當(dāng)前位置：主頁(yè) > 科技論文 > 計(jì)算機(jī)論文 >

基于可重構(gòu)平臺(tái)的片上多處理器系統(tǒng)相關(guān)技術(shù)研究

發(fā)布時(shí)間：2018-04-23 14:01

本文選題：片上多處理器 + 雙模式融合通信��；參考：《東北大學(xué)》2013年博士論文

【摘要】：傳統(tǒng)單核處理器受到功耗及制造工藝的限制,已無(wú)法通過(guò)提升主頻來(lái)滿(mǎn)足高性能嵌入式應(yīng)用的需求。因此,學(xué)者們提出了片上多處理器系統(tǒng)的研究方向。與多處理器系統(tǒng)相比,片上多處理器系統(tǒng)將處理單元集成在單顆芯片中,減少了通信代價(jià),降低了功耗,進(jìn)一步提升了系統(tǒng)的整體性能。因此,片上多處理器系統(tǒng)是未來(lái)計(jì)算機(jī)發(fā)展的方向和必然趨勢(shì)。隨著研究的不斷深入,越來(lái)越多的應(yīng)用被映射到片上多處理器系統(tǒng)中,然而此過(guò)程所遇到的一些問(wèn)題沖擊了現(xiàn)有的系統(tǒng)架構(gòu),此類(lèi)問(wèn)題的核心是如何保證和提高系統(tǒng)的并行效率。為此,本文從通信機(jī)制、設(shè)計(jì)模型、路由算法、拓?fù)浣Y(jié)構(gòu)等關(guān)鍵領(lǐng)域,展開(kāi)了深入的研究,并取得了如下創(chuàng)新性成果：(1)提出了一種雙模式融合的通信機(jī)制。處理器間的通信機(jī)制是影響片上多處理器系統(tǒng)性能的關(guān)鍵因素,針對(duì)已有通信機(jī)制存在通信效率低的問(wèn)題,提出了一種雙模式融合通信機(jī)制。該機(jī)制根據(jù)處理器間交互數(shù)據(jù)的特征,將其劃分為控制類(lèi)消息和數(shù)據(jù)類(lèi)消息,分別采用獨(dú)立的通道完成傳輸�；陔p模式融合通信機(jī)制,提出了復(fù)制-分治的任務(wù)并行化模型,通過(guò)預(yù)先對(duì)任務(wù)復(fù)制,減少運(yùn)行時(shí)處理器問(wèn)的調(diào)度開(kāi)銷(xiāo)�；诳芍貥�(gòu)平臺(tái),對(duì)雙模式融合通信機(jī)制進(jìn)行了實(shí)現(xiàn),并以粒子濾波跟蹤算法為例,進(jìn)行了任務(wù)并行化設(shè)計(jì)。測(cè)試結(jié)果表明,雙模式融合通信機(jī)制能夠顯著提升處理器間的數(shù)據(jù)交互能力,降低并行開(kāi)銷(xiāo),提高系統(tǒng)整體的并行效率。(2)提出了一種多層次并行的設(shè)計(jì)模型。根據(jù)應(yīng)用需求設(shè)計(jì)合理的系統(tǒng)架構(gòu)及任務(wù)調(diào)度方式,是提高異構(gòu)片上多處理器系統(tǒng)性能的關(guān)鍵。已有的設(shè)計(jì)模型雖然可以提高系統(tǒng)的并行性,但仍然沒(méi)有擺脫宏觀串行、局部并行的模式。針對(duì)以上問(wèn)題,提出了一種多層次并行的設(shè)計(jì)模型。將異構(gòu)系統(tǒng)的設(shè)計(jì)分解為系統(tǒng)級(jí)、事務(wù)級(jí)和語(yǔ)句級(jí)三個(gè)層次,通過(guò)逐層深入、逐步分解的方式挖掘任務(wù)的并行性,提高系統(tǒng)整體性能。以多層次并行模型為基礎(chǔ),基于可重構(gòu)平臺(tái),設(shè)計(jì)并實(shí)現(xiàn)了AVI視頻編碼及存儲(chǔ)系統(tǒng)。測(cè)試結(jié)果表明,多層次并行模型有效的解決了異構(gòu)片上多處理器系統(tǒng)的設(shè)計(jì)問(wèn)題,提高了系統(tǒng)并行效率。(3)提出了一種基于阻塞感知的局部自適應(yīng)路由算法。已有路由算法對(duì)拓?fù)渚W(wǎng)絡(luò)利用率低,數(shù)據(jù)包路由過(guò)程容易產(chǎn)生局部阻塞,針對(duì)此問(wèn)題,提出了一種基于阻塞感知的局部白適應(yīng)路由算法。該路由算法采取全局維序、局部自適應(yīng)的規(guī)則,在路由節(jié)點(diǎn)間增加阻塞反饋信號(hào),對(duì)鄰近區(qū)域的網(wǎng)絡(luò)狀態(tài)進(jìn)行監(jiān)控,并能夠根據(jù)實(shí)際情況動(dòng)態(tài)調(diào)整路由路徑。理論分析及仿真結(jié)果表明：該算法具有較高的數(shù)據(jù)吞吐率和較強(qiáng)的自適應(yīng)能力�；诳芍貥�(gòu)平臺(tái),對(duì)本文提出的算法和XY路由算法進(jìn)行了實(shí)現(xiàn)。對(duì)比測(cè)試表明,采用本文所提出的算法進(jìn)行路由時(shí),有多條最短路徑可以選擇,降低了單一鏈路的負(fù)載。同時(shí),當(dāng)網(wǎng)絡(luò)出現(xiàn)阻塞時(shí),可有效的繞過(guò)阻塞區(qū)域,提高系統(tǒng)的并行性。(4)提出了一種基于折半思想的拓?fù)浣Y(jié)構(gòu)。NoC型片上多處理器系統(tǒng)中,主節(jié)點(diǎn)與其它節(jié)點(diǎn)的數(shù)據(jù)交互頻率要遠(yuǎn)高于普通節(jié)點(diǎn)間的交互頻率,而目前的拓?fù)浣Y(jié)構(gòu)研究并沒(méi)有面向這一特征進(jìn)行優(yōu)化設(shè)計(jì)。針對(duì)此問(wèn)題,提出一種新型的拓?fù)浣Y(jié)構(gòu)Half-Mesh。該拓?fù)渫ㄟ^(guò)增加行、列頭節(jié)點(diǎn)與普通節(jié)點(diǎn)間橫向、縱向長(zhǎng)連線(xiàn),縮短了頭節(jié)點(diǎn)與同維的中心節(jié)點(diǎn)間距離,繼而減小了整個(gè)NoC網(wǎng)絡(luò)的平均路徑長(zhǎng)度。針對(duì)Half-Mesh拓?fù)浣Y(jié)構(gòu),提出了HTF-XY路由算法,采取分區(qū)路由策略,既縮短了不同區(qū)域內(nèi)節(jié)點(diǎn)間的路徑長(zhǎng)度,又提升路由的自適應(yīng)性�；诳芍貥�(gòu)平臺(tái),實(shí)現(xiàn)了網(wǎng)絡(luò)規(guī)模為7×7的Half-Mesh拓?fù)浣Y(jié)構(gòu)及HTF-XY路由算法。測(cè)試結(jié)果表明,Half-Mesh拓?fù)浣Y(jié)構(gòu)提升了頭節(jié)點(diǎn)與其它節(jié)點(diǎn)的交互能力,降低了整個(gè)片上網(wǎng)絡(luò)的路由延遲,提高系統(tǒng)的并行性。
[Abstract]:The traditional mononuclear processor is limited by power and manufacturing technology. It has not been able to meet the needs of high performance embedded applications by lifting the main frequency. Therefore, scholars have proposed the research direction of the on-chip multiprocessor system. Compared with the multiprocessor system, the chip multiprocessor system integrates processing units in single chips and reduces the pass. It reduces power consumption and further improves the overall performance of the system. Therefore, the on-chip multiprocessor system is the direction and inevitable trend of future computer development. As the research continues, more and more applications are mapped to on chip multiprocessor systems. However, some of the problems encountered in this process have impacted the existing systems. The core of such problems is how to ensure and improve the parallel efficiency of the system. Therefore, this paper has carried out an in-depth study on the key fields such as communication mechanism, design model, routing algorithm, topology structure and other key fields, and obtained the following innovative achievements: (1) a communication mechanism of dual mode fusion is proposed. The communication mechanism among processors is the mechanism of communication between the processors. The key factor affecting the performance of the multiprocessor system on the chip is a dual mode fusion communication mechanism, which is based on the characteristics of the interactive data between processors. This mechanism divides it into a control class message and a data class message according to the characteristics of the interactive data between the processors. A task parallelization model of duplication and division is proposed in the mode fusion mechanism. By copying the tasks in advance, the scheduling overhead of the processor is reduced. Based on the reconfigurable platform, the dual mode fusion communication mechanism is implemented. The task parallelization design is carried out with the particle filter tracking algorithm. The test results show that The dual mode fusion communication mechanism can significantly improve the data interaction capability between processors, reduce the parallel overhead and improve the overall parallel efficiency of the system. (2) a multi level parallel design model is proposed. The design of a reasonable system architecture and task scheduling method based on the application requirements is the key to improving the performance of the heterogeneous multiprocessor system. Key. Although the existing design model can improve the parallelism of the system, it still does not get rid of the macro serial and local parallel mode. In view of the above problems, a multi level parallel design model is proposed. The design of the heterogeneous system is decomposed into three levels of system level, transaction level and statement level, which are gradually decomposed by layer by layer. Based on the multilevel parallel model and the reconfigurable platform, the AVI video coding and storage system is designed and implemented on the basis of the multilevel parallel model. The test results show that the multilevel parallel model effectively solves the design problem of the multiprocessor system on the heterogeneous chip and improves the efficiency of the system parallel. (3) proposed A local adaptive routing algorithm based on blocking perception is proposed. The existing routing algorithm has a low utilization rate to the topology network and easily produces local congestion in the packet routing process. A local white adaptive routing algorithm based on blocking perception is proposed for this problem. The routing algorithm takes the global order, local adaptive rules and routing. The congestion feedback signal is added between nodes to monitor the network state in the adjacent area, and the routing path can be dynamically adjusted according to the actual situation. The theoretical analysis and simulation results show that the algorithm has high data throughput and strong adaptive ability. Based on reconfigurable platform, the algorithm proposed in this paper and the XY routing algorithm are introduced. The comparison test shows that when the algorithm proposed in this paper is used for routing, there are several shortest paths that can be selected to reduce the load of a single link. At the same time, when the network is blocked, it can effectively bypass the blocking area and improve the parallelism of the system. (4) a topology structure.NoC type based on the half thought is proposed. In the processor system, the frequency of the data interaction between the main node and the other nodes is much higher than the interaction frequency between the common nodes, and the current topology research has not been optimized for this feature. A new topology, Half-Mesh., is proposed. Transverse and lengthwise long lines shorten the distance between the head node and the center node of the same dimension, and then reduce the average path length of the whole NoC network. In view of the Half-Mesh topology, the HTF-XY routing algorithm is proposed and the partition routing strategy is adopted, which not only shortens the path length among the nodes in different regions, but also improves the adaptability of the routing. The reconfigurable platform has realized the network size of 7 * 7 Half-Mesh topology and HTF-XY routing algorithm. The test results show that the Half-Mesh topology improves the interaction between the head node and other nodes, reduces the routing delay of the entire network and improves the parallelism of the system.

【學(xué)位授予單位】：東北大學(xué)
【學(xué)位級(jí)別】：博士
【學(xué)位授予年份】：2013
【分類(lèi)號(hào)】：TP332

【參考文獻(xiàn)】

相關(guān)期刊論文前3條

1 歐陽(yáng)一鳴;董少周;梁華國(guó);;基于2D Mesh的NoC路由算法設(shè)計(jì)與仿真[J];計(jì)算機(jī)工程;2009年22期

2 吳紅莉;尹寶林;向剛;趙霞;;分布式存儲(chǔ)環(huán)境下工作流相關(guān)數(shù)據(jù)的一致性更新[J];系統(tǒng)仿真學(xué)報(bào);2009年08期

3 ;Design and simulation of a Torus topology for network on chip[J];Journal of Systems Engineering and Electronics;2008年04期

相關(guān)博士學(xué)位論文前1條

1 朱曉靜;片上網(wǎng)絡(luò)的結(jié)構(gòu)設(shè)計(jì)與性能分析[D];中國(guó)科學(xué)技術(shù)大學(xué);2008年

，

本文編號(hào)：1792270

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/jisuanjikexuelunwen/1792270.html

上一篇：基于MOOC平臺(tái)的《大學(xué)計(jì)算機(jī)基
下一篇：基于USB3.0的數(shù)據(jù)采集系統(tǒng)設(shè)計(jì)

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于可重構(gòu)平臺(tái)的片上多處理器系統(tǒng)相關(guān)技術(shù)研究