當(dāng)前位置：主頁(yè) > 科技論文 > 計(jì)算機(jī)論文 >

數(shù)據(jù)并行處理器中指令流出的協(xié)同性研究

發(fā)布時(shí)間：2019-01-04 12:03

【摘要】：盡管在過(guò)去的20年中，半導(dǎo)體工藝的發(fā)展和體系結(jié)構(gòu)技術(shù)的推動(dòng)，使得微處理器的性能提升了有上千倍之多。然而來(lái)自應(yīng)用的性能需求卻依然與處理器的實(shí)際性能之間存在著日益拉大的差距。特別是隨著半導(dǎo)體工藝的繼續(xù)進(jìn)步變得舉步維艱，芯片功耗的負(fù)面影響逐步凸顯，，如何縮小處理器實(shí)際性能與應(yīng)用需求之間的差距，成為一個(gè)艱巨而又緊迫的任務(wù)。融合了多核、SIMD(Single InstructionMultiple Data)以及VLIW(Very Long Instruction Word)技術(shù)的數(shù)據(jù)并行處理器，以其高效的數(shù)據(jù)并行性開(kāi)發(fā)能力，為繼續(xù)提高處理器的性能帶來(lái)了一道曙光。然而，不可忽視的一點(diǎn)是，數(shù)據(jù)并行處理器在帶來(lái)希望的同時(shí)，其自身依然存在指令流出的協(xié)同性問(wèn)題。本文正是針對(duì)該問(wèn)題，以指令流出技術(shù)為落腳點(diǎn)，從兩個(gè)方面加強(qiáng)了數(shù)據(jù)并行處理器中指令流出的協(xié)同性：即多種指令流出方式的高效融合和通過(guò)克服性能瓶頸達(dá)到硬件資源間的協(xié)同配合。本文取得的主要研究成果如下： 1).分析推演了數(shù)據(jù)并行處理器中多核、SIMD及VLIW在關(guān)注功耗開(kāi)銷前提下的高效融合模型。本文通過(guò)在Amdahl定律中加入對(duì)SIMD、VLIW技術(shù)的表征，將Amdahl定律成功應(yīng)用于數(shù)據(jù)并行處理器,并給出有關(guān)多核數(shù)目、SIMD寬度和VLIW長(zhǎng)度的設(shè)計(jì)指導(dǎo)。本文還將限制數(shù)據(jù)并行處理器性能的關(guān)鍵瓶頸鎖定在串行處理、分支結(jié)構(gòu)以及對(duì)同時(shí)多寬度SIMD的支持等問(wèn)題上。 2).提出了用于加速串行處理應(yīng)用，并提供控制處理高效配合的雙核化框架。包括三項(xiàng)關(guān)鍵技術(shù)：kernel級(jí)軟件流水、動(dòng)態(tài)解耦耦合機(jī)制、統(tǒng)一分支和快速數(shù)據(jù)共享技術(shù)。本文通過(guò)kernel級(jí)軟件流水的方法開(kāi)發(fā)出大量的串、并行應(yīng)用kernel間的并行性，并通過(guò)動(dòng)態(tài)解耦、耦合機(jī)制，高效的實(shí)現(xiàn)了對(duì)串、并行應(yīng)用間并行性的開(kāi)發(fā)，消除了串行處理類應(yīng)用的瓶頸效應(yīng)。此外、本文采用統(tǒng)一分支及快速數(shù)據(jù)共享技術(shù)進(jìn)一步提高了雙核化框架在緊耦合狀態(tài)下的性能。 3).提出了用于克服分支結(jié)構(gòu)瓶頸效應(yīng)的指令混洗機(jī)制。該機(jī)制在保持了SIMD結(jié)構(gòu)高效性的同時(shí)，兼具了MIMD結(jié)構(gòu)在處理分支問(wèn)題時(shí)的靈活性，從而使得不同的SIMD lane能夠根據(jù)各自的分支結(jié)果獲取相應(yīng)的指令，實(shí)現(xiàn)不同分支路徑的并行執(zhí)行。與此同時(shí)，由于在該機(jī)制中執(zhí)行相同分支路徑的SIMD lane仍然以SIMD的方式執(zhí)行，因此很好的保持了SIMD結(jié)構(gòu)本身的高效性。指令混洗機(jī)制在SIMD與MIMD結(jié)構(gòu)之間搭建了一座橋梁，極大的提升了數(shù)據(jù)并行處理器的執(zhí)行效率。 4).擴(kuò)展了指令混洗機(jī)制，提出支持SIMD lane動(dòng)態(tài)及靜態(tài)分組的多SIMD多數(shù)據(jù)流(MSMD)結(jié)構(gòu)。該結(jié)構(gòu)能夠在高效支持分支問(wèn)題的同時(shí)，滿足應(yīng)用中對(duì)同時(shí)多寬度SIMD的需求，支持多個(gè)具有不同SIMD寬度需求的應(yīng)用kernel并行執(zhí)行。此外，多SIMD多數(shù)據(jù)流結(jié)構(gòu)改進(jìn)了指令混洗機(jī)制中指令buffer的映射算法，進(jìn)一步提升了SIMD結(jié)構(gòu)在處理分支問(wèn)題時(shí)的性能。 5).將雙核化框架與多SIMD多數(shù)據(jù)流結(jié)構(gòu)有機(jī)結(jié)合，形成協(xié)同指令流出技術(shù)，實(shí)現(xiàn)對(duì)數(shù)據(jù)并行處理器中串行處理、分支以及同時(shí)多寬度SIMD問(wèn)題的綜合突破與硬件資源的協(xié)同配合。本文還對(duì)該結(jié)構(gòu)在全芯片的RTL級(jí)環(huán)境中進(jìn)行了設(shè)計(jì)實(shí)現(xiàn)，實(shí)現(xiàn)結(jié)果表明，協(xié)同指令流出技術(shù)能夠以合理的開(kāi)銷，實(shí)現(xiàn)數(shù)據(jù)并行處理器中硬件資源的高效協(xié)同配合。數(shù)據(jù)并行處理器結(jié)構(gòu)仍然是一個(gè)熱點(diǎn)研究課題。許多關(guān)鍵問(wèn)題還有待更加系統(tǒng)、更具有實(shí)際意義的研究。本文通過(guò)多種指令流出方式的融合模型研究，為數(shù)據(jù)并行處理器的設(shè)計(jì)提供了系統(tǒng)的指導(dǎo)，并針對(duì)限制數(shù)據(jù)并行處理器性能的關(guān)鍵瓶頸，提出了高效的解決辦法。驗(yàn)證和評(píng)估結(jié)果表明，本文所提的解決辦法是有效的，能夠應(yīng)用于未來(lái)數(shù)據(jù)并行處理器的設(shè)計(jì)和實(shí)現(xiàn)。
[Abstract]:In the past 20 years, the development of the semiconductor process and the advancement of the architecture technology have improved the performance of the microprocessor by more than a thousand times. the performance requirements from the application, however, still have an increasing gap between the actual performance of the processor. In particular, with the continuous progress of the semiconductor process, the negative effect of the chip power consumption is becoming more and more obvious, and how to reduce the gap between the actual performance and the application demand of the processor becomes a difficult and urgent task. The data-parallel processor with multi-core, SIMD (Single Instruction Multiple Data) and VLIW (Very Long Instruction Word) technology is used to develop the high-efficiency data parallelism. The non-negligible point, however, is that the data parallel processor, at the same time as it brings the hope, still has the problem of the co-existence of the instruction outflow. In this paper, aiming at this problem, the coordination of the instruction outflow in the data parallel processor is enhanced from two aspects by using the instruction outflow technology as the landing point, that is, the efficient fusion of multiple instruction outflow modes and the cooperative matching between the hardware resources by overcoming the performance bottleneck. The main research results are as follows: 1 The high-efficiency fusion mode of the multi-core, SIMD and VLIW in the data-parallel processor is analyzed. In this paper, by adding the characterization of SIMD and VLIW technology in Amdahl's law, the Amdahl's law is successfully applied to the data parallel processor, and the design of the multi-core number, the SIMD width and the length of the VLIW is given. This paper also discusses the key bottleneck of data parallel processor performance, such as serial processing, branch structure and support for simultaneous multi-width SIMD Up. 2). Put forward the dual-core for accelerating the serial processing application and providing control processing and efficient matching. The framework includes three key technologies: kernel-level software running water, dynamic decoupling coupling mechanism, unified branch and fast data co-operation In this paper, a large number of serial and parallel application kernel parallelism are developed through kernel-level software pipelining, and the development of parallelism between strings and parallel applications is realized through dynamic decoupling and coupling mechanism, and the bottle of serial processing class application is eliminated. In addition, the unified branch and fast data sharing technology is used to further improve the binuclear framework in the tight coupling state. Performance. 3). A finger for overcoming the bottleneck effect of a branch structure is proposed. the mechanism maintains the high efficiency of the SIMD structure and has the flexibility of the MIMD structure when processing the branch problems, so that the different SIMD lane can obtain the corresponding instruction according to the respective branch results to realize different branch paths, in parallel, the simd lane, which performs the same branch path in this mechanism, is still executed in a simd manner, so that the simd structure is well maintained. The instruction shuffling mechanism sets up a bridge between the SIMD and MIMD structures, which greatly improves the data parallel processor. execution of efficiency. 4). extended instruction shuffling mechanism to propose a multi-simd multi-data stream (The structure of the MSMD can meet the requirement of simultaneous multi-width SIMD in the application while supporting the branch problem efficiently, and support a plurality of applications with different SIMD width requirements. in addition, the multi-SIMD multi-stream structure improves the instruction buffer mapping algorithm in the instruction shuffling mechanism, and further improves the SIMD structure in processing the partition. the question of the branch and the combination of the dual-core framework and the multi-SIMD multi-data stream structure is organically combined to form a cooperative instruction flow-out technology to realize the comprehensive breakthrough of the serial processing, the branch and the simultaneous multi-width SIMD problem in the data parallel processor. The design and implementation of the structure in the RTL-level environment of the whole chip are also carried out in this paper. The results show that the cooperative instruction flow-out technology can realize the hardware of the data parallel processor with reasonable overhead. High-efficiency co-operation of resources and data parallel processor structure It's still a hot topic. Many of the key issues still need to be more systematic This paper studies the fusion model of the data parallel processor, provides the system guidance for the design of the data parallel processor, and the key bottleneck for limiting the performance of the data parallel processor The results of the verification and evaluation show that the solution proposed in this paper is effective and can be applied to future data
【學(xué)位授予單位】：國(guó)防科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】：博士
【學(xué)位授予年份】：2013
【分類號(hào)】：TP332

【參考文獻(xiàn)】

相關(guān)期刊論文前2條

1 陳書(shū)明;汪東;陳小文;萬(wàn)江華;;一種面向多核DSP的小容量緊耦合快速共享數(shù)據(jù)池[J];計(jì)算機(jī)學(xué)報(bào);2008年10期

2 陳書(shū)明;萬(wàn)江華;魯建壯;劉仲;孫海燕;孫永節(jié);劉衡竹;劉祥遠(yuǎn);李振濤;徐毅;陳小文;;YHFT-QDSP:High-Performance Heterogeneous Multi-Core DSP[J];Journal of Computer Science & Technology;2010年02期

本文編號(hào)：2400263

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/jisuanjikexuelunwen/2400263.html

上一篇：基于Qt的嵌入式Linux系統(tǒng)下的掌上多媒體系統(tǒng)設(shè)計(jì)
下一篇：云文件同步系統(tǒng)關(guān)鍵技術(shù)研究與實(shí)現(xiàn)

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

數(shù)據(jù)并行處理器中指令流出的協(xié)同性研究