EDGE體系結構指令動態(tài)映射算法研究
[Abstract]:The lumped structure widely existing in scrambled superscalar processors has seriously restricted the performance improvement of microprocessors. Edge (Explicit Data Graph Execution) is one of the models to deal with the bottleneck of microprocessor performance enhancement. The lumped structure with large energy consumption in superscalar is abandoned from the structural model. In a distributed EDGE architecture, instructions are mapped to multiple slices to execute simultaneously. The transmission of operands between slices requires delay, which results in performance degradation. The instruction mapping algorithm tries to eliminate the performance loss caused by fragmentation by carefully weighing the program parallelism and inter-slice communication delay. The TRIPS microprocessor adopts asymmetric distribution of critical resource topology and static reference. Mapping algorithm (SPDI, Static Placement Dynamic Issue). This will lead to a large load imbalance and Operand network communication hot spots on the ET (Execute Tile), thus causing a decrease in IPC. In this paper, a EDGE structure similar to TRIPS is implemented in the M5-EDGE simulator to study the instruction dynamic Deep mapping algorithm. In the absence of compiler scheduling, the Deep algorithm using cyclic mapping is 85% of SPDI and 98.3% of SPDI when the transmission width is 1 and 2, respectively. According to the topological position of RT (Register Tile) and DT (Data-cache Tile), three kinds of optimization of Deep mapping are carried out: according to the order of et numbering, the glyph order of "its" and the sum of calculating the number of leapfrogging steps in the global communication of very block to select ETs first. When the launch width is 1, the average jump steps of the three optimizations are 2.63% and 4.70% less than those of the basic Deep algorithm, respectively, while the IPC increases by 1.07% and 2.11%, respectively. This shows that optimizing the jump number of inter-instruction communication under Deep mapping can significantly increase the number of jump steps. In the Deep mapping algorithm, more than 90% of the operands are transferred by the optograph bypass, which greatly reduces the load of the operands network. When the bypass width is 2 times the transmit width, the local Operand transfer delay is almost reduced to 0. 0. Increasing the local bypass width can effectively reduce the delay of Operand transfer. RT is assigned to et by number, and the IPC of basic Deep mapping algorithm increases by 1.77. For the DT position optimization, the et near DT and the sum of calculated VBS hops are selected first. These two optimizations are 1.17% and 1.89% higher than the basic Deep mapping IPC, respectively. The RT and DT are tiled into the et to form the topological structure of 4x4. When the emission width is 1 and 2, the IPC of Deep map is 97.18% of SPDI and 113.42% of SPDI, respectively. The ratio of ETs was 97.32% and 114.06% respectively. When the topology distance becomes smaller or the Deep mapping algorithm optimizes the number of communication hops, the system IPCs can be improved significantly.
【學位授予單位】:哈爾濱工業(yè)大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TP332;TP301.6
【共引文獻】
相關期刊論文 前10條
1 裴頌文;吳小東;唐作其;熊乃學;;異構千核處理器系統(tǒng)的統(tǒng)一內存地址空間訪問方法[J];國防科技大學學報;2015年01期
2 楊文頂;覃志東;;基于NoC的眾核處理器可靠性仿真分析研究[J];智能計算機與應用;2015年02期
3 劉東;張進寶;廖小飛;金海;;面向混合內存體系結構的模擬器[J];華東師范大學學報(自然科學版);2014年05期
4 謝子超;佟冬;黃明凱;;A General Low-Cost Indirect Branch Prediction Using Target Address Pointers[J];Journal of Computer Science and Technology;2014年06期
5 李凌達;陸俊林;程旭;;Retention Benefit Based Intelligent Cache Replacement[J];Journal of Computer Science and Technology;2014年06期
6 李笑天;殷淑娟;何虎;;一種DSP周期精度高效建模方法[J];計算機應用研究;2015年01期
7 劉雨辰;王佳;陳云霽;焦帥;;計算機系統(tǒng)模擬器研究綜述[J];計算機研究與發(fā)展;2015年01期
8 黃明凱;劉先華;譚明星;謝子超;程旭;;一種面向解釋器的間接轉移預測技術[J];計算機研究與發(fā)展;2015年01期
9 黃永兵;陳明宇;;移動設備應用程序的體系結構特征分析[J];計算機學報;2015年02期
10 楊群;李笑天;何虎;;面向Superscalar與VLIW混合架構處理器的調試器設計[J];計算機應用與軟件;2015年05期
相關博士學位論文 前2條
1 章鐵飛;基于程序訪存模式的存儲系統(tǒng)節(jié)能技術研究[D];浙江大學;2013年
2 修思文;MPSoC性能估計技術研究[D];浙江大學;2015年
相關碩士學位論文 前10條
1 王勛;面向非易失存儲器PCM的節(jié)能技術研究[D];浙江工業(yè)大學;2013年
2 辛愿;面向嵌入式系統(tǒng)的自調數據預取[D];浙江大學;2013年
3 胡妍;結合結構級和門級的多核處理器功耗評估方法[D];湖南大學;2013年
4 劉雨辰;基于多維數組的高速片上網絡模擬器的設計與實現[D];內蒙古大學;2014年
5 單磊;大規(guī)模并行片上系統(tǒng)的分布式并行模擬關鍵技術研究[D];國防科學技術大學;2012年
6 佘超杰;基于多核的片上網絡低延遲與低功耗的研究[D];北京工業(yè)大學;2014年
7 艾天鵬;基于通訊感知的片上網絡加速機制研究[D];浙江工業(yè)大學;2014年
8 陸yN;基于計算模型的體系結構模擬器研究[D];復旦大學;2013年
9 張浪;面向異構集成的NoC路由算法研究[D];武漢理工大學;2014年
10 繆旭陽;復雜體系結構的計算特征分類研究[D];武漢理工大學;2014年
本文編號:2141553
本文鏈接:http://www.sikaile.net/kejilunwen/jisuanjikexuelunwen/2141553.html