片上多核處理器緩存子系統(tǒng)優(yōu)化的研究
[Abstract]:the current slice-on-chip multi-core processor requires a high-capacity caching system to reduce the performance gap between a fast processor and a slow chip. It is considered that the performance and power consumption of the cache sub-system can be optimized by using and digging the characteristics of the multi-core processor on the chip. In this paper, the mechanism of multi-core processor cache sub-system performance on several optimization slices is studied. In particular, the research topic in this paper includes three aspects: 1) research and design efficient multicast routing algorithm to improve the performance of the network on the chip; 2) use the current new non-volatile memory to design a low-power cache system for the multi-core processor on the chip; and 3) mining the progress information of the utilization thread to design a more efficient cache coherence protocol. For the first subject of the study, we propose a high-efficiency, on-chip, network-multicast routing machine for multi-core processors with more and more cores, the on-chip network provides an efficient, scalable communication infrastructure architecture. For an on-chip network under a multi-core architecture, a large number of communication modes are common. Without the support of a valid multicast routing mechanism, conventional unicast-based on-chip networks are inefficient in handling these multicast communications This paper presents a network-based multicast routing mechanism, called DPM. DPM can effectively reduce the average transmission delay of the network packets in the network and reduce the work of the network on the chip in particular, DPM can dynamically route that route in accordance with the load balance level in the current network and the link share characteristics of the multicast communication The second subject of this paper is to use a new non-volatile memory (spin transfer moment random access memory, STT-RAM) to design low power consumption for multi-core processors on the chip The cache. STT-RAM has a fast access speed, a high storage density, and a negligible drain however, large-scale application of STT-RAM as that cache of the multi-core processor is subject to a longer write delay of the STT-RAM and high write power consumption The recent study has shown that the data retention time of a memory cell (magnetic tunnel junction MTJ) that has reduced the STT-RAM can effectively increase it Write performance. However, the STT-RAM with reduced retention time is easy to lose, and it is necessary to avoid the number by periodically refreshing its storage unit It is lost. When such STT-RAM is used for the last-level cache (LLC) of a multi-core, frequent refresh operations will also negatively impact the performance of the system while increasing energy consumption The text provides a high-efficiency refresh scheme (CCear) that minimizes the brush on this class of STT-RAM The new operation. The CCear eliminates unnecessary brush by interacting with the cache coherency protocol and the cache management algorithm New operation. Finally, we put forward an efficient consistency protocol adjustment mechanism to optimize the parallelism of the multi-core processor running on the chip The performance of the program. One of the main objectives of the multi-core processor on the chip is to continue to improve the application by digging the parallelism of the thread level the performance of a program. However, for a multi-threaded program running on this class of systems, different threads typically present different threads due to the non-uniform task assignment and the collision of the shared resource The progress of the execution of the progress. The non-uniformity of this progress is the maximum of the multi-threaded program performance One of the bottlenecks in a multi-threaded program, such as a memory barrier and a lock, and the kernel running a thread with a faster progress must stop and wait for entry a relatively slow core. Such an air, etc., will not only reduce the performance of the system, but also This paper presents a thread progress-aware consistency adjusting mechanism, called TEACA. The TEACA dynamically adjusts the consistency of each thread with the thread's progress information. The purpose of this paper is to improve the utilization efficiency of network bandwidth resources on the slice. in particular, that TEACA divide the thread into two types: leader thread and the latter thread. The TEACA then provides a specific request for its consistency request based on the thread's class information
【學位授予單位】:中國科學技術(shù)大學
【學位級別】:博士
【學位授予年份】:2013
【分類號】:TP332
【共引文獻】
相關期刊論文 前4條
1 劉軼;吳名瑜;王永會;錢德沛;;一種硬件事務存儲系統(tǒng)中的事務嵌套處理方案[J];電子學報;2014年01期
2 Muhammad Abid Mughal;Hai-Xia Wang;Dong-Sheng Wang;;The Case of Using Multiple Streams in Streaming[J];International Journal of Automation and Computing;2013年06期
3 張駿;田澤;梅魁志;趙季中;;基于節(jié)點預測的直接Cache一致性協(xié)議[J];計算機學報;2014年03期
4 馮超超;張民選;李晉文;戴藝;;一種可配置雙向鏈路的片上網(wǎng)絡容錯偏轉(zhuǎn)路由器[J];計算機研究與發(fā)展;2014年02期
相關博士學位論文 前5條
1 王慶;面向嵌入式多核系統(tǒng)的并行程序優(yōu)化技術(shù)研究[D];哈爾濱工業(yè)大學;2013年
2 朱素霞;面向多核處理器確定性重演的內(nèi)存競爭記錄機制研究[D];哈爾濱工業(yè)大學;2013年
3 楊兵;分簇超標量處理器關鍵技術(shù)研究[D];哈爾濱工業(yè)大學;2009年
4 馮超超;片上網(wǎng)絡無緩沖路由器關鍵技術(shù)研究[D];國防科學技術(shù)大學;2012年
5 陳銳忠;非對稱多核處理器的若干調(diào)度問題研究[D];華南理工大學;2013年
相關碩士學位論文 前5條
1 閔銀皮;同構(gòu)通用流多核處理器存儲部件關鍵技術(shù)研究[D];國防科學技術(shù)大學;2012年
2 張岐;基于CMP的硬件事務存儲系統(tǒng)優(yōu)化技術(shù)研究[D];哈爾濱工程大學;2013年
3 張杰;基于CMP的共享L2Cache管理策略研究[D];哈爾濱工程大學;2013年
4 馬超;徽商銀行基金代銷自動賬戶系統(tǒng)設計與實現(xiàn)[D];大連理工大學;2013年
5 王勛;面向非易失存儲器PCM的節(jié)能技術(shù)研究[D];浙江工業(yè)大學;2013年
,本文編號:2412699
本文鏈接:http://www.sikaile.net/kejilunwen/jisuanjikexuelunwen/2412699.html