面向多線程應用的Cache優(yōu)化策略及并行模擬研究
發(fā)布時間:2019-06-24 14:55
【摘要】:片上多核處理器(Chip Multi-Processor, CMP)相對于傳統(tǒng)的單核處理器具有復雜度小、擴展性好以及性價比高等優(yōu)勢,在工藝和應用等因素的推動下,CMP已經成為高性能微處理器的發(fā)展潮流。多核處理器設計復雜度和性能瓶頸大部分集中在片內存儲系統(tǒng)上,提高緩存(Cache)命中率、避免延時較大的片外訪存對系統(tǒng)的整體性能至關重要,因此片上層次Cache系統(tǒng)已成為多核處理器的研究重點之一。目前學術界對CMP的緩存優(yōu)化做了很多工作,但這些工作大部分是面向多道程序的,對于多線程應用程序,已有的Cache優(yōu)化技術是否能提高程序性能或者說如何提高性能,依然是開放的問題。本文的研究主要針對多核處理器的Cache性能優(yōu)化及并行模擬展開,論文的貢獻與創(chuàng)新點如下: 1.研究了分片式多核處理器的緩存優(yōu)化機制。在分片式片上多核處理器中,每個分片之間的通信流量和二級Cache的容量利用率都存在不均衡的現象。針對這一現象,本文提出一種面向多線程應用程序的自適應復制策略ARP,綜合私有二級Cache和共享Cache的優(yōu)點,通過周期性的權衡Cache數據復制帶來的收益與消耗,動態(tài)地控制數據在二級Cache之間的復制數量。實驗表明,在16核的配置中,ARP機制在最好情況下能降低52%的網絡流量,提高容量利用率到58%,此外在優(yōu)化平均訪問距離方面也有較好效果。 2.研究了面向多線程應用的基于效用的緩存優(yōu)化策略。傳統(tǒng)的緩存劃分方案大多是面向多道程序的,忽略了多線程負載中共享數據和私有數據訪問模式的差別,使得共享數據的使用效率降低。針對多線程程序中不同類型數據的訪問特性,本文提出了一種面向多線程程序的Cache管理機制UPP,通過監(jiān)控共享Cache中共享、私有數據的效用信息為每個線程以及共享數據分配Cache空間,再結合改進后的數據插入、提升策略,達到數據總體效用最大化、過濾低重用數據等目的。實驗表明,UPP性能相對于基于LRU的純共享Cache結構、基于公平的靜態(tài)Cache劃分結構性能的提升約為4.5%和5.2%。 3.研究了多核處理器的并行模擬技術。隨著片上多核處理器(CMP)中處理器核數目及核之間互聯復雜度的增加,多核處理器模擬器將變得更加龐大、復雜、緩慢。針對這一問題,本文利用多線程技術開發(fā)了一種模塊化、可擴展的并行仿真模塊ParaNSim,既可以作為獨立的片上網絡模擬器使用,也可以添加其它模塊作為分片式CMP模擬器或者嵌入其它模擬器中作為一個子模塊使用。實驗表明,ParaNSim在4個子線程和8個子線程的配置下分別能取得1.44和2.42倍的最高加速比。
[Abstract]:Compared with the traditional single-core processor, on-chip multi-core processor (Chip Multi-Processor, CMP) has the advantages of low complexity, good expansibility and high performance-price ratio. CMP has become the development trend of high-performance microprocessor driven by process and application. The design complexity and performance bottleneck of multi-core processor are mostly concentrated on-chip storage system. Improving the hit ratio of cache (Cache) and avoiding off-chip access with large delay are very important to the overall performance of the system. Therefore, on-chip hierarchical Cache system has become one of the research priorities of multi-core processor. At present, the academic circles have done a lot of work on the cache optimization of CMP, but most of these work is oriented to multi-program. For multithreaded applications, whether the existing Cache optimization technology can improve the performance of the program or how to improve the performance is still an open question. The research in this paper is mainly aimed at the performance optimization and parallel simulation of multi-core processor Cache. The contributions and innovations of this paper are as follows: 1. The cache optimization mechanism of split multi-core processor is studied. In the sliced on-chip multi-core processor, the communication traffic between each slice and the capacity utilization of the two-level Cache are uneven. In view of this phenomenon, this paper proposes an adaptive replication strategy for multithreaded applications, ARP, which combines the advantages of private secondary Cache and shared Cache, and dynamically controls the number of replication between secondary Cache by periodically weighing the benefits and consumption of Cache data replication. The experimental results show that in the 16-core configuration, the ARP mechanism can reduce the network traffic by 52% and increase the capacity utilization to 58%. In addition, it also has a good effect in optimizing the average access distance. two銆,
本文編號:2505146
[Abstract]:Compared with the traditional single-core processor, on-chip multi-core processor (Chip Multi-Processor, CMP) has the advantages of low complexity, good expansibility and high performance-price ratio. CMP has become the development trend of high-performance microprocessor driven by process and application. The design complexity and performance bottleneck of multi-core processor are mostly concentrated on-chip storage system. Improving the hit ratio of cache (Cache) and avoiding off-chip access with large delay are very important to the overall performance of the system. Therefore, on-chip hierarchical Cache system has become one of the research priorities of multi-core processor. At present, the academic circles have done a lot of work on the cache optimization of CMP, but most of these work is oriented to multi-program. For multithreaded applications, whether the existing Cache optimization technology can improve the performance of the program or how to improve the performance is still an open question. The research in this paper is mainly aimed at the performance optimization and parallel simulation of multi-core processor Cache. The contributions and innovations of this paper are as follows: 1. The cache optimization mechanism of split multi-core processor is studied. In the sliced on-chip multi-core processor, the communication traffic between each slice and the capacity utilization of the two-level Cache are uneven. In view of this phenomenon, this paper proposes an adaptive replication strategy for multithreaded applications, ARP, which combines the advantages of private secondary Cache and shared Cache, and dynamically controls the number of replication between secondary Cache by periodically weighing the benefits and consumption of Cache data replication. The experimental results show that in the 16-core configuration, the ARP mechanism can reduce the network traffic by 52% and increase the capacity utilization to 58%. In addition, it also has a good effect in optimizing the average access distance. two銆,
本文編號:2505146
本文鏈接:http://www.sikaile.net/kejilunwen/jisuanjikexuelunwen/2505146.html