面向嵌入式系統(tǒng)的自調(diào)數(shù)據(jù)預取

發(fā)布時間：2018-03-23 06:09

本文選題：數(shù)據(jù)預取　切入點：多核處理器　出處：《浙江大學》2013年碩士論文　論文類型：學位論文

【摘要】：針對計算機系統(tǒng)中存在的存儲墻問題,現(xiàn)代處理器采用預取技術,利用應用程序中存在的規(guī)律性地址訪問模式,來對存儲訪問行為進行預測,以減少高速緩存缺失次數(shù)。然而目前工業(yè)和學術界的各種預取技術存在以下問題：1)應用程序中存在大量的鏈表指針模式,而主流商業(yè)處理器上的預取引擎只針對線性地址模式進行預測；2)現(xiàn)有的指針預取方法對返回值進行類地址判斷,其預取準確率較低,通常在10%以下；3)在多核處理器上數(shù)據(jù)預取引擎會加劇對共享資源的沖突,進而導致系統(tǒng)總體性能降低。本文開發(fā)了一款兼容MIPS32指令集的周期級軟件模擬器,來對嵌入式單核／多核處理器的功能、時序和成本三方面進行建模。在該平臺上針對上述現(xiàn)有預取技術中存在的問題探索解決方案。根據(jù)對應用特性的分析和優(yōu)化空間探索,提出了用于嵌入式單核處理器的多模式自調(diào)數(shù)據(jù)預取方案。該解決方案根據(jù)硬件統(tǒng)計的運行時信息,通過特殊預取指令對兩種預取模式的激進度進行自適應調(diào)節(jié),通過鏈式和線性模式判斷提高了預取的準確率。在單核軟件模擬器上執(zhí)行EEMBC、 SPEC CPU2006和OLDEN評測程序,結果表明,多模式預取引擎的準確率分別平均為36%,40%和56%,而內(nèi)容指導(Content direct prefetching, CDP)的指針預取準確率分別為8%,9%和24%,相對流預取、CDP指針預取和GHB預取性能分別提升7%、6%和9%。本文針對多核多線程的應用環(huán)境,提出一種線程分類的預取機制,來降低數(shù)據(jù)預取導致的存儲系統(tǒng)資源競爭。提出的多核數(shù)據(jù)預取機制包括：(1)采用過濾方式通知硬件單元,丟棄預取請求會導致線程間數(shù)據(jù)無效化的預取。(2)根據(jù)運行時信息對線程進行分類,調(diào)整各線程數(shù)據(jù)預取引擎的開關狀態(tài)和激進程度,從而降低了線程間的資源沖突。在16核系統(tǒng)進行建模,采用PARSEC、SPLASH-2和科學計算程序進行評估,結果表明：相比于基準預取引擎,采用過濾機制和線程分類調(diào)整預取策略,系統(tǒng)性能分別可以提升2%和6%。相比將反饋指導預取(Feedback direct prefetching, FDP)技術應用于基準預取引擎上的結果,本文提出的預取機制提升了4%的系統(tǒng)性能,并減少了4%的能量時間積。
[Abstract]:Aiming at the problem of storage wall in computer system, modern processor uses prefetching technology to predict storage access behavior by using regular address access mode in application program. To reduce the number of cache deletions. However, the current industrial and academic prefetching technologies have the following problems: 1) there are a large number of linked list pointer patterns in applications, On the other hand, the prefetching engine on the mainstream commercial processor only predicts the linear address mode. (2) the existing pointer prefetching method can judge the return value by class address, and the accuracy of prefetching is low. Generally less than 10%) data prefetching engines on multicore processors can exacerbate the conflict on shared resources and thus result in a deterioration in overall system performance. In this paper, a cycle level software simulator compatible with MIPS32 instruction set is developed to perform the function of embedded single core / multi core processor. Based on the analysis of the characteristics of the application and optimization of space exploration, this platform explores solutions to the problems existing in the existing prefetching technologies mentioned above. This paper presents a multi-mode self-tuning data prefetching scheme for embedded single-core processors, which adaptively adjusts the radicalization of the two prefetching modes through special prefetching instructions according to the runtime information of hardware statistics. The accuracy of prefetching is improved by chain and linear mode judgment. The EEMBC, SPEC CPU2006 and OLDEN evaluation programs are executed on the single core software simulator, and the results show that, The average accuracy of multi-mode prefetching engine is 36% and 56%, respectively, while the accuracy of content direct prefetching is 8% and 24%, respectively. The relative flow prefetching and GHB prefetching performance are improved by 7% and 9%, respectively. In this paper, a prefetching mechanism of thread classification is proposed to reduce the resource competition of storage system caused by data prefetching. The multi-core data prefetching mechanism includes: 1) notifying the hardware unit by filtering method. Pre-fetching requests, which will invalidate data between threads, categorize threads according to runtime information, and adjust the switch state and radicalization of each thread's data prefetching engine. Thus, the resource conflict between threads is reduced. Modeling in 16-core system, using PARS ECS / SPLASH-2 and scientific calculation program to evaluate, the results show that compared with the benchmark prefetching engine, filtering mechanism and thread classification are used to adjust the prefetching strategy. The system performance can be improved by 2% and 6% respectively. Compared with the result of applying feedback guidance prefetching (FDP) technique to the reference prefetching engine, the proposed prefetching mechanism improves the system performance by 4% and reduces the energy time product by 4%.
【學位授予單位】：浙江大學
【學位級別】：碩士
【學位授予年份】：2013
【分類號】：TP333

【參考文獻】

相關期刊論文前8條

1 高豐,劉鵬,姚慶棟,李東曉;一種基于HDTV信源集成解碼芯片的RTOS的設計與實現(xiàn)[J];電路與系統(tǒng)學報;2002年03期

2 樊建平,陳明宇;網(wǎng)格化的動態(tài)自組織高性能計算機體系結構DSAG[J];計算機研究與發(fā)展;2003年12期

3 胡偉武;張福新;李祖松;;龍芯2號處理器設計和性能分析[J];計算機研究與發(fā)展;2006年06期

4 胡偉武,唐志敏;龍芯1號處理器結構設計[J];計算機學報;2003年04期

5 張福新;章隆兵;胡偉武;;基于SimpleScalar的龍芯CPU模擬器Sim-Godson[J];計算機學報;2007年01期

6 郇丹丹;李祖松;胡偉武;劉志勇;;結合訪存失效隊列狀態(tài)的預取策略[J];計算機學報;2007年07期

7 高翔;張福新;湯彥;章隆兵;胡偉武;唐志敏;;基于龍芯CPU的多核全系統(tǒng)模擬器SimOS-Goodson[J];軟件學報;2007年04期

8 包云崗;許建衛(wèi);陳明宇;樊建平;;一種新型計算機體系結構模擬器的研究與實現(xiàn)[J];系統(tǒng)仿真學報;2007年07期

，

本文編號：1652221

資料下載