天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

運(yùn)用GPU計(jì)算面向非規(guī)則應(yīng)用的非合并內(nèi)存訪問(wèn)優(yōu)化(英文)

發(fā)布時(shí)間:2021-07-27 08:37
  通用圖形處理器(GPGPU)可大大提升規(guī)則應(yīng)用的計(jì)算性能。然而,很多應(yīng)用中存在非規(guī)則內(nèi)存訪問(wèn)模式,大大限制了GPU的性能優(yōu)勢(shì)。近年來(lái),一些研究提出解決方案來(lái)移除靜態(tài)非規(guī)則內(nèi)存訪問(wèn)。然而,利用軟件消除動(dòng)態(tài)非規(guī)則內(nèi)存訪問(wèn)仍然面臨嚴(yán)峻挑戰(zhàn)。本文提出一種純軟件解決方案用于消除動(dòng)態(tài)非規(guī)則內(nèi)存訪問(wèn),尤其是間接內(nèi)存訪問(wèn),無(wú)需硬件擴(kuò)展和離線分析。提出數(shù)據(jù)重組和索引重定向以減少內(nèi)存訪問(wèn)次數(shù),從而提高GPU內(nèi)核性能。為提高數(shù)據(jù)重組效率,卸載重組數(shù)據(jù)操作至GPU以降低開(kāi)銷并傳輸數(shù)據(jù)。通過(guò)并發(fā)執(zhí)行數(shù)據(jù)重組和數(shù)據(jù)處理內(nèi)核的統(tǒng)一計(jì)算設(shè)備架構(gòu)(CUDA)流,可降低數(shù)據(jù)重組開(kāi)銷。完成這些優(yōu)化后,相比于CUSPARSE基準(zhǔn)測(cè)試,使用該方法GPU內(nèi)核的內(nèi)存數(shù)據(jù)傳輸減少了16.7%–50%;同時(shí),NVIDIA Tesla P4 GPU上的內(nèi)核性能提高了9.64%–34.9%。 

【文章來(lái)源】:Frontiers of Information Technology & Electronic Engineering. 2020,21(09)EISCICSCD

【文章頁(yè)數(shù)】:18 頁(yè)

【文章目錄】:
1 Introduction
2 Related work
3 System analysis and design
    3.1 Analysis of memory access pattern
    3.2 Data reordering with CPU and GPU
    3.3 System design
4 Irregularity elimination
    4.1 Data reordering
    4.2 Index redirection
5 Overhead optimization
    5.1 Overlapping remapping with computa-tion
    5.2 Cache
6 Experiments and evaluation
    6.1 Experimental setup
    6.2 Benefits of data reordering on GPU
        6.2.1 Reduction of the number of memory transac-tions
        6.2.2 Data reordering varying from CPU to GPU
    6.3 Performance optimization with overlap and cache
        6.3.1 Evaluation of the CG program
        6.3.2 Evaluation of the SP program
        6.3.3 Evaluation of the MD program
    6.4 Overview of results
7 Conclusions



本文編號(hào):3305444

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/kejilunwen/jisuanjikexuelunwen/3305444.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶0887d***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com