基于Intel Xeon Phi的稀疏矩陣向量乘性能優(yōu)化

發(fā)布時間：2019-04-21 19:35

【摘要】：稀疏矩陣向量乘(Sp MV)是線性求解系統(tǒng)等科學計算中重要的計算核心.鑒于傳統(tǒng)的稀疏矩陣向量乘算法在Intel Xeon Phi眾核集成架構(gòu)上存在SIM D利用率低,不規(guī)則訪存開銷高及負載不均衡的問題,難以發(fā)揮其運算能力.本文針對Intel Xeon Phi的體系結(jié)構(gòu)特點,提出了一種通用的分塊壓縮存儲表示的稀疏矩陣向量乘并行算法:(1)在ELLPACK存儲格式基礎上按列分塊及壓縮矩陣,增加非零元的密度,提高SIMD利用率;(2)通過精心的數(shù)據(jù)重排,保留矩陣非零元本身的局部性,從而提高數(shù)據(jù)重用率,降低訪存開銷;(3)將矩陣壓縮后劃分成近似等大的矩陣塊并靜態(tài)等量分配到不同核上計算,使各核負載均衡.實驗結(jié)果表明,與Intel Xeon Phi上已有的MKL數(shù)學庫中的CSR算法相比,本算法獲得了更高的計算訪存比,性能比M KL的CSR算法平均快2.05倍.
[Abstract]:Sparse matrix vector multiplication (Sp MV) is an important core of scientific computation such as linear solution system. Because the traditional sparse matrix vector multiplication algorithm has the problems of low utilization of SIM D, high overhead of irregular memory access and unbalanced load in the Intel Xeon Phi multikernel integration architecture, it is difficult to give full play to its computing power. According to the characteristics of Intel Xeon Phi architecture, this paper proposes a general sparse matrix vector multiplication algorithm based on block compression storage: (1) based on the ELLPACK storage format, the sparse matrix vector multiplication algorithm is proposed to increase the density of non-zero elements by column block and compression matrix. Improve the utilization rate of SIMD; (2) by meticulous data rearrangement, the locality of non-zero elements of the matrix is retained, so as to improve the data reuse rate and reduce the memory access overhead; (3) the compressed matrix is divided into approximately equal-size matrix blocks and distributed to different cores in static and equal quantities, so that the load of each core can be balanced. The experimental results show that compared with the CSR algorithm in the MKL mathematical library on Intel Xeon Phi, the proposed algorithm achieves a higher memory-to-computation ratio, and its performance is 2.05 times faster than that of MKL's CSR algorithm on average.
【作者單位】：中國科學技術大學計算機科學與技術學院;
【基金】：國家"八六三"高技術研究發(fā)展計劃項目(2012AA010901,2012AA010902)資助
【分類號】：TP332;O241.6

【相似文獻】

相關期刊論文前10條

1 張奠成 ,姚棟義;電子電路機助分析和設計中的稀疏矩陣技術[J];合肥工業(yè)大學學報;1981年02期

2 匡云太;一個縮減非對稱稀疏矩陣的帶寬和外形的算法[J];同濟大學學報;1987年03期

3 于繼業(yè);稀疏矩陣塊對角化的一種方法[J];數(shù)學的實踐與認識;1988年03期

4 黃東泉;有向圖在結(jié)構(gòu)不對稱稀疏矩陣重排序中的應用[J];西安交通大學學報;1982年06期

5 陸黎明;陳海強;朱鴻鶚;;稀疏矩陣技術在網(wǎng)絡分析中的應用[J];上海師范學院學報(自然科學版);1984年03期

6 鄭志鎮(zhèn),李尚健,李志剛;稀疏矩陣帶寬減小的一種算法[J];華中理工大學學報;1998年12期

7 秦體恒;李學相;安學慶;;稀疏矩陣存儲算法的探討[J];河南機電高等專科學校學報;2008年01期

8 周永法;稀疏矩陣的并行算法[J];北京航空學院學報;1982年04期

9 王玉卿;高斯消元的順序和稀疏矩陣的圖解[J];沈陽工業(yè)大學學報;1993年03期

10 應宏;;稀疏矩陣鏈式存儲的一種實現(xiàn)[J];牡丹江師范學院學報(自然科學版);1997年01期

相關碩士學位論文前5條

1 胡耀國;基于GPU的有限元方法研究[D];華中科技大學;2011年

2 梁添;基于GPU的稀疏矩陣運算優(yōu)化研究[D];華中科技大學;2012年

3 吳長江;基于CUDA的大規(guī)模線性稀疏方程組求解器的設計[D];電子科技大學;2013年

4 劉恩益;基于GPU的不可壓縮管流并行數(shù)值模擬關鍵技術研究[D];杭州電子科技大學;2014年

5 張?zhí)m;稀疏矩陣方程組預處理迭代技術研究[D];華南理工大學;2010年

，

本文編號：2462494

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/jisuanjikexuelunwen/2462494.html

上一篇：嵌入式系統(tǒng)任務級調(diào)試器的研究與實現(xiàn)
下一篇：應用ANSYS熱分析軟件優(yōu)化IDC機房散熱設計

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于Intel Xeon Phi的稀疏矩陣向量乘性能優(yōu)化