高性能DSP中SIMD關鍵計算部件的研究

發(fā)布時間：2018-07-08 14:10

本文選題：SIMD + 子字并行�。� 參考：《國防科學技術大學》2012年碩士論文

【摘要】：當前，嵌入式處理器的應用正向大規(guī)模，實時性等方向發(fā)展，其中高性能的功能部件是提升處理器性能的一大基礎。本文圍繞子字并行功能部件為中心，以FT-X高性能浮點DSP研究為背景，開展了對功能部件子字并行的深入研究，并提出了高性能的支持子字并行的功能部件的算法。 1）本文針對功能部件的獨特特點，面向不同應用，對采用子字并行的功能部件的性能進行了分析。并對DSP中存在最多的乘法和加法運算部件分別進行了加速比分析。 2）通過對乘法算法的深入分析，本文提出了一種支持子字并行的乘法算法。采用新型Booth編碼技術、ES編碼和CS編碼合理分離結構，，對高位寬乘法具有速度優(yōu)勢。并支持三種位寬工作模式，在文中對可以同時執(zhí)行1個64位乘法，4個32位乘法或16個16位乘法，支持有/無符號運算的乘法結構進行了舉例說明；為配合乘法矩陣算法在點積指令中的應用，本文提出了一種溢出判斷補償技術，解決了在多數(shù)據(jù)通路下點積和矩陣乘法的溢出判斷問題。 3）本文對有限域乘法部件進行了算法研究，并對有限域算法進行了子字并行化。提出了一種操作寬度和本原多項式同時可調的有限域乘法器。與現(xiàn)有的單功能有限域乘法器相比，在綜合指標上具備了一定優(yōu)勢。 4）本文對加法算法進行了分析。在比較較為先進的加法算法的基礎上，提出了一種支持子字并行的加法算法。該算法適用于支持邏輯指令和加/減法的ALU上，可擴展性較強，且性能較強。 5）上述算法最終實際應用在FT-X高性能浮點處理器的功能部件中。本文對設計的功能部件進行了詳細的設計和模擬驗證，并給出了最終的的綜合結果。本文提出的支持子字并行的乘法部件算法具有關鍵路徑較短，功能強大，面積較小等特點，是一種優(yōu)良的算法。綜合結果表明，該算法能夠提高64位可支持SIMD乘法速度約4%。本文提出的支持子字并行的加法器可以在較少增加標量加法延時的前提下，支持多種子字并行模式，并將結果選擇嵌在運算體內，與進位消除算法相比，性能提高11%。基于本文乘法算法的M部件能夠滿足應用的指令集要求。在DC綜合工具的環(huán)境及TSMC40nm工藝下，F(xiàn)T-X DSP的M部件面積為142275(um2)，動態(tài)功耗為28.6863(mW)，最高頻率可達1GHz。
[Abstract]:At present, the application of embedded processor is developing in the direction of large scale and real time, among which high performance functional components are the basis of improving processor performance. Based on the research of FT-X high performance floating-point DSP, this paper focuses on the sub-word parallelism, and makes a thorough research on the sub-word parallelism of the functional components, which is based on the FT-X high performance floating-point DSP. A high performance algorithm for supporting subword parallelism is proposed. 1) according to the unique characteristics of functional components and different applications, the performance of functional components using subword parallelism is analyzed in this paper. The speedup ratio analysis of the most existing multiplication and addition components in DSP is given. 2) through the in-depth analysis of the multiplication algorithm, a multiplication algorithm supporting subword parallelism is proposed in this paper. The new Booth coding technique is used in the separation of es coding and CS coding, which has the advantage of high bit width multiplication. It also supports three bit width working modes. In this paper, we illustrate the multiplication structure which can perform one 64-bit multiplication, four 32-bit multiplication or 16 16-bit multiplication at the same time, and support / unsigned operation. In order to match the application of multiplication matrix algorithm in dot product instruction, this paper proposes a compensation technique for overflow judgment. The problem of overflow judgment of point product and matrix multiplication under multi-data path is solved. 3) the algorithm of finite field multiplication is studied and the subword parallelization of finite field algorithm is presented. A finite field multiplier with adjustable operation width and primitive polynomial is proposed. Compared with the existing single function finite field multiplier, it has some advantages in the synthesis index. 4) the addition algorithm is analyzed in this paper. Based on the more advanced addition algorithm, a subword parallel addition algorithm is proposed. The algorithm is suitable for ALU which supports logical instruction and addition / subtraction. It is scalable and has strong performance. 5) the above algorithm is applied in FT-X high performance floating-point processor. In this paper, the functional components are designed and simulated in detail, and the final comprehensive results are given. The multiplication component algorithm, which supports subword parallelism, is an excellent algorithm because of its short critical path, powerful function and small area. The results show that the proposed algorithm can improve the speed of 64 bit SIMD multiplication. The proposed subword parallelism adder can support multi-seed word parallel mode with less scalar addition delay, and the result is embedded in the operation body. Compared with carry elimination algorithm, the performance is improved by 11%. The M part based on the multiplication algorithm in this paper can meet the requirement of instruction set. In the environment of DC synthesis tool and TSMC 40nm process, the M component area of FT-X DSP is 142275 (um2), the dynamic power consumption is 28.6863 (MW), and the highest frequency is 1 GHz.
【學位授予單位】：國防科學技術大學
【學位級別】：碩士
【學位授予年份】：2012
【分類號】：TP332

【相似文獻】

相關期刊論文前10條

1 侯永生;趙榮彩;高偉;朱嘉楓;;SIMD擴展部件數(shù)據(jù)依賴關系約束條件研究[J];信息工程大學學報;2014年01期

2 吳松,章勇,姚慶棟;嵌入式SIMD控制核的設計研究[J];浙江大學學報(工學版);2001年02期

3 周西漢,劉勃,周荷琴,袁非牛;一種基于奔騰SIMD指令的快速背景提取方法[J];計算機工程與應用;2004年27期

4 張倩;;二維SIMD結構的低功耗調度[J];計算機工程;2009年10期

5 魏帥;趙榮彩;姚遠;侯永生;;面向SIMD的數(shù)組重組和對齊優(yōu)化[J];計算機科學;2012年02期

6 張武健 ,邱曉海 ,周潤德 ,陳弘毅;A New Implementation of the Post-Stage Tasks of Motion Estimation Using SIMD Architecture[J];Tsinghua Science and Technology;2001年04期

7 李俊山,李莉,沈緒榜,焦康;圖象理解SIMD計算機的設計技術[J];小型微型計算機系統(tǒng);2002年09期

8 王馨梅,張發(fā)存,崔杜武;SIMD計算機的面向對象仿真方法[J];計算機工程;2005年17期

9 李初輝;王偉;肖瑋;;一種共享主存二維SIMD結構資源分配算法的改進與實現(xiàn)[J];計算機工程與科學;2008年09期

10 付光遠;;基于SIMD-MPP并行模型的圖像調度與映射方法研究[J];微電子學與計算機;2006年S1期

相關會議論文前8條

1 梅家祥;王永文;邢座程;;SIMD模式下訪存單元的驗證[A];第十五屆計算機工程與工藝年會暨第一屆微處理器技術論壇論文集（A輯）[C];2011年

2 付光遠;;基于SIMD-MPP并行模型的圖像調度與映射方法研究[A];2006年全國開放式分布與并行計算機學術會議論文集（三）[C];2006年

3 高巍;宋妍;;基于SIMD體系結構的數(shù)據(jù)相關控制語句轉化方法[A];2009年全國開放式分布與并行計算機學術會議論文集(上冊)[C];2009年

4 朱明慧;;ADI浮點DSP軟件編程中的指令并行與SIMD應用[A];中國航空學會信號與信息處理專業(yè)全國第八屆學術會議論文集[C];2004年

5 李彥潔;龐一;孫立峰;;多視點視頻編碼中的SIMD優(yōu)化研究[A];第四屆和諧人機環(huán)境聯(lián)合學術會議論文集[C];2008年

6 Liangchun Xu;Hongping Zhang;Wenfei Guo;Di Zhang;;A new SIMD correlatoralgorithm for GNSS software receivers toprocess complex IF data[A];第四屆中國衛(wèi)星導航學術年會論文集-S7 北斗/GNSS用戶終端技術[C];2013年

7 吳鐵彬;劉衡竹;楊惠;張劍鋒;侯申;;一種快速SIMD浮點乘加器的設計與實現(xiàn)[A];第十五屆計算機工程與工藝年會暨第一屆微處理器技術論壇論文集（B輯）[C];2011年

8 張科勛;李勇;郭海勇;;一種半定制與全定制相結合的SIMD乘法器設計[A];第十五屆計算機工程與工藝年會暨第一屆微處理器技術論壇論文集（A輯）[C];2011年

相關博士學位論文前5條

1 張為華;共享主存多SIMD結構編譯優(yōu)化及結構研究[D];復旦大學;2006年

2 李玉祥;面向非多媒體程序的SIMD向量化方法及優(yōu)化技術研究[D];中國科學技術大學;2008年

3 朱嘉華;SIMD編譯優(yōu)化方法研究[D];復旦大學;2005年

4 姜偉華;針對實際多媒體程序和多媒體擴展指令集的SIMD編譯優(yōu)化[D];復旦大學;2005年

5 魏帥;面向SIMD的向量化算法及重組技術研究[D];解放軍信息工程大學;2012年

相關碩士學位論文前10條

1 張倩;二維SIMD結構的低功耗調度[D];復旦大學;2008年

2 楊明;基于存儲訪問的SIMD優(yōu)化技術研究[D];解放軍信息工程大學;2011年

3 劉楷;基于SIMD結構的高性能DSP處理器評測程序的優(yōu)化與實現(xiàn)[D];西安電子科技大學;2012年

4 陳向;SIMD數(shù)據(jù)置換指令的自動生成與優(yōu)化[D];國防科學技術大學;2010年

5 彭永克;基于SIMD架構的二維DCT/IDCT變換電路模塊的設計與實現(xiàn)[D];上海交通大學;2008年

6 肖瑋;二維SIMD結構的編譯優(yōu)化與功耗研究[D];復旦大學;2008年

7 王迪;SIMD編譯優(yōu)化技術研究[D];浙江大學;2008年

8 高偉;面向SIMD的自動向量化優(yōu)化技術研究[D];解放軍信息工程大學;2013年

9 劉洋徐瑞;高性能DSP中SIMD關鍵計算部件的研究[D];國防科學技術大學;2012年

10 郝云龍;反饋指導的SIMD向量識別及優(yōu)化技術研究[D];解放軍信息工程大學;2011年

本文編號：2107693

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/jisuanjikexuelunwen/2107693.html

上一篇：BlueOcean海量存儲系統(tǒng)Windows客戶端設計與實現(xiàn)
下一篇：運動想象腦機接口的特征提取與模式分類研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

高性能DSP中SIMD關鍵計算部件的研究