基于2.5D封裝系統(tǒng)的存儲(chǔ)型計(jì)算研究
發(fā)布時(shí)間:2018-11-06 09:05
【摘要】:對(duì)于數(shù)據(jù)密集型應(yīng)用,大量能量和延時(shí)消耗在計(jì)算和存儲(chǔ)單元之間的數(shù)據(jù)傳輸上,造成馮·諾依曼瓶頸。在采用2.5D封裝集成的系統(tǒng)中,這一問(wèn)題依然存在。為此,提出一種新型的硬件加速方案。引入存儲(chǔ)型計(jì)算到2.5D系統(tǒng)中,使片外存儲(chǔ)具備運(yùn)算的能力。將存儲(chǔ)器劃分為若干個(gè)bank,支持bank間并行訪問(wèn),并在存儲(chǔ)陣列中設(shè)計(jì)可配置的加速單元,充分利用存儲(chǔ)陣列的帶寬進(jìn)行并行計(jì)算,降低數(shù)據(jù)傳輸?shù)难訒r(shí)和能耗。以H.264解碼中的反量化反變換為例對(duì)該結(jié)構(gòu)進(jìn)行實(shí)現(xiàn),仿真結(jié)果顯示,相較于傳統(tǒng)軟件實(shí)現(xiàn)方法,該方案可獲得7.1倍的性能提升,節(jié)省80.5%的能量,并且只增加2%的面積開(kāi)銷。
[Abstract]:For data-intensive applications, a large amount of energy and delay is consumed on data transmission between computing and storage cells, resulting in von Neumann bottleneck. This problem still exists in the integrated system with 2.5 D package. Therefore, a new hardware acceleration scheme is proposed. The memory computing is introduced into 2.5D system, which makes the off-chip storage have the ability of operation. The memory is divided into several bank, to support parallel access between bank, and a configurable accelerator is designed in the memory array to make full use of the bandwidth of the memory array for parallel computation, thus reducing the delay and energy consumption of data transmission. Taking the inverse quantization inverse transform in H.264 decoding as an example, the simulation results show that compared with the traditional software implementation method, the performance of the scheme can be improved by 7.1 times and the energy of 80.5% can be saved. And only increase by 2% area overhead.
【作者單位】: 復(fù)旦大學(xué)專用集成電路與系統(tǒng)國(guó)家重點(diǎn)實(shí)驗(yàn)室;中山大學(xué)中山大學(xué)-卡內(nèi)基梅隆大學(xué)聯(lián)合工程學(xué)院;廣東順德中山大學(xué)-卡內(nèi)基梅隆大學(xué)國(guó)際聯(lián)合研究院;
【基金】:廣東順德中山大學(xué)-卡內(nèi)基梅隆大學(xué)國(guó)際聯(lián)合研究院項(xiàng)目(20150303) 三星電子橫向課題(SLSI-201403DD013)
【分類號(hào)】:TN405
,
本文編號(hào):2313838
[Abstract]:For data-intensive applications, a large amount of energy and delay is consumed on data transmission between computing and storage cells, resulting in von Neumann bottleneck. This problem still exists in the integrated system with 2.5 D package. Therefore, a new hardware acceleration scheme is proposed. The memory computing is introduced into 2.5D system, which makes the off-chip storage have the ability of operation. The memory is divided into several bank, to support parallel access between bank, and a configurable accelerator is designed in the memory array to make full use of the bandwidth of the memory array for parallel computation, thus reducing the delay and energy consumption of data transmission. Taking the inverse quantization inverse transform in H.264 decoding as an example, the simulation results show that compared with the traditional software implementation method, the performance of the scheme can be improved by 7.1 times and the energy of 80.5% can be saved. And only increase by 2% area overhead.
【作者單位】: 復(fù)旦大學(xué)專用集成電路與系統(tǒng)國(guó)家重點(diǎn)實(shí)驗(yàn)室;中山大學(xué)中山大學(xué)-卡內(nèi)基梅隆大學(xué)聯(lián)合工程學(xué)院;廣東順德中山大學(xué)-卡內(nèi)基梅隆大學(xué)國(guó)際聯(lián)合研究院;
【基金】:廣東順德中山大學(xué)-卡內(nèi)基梅隆大學(xué)國(guó)際聯(lián)合研究院項(xiàng)目(20150303) 三星電子橫向課題(SLSI-201403DD013)
【分類號(hào)】:TN405
,
本文編號(hào):2313838
本文鏈接:http://www.sikaile.net/kejilunwen/dianzigongchenglunwen/2313838.html
最近更新
教材專著