基于自動(dòng)訪存模式分析的多OpenCL設(shè)備共享存儲(chǔ)設(shè)計(jì)
發(fā)布時(shí)間:2018-11-04 20:37
【摘要】:OpenCL具有良好的功能移植性,是主從結(jié)構(gòu)異構(gòu)多設(shè)備系統(tǒng)的理想編程模型。然而,要充分利用整個(gè)異構(gòu)系統(tǒng)的計(jì)算能力,程序員需要顯式的分配各個(gè)設(shè)備的負(fù)載,控制設(shè)備間的數(shù)據(jù)傳輸?shù)鹊?這些工作無(wú)疑增加了程序員的負(fù)擔(dān)。本文提出了多OpenCL設(shè)備共享存儲(chǔ)(OMSM),通過(guò)Runtime對(duì)共享存儲(chǔ)的支持使得程序員不需要顯示的控制數(shù)據(jù)傳輸。OMSM主要任務(wù)有兩個(gè):一個(gè)是任務(wù)劃分,一個(gè)是存儲(chǔ)管理。這兩個(gè)任務(wù)能夠自動(dòng)化的根本原因在于OpenCL編程模型中工作組的獨(dú)立性:索引空間中的工作組的獨(dú)立性使得劃分任務(wù)得以簡(jiǎn)化成分配不同數(shù)量的工作組,同時(shí),使得工作組寫(xiě)數(shù)據(jù)區(qū)域不能重疊,從而使得工作組的訪問(wèn)區(qū)域較為規(guī)則。訪存分析的自動(dòng)化是整個(gè)系統(tǒng)自動(dòng)化的關(guān)鍵。本文首先分析了工作組的訪存模式,結(jié)合kernel程序的特點(diǎn),提出了帶約束的線(xiàn)性的抽象描述來(lái)刻畫(huà)kernel程序工作組的訪存模式。為了高效的操作抽象描述,我們?cè)O(shè)計(jì)了求交、歸一化、獨(dú)立變量消除、合并和求解操作,并基于LLVM開(kāi)源的編譯器框架實(shí)現(xiàn)了訪存模式的自動(dòng)分析工具。獲取訪存信息之后,OMSM的Runtime在執(zhí)行時(shí)有兩個(gè)階段:一個(gè)是通過(guò)對(duì)系統(tǒng)內(nèi)各個(gè)設(shè)備Profiling來(lái)使得負(fù)載均衡,另一個(gè)是通過(guò)段表來(lái)描述數(shù)據(jù)在多個(gè)設(shè)備間的分布情況,自動(dòng)控制數(shù)據(jù)傳輸。實(shí)驗(yàn)結(jié)果表明,OMSM的對(duì)于沒(méi)有間接訪問(wèn)的kernel有很高的適用性,同時(shí)在同構(gòu)多設(shè)備和異構(gòu)多設(shè)備平臺(tái)上都獲得了較高的性能提升。
[Abstract]:OpenCL has good portability and is an ideal programming model for master-slave heterogeneous multi-device systems. However, in order to make full use of the computing power of the whole heterogeneous system, the programmer needs to explicitly distribute the load of each device, control the data transmission between the devices and so on, which undoubtedly increases the burden on the programmer. In this paper, we propose that the shared storage (OMSM), of multiple OpenCL devices can control data transmission that programmers do not need to display through the support of Runtime for shared storage. There are two main tasks in OMSM: one is task division, the other is storage management. The fundamental reason for the automation of these two tasks is the independence of the workgroups in the OpenCL programming model: the independence of the workgroups in the index space simplifies the division of tasks into a different number of workgroups, and at the same time, So that the workgroup write data area can not overlap, which makes the access area of the working group more regular. The automation of memory access analysis is the key to the automation of the whole system. In this paper, we first analyze the memory access mode of the working group, and combine the characteristics of the kernel program, we propose a constrained linear abstract description to describe the memory access mode of the kernel program working group. In order to efficiently describe the operation abstract, we design the intersection, normalization, independent variable elimination, merging and solving operations, and implement the automatic analysis tool of memory access pattern based on LLVM open source compiler framework. After obtaining the access information, the Runtime of OMSM has two stages of execution: one is to balance the load through the Profiling of each device in the system, and the other is to describe the distribution of data among multiple devices through the segment table. Automatic control of data transmission. The experimental results show that OMSM has a high applicability to kernel without indirect access, and high performance improvement is achieved on both isomorphic and heterogeneous multi-device platforms.
【學(xué)位授予單位】:國(guó)防科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類(lèi)號(hào)】:TP333
,
本文編號(hào):2311067
[Abstract]:OpenCL has good portability and is an ideal programming model for master-slave heterogeneous multi-device systems. However, in order to make full use of the computing power of the whole heterogeneous system, the programmer needs to explicitly distribute the load of each device, control the data transmission between the devices and so on, which undoubtedly increases the burden on the programmer. In this paper, we propose that the shared storage (OMSM), of multiple OpenCL devices can control data transmission that programmers do not need to display through the support of Runtime for shared storage. There are two main tasks in OMSM: one is task division, the other is storage management. The fundamental reason for the automation of these two tasks is the independence of the workgroups in the OpenCL programming model: the independence of the workgroups in the index space simplifies the division of tasks into a different number of workgroups, and at the same time, So that the workgroup write data area can not overlap, which makes the access area of the working group more regular. The automation of memory access analysis is the key to the automation of the whole system. In this paper, we first analyze the memory access mode of the working group, and combine the characteristics of the kernel program, we propose a constrained linear abstract description to describe the memory access mode of the kernel program working group. In order to efficiently describe the operation abstract, we design the intersection, normalization, independent variable elimination, merging and solving operations, and implement the automatic analysis tool of memory access pattern based on LLVM open source compiler framework. After obtaining the access information, the Runtime of OMSM has two stages of execution: one is to balance the load through the Profiling of each device in the system, and the other is to describe the distribution of data among multiple devices through the segment table. Automatic control of data transmission. The experimental results show that OMSM has a high applicability to kernel without indirect access, and high performance improvement is achieved on both isomorphic and heterogeneous multi-device platforms.
【學(xué)位授予單位】:國(guó)防科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類(lèi)號(hào)】:TP333
,
本文編號(hào):2311067
本文鏈接:http://www.sikaile.net/kejilunwen/jisuanjikexuelunwen/2311067.html
最近更新
教材專(zhuān)著