GPU集群的并行編程通信接口研究
發(fā)布時間:2018-01-19 15:42
本文關(guān)鍵詞: 圖形處理器集群 并行編程 集群通信 全局數(shù)組 出處:《華中科技大學》2012年碩士論文 論文類型:學位論文
【摘要】:圖形處理器GPU善于處理大規(guī)模密集型數(shù)據(jù)和并行數(shù)據(jù),通用并行架構(gòu)CUDA讓GPU在通用計算領(lǐng)域越來越普及。由于GPU集群的高性價比,高性能計算領(lǐng)域中GPU集群的使用越來越普遍,但GPU集群并行編程并沒有一個標準的通信模型,絕大多數(shù)集群應用采取CUDA+MPI的方法實現(xiàn),而CUDA和MPI編程都非常困難,需要程序員了解GPU硬件架構(gòu)和MPI消息傳遞機制,,顯式控制內(nèi)存與顯存、節(jié)點與節(jié)點間的數(shù)據(jù)傳輸。因此,對編程人員來說,GPU集群并行編程仍是一個復雜的問題。 GPU集群通信接口CUDAGA結(jié)合分布式內(nèi)存上的共享內(nèi)存編程模型GA與通用并行架構(gòu)CUDA的特征,采用共享顯存方式,通過全局共享地址空間實現(xiàn)節(jié)點間GPU-to-GPU的數(shù)據(jù)通信,并通過內(nèi)部透明的CPU端臨時全局數(shù)組和GPU端全局數(shù)組來維護數(shù)據(jù)一致性,保證通信數(shù)據(jù)的正確性。同時,該接口解決了多進程多GPU環(huán)境下GPU設(shè)備的初始化問題,并提供GPU集群信息查詢接口及圖形化監(jiān)控界面兩種方式,幫助用戶及時了解設(shè)備使用情況。此外,CUDAGA從數(shù)據(jù)傳輸和計算內(nèi)核兩方面對GA庫中的數(shù)組運算進行優(yōu)化,加速后的函數(shù)庫可供用戶直接使用。CUDAGA為用戶提供了一個簡單方便的GPU集群并行編程通信接口,在保證通信性能的同時簡化編程難度,提高程序員編寫GPU集群應用程序的效率。 選取并行矩陣乘Cannon算法和Jacobi迭代算法在GPU集群上的代碼實現(xiàn)和運行為例,對GPU集群通信接口CUDAGA進行測試。從編程復雜度與通信性能兩方面的測試結(jié)果可以看出,對于以數(shù)組為基本數(shù)據(jù)結(jié)構(gòu)、節(jié)點間通信量大且涉及大量數(shù)據(jù)訪問操作的應用,用CUDAGA編寫的代碼的運行性能要優(yōu)于用CUDA+MPI實現(xiàn)的版本,而且代碼長度縮短一半以上,提高了程序編寫的效率。
[Abstract]:Graphics processor GPU is good at dealing with large scale intensive data and parallel data. CUDA makes GPU become more and more popular in the field of general computing because of the high cost performance of GPU cluster. The use of GPU cluster is becoming more and more common in the field of high performance computing, but there is no standard communication model for GPU cluster parallel programming. Most cluster applications adopt the method of CUDA MPI. CUDA and MPI programming are very difficult, require programmers to understand the GPU hardware architecture and MPI messaging mechanism, explicit control of memory and memory, node to node data transmission. Parallel programming in GPU clusters is still a complex problem for programmers. GPU trunked communication interface (CUDAGA) combines the characteristics of shared memory programming model (GA) on distributed memory with that of CUDA, which is a general parallel architecture, and adopts shared video memory. The GPU-to-GPU data communication between nodes is realized through the global shared address space, and the data consistency is maintained through the internal transparent temporary global array on the CPU side and the global array on the GPU side. At the same time, the interface solves the initialization problem of GPU device in multi-process and multi-#en0# environment, and provides two ways of GPU cluster information query interface and graphical monitoring interface. In addition, CUDAGA optimizes the array operation in GA library from the aspects of data transmission and computing kernel. The accelerated function library can be used directly by the user. CUDAGA provides a simple and convenient communication interface for GPU cluster parallel programming, which simplifies the programming difficulty while ensuring the communication performance. Improve the efficiency of programmers writing GPU cluster applications. The parallel matrix multiplication Cannon algorithm and the Jacobi iterative algorithm are selected as an example to implement and run the code on the GPU cluster. The GPU trunked communication interface CUDAGA is tested. From the test results of programming complexity and communication performance, we can see that array is the basic data structure. The code written in CUDAGA has better performance than the version implemented in CUDA MPI, and the length of code is shortened by more than half because of the large amount of communication between nodes and the application of a large number of data access operations. The efficiency of programming is improved.
【學位授予單位】:華中科技大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TP338.6
【參考文獻】
相關(guān)期刊論文 前4條
1 陳華平 ;黃劉生 ;安虹 ;陳國良;;并行分布計算中的任務調(diào)度及其分類[J];計算機科學;2001年01期
2 程豪;張云泉;張先軼;李玉成;;CPU-GPU并行矩陣乘法的實現(xiàn)與性能分析[J];計算機工程;2010年13期
3 吳恩華,柳有權(quán);基于圖形處理器(GPU)的通用計算[J];計算機輔助設(shè)計與圖形學學報;2004年05期
4 馮高鋒;;GPU-CPU集群上的動態(tài)規(guī)劃算法[J];計算機應用;2007年S2期
相關(guān)碩士學位論文 前1條
1 馬慶懷;基于CPU與GPU混合架構(gòu)集群的性能測試與優(yōu)化[D];中國地質(zhì)大學(北京);2011年
本文編號:1444838
本文鏈接:http://www.sikaile.net/kejilunwen/jisuanjikexuelunwen/1444838.html
最近更新
教材專著