基于GPU的TOUGHREACT并行化實現(xiàn)

發(fā)布時間：2018-08-24 19:20

【摘要】：近年來，高性能并行計算技術發(fā)展迅速。利用新的多核、眾核以及GPU計算平臺高效實現(xiàn)復雜地質條件下物理化學狀態(tài)數(shù)值模型的模擬，已經成為地質工作者越來越關心的科學課題。隨著GPU通用計算的出現(xiàn)以及飛速發(fā)展，越來越多的研究人員利用GPU技術來加速地下多相流數(shù)值模擬軟件的計算過程，以滿足大尺度、高精度的應用需求。由勞倫斯伯克利實驗室開發(fā)的TOUGHREACT是當前應用最廣泛的解決地下多相流體運動與地球化學反應運移耦合過程和機理的模擬程序。當前，在對要求較大尺度、較高精度的復雜地質環(huán)境問題（如二氧化碳地質儲存）進行數(shù)值模擬時，TOUGHREACT執(zhí)行效率不高。因此通過GPU并行計算技術加速TOUGHREACT的數(shù)值模擬過程有非常重要的工程意義和研究價值。本文基于此目的在CPU-GPU異構計算平臺上對TOUGHREACT軟件進行了并行化實現(xiàn)。首先，通過了解相關專業(yè)知識，對軟件的基本模擬過程進行簡要理解。參考已有的研究工作，對軟件的模塊化結構進行了詳細分析。對比多相流模塊與地球化學反應運移模塊在求解過程中的差異，綜合考慮線性方程組的規(guī)模和每個時間步內迭代求解過程的并發(fā)性，確定多相流動數(shù)值模擬部分更適合在GPU平臺上并行實現(xiàn)。在對自然科學和社會科學中許多實際問題進行數(shù)值求解時，經常使用偏微分方程作為數(shù)值模型來表示質量與能量守恒狀態(tài)，而在對偏微分方程進行離散求解時，稀疏線性方程組的求解是主要的計算步驟之一。尤其是在對某些場地級大尺度問題進行模擬時，稀疏線性方程組的求解時間會達到80%以上。因此，本文對TOUREACT中各部分模塊執(zhí)行時間進行了對比，選擇以其中線性方程組求解過程為重點開展并行化工作。由于求解多相流問題時遇到的系數(shù)矩陣具有非對稱非正定的特征，因此本文使用krylov子空間法中的幾種雙共軛梯度法求解方程組。同時，為了不以犧牲求解效率為代價，決定不對預處理部分做GPU移植，而主要針對求解中最耗時的兩個部分：稀疏矩陣向量乘（SPMV）和向量內積操作進行CUDA實現(xiàn)。確定了各個內核函數(shù)映射關系以后，基于CUDA的并行程序開發(fā)難度不大，但是一些必要的優(yōu)化手段可以顯著提高并行程序的性能。本文作了如下工作：選擇合理的稀疏矩陣存儲格式，減少內存占用以及主機與設備的數(shù)據傳輸開銷；優(yōu)化存儲器訪問，使用共享內存、頁鎖定存儲器以及合并順序執(zhí)行的內核函數(shù)來減少全局內存訪問；優(yōu)化指令流，包括避免不必要的同步操作以及循環(huán)展開；實現(xiàn)多版本內核，建立線程規(guī)模判定樹，根據不同的問題規(guī)模進行合理的線程組織，充分利用GPU上的處理器資源，以達到負載均衡的目的。最后，將實現(xiàn)的并行預處理共軛梯度求解器整合到TOUGHREACT程序中。在CPU-GPU構成的計算平臺上，對不同規(guī)模的實際問題進行數(shù)值模擬，對本文實現(xiàn)的并行BICG和并行BICGSTB算法進行性能測試。實驗表明，本文實現(xiàn)的線性方程組并行求解器相對于CPU串行程序有最多3.4倍的加速比，對多相流動數(shù)值模擬的整體求解過程有最多2.8倍的加速比。這一結果印證了本文使用的并行化策略的正確性，為進一步的對地球化學反應運移模塊的GPU移植工作打下了很好的基礎，積累了豐富的經驗。
[Abstract]:In recent years, high-performance parallel computing technology has developed rapidly. Using new multi-core, multi-core and GPU computing platform to efficiently simulate the physical and chemical state numerical model under complex geological conditions has become a scientific topic of increasing concern to geologists. GPU technology is used to speed up the calculation process of underground multiphase flow numerical simulation software to meet the needs of large-scale and high-precision applications.TOUGHREACT developed by Lawrence Berkeley Laboratory is the most widely used simulation program to solve the coupling process and mechanism of underground multiphase flow and geochemical reaction and migration. Therefore, it is of great engineering significance and research value to accelerate the numerical simulation process of TOUGHREACT by GPU parallel computing technology. This paper is based on this purpose in CPU-GPU heterogeneous. TOUGHREACT software is parallelized on the computing platform.
Firstly, the basic simulation process of the software is briefly understood by understanding the relevant professional knowledge. Referring to the existing research work, the modular structure of the software is analyzed in detail. The concurrency of the iterative process in the step determines that the numerical simulation part of multiphase flow is more suitable for parallel implementation on the GPU platform.
Partial differential equations (PDEs) are often used as numerical models to represent the conservation of mass and energy in numerical solutions of many practical problems in natural and social sciences. In the discrete solution of PDEs, the solution of sparse linear equations is one of the main computational steps, especially for large sites. When the scale problem is simulated, the solution time of the sparse linear equations will be more than 80%. Therefore, this paper compares the execution time of each module in TOUREACT, and chooses the solution process of the linear equations as the focus of parallel work.
Because the coefficient matrices encountered in solving multiphase flow problems are asymmetric and non-positive definite, several double conjugate gradient methods in Krylov subspace method are used to solve the equations in this paper. Divided into: Sparse Matrix Vector Multiplication (SPMV) and Vector Inner Product (VIP) operations are implemented in CUDA. After determining the mapping relations of each kernel function, it is not difficult to develop parallel programs based on CUDA, but some necessary optimization methods can significantly improve the performance of parallel programs. Optimizing memory access, using shared memory, page-locked memory, and merging sequential kernel functions to reduce global memory access; optimizing instruction flow, including avoiding unnecessary synchronization and loop unwrapping; implementing a multi-version kernel to establish lines Program size decision tree is used to organize threads reasonably according to different problem sizes and make full use of processor resources on GPU to achieve load balancing.
Finally, the parallel preconditioned conjugate gradient solver is integrated into the TOUGHREACT program. On the platform of CPU-GPU, numerical simulations are carried out for practical problems of different scales. The performance of the parallel BICG and parallel BICGSTB algorithms implemented in this paper are tested. Experiments show that the parallel solver of linear equations realized in this paper is phase-wise. There is a maximum acceleration ratio of 3.4 times for the CPU serial program and 2.8 times for the whole solution process of multiphase flow numerical simulation. This result confirms the correctness of the parallelization strategy used in this paper, and lays a good foundation for further GPU transplantation of the geochemical reaction and migration module. Experience.
【學位授予單位】：吉林大學
【學位級別】：碩士
【學位授予年份】：2012
【分類號】：TP338.6

【參考文獻】

相關期刊論文前1條

1 施小清;張可霓;吳吉春;;TOUGH2軟件的發(fā)展及應用[J];工程勘察;2009年10期

，

本文編號：2201782

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/jisuanjikexuelunwen/2201782.html

上一篇：一款單片機系統(tǒng)控制的溫控智能水杯設計
下一篇：基于私有云存儲技術的重要檔案移交與異地備份的探討

論文發(fā)表

·知網|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于GPU的TOUGHREACT并行化實現(xiàn)