多核集群上的混合并行分子動力學計算研究
發(fā)布時間:2018-02-03 23:58
本文關鍵詞: 混合編程模型 多核集群 分子動力學 MPI OpenMP 出處:《電子科技大學》2012年博士論文 論文類型:學位論文
【摘要】:隨著高性能計算機的快速發(fā)展和計算資源的日益豐富,高性能計算已成為當今國內外研究的熱點。由于高性能計算機的主流結構已從大規(guī)模并行處理機轉向多核集群,系統(tǒng)也從單一內存模型轉向混合內存模型,為高性能計算機所設計的并行程序必須適應這一轉變,從而產生了混合并行編程模型。分子動力學(Molecular Dynamics,MD)模擬作為一種重要的科學研究方法,在多個學科領域里得到了廣泛地應用。進一步加快MD模擬在多核集群上的計算速度,促進這些領域的科研工作進一步發(fā)展就變得非常緊迫。然而,當人們在設計多核集群上的基于混合并行編程模型的并行MD算法以及其它并行算法時,普遍遇到引入多線程并行時開銷過高的問題,使混合模型常常不如原來的純消息傳遞模型。因此,如何解決這類問題,提高科學與工程計算程序在多核集群上的計算速度,,是當前研究的一個重要方向。 本文全面系統(tǒng)地研究混合并行編程模型、混合并行MD算法的研究現(xiàn)狀和存在的不足,在此基礎上提出了一系列相關問題的優(yōu)化或改進算法。 本文的主要內容及創(chuàng)新點如下: (1)本文深入地分析了適用于多核集群的混合并行編程模型、并行MD算法的基本原理和基本實現(xiàn)方法,為后面提出的多核集群上的混合并行MD算法打下了基礎。 (2)本文論證了Critical Section算法進行多線程并行MD計算的可擴展性問題,理論分析和實驗結果表明,Critical Section算法在處理器核心數(shù)量大于8時的加速比明顯下降。本文進而提出了一個稱為三角形并行MD算法的優(yōu)化方法,該方法通過靜態(tài)分配原子集的策略讓各線程在不同的時刻進入臨界區(qū),從而減少臨界區(qū)的閑置時間,加快并行計算速度。 (3)本文提出了基于OpenMP的并行MD算法——SPMD-like(Single ProgramMultiple Data)算法。該算法采用與SPMD程序相同的各自處理數(shù)據并冗余計算跨區(qū)域數(shù)據關系的策略,但是在實現(xiàn)上卻接近簡單的OpenMP實現(xiàn),不需要修改MD的內部計算邏輯,只需要修改幾個數(shù)據結構并添加一個空間分解子程序。該算法在保持OpenMP實現(xiàn)簡單特點的同時取得接近純消息模型的并行計算性能和可擴展性。 (4)本文提出了一種多核集群上的基于混合MPI/OpenMP模型的并行MD算法。該算法在保持盡量小修改原則的基礎上,將SPMD-like算法嵌入純MPI并行MD程序中。該混合并行程序在節(jié)點內采用OpenMP并行,在引入較小并行開銷的同時,明顯地減少了節(jié)點間的通信時間,從而有效地提高了MD程序在多核集群上的計算速度和并行效率。 (5)本文提出了一種完全避免臨界區(qū)的歸約算法——分塊輪換歸約算法,該算法在保持與Critical Section算法相似的簡單性的同時,具有比Critical Section算法更好的并行性能和可擴展性。理論分析和實驗測試證明該算法在節(jié)點內處理器核數(shù)為16時并行性能較好,但是達到32以及更大時,它的性能不如SPMD-like算法。因此它和SPMD-like算法分別適合于不同的混合并行場合:節(jié)點內處理器核數(shù)量不多時,可選擇實現(xiàn)較簡單的分塊輪換歸約法;處理器核數(shù)量較多時可采用性能更好的SPMD-like算法。 (6)本文提出了一種基于混合MPI/TBB模型的并行MD算法,并以LAMMPS為例進行了它的實現(xiàn)研究。實驗測試結果表明,當多核集群中參與計算的節(jié)點數(shù)增加到一定程度后,混合模型可以獲得比純MPI模型更好的并行性能,且主要原因是通信時間的減少。
[Abstract]:With the rapid development of high - performance computers and the increasingly abundant computing resources , high - performance computing has become a hot topic at home and abroad . As the mainstream structure of high - performance computers has shifted from a large - scale parallel processing machine to a multi - core cluster , a parallel program designed by a high - performance computer has been widely used . In this paper , a systematic study of the mixed parallel programming model , the research status and the shortcomings of the hybrid parallel MD algorithm are studied systematically . Based on this , a series of optimization or improved algorithms are proposed . The main content and innovation points of this paper are as follows : ( 1 ) This paper deeply analyzes the mixed parallel programming model applicable to multi - core cluster , the basic principle and realization method of parallel MD algorithm , which lays a foundation for the hybrid parallel MD algorithm on multi - core cluster . ( 2 ) In this paper , the scalability problem of multi - thread parallel MD computation is demonstrated by Critical Section algorithm . The theoretical analysis and experimental results show that the critical section algorithm decreases significantly when the number of processor cores is greater than 8 . This paper further proposes an optimization method called triangle parallel MD algorithm . This method allows each thread to enter the critical area at different times by statically assigned atom set strategy , thus reducing the idle time of the critical area and speeding up the parallel computing speed . ( 3 ) In this paper , a parallel MD algorithm _ SPMD - like ( Single Program Multiple Data ) algorithm is proposed , which uses the same processing data as SPMD program and computes the cross - region data relationship . However , it is close to the implementation of the simple program . There is no need to modify the internal calculation logic of MD . It is only necessary to modify several data structures and add a spatial decomposition subroutine . ( 4 ) In this paper , a parallel MD algorithm is proposed based on the hybrid MPI - like model on a multi - core cluster . The algorithm is based on the principle of small modification , and the SPMD - like algorithm is embedded in a pure MPI parallel MD program . In the node , the hybrid parallel program is used in parallel , and the communication time between the nodes is obviously reduced while the smaller parallel overhead is introduced , thereby effectively improving the computing speed and the parallel efficiency of the MD program on the multi - core cluster . ( 5 ) In this paper , a reduction algorithm _ block rotation reduction algorithm is proposed to completely avoid the critical section . The algorithm has better parallel performance and scalability than the Critical Section algorithm while maintaining the similarity to the Critical Section algorithm . The theoretical analysis and experimental tests prove that the algorithm is better in parallel performance than the SPMD - like algorithm when the number of processors in the node is 16 . Therefore , it is better than the SPMD - like algorithm when the number of processors in the node is high . ( 6 ) A parallel MD algorithm based on mixed MPI / TBB model is presented in this paper , and its implementation is studied with LAMMPS . The experimental results show that when the number of nodes participating in the multi - core cluster increases to a certain degree , the hybrid model can obtain better parallel performance than pure MPI model , and the main reason is the reduction of communication time .
【學位授予單位】:電子科技大學
【學位級別】:博士
【學位授予年份】:2012
【分類號】:TP338
【參考文獻】
相關期刊論文 前7條
1 王慶先;孫世新;尚明生;劉宴兵;;并行計算模型研究[J];計算機科學;2004年09期
2 陳國良;孫廣中;徐云;呂敏;;并行算法研究方法學[J];計算機學報;2008年09期
3 白明澤;程麗;豆育升;孫世新;;基于OpenMP的分子動力學并行算法的性能分析與優(yōu)化[J];計算機應用;2012年01期
4 單瑩;吳建平;王正華;;基于SMP集群的多層次并行編程模型與并行優(yōu)化技術[J];計算機應用研究;2006年10期
5 潘衛(wèi);陳燎原;張錦華;李永革;潘莉;夏凡;;基于SMP集群的MPI+OpenMP混合編程模型研究[J];計算機應用研究;2009年12期
6 趙永華,遲學斌;基于SMP集群的MPI+OpenMP混合編程模型及有效實現(xiàn)[J];微電子學與計算機;2005年10期
7 陳國良;苗乾坤;孫廣中;徐云;鄭啟龍;;分層并行計算模型[J];中國科學技術大學學報;2008年07期
本文編號:1488715
本文鏈接:http://www.sikaile.net/kejilunwen/jisuanjikexuelunwen/1488715.html
最近更新
教材專著