海量數(shù)據(jù)小文件分布式存儲系統(tǒng)的設計與實現(xiàn)
發(fā)布時間:2018-04-10 14:18
本文選題:海量小文件 + 文件系統(tǒng); 參考:《湖南大學》2013年碩士論文
【摘要】:近年,由于互聯(lián)網(wǎng)的發(fā)展,導致海量信息的傳輸和存儲的場景日益增多,在這種背景下,數(shù)據(jù)存儲技術也得到了快速發(fā)展。由于互聯(lián)網(wǎng)的信息以海量小文件居多,所以作為海量小文件存儲技術的一個重要研究方向,分布式文件系統(tǒng)是當今的研究熱點。目前,在分布式文件系統(tǒng)中存儲海量小文件時,還普遍存在著存儲性能不高、存儲空間利用率低、性能瓶頸及單點故障等問題,因此,如何解決目前海量小文件數(shù)據(jù)的存儲和傳輸中存在的諸多實際問題,是當前計算機存儲技術研究領域中非常重要的工作。 首先,針對上述問題,本文提出了一種在單個數(shù)據(jù)節(jié)點中存儲海量小文件的數(shù)據(jù)分塊方案。在該方案中,對小文件的概念及算法進行了描述,并定義了文件塊的塊內(nèi)利用率,塊內(nèi)相關率及塊間相關率三個指標,根據(jù)這三個指標,可以對每個文件塊中小文件分布的情況進行量化的考核,再衡量文件塊對于查詢數(shù)據(jù)的影響,最后可以有針對性的進行優(yōu)化。 其次,提出了一種給予小文件存儲的數(shù)據(jù)副本數(shù)確定算法。這種算法以小文件副本所在的數(shù)據(jù)節(jié)點可靠性為參數(shù),,該參數(shù)能夠快速確定小文件的可靠性,系統(tǒng)可以根據(jù)此可靠性來決定當前的小文件副本數(shù)量是否滿足要求。在此基礎上,提出了一種靈活的小文件副本弱一致性維護方案。 第三,在分析海量小文件分布式存儲系統(tǒng)的功能和性能需求的基礎上,提出了整個小文件存儲及管理系統(tǒng)的框架,該框架主要從數(shù)據(jù)節(jié)點DataNode、數(shù)據(jù)管理服務器DataServer、文件塊倒排表、文件倒排表與目錄的管理、相應的API函數(shù)等四個主要方面對海量小文件分布式存儲進行了設計和實現(xiàn)。 最后,為了評估系統(tǒng)的整體性能,對系統(tǒng)進行了測試。通過分析與測試一些關鍵性指標與性能,得出整個系統(tǒng)的性能基本達到設計要求,能夠滿足實際環(huán)境的要求的結論。
[Abstract]:In recent years, due to the development of the Internet, there are more and more scenes of mass information transmission and storage. In this context, data storage technology has also been rapidly developed.Distributed file system (DFS), as an important research direction of storage technology of large amount of small files, is one of the most popular research fields because of the large amount of small files on the Internet.At present, when storing large amount of small files in distributed file system, there are still some problems such as low storage performance, low utilization of storage space, performance bottleneck and single point failure, etc.How to solve many practical problems existing in the storage and transmission of large amounts of small file data is a very important work in the field of computer storage technology.Firstly, in order to solve the above problems, this paper proposes a data partitioning scheme for storing large amounts of small files in a single data node.In this scheme, the concept and algorithm of small files are described, and three indexes of the intra-block utilization ratio, intra-block correlation rate and inter-block correlation rate of the file block are defined.The distribution of small and medium files in each file block can be evaluated quantitatively, then the impact of file block on query data can be measured. Finally, the optimization can be carried out pertinently.Secondly, an algorithm for determining the number of copies of data stored in small files is proposed.This algorithm takes the reliability of the data node in which the small file copy is located as a parameter, and the parameter can quickly determine the reliability of the small file, according to which the system can determine whether the current number of small file replicas meets the requirements.On this basis, a flexible weak consistency maintenance scheme for small file replicas is proposed.Thirdly, on the basis of analyzing the function and performance requirement of the massive small file distributed storage system, this paper puts forward the framework of the whole small file storage and management system. The framework mainly consists of data node data Node, data management server data Server, file block inverted table.Four main aspects of file inverted table and directory management, corresponding API function, etc., are designed and implemented for distributed storage of large amount of small files.Finally, in order to evaluate the overall performance of the system, the system was tested.By analyzing and testing some key indexes and performance, it is concluded that the performance of the whole system basically meets the design requirements and can meet the requirements of the actual environment.
【學位授予單位】:湖南大學
【學位級別】:碩士
【學位授予年份】:2013
【分類號】:TP333
【參考文獻】
中國期刊全文數(shù)據(jù)庫 前4條
1 程瑩;張云勇;徐雷;房秉毅;;基于Hadoop及關系型數(shù)據(jù)庫的海量數(shù)據(jù)分析研究[J];電信科學;2010年11期
2 楊希;趙躍龍;周云霞;;智能網(wǎng)絡磁盤集群負載平衡研究[J];計算機工程與應用;2011年04期
3 欒亞建;黃爛
本文編號:1731554
本文鏈接:http://www.sikaile.net/kejilunwen/jisuanjikexuelunwen/1731554.html
最近更新
教材專著