信息存儲(chǔ)系統(tǒng)中重復(fù)數(shù)據(jù)刪除技術(shù)的研究
[Abstract]:Repeated data deletion is a kind of data lossless compression solution in network storage system. It can effectively restrain the fast growth of data storage overhead and reduce the cost of building storage system and operation management. Under the background of rapid growth of data information, repeated data deletion technology has been widely concerned by academia and industry. However, there are still many technical problems in the field of repeated data deletion, such as increasing data compression ratio, reducing processing time, optimizing data reliability and so on. In view of the above problems, this paper has carried out in-depth research from three aspects: repetitive data delete processing, data reliability in repetitive data deletion processing and data distribution strategy in storage background. Based on the theoretical analysis model and the real data set, the factors that affect the processing effect of repeated data deletion are studied. The repetition feature of target data has great influence on the effect of repeated data deletion. Therefore, a repetitive data deletion strategy based on repetition feature is proposed to optimize the data compression ratio and processing time cost. The strategy mainly includes semantic data grouping strategy and progressive data segmentation granularity decision method. According to the semantic information, the data grouping strategy based on semantics is used to distinguish the repeated features and similarity of the data and to complete the grouping operation of the target data. Progressive data segmentation granularity determination method is based on the data grouping as the unit of operation, according to the repeated characteristics of the data segmentation strategy is properly set. The experimental results show that the repetitive data deletion strategy based on repetition features has better comprehensive performance in data compression ratio and processing time than other repetitive data deletion solutions. In order to solve the problem of data reliability in repeated data deletion, an optimal redundancy calculation model is proposed to improve the reliability of target data according to the heat of reference. In order to apply the theoretical model to the real storage system, this paper optimizes the feasibility of the theoretical model by taking the sample space of the data unit to calculate the empirical value, and proposes a data redundancy strategy based on the heat of reference. The optimal redundancy is configured according to the relative attributes of the data unit (the size of the data unit and the heat of reference) to ensure the optimal data reliability of the target data set using the minimum storage cost. Simulation results demonstrate the feasibility and effectiveness of the data redundancy strategy based on citation heat. Aiming at the lack of flexibility in the current data distribution strategy, a capacity-aware data distribution strategy is proposed to improve the balance of storage load in the case of unequal storage resources between physical nodes. This strategy provides a data distribution policy solution in two cases. Without considering data redundancy, a capacity-aware distributed data strategy is proposed, which is based on the consistent hash data distribution algorithm and introduces the design idea of virtualization. Virtual node allocation method is used to allocate storage resources. The load balancing method based on node capacity awareness is used to optimize the data load distribution between physical storage nodes. Considering the data redundancy, a data distribution strategy supporting multiple redundancy is proposed, which provides flexible platform support for the data redundancy policy, and optimizes the storage load balancing degree. The simulation results show that the two data distribution strategies are helpful to improve the balance level of storage data load in their respective application background.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP333
【共引文獻(xiàn)】
相關(guān)期刊論文 前10條
1 林琳;;基于C語言的存儲(chǔ)資源管理系統(tǒng)的研究[J];才智;2011年13期
2 胡峰;張杰;劉靜;肖大偉;;一種基于Rough集的海量數(shù)據(jù)屬性約簡方法[J];重慶郵電大學(xué)學(xué)報(bào)(自然科學(xué)版);2009年04期
3 劉霖;趙躍龍;李成藝;;一種新的存儲(chǔ)解決方案——IND系統(tǒng)存儲(chǔ)[J];電腦與信息技術(shù);2006年05期
4 王丹玲;;虛擬化存儲(chǔ)及其實(shí)現(xiàn)[J];電腦知識(shí)與技術(shù);2006年05期
5 劉紹凱;;存儲(chǔ)區(qū)域網(wǎng)(SAN)系統(tǒng)的管理及其實(shí)現(xiàn)研究[J];電腦知識(shí)與技術(shù);2006年26期
6 蔣春曦;謝慶勝;王偉;;省級(jí)行業(yè)信息服務(wù)系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];電腦知識(shí)與技術(shù);2008年17期
7 王宇;;網(wǎng)絡(luò)存儲(chǔ)面面觀[J];電聲技術(shù);2008年05期
8 夏國遠(yuǎn);;數(shù)據(jù)存儲(chǔ)技術(shù)的應(yīng)用分析[J];大眾科技;2011年09期
9 黃曉武;;基于ISCSI的校園網(wǎng)絡(luò)存儲(chǔ)安全研究[J];福建電腦;2006年03期
10 王春建;;電視非編網(wǎng)素材的實(shí)時(shí)備份[J];廣播電視信息;2011年11期
相關(guān)會(huì)議論文 前2條
1 劉景寧;王曉靜;童薇;時(shí)洋;馮丹;;對(duì)象存儲(chǔ)器中光纖通道驅(qū)動(dòng)程序設(shè)計(jì)與優(yōu)化[A];第15屆全國信息存儲(chǔ)技術(shù)學(xué)術(shù)會(huì)議論文集[C];2008年
2 王雪嬌;錢軍;溫東新;張展;崔忠強(qiáng);;基于Linux虛擬文件系統(tǒng)故障注入器的設(shè)計(jì)與實(shí)現(xiàn)[A];第六屆中國測試學(xué)術(shù)會(huì)議論文集[C];2010年
相關(guān)博士學(xué)位論文 前10條
1 楊天明;網(wǎng)絡(luò)備份中重復(fù)數(shù)據(jù)刪除技術(shù)研究[D];華中科技大學(xué);2010年
2 牛中盈;并行文件系統(tǒng)安全性研究[D];華中科技大學(xué);2010年
3 林勝;存儲(chǔ)系統(tǒng)容錯(cuò)及陣列編碼[D];南開大學(xué);2010年
4 陳俊健;面向?qū)ο蟠鎯?chǔ)系統(tǒng)安全技術(shù)研究[D];華中科技大學(xué);2011年
5 彭濤;基于特征和實(shí)例的海量數(shù)據(jù)約簡方法研究[D];華中科技大學(xué);2011年
6 姜明華;基于冗余智能存儲(chǔ)通道的存儲(chǔ)系統(tǒng)關(guān)鍵技術(shù)研究[D];華中科技大學(xué);2011年
7 魏青松;大規(guī)模分布式存儲(chǔ)技術(shù)研究[D];電子科技大學(xué);2004年
8 吳濤;虛擬化存儲(chǔ)技術(shù)研究[D];華中科技大學(xué);2004年
9 王爍;數(shù)字視頻播放系統(tǒng)的研究[D];華中科技大學(xué);2004年
10 鄧玉輝;基于網(wǎng)絡(luò)磁盤陣列的海量信息存儲(chǔ)系統(tǒng)[D];華中科技大學(xué);2004年
相關(guān)碩士學(xué)位論文 前10條
1 段莉娟;網(wǎng)絡(luò)中間件數(shù)據(jù)采集系統(tǒng)的研究與實(shí)現(xiàn)[D];電子科技大學(xué);2010年
2 胡永奎;對(duì)象存儲(chǔ)設(shè)備中文件系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[D];解放軍信息工程大學(xué);2010年
3 王莉莉;基于DELTA壓縮算法的大型數(shù)據(jù)庫災(zāi)備關(guān)鍵技術(shù)研究[D];電子科技大學(xué);2011年
4 柏宏斌;基于B/S架構(gòu)的信息管理系統(tǒng)理論研究[D];電子科技大學(xué);2010年
5 彭亮亮;基于IPv6的校園網(wǎng)絡(luò)存儲(chǔ)再生性的研究[D];西安電子科技大學(xué);2010年
6 魯昌龍;固態(tài)硬盤存儲(chǔ)系統(tǒng)模型及存儲(chǔ)管理層算法的研究[D];景德鎮(zhèn)陶瓷學(xué)院;2011年
7 徐忠明;基于Hadoop的網(wǎng)絡(luò)驗(yàn)證平臺(tái)的研究[D];廣東工業(yè)大學(xué);2011年
8 蔡洪;智能網(wǎng)絡(luò)存儲(chǔ)系統(tǒng)(INSS)中負(fù)載均衡技術(shù)的研究[D];華南理工大學(xué);2011年
9 王承才;小學(xué)校園Web網(wǎng)絡(luò)硬盤應(yīng)用系統(tǒng)的研究及實(shí)現(xiàn)[D];華南理工大學(xué);2011年
10 羅浩;基于P2P的分布式存儲(chǔ)研究與實(shí)現(xiàn)[D];電子科技大學(xué);2011年
本文編號(hào):2316231
本文鏈接:http://www.sikaile.net/kejilunwen/jisuanjikexuelunwen/2316231.html