信息存儲(chǔ)系統(tǒng)中重復(fù)數(shù)據(jù)刪除技術(shù)的研究

發(fā)布時(shí)間：2018-11-07 11:31

【摘要】：重復(fù)數(shù)據(jù)刪除技術(shù)是網(wǎng)絡(luò)存儲(chǔ)系統(tǒng)中一種數(shù)據(jù)無損壓縮的解決方案，，可以有效地抑制數(shù)據(jù)存儲(chǔ)開銷過快的增長，縮減存儲(chǔ)系統(tǒng)的構(gòu)建以及運(yùn)營管理的成本。在數(shù)據(jù)信息量迅猛增長的背景下，重復(fù)數(shù)據(jù)刪除技術(shù)得到了學(xué)術(shù)界和產(chǎn)業(yè)界廣泛的關(guān)注。但重復(fù)數(shù)據(jù)刪除領(lǐng)域仍然存在諸多技術(shù)問題，如提高數(shù)據(jù)壓縮率，減少處理時(shí)間，優(yōu)化數(shù)據(jù)可靠性等方面。針對(duì)上述存在的問題，論文從重復(fù)數(shù)據(jù)刪除處理方法，重復(fù)數(shù)據(jù)刪除處理中的數(shù)據(jù)可靠性問題以及存儲(chǔ)后臺(tái)的數(shù)據(jù)分布策略三個(gè)方面展開了深入的研究。通過理論分析模型以及現(xiàn)實(shí)數(shù)據(jù)集的實(shí)測分析，對(duì)影響重復(fù)數(shù)據(jù)刪除處理效果的因素展開了研究。目標(biāo)數(shù)據(jù)的重復(fù)特征對(duì)重復(fù)數(shù)據(jù)刪除處理的效果具有較大影響，因此，提出了一種基于重復(fù)特征的重復(fù)數(shù)據(jù)刪除策略，對(duì)數(shù)據(jù)壓縮率以及處理時(shí)間開銷進(jìn)行優(yōu)化。該策略主要包括基于語義的數(shù)據(jù)分組策略和漸進(jìn)式數(shù)據(jù)分割粒度判定法�；谡Z義的數(shù)據(jù)分組策略根據(jù)語義信息對(duì)數(shù)據(jù)的重復(fù)特征以及相似性進(jìn)行判別并完成對(duì)目標(biāo)數(shù)據(jù)的分組操作。漸進(jìn)式數(shù)據(jù)分割粒度判定法是以數(shù)據(jù)分組為操作單位，根據(jù)重復(fù)特征對(duì)數(shù)據(jù)分割策略進(jìn)行合適地設(shè)置。實(shí)驗(yàn)測試表明基于重復(fù)特征的重復(fù)數(shù)據(jù)刪除策略相對(duì)于其它重復(fù)數(shù)據(jù)刪除解決方案，在數(shù)據(jù)壓縮率以及處理時(shí)間開銷上獲得了更加優(yōu)異的綜合性能。針對(duì)重復(fù)數(shù)據(jù)刪除處理中數(shù)據(jù)可靠性的問題提出了一種最優(yōu)冗余度計(jì)算模型，根據(jù)數(shù)據(jù)的引用熱度提高目標(biāo)數(shù)據(jù)的可靠性。為了將該理論模型應(yīng)用到現(xiàn)實(shí)存儲(chǔ)系統(tǒng)中，采用抽取數(shù)據(jù)單元樣本空間計(jì)算經(jīng)驗(yàn)數(shù)值的方法對(duì)理論模型進(jìn)行了可行性優(yōu)化，并提出一種基于引用熱度的數(shù)據(jù)冗余策略。該數(shù)據(jù)冗余策略根據(jù)數(shù)據(jù)單元的相關(guān)屬性（數(shù)據(jù)單元的大小以及引用熱度）配置最優(yōu)的冗余度，確保目標(biāo)數(shù)據(jù)集使用最小的存儲(chǔ)開銷獲得最優(yōu)的數(shù)據(jù)可靠性。仿真實(shí)驗(yàn)驗(yàn)證了基于引用熱度的數(shù)據(jù)冗余策略的可行性和有效性。針對(duì)當(dāng)前數(shù)據(jù)分布策略中靈活性不足的問題，提出了一種基于容量感知的數(shù)據(jù)分布策略，以改善在物理節(jié)點(diǎn)間存儲(chǔ)資源不相等的情況下存儲(chǔ)負(fù)載的均衡程度。該策略提供了兩種情況下的數(shù)據(jù)分布策略解決方案。在不考慮數(shù)據(jù)冗余度情況下，提出了一種基于容量感知的數(shù)據(jù)分布式策略，該策略基于一致性哈希數(shù)據(jù)分布算法，引入了虛擬化的設(shè)計(jì)思路，采用虛擬節(jié)點(diǎn)分配法進(jìn)行存儲(chǔ)資源的分配；并采用基于節(jié)點(diǎn)容量感知的負(fù)載均衡方法對(duì)物理存儲(chǔ)節(jié)點(diǎn)之間的數(shù)據(jù)負(fù)載分布進(jìn)行優(yōu)化調(diào)整。在考慮數(shù)據(jù)冗余度情況下，提出了一種支持多冗余度的數(shù)據(jù)分布策略，為數(shù)據(jù)冗余策略提供靈活的平臺(tái)支持，并對(duì)存儲(chǔ)負(fù)載均衡程度進(jìn)行優(yōu)化。仿真測試結(jié)果表明兩種數(shù)據(jù)分布策略在各自應(yīng)用背景下均有助于改善存儲(chǔ)數(shù)據(jù)負(fù)載的均衡水平。
[Abstract]:Repeated data deletion is a kind of data lossless compression solution in network storage system. It can effectively restrain the fast growth of data storage overhead and reduce the cost of building storage system and operation management. Under the background of rapid growth of data information, repeated data deletion technology has been widely concerned by academia and industry. However, there are still many technical problems in the field of repeated data deletion, such as increasing data compression ratio, reducing processing time, optimizing data reliability and so on. In view of the above problems, this paper has carried out in-depth research from three aspects: repetitive data delete processing, data reliability in repetitive data deletion processing and data distribution strategy in storage background. Based on the theoretical analysis model and the real data set, the factors that affect the processing effect of repeated data deletion are studied. The repetition feature of target data has great influence on the effect of repeated data deletion. Therefore, a repetitive data deletion strategy based on repetition feature is proposed to optimize the data compression ratio and processing time cost. The strategy mainly includes semantic data grouping strategy and progressive data segmentation granularity decision method. According to the semantic information, the data grouping strategy based on semantics is used to distinguish the repeated features and similarity of the data and to complete the grouping operation of the target data. Progressive data segmentation granularity determination method is based on the data grouping as the unit of operation, according to the repeated characteristics of the data segmentation strategy is properly set. The experimental results show that the repetitive data deletion strategy based on repetition features has better comprehensive performance in data compression ratio and processing time than other repetitive data deletion solutions. In order to solve the problem of data reliability in repeated data deletion, an optimal redundancy calculation model is proposed to improve the reliability of target data according to the heat of reference. In order to apply the theoretical model to the real storage system, this paper optimizes the feasibility of the theoretical model by taking the sample space of the data unit to calculate the empirical value, and proposes a data redundancy strategy based on the heat of reference. The optimal redundancy is configured according to the relative attributes of the data unit (the size of the data unit and the heat of reference) to ensure the optimal data reliability of the target data set using the minimum storage cost. Simulation results demonstrate the feasibility and effectiveness of the data redundancy strategy based on citation heat. Aiming at the lack of flexibility in the current data distribution strategy, a capacity-aware data distribution strategy is proposed to improve the balance of storage load in the case of unequal storage resources between physical nodes. This strategy provides a data distribution policy solution in two cases. Without considering data redundancy, a capacity-aware distributed data strategy is proposed, which is based on the consistent hash data distribution algorithm and introduces the design idea of virtualization. Virtual node allocation method is used to allocate storage resources. The load balancing method based on node capacity awareness is used to optimize the data load distribution between physical storage nodes. Considering the data redundancy, a data distribution strategy supporting multiple redundancy is proposed, which provides flexible platform support for the data redundancy policy, and optimizes the storage load balancing degree. The simulation results show that the two data distribution strategies are helpful to improve the balance level of storage data load in their respective application background.
【學(xué)位授予單位】：華中科技大學(xué)
【學(xué)位級(jí)別】：博士
【學(xué)位授予年份】：2012
【分類號(hào)】：TP333

【共引文獻(xiàn)】

相關(guān)期刊論文前10條

1 林琳;;基于C語言的存儲(chǔ)資源管理系統(tǒng)的研究[J];才智;2011年13期

2 胡峰;張杰;劉靜;肖大偉;;一種基于Rough集的海量數(shù)據(jù)屬性約簡方法[J];重慶郵電大學(xué)學(xué)報(bào)(自然科學(xué)版);2009年04期

3 劉霖;趙躍龍;李成藝;;一種新的存儲(chǔ)解決方案——IND系統(tǒng)存儲(chǔ)[J];電腦與信息技術(shù);2006年05期

4 王丹玲;;虛擬化存儲(chǔ)及其實(shí)現(xiàn)[J];電腦知識(shí)與技術(shù);2006年05期

5 劉紹凱;;存儲(chǔ)區(qū)域網(wǎng)(SAN)系統(tǒng)的管理及其實(shí)現(xiàn)研究[J];電腦知識(shí)與技術(shù);2006年26期

6 蔣春曦;謝慶勝;王偉;;省級(jí)行業(yè)信息服務(wù)系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];電腦知識(shí)與技術(shù);2008年17期

7 王宇;;網(wǎng)絡(luò)存儲(chǔ)面面觀[J];電聲技術(shù);2008年05期

8 夏國遠(yuǎn);;數(shù)據(jù)存儲(chǔ)技術(shù)的應(yīng)用分析[J];大眾科技;2011年09期

9 黃曉武;;基于ISCSI的校園網(wǎng)絡(luò)存儲(chǔ)安全研究[J];福建電腦;2006年03期

10 王春建;;電視非編網(wǎng)素材的實(shí)時(shí)備份[J];廣播電視信息;2011年11期

相關(guān)會(huì)議論文前2條

1 劉景寧;王曉靜;童薇;時(shí)洋;馮丹;;對(duì)象存儲(chǔ)器中光纖通道驅(qū)動(dòng)程序設(shè)計(jì)與優(yōu)化[A];第15屆全國信息存儲(chǔ)技術(shù)學(xué)術(shù)會(huì)議論文集[C];2008年

2 王雪嬌;錢軍;溫東新;張展;崔忠強(qiáng);;基于Linux虛擬文件系統(tǒng)故障注入器的設(shè)計(jì)與實(shí)現(xiàn)[A];第六屆中國測試學(xué)術(shù)會(huì)議論文集[C];2010年

相關(guān)博士學(xué)位論文前10條

1 楊天明;網(wǎng)絡(luò)備份中重復(fù)數(shù)據(jù)刪除技術(shù)研究[D];華中科技大學(xué);2010年

2 牛中盈;并行文件系統(tǒng)安全性研究[D];華中科技大學(xué);2010年

3 林勝;存儲(chǔ)系統(tǒng)容錯(cuò)及陣列編碼[D];南開大學(xué);2010年

4 陳俊健;面向?qū)ο蟠鎯?chǔ)系統(tǒng)安全技術(shù)研究[D];華中科技大學(xué);2011年

5 彭濤;基于特征和實(shí)例的海量數(shù)據(jù)約簡方法研究[D];華中科技大學(xué);2011年

6 姜明華;基于冗余智能存儲(chǔ)通道的存儲(chǔ)系統(tǒng)關(guān)鍵技術(shù)研究[D];華中科技大學(xué);2011年

7 魏青松;大規(guī)模分布式存儲(chǔ)技術(shù)研究[D];電子科技大學(xué);2004年

8 吳濤;虛擬化存儲(chǔ)技術(shù)研究[D];華中科技大學(xué);2004年

9 王爍;數(shù)字視頻播放系統(tǒng)的研究[D];華中科技大學(xué);2004年

10 鄧玉輝;基于網(wǎng)絡(luò)磁盤陣列的海量信息存儲(chǔ)系統(tǒng)[D];華中科技大學(xué);2004年

相關(guān)碩士學(xué)位論文前10條

1 段莉娟;網(wǎng)絡(luò)中間件數(shù)據(jù)采集系統(tǒng)的研究與實(shí)現(xiàn)[D];電子科技大學(xué);2010年

2 胡永奎;對(duì)象存儲(chǔ)設(shè)備中文件系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[D];解放軍信息工程大學(xué);2010年

3 王莉莉;基于DELTA壓縮算法的大型數(shù)據(jù)庫災(zāi)備關(guān)鍵技術(shù)研究[D];電子科技大學(xué);2011年

4 柏宏斌;基于B/S架構(gòu)的信息管理系統(tǒng)理論研究[D];電子科技大學(xué);2010年

5 彭亮亮;基于IPv6的校園網(wǎng)絡(luò)存儲(chǔ)再生性的研究[D];西安電子科技大學(xué);2010年

6 魯昌龍;固態(tài)硬盤存儲(chǔ)系統(tǒng)模型及存儲(chǔ)管理層算法的研究[D];景德鎮(zhèn)陶瓷學(xué)院;2011年

7 徐忠明;基于Hadoop的網(wǎng)絡(luò)驗(yàn)證平臺(tái)的研究[D];廣東工業(yè)大學(xué);2011年

8 蔡洪;智能網(wǎng)絡(luò)存儲(chǔ)系統(tǒng)（INSS）中負(fù)載均衡技術(shù)的研究[D];華南理工大學(xué);2011年

9 王承才;小學(xué)校園Web網(wǎng)絡(luò)硬盤應(yīng)用系統(tǒng)的研究及實(shí)現(xiàn)[D];華南理工大學(xué);2011年

10 羅浩;基于P2P的分布式存儲(chǔ)研究與實(shí)現(xiàn)[D];電子科技大學(xué);2011年

本文編號(hào)：2316231

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/jisuanjikexuelunwen/2316231.html

上一篇：全系統(tǒng)模擬器配置與仿真控制機(jī)制設(shè)計(jì)
下一篇：一種改進(jìn)的云存儲(chǔ)平臺(tái)權(quán)限管理機(jī)制設(shè)計(jì)

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

信息存儲(chǔ)系統(tǒng)中重復(fù)數(shù)據(jù)刪除技術(shù)的研究