海量虛擬身份數(shù)據(jù)的存儲管理關(guān)鍵技術(shù)研究與實現(xiàn)

發(fā)布時間：2018-08-03 20:06

【摘要】：隨著計算機網(wǎng)絡(luò)及其應(yīng)用的快速發(fā)展,網(wǎng)絡(luò)上出現(xiàn)了越來越多的網(wǎng)絡(luò)平臺、應(yīng)用,用戶在不同的平臺、應(yīng)用可能會使用大量的虛擬身份應(yīng)用信息。不論是靜態(tài)數(shù)據(jù)如注冊賬號,還是用戶交互消息如信息等都屬于虛擬身份應(yīng)用信息,它們存儲的數(shù)據(jù)總量均達到TB級別甚至PB級別。在Web2.0時代,互聯(lián)網(wǎng)應(yīng)用需要處理大量用戶創(chuàng)作或者分享的數(shù)據(jù),比如圖片、視頻、博客日志等,這些數(shù)據(jù)類型多種多樣并且格式、大小也不盡相同。數(shù)據(jù)量大,類型多樣,大小不一的特性對于海量數(shù)據(jù)存儲、管理提出了嚴峻的考驗。本文是基于863重大項目——***網(wǎng)絡(luò)身份管理與應(yīng)用技術(shù)中的子課題***虛擬身份管理。它的主要功能是通過多種手段獲得不同平臺下的虛擬身份數(shù)據(jù),并對它們做以統(tǒng)一管理,為實際的網(wǎng)絡(luò)平臺、應(yīng)用提供接口,方便查找、追溯等。本文是對虛擬身份數(shù)據(jù)的存儲關(guān)鍵技術(shù)進行研究,主要解決和實現(xiàn)了存儲時的數(shù)據(jù)模型,在分布式環(huán)境下數(shù)據(jù)劃分、數(shù)據(jù)副本以及查詢時提高效率的多維索引和緩存等問題,并在虛擬身份追溯系統(tǒng)中模擬運行進行檢測,為實現(xiàn)項目的要求提供存儲基礎(chǔ)。本文是基于Cassandra數(shù)據(jù)庫的,主要工作包括:(1)在存儲方面,針對虛擬身份數(shù)據(jù)量大,涉及模糊查詢等特點,提出了基于MySQL數(shù)據(jù)庫和Cassandra數(shù)據(jù)庫相結(jié)合的數(shù)據(jù)模型。在分布式環(huán)境下,考慮了數(shù)據(jù)劃分和數(shù)據(jù)備份等問題,設(shè)計與實現(xiàn)了基于加權(quán)改進一致性hash算法的數(shù)據(jù)劃分方法和基于數(shù)據(jù)規(guī)模與熱點變化相結(jié)合的數(shù)據(jù)副本策略。(2)在查詢方面,針對虛擬身份查詢請求中的無指定列的查詢,機器節(jié)點快速準確定位等問題,設(shè)計并實現(xiàn)了Cassandra索引與倒排索引、節(jié)點索引相結(jié)合的多維度索引�？紤]到請求訪問的局部性原理,設(shè)計實現(xiàn)了針對虛擬身份特點的語義緩存技術(shù)。(3)在系統(tǒng)實現(xiàn)方面,以虛擬追溯系統(tǒng)為依托,對存儲方面的數(shù)據(jù)模型、數(shù)據(jù)劃分思想以及數(shù)據(jù)副本策略,查詢方面的多維度索引和語義緩存做了性能測試,證明了以上方法對提高系統(tǒng)效率具有很好的性能。
[Abstract]:With the rapid development of computer network and its applications, more and more network platforms appear on the network. Users may use a large amount of virtual identity application information in different platforms. Both static data such as registered accounts and interactive messages such as information belong to virtual identity application information. The total amount of data stored by them reaches TB level or even PB level. In the era of Web2.0, Internet applications need to deal with a large number of user-created or shared data, such as pictures, videos, blog logs, and so on. The characteristics of large amount of data, diverse types and different sizes put forward a severe test for massive data storage and management. This paper is based on 863 major project * Network identity management and application technology in the subproject * virtual identity management. Its main function is to obtain virtual identity data under different platforms by various means, and to manage them uniformly, to provide interfaces for practical network platforms and applications, to facilitate searching and tracing, and so on. In this paper, the key technology of storage of virtual identity data is studied, which mainly solves and implements the data model, data partition in distributed environment, data replica, multidimensional index and cache to improve the efficiency of query, and so on. In the virtual identity traceability system, the simulated operation is tested to provide the storage base for the project. This paper is based on Cassandra database. The main work includes: (1) aiming at the characteristics of large amount of virtual identity data and fuzzy query, a data model based on the combination of MySQL database and Cassandra database is proposed. In the distributed environment, the problems of data partitioning and data backup are considered. This paper designs and implements the data partitioning method based on the weighted improved consistent hash algorithm and the data replica strategy based on the combination of data scale and hot spot change. (2) in the aspect of query, the query with no specified column in the virtual identity query request is designed and implemented. In order to locate the machine nodes quickly and accurately, this paper designs and implements a multi-dimensional index which combines Cassandra index, inverted index and node index. Considering the local principle of request access, this paper designs and implements the semantic cache technology for the characteristics of virtual identity. (3) in the aspect of system implementation, the data model of storage is based on virtual traceability system. The idea of data partitioning, data replica strategy, multi-dimensional index and semantic cache in query are tested, which proves that these methods have good performance in improving system efficiency.
【學(xué)位授予單位】：國防科學(xué)技術(shù)大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2013
【分類號】：TP333
，

本文編號：2162866

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/jisuanjikexuelunwen/2162866.html

上一篇：基于虛擬化環(huán)境下SSD性能優(yōu)化機制研究
下一篇：基于Django框架的故障診斷和安全評估平臺

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

海量虛擬身份數(shù)據(jù)的存儲管理關(guān)鍵技術(shù)研究與實現(xiàn)