基于相關(guān)實體檢索模型的信息保護

發(fā)布時間：2018-05-21 09:42

本文選題：信息保護 + 實體檢索��；參考：《復旦大學》2012年碩士論文

【摘要】：隨著自然語言處理、數(shù)據(jù)挖掘等技術(shù)的發(fā)展,尤其是搜索引擎的廣泛應用,人們可以很高效地將原本分散的信息組織在一起,普通用戶也能便捷地從網(wǎng)絡(luò)中獲取期望的信息。然而強有力的網(wǎng)絡(luò)信息檢索技術(shù)是把雙刃劍,用戶在獲取外部知識變得更快捷的同時,隱藏自己的私有信息也變得越來越困難。用戶在論壇、博客、社交網(wǎng)絡(luò)等web應用上發(fā)布的原本安全的信息、,攻擊者通過搜索引擎進行的相關(guān)實體推定,就有可能造成用戶的信息泄漏。傳統(tǒng)的信息防護多集中在數(shù)據(jù)庫及信息安全領(lǐng)域,前者主要研究結(jié)構(gòu)化數(shù)據(jù)上的信息、保護；后者主要研究傳輸路徑上的信息安全。本文作為863研究發(fā)展計劃“基于Web的用戶數(shù)據(jù)安全防護關(guān)鍵技術(shù)研究”的關(guān)鍵子項目,主要研究大規(guī)模非結(jié)構(gòu)化數(shù)據(jù)上的敏感信息的關(guān)聯(lián)性,構(gòu)建互聯(lián)網(wǎng)環(huán)境下的敏感信息保護框架,相關(guān)的研究背景主要集中在信息檢索與自然語言處理方面。本文在利用搜索引擎的基礎(chǔ)上,針對互聯(lián)網(wǎng)用戶數(shù)據(jù)的特點,綜合運用了文本挖掘與信息檢索的多種技術(shù)與方法,提出了一個多角度關(guān)聯(lián)模型,通過相關(guān)實體檢索預測出潛在的用戶信息泄漏,從而達到保護用戶信息的目的。本文的工作主要包括： ●介紹信息保護的研究現(xiàn)狀,數(shù)據(jù)庫及信息安全領(lǐng)域的傳統(tǒng)信息保護方法,大規(guī)模非結(jié)構(gòu)化數(shù)據(jù)防護涉及到的技術(shù)與方法 ●提出基于相關(guān)實體檢索算法的信息保護框架,構(gòu)建多角度實體關(guān)聯(lián)模型,并通過對權(quán)威主頁的深度挖掘,改進了關(guān)聯(lián)模型的檢索結(jié)果。 ●以框架為基礎(chǔ)設(shè)計和實現(xiàn)了一個基于互聯(lián)網(wǎng)海量語料的信息防護系統(tǒng)。系統(tǒng)的相關(guān)實體檢索模塊在TRE-C2010的相關(guān)實體任務(wù)數(shù)據(jù)集上進行了實驗,與基于BM25及貝葉斯模型等其他實體檢索方法相比,本文提出的方法各項評測指標都優(yōu)于前者,顯示了模型的準確性和適用性,證明了方法的有效性。
[Abstract]:With the development of natural language processing, data mining and other technologies, especially the wide application of search engines, people can organize the originally dispersed information efficiently, and ordinary users can easily obtain the desired information from the network. However, powerful network information retrieval technology is a double-edged sword. It is becoming more and more difficult for users to hide their private information while acquiring external knowledge more quickly. Users posted on web applications such as forums, blogs, social networks and other previously secure information, attackers through the search engine related entity presumption, may cause users' information disclosure. Traditional information protection mainly focuses on database and information security. The former mainly studies information protection on structured data and the latter focuses on information security in transmission path. As a key subproject of the 863 Research and Development Program "Research on key Technologies of user data Security Protection based on Web", this paper mainly studies the relevance of sensitive information on large-scale unstructured data. The research background of constructing sensitive information protection framework in Internet environment is mainly focused on information retrieval and natural language processing. On the basis of search engine, according to the characteristics of Internet user data, this paper synthetically applies various techniques and methods of text mining and information retrieval, and puts forward a multi-angle correlation model. The potential leakage of user information is predicted by retrieval of relevant entities, and the purpose of protecting user information is achieved. The work of this paper mainly includes: This paper introduces the research status of information protection, the traditional information protection methods in the field of database and information security, and the techniques and methods involved in large-scale unstructured data protection. An information protection framework based on relevant entity retrieval algorithm is proposed, and a multi-angle entity association model is constructed, and the retrieval results of the association model are improved through the deep mining of the authoritative home page. Based on the framework, an information protection system based on Internet mass corpus is designed and implemented. Compared with other entity retrieval methods based on BM25 and Bayesian model, the method proposed in this paper is superior to the former. The accuracy and applicability of the model are demonstrated, and the validity of the method is proved.
【學位授予單位】：復旦大學
【學位級別】：碩士
【學位授予年份】：2012
【分類號】：TP311.13

【參考文獻】

相關(guān)期刊論文前1條

1 孫麟;牛軍鈺;;基于領(lǐng)域相關(guān)詞匯提取的特征選擇方法[J];小型微型計算機系統(tǒng);2007年05期

，

本文編號：1918685

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/1918685.html

上一篇：網(wǎng)絡(luò)廣告還有明天
下一篇：如何在網(wǎng)絡(luò)中搭建個人服務(wù)器

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于相關(guān)實體檢索模型的信息保護