基于匿名化的隱私保護(hù)數(shù)據(jù)挖掘技術(shù)的研究

發(fā)布時(shí)間：2019-03-09 09:32

【摘要】：近年來,信息技術(shù)和數(shù)據(jù)科學(xué)發(fā)展迅猛,并逐漸被應(yīng)用于各個(gè)行業(yè)。數(shù)據(jù)挖掘可以發(fā)現(xiàn)潛在的數(shù)據(jù)信息和數(shù)據(jù)間的微妙關(guān)系并將其用于決策制定,然而,敏感和隱私信息也同樣會(huì)被挖掘,這將會(huì)給數(shù)據(jù)提供者造成安全風(fēng)險(xiǎn)。匿名化技術(shù)是一項(xiàng)通過將同一等價(jià)類中的用戶準(zhǔn)標(biāo)志信息變的不可區(qū)分,從而保護(hù)用戶身份或敏感信息不被識(shí)別的隱私保護(hù)技術(shù)。目前的相關(guān)算法多是針對(duì)關(guān)系型等結(jié)構(gòu)型數(shù)據(jù)集而設(shè)計(jì),無法直接用于事務(wù)數(shù)據(jù)集的匿名化。少數(shù)針對(duì)事務(wù)數(shù)據(jù)的匿名化算法面臨著數(shù)據(jù)高維度,稀疏型數(shù)據(jù)敏感等問題,從而導(dǎo)致算法時(shí)間復(fù)雜度高,信息損失大。此外,現(xiàn)實(shí)數(shù)據(jù)中敏感信息往往因人而異,攻擊者的背景知識(shí)也往往存在局限性。因此,本文的主要研究為:針對(duì)事務(wù)數(shù)據(jù)集的匿名化隱私保護(hù)問題,本文提出了一個(gè)K-匿名隱私保護(hù)數(shù)據(jù)挖掘算法PTA,針對(duì)性的解決了事務(wù)數(shù)據(jù)集匿名化時(shí)間復(fù)雜度高和信息損失大的問題。本文將最小化信息損失的問題轉(zhuǎn)換為求解TSP問題中的最短環(huán)路問題,并通過一個(gè)類似Prim算法的思想求解最短環(huán)路,再通過設(shè)計(jì)的映射、投票和篩選操作對(duì)信息損失進(jìn)行優(yōu)化,從而實(shí)現(xiàn)了事務(wù)數(shù)據(jù)集的K-匿名隱私保護(hù)。此外,本文采用分而治之的思想,以用來降低算法的時(shí)間復(fù)雜度。實(shí)驗(yàn)表明,該算法在時(shí)間復(fù)雜度和信息損失量方面均優(yōu)于現(xiàn)有算法。針對(duì)個(gè)性化的匿名化隱私保護(hù)問題,本文首次提出了個(gè)性化層次事務(wù)數(shù)據(jù)集的隱私保護(hù)問題。同時(shí),考慮到傳統(tǒng)的L-多樣性隱私保護(hù)技術(shù)存在的缺陷,本文提出了一個(gè)(L,P)-多樣性的隱私保護(hù)概念,并在此基礎(chǔ)上提出了一個(gè)貪婪的Lnn-means算法。該算法首先通過層次泛化和矩陣化方法將原始數(shù)據(jù)轉(zhuǎn)換為類似關(guān)系數(shù)據(jù)集的形式,并采用聚類技術(shù)將相似度較高的事務(wù)記錄進(jìn)行聚類。最后,通過優(yōu)先對(duì)信息損失量小且滿足(L,P)-多樣性的事務(wù)記錄進(jìn)行等價(jià)類生成,從而實(shí)現(xiàn)了(L,P)-多樣性隱私保護(hù)。事實(shí)上,Lnn-means算法不僅彌補(bǔ)了傳統(tǒng)L-多樣性隱私保護(hù)技術(shù)的缺陷,還在一定程度上避免了L-多樣性隱私保護(hù)可能面臨的語義攻擊問題,隱私保護(hù)能力更強(qiáng),安全性更高�？傮w來說,本文針對(duì)性的解決了事務(wù)數(shù)據(jù)集和個(gè)性化事務(wù)數(shù)據(jù)的匿名化隱私保護(hù)問題。大量的實(shí)驗(yàn)證明,本文所提出的框架與算法在實(shí)現(xiàn)匿名化需求上是可行且有效的。
[Abstract]:In recent years, the rapid development of information technology and data science, and gradually applied to various industries. Data mining can discover potential data information and the subtle relationship between data and use it in decision-making, however, sensitive and privacy information will also be mined, which will cause security risks to data providers. Anonymization is a privacy protection technology which can protect users' identity or sensitive information from identification by changing the user quasi-flag information in the same equivalent class into indistinguishable information. At present, most of the related algorithms are designed for relational and other structured datasets, and can not be directly used for anonymization of transaction datasets. A few anonymous algorithms for transaction data are faced with problems such as high dimension and sparse data sensitivity, which lead to high time complexity and large loss of information. In addition, sensitive information in real data often varies from person to person, and attacker's background knowledge often has limitations. Therefore, the main research of this paper is as follows: for the problem of anonymized privacy protection of transaction data sets, this paper proposes a K-anonymous privacy protection data mining algorithm, PTA,. It solves the problems of high time complexity and information loss in transaction data set concealment. In this paper, the problem of minimizing information loss is transformed into solving the shortest loop problem in the TSP problem, and the shortest loop problem is solved by an idea similar to the Prim algorithm, and then the information loss is optimized by mapping, voting and filtering operations designed. Thus, the K-anonymity privacy protection of transaction data set is realized. In addition, the idea of divide-and-conquer is used to reduce the time complexity of the algorithm. Experimental results show that the proposed algorithm is superior to the existing algorithms in terms of time complexity and information loss. In this paper, the privacy protection of personalized hierarchical transaction data set is proposed for the first time in order to solve the privacy protection problem of personalization anonymity. At the same time, considering the shortcomings of the traditional L-diversity privacy protection technology, this paper proposes a concept of (L, P)-diversity privacy protection, and then proposes a greedy Lnn-means algorithm based on the concept of (L, P)-diversity privacy protection. Firstly, the original data is transformed into similar relational data set by hierarchical generalization and matrix method, and the transaction records with high similarity are clustered by clustering technology. Finally, the (L, P)-privacy protection of (L, P)-diversity is realized by generating equivalent classes of transaction records satisfying (L, P)-diversity, which are small amount of information loss and satisfying (L, P)-diversity. In fact, the Lnn-means algorithm not only makes up for the shortcomings of the traditional L-diversity privacy protection technology, but also avoids the semantic attack problem that the L-diversity privacy protection may face to a certain extent, and the privacy protection ability is stronger and the security is higher. In general, this paper solves the privacy protection problem of transaction data set and personalized transaction data. A large number of experiments show that the proposed framework and algorithm are feasible and effective in realizing anonymization requirements.
【學(xué)位授予單位】：哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類號(hào)】：TP309;TP311.13

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 ;守住你的秘密——隱私保護(hù)神[J];計(jì)算機(jī)與網(wǎng)絡(luò);2002年05期

2 李學(xué)聚;;新時(shí)期讀者隱私保護(hù)探析[J];科技情報(bào)開發(fā)與經(jīng)濟(jì);2006年13期

3 管重;;誰偷窺了你的隱私[J];數(shù)字通信;2007年15期

4 孔為民;;大學(xué)圖書館與隱私保護(hù)[J];科技情報(bào)開發(fā)與經(jīng)濟(jì);2007年26期

5 尹凱華;熊璋;吳晶;;個(gè)性化服務(wù)中隱私保護(hù)技術(shù)綜述[J];計(jì)算機(jī)應(yīng)用研究;2008年07期

6 高楓;張峰;周偉;;網(wǎng)絡(luò)環(huán)境中的隱私保護(hù)標(biāo)準(zhǔn)化研究[J];電信科學(xué);2013年04期

7 高密;薛寶賞;;我的電腦信息隱私保護(hù)很強(qiáng)大[J];網(wǎng)友世界;2010年11期

8 ;為自己的電子商務(wù)設(shè)計(jì)隱私保護(hù)[J];個(gè)人電腦;2000年07期

9 ;隱私保護(hù)的10個(gè)準(zhǔn)則[J];個(gè)人電腦;2000年07期

10 岑婷婷;韓建民;王基一;李細(xì)雨;;隱私保護(hù)中K-匿名模型的綜述[J];計(jì)算機(jī)工程與應(yīng)用;2008年04期

相關(guān)會(huì)議論文前10條

1 鄭思琳;陳紅;葉運(yùn)莉;;實(shí)習(xí)護(hù)士病人隱私保護(hù)意識(shí)和行為調(diào)查分析[A];中華護(hù)理學(xué)會(huì)第8屆全國造口、傷口、失禁護(hù)理學(xué)術(shù)交流會(huì)議、全國外科護(hù)理學(xué)術(shù)交流會(huì)議、全國神經(jīng)內(nèi)、外科護(hù)理學(xué)術(shù)交流會(huì)議論文匯編[C];2011年

2 孫通源;;基于局部聚類和雜度增益的數(shù)據(jù)信息隱私保護(hù)方法探討[A];中國水利學(xué)會(huì)2013學(xué)術(shù)年會(huì)論文集——S4水利信息化建設(shè)與管理[C];2013年

3 張亞維;朱智武;葉曉俊;;數(shù)據(jù)空間隱私保護(hù)平臺(tái)的設(shè)計(jì)[A];第二十五屆中國數(shù)據(jù)庫學(xué)術(shù)會(huì)議論文集（一）[C];2008年

4 公偉;隗玉凱;王慶升;胡鑫磊;李換雙;;美國隱私保護(hù)標(biāo)準(zhǔn)及隱私保護(hù)控制思路研究[A];2013年度標(biāo)準(zhǔn)化學(xué)術(shù)研究論文集[C];2013年

5 張鵬;于波;童云海;唐世渭;;基于隨機(jī)響應(yīng)的隱私保護(hù)關(guān)聯(lián)規(guī)則挖掘[A];第二十一屆中國數(shù)據(jù)庫學(xué)術(shù)會(huì)議論文集（技術(shù)報(bào)告篇）[C];2004年

6 桂瓊;程小輝;;一種隱私保護(hù)的分布式關(guān)聯(lián)規(guī)則挖掘方法[A];2009年全國開放式分布與并行計(jì)算機(jī)學(xué)術(shù)會(huì)議論文集(下冊(cè))[C];2009年

7 俞笛;徐向陽;解慶春;劉寅;;基于保序加密的隱私保護(hù)挖掘算法[A];第八屆全國信息隱藏與多媒體安全學(xué)術(shù)大會(huì)湖南省計(jì)算機(jī)學(xué)會(huì)第十一屆學(xué)術(shù)年會(huì)論文集[C];2009年

8 李貝貝;樂嘉錦;;分布式環(huán)境下的隱私保護(hù)關(guān)聯(lián)規(guī)則挖掘[A];第二十二屆中國數(shù)據(jù)庫學(xué)術(shù)會(huì)議論文集（研究報(bào)告篇）[C];2005年

9 徐振龍;郭崇慧;;隱私保護(hù)數(shù)據(jù)挖掘研究的簡要綜述[A];第七屆（2012）中國管理學(xué)年會(huì)商務(wù)智能分會(huì)場論文集（選編）[C];2012年

10 潘曉;郝興;孟小峰;;基于位置服務(wù)中的連續(xù)查詢隱私保護(hù)研究[A];第26屆中國數(shù)據(jù)庫學(xué)術(shù)會(huì)議論文集（A輯）[C];2009年

相關(guān)重要報(bào)紙文章前10條

1 記者李舒瑜;更關(guān)注隱私保護(hù)和人格尊重[N];深圳特區(qū)報(bào);2011年

2 荷蘭鹿特丹醫(yī)學(xué)中心博士吳舟橋;荷蘭人的隱私[N];東方早報(bào);2012年

3 本報(bào)記者周靜;私密社交應(yīng)用風(fēng)潮來襲聚焦小眾隱私保護(hù)是關(guān)鍵[N];通信信息報(bào);2013年

4 獨(dú)立分析師陳志剛;隱私管理應(yīng)歸個(gè)人[N];通信產(chǎn)業(yè)報(bào);2013年

5 本報(bào)記者朱寧寧;商業(yè)利益與隱私保護(hù)需立法平衡[N];法制日?qǐng)?bào);2014年

6 袁元;手機(jī)隱私保護(hù)萌發(fā)商機(jī)[N];證券日?qǐng)?bào);2014年

7 王爾山;跟隱私說再見[N];21世紀(jì)經(jīng)濟(jì)報(bào)道;2008年

8 記者武曉黎;360安全瀏覽器推“隱私瀏覽”模式[N];中國消費(fèi)者報(bào);2008年

9 早報(bào)記者是冬冬;“美國隱私保護(hù)法律已過時(shí)”[N];東方早報(bào);2012年

10 張曉明;隱私的兩難[N];電腦報(bào);2013年

相關(guān)博士學(xué)位論文前10條

1 孟祥旭;基于位置的移動(dòng)信息服務(wù)技術(shù)與應(yīng)用研究[D];國防科學(xué)技術(shù)大學(xué);2013年

2 蘭麗輝;基于向量模型的加權(quán)社會(huì)網(wǎng)絡(luò)發(fā)布隱私保護(hù)方法研究[D];江蘇大學(xué);2015年

3 柯昌博;云服務(wù)組合隱私分析與保護(hù)方法研究[D];南京航空航天大學(xué);2014年

4 李敏;基于位置服務(wù)的隱私保護(hù)研究[D];電子科技大學(xué);2014年

5 陳東;信息物理融合系統(tǒng)安全與隱私保護(hù)關(guān)鍵技術(shù)研究[D];東北大學(xué);2014年

6 張柯麗;信譽(yù)系統(tǒng)安全和隱私保護(hù)機(jī)制的研究[D];北京郵電大學(xué);2015年

7 Kamenyi Domenic Mutiria;[D];電子科技大學(xué);2014年

8 孫崇敬;面向?qū)傩耘c關(guān)系的隱私保護(hù)數(shù)據(jù)挖掘理論研究[D];電子科技大學(xué);2014年

9 劉向宇;面向社會(huì)網(wǎng)絡(luò)的隱私保護(hù)關(guān)鍵技術(shù)研究[D];東北大學(xué);2014年

10 高勝;移動(dòng)感知計(jì)算中位置和軌跡隱私保護(hù)研究[D];西安電子科技大學(xué);2014年

相關(guān)碩士學(xué)位論文前10條

1 劉乾坤;基于匿名化的隱私保護(hù)數(shù)據(jù)挖掘技術(shù)的研究[D];哈爾濱工業(yè)大學(xué);2017年

2 鄒朝斌;SNS用戶隱私感知與自我表露行為的關(guān)系研究[D];西南大學(xué);2015年

3 李汶龍;大數(shù)據(jù)時(shí)代的隱私保護(hù)與被遺忘權(quán)[D];中國政法大學(xué);2015年

4 孫琪;基于位置服務(wù)的連續(xù)查詢隱私保護(hù)研究[D];湖南工業(yè)大學(xué);2015年

5 尹惠;無線傳感器網(wǎng)絡(luò)數(shù)據(jù)融合隱私保護(hù)技術(shù)研究[D];西南交通大學(xué);2015年

6 王鵬飛;位置服務(wù)中的隱私保護(hù)技術(shù)研究[D];南京理工大學(xué);2015年

7 顧鋮;基于關(guān)聯(lián)規(guī)則的隱私保護(hù)算法研究[D];南京理工大學(xué);2015年

8 崔堯;基于匿名方案的位置隱私保護(hù)技術(shù)研究[D];西安工業(yè)大學(xué);2015年

9 畢開圓;社會(huì)網(wǎng)絡(luò)中用戶身份隱私保護(hù)模型的研究[D];大連海事大學(xué);2015年

10 黃奚芳;基于差分隱私保護(hù)的集值型數(shù)據(jù)發(fā)布技術(shù)研究[D];江西理工大學(xué);2015年

，

本文編號(hào)：2437326

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/ruanjiangongchenglunwen/2437326.html

上一篇：互聯(lián)網(wǎng)時(shí)代社會(huì)生態(tài)異構(gòu)數(shù)據(jù)集成的研究與應(yīng)用
下一篇：數(shù)字圖像處理中二維經(jīng)驗(yàn)?zāi)Ｊ椒纸怅P(guān)鍵問題研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于匿名化的隱私保護(hù)數(shù)據(jù)挖掘技術(shù)的研究