基于搜索日志的用戶語義本體構(gòu)建研究

發(fā)布時間：2018-02-11 08:43

本文關(guān)鍵詞： 用戶語義本體用戶日志概念格形式概念分析 WordNet　出處：《西華大學(xué)》2012年碩士論文　論文類型：學(xué)位論文

【摘要】：近年來，隨著互聯(lián)網(wǎng)信息技術(shù)的迅速發(fā)展，互聯(lián)網(wǎng)上的信息資源已經(jīng)達到海量級別，并且正呈指數(shù)級形式增長,海量網(wǎng)頁數(shù)據(jù)的結(jié)構(gòu)復(fù)雜性和用戶查詢詞簡短及語義模糊性等特點給現(xiàn)有搜索引擎的發(fā)展帶來了極大的挑戰(zhàn)。檢索系統(tǒng)如何能夠準確的理解用戶輸入查詢詞的信息需求，根據(jù)不同用戶返回不同的檢索結(jié)果，即根據(jù)不同的用戶提供個性化的服務(wù)，這是用戶越來越關(guān)心的問題。要為用戶提供個性化的服務(wù)，就必須挖掘用戶的相關(guān)領(lǐng)域知識背景，為搜索引擎提供一個面向用戶的知識模型，即用戶本體。一般情況下，搜索引擎搜集了大量的用戶搜索日志，這些數(shù)據(jù)記錄了用戶歷史查詢詞和對應(yīng)點擊網(wǎng)頁的信息，通過分析這些歷史數(shù)據(jù)，能夠挖掘出用戶的領(lǐng)域背景知識。本體（Ontology）作為語義網(wǎng)的關(guān)鍵技術(shù)，它通過提供一個領(lǐng)域的詞匯和形式概念，使得信息的共享和交互變得容易和簡單。本文的主要工作如下：首先，本文提出了一種新穎的用戶查詢詞語義相似度計算方法，采用AGNES(Agglomerative Nesting)層次聚類算法，將用戶查詢詞按用戶個性化興趣和知識背景進行主題分類。本文首先提出了基于用戶搜索日志的三種用戶查詢詞語義相似關(guān)系①基于用戶原始查詢詞本身的相似關(guān)系，②基于用戶擴展查詢詞的相似關(guān)系，③基于用戶點擊URLs的相似關(guān)系。通過分析這三種語義關(guān)系，將它們按照線性組合的方式組合，形成了一種新穎的計算用戶查詢詞語義相似度的方法。基于這種用戶查詢詞語義相似度函數(shù)，利用AGNES層次聚類算法，將用戶查詢詞根據(jù)用戶搜索日志中所反映的主題進行語義主題聚類，從而達到消除用戶查詢詞語義模糊性的目的。其次，本文提出了一種利用用戶查詢詞語義主題聚類結(jié)果和WordNet通用本體建立一個用戶查詢詞興趣主題領(lǐng)域知識模型，即用戶語義本體（User Semantic Ontology）的方法。該方法過程如下①根據(jù)用戶查詢詞語義主題聚類結(jié)果，生成用戶原始查詢詞-用戶點擊文檔和擴展查詢詞-用戶點擊文檔之間的形式背景，②優(yōu)化擴展查詢詞-用戶點擊文檔之間的形式背景，合并原始查詢詞和優(yōu)化后的擴展查詢詞的形式背景，并構(gòu)建概念格，通過概念格-本體轉(zhuǎn)化的規(guī)則，將概念格轉(zhuǎn)換為初始本體，③利用WordNet優(yōu)化初始本體。該用戶本體表達了一個用戶的興趣偏好，然后將其應(yīng)用于主題搜索引擎，，進而可以把信息采集從基于關(guān)鍵詞的相關(guān)度匹配技術(shù)層面提高到基于語義層面的查找。最后，利用VC++6.0開發(fā)應(yīng)用程序進行驗證。實驗表明，通過本文本體構(gòu)建方法，用戶查詢詞能更好的根據(jù)用戶興趣和知識背景來區(qū)分其真實語義，消除其語義模糊性。
[Abstract]:In recent years, with the rapid development of Internet information technology, the information resources on the Internet have reached a mass level, and are growing exponentially. The structural complexity of massive web page data and the short and semantic fuzziness of user query words bring great challenges to the development of existing search engines. How can the retrieval system accurately understand the information requirements of user input query words? According to different users return different retrieval results, that is, according to different users to provide personalized services, which is increasingly concerned by users. In order to provide personalized services for users, we must dig up the relevant domain knowledge background of users. In general, the search engine collects a large number of user search logs, which record the user history query words and the corresponding information of clicking on the web page, which provides a user-oriented knowledge model for the search engine. By analyzing these historical data, we can mine the domain background knowledge of the user. Ontology (Ontology) as the key technology of semantic Web, it makes the sharing and interaction of information easy and simple by providing the vocabulary and formal concept of a domain. The main work of this paper is as follows:. First of all, this paper proposes a novel method to calculate the semantic similarity of user query words, using AGNES(Agglomerative clustering hierarchical clustering algorithm. The user query words are classified according to the user's personalized interest and knowledge background. Firstly, three kinds of semantic similarity relation of user query words based on user search log are proposed. 1 based on the similarity of the original query words of the user, this paper proposes three kinds of semantic similarity relation of user query words based on user search log. Relationship 2 is based on the similarity relation of user extended query words. 3 based on the similarity relation of user clicking on URLs, this paper analyzes the three semantic relationships. A novel method of calculating the semantic similarity of user query words is formed by combining them according to linear combination. Based on this function of semantic similarity, AGNES hierarchical clustering algorithm is used. The user query words are clustered according to the topics reflected in the user search log, so as to eliminate the ambiguity of the meaning of the user query words. Secondly, this paper proposes a knowledge model of topic domain of user query word interest, which is based on the result of semantic clustering of user query words and WordNet general ontology. The process of this method is as follows: (1) clustering result according to the user query word meaning topic, Generate user original query words-user click document and extended query word-user click formal background between documents / optimize extended query word-user clicks on formal background between documents, The formal background of the original query words and the optimized extended query words is combined, and the concept lattice is constructed, which is transformed by the rules of concept lattice-ontology transformation. The concept lattice is transformed into the initial ontology 3, which uses WordNet to optimize the initial ontology. The user ontology expresses a user's interest preference and then applies it to the subject search engine. Furthermore, the information collection can be improved from keyword based correlation matching technology to semantic level based search. Finally, using VC 6.0 to develop the application program to verify. The experiment shows that the user query words can better distinguish their real semantics and eliminate their semantic fuzziness according to the user's interest and knowledge background through the method of ontology construction in this paper.
【學(xué)位授予單位】：西華大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2012
【分類號】：TP391.1

【相似文獻】

相關(guān)期刊論文前10條

1 馬峻;一種從線性概念圖中自動抽取本體概念的算法[J];計算機工程與應(yīng)用;2004年23期

2 黃建華;朱才連;;基于本體的BADD原型系統(tǒng)研究[J];上海理工大學(xué)學(xué)報;2006年02期

3 徐劍波;王仁武;陳家訓(xùn);;基于本體的概念辨析[J];現(xiàn)代圖書情報技術(shù);2006年06期

4 董敏紅;;圖書館開展用戶信息服務(wù)的探討[J];科技情報開發(fā)與經(jīng)濟;2006年24期

5 袁琴;楊小虎;;基于本體分類的Web服務(wù)合成的研究及應(yīng)用[J];計算機工程;2007年02期

6 全沒;;數(shù)字圖書館知識服務(wù)中用戶信息需求分析的存在問題及對策[J];情報探索;2007年02期

7 李寶珍;;基于知識管理的用戶信息服務(wù)探討[J];中國市場;2007年01期

8 王福成;沈記全;陳科;;基于網(wǎng)格的異構(gòu)數(shù)據(jù)庫整合研究[J];應(yīng)用科技;2007年09期

9 宋涯含;延清;;基于本體的網(wǎng)絡(luò)信息組織方法的理論研究[J];圖書館論壇;2007年04期

10 毋濤;黃寧;;基于語義的工作流過程優(yōu)化[J];計算機工程與應(yīng)用;2008年09期

相關(guān)會議論文前10條

1 徐立恒;劉洋;來斯惟;劉康;田野;王渝麗;趙軍;;基于多特征表示的本體概念掛載[A];中國計算語言學(xué)研究前沿進展（2009-2011）[C];2011年

2 繆嘉嘉;李愛平;劉志忠;吳泉源;賈焰;;一種面向語義信息集成的本體擴展方法[A];第二十二屆中國數(shù)據(jù)庫學(xué)術(shù)會議論文集（技術(shù)報告篇）[C];2005年

3 張德政;劉潔卉;;基于圖分析的領(lǐng)域知識獲取技術(shù)[A];著力提高高等教育質(zhì)量，努力增強高校創(chuàng)新與服務(wù)能力——北京市高等教育學(xué)會2007年學(xué)術(shù)年會論文集（上冊）[C];2008年

4 王曉光;昃琳;劉濤;;以用戶為中心的圖書館——圖書館服務(wù)模式研究[A];信息時代——科技情報研究學(xué)術(shù)論文集（第三輯）[C];2008年

5 李宏偉;吳國榮;袁永華;;基于本體的Web服務(wù)自動組合方法研究[A];第四屆海峽兩岸GIS發(fā)展研討會暨中國GIS協(xié)會第十屆年會論文集[C];2006年

6 許勇;王智學(xué);李宗勇;;基于圖形化的本體一致性檢查[A];2007北京地區(qū)高校研究生學(xué)術(shù)交流會通信與信息技術(shù)會議論文集（上冊）[C];2008年

7 李帆;夏士雄;張磊;王志愿;;基于語義劃分的多層關(guān)聯(lián)規(guī)則冗余處理方法[A];2010年全國開放式分布與并行計算機學(xué)術(shù)會議論文集[C];2010年

8 陳靜;朱巧明;貢正仙;李培鋒;;特定本體指導(dǎo)的信息抽取技術(shù)研究[A];內(nèi)容計算的研究與應(yīng)用前沿——第九屆全國計算語言學(xué)學(xué)術(shù)會議論文集[C];2007年

9 鄭萍;;數(shù)字圖書館時代用戶信息需求及信息服務(wù)[A];福建省社會科學(xué)信息中心2005年年會論文集[C];2005年

10 吳國鳳;吳乃鑫;;基于語義的Web服務(wù)動態(tài)組合算法[A];計算機技術(shù)與應(yīng)用進展·2007——全國第18屆計算機技術(shù)與應(yīng)用（CACIS）學(xué)術(shù)會議論文集[C];2007年

相關(guān)重要報紙文章前10條

1 院圖書館楊華;網(wǎng)絡(luò)環(huán)境下社科信息需求的特點與服務(wù)策略[N];中國社會科學(xué)院院報;2006年

2 紹興市委黨校羅新陽;構(gòu)建網(wǎng)絡(luò)民情分析機制[N];紹興日報;2009年

3 廉迎戰(zhàn);建立新型服務(wù)體系培養(yǎng)優(yōu)秀信息人才[N];科技日報;2007年

4 汪學(xué)群;乾嘉漢學(xué)家對理的新詮釋[N];中國社會科學(xué)院院報;2008年

5 孟偉松　王萬隆;專業(yè)支撐綜合落地[N];人民郵電;2007年

6 記者張勇;“搜”里商機無限門戶化大勢所趨[N];民營經(jīng)濟報;2006年

7 艾宇欣;搜索市場山雨欲來群雄割據(jù)誰與爭峰[N];中國工業(yè)報;2006年

8 記者　錢戈通訊員　陸敏;安慶移動大力拓展農(nóng)村市場[N];人民郵電;2006年

9 本報記者　顧克非;短信網(wǎng)址為服務(wù)行業(yè)點石成金[N];消費日報;2006年

10 王衛(wèi);網(wǎng)絡(luò)圖書館服務(wù)“十化”[N];永州日報;2006年

相關(guān)博士學(xué)位論文前10條

1 周義剛;基于本體分子的動態(tài)知識組織模型及其應(yīng)用研究[D];武漢大學(xué);2010年

2 米楊;基于頂級本體整合的醫(yī)學(xué)領(lǐng)域語義標注研究[D];吉林大學(xué);2012年

3 姜贏;維度本體及其應(yīng)用[D];武漢大學(xué);2009年

4 云紅艷;設(shè)備功能視點下的海洋生態(tài)本體構(gòu)建及應(yīng)用研究[D];中國海洋大學(xué);2012年

5 楊志和;教育資源云服務(wù)本體與技術(shù)規(guī)范研究[D];華東師范大學(xué);2012年

6 馮莎莎;本體邏輯差的研究[D];吉林大學(xué);2011年

7 陳雯;基于本體框架的交通出行語義軌跡建模、標記及數(shù)據(jù)庫研究[D];華東師范大學(xué);2011年

8 饒國政;基于語義WIKI的本體知識庫研究[D];天津大學(xué);2009年

9 陳立;物流信息語義匹配研究[D];北京交通大學(xué);2011年

10 鐘美;基于Web的空間本體構(gòu)建方法研究[D];武漢大學(xué);2010年

相關(guān)碩士學(xué)位論文前10條

1 王娜;基于Silverlight技術(shù)的本體編輯器的設(shè)計與實現(xiàn)[D];中國海洋大學(xué);2010年

2 范軼;基于本體推理的心電圖輔助診斷系統(tǒng)研究[D];吉林大學(xué);2010年

3 徐濟成;面向農(nóng)業(yè)領(lǐng)域的本體學(xué)習(xí)建模研究[D];安徽農(nóng)業(yè)大學(xué);2010年

4 李瓊;基于機器學(xué)習(xí)的本體概念映射研究[D];西安工業(yè)大學(xué);2010年

5 王剛;本體構(gòu)建研究及其在火車訂票系統(tǒng)中的應(yīng)用[D];江蘇科技大學(xué);2010年

6 張義飛;基于改進的PI演算的本體演化形式化描述方法[D];吉林大學(xué);2010年

7 孫婭彬;基于本體的物流資源檢索技術(shù)研究[D];山東師范大學(xué);2010年

8 朱穎;本體技術(shù)在陶瓷產(chǎn)品配置系統(tǒng)中的研究與應(yīng)用[D];景德鎮(zhèn)陶瓷學(xué)院;2010年

9 王松;高效的異構(gòu)本體匹配技術(shù)研究[D];南開大學(xué);2011年

10 王志俊;語義Web中基于OKQT的本體知識路由實現(xiàn)機制[D];太原理工大學(xué);2010年

本文編號：1502644

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/1502644.html

上一篇：基于平行推理機制的隱式篇章關(guān)系檢測研究
下一篇：云搜索中的搜索結(jié)果聚類技術(shù)研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于搜索日志的用戶語義本體構(gòu)建研究