對(duì)象檢索中的實(shí)體信息查詢擴(kuò)展算法研究
發(fā)布時(shí)間:2019-07-10 08:34
【摘要】:本文主要研究了對(duì)象檢索中的實(shí)體信息擴(kuò)展算法,現(xiàn)如今對(duì)于信息的需求已經(jīng)逐漸從較為模糊的網(wǎng)頁(yè)檢索演進(jìn)為對(duì)象檢索,帶動(dòng)實(shí)體信息抽取成為最核心的技術(shù)之一,而實(shí)體信息擴(kuò)展則是實(shí)體信息抽取技術(shù)中一個(gè)重要的部分。實(shí)體信息抽取的目的在于自動(dòng)生成包含實(shí)體相關(guān)屬性信息的實(shí)體知識(shí)庫(kù)。本文研究的實(shí)體信息查詢擴(kuò)展的目的:一是擴(kuò)充實(shí)體查詢?cè)~信息,在查詢?cè)~信息不完備的條件下,對(duì)實(shí)體查詢?cè)~進(jìn)行信息擴(kuò)充,消除查詢?cè)~歧義,明確查詢意圖;二是實(shí)現(xiàn)針對(duì)實(shí)體別稱等共指信息的擴(kuò)展,從而將共同指向的不同實(shí)體之間的信息得以合并共享。 本文的主要工作如下: 首先,將對(duì)象檢索與傳統(tǒng)的信息檢索進(jìn)行了分析對(duì)比,重點(diǎn)分析了實(shí)體信息擴(kuò)展和傳統(tǒng)查詢擴(kuò)展在預(yù)處理、詞項(xiàng)選擇、相關(guān)度計(jì)算、及匹配方法上的區(qū)別和聯(lián)系,并在此基礎(chǔ)上確定了本文的主要研究課題,即基于統(tǒng)計(jì)學(xué)習(xí)的實(shí)體信息擴(kuò)展,以及基于語(yǔ)法規(guī)則的實(shí)體信息擴(kuò)展。 其次,針對(duì)與實(shí)體相關(guān)度高的詞項(xiàng)擴(kuò)展問(wèn)題,本文提出了一種基于概率統(tǒng)計(jì)的實(shí)體信息擴(kuò)展方法,利用相關(guān)反饋技術(shù),結(jié)合層次聚類算法,在相關(guān)文檔集內(nèi)對(duì)實(shí)體與詞項(xiàng)進(jìn)行共現(xiàn)相關(guān)度挖掘,實(shí)現(xiàn)對(duì)實(shí)體描述信息的擴(kuò)展;谠撃P,對(duì)兩千余個(gè)實(shí)體進(jìn)行了相關(guān)詞項(xiàng)擴(kuò)展,并應(yīng)用在TREC2012Microblog評(píng)測(cè)任務(wù)中,結(jié)果驗(yàn)證了該模型的有效性。 最后,針對(duì)實(shí)體別稱、同義詞、身份描述等信息,本文研究給出了一種基于語(yǔ)法規(guī)則的實(shí)體信息擴(kuò)展方法,通過(guò)詞法分析預(yù)處理,根據(jù)針對(duì)共指表述的語(yǔ)法特征,對(duì)實(shí)體表述進(jìn)行共指消解,實(shí)現(xiàn)實(shí)體別稱等信息的擴(kuò)展。利用該模型,在TAC2012KBP中的兩個(gè)子任務(wù)中獲得良好效果,驗(yàn)證了該模型的有效性。
文內(nèi)圖片:
圖片說(shuō)明:凝聚的層次聚類劃分策略這一簇文檔集中的全部文檔將作為對(duì)實(shí)體的支撐信息/并在后續(xù)步驟中對(duì)這些文檔進(jìn)行針對(duì)這一實(shí)體的信息抽取作為對(duì)這一實(shí)體的信息擴(kuò)展
[Abstract]:This paper mainly studies the entity information expansion algorithm in object retrieval. Now the demand for information has gradually evolved from vague web page retrieval to object retrieval, which makes entity information extraction become one of the most core technologies, and entity information expansion is an important part of entity information extraction technology. The purpose of entity information extraction is to automatically generate entity knowledge base containing entity related attribute information. The purpose of the entity information query extension studied in this paper is: first, to expand the entity query word information, under the condition that the query word information is not complete, to expand the entity query word information, to eliminate the query word ambiguity, and to clarify the query intention; the other is to realize the expansion of the common reference information for the entity nickname, so that the information between the different entities can be merged and shared. The main work of this paper is as follows: firstly, the object retrieval is analyzed and compared with the traditional information retrieval, and the differences and relations between entity information extension and traditional query extension in preprocessing, word item selection, relevance calculation and matching methods are analyzed. On this basis, the main research topics of this paper are determined, that is, the entity information extension based on statistical learning. And the extension of entity information based on syntax rules. Secondly, in order to solve the problem of word item expansion with high correlation with entity, this paper proposes a method of entity information extension based on probability statistics. By using correlation feedback technology and hierarchical clustering algorithm, the co-occurrence correlation degree mining of entity and word item is carried out in the related document set to realize the extension of entity description information. Based on the model, the related lexical items of more than two thousand entities are extended and applied to the TREC2012Microblog evaluation task. The results verify the effectiveness of the model. Finally, aiming at the information such as entity synonym, identity description and so on, this paper presents a method of entity information extension based on grammatical rules. Through lexical analysis preprocessing, according to the grammatical characteristics of common reference expression, the entity expression is digested and the information such as entity nickname is extended. Using the model, good results are obtained in two subtasks in TAC2012KBP, and the effectiveness of the model is verified.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP393.092;TP391.3
本文編號(hào):2512481
文內(nèi)圖片:
圖片說(shuō)明:凝聚的層次聚類劃分策略這一簇文檔集中的全部文檔將作為對(duì)實(shí)體的支撐信息/并在后續(xù)步驟中對(duì)這些文檔進(jìn)行針對(duì)這一實(shí)體的信息抽取作為對(duì)這一實(shí)體的信息擴(kuò)展
[Abstract]:This paper mainly studies the entity information expansion algorithm in object retrieval. Now the demand for information has gradually evolved from vague web page retrieval to object retrieval, which makes entity information extraction become one of the most core technologies, and entity information expansion is an important part of entity information extraction technology. The purpose of entity information extraction is to automatically generate entity knowledge base containing entity related attribute information. The purpose of the entity information query extension studied in this paper is: first, to expand the entity query word information, under the condition that the query word information is not complete, to expand the entity query word information, to eliminate the query word ambiguity, and to clarify the query intention; the other is to realize the expansion of the common reference information for the entity nickname, so that the information between the different entities can be merged and shared. The main work of this paper is as follows: firstly, the object retrieval is analyzed and compared with the traditional information retrieval, and the differences and relations between entity information extension and traditional query extension in preprocessing, word item selection, relevance calculation and matching methods are analyzed. On this basis, the main research topics of this paper are determined, that is, the entity information extension based on statistical learning. And the extension of entity information based on syntax rules. Secondly, in order to solve the problem of word item expansion with high correlation with entity, this paper proposes a method of entity information extension based on probability statistics. By using correlation feedback technology and hierarchical clustering algorithm, the co-occurrence correlation degree mining of entity and word item is carried out in the related document set to realize the extension of entity description information. Based on the model, the related lexical items of more than two thousand entities are extended and applied to the TREC2012Microblog evaluation task. The results verify the effectiveness of the model. Finally, aiming at the information such as entity synonym, identity description and so on, this paper presents a method of entity information extension based on grammatical rules. Through lexical analysis preprocessing, according to the grammatical characteristics of common reference expression, the entity expression is digested and the information such as entity nickname is extended. Using the model, good results are obtained in two subtasks in TAC2012KBP, and the effectiveness of the model is verified.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP393.092;TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前3條
1 徐建民;白彥霞;吳樹(shù)芳;;基于同義詞擴(kuò)展的貝葉斯網(wǎng)絡(luò)檢索模型[J];計(jì)算機(jī)應(yīng)用;2006年11期
2 嚴(yán)華云;劉其平;肖良軍;;信息檢索中的相關(guān)反饋技術(shù)綜述[J];計(jì)算機(jī)應(yīng)用研究;2009年01期
3 王蘭成;李超;;結(jié)合兩種相似度計(jì)算的主題信息檢索方法研究[J];現(xiàn)代圖書(shū)情報(bào)技術(shù);2009年11期
,本文編號(hào):2512481
本文鏈接:http://www.sikaile.net/guanlilunwen/ydhl/2512481.html
最近更新
教材專著