基于中文維基百科的概念相關詞群研究

發(fā)布時間：2018-07-30 06:38

【摘要】：互聯(lián)網(wǎng)飛速發(fā)展,人們對信息獲取需求的不斷提高,同時信息爆炸式增長,導致信息的收集和查找日益困難,如何在有限的時間內(nèi)查找到準確而全面的信息對于搜索技術研究提出了重大的挑戰(zhàn),而在搜索引擎系統(tǒng)中加入語義知識就是提高查詢效率的一個重要途徑。詞語作為語義表示的最小單位,由于一詞多義、別名等眾多復雜情況導致單個詞語表達意思時語義不明確,傳統(tǒng)的一些詞語相關度計算方法不能很好地解決詞語消歧義問題。傳統(tǒng)計算方法大概可以分兩種方法,一是在大規(guī)模語料上使用統(tǒng)計方法,但是現(xiàn)實生活中缺少規(guī)模足夠大且精確的語料；二是基于人工構建知識系統(tǒng)的計算方法,也存在一些問題,如人工構建知識系統(tǒng)規(guī)模小、維護成本高等。面對傳統(tǒng)詞語相關度計算方法的一些不足以及當今自然語言處理領域對語義知識的需求,本文著重于詞語相關度計算與概念相關詞群挖掘的研究,具體內(nèi)容如下：一、對中文維基百科資源整理加工的基礎上,使用改進的WLVM方法建立了-個詞語間相關度數(shù)據(jù)集,對數(shù)據(jù)集進行了評估和分析,整理出一些概念的相關詞群,概念詞群可以用于該概念的語義表示,同樣也可以被廣泛的應用于自然語言處理的其他方面,比如,文本擴展、知識庫構建等。二、提出一種詞語相關度計算方法。在分析前人詞語相關性計算方法的基礎上,對比大規(guī)模語料、人工構建的知識系統(tǒng)與維基百科的差別,本文提出一種詞語間語義相關度計算方法,綜合利用了鏈接、分類系統(tǒng)、文本資源和錨文本等語義知識,并對相關性計算結果進行消歧義處理。在實驗中,使用本文提出的方法分別在文本資源和鏈接、分類系統(tǒng)中計算詞語相關度、并與其他多種方法做了對比,證明了本方法的有效性。
[Abstract]:With the rapid development of the Internet, the increasing demand for information acquisition and the explosive growth of information make it more and more difficult to collect and find information. How to find accurate and comprehensive information in a limited time poses a great challenge to the research of search technology, and adding semantic knowledge to search engine system is an important way to improve query efficiency. As the smallest unit of semantic representation, because of the complexity of polysemy, aliases, etc., the semantic of a single word is not clear, so some traditional methods of calculating the correlation degree of words can not solve the problem of word disambiguation. The traditional computing methods can be divided into two methods: one is to use statistical methods on large-scale corpus, but in real life there is a lack of large enough and accurate data; the other is to calculate the knowledge system based on artificial construction. There are also some problems, such as small scale of artificial construction of knowledge system, high maintenance cost and so on. In the face of the shortcomings of traditional computing methods of word relevance and the need of semantic knowledge in the field of natural language processing, this paper focuses on the research of word relevance calculation and concept related word group mining. The specific contents are as follows: first, based on the processing of Chinese Wikipedia resources, we establish a set of words correlation data set by using the improved WLVM method, and evaluate and analyze the data set. The concept group can be used for semantic representation of the concept, and can also be widely used in other aspects of natural language processing, such as text expansion, knowledge base construction and so on. Second, a method for calculating the relevance of words is proposed. On the basis of analyzing the previous methods of word correlation calculation and comparing the differences between large-scale corpus, artificial knowledge system and Wikipedia, this paper proposes a method to calculate the semantic relevance between words and phrases, which makes comprehensive use of link and classification system. Semantic knowledge such as text resources and anchor text are used to disambiguate the results of correlation calculation. In the experiment, the method proposed in this paper is used to calculate the relevance of words in the text resources, links and classification system respectively, and compared with other methods, the effectiveness of this method is proved.
【學位授予單位】：華中師范大學
【學位級別】：碩士
【學位授予年份】：2012
【分類號】：TP391.1

【引證文獻】

相關碩士學位論文前2條

1 駱超;基于LDA模型的文檔排序方法研究[D];華中師范大學;2013年

2 劉強;面向查詢語句的擴展過濾及權重計算研究[D];華中師范大學;2013年

，

本文編號：2154149

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/2154149.html

上一篇：規(guī)劃設計領域的文檔模型及知識搜索的研究與實現(xiàn)
下一篇：Google給圖書館帶來的十大機遇與挑戰(zhàn)

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于中文維基百科的概念相關詞群研究