基于超圖的漢越新聞關(guān)鍵詞抽取研究
[Abstract]:With the development of Belt and Road, our country began to pay more attention to Vietnam. News, as a carrier of information dissemination, is an important way for people to obtain information. However, Vietnamese is a small language, very few people master, and network news can hardly provide keywords, making news positioning difficult. Chinese and Vietnamese news keyword extraction can save a lot of time and improve the utilization rate of information. At present, in the field of keyword extraction, people usually only consider the feature information of words, and do not consider the complex relations in news documents, so it is an urgent problem to use the appropriate model to express these complex relationships. The hypergraph model can express the complex relations between multiple entities, and its characteristics can meet the needs of news documents to express multiple relationships. Therefore, this paper uses the hypergraph model to study the complex relationships between multiple entities, and the hypergraph model is used to study the relationship between multiple entities in a single document. The method of keyword extraction in multi-document and bilingual environment. The main work of this paper is as follows: 1. This paper presents a new method of news keyword extraction based on hypergraph sorting in single document. Considering that the hypergraph model can express the relationship between words and sentences in the document, this method firstly analyzes the structural characteristics of a single document, takes the word as the vertex, selects the word frequency, part of speech, word span and location as the weight of the word. Then the sentence is regarded as the hypergraph edge of the hypergraph, and the single document news hypergraph model. 2. 2. In this paper, a new method of news keyword extraction based on hypergraph sorting is proposed. Considering that hyper-edge in hypergraph model can represent a news document, this method extracts the time factor and comment number element of news page as feature weight of super-edge by analyzing the influence of the feature of news page on keyword extraction. Build multi-document news hypergraph model. 3. In this paper, a method for extracting Chinese and Vietnamese bilingual news keywords based on hypergraph ordering under multi-document is proposed. Considering that hypergraph can express the corresponding relationship between Chinese and Vietnamese bilingual words by hypergraph, this method firstly analyzes the characteristics of bilingual news documents and takes the frequency of bilingual words as the core feature information of words. Then two types of hyperedges are constructed to build a bilingual news hypergraph model. Finally, the hypergraph-based random walk algorithm is used to sort the vertices of the hypergraph, and some words with the highest ranking are output as keywords of the news document. The experiment proves that the method has message.
【學(xué)位授予單位】:昆明理工大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 張莉婧;李業(yè)麗;曾慶濤;雷嘉麗;楊鵬;;基于改進(jìn)TextRank的關(guān)鍵詞抽取算法[J];北京印刷學(xué)院學(xué)報(bào);2016年04期
2 寧建飛;劉降珍;;融合Word2vec與TextRank的關(guān)鍵詞抽取研究[J];現(xiàn)代圖書情報(bào)技術(shù);2016年06期
3 牛萍;黃德根;;TF-IDF與規(guī)則相結(jié)合的中文關(guān)鍵詞自動抽取研究[J];小型微型計(jì)算機(jī)系統(tǒng);2016年04期
4 李強(qiáng);;一種基于種子擴(kuò)散策略的關(guān)鍵詞抽取方法[J];科技風(fēng);2016年01期
5 朱澤德;李淼;張健;曾偉輝;曾新華;;一種基于LDA模型的關(guān)鍵詞抽取方法[J];中南大學(xué)學(xué)報(bào)(自然科學(xué)版);2015年06期
6 王民;;新聞文檔關(guān)鍵詞抽取技術(shù)研究[J];科技傳播;2015年07期
7 梁喜濤;顧磊;;中文分詞與詞性標(biāo)注研究[J];計(jì)算機(jī)技術(shù)與發(fā)展;2015年02期
8 顧益軍;夏天;;融合LDA與TextRank的關(guān)鍵詞抽取研究[J];現(xiàn)代圖書情報(bào)技術(shù);2014年Z1期
9 孫皓;董守斌;;基于標(biāo)簽密度的自適應(yīng)正文提取方法[J];鄭州大學(xué)學(xué)報(bào)(理學(xué)版);2009年01期
10 章成志;;自動標(biāo)引研究的回顧與展望[J];現(xiàn)代圖書情報(bào)技術(shù);2007年11期
相關(guān)博士學(xué)位論文 前1條
1 徐曉華;圖上的隨機(jī)游走學(xué)習(xí)[D];南京航空航天大學(xué);2008年
相關(guān)碩士學(xué)位論文 前5條
1 汪建成;漢越雙語新聞話題分析方法研究[D];昆明理工大學(xué);2015年
2 毛新武;基于組合特征的中文新聞網(wǎng)頁關(guān)鍵詞提取研究[D];北京林業(yè)大學(xué);2013年
3 沈劍平;面向網(wǎng)絡(luò)人物搜索的中文人名消歧[D];哈爾濱工業(yè)大學(xué);2010年
4 尹倩;基于聚類分析的中文新聞網(wǎng)頁關(guān)鍵詞提取方法研究[D];合肥工業(yè)大學(xué);2009年
5 楊潔;多文檔關(guān)鍵詞抽取技術(shù)的研究[D];沈陽航空工業(yè)學(xué)院;2009年
,本文編號:2261433
本文鏈接:http://www.sikaile.net/kejilunwen/ruanjiangongchenglunwen/2261433.html