基于超圖的漢越新聞關(guān)鍵詞抽取研究

發(fā)布時間：2018-10-10 10:27

【摘要】：隨著一帶一路的展開,我國對越南的關(guān)注度開始提高,新聞作為信息傳播的載體,是人們獲取信息的重要途徑。然而越南語是小語種,掌握的人非常少,且網(wǎng)絡(luò)新聞幾乎不會提供關(guān)鍵詞,使得新聞的定位成了難題。漢越新聞關(guān)鍵詞抽取能夠節(jié)省大量時間、提高信息使用率,在漢越關(guān)系日趨密切的當(dāng)下有著重要的研究價值。當(dāng)前在關(guān)鍵詞抽取領(lǐng)域,人們通常只考慮詞的特征信息,并沒有考慮新聞文檔中存在的復(fù)雜關(guān)系,所以使用合適的模型對這些復(fù)雜關(guān)系進(jìn)行表達(dá)成為了急需解決的問題。超圖模型中的超邊可以表達(dá)多個實(shí)體間的復(fù)雜關(guān)系,其特性恰好能夠滿足新聞文檔表述多元關(guān)系的需求,因此本文使用超圖模型,研究在單文檔、多文檔與雙語環(huán)境下關(guān)鍵詞抽取的方法。本文的主要工作如下:1.提出了單文檔下基于超圖排序的新聞關(guān)鍵詞抽取方法�？紤]到超圖模型能夠表述文檔中詞與句子之間的關(guān)系,該方法首先分析單文檔的結(jié)構(gòu)特征,將詞作為頂點(diǎn),選擇詞頻、詞性、詞跨度和位置因素作為詞的權(quán)重,之后將句子作為超圖的超邊,構(gòu)建單文檔新聞超圖模型。2.提出了多文檔下基于超圖排序的新聞關(guān)鍵詞抽取方法�？紤]到超圖模型中的超邊可以表示一篇新聞文檔,該方法通過分析新聞網(wǎng)頁自身特征對關(guān)鍵詞抽取的影響,提取新聞網(wǎng)頁的時間要素與評論數(shù)要素作為超邊的特征權(quán)重,構(gòu)建多文檔新聞超圖模型。3.提出了多文檔下基于超圖排序的漢越雙語新聞關(guān)鍵詞抽取方法。考慮到超圖可以通過超邊表述漢越雙語詞對應(yīng)關(guān)系,以此將兩種語言進(jìn)行關(guān)聯(lián),該方法首先分析雙語新聞文檔的特點(diǎn),將雙語詞頻作為詞的核心特征信息,之后通過構(gòu)建兩種類型的超邊來建立雙語新聞超圖模型。最后使用基于超圖的隨機(jī)游走算法將超圖的頂點(diǎn)進(jìn)行排序,并輸出排名最高的若干詞作為新聞文檔的關(guān)鍵詞,實(shí)驗(yàn)證明了方法的有消息。
[Abstract]:With the development of Belt and Road, our country began to pay more attention to Vietnam. News, as a carrier of information dissemination, is an important way for people to obtain information. However, Vietnamese is a small language, very few people master, and network news can hardly provide keywords, making news positioning difficult. Chinese and Vietnamese news keyword extraction can save a lot of time and improve the utilization rate of information. At present, in the field of keyword extraction, people usually only consider the feature information of words, and do not consider the complex relations in news documents, so it is an urgent problem to use the appropriate model to express these complex relationships. The hypergraph model can express the complex relations between multiple entities, and its characteristics can meet the needs of news documents to express multiple relationships. Therefore, this paper uses the hypergraph model to study the complex relationships between multiple entities, and the hypergraph model is used to study the relationship between multiple entities in a single document. The method of keyword extraction in multi-document and bilingual environment. The main work of this paper is as follows: 1. This paper presents a new method of news keyword extraction based on hypergraph sorting in single document. Considering that the hypergraph model can express the relationship between words and sentences in the document, this method firstly analyzes the structural characteristics of a single document, takes the word as the vertex, selects the word frequency, part of speech, word span and location as the weight of the word. Then the sentence is regarded as the hypergraph edge of the hypergraph, and the single document news hypergraph model. 2. 2. In this paper, a new method of news keyword extraction based on hypergraph sorting is proposed. Considering that hyper-edge in hypergraph model can represent a news document, this method extracts the time factor and comment number element of news page as feature weight of super-edge by analyzing the influence of the feature of news page on keyword extraction. Build multi-document news hypergraph model. 3. In this paper, a method for extracting Chinese and Vietnamese bilingual news keywords based on hypergraph ordering under multi-document is proposed. Considering that hypergraph can express the corresponding relationship between Chinese and Vietnamese bilingual words by hypergraph, this method firstly analyzes the characteristics of bilingual news documents and takes the frequency of bilingual words as the core feature information of words. Then two types of hyperedges are constructed to build a bilingual news hypergraph model. Finally, the hypergraph-based random walk algorithm is used to sort the vertices of the hypergraph, and some words with the highest ranking are output as keywords of the news document. The experiment proves that the method has message.
【學(xué)位授予單位】：昆明理工大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2017
【分類號】：TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 張莉婧;李業(yè)麗;曾慶濤;雷嘉麗;楊鵬;;基于改進(jìn)TextRank的關(guān)鍵詞抽取算法[J];北京印刷學(xué)院學(xué)報(bào);2016年04期

2 寧建飛;劉降珍;;融合Word2vec與TextRank的關(guān)鍵詞抽取研究[J];現(xiàn)代圖書情報(bào)技術(shù);2016年06期

3 牛萍;黃德根;;TF-IDF與規(guī)則相結(jié)合的中文關(guān)鍵詞自動抽取研究[J];小型微型計(jì)算機(jī)系統(tǒng);2016年04期

4 李強(qiáng);;一種基于種子擴(kuò)散策略的關(guān)鍵詞抽取方法[J];科技風(fēng);2016年01期

5 朱澤德;李淼;張健;曾偉輝;曾新華;;一種基于LDA模型的關(guān)鍵詞抽取方法[J];中南大學(xué)學(xué)報(bào)(自然科學(xué)版);2015年06期

6 王民;;新聞文檔關(guān)鍵詞抽取技術(shù)研究[J];科技傳播;2015年07期

7 梁喜濤;顧磊;;中文分詞與詞性標(biāo)注研究[J];計(jì)算機(jī)技術(shù)與發(fā)展;2015年02期

8 顧益軍;夏天;;融合LDA與TextRank的關(guān)鍵詞抽取研究[J];現(xiàn)代圖書情報(bào)技術(shù);2014年Z1期

9 孫皓;董守斌;;基于標(biāo)簽密度的自適應(yīng)正文提取方法[J];鄭州大學(xué)學(xué)報(bào)(理學(xué)版);2009年01期

10 章成志;;自動標(biāo)引研究的回顧與展望[J];現(xiàn)代圖書情報(bào)技術(shù);2007年11期

相關(guān)博士學(xué)位論文前1條

1 徐曉華;圖上的隨機(jī)游走學(xué)習(xí)[D];南京航空航天大學(xué);2008年

相關(guān)碩士學(xué)位論文前5條

1 汪建成;漢越雙語新聞話題分析方法研究[D];昆明理工大學(xué);2015年

2 毛新武;基于組合特征的中文新聞網(wǎng)頁關(guān)鍵詞提取研究[D];北京林業(yè)大學(xué);2013年

3 沈劍平;面向網(wǎng)絡(luò)人物搜索的中文人名消歧[D];哈爾濱工業(yè)大學(xué);2010年

4 尹倩;基于聚類分析的中文新聞網(wǎng)頁關(guān)鍵詞提取方法研究[D];合肥工業(yè)大學(xué);2009年

5 楊潔;多文檔關(guān)鍵詞抽取技術(shù)的研究[D];沈陽航空工業(yè)學(xué)院;2009年

，

本文編號：2261433

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/ruanjiangongchenglunwen/2261433.html

上一篇：人臉識別技術(shù)與考勤系統(tǒng)應(yīng)用研究
下一篇：基于實(shí)時信息與用戶引導(dǎo)的公交應(yīng)用的設(shè)計(jì)與實(shí)現(xiàn)

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于超圖的漢越新聞關(guān)鍵詞抽取研究