天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

融合規(guī)則與統(tǒng)計(jì)的微博新詞發(fā)現(xiàn)方法

發(fā)布時(shí)間:2018-03-19 01:30

  本文選題:微博新詞 切入點(diǎn):構(gòu)詞規(guī)則 出處:《計(jì)算機(jī)應(yīng)用》2017年04期  論文類型:期刊論文


【摘要】:結(jié)合微博新詞的構(gòu)詞規(guī)則自由度大和極其復(fù)雜的特點(diǎn),針對(duì)傳統(tǒng)的C/NC-value方法抽取的結(jié)果新詞邊界的識(shí)別準(zhǔn)確率不高,以及低頻微博新詞無(wú)法正確識(shí)別的問(wèn)題,提出了一種融合人工啟發(fā)式規(guī)則、C/NC-value改進(jìn)算法和條件隨機(jī)場(chǎng)(CRF)模型的微博新詞抽取方法。一方面,人工啟發(fā)式規(guī)則是指對(duì)微博新詞的分類和歸納總結(jié),并從微博新詞構(gòu)詞的詞性(POS)、字符類別和表意符號(hào)等角度設(shè)計(jì)的微博新詞的構(gòu)詞規(guī)則;另一方面,改進(jìn)的C/NC-value方法通過(guò)引入詞頻、鄰接熵和互信息等統(tǒng)計(jì)量來(lái)重構(gòu)NC-value目標(biāo)函數(shù),并使用CRF模型訓(xùn)練和識(shí)別新詞,最終達(dá)到提高新詞邊界識(shí)別準(zhǔn)確率和低頻新詞識(shí)別精度的目的。實(shí)驗(yàn)結(jié)果顯示,與傳統(tǒng)方法相比,所提出的方法能有效地提高微博新詞識(shí)別的F值。
[Abstract]:According to the characteristics of Weibo's great freedom and complexity of word formation rules, aiming at the problem that the recognition accuracy of the boundary of new words extracted by the traditional C / NC-value method is not high, and the problem that the low frequency Weibo new words cannot be correctly recognized, This paper presents a new word extraction method for Weibo, which combines the improved C / NC-value algorithm of artificial heuristic rule and conditional random field CRF model. On the one hand, artificial heuristic rule refers to the classification and summarization of Weibo new words. On the other hand, the improved C / NC-value method reconstructs the NC-value objective function by introducing the statistics of word frequency, contiguous entropy and mutual information, etc. The CRF model is used to train and recognize new words, which can improve the accuracy of boundary recognition and the accuracy of low frequency new words recognition. The experimental results show that, compared with the traditional methods, The proposed method can effectively improve the F value of Weibo's new word recognition.
【作者單位】: 北京交通大學(xué)計(jì)算機(jī)與信息技術(shù)學(xué)院;
【基金】:國(guó)家自然科學(xué)基金資助項(xiàng)目(61370130,61473294) 中央高;究蒲袠I(yè)務(wù)費(fèi)專項(xiàng)資金資助項(xiàng)目(2014RC040) 科學(xué)技術(shù)部國(guó)際科技合作計(jì)劃項(xiàng)目(K11F100010)~~
【分類號(hào)】:TP391.1;TP393.092


本文編號(hào):1632243

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/guanlilunwen/ydhl/1632243.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶e7981***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com