基于馬爾科夫邏輯網(wǎng)的柬埔寨語復(fù)雜組織機(jī)構(gòu)名識別
發(fā)布時(shí)間:2017-12-27 19:14
本文關(guān)鍵詞:基于馬爾科夫邏輯網(wǎng)的柬埔寨語復(fù)雜組織機(jī)構(gòu)名識別 出處:《昆明理工大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
更多相關(guān)文章: 柬埔寨語 Tri-training 特征選擇 馬爾科夫邏輯網(wǎng) 一階邏輯
【摘要】:隨著我國與柬埔寨國家的交流合作日益頻繁,進(jìn)行柬埔寨的自然語言處理工作變得尤為重要。由于不同語言之間存在較大的差異,因此,其他語言的命名實(shí)體識別方法無法直接移植到柬埔寨語中。為了提高柬埔寨語組織機(jī)構(gòu)名識別的準(zhǔn)確率,本文圍繞柬埔寨語組織機(jī)構(gòu)名識別模型構(gòu)建,擴(kuò)充組織機(jī)構(gòu)名語料庫等關(guān)鍵問題展開研究,并取得了以下成果:(1)提出了一種基于Tri-training的柬埔寨語組織機(jī)構(gòu)名的識別方法。該方法首先利用改進(jìn)的Tri-training算法,將基于條件隨機(jī)場、支持向量機(jī)和最大熵模型三個(gè)不同的分類器組合成一個(gè)分類體系,然后利用少量的已標(biāo)注語料,依據(jù)最優(yōu)化樣本選擇策略對新加入樣本進(jìn)行選擇,結(jié)合柬埔寨語的語言特點(diǎn)進(jìn)行實(shí)驗(yàn)。結(jié)果表明該方法能夠通過利用少量的已標(biāo)注語料來實(shí)現(xiàn)對柬埔寨語組織機(jī)構(gòu)名的識別。(2)提出了一種基于馬爾科夫邏輯網(wǎng)的柬埔寨語復(fù)雜組織機(jī)構(gòu)名識別方法。該方法首先采用條件隨機(jī)場模型對簡單的組織機(jī)構(gòu)名進(jìn)行識別,然后結(jié)合柬埔寨語的語言特點(diǎn),得到一階邏輯規(guī)則,將一階邏輯規(guī)則融入到馬爾科夫邏輯網(wǎng)中,并利用LazySAT推理算法來進(jìn)行復(fù)雜組織機(jī)構(gòu)名的識別。結(jié)果表明該方法能夠使柬埔寨語復(fù)雜組織機(jī)構(gòu)名達(dá)到更好的識別效果。(3)設(shè)計(jì)并實(shí)現(xiàn)了柬埔寨語組織機(jī)構(gòu)名識別原型系統(tǒng),為柬埔寨語命名實(shí)體識別的研究提供了有力支撐。
[Abstract]:With the increasingly frequent exchanges and cooperation between China and Kampuchea countries, Natural Language Processing work in Kampuchea is becoming more and more important. Because of the great difference between different languages, the method of naming entity recognition in other languages can not be directly transplanted into Kampuchea language. In order to improve the recognition accuracy of Kampuchea language organization names, this paper constructed around the Kampuchea language organization name recognition model, carried out research on key issues of extension organization name corpus, and has achieved the following results: (1) propose a recognition method based on the Kampuchea language organization name Tri-training. This method uses the improved Tri-training algorithm, the CRFs, support vector machine and maximum entropy model for three different classifiers are combined into a classification system based on corpus, and then use a small amount of sample selection, on the basis of the optimization strategy to select the newly added samples, combined with the linguistic features of Kampuchea language experiment. The results show that the method can realize the identification of the name of the Kampuchea language organization by using a small number of tagged corpus. (2) a method of identifying the name of Kampuchea language complex organization based on Markoff logic network is proposed. This method first uses conditional random field model of simple organization name recognition, and then combined with the linguistic features of Kampuchea language, get the first-order logic rules into first-order logic rules to Markov logic network, and to identify complex organizations using LazySAT inference algorithm. The results show that the method can make the name of Kampuchea language complex organization achieve a better recognition effect. (3) the prototype system of the name recognition of the Kampuchea language organization is designed and realized, which provides a strong support for the study of the name entity recognition of the Kampuchea language.
【學(xué)位授予單位】:昆明理工大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.1;H613
【相似文獻(xiàn)】
相關(guān)重要報(bào)紙文章 前2條
1 本報(bào)記者 楊玲;銀悅西:將民歌傳唱到東盟的使者[N];南寧日報(bào);2008年
2 記者 李新雄 實(shí)習(xí)生 韋錦星 黃政合;推介東盟十國 學(xué)習(xí)東盟語言[N];廣西日報(bào);2004年
相關(guān)碩士學(xué)位論文 前4條
1 王若蘭;基于馬爾科夫邏輯網(wǎng)的柬埔寨語復(fù)雜組織機(jī)構(gòu)名識別[D];昆明理工大學(xué);2017年
2 李小龍(TRY RATANAK);柬埔寨語新聞評論文本情感分類研究[D];昆明理工大學(xué);2017年
3 楊穎;柬埔寨語詞綴研究[D];云南民族大學(xué);2013年
4 潘華山;基于條件隨機(jī)場的柬埔寨語詞法分析方法研究[D];昆明理工大學(xué);2014年
,本文編號:1342840
本文鏈接:http://www.sikaile.net/shoufeilunwen/zaizhiboshi/1342840.html
最近更新
教材專著