當(dāng)前位置：主頁(yè) > 經(jīng)濟(jì)論文 > 政治經(jīng)濟(jì)論文 >

越南語(yǔ)淺層句法分析方法的研究

發(fā)布時(shí)間：2018-01-01 02:08

本文關(guān)鍵詞：越南語(yǔ)淺層句法分析方法的研究　出處：《昆明理工大學(xué)》2017年碩士論文　論文類(lèi)型：學(xué)位論文

【摘要】：隨著兩國(guó)政治、經(jīng)濟(jì)和文化等方面日益頻繁接觸和深入合作,語(yǔ)言交流顯得尤為重要。由于兩國(guó)語(yǔ)言相差較大,造成了溝通上的障礙,進(jìn)而成為兩國(guó)發(fā)展的絆腳石;同時(shí)越南語(yǔ)自然語(yǔ)言處理在人工智能中起到核心作用,同樣底層的淺層句法分析是自然語(yǔ)言處理的基礎(chǔ)與前提,關(guān)系到后續(xù)工作的開(kāi)展,且為上層應(yīng)用服務(wù)。為了兩國(guó)更好的發(fā)展,解決語(yǔ)言問(wèn)題勢(shì)在必行;針對(duì)上述問(wèn)題,漢越機(jī)器翻譯工作越來(lái)越重要。本文開(kāi)展了越南語(yǔ)淺層句法分析的研究,主要完成了以下幾個(gè)方面的研究工作:1.收集、整理和預(yù)處理越南語(yǔ)兼類(lèi)詞、實(shí)體和組塊相關(guān)語(yǔ)料。語(yǔ)料是自然語(yǔ)言處理過(guò)程中基礎(chǔ)性的課題,故構(gòu)建語(yǔ)料庫(kù)顯得尤為重要,主要構(gòu)建越南語(yǔ)兼類(lèi)詞、實(shí)體、實(shí)體庫(kù)和組塊等語(yǔ)料庫(kù),語(yǔ)料主要來(lái)源于已公開(kāi)的少量語(yǔ)料和人工標(biāo)記校對(duì)。2.提出了一種基于條件隨機(jī)場(chǎng)的越南語(yǔ)兼類(lèi)詞方法。首先通過(guò)分析越南語(yǔ)兼類(lèi)詞特點(diǎn)進(jìn)行分析,選取有效的兼類(lèi)詞消歧特征,制定相應(yīng)的特征模版;其次使用條件隨機(jī)場(chǎng)進(jìn)行統(tǒng)計(jì)建模,得到基于條件隨機(jī)場(chǎng)的越南語(yǔ)兼類(lèi)詞消歧模型。兼類(lèi)詞問(wèn)題的解決有利于提高詞性標(biāo)注的準(zhǔn)確率,提高詞性語(yǔ)料庫(kù)質(zhì)量,盡可能防止錯(cuò)誤向后累積傳遞,為越南語(yǔ)命名實(shí)體識(shí)別提供了基礎(chǔ)和支撐。3.提出了一種融合實(shí)體特性的越南語(yǔ)命名實(shí)體識(shí)別的混合方法。首先根據(jù)越南語(yǔ)語(yǔ)言和實(shí)體特點(diǎn)進(jìn)行分析,選取全局特征和局部特征作為本文的有效特征,構(gòu)建基于最大熵模型的越南語(yǔ)實(shí)體識(shí)別模型;其次利用以上的特點(diǎn)進(jìn)行制定越南語(yǔ)實(shí)體識(shí)別的規(guī)則集合;最后將最大熵模型和規(guī)則集合相結(jié)合進(jìn)行實(shí)體識(shí)別。實(shí)體可以作為組塊中有效的特征,同時(shí)有利于后續(xù)工作的開(kāi)展。4.提出來(lái)了一種條件隨機(jī)場(chǎng)和錯(cuò)誤驅(qū)動(dòng)學(xué)習(xí)的越南語(yǔ)組塊分析方法。首先根據(jù)越南語(yǔ)組塊和語(yǔ)言特點(diǎn),選取基本特征和實(shí)體特征作為本文的有效特征,使用條件隨機(jī)場(chǎng)統(tǒng)計(jì)方法進(jìn)行建模,得到組塊統(tǒng)計(jì)分析模型;其次利用轉(zhuǎn)換學(xué)習(xí)方法進(jìn)行獲取候選轉(zhuǎn)換規(guī)則集合,利用評(píng)價(jià)函數(shù)進(jìn)行篩選,得到轉(zhuǎn)換規(guī)則集合;最后將統(tǒng)計(jì)模型和轉(zhuǎn)換規(guī)則相結(jié)合進(jìn)行組塊分析標(biāo)記。組塊作為實(shí)體識(shí)別的有效特征有利于實(shí)體識(shí)別正確率的提高。
[Abstract]:As the political, economic and cultural aspects of the two countries increasingly frequent contact and in-depth cooperation, language exchange is particularly important. Then become the stumbling block of the development of the two countries; At the same time, Vietnamese natural language processing plays a central role in artificial intelligence, the same low-level shallow syntax analysis is the basis and premise of natural language processing, related to the development of follow-up work. For the better development of the two countries, it is imperative to solve the language problem; In order to solve the above problems, Sino-Vietnamese machine translation is becoming more and more important. In this paper, the research on the shallow syntactic analysis of Vietnamese has been carried out, and the following research work has been completed: 1. Collating and preprocessing Vietnamese concomitant words, entities and chunks related corpus. Corpus is a basic subject in the process of natural language processing, so it is particularly important to construct corpus, mainly to construct Vietnamese concomitant words and entities. A corpus of entities and blocks. The corpus mainly comes from a few published corpus and manual marker proofreading. 2. A conditional random field based method of Vietnamese conjunctive words is proposed. Firstly, the characteristics of Vietnamese conjunctive words are analyzed. The effective disambiguation feature of the compound word is selected and the corresponding feature template is established. Secondly, the conditional random field is used for statistical modeling to obtain a conditional random field based Vietnamese word disambiguation model. The solution of concurrent word problem is helpful to improve the accuracy of part of speech tagging and improve the quality of part of speech corpus. As far as possible, prevent the accumulation of errors from being passed back. This paper provides the basis and support for Vietnamese named entity recognition. 3. A hybrid method of Vietnamese named entity recognition is proposed. Firstly, it is analyzed according to the characteristics of Vietnamese language and entity. The global feature and local feature are selected as the effective features in this paper, and a Vietnamese entity recognition model based on the maximum entropy model is constructed. Secondly, make use of the above characteristics to make the Vietnamese language entity recognition rules set; Finally, the maximum entropy model and the rule set are combined to identify the entity. The entity can be used as an effective feature in the block. At the same time, it is helpful to carry out the following work. 4. A conditional random field and error-driven learning method of Vietnamese language block analysis is proposed. Firstly, according to the Vietnamese language block and language characteristics. The basic features and entity features are selected as the effective features in this paper. The conditional random field statistical method is used to model the block statistical analysis model. Secondly, the candidate transformation rule set is obtained by using the transformation learning method, and the set of transformation rules is obtained by the selection of the evaluation function. Finally, the statistical model and the transformation rule are combined to carry out block analysis marking. As an effective feature of entity recognition, block is beneficial to improve the accuracy of entity recognition.
【學(xué)位授予單位】：昆明理工大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類(lèi)號(hào)】：TP391.1

【相似文獻(xiàn)】

相關(guān)期刊論文前1條

1 羅武駿;黃程韋;查誠(chéng);趙力;;越南語(yǔ)語(yǔ)音情感特征分析與識(shí)別[J];信號(hào)處理;2013年10期

相關(guān)會(huì)議論文前3條

1 張海云;張超靜;畢玉德;;越南語(yǔ)文獻(xiàn)中字母縮略語(yǔ)自動(dòng)提取研究[A];第五屆全國(guó)青年計(jì)算語(yǔ)言學(xué)研討會(huì)論文集[C];2010年

2 武氏惠;;淺談漢語(yǔ)多音字對(duì)越南語(yǔ)翻譯的影響——以“單”字為例[A];學(xué)行堂文史集刊——2013年第2期[C];2013年

3 林麗;畢玉德;;越南語(yǔ)給予類(lèi)動(dòng)詞的語(yǔ)義結(jié)構(gòu)和層級(jí)分類(lèi)研究[A];中國(guó)計(jì)算語(yǔ)言學(xué)研究前沿進(jìn)展（2009-2011）[C];2011年

相關(guān)重要報(bào)紙文章前10條

1 實(shí)習(xí)生黃一婧記者周仕興;全國(guó)越南語(yǔ)口語(yǔ)大賽在邕舉行[N];廣西日?qǐng)?bào);2005年

2 陸勇;崇左打響越南語(yǔ)人才跨國(guó)勞務(wù)品牌[N];中國(guó)勞動(dòng)保障報(bào);2008年

3 本報(bào)記者曹植勤實(shí)習(xí)生侯少華鄧芳;越南語(yǔ)里的中國(guó)文化[N];南寧日?qǐng)?bào);2008年

4 唐光福;加強(qiáng)技能培訓(xùn) 提升業(yè)務(wù)能力[N];邊防警察報(bào);2010年

5 記者鄭雅邋實(shí)習(xí)生劉小靈袁晶;把中越語(yǔ)言文化研究向前推進(jìn)[N];南寧日?qǐng)?bào);2007年

6 本報(bào)記者伍建青;教育交流澆灌友誼之花[N];廣西日?qǐng)?bào);2010年

7 黃志輝班紹長(zhǎng);一口流利越南語(yǔ) 邊貿(mào)派上大用場(chǎng)[N];中國(guó)勞動(dòng)保障報(bào);2013年

8 周漢青　本報(bào)記者陳典宏;中士伍新海邊境線上的“金牌翻譯”[N];解放軍報(bào);2010年

9 謝莉麗;越南語(yǔ)畢業(yè)生火爆東盟[N];廣西日?qǐng)?bào);2004年

10 通訊員海仁;海南特招俄語(yǔ)和越南語(yǔ)專(zhuān)業(yè)公務(wù)員[N];中國(guó)人事報(bào);2008年

相關(guān)博士學(xué)位論文前6條

1 武金英（VU KIM ANH）;漢越詞研究[D];河北大學(xué);2016年

2 武忠定;越南語(yǔ)核心詞研究[D];華中科技大學(xué);2012年

3 阮氏玉華;越南語(yǔ)佛教詞語(yǔ)研究[D];華中科技大學(xué);2011年

4 阮氏玉華;越南語(yǔ)佛教詞語(yǔ)研究[D];華中科技大學(xué);2011年

5 阮大瞿越（Nguy（？）n （？）i C（？） Vi（？）t;十七世紀(jì)越南漢字音（A類(lèi)）研究[D];北京大學(xué);2011年

6 阮氏黎心;漢越人體名詞隱喻對(duì)比研究[D];華東師范大學(xué);2011年

相關(guān)碩士學(xué)位論文前10條

1 阮武瓊芳;漢越詞及漢越音在新時(shí)期越南語(yǔ)中的實(shí)踐價(jià)值[D];首都師范大學(xué);2007年

2 徐淑媛;越南語(yǔ)問(wèn)候語(yǔ)研究[D];廣西民族大學(xué);2015年

3 陳氏青日;越南語(yǔ)與漢語(yǔ)擬聲詞對(duì)比研究[D];廣西民族大學(xué);2014年

4 莫媛媛;漢越雙語(yǔ)詞語(yǔ)對(duì)齊方法研究[D];昆明理工大學(xué);2015年

5 陳氏賢;漢語(yǔ)副詞“都”與越南語(yǔ)對(duì)應(yīng)詞對(duì)比研究和偏誤分析[D];福建師范大學(xué);2015年

6 顏偉光（NHAN VI QUANG）;越南語(yǔ)注釋的漢語(yǔ)教材中同譯動(dòng)詞的研究[D];福建師范大學(xué);2015年

7 龐納敏;新HSK六級(jí)詞匯漢越比較研究[D];廣西大學(xué);2015年

8 阮氏莊;漢越介詞對(duì)比研究及越南學(xué)生使用漢語(yǔ)介詞的調(diào)查分析[D];東北師范大學(xué);2015年

9 范功名（Pham Cong Danh）;漢—越語(yǔ)短語(yǔ)語(yǔ)序與正負(fù)遷移研究[D];河北師范大學(xué);2015年

10 黎明柱子;漢越詞:類(lèi)別與越化[D];廣東外語(yǔ)外貿(mào)大學(xué);2015年

，

本文編號(hào)：1362453

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://www.sikaile.net/jingjilunwen/zhengzhijingjixuelunwen/1362453.html

上一篇：司法審判中的社會(huì)輿論因素研究
下一篇：網(wǎng)絡(luò)不良信息及其對(duì)中學(xué)生的傳播效果研究

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

越南語(yǔ)淺層句法分析方法的研究