基于依存語(yǔ)法的蒙古語(yǔ)賓述關(guān)系描述與識(shí)別研究
發(fā)布時(shí)間:2018-06-02 22:03
本文選題:蒙古語(yǔ)樹(shù)庫(kù) + 依存語(yǔ)法 ; 參考:《內(nèi)蒙古大學(xué)》2017年碩士論文
【摘要】:蒙古文信息處理研究工作中句法分析屬于關(guān)鍵技術(shù)。近年來(lái)隨著信息處理工作的深入,諸如文本校對(duì)、機(jī)器翻譯等應(yīng)用系統(tǒng)的研發(fā),對(duì)句法分析結(jié)果提出了更高的要求。本文以蒙古語(yǔ)傳統(tǒng)語(yǔ)法學(xué)研究為理論依據(jù),在蒙古語(yǔ)詞法分析、依存句法分析等信息處理成果的基礎(chǔ)上,從統(tǒng)計(jì)學(xué)和計(jì)量學(xué)角度,對(duì)現(xiàn)代蒙古語(yǔ)賓述關(guān)系動(dòng)態(tài)特性進(jìn)行描述并設(shè)計(jì)實(shí)現(xiàn)了自動(dòng)識(shí)別。賓述關(guān)系是一種比較復(fù)雜的依存關(guān)系類型,在蒙古語(yǔ)句子中所占的比例很高。蒙古語(yǔ)形態(tài)變化復(fù)雜,致使提高蒙古語(yǔ)賓述關(guān)系識(shí)別準(zhǔn)確率也變得困難,其主要難點(diǎn)在于對(duì)省略賓格形式出現(xiàn)的直接賓述關(guān)系識(shí)別與間接賓述關(guān)系識(shí)別。正確識(shí)別蒙古語(yǔ)賓述關(guān)系對(duì)于蒙古語(yǔ)句法分析具有重要的意義。主要體現(xiàn)在以下兩點(diǎn):①傳統(tǒng)語(yǔ)言學(xué)研究方面,用統(tǒng)計(jì)學(xué)方法為傳統(tǒng)語(yǔ)法學(xué)原理提供了驗(yàn)證手段和數(shù)據(jù)。②信息處理方面,擴(kuò)充了樹(shù)庫(kù)語(yǔ)料的同時(shí)為細(xì)化蒙古語(yǔ)句法分析研究提出了創(chuàng)新型的模式。本文分以下幾個(gè)步驟對(duì)蒙古語(yǔ)賓述關(guān)系進(jìn)行動(dòng)態(tài)特性描述和自動(dòng)標(biāo)識(shí)研究:一、對(duì)現(xiàn)代蒙古語(yǔ)依存樹(shù)庫(kù)進(jìn)行擴(kuò)充并校對(duì)完善。新增校對(duì)樹(shù)庫(kù)達(dá)到189048個(gè)詞,13154個(gè)句子規(guī)模。二、對(duì)蒙古語(yǔ)賓述關(guān)系詞法特點(diǎn)、搭配特點(diǎn)、依存句法特點(diǎn)等進(jìn)行了詳細(xì)的統(tǒng)計(jì)分析,為人工編寫識(shí)別規(guī)則和機(jī)器學(xué)習(xí)特征模板的制定提供了必要的理論依據(jù)。三、對(duì)蒙古語(yǔ)賓述關(guān)系的識(shí)別實(shí)驗(yàn)分別進(jìn)行了四組,即①基于CRF統(tǒng)計(jì)模型的識(shí)別實(shí)驗(yàn);②加入人工編寫規(guī)則的CRF統(tǒng)計(jì)模型識(shí)別實(shí)驗(yàn);③加入有條件限制規(guī)則的CRF統(tǒng)計(jì)模型識(shí)別實(shí)驗(yàn)。④修訂規(guī)則后的CRF統(tǒng)計(jì)模型識(shí)別實(shí)驗(yàn)。準(zhǔn)確率分別達(dá)到89.81%、89.80%、89.80%和89.73%。
[Abstract]:Syntactic analysis is a key technology in Mongolian information processing. In recent years, with the development of information processing, such as text proofreading, machine translation and other application systems, the result of syntactic parsing has been put forward higher requirements. Based on the theoretical basis of Mongolian traditional grammar research and the results of information processing such as lexical analysis and dependency syntax analysis in Mongolian, this paper is based on statistics and metrology. This paper describes the dynamic characteristics of object description relation in modern Mongolian language and realizes automatic recognition. Object-declarative relation is a complex type of dependency relation, which accounts for a high proportion in Mongolian sentences. The complexity of Mongolian morphology makes it difficult to improve the accuracy of object description relationship recognition in Mongolian language. The main difficulty lies in the recognition of direct object relation and indirect object description relation in the form of elliptical object. It is of great significance to correctly recognize the object-to-state relation in Mongolian language for the parsing of Mongolian syntax. Mainly reflected in the following two points: 1. Traditional linguistic research. The statistical method provides the verification means for the traditional grammar principles and the information processing of data .2. The tree corpus is expanded and an innovative model is proposed for the refinement of Mongolian syntactic analysis. This paper is divided into the following steps to describe the dynamic characteristics and automatic identification of Mongolian object description: first, to expand and improve the modern Mongolian dependency tree library. The new proofreading treebank reached 18,9048 words and 13154 sentences. Secondly, this paper makes a detailed statistical analysis on the lexical features, collocation characteristics and dependency syntax features of the object relation in Mongolian language, which provides a necessary theoretical basis for the manual writing of recognition rules and the establishment of machine learning feature templates. Third, four groups of recognition experiments of Mongolian object-declarative relation are carried out, that is, 1 recognition experiment based on CRF statistical model and 2 CRF statistical model recognition experiment based on manual compiling rule; (3) the experiment of CRF statistical model recognition after adding conditional restriction rule to CRF statistical model recognition experiment. The accuracy rate was 89.81%, 89.80% and 89.73%, respectively.
【學(xué)位授予單位】:內(nèi)蒙古大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:H212
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 哈斯;布音其其格;;基于蒙古語(yǔ)名詞語(yǔ)義網(wǎng)的同形詞歧義消除研究[J];中文信息學(xué)報(bào);2016年06期
2 烏蘭;達(dá)胡白乙拉;關(guān)曉p,
本文編號(hào):1970363
本文鏈接:http://www.sikaile.net/shoufeilunwen/zaizhiboshi/1970363.html
最近更新
教材專著