藏文名詞短語結構類型分布與統(tǒng)計研究
發(fā)布時間:2018-03-05 17:10
本文選題:名詞短語 切入點:短語結構 出處:《西北民族大學》2017年碩士論文 論文類型:學位論文
【摘要】:大數(shù)據(jù)策略和深度學習方法已經成為藏語自然語言處理領域的主流技術。當前,知識資源和標注語料庫的匱乏已經影響到了藏語智能化研究的進程,尤其是像WordNet、HowNet和框架語義一樣的詞匯語義資源和句法結構標注、語義角色標注以及篇章信息標注等資源,還未形成統(tǒng)一的規(guī)范模式,深度學習等主流的學習方法不能用于實際訓練。因此,資源庫建設已經成為藏文信息處理領域中一項基礎而艱巨的任務。名詞短語、動詞短語和形容詞短語研究是句法樹庫構建所面臨的核心問題。本文在藏語句法樹庫框架下,對藏語名詞短語及其結構展開分類統(tǒng)計研究,其目的是檢驗藏語短語結構分類歸納的準確性,提高藏語短語分析的效率,加快藏語句法樹庫構建的進程。文章主要分為八個章節(jié)進行敘述,首先討論了短語的研究背景和研究現(xiàn)狀,進一步去了解了英語和漢語中名詞短語的相關句法分析理論和構建名詞短語結構庫所需的語料。其次,對英語、漢語和藏語的名詞短語的概念進行敘述,并通過藏語文本真實語料對藏語中構成名詞短語的結構進行分析,將詞類修飾構成的名詞短語進行分類歸納,通過分類歸納建立了藏語名詞短語的標記集。最后,通過藏文真實語料中對名詞短語結構的統(tǒng)計結果構建了名詞短語結構庫、名詞短語詞性標注庫和名詞短語結構標注軟件。文章整體采用了語料實證、對比分析、統(tǒng)計分析、人工標注以及人工校對的研究方法,建立了藏語基本名詞短語結構庫和詞性標注語料庫?傊,藏文名詞短語結構類型分布與統(tǒng)計研究為藏語句法語義分析和樹庫構建提供基本資源,為信息檢索、搜索引擎、機器翻譯、文本分類、模式識別、多媒體教學、網(wǎng)絡等應用技術領域提供一定的理論與技術支持。
[Abstract]:Big data's strategy and in-depth learning methods have become the mainstream technology in the field of Tibetan natural language processing. At present, the lack of knowledge resources and annotated corpus has affected the process of intelligent Tibetan language research. In particular, lexical semantic resources and syntactic structure tagging, semantic role tagging and textual information tagging resources, such as WordNet HowNet and framework semantics, have not yet formed a unified normative model. Mainstream learning methods, such as in-depth learning, cannot be used for practical training. Therefore, the construction of a resource bank has become a basic and arduous task in the field of Tibetan information processing. The study of verb phrase and adjective phrase is the core problem in the construction of syntactic tree library. This paper, under the framework of Tibetan syntactic tree library, makes a statistical study of Tibetan noun phrases and their structures. The purpose of this paper is to test the accuracy of the classification and induction of Tibetan phrase structure, to improve the efficiency of Tibetan phrase analysis, and to speed up the construction of Tibetan syntactic tree bank. First of all, it discusses the background and present situation of phrase research, and further studies the syntactic analysis theory of noun phrase in English and Chinese, and the data needed to construct the noun phrase structure database. The concept of noun phrases in Chinese and Tibetan is described, and the structure of noun phrases in Tibetan is analyzed through the true data of Tibetan texts, and the noun phrases which are modified by parts of speech are classified and summarized. The tag set of Tibetan noun phrases is established by classification and induction. Finally, the noun phrase structure database is constructed through the statistical results of the noun phrase structure in the real Tibetan corpus. Part of speech tagging database and noun phrase structure tagging software. The research methods of corpus demonstration, comparative analysis, statistical analysis, manual tagging and artificial proofreading are used in this paper. The basic noun phrase structure database and part of speech tagging corpus are established. In a word, the distribution and statistical study of Tibetan noun phrase structure types provide basic resources for Tibetan syntactic and semantic analysis and tree database construction, as well as for information retrieval and search engine. Machine translation, text classification, pattern recognition, multimedia teaching, network and other applications provide some theoretical and technical support.
【學位授予單位】:西北民族大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:H214
【參考文獻】
相關期刊論文 前2條
1 章忠憲;;基于規(guī)則的英語名詞短語結構自動識別研究[J];吉林工程技術師范學院學報;2013年07期
2 王維賢;;現(xiàn)代漢語的短語結構和句子結構[J];語文研究;1984年03期
,本文編號:1571152
本文鏈接:http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/1571152.html
最近更新
教材專著