紹納語詞性標(biāo)注器詞法與轉(zhuǎn)換規(guī)則的改進(jìn)方法研究
發(fā)布時(shí)間:2023-03-10 19:17
自然語言處理(NLP)是指對(duì)人類語言的處理,它是人工智能領(lǐng)域內(nèi)的一門學(xué)科。自然語言處理研究的最終目標(biāo)是解析和理解語言,然而這個(gè)目標(biāo)還尚未實(shí)現(xiàn)。因?yàn)檫@個(gè)原因,對(duì)自然語言處理的大量研究工作集中在中間任務(wù)上,所謂的中間任務(wù)就是說只研究一些能理解語言中內(nèi)在結(jié)構(gòu)的一些方法,而不需要完全的理解語言。其中的一個(gè)主要研究任務(wù)是詞性標(biāo)注或僅僅進(jìn)行簡(jiǎn)單的標(biāo)注。由于紹納語缺乏標(biāo)準(zhǔn)的詞性標(biāo)注器,導(dǎo)致紹納語在機(jī)器翻譯、拼寫檢查、詞典編纂、和自動(dòng)句法分析和構(gòu)造等領(lǐng)域,成為研究者們開展研究的主要困難。到目前為止,還沒有紹納語的詞性標(biāo)注的相關(guān)研究工作,詞性標(biāo)注器的性能還沒有得到足夠的改進(jìn)。因此,本文的研究目的是使用足夠大的訓(xùn)練語料來提高Brill詞性標(biāo)注器在紹納語上的詞法和轉(zhuǎn)換規(guī)則方面的能力。因此,我們回顧了紹納語關(guān)于語法和形態(tài)的文獻(xiàn)以理解紹納語的性質(zhì),并且識(shí)別出了可能的標(biāo)注集合。通過閱讀資料,我們確定了26個(gè)廣泛的標(biāo)注集,并且從包含6750個(gè)不同單詞的1100個(gè)句子中提取了17473個(gè)被標(biāo)注的單詞用于訓(xùn)練和測(cè)試。其中,258個(gè)句子來自于先前的工作中。由于只有少數(shù)現(xiàn)成的標(biāo)準(zhǔn)語料庫,而人工標(biāo)注來得到語料庫是一項(xiàng)艱巨的任...
【文章頁數(shù)】:66 頁
【學(xué)位級(jí)別】:碩士
【文章目錄】:
摘要
Abstract
Chapter 1 Introduction
1.1 Background
1.2 Statement of the Problem
1.3 Objective
1.3.1 Specific Objectives
1.4 Methodology
1.4.1 Data Collection
1.4.2 Modeling
1.4.3 Testing and Validation
1.5 Tools and Techniques
1.6 Application of Results
1.7 Organization of the Paper
Chapter 2 Literature
2.1 Literature Review
2.1.1 Statistical Approach
2.1.2 Hidden Markov Model
2.1.3 Maximum Entropy Model
2.2 Rule-Based Approach
2.2.1 Transformation-Based Approach
2.2.2 Artificial Neural Network Approach
2.2.3 Hybrid Approach
2.3 Related Works
Chapter 3 Tag-set preparation
3.1 Introduction
3.2 The Shona Language Phonetics
3.3 The Shona Language Sentence Structure
3.4 Shona Language Word Classes
3.4.1 Shona Noun (Zita)
3.4.2 Shona Pronoun
3.4.3 Shona Adjective
3.4.4 Afaan Oromo Verb (Xumura)
3.4.5 Shona Adverbs
3.4.6 Shona Conjunction
3.4.7 Shona Pre-position
3.4.8 Shona Introjections
3.4.9 Shona Numeral
3.5 Shona Tags and Tag sets
Chapter 4 Design of the POS tagger
4.1 Introduction
4.2 Approaches and techniques
4.3 Designing Transformation-based error-driven learning
4.2.1 Rules
4.2.2 Learning Phase
4.2.3 The Lexical Rule Learner
4.2.4 The Contextual Rule Learner
4.2.5 Brill Tagger Architecture
Chapter 5 Implementation
5.1 Introduction
5.2 Corpus Preparation
5.3 Implementation of the Brill's Tagger
5.3.1 Implementation of the Initial State Tagger (HMM Tagger)
5.3.2 Implementation of the Brill's tagger Learning phase
Chapter 6 Experiment and performance analysis
6.1 Introduction
6.2 Experiments
6.2.1 Brill's Tagger Versus Corpus Size
6.3 Performance Analysis
6.4 Discussion
Chapter 7 Conclusion and Recommendation
7.1 Conclusion
7.2 Recommendations
References
Acknowledgements
本文編號(hào):3758406
【文章頁數(shù)】:66 頁
【學(xué)位級(jí)別】:碩士
【文章目錄】:
摘要
Abstract
Chapter 1 Introduction
1.1 Background
1.2 Statement of the Problem
1.3 Objective
1.3.1 Specific Objectives
1.4 Methodology
1.4.1 Data Collection
1.4.2 Modeling
1.4.3 Testing and Validation
1.5 Tools and Techniques
1.6 Application of Results
1.7 Organization of the Paper
Chapter 2 Literature
2.1 Literature Review
2.1.1 Statistical Approach
2.1.2 Hidden Markov Model
2.1.3 Maximum Entropy Model
2.2 Rule-Based Approach
2.2.1 Transformation-Based Approach
2.2.2 Artificial Neural Network Approach
2.2.3 Hybrid Approach
2.3 Related Works
Chapter 3 Tag-set preparation
3.1 Introduction
3.2 The Shona Language Phonetics
3.3 The Shona Language Sentence Structure
3.4 Shona Language Word Classes
3.4.1 Shona Noun (Zita)
3.4.2 Shona Pronoun
3.4.3 Shona Adjective
3.4.4 Afaan Oromo Verb (Xumura)
3.4.5 Shona Adverbs
3.4.6 Shona Conjunction
3.4.7 Shona Pre-position
3.4.8 Shona Introjections
3.4.9 Shona Numeral
3.5 Shona Tags and Tag sets
Chapter 4 Design of the POS tagger
4.1 Introduction
4.2 Approaches and techniques
4.3 Designing Transformation-based error-driven learning
4.2.1 Rules
4.2.2 Learning Phase
4.2.3 The Lexical Rule Learner
4.2.4 The Contextual Rule Learner
4.2.5 Brill Tagger Architecture
Chapter 5 Implementation
5.1 Introduction
5.2 Corpus Preparation
5.3 Implementation of the Brill's Tagger
5.3.1 Implementation of the Initial State Tagger (HMM Tagger)
5.3.2 Implementation of the Brill's tagger Learning phase
Chapter 6 Experiment and performance analysis
6.1 Introduction
6.2 Experiments
6.2.1 Brill's Tagger Versus Corpus Size
6.3 Performance Analysis
6.4 Discussion
Chapter 7 Conclusion and Recommendation
7.1 Conclusion
7.2 Recommendations
References
Acknowledgements
本文編號(hào):3758406
本文鏈接:http://www.sikaile.net/kejilunwen/zidonghuakongzhilunwen/3758406.html
最近更新
教材專著