天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 軟件論文 >

日漢數(shù)字時(shí)間表達(dá)式的識別與翻譯研究

發(fā)布時(shí)間:2018-10-18 09:01
【摘要】:命名實(shí)體識別及翻譯是自然語言處理中重要的基礎(chǔ)任務(wù)。數(shù)字時(shí)間表達(dá)式作為一類特殊的命名實(shí)體包含了關(guān)鍵信息,其識別與翻譯具有重要的理論意義和實(shí)用價(jià)值。數(shù)字時(shí)間表達(dá)式的識別與分析是信息檢索、事件抽取、事件檢測跟蹤及問答系統(tǒng)等自然語言處理任務(wù)的重要基礎(chǔ)。尤其在機(jī)器翻譯等多語言處理任務(wù)中,數(shù)字時(shí)間表達(dá)式的對齊及其翻譯質(zhì)量也是影響機(jī)器翻譯系統(tǒng)性能的重要因素。數(shù)字時(shí)間表達(dá)式識別與翻譯的研究對于提高機(jī)器翻譯系統(tǒng)性能及推進(jìn)人工智能快速發(fā)展具有重要意義。本文從日漢雙語數(shù)字時(shí)間表達(dá)式的特性出發(fā),將語言學(xué)知識與統(tǒng)計(jì)方法相結(jié)合,通過大量的數(shù)據(jù)分析和實(shí)驗(yàn),對日漢雙語數(shù)字時(shí)間表達(dá)式的識別與翻譯方法進(jìn)行了深入的研究和探索并將其應(yīng)用于機(jī)器翻譯系統(tǒng)。本文的主要研究工作如下:(1)基于最新的TIMEX3時(shí)間標(biāo)注規(guī)范和通用的數(shù)字分類方式,結(jié)合日漢語言學(xué)知識中同構(gòu)和異構(gòu)情況,分別針對日語和漢語的數(shù)字時(shí)間表達(dá)式建立了觸發(fā)詞、邊界詞等關(guān)鍵詞知識庫,并將表達(dá)“概數(shù)”含義的詞包含在數(shù)字時(shí)間表達(dá)式識別范圍中,使得數(shù)字時(shí)間表達(dá)式具有更豐富的含義;然后利用正則匹配的方式對數(shù)字時(shí)間表達(dá)式進(jìn)行識別;最后將以上基于規(guī)則與基于統(tǒng)計(jì)的識別方法相融合,分別實(shí)現(xiàn)對日語和漢語數(shù)字時(shí)間表達(dá)式的識別。實(shí)驗(yàn)結(jié)果表明,該識別方法在日語和漢語上都有較好的表現(xiàn)。(2)在傳統(tǒng)的詞對齊方法中融入雙語數(shù)字時(shí)間表達(dá)式對齊,提出了一種基于位置約束和相似度度量相結(jié)合的數(shù)字時(shí)間表達(dá)式雙向?qū)R算法,實(shí)驗(yàn)結(jié)果表明該算法能有效提高雙語詞對齊性能,輔助機(jī)器翻譯系統(tǒng)訓(xùn)練生成更優(yōu)的翻譯模型。(3)根據(jù)日漢數(shù)字時(shí)間表達(dá)式的翻譯特點(diǎn),建立數(shù)字時(shí)間表達(dá)式的翻譯規(guī)則庫,專用于數(shù)字時(shí)間表達(dá)式的獨(dú)立翻譯,并將雙語數(shù)字時(shí)間表達(dá)式的識別及對齊信息和翻譯規(guī)則庫有效融合到現(xiàn)有的統(tǒng)計(jì)機(jī)器翻譯系統(tǒng)中,提升機(jī)器翻譯中關(guān)于數(shù)字時(shí)間表達(dá)式及其鄰近詞的翻譯準(zhǔn)確性,進(jìn)而提升整體翻譯效果,并通過實(shí)驗(yàn)得以驗(yàn)證。綜上所述,本文創(chuàng)新工作主要體現(xiàn)在:根據(jù)日漢數(shù)字時(shí)間表達(dá)式的特性,基于TIMEX3標(biāo)注對時(shí)間詞的識別和翻譯規(guī)則進(jìn)行設(shè)計(jì)、將“概數(shù)”詞納入數(shù)字時(shí)間表達(dá)式識別范圍;并提出一種基于位置約束和相似度度量的數(shù)字時(shí)間表達(dá)式雙向?qū)R算法;以及建立日漢數(shù)字時(shí)間表達(dá)式的翻譯規(guī)則庫。最終將這三方面研究內(nèi)容應(yīng)用于機(jī)器翻譯系統(tǒng),實(shí)驗(yàn)驗(yàn)證其有效地改善了機(jī)器翻譯系統(tǒng)的整體性能。
[Abstract]:Named entity recognition and translation are important basic tasks in natural language processing. As a special named entity, digital time expression contains key information, and its recognition and translation have important theoretical significance and practical value. Recognition and analysis of digital time expressions are the important foundation of natural language processing tasks such as information retrieval, event extraction, event detection and tracking, and question and answer system. Especially in multilingual processing tasks such as machine translation, the alignment of digital time expressions and their translation quality are also important factors affecting the performance of machine translation systems. The research of digital time expression recognition and translation is of great significance to improve the performance of machine translation system and promote the rapid development of artificial intelligence. Based on the characteristics of Japanese and Chinese bilingual digital time expressions, this paper combines linguistic knowledge with statistical methods, and through a large number of data analysis and experiments, The recognition and translation methods of Japanese and Chinese bilingual digital time expressions are deeply studied and applied to machine translation systems. The main research work of this paper is as follows: (1) based on the latest TIMEX3 time labeling specification and the general numerical classification method, combined with the isomorphism and heterogeneity of Japanese and Chinese language knowledge, the trigger words are established for Japanese and Chinese digital time expressions, respectively. The knowledge base of keywords such as boundary words, and the words expressing the meaning of "approximate number" are included in the recognition range of digital time expression, which makes digital time expression have richer meaning. Then the digital time expression is recognized by regular matching. Finally, the recognition of Japanese and Chinese digital time expressions is realized by combining the above rule-based and statistical recognition methods. The experimental results show that the method has a good performance in both Japanese and Chinese. (2) the bilingual digital time expression alignment is incorporated into the traditional word alignment method. A bidirectional alignment algorithm of digital time expressions based on position constraint and similarity measure is proposed. The experimental results show that the algorithm can effectively improve the performance of bilingual word alignment. The auxiliary machine translation system trains to generate a better translation model. (3) according to the translation characteristics of Japanese and Chinese digital time expressions, a translation rule base of digital time expressions is established, which is used for the independent translation of digital time expressions. The recognition and alignment information of bilingual digital time expressions and translation rules are effectively integrated into the existing statistical machine translation system to improve the accuracy of translation of digital time expressions and their adjacent words in machine translation. Thus, the overall translation effect can be improved and verified by experiments. To sum up, the innovative work of this paper is mainly reflected in: according to the characteristics of Japanese and Chinese digital time expressions, the recognition and translation rules of time words are designed based on TIMEX3 annotation, and the "estimate" words are brought into the recognition scope of digital time expressions; A bidirectional alignment algorithm for digital time expressions based on position constraints and similarity measures is proposed, and a translation rule base of Japanese and Chinese digital time expressions is established. Finally, these three aspects are applied to the machine translation system, and the experimental results show that it improves the overall performance of the machine translation system effectively.
【學(xué)位授予單位】:北京交通大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 夏光輝;李軍蓮;阮學(xué)平;;基于實(shí)體詞典與機(jī)器學(xué)習(xí)的基因命名實(shí)體識別[J];醫(yī)學(xué)信息學(xué)雜志;2015年12期

2 楊萍;侯宏旭;蔣玉鵬;申志鵬;杜健;;基于雙語對齊的漢語 新蒙古文命名實(shí)體翻譯[J];北京大學(xué)學(xué)報(bào)(自然科學(xué)版);2016年01期

3 王東明;徐金安;陳鈺楓;張玉潔;;基于單語語料的面向日語假名的日漢人名翻譯對抽取方法[J];中文信息學(xué)報(bào);2015年05期

4 尹存燕;黃書劍;戴新宇;陳家駿;;中英命名實(shí)體識別及對齊中的中文分詞優(yōu)化[J];電子學(xué)報(bào);2015年08期

5 劉勝奇;朱東華;;基于多策略融合Giza++的術(shù)語對齊法[J];軟件學(xué)報(bào);2015年07期

6 尹存燕;黃書劍;戴新宇;陳家駿;;面向新聞?wù)Z料的中日命名實(shí)體翻譯抽取[J];小型微型計(jì)算機(jī)系統(tǒng);2015年06期

7 趙紫玉;徐金安;張玉潔;劉江鳴;;規(guī)則與統(tǒng)計(jì)相結(jié)合的日語時(shí)間表達(dá)式識別[J];中文信息學(xué)報(bào);2013年06期

8 徐紅艷;黨曉婉;馮勇;李軍平;;基于BP神經(jīng)網(wǎng)絡(luò)的Deep Web實(shí)體識別方法[J];計(jì)算機(jī)應(yīng)用;2013年03期

9 李君嬋;譚紅葉;王風(fēng)娥;;中文時(shí)間表達(dá)式及類型識別[J];計(jì)算機(jī)科學(xué);2012年S3期

10 陳鈺楓;宗成慶;蘇克毅;;漢英雙語命名實(shí)體識別與對齊的交互式方法[J];計(jì)算機(jī)學(xué)報(bào);2011年09期

相關(guān)碩士學(xué)位論文 前1條

1 鄔桐;中文時(shí)間表達(dá)式識別研究[D];復(fù)旦大學(xué);2010年



本文編號:2278643

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/kejilunwen/ruanjiangongchenglunwen/2278643.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶839eb***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請E-mail郵箱bigeng88@qq.com