天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 搜索引擎論文 >

語句壓縮及其應用研究

發(fā)布時間:2018-02-10 07:53

  本文關鍵詞: 語句壓縮 結構化學習方法 整型規(guī)劃 多文檔自動文摘 自然語言處理 出處:《蘇州大學》2013年碩士論文 論文類型:學位論文


【摘要】:近年來隨著各類信息的日益增多,人們對信息處理的需求促進了自然語言處理技術的空前發(fā)展。與此同時,,人們對如何在海量的信息中及時找出有價值的信息越來越關注,語句壓縮作為自然語言處理的基礎日益受到研究者的關注。語句壓縮可以應用于自動摘要、自動標題、搜索引擎和話題檢測等諸多任務中。 目前,主流的語句壓縮研究主要基于語料驅動的監(jiān)督模型。本文采用監(jiān)督模型中的判別式模型,通過學習成分句法樹的剪裁實現(xiàn)語句壓縮。本文的研究內容主要包括以下幾個方面: 1、基于結構化學習的語句壓縮研究。首先,使用匹配抽取的方式構建中文平行語料庫;然后,提出語料庫擴展方式,為解決語料庫規(guī)模小的問題提供了新思路;最后,使用結構化學習算法學習源語句成分句法樹的剪裁過程,并實現(xiàn)語句壓縮。實驗結果表明,基于結構化學習的語句壓縮模型具有很好的性能,且提出的語料庫擴展方式具有可行性。 2、語句壓縮解碼方法研究。在基于判別式模型的方法下,提出使用整型規(guī)劃方法解碼。通過將語句壓縮問題轉換為整型規(guī)劃問題實現(xiàn)最優(yōu)目標語句的查詢,該解碼方法能在保持較好的壓縮率的情況下保留源語句的主要信息。 3、語句壓縮評測指標研究。針對語句壓縮缺乏合適的自動評測指標,本文在基于單詞刪除的語句壓縮系統(tǒng)中,引入了BLEU和N-Gram兩種評測指標用于評測語句壓縮性能,并通過實驗驗證了這兩種評測指標的適用性。 4、語句壓縮的應用研究。以語句壓縮的應用為切入點,將語句壓縮系統(tǒng)應用于多文檔自動文摘任務中。實驗結果表明壓縮系統(tǒng)在多文檔自動文摘中能夠刪除部分句子級別的非重要信息且不影響文摘的可讀性。
[Abstract]:In recent years, with the increasing of all kinds of information, people's demand for information processing has promoted the unprecedented development of natural language processing technology. At the same time, people pay more and more attention to how to find valuable information in a large amount of information in time. Sentence compression, as the foundation of natural language processing, has attracted more and more attention from researchers. Sentence compression can be used in many tasks, such as automatic summary, automatic title, search engine and topic detection. At present, the mainstream research on sentence compression is mainly based on corpus driven supervisory model. Sentence compression is realized by tailoring the syntactic tree of learning components. The research content of this paper mainly includes the following aspects:. 1. The research of sentence compression based on structured learning. Firstly, we construct Chinese parallel corpus by matching extraction. Then, we propose a new method of corpus expansion, which provides a new way to solve the problem of small size of corpus. The structural learning algorithm is used to learn the clipping process of the syntactic tree of source sentence components, and the sentence compression is realized. The experimental results show that the model of sentence compression based on structured learning has a good performance. And the proposed corpus expansion is feasible. 2. Research on the method of sentence compression and decoding. Based on discriminant model, an integer programming method is proposed to decode. The query of the optimal target statement is realized by converting the sentence compression problem to the integer programming problem. The decoding method can retain the main information of the source statement while keeping a good compression ratio. 3. The research of sentence compression evaluation index. Aiming at the lack of proper automatic evaluation index, this paper introduces BLEU and N-Gram in the sentence compression system based on word deletion, which is used to evaluate the performance of sentence compression. The applicability of these two evaluation indexes is verified by experiments. 4. Research on the application of sentence compression. The sentence compression system is applied to the task of multi-document automatic abstracting. The experimental results show that the system can delete some non-important information at sentence level and does not affect the readability of the abstract.
【學位授予單位】:蘇州大學
【學位級別】:碩士
【學位授予年份】:2013
【分類號】:TP391.1

【參考文獻】

相關期刊論文 前4條

1 張明慧;王紅玲;周國棟;;基于LDA主題特征的自動文摘方法[J];計算機應用與軟件;2011年10期

2 秦兵,劉挺,李生;多文檔自動文摘綜述[J];中文信息學報;2005年06期

3 張瑾;王小磊;許洪波;;自動文摘評價方法綜述[J];中文信息學報;2008年03期

4 沈洲,王永成,許一震,方澈;自動文摘系統(tǒng)評價方法的研究與實踐[J];情報學報;2001年01期



本文編號:1500031

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/1500031.html


Copyright(c)文論論文網All Rights Reserved | 網站地圖 |

版權申明:資料由用戶7011b***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com