基于信息抽取的個性化校園日歷系統(tǒng)的研究
發(fā)布時間:2018-08-14 14:03
【摘要】:伴隨著互聯(lián)網(wǎng)的飛速發(fā)展,信息數(shù)據(jù)也隨之越來越多樣化和復雜化,這也給用戶在查詢信息時帶來了很多的不便。如何從每天不斷涌現(xiàn)的大量的數(shù)據(jù)中提取出需要的信息的也成為自然語言處理研究的重點。而本文研究的信息抽取技術應運而生,將大量無序、不規(guī)則的信息抽取出來并結構化存儲,對推動信息技術的發(fā)展具有重要作用。 本文的特色是研究了以事件和時間為中心的信息抽取技術,并且設計和實現(xiàn)了個性化校園日歷系統(tǒng)。主要創(chuàng)新點和研究成果如下: 首先,設計和實現(xiàn)了一種將規(guī)則和統(tǒng)計模型相結合的中文實體關系抽取算法,該方法利用正則表達式抽取出準確結果,采用條件隨機場模型和最大熵模型相結合的機器學習方法給出補充結果,提高了準確率和召回率。該方法在TAC-KBP評測的SlotFilling任務中取得了較好的效果。 其次,提出并設計實現(xiàn)了個性化校園日歷系統(tǒng),該系統(tǒng)在抽取事件信息的同時對事件中的時間信息進行整理,為人們?nèi)媪私馐录峁┝司索。此系統(tǒng)采用基于規(guī)則的方法抽取了文本信息中的時間表達式并對其進行歸一化處理。在此基礎上,提出詞激活力模型的事件起止時間表達式的識別方法。事件的起止時間對于了解事件的發(fā)展進程提供了更多的信息。該系統(tǒng)已經(jīng)在校園實體搜索引擎系統(tǒng)COSE中成功應用并上線。 第三,提出一種基于WAF的情感傾向詞表擴展方法以及基于機器學習的文本的情感傾向性判斷方法。該方法在2011COAE評測的任務一觀點詞抽取與傾向性判斷的問題解決上取得較好成績。該算法模型為校園日歷系統(tǒng)添加了情感傾向性判斷功能。該功能可進一步應用于校園輿情監(jiān)控。
[Abstract]:With the rapid development of the Internet, the information data is becoming more and more diversified and complicated, which also brings a lot of inconvenience to users in querying information. How to extract the needed information from a large number of daily data has also become the focus of natural language processing. The technology of information extraction which is studied in this paper arises as the times require. A large amount of disordered and irregular information is extracted out and stored structurally, which plays an important role in promoting the development of information technology. The feature of this paper is to study the information extraction technology with event and time as the center, and design and implement the personalized campus calendar system. The main innovations and research results are as follows: firstly, a Chinese entity relation extraction algorithm combining rule and statistical model is designed and implemented. The machine learning method combined with conditional random field model and maximum entropy model is used to give the supplementary results, which improves the accuracy and recall rate. This method has achieved good results in the SlotFilling task evaluated by TAC-KBP. Secondly, a personalized campus calendar system is proposed and implemented. The system extracts the event information and collates the time information of the event, which provides a clue for people to understand the event comprehensively. In this system, the time expression of text information is extracted and normalized by rule-based method. On the basis of this, a method of identifying the expression of event start and end time based on word activation force model is proposed. The timing of events provides more information about the evolution of events. The system has been successfully applied in the campus entity search engine system COSE. Thirdly, an extension method of affective propensity lexicon based on WAF and a method to judge the affective tendency of text based on machine learning are proposed. This method has achieved good results in the task-viewpoint word extraction and tendency judgment of 2011COAE evaluation. The algorithm model adds the function of emotional orientation judgment for the campus calendar system. This function can be further applied to the monitoring of campus public opinion.
【學位授予單位】:北京郵電大學
【學位級別】:碩士
【學位授予年份】:2013
【分類號】:TP391.1
本文編號:2183091
[Abstract]:With the rapid development of the Internet, the information data is becoming more and more diversified and complicated, which also brings a lot of inconvenience to users in querying information. How to extract the needed information from a large number of daily data has also become the focus of natural language processing. The technology of information extraction which is studied in this paper arises as the times require. A large amount of disordered and irregular information is extracted out and stored structurally, which plays an important role in promoting the development of information technology. The feature of this paper is to study the information extraction technology with event and time as the center, and design and implement the personalized campus calendar system. The main innovations and research results are as follows: firstly, a Chinese entity relation extraction algorithm combining rule and statistical model is designed and implemented. The machine learning method combined with conditional random field model and maximum entropy model is used to give the supplementary results, which improves the accuracy and recall rate. This method has achieved good results in the SlotFilling task evaluated by TAC-KBP. Secondly, a personalized campus calendar system is proposed and implemented. The system extracts the event information and collates the time information of the event, which provides a clue for people to understand the event comprehensively. In this system, the time expression of text information is extracted and normalized by rule-based method. On the basis of this, a method of identifying the expression of event start and end time based on word activation force model is proposed. The timing of events provides more information about the evolution of events. The system has been successfully applied in the campus entity search engine system COSE. Thirdly, an extension method of affective propensity lexicon based on WAF and a method to judge the affective tendency of text based on machine learning are proposed. This method has achieved good results in the task-viewpoint word extraction and tendency judgment of 2011COAE evaluation. The algorithm model adds the function of emotional orientation judgment for the campus calendar system. This function can be further applied to the monitoring of campus public opinion.
【學位授予單位】:北京郵電大學
【學位級別】:碩士
【學位授予年份】:2013
【分類號】:TP391.1
【參考文獻】
相關期刊論文 前8條
1 劉克彬;李芳;劉磊;韓穎;;基于核函數(shù)中文關系自動抽取系統(tǒng)的實現(xiàn)[J];計算機研究與發(fā)展;2007年08期
2 李保利,陳玉忠,俞士汶;信息抽取研究綜述[J];計算機工程與應用;2003年10期
3 張曉艷;王挺;陳火旺;;命名實體識別研究[J];計算機科學;2005年04期
4 鄧擘;樊孝忠;楊立公;;用語義模式提取實體關系的方法[J];計算機工程;2007年10期
5 劉遷;焦慧;賈惠波;;信息抽取技術的發(fā)展現(xiàn)狀及構建方法的研究[J];計算機應用研究;2007年07期
6 車萬翔,劉挺,李生;實體關系自動抽取[J];中文信息學報;2005年02期
7 孫茂松,黃昌寧,,高海燕,方捷;中文姓名的自動辨識[J];中文信息學報;1995年02期
8 張小衡,王玲玲;中文機構名稱的識別與分析[J];中文信息學報;1997年04期
相關博士學位論文 前1條
1 張素香;信息抽取中關鍵技術的研究[D];北京郵電大學;2007年
本文編號:2183091
本文鏈接:http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/2183091.html
最近更新
教材專著