中文情緒表達(dá)常識庫構(gòu)建及其在情緒分析中的應(yīng)用
發(fā)布時間:2018-07-22 11:01
【摘要】:隨著人機交互逐漸被人們所熟知和應(yīng)用,計算機被期望擁有與人一樣的情感、情緒方面處理能力。近年來,社會化媒體的興起使得用戶生成的文本,尤其是帶有個人情緒的微博、博客和評論等被大量推送在網(wǎng)絡(luò)上。網(wǎng)絡(luò)文本數(shù)據(jù)推動了對大量真實個體情緒分析和跟蹤的研究,在社會、政治、經(jīng)濟(jì)等領(lǐng)域顯示出重要的研究意義和廣闊的應(yīng)用前景。本課題研究中文情緒基礎(chǔ)資源建設(shè)及其在文本情緒分析中的應(yīng)用,從情緒體系模型、情緒詞基礎(chǔ)資源構(gòu)建和多標(biāo)簽文本情緒自動分類三個方面分析。本文主要包括以下四項工作:第一,針對中文情緒詞典資源較為匱乏的問題,利用英文情緒詞典Word Net-Affect,通過機器翻譯、噪音過濾和同義擴展步驟,自動構(gòu)建了一個具有較高質(zhì)量和覆蓋率的中文情緒詞表,為文本情緒分析建立可靠的基礎(chǔ)資源。第二,目前已有的中文情緒詞典普遍存在完善性和精確性等問題,以往研究中,情緒詞信息通常只包括詞語簡單的情緒類別和強度值。本課題認(rèn)為詞語的情緒類型分為表達(dá)和認(rèn)知兩種,在本文中主要挖掘詞語情緒表達(dá)方面蘊含的深層信息,同時引入How Net的詞語概念解釋來區(qū)分詞語多義性,在此基礎(chǔ)上提出新型標(biāo)注體系,構(gòu)建了細(xì)粒度中文情緒表達(dá)常識庫。第三,面對網(wǎng)絡(luò)文本和詞語不斷新增的情況,采用基于規(guī)則的新詞發(fā)現(xiàn)方法自動擴充常識庫。面對句子短小信息量少和難以識別非情緒詞表達(dá)情緒的問題,引入詞語的義項概念自動擴展句子。第四,將情緒詞資源應(yīng)用在基于語義規(guī)則以及基于機器學(xué)習(xí)的多類標(biāo)文本情緒分類算法中,通過對比實驗發(fā)現(xiàn),本課題構(gòu)建的中文情緒詞詞表和情緒表達(dá)常識庫分類性能優(yōu)于傳統(tǒng)情緒詞資源,同時表明,融入了常識庫信息的特征表示方法能有效提升基于機器學(xué)習(xí)方法的分類性能。本課題的貢獻(xiàn)在于:一,構(gòu)建了高質(zhì)量的中文情緒詞表以及目前已知最精細(xì)的中文情緒表達(dá)常識庫。二,采用規(guī)則的方法發(fā)掘新情緒詞可以擴大常識庫規(guī)模,同時,利用詞語概念擴充句子的方法有利于改善文本情緒分析結(jié)果。三,相比于傳統(tǒng)中文情緒詞典以及現(xiàn)有特征表達(dá)方法在多標(biāo)簽文本情緒分類中的作用,新詞典及新型細(xì)粒度中文情緒表達(dá)常識庫的應(yīng)用提高了分類性能,體現(xiàn)了它們的優(yōu)勢以及在文本情緒計算應(yīng)用中的有效性。
[Abstract]:As human-computer interaction is gradually known and applied, computers are expected to have the same emotional and emotional processing abilities as humans. In recent years, the rise of social media has made user-generated texts, especially Weibo, blogs and comments with personal emotions, being heavily pushed online. Web text data promote the research of a large number of real individual emotional analysis and tracking, and show important research significance and broad application prospect in social, political, economic and other fields. This paper studies the construction of Chinese emotional basic resources and its application in text emotion analysis, which is analyzed from three aspects: the emotional system model, the construction of the basic resources of emotional words and the automatic classification of multi-label text emotions. This paper mainly includes the following four tasks: first, aiming at the shortage of Chinese emotion dictionary resources, we use the English emotion dictionary word Net-Affectthrough machine translation, noise filtering and synonymous extension steps. An automatic Chinese emotional lexicon with high quality and coverage is constructed to establish a reliable basic resource for text emotion analysis. Secondly, the existing Chinese emotion dictionaries generally have some problems, such as perfection and accuracy. In previous studies, the information of emotion words usually only includes simple categories of emotions and intensity of words. This thesis holds that the emotion types of words can be divided into expression and cognition. In this paper, the deep information contained in the expression of words' emotions is mainly explored, and the concept of how net is introduced to distinguish the polysemy of words. On this basis, a new annotation system is proposed, and a fine-grained common sense database of Chinese emotion expression is constructed. Thirdly, in the face of the new network text and words, the rule-based new word discovery method is used to automatically expand the common sense database. In the face of the problem that there is little short information in sentences and it is difficult to recognize the expression of emotion by non-emotional words, the concept of meaning of words is introduced to extend sentences automatically. Fourthly, the emotional word resources are applied to the multi-class text emotion classification algorithm based on semantic rules and machine learning. The classification performance of the Chinese emotional vocabulary and the common sense database of emotion expression constructed in this paper is superior to that of the traditional emotional word resources. It is also shown that the feature representation method incorporating the common sense information can effectively improve the classification performance based on the machine learning method. The contributions of this thesis are as follows: first, a high quality Chinese emotional lexicon and the best known common sense database of Chinese emotion expression are constructed. Secondly, the use of rules to discover new emotional words can expand the scale of the common sense database, at the same time, the use of word concepts to expand the sentence method is conducive to improve the text emotional analysis results. Third, compared with the traditional Chinese emotion dictionary and the existing feature expression methods in multi-label text emotion classification, the new dictionary and the new fine-grained Chinese emotion expression common sense database have improved the classification performance. It shows their advantages and effectiveness in the application of text emotion calculation.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP391.1
本文編號:2137228
[Abstract]:As human-computer interaction is gradually known and applied, computers are expected to have the same emotional and emotional processing abilities as humans. In recent years, the rise of social media has made user-generated texts, especially Weibo, blogs and comments with personal emotions, being heavily pushed online. Web text data promote the research of a large number of real individual emotional analysis and tracking, and show important research significance and broad application prospect in social, political, economic and other fields. This paper studies the construction of Chinese emotional basic resources and its application in text emotion analysis, which is analyzed from three aspects: the emotional system model, the construction of the basic resources of emotional words and the automatic classification of multi-label text emotions. This paper mainly includes the following four tasks: first, aiming at the shortage of Chinese emotion dictionary resources, we use the English emotion dictionary word Net-Affectthrough machine translation, noise filtering and synonymous extension steps. An automatic Chinese emotional lexicon with high quality and coverage is constructed to establish a reliable basic resource for text emotion analysis. Secondly, the existing Chinese emotion dictionaries generally have some problems, such as perfection and accuracy. In previous studies, the information of emotion words usually only includes simple categories of emotions and intensity of words. This thesis holds that the emotion types of words can be divided into expression and cognition. In this paper, the deep information contained in the expression of words' emotions is mainly explored, and the concept of how net is introduced to distinguish the polysemy of words. On this basis, a new annotation system is proposed, and a fine-grained common sense database of Chinese emotion expression is constructed. Thirdly, in the face of the new network text and words, the rule-based new word discovery method is used to automatically expand the common sense database. In the face of the problem that there is little short information in sentences and it is difficult to recognize the expression of emotion by non-emotional words, the concept of meaning of words is introduced to extend sentences automatically. Fourthly, the emotional word resources are applied to the multi-class text emotion classification algorithm based on semantic rules and machine learning. The classification performance of the Chinese emotional vocabulary and the common sense database of emotion expression constructed in this paper is superior to that of the traditional emotional word resources. It is also shown that the feature representation method incorporating the common sense information can effectively improve the classification performance based on the machine learning method. The contributions of this thesis are as follows: first, a high quality Chinese emotional lexicon and the best known common sense database of Chinese emotion expression are constructed. Secondly, the use of rules to discover new emotional words can expand the scale of the common sense database, at the same time, the use of word concepts to expand the sentence method is conducive to improve the text emotional analysis results. Third, compared with the traditional Chinese emotion dictionary and the existing feature expression methods in multi-label text emotion classification, the new dictionary and the new fine-grained Chinese emotion expression common sense database have improved the classification performance. It shows their advantages and effectiveness in the application of text emotion calculation.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 徐睿峰;鄒承天;鄭燕珍;徐軍;桂林;劉濱;王曉龍;;一種基于情緒表達(dá)與情緒認(rèn)知分離的新型情緒詞典[J];中文信息學(xué)報;2013年06期
,本文編號:2137228
本文鏈接:http://www.sikaile.net/jingjilunwen/zhengzhijingjixuelunwen/2137228.html
最近更新
教材專著