基于主動學(xué)習(xí)的微博情感分析方法研究

發(fā)布時間：2018-05-27 00:05

本文選題：微博情感分析 + 主動學(xué)習(xí)　；參考：《吉林大學(xué)》2017年碩士論文

【摘要】：目前,作為文本挖掘重要分支之一的文本情感分析受到學(xué)者的廣泛關(guān)注。隨著互聯(lián)網(wǎng)的飛速發(fā)展和社交媒體的普及,網(wǎng)上產(chǎn)生了大量的用戶衍生文本,而這些文本主觀性很強(qiáng)并帶有明顯的情感傾向和豐富的情感信息,具有很高的研究價值。主流的情感分類方法廣泛采用了機(jī)器學(xué)習(xí),這種方法的局限在于需要大規(guī)模標(biāo)注語料作為訓(xùn)練集,這需要花費(fèi)巨大的成本來標(biāo)注語料。而在實踐當(dāng)中,容易獲得的都是未標(biāo)注的文本語料,因此,如何利用少量的標(biāo)注語料和大量的未標(biāo)注語料進(jìn)行文本情感分類成為了一個重要課題。本文將主動學(xué)習(xí)方法結(jié)合進(jìn)基于機(jī)器學(xué)習(xí)的文本情感分類方法中,以解決未標(biāo)記語料的有效利用問題。由于文本特征矩陣的稀疏性,采用支持向量機(jī)作為基分類器在準(zhǔn)確度上有著較大優(yōu)勢。邊緣采樣方法是利用支持向量機(jī)進(jìn)行主動學(xué)習(xí)的經(jīng)典方法,但該方法同樣存在著錯誤級聯(lián)、過擬合和冗余迭代等一些準(zhǔn)確率和性能上的問題。本文針對這些問題并在同樣使用支持向量機(jī)作為基分類器的基礎(chǔ)上提出了一個新的主動學(xué)習(xí)方法(Active Learning in Informative Vector Selection-----ALIVS)。主要工作如下:第一,本研究對文本情感分類和主動學(xué)習(xí)的理論進(jìn)行了系統(tǒng)研究,分析了文本情感分類的主要任務(wù)、研究流派以及主動學(xué)習(xí)的基本假設(shè)和主流方法等基礎(chǔ)理論。并對經(jīng)典的基于邊緣的主動學(xué)習(xí)方法進(jìn)行了研究和分析,發(fā)現(xiàn)其存在的局限。第二,本研究以上文所述的理論研究為起點,提出新的主動學(xué)習(xí)方法ALIVS,該方法利用未標(biāo)記樣本集的特點提出了信息向量(Informative Vector)的概念并結(jié)合支持向量機(jī)發(fā)展出一個二級分類的學(xué)習(xí)流程,該流程基于以下想法:采用兩級分類器,第一級主分類器負(fù)責(zé)情感分類;第二級信息向量分類器利用第一級分類器學(xué)習(xí)到的分類信息從未標(biāo)記樣本中遴選出最具分類信息的信息向量作為候選標(biāo)記樣本,經(jīng)專家標(biāo)記后,加入第一級分類器的訓(xùn)練集中,循環(huán)迭代,不斷增強(qiáng)第一級分類器的分類能力,進(jìn)而達(dá)成利用大量的未標(biāo)記文本和少量的標(biāo)記文本進(jìn)行有效訓(xùn)練的目標(biāo)。第三,本研究將該方法應(yīng)用到基于COAE2014評測的任務(wù)4的實際場景中,并與廣泛應(yīng)用的邊緣采樣方法進(jìn)行對比,設(shè)計實驗對該方法的準(zhǔn)確度和性能進(jìn)行了測試和分析。實驗結(jié)果表明,本文提出的ALIVS方法在提高準(zhǔn)確率、降低過擬合及錯誤級聯(lián)等方面有著良好的表現(xiàn),從而證明了該方法的可行性。最后本文對該方法在未來的改進(jìn)和發(fā)展進(jìn)行了展望。
[Abstract]:At present, as one of the important branches of text mining, text emotional analysis has been widely concerned by scholars. With the rapid development of the Internet and the popularity of social media, a large number of user-derived texts have been generated on the Internet, and these texts are highly subjective, with obvious emotional tendency and rich emotional information, which has high research value. Machine learning is widely used in the mainstream affective classification methods. The limitation of this method lies in the need of large-scale tagging corpus as a training set, which requires a huge cost to annotate the corpus. In practice, it is easy to obtain unannotated text corpus, so how to use a small amount of annotated corpus and a large amount of unlabeled corpus to classify text emotion has become an important topic. In this paper, the active learning method is combined with the text emotion classification method based on machine learning to solve the problem of the effective use of unmarked corpus. Because of the sparsity of text feature matrix, support vector machine (SVM) as the basis classifier has a great advantage in accuracy. Edge sampling is a classical method for active learning using support vector machines, but it also has some problems in accuracy and performance, such as error concatenation, overfitting and redundant iteration. In this paper, we propose a new active learning method, active Learning in Informative Vector Selection-ALIVSs, based on the same support vector machine (SVM) as a basis classifier for these problems. The main work is as follows: first, this study systematically studies the theories of text emotion classification and active learning, analyzes the main tasks of text emotion classification, the basic hypothesis and mainstream methods of active learning. The classical edge-based active learning method is studied and analyzed, and its limitations are found. Second, this study starts with the theoretical research mentioned above. A new active learning method, ALIVS, is proposed in this paper. Based on the characteristics of unlabeled sample sets, the concept of information vector Informative vector is proposed and a secondary classification process is developed by combining support vector machine. The process is based on the following ideas: a two-level classifier is used, and the first primary classifier is responsible for emotion classification; The second level information vector classifier uses the information vector of the first level classifier to select the information vector with the most classified information as the candidate marker sample, and adds the training set of the first level classifier after the expert mark. Cyclic iteration enhances the classification ability of the first level classifier and achieves the goal of using a large number of unmarked text and a small amount of marked text for effective training. Thirdly, this method is applied to the actual scenario of task 4 based on COAE2014 evaluation, and compared with the widely used edge sampling method. The accuracy and performance of the method are tested and analyzed by experiments. The experimental results show that the proposed ALIVS method has a good performance in improving the accuracy, reducing over-fitting and error concatenation, which proves the feasibility of this method. Finally, the improvement and development of this method in the future are prospected.
【學(xué)位授予單位】：吉林大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2017
【分類號】：TP391.1

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 徐興凱;;信息課上應(yīng)重視學(xué)生的主動學(xué)習(xí)[J];小學(xué)時代(教育研究);2011年10期

2 劉蘭芳;;談學(xué)生主動學(xué)習(xí)習(xí)慣的培養(yǎng)[J];科技資訊;2006年30期

3 劉寶峰;;由被動學(xué)習(xí)轉(zhuǎn)為主動學(xué)習(xí)的探討[J];天津職業(yè)院校聯(lián)合學(xué)報;2012年08期

4 沈元懌;;基于主動學(xué)習(xí)的資源優(yōu)化分配方案研究[J];佛山科學(xué)技術(shù)學(xué)院學(xué)報(自然科學(xué)版);2006年01期

5 王玲;李琴;隋美玲;肖海軍;;基于支持向量機(jī)的主動學(xué)習(xí)方法及其實現(xiàn)[J];長沙大學(xué)學(xué)報;2014年02期

6 繆樹民;STS案例的探討[J];甘肅科技縱橫;2005年06期

7 王穎;高新波;李潔;王秀美;;基于PSVM的主動學(xué)習(xí)腫塊檢測方法[J];計算機(jī)研究與發(fā)展;2012年03期

8 張桂平;李文博;王裴巖;;基于主動學(xué)習(xí)的本體概念關(guān)系判斷[J];中文信息學(xué)報;2013年04期

9 楊文君;;大學(xué)計算機(jī)基礎(chǔ)教學(xué)模式改革探索——問題模式在教學(xué)中的應(yīng)用[J];牡丹江師范學(xué)院學(xué)報(自然科學(xué)版);2006年02期

10 魏欽冰;;大學(xué)計算機(jī)基礎(chǔ)教學(xué)模式改革探索——淺析問題模式在教學(xué)中的應(yīng)用[J];職業(yè)圈;2007年07期

相關(guān)博士學(xué)位論文前1條

1 姚拓中;結(jié)合主動學(xué)習(xí)的視覺場景理解[D];浙江大學(xué);2011年

相關(guān)碩士學(xué)位論文前9條

1 陳雄韜;基于聚類的主動學(xué)習(xí)實例選擇方法研究[D];中國礦業(yè)大學(xué);2016年

2 張軍;基于主動學(xué)習(xí)和遷移學(xué)習(xí)的文本情感預(yù)測研究[D];山西大學(xué);2016年

3 關(guān)雅夫;基于主動學(xué)習(xí)的微博情感分析方法研究[D];吉林大學(xué);2017年

4 黃輝;基于局部線性重構(gòu)系數(shù)的主動學(xué)習(xí)[D];溫州大學(xué);2014年

5 崔寶今;基于半監(jiān)督和主動學(xué)習(xí)的蛋白質(zhì)關(guān)系抽取研究[D];大連理工大學(xué);2008年

6 張江紅;多分類主動學(xué)習(xí)方法在地表分類中的應(yīng)用[D];南京理工大學(xué);2011年

7 易博;基于主動學(xué)習(xí)的語義缺失問句補(bǔ)全[D];哈爾濱工業(yè)大學(xué);2012年

8 柴思遠(yuǎn);結(jié)合主動學(xué)習(xí)的協(xié)作分類方法研究[D];吉林大學(xué);2011年

9 高文濤;劃分分類模型中主動學(xué)習(xí)關(guān)鍵技術(shù)研究[D];燕山大學(xué);2010年

，

本文編號：1939511

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/ruanjiangongchenglunwen/1939511.html

上一篇：中國聯(lián)通高�？蛻絷P(guān)系管理系統(tǒng)設(shè)計與實現(xiàn)
下一篇：基于MapReduce的主成分分析算法研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于主動學(xué)習(xí)的微博情感分析方法研究