基于主動學(xué)習(xí)和遷移學(xué)習(xí)的文本情感預(yù)測研究
發(fā)布時間:2018-04-23 04:31
本文選題:主動學(xué)習(xí) + 遷移學(xué)習(xí); 參考:《山西大學(xué)》2016年碩士論文
【摘要】:隨著新興電子商務(wù)平臺廣泛使用,用戶在享受便利的同時,也通過論壇發(fā)表關(guān)于產(chǎn)品的觀點(diǎn)。通過這些評論,普通用戶可以了解產(chǎn)品的性能,為購買行為做出理性的選擇,生產(chǎn)者可以快速掌握市場動向,為商品營銷做出正確的決策。因此,面向產(chǎn)品評論的觀點(diǎn)挖掘和情感分析是解決此類問題的有效手段。傳統(tǒng)的監(jiān)督學(xué)習(xí)方法多應(yīng)用于靜態(tài)單領(lǐng)域數(shù)據(jù),需要大量的標(biāo)注數(shù)據(jù),而遷移學(xué)習(xí)方法可以利用已有的標(biāo)注數(shù)據(jù)來學(xué)習(xí)分類模型,用于解決訓(xùn)練目標(biāo)樣本標(biāo)注不足的問題。由于不同領(lǐng)域或不同時期的數(shù)據(jù)之間存在一定的差異性,本文通過主動學(xué)習(xí)對分類模型進(jìn)行優(yōu)化,用于提高文本的情感預(yù)測效果,主要研究的內(nèi)容如下:(1)文本情感預(yù)測的問題分析根據(jù)實(shí)驗語料,從傳統(tǒng)文本表示的局限性、評論文本語言表達(dá)的多樣性以及評論文本不同時段的關(guān)注點(diǎn)不同三個方面,具體分析了目前情感分析研究中存在的問題,并提出了相應(yīng)的解決方法。(2)基于主動學(xué)習(xí)和遷移學(xué)習(xí)的跨領(lǐng)域文本情感預(yù)測針對靜態(tài)跨領(lǐng)域數(shù)據(jù)領(lǐng)域不同導(dǎo)致的語言表達(dá)多樣性問題,提出一種基于主動學(xué)習(xí)和遷移學(xué)習(xí)的跨領(lǐng)域文本情感預(yù)測方法,首先通過源領(lǐng)域數(shù)據(jù)訓(xùn)練分類模型,選擇目標(biāo)領(lǐng)域置信度較高的文本作為分類模型的初始種子樣本,迭代過程中,選取專家標(biāo)注的低置信度文本與高置信度文本共同加入訓(xùn)練數(shù)據(jù)集,加快了目標(biāo)領(lǐng)域分類模型的優(yōu)化速度,再根據(jù)情感詞典、評價詞搭配抽取規(guī)則以及輔助特征詞從訓(xùn)練集中動態(tài)抽取特征集,最終利用優(yōu)化好的分類模型對測試數(shù)據(jù)集進(jìn)行分類。相比Active-Dynamic,Active-Semi-Dynamic平均精度提高了 2.75個百分點(diǎn),實(shí)驗結(jié)果表明加入高置信度樣本,能夠豐富訓(xùn)練樣本和特征信息,有助于分類模型的訓(xùn)練。相比Active-BOW,Active-Semi-Dynamic平均精度提高了 2.79個百分點(diǎn),實(shí)驗結(jié)果表明利用情感詞典和依存句法分析相結(jié)合抽取情感詞,能夠更加準(zhǔn)確地刻畫文本的情感信息,提高跨領(lǐng)域文本的情感預(yù)測效果。(3)基于主動學(xué)習(xí)和遷移學(xué)習(xí)的時序評論情感預(yù)測針對動態(tài)時序數(shù)據(jù)評論時間不同導(dǎo)致的評論關(guān)注點(diǎn)不同問題,提出一種基于主動學(xué)習(xí)和遷移學(xué)習(xí)的時序評論情感預(yù)測方法,采用遷移學(xué)習(xí)思想,通過前一時期標(biāo)注數(shù)據(jù)獲得當(dāng)前時期數(shù)據(jù)的初始標(biāo)注樣本。在主動學(xué)習(xí)中,采用SMOTE算法平衡訓(xùn)練數(shù)據(jù)集,通過優(yōu)化后分類模型預(yù)測當(dāng)前時期汽車評論的情感傾向。相比UN_SMOTE,SMOTE算法的平均準(zhǔn)確率提高了 4.32個百分點(diǎn),實(shí)驗結(jié)果表明分類模型優(yōu)化過程中,在少數(shù)類中插入新樣本,能夠平衡訓(xùn)練語料,提升汽車評論的情感預(yù)測效果,同時,實(shí)現(xiàn)了混合類評論的情感預(yù)測。
[Abstract]:With the widespread use of the emerging e-commerce platform, users enjoy the convenience, while the views of the products are published in the forum. Through these comments, ordinary users can understand the performance of the products, make a rational choice for the purchase behavior, and the producers can quickly grasp the market trend and make the correct decision for the marketing of goods. Therefore, View mining and emotional analysis for product reviews is an effective means to solve such problems. The traditional supervised learning method is applied to static single domain data and requires a large number of annotation data. The migration learning method can use the existing annotation data to learn the classification model and solve the problem of the shortage of training target samples. Due to the difference in data between different fields and different periods, this paper optimizes the classification model by active learning to improve the emotional prediction effect of text. The main contents are as follows: (1) the analysis of text emotional prediction is based on the authentic corpus, the limitations of the traditional text representation and the comment text. The diversity of language expression and the different points of attention in different periods of review text are three different aspects, and the existing problems in the present emotional analysis are analyzed, and the corresponding solutions are put forward. (2) a cross domain text emotion prediction needle based on active learning and migration learning has a different language in the static cross domain data field. To express the problem of diversity, a cross domain text emotion prediction method based on active learning and migration learning is proposed. First, the classification model is trained by the source domain data, and the text of higher confidence in the target domain is selected as the initial seed sample of the classification model. In the iterative process, the low confidence text and high confidence of the expert tagging are selected. The degree text joins the training data set together to speed up the optimization speed of the target domain classification model, and then according to the affective dictionary, the evaluation of the word collocation extraction rules and the auxiliary feature words from the training set dynamic extraction of the feature set. Finally, the optimized classification model is used to classify the test data sets. Compared with Active-Dynamic, Active-Semi-Dynami The average accuracy of C is increased by 2.75 percentage points. The experimental results show that adding high confidence samples can enrich the training samples and feature information and help the training of classification models. Compared with Active-BOW, the average precision of Active-Semi-Dynamic is increased by 2.79 percentage points. The experimental results show that the combination of emotional dictionary and dependency syntactic analysis is used to draw the combination of the emotional dictionary and the dependency syntactic analysis. Emotional words can be used to describe the emotional information of the text more accurately and improve the emotional prediction effect of the cross domain text. (3) a time series review emotional prediction based on active learning and migration learning is based on the different problems of critical attention caused by the different time of dynamic time series data commentary, and a time based on active learning and migration learning is proposed. In the active learning, the SMOTE algorithm is used to balance the training data set and to predict the emotional tendencies of the current period car reviews by optimizing the classification model. Compared with the average UN_SMOTE, the average accuracy of the SMOTE algorithm is compared. The accuracy of the experiment is increased by 4.32 percentage points. The experimental results show that in the optimization process of the classification model, new samples are inserted in a few classes, which can balance the training corpus, improve the emotional prediction effect of the car reviews, and realize the emotional prediction of the mixed class reviews.
【學(xué)位授予單位】:山西大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 唐超;王文劍;李偉;李國斌;曹峰;;基于多學(xué)習(xí)器協(xié)同訓(xùn)練模型的人體行為識別方法[J];軟件學(xué)報;2015年11期
2 趙傳君;王素格;李德玉;李欣;;基于分組提升集成的跨領(lǐng)域文本情感分類[J];計算機(jī)研究與發(fā)展;2015年03期
3 姜高霞;王文劍;;時序數(shù)據(jù)曲線排齊的相關(guān)性分析方法[J];軟件學(xué)報;2014年09期
4 張玉紅;周全;胡學(xué)鋼;;面向跨領(lǐng)域情感分類的特征選擇方法[J];模式識別與人工智能;2013年11期
5 魏現(xiàn)輝;張紹武;楊亮;林鴻飛;;基于加權(quán)SimRank的跨領(lǐng)域文本情感傾向性分析[J];模式識別與人工智能;2013年11期
6 呂云云;李e,
本文編號:1790476
本文鏈接:http://www.sikaile.net/jingjilunwen/dianzishangwulunwen/1790476.html
最近更新
教材專著