面向產(chǎn)品評論的情感文本分類研究

發(fā)布時間：2018-07-24 13:07

【摘要】：隨著電子商務的發(fā)展,網(wǎng)站上產(chǎn)品評論信息日益增多。消費者針對所購產(chǎn)品或服務表達觀點、立場、看法,而這些觀點、看法可以從不同方面反應產(chǎn)品或服務的品質。根據(jù)在線產(chǎn)品評論信息,意向消費者可以了解所需產(chǎn)品信息,商家也可以及時對產(chǎn)品或服務不足之處進行改進。由于以消費者為中心發(fā)表的評論信息比較雜亂,為便于其他消費者更好的了解產(chǎn)品信息以及商家及時獲取用戶反饋信息,對評論文本進行情感傾向分析分類是必要的。文本情感分析,主要是對文本的情感特征進行分析,為了有效提取文本情感特征,論文通過特征選擇算法以及情感詞典的方法進行提取,然后對文本進行分類。本文主要研究內容如下:(1)基于卡方統(tǒng)計的n-gram特征提取與冗余約簡方法,對n-gram特征項之間存在冗余而影響實際分類效果的問題改進傳統(tǒng)卡方統(tǒng)計算法,利用特征之間共現(xiàn)與不共現(xiàn)的關聯(lián)性,選取具有關聯(lián)性的n-gram特征;然后利用特征與類別之間的相關性,判別多元特征間是否冗余,并對冗余特征進行約簡,從而選取高類別相關性而低冗余的n-gram特征。最后,對上述方法利用SVM算法在不同情感語料集上進行測試,實驗結果表明該方法提高了文本情感分類的效率,驗證了方法的有效性。(2)基于情感詞典的方法,可以直接提取文本的情感特征,但情感詞典的質量會影響分類的效果,且修飾情感詞的上下文結構特征也會影響文本中情感詞的極性。針對情感詞典的構建以及情感詞極性變化問題,提出基于產(chǎn)品屬性的情感分類。該方法,首先利用Word2vec訓練特征生成詞向量,利用詞向量之間的相似性對相似特征進行聚類,利用屬性詞與情感詞的依存關系,提取屬性詞與情感詞;然后,分析情感文本特征,構造領域情感詞典,抽取文本的屬性詞、情感詞以及其上下文結構特征;最后,結合SVM算法對文本進行分類,分析該方法對情感分類的影響,驗證該方法對分類是有效的。在此基礎上,分析LDA主題特征對文本情感分類的影響。為考慮情感特征的結構信息,提出結合n-gram模型生成n-gram特征的方法,同時對多元特征進行冗余約簡。然后,將LDA主題概率作為特征,利用SVM算法在不同情感語料集上進行測試,分析不同n-gram特征結合LDA對文本分類的影響。最后,將該方法與不同的分類方法進行對比分析,實驗結果表明該方法提高了文本情感分類的結果,驗證了方法的有效性。
[Abstract]:With the development of e-commerce, the product comment information on the website is increasing day by day. Consumers express their views, positions, opinions on the products or services they purchase, which can reflect the quality of the products or services in different ways. According to the online product review information, the intended consumer can know the required product information, and the merchant can improve the product or service deficiency in time. Because of the disorderly comments published by consumers, it is necessary to analyze and classify comment texts in order to help other consumers better understand product information and get timely feedback from users. Text emotional analysis is mainly to analyze the emotional characteristics of text. In order to extract the emotional features of text effectively, this paper extracts the text by feature selection algorithm and emotion dictionary, and then classifies the text. The main contents of this paper are as follows: (1) based on chi-square statistics, the traditional chi-square statistical algorithm is improved for the problem that there is redundancy between n-gram feature items, which affects the actual classification effect, based on the n-gram feature extraction and redundancy reduction method. By using the correlation between co-occurrence and non-co-occurrence among features, the n-gram feature with relevance is selected, and then the correlation between feature and category is used to judge whether the multivariate features are redundant or not, and the redundant features are reduced. In order to select a high category of correlation and low redundancy of n-gram features. Finally, the method is tested on different affective corpus using SVM algorithm. The experimental results show that the method improves the efficiency of text emotion classification and verifies the effectiveness of the method. (2) the affective dictionary based approach. The emotion features of the text can be extracted directly, but the quality of the emotion dictionary will affect the classification effect, and the contextual structure of the modified emotion words will also affect the polarity of the emotional words in the text. Aiming at the construction of emotion dictionary and the change of polarity of emotion words, the emotion classification based on product attributes is proposed. In this method, we first use Word2vec to train features to generate word vectors, cluster similar features by using the similarity between word vectors, extract attribute words and affective words by using the dependency between attribute words and affective words, and then analyze the affective text features. Construct domain emotion dictionary, extract attribute words, affective words and its context structure features of text. Finally, combine SVM algorithm to classify text, analyze the influence of this method on emotion classification, and verify that this method is effective for classification. On this basis, the influence of LDA theme features on text affective classification is analyzed. In order to consider the structural information of affective features, a method of generating n-gram features based on n-gram model is proposed. At the same time, the multivariate features are reduced by redundancy. Then, the LDA topic probability is used as the feature, and the SVM algorithm is used to test the different affective corpus to analyze the influence of different n-gram features combined with LDA on text classification. Finally, the method is compared with different classification methods. The experimental results show that the method improves the result of text emotion classification and verifies the effectiveness of the method.
【學位授予單位】：安徽大學
【學位級別】：碩士
【學位授予年份】：2017
【分類號】：TP391.1

【相似文獻】

相關期刊論文前10條

1 李洋;;微博文本規(guī)范化研究綜述[J];現(xiàn)代計算機(專業(yè)版);2014年03期

2 于鳴鏑;;再論讀者與文本[J];圖書情報工作;2006年01期

3 梅約翰;早期中國文本詮釋的折衷方式:以《論語》為例[J];中國哲學史;2004年02期

4 閻立峰;;電視文本的美、善、真[J];現(xiàn)代傳播-中國傳媒大學學報;2009年03期

5 周佩妮;;略論文本概念的兩組對應范疇[J];圖書館理論與實踐;2006年06期

6 萬敏;;《老王》文本意義之“冷暖”維度建構[J];現(xiàn)代語文(教學研究版);2013年05期

7 胡昌斗;;關于文本概念語義規(guī)范的思考[J];中國圖書館學報;2006年04期

8 王燕子;;介質:文本媒介的意義言說[J];陰山學刊;2012年05期

9 于鳴鏑;;我的讀者觀[J];圖書館論壇;2005年06期

10 李佳徽;;多媒介文本聯(lián)合解碼——產(chǎn)品服務體系設計[J];中國包裝工業(yè);2013年18期

相關會議論文前1條

1 佴榮本;;文學史的文本與人本[A];2006年江蘇省哲學社會科學界學術大會論文集（下）[C];2006年

相關重要報紙文章前5條

1 許彥達;朗讀不是鸚鵡學舌[N];中國教師報;2005年

2 徐妍;在傷痛中承擔意義[N];文藝報;2010年

3 山東省寧陽縣鶴山鄉(xiāng)中心小學尹承香;走進人物的心靈[N];學知報;2011年

4 王志耕;從失語走向歷史透視[N];中華讀書報;2002年

5 藏策;圖文書的歧途[N];中國新聞出版報;2003年

相關博士學位論文前3條

1 謝云才;文本意義的詮釋與翻譯[D];上海外國語大學;2010年

2 李金鳳;“評價—順應”視角的讀者定位研究[D];復旦大學;2009年

3 黃小揚;背景知識對英語閱讀理解的干預效應研究[D];浙江大學;2012年

相關碩士學位論文前10條

1 張劍;基于概念的文本表示模型的研究[D];清華大學;2006年

2 王宣又;心理空間與文學文本意義的認知探尋[D];四川外語學院;2011年

3 王海霞;文學翻譯理解過程——文本視界與譯者視界的融合[D];湖南師范大學;2004年

4 柴鳳英;文本意義的闡釋與建構[D];內蒙古師范大學;2004年

5 程敏;關聯(lián)理論與翻譯[D];南京師范大學;2005年

6 張婷婷;再現(xiàn)譯者決策過程[D];浙江大學;2006年

7 曹忠華;兒童文學視野下小學神話類文本教學探究[D];蘇州大學;2014年

8 張鵬;基于FrameNet框架關系的文本蘊含識別研究[D];山西大學;2012年

9 卓今;《馬語者》翻譯實踐報告[D];中南大學;2012年

10 林怡;視點—文本意義的生成[D];福建師范大學;2008年

，

本文編號：2141476

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/ruanjiangongchenglunwen/2141476.html

上一篇：公交車行駛平穩(wěn)性智能監(jiān)測系統(tǒng)的設計與研究
下一篇：基于二部網(wǎng)絡分析的推薦算法研究及其應用

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

面向產(chǎn)品評論的情感文本分類研究