基于規(guī)則和統(tǒng)計(jì)的網(wǎng)絡(luò)不良信息識(shí)別研究
[Abstract]:The rapid development of the Internet has brought great and profound influence to the society and people's life. As a carrier of information dissemination, Internet has unparalleled advantages compared with traditional paper media. It provides a high quality platform for information dissemination in different fields such as politics, economy, culture and so on. It also creates a new way for people to communicate with each other. Internet brings convenience to people's life, but also brings some negative effects. In the virtual network environment, every user is transformed into a string of virtual symbols. The information and comments issued by the users through personal web pages, Weibo, WeChat public numbers, forums, etc., are all uncertain. Even though many platforms take certain measures of prior vetting and filtering after the event, there are still some people with hidden identities, moral awareness, and poor cultural attainment, making a large number of false, pornographic, politically sensitive, and swindling types. Superstition and other information are filled with Internet corner, corrupt social atmosphere, demagoguery, and cause great damage to people's physical and mental health. As a kind of network social media with a large number of users, Weibo is a platform for sharing, disseminating and obtaining information based on user relations. The information posted by users can be pushed to fans through clients or platforms in a timely manner, thus realizing real time. Quick dissemination of information. At the same time, Weibo fans can interact with the blogger by publishing comments, or can transmit, comment, collect and other operations, achieve information sharing, dissemination, expand the scope of information dissemination, enhance the influence of information. Weibo's this characteristic also led to Weibo to become the hiding place of bad information at the same time. Therefore, Weibo has become the object of many scholars. In order to purify the network environment, keep minors away from the violation of bad information and provide Internet users with good search experience, it is necessary to control the publication and dissemination of these bad information and take appropriate measures and means to strengthen supervision and management. Therefore, the purpose of this paper is to identify the bad information in the network, combined with the existing Chinese text mining technology to carry out experimental research. The crawler program collects Weibo users to comment and forward the text of a particular Weibo, and gets the original data. The original data are removed independent symbols, word segmentation, dependency tagging, word frequency statistics and so on, and the text feature set is extracted by using the obtained data. In order to improve the accuracy of word segmentation, this paper designs a bad thesaurus, which includes the basic word list, the synonym table, the abbreviated lexicon and the dependency table of the words. The feature extraction algorithm based on statistics is combined with dependency analysis to extract text features effectively, and a text classification model is implemented by using naive Bayes algorithm. Furthermore, the model is applied to the classification of user comments in Weibo, and the classifier is tested by experiments. Compared with the improved model, the classification accuracy and recall rate are obviously improved. Finally, this paper summarizes the research, puts forward the innovation and shortcomings of this paper, and continues to improve in the follow-up research process.
【學(xué)位授予單位】:華中師范大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.1;TP393.092
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 科卞;信號(hào)細(xì)微特征提取分析技術(shù)[J];電子科技大學(xué)學(xué)報(bào);2000年02期
2 馬少華,高峰,李敏,吳成東;神經(jīng)網(wǎng)絡(luò)分類器的特征提取和優(yōu)選[J];基礎(chǔ)自動(dòng)化;2000年06期
3 管聰慧,宣國(guó)榮;多類問(wèn)題中的特征提取[J];計(jì)算機(jī)工程;2002年01期
4 胡威;李建華;陳波;;入侵檢測(cè)建模過(guò)程中特征提取最優(yōu)化評(píng)估[J];計(jì)算機(jī)工程;2006年12期
5 朱玉蓮;陳松燦;趙國(guó)安;;推廣的矩陣模式特征提取方法及其在人臉識(shí)別中的應(yīng)用[J];小型微型計(jì)算機(jī)系統(tǒng);2007年04期
6 趙振勇;王保華;王力;崔磊;;人臉圖像的特征提取[J];計(jì)算機(jī)技術(shù)與發(fā)展;2007年05期
7 馮海亮;王麗;李見(jiàn)為;;一種新的用于人臉識(shí)別的特征提取方法[J];計(jì)算機(jī)科學(xué);2009年06期
8 朱笑榮;楊德運(yùn);;基于入侵檢測(cè)的特征提取方法[J];計(jì)算機(jī)應(yīng)用與軟件;2010年06期
9 王菲;白潔;;一種基于非線性特征提取的被動(dòng)聲納目標(biāo)識(shí)別方法研究[J];軟件導(dǎo)刊;2010年05期
10 陳偉;瞿曉;葛丁飛;;主觀引導(dǎo)特征提取法在光譜識(shí)別中的應(yīng)用[J];科技通報(bào);2011年04期
相關(guān)會(huì)議論文 前10條
1 尚修剛;蔣慰孫;;模糊特征提取新算法[A];1997中國(guó)控制與決策學(xué)術(shù)年會(huì)論文集[C];1997年
2 潘榮江;孟祥旭;楊承磊;王銳;;旋轉(zhuǎn)體的幾何特征提取方法[A];第一屆建立和諧人機(jī)環(huán)境聯(lián)合學(xué)術(shù)會(huì)議(HHME2005)論文集[C];2005年
3 薛燕;李建良;朱學(xué)芳;;人臉識(shí)別中特征提取的一種改進(jìn)方法[A];第十三屆全國(guó)圖象圖形學(xué)學(xué)術(shù)會(huì)議論文集[C];2006年
4 杜栓平;曹正良;;時(shí)間—頻率域特征提取及其應(yīng)用[A];2005年全國(guó)水聲學(xué)學(xué)術(shù)會(huì)議論文集[C];2005年
5 黃先鋒;韓傳久;陳旭;周劍軍;;運(yùn)動(dòng)目標(biāo)的分割與特征提取[A];全國(guó)第二屆信號(hào)處理與應(yīng)用學(xué)術(shù)會(huì)議?痆C];2008年
6 魏明果;;方言比較的特征提取與矩陣分析[A];2009系統(tǒng)仿真技術(shù)及其應(yīng)用學(xué)術(shù)會(huì)議論文集[C];2009年
7 林土勝;賴聲禮;;視網(wǎng)膜血管特征提取的拆支跟蹤法[A];1999年中國(guó)神經(jīng)網(wǎng)絡(luò)與信號(hào)處理學(xué)術(shù)會(huì)議論文集[C];1999年
8 秦建玲;李軍;;基于核的主成分分析的特征提取方法與樣本篩選[A];2005年中國(guó)機(jī)械工程學(xué)會(huì)年會(huì)論文集[C];2005年
9 劉紅;陳光,
本文編號(hào):2382415
本文鏈接:http://www.sikaile.net/guanlilunwen/ydhl/2382415.html