天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于規(guī)則和統(tǒng)計(jì)的網(wǎng)絡(luò)不良信息識(shí)別研究

發(fā)布時(shí)間:2018-12-16 13:21
【摘要】:互聯(lián)網(wǎng)的高速發(fā)展,給社會(huì)和人們的生活帶來(lái)了巨大而深遠(yuǎn)的影響;ヂ(lián)網(wǎng)作為信息傳播的載體,與傳統(tǒng)的紙媒相比具有無(wú)法比擬的優(yōu)越性,為不同領(lǐng)域如政治、經(jīng)濟(jì)、文化等的信息傳播提供了優(yōu)質(zhì)的平臺(tái),也為人與人之間的交流創(chuàng)建了一種新的途徑。互聯(lián)網(wǎng)給人們生活帶來(lái)便利的同時(shí),也帶來(lái)一些負(fù)面的效應(yīng)。虛擬的網(wǎng)絡(luò)環(huán)境中,每一個(gè)用戶都被轉(zhuǎn)化為一串虛擬的符號(hào),用戶通過(guò)個(gè)人網(wǎng)頁(yè)、微博、微信公眾號(hào)、論壇等形式的網(wǎng)絡(luò)媒體發(fā)布的信息、言論等,都具有一定的不確定性,即使許多平臺(tái)采取一定的事前審核、事后過(guò)濾措施,但仍然有某些身份隱蔽、道德意識(shí)、文化素養(yǎng)較差的人存在,使得大量虛假的、色情類、政治敏感類、詐騙類、迷信類等信息充斥網(wǎng)絡(luò)的角角落落,敗壞社會(huì)風(fēng)氣,蠱惑人心,給人們的身心健康造成極大的損害。作為一種用戶量巨大的網(wǎng)絡(luò)社交媒體,微博是一種基于用戶關(guān)系的信息分享、傳播、獲取的平臺(tái),用戶發(fā)布的微博消息可以通過(guò)客戶端或者平臺(tái)及時(shí)推送給粉絲,實(shí)現(xiàn)了實(shí)時(shí)、快捷的信息傳播。同時(shí)微博粉絲也可以通過(guò)發(fā)表評(píng)論與博主進(jìn)行互動(dòng),或者可以進(jìn)行轉(zhuǎn)發(fā)、評(píng)論、收藏等操作,實(shí)現(xiàn)信息分享、傳播,擴(kuò)大信息傳播的范圍,增強(qiáng)信息的影響力。微博的這個(gè)特點(diǎn)同時(shí)也導(dǎo)致了微博成為不良信息的藏身之地。因此微博已經(jīng)成為許多學(xué)者研究的對(duì)象。為了凈化網(wǎng)絡(luò)環(huán)境,讓未成年人遠(yuǎn)離不良信息的侵害,給互聯(lián)網(wǎng)用戶提供良好的搜索體驗(yàn),有必要控制這些不良信息的發(fā)布和傳播,采取相應(yīng)的措施和手段加強(qiáng)監(jiān)督和管理。為此,本文以網(wǎng)絡(luò)中不良信息的識(shí)別為目的,結(jié)合已有的中文文本挖掘技術(shù)來(lái)進(jìn)行實(shí)驗(yàn)研究。通過(guò)爬蟲程序采集微博用戶針對(duì)特定微博正文進(jìn)行評(píng)論和轉(zhuǎn)發(fā)內(nèi)容,得到原始數(shù)據(jù)。并對(duì)原始數(shù)據(jù)進(jìn)行去除無(wú)關(guān)的符號(hào)、分詞處理、依存關(guān)系標(biāo)注、詞頻統(tǒng)計(jì)等操作,并利用得到的數(shù)據(jù)來(lái)提取文本的特征集。為了提高分詞的準(zhǔn)確性,本文設(shè)計(jì)了不良詞庫(kù),其中包含不良詞語(yǔ)本身對(duì)應(yīng)的基本詞表、近義詞表、縮寫詞表、詞語(yǔ)之間的依存關(guān)系表;將基于統(tǒng)計(jì)的特征提取算法與依存關(guān)系分析相結(jié)合,有效提取文本特征,并使用樸素貝葉斯算法實(shí)現(xiàn)了文本分類模型。進(jìn)一步將該模型應(yīng)用于微博中用戶評(píng)論的分類處理,通過(guò)實(shí)驗(yàn)對(duì)分類器進(jìn)行測(cè)試,與改進(jìn)前相比,分類的準(zhǔn)確率和召回率有明顯的提高。最后針對(duì)本文的研究做出總結(jié),提出本文的創(chuàng)新點(diǎn)和不足之處,并在后續(xù)的研究過(guò)程繼續(xù)完善。
[Abstract]:The rapid development of the Internet has brought great and profound influence to the society and people's life. As a carrier of information dissemination, Internet has unparalleled advantages compared with traditional paper media. It provides a high quality platform for information dissemination in different fields such as politics, economy, culture and so on. It also creates a new way for people to communicate with each other. Internet brings convenience to people's life, but also brings some negative effects. In the virtual network environment, every user is transformed into a string of virtual symbols. The information and comments issued by the users through personal web pages, Weibo, WeChat public numbers, forums, etc., are all uncertain. Even though many platforms take certain measures of prior vetting and filtering after the event, there are still some people with hidden identities, moral awareness, and poor cultural attainment, making a large number of false, pornographic, politically sensitive, and swindling types. Superstition and other information are filled with Internet corner, corrupt social atmosphere, demagoguery, and cause great damage to people's physical and mental health. As a kind of network social media with a large number of users, Weibo is a platform for sharing, disseminating and obtaining information based on user relations. The information posted by users can be pushed to fans through clients or platforms in a timely manner, thus realizing real time. Quick dissemination of information. At the same time, Weibo fans can interact with the blogger by publishing comments, or can transmit, comment, collect and other operations, achieve information sharing, dissemination, expand the scope of information dissemination, enhance the influence of information. Weibo's this characteristic also led to Weibo to become the hiding place of bad information at the same time. Therefore, Weibo has become the object of many scholars. In order to purify the network environment, keep minors away from the violation of bad information and provide Internet users with good search experience, it is necessary to control the publication and dissemination of these bad information and take appropriate measures and means to strengthen supervision and management. Therefore, the purpose of this paper is to identify the bad information in the network, combined with the existing Chinese text mining technology to carry out experimental research. The crawler program collects Weibo users to comment and forward the text of a particular Weibo, and gets the original data. The original data are removed independent symbols, word segmentation, dependency tagging, word frequency statistics and so on, and the text feature set is extracted by using the obtained data. In order to improve the accuracy of word segmentation, this paper designs a bad thesaurus, which includes the basic word list, the synonym table, the abbreviated lexicon and the dependency table of the words. The feature extraction algorithm based on statistics is combined with dependency analysis to extract text features effectively, and a text classification model is implemented by using naive Bayes algorithm. Furthermore, the model is applied to the classification of user comments in Weibo, and the classifier is tested by experiments. Compared with the improved model, the classification accuracy and recall rate are obviously improved. Finally, this paper summarizes the research, puts forward the innovation and shortcomings of this paper, and continues to improve in the follow-up research process.
【學(xué)位授予單位】:華中師范大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.1;TP393.092

【相似文獻(xiàn)】

相關(guān)期刊論文 前10條

1 科卞;信號(hào)細(xì)微特征提取分析技術(shù)[J];電子科技大學(xué)學(xué)報(bào);2000年02期

2 馬少華,高峰,李敏,吳成東;神經(jīng)網(wǎng)絡(luò)分類器的特征提取和優(yōu)選[J];基礎(chǔ)自動(dòng)化;2000年06期

3 管聰慧,宣國(guó)榮;多類問(wèn)題中的特征提取[J];計(jì)算機(jī)工程;2002年01期

4 胡威;李建華;陳波;;入侵檢測(cè)建模過(guò)程中特征提取最優(yōu)化評(píng)估[J];計(jì)算機(jī)工程;2006年12期

5 朱玉蓮;陳松燦;趙國(guó)安;;推廣的矩陣模式特征提取方法及其在人臉識(shí)別中的應(yīng)用[J];小型微型計(jì)算機(jī)系統(tǒng);2007年04期

6 趙振勇;王保華;王力;崔磊;;人臉圖像的特征提取[J];計(jì)算機(jī)技術(shù)與發(fā)展;2007年05期

7 馮海亮;王麗;李見(jiàn)為;;一種新的用于人臉識(shí)別的特征提取方法[J];計(jì)算機(jī)科學(xué);2009年06期

8 朱笑榮;楊德運(yùn);;基于入侵檢測(cè)的特征提取方法[J];計(jì)算機(jī)應(yīng)用與軟件;2010年06期

9 王菲;白潔;;一種基于非線性特征提取的被動(dòng)聲納目標(biāo)識(shí)別方法研究[J];軟件導(dǎo)刊;2010年05期

10 陳偉;瞿曉;葛丁飛;;主觀引導(dǎo)特征提取法在光譜識(shí)別中的應(yīng)用[J];科技通報(bào);2011年04期

相關(guān)會(huì)議論文 前10條

1 尚修剛;蔣慰孫;;模糊特征提取新算法[A];1997中國(guó)控制與決策學(xué)術(shù)年會(huì)論文集[C];1997年

2 潘榮江;孟祥旭;楊承磊;王銳;;旋轉(zhuǎn)體的幾何特征提取方法[A];第一屆建立和諧人機(jī)環(huán)境聯(lián)合學(xué)術(shù)會(huì)議(HHME2005)論文集[C];2005年

3 薛燕;李建良;朱學(xué)芳;;人臉識(shí)別中特征提取的一種改進(jìn)方法[A];第十三屆全國(guó)圖象圖形學(xué)學(xué)術(shù)會(huì)議論文集[C];2006年

4 杜栓平;曹正良;;時(shí)間—頻率域特征提取及其應(yīng)用[A];2005年全國(guó)水聲學(xué)學(xué)術(shù)會(huì)議論文集[C];2005年

5 黃先鋒;韓傳久;陳旭;周劍軍;;運(yùn)動(dòng)目標(biāo)的分割與特征提取[A];全國(guó)第二屆信號(hào)處理與應(yīng)用學(xué)術(shù)會(huì)議?痆C];2008年

6 魏明果;;方言比較的特征提取與矩陣分析[A];2009系統(tǒng)仿真技術(shù)及其應(yīng)用學(xué)術(shù)會(huì)議論文集[C];2009年

7 林土勝;賴聲禮;;視網(wǎng)膜血管特征提取的拆支跟蹤法[A];1999年中國(guó)神經(jīng)網(wǎng)絡(luò)與信號(hào)處理學(xué)術(shù)會(huì)議論文集[C];1999年

8 秦建玲;李軍;;基于核的主成分分析的特征提取方法與樣本篩選[A];2005年中國(guó)機(jī)械工程學(xué)會(huì)年會(huì)論文集[C];2005年

9 劉紅;陳光,

本文編號(hào):2382415


資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/guanlilunwen/ydhl/2382415.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶17d7c***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com