天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于云服務(wù)模式的文本過濾關(guān)鍵技術(shù)研究與應(yīng)用

發(fā)布時間:2018-06-19 15:27

  本文選題:文本過濾 + 分類。 參考:《電子科技大學(xué)》2014年碩士論文


【摘要】:互聯(lián)網(wǎng)的快速發(fā)展,使其成為人們交流信息的主要方式之一。但由于它的這種開放性,導(dǎo)致網(wǎng)絡(luò)上存在很多如色情、暴力、迷信、反動等垃圾信息,嚴(yán)重影響了人們的日常上網(wǎng)活動。雖然目前已有很多文本過濾技術(shù),但是隨著外界環(huán)境的變化,文本過濾技術(shù)也需要不斷地改進(jìn)和提高。同時,隨著人們生活水平的不斷提高,越來越多的用戶通過移動終端來訪問互聯(lián)網(wǎng)。如何保證移動用戶能夠通過移動設(shè)備獲得健康的、有效的正常信息,這就需要在面向移動終端的云平臺上實現(xiàn)文本過濾技術(shù),從而實現(xiàn)對垃圾網(wǎng)頁進(jìn)行過濾處理。在這種需求下,本文在對現(xiàn)有的文本過濾關(guān)鍵技術(shù)進(jìn)行了分析和討論的基礎(chǔ)上,改進(jìn)了傳統(tǒng)的基于向量空間模型的文本分類算法以及樸素貝葉斯分類算法,并采用這兩種改進(jìn)的文本分類算法構(gòu)建了一個高性能的文本過濾系統(tǒng);然后將該系統(tǒng)部署于面向移動終端的云平臺,實現(xiàn)了云平臺上的文本過濾服務(wù)。保證了移動終端用戶能夠通過移動設(shè)備訪問互聯(lián)網(wǎng)上正常的、合法的網(wǎng)頁。本文的主要內(nèi)容為:1、在對文本過濾技術(shù)中常用的特征選擇算法進(jìn)行分析研究的基礎(chǔ)上,將等比例的思想運(yùn)用于特征選擇,使得提取的文本特征向量能夠更準(zhǔn)確地體現(xiàn)文本主題、類別信息等。2、在對文本過濾技術(shù)中已有的權(quán)重計算方法進(jìn)行分析和討論的基礎(chǔ)上,考慮了特征項的結(jié)構(gòu)信息、長度信息、比重信息等,對傳統(tǒng)的權(quán)重計算方法進(jìn)行了改進(jìn),使其能夠更好地反映特征項對網(wǎng)頁分類的重要程度。3、網(wǎng)頁是一種結(jié)構(gòu)化或半結(jié)構(gòu)化的文檔,因此本文采用模塊化的方式對網(wǎng)頁進(jìn)行分類處理;同時將基于比重的改進(jìn)權(quán)值計算方法以及等比例的特征選擇方法應(yīng)用于傳統(tǒng)的基于向量空間模型的分類算法和樸素貝葉斯分類算法;從而利用這兩個改進(jìn)的分類算法構(gòu)造了一個高性能的網(wǎng)頁過濾系統(tǒng),并且將該系統(tǒng)部署于云平臺,提供了文本過濾服務(wù)。測試結(jié)果證明,改進(jìn)的文本分類算法與傳統(tǒng)的算法相比,具有更高的分類準(zhǔn)確率、分類精度,較小的誤判率和錯誤率等,進(jìn)而改進(jìn)的文本過濾系統(tǒng)具有更好的性能。
[Abstract]:With the rapid development of the Internet, it has become one of the main ways for people to exchange information. However, because of its openness, there are a lot of junk information such as pornography, violence, superstition, reactionary and so on the Internet, which seriously affects people's daily online activities. Although there are many text filtering technologies, text filtering technology needs to be improved and improved with the change of external environment. At the same time, with the continuous improvement of people's living standards, more and more users access the Internet through mobile terminals. How to ensure that mobile users can obtain healthy and effective normal information through mobile devices, which requires the implementation of text filtering technology on the cloud platform for mobile terminals, so as to achieve the filtering of garbage pages. Based on the analysis and discussion of the existing key technologies of text filtering, this paper improves the traditional text classification algorithm based on vector space model and naive Bayes classification algorithm. The two improved text classification algorithms are used to construct a high performance text filtering system, and then the system is deployed to the mobile terminal oriented cloud platform to realize the text filtering service on the cloud platform. It ensures that mobile end users can access normal and legitimate web pages on the Internet through mobile devices. The main content of this paper is: 1. On the basis of analyzing and studying the common feature selection algorithms in text filtering technology, we apply the idea of equal proportion to feature selection, so that the extracted text feature vector can reflect the text topic more accurately. Based on the analysis and discussion of the existing weight calculation methods in text filtering technology, the structure information, length information and specific gravity information of feature items are considered, and the traditional weight calculation method is improved. It can better reflect the importance of feature items to the classification of web pages. Web pages are a kind of structured or semi-structured documents. At the same time, the improved weight calculation method based on specific gravity and the equal proportion feature selection method are applied to the traditional classification algorithm based on vector space model and naive Bayes classification algorithm. Therefore, a high performance web page filtering system is constructed by using these two improved classification algorithms, and the system is deployed on the cloud platform to provide text filtering services. The test results show that the improved text classification algorithm has higher classification accuracy, lower error rate and error rate than the traditional algorithm, and the improved text filtering system has better performance.
【學(xué)位授予單位】:電子科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP393.09;TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文 前2條

1 阮彤,馮東雷,李京;基于貝葉斯網(wǎng)絡(luò)的信息過濾模型研究[J];計算機(jī)研究與發(fā)展;2002年12期

2 張霖;羅永亮;陶飛;任磊;郭華;;制造云構(gòu)建關(guān)鍵技術(shù)研究[J];計算機(jī)集成制造系統(tǒng);2010年11期

,

本文編號:2040343

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/guanlilunwen/ydhl/2040343.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶e06f9***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com