聚類算法在網(wǎng)頁分類中的應(yīng)用研究
[Abstract]:In recent years, with the continuous development of information technology, the number of web pages has increased dramatically, and the amount of information on the network has increased sharply. Search engines can help users block out a lot of irrelevant information. Search engine system has entered the third era marked by intelligence and humanization. The most important characteristic of this era is to move artificial intelligence technology into search engine system, and clustering algorithm is the most important one. Clustering technology divides the results returned by search engines into several classes for users to search. At present, most of the existing search engines simply cluster the content of the web pages. Based on the analysis of the existing clustering algorithms, this paper applies the CBC algorithm to the web page clustering. The search term is added as the main reference data, and the CBC algorithm is improved in the calculation of the feature weight by increasing the weight of the search term in the clustering. The improved CBC algorithm is implemented, and the results of the new algorithm and the traditional K-means algorithm are compared with the data set. It is proved that the algorithm is superior to the traditional K-means algorithm in accuracy and has obvious advantages in efficiency. Finally, this paper designs a Chinese clustering system based on the improved clustering algorithm. On this basis, the algorithm and the next work are proposed to improve the idea.
【學(xué)位授予單位】:北京化工大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前5條
1 陳建超;胡桂武;楊志華;嚴(yán)桂奪;;基于全局性確定聚類中心的文本聚類[J];計(jì)算機(jī)工程與應(yīng)用;2011年10期
2 熊忠陽;吳林敏;張玉芳;;針對非均勻數(shù)據(jù)集的DBSCAN過濾式改進(jìn)算法[J];計(jì)算機(jī)應(yīng)用研究;2009年10期
3 閆仁武;商好值;;一種基于遺傳算法的模糊C均值算法[J];科學(xué)技術(shù)與工程;2010年28期
4 趙慧;劉希玉;崔海青;;網(wǎng)格聚類算法[J];計(jì)算機(jī)技術(shù)與發(fā)展;2010年09期
5 孔繼利;顧傜;孫欣;馮愛蘭;;系統(tǒng)聚類和重心法在多節(jié)點(diǎn)配送中心選址中的研究[J];物流技術(shù);2010年05期
相關(guān)會議論文 前1條
1 李世峰;黃磊;劉昌平;;幾種聚類方法的比較[A];第八屆全國漢字識別學(xué)術(shù)會議論文集[C];2002年
相關(guān)博士學(xué)位論文 前1條
1 于澝;基于一維SOM神經(jīng)網(wǎng)絡(luò)的聚類及數(shù)據(jù)分析方法研究[D];天津大學(xué);2009年
相關(guān)碩士學(xué)位論文 前2條
1 林麗;基于語義距離的文本聚類算法研究[D];廈門大學(xué);2007年
2 翟少丹;基于混合模型的聚類算法研究[D];西北大學(xué);2009年
本文編號:2193609
本文鏈接:http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/2193609.html