面向質(zhì)量安全的元搜索數(shù)據(jù)采集系統(tǒng)的設(shè)計與實現(xiàn)
[Abstract]:At present, quality and safety problems occur frequently, and with the popularity of the Internet, quality and safety issues are more and more discussed by the public on the Internet. Comments on quality and safety published on the Internet and Internet media reports on quality and safety can be used as textual data for quality and safety analysis. Therefore, the Internet can become the data source of quality and safety information acquisition, which provides the data basis for quality and safety analysis. In this paper, a data acquisition system based on meta-search is designed and implemented, which is responsible for collecting web pages related to quality and safety. In this paper, meta-search engine is no longer the traditional way to use, but is used to collect data according to the query words set by the user. The function of the system is mainly divided into three functional blocks: meta-search query, web page extraction and correlation determination. The different meta-search engines are encapsulated in the meta-search function block, and the query is managed by priority scheduling. In the function block of web page extraction, two methods based on template analysis and statistical analysis are adopted: template analysis is mainly responsible for the extraction of result links, and statistical analysis is used as a general text extraction method. The classification algorithm of support vector machine is used to filter the quality and safety related data and remove the noise information in the correlation decision function block. Finally, the paper tests the effect of web page extraction and classification, and shows the results of the system. Because the quality and safety related data are scattered on the Internet and the data characteristics are obvious, this paper abandons the use of targeted crawler mode to collect data, and makes an attempt to use meta-search engine for data acquisition. This paper has certain reference significance to other fields of data acquisition research.
【學位授予單位】:華中科技大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TP274.2
【參考文獻】
相關(guān)期刊論文 前10條
1 吳東辰;;國內(nèi)外幾種主要搜索引擎比較[J];福建圖書館理論與實踐;2005年04期
2 王琦,唐世渭,楊冬青,王騰蛟;基于DOM的網(wǎng)頁主題信息自動提取[J];計算機研究與發(fā)展;2004年10期
3 孟軍;劉秋水;王秀坤;;節(jié)點頻度和語義距離相結(jié)合的網(wǎng)頁正文信息抽取[J];計算機工程與應用;2009年01期
4 彭洪匯;林作銓;;Internet上的搜索引擎和元搜索引擎[J];計算機科學;2002年09期
5 陸安江;董旭暉;;個性化元搜索引擎模型的研究與設(shè)計[J];計算機與現(xiàn)代化;2011年01期
6 孫承杰,關(guān)毅;基于統(tǒng)計的網(wǎng)頁正文信息抽取方法的研究[J];中文信息學報;2004年05期
7 詹勇;;質(zhì)量安全是企業(yè)首要責任[J];決策導刊;2008年10期
8 李綱;戴強斌;;WNBTE網(wǎng)頁正文抽取方法研究[J];情報科學;2008年03期
9 龔蛟騰;元搜索引擎研究[J];情報雜志;2004年10期
10 原福永;梁順攀;;元搜索引擎的現(xiàn)狀與發(fā)展[J];計算機工程與設(shè)計;2005年12期
相關(guān)博士學位論文 前1條
1 杜亞軍;搜索引擎智能行為的研究及實現(xiàn)[D];西南交通大學;2005年
相關(guān)碩士學位論文 前3條
1 王春艷;元搜索引擎的研究與實現(xiàn)[D];吉林大學;2011年
2 陳劍敏;基于Bayes方法的文本分類器的研究與實現(xiàn)[D];重慶大學;2007年
3 吳鵬;支持向量機文本分類算法的研究及其應用[D];大連理工大學;2009年
本文編號:2474318
本文鏈接:http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/2474318.html