面向網(wǎng)頁(yè)排序的關(guān)鍵詞權(quán)值計(jì)算

發(fā)布時(shí)間：2018-11-01 16:11

【摘要】：隨著信息科技的發(fā)展和互聯(lián)網(wǎng)的日益普及，搜索引擎深受人們的重視，近年來(lái)最主流的搜索引擎是基于關(guān)鍵詞檢索的搜索引擎，在基于關(guān)鍵詞檢索的搜索引擎中，用戶查詢語(yǔ)句中各個(gè)詞語(yǔ)權(quán)值計(jì)算的精度將直接影響到后續(xù)網(wǎng)頁(yè)排序的好壞，因此正確計(jì)算檢索條件中詞語(yǔ)權(quán)值是至關(guān)重要的。本文的研究是試圖尋找一種面向網(wǎng)頁(yè)排序的用戶查詢語(yǔ)句關(guān)鍵詞權(quán)值計(jì)算方法，使基于關(guān)鍵詞檢索的搜索引擎在網(wǎng)頁(yè)排序這一環(huán)節(jié)達(dá)到一個(gè)更高的水平，為后續(xù)檢索處理打下良好的基礎(chǔ)。為了完成研究目的，本文的工作主要包括以下三個(gè)部分：用戶查詢語(yǔ)句自身特點(diǎn)分析。對(duì)標(biāo)注了核心詞的5000句查詢語(yǔ)句自身特點(diǎn)與詞語(yǔ)權(quán)值關(guān)系進(jìn)行分析，對(duì)查詢語(yǔ)句中含有的停用詞和現(xiàn)代漢語(yǔ)語(yǔ)料中停用詞進(jìn)行分析，并對(duì)不同類別下查詢語(yǔ)句中停用詞進(jìn)行了分析和舉例。面向網(wǎng)頁(yè)排序的關(guān)鍵詞權(quán)值計(jì)算。對(duì)用戶查詢?nèi)罩具M(jìn)行分詞和詞性標(biāo)注，將關(guān)鍵詞抽取任務(wù)視為分類任務(wù)，結(jié)合查詢語(yǔ)句自身的特點(diǎn)，，最終確定出每個(gè)詞語(yǔ)的八個(gè)上下文特征作為決策樹森林分類的特征，并分別介紹了各個(gè)特征的計(jì)算方法。并對(duì)實(shí)驗(yàn)結(jié)果進(jìn)行錯(cuò)誤分析，加入一些規(guī)則對(duì)模型分類的結(jié)果進(jìn)行后處理。實(shí)驗(yàn)結(jié)果分析。對(duì)決策樹分類方法與傳統(tǒng)關(guān)鍵詞提取和權(quán)值計(jì)算方法的結(jié)果進(jìn)行對(duì)比分析，從用戶查詢?nèi)罩局须S機(jī)抽取1000條左右查詢語(yǔ)句進(jìn)行人工評(píng)測(cè)，使用交叉驗(yàn)證的方法評(píng)測(cè)模型準(zhǔn)確率和召回率；比較模型方法與傳統(tǒng)的網(wǎng)頁(yè)排序中權(quán)值計(jì)算方法的勝出率；選擇幾個(gè)查詢語(yǔ)句，到“百度”上搜索，得出由模型確定的關(guān)鍵詞序列進(jìn)行搜索與不對(duì)關(guān)鍵詞進(jìn)行處理的查詢語(yǔ)句搜索對(duì)網(wǎng)頁(yè)排序效果的影響。實(shí)驗(yàn)結(jié)果表明本文采用的關(guān)鍵詞抽取和權(quán)值計(jì)算方法在網(wǎng)頁(yè)排序的權(quán)值計(jì)算中是切實(shí)可行的。
[Abstract]:With the development of information technology and the increasing popularity of the Internet, search engines are paid more attention by people. In recent years, the most mainstream search engine is the search engine based on keyword search, which is based on keyword search engine. The accuracy of calculating the weight of each word in the user query statement will directly affect the order of the subsequent web pages, so it is very important to correctly calculate the word weight value in the retrieval condition. In this paper, we try to find a method to calculate the keyword weight of user query statements in order to make the search engine based on keyword search reach a higher level. It lays a good foundation for the subsequent retrieval processing. In order to accomplish the purpose of the research, this paper mainly includes the following three parts: the characteristics of user query statements. This paper analyzes the relationship between the characteristics of the 5000 sentence query sentences marked with the core words and the weight of the words, and analyzes the stop words contained in the query statements and the stop words in the modern Chinese corpus. At the same time, the analysis and examples of stop-word in query statements under different categories are given. Keyword weight calculation for web page sorting. The segmentation and part of speech tagging of user query log is carried out, and the task of keyword extraction is regarded as a classification task. Combining with the characteristics of query statements, the eight contextual features of each word are finally determined as the characteristics of forest classification in decision tree. The calculation methods of each characteristic are introduced respectively. Error analysis of the experimental results is carried out, and some rules are added to post-process the results of model classification. Analysis of experimental results. The results of decision tree classification method and traditional keyword extraction and weight calculation methods are compared and analyzed. About 1000 query statements are randomly extracted from the user's query log for manual evaluation. The accuracy and recall rate of the model are evaluated by cross-validation. Compare the winning rate between the model method and the traditional weight calculation method in web page sorting; Several query statements are selected to search on "Baidu", and the influence of the keyword sequence determined by the model and the search statement that does not deal with the keywords on the ranking effect of the web pages is obtained. The experimental results show that the method of keyword extraction and weight calculation used in this paper is feasible in the weight calculation of web page sorting.
【學(xué)位授予單位】：中國(guó)社會(huì)科學(xué)院研究生院
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2013
【分類號(hào)】：TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 羅智勇;宋柔;;基于多特征的自適應(yīng)新詞識(shí)別[J];北京工業(yè)大學(xué)學(xué)報(bào);2007年07期

2 李衛(wèi)東;宋威;李欣;楊炳儒;;一種多標(biāo)準(zhǔn)決策樹剪枝方法及其在入侵檢測(cè)中的應(yīng)用[J];北京科技大學(xué)學(xué)報(bào);2007年04期

3 呂鳴劍;;數(shù)據(jù)挖掘在知識(shí)工程中的應(yīng)用研究[J];電腦知識(shí)與技術(shù);2011年23期

4 熊文新;宋柔;;信息檢索用戶查詢語(yǔ)句的停用詞過(guò)濾[J];計(jì)算機(jī)工程;2007年06期

5 張映海;何中市;陳永鋒;;搜索引擎結(jié)果中Web文檔的排序研究[J];計(jì)算機(jī)與數(shù)字工程;2007年02期

6 文炯;;搜索引擎之競(jìng)價(jià)排名研究[J];江西圖書館學(xué)刊;2006年01期

7 游榮彥;Zipf定律與漢字字頻分布[J];中文信息學(xué)報(bào);2000年03期

8 黃永文,何中市;基于互信息的統(tǒng)計(jì)語(yǔ)言模型平滑技術(shù)[J];中文信息學(xué)報(bào);2005年04期

9 索紅光;劉玉樹;曹淑英;;一種基于詞匯鏈的關(guān)鍵詞抽取方法[J];中文信息學(xué)報(bào);2006年06期

10 黃昌寧;趙海;;中文分詞十年回顧[J];中文信息學(xué)報(bào);2007年03期

相關(guān)會(huì)議論文前2條

1 張建強(qiáng);;基于語(yǔ)料庫(kù)的現(xiàn)代漢語(yǔ)疑問(wèn)句使用情況調(diào)查[A];第五屆全國(guó)語(yǔ)言文字應(yīng)用學(xué)術(shù)研討會(huì)論文集[C];2007年

2 魏志成;;漢語(yǔ)句型系統(tǒng)的解構(gòu)與重構(gòu)[A];中國(guó)英漢語(yǔ)比較研究會(huì)第七次全國(guó)學(xué)術(shù)研討會(huì)論文集[C];2006年

相關(guān)博士學(xué)位論文前1條

1 張俊林;基于語(yǔ)言模型的信息檢索系統(tǒng)研究[D];中國(guó)科學(xué)院研究生院（軟件研究所）;2004年

相關(guān)碩士學(xué)位論文前1條

1 毛婷婷;中文專有名詞識(shí)別的研究[D];大連理工大學(xué);2006年

本文編號(hào)：2304434

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/2304434.html

上一篇：融合多類特征的Web查詢意圖識(shí)別
下一篇：網(wǎng)絡(luò)信息移動(dòng)搜索的結(jié)構(gòu)框架與技術(shù)機(jī)理探討

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

面向網(wǎng)頁(yè)排序的關(guān)鍵詞權(quán)值計(jì)算