天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 搜索引擎論文 >

用戶視頻檢索意圖強(qiáng)度識(shí)別算法研究

發(fā)布時(shí)間:2018-05-15 01:22

  本文選題:短文本分類 + 信息檢索; 參考:《浙江大學(xué)》2015年碩士論文


【摘要】:隨著數(shù)據(jù)爆炸性增長(zhǎng),用戶在信息面前面臨越來(lái)越多的選擇性困難。搜索引擎是人們獲取信息的一個(gè)重要手段,并且隨著智能設(shè)備的普及,移動(dòng)端的搜索占有越來(lái)越重要的地位。移動(dòng)設(shè)備有限的展示空間決定了要為用戶提供盡可能精準(zhǔn)、有效的信息,因此需要更加準(zhǔn)確識(shí)別用戶的檢索意圖,從而為用戶提供更加精準(zhǔn)的服務(wù),增強(qiáng)用戶體驗(yàn)。然而在互聯(lián)網(wǎng)發(fā)達(dá)的時(shí)代,人們的信息需求通常以短串的形式表達(dá),一般由3-4個(gè)詞組成,信息描述相對(duì)模糊、歧義性較強(qiáng),造成了對(duì)用戶實(shí)際需求識(shí)別不夠準(zhǔn)確。本文利用搜索引擎中豐富的數(shù)據(jù)資源以及用戶的交互結(jié)果,分析、解決用戶視頻檢索意圖強(qiáng)度識(shí)別的問(wèn)題。該技術(shù)應(yīng)用于通用搜索和視頻檢索系統(tǒng)中,通過(guò)分析用戶的檢索串識(shí)別出視頻意圖強(qiáng)弱,從而將更加精準(zhǔn)的結(jié)果以友好的方式展示給用戶。本文首先對(duì)用戶輸入的檢索串利用搜索引擎展示結(jié)果以及用戶點(diǎn)擊結(jié)果中的標(biāo)題進(jìn)行擴(kuò)展,同時(shí)根據(jù)本課題類別間文本重合度較高的特點(diǎn)提出了一種新的基于熵和詞頻的文本特征選擇方法。其次,詳細(xì)設(shè)計(jì)并抽取了基于文本、視頻域名統(tǒng)計(jì)、搜索引擎返回結(jié)果類型、深度語(yǔ)言模型的語(yǔ)義信息以及session的統(tǒng)計(jì)等5組不同的特征及其組合方法進(jìn)行實(shí)驗(yàn),驗(yàn)證了本課題的有效性。受深度學(xué)習(xí)語(yǔ)言模型word2vec的啟發(fā),提出了站點(diǎn)域名的詞向量表示方法Host2vec,將深度語(yǔ)言模型引入檢索意圖強(qiáng)度識(shí)別的問(wèn)題中來(lái)。最后,針對(duì)用戶檢索視頻檢索意圖強(qiáng)度隨時(shí)序變化的關(guān)系進(jìn)行了分析、挖掘。
[Abstract]:With the explosive growth of data, users face more and more difficulties of selectivity in front of information. Search engine is an important means for people to obtain information, and with the popularity of intelligent devices, mobile search plays an increasingly important role. The limited display space of mobile devices determines the need to provide users with as accurate and effective information as possible, so it is necessary to identify users' retrieval intentions more accurately, so as to provide users with more accurate services and enhance user experience. However, in the era of Internet development, people's information needs are usually expressed in short strings, usually composed of 3-4 words. The information description is relatively vague and ambiguous, which results in inaccurate identification of users' actual needs. Based on the rich data resources in search engines and the interactive results of users, this paper analyzes and solves the problem of identifying the intension of users' video retrieval. This technique is applied to the general search and video retrieval system. By analyzing the user's retrieval string, the video intention is identified, and the more accurate results are displayed to the user in a friendly manner. This paper first extends the search string input by the user using search engines to display the results as well as the titles in the user click results. At the same time, a new text feature selection method based on entropy and word frequency is proposed. Secondly, we design and extract five groups of different features and their combination methods based on text, video domain name statistics, search engine return result type, semantic information of depth language model and session statistics. The validity of this subject is verified. Inspired by the deep learning language model (word2vec), this paper proposes a word vector representation method of site domain name, Host2vec. the depth language model is introduced into the problem of identifying the intension of retrieval intention. Finally, the relationship between the order change of the intention intensity of the user retrieval video retrieval is analyzed, and the mining is carried out.
【學(xué)位授予單位】:浙江大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2015
【分類號(hào)】:TP391.41

【參考文獻(xiàn)】

相關(guān)期刊論文 前1條

1 張磊;李亞楠;王斌;李鵬;蔣在帆;;網(wǎng)頁(yè)搜索引擎查詢?nèi)罩镜腟ession劃分研究[J];中文信息學(xué)報(bào);2009年02期

,

本文編號(hào):1890350

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/1890350.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶d3c9c***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com