連續(xù)隱馬爾科夫模型在點(diǎn)擊欺詐識(shí)別中的應(yīng)用研究

發(fā)布時(shí)間：2018-05-31 23:01

本文選題：點(diǎn)擊欺詐 + 連續(xù)隱馬爾科夫模型�。� 參考：《上海交通大學(xué)》2013年碩士論文

【摘要】：隨著搜索引擎關(guān)鍵詞廣告營銷模式的蓬勃發(fā)展，欺詐點(diǎn)擊行為已經(jīng)成為困擾廣告商和搜索引擎公司的一大難題。對(duì)于點(diǎn)擊欺詐識(shí)別與防治問題的研究也成為國內(nèi)外學(xué)者們關(guān)注的焦點(diǎn)。本文分析了搜索引擎在線關(guān)鍵詞廣告的點(diǎn)擊欺詐（click fraud）問題及其行為特征。鑒于關(guān)鍵詞廣告對(duì)應(yīng)的點(diǎn)擊行為模式較為符合隱馬爾科夫模型（HMM）的基本前提假設(shè)，，本文試圖把HMM模型的理論框架應(yīng)用于點(diǎn)擊欺詐識(shí)別。本文的工作主要有：（1）HMM只是一個(gè)理論框架模型。本文對(duì)關(guān)鍵詞點(diǎn)擊的行為模式進(jìn)行了分析，搭建了針對(duì)搜索引擎關(guān)鍵詞廣告的連續(xù)隱馬氏模型（CHMM），并創(chuàng)立了欺詐點(diǎn)擊行為的識(shí)別方法；（2）根據(jù)觀測(cè)數(shù)據(jù)，訓(xùn)練得到CHMM模型（參數(shù)估計(jì)），并對(duì)該模型的識(shí)別效果進(jìn)行了驗(yàn)證。統(tǒng)計(jì)結(jié)果表明：CHMM模型對(duì)點(diǎn)擊欺詐的識(shí)別有較高的準(zhǔn)確率；（3）討論了模型中的參數(shù)：隱狀態(tài)數(shù)N、序列的長度R、以及閾值大小，選取不同值的情況下，模型的識(shí)別準(zhǔn)確度。以確定最佳的隱狀態(tài)數(shù)（固定值）和閾值等參數(shù)。（4）由于時(shí)間段、突發(fā)事件等因素影響，可能導(dǎo)致某一在線關(guān)鍵詞廣告的點(diǎn)擊率明顯提升，但是這并不是欺詐點(diǎn)擊造成的。本文采用動(dòng)態(tài)的CHMM模型，不斷更新用于訓(xùn)練的時(shí)間序列數(shù)據(jù)，以產(chǎn)生新的參數(shù)，可以很好的降低這類因素對(duì)模型識(shí)別準(zhǔn)確度的影響。（5）隱馬爾科夫模型（HMM）的參數(shù)估計(jì)是其應(yīng)用于識(shí)別問題時(shí)能否達(dá)到較高的準(zhǔn)確率的關(guān)鍵。傳統(tǒng)的Baum-Welch算法有諸多缺陷，基于SegmentalK-Means（SKM）的訓(xùn)練算法，與Baum-Welch算法相比，不僅可以降低運(yùn)算的復(fù)雜度，收斂速度也較快，而且該算法更側(cè)重于對(duì)模型的輸出模式進(jìn)行自動(dòng)分類識(shí)別。因此，對(duì)點(diǎn)擊欺詐識(shí)別問題，SKM算法更有針對(duì)性，適用性更強(qiáng)。實(shí)證分析也表明，SKM訓(xùn)練算法對(duì)于點(diǎn)擊欺詐的識(shí)別效果更好。此外，本文初步探討了基于MCMC的Gibbs抽樣法的HMM參數(shù)估計(jì)方法。
[Abstract]:With the vigorous development of search engine keyword advertising marketing mode, fraudulent click behavior has become a major problem for advertisers and search engine companies. The research on click fraud identification and prevention has also become the focus of scholars at home and abroad. This paper analyzes the click Fraud-click problem of online keyword advertising in search engines and its behavioral characteristics. In view of the fact that the click behavior pattern corresponding to the advertisement corresponds to the basic premise hypothesis of Hidden Markov Model (hmm), this paper attempts to apply the theoretical framework of HMM model to click fraud identification. The main work of this paper is as follows: The hmm is only a theoretical framework model. In this paper, the behavior pattern of keyword click is analyzed, the continuous hidden Markov model for keyword advertisement is built, and the identification method of fraudulent click behavior is established. (2) according to the observed data, the CHMM model (parameter estimation) is obtained, and the recognition effect of the model is verified. The statistical results show that the 1: CHMM model has a high accuracy in the recognition of click fraud. (3) the parameters of the model are discussed: the number of hidden states N, the length of the sequence R, and the threshold value. The recognition accuracy of the model is obtained by selecting different values. In order to determine the best number of hidden states (fixed value) and threshold and other parameters. Due to the influence of time period, unexpected events and other factors, the click rate of an online keyword advertisement may increase obviously, but this is not caused by fraudulent click. In this paper, the dynamic CHMM model is used to continuously update the time series data used for training to produce new parameters, which can reduce the influence of these factors on the accuracy of model recognition. The parameter estimation of hidden Markov model (HMMM) is the key to the accuracy of HMMM when it is applied to the problem of recognition. The traditional Baum-Welch algorithm has many defects. Compared with the Segmental K-Means-SKM (Segmental K-Means-SKM) algorithm, the algorithm can not only reduce the computational complexity and the convergence speed, but also focus on the automatic classification and recognition of the output pattern of the model. Therefore, the SKM algorithm is more specific and applicable to the problem of click fraud identification. Empirical analysis also shows that SKM training algorithm is more effective in the recognition of click fraud. In addition, this paper preliminarily discusses the HMM parameter estimation method based on Gibbs sampling method based on MCMC.
【學(xué)位授予單位】：上海交通大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2013
【分類號(hào)】：TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文前7條

1 李葦營，易克初，胡征;神經(jīng)網(wǎng)絡(luò)與HMM構(gòu)成的混合網(wǎng)絡(luò)在語音識(shí)別中應(yīng)用的研究[J];電子學(xué)報(bào);1994年10期

2 袁健;張勁松;馬良;;一種有效預(yù)防點(diǎn)擊欺詐的策略[J];計(jì)算機(jī)應(yīng)用;2009年07期

3 張祖蓮;卡米力·木衣丁;王命全;;一種有效預(yù)防點(diǎn)擊欺詐的算法[J];計(jì)算機(jī)應(yīng)用;2010年07期

4 龔尚福;姜曉旭;;基于用戶行為分析的廣告欺詐點(diǎn)擊檢測(cè)[J];計(jì)算機(jī)應(yīng)用與軟件;2011年04期

5 高志堅(jiān);;引入第三方監(jiān)測(cè)根治點(diǎn)擊欺詐[J];生產(chǎn)力研究;2007年18期

6 歐海鷹;呂廷杰;;在線關(guān)鍵詞廣告研究綜述:新的研究方向[J];管理評(píng)論;2011年04期

7 黃曉彬;王春峰;房振明;熊春連;;基于隱馬爾科夫模型的中國股票信息探測(cè)[J];系統(tǒng)工程理論與實(shí)踐;2012年04期

相關(guān)碩士學(xué)位論文前4條

1 張喜良;拓展的隱馬氏模型和基于遺傳算法的參數(shù)估計(jì)方法[D];國防科學(xué)技術(shù)大學(xué);2010年

2 張靜亞;基于HMM的漢語連續(xù)數(shù)字語音識(shí)別[D];蘇州大學(xué);2005年

3 吳yN;在線廣告點(diǎn)擊欺騙的檢測(cè)和應(yīng)用[D];上海交通大學(xué);2006年

4 舒正勇;商業(yè)搜索引擎的點(diǎn)擊欺詐問題研究[D];遼寧師范大學(xué);2008年

本文編號(hào)：1961677

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/1961677.html

上一篇：基于聯(lián)合索引的下一代圖書館學(xué)術(shù)資源搜索研究
下一篇：中文電子商務(wù)搜索引擎有效性比較

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

連續(xù)隱馬爾科夫模型在點(diǎn)擊欺詐識(shí)別中的應(yīng)用研究