教學(xué)資源搜索平臺(tái)Web日志挖掘技術(shù)研究
發(fā)布時(shí)間:2018-05-19 02:10
本文選題:Web日志挖掘 + 數(shù)據(jù)預(yù)處理; 參考:《廣西大學(xué)》2014年碩士論文
【摘要】:隨著Web應(yīng)用的不斷增多,Web數(shù)據(jù)庫(kù)的規(guī)模也在不斷擴(kuò)大,其數(shù)據(jù)量亦逐漸加大。Web日志挖掘利用數(shù)據(jù)挖掘技術(shù)對(duì)web服務(wù)器的log日志進(jìn)行挖掘分析,探究日志中潛在的規(guī)則與模式,最終將其應(yīng)用到網(wǎng)站架構(gòu)設(shè)計(jì)、個(gè)性化服務(wù)等方面。Web日志挖掘的過(guò)程通常分為三個(gè)階段:數(shù)據(jù)預(yù)處理階段、模式發(fā)現(xiàn)階段以及模式分析階段。在整個(gè)Web日志挖掘過(guò)程中,最為重要的是數(shù)據(jù)預(yù)處理階段,它能直接影響到后面模式識(shí)別與模式分析的算法性能及計(jì)算結(jié)果。其中會(huì)話識(shí)別是數(shù)據(jù)預(yù)處理的主要環(huán)節(jié),同時(shí)也是最為基礎(chǔ)、關(guān)鍵的步驟。本文的主要研究工作包括:(1)給出了一種基于動(dòng)態(tài)時(shí)間閥值的Web會(huì)話識(shí)別方法。對(duì)目前常用的幾種會(huì)話識(shí)別方法進(jìn)行了詳細(xì)的描述,分析了每種方法的優(yōu)缺點(diǎn),在參考基于時(shí)間的啟發(fā)式識(shí)別方法基礎(chǔ)上,提出一種以站點(diǎn)首頁(yè)作為新會(huì)話的開(kāi)始,以動(dòng)態(tài)時(shí)間閥值來(lái)決定會(huì)話邊界的改進(jìn)會(huì)話識(shí)別方法,給出了算法流程圖以及具體的實(shí)現(xiàn)方法。實(shí)驗(yàn)結(jié)果表明,改進(jìn)的會(huì)話識(shí)別方法不僅可以識(shí)別出更多的真實(shí)用戶會(huì)話,而且還能有效地提高會(huì)話識(shí)別的精確度和識(shí)全度。(2)設(shè)計(jì)了一個(gè)基于Web日志挖掘的教學(xué)資源搜索平臺(tái)。該平臺(tái)以廣西中醫(yī)藥大學(xué)學(xué)校網(wǎng)站IIS日志為處理對(duì)象,選取了2013年7月某天的日志信息作為系統(tǒng)的分析數(shù)據(jù)。設(shè)計(jì)了系統(tǒng)的整體架構(gòu),對(duì)系統(tǒng)各主要模塊的功能進(jìn)行了詳細(xì)的說(shuō)明,給出了數(shù)據(jù)表結(jié)構(gòu)和每個(gè)環(huán)節(jié)的流程圖,編程實(shí)現(xiàn)了原型系統(tǒng)。
[Abstract]:With the increasing of Web application, the scale of web database is also expanding, and the data amount of web log mining is also gradually increasing. The data mining technology is used to mine and analyze the log of web server, and to explore the potential rules and patterns in the log. The process of Web log mining is usually divided into three stages: data preprocessing, pattern discovery and pattern analysis. In the whole process of Web log mining, the data preprocessing stage is the most important, which can directly affect the algorithm performance and calculation results of pattern recognition and pattern analysis. Session recognition is the main step of data preprocessing, and it is also the most basic and key step. The main research work in this paper includes: 1) A Web session recognition method based on dynamic time threshold is presented. This paper gives a detailed description of several commonly used methods of session recognition, analyzes the advantages and disadvantages of each method, and proposes a new session based on the first page of the site based on the reference of the heuristic recognition method based on time. An improved session recognition method based on the dynamic time threshold to determine the boundary of the session is presented. The algorithm flow chart and the implementation method are given. Experimental results show that the improved session recognition method can not only identify more real user sessions, but also effectively improve the accuracy and accuracy of session identification. (2) A teaching resource search platform based on Web log mining is designed. The platform takes the IIS log of Guangxi University of traditional Chinese Medicine as the processing object and selects the log information of July 2013 as the systematic analysis data. The whole structure of the system is designed, the functions of the main modules of the system are explained in detail, the structure of the data table and the flow chart of each link are given, and the prototype system is realized by programming.
【學(xué)位授予單位】:廣西大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP391.1;TP393.09
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 趙潔;董振寧;張沙清;肖南峰;;一種多粒度Web使用數(shù)據(jù)收集方法[J];現(xiàn)代圖書(shū)情報(bào)技術(shù);2011年02期
,本文編號(hào):1908236
本文鏈接:http://www.sikaile.net/guanlilunwen/ydhl/1908236.html
最近更新
教材專著