天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

HTML頁(yè)面中的文獻(xiàn)記錄分析算法

發(fā)布時(shí)間:2019-04-26 00:39
【摘要】:為了使出版機(jī)構(gòu)能夠及時(shí)從大量網(wǎng)頁(yè)中發(fā)現(xiàn)所需文獻(xiàn),需要設(shè)計(jì)能夠從超文本標(biāo)記語(yǔ)言頁(yè)面中自動(dòng)提取文獻(xiàn)信息的算法.為此,設(shè)計(jì)了基于條件隨機(jī)場(chǎng)的文獻(xiàn)記錄分析算法:首先,設(shè)計(jì)了文檔對(duì)象樹(shù)的分割算法,通過(guò)分割標(biāo)記將頁(yè)面數(shù)據(jù)分成獨(dú)立的部分,這些數(shù)據(jù)塊由標(biāo)簽和文本序列構(gòu)成;隨后,將該序列作為條件隨機(jī)場(chǎng)模型的特征向量,建立文獻(xiàn)信息標(biāo)記模型;最后,設(shè)計(jì)啟發(fā)式算法,從標(biāo)記模型中提取文獻(xiàn)信息數(shù)據(jù),并通過(guò)實(shí)驗(yàn)驗(yàn)證了其有效性.
[Abstract]:In order for publishers to find the required documents from a large number of web pages in time, it is necessary to design an algorithm that can automatically extract literature information from hypertext markup language pages. For this reason, a document record analysis algorithm based on conditional random field is designed. Firstly, the segmentation algorithm of document object tree is designed. The page data is divided into independent parts by segmenting tags, and these data blocks are composed of tags and text sequences. Then, using this sequence as the feature vector of conditional random field model, the document information marking model is established. Finally, the heuristic algorithm is designed to extract the literature information data from the marking model, and the validity of the model is verified by experiments.
【作者單位】: 北京印刷學(xué)院信息工程學(xué)院;清華大學(xué)計(jì)算機(jī)科學(xué)與技術(shù)博士后流動(dòng)站;國(guó)家新聞出版廣電總局廣播電視衛(wèi)星直播管理中心;
【基金】:北京市教委科技創(chuàng)新服務(wù)能力建設(shè)項(xiàng)目(PXM2016_014223_000025) 北京印刷學(xué)院校級(jí)重點(diǎn)項(xiàng)目(ea201507);北京印刷學(xué)院教師隊(duì)伍建設(shè)—博士啟動(dòng)金項(xiàng)目(27170116005/062);北京印刷學(xué)院科研項(xiàng)目—出版物數(shù)據(jù)資產(chǎn)評(píng)估實(shí)驗(yàn)室建設(shè)項(xiàng)目(20190116005/006)
【分類號(hào)】:TP393.092
,

本文編號(hào):2465603

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/guanlilunwen/ydhl/2465603.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶fb6f8***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com