一種增量倒排索引結(jié)構(gòu)的設(shè)計(jì)與實(shí)現(xiàn)
發(fā)布時(shí)間:2018-05-08 13:58
本文選題:主題式搜索引擎 + 增量倒排索引; 參考:《吉林大學(xué)學(xué)報(bào)(理學(xué)版)》2007年06期
【摘要】:針對(duì)主題爬行器獲取網(wǎng)頁(yè)更新速度快的特點(diǎn),提出一種用于網(wǎng)絡(luò)搜索引擎的增量索引結(jié)構(gòu).在建立倒排索引時(shí),每個(gè)詞項(xiàng)的記錄表以鏈接塊的形式存放于倒排索引文件中,每次新分配的塊大小遞增.該索引結(jié)構(gòu)解決了倒排索引連續(xù)存儲(chǔ)所帶來(lái)的難以更新問(wèn)題.實(shí)驗(yàn)結(jié)果表明,與支持實(shí)時(shí)更新的傳統(tǒng)鏈表式存儲(chǔ)方式相比,這種索引結(jié)構(gòu)能提供更高效的檢索,采用以空間換時(shí)間的方法有效地提高了索引的更新效率.
[Abstract]:An incremental index structure for web search engines is proposed in this paper. When the inverted index is built, the record table of each word item is stored in the inverted index file in the form of link block, and the size of each newly allocated block is incremented. The index structure solves the problem that the inverted index is difficult to update in continuous storage. The experimental results show that the index structure can provide more efficient retrieval than the traditional linked list storage method which supports real-time update and the updating efficiency of the index is improved effectively by using the method of space-for-time exchange.
【作者單位】: 吉林大學(xué)計(jì)算機(jī)科學(xué)與技術(shù)學(xué)院 吉林大學(xué)符號(hào)計(jì)算與知識(shí)工程教育部重點(diǎn)實(shí)驗(yàn)室
【基金】:國(guó)家自然科學(xué)基金(批準(zhǔn)號(hào):60373099) 教育部“符號(hào)計(jì)算與知識(shí)工程”重點(diǎn)實(shí)驗(yàn)室項(xiàng)目基金(批準(zhǔn)號(hào):93K-17) 吉林省科技發(fā)展計(jì)劃項(xiàng)目基金(批準(zhǔn)號(hào):20070533)
【分類號(hào)】:TP391.3
,
本文編號(hào):1861681
本文鏈接:http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/1861681.html
最近更新
教材專著