基于多表數(shù)據(jù)庫的中文關(guān)鍵詞Top-N查詢處理
發(fā)布時(shí)間:2018-05-28 20:24
本文選題:關(guān)系數(shù)據(jù)庫 + 中文關(guān)鍵詞; 參考:《河北大學(xué)》2013年碩士論文
【摘要】:關(guān)鍵詞查詢的理論和技術(shù)在信息檢索和Web搜索引擎中得到了廣泛深入的研究和應(yīng)用。傳統(tǒng)數(shù)據(jù)庫管理系統(tǒng)僅支持模式匹配,不支持自由形態(tài)的關(guān)鍵詞查詢。鑒于此,近年來關(guān)系數(shù)據(jù)庫上的關(guān)鍵詞查詢處理的研究成為備受關(guān)注的前沿課題之一。傳統(tǒng)關(guān)系數(shù)據(jù)庫系統(tǒng)運(yùn)用結(jié)構(gòu)化查詢語言(SQL)對(duì)數(shù)據(jù)庫進(jìn)行操作,需要用戶掌握SQL和數(shù)據(jù)庫模式,這對(duì)于普通用戶是困難的。此外,對(duì)返回的查詢結(jié)果,傳統(tǒng)數(shù)據(jù)庫系統(tǒng)只能進(jìn)行簡單排序,用戶要想從中獲取最感興趣的信息是很困難的。目前,關(guān)鍵詞查詢的研究主要針對(duì)英文關(guān)鍵詞,因此針對(duì)具有多表的數(shù)據(jù)庫,本文給出一種中文關(guān)鍵詞top-N查詢處理方法。此方法創(chuàng)建索引表存儲(chǔ)從數(shù)據(jù)庫中析出的中文元組字及其相關(guān)信息,進(jìn)而構(gòu)造索引用以快速匹配查詢關(guān)鍵字,借鑒IR的相似度公式構(gòu)造適合中文關(guān)鍵詞查詢的排序策略。對(duì)于一個(gè)中文關(guān)鍵詞查詢,利用索引快速匹配查詢字和元組字得到相應(yīng)信息,,并根據(jù)這些信息創(chuàng)建候選元組生成鏈表和SQL查詢語句,進(jìn)而得到候選元組及其與查詢之間的相似度,最終按相似度返回Top-N結(jié)果。此方法實(shí)現(xiàn)了按字搜索及中文的縮略詞的查詢處理。最后利用真實(shí)數(shù)據(jù)集進(jìn)行實(shí)驗(yàn),實(shí)驗(yàn)內(nèi)容包括對(duì)查詢相應(yīng)時(shí)間和準(zhǔn)確性的驗(yàn)證,實(shí)驗(yàn)數(shù)據(jù)顯示本文方法是有效的。
[Abstract]:The theory and technology of keyword query have been widely studied and applied in information retrieval and Web search engine. Traditional database management system only supports pattern matching, not free form keyword query. In view of this, the research of keyword query processing on relational database has become one of the most concerned topics in recent years. The traditional relational database system uses structured query language SQL) to operate the database, which requires users to master SQL and database schema, which is difficult for ordinary users. In addition, the traditional database system can only sort the returned query results simply, so it is difficult for users to obtain the most interesting information from them. At present, the research of keyword query is mainly focused on English keywords, so for the database with multiple tables, this paper presents a method of Chinese keyword top-N query processing. In this method, the index table is created to store the Chinese tuples and related information extracted from the database, and then the index is constructed to match the query keywords quickly, and the ranking strategy suitable for the Chinese keyword query is constructed by using the similarity formula of IR. For a Chinese keyword query, the index is used to quickly match the query word and the tuple word to get the corresponding information. According to this information, the candidate tuples are created to generate the linked list and the SQL query statement. Then the candidate tuples and their similarity with the query are obtained, and the Top-N results are returned according to the similarity. This method realizes word search and Chinese acronym query processing. Finally, the real data set is used to carry out the experiment, which includes the verification of the time and accuracy of the query, and the experimental data show that the method in this paper is effective.
【學(xué)位授予單位】:河北大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP311.13;TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前5條
1 郗君甫;劉國華;唐軍軍;祁瑞麗;朱鶴;;基于本體的關(guān)系數(shù)據(jù)庫關(guān)鍵詞語義查詢擴(kuò)展方法[J];燕山大學(xué)學(xué)報(bào);2010年03期
2 馬志柔;葉屹;;一種有效的多關(guān)鍵詞詞頻統(tǒng)計(jì)方法[J];計(jì)算機(jī)工程;2006年10期
3 柳佳剛;陳山;;基于PAT-tree的中文關(guān)鍵詞自動(dòng)檢索模式的研究[J];計(jì)算技術(shù)與自動(dòng)化;2009年02期
4 黎方正;謝東;;基于完全化語義的關(guān)鍵詞檢索研究[J];計(jì)算機(jī)應(yīng)用研究;2010年10期
5 王珊;張俊;彭朝暉;戰(zhàn)疆;杜小勇;;基于本體的關(guān)系數(shù)據(jù)庫語義檢索[J];計(jì)算機(jī)科學(xué)與探索;2007年01期
本文編號(hào):1948054
本文鏈接:http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/1948054.html
最近更新
教材專著