面向大規(guī)模RDF數(shù)據(jù)的語(yǔ)義搜索

發(fā)布時(shí)間：2018-01-30 04:27

本文關(guān)鍵詞： 語(yǔ)義搜索混合查詢圖數(shù)據(jù)索引查詢優(yōu)化實(shí)體匹配查詢翻譯排序　出處：《上海交通大學(xué)》2013年博士論文　論文類型：學(xué)位論文

【摘要】：語(yǔ)義萬(wàn)維網(wǎng)通過(guò)賦予信息明確的結(jié)構(gòu)和語(yǔ)義,使得機(jī)器不僅可以顯示這些信息,更能夠理解、處理和整合它們。近年來(lái),隨著鏈接開(kāi)放數(shù)據(jù)和DBpedia等項(xiàng)目的全面展開(kāi),語(yǔ)義Web數(shù)據(jù)源的數(shù)量激增,大量以RDF為數(shù)據(jù)模型的圖結(jié)構(gòu)語(yǔ)義數(shù)據(jù)被發(fā)布�；ヂ�(lián)網(wǎng)正從僅包含網(wǎng)頁(yè)和網(wǎng)頁(yè)之間超鏈接的文檔萬(wàn)維網(wǎng)轉(zhuǎn)變成包含大量描述各種實(shí)體和實(shí)體之間豐富關(guān)系的數(shù)據(jù)萬(wàn)維網(wǎng)。在這種背景下,以谷歌為代表的各大搜索引擎公司紛紛以此為基礎(chǔ)構(gòu)建知識(shí)圖譜來(lái)改善搜索質(zhì)量,從而拉開(kāi)了語(yǔ)義搜索的序幕。與傳統(tǒng)的文檔檢索不同,語(yǔ)義搜索需要處理粒度更細(xì)的結(jié)構(gòu)化語(yǔ)義數(shù)據(jù),因此也面臨著更大的前所未有的挑戰(zhàn)。原有成熟的針對(duì)非結(jié)構(gòu)化的Web文檔的存儲(chǔ)與索引技術(shù)對(duì)RDF數(shù)據(jù)不再適用�，F(xiàn)有的排序算法也不能直接應(yīng)用到面向?qū)嶓w和關(guān)聯(lián)的語(yǔ)義搜索中。SPARQL查詢支持和面向異構(gòu)語(yǔ)義數(shù)據(jù)源的數(shù)據(jù)整合是全新的問(wèn)題。此外,支持用戶熟悉的關(guān)鍵詞查詢對(duì)于語(yǔ)義搜索推廣的至關(guān)重要。本文旨在全面系統(tǒng)地解決了面向大規(guī)模RDF數(shù)據(jù)的語(yǔ)義搜索所面臨的挑戰(zhàn)：支持大規(guī)模圖數(shù)據(jù)存儲(chǔ)與索引,支持包含關(guān)鍵詞的圖結(jié)構(gòu)查詢,支持以實(shí)體為中心的結(jié)構(gòu)化排序,支持面向多數(shù)據(jù)源的異構(gòu)數(shù)據(jù)融合,和支持友好的用戶交互等。論文各章的主要內(nèi)容和貢獻(xiàn)如下列出：第一章為緒論,介紹了研究背景,總結(jié)了語(yǔ)義搜索的國(guó)內(nèi)外研究現(xiàn)狀并詳細(xì)描述了面向大規(guī)模RDF的語(yǔ)義搜索所面臨的主要挑戰(zhàn)。第二章首次使用信息檢索的方法來(lái)搜索數(shù)據(jù)萬(wàn)維網(wǎng)。通過(guò)利用和擴(kuò)展倒排索引來(lái)支持高效的單變量樹(shù)型混合式查詢處理。在此基礎(chǔ)上,我提出了一種基于關(guān)系的排序算法來(lái)返回相關(guān)的實(shí)體,使用分面瀏覽來(lái)允許用戶交互性地構(gòu)造混合式查詢,以及基于塊的索引來(lái)支持增量式索引更新。第三章擴(kuò)展了第二章的結(jié)構(gòu)化查詢能力,提出了一個(gè)高效的RDF查詢引擎來(lái)執(zhí)行更一般的SPARQL查詢。此外,我通過(guò)收集特定的RDF統(tǒng)計(jì)信息來(lái)估計(jì)查詢計(jì)劃的執(zhí)行代價(jià),并設(shè)計(jì)了一個(gè)全新的查詢優(yōu)化算法來(lái)確定最優(yōu)的聯(lián)結(jié)順序,將SPARQL查詢圖轉(zhuǎn)換為最優(yōu)的查詢計(jì)劃。第四章討論了基于RDF圖模式的高效查詢處理。本章介紹了兩種模式選擇策略,一種通過(guò)啟發(fā)式規(guī)則來(lái)選擇RDF頻繁子圖,另一種使用查詢歷史來(lái)選擇用戶偏好的子圖結(jié)構(gòu)。在前兩章的基礎(chǔ)上,我進(jìn)一步提出基于圖模式的高效索引,通過(guò)模式樹(shù)來(lái)表示查詢計(jì)劃,并將SPARQL查詢轉(zhuǎn)換為子模式覆蓋問(wèn)題來(lái)解決。第五章提出了一個(gè)二階段整合的解決方案來(lái)解決面向大規(guī)模RDF圖數(shù)據(jù)的語(yǔ)義搜索中的實(shí)體匹配問(wèn)題。通過(guò)分塊來(lái)快速篩選候選實(shí)體對(duì)以解決可擴(kuò)展性方面的問(wèn)題。接著,利用實(shí)體的局部結(jié)構(gòu)特性在每個(gè)分塊內(nèi)部進(jìn)行聚類,取得最終的匹配結(jié)果。本項(xiàng)工作也是首次嘗試通過(guò)利用開(kāi)放鏈接數(shù)據(jù)中現(xiàn)有的sameAs三元組在大規(guī)模場(chǎng)景下進(jìn)行廣泛的實(shí)體匹配效果評(píng)估。第六章研究了一種新穎且友好的關(guān)鍵詞搜索交互方式,即在大規(guī)模圖數(shù)據(jù)(特別是RDF數(shù)據(jù))上如何進(jìn)行高效的關(guān)鍵詞查詢翻譯。我提出了一個(gè)新穎的前k子圖搜索算法,將關(guān)鍵詞查詢轉(zhuǎn)化為結(jié)構(gòu)化查詢,而不是直接計(jì)算查詢結(jié)果。我還利用摘要技術(shù)來(lái)生成只包含圖模式信息的聚合圖,來(lái)加速查詢翻譯過(guò)程。第七章介紹了一個(gè)支持按需支付數(shù)據(jù)整合的數(shù)據(jù)萬(wàn)維網(wǎng)搜索基礎(chǔ)架構(gòu)。本章將查詢翻譯擴(kuò)展到在異構(gòu)的萬(wàn)維網(wǎng)數(shù)據(jù)源上,即將用戶關(guān)鍵詞翻譯為一個(gè)跨越多個(gè)數(shù)據(jù)源的語(yǔ)義結(jié)構(gòu)化查詢。此外,我詳細(xì)介紹了數(shù)據(jù)萬(wàn)維網(wǎng)上進(jìn)行分布式查詢處理的技術(shù),特別是映射聯(lián)結(jié)。它利用第五章提到的大規(guī)模實(shí)體匹配方法來(lái)預(yù)先計(jì)算數(shù)據(jù)層映射,并對(duì)從異構(gòu)數(shù)據(jù)源中獲得的結(jié)果進(jìn)行高效合并。第八章將語(yǔ)義搜索應(yīng)用場(chǎng)景擴(kuò)展到同時(shí)包含圖結(jié)構(gòu)數(shù)據(jù)、網(wǎng)頁(yè)以及相應(yīng)語(yǔ)義標(biāo)注的混合網(wǎng)絡(luò)環(huán)境中。通過(guò)整合信息檢索和數(shù)據(jù)庫(kù)技術(shù)來(lái)構(gòu)建一個(gè)可以擴(kuò)展到大量文檔、圖結(jié)構(gòu)數(shù)據(jù)和語(yǔ)義標(biāo)注的數(shù)據(jù)庫(kù)。此外,我提出了一個(gè)新穎的數(shù)據(jù)結(jié)構(gòu)來(lái)表示混合搜索返回的(中間)結(jié)果,并設(shè)計(jì)了一系列針對(duì)混合查詢處理的高效算法。第九章總結(jié)了本文主要工作和成果并對(duì)語(yǔ)義搜索的進(jìn)一步研究做了展望。
[Abstract]:The semantic web and semantic structure by giving clear information, makes the machine can not only display the information, can be more understanding, processing and integrating them. In recent years, with the linked open data and DBpedia projects in full swing, the number of semantic Web data source in a large graph structure of semantic data with RDF data model is release. From the Internet contains only between web pages and documents of the World Wide Web hyperlink change into a large number of descriptions of various entities and contains rich data between web relations. In this context, to Google for the company on behalf of the major search engines are based on the construction of knowledge map to improve the quality of the search, which opened the prelude of semantic search.
Different from the traditional document retrieval, semantic search structured semantic data need to deal with finer granularity, so they face greater challenges hitherto unknown. According to the original maturity of non structured Web document storage and indexing techniques are no longer applicable to RDF data. The existing ranking algorithms cannot be applied directly to the.SPARQL entity oriented semantic search and the associated query in data integration and support for heterogeneous semantic data source is a new problem. In addition, users are familiar with the query keywords support is essential for the promotion of semantic search.
This paper aims to systematically solve the semantic search for large scale RDF data challenge: to support the storage and index large graph data, support graph structure containing keyword query, support structured ranking entity centric, support for multiple data sources in heterogeneous data integration, and support friendly user interaction. The the main contents of each chapter and contributions are listed below:
The first chapter is the introduction, introduces the research background, summarizes the research status of semantic search both at home and abroad, and describes the major challenges faced by large-scale RDF in semantic search.
The second chapter first use information retrieval method to search the web data. Through the use of extended inverted index and single variable tree support efficient hybrid query processing. On this basis, I propose a ranking algorithm based on the relationship to return relevant entities, using the surface to allow the user to interactively browse structure hybrid query and block index based on incremental index updates.
The third chapter extends the second chapter structured query ability, proposed an efficient RDF query engine to execute more general SPARQL query. In addition, I through to estimate the execution cost of the query plan RDF statistics collection specific, and design a new query optimization algorithm to determine the optimal order of connection, will the SPARQL query graph into an optimal query plan.
The fourth chapter discusses the efficient query processing based on RDF graph patterns. This chapter introduces two kinds of mode selection strategy, through a heuristic rule to select the RDF subgraph, another use query history to select user preference graph structure. Based on the previous two chapters, I put forward efficient the index map based on the representation of the query plan through the pattern tree, and the SPARQL query into sub model to solve the coverage problem.
The fifth chapter puts forward a solution to the two stages of integration to solve the large-scale RDF data oriented semantic search in graph entity matching problem. By block to rapid screening of candidate entities to solve the scalability problem. Then, based on the characteristics of the local structure of entities in each block within the cluster, matching the results of the final. This work is the first attempt by a wide range of entity matching evaluation in large-scale scenarios using existing open data link sameAs three tuple.
The sixth chapter studies a novel keyword search and friendly interactive way, namely in the massive map data (especially RDF) on how to efficiently query translation. I propose a novel K subgraph search algorithm, the keyword query into structured queries, instead of directly calculating the query result. I also use the technology to generate the map contains only the pattern information aggregation, to accelerate the query translation process.
The seventh chapter introduces a data support web payment data integration on-demand search infrastructure. This chapter will be extended to the World Wide Web query translation in heterogeneous data source, the user keywords for a translation across multiple data sources of semantic structured query. In addition, I detail data for the World Wide Web distributed query processing technology, especially the mapping connection. It uses fifth chapters mentioned, large-scale entity method to calculate the data mapping layer in advance, and efficient to merge from heterogeneous data sources in the results.
The eighth chapter will be extended to the semantic search application scenarios including graph structure data, "and the corresponding semantic annotation of the hybrid network environment. Through the integration of information retrieval and database technology to build a can be extended to a large number of documents, graph data and semantic annotation database. In addition, I propose a novel data hybrid structure to represent the search results (middle), and has designed a series of hybrid algorithm for efficient query processing.
The ninth chapter summarizes the main work and achievements of this paper and makes a prospect for further research on semantic search.

【學(xué)位授予單位】：上海交通大學(xué)
【學(xué)位級(jí)別】：博士
【學(xué)位授予年份】：2013
【分類號(hào)】：TP391.1

【共引文獻(xiàn)】

相關(guān)期刊論文前10條

1 劉喜文;鄭昌興;王文龍;湯剛強(qiáng);;構(gòu)建數(shù)據(jù)倉(cāng)庫(kù)過(guò)程中的數(shù)據(jù)清洗研究[J];圖書(shū)與情報(bào);2013年05期

2 梁烽;蔡淑琴;;基于超圖聚類的大本體分塊研究[J];廣西大學(xué)學(xué)報(bào)(自然科學(xué)版);2013年06期

3 潘善亮;茅琴嬌;韓露;;一種基于虛擬社交化的Web服務(wù)發(fā)現(xiàn)方法研究[J];電信科學(xué);2013年12期

4 顧益軍;解易;張培晶;;面向有組織犯罪分析的人際關(guān)系網(wǎng)絡(luò)節(jié)點(diǎn)重要性評(píng)價(jià)研究[J];中國(guó)人民公安大學(xué)學(xué)報(bào)(自然科學(xué)版);2013年04期

5 鄭晶;;基于云計(jì)算的Pagerank算法的改進(jìn)[J];福州大學(xué)學(xué)報(bào)(自然科學(xué)版);2014年01期

6 許明;吳建平;杜怡曼;謝峰;肖云鵬;;基于三部圖的路網(wǎng)節(jié)點(diǎn)關(guān)鍵度排序方法[J];北京郵電大學(xué)學(xué)報(bào);2014年S1期

7 張勝;;譜聚類在圖像識(shí)別中的應(yīng)用[J];安徽電子信息職業(yè)技術(shù)學(xué)院學(xué)報(bào);2014年02期

8 "基于大數(shù)據(jù)的互聯(lián)網(wǎng)化存量經(jīng)營(yíng)"項(xiàng)目組;"基于用戶感知的運(yùn)維轉(zhuǎn)型"項(xiàng)目組;;運(yùn)營(yíng)商存量經(jīng)營(yíng)大數(shù)據(jù)平臺(tái)及其關(guān)鍵技術(shù)研究[J];電信科學(xué);2014年06期

9 張喜平;李永樹(shù);劉剛;王蕾;;節(jié)點(diǎn)重要度貢獻(xiàn)的復(fù)雜網(wǎng)絡(luò)節(jié)點(diǎn)重要度評(píng)估方法[J];復(fù)雜系統(tǒng)與復(fù)雜性科學(xué);2014年03期

10 龔衛(wèi)華;郭偉鵬;楊良懷;;信任網(wǎng)絡(luò)中多維信任序列模式挖掘方法研究[J];電子與信息學(xué)報(bào);2014年08期

相關(guān)會(huì)議論文前2條

1 許明;吳建平;杜怡曼;謝峰;肖云鵬;;基于三部圖的路網(wǎng)節(jié)點(diǎn)關(guān)鍵度排序方法[A];2013年全國(guó)通信軟件學(xué)術(shù)會(huì)議論文集[C];2013年

2 紀(jì)雪梅;王芳;;在線社交網(wǎng)絡(luò)用戶情感傳播研究[A];2013中國(guó)信息經(jīng)濟(jì)學(xué)會(huì)學(xué)術(shù)年會(huì)暨博士生論壇論文集[C];2013年

相關(guān)博士學(xué)位論文前10條

1 郭永明;XML文檔交互式信息檢索技術(shù)研究[D];東華大學(xué);2010年

2 周春英;超數(shù)據(jù)集成挖掘方法與技術(shù)研究[D];浙江大學(xué);2012年

3 姜麗麗;實(shí)體搜索與實(shí)體解析方法研究[D];蘭州大學(xué);2012年

4 張永新;面向Web數(shù)據(jù)集成的數(shù)據(jù)融合問(wèn)題研究[D];山東大學(xué);2012年

5 張文江;地質(zhì)災(zāi)害數(shù)據(jù)集成關(guān)鍵技術(shù)研究[D];成都理工大學(xué);2013年

6 劉馨月;Web挖掘中的鏈接分析與話題檢測(cè)研究[D];大連理工大學(xué);2012年

7 馬飛;云數(shù)據(jù)中心中虛擬機(jī)放置和實(shí)時(shí)遷移研究[D];北京交通大學(xué);2013年

8 劉懿;松耦合模型驅(qū)動(dòng)的流域水資源管理決策支持系統(tǒng)研究及應(yīng)用[D];華中科技大學(xué);2013年

9 樂(lè)承毅;企業(yè)知識(shí)與員工知識(shí)貢獻(xiàn)度集成評(píng)價(jià)方法及應(yīng)用研究[D];浙江大學(xué);2013年

10 馮建周;語(yǔ)義Web服務(wù)自動(dòng)組合的關(guān)鍵技術(shù)研究[D];燕山大學(xué);2013年

相關(guān)碩士學(xué)位論文前10條

1 趙飛國(guó);面向數(shù)據(jù)挖掘的數(shù)據(jù)預(yù)處理系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)[D];北京交通大學(xué);2011年

2 徐凱豐;中文語(yǔ)義萬(wàn)維網(wǎng)本體匹配[D];上海交通大學(xué);2011年

3 吳德龍;基于存儲(chǔ)優(yōu)化模型的RDF數(shù)據(jù)查詢機(jī)制研究[D];華中科技大學(xué);2011年

4 伍印廷;RDFBase的查詢優(yōu)化和性能評(píng)估[D];天津大學(xué);2012年

5 王峰;同名排歧方法研究及其應(yīng)用[D];清華大學(xué);2009年

6 張延鵬;Data Cube中基于維層次的OLAP算法研究[D];燕山大學(xué);2010年

7 傅臨云;數(shù)據(jù)萬(wàn)維網(wǎng)自動(dòng)實(shí)體匹配[D];上海交通大學(xué);2010年

8 苗潤(rùn)華;基于聚類和孤立點(diǎn)檢測(cè)的數(shù)據(jù)預(yù)處理方法的研究[D];北京交通大學(xué);2012年

9 王江海;數(shù)據(jù)空間命名實(shí)體集成技術(shù)研究[D];華僑大學(xué);2012年

10 劉譜;高擴(kuò)展的RDF數(shù)據(jù)存儲(chǔ)系統(tǒng)研究[D];華中科技大學(xué);2012年

，

本文編號(hào)：1475291

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/1475291.html

上一篇：我國(guó)多媒體網(wǎng)絡(luò)雜志發(fā)展趨勢(shì)及策略分析
下一篇：基于特征峰識(shí)別的彈頭痕跡自動(dòng)比對(duì)方法

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

面向大規(guī)模RDF數(shù)據(jù)的語(yǔ)義搜索