基于DFSTree鄰域索引的大圖匹配算法研究
發(fā)布時(shí)間:2018-11-26 12:21
【摘要】:圖結(jié)構(gòu)模型常被用于互聯(lián)網(wǎng),生物,計(jì)算機(jī)視覺(jué)和化學(xué)等領(lǐng)域中描述復(fù)雜的數(shù)據(jù)對(duì)象及對(duì)象之間的關(guān)聯(lián)關(guān)系。圖搜索在圖計(jì)算研究領(lǐng)域扮演著十分重要的角色,圖搜索可根據(jù)其應(yīng)用場(chǎng)景的不同分為兩類(lèi):子圖查詢與子圖匹配,其中子圖查詢是在圖數(shù)據(jù)庫(kù)中找出所有與查詢圖匹配的超圖集合并返回;子圖匹配則是從圖數(shù)據(jù)庫(kù)中找到所有與查詢圖同構(gòu)的子圖集并返回。無(wú)論是圖搜索中的圖查詢亦或是圖匹配,均需要借助于子圖同構(gòu)算法來(lái)對(duì)查詢圖與返回結(jié)果中的圖進(jìn)行同構(gòu)判定。眾所周知子圖同構(gòu)問(wèn)題是NP完全問(wèn)題,在業(yè)界中圖同構(gòu)問(wèn)題是通過(guò)搜索剪枝的快速啟發(fā)算法來(lái)實(shí)現(xiàn);谒阉鞯淖訄D同構(gòu)算法在時(shí)間開(kāi)銷(xiāo)過(guò)大,所以若對(duì)數(shù)據(jù)庫(kù)中的所有圖逐一掃描并判斷同構(gòu)關(guān)系是不現(xiàn)實(shí)的。針對(duì)這一問(wèn)題科研人員提出并運(yùn)用的圖索引技術(shù),通過(guò)為圖構(gòu)建索引來(lái)將一部分不滿足查詢匹配條件的圖過(guò)濾掉。通過(guò)這種方式將需要執(zhí)行同構(gòu)檢測(cè)算法的圖集規(guī)模盡可能的減少,以此來(lái)從全局提升圖匹配查詢算法的執(zhí)行效率?蒲腥藛T也基于圖索引算法提出了現(xiàn)今最為常用的,基于過(guò)濾-驗(yàn)證的圖匹配查詢的算法框架。傳統(tǒng)基于挖掘與非挖掘圖索引算法受限于圖規(guī)模大小,當(dāng)圖規(guī)模變的十分龐大時(shí)便失去研究?jī)r(jià)值。在本文中,我們首先提出了一種基于樹(shù)型結(jié)構(gòu)的鄰域索引,并通過(guò)該索引用來(lái)提升圖搜索算法的執(zhí)行效率。同時(shí)將鄰域索引作為通用的模型,對(duì)基于鄰域索引的圖匹配框架進(jìn)行抽象并總結(jié)出其算法框架的代價(jià)估算函數(shù)。不僅如此,我們還參照于使用十分廣泛的VF2子圖同構(gòu)檢測(cè)算法框架設(shè)計(jì)出了一種以樹(shù)型索引項(xiàng)為單位的圖匹配算法,并通過(guò)該算法來(lái)獲取被查詢圖中所有與查詢圖滿足子圖同構(gòu)的映射關(guān)系集合。在文章的最后,通過(guò)實(shí)驗(yàn)對(duì)比數(shù)據(jù)來(lái)驗(yàn)證我們所提出的基于鄰域模式的DFSTree索引和基于該索引的圖重構(gòu)匹配算法要優(yōu)于當(dāng)今主流的基于鄰域模式的SPath索引算法。在實(shí)驗(yàn)階段我們分別在索引構(gòu)建時(shí)間與空間,索引過(guò)濾候選集大小,以及基于索引的圖匹配查詢算法在返回匹配結(jié)果的平均響應(yīng)時(shí)間與算法返回的匹配結(jié)果來(lái)進(jìn)行對(duì)比分析。通過(guò)實(shí)驗(yàn)結(jié)果數(shù)據(jù)來(lái)驗(yàn)證得到,基于樹(shù)型結(jié)構(gòu)的DFSTree鄰域索引雖然在構(gòu)建時(shí)間與空間上稍遜色于當(dāng)今主流基于最短路徑的鄰域索引SPath算法,但是在查詢響應(yīng)時(shí)間與過(guò)濾后對(duì)候選集中呈負(fù)相關(guān)的候選元素的過(guò)濾強(qiáng)度而言都優(yōu)于SPath算法。
[Abstract]:Graph structure models are often used to describe complex data objects and their relationships in the fields of Internet, biology, computer vision and chemistry. Graph search plays a very important role in the field of graph computing. Graph search can be divided into two categories according to its application scenarios: subgraph query and subgraph matching. The subgraph query is to find out all the hypergraph sets matching the query graph in the graph database and return them. Subgraph matching is to find all the subsets isomorphic to the query graph from the graph database and return. Whether it is graph query or graph matching in graph search, it is necessary to use subgraph isomorphism algorithm to determine the isomorphism between the query graph and the graph in the returned result. It is well known that the subgraph isomorphism problem is a NP complete problem. In the industry, the graph isomorphism problem is realized by the fast heuristic algorithm of searching pruning. The time cost of the search based subgraph isomorphism algorithm is too large, so it is not realistic to scan all the graphs in the database one by one and judge the isomorphism relationship. Aiming at this problem, the graph index technology proposed and applied by researchers, by constructing indexes for graphs, filters out some graphs that do not meet the query matching conditions. In this way, the scale of graph set that needs to perform isomorphism detection algorithm will be reduced as much as possible, so as to improve the execution efficiency of graph matching query algorithm globally. Based on graph indexing algorithm, researchers also proposed the most commonly used, filter-based graph matching query algorithm framework. The traditional algorithms based on mining and non-mining graph index are limited by the size of the graph and lose the research value when the scale of the graph becomes very large. In this paper, we first propose a tree structure based neighborhood index, which is used to improve the execution efficiency of graph search algorithm. At the same time, the neighborhood index is used as a general model to abstract the graph matching framework based on neighborhood index and sum up the cost estimation function of its algorithm framework. Not only that, we also design a graph matching algorithm based on tree index terms according to the widely used framework of VF2 subgraph isomorphism detection algorithm. The algorithm is used to obtain all the mapping relation sets of the queried graph which are isomorphic to the subgraph of the query graph. At the end of the paper, we compare the experimental data to verify that our proposed DFSTree index based on neighborhood schema and graph reconstruction matching algorithm based on this index are better than the current mainstream SPath index algorithm based on neighborhood schema. In the experiment stage, we compare and analyze the time and space of index construction, the size of index filter candidate set, the average response time of graph matching query algorithm based on index and the matching result returned by the algorithm. The experimental results show that the DFSTree neighborhood index based on the tree structure is slightly inferior to the current SPath algorithm based on the shortest path in the construction time and space. However, the filtering intensity of candidate elements with negative correlation between query response time and filtering time is better than that of SPath algorithm.
【學(xué)位授予單位】:遼寧大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類(lèi)號(hào)】:O157.5
本文編號(hào):2358544
[Abstract]:Graph structure models are often used to describe complex data objects and their relationships in the fields of Internet, biology, computer vision and chemistry. Graph search plays a very important role in the field of graph computing. Graph search can be divided into two categories according to its application scenarios: subgraph query and subgraph matching. The subgraph query is to find out all the hypergraph sets matching the query graph in the graph database and return them. Subgraph matching is to find all the subsets isomorphic to the query graph from the graph database and return. Whether it is graph query or graph matching in graph search, it is necessary to use subgraph isomorphism algorithm to determine the isomorphism between the query graph and the graph in the returned result. It is well known that the subgraph isomorphism problem is a NP complete problem. In the industry, the graph isomorphism problem is realized by the fast heuristic algorithm of searching pruning. The time cost of the search based subgraph isomorphism algorithm is too large, so it is not realistic to scan all the graphs in the database one by one and judge the isomorphism relationship. Aiming at this problem, the graph index technology proposed and applied by researchers, by constructing indexes for graphs, filters out some graphs that do not meet the query matching conditions. In this way, the scale of graph set that needs to perform isomorphism detection algorithm will be reduced as much as possible, so as to improve the execution efficiency of graph matching query algorithm globally. Based on graph indexing algorithm, researchers also proposed the most commonly used, filter-based graph matching query algorithm framework. The traditional algorithms based on mining and non-mining graph index are limited by the size of the graph and lose the research value when the scale of the graph becomes very large. In this paper, we first propose a tree structure based neighborhood index, which is used to improve the execution efficiency of graph search algorithm. At the same time, the neighborhood index is used as a general model to abstract the graph matching framework based on neighborhood index and sum up the cost estimation function of its algorithm framework. Not only that, we also design a graph matching algorithm based on tree index terms according to the widely used framework of VF2 subgraph isomorphism detection algorithm. The algorithm is used to obtain all the mapping relation sets of the queried graph which are isomorphic to the subgraph of the query graph. At the end of the paper, we compare the experimental data to verify that our proposed DFSTree index based on neighborhood schema and graph reconstruction matching algorithm based on this index are better than the current mainstream SPath index algorithm based on neighborhood schema. In the experiment stage, we compare and analyze the time and space of index construction, the size of index filter candidate set, the average response time of graph matching query algorithm based on index and the matching result returned by the algorithm. The experimental results show that the DFSTree neighborhood index based on the tree structure is slightly inferior to the current SPath algorithm based on the shortest path in the construction time and space. However, the filtering intensity of candidate elements with negative correlation between query response time and filtering time is better than that of SPath algorithm.
【學(xué)位授予單位】:遼寧大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類(lèi)號(hào)】:O157.5
【參考文獻(xiàn)】
相關(guān)期刊論文 前2條
1 王楠;王斌;李曉華;楊曉春;;支持動(dòng)態(tài)圖數(shù)據(jù)的子圖查詢方法[J];計(jì)算機(jī)科學(xué)與探索;2014年02期
2 宋美娜;金遠(yuǎn)平;;一個(gè)基于DFS編碼的圖形匹配算法[J];計(jì)算機(jī)與數(shù)字工程;2009年09期
相關(guān)博士學(xué)位論文 前3條
1 鄒曉紅;用于圖分類(lèi)的頻繁子結(jié)構(gòu)挖掘算法研究[D];燕山大學(xué);2011年
2 張碩;圖數(shù)據(jù)庫(kù)查詢處理技術(shù)的研究[D];哈爾濱工業(yè)大學(xué);2010年
3 鄒磊;圖數(shù)據(jù)庫(kù)中的子圖查詢算法研究[D];華中科技大學(xué);2009年
相關(guān)碩士學(xué)位論文 前3條
1 王會(huì)會(huì);精確子圖數(shù)據(jù)庫(kù)查詢技術(shù)研究[D];哈爾濱工業(yè)大學(xué);2014年
2 李辰辰;基于非挖掘索引的圖查詢研究[D];遼寧大學(xué);2014年
3 鄭超;大規(guī)模圖集的頻繁子圖挖掘算法研究[D];燕山大學(xué);2010年
,本文編號(hào):2358544
本文鏈接:http://www.sikaile.net/kejilunwen/yysx/2358544.html
最近更新
教材專著