生物網絡分析及其在復雜疾病研究中的應用
發(fā)布時間:2018-04-04 19:43
本文選題:系統生物學 切入點:生物網絡 出處:《中南大學》2012年博士論文
【摘要】:如何診斷和治療以癌癥為代表的復雜疾病一直是生物醫(yī)學研究的重點和難點。但這方面的研究長期以來受限于生物實驗技術和實驗結果分析技術,沒能取得重大的突破。高通量生物技術的快速發(fā)展為復雜疾病的研究提供了海量的數據來源,尤其是以基因調控網絡和蛋白質相互作用網絡為代表的生物網絡很好的表示了生物大分子間的復雜關系,為復雜疾病的研究提供了很好的數據支持。正是由于這類生物網絡數據的大量積累,研究人員迫切的需要新的分析技術對生物網絡進行分析,并最終對復雜疾病的研究、診斷和治療提供支持。 本文從評估生物大分子間相互作用數據的可靠性出發(fā),對圖聚類、多數據融合的動態(tài)網絡構建等技術進行了研究,最終將這些分析技術應用到復雜疾病的疾病基因和生物過程的識別中。主要的研究工作包括: 針對目前高通量實驗技術所產生的生物網絡存在假陽性高和假陰性高的問題,利用Gene Ontology注釋信息和語義相似性對現有的蛋白質相互作用數據的可靠性進行評估,通過統計分析和機器學習尋找最適合于評估蛋白質相互作用可靠性的語義相似性定義。 現在直接從公開數據庫中得到的生物網絡都是靜態(tài)的,但這顯然沒有反應出生物的動態(tài)性。我們通過對時序基因表達數據和組織特異性基因表達數據進行分析,并將其與現有的靜態(tài)生物網絡融合,構建出了具有一定時空動態(tài)特性的生物網絡,并對這種動態(tài)網絡進行了基本的分析,并將其跟靜態(tài)網絡做了比較。 現有的大部分用于從生物網絡中挖掘功能模塊和復合物的算法都只是基于生物網絡的拓撲結構。通過分析發(fā)現,關鍵蛋白質在功能模塊和復合物中的分布式不均勻的,而且功能模塊和復合物都存在核結構,因此在聚類過程中有必要對關鍵蛋白和非關鍵蛋白做不同的處理。據此,我們提出了基于關鍵蛋白質的圖聚類算法,EPOF。將該算法應用到酵母的蛋白質相互作用網絡上,通過GO富集分析和跟已知的復合物進行比較,EPOF算法的性能比其他同類算法有顯著提高。 最后,在對生物網絡進行各種分析的基礎之上,我們利用圖聚類算法對疾病和藥物對照研究中的基因表達數據進行分析,并用GO語義相似性對聚類結果進行比較,識別出跟疾病相關的生物過程。同時,我們還利用疾病的Gene Signature和生物網絡數據融合不同的Gene Signature,并識別出跟疾病有密切關系的基因。 本文從生物網絡數據的預處理開始,研究了生物網絡的各種分析方法,最終將這些方法應用到復雜疾病的研究中,取得了較好的結果。本文的研究內容和成果,為從系統的角度對各種復雜疾病展開研究提供了支持,有助于推動我們對以癌癥為代表的復雜疾病的診斷和治療等方面的研究。
[Abstract]:How to diagnose and treat complex diseases represented by cancer has always been the focus and difficulty of biomedical research.However, the research in this field has been limited by biological experimental technology and experimental results analysis technology for a long time, and failed to make a major breakthrough.The rapid development of high-throughput biotechnology provides massive data sources for the study of complex diseases, especially the biological networks represented by gene regulation networks and protein interaction networks, which represent the complex relationships among biomolecules.It provides a good data support for the study of complex diseases.Because of this kind of biological network data accumulation, researchers urgently need new analysis technology to analyze biological network, and finally provide support for the research, diagnosis and treatment of complex diseases.In order to evaluate the reliability of biomolecular interaction data, the techniques of graph clustering, dynamic network construction of multi-data fusion and so on are studied in this paper.Finally, these analytical techniques are applied to the identification of disease genes and biological processes of complex diseases.Major research efforts include:In view of the problem of false positive and false negative high in biological networks produced by high-throughput experimental technology, the reliability of existing protein-protein interaction data is evaluated by using Gene Ontology annotation information and semantic similarity.Through statistical analysis and machine learning to find the most suitable for evaluating the reliability of protein interaction semantic similarity definition.Biological networks obtained directly from public databases are now static, but this obviously does not reflect the dynamic nature of organisms.Based on the analysis of temporal gene expression data and tissue specific gene expression data, and fusion with the existing static biological networks, we have constructed a biological network with a certain temporal and spatial dynamic characteristics.The dynamic network is analyzed and compared with the static network.Most of the existing algorithms for mining functional modules and complexes from biological networks are based on the topology of biological networks.It is found that the key proteins are distributed inhomogeneously in functional modules and complexes, and the nuclear structures exist in both functional modules and complexes. Therefore, it is necessary to treat the key proteins and non-key proteins differently in the process of clustering.Based on this, we propose a graph clustering algorithm based on key proteins (EPOF).This algorithm is applied to yeast protein interaction network. The performance of EPOF algorithm is significantly improved by go enrichment analysis and comparison with known complexes.Finally, based on the analysis of biological networks, we analyze the gene expression data in disease and drug control studies using map clustering algorithm, and compare the clustering results with go semantic similarity.Identify biological processes associated with disease.At the same time, we use the disease Gene Signature and biological network data to fuse different Gene signature and identify genes closely related to the disease.Starting with the pretreatment of biological network data, various analytical methods of biological network are studied in this paper. Finally, these methods are applied to the study of complex diseases, and good results are obtained.The research contents and results of this paper provide support for the systematic study of various complex diseases, and help to promote our research on the diagnosis and treatment of complex diseases represented by cancer.
【學位授予單位】:中南大學
【學位級別】:博士
【學位授予年份】:2012
【分類號】:R319;O157.5
【參考文獻】
相關期刊論文 前1條
1 MOTULSKY Arno G.;;Genetics of complex diseases[J];Journal of Zhejiang University Science;2006年02期
,本文編號:1711400
本文鏈接:http://www.sikaile.net/yixuelunwen/swyx/1711400.html