當(dāng)前位置：主頁(yè) > 碩博論文 > 基礎(chǔ)科學(xué)碩士論文 >

基于流形學(xué)習(xí)的蛋白質(zhì)功能預(yù)測(cè)與優(yōu)化

發(fā)布時(shí)間：2018-09-14 10:02

【摘要】：后基因組時(shí)代中,隨著高通量實(shí)驗(yàn)技術(shù)的快速發(fā)展,大量的蛋白質(zhì)數(shù)據(jù)被收集起來(lái)。然而,蛋白質(zhì)數(shù)據(jù)與功能標(biāo)注數(shù)據(jù)之間的差距卻在不斷變大。即使如酵母菌這樣得到廣泛研究的物種,其仍有近四分之一的蛋白質(zhì)功能無(wú)法確定。因此,從計(jì)算角度設(shè)計(jì)出高效的蛋白質(zhì)功能的自動(dòng)標(biāo)注方法成為生物信息學(xué)領(lǐng)域的重要挑戰(zhàn)之一。此外,由高通量實(shí)驗(yàn)方法或計(jì)算預(yù)測(cè)方法獲得的蛋白質(zhì)功能標(biāo)注數(shù)據(jù)包含了較高比例的假陽(yáng)性和假陰性噪聲,嚴(yán)重影響了與蛋白質(zhì)功能標(biāo)注相關(guān)的生物、醫(yī)療的應(yīng)用效果。在本論文中,依據(jù)蛋白質(zhì)相互作用網(wǎng)絡(luò)的拓?fù)浣Y(jié)構(gòu)、流形學(xué)習(xí)方法和圖理論知識(shí),我們提出了三種有效的計(jì)算方法用來(lái)解決蛋白質(zhì)功能的自動(dòng)預(yù)測(cè)以及功能標(biāo)注數(shù)據(jù)中的噪聲問(wèn)題。全文的主要工作概括如下:(1)針對(duì)蛋白質(zhì)功能的自動(dòng)標(biāo)注問(wèn)題,提出了一種新的整合流形學(xué)習(xí)和多標(biāo)簽學(xué)習(xí)的蛋白質(zhì)功能預(yù)測(cè)框架。首先,利用邊介數(shù)對(duì)蛋白質(zhì)相互作用網(wǎng)絡(luò)進(jìn)行加權(quán)處理。然后,利用等度規(guī)特征映射(ISOMAP)算法將該加權(quán)網(wǎng)絡(luò)嵌入到低維表示空間中,從而獲得蛋白質(zhì)數(shù)據(jù)的低維特征表示;最后,將蛋白質(zhì)功能預(yù)測(cè)轉(zhuǎn)化成經(jīng)典地多標(biāo)簽學(xué)習(xí)問(wèn)題,并且能夠采用多種多標(biāo)簽學(xué)習(xí)方法進(jìn)行蛋白質(zhì)功能的預(yù)測(cè)與評(píng)估工作。實(shí)驗(yàn)結(jié)果表明,提出的方法能夠取得了更加合理的蛋白質(zhì)低維特征表示,并且相比于其他對(duì)比方法取得了更加準(zhǔn)確的預(yù)測(cè)精度。(2)提出一種魯棒的融合功能相關(guān)性的多標(biāo)簽線(xiàn)性回歸方法來(lái)預(yù)測(cè)蛋白質(zhì)的功能。首先,采用基于流形學(xué)習(xí)的ISOMAP算法將邊介數(shù)加權(quán)的蛋白質(zhì)相互作用網(wǎng)絡(luò)嵌入到低維子空間中。然后,根據(jù)蛋白質(zhì)低維數(shù)據(jù)的分布特點(diǎn),將線(xiàn)性回歸理論擴(kuò)展到多標(biāo)簽情境中,通過(guò)余弦相似性計(jì)算蛋白質(zhì)功能標(biāo)簽之間的相似性,并將其作為規(guī)則項(xiàng)加入到多標(biāo)簽線(xiàn)性回歸模型的目標(biāo)函數(shù)中。最后,評(píng)估了提出的算法在酵母菌數(shù)據(jù)庫(kù)上的有效性。實(shí)驗(yàn)結(jié)果表明,提出的方法相比于其他現(xiàn)有的方法實(shí)現(xiàn)了更加令人滿(mǎn)意的預(yù)測(cè)性能。(3)為了解決蛋白質(zhì)功能標(biāo)注數(shù)據(jù)中包含大量噪聲的問(wèn)題,提出了一種基于圖規(guī)則化l1-范數(shù)的主成分分析法(Gl1PCA)進(jìn)行蛋白質(zhì)功能優(yōu)化。首先,該方法通過(guò)蛋白質(zhì)相互作用網(wǎng)絡(luò)與功能相似性矩陣分別構(gòu)建了一個(gè)蛋白質(zhì)圖與一個(gè)功能圖。然后,將蛋白質(zhì)圖與功能圖經(jīng)過(guò)拉普拉斯變化后以規(guī)則項(xiàng)的形式被整合到了l1-范數(shù)的主成分分析法(l1PCA)的目標(biāo)函數(shù)之中。最后,給出了該優(yōu)化模型的一種基于增廣拉格朗日乘子法(ALM)的快速解法,并利用理論證明與優(yōu)化實(shí)驗(yàn)驗(yàn)證了提出的算法的正確性。實(shí)驗(yàn)結(jié)果表明,提出的算法能夠有效的優(yōu)化蛋白質(zhì)的功能標(biāo)注數(shù)據(jù)。
[Abstract]:In the post-genome era, with the rapid development of high-throughput experimental techniques, a large number of protein data have been collected. However, the gap between protein data and functional labeling data is widening. Even in species widely studied, such as yeasts, nearly a quarter of their protein functions remain uncertain. It is one of the most important challenges in bioinformatics to design efficient automatic annotation methods for protein functions from computational perspective. In addition, high-throughput experimental methods or computational prediction methods contain a high proportion of false-positive and false-negative noises, which seriously affect the protein function annotation phase. In this paper, based on the topological structure of protein-protein interaction network, manifold learning method and graph theory, we propose three effective computational methods to solve the problem of automatic prediction of protein function and noise in function labeling data. 1) To solve the problem of automatic annotation of protein function, a new framework for protein function prediction based on integrated manifold learning and multi-label learning is proposed. First, the protein-protein interaction network is weighted by the edge median. Then, the weighted network is embedded into the low-dimensional representation space by the ISOMAP algorithm. Finally, the protein function prediction is transformed into a classical multi-label learning problem, and many multi-label learning methods can be used to predict and evaluate protein function. The experimental results show that the proposed method can achieve a more reasonable protein low-dimensional feature representation. (2) A robust multi-label linear regression method based on functional correlation is proposed to predict protein function. Firstly, an ISOMAP algorithm based on manifold learning is used to embed the edge-median weighted protein-protein interaction network into a low-dimensional subspace. Secondly, according to the distribution characteristics of low-dimensional protein data, the linear regression theory is extended to multi-label situation, and the similarity between protein functional labels is calculated by cosine similarity, which is added as a rule term to the objective function of multi-label linear regression model. Finally, the yeast count of the proposed algorithm is evaluated. Experimental results show that the proposed method achieves better predictive performance than other existing methods. (3) In order to solve the problem of large amount of noise in protein function annotation data, a graph regularized L1-norm based principal component analysis (Gl1PCA) is proposed to optimize protein function. Firstly, a protein graph and a function graph are constructed by the protein interaction network and the function similarity matrix respectively. Then, the protein graph and the function graph are integrated into the objective function of L1-norm principal component analysis (l1PCA) in the form of regular terms after Laplace transformation. A fast algorithm based on augmented Lagrange multiplier (ALM) is proposed to solve the optimization model. The validity of the proposed algorithm is verified by theoretical proof and optimization experiments. The experimental results show that the proposed algorithm can effectively optimize the protein functional annotation data.
【學(xué)位授予單位】：安徽大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類(lèi)號(hào)】：Q51;TP181

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 曾嵐,徐晉麟,李亦學(xué),石鐵流;大規(guī)模蛋白質(zhì)功能預(yù)測(cè)方法的進(jìn)展[J];生命的化學(xué);2005年01期

2 盧宏超;石秋艷;石寶晨;張治華;趙屹;唐素勤;熊磊;王強(qiáng);陳潤(rùn)生;;基于蛋白質(zhì)網(wǎng)絡(luò)功能模塊的蛋白質(zhì)功能預(yù)測(cè)[J];生物化學(xué)與生物物理進(jìn)展;2006年05期

3 王繁業(yè);李亞非;;用于新藥開(kāi)發(fā)的新的蛋白質(zhì)功能預(yù)測(cè)方法[J];化學(xué)與生物工程;2006年09期

4 王秀鶴;王正華;王勇獻(xiàn);張振慧;;基于分組重量編碼的蛋白質(zhì)功能預(yù)測(cè)[J];生物信息學(xué);2007年01期

5 倪青山;王正志;黎剛果;孟祥林;;基于K近鄰的蛋白質(zhì)功能的預(yù)測(cè)方法[J];生物醫(yī)學(xué)工程研究;2009年02期

6 蔣英芝;賀連華;劉建軍;;蛋白質(zhì)功能研究方法及技術(shù)[J];生物技術(shù)通報(bào);2009年09期

7 胡敏菁;吳建盛;施識(shí)帆;劉宏德;孫嘯;;面向蛋白質(zhì)功能位點(diǎn)識(shí)別的機(jī)器學(xué)習(xí)平臺(tái)構(gòu)建[J];生物信息學(xué);2010年01期

8 趙研;盧奕南;權(quán)勇;;基于模糊積分多源數(shù)據(jù)融合的蛋白質(zhì)功能預(yù)測(cè)[J];南京大學(xué)學(xué)報(bào)(自然科學(xué)版);2012年01期

9 吳建盛;;基于新型機(jī)器學(xué)習(xí)方法的蛋白質(zhì)功能預(yù)測(cè)與分析[J];信息通信;2012年05期

10 劉言;沈素萍;方慧生;陳凱先;;蛋白質(zhì)功能預(yù)測(cè)方法概述[J];生物信息學(xué);2013年01期

相關(guān)會(huì)議論文前10條

1 盧乃浩;張燕;李海玲;高中洪;;蛋白質(zhì)酪氨酸硝化修飾對(duì)蛋白質(zhì)功能影響及抗氧化劑的作用[A];第六屆全國(guó)化學(xué)生物學(xué)學(xué)術(shù)會(huì)議論文摘要集[C];2009年

2 郭延芝;李夢(mèng)龍;;蛋白質(zhì)功能預(yù)測(cè)中的特征篩選與優(yōu)化[A];中國(guó)化學(xué)會(huì)第27屆學(xué)術(shù)年會(huì)第15分會(huì)場(chǎng)摘要集[C];2010年

3 汪世華;;蛋白質(zhì)芯片用于快速檢測(cè)的研究[A];中國(guó)蛋白質(zhì)組學(xué)第三屆學(xué)術(shù)大會(huì)論文摘要[C];2005年

4 郭延芝;文志寧;李夢(mèng)龍;;基于序列信息的蛋白質(zhì)功能預(yù)測(cè)[A];中國(guó)化學(xué)會(huì)第26屆學(xué)術(shù)年會(huì)化學(xué)信息學(xué)與化學(xué)計(jì)量學(xué)分會(huì)場(chǎng)論文集[C];2008年

5 王靖;李霞;高磊;朱明珠;楊德武;;蛋白質(zhì)功能位點(diǎn)和結(jié)構(gòu)域與人類(lèi)蛋白質(zhì)互作關(guān)聯(lián)分析[A];中國(guó)生物醫(yī)學(xué)工程進(jìn)展——2007中國(guó)生物醫(yī)學(xué)工程聯(lián)合學(xué)術(shù)年會(huì)論文集（下冊(cè)）[C];2007年

6 劉克良;梁遠(yuǎn)軍;;肽類(lèi)藥物研究進(jìn)展[A];2006第六屆中國(guó)藥學(xué)會(huì)學(xué)術(shù)年會(huì)大會(huì)報(bào)告集[C];2006年

7 劉俊峰;王新泉;王占新;安曉敏;常文瑞;梁棟材;;造血干細(xì)胞中特異表達(dá)新基因kd93的重組表達(dá)和晶體結(jié)構(gòu)研究[A];中國(guó)科協(xié)2005年學(xué)術(shù)年會(huì)生物物理與重大疾病分會(huì)論文摘要集[C];2005年

8 常珊;李春華;龔新奇;陳慰祖;王存新;;蛋白質(zhì)不同區(qū)域的氨基酸保守性網(wǎng)絡(luò)分析[A];第十次中國(guó)生物物理學(xué)術(shù)大會(huì)論文摘要集[C];2006年

9 張長(zhǎng)勝;來(lái)魯華;;基于關(guān)鍵相互作用的蛋白質(zhì)功能設(shè)計(jì)[A];第五屆全國(guó)化學(xué)生物學(xué)學(xué)術(shù)會(huì)議論文摘要集[C];2007年

10 李亦學(xué);;蛋白質(zhì)組功能注釋[A];中國(guó)蛋白質(zhì)組學(xué)第二屆學(xué)術(shù)大會(huì)論文摘要論文集[C];2004年

相關(guān)重要報(bào)紙文章前4條

1 記者耿挺;蛋白質(zhì)功能算出來(lái)[N];上海科技報(bào);2007年

2 劉云濤;北大蛋白質(zhì)功能設(shè)計(jì)研究獲新進(jìn)展[N];中國(guó)醫(yī)藥報(bào);2007年

3 記者吳仲?lài)?guó);日首次公開(kāi)招募研究人員[N];科技日?qǐng)?bào);2001年

4 華琳王治強(qiáng);我校三項(xiàng)“973”項(xiàng)目通過(guò)科技部驗(yàn)收[N];新清華;2005年

相關(guān)博士學(xué)位論文前9條

1 滕志霞;基于序列和PPI網(wǎng)絡(luò)的蛋白質(zhì)功能預(yù)測(cè)方法研究[D];哈爾濱工業(yè)大學(xué);2016年

2 孫承磊;基于數(shù)據(jù)挖掘技術(shù)的蛋白質(zhì)功能預(yù)測(cè)研究[D];上海大學(xué);2013年

3 竇永超;預(yù)測(cè)蛋白質(zhì)功能位點(diǎn)的幾種新數(shù)學(xué)模型[D];大連理工大學(xué);2011年

4 施紹萍;基于支持向量機(jī)的蛋白質(zhì)功能預(yù)測(cè)新方法研究[D];南昌大學(xué);2012年

5 俞曉晶;基于蛋白質(zhì)序列和生物醫(yī)學(xué)文獻(xiàn)的蛋白質(zhì)功能挖掘[D];中國(guó)科學(xué)院研究生院（上海生命科學(xué)研究院）;2006年

6 張同亮;基于智能計(jì)算的蛋白質(zhì)功能預(yù)測(cè)研究[D];東華大學(xué);2008年

7 馬志強(qiáng);蛋白質(zhì)功能預(yù)測(cè)的非同源性計(jì)算方法研究[D];吉林大學(xué);2009年

8 陳義明;基于分類(lèi)的蛋白質(zhì)功能預(yù)測(cè)技術(shù)研究[D];國(guó)防科學(xué)技術(shù)大學(xué);2010年

9 張拓;兩種特殊類(lèi)型蛋白質(zhì)功能殘基的預(yù)測(cè)與生物序列比對(duì)[D];南開(kāi)大學(xué);2009年

相關(guān)碩士學(xué)位論文前10條

1 郭金文;基于序列循環(huán)關(guān)系網(wǎng)絡(luò)模型的蛋白質(zhì)功能預(yù)測(cè)技術(shù)研究[D];福建師范大學(xué);2015年

2 刁印;基于圖理論和互作網(wǎng)絡(luò)的蛋白質(zhì)功能預(yù)測(cè)研究[D];大連理工大學(xué);2015年

3 張信;基于多數(shù)據(jù)源融合的蛋白質(zhì)功能預(yù)測(cè)方法研究[D];大連理工大學(xué);2015年

4 劉殿昆;NaHCO_3脅迫下檉柳（T.hispida）根部差異表達(dá)蛋白質(zhì)的研究[D];東北林業(yè)大學(xué);2016年

5 梁華東;基于流形學(xué)習(xí)的蛋白質(zhì)功能預(yù)測(cè)與優(yōu)化[D];安徽大學(xué);2017年

6 王博;基于頻繁功能模式的蛋白質(zhì)功能預(yù)測(cè)[D];吉林大學(xué);2012年

7 趙研;模糊積分在蛋白質(zhì)功能預(yù)測(cè)上的應(yīng)用[D];吉林大學(xué);2012年

8 李希;基于序列特征的蛋白質(zhì)功能類(lèi)預(yù)測(cè)方法研究[D];湖南大學(xué);2010年

9 王秀鶴;基于序列和相互作用的蛋白質(zhì)功能預(yù)測(cè)[D];國(guó)防科學(xué)技術(shù)大學(xué);2006年

10 鄧小龍;基于隨機(jī)游走的蛋白質(zhì)功能預(yù)測(cè)方法的研究[D];吉林大學(xué);2012年

，

本文編號(hào)：2242391

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://www.sikaile.net/shoufeilunwen/benkebiyelunwen/2242391.html

上一篇：魚(yú)群和鳥(niǎo)群遷徙運(yùn)動(dòng)中的流體力學(xué)機(jī)理研究
下一篇：基于異構(gòu)計(jì)算的MOND數(shù)值模擬加速技術(shù)研究

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于流形學(xué)習(xí)的蛋白質(zhì)功能預(yù)測(cè)與優(yōu)化