當(dāng)前位置：主頁(yè) > 科技論文 > 自動(dòng)化論文 >

基于特征融合和降維算法的蛋白質(zhì)亞核定位研究

發(fā)布時(shí)間：2018-07-03 10:55

本文選題：蛋白質(zhì)亞核定位 + 融合表達(dá)　；參考：《云南大學(xué)》2016年碩士論文

【摘要】：隨著人類基因組測(cè)序的完成,高通量測(cè)序技術(shù)逐步流行,使得蛋白質(zhì)序列大量產(chǎn)生。對(duì)新測(cè)得序列的蛋白質(zhì)功能的掌握則成為生物信息學(xué)研究的熱點(diǎn)之一。眾所周知,蛋白質(zhì)需要在生物體細(xì)胞內(nèi)執(zhí)行其生物活動(dòng),進(jìn)而得知蛋白質(zhì)的亞細(xì)胞、亞核定位信息與蛋白質(zhì)的功能緊密相關(guān),并且蛋白質(zhì)亞核定位信息還為遺傳和癌癥等方面疾病的預(yù)防、診斷與治療提供有效的線索。然而傳統(tǒng)的通過(guò)生物學(xué)實(shí)驗(yàn)的方法獲取蛋白質(zhì)亞核定位信息需消耗大量的時(shí)間與金錢。近年來(lái),隨著計(jì)算機(jī)科學(xué)快速地發(fā)展,利用機(jī)器學(xué)習(xí)的方法研究蛋白質(zhì)亞核定位成為生物信息學(xué)研究的一個(gè)熱點(diǎn),并且基于機(jī)器學(xué)習(xí)的方法所開(kāi)發(fā)出的定位方法預(yù)測(cè)速度快且代價(jià)較低。本文正是利用機(jī)器學(xué)習(xí)的方法對(duì)蛋白質(zhì)亞核定位問(wèn)題展開(kāi)深入研究。首先全面地對(duì)蛋白質(zhì)亞核定位的基本知識(shí)、問(wèn)題的背景與意義以及研究現(xiàn)狀進(jìn)行闡述；同時(shí)對(duì)蛋白質(zhì)亞核定位的主要研究?jī)?nèi)容給出詳細(xì)地描述；然后不同角度地對(duì)蛋白質(zhì)序列特征表達(dá)和分類器的選擇進(jìn)行探討,并歸結(jié)了當(dāng)前蛋白質(zhì)序列表達(dá)方法存有的問(wèn)題；最后提出了本文研究蛋白質(zhì)亞核定位的突破點(diǎn)。提出基于特征融合和有監(jiān)督的局部保持投影的蛋白質(zhì)亞核定位方法。由于傳統(tǒng)的特征表達(dá)只局限于單一方面序列信息來(lái)提取蛋白質(zhì)特征,并且基于傳統(tǒng)的特征表達(dá),設(shè)計(jì)分類模型時(shí),沒(méi)有分析序列表達(dá)的數(shù)據(jù)分布,使得特征表達(dá)與分類方法之間比較孤立,于是,該方法首先對(duì)具有序列互補(bǔ)性信息的表達(dá)進(jìn)行融合,得到一種具有高效判別信息的特征融合表達(dá)；然后利用有監(jiān)督的局部保持投影學(xué)習(xí)數(shù)據(jù)低維流形,對(duì)提出的融合表達(dá)降維處理,得到類間分割、類內(nèi)保持的低維判別特征,依據(jù)此數(shù)據(jù)分布,選用K-近鄰分類方法預(yù)測(cè)序列的亞核位置；最后該方法在兩種標(biāo)準(zhǔn)數(shù)據(jù)集上進(jìn)行多種對(duì)比實(shí)驗(yàn)均取得較高的預(yù)測(cè)精度。該方法充分利用傳統(tǒng)序列表達(dá)包含信息的互補(bǔ)性,并考慮序列表達(dá)的數(shù)據(jù)分布與分類模型的關(guān)聯(lián)性,使得該方法在整體預(yù)測(cè)精度上有較大的提高。但是該方法忽略了不同亞核位置蛋白質(zhì)的差異性,為此提出了本文研究的另一創(chuàng)新點(diǎn)。提出基于高效的融合表達(dá)和線性判別分析的蛋白質(zhì)亞核定位方法。該方法依據(jù)不同特征表達(dá)包含的序列信息不同,進(jìn)而對(duì)亞核定位的貢獻(xiàn)程度不同,以及不同亞核位置上的蛋白質(zhì)的功能不同的性質(zhì),通過(guò)精細(xì)化各亞核位置上蛋白質(zhì)的這些差異性,提出對(duì)不同亞核位置上的特征數(shù)據(jù)進(jìn)行不同程度的融合處理,構(gòu)建出包含高效判別信息的兩種高維融合表達(dá)；其中,利用遺傳算法求取融合表達(dá)的各亞核位置上的特征融合系數(shù)。由于得到的融合表達(dá)的維度高且融合表達(dá)包含的信息有冗余,為此,利用線性判別分析降維處理所提出的融合表達(dá),選出亞核定位預(yù)測(cè)精度最高時(shí)的數(shù)據(jù)維度,同時(shí)開(kāi)發(fā)出本章的蛋白質(zhì)亞核定位分類器。在兩種標(biāo)準(zhǔn)數(shù)據(jù)集上運(yùn)行大量實(shí)驗(yàn),結(jié)果表明提出的方法具有較高的預(yù)測(cè)精度,且分類器的性能也較高。
[Abstract]:With the completion of the sequencing of the human genome, high throughput sequencing technology is becoming popular, making a large number of protein sequences. It is one of the hotspots in the study of bioinformatics to master the protein function of the newly detected sequences. It is well known that proteins need to hold their biological activities within the cells of the organism, and then learn the subthin protein of the protein. The localization information of subnuclei is closely related to the function of protein, and the localization information of protein subnuclei provides effective clues for the prevention and treatment of diseases such as heredity and cancer. However, the traditional method of obtaining protein subnuclear location through biological experiments takes a lot of time and money. With the rapid development of computer science, using machine learning method to study the localization of protein subnuclei has become a hot spot in bioinformatics research, and the positioning method developed based on machine learning method has a fast and low cost. This paper is using the method of machine learning to develop the problem of protein subcore positioning. Firstly, the basic knowledge of protein subnucleus localization, the background and significance of the problem and the current research status are expounded, and the main contents of the protein subnucleus location are described in detail. Then the expression of protein sequence characteristics and the selection of classifier are discussed in different angles, and the results are summed up. At the end of this paper, the breakthrough point of protein subcore localization is proposed in this paper. A protein subnucleus localization method based on feature fusion and supervised local maintenance is proposed. In the traditional feature expression, when the classification model is designed, the data distribution is not analyzed, which makes the feature expression and the classification method more isolated. Therefore, the method first combines the expression of the sequence complementarity information, and obtains a feature fusion expression with efficient discriminant information; then, the method is supervised. The local preserving projection learning data is low dimensional manifold, and the proposed fusion expression reduction processing, the inter class segmentation, the low dimension distinguishing feature of the class keep in class, according to this data distribution, the K- nearest neighbor classification method is selected to predict the subkernel position of the sequence. Finally, the method has achieved a higher preview in a variety of contrast experiments on the two standard data sets. This method makes full use of the complementarity of the information contained in the traditional sequence expression, and takes into account the correlation between the data distribution and the classification model expressed in the sequence, making the method more accurate in the overall prediction accuracy. However, this method ignores the difference of different subkernel position proteins, and puts forward another creation in this paper. New points. A protein subnucleus localization method based on high efficient fusion expression and linear discriminant analysis is proposed. This method is based on different features of sequence information contained in different features, and then the contribution degree to subnuclear localization is different, as well as the different functional properties of protein in different subnuclei, by fine refining the protein subnucleus location protein. The quality of these differences, proposed to different subkernel location of the feature data in different degrees of fusion processing, and construct two kinds of high dimensional fusion expression including high efficient discriminant information; in which the genetic algorithm is used to obtain the fusion coefficients of the subkernel location of the fusion expression. The information contained in the expression is redundant. Therefore, the fusion expression proposed by the linear discriminant analysis is used to select the data dimension when the subkernel location prediction is the highest, and the protein subkernel location classifier is developed in this chapter. The large quantity experiment is run on the two standard data sets, and the results show that the proposed method is high. The precision is predicted, and the performance of the classifier is also high.
【學(xué)位授予單位】：云南大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2016
【分類號(hào)】：Q51;TP181
，

本文編號(hào)：2093403

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/zidonghuakongzhilunwen/2093403.html

上一篇：基于轉(zhuǎn)移流量的智能變電站過(guò)程層網(wǎng)絡(luò)拓?fù)溴e(cuò)誤辨識(shí)
下一篇：不完全信息條件下橋牌博弈算法的研究及應(yīng)用

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于特征融合和降維算法的蛋白質(zhì)亞核定位研究