基于張量分解的癌癥亞型分析算法的研究
發(fā)布時間:2018-10-30 12:06
【摘要】:通過形態(tài)學或所屬組織器官命名的癌癥并不準確,癌癥的臨床治療需要更精確的亞型才能對癥下藥和靶向治療。通過對基因芯片數(shù)據(jù)如m RNA、mi RNA、DNA、蛋白質等數(shù)據(jù)的分析能發(fā)現(xiàn)和識別出更準確的癌癥亞型。整合多源基因組數(shù)據(jù)不僅能夠發(fā)現(xiàn)腫瘤與基因組數(shù)據(jù)的關系,而且可以發(fā)現(xiàn)各基因數(shù)據(jù)之間對腫瘤的協(xié)同共作用關系。綜合考慮不同基因數(shù)據(jù),在不丟失信息的前提下分析不同數(shù)據(jù)相互之間的共享結構是分析癌癥亞型的難點。本文使用多維陣列的張量結構來整合多源基因組數(shù)據(jù),不經(jīng)過中間數(shù)據(jù)轉換,保留的原始單一基因數(shù)據(jù)的特有信息,同時挖掘不同基因數(shù)據(jù)之間的協(xié)同致病模式。本文介紹了張量模型的原理和框架,在基于乳腺癌的基因表達譜數(shù)據(jù)和DNA甲基化數(shù)據(jù)上構建了張量模型,構建的方法是對預處理的芯片數(shù)據(jù)做差異表達分析,有明顯差異的基因在張量中置位1或者保留原芯片值。表達正;驔]有明顯差異的基因則稀疏化為0。這樣基因表達譜數(shù)據(jù)和甲基化數(shù)據(jù)就整合為一個三維張量。在現(xiàn)有的CP-ARP分解算法的基礎上,本文針對基因芯片數(shù)據(jù)高維度小樣本的數(shù)據(jù)特征和基因功能差異表達和表達水平正常的兩極化特征,引入了非負和稀疏性限制條件,優(yōu)化了CP分解模型。改進的模型使用基于隨機梯度下降的ALS優(yōu)化方法,在計算性能上有所提升。使用改進的分解方法在與已經(jīng)驗證的乳腺癌五種亞型對比結果證明了張量分解模型在癌癥分型應用上的有效性。通過對癌癥分型的結果分析,驗證了Her2這種臨床已證明存在的亞型。從平均輪廓系數(shù)和生存分析等角度證明了算法的性能和所分亞型的有效性。證實了本文提出的方法在癌癥的分型以及癌癥診斷治療上能提供一定的參考和借鑒。
[Abstract]:Cancer named by morphology or tissue or organ is not accurate. The clinical treatment of cancer requires more precise subtypes in order to get the right medicine and target treatment. More accurate cancer subtypes can be identified by analyzing microarray data such as m RNA,mi RNA,DNA, protein. The integration of multi-source genomic data can not only find the relationship between tumor and genomic data, but also find the synergistic co-action relationship between gene data and tumor. Considering different gene data and analyzing the shared structure of different data without losing information, it is difficult to analyze cancer subtype. In this paper, the Zhang Liang structure of multi-dimensional array is used to integrate the multi-source genomic data, and the unique information of the original single gene data is preserved without intermediate data conversion, and the cooperative pathogenicity patterns among different genetic data are also mined. In this paper, the principle and framework of Zhang Liang model are introduced, and then, on the basis of gene expression profile data and DNA methylation data of breast cancer, Zhang Liang model is constructed. The method is to analyze the differential expression of pre-processed microarray data. There are significant differences in the gene in Zhang Liang to place 1 or to retain the original chip value. Genes that express normal or no significant differences are sparse to 0. In this way, the gene expression profile data and methylation data are integrated into a three-dimensional Zhang Liang. Based on the existing CP-ARP decomposition algorithms, this paper introduces non-negative and sparse constraints for the data characteristics of high-dimensional small samples of gene chip data and the polarimetric characteristics of normal expression and expression level of gene functional differences. The CP decomposition model is optimized. The improved model uses the ALS optimization method based on stochastic gradient descent to improve the computational performance. The application of Zhang Liang decomposition model in cancer classification was proved by using the improved decomposition method in comparison with the five subtypes of breast cancer. Her2, a clinically proven subtype, was verified by analysis of cancer typing results. The performance of the algorithm and the validity of the subtype are proved from the point of view of average contour coefficient and survival analysis. It is confirmed that the proposed method can provide some reference for the classification of cancer and the diagnosis and treatment of cancer.
【學位授予單位】:哈爾濱工業(yè)大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:R73-3
本文編號:2299963
[Abstract]:Cancer named by morphology or tissue or organ is not accurate. The clinical treatment of cancer requires more precise subtypes in order to get the right medicine and target treatment. More accurate cancer subtypes can be identified by analyzing microarray data such as m RNA,mi RNA,DNA, protein. The integration of multi-source genomic data can not only find the relationship between tumor and genomic data, but also find the synergistic co-action relationship between gene data and tumor. Considering different gene data and analyzing the shared structure of different data without losing information, it is difficult to analyze cancer subtype. In this paper, the Zhang Liang structure of multi-dimensional array is used to integrate the multi-source genomic data, and the unique information of the original single gene data is preserved without intermediate data conversion, and the cooperative pathogenicity patterns among different genetic data are also mined. In this paper, the principle and framework of Zhang Liang model are introduced, and then, on the basis of gene expression profile data and DNA methylation data of breast cancer, Zhang Liang model is constructed. The method is to analyze the differential expression of pre-processed microarray data. There are significant differences in the gene in Zhang Liang to place 1 or to retain the original chip value. Genes that express normal or no significant differences are sparse to 0. In this way, the gene expression profile data and methylation data are integrated into a three-dimensional Zhang Liang. Based on the existing CP-ARP decomposition algorithms, this paper introduces non-negative and sparse constraints for the data characteristics of high-dimensional small samples of gene chip data and the polarimetric characteristics of normal expression and expression level of gene functional differences. The CP decomposition model is optimized. The improved model uses the ALS optimization method based on stochastic gradient descent to improve the computational performance. The application of Zhang Liang decomposition model in cancer classification was proved by using the improved decomposition method in comparison with the five subtypes of breast cancer. Her2, a clinically proven subtype, was verified by analysis of cancer typing results. The performance of the algorithm and the validity of the subtype are proved from the point of view of average contour coefficient and survival analysis. It is confirmed that the proposed method can provide some reference for the classification of cancer and the diagnosis and treatment of cancer.
【學位授予單位】:哈爾濱工業(yè)大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:R73-3
【參考文獻】
相關期刊論文 前4條
1 李澤,包雷,黃英武,孫之榮;基于基因表達譜的腫瘤分型和特征基因選取[J];生物物理學報;2002年04期
2 田振軍,張志琪,唐量,郭進,劉健;應用cDNA微矩陣基因芯片篩選運動性心肌肥大相關基因的初步研究[J];中國運動醫(yī)學雜志;2002年02期
3 何志巍,姚開泰;DNA微陣列(或芯片)技術原理及應用[J];生物化學與生物物理進展;1999年05期
4 王升啟;基因芯片技術及應用研究進展[J];生物工程進展;1999年04期
相關博士學位論文 前1條
1 郭煒煒;基于張量表示的多維信息處理方法研究[D];國防科學技術大學;2014年
相關碩士學位論文 前3條
1 詹勇;基于主題模型和混合模型的微博客交叉話題發(fā)現(xiàn)研究[D];西南交通大學;2013年
2 韓斌;基于內(nèi)容的超像素合并及其在圖像分割中的應用[D];上海交通大學;2013年
3 李寅;基于張量分解的視覺顯著性算法研究[D];上海交通大學;2011年
,本文編號:2299963
本文鏈接:http://www.sikaile.net/yixuelunwen/zlx/2299963.html
最近更新
教材專著