I-VECTOR說話人識別中基于偏最小二乘的總變化空間估計方法
發(fā)布時間:2018-04-24 11:46
本文選題:說話人識別 + i-vector; 參考:《哈爾濱工業(yè)大學(xué)》2015年碩士論文
【摘要】:作為一項關(guān)鍵的多媒體數(shù)據(jù)分析技術(shù),說話人識別被廣泛地應(yīng)用于事務(wù)訪問控制、身份驗證、執(zhí)法、語音數(shù)據(jù)管理,以及音頻監(jiān)控等領(lǐng)域。其中,i-vector作為一項有效的說話人識別技術(shù),其性能優(yōu)于傳統(tǒng)的說話人識別方法,因而在說話人識別領(lǐng)域受到了廣泛的關(guān)注。I-vector說話人識別技術(shù)的核心環(huán)節(jié)為總變化空間的估計,然而目前的總變化空間的估計方法均為通過尋找特征向量之間的數(shù)據(jù)信息關(guān)系達來到特征提取的目的,卻忽略了一個重要的先驗知識——說話人的類別信息,而類別信息對于樣本的分類與預(yù)測有著十分重要的意義,因此現(xiàn)有的總變化空間估計方法并不是最優(yōu)的。為此,本文從類別信息入手,提出了一種基于偏最小二乘的總變化空間的估計方法。首先訓(xùn)練高斯混合模型-通用背景模型(Gaussian Mixture Model-Universal Background Model,GMM-UBM),從而得到每位說話人GMM均值超向量;然后利用GMM均值超向量和類別信息估計總變化空間,并提取說話人i-vector;最后利用類內(nèi)協(xié)方差規(guī)整(Within-Class Covariance Normalization,WCCN)進行信道補償處理,并用余弦距離打分作為判決方法。實驗結(jié)果表明,King-ASR-009數(shù)據(jù)庫與NIST 2008數(shù)據(jù)庫(任務(wù)short2-short3與任務(wù)8conv-short3)上的識別性能均有明顯提升。由于偏最小二乘對類間相似點不敏感,而對異常點比較敏感,所以當(dāng)訓(xùn)練樣本中出現(xiàn)上述問題時,往往導(dǎo)致系統(tǒng)性能的下降。對此,本文提出了一種基于回歸懲罰偏最小二乘的總變化空間估計方法,將訓(xùn)練語料一分為二,一部分用于訓(xùn)練初始總變化空間,另一部分用于回歸懲罰。實驗結(jié)果表明,King-ASR-009數(shù)據(jù)庫上的說話人確認與辨認性能均有所提升。
[Abstract]:As a key technology of multimedia data analysis, speaker identification is widely used in the fields of transaction access control, authentication, law enforcement, voice data management, audio monitoring and so on. As an effective speaker recognition technology, the performance of i-vector is superior to that of the traditional speaker recognition method, so it has received widespread attention in the field of speaker recognition. The core link of .I-vector speaker recognition technology is the estimation of total change space. However, the current estimation methods of the total change space all reach the purpose of feature extraction by searching for the data information relationship between the feature vectors, but ignore an important priori knowledge-the category information of the speaker. Class information is very important for the classification and prediction of samples, so the existing estimation methods of total variation space are not optimal. For this reason, this paper presents an estimation method of total change space based on partial least squares. Firstly, Gao Si mixed model-general background model is trained to obtain the GMM mean supervector of each speaker, and then the total change space is estimated by using GMM mean supervector and category information. Finally, we use Within-Class Covariance NormalizationWCCNs to deal with channel compensation, and use cosine distance as the judgment method. The experimental results show that the recognition performance of King-ASR-009 database and NIST 2008 database (task short2-short3 and task 8conv-short3) are improved obviously. Since partial least squares is insensitive to similar points between classes and sensitive to outliers, the system performance is often degraded when the above problems occur in the training samples. In this paper, a method of estimating the total variable space based on partial least squares of regression penalty is proposed. The training corpus is divided into two parts, one part is used to train the initial total change space, the other part is used for regression punishment. The experimental results show that the speaker recognition and identification performance in King-ASR-009 database is improved.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2015
【分類號】:TN912.34
【參考文獻】
相關(guān)碩士學(xué)位論文 前1條
1 王秋雯;基于GMM-UBM的快速說話人識別方法[D];哈爾濱工業(yè)大學(xué);2011年
,本文編號:1796491
本文鏈接:http://www.sikaile.net/kejilunwen/wltx/1796491.html
最近更新
教材專著