當(dāng)前位置：主頁 > 科技論文 > 網(wǎng)絡(luò)通信論文 >

I-VECTOR說話人識別中基于偏最小二乘的總變化空間估計方法

發(fā)布時間：2018-04-24 11:46

本文選題：說話人識別 + i-vector��；參考：《哈爾濱工業(yè)大學(xué)》2015年碩士論文

【摘要】：作為一項關(guān)鍵的多媒體數(shù)據(jù)分析技術(shù),說話人識別被廣泛地應(yīng)用于事務(wù)訪問控制、身份驗證、執(zhí)法、語音數(shù)據(jù)管理,以及音頻監(jiān)控等領(lǐng)域。其中,i-vector作為一項有效的說話人識別技術(shù),其性能優(yōu)于傳統(tǒng)的說話人識別方法,因而在說話人識別領(lǐng)域受到了廣泛的關(guān)注。I-vector說話人識別技術(shù)的核心環(huán)節(jié)為總變化空間的估計,然而目前的總變化空間的估計方法均為通過尋找特征向量之間的數(shù)據(jù)信息關(guān)系達來到特征提取的目的,卻忽略了一個重要的先驗知識——說話人的類別信息,而類別信息對于樣本的分類與預(yù)測有著十分重要的意義,因此現(xiàn)有的總變化空間估計方法并不是最優(yōu)的。為此,本文從類別信息入手,提出了一種基于偏最小二乘的總變化空間的估計方法。首先訓(xùn)練高斯混合模型-通用背景模型(Gaussian Mixture Model-Universal Background Model,GMM-UBM),從而得到每位說話人GMM均值超向量;然后利用GMM均值超向量和類別信息估計總變化空間,并提取說話人i-vector;最后利用類內(nèi)協(xié)方差規(guī)整(Within-Class Covariance Normalization,WCCN)進行信道補償處理,并用余弦距離打分作為判決方法。實驗結(jié)果表明,King-ASR-009數(shù)據(jù)庫與NIST 2008數(shù)據(jù)庫(任務(wù)short2-short3與任務(wù)8conv-short3)上的識別性能均有明顯提升。由于偏最小二乘對類間相似點不敏感,而對異常點比較敏感,所以當(dāng)訓(xùn)練樣本中出現(xiàn)上述問題時,往往導(dǎo)致系統(tǒng)性能的下降。對此,本文提出了一種基于回歸懲罰偏最小二乘的總變化空間估計方法,將訓(xùn)練語料一分為二,一部分用于訓(xùn)練初始總變化空間,另一部分用于回歸懲罰。實驗結(jié)果表明,King-ASR-009數(shù)據(jù)庫上的說話人確認與辨認性能均有所提升。
[Abstract]:As a key technology of multimedia data analysis, speaker identification is widely used in the fields of transaction access control, authentication, law enforcement, voice data management, audio monitoring and so on. As an effective speaker recognition technology, the performance of i-vector is superior to that of the traditional speaker recognition method, so it has received widespread attention in the field of speaker recognition. The core link of .I-vector speaker recognition technology is the estimation of total change space. However, the current estimation methods of the total change space all reach the purpose of feature extraction by searching for the data information relationship between the feature vectors, but ignore an important priori knowledge-the category information of the speaker. Class information is very important for the classification and prediction of samples, so the existing estimation methods of total variation space are not optimal. For this reason, this paper presents an estimation method of total change space based on partial least squares. Firstly, Gao Si mixed model-general background model is trained to obtain the GMM mean supervector of each speaker, and then the total change space is estimated by using GMM mean supervector and category information. Finally, we use Within-Class Covariance NormalizationWCCNs to deal with channel compensation, and use cosine distance as the judgment method. The experimental results show that the recognition performance of King-ASR-009 database and NIST 2008 database (task short2-short3 and task 8conv-short3) are improved obviously. Since partial least squares is insensitive to similar points between classes and sensitive to outliers, the system performance is often degraded when the above problems occur in the training samples. In this paper, a method of estimating the total variable space based on partial least squares of regression penalty is proposed. The training corpus is divided into two parts, one part is used to train the initial total change space, the other part is used for regression punishment. The experimental results show that the speaker recognition and identification performance in King-ASR-009 database is improved.
【學(xué)位授予單位】：哈爾濱工業(yè)大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2015
【分類號】：TN912.34

【參考文獻】

相關(guān)碩士學(xué)位論文前1條

1 王秋雯;基于GMM-UBM的快速說話人識別方法[D];哈爾濱工業(yè)大學(xué);2011年

，

本文編號：1796491

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/wltx/1796491.html

上一篇：一種電磁頻譜管理盲監(jiān)測技術(shù)
下一篇：新的像機觀測可靠度模型

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

I-VECTOR說話人識別中基于偏最小二乘的總變化空間估計方法