基于I-VECTOR的與文本無(wú)關(guān)的說(shuō)話人識(shí)別研究
本文選題:語(yǔ)音信號(hào) + GMM-UBM模型 ; 參考:《蘭州理工大學(xué)》2017年碩士論文
【摘要】:說(shuō)話人識(shí)別作為生物識(shí)別的一種,因其使用便捷性、非交互式等優(yōu)勢(shì)逐漸被人們接受和使用,并成為生物識(shí)別領(lǐng)域的研究熱點(diǎn)。與文本無(wú)關(guān)的說(shuō)話人識(shí)別是從語(yǔ)音信號(hào)中提取出能反映個(gè)人特征的信息,來(lái)完成對(duì)話者身份的辨認(rèn)和確認(rèn)。近年來(lái),隨著說(shuō)話人識(shí)別技術(shù)的發(fā)展,說(shuō)話人識(shí)別逐漸走向社會(huì)應(yīng)用,但實(shí)際使用時(shí),由于實(shí)際環(huán)境的影響、語(yǔ)音采集設(shè)備的多樣性以及話者語(yǔ)音的長(zhǎng)短等影響,說(shuō)話人識(shí)別在識(shí)別精度上還存在一些問(wèn)題。本文針對(duì)在實(shí)際使用時(shí),測(cè)試話者的短語(yǔ)音導(dǎo)致識(shí)別精度不高以及環(huán)境失配等問(wèn)題,從補(bǔ)償?shù)慕嵌?研究了高斯模型、i-vector模型以及高斯線性鑒別性分析(GPLDA)模型。首先,本文對(duì)說(shuō)話人識(shí)別的模型進(jìn)行了介紹,探討了說(shuō)話人識(shí)別的預(yù)處理和特征提取,利用美爾頻率倒譜系數(shù)提取話者的特征,針對(duì)訓(xùn)練和測(cè)試語(yǔ)音不足的問(wèn)題,構(gòu)建了GMM-UBM模型,對(duì)其原理和建模進(jìn)行了相關(guān)的闡述,并分析了該系統(tǒng)的優(yōu)缺點(diǎn),通過(guò)實(shí)驗(yàn)驗(yàn)證了模型的混合度選取,研究了反映說(shuō)話人動(dòng)態(tài)和靜態(tài)特征的美爾頻率差分特征對(duì)說(shuō)話人識(shí)別的影響,通過(guò)實(shí)驗(yàn)分析了該系統(tǒng)的性能。其次,針對(duì)GMM-UBM跨信道性能差的特點(diǎn),在因子分析的基礎(chǔ)上,利用身份認(rèn)證矢量i-vector構(gòu)建了基于i-vector的說(shuō)話人確認(rèn)系統(tǒng)。針對(duì)信道失配等問(wèn)題,利用線性鑒別性分析和類內(nèi)協(xié)方差歸一化等補(bǔ)償手段對(duì)系統(tǒng)進(jìn)行補(bǔ)償,并分析各補(bǔ)償方式對(duì)系統(tǒng)的影響。同時(shí)利用實(shí)驗(yàn)分析了i-vector維數(shù)對(duì)說(shuō)話人識(shí)別系統(tǒng)的影響,并選取了合適的特征維數(shù)。最后,針對(duì)目前與文本無(wú)關(guān)的說(shuō)話人識(shí)別,基于不定長(zhǎng)短語(yǔ)音的說(shuō)話人確認(rèn)的識(shí)別精度低等問(wèn)題,本文采用高斯線性鑒別行分析(GPLDA)模型,針對(duì)將i-vector轉(zhuǎn)化到PLDA模型時(shí),對(duì)i-vector進(jìn)行長(zhǎng)度歸一化,導(dǎo)致對(duì)長(zhǎng)度歸一化后的i-vector的后端協(xié)方差不能進(jìn)行精確計(jì)算,影響系統(tǒng)的魯棒性。本文提出利用全變量空間的列向量歸一化來(lái)代替對(duì)i-vector的長(zhǎng)度歸一化,并對(duì)提出的方法進(jìn)行驗(yàn)證和實(shí)驗(yàn),結(jié)果表明該方法可以提高系統(tǒng)的魯棒性,且識(shí)別率沒(méi)有降低。
[Abstract]:As a kind of biometrics, speaker recognition has been accepted and used gradually because of its advantages of convenience and non-interaction, and has become a hotspot in the field of biometrics. Text-independent speaker recognition is to extract information that reflects personal characteristics from the speech signal to identify and confirm the identity of the interlocutor. In recent years, with the development of speaker recognition technology, speaker recognition has gradually moved towards social application, but in practical use, due to the influence of actual environment, the diversity of speech acquisition equipment and the length of speaker speech, etc. There are still some problems in the recognition accuracy of speaker recognition. In order to solve the problems of low recognition accuracy and environmental mismatch caused by the short speech of the speaker in practical use, this paper studies the Gao Si model and the Gao Si linear discriminant analysis model from the angle of compensation. Firstly, this paper introduces the model of speaker recognition, discusses the preprocessing and feature extraction of speaker recognition, extracts the speaker's features by using the Mel frequency cepstrum coefficient, and aims at the problem of insufficient training and testing speech. The GMM-UBM model is constructed, its principle and modeling are expounded, the advantages and disadvantages of the system are analyzed, and the selection of the mixing degree of the model is verified by experiments. The effect of Mel frequency difference feature, which reflects the dynamic and static characteristics of the speaker, on speaker recognition is studied, and the performance of the system is analyzed through experiments. Secondly, aiming at the poor cross-channel performance of GMM-UBM, based on factor analysis, a speaker confirmation system based on i-vector is constructed by using identity authentication vector i-vector. In order to solve the problem of channel mismatch, linear discriminant analysis and intra-class covariance normalization are used to compensate the system, and the influence of each compensation method on the system is analyzed. At the same time, the influence of i-vector dimension on speaker recognition system is analyzed by experiments, and the appropriate feature dimension is selected. Finally, aiming at the low recognition accuracy of text-independent speaker recognition and speaker recognition based on variable length speech, this paper adopts Gao Si linear line discriminant analysis (Gao Si) model, aiming at transforming i-vector into PLDA model. The length normalization of i-vector results in the failure to calculate the back-end covariance of the normalized length i-vector, which affects the robustness of the system. In this paper, the column vector normalization in full variable space is proposed to replace the length normalization of i-vector, and the proposed method is verified and tested. The results show that the proposed method can improve the robustness of the system and the recognition rate is not reduced.
【學(xué)位授予單位】:蘭州理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TN912.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前4條
1 胡群威;吳明輝;李輝;;利用時(shí)長(zhǎng)信息提高說(shuō)話人確認(rèn)系統(tǒng)的魯棒性[J];微型機(jī)與應(yīng)用;2016年11期
2 許云飛;楊海;周若華;顏永紅;;高斯PLDA在說(shuō)話人確認(rèn)中的應(yīng)用及其聯(lián)合估計(jì)[J];自動(dòng)化學(xué)報(bào);2014年06期
3 劉華平;李昕;徐柏齡;姜寧;;語(yǔ)音信號(hào)端點(diǎn)檢測(cè)方法綜述及展望[J];計(jì)算機(jī)應(yīng)用研究;2008年08期
4 李樺,安鋼,樊新海;短時(shí)能頻值在語(yǔ)音端點(diǎn)檢測(cè)中的應(yīng)用[J];測(cè)試技術(shù)學(xué)報(bào);1999年01期
相關(guān)碩士學(xué)位論文 前10條
1 李銳;基于因子分析的說(shuō)話人分離技術(shù)研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2016年
2 胡群威;話者確認(rèn)中信道和時(shí)長(zhǎng)失配補(bǔ)償研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2016年
3 趙靈歌;文本無(wú)關(guān)的說(shuō)話人識(shí)別研究[D];重慶大學(xué);2016年
4 陳晨;I-VECTOR說(shuō)話人識(shí)別中基于偏最小二乘的總變化空間估計(jì)方法[D];哈爾濱工業(yè)大學(xué);2015年
5 卓著;基于信道補(bǔ)償技術(shù)的說(shuō)話人確認(rèn)研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2015年
6 陳煒;指紋識(shí)別系統(tǒng)的研究應(yīng)用[D];東南大學(xué);2015年
7 曾祺;文本無(wú)關(guān)的多說(shuō)話人確認(rèn)研究[D];電子科技大學(xué);2014年
8 鐘林鵬;說(shuō)話人識(shí)別系統(tǒng)中的語(yǔ)音信號(hào)處理技術(shù)研究[D];電子科技大學(xué);2013年
9 徐紅梅;與文本無(wú)關(guān)的閉集聲紋識(shí)別系統(tǒng)研究[D];哈爾濱理工大學(xué);2013年
10 向權(quán);基于GMM的聲紋識(shí)別系統(tǒng)研究[D];哈爾濱理工大學(xué);2012年
,本文編號(hào):1879264
本文鏈接:http://www.sikaile.net/kejilunwen/xinxigongchenglunwen/1879264.html