語音識別系統(tǒng)中的VTS特征補(bǔ)償算法優(yōu)化
發(fā)布時(shí)間:2018-05-28 17:09
本文選題:矢量泰勒級數(shù) + 特征補(bǔ)償; 參考:《東南大學(xué)》2016年碩士論文
【摘要】:在實(shí)際環(huán)境中,由于環(huán)境噪聲的干擾,語音識別系統(tǒng)的識別性能并不理想。矢量泰勒級數(shù)(VTS:Vector Taylor Series)特征補(bǔ)償是一種基于模型的特征補(bǔ)償算法,具有很強(qiáng)的魯棒性,能夠有效解決訓(xùn)練環(huán)境與測試環(huán)境失配導(dǎo)致的識別性能下降問題。針對VTS計(jì)算量大、在低信噪比環(huán)境下性能急劇下降的問題,論文將對基于VTS的孤立詞識別系統(tǒng)進(jìn)行優(yōu)化,主要包括基于雙層高斯混合模型(GMM:Gaussian Mixture Model)結(jié)構(gòu)的VTS特征補(bǔ)償優(yōu)化,以及針對多環(huán)境模型的噪聲參數(shù)估計(jì)的初始值優(yōu)化,通過優(yōu)化提高系統(tǒng)的識別速度和識別率,增強(qiáng)語音識別系統(tǒng)的實(shí)用性。主要工作如下:(1)魯棒語音識別系統(tǒng)結(jié)構(gòu)分析。重點(diǎn)分析魯棒語音識別中的關(guān)鍵技術(shù),包括基于加權(quán)子帶譜熵的端點(diǎn)檢測算法,VTS特征補(bǔ)償算法,以及聲學(xué)模型。聲學(xué)模型包括用于特征補(bǔ)償?shù)腉MM模型和模式識別的隱馬爾可夫模型(HMM:Hidden Markov Model).(2)基于雙層GMM模型的VTS補(bǔ)償算法優(yōu)化。針對VTS特征補(bǔ)償計(jì)算量大的問題,本文提出了雙層GMM的VTS算法結(jié)構(gòu),將特征補(bǔ)償中的噪聲參數(shù)估計(jì)過程和特征映射過程分開進(jìn)行。在訓(xùn)練階段,分別得到高斯單元混合數(shù)個(gè)數(shù)較少的GMM1模型和混合高斯個(gè)數(shù)較多的GMM2模型。特征補(bǔ)償過程中,先用GMM1模型估計(jì)測試語音中噪聲的均值和方差,再利用GMM2模型基于最小均方誤差準(zhǔn)則,將測試語音的含噪特征參數(shù)映射成純凈的語音特征參數(shù)。算法優(yōu)化大幅降低了計(jì)算量,同時(shí)保持了識別性能。(3)基于多環(huán)境模型VTS算法的噪聲參數(shù)估計(jì)初始值優(yōu)化;诙喹h(huán)境模型VTS語音識別從基本環(huán)境模型集中選出與當(dāng)前環(huán)境最匹配的聲學(xué)模型,用于特征補(bǔ)償,能夠有效降低訓(xùn)練環(huán)境與測試環(huán)境之間的失配性。根據(jù)最優(yōu)GMM模型設(shè)置噪聲參數(shù)的初始值,在噪聲參數(shù)迭代求解過程中可以有效的避免最大期望(EM:Expectation-maximization)算法陷入局部收斂,使得EM算法能夠以更少的迭代次數(shù)收斂到更為準(zhǔn)確的估計(jì)值,從而提高語音識別性能。(4)實(shí)現(xiàn)了基于MATLAB的離線仿真測試和基于C平臺的實(shí)時(shí)測試。在MATLAB平臺和C平臺進(jìn)行大量實(shí)驗(yàn),驗(yàn)證本文所提出優(yōu)化算法的有效性。實(shí)驗(yàn)證明,本文所提出的雙層GMM結(jié)構(gòu)優(yōu)化算法在中文語音庫下識別速度提升38%左右,噪聲參數(shù)估計(jì)EM迭代初始值優(yōu)化算法能夠更加準(zhǔn)確的估計(jì)出噪聲參數(shù),從而使系統(tǒng)誤識率下降,特別是在低信噪比環(huán)境下效果更加明顯。
[Abstract]:In the actual environment, the recognition performance of speech recognition system is not ideal due to the interference of environmental noise. Vector Taylor series Taylor series is a model-based feature compensation algorithm, which is robust and can effectively solve the problem of poor recognition performance caused by mismatch of training environment and test environment. Aiming at the problem of large amount of VTS computation and sharp deterioration of performance in low SNR environment, the isolated word recognition system based on VTS will be optimized in this paper, including the VTS feature compensation optimization based on the two-layer Gao Si hybrid model (GMM: Gaussian Mixture Model) structure. And the initial value of noise parameter estimation for multi-environment model is optimized to improve the recognition speed and recognition rate of the system and enhance the practicability of the speech recognition system. The main work is as follows: 1) structure analysis of robust speech recognition system. The key technologies of robust speech recognition are analyzed, including the VTS feature compensation algorithm based on weighted sub-band spectral entropy and acoustic model. The acoustic model includes the GMM model for feature compensation and the hidden Markov model for pattern recognition. In order to solve the problem of large computation of VTS feature compensation, a two-layer GMM VTS algorithm is proposed in this paper, in which the noise parameter estimation process and the feature mapping process in feature compensation are separated. In the training stage, the GMM1 model with less mixing number of Gao Si cells and the GMM2 model with more mixed Gao Si number are obtained respectively. In the process of feature compensation, the GMM1 model is used to estimate the mean and variance of the noise in the test speech first, and then, based on the minimum mean square error criterion, the noisy feature parameters of the tested speech are mapped to pure speech feature parameters by using the GMM2 model. The algorithm greatly reduces the computational complexity, while keeping the recognition performance. 3) the noise parameter estimation initial value optimization based on the multi-environment model VTS algorithm. Based on the multi-environment model VTS speech recognition selects the most suitable acoustic model from the basic environment model for feature compensation which can effectively reduce the mismatch between the training environment and the test environment. By setting the initial value of noise parameters according to the optimal GMM model, we can effectively avoid the maximum expectation EM1: Expectation-maximization algorithm falling into local convergence in the iterative solution of noise parameters. The EM algorithm can converge to a more accurate estimate with fewer iterations, thus improving the speech recognition performance. (4) the off-line simulation test based on MATLAB and the real-time test based on C platform are realized. A large number of experiments are carried out on MATLAB platform and C platform to verify the effectiveness of the proposed optimization algorithm. Experimental results show that the proposed two-layer GMM structure optimization algorithm increases the recognition speed by about 38% under the Chinese speech corpus, and the noise parameters can be estimated more accurately by the EM iterative initial value optimization algorithm. Thus, the system error rate is decreased, especially in the low SNR environment.
【學(xué)位授予單位】:東南大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TN912.34
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 汪洪波;;語音識別系統(tǒng)在配送中心的應(yīng)用[J];信息與電腦;2006年06期
2 楊q,
本文編號:1947512
本文鏈接:http://www.sikaile.net/kejilunwen/xinxigongchenglunwen/1947512.html
最近更新
教材專著