嘈雜聲學(xué)環(huán)境下的時(shí)頻語音出現(xiàn)概率與噪聲功率譜估計(jì)
本文選題:語音出現(xiàn)概率 切入點(diǎn):噪聲功率譜估計(jì) 出處:《北京理工大學(xué)》2016年博士論文
【摘要】:語音出現(xiàn)概率與噪聲功率譜是語音增強(qiáng)所依賴的基本前提,它們對(duì)噪聲消除的結(jié)果有著決定性的影響。語音出現(xiàn)概率與噪聲功率譜估計(jì)是兩個(gè)等效問題,從一個(gè)問題的解可以推導(dǎo)出另一個(gè)解。本文關(guān)注的焦點(diǎn)在于利用統(tǒng)計(jì)模型推導(dǎo)出兩個(gè)最優(yōu)解。傳統(tǒng)的統(tǒng)計(jì)模型建模方法是啟發(fā)式的,在模型參數(shù)的更新過程中采用了大量的經(jīng)驗(yàn)規(guī)則,甚至某些重要的參數(shù)直接由經(jīng)驗(yàn)給出。啟發(fā)式的方法使得模型參數(shù)對(duì)數(shù)據(jù)的自適應(yīng)能力差,難以保證最優(yōu)解。此外,傳統(tǒng)的建模方法是半監(jiān)督式的。它們通常假定輸入語音是以非語音起始的,起始部分的非語音可視作被標(biāo)記的樣本,用于監(jiān)督式建模,在后續(xù)更新中采用決策導(dǎo)向的非監(jiān)督方法更新模型,因而在整體上視為半監(jiān)督式的建模。然而,在實(shí)際應(yīng)用中輸入語音經(jīng)常以語音信號(hào)起始,因而半監(jiān)督式建模方法不能滿足實(shí)際需求。針對(duì)傳統(tǒng)方法存在的問題,本文提出了一種基于非監(jiān)督聚類的最優(yōu)估計(jì)方法,在極大似然準(zhǔn)則指導(dǎo)下求解聚類模型的參數(shù),從而保證了語音出現(xiàn)概率和噪聲功率譜的解是最優(yōu)的。具體采用二元高斯混合模型(GMM)和隱馬爾可夫模型(HMM)作為聚類模型,將語音和非語音聚類看作模型的兩個(gè)“元”。本文中,聚類過程等同于模型參數(shù)的估計(jì)過程,噪聲功率譜的解則由聚類均值表示,語音出現(xiàn)概率(SPP)則由聚類的統(tǒng)計(jì)特征導(dǎo)出。由于聚類是非監(jiān)督式的建模方法,它不需要非語音起始假設(shè),比傳統(tǒng)的建模方式更貼近于實(shí)際應(yīng)用。論文的具體貢獻(xiàn)和創(chuàng)新性研究成果簡述如下:1.提出了二元GMM的非監(jiān)督離線建模方法,對(duì)每個(gè)子帶上的對(duì)數(shù)功率譜包絡(luò)建模,采用經(jīng)典的EM方法實(shí)現(xiàn)最優(yōu)估計(jì)。2.提出了二元HMM的離線建模方法。HMM相比于GMM的優(yōu)勢(shì)在于它考慮了譜包絡(luò)的時(shí)間相關(guān)性,它將子帶上的功率譜包絡(luò)視作在語音和非語音狀態(tài)之間動(dòng)態(tài)轉(zhuǎn)移的狀態(tài)序列,EM方法使得時(shí)間相關(guān)性自適應(yīng)于觀察數(shù)據(jù)。3.在經(jīng)典的EM方法基礎(chǔ)上,實(shí)現(xiàn)了一個(gè)近似最優(yōu)的GMM參數(shù)在線估計(jì),GMM的參數(shù)集逐幀更新,同時(shí)逐幀輸出檢測(cè)與估計(jì)結(jié)果。4.提出HMM的在線似然函數(shù),并在似然函數(shù)的基礎(chǔ)上,根據(jù)牛頓迭代法推導(dǎo)出HMM參數(shù)集的一階遞歸過程,實(shí)現(xiàn)參數(shù)的逐幀最優(yōu)更新。5.針對(duì)功率譜包絡(luò)的統(tǒng)計(jì)特征,提出約束二元GMM/HMM模型的方法,使得模型在語音長時(shí)缺失的情況仍然保持穩(wěn)定。
[Abstract]:Speech appearance probability and noise power spectrum are the basic premise of speech enhancement, and they have a decisive effect on the result of noise elimination. The probability of speech appearance and the estimation of noise power spectrum are two equivalent problems. The focus of this paper is to deduce two optimal solutions from the solution of one problem. The traditional statistical model modeling method is heuristic. In the process of updating model parameters, a large number of empirical rules are used, and even some important parameters are given directly by experience. The heuristic method makes the model parameters' adaptive ability to data poor, so it is difficult to guarantee the optimal solution. The traditional modeling methods are semi-supervised. They usually assume that the input speech starts with non-speech, and the non-speech in the beginning part can be regarded as a marked sample for supervised modeling. Decision-oriented unsupervised method is used to update the model in the follow-up update, so it is regarded as semi-supervised modeling in the whole. However, in practical application, the input speech often starts with speech signal. Therefore, the semi-supervised modeling method can not meet the practical requirements. In order to solve the problems of traditional methods, an unsupervised clustering based optimal estimation method is proposed in this paper, which can solve the parameters of the clustering model under the guidance of maximum likelihood criterion. Therefore, it is ensured that the solution of speech appearance probability and noise power spectrum is optimal. In this paper, binary Gao Si mixed model (GMMM) and hidden Markov model (HMMM) are used as clustering models, and speech and non-speech clustering are regarded as two "elements" of the model. The clustering process is equivalent to the estimation of the model parameters, the solution of the noise power spectrum is represented by the clustering mean, and the speech appearance probability SPP is derived from the statistical features of the clustering. It does not require the assumption of non-speech initiation and is closer to practical application than the traditional modeling method. The specific contributions and innovative research results of this paper are summarized as follows: 1. An unsupervised offline modeling method for binary GMM is proposed. For the logarithmic power spectral envelope modeling of each subband, the classical EM method is used to realize the optimal estimation. 2. An off-line modeling method of binary HMM is proposed. The advantage of hmm over GMM is that it takes into account the temporal correlation of spectral envelope. It regards the power spectral envelope of the subband as a state sequence of dynamic transition between speech and non-speech states, which makes temporal correlation adaptive to observation data .3. based on the classical EM method, An approximate optimal on-line estimation of GMM parameters is implemented. The parameter set is updated frame by frame. At the same time, the detection and estimation results of HMM are outputted. 4. The online likelihood function of HMM is proposed and based on the likelihood function. According to Newton iteration method, the first order recursive process of HMM parameter set is deduced, and the optimal updating of parameters is realized by frame by frame. 5. According to the statistical characteristics of power spectrum envelope, a method of constrained binary GMM/HMM model is proposed. The model remains stable when the speech is absent for a long time.
【學(xué)位授予單位】:北京理工大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2016
【分類號(hào)】:TN912.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 許春冬;戰(zhàn)鴿;應(yīng)冬文;李軍鋒;顏永紅;;基于隱馬爾可夫模型的非監(jiān)督噪聲功率譜估計(jì)[J];數(shù)據(jù)采集與處理;2015年02期
2 許春冬;夏日升;應(yīng)冬文;李軍鋒;顏永紅;;HMM-based noise estimator for speech enhancement[J];Journal of Beijing Institute of Technology;2014年04期
3 何玉文;鮑長春;夏丙寅;;基于AR-HMM在線能量調(diào)整的語音增強(qiáng)方法[J];電子學(xué)報(bào);2014年10期
4 許春冬;夏日升;應(yīng)冬文;李軍鋒;;面向語音增強(qiáng)的序貫隱馬爾可夫模型時(shí)頻語音存在概率估計(jì)[J];聲學(xué)學(xué)報(bào);2014年05期
5 肖佳林;趙聿晴;王英;;基于HMM與SVM的語音活動(dòng)檢測(cè)[J];計(jì)算機(jī)工程;2014年01期
6 周建英;吳小培;張超;呂釗;;基于滑動(dòng)窗的混合高斯模型運(yùn)動(dòng)目標(biāo)檢測(cè)方法[J];電子與信息學(xué)報(bào);2013年07期
7 司華建;李輝;陳冠華;方昕;;最大后驗(yàn)概率自適應(yīng)方法在口令識(shí)別中的應(yīng)用[J];計(jì)算機(jī)工程與應(yīng)用;2013年12期
8 梁巖;鮑長春;夏丙寅;何玉文;周璇;李娜;;基于高斯混合模型的壓縮域語音增強(qiáng)方法[J];電子學(xué)報(bào);2012年10期
9 張敏;曾曉輝;;基于優(yōu)選信息熵的語音端點(diǎn)檢測(cè)方法[J];計(jì)算機(jī)工程;2012年19期
10 SON Young-ho;LEE Sang-min;;Improved speech absence probability estimation based on environmental noise classification[J];Journal of Central South University;2012年09期
,本文編號(hào):1700075
本文鏈接:http://www.sikaile.net/shoufeilunwen/xxkjbs/1700075.html