基于機(jī)器學(xué)習(xí)的雙麥克風(fēng)手機(jī)語音增強(qiáng)算法研究
發(fā)布時間:2018-06-15 19:27
本文選題:神經(jīng)網(wǎng)絡(luò) + 手機(jī); 參考:《南京師范大學(xué)》2017年博士論文
【摘要】:手機(jī)作為目前市場最大,消費人群最廣的便攜式移動通訊設(shè)備,其通話質(zhì)量的改善一直以來受到了廣泛的關(guān)注。由于使用場合很廣,需要應(yīng)對的背景噪聲環(huán)境也十分復(fù)雜,這就要求應(yīng)用于手機(jī)平臺上的消噪算法可以靈活地應(yīng)對多種噪聲,在保證語音通話質(zhì)量的前提下,對背景噪聲進(jìn)行有效抑制,而且算法的性能不會因使用者握機(jī)姿勢的不同或通話過程中手機(jī)的轉(zhuǎn)動而下降,對真實環(huán)境具有良好的魯棒性。近年來人工智能的應(yīng)用已逐步覆蓋各個領(lǐng)域,機(jī)器學(xué)習(xí)作為其核心,強(qiáng)調(diào)在不斷的數(shù)據(jù)學(xué)習(xí)中改善算法的性能,這種特性使得機(jī)器學(xué)習(xí)相關(guān)算法(如神經(jīng)網(wǎng)絡(luò))能夠靈活應(yīng)對復(fù)雜而多變的外部環(huán)境,如果能將機(jī)器學(xué)習(xí)應(yīng)用于手機(jī)消噪算法中一定會顯著提升算法在真實場景下的性能,然而相關(guān)研究工作卻并不多。本文嘗試將機(jī)器學(xué)習(xí)中的神經(jīng)網(wǎng)絡(luò)模型應(yīng)用于手機(jī)消噪算法中,并針對消噪算法的各個部分進(jìn)行了改進(jìn),提高了算法在真實使用場景下的靈活性和魯棒性。全文工作及創(chuàng)新點主要包含下列幾個方面:(1)針對現(xiàn)有的雙通道VAD算法依賴于固定閾值難以在多種不同的噪聲環(huán)境下準(zhǔn)確地檢測語音和噪聲等問題。論文第二章結(jié)合神經(jīng)網(wǎng)絡(luò)提出了一種新的雙通道VAD算法,該算法以分頻帶能量差和歸一化互通道相關(guān)作為兩類新的特征,采用神經(jīng)網(wǎng)絡(luò)對語音和噪聲進(jìn)行分類,不依賴于固定的閾值,可以靈活應(yīng)對復(fù)雜而多變的噪聲環(huán)境,較現(xiàn)有的基于互通道能量差及其改進(jìn)的VAD算法準(zhǔn)確性更高。(2)論文的第三章利用了手機(jī)兩個麥克風(fēng)接收帶噪語音信號功率的比值在噪聲段和語音段的不同,提出一種新的基于互通道功率比值的VAD算法,在此基礎(chǔ)上,將第二章的神經(jīng)網(wǎng)絡(luò)VAD算法與基于互通道功率比值的VAD算法相結(jié)合,最終得到一種適用于手機(jī)消噪處理中的語音和噪聲活動檢測算法,該算法能夠分別針對語音和噪聲進(jìn)行準(zhǔn)確的檢測,使用檢測結(jié)果控制時域語音增強(qiáng)算法對帶噪語音信號進(jìn)行消噪處理,在濾除噪聲的同時能夠顯著降低對語音信號造成的損傷,提高語音的可懂度,特別是對方向性的語音干擾也能夠有很好的抑制效果。(3)為了進(jìn)一步濾除第三章時域語音增強(qiáng)處理后殘留的線性不相關(guān)噪聲,論文的第四章將時域輸出的增強(qiáng)語音信號和背景噪聲信號轉(zhuǎn)化到頻域進(jìn)行進(jìn)一步的消噪處理,并分別針對消噪算法中兩個重要的組成部分:噪聲估計和噪聲消除做了改進(jìn)。首先結(jié)合單、雙麥克風(fēng)的噪聲估計算法,提高了噪聲估計的準(zhǔn)確性,然后將基音檢測與消噪處理相結(jié)合,在語音幀中估計語音基音頻率確定語音和噪聲頻率點,針對語音和噪聲頻率點分別調(diào)整維納濾波器的參數(shù),在對噪聲進(jìn)行濾除的同時盡可能地保留語音頻點,從而減少了語音失真。實驗結(jié)果表明,與現(xiàn)有的雙麥克風(fēng)消噪算法相比,經(jīng)過改進(jìn)后的頻域消噪算法能夠更有效地減少對語音信號造成的損害,提高了手機(jī)的通話質(zhì)量。(4)使用者握機(jī)姿勢的不同或通話過程中手機(jī)的轉(zhuǎn)動會對消噪算法的性能產(chǎn)生影響,如果能夠?qū)崟r確定手機(jī)的位置,并依據(jù)當(dāng)前位置及時調(diào)整消噪算法的參數(shù)則能夠提高算法的性能,F(xiàn)有的定位算法大多需要三個以上的麥克風(fēng)陣列,無法直接用于雙麥克風(fēng)的手機(jī)上。論文第五章結(jié)合手機(jī)這一特定的應(yīng)用場景提出了一種只使用兩個麥克風(fēng)在三維空間中定位手機(jī)位置的新方法,該方法使用互通道時延和通過對目標(biāo)語音到達(dá)兩個麥克風(fēng)的傳播路徑進(jìn)行分析提出的新特征子帶互通道功率比作為輸入,訓(xùn)練神經(jīng)網(wǎng)絡(luò)輸出手機(jī)的空間位置。(5)當(dāng)檢測到手機(jī)偏離標(biāo)準(zhǔn)通話位置時,依據(jù)第五章神經(jīng)網(wǎng)絡(luò)定位的結(jié)果及時地對論文第三和第四章中的時域和頻域消噪算法的參數(shù)進(jìn)行調(diào)整,避免了算法因手機(jī)位置的移動而造成的通話性能下降。實驗結(jié)果表明,現(xiàn)有的雙麥克風(fēng)消噪算法由于忽略了手機(jī)轉(zhuǎn)動的問題,在真實場景下的性能無法得到保障,而本論文提出的消噪算法性能更加穩(wěn)定也更具有實用性。論文的結(jié)尾概括了全文的主要工作和創(chuàng)新性的研究成果,并對進(jìn)一步的研究進(jìn)行了展望。
[Abstract]:Mobile phone, the largest portable mobile communication device in the market and the largest consumer in the market, has been widely concerned about the improvement of call quality. Because of the wide use of the mobile phone, the background noise environment that needs to be dealt with is very complex. This requires that the denoising algorithm applied to the flat platform of the mobile phone can be flexible to deal with many kinds of noise. On the premise of guaranteeing the quality of voice calls, the background noise is effectively suppressed, and the performance of the algorithm will not decline because of the different positions of the user and the rotation of the mobile phone during the call process. It has good robustness to the real environment. In recent years, the application of artificial intelligence has been gradually covered in various fields, and machine learning is used as its application. The core is to improve the performance of the algorithm in continuous data learning. This feature makes the machine learning related algorithms (such as neural networks) flexible to cope with complex and changeable external environments. If the machine learning is applied to the mobile phone denoising algorithm, the performance of the algorithm will be significantly improved in the real scene. This paper tries to apply the neural network model in machine learning to the algorithm of mobile phone noise elimination, and improves the flexibility and robustness of the algorithm in the real use scene. The main package of full text work and innovation includes the following aspects: (1) for the existing dual channel In the second chapter, a new dual channel VAD algorithm is proposed in the second chapter of the paper. The second chapter combines the energy difference of the frequency band and the normalized cross channel correlation as two new features, and the neural network is used for speech and noise. The classification of sound is not dependent on the fixed threshold, and it can handle complex and changeable noise environment flexibly. The VAD algorithm based on the existing mutual channel energy difference and its improved algorithm is more accurate. (2) the third chapter of the paper uses the difference of the ratio of the power of the noisy speech signals received by the two microphone of the mobile phone, and the difference between the noise and the speech segments is proposed. A new VAD algorithm based on the ratio of mutual channel power is proposed. On this basis, the second chapter neural network VAD algorithm is combined with the VAD algorithm based on the power ratio of mutual channel. Finally, a speech and noise detection algorithm suitable for mobile phone noise elimination can be obtained. The algorithm can be used to correct speech and noise respectively. Detection, using the detection results to control the time domain speech enhancement algorithm to denoise the noisy speech signal. While filtering the noise, it can significantly reduce the damage to the speech signal and improve the intelligibility of the speech, especially for the directional speech interference. (3) in order to further filter the third chapters The fourth chapter of this paper transforms the enhanced speech signal and background noise signal in the time domain to the frequency domain for further de-noising. The two important components of the denoising algorithm: noise estimation and noise elimination are improved. First, single, double Mike is combined. The algorithm of wind noise estimation improves the accuracy of noise estimation. Then the pitch detection and noise elimination are combined. The speech and noise frequency points are estimated in the speech frame, and the parameters of the Wiener filter are adjusted to the speech and noise frequency points. While the noise is filtered, the speech is preserved as much as possible. The experimental results show that compared with the existing double microphone denoising algorithm, the improved frequency domain denoising algorithm can reduce the damage to the speech signal more effectively and improve the call quality of the mobile phone. (4) the rotation of the mobile phone in the different position of the user's grip or the call process will eliminate the noise. The performance of the algorithm has an impact. If it can determine the location of the mobile phone in real time and adjust the parameters of the denoising algorithm in time according to the current position, the algorithm can improve the performance of the algorithm. Most of the existing location algorithms need more than three microphone arrays and can not be used directly on the two microphone mobile phones. The fifth chapter of the paper combines with the specific mobile phone. In the application scenario, a new method of locating the mobile phone in a three-dimensional space with only two microphones is used. This method uses the mutual channel time delay and the new characteristic subband power ratio as input by analyzing the propagation path of the target speech to two microphones, and trains the space of the neural network to output the cell phone space. Position. (5) when the mobile phone is detected to deviate from the standard call position, the parameters of the time domain and frequency domain denoising algorithm in the third and fourth chapters of the paper are adjusted in time according to the results of the fifth chapter neural network positioning, which avoids the call performance degradation caused by the mobile location of the mobile phone. The experimental results show that the existing dual microphone is used. Because of ignoring the problem of mobile phone rotation, the performance of the noise elimination algorithm can not be guaranteed in the real scene, and the performance of the denoising algorithm proposed in this paper is more stable and more practical. The end of this paper summarizes the main work and innovative research results of the full text, and looks forward to the further research.
【學(xué)位授予單位】:南京師范大學(xué)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2017
【分類號】:TN912.3;TP181
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 紀(jì)振發(fā);楊暉;李然;金銀超;;基于短時自相關(guān)及過零率的語音端點檢測算法[J];電子科技;2016年09期
2 章雒霏;張銘;李晨;;一種新的語音和噪聲活動檢測算法及其在手機(jī)雙麥克風(fēng)消噪系統(tǒng)中的應(yīng)用[J];電子與信息學(xué)報;2016年08期
3 王明合;張二華;唐振民;許昊;;基于Fisher線性判別分析的語音信號端點檢測方法[J];電子與信息學(xué)報;2015年06期
4 張宗帥;顧亞平;張俊;楊小平;;基于HRTF的虛擬聲源定位[J];網(wǎng)絡(luò)新媒體技術(shù);2015年02期
5 郭海燕;李梟雄;李擬s,
本文編號:2023281
本文鏈接:http://www.sikaile.net/kejilunwen/xinxigongchenglunwen/2023281.html
最近更新
教材專著