基于神經(jīng)網(wǎng)絡(luò)的語(yǔ)音轉(zhuǎn)換算法研究
發(fā)布時(shí)間:2018-02-05 22:01
本文關(guān)鍵詞: 語(yǔ)音轉(zhuǎn)換 廣義回歸神經(jīng)網(wǎng)絡(luò) PSO算法 LPC模型 STRAIGHT模型 出處:《西安建筑科技大學(xué)》2017年碩士論文 論文類(lèi)型:學(xué)位論文
【摘要】:語(yǔ)音轉(zhuǎn)換技術(shù)是一種將源說(shuō)話人的聲音變?yōu)槟繕?biāo)說(shuō)話人聲音的技術(shù)。作為一門(mén)交叉性較強(qiáng)的學(xué)科,語(yǔ)音轉(zhuǎn)換技術(shù)目前已在文語(yǔ)轉(zhuǎn)換、醫(yī)療輔助和通信保密等方面已經(jīng)得到了重要應(yīng)用,并且在其他領(lǐng)域展現(xiàn)出了廣泛的應(yīng)用前景。語(yǔ)音轉(zhuǎn)換的研究不僅能加深信號(hào)處理領(lǐng)域的理論發(fā)展,而能夠加深其他與之交叉領(lǐng)域的研究進(jìn)展。因此,語(yǔ)音轉(zhuǎn)換技術(shù)的研究在各個(gè)方面都表現(xiàn)出了重要的意義。目前進(jìn)行語(yǔ)音轉(zhuǎn)換時(shí)使用最多的模型是高斯混合模型(Gaussian Mixture Model,GMM)和人工神經(jīng)網(wǎng)絡(luò)模型(Artificial Neural Networks,ANN)。考慮到GMM模型存在過(guò)平滑和過(guò)擬合等問(wèn)題,論文選用ANN模型進(jìn)行語(yǔ)音轉(zhuǎn)換。ANN中的徑向基函數(shù)神經(jīng)網(wǎng)絡(luò)(Radial Basis Function,RBF)模型結(jié)構(gòu)簡(jiǎn)單,可以逼近任意非線性函數(shù)。而廣義回歸神經(jīng)網(wǎng)絡(luò)(Generalized Regression Neuron Network,GRNN)作為RBF的一種特例,其模型具有很強(qiáng)的非線性映射能力、簡(jiǎn)單的網(wǎng)絡(luò)結(jié)構(gòu)和較高的魯棒性。針對(duì)GRNN模型有且只有一個(gè)模型參數(shù)的特點(diǎn),本文利用粒子群優(yōu)化算法(Particle swarm optimization,PSO)對(duì)其進(jìn)行參數(shù)優(yōu)化,得到了PSO-GRNN模型。該模型不但可以減少人為參數(shù)選擇對(duì)轉(zhuǎn)換模型的影響,還可以提高網(wǎng)絡(luò)的學(xué)習(xí)能力。因此,論文中使用的ANN模型有RBF模型、GRNN模型和PSO-GRNN模型。實(shí)驗(yàn)結(jié)果表明,基于PSO-GRNN模型的轉(zhuǎn)換語(yǔ)音比基于RBF模型和GRNN模型的轉(zhuǎn)換語(yǔ)音更接近目標(biāo)語(yǔ)音。線性預(yù)測(cè)編碼(Linear Prediction Coding,LPC)模型在語(yǔ)音信號(hào)分解時(shí)對(duì)鼻音和爆破音描述的準(zhǔn)確率不高,而STRAIGHT模型可以將語(yǔ)音信號(hào)分解得到彼此獨(dú)立的頻譜參數(shù)和基頻參數(shù),并對(duì)這些參數(shù)進(jìn)行語(yǔ)音重構(gòu)。故本文使用STRAIGHT模型代替LPC模型對(duì)語(yǔ)音信號(hào)分解和合成,并進(jìn)行了相應(yīng)的語(yǔ)音轉(zhuǎn)換實(shí)驗(yàn)。相似度測(cè)評(píng)結(jié)果表明,基于STRAIGHT和PSO-GRNN模型的轉(zhuǎn)換語(yǔ)音比基于LPC和PSO-GRNN模型的轉(zhuǎn)換語(yǔ)音更接近目標(biāo)語(yǔ)音。
[Abstract]:Speech conversion technology is a kind of technology that turns the source speaker's voice into the target speaker's voice. As a cross subject, speech conversion technology has been used in text to speech conversion. Medical aids and communication secrecy have been widely used in other fields. The research of speech conversion can not only deepen the theoretical development of signal processing. And can deepen the research progress in other intersecting fields. The research of speech conversion technology has shown great significance in all aspects. At present, Gao Si mixed model is the most widely used model in speech conversion. Gaussian Mixture Model. GMM) and artificial Neural Networks (Ann). Considering that the GMM model has some problems, such as smoothing and overfitting, etc. In this paper, the radial basis function neural network (Radial Basis function) model of ANN model for speech conversion. Ann is simple in structure. The generalized Regression Neuron Network can be approximated to any nonlinear function. GRN) as a special case of RBF, its model has strong nonlinear mapping ability, simple network structure and high robustness. Aiming at the characteristics of GRNN model with only one model parameter. In this paper, particle swarm optimization algorithm (PSO) is used to optimize its parameters. The PSO-GRNN model is obtained, which can not only reduce the influence of the artificial parameter selection on the conversion model, but also improve the learning ability of the network. The ANN model used in this paper includes RBF model and PSO-GRNN model. The transformed speech based on PSO-GRNN model is closer to the target speech than that based on RBF model and GRNN model. Linear Prediction Coding. The STRAIGHT model can decompose the speech signal into spectrum parameters and fundamental frequency parameters independently. So we use STRAIGHT model instead of LPC model to decompose and synthesize the speech signal, and carry out the corresponding speech conversion experiment. The result of similarity evaluation shows that the speech signal is decomposed and synthesized by the STRAIGHT model instead of the LPC model. The converted speech based on STRAIGHT and PSO-GRNN model is closer to the target speech than that based on LPC and PSO-GRNN model.
【學(xué)位授予單位】:西安建筑科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類(lèi)號(hào)】:TN912.3;TP183
【相似文獻(xiàn)】
相關(guān)碩士學(xué)位論文 前4條
1 楊秀峰;基于神經(jīng)網(wǎng)絡(luò)的語(yǔ)音轉(zhuǎn)換算法研究[D];西安建筑科技大學(xué);2017年
2 水晶;語(yǔ)音調(diào)度WEB平臺(tái)服務(wù)器推送技術(shù)研究[D];長(zhǎng)安大學(xué);2017年
3 李麗軍;漢字家族效應(yīng):語(yǔ)音總體激活與側(cè)抑制機(jī)制[D];西南大學(xué);2017年
4 郝唯;二人轉(zhuǎn)小帽的語(yǔ)言特色探析[D];西南大學(xué);2017年
,本文編號(hào):1492873
本文鏈接:http://www.sikaile.net/kejilunwen/xinxigongchenglunwen/1492873.html
最近更新
教材專(zhuān)著