基于改進(jìn)的BLFW下平行和非平行文本的語(yǔ)音轉(zhuǎn)換算法研究
本文選題:語(yǔ)音轉(zhuǎn)換 + 自適應(yīng)高斯分類(lèi); 參考:《南京郵電大學(xué)》2017年碩士論文
【摘要】:在語(yǔ)音信號(hào)處理領(lǐng)域,語(yǔ)音轉(zhuǎn)換是指將一個(gè)說(shuō)話人(源說(shuō)話人)的語(yǔ)音轉(zhuǎn)換成聽(tīng)起來(lái)像另一個(gè)說(shuō)話人(目標(biāo)說(shuō)話人)的所發(fā)出的語(yǔ)音,同時(shí)保持語(yǔ)義不變。語(yǔ)音中包含著豐富的信息,包括語(yǔ)義信息、個(gè)性信息、語(yǔ)言信息和情感信息等,而語(yǔ)音轉(zhuǎn)換主要關(guān)注點(diǎn)在于語(yǔ)音的聲學(xué)本質(zhì)特征:頻譜特性和韻律特征。在語(yǔ)音轉(zhuǎn)換的多種應(yīng)用場(chǎng)景中,如娛樂(lè)和跨語(yǔ)言轉(zhuǎn)換應(yīng)用中,需要語(yǔ)音轉(zhuǎn)換系統(tǒng)能夠提供高質(zhì)量的語(yǔ)音和實(shí)現(xiàn)非平行文本下的語(yǔ)音轉(zhuǎn)換。現(xiàn)有的語(yǔ)音轉(zhuǎn)換系統(tǒng)面臨著兩個(gè)主要問(wèn)題:一方面是轉(zhuǎn)換后的語(yǔ)音不能同時(shí)獲得較高的相似度和較好的音質(zhì)效果,而不得不在轉(zhuǎn)換后語(yǔ)音的相似度和音質(zhì)上權(quán)衡,另一方面是轉(zhuǎn)換函數(shù)的訓(xùn)練依賴于平行語(yǔ)料,限制了語(yǔ)音轉(zhuǎn)換系統(tǒng)的通用性。首先為了實(shí)現(xiàn)較高音質(zhì)和相似度轉(zhuǎn)換的語(yǔ)音轉(zhuǎn)換,本文提出基于自適應(yīng)高斯分類(lèi)的雙線性頻率彎折加幅度調(diào)節(jié)算法,它采用自適應(yīng)高斯分類(lèi)更好地對(duì)語(yǔ)音的聲學(xué)特征分布建模,在實(shí)現(xiàn)合理分類(lèi)的基礎(chǔ)上進(jìn)行語(yǔ)音轉(zhuǎn)換。經(jīng)過(guò)主觀和客觀評(píng)價(jià),本文提出的方法比固定的分類(lèi)數(shù)的雙線性頻率彎折加幅度調(diào)節(jié)算法轉(zhuǎn)換后的語(yǔ)音的平均MOS值提高了4.7%,平均MCD值降低了2.7%,這說(shuō)明本文提出的方法對(duì)語(yǔ)音轉(zhuǎn)換系統(tǒng)的性能有一定的改進(jìn)。其次,為了解決語(yǔ)音轉(zhuǎn)換方法對(duì)平行語(yǔ)料的依賴,本文使用基于單元挑選和聲道長(zhǎng)度歸一化的方法對(duì)非平行語(yǔ)料進(jìn)行對(duì)齊,然后將基于自適應(yīng)高斯分類(lèi)的雙線性頻率彎折加幅度調(diào)節(jié)方法應(yīng)用于非平行文本下的語(yǔ)音轉(zhuǎn)換領(lǐng)域。經(jīng)過(guò)主觀和客觀評(píng)價(jià)實(shí)驗(yàn)對(duì)比,證實(shí)這種方法比非平行文本下INCA方法的轉(zhuǎn)換后的語(yǔ)音的平均MOS值提高了7.1%,平均MCD值降低了4.0%,表明轉(zhuǎn)換后的語(yǔ)音音質(zhì)更好,相似度更高。而與傳統(tǒng)的平行文本下的高斯混合模型語(yǔ)音轉(zhuǎn)換方法相比平均MCD值高了5.1%,平均MOS值低了3.9%,表明其轉(zhuǎn)換性能仍有一定的差距,但是本方法是在非平行文本條件下開(kāi)展的,具有更強(qiáng)的通用性。
[Abstract]:In the field of speech signal processing, speech conversion is to transform the speech of one speaker (source speaker) into a speech that sounds like another speaker (target speaker), while maintaining the same semantics. Speech contains abundant information, including semantic information, personality information, language information and emotional information, while speech conversion focuses on the acoustic essential features of speech, such as spectrum characteristics and prosodic features. In many application scenarios of speech conversion, such as entertainment and cross-language conversion, it is necessary that the speech conversion system can provide high quality speech and achieve speech conversion under non-parallel text. The existing speech conversion system is faced with two main problems: on the one hand, the transformed speech can not obtain higher similarity and better sound quality at the same time, but it has to weigh the similarity and sound quality of the converted speech at the same time. On the other hand, the training of conversion function depends on parallel corpus, which limits the generality of speech conversion system. In order to realize the speech conversion of high tone quality and similarity conversion, this paper proposes a bilinear frequency bending amplitude adjustment algorithm based on adaptive Gao Si classification, which uses adaptive Gao Si classification to better model the acoustic feature distribution of speech. On the basis of reasonable classification, speech conversion is carried out. After subjective and objective evaluation, The method proposed in this paper increases the average MOS value of speech by 4.7 and reduces the average MCD value by 2.7 points compared with the bilinear frequency bending and amplitude adjustment algorithm with fixed classification number, which shows that the proposed method is effective for speech conversion system. The performance has certain improvement. Secondly, in order to solve the dependence of speech conversion methods on parallel corpus, this paper uses the method of unit selection and channel length normalization to align the non-parallel corpus. Then the bilinear frequency bending amplitude adjustment method based on adaptive Gao Si classification is applied to the field of speech conversion under non-parallel text. By comparing subjective and objective evaluation experiments, it is proved that the average MOS value and the average MCD value of the transformed speech by the INCA method under non-parallel text are 7.1 higher and 4.0% lower than those of the non-parallel text INCA method, which indicates that the transformed speech has better sound quality and higher similarity. The average Gao Si value is 5.1 higher and the average MOS value is 3.9 lower than the traditional parallel text model speech conversion method, which indicates that there is still a certain gap in the conversion performance. However, this method is developed under the condition of non-parallel text. It is more versatile.
【學(xué)位授予單位】:南京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類(lèi)號(hào)】:TN912.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 車(chē)瀅霞;俞一彪;;約束條件下的結(jié)構(gòu)化高斯混合模型及非平行語(yǔ)料語(yǔ)音轉(zhuǎn)換[J];電子學(xué)報(bào);2016年09期
2 李陽(yáng)春;俞一彪;;倒譜本征空間結(jié)構(gòu)化高斯混合模型語(yǔ)音轉(zhuǎn)換方法[J];聲學(xué)學(xué)報(bào);2015年01期
3 李賢;於俊;汪增福;;面向情感語(yǔ)音轉(zhuǎn)換的韻律轉(zhuǎn)換方法[J];聲學(xué)學(xué)報(bào);2014年04期
4 宋鵬;王浩;趙力;;采用模型自適應(yīng)的語(yǔ)音轉(zhuǎn)換方法[J];信號(hào)處理;2013年10期
5 馬振;張雄偉;楊吉斌;徐玉龍;;基于稀疏卷積非負(fù)矩陣分解的語(yǔ)音轉(zhuǎn)換方法研究[J];軍事通信技術(shù);2013年02期
6 宋鵬;王浩;趙力;;基于混合Gauss歸一化的語(yǔ)音轉(zhuǎn)換方法[J];清華大學(xué)學(xué)報(bào)(自然科學(xué)版);2013年06期
7 馬振;張雄偉;楊吉斌;;基于語(yǔ)音個(gè)人特征信息分離的語(yǔ)音轉(zhuǎn)換方法研究[J];信號(hào)處理;2013年04期
8 孫健;張雄偉;曹鐵勇;楊吉斌;孫新建;;基于卷積非負(fù)矩陣分解的語(yǔ)音轉(zhuǎn)換方法[J];數(shù)據(jù)采集與處理;2013年02期
9 俞一彪;曾道建;姜瑩;;采用獨(dú)立說(shuō)話人模型的語(yǔ)音轉(zhuǎn)換[J];聲學(xué)學(xué)報(bào);2012年03期
10 徐寧;楊震;張玲華;;基于狀態(tài)空間模型的子頻帶語(yǔ)音轉(zhuǎn)換算法[J];電子學(xué)報(bào);2010年03期
,本文編號(hào):1998505
本文鏈接:http://www.sikaile.net/kejilunwen/xinxigongchenglunwen/1998505.html