藏語統(tǒng)計參數(shù)語音合成的合成語音的音質(zhì)評測

發(fā)布時間：2018-01-02 02:23

本文關(guān)鍵詞：藏語統(tǒng)計參數(shù)語音合成的合成語音的音質(zhì)評測　出處：《西北師范大學(xué)》2015年碩士論文　論文類型：學(xué)位論文

【摘要】：統(tǒng)計參數(shù)語音合成為了語音合成領(lǐng)域的主流合成方法,該方法能夠利用有限的訓(xùn)練語料合成出不同說話人、不同風格和不同情感的語音,具有容易改變合成語音的音質(zhì)、模型占用存儲空間小等優(yōu)點。論文以藏語的統(tǒng)計參數(shù)語音合成的合成語音質(zhì)量的評測為研究目標,提出了一種面向藏語統(tǒng)計參數(shù)語音合成的語音基元自動標注方法,考察了不同基元、不同時間標注對藏語統(tǒng)計參數(shù)藏語合成系統(tǒng)合成的語音音質(zhì)的影響,同時引入說話人識別方法對合成語音和源說話人的語音的相似程度進行了評測。論文的主要工作與創(chuàng)新如下:1.提出了一種面向藏語統(tǒng)計參數(shù)語音合成的語音基元自動標注方法。在基于隱Markov模型(Hidden Markov Model,HMM)的藏語統(tǒng)計參數(shù)語音合成的聲學(xué)模型訓(xùn)練中,引入了DAEM(Deterministic Annealing Expectation Maximization)算法,對沒有時間標注的藏語訓(xùn)練語音進行自動時間標注。以聲母和韻母為合成基元,在聲母和韻母的聲學(xué)模型的訓(xùn)練過程中,利用DAEM算法確定HMM模型的嵌入式重估的最佳參數(shù)。訓(xùn)練好聲學(xué)模型后,再利用強制對齊自動獲得聲母和韻母的時間標注。實驗結(jié)果表明,該方法對聲母和韻母的時間標注接近手工標注的結(jié)果。2.考察了不同語音基元和不同的基元時間標注對藏語合成語音音質(zhì)的影響。分別利用自動標注了時間邊界和手工標注了時間邊界的藏語語料庫訓(xùn)練聲學(xué)模型,實現(xiàn)基于HMM的藏語統(tǒng)計參數(shù)語音合成系統(tǒng)。在此基礎(chǔ)上,分別考察了以聲、韻母為合成基元和以音節(jié)為合成基元對合成語音音質(zhì)的影響。同時也考察了手工時間標注和自動時間標注對合成語音音質(zhì)的影響。結(jié)果表明,在訓(xùn)練語料少時,兩種不同基元合成的藏語語音音質(zhì)都比較差。隨著訓(xùn)練語料的增加,兩種不同基元合成的藏語語音音質(zhì)都在提高。最終在一定訓(xùn)練語料情況下,兩種不同基元可以合成語音音質(zhì)近似的語音。同時,以音節(jié)為基元時用自動標注時間的訓(xùn)練語料合成的藏語語音與用手工標注時間的訓(xùn)練語料合成的藏語語音音質(zhì)還有一定的差距。3.提出了一種利用說話人識別方法評測合成語音與目標說話人的相似程度的方法。采用經(jīng)驗?zāi)B(tài)分解法(Empirical Mode Decomposition,EMD)和短時分析相結(jié)合的說話人識別方法對合成語音進行說話人識別,通過識別結(jié)果判斷合成語音與目標說話人的相似程度。結(jié)果表明,合成的藏語語音與目標說人具有較高的相似程度。
[Abstract]:Statistical parametric speech synthesis method for synthesis of the mainstream field of speech synthesis, the method can use the limited training corpus to synthesize different speakers, different styles and different emotional speech, is easy to change the synthesized speech quality model, the storage space of small advantages. Based on the evaluation of Tibetan statistical parametric speech synthesis speech quality as the research target, put forward a kind of statistical parameters for Tibetan speech synthesis speech element automatic annotation method, the effects of different elements, different time labeling effect on the statistical parameters of a Tibetan Tibetan speech quality synthesis system, while the introduction of speaker recognition method of speech synthesis and speech source speech similar degree the evaluation. The main work and innovation of this paper are as follows: 1. we propose a statistical parameter for Tibetan speech synthesis speech element automatically In the annotation method based on hidden Markov model (Hidden Markov Model, HMM) of the Tibetan statistical parametric speech synthesis acoustic model training, the introduction of the DAEM (Deterministic Annealing Expectation Maximization) algorithm, the Tibetan language training speech did not have time to carry out the automatic annotation time annotation. To consonants and vowels as basic synthesis units, in the training process the acoustic model of consonants and vowels in the determination of optimum parameters of the HMM model embedded revaluation by using DAEM algorithm. The trained acoustic models, using forced alignment automatically get the initial and final time labeling. The experimental results show that the method of time of consonants and vowels marked close to the manual annotation results.2. different speech element and different element effect of the time effect on Tibetan speech synthesis sound annotation. Using automatic annotation time boundary and manually annotated time edge Acoustic model training corpus of Tibetan community, realizing the Tibetan statistical parametric speech synthesis system based on HMM. On this basis, were investigated to the sound of the vowel synthesis based on syllable element and element affecting the synthesis of synthesized speech quality. We also studied the annotation manual and automatic annotation time time of the synthesized speech quality. The results showed that in the training corpus is low, two kinds of element of Tibetan speech synthesis quality is relatively poor. With the increase of the training corpus, two kinds of element of Tibetan speech synthesis sound are improved. Finally in a training corpus, two kinds of primitives can approximate speech synthesis speech sound quality. At the same time, the syllable as the basic element for automatic annotation time training corpus for Tibetan speech synthesis and manual annotation time training corpus for Tibetan speech synthesis quality there is a certain gap.3. Put forward a method of using speaker recognition method to evaluate the synthesized speech with the target speaker similarity. The EMD method (Empirical Mode Decomposition, EMD) speaker recognition method and short-time analysis combining the speaker recognition for speech synthesis by judging the similarity of synthetic speech and the target speaker recognition results. The results show that the Tibetan speech synthesis and target people with high similarity.

【學(xué)位授予單位】：西北師范大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2015
【分類號】：TN912.33

【相似文獻】

相關(guān)期刊論文前10條

1 王立鋒;廖琪梅;苗丹民;;合成語音感知學(xué)習(xí)模型的建立和效應(yīng)驗證[J];第四軍醫(yī)大學(xué)學(xué)報;2006年04期

2 霍飛;阿尼·庫珀;;機械合成語音,能最終實現(xiàn)嗎?[J];世界科學(xué);2012年02期

3 趙博,蔡蓮紅;合成語音自然度客觀測度[J];計算機工程與應(yīng)用;2005年07期

4 華一滿;;合成語音在智能儀器中的應(yīng)用[J];電子技術(shù);1992年07期

5 宋潔;;語音輸出使未來的通訊越來越方便[J];航空計算技術(shù);1985年02期

6 鄧正良;一種靈活合成語音庫語音的方法[J];廣西氣象;2000年04期

7 陳聯(lián)武;郭武;戴禮榮;;聲紋識別中合成語音的魯棒性[J];模式識別與人工智能;2011年06期

8 高正平;徐駿宇;黃漢輝;;PWM在合成語音輸出電路中的應(yīng)用[J];電子科技大學(xué)學(xué)報;2006年01期

9 余志才,邵志標;PWM方式輸出合成語音[J];半導(dǎo)體技術(shù);2001年12期

10 劉惠華,潘建軍,周冰,范京;稀疏譜線合成對元音頻域信息分布的探討[J];北京機械工業(yè)學(xué)院學(xué)報;2005年01期

相關(guān)會議論文前6條

1 呂士楠;林凡;張連毅;;基于大語音庫的拼接合成語音特征分析[A];新世紀的現(xiàn)代語音學(xué)——第五屆全國現(xiàn)代語音學(xué)學(xué)術(shù)會議論文集[C];2001年

2 鮑懷翹;王安紅;呂士楠;鄭玉玲;;普通話合成語音評估方法研究[A];第七屆全國人機語音通訊學(xué)術(shù)會議（NCMMSC7）論文集[C];2003年

3 許潔萍;王安紅;鮑懷翹;鄭玉玲;陳明;呂士楠;;漢語合成語音評測實驗研究[A];第八屆全國人機語音通訊學(xué)術(shù)會議論文集[C];2005年

4 初敏;;韻律研究與合成語音的自然度[A];新世紀的現(xiàn)代語音學(xué)——第五屆全國現(xiàn)代語音學(xué)學(xué)術(shù)會議論文集[C];2001年

5 初敏;呂士楠;;一種將PSOLA算法與語音正弦模型結(jié)合的合成方法[A];第五屆全國人機語音通訊學(xué)術(shù)會議論文集[C];1998年

6 黃玫;李雙田;;一種改進的正弦分析／合成語音方法及在音頻時域修正中的應(yīng)用[A];2006年聲頻工程學(xué)術(shù)交流會論文集[C];2006年

相關(guān)重要報紙文章前1條

1 IDG電訊;XML：位于逐漸成形的Web服務(wù)中心[N];計算機世界;2002年

相關(guān)博士學(xué)位論文前1條

1 黃平牧;中文TTS系統(tǒng)中若干關(guān)鍵技術(shù)研究[D];北京郵電大學(xué);2008年

相關(guān)碩士學(xué)位論文前5條

1 楊心yN;歌聲合成技術(shù)與應(yīng)用探究[D];南京藝術(shù)學(xué)院;2015年

2 徐世鵬;藏語統(tǒng)計參數(shù)語音合成的合成語音的音質(zhì)評測[D];西北師范大學(xué);2015年

3 王家麗;嵌入式漢語合成語音庫的構(gòu)建與搜索[D];山東大學(xué);2008年

4 唐金峰;電話語音的頻帶擴展[D];蘇州大學(xué);2009年

5 李蕾;關(guān)于可編程流程的IVR系統(tǒng)的研究與設(shè)計[D];四川大學(xué);2005年

，

本文編號：1367277

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/wltx/1367277.html

上一篇：基于ZigBee技術(shù)的智能交通系統(tǒng)設(shè)計與實現(xiàn)
下一篇：空空導(dǎo)彈無線電引信箔條干擾風險分析方法

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

藏語統(tǒng)計參數(shù)語音合成的合成語音的音質(zhì)評測