天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于語音頻率特性抑制音素影響的說話人特征提取

發(fā)布時(shí)間:2018-03-24 10:17

  本文選題:說話人辨認(rèn) 切入點(diǎn):音素的個(gè)人信息分布 出處:《天津大學(xué)》2014年博士論文


【摘要】:語音具有語言信息與個(gè)人信息;語言信息表示說話人的共性特征,個(gè)人信息表示說話人個(gè)性特征。進(jìn)行說話人識(shí)別時(shí),需要保存說話人個(gè)性信息并同時(shí)抑制語言信息。然而,語音信號(hào)的說話人個(gè)性信息與語言信息很難分開。為了減小發(fā)音內(nèi)容之間差異對(duì)說話人識(shí)別的影響,本文提出了音素影響抑制(PhonemeEffect Suppression,PES)法,以便強(qiáng)調(diào)說話人個(gè)人信息的差異。 為了得到在頻域上說話人信息的準(zhǔn)確分布,本文首先研究了語音頻率特性。我們通過得到每個(gè)音素在各個(gè)子頻帶上對(duì)說話人個(gè)性信息的貢獻(xiàn)率(PhonemeF-ratio Contribution,PFC),提出了在不同音素的說話人信息的分布。語音受到人的發(fā)聲器官、發(fā)音方式與發(fā)音位置的影響。所以在每個(gè)音素的說話人信息的分布反映特定生理發(fā)音器官與發(fā)音方式的個(gè)性。本文在三種語言(英語、漢語與朝鮮語)上分別研究了說話人個(gè)人信息的聲學(xué)表達(dá)。通過測試每個(gè)音素在各個(gè)子頻帶上對(duì)說話人個(gè)性信息的貢獻(xiàn)率,發(fā)現(xiàn)濁音、清音和鼻音的都具有不同的說話人個(gè)性信息的分布。 在此基礎(chǔ)上,本文提出了PES方法,抑制了不同音素對(duì)說話人個(gè)性的影響,得出了說話人個(gè)人信息在頻域上的分布(Phoneme Effect Suppressed SpeakerInformation Distribution,PES-SID)。 最后,本文提出了一種提取說話人特征的新方法,此方法專注于基于說話人個(gè)人信息分布的非均勻頻率尺度的表示。本文提出的說話人特征用于GMM說話人模型并進(jìn)行了說話人辨認(rèn)實(shí)驗(yàn),并與另外兩種說話人特征作了對(duì)比。實(shí)驗(yàn)結(jié)果表明我們提出的特征優(yōu)于其他兩種特征。與MFCC(Mel Frequency CepstrumCoefficient)特征相比,對(duì)于不同的語言,我們提出的特征都降低了識(shí)別錯(cuò)誤率:對(duì)于英語降低了61.1%,對(duì)于朝鮮語68.0%,對(duì)中文32.9%。與FFCC(F-ratioFrequency Cepstrum Coefficient)相比,我們的錯(cuò)誤率降低了:30%(英語),,28.5%(朝鮮語),6.6%(中文)。這些結(jié)果表明,本文提出的特征對(duì)于不同的語言也具有一定的說話人鑒別魯棒性。
[Abstract]:Speech has language information and personal information; language information represents the common characteristics of the speaker and personal information represents the individual characteristics of the speaker. In the process of speaker recognition, it is necessary to preserve the speaker's personality information and suppress the language information at the same time. It is difficult to separate the speaker's personality information from the language information of the speech signal. In order to reduce the influence of the difference between the pronunciation contents on the speaker's recognition, this paper proposes a phoneme influence suppression method (PhonemeEffect support expression) to emphasize the difference of the speaker's personal information. In order to obtain the accurate distribution of speaker information in frequency domain, In this paper, we first study the frequency characteristics of speech. By obtaining the contribution rate of each phoneme to the speaker's personality information in each subband, we propose the distribution of speaker information in different phonemes. Therefore, the distribution of speaker information in each phoneme reflects the personality of specific physiological organs and patterns of pronunciation. In this paper, three languages (English, English, English, English, English, English, English, English, English, English, English, English, English, English, English, English, etc.). The acoustic expression of the speaker's personal information was studied in Chinese and Korean respectively. By testing the contribution rate of each phoneme to the speaker's personality information in each sub-band, we found the voiced sound. Clear tone and nasal sound have different distribution of speaker's personality information. On this basis, the PES method is proposed to suppress the influence of different phonemes on the speaker's personality, and the distribution of the speaker's personal information in the frequency domain is obtained. Finally, a new method for extracting speaker features is proposed. This method focuses on the representation of non-uniform frequency scales based on the distribution of personal information of the speaker. The speaker feature proposed in this paper is used in the GMM speaker model and the speaker recognition experiment is carried out. Compared with the other two speaker features, the experimental results show that the proposed feature is superior to the other two features. All the features we proposed reduced the rate of recognition errors: 61.1 for English, 68.0 for Korean, 32.9for Chinese. Compared with FFCC(F-ratioFrequency Cepstrum efficiency, our error rate was lower than that of FFCC(F-ratioFrequency. The proposed features are also robust to speaker identification for different languages.
【學(xué)位授予單位】:天津大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2014
【分類號(hào)】:TN912.3

【參考文獻(xiàn)】

相關(guān)期刊論文 前3條

1 岳喜才;葉大田;;文本無關(guān)的說話人識(shí)別:綜述[J];模式識(shí)別與人工智能;2001年02期

2 鄧菁;鄭方;劉建;吳文虎;;Mel子帶譜質(zhì)心和高斯混合相關(guān)性在魯棒話者識(shí)別中的應(yīng)用[J];聲學(xué)學(xué)報(bào);2006年05期

3 俞一彪;袁冬梅;薛峰;;一種適于說話人識(shí)別的非線性頻率尺度變換[J];聲學(xué)學(xué)報(bào)(中文版);2008年05期



本文編號(hào):1657858

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/kejilunwen/wltx/1657858.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶8969e***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com