基于性別分類的說話人識(shí)別研究

發(fā)布時(shí)間：2018-01-29 11:51

本文關(guān)鍵詞： 漢語方言數(shù)據(jù)庫性別識(shí)別說話人識(shí)別矢量量化支持向量機(jī)　出處：《江蘇師范大學(xué)》2012年碩士論文　論文類型：學(xué)位論文

【摘要】：語音信號(hào)既包含了說話人的語義信息，又包含了說話人的個(gè)性信息，人們從中可以提取說話人的性別、年齡、籍貫等身份信息。說話人識(shí)別是根據(jù)語音信號(hào)中反映說話人的語音參數(shù)自動(dòng)確定說話人身份的技術(shù)。作為一種生物認(rèn)證技術(shù)，在信息檢索、公安破案、語音身份驗(yàn)證、電話銀行等領(lǐng)域具有重要的應(yīng)用價(jià)值和廣泛的應(yīng)用前景。論文從數(shù)據(jù)采集到特征提取和分類識(shí)別進(jìn)行了系統(tǒng)研究，取得了下列創(chuàng)新性成果。 1、建立一個(gè)漢語方言語音數(shù)據(jù)庫參照國際上語音語料庫的設(shè)計(jì)標(biāo)準(zhǔn)，考慮錄音通道、方言種類、話者年齡與性別分布的選擇。最終建立起一個(gè)涵蓋了閩、粵、吳、湘、北方、贛、客家等七種地方方言和普通話的漢語方言語音數(shù)據(jù)庫。包括寬帶語音（麥克風(fēng)）和窄帶語音（手機(jī)、固定電話），，106小時(shí)的語音數(shù)據(jù)。 2、提出一種基于碼本模型的性別辨識(shí)方法首次在性別識(shí)別研究中引入半監(jiān)督聚類技術(shù)，利用半監(jiān)督學(xué)習(xí)的思想對(duì)漢語方言的語音數(shù)據(jù)進(jìn)行矢量量化，形成具有監(jiān)督信息的男、女性別碼本的模型。該方法充分考慮了語音特征空間的概率分布狀態(tài)，優(yōu)化了碼本的生成方法，提高了碼本模型的精確度，解決了傳統(tǒng)矢量量化算法中碼本生成精度低的問題，有效提高了系統(tǒng)的識(shí)別效果。實(shí)驗(yàn)結(jié)果表明，在有噪語音和純凈語音環(huán)境下與傳統(tǒng)矢量量化算法比較，在識(shí)別精度、系統(tǒng)穩(wěn)定性魯棒性等方面都明顯提高。 3、改進(jìn)混合SVM的說話人識(shí)別方法 SVM以結(jié)構(gòu)風(fēng)險(xiǎn)最小化為準(zhǔn)則，類別區(qū)分能力強(qiáng)，輸出結(jié)果反映了異類樣本間的差異性，適用于處理連續(xù)輸入向量下的分類問題。為此，我們改進(jìn)了應(yīng)用于說話人識(shí)別的混合SVM模型識(shí)別系統(tǒng)。該方法在將大樣本數(shù)據(jù)進(jìn)行分割和聚類的基礎(chǔ)上，為每一類樣本語音都構(gòu)造一個(gè)SVM進(jìn)行訓(xùn)練，并綜合所有的SVM輸出結(jié)果進(jìn)行決策分類。較好的解決因話者數(shù)量增加和語音數(shù)據(jù)規(guī)模過大帶來的系統(tǒng)時(shí)間代價(jià)過大、識(shí)別效率低下的問題，有效地提高了話者識(shí)別系統(tǒng)的分類決策能力。 4、建立了分層話者識(shí)別系統(tǒng) 目前說話人識(shí)別難以大量數(shù)據(jù)下系統(tǒng)的實(shí)時(shí)應(yīng)用，隨著語音數(shù)據(jù)庫規(guī)模的不斷擴(kuò)大，依據(jù)現(xiàn)有技術(shù)，系統(tǒng)在識(shí)別時(shí)間、內(nèi)存需求及識(shí)別精度等方面都難以滿足實(shí)時(shí)辨識(shí)的需求。論文討論了MFCC、SDC等幾種不同特征在識(shí)別系統(tǒng)中的表現(xiàn)，并依據(jù)分類搜索的思想，利用方言辨識(shí)、性別辨識(shí)等技術(shù)，縮小說話人識(shí)別中的數(shù)量和范圍，再使用話者識(shí)別技術(shù)進(jìn)行辨識(shí)，最終確定每一位說話人的身份，努力尋求建立一個(gè)最優(yōu)的說話人識(shí)別系統(tǒng)模型。
[Abstract]:Speech signals not only contain the semantic information of the speaker, but also contain the personality information of the speaker, from which people can extract the sex and age of the speaker. Speaker recognition is a technology that automatically determines the identity of the speaker according to the voice parameters of the speaker. As a biometric authentication technology, it is used in information retrieval and public security to solve a case. Voice authentication, telephone banking and other fields have important application value and wide application prospects. This paper systematically studies data acquisition, feature extraction and classification recognition, and obtains the following innovative results. 1. Establish a phonetic database of Chinese dialects Referring to the design standards of international phonetic corpus, considering the choice of recording channels, dialect types, age and gender distribution of speakers. Finally, to establish a covering Fujian, Guangdong, Wu, Xiang, northern, Jiangxi. Hakka and other seven local dialects and Mandarin Chinese dialect voice database, including broadband voice (microphone) and narrowband voice (mobile phone, fixed telephone / telephone / 106 hours of voice data). 2. A method of sex identification based on codebook model is proposed. For the first time, semi-supervised clustering technology is introduced in the research of gender recognition, and the speech data of Chinese dialects are vectorized by semi-supervised learning to form men with supervised information. This method fully considers the probability distribution state of speech feature space, optimizes the codebook generation method, and improves the accuracy of codebook model. It solves the problem of low precision of codebook generation in the traditional vector quantization algorithm and effectively improves the recognition effect of the system. The experimental results show that the algorithm is compared with the traditional vector quantization algorithm in noisy speech and pure speech environment. The recognition accuracy and system stability robustness are improved obviously. 3, improve the method of speaker recognition based on hybrid SVM SVM takes structural risk minimization as the criterion and has strong ability to distinguish categories. The output results reflect the differences between different samples and are suitable for dealing with classification problems under continuous input vectors. We improve the hybrid SVM model recognition system which is applied to speaker recognition. Based on the segmentation and clustering of large sample data, we construct a SVM for each class of speech samples. And synthesizes all the SVM output results to carry on the decision classification, which solves the problem that the system time cost is too large and the recognition efficiency is low due to the increase of the number of speakers and the large scale of speech data. The classification decision ability of speaker recognition system is improved effectively. 4. A hierarchical speaker recognition system is established At present, speaker recognition is difficult to be used in real time under a large amount of data. With the continuous expansion of the scale of speech database, according to the existing technology, the system is in the recognition time. Memory requirements and recognition accuracy are difficult to meet the needs of real-time identification. This paper discusses the performance of several different features such as MFCC / SDC in the recognition system, and according to the idea of classification and search. By using dialect identification, gender identification and other techniques, the number and scope of speaker recognition are reduced, and then the speaker recognition technology is used to identify each speaker. Try to establish an optimal speaker recognition system model.
【學(xué)位授予單位】：江蘇師范大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2012
【分類號(hào)】：H17

【參考文獻(xiàn)】

相關(guān)期刊論文前8條

1 馬志友,楊瑩春,吳朝暉;二次特征提取及其在說話人識(shí)別中的應(yīng)用[J];電路與系統(tǒng)學(xué)報(bào);2003年02期

2 蔣曄;唐振民;;短語音說話人辨認(rèn)的研究[J];電子學(xué)報(bào);2011年04期

3 顧明亮;馬勇;;基于高斯混合模型的漢語方言辨識(shí)系統(tǒng)[J];計(jì)算機(jī)工程與應(yīng)用;2007年03期

4 肖毅,李治柱;中文普通話電話語音數(shù)據(jù)庫的研制[J];計(jì)算機(jī)工程;2002年08期

5 顧明亮;沈兆勇;;基于語音配列的漢語方言自動(dòng)辨識(shí)[J];中文信息學(xué)報(bào);2006年05期

6 屈丹,王炳錫,魏鑫;語言辨識(shí)的矢量量化方法(VQ)[J];信息工程大學(xué)學(xué)報(bào);2002年03期

7 何勁松,施澤生;特征選擇方法中的信號(hào)分析方法研究[J];中國科學(xué)技術(shù)大學(xué)學(xué)報(bào);2001年01期

8 劉巖;;關(guān)于中國少數(shù)民族瀕危語言語音語料庫的設(shè)計(jì)[J];中央民族大學(xué)學(xué)報(bào);2006年04期

相關(guān)重要報(bào)紙文章前1條

1 北京大學(xué)信息科學(xué)中心視覺與聽覺信息處理國家重點(diǎn)實(shí)驗(yàn)室吳璽宏;[N];計(jì)算機(jī)世界;2001年

相關(guān)博士學(xué)位論文前2條

1 雷震春;支持向量機(jī)在說話人識(shí)別中的應(yīng)用研究[D];浙江大學(xué);2006年

2 解焱陸;基于特征變換和分類的文本無關(guān)電話語音說話人識(shí)別研究[D];中國科學(xué)技術(shù)大學(xué);2007年

本文編號(hào)：1473348

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://www.sikaile.net/wenyilunwen/hanyulw/1473348.html

上一篇：對(duì)韓漢語教學(xué)中的介詞教學(xué)研究
下一篇：仡央語言和彝語的接觸關(guān)系

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于性別分類的說話人識(shí)別研究