基于模板匹配的語音樣例快速檢索技術研究

發(fā)布時間：2018-09-03 11:18

【摘要】：語音樣例檢索是根據(jù)用戶輸入的查詢樣例（即波形片段），在海量的語音資源中搜索并返回與之相關聯(lián)的語音片段的過程。它在信息安全、語音搜索引擎以及語音資源的分類管理等領域具有重要的應用價值。基于模板匹配的語音樣例檢索是當前語音樣例檢索的主流技術之一。然而，直接運用該方法進行語音樣例的檢索存在時間消耗量大以及不能充分考慮聲學條件變異的缺點。針對上述缺點，本文主要在減少檢索時間消耗量以及相關區(qū)域重排序等方面開展研究，，以達到加快檢索速度、提高檢索精度的目的。本文的主要工作集中在以下三個方面：針對直接運用動態(tài)時間規(guī)整進行語音樣例檢索在相關區(qū)域搜索時時間消耗量大的問題，提出融合分段累積近似下界估計的動態(tài)時間規(guī)整算法，此算法通過大規(guī)模減少相關區(qū)域搜索時的動態(tài)匹配次數(shù)來達到提高檢索速度的目的。該方法首先計算查詢樣例和測試語句中每個匹配區(qū)域之間動態(tài)規(guī)整得分的分段累積近似下界估計；然后運用K最近鄰搜索算法和動態(tài)時間規(guī)整算法搜索與查詢樣例相關的區(qū)域。實驗結果表明：該方法的檢索速度是直接運用動態(tài)時間規(guī)整進行檢索的5.9倍，而對其檢索精度毫無影響。直接運用動態(tài)時間規(guī)整進行語音樣例檢索存在大量的冗余計算和冗余匹配。針對此問題，提出了一種基于分段動態(tài)時間規(guī)整的語音樣例檢索方法，該方法將測試語句按照一定規(guī)則劃分為一系列匹配區(qū)域；然后運用動態(tài)時間規(guī)整進行語音樣例的檢索。為進一步提高檢索效率，將分段動態(tài)時間規(guī)整算法和分段累積近似下界估計相結合。同時為了增加對聲學條件變異的考慮，運用虛擬相關反饋技術修正檢索結果，提出基于虛擬相似度的相關區(qū)域重排序方法。實驗結果表明：該方法的檢索速度是直接運用動態(tài)時間規(guī)整進行檢索的14.6倍，檢索精度相對于后者也提高了5.21%。針對融合下界估計的動態(tài)時間規(guī)整算法和融合下界估計的分段動態(tài)規(guī)整算法存在的局限，提出融合邊界信息的動態(tài)時間規(guī)整算法。該方法首先運用層次凝聚聚類算法將查詢樣例和測試語句的音素后驗概率特征序列分段（即邊界檢測），計算每個分段的均值向量，并將這些均值向量組成新索引和新查詢；再運用動態(tài)時間規(guī)整算法進行語音樣例的檢索；最后采用虛擬相關反饋修正檢索結果。實驗結果表明：該方法的檢索速度是直接運用動態(tài)時間規(guī)整進行檢索的15.4倍，檢索精度在后者的基礎上也提高了0.73%。
[Abstract]:Speech sample retrieval is a process of searching and returning the associated speech fragments in a large amount of speech resources according to the query samples (i.e. waveform fragments) input by the user. It has important application value in the fields of information security, voice search engine and classification management of speech resources. Speech sample retrieval based on template matching is one of the main techniques in speech sample retrieval. However, the direct use of this method for the retrieval of speech samples has the disadvantages of high time consumption and insufficient consideration of acoustic condition variation. In order to speed up the retrieval speed and improve the retrieval accuracy, this paper mainly focuses on reducing the retrieval time consumption and reordering the relevant areas in order to speed up the retrieval speed and improve the retrieval accuracy. The main work of this paper is focused on the following three aspects: aiming at the problem of large amount of time consumption in the search of related areas by direct use of dynamic time regularization for speech sample retrieval, A dynamic time warping algorithm based on piecewise cumulative approximate lower bound estimation is proposed. This algorithm can improve the retrieval speed by reducing the number of dynamic matching in search of relevant regions on a large scale. This method first calculates the piecewise cumulative approximate lower bound estimation of the dynamic warping scores between the query samples and each matching region in the test statement, and then uses the K-nearest neighbor search algorithm and the dynamic time warping algorithm to search the regions related to the query samples. The experimental results show that the retrieval speed of this method is 5.9 times faster than that of the direct use of dynamic time regulation, but it has no effect on the retrieval accuracy. There are a lot of redundant computation and redundant matching in speech sample retrieval using dynamic time warping. To solve this problem, a speech sample retrieval method based on piecewise dynamic temporal regularity is proposed, which divides test statements into a series of matching regions according to certain rules, and then uses dynamic time warping to retrieve speech samples. In order to further improve the retrieval efficiency, the piecewise dynamic time warping algorithm is combined with the piecewise cumulative approximate lower bound estimation. At the same time, in order to increase the consideration of acoustic condition variation, virtual correlation feedback technique is used to modify the retrieval results, and a virtual similarity based relative region reordering method is proposed. The experimental results show that the retrieval speed of this method is 14.6 times faster than that of the direct use of dynamic time warping, and the retrieval accuracy is 5.21 times higher than that of the latter. In view of the limitations of the dynamic time warping algorithm for fusion lower bound estimation and the segmented dynamic warping algorithm for fusion lower bound estimation, a dynamic time warping algorithm based on fusion boundary information is proposed. The method first uses hierarchical aggregation clustering algorithm to segment the phoneme posteriori probability feature series of query samples and test sentences (i.e. boundary detection), calculates the mean vector of each segment, and sets these mean vectors into new indexes and new queries. Then the dynamic time warping algorithm is used to retrieve the speech samples, and the virtual correlation feedback is used to correct the retrieval results. The experimental results show that the retrieval speed of this method is 15.4 times faster than that of the direct use of dynamic time warping, and the retrieval accuracy is improved by 0.73 on the basis of the latter.
【學位授予單位】：解放軍信息工程大學
【學位級別】：碩士
【學位授予年份】：2013
【分類號】：TN912.3

【共引文獻】

相關期刊論文前10條

1 牛濱;孔令志;羅森林;潘麗敏;郭亮;;基于MFCC和GMM的個性音樂推薦模型[J];北京理工大學學報;2009年04期

2 劉剛;葉大田;;針對漢語聲母發(fā)音的輔助教師系統(tǒng)的研究[J];北京生物醫(yī)學工程;2008年02期

3 張志勇;宋陽;;基于嵌入式下的語音機器人的設計與實現(xiàn)[J];長春師范學院學報(人文社會科學版);2008年10期

4 馬志欣;王宏;李鑫;;語音識別技術綜述[J];昌吉學院學報;2006年03期

5 楊占軍;楊英杰;王強;;基于DSP的語音識別系統(tǒng)的設計與實現(xiàn)[J];東北電力大學學報;2006年02期

6 高翔;姬光榮;姬婷婷;王群;;基于探測過程建模的探地雷達多目標識別[J];電波科學學報;2011年03期

7 熊心美;陸勇;李廣波;;基于高速SOC的FFT頻譜分析儀的設計[J];電測與儀表;2009年01期

8 白順先;馬瑞士;;語音端點檢測中判決機制的研究[J];大連民族學院學報;2010年03期

9 李炳男;張雪英;王峰;;基于RBF神經(jīng)網(wǎng)絡的鋼琴單音識別研究[J];電腦開發(fā)與應用;2009年04期

10 車士偉;吾守爾·斯拉木;;淺談連續(xù)語音識別中的關鍵技術[J];電腦與信息技術;2010年02期

相關會議論文前10條

1 王剛;鄔曉鈞;鄭方;王琳琳;張陳昊;;基于參考說話人模型和雙層結構的說話人辨認[A];第十一屆全國人機語音通訊學術會議論文集（二）[C];2011年

2 馬治飛;徐望;王炳錫;王興斌;;一種基于概率模型和倒譜差分的特征補償算法[A];第十二屆全國信號處理學術年會（CCSP-2005）論文集[C];2005年

3 王興斌;徐望;王炳錫;馬治飛;;噪聲環(huán)境下語音能量的MMSE估計及其在語音識別中的應用[A];第十二屆全國信號處理學術年會（CCSP-2005）論文集[C];2005年

4 徐小峰;胡央芳;劉守快;鄭翔;俞一彪;王宇嶺;王慶才;戴云;李道明;;基于VQ算法的病癥脈象識別[A];第十三屆全國信號處理學術年會（CCSP-2007）論文集[C];2007年

5 展領;景新幸;;矢量量化和VQ-UBM在說話人確認中的應用[A];中國聲學學會2009年青年學術會議[CYCA’09]論文集[C];2009年

6 漢小歡;景新幸;;一種級聯(lián)的特征參數(shù)提取方法[A];中國聲學學會2009年青年學術會議[CYCA’09]論文集[C];2009年

7 茹婷婷;謝湘;;耳語音數(shù)據(jù)庫的設計與采集[A];第九屆全國人機語音通訊學術會議論文集[C];2007年

8 熊軍軍;馬瑞堂;李成榮;;兒童語音識別的研究現(xiàn)狀[A];第九屆全國人機語音通訊學術會議論文集[C];2007年

9 沈宏余;李英;;基于TMS320VC5416的語音識別系統(tǒng)的設計與實現(xiàn)[A];2007'儀表，自動化及先進集成技術大會論文集（二）[C];2007年

10 李志忠;滕光輝;;基于發(fā)聲信息的動物福利評價研究現(xiàn)狀[A];農業(yè)工程科技創(chuàng)新與建設現(xiàn)代農業(yè)——2005年中國農業(yè)工程學會學術年會論文集第三分冊[C];2005年

相關博士學位論文前10條

1 黃湘松;基于混淆網(wǎng)絡的漢語語音檢索技術研究[D];哈爾濱工程大學;2010年

2 黃麗霞;非特定人魯棒性語音識別中前端濾波器的研究[D];太原理工大學;2011年

3 尉洪;漢語基元音素獨立分量譜分析對比及語音合成研究[D];云南大學;2011年

4 高翔;淺埋地層探地雷達信號處理與目標識別研究[D];中國海洋大學;2011年

5 呂釗;噪聲環(huán)境下的語音識別算法研究[D];安徽大學;2011年

6 吳強;基于聽覺感知與張量模型的魯棒語音特征提取方法研究[D];上海交通大學;2010年

7 曹聞;時空數(shù)據(jù)模型及其應用研究[D];解放軍信息工程大學;2011年

8 丁琦;數(shù)字音頻篡改檢測與隱寫分析技術研究[D];解放軍信息工程大學;2011年

9 李邵梅;文本無關短語音說話人識別技術研究[D];解放軍信息工程大學;2011年

10 龍潛;噪聲環(huán)境下的語音識別技術研究[D];中國科學技術大學;2007年

相關碩士學位論文前10條

1 王文姝;基于模糊理論的關鍵詞識別算法研究[D];哈爾濱工程大學;2010年

2 楊青;手勢識別技術的研究[D];大連理工大學;2010年

3 時筱惠;大連方言語音對英語語音習得的影響[D];遼寧師范大學;2010年

4 張宇;基于倒譜特征的說話人識別方法研究[D];大連海事大學;2010年

5 劉亞玉;限定性文本的語料庫自動構建[D];中國海洋大學;2010年

6 郭秋雨;小詞匯量非特定人的孤立詞語音識別系統(tǒng)研究[D];中國海洋大學;2010年

7 丁寧;小麥碰撞音頻信號預處理方法研究[D];河南工業(yè)大學;2010年

8 吳榮娣;基于特征分類直方圖均衡的魯棒性語音識別研究[D];蘇州大學;2010年

9 銀兵;基于μ’nSP~（TM）處理器的嵌入式語音控制技術研究[D];河南理工大學;2010年

10 童佳寧;基于HMM和PNN的混合語音識別模型研究[D];河北工程大學;2010年

本文編號：2219826

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/2219826.html

上一篇：中小企業(yè)電子商務差異化發(fā)展——基于博弈論的分析
下一篇：網(wǎng)絡鏈接侵權責任探析

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于模板匹配的語音樣例快速檢索技術研究