語音半自動標注系統(tǒng)的設計與實現(xiàn)
發(fā)布時間:2018-06-19 06:50
本文選題:DAEM算法 + STRAIGHT算法 ; 參考:《西北師范大學》2015年碩士論文
【摘要】:隨著當代信息技術日新月異的發(fā)展,人們對語音合成和語音識別的效果提出更高的要求,越來越多的實驗室研究成果被應用到實際生活中,各種語音系統(tǒng)產品不斷問世。構建大規(guī)模的語料庫是設計優(yōu)秀語音系統(tǒng)不可缺少的一項任務,而是否對語料庫進行精確標注,則決定語料庫質量的優(yōu)劣,因此語料庫的標注在語音研究中起到關鍵性的作用。大量的人工標注不僅耗時、耗力、成本大,而且由于人耳對于詞或語句中單個音節(jié)的邊界不敏感,標注數(shù)據(jù)會產生較大的誤差。論文設計了一個語音語料的半自動標注系統(tǒng),能夠自動計算出語音語料的邊界和基頻包絡,在此基礎上手工矯正自動標注結果,實現(xiàn)語音語料邊界和基頻包絡的準確標注。論文的主要工作與創(chuàng)新如下:1.實現(xiàn)了語音基元邊界的自動標注算法。對錄制好的無時間標注語音文件,采用基于隱Markov模型(Hidden Markov Model,HMM)的強制對齊算法進行時間邊界的自動對齊。在HMM模型訓練過程的重估步驟中,引入了確定性模擬退火期望值最大(Deterministic Annealing Expectation Maximization,DAEM)算法,提高了語音基元邊界強制對齊的準確性。2.實現(xiàn)了語音基頻的自動標注算法。在語料時長邊界標注的基礎上,采用STRAIGHT(Speech Transformation and Representation based on Adaptive Interpolation of w eighted spectrogram)算法提取語音的基頻,并對提取出的基頻數(shù)據(jù)進行平滑。根據(jù)兩峰值點距離是基頻周期的關系,獲得峰值點標注位置,從峰值點形成的基頻包絡曲線,可以直接發(fā)現(xiàn)漏標、錯標的峰值點。通過人工修正,得到更加準確的標注數(shù)據(jù)。這也就是半自動標注系統(tǒng)的體現(xiàn)。3.設計實現(xiàn)了一個語音半標注系統(tǒng)。系統(tǒng)采用圖形化用戶交互界面,在語音波形上畫出每個語音基元的邊界,同時將STRAIGHT算法的基頻,轉換成語音波形上的峰值點標注。在此基礎上,設計實現(xiàn)了手工修改語音基元邊界和峰值點標注的功能,以完成更為精確的語音基元邊界以及基頻包絡的標注,最終實現(xiàn)可視化的語音半自動標注系統(tǒng)的設計。4.對蘭州方言進行了實驗語音學分析。利用實現(xiàn)的語音半自動標注系統(tǒng),標注了蘭州方言單字的邊界和基頻,并進行了實驗語音學分析,驗證了蘭州方言單字的語音學結論。
[Abstract]:With the rapid development of modern information technology, people put forward higher requirements for the effect of speech synthesis and speech recognition. More and more laboratory research results have been applied to the real life, and a variety of speech system products are coming out. Constructing a large scale corpus is an indispensable task in the design of excellent speech system. Whether or not to accurately annotate the corpus determines the quality of the corpus, so the annotation of the corpus plays a key role in the phonological research. A large number of manual tagging is not only time-consuming, labor-intensive and costly, but also because the ear is insensitive to the boundary of a single syllable in a word or sentence, the tagging data will produce a large error. In this paper, a semi-automatic tagging system of speech corpus is designed, which can automatically calculate the boundary of speech corpus and the envelope of fundamental frequency. On this basis, the automatic tagging results can be corrected manually, and the accurate tagging of the boundary of speech corpus and the envelope of fundamental frequency can be realized. The main work and innovation of this paper are as follows: 1. An automatic algorithm for marking the edge of speech primitives is implemented. Based on hidden Markov model and Hidden Markov Model (HMMM), an automatic time boundary alignment algorithm is used to automatically align the recorded time-free speech files. In the revaluation step of hmm training process, deterministic Annealing expectation maximization (DAEMEM) algorithm is introduced, which improves the accuracy of speech primitive boundary forced alignment. The automatic marking algorithm of speech fundamental frequency is realized. On the basis of time-length boundary annotation, the speech and representation based on Adaptive of w eighted spectrogram) algorithm is used to extract the fundamental frequency of speech, and the extracted fundamental frequency data is smoothed. According to the relationship between the distance between two peaks and the fundamental frequency period, the tagging position of the peak point is obtained, and the fundamental frequency envelope curve formed from the peak point can directly find the missing mark and the wrong target peak point. Through manual correction, more accurate tagging data can be obtained. This is the semiautomatic tagging system. 3. A speech semi-label system is designed and implemented. The system uses a graphical user interface to draw the boundaries of each speech primitive on the speech waveform. At the same time, the fundamental frequency of the Straight algorithm is converted into the peak point annotation on the speech waveform. On this basis, the function of manually modifying the speech primitive boundary and peak point tagging is designed and realized, so as to complete the more accurate voice-element boundary and the tagging of the fundamental frequency envelope. Finally, the design of the visualized semi-automatic voice tagging system .4. This paper analyzes the experimental phonetics of Lanzhou dialect. In this paper, the boundary and fundamental frequency of single words in Lanzhou dialect are annotated by using the realized phonetic semiautomatic marking system, and the experimental phonetics analysis is carried out to verify the phonological conclusions of Lanzhou dialect words.
【學位授予單位】:西北師范大學
【學位級別】:碩士
【學位授予年份】:2015
【分類號】:TN912.3
【相似文獻】
相關期刊論文 前10條
1 王玉林,,趙炳彥;汽車車身零件圖紙的自動標注方法[J];計算機輔助工程;1996年04期
2 佘晶,黃翔;特征尺寸自動標注方法的研究及實現(xiàn)[J];機械制造與自動化;2005年01期
3 Q迷平
本文編號:2038994
本文鏈接:http://www.sikaile.net/kejilunwen/wltx/2038994.html
最近更新
教材專著