多維語音信息識別技術(shù)研究
[Abstract]:With the increasing demand for artificial intelligence and the rapid development of machine learning technology, voice interaction technology has become the development trend of the next generation of smart home and many other applications. Speech recognition, speaker identification and voice emotion recognition have attracted more and more attention and high degree of attention. At present, the research of speech recognition at home and abroad is the single identification of single dimensional information or content. However, in daily life, the speech signals that people collect are essentially mixed signals, mainly including three large information: the content information contained in the voice, and the speech contains information related to the speaker's features (such as sex. We can identify all kinds of sound information at the same time, and we can identify all kinds of sound information at the same time in human dialogue. The separate identification of various information will produce the ambiguity of semantic understanding, reduce the robustness of speech recognition, and prevent the development of speech dialogue system. If machine can The identity, age, sex, emotional state and even background sound of the speaker can be recognized as many multidimensional information as a person at the same time, which can greatly improve the efficiency of human-computer dialogue and solve the bottleneck problem in the single dimension recognition system. Therefore, this team has proposed a new research topic for the simultaneous recognition of multidimensional speech information. Of course, there are nearly ten kinds of recognition objects involved in the three major aspects of the above information. At the same time, the recognition is very difficult and the scope of research involved is very wide. Therefore, as a pioneering attempt, this article will first study the multi-dimensional information recognition technology related to the speaker. Gender related emotion recognition, the technical research and development of gender and identity identification in the emotional environment. Aiming at the only one dimension information recognition system block diagram, this paper analyzes the common and characteristic of the traditional single speaker information recognition, and focuses on the two key technologies to realize the simultaneous recognition of multi-dimensional speaker information. Feature extraction and model training. (1) it is found that different speech feature parameters can represent different speech related information, and the same eigenvectors can also be used in different single dimensional speech recognition tasks. At present, the commonly used acoustic characteristic parameters are prosodic features, sound qualitative characteristics and spectral characteristics. The speaker related three aspects of information recognition, so consider using the combined features of the above three acoustic features as the feature parameters of the multidimensional speaker information recognition. Compared with the single category, it contains more abundant speech information. This paper uses two methods to obtain the fusion features respectively, one is extracted by the Matlab simulation platform. Low dimension features, and the other is the high dimensional feature extracted by the OpenSMILE toolbox. (2) in view of the lack of mature reference and theoretical knowledge of multidimensional information recognition, this paper first creatively constructs a gender based multidimensional information recognition baseline system, as a multidimensional reference model. Then, the baseline system and transmission are passed through the baseline system. Compared to the system identified by the system of emotion, gender and identity, the average recognition rate of the multidimensional recognition system is 11.37% higher, which proves the feasibility and effectiveness of the baseline system scheme, and proves that the multi-dimensional information recognition can also bring the advantage of improving the recognition rate of Dan Weixin interest, which itself becomes a new kind. (3) because the multi-dimensional speaker information recognition task is essentially a multi label learning problem, the multi example multi label learning algorithm is considered in the study of multidimensional speech recognition. The multi example multi label support vector machine (MSVM) is used for the first time for the first time. The experiment shows that, in addition to gender recognition, the recognition rate of the improved MIMLSVM system is higher than that of the baseline system, in addition to gender recognition, the recognition rate based on the improved system is higher than that of the baseline system. Among them, the high dimension features are used to improve the MIMLSVM system. The accuracy rate is lower than that of low dimension, and the baseline system is about 1.97% higher. It is visible that proper parameter selection and model matching can significantly improve the recognition rate of multidimensional systems. However, with the increase of the number of markers, the running time and computational complexity of the system are also increased accordingly. A certain amount of system complexity is the cost.
【學(xué)位授予單位】:南京郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TN912.34
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 劉文舉,孫兵,鐘秋海;基于說話人分類技術(shù)的分級說話人識別研究[J];電子學(xué)報;2005年07期
2 丁輝;唐振民;錢博;李燕萍;;易擴(kuò)展小樣本環(huán)境說話人辨認(rèn)系統(tǒng)的研究[J];系統(tǒng)仿真學(xué)報;2008年10期
3 劉明輝;黃中偉;熊繼平;;用于說話人辨識的評分規(guī)整[J];計算機(jī)工程與應(yīng)用;2010年12期
4 陳雪芳;楊繼臣;;一種三層判決的說話人索引算法[J];計算機(jī)工程;2012年02期
5 楊繼臣;何俊;李艷雄;;一種基于性別的說話人索引算法[J];計算機(jī)工程與科學(xué);2012年06期
6 何致遠(yuǎn),胡起秀,徐光yP;兩級決策的開集說話人辨認(rèn)方法[J];清華大學(xué)學(xué)報(自然科學(xué)版);2003年04期
7 殷啟新,韓春光,楊鑒;基于掌上電腦錄音的說話人辨認(rèn)[J];云南民族學(xué)院學(xué)報(自然科學(xué)版);2003年04期
8 呂聲,尹俊勛;同語種說話人轉(zhuǎn)換的實(shí)現(xiàn)[J];移動通信;2004年S3期
9 董明,劉加,劉潤生;快速口音自適應(yīng)的動態(tài)說話人選擇性訓(xùn)練[J];清華大學(xué)學(xué)報(自然科學(xué)版);2005年07期
10 曹敏;王浩川;;說話人自動識別技術(shù)研究[J];中州大學(xué)學(xué)報;2007年02期
相關(guān)會議論文 前10條
1 司羅;胡起秀;金琴;;完全無監(jiān)督的雙人對話中的說話人分隔[A];第九屆全國信號處理學(xué)術(shù)年會(CCSP-99)論文集[C];1999年
2 金乃高;侯剛;王學(xué)輝;李非墨;;基于主動感知的音視頻聯(lián)合說話人跟蹤方法[A];2010年通信理論與信號處理學(xué)術(shù)年會論文集[C];2010年
3 馬勇;鮑長春;夏丙寅;;基于辨別性深度信念網(wǎng)絡(luò)的說話人分割[A];第十二屆全國人機(jī)語音通訊學(xué)術(shù)會議(NCMMSC'2013)論文集[C];2013年
4 白俊梅;張樹武;徐波;;廣播電視中的目標(biāo)說話人跟蹤技術(shù)[A];第八屆全國人機(jī)語音通訊學(xué)術(shù)會議論文集[C];2005年
5 索宏彬;劉曉星;;基于高斯混合模型的說話人跟蹤系統(tǒng)[A];第八屆全國人機(jī)語音通訊學(xué)術(shù)會議論文集[C];2005年
6 羅海風(fēng);龍長才;;多話者環(huán)境下說話人辨識聽覺線索研究[A];中國聲學(xué)學(xué)會2009年青年學(xué)術(shù)會議[CYCA’09]論文集[C];2009年
7 王剛;鄔曉鈞;鄭方;王琳琳;張陳昊;;基于參考說話人模型和雙層結(jié)構(gòu)的說話人辨認(rèn)快速算法[A];第十一屆全國人機(jī)語音通訊學(xué)術(shù)會議論文集(一)[C];2011年
8 李經(jīng)偉;;語體轉(zhuǎn)換與角色定位[A];全國語言與符號學(xué)研究會第五屆研討會論文摘要集[C];2002年
9 王剛;鄔曉鈞;鄭方;王琳琳;張陳昊;;基于參考說話人模型和雙層結(jié)構(gòu)的說話人辨認(rèn)[A];第十一屆全國人機(jī)語音通訊學(xué)術(shù)會議論文集(二)[C];2011年
10 何磊;方棣棠;吳文虎;;說話人聚類與模型自適應(yīng)結(jié)合的說話人自適應(yīng)方法[A];第六屆全國人機(jī)語音通訊學(xué)術(shù)會議論文集[C];2001年
相關(guān)重要報紙文章 前8條
1 ;做一名積極的傾聽者[N];中國紡織報;2003年
2 唐志強(qiáng);不聽別人說話,也能模仿其口音[N];新華每日電訊;2010年
3 atvoc;數(shù)碼語音電路產(chǎn)品概述[N];電子資訊時報;2008年
4 記者 李山;德用雙音素改進(jìn)人工語音表達(dá)[N];科技日報;2012年
5 中國科學(xué)院自動化研究所模式識別國家重點(diǎn)實(shí)驗(yàn)室 于劍邋陶建華;個性化語音生成技術(shù)面面觀[N];計算機(jī)世界;2007年
6 江西 林慧勇;語音合成芯片MSM6295及其應(yīng)用[N];電子報;2006年
7 記者 邰舉;韓開發(fā)出腦電波情感識別技術(shù)[N];科技日報;2007年
8 黃力行邋陶建華;多模態(tài)情感識別參透人心[N];計算機(jī)世界;2007年
相關(guān)博士學(xué)位論文 前10條
1 李洪儒;語句中的說話人形象[D];黑龍江大學(xué);2003年
2 李威;多人會話語音中的說話人角色分析[D];華南理工大學(xué);2015年
3 楊繼臣;說話人信息分析及其在多媒體檢索中的應(yīng)用研究[D];華南理工大學(xué);2010年
4 鄭建煒;基于核方法的說話人辨認(rèn)模型研究[D];浙江工業(yè)大學(xué);2010年
5 呂聲;說話人轉(zhuǎn)換方法的研究[D];華南理工大學(xué);2004年
6 陳凌輝;說話人轉(zhuǎn)換建模方法研究[D];中國科學(xué)技術(shù)大學(xué);2013年
7 玄成君;基于語音頻率特性抑制音素影響的說話人特征提取[D];天津大學(xué);2014年
8 李燕萍;說話人辨認(rèn)中的特征參數(shù)提取和魯棒性技術(shù)研究[D];南京理工大學(xué);2009年
9 徐利敏;說話人辨認(rèn)中的特征變換和魯棒性技術(shù)研究[D];南京理工大學(xué);2008年
10 王堅;語音識別中的說話人自適應(yīng)研究[D];北京郵電大學(xué);2007年
相關(guān)碩士學(xué)位論文 前10條
1 李姍;多維語音信息識別技術(shù)研究[D];南京郵電大學(xué);2017年
2 朱麗萍;說話人聚類機(jī)制的設(shè)計與實(shí)施[D];北京郵電大學(xué);2016年
3 楊浩;基于廣義音素的文本無關(guān)說話人認(rèn)證的研究[D];北京郵電大學(xué);2008年
4 史夢潔;構(gòu)式“沒有比X更Y的(了)”研究[D];上海師范大學(xué);2015年
5 魏君;“說你什么好”的多角度研究[D];河北大學(xué);2015年
6 解冬悅;互動韻律:英語多人沖突性話語中說話人的首音模式研究[D];大連外國語大學(xué);2015年
7 朱韋巍;揚(yáng)州街上話語氣詞研究[D];南京林業(yè)大學(xué);2015年
8 蔣博;特定目標(biāo)說話人的語音轉(zhuǎn)換系統(tǒng)設(shè)計[D];電子科技大學(xué);2015年
9 王雅丹;漢語反語研究[D];南昌大學(xué);2015年
10 陳雨鶯;基于EMD的說話人特征參數(shù)提取方法研究[D];湘潭大學(xué);2015年
,本文編號:2173801
本文鏈接:http://www.sikaile.net/kejilunwen/xinxigongchenglunwen/2173801.html