語(yǔ)音驅(qū)動(dòng)三維唇形動(dòng)畫(huà)算法研究
發(fā)布時(shí)間:2018-04-05 22:15
本文選題:語(yǔ)音驅(qū)動(dòng) 切入點(diǎn):三維動(dòng)畫(huà) 出處:《北京理工大學(xué)》2016年碩士論文
【摘要】:語(yǔ)音驅(qū)動(dòng)三維唇形動(dòng)畫(huà)算法屬于語(yǔ)音信號(hào)處理與三維動(dòng)畫(huà)技術(shù)交叉范疇,可應(yīng)用于各種需要語(yǔ)音與唇形同步的三維動(dòng)畫(huà)領(lǐng)域,如三維動(dòng)畫(huà)電影或視頻、3D游戲、虛擬主播、教學(xué)視頻等。目前國(guó)內(nèi)外關(guān)于語(yǔ)音驅(qū)動(dòng)唇形動(dòng)畫(huà)的研究較少,進(jìn)行唇形動(dòng)畫(huà)制作時(shí)多以人工制作為主,費(fèi)時(shí)費(fèi)力,因此研究語(yǔ)音驅(qū)動(dòng)三維唇形動(dòng)畫(huà)算法具有一定的社會(huì)意義與應(yīng)用價(jià)值。在語(yǔ)音驅(qū)動(dòng)三維唇形動(dòng)畫(huà)算法中,語(yǔ)音到唇形的映射直接影響到唇形動(dòng)畫(huà)的真實(shí)感。在現(xiàn)有的語(yǔ)音驅(qū)動(dòng)唇形動(dòng)畫(huà)算法中,主要存在以下難點(diǎn)和問(wèn)題:(1)不同語(yǔ)言間音素的發(fā)音規(guī)律有所不同,難以與唇形形成統(tǒng)一的映射關(guān)系;(2)使用BP神經(jīng)網(wǎng)絡(luò)進(jìn)行語(yǔ)音特征參數(shù)到唇形的映射,通常速度和精度高度受限于訓(xùn)練樣本數(shù)量和網(wǎng)絡(luò)結(jié)構(gòu);(3)三維人臉模型的格式多種多樣,沒(méi)有統(tǒng)一的唇形動(dòng)畫(huà)標(biāo)準(zhǔn),通用性存在不足。本文針對(duì)上述問(wèn)題,在現(xiàn)有的語(yǔ)音驅(qū)動(dòng)唇形動(dòng)畫(huà)算法基礎(chǔ)上,做了如下改進(jìn)工作:首先,分析了漢語(yǔ)普通話和英語(yǔ)的發(fā)音規(guī)律,嘗試用國(guó)際音標(biāo)將兩種語(yǔ)言的發(fā)音規(guī)律統(tǒng)一起來(lái),并以此為依據(jù)錄制了訓(xùn)練語(yǔ)音庫(kù)。其次,嘗試適用高斯混合模型算法和基于有向無(wú)環(huán)圖的支持向量機(jī)多分類算法(DAG-SVM)代替神經(jīng)網(wǎng)絡(luò)進(jìn)行音素分類,并對(duì)DAG-SVM進(jìn)行了改進(jìn)。最后,利用DirectX中的三維網(wǎng)格漸變動(dòng)畫(huà)技術(shù)實(shí)現(xiàn)了通用性強(qiáng)且具有真實(shí)感的三維人臉唇形動(dòng)畫(huà),并與分類算法相結(jié)合,編寫(xiě)了圖形界面。實(shí)驗(yàn)結(jié)果表明本文提出的算法性能較好,能達(dá)到預(yù)期要求。
[Abstract]:Speech driven 3D lip animation algorithm belongs to the cross category of speech signal processing and 3D animation technology. It can be used in various 3D animation fields, such as 3D animation movies or video games, virtual anchors, etc.Teaching videos, etc.At present, there are few researches on speech driven lip animation at home and abroad. Most of the lip animation is made manually, which is time-consuming and laborious. Therefore, the study of speech driven three-dimensional lip animation algorithm has certain social significance and application value.In the speech driven 3D lip animation algorithm, the mapping of speech to lip shape directly affects the reality of lip animation.In the existing speech driven lip animation algorithms, there are mainly the following difficulties and problems: 1) the phoneme sounds differently among different languages.It is difficult to form a unified mapping relationship with lip shape.) BP neural network is used to map speech feature parameters to lip shape. Usually, the speed and accuracy are highly limited by the number of training samples and network structure.There is no uniform standard for lip animation, and there is a lack of generality.In order to solve the above problems, based on the existing speech driven lip animation algorithms, this paper makes the following improvements: firstly, it analyzes the pronunciation rules of Mandarin and English.This paper attempts to unify the pronunciation rules of the two languages with the International phonetic Alphabet and record the training corpus on the basis of it.Secondly, we try to use Gao Si hybrid model algorithm and support vector machine multi-classification algorithm based on directed acyclic graph (SVM) instead of neural network to classify phoneme, and improve DAG-SVM.Finally, the 3D facial lip animation with strong generality and realistic sense is realized by using the technology of 3D mesh gradual animation in DirectX, and the graphical interface is compiled by combining with the classification algorithm.The experimental results show that the proposed algorithm has good performance and can meet the expected requirements.
【學(xué)位授予單位】:北京理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP391.41;TN912.3
,
本文編號(hào):1716712
本文鏈接:http://www.sikaile.net/kejilunwen/xinxigongchenglunwen/1716712.html
最近更新
教材專著