基于張量分析的多因素音頻信號(hào)建模與應(yīng)用研究

發(fā)布時(shí)間：2018-01-02 21:25

本文關(guān)鍵詞：基于張量分析的多因素音頻信號(hào)建模與應(yīng)用研究　出處：《北京理工大學(xué)》2016年博士論文　論文類型：學(xué)位論文

【摘要】：隨著互聯(lián)網(wǎng)技術(shù)和多媒體技術(shù)的不斷發(fā)展,音頻信號(hào)作為多媒體信號(hào)的重要組成部分,對(duì)其進(jìn)行分析和處理引起了越來越多研究人員的關(guān)注,而張量分析是近年來被廣泛使用的多邊或者多線性分析工具,可以處理不止一個(gè)影響因素的信號(hào),包括信號(hào)的高階擴(kuò)展形式或者本身是多維度的信號(hào)。本文把張量分析方法引入到多因素音頻信號(hào)建模及其應(yīng)用研究領(lǐng)域,利用其作為一種多因素分析方法在處理高階信號(hào)方面可以保持?jǐn)?shù)據(jù)結(jié)構(gòu)信息的優(yōu)勢(shì),解決音頻信號(hào)的高階特征建模、音頻分類的高階子空間分析和多聲道音頻信號(hào)丟失數(shù)據(jù)的恢復(fù)這三個(gè)應(yīng)用問題,具體研究內(nèi)容如下:1.針對(duì)音頻信號(hào)的特征建模,本文對(duì)傳統(tǒng)的一維、二維建模方式進(jìn)行擴(kuò)展,使用張量對(duì)音頻信號(hào)的高階特征建模,既體現(xiàn)了音頻信號(hào)在不同屬性子空間的物理意義,又保證了各子空間之間的聯(lián)系,而且通過張量分解可以挖掘音頻信號(hào)潛在的、本質(zhì)的、具有區(qū)分度的結(jié)構(gòu)信息。在無人車語音命令識(shí)別系統(tǒng)中構(gòu)建一個(gè)幀結(jié)構(gòu)、分解尺度、特征參數(shù)的三階張量;在音頻分類系統(tǒng)中,采用聲學(xué)特征空間、感知特征空間和心理聲學(xué)特征空間的不同屬性構(gòu)建三階張量。通過使用張量建模和分解得到的音頻特征集合,有利于提升音頻識(shí)別和分類的正確率。2.針對(duì)音頻分類這一模式識(shí)別問題,本文利用高階子空間分析方法,創(chuàng)造性的使用非負(fù)張量分解技術(shù)進(jìn)行音頻分類。在有監(jiān)督的音頻分類訓(xùn)練時(shí),把音頻信號(hào)用非負(fù)張量模型表示,為了確保分解結(jié)果的唯一性,使用非負(fù)張量分解對(duì)每類音頻信號(hào)分別進(jìn)行學(xué)習(xí),得到各類音頻的非負(fù)核張量和因子矩陣;音頻分類測(cè)試時(shí),通過訓(xùn)練生成的非負(fù)因子矩陣把測(cè)試音頻映射到各種類音頻空間,通過Frobenius范數(shù)比較映射結(jié)果與訓(xùn)練時(shí)得到的各類音頻核張量的相似度完成音頻分類。與傳統(tǒng)分類器相比,因?yàn)橐纛l數(shù)據(jù)結(jié)構(gòu)中的非線性關(guān)系并沒有在非負(fù)張量分解過程中受到破壞,所以音頻分類效果更好,可以更有效的實(shí)現(xiàn)音頻數(shù)據(jù)庫的分類標(biāo)注。3.針對(duì)多聲道音頻信號(hào)丟失數(shù)據(jù)的恢復(fù)問題,本文把張量分解和張量補(bǔ)全技術(shù)首次引入到音頻數(shù)據(jù)恢復(fù)中。張量分解方法是對(duì)有數(shù)據(jù)丟失的音頻信號(hào)用三階張量建模并分解,通過加權(quán)處理和交替迭代算法實(shí)現(xiàn)目標(biāo)函數(shù)最小化;而張量補(bǔ)全方法主要通過定義張量的跡范數(shù),利用凸松弛技術(shù)把秩函數(shù)最小化問題轉(zhuǎn)化成跡范數(shù)最小化問題,即完成非凸優(yōu)化到凸優(yōu)化問題的轉(zhuǎn)變,進(jìn)而解決了非確定性多項(xiàng)式困難問題,利用基于塊坐標(biāo)下降算法的簡單補(bǔ)全和基于交替方向乘子算法的精確補(bǔ)全完成多聲道音頻信號(hào)丟失數(shù)據(jù)的恢復(fù)。
[Abstract]:With the development of Internet technology and multimedia technology, audio signal, as an important part of multimedia signal, has attracted more and more researchers' attention. Zhang Liang analysis is a multilateral or multi-linear analysis tool widely used in recent years, which can deal with more than one factor of the signal. This paper introduces Zhang Liang analysis method into multi-factor audio signal modeling and application research field. As a multi-factor analysis method, it can maintain the advantage of data structure information in processing high-order signal, and solve the high-order feature modeling of audio signal. High order subspace analysis of audio classification and restoration of lost data of multi-channel audio signal are three application problems. The specific research contents are as follows: 1. Aiming at the feature modeling of audio signal, this paper focuses on the traditional one-dimensional. The two-dimensional modeling method is extended to use Zhang Liang to model the high-order features of audio signal, which not only reflects the physical meaning of audio signal in different attribute subspace, but also ensures the relationship between each subspace. And through Zhang Liang decomposition can mine audio signal potential, essential, has the discriminative structure information, constructs a frame structure in the unmanned vehicle speech command recognition system, decomposes the scale. Third order Zhang Liang of characteristic parameter; In the audio classification system, the third order Zhang Liang is constructed by using different attributes of acoustic feature space, perceptual feature space and psychoacoustic feature space. It is helpful to improve the accuracy of audio recognition and classification. 2. Aiming at the pattern recognition problem of audio classification, this paper uses high-order subspace analysis method. Creative use of non-negative Zhang Liang decomposition technology for audio classification. In the supervised audio classification training, the audio signal is represented by non-negative Zhang Liang model, in order to ensure the uniqueness of decomposition results. The non-negative Zhang Liang decomposition is used to study each kind of audio signal separately, and the non-negative nuclear Zhang Liang and factor matrix of all kinds of audio are obtained. In audio classification testing, the test audio is mapped to various kinds of audio space through the non-negative factor matrix generated by the training. By comparing the mapping result of Frobenius norm with the similarity of Zhang Liang, the audio kernel obtained from the training, the audio classification is completed, which is compared with the traditional classifier. Because the nonlinear relationship in the audio data structure is not destroyed in the process of non-negative Zhang Liang decomposition, the audio classification effect is better. It is more effective to realize the classification tagging of audio database. 3. To recover the lost data of multi-channel audio signal. In this paper, Zhang Liang decomposition and Zhang Liang complement technology are introduced into audio data recovery for the first time. Zhang Liang decomposition method is to model and decompose the audio signals with data loss by the third order Zhang Liang. The objective function is minimized by weighted processing and alternating iteration algorithm. Zhang Liang complements the whole method mainly by defining Zhang Liang's trace norm, using convex relaxation technique to transform the rank function minimization problem into the trace norm minimization problem, that is, to complete the transformation from non-convex optimization to convex optimization. Furthermore, the problem of uncertain polynomial is solved. The simple complement based on block coordinate descent algorithm and the exact complement based on alternating direction multiplier algorithm are used to restore the lost data of multi-channel audio signal.
【學(xué)位授予單位】：北京理工大學(xué)
【學(xué)位級(jí)別】：博士
【學(xué)位授予年份】：2016
【分類號(hào)】：TN912.3

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 楊立東;王晶;謝湘;匡鏡明;;基于Tucker分解的音頻分類研究[J];信號(hào)處理;2015年02期

2 王磊;周樂囡;姬紅兵;林琳;;一種面向信號(hào)分類的匹配追蹤新方法[J];電子與信息學(xué)報(bào);2014年06期

3 XING Ling;MA Qiang;ZHU Min;;Tensor semantic model for an audio classification system[J];Science China(Information Sciences);2013年06期

4 邢玲;賀梅;馬強(qiáng);朱敏;;基于張量神經(jīng)網(wǎng)絡(luò)的音頻多語義分類方法[J];計(jì)算機(jī)應(yīng)用;2012年10期

5 盧雁;吳盛教;趙文強(qiáng);;壓縮感知理論綜述[J];計(jì)算機(jī)與數(shù)字工程;2012年08期

6 高悅;陳硯圃;閔剛;杜佳;;基于線性預(yù)測(cè)分析和差分變換的語音信號(hào)壓縮感知[J];電子與信息學(xué)報(bào);2012年06期

7 王膂;伍家松;Senhadji Lotfi;舒華忠;;音頻壓縮中3種整數(shù)型MDCT變換的比較[J];東南大學(xué)學(xué)報(bào)(自然科學(xué)版);2012年02期

8 朱墨;吳國清;郭新毅;;基于盲解卷積的水聲信號(hào)恢復(fù)技術(shù)[J];應(yīng)用聲學(xué);2011年03期

9 龐毅;閆德勤;;基于張量模式的降維方法研究[J];吉林師范大學(xué)學(xué)報(bào)(自然科學(xué)版);2011年02期

10 劉銘;俞能海;李衛(wèi)海;周浩;;基于張量分解的數(shù)字圖像取證[J];計(jì)算機(jī)工程;2011年08期

，

本文編號(hào)：1370979

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://www.sikaile.net/shoufeilunwen/xxkjbs/1370979.html

上一篇：用于白光LED的遠(yuǎn)程光譜轉(zhuǎn)換材料的制備及性能研究
下一篇：光纖Bragg光柵監(jiān)測(cè)系統(tǒng)研制優(yōu)化及其邊坡工程應(yīng)用研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于張量分析的多因素音頻信號(hào)建模與應(yīng)用研究