基于聽覺濾波器的音頻感知哈希算法及其在音樂檢索中的應用
發(fā)布時間:2018-04-15 23:02
本文選題:音頻感知哈希 + Gammachirp濾波器組; 參考:《華東理工大學》2015年碩士論文
【摘要】:隨著互聯網和多媒體技術的不斷發(fā)展,人們能夠越來越方便的獲取更多的數字音頻資源。由于人耳聽覺系統(tǒng)對于音頻具有卓越的分辨能力,即使在嘈雜的環(huán)境中,只需要幾秒鐘便可以識別出正在播放的歌曲。但問題是面對越來越多的音頻資源,如何通過計算機實現自動音頻識別。由此產生了基于內容進行識別的音頻感知哈希技術。 針對目前很多提出的音頻感知哈希算法魯棒性不夠好,計算復雜度高的問題,本文提出一種新的音頻感知哈希算法。首先,我們設計了一種新的音頻時頻域特征表示方法,用多通道Gammachirp濾波器組在人耳最敏感頻帶范圍內對音頻信號進行濾波,分幀后按頻帶計算能量譜,實驗證明該音頻特征具有很好的魯棒性和抗幾何失真能力。接著利用非負矩陣分解(Non-negative Matrix Factorization, NMF)提取出Gamamchirp耳蝸能量譜局部特征的同時對數據進行降維。最后對該局部特征進行差分和量化得到二值化的音頻感知哈希,實驗結果表明在經受音頻編輯軟件多種攻擊和實際環(huán)境中錄音檢索時,所提出的音頻感知哈希算法都具有很高的識別率。 另一方面,檢索速度在音頻信息檢索中也是一個很重要的問題。僅通過改變算法無法在短時間內獲得顯著的速度提升。因此,有必要利用其它計算設備加速音頻檢索算法。圖形處理單元(Graphic Processing Unit, GPU)能夠提供強大的并行計算能力,嘗試利用GPU對已有音頻檢索算法進行加速具有重要的意義。本文中,通過利用CPU與GPU協(xié)同運算使得感知哈希匹配和整個音頻信息檢索過程的耗時得到了大幅度降低。 最后,本文結合以上算法設計了一個交互式音樂檢索系統(tǒng),該系統(tǒng)可以通過錄取幾秒種的音頻片段檢索出其對應的曲名,歌手以及專輯封面圖片等信息。
[Abstract]:With the continuous development of Internet and multimedia technology, people can obtain more and more digital audio resources more and more conveniently.Because the human auditory system has excellent audio discrimination, even in noisy environments, it takes only a few seconds to recognize the songs being played.But the problem is how to realize automatic audio recognition by computer in the face of more and more audio resources.Therefore, an audio perceptive hashing technique based on content recognition is produced.Aiming at the problem that many audio perceptive hashing algorithms are not robust enough and high computational complexity, a new audio perceptual hash algorithm is proposed in this paper.First of all, we design a new time and frequency domain feature representation method for audio frequency. We filter audio signals in the most sensitive frequency band of human ear by using multi-channel Gammachirp filter banks, and calculate the energy spectrum according to the frequency band after dividing frames.Experiments show that the audio feature has good robustness and anti-geometric distortion.Then the non-negative Matrix factorization (NMF) is used to extract the local features of the Gamamchirp cochlear energy spectrum and to reduce the dimension of the data.Finally, the binary audio perceptual hashes are obtained by differential and quantization of the local features. The experimental results show that, when the audio editing software is subjected to various attacks and the actual environment,The proposed audio perceptual hashing algorithm has a high recognition rate.On the other hand, retrieval speed is also an important problem in audio information retrieval.Only by changing the algorithm can not achieve a significant speed increase in a short period of time.Therefore, it is necessary to use other computing devices to speed up the audio retrieval algorithm.Graphic Processing Unit (GPU) can provide powerful parallel computing power. It is of great significance to use GPU to accelerate the existing audio retrieval algorithms.In this paper, the time consuming of perceptual hash matching and the whole audio information retrieval process is greatly reduced by using CPU and GPU cooperative operation.Finally, this paper designs an interactive music retrieval system based on the above algorithms. The system can retrieve the corresponding music titles, singers and album cover pictures by taking audio clips of several seconds.
【學位授予單位】:華東理工大學
【學位級別】:碩士
【學位授予年份】:2015
【分類號】:TN713;TP391.3
【參考文獻】
相關期刊論文 前8條
1 趙鶴鳴,葛良,陳雪勤,俞一彪;基于聲音定位和聽覺掩蔽效應的語音分離研究[J];電子學報;2005年01期
2 牛夏牧;焦玉華;;感知哈希綜述[J];電子學報;2008年07期
3 徐達文;王讓定;鮑吉龍;;基于聽覺感知模型的自適應音頻數字水印算法[J];計算機工程與應用;2006年31期
4 吳曉婷;閆德勤;;數據降維方法分析與研究[J];計算機應用研究;2009年08期
5 張文q,
本文編號:1756185
本文鏈接:http://www.sikaile.net/kejilunwen/dianzigongchenglunwen/1756185.html