自適應視頻摘要算法研究
發(fā)布時間:2018-03-06 18:24
本文選題:視頻摘要 切入點:字典學習 出處:《中國科學技術大學》2017年博士論文 論文類型:學位論文
【摘要】:隨著數(shù)字錄像設備的普及以及網絡技術的發(fā)展,視頻逐漸成為人們記錄個人生活、并進行溝通的一種重要形式。每一天都會產生大量的視頻,這些視頻內容的范圍很廣,包括新聞、體育賽事、電視劇、綜藝節(jié)目以及自拍等等。這些海量的視頻,一方面給人們帶來巨大的觀看負擔,全部看完非常耗時;另一方面,也給視頻服務器、網站帶來了巨大的存儲壓力。因此,人們迫切需要一種方法能夠把視頻中的關鍵內容提取出來進行快速觀看、有效存儲。視頻摘要技術就是為了滿足這種需求而誕生的。近年來視頻摘要技術有了巨大發(fā)展,但還未成熟。本文的研究正是針對提高視頻摘要的性能展開的。本文對視頻摘要技術中存在的問題進行了深入的研究。目前,視頻內容種類繁多,內容千差萬別;甚至在同一個視頻中,也可能會包含很多的場景、且這些場景之間的差異非常大。視頻數(shù)據的這種多樣性,給視頻摘要算法的適應性提出了較高的要求。算法需要能夠根據視頻數(shù)據的內容,自適應地調節(jié)其提取特征的方式,進行視頻分段,提取關鍵幀,組成視頻摘要。瞄準這些需求,在已有的視頻摘要算法研究成果的基礎上,本文結合目前的字典學習和稀疏表示、深度學習等技術,對視頻摘要中的特征提取、視頻分段和視頻內容重要性評價等環(huán)節(jié)進行了深入研究,提出了相應的解決方法,并在標準數(shù)據集上進行了測試,對結果進行了分析。下面對本文的工作進行簡要介紹:1)提出了一種基于圖正則化稀疏編碼的視頻摘要算法。傳統(tǒng)的視頻摘要算法在特征提取環(huán)節(jié),往往直接按照某種事先制定好的規(guī)則來計算特征值。但是由于視頻內容較為多樣,這種事先制定好規(guī)則的提取特征方式,往往不能夠準確描述多樣的視頻內容。為了提高算法的適應能力,我們使用字典學習和稀疏表示方法,用無監(jiān)督特征學習的方式,根據視頻內容,自適應地學習出視頻內容對應的合適的特征空間,對視頻進行特征提取。通過采用這樣的方法,視頻特征能夠更加準確地描述其內容,且具有較強的場景適應性。2)提出了一種基于自適應閾值的視頻摘要算法。在提取了視頻幀的特征之后,需要進行視頻分段,獲得視頻的結構信息,作為生成視頻摘要的參考。現(xiàn)有的視頻分段算法,采用的是度量視頻幀之間的相似度、用固定閾值的方式來對視頻進行分段。然而,由于視頻數(shù)據的多樣性,同一個固定閾值很難在不同視頻中達到理想效果。這是因為,在不同的視頻中,其視頻內容的變化劇烈程度不同,因此其最優(yōu)的分割閾值也應該不同。為了能夠增強分段算法的適應性,文中提出了一種基于自適應閾值的視頻摘要算法。該算法能夠根據每個視頻中視頻幀變化的劇烈程度,自適應地調整視頻分段的閾值。這樣增強了算法的適應能力,有助于提高所生成的視頻摘要的質量。3)提出了一種基于自動編碼機的視頻摘要算法。對視頻進行了分段、獲得了視頻結構信息之后,需要確定不同視頻段的重要性程度,并將最重要的部分提取出來作為視頻摘要。重要性評價是一個非常重要且復雜的問題。一方面,其評價結果直接影響著視頻摘要的結果:另一方面,視頻內容的重要性評價比較主觀和抽象,很難用一組公式去進行概括和總結。本文首先通過視頻標題來收集網絡上和視頻內容相關的圖片;然后,用自動編碼機來學習圖片和視頻中共有的模式信息;最后,用訓練好的編碼機模型,對視頻內容進行重要性評價,依之生成視頻摘要。本文的方法,通過使用深度網絡對網絡圖片中的信息進行挖掘,能了解大眾對某些事物的判斷,因而能夠更加準確地判斷視頻內容的重要性。4)在實驗環(huán)節(jié),我們將以上提出的方法,在VSUMM,Youtube和SumMe等標準數(shù)據集上進行了測試,并進行了詳細的分析。結果表明,我們的方法在這些數(shù)據集上得到了更好的結果,生成了比現(xiàn)有方法質量更高的視頻摘要。
[Abstract]:With the rapid development of the popularity of digital video equipment and network technology, video recording has gradually become an important form of personal life, and communicate. Every day will produce a large number of video, the video content range is very wide, including news, sports, television dramas, variety shows and the self and so on. The massive video, on the one hand to bring huge burden to watch, read all very time-consuming; on the other hand, but also to the video server, the website has brought huge storage pressure. Therefore, it is an urgent need for a method to extract the key contents of the video quickly watch video abstract technology is the effective storage. In order to meet the demands of birth. In recent years, video abstract technology has made great progress, but still immature. This study is aimed at improving the performance of the video. This paper studied the existing problems in the video abstract technology. At present, many kinds of video content, content is different; even in the same video, may also contain a lot of scenes, and the difference between these scenarios is very large. The video data diversity, put forward higher requirements to abstract video adaptive algorithm. The algorithm needs to be able to according to the content of the video data, which adaptively adjust the feature extraction method, video segmentation, key frame extraction, video composition abstract. Aimed at these demands, the existing research results as the frequency algorithm on the basis of combining the dictionary learning and sparse representation, technology deep learning, feature extraction of video abstract, video segmentation and video content importance evaluation and other aspects of the in-depth study, put forward the corresponding solutions, and in the standard Data sets were tested, the results were analyzed. The work of this paper are briefly introduced: 1) proposed a video summarization algorithm of graph regularized sparse encoding based on traditional video summarization algorithm in the feature extraction step, often directly according to some prior made good rules to calculate the eigenvalues. Because the video content is more diverse, extract the features of this pre established rules, and often can not accurately describe the variety of video content. In order to improve the algorithm's adaptability, we use a dictionary learning and sparse representation method for unsupervised feature learning methods, according to the video content, adaptive learning space suitable video features corresponds to the content, the video feature extraction. By using this method, the video features can more accurately describe the content, and has strong adaptation to the scene .2) this paper proposes a video summarization algorithm based on adaptive threshold. After extracting the features of video frames, the need for video segmentation, obtain the structure information of video, video abstraction as reference. The existing video segmentation algorithm is used to measure the similarity between video frames, using a fixed threshold method segmentation of the video. However, due to the diversity of video data, with a fixed threshold is difficult to achieve the desired effect in different video. This is because, in different video, the video content is not the same degree of change, so the optimal segmentation threshold should also be different. In order to improve the segmentation algorithm the adaptability, this paper proposes a video summarization algorithm based on adaptive threshold. The algorithm according to the severity of the video in each video frame change, adaptive adjustment of video segmentation threshold Value. This enhances the algorithm's ability to adapt to the quality,.3 helps to improve the generated video) Abstract This paper proposes a video encoding algorithm based on the automatic machine. The video segment, after obtaining the video information, to determine the importance degree of different video segments, and will be the most important as part of the extract video summary. The importance of evaluation is a very important and complicated problem. On the one hand, the evaluation results directly affect the result of video abstract: on the other hand, to evaluate the importance of video content is subjective and abstract, it is difficult to use a formula to summarize. Firstly, through the video title to collect the network video content and related images; then, using automatic encoding machine to learn pictures and videos of common mode information; finally, use the trained model encoding machine, video content into For the importance of evaluation, according to the generated video abstract. This method of mining depth through the use of network information on the network image, to understand public opinion about some things, so it can more accurately judge the importance of video content.4) in the experiment, we will put forward the above method in VSUMM, Youtube and SumMe standard data sets were tested and analyzed in detail. The results show that our method on these data sets to get better results, generating higher than the existing methods of quality video abstract.
【學位授予單位】:中國科學技術大學
【學位級別】:博士
【學位授予年份】:2017
【分類號】:TP391.41
【相似文獻】
相關博士學位論文 前1條
1 李佳桐;自適應視頻摘要算法研究[D];中國科學技術大學;2017年
,本文編號:1575969
本文鏈接:http://www.sikaile.net/shoufeilunwen/xxkjbs/1575969.html