天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 數學論文 >

基于多樣化Top-k Shapelets的時間序列分類方法研究

發(fā)布時間:2018-10-08 20:09
【摘要】:時間序列是指將某種現象某一個統(tǒng)計指標在不同時間上的數值按時間先后順序形成的序列。由于真實系統(tǒng)或現象的內部通常會受到多種因素的影響,從而導致輸出的時間序列具有許多復雜的表現:維度高、結構復雜、存在噪聲以及存在相似性變形等。傳統(tǒng)時間序列分析方法采用統(tǒng)計學方法對時間序列進行建模,但其復雜的特性使得構建的模型很難滿足實際系統(tǒng)的要求,因此基于數據挖掘的時間序列研究方法應運而生,使得時間序列挖掘成為一個活躍的研究領域。時間序列分類是時間序列數據挖掘領域的一類重要研究內容,其任務是通過構建分類器為給定的時間序列數據分配一個類標號。作為一種針對局部形態(tài)特征的分類方法,shapelets能夠區(qū)分子序列之間微小的差別,從而獲得良好的分類效果,在醫(yī)療診斷、姿勢識別等多個領域得到應用,但仍然存在亟待解決的問題。本文針對這些問題,所做的主要研究內容如下:(1)針對現有基于shapelets的分類方法中最優(yōu)shapelets集合存在冗余的問題,提出了一種基于多樣化top-k shapelets轉換的時間序列分類方法(Div Top KShapelet)。本文引入數據檢索領域的多樣化top-k查詢方法,提出了多樣化top-k shapelets的概念及相對應的多樣化top-k shapelets圖,對候選的shapelets進行處理,從中選出最具有辨別能力且彼此不相似的shapelets,同時,使用SAX技術對原始的時間序列數據集進行降維。實驗結果表明:該方法不僅比傳統(tǒng)分類方法具有更高的準確率,而且與使用聚類篩選的方法(Cluster Shapelet)和shapelets覆蓋的方法(Shapelet Selection)相比,分類準確率最多提高了48.43%和32.61%;同時在所有15個數據集上均有計算效率的提升,最少加速了1.09倍,最高可達到287.8倍。(2)針對現有shapelets分類方法不能解決不平衡時間序列分類的問題,提出了基于多樣化top-k shapelets轉換的時間序列分類方法(Div IMShapelet+SMOTE)。將不平衡數據分類評價指標AUC,代替?zhèn)鹘y(tǒng)的信息熵作為衡量shapelets的標準,并利用多樣化top-k shapelets對訓練集進行轉換,最后使用SMOTE方法對轉換后的訓練集進行過采樣。該方法利用AUC值對不平衡數據不敏感的特性,使shapelets特征更能準確評估分類的準確性,不僅可以有效提取時間序列特征,而且在特征的基礎上進行數據集的平衡處理。實驗表明:與Div Top KShapelet和INOS+SVM方法相比,Div IMShapelet+SMOTE的效果最好,分類準確率最多提高了38.8%和10.2%,AUC最多提高了0.37和0.08,F-measure最多提高了0.35和0.15,能夠有效處理不平衡時間序列數據分類問題。
[Abstract]:A time series is a series in which the values of a certain statistical index in different time are formed in order of time. Because the interior of real system or phenomenon is usually affected by many factors, the output time series have many complex manifestations: high dimension, complex structure, noise and similarity deformation. The traditional time series analysis method uses the statistical method to model the time series, but its complex characteristics make it difficult to meet the requirements of the actual system, so the time series research method based on data mining emerges as the times require. It makes time series mining an active research field. Time series classification is an important research content in the field of time series data mining. Its task is to assign a class number to a given time series data by constructing a classifier. As a classification method based on local morphological features, shapelets can make small differences between molecular sequences, thus obtaining good classification effect. It has been applied in many fields, such as medical diagnosis, posture recognition, etc. But there are still problems to be solved. The main research contents of this paper are as follows: (1) aiming at the redundancy of optimal shapelets set in existing classification methods based on shapelets, a time series classification method based on diversified top-k shapelets transformation, (Div Top KShapelet)., is proposed. In this paper, we introduce the diversified top-k query method in the field of data retrieval, propose the concept of diversified top-k shapelets and the corresponding diversified top-k shapelets diagram, process the candidate shapelets, and select the most discriminative and dissimilar shapelets, simultaneously. Using SAX technology to reduce the dimension of the original time series data set. The experimental results show that the proposed method not only has a higher accuracy than the traditional classification method, but also compares with the clustering filtering method (Cluster Shapelet) and the shapelets covering method (Shapelet Selection). The accuracy of classification is increased by 48.43% and 32.61%, and the computational efficiency is improved on all 15 data sets, which accelerates at least 1.09 times and can reach 287.8 times. (2) the existing shapelets classification method can not solve the problem of unbalanced time series classification. A time series classification method, (Div IMShapelet SMOTE)., based on diversified top-k shapelets transformation is proposed. The unbalanced data classification and evaluation index (AUC,) is used to replace the traditional information entropy as the standard to measure shapelets, and the training set is converted by using diversified top-k shapelets. Finally, the transformed training set is oversampled by SMOTE method. In this method, the AUC value is insensitive to unbalanced data, so that the shapelets feature can evaluate the accuracy of classification more accurately. It can not only extract the feature of time series effectively, but also deal with the balance of data set on the basis of feature. The experimental results show that compared with Div Top KShapelet and INOS SVM methods, Div IMShapelet SMOTE has the best effect. The classification accuracy is increased by 38.8% and 10.2% respectively. The maximum increases of 0.37 and 0.08 F-measure are 0.37 and 0.35 and 0.15, respectively, which can effectively deal with the classification problem of unbalanced time series data.
【學位授予單位】:中國礦業(yè)大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP311.13;O211.61

【參考文獻】

相關期刊論文 前3條

1 原繼東;王志海;韓萌;;基于Shapelet剪枝和覆蓋的時間序列分類算法[J];軟件學報;2015年09期

2 原繼東;王志海;韓萌;游洋;;基于邏輯shapelets轉換的時間序列分類算法[J];計算機學報;2015年07期

3 葉志飛;文益民;呂寶糧;;不平衡分類問題研究綜述[J];智能系統(tǒng)學報;2009年02期

,

本文編號:2258104

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/kejilunwen/yysx/2258104.html


Copyright(c)文論論文網All Rights Reserved | 網站地圖 |

版權申明:資料由用戶ea4f8***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com