基于鄰近重采樣和分類器排序的信用卡欺詐檢測中不平衡數(shù)據(jù)研究
發(fā)布時間:2024-02-21 00:23
信用卡交易的普遍化,導致全球信用卡交易欺詐愈發(fā)嚴重,每年造成的損失高達數(shù)十億美元。有效的信用卡欺詐檢測算法可以有效地降低財務風險和金融風險。這種算法在很大程度上依賴于機器學習和數(shù)據(jù)挖掘技術(shù),但由于信用卡交易數(shù)據(jù)分布并不均勻,使得設(shè)計欺詐檢測系統(tǒng)具有挑戰(zhàn)性。這種非靜態(tài)分布使得正常的信用卡交易數(shù)據(jù)遠多于欺詐交易數(shù)據(jù),一般稱之為不平衡數(shù)據(jù)。這種不均衡的數(shù)據(jù)分布通常會導致分類器被多數(shù)類(合法交易)數(shù)據(jù)所淹沒,并且會因為不能預測少類數(shù)據(jù)(欺詐性交易)而失去預測功能。為解決這個問題,一種可能的解決方案是在數(shù)據(jù)級使用預處理技術(shù)。預處理技術(shù)是數(shù)據(jù)挖掘任務的關(guān)鍵步驟,處理后的數(shù)據(jù)直接應用于分類技術(shù)從而建立預測模型。預處理過程包括數(shù)據(jù)清洗,數(shù)據(jù)集成,數(shù)據(jù)變換,數(shù)據(jù)重采樣等。本文主要從數(shù)據(jù)清洗和數(shù)據(jù)重采樣兩個方面進行研究。噪聲數(shù)據(jù)指存在異常變化或錯誤的數(shù)據(jù),會嚴重影響數(shù)據(jù)分類性能。重采樣則是用于產(chǎn)生構(gòu)建預測模型的訓練數(shù)據(jù),預測模型的質(zhì)量很大程度上取決于在模型的訓練中使用什么樣的樣本。重采樣技術(shù)通過減少多數(shù)類(欠采樣)或增加少數(shù)類(過采樣)來產(chǎn)生均衡的訓練集,通過這樣的平衡訓練集可以建立性能更高的預測模型,F(xiàn)...
【文章頁數(shù)】:137 頁
【學位級別】:博士
【文章目錄】:
摘要
abstract
Chapter 1 Introduction
1.1 Introduction
1.1.1 Credit Card Fraud
1.1.2 Types of Credit Card Fraud
1.1.2.1 Bankruptcy Fraud
1.1.2.2 Theft fraud/counterfeit Fraud
1.1.2.3 Application Fraud
1.1.2.4 Behavioral Fraud
1.1.3 Losses Generated by Credit Card Fraud
1.2 Fraud Analytics and Predictive Analytics
1.3 Predictive Analytics for Credit Card Fraud
1.4 Pre-processing Techniques for Class Imbalance
1.5 Research Motivation and Problem Statement
1.6 Contribution
1.7 Software Implementation for Experimentation
1.8 Layout of Thesis
Chapter 2 Literature Review
2.1 Machine Learning
2.1.1 Unsupervised Learning
2.1.2 Supervised Learning
2.1.2.1 Supervised Learning for Credit Card Fraud Detection
2.1.3 Classification Techniques for Credit Card Fraud
2.1.3.1 Decision Tree
2.1.3.2 Support Vector Machine (SVM)
2.1.3.3 IBK
2.1.3.4 Voted Perceptron
2.1.3.5 Linear Logistic
2.1.3.6 Na?ve Bayes
2.1.3.7 Bayesian Network
2.2 Single & Multi-algorithm Classification Techniques used for CCFD
2.3 General Framework of Credit Card Fraud Detection
2.4 Techniques for Handling Class Imbalanced Datasets
2.4.1 Algorithm Level Techniques
2.4.2 Data Level Techniques
2.4.2.1 Under-sampling Techniques
2.4.2.2 Over-sampling Techniques
2.4.2.3 Ensemble Techniques
2.4.2.4 Cost Based Techniques
2.5 Related Work
2.5.1 Literature Survey for Resampling Techniques and Limitations
2.5.2 Literature Survey for Ranking Classification Algorithms using MCDM
Chapter 3 A Novel Resampling Approach for Credit Card Fraud
3.1 Motivation for the Novel Resampling Approach
3.2 Locally Centered Mahalanobis Distance
3.3 Algorithm for Noisy and Borderline Samples
3.3.1 Algorithm for Noisy and Borderline samples
3.4 Novel Resampling Approach
3.4.1 Novel Under-sampling Approach
3.4.2 Over-sampling Approach
3.4.2.1 Over-sampling Algorithm
3.5 Experimentation
3.5.1 Credit Card Data Sets
3.5.1.1 Australian Credit Approval (ACA)
3.5.1.2 German Credit Data (GCD)
3.5.1.3 Give Me Some Credit (GMSC)
3.5.1.4 PAKDD 2010
3.5.1.5 Indonesian Credit Card Dataset (ICCD)
3.5.2 Dataset Preparation for Supervised Classification
3.5.2.1 Training and Cross-validation Sets
3.5.2.2 Testing Set
3.5.3 Evaluation Criteria for Credit Card Datasets
3.5.3.1 Performance Measures
3.5.4 Experimental Procedure
3.6 Results and Discussion
3.6.1 Under-sampling Results
3.6.2 Over-sampling Results
Chapter 4 Impact of Class Imbalance in Ranking Classifiers
4.1 A Comparative Study of Decision Tree Algorithms for Credit Card Fraud
4.1.1 Experimental Design
4.1.2 Resampling the Datasets
4.1.3 Feature selection and Classification
4.1.4 Parameter Tuning of Classifiers
4.1.5 Results & Discussion
4.2 Ranking Classifiers Using MCDM for Imbalanced CCFD
4.2.1 Proposed Scheme
4.2.1.1 Pre-Processing Phase
4.2.1.2 Data Mining Phase
4.2.1.3 Ranking Phase
4.2.2 Experimental Design
4.2.3 Results and Discussion
4.2.3.1 MCDM Phase
4.3 Comparison of Different Ranking Approaches for Classifiers
Chapter 5 Conclusion
5.1 Contributions and Conclusions
5.2 Future Work
Acknowledgement
References
Research Results Obtained During the Study for Doctoral Degree
本文編號:3904750
【文章頁數(shù)】:137 頁
【學位級別】:博士
【文章目錄】:
摘要
abstract
Chapter 1 Introduction
1.1 Introduction
1.1.1 Credit Card Fraud
1.1.2 Types of Credit Card Fraud
1.1.2.1 Bankruptcy Fraud
1.1.2.2 Theft fraud/counterfeit Fraud
1.1.2.3 Application Fraud
1.1.2.4 Behavioral Fraud
1.1.3 Losses Generated by Credit Card Fraud
1.2 Fraud Analytics and Predictive Analytics
1.3 Predictive Analytics for Credit Card Fraud
1.4 Pre-processing Techniques for Class Imbalance
1.5 Research Motivation and Problem Statement
1.6 Contribution
1.7 Software Implementation for Experimentation
1.8 Layout of Thesis
Chapter 2 Literature Review
2.1 Machine Learning
2.1.1 Unsupervised Learning
2.1.2 Supervised Learning
2.1.2.1 Supervised Learning for Credit Card Fraud Detection
2.1.3 Classification Techniques for Credit Card Fraud
2.1.3.1 Decision Tree
2.1.3.2 Support Vector Machine (SVM)
2.1.3.3 IBK
2.1.3.4 Voted Perceptron
2.1.3.5 Linear Logistic
2.1.3.6 Na?ve Bayes
2.1.3.7 Bayesian Network
2.2 Single & Multi-algorithm Classification Techniques used for CCFD
2.3 General Framework of Credit Card Fraud Detection
2.4 Techniques for Handling Class Imbalanced Datasets
2.4.1 Algorithm Level Techniques
2.4.2 Data Level Techniques
2.4.2.1 Under-sampling Techniques
2.4.2.2 Over-sampling Techniques
2.4.2.3 Ensemble Techniques
2.4.2.4 Cost Based Techniques
2.5 Related Work
2.5.1 Literature Survey for Resampling Techniques and Limitations
2.5.2 Literature Survey for Ranking Classification Algorithms using MCDM
Chapter 3 A Novel Resampling Approach for Credit Card Fraud
3.1 Motivation for the Novel Resampling Approach
3.2 Locally Centered Mahalanobis Distance
3.3 Algorithm for Noisy and Borderline Samples
3.3.1 Algorithm for Noisy and Borderline samples
3.4 Novel Resampling Approach
3.4.1 Novel Under-sampling Approach
3.4.2 Over-sampling Approach
3.4.2.1 Over-sampling Algorithm
3.5 Experimentation
3.5.1 Credit Card Data Sets
3.5.1.1 Australian Credit Approval (ACA)
3.5.1.2 German Credit Data (GCD)
3.5.1.3 Give Me Some Credit (GMSC)
3.5.1.4 PAKDD 2010
3.5.1.5 Indonesian Credit Card Dataset (ICCD)
3.5.2 Dataset Preparation for Supervised Classification
3.5.2.1 Training and Cross-validation Sets
3.5.2.2 Testing Set
3.5.3 Evaluation Criteria for Credit Card Datasets
3.5.3.1 Performance Measures
3.5.4 Experimental Procedure
3.6 Results and Discussion
3.6.1 Under-sampling Results
3.6.2 Over-sampling Results
Chapter 4 Impact of Class Imbalance in Ranking Classifiers
4.1 A Comparative Study of Decision Tree Algorithms for Credit Card Fraud
4.1.1 Experimental Design
4.1.2 Resampling the Datasets
4.1.3 Feature selection and Classification
4.1.4 Parameter Tuning of Classifiers
4.1.5 Results & Discussion
4.2 Ranking Classifiers Using MCDM for Imbalanced CCFD
4.2.1 Proposed Scheme
4.2.1.1 Pre-Processing Phase
4.2.1.2 Data Mining Phase
4.2.1.3 Ranking Phase
4.2.2 Experimental Design
4.2.3 Results and Discussion
4.2.3.1 MCDM Phase
4.3 Comparison of Different Ranking Approaches for Classifiers
Chapter 5 Conclusion
5.1 Contributions and Conclusions
5.2 Future Work
Acknowledgement
References
Research Results Obtained During the Study for Doctoral Degree
本文編號:3904750
本文鏈接:http://www.sikaile.net/shoufeilunwen/jjglbs/3904750.html
最近更新
教材專著