基于流形學(xué)習(xí)的A股上市公司抽樣的信用評價
發(fā)布時間:2018-01-04 20:43
本文關(guān)鍵詞:基于流形學(xué)習(xí)的A股上市公司抽樣的信用評價 出處:《電子科技大學(xué)》2014年碩士論文 論文類型:學(xué)位論文
更多相關(guān)文章: 流形學(xué)習(xí) 等距映射(ISOMAP) 支持向量機(jī) 聚類分析 信用評價
【摘要】:隨著科學(xué)技術(shù)的飛速發(fā)展,經(jīng)濟(jì)全球化的快速蔓延,如何進(jìn)行有效的信用風(fēng)險評估是當(dāng)今金融領(lǐng)域的重要問題。準(zhǔn)確的風(fēng)險評估在銀行貸款中尤為重要,甚至對預(yù)測違約概率一個小的改進(jìn)都可以使銀行獲得更多的額外利潤。然而,在銀行大量保有的客戶數(shù)據(jù)庫中,銀行的工作人員難以對這些數(shù)據(jù)進(jìn)行有效的分析與利用。而數(shù)據(jù)挖掘技術(shù)對于尋求銀行現(xiàn)有業(yè)務(wù)數(shù)據(jù)中的規(guī)律,開發(fā)銀行決策支持系統(tǒng)正好提供了有力的支持。面臨大量的數(shù)據(jù)、較高的維度,為了保障數(shù)據(jù)挖掘的高效性,我們需在原始數(shù)據(jù)輸入之前進(jìn)行特殊處理以保證數(shù)據(jù)挖掘算法的良好性能。而流形學(xué)習(xí)作為一種降維的機(jī)器學(xué)習(xí)方法,正好可以滿足降維這一需求。鑒于此,本文提出了一個基于流形學(xué)習(xí)和數(shù)據(jù)挖掘技術(shù)的混合模型來進(jìn)行信用評價研究。本研究提出的基于流形學(xué)習(xí)的信用評價模型如下:(1)對抽樣選取的250家A股上市公司過去的非線性財(cái)務(wù)數(shù)據(jù)進(jìn)行Z-score規(guī)范化數(shù)據(jù)預(yù)處理。(2)使用流形學(xué)習(xí)典型算法中的等距映射(ISOMAP)對財(cái)務(wù)數(shù)據(jù)進(jìn)行降維,即特征提取。(3)將提取的特征數(shù)據(jù)輸入SVM進(jìn)行分類和預(yù)測企業(yè)信用風(fēng)險。為了證明本文提出模型的有效性,我們將“PCA+SVM”、“LLE+SVM”,“SVM”的性能與本文提出的混合模型“ISOMAP+SVM”做出比較。(4)在分類的基礎(chǔ)上進(jìn)行聚類,得出具體上市公司分類并劃分信用等級以幫助銀行制定相應(yīng)的貸款策略。本文將定性分析和定量分析相結(jié)合,采用Matlab R2012a對財(cái)務(wù)數(shù)據(jù)進(jìn)行處理后,得到以下幾個重要結(jié)論:(1)經(jīng)過Z-score規(guī)范化方法進(jìn)行數(shù)據(jù)預(yù)處理得到的結(jié)果明顯優(yōu)于沒有規(guī)范化得到的結(jié)果。數(shù)據(jù)是否進(jìn)行規(guī)范化預(yù)處理對后續(xù)數(shù)據(jù)處理影響很大。(2)與“PCA+SVM”和“LLE+SVM”相比,本研究所提出的基于流形學(xué)習(xí)算法中的ISOMAP的信用評價模型不僅有最好的分類精度,使第二類錯誤的發(fā)生率最低,并且與聚類分析相結(jié)合提高了分類準(zhǔn)確性。此模型能夠?qū)崿F(xiàn)一種改進(jìn)的預(yù)測精度,提高了上市公司的信用分類準(zhǔn)確性。(3)在數(shù)據(jù)降維后,基于二分類的基礎(chǔ)上使用k-means算法將250家上市公司成功分類并聚類成了7類,這有助于對上市企業(yè)信用風(fēng)險的評價、劃分信用等級并制定相應(yīng)的信貸策略。(4)使用流形學(xué)習(xí)和PCA對非線性數(shù)據(jù)進(jìn)行降維,均可以提高預(yù)測和聚類的準(zhǔn)確度,降低信用分類成本。但I(xiàn)SOMAP和LLE對非線性數(shù)據(jù)的降維性能比PCA略勝一籌。
[Abstract]:With the rapid development of science and technology and the rapid spread of economic globalization, how to carry out effective credit risk assessment is an important issue in the field of finance. Even a small improvement in predicting the probability of default can allow banks to earn more extra profit. However, in a large number of customer databases maintained by banks. It is difficult for the staff of the bank to analyze and utilize these data effectively, and the data mining technology can seek the rules in the existing business data of the bank. The decision support system of the development bank has provided the powerful support. Facing the massive data, the higher dimension, in order to guarantee the high efficiency of the data mining. We need special processing before the original data input to ensure the good performance of the data mining algorithm. As a dimensionality reduction machine learning method, manifold learning can meet the demand of dimensionality reduction. In this paper, a hybrid model based on manifold learning and data mining is proposed to study credit evaluation. Perform Z-score normalization data preprocessing on the past nonlinear financial data of 250 A-share listed companies selected from a sample.) using the isometric mapping in the typical manifold learning algorithm (. ISO MAP) reduces the dimension of financial data. In order to prove the validity of the model proposed in this paper, we will "PCA SVM". The performance of "LLE SVM" and "SVM" is compared with the hybrid model "ISOMAP SVM" proposed in this paper. The classification of specific listed companies and the classification of credit ratings to help banks to formulate the corresponding loan strategy. This paper combines qualitative analysis and quantitative analysis. Use Matlab R2012a to process financial data. Get the following important conclusions: 1). The result of data preprocessing by Z-score normalization method is obviously better than that without normalization. Whether or not the data is normalized preprocessing has a great influence on the subsequent data processing. Compared with "PCA SVM" and "LLE SVM". The credit evaluation model based on ISOMAP in manifold learning algorithm proposed in this paper not only has the best classification accuracy, but also has the lowest occurrence rate of the second kind of errors. And combined with clustering analysis to improve the accuracy of classification. This model can achieve an improved prediction accuracy, improve the accuracy of credit classification of listed companies. On the basis of two-classification, we use k-means algorithm to classify 250 listed companies successfully and cluster them into 7 categories, which is helpful to evaluate the credit risk of listed enterprises. Using manifold learning and PCA to reduce the dimension of nonlinear data can improve the accuracy of prediction and clustering. The cost of credit classification is reduced. But ISOMAP and LLE have better dimensionality reduction performance than PCA.
【學(xué)位授予單位】:電子科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:F832.51;F275;F832.4
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 劉東輝;卞建鵬;付平;劉智青;;支持向量機(jī)最優(yōu)參數(shù)選擇的研究[J];河北科技大學(xué)學(xué)報(bào);2009年01期
,本文編號:1380050
本文鏈接:http://www.sikaile.net/jingjilunwen/jinrongzhengquanlunwen/1380050.html
最近更新
教材專著