應(yīng)用k-means算法實(shí)現(xiàn)標(biāo)記分布學(xué)習(xí)
發(fā)布時(shí)間:2018-07-28 16:36
【摘要】:標(biāo)記分布學(xué)習(xí)是近年來(lái)提出的一種新的機(jī)器學(xué)習(xí)范式,它能很好地解決某些標(biāo)記多義性的問(wèn)題,F(xiàn)有的標(biāo)記分布學(xué)習(xí)算法均利用條件概率建立參數(shù)模型,但未能充分利用特征和標(biāo)記間的聯(lián)系。本文考慮到特征相似的樣本所對(duì)應(yīng)的標(biāo)記分布也應(yīng)當(dāng)相似,利用原型聚類(lèi)的k均值算法(k-means),將訓(xùn)練集的樣本進(jìn)行聚類(lèi),提出基于kmeans算法的標(biāo)記分布學(xué)習(xí)(label distribution learning based on k-means algorithm,LDLKM)。首先通過(guò)聚類(lèi)算法kmeans求得每一個(gè)簇的均值向量,然后分別求得對(duì)應(yīng)標(biāo)記分布的均值向量。最后將測(cè)試集和訓(xùn)練集的均值向量間的距離作為權(quán)重,應(yīng)用到對(duì)測(cè)試集標(biāo)記分布的預(yù)測(cè)上。在6個(gè)公開(kāi)的數(shù)據(jù)集上進(jìn)行實(shí)驗(yàn),并與3種已有的標(biāo)記分布學(xué)習(xí)算法在5種評(píng)價(jià)指標(biāo)上進(jìn)行比較,實(shí)驗(yàn)結(jié)果表明提出的LDLKM算法是有效的。
[Abstract]:Label distributed learning is a new machine learning paradigm proposed in recent years. It can solve some problems of label polysemy. The existing algorithm of label distribution learning uses conditional probability to establish parameter model, but it fails to make full use of the relationship between feature and marker. In this paper, we consider that the label distribution of the samples with similar features should also be similar. Using the k-means algorithm (k-means) of the prototype clustering, the samples of the training set are clustered, and the label distribution based on the kmeans algorithm is proposed to learn the (label distribution learning based on k-means algorithm (LDLKM). First, the mean vector of each cluster is obtained by clustering algorithm kmeans, and then the mean vector of the corresponding label distribution is obtained respectively. Finally, the distance between the mean vector of the test set and the training set is used as the weight to predict the marked distribution of the test set. The experiments are carried out on six open data sets and compared with three existing label distributed learning algorithms on five evaluation indexes. The experimental results show that the proposed LDLKM algorithm is effective.
【作者單位】: 閩南師范大學(xué)粒計(jì)算重點(diǎn)實(shí)驗(yàn)室;
【基金】:國(guó)家自然科學(xué)基金項(xiàng)目(61379049,61379089)
【分類(lèi)號(hào)】:TP181
,
本文編號(hào):2150903
[Abstract]:Label distributed learning is a new machine learning paradigm proposed in recent years. It can solve some problems of label polysemy. The existing algorithm of label distribution learning uses conditional probability to establish parameter model, but it fails to make full use of the relationship between feature and marker. In this paper, we consider that the label distribution of the samples with similar features should also be similar. Using the k-means algorithm (k-means) of the prototype clustering, the samples of the training set are clustered, and the label distribution based on the kmeans algorithm is proposed to learn the (label distribution learning based on k-means algorithm (LDLKM). First, the mean vector of each cluster is obtained by clustering algorithm kmeans, and then the mean vector of the corresponding label distribution is obtained respectively. Finally, the distance between the mean vector of the test set and the training set is used as the weight to predict the marked distribution of the test set. The experiments are carried out on six open data sets and compared with three existing label distributed learning algorithms on five evaluation indexes. The experimental results show that the proposed LDLKM algorithm is effective.
【作者單位】: 閩南師范大學(xué)粒計(jì)算重點(diǎn)實(shí)驗(yàn)室;
【基金】:國(guó)家自然科學(xué)基金項(xiàng)目(61379049,61379089)
【分類(lèi)號(hào)】:TP181
,
本文編號(hào):2150903
本文鏈接:http://www.sikaile.net/kejilunwen/zidonghuakongzhilunwen/2150903.html
最近更新
教材專(zhuān)著