當(dāng)前位置：主頁(yè) > 科技論文 > 自動(dòng)化論文 >

代價(jià)敏感屬性約簡(jiǎn)的歸并算法研究

發(fā)布時(shí)間：2018-09-05 17:04

【摘要】：數(shù)據(jù)挖掘又稱數(shù)據(jù)庫(kù)中的知識(shí)發(fā)現(xiàn)(Knowledge Discover in Database, KDD),是目前人工智能和數(shù)據(jù)庫(kù)領(lǐng)域研究的熱點(diǎn)問(wèn)題。在現(xiàn)實(shí)世界中,數(shù)據(jù)集中存儲(chǔ)的屬性有幾十、幾百、甚至上千種。這些屬性中有很多是冗余的,它們會(huì)干擾數(shù)據(jù)挖掘的過(guò)程,也很大程度上會(huì)影響算法的效率。因此,人們提出了屬性約簡(jiǎn)這一預(yù)處理技術(shù)。另一方面,現(xiàn)實(shí)世界中的行為或者事物都有各種代價(jià),如測(cè)試代價(jià)、誤分類(lèi)代價(jià)、延遲代價(jià)等,涉及金錢(qián)、時(shí)間、人工等方面的開(kāi)銷(xiāo)。代價(jià)敏感學(xué)習(xí)致力于涉及各類(lèi)代價(jià)的挖掘問(wèn)題。當(dāng)前被研究的代價(jià)敏感屬性約簡(jiǎn)問(wèn)題包括：最小測(cè)試代價(jià)屬性約簡(jiǎn)問(wèn)題、簡(jiǎn)單公共測(cè)試代價(jià)屬性約簡(jiǎn)問(wèn)題、最小測(cè)試時(shí)間代價(jià)問(wèn)題等。人們將不同算法框架應(yīng)用于這些具體問(wèn)題。啟發(fā)式算法的速度很快,但是由于它們常常會(huì)收斂于局部最優(yōu)解,因此正確率不高�；厮菟惴m然能夠保證找到最優(yōu)解,但是運(yùn)行時(shí)間往往不能被接受。仿生算法也常常能找到最優(yōu)解,不過(guò)其耗費(fèi)的時(shí)間代價(jià)過(guò)大。最近還有學(xué)者提出半貪心算法,能夠在一定時(shí)間內(nèi)得到較好的結(jié)果。本文將分治算法與回溯算法相結(jié)合,提出一種歸并算法,以改善回溯算法的不足。本文的歸并算法包含三個(gè)關(guān)鍵技術(shù)：分組與合并、回溯算法、以及競(jìng)爭(zhēng)機(jī)制。該算法先將屬性隨機(jī)分組,得到一些屬性子集,組的大小g對(duì)算法的性能有很大的影響。在極端情況下,組的大小與屬性數(shù)目相同的情況下,歸并算法將退回為回溯算法。屬性子集通過(guò)回溯算法得到屬性子集的約簡(jiǎn),然后將每對(duì)相鄰的約簡(jiǎn)合并成一個(gè)新的屬性子集。重復(fù)以上過(guò)程直到只剩一個(gè)屬性子集,這個(gè)屬性子集的約簡(jiǎn)就是原問(wèn)題的約簡(jiǎn)。屬性分組后,全局重要的屬性可能在局部約簡(jiǎn)時(shí)被刪除,導(dǎo)致歸并算法得到的解不是全局最優(yōu)解。因此我們采用競(jìng)爭(zhēng)機(jī)制,運(yùn)行歸并算法p次,得到p個(gè)解,再?gòu)倪@p個(gè)解中選取最優(yōu)解。本文將該算法運(yùn)用于最小測(cè)試代價(jià)屬性約簡(jiǎn)問(wèn)題、簡(jiǎn)單公共測(cè)試代價(jià)屬性約簡(jiǎn)問(wèn)題以及最小測(cè)試時(shí)間代價(jià)問(wèn)題這三個(gè)問(wèn)題。并使用來(lái)自UCI(University of California-Irvine)數(shù)據(jù)庫(kù)中的四個(gè)真實(shí)數(shù)據(jù)集對(duì)提出的歸并算法進(jìn)行實(shí)驗(yàn)。其中每種數(shù)據(jù)集使用了三種不同分布的測(cè)試代價(jià)。通過(guò)實(shí)驗(yàn)我們得知：競(jìng)爭(zhēng)機(jī)制能有效提高結(jié)果的質(zhì)量；對(duì)于不同問(wèn)題,p值大于6之后算法結(jié)果趨于穩(wěn)定；最優(yōu)g值對(duì)于不同情況略有不同,在最小測(cè)試代價(jià)問(wèn)題中為6,簡(jiǎn)單公共測(cè)試代價(jià)、最小測(cè)試時(shí)間代價(jià)問(wèn)題中均為7。與現(xiàn)有的啟發(fā)式算法、蟻群算法和回溯算法相比,歸并算法在保持較高的正確率的情況下,能夠大大縮短運(yùn)行時(shí)間。因此,本文提出的歸并算法是針對(duì)該類(lèi)問(wèn)題的一種有效并且高效的算法。
[Abstract]:Data mining, also known as knowledge discovery (Knowledge Discover in Database, KDD),) in database, is a hot issue in the field of artificial intelligence and database. In the real world, data sets store dozens, hundreds, or even thousands of attributes. Many of these attributes are redundant, which interfere with the process of data mining and greatly affect the efficiency of the algorithm. Therefore, the preprocessing technique of attribute reduction is proposed. On the other hand, behaviors or things in the real world have various costs, such as the cost of testing, the cost of misclassification, the cost of delay, the cost of money, time, labor, and so on. Cost-sensitive learning focuses on mining problems involving various costs. The cost sensitive attribute reduction problem which has been studied at present includes: the minimum test cost attribute reduction problem, the simple common test cost attribute reduction problem, the minimum test time cost problem and so on. Different algorithms are applied to these specific problems. The speed of heuristic algorithms is very fast, but the accuracy is not high because they often converge to local optimal solutions. Although the backtracking algorithm can guarantee to find the optimal solution, the running time is often unacceptable. Bionic algorithms can often find the optimal solution, but the cost of time is too large. Recently, some scholars have proposed a half-greedy algorithm, which can get better results in a certain time. In this paper, a merging algorithm is proposed to improve the deficiency of the backtracking algorithm by combining the partition-and-conquer algorithm with the backtracking algorithm. The merging algorithm consists of three key technologies: grouping and merging, backtracking algorithm, and competition mechanism. In this algorithm, the attributes are grouped randomly and some subsets of attributes are obtained. The size of the group g has a great influence on the performance of the algorithm. In extreme cases, when the group size is the same as the number of attributes, the merging algorithm is returned to the backtracking algorithm. The reduction of attribute subset is obtained by backtracking algorithm, and then each pair of adjacent reduction is merged into a new attribute subset. Repeat the above process until there is only a subset of attributes whose reduction is the reduction of the original problem. After the attributes are grouped, the globally important attributes may be deleted in the local reduction, resulting in the solution obtained by the merging algorithm is not the global optimal solution. So we use the competition mechanism, run the merging algorithm p times, obtain p solutions, and then select the optimal solution from the p solution. In this paper, the algorithm is applied to three problems: the minimum test cost attribute reduction problem, the simple common test cost attribute reduction problem and the minimum test time cost problem. Four real data sets from UCI (University of California-Irvine) database are used to test the proposed merging algorithm. Each data set uses three different distribution of test costs. Through experiments, we know that the competition mechanism can effectively improve the quality of the results, the algorithm results tend to be stable when the value of p is greater than 6 for different problems, and the optimal g value is slightly different for different cases. In the minimum test cost problem is 6, the simple common test cost and the minimum test time cost problem are all 7. 5%. Compared with the existing heuristic algorithm, ant colony algorithm and backtracking algorithm, the merging algorithm can greatly shorten the running time while maintaining a higher accuracy. Therefore, the merging algorithm proposed in this paper is an effective and efficient algorithm for this kind of problems.
【學(xué)位授予單位】：西南石油大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2016
【分類(lèi)號(hào)】：TP18;TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 孟蕓;王U，

本文編號(hào)：2224887

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/zidonghuakongzhilunwen/2224887.html

上一篇：基于遙感技術(shù)的湖泊歷史變遷檢測(cè)及展示
下一篇：基于粒子群優(yōu)化的網(wǎng)絡(luò)視頻流特征選擇與識(shí)別

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

代價(jià)敏感屬性約簡(jiǎn)的歸并算法研究