當(dāng)前位置：主頁(yè) > 科技論文 > 自動(dòng)化論文 >

虛擬樣本生成技術(shù)及建模應(yīng)用研究

發(fā)布時(shí)間：2018-04-23 21:46

本文選題：小樣本 + 虛擬樣本生成��；參考：《北京化工大學(xué)》2017年博士論文

【摘要】：“大數(shù)據(jù)”時(shí)代,在很多領(lǐng)域,數(shù)據(jù)海量,知識(shí)貧乏,需要通過(guò)數(shù)據(jù)挖掘發(fā)現(xiàn)知識(shí),數(shù)據(jù)驅(qū)動(dòng)建模成為研究熱點(diǎn),而數(shù)據(jù)樣本個(gè)數(shù)不充分、樣本代表性不典型或者樣本分布不均勻等嚴(yán)重制約數(shù)據(jù)驅(qū)動(dòng)建模的質(zhì)量。在大數(shù)據(jù)背景下,不可忽視的一個(gè)重要問(wèn)題就是大數(shù)據(jù)、小樣本問(wèn)題。這個(gè)問(wèn)題主要源于數(shù)據(jù)獲取成本較高、或數(shù)據(jù)重復(fù)或發(fā)生概率較小等原因,致使面臨有用數(shù)據(jù)有限�；谛颖救绾芜M(jìn)行有效建模是計(jì)算智能領(lǐng)域的一個(gè)重要研究方向,具有十分重要的理論研究意義和應(yīng)用價(jià)值。解決小樣本問(wèn)題,目前學(xué)術(shù)界主要有基于灰色理論與機(jī)器學(xué)習(xí)的方法和生成虛擬樣本的方法等兩種途徑�；谛颖緮�(shù)據(jù)產(chǎn)生新的有效數(shù)據(jù)是補(bǔ)充數(shù)據(jù)的一種有效方法,虛擬樣本生成技術(shù)是解決小樣本問(wèn)題的重要研究方向。在大量文獻(xiàn)閱讀、歸納、總結(jié)的基礎(chǔ)上,本文將針對(duì)監(jiān)督式和非監(jiān)督式機(jī)器學(xué)習(xí)算法所對(duì)應(yīng)的標(biāo)簽數(shù)據(jù)和無(wú)標(biāo)簽數(shù)據(jù)的小樣本問(wèn)題,開(kāi)展基于小樣本的虛擬樣本產(chǎn)生、優(yōu)化和應(yīng)用研究,以產(chǎn)生充足的有效數(shù)據(jù)集,進(jìn)而開(kāi)展神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)和算法研究以提出數(shù)據(jù)驅(qū)動(dòng)的智能建模新方法,并開(kāi)展工程建設(shè)費(fèi)用風(fēng)險(xiǎn)分析應(yīng)用研究。本文的主要研究?jī)?nèi)容如下:(1)基于整體擴(kuò)散技術(shù)的虛擬樣本生成新方法。整體趨勢(shì)擴(kuò)散技術(shù)是一種有效的基于分布的虛擬樣本生成技術(shù),但現(xiàn)有技術(shù)只考慮了在原始樣本區(qū)域和擴(kuò)散區(qū)域采用同一種數(shù)據(jù)分布方法產(chǎn)生虛擬樣本,并且增加虛擬輸入屬性使輸入空間倍增。本文在此基礎(chǔ)上,在已知小樣本區(qū)域采用不均勻分布、在拓展區(qū)域采用均勻分布兩種方式相結(jié)合,通過(guò)多分布整體擴(kuò)散技術(shù)推估小樣本屬性可接受范圍,同時(shí)為了不增加輸入屬性,不再求取隸屬度函數(shù)值代表樣本點(diǎn)發(fā)生的可能性作為模型的虛擬輸入屬性,由此形成了一種更有效的虛擬樣本產(chǎn)生新機(jī)制,提出了一種新穎的多分布整體趨勢(shì)擴(kuò)散技術(shù)(MD-MTD)。通過(guò)標(biāo)準(zhǔn)函數(shù)和工業(yè)數(shù)據(jù)集驗(yàn)證了所提方法的有效性。(2)基于優(yōu)化技術(shù)的虛擬樣本生成新方法。為了解決虛擬樣本的優(yōu)化問(wèn)題,在MD-MTD的基礎(chǔ)上,本文提出了基于三角隸屬函數(shù)的信息擴(kuò)散方法(TMIE),進(jìn)而提出了一種新的確定上下拓展區(qū)域界限的方法,基于改進(jìn)的MD-MTD產(chǎn)生虛擬樣本,采用PSO對(duì)所產(chǎn)生的輸入屬性的虛擬樣本進(jìn)行優(yōu)化計(jì)算,獲得更合適的虛擬樣本,由此提出了 PSO-MD-MTD方法。通過(guò)標(biāo)準(zhǔn)函數(shù)和工業(yè)數(shù)據(jù)集驗(yàn)證了所提方法的有效性。(3)基于插值的虛擬樣本生成新方法。基于分布的虛擬樣本生成技術(shù)是基于小樣本建立的模型,由此本文研究建立一種合理有效的基于小樣本的神經(jīng)網(wǎng)絡(luò)模型,進(jìn)而根據(jù)所建模型的線性和非線性結(jié)構(gòu)特點(diǎn)進(jìn)行虛擬樣本的生成。為此,本文提出了一種極限學(xué)習(xí)機(jī)隱含層插值的虛擬樣本生成方法(IVSG),對(duì)極限學(xué)習(xí)機(jī)隱含層的輸出數(shù)據(jù)進(jìn)行中值插值產(chǎn)生相應(yīng)的虛擬樣本,再由隱含層輸出數(shù)據(jù)的虛擬樣本前后反推輸出層輸出和輸入層輸入空間的虛擬數(shù)據(jù)。通過(guò)標(biāo)準(zhǔn)函數(shù)和工業(yè)數(shù)據(jù)集驗(yàn)證了所提方法的有效性,并對(duì)IVSG、PSO-MD-MTD和MD-MTD進(jìn)行比較,分析不同方法的適用性。(4)基于偏最小二乘法的函數(shù)連接神經(jīng)網(wǎng)絡(luò)建模新方法。在解決數(shù)據(jù)樣本有效性問(wèn)題的基礎(chǔ)上,利用數(shù)據(jù)驅(qū)動(dòng)建模思想來(lái)挖掘數(shù)據(jù)背后隱藏的知識(shí)就是一項(xiàng)十分重要的工作。為了有效解決函數(shù)連接神經(jīng)網(wǎng)絡(luò)中共線性數(shù)據(jù)問(wèn)題和有效地挖掘有限數(shù)據(jù)背后的知識(shí)信息,本文結(jié)合極限學(xué)習(xí)機(jī)模型,提出采用偏最小二乘學(xué)習(xí)算法取代函數(shù)連接神經(jīng)網(wǎng)絡(luò)原模型誤差反向傳播算法來(lái)求取模型參數(shù),由此提出了一種基于偏最小二乘學(xué)習(xí)算法的函數(shù)連接神經(jīng)網(wǎng)絡(luò)模型(PLSR-FLNN),通過(guò)兩個(gè)工業(yè)實(shí)例數(shù)據(jù)集驗(yàn)證了所提方法的有效性,與其它四種建模方法比較驗(yàn)證了所提方法的先進(jìn)性。(5)基于蒙特卡洛方法擴(kuò)充樣本實(shí)現(xiàn)工程建設(shè)費(fèi)用風(fēng)險(xiǎn)分析與評(píng)估。在解決監(jiān)督學(xué)習(xí)中數(shù)據(jù)和建模問(wèn)題的基礎(chǔ)上,本文針對(duì)非監(jiān)督學(xué)習(xí)中的數(shù)據(jù)問(wèn)題開(kāi)展研究工作。重點(diǎn)探討Monte Carlo在工程建設(shè)費(fèi)用風(fēng)險(xiǎn)分析中的不確定性小樣本問(wèn)題,提出基于蒙特卡洛模擬的樣本補(bǔ)充方法,在此基礎(chǔ)上,根據(jù)數(shù)據(jù)樣本估計(jì)費(fèi)用項(xiàng)的概率分布和概率密度函數(shù),同時(shí)采用蒙特卡洛模擬和市場(chǎng)因素驅(qū)動(dòng),并結(jié)合李克特量表分析法,對(duì)各影響因素進(jìn)行綜合分析與評(píng)價(jià),由此提出一種實(shí)用的工程建設(shè)費(fèi)用風(fēng)險(xiǎn)分析方法,通過(guò)實(shí)際工程案例驗(yàn)證了所提方法的有效性。
[Abstract]:In the era of "big data", in many fields, data is huge, knowledge is poor, and knowledge is needed through data mining. Data driven modeling has become a hot topic, but the number of data samples is not sufficient, the representative of sample is not typical or the distribution of sample is not uniform, and the quality of data driven modeling is seriously restricted. In large data background, it can not be ignored. One of the important problems is large data, small sample problem. This problem is mainly due to the high cost of data acquisition, or the low probability of data repetition or small occurrence, which leads to the limited availability of useful data. It is an important research direction in the field of computing intelligence based on how to make effective modeling based on small samples. In order to solve the problem of small sample, there are two ways in the academic circle, which are based on the method of grey theory and machine learning and the method of generating virtual sample. It is an effective method to produce new effective data based on small sample data, and the virtual sample generation technology is important to solve the small sample problem. On the basis of a large number of literature reading, induction and summary, this paper will launch a small sample based virtual sample generation, optimization and application research to produce sufficient and effective data sets to develop a neural network, based on the small sample problem of the label data and unlabeled data corresponding to the supervised and unsupervised machine learning algorithms. The research of network structure and algorithm is a new method of data driven intelligent modeling, and the research of engineering construction cost risk analysis is carried out. The main contents of this paper are as follows: (1) a new method of virtual sample generation based on the whole diffusion technology. The existing technology only considers the use of the same data distribution method in the original sample area and the diffusion region to generate virtual samples, and increase the virtual input attribute to multiplier the input space. On this basis, the inhomogeneous distribution is adopted in the known small sample regions, and the two ways of uniform distribution are combined in the extended region through the multiple points. The whole diffusion technology estimates the acceptable range of the small sample attributes. At the same time, in order to not increase the input attribute, the possibility of the membership degree function is no longer to represent the possibility of the sample point as the virtual input attribute of the model, thus a more effective new mechanism of virtual sample generation is formed, and a novel multi distribution overall trend expansion is proposed. MD-MTD. The validity of the proposed method is verified through standard functions and industrial data sets. (2) a new method of virtual sample generation based on optimization technology is created. In order to solve the optimization problem of virtual samples, based on MD-MTD, this paper proposes a method of information diffusion based on trigonometric membership function (TMIE), and then proposes a new kind of method. The method of setting up and down region boundaries is based on the virtual sample produced by the improved MD-MTD. The virtual sample of the input attributes generated by PSO is optimized and the more appropriate virtual samples are obtained. Thus, the PSO-MD-MTD method is proposed. The validity of the proposed method is verified by the standard function and the industrial data set. (3) interpolation based on the method. The virtual sample generation method is a new method. The distributed virtual sample generation technology is based on the small sample model. In this paper, a reasonable and effective neural network model based on small sample is established, and then the pseudo sample is generated according to the linear and nonlinear structure characteristics of the model. The virtual sample generation method (IVSG) for the implicit layer interpolation of the learning machine is used to generate the corresponding virtual samples for the output data of the implicit layer of the limit learning machine, and then the output layer and the input layer virtual data in the input layer of the virtual sample of the hidden layer output data. The standard function and the industrial data collection are tested. The validity of the proposed method is proved, and IVSG, PSO-MD-MTD and MD-MTD are compared, and the applicability of different methods is analyzed. (4) a new method of modeling the neural network based on partial least square method is used. On the basis of solving the problem of data sample validity, the data driven modeling idea is used to excavate the hidden knowledge behind the data. In order to effectively solve the linear data problem of the function connection neural network and effectively excavate the knowledge information behind the finite data, a partial least square learning algorithm is proposed to replace the original model error back propagation algorithm of the function connection neural network to obtain the model reference. In this way, a function connection neural network model (PLSR-FLNN) based on partial least squares learning algorithm is proposed, and the effectiveness of the proposed method is verified by two industrial example data sets. Compared with the other four modeling methods, the advanced nature of the proposed method is verified. (5) the construction cost of the project is expanded by the Monte Carlo method. Using risk analysis and evaluation. On the basis of solving the problem of data and modeling in supervised learning, this paper carries out research work on data problems in unsupervised learning. This paper focuses on the small sample problem of Monte Carlo in the risk analysis of engineering construction costs, and proposes a sample supplement based on Monte Carlo simulation, which is based on this basis. At the same time, the probability distribution and probability density function of the cost item are estimated according to the data sample, and the Monte Carlo simulation and the market factor are used at the same time. Combined with the Li kte scale analysis method, the influence factors are synthetically analyzed and evaluated. A practical project construction cost risk analysis method is put forward, and the practical engineering case is adopted. The effectiveness of the proposed method is verified.

【學(xué)位授予單位】：北京化工大學(xué)
【學(xué)位級(jí)別】：博士
【學(xué)位授予年份】：2017
【分類號(hào)】：TP18;TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 賀彥林;王曉;朱群雄;;基于主成分分析-改進(jìn)的極限學(xué)習(xí)機(jī)方法的精對(duì)苯二甲酸醋酸含量軟測(cè)量[J];控制理論與應(yīng)用;2015年01期

2 劉菲菲;彭荻;賀彥林;朱群雄;;基于極限學(xué)習(xí)的過(guò)程神經(jīng)網(wǎng)絡(luò)研究及化工應(yīng)用[J];上海交通大學(xué)學(xué)報(bào);2014年07期

3 高慧慧;賀彥林;彭荻;朱群雄;;基于數(shù)據(jù)屬性劃分的遞階ELM研究及化工應(yīng)用[J];化工學(xué)報(bào);2013年12期

4 張明泉;鐘雄;;蒙特卡洛模擬在油田開(kāi)發(fā)經(jīng)濟(jì)評(píng)價(jià)風(fēng)險(xiǎn)中的應(yīng)用[J];西南石油大學(xué)學(xué)報(bào)(社會(huì)科學(xué)版);2012年04期

5 孫燕君;錢瑜;張玉超;;蒙特卡洛分析在氯氣泄漏事故環(huán)境風(fēng)險(xiǎn)評(píng)價(jià)中的應(yīng)用研究[J];環(huán)境科學(xué)學(xué)報(bào);2011年11期

6 于旭;楊靜;謝志強(qiáng);;虛擬樣本生成技術(shù)研究[J];計(jì)算機(jī)科學(xué);2011年03期

7 邵秀麗;侯樂(lè)彩;黃海寬;;基于SVM和蒙特卡洛的滴丸含水量建模仿真[J];南開(kāi)大學(xué)學(xué)報(bào)(自然科學(xué)版);2009年05期

8 朱群雄;孟慶浩;;一種新的選擇性神經(jīng)網(wǎng)絡(luò)集成方法及其在PTA中的應(yīng)用[J];化工學(xué)報(bào);2009年10期

9 王紅衛(wèi);祁超;魏永長(zhǎng);李彬;朱松;;基于數(shù)據(jù)的決策方法綜述[J];自動(dòng)化學(xué)報(bào);2009年06期

10 郜傳厚;漸令;陳積明;孫優(yōu)賢;;復(fù)雜高爐煉鐵過(guò)程的數(shù)據(jù)驅(qū)動(dòng)建模及預(yù)測(cè)算法[J];自動(dòng)化學(xué)報(bào);2009年06期

相關(guān)會(huì)議論文前1條

1 吳祉群;何建國(guó);蒲潔;;蒙特卡羅法模擬計(jì)算小樣本事件可靠性[A];2004年全國(guó)機(jī)械可靠性學(xué)術(shù)交流會(huì)論文集[C];2004年

相關(guān)博士學(xué)位論文前1條

1 李棟;基于免疫系統(tǒng)的小樣本在線學(xué)習(xí)異常檢測(cè)與故障診斷方法[D];上海大學(xué);2014年

，

本文編號(hào)：1793759

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/zidonghuakongzhilunwen/1793759.html

上一篇：復(fù)合滑模控制在精密PMLSM激光切割運(yùn)動(dòng)平臺(tái)的應(yīng)用
下一篇：基于ARM的甲醇精餾塔變結(jié)構(gòu)控制系統(tǒng)研究

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

虛擬樣本生成技術(shù)及建模應(yīng)用研究