基于強化學習的庫位優(yōu)化算法在物料拉動系統(tǒng)中的研究與應用
發(fā)布時間:2018-11-16 06:51
【摘要】:機械自動化和流水線技術的興起推動了現(xiàn)代制造業(yè)的蓬勃發(fā)展。為了在激烈的競爭環(huán)境中占據(jù)有利地位,相關企業(yè)積極尋求控制生產(chǎn)成本和提升生產(chǎn)效率的途徑。自動化立體倉庫的庫位分配作為制造業(yè)生產(chǎn)物流中的重要組成部分,對生產(chǎn)線的運行效率和能量消耗都有著顯著的影響。優(yōu)秀的庫位分配策略可以有效的減少生產(chǎn)物流中的時間消耗和能量消耗,提高生產(chǎn)系統(tǒng)的運行效率。本文結合國內(nèi)某汽車整車制造廠的自動化立體倉庫情況,針對當前倉庫管理中存在的出入庫效率不高、運作能量消耗較大、分配智能化程度較低等問題提出了優(yōu)化需求。根據(jù)建立的庫位分配優(yōu)化模型的大規(guī)模條件信息、離散輸入輸出和全局優(yōu)化的特點,在分析不同解決方案特點的基礎上,研究使用了一種基于環(huán)境抽象和時態(tài)抽象的強化學習算法來解決該問題。針對庫位分配問題存在的條件規(guī)模較大的特點,對環(huán)境信息進行去冗余和抽象分層,將具體的庫位信息抽象成細節(jié)無關的分類評價信息。減小了問題的輸入條件規(guī)模,提升了問題的計算速度和收斂速度。針對該問題為全局最優(yōu)化問題的特點,結合半馬爾科夫過程SMDP的思想,對模型的決策過程進行時態(tài)抽象,將實時評價延遲為周期評價。通過周期的統(tǒng)計計算結果的優(yōu)劣來調(diào)整模型的決策方向,避免因模型追求實時分配效果而導致整體分配效果不佳的情況。針對庫位分配問題訓練樣本不足和儲存空間有限的情況,使用BP神經(jīng)網(wǎng)絡對模型值函數(shù)進行近似模擬,并根據(jù)最優(yōu)歷史分配周期評價結果和當前分配周期評價結果對其進行訓練。避免了因采用查表法計算值函數(shù)而造成的存儲空間需求巨大、訓練周期長和對樣本要求高等問題。最后,以研究內(nèi)容為基礎構建了庫位分配系統(tǒng),闡述了系統(tǒng)的主要設計過程和實現(xiàn)過程,展示了其對于汽車制造廠生產(chǎn)物流的優(yōu)化效果。
[Abstract]:The rise of mechanical automation and assembly line technology promotes the vigorous development of modern manufacturing industry. In order to occupy a favorable position in the fierce competitive environment, relevant enterprises actively seek ways to control production costs and improve production efficiency. As an important part of manufacturing production logistics, storage allocation of automated warehouse has a significant impact on the efficiency and energy consumption of production line. Excellent allocation strategy can effectively reduce the time and energy consumption in production logistics and improve the efficiency of production system. In this paper, according to the situation of the automatic three-dimensional warehouse in a domestic automobile manufacturing plant, the optimization requirements are put forward in view of the problems existing in the current warehouse management, such as the inefficiency of entering and leaving warehouse, the large consumption of operation energy, and the low degree of intelligent distribution, etc. According to the large-scale conditional information, discrete input and output and global optimization of the established optimization model, the characteristics of different solutions are analyzed. A reinforcement learning algorithm based on environment abstraction and temporal abstraction is used to solve this problem. In view of the large scale of the condition for the allocation of database bits, the environmental information is de-redundant and abstractly stratified, and the specific information is abstracted into detailed classification and evaluation information. The input condition scale of the problem is reduced, and the computation speed and convergence speed of the problem are improved. In view of the fact that the problem is a global optimization problem, combined with the idea of semi-Markov process (SMDP), the decision-making process of the model is abstracted in temporal state, and the real-time evaluation is delayed as periodic evaluation. The decision direction of the model is adjusted by the result of the statistical calculation of the period to avoid the situation that the overall allocation effect is not good because the model pursues the real-time allocation effect. In view of the shortage of training samples and the limited storage space in the allocation of database, the BP neural network is used to approximate simulate the model value function. It is trained according to the evaluation results of the optimal historical allocation cycle and the current distribution cycle. It avoids the problems of huge storage space, long training period and high demand for samples, which are caused by the use of look-up table method to calculate the value function. Finally, based on the research content, the paper constructs a warehouse allocation system, expounds the main design process and realization process of the system, and shows the optimization effect of the system for the production logistics of the automobile factory.
【學位授予單位】:西南交通大學
【學位級別】:碩士
【學位授予年份】:2015
【分類號】:U468.8
本文編號:2334777
[Abstract]:The rise of mechanical automation and assembly line technology promotes the vigorous development of modern manufacturing industry. In order to occupy a favorable position in the fierce competitive environment, relevant enterprises actively seek ways to control production costs and improve production efficiency. As an important part of manufacturing production logistics, storage allocation of automated warehouse has a significant impact on the efficiency and energy consumption of production line. Excellent allocation strategy can effectively reduce the time and energy consumption in production logistics and improve the efficiency of production system. In this paper, according to the situation of the automatic three-dimensional warehouse in a domestic automobile manufacturing plant, the optimization requirements are put forward in view of the problems existing in the current warehouse management, such as the inefficiency of entering and leaving warehouse, the large consumption of operation energy, and the low degree of intelligent distribution, etc. According to the large-scale conditional information, discrete input and output and global optimization of the established optimization model, the characteristics of different solutions are analyzed. A reinforcement learning algorithm based on environment abstraction and temporal abstraction is used to solve this problem. In view of the large scale of the condition for the allocation of database bits, the environmental information is de-redundant and abstractly stratified, and the specific information is abstracted into detailed classification and evaluation information. The input condition scale of the problem is reduced, and the computation speed and convergence speed of the problem are improved. In view of the fact that the problem is a global optimization problem, combined with the idea of semi-Markov process (SMDP), the decision-making process of the model is abstracted in temporal state, and the real-time evaluation is delayed as periodic evaluation. The decision direction of the model is adjusted by the result of the statistical calculation of the period to avoid the situation that the overall allocation effect is not good because the model pursues the real-time allocation effect. In view of the shortage of training samples and the limited storage space in the allocation of database, the BP neural network is used to approximate simulate the model value function. It is trained according to the evaluation results of the optimal historical allocation cycle and the current distribution cycle. It avoids the problems of huge storage space, long training period and high demand for samples, which are caused by the use of look-up table method to calculate the value function. Finally, based on the research content, the paper constructs a warehouse allocation system, expounds the main design process and realization process of the system, and shows the optimization effect of the system for the production logistics of the automobile factory.
【學位授予單位】:西南交通大學
【學位級別】:碩士
【學位授予年份】:2015
【分類號】:U468.8
【參考文獻】
相關期刊論文 前8條
1 劉金平,周炳海,奚立峰;在線自動化立體倉庫的庫位分配方法及其實證研究[J];工業(yè)工程與管理;2005年01期
2 陸鑫,高陽,李寧,陳世福;基于神經(jīng)網(wǎng)絡的強化學習算法研究[J];計算機研究與發(fā)展;2002年08期
3 商允偉,裘聿皇,劉長有;自動化倉庫貨位分配優(yōu)化問題研究[J];計算機工程與應用;2004年26期
4 陳月婷;何芳;;基于改進粒子群算法的立體倉庫貨位分配優(yōu)化[J];計算機工程與應用;2008年11期
5 尤藝;周立新;孫焰;;寶鋼礦石料場庫位分配優(yōu)化數(shù)模研究[J];物流科技;2009年11期
6 劉智斌;曾曉勤;劉惠義;儲榮;;基于BP神經(jīng)網(wǎng)絡的雙層啟發(fā)式強化學習方法[J];計算機研究與發(fā)展;2015年03期
7 馬永杰;蔣兆遠;楊志民;;基于遺傳算法的自動化倉庫的動態(tài)貨位分配[J];西南交通大學學報;2008年03期
8 簡淦楊;劉明波;林舜江;;隨機動態(tài)經(jīng)濟調(diào)度問題的存儲器建模及近似動態(tài)規(guī)劃算法[J];中國電機工程學報;2014年25期
相關碩士學位論文 前2條
1 張歡歡;自動化立體倉庫的若干關鍵技術與仿真[D];浙江大學;2008年
2 何玉林;瞬時差分方法在中國象棋計算機博弈中的應用[D];河北大學;2009年
,本文編號:2334777
本文鏈接:http://www.sikaile.net/guanlilunwen/wuliuguanlilunwen/2334777.html
最近更新
教材專著