基于安全強(qiáng)化學(xué)習(xí)的車(chē)道保持方法研究及其在SUMO中的驗(yàn)證
發(fā)布時(shí)間:2021-11-03 12:00
自動(dòng)駕駛在不久的將來(lái)將會(huì)改變?nèi)藗內(nèi)粘I钪械慕煌ǚ绞?大量的工作已投入到自主駕駛中的決策和運(yùn)動(dòng)控制算法。目前,強(qiáng)化學(xué)習(xí)(Reinforcement Learning)一直是應(yīng)用于這方面的主要策略。但是,若將強(qiáng)化學(xué)習(xí)應(yīng)用于自動(dòng)駕駛,其在進(jìn)行探索時(shí)所采取的行動(dòng)可能造成安全隱患,而且該算法的收斂速度可能太慢。因此要想將強(qiáng)化學(xué)習(xí)走出實(shí)驗(yàn)室并應(yīng)用于實(shí)際的車(chē)輛自主學(xué)習(xí)中的話(huà),迫切需要解決強(qiáng)化學(xué)習(xí)中的安全問(wèn)題。論文提出了一種應(yīng)用于自動(dòng)駕駛的安全強(qiáng)化學(xué)習(xí)算法(Safe Reinforcement Learning),通過(guò)添加約束來(lái)確保算法學(xué)習(xí)過(guò)程中的安全性。論文提出帶約束的策略?xún)?yōu)化算法(CPO:Constrained Policy Optimization),該算法的關(guān)鍵在于在代價(jià)函數(shù)中引入條件約束。CPO算法基于Actor-Critic算法框架,通過(guò)設(shè)置硬約束條件降低策略更新的大小來(lái)確保策略更新過(guò)程中的安全性。論文主要工作內(nèi)容包括CPO算法的理論證明和推導(dǎo),實(shí)際應(yīng)用以及仿真結(jié)果分析。論文在多種地圖上比較了提出的算法,評(píng)估和分析了算法在不同地圖上的安全性和穩(wěn)定性。同時(shí),論文也比較了CPO算法和傳統(tǒng)強(qiáng)化...
【文章來(lái)源】:清華大學(xué)北京市 211工程院校 985工程院校 教育部直屬院校
【文章頁(yè)數(shù)】:70 頁(yè)
【學(xué)位級(jí)別】:碩士
【文章目錄】:
摘要
ABSTRACT
CHAPTER 1.INTRODUCTION
1.1 GENERAL INTRODUCTION AND BACKGROUND
1.2 PROBLEM STATEMENT
1.3 OBJECTIVE
1.4 THESIS OUTLINE
CHAPTER 2.LITERATURE REVIEW
2.1 THE RESEARCH STATUS OF REINFORCEMENT LEARNING
2.2 REINFORCEMENT LEARNING THEORY AND STRUCTURE
2.2.1 MARKOV DECISION PROCESS AND STRUCTURE
2.2.2 BELLMAN EQUATION
2.3 REINFORCEMENT LEARNING CLASSIFICATIONS
2.4 REINFORCEMENT LEARNING ALGORITHMS
2.4.1 DYNAMIC PROGRAMMING
2.4.2 Q-LEARNING
2.4.3 SARSA ALGORITHM
2.4.4 POLICY GRADIENT METHODS
2.4.5 ACTOR-CRITIC
2.5 THE RESEARCH STATUS OF SAFE REINFORCEMENT LEARNING
2.5.1 BASED ON THE MODIFICATION IN OPTIMIZATION CRITERIA:
2.5.2 BASED ON THE MODIFICATION IN EXPLORATION PROCESS
CHAPTER 3.CONSTRAINED POLICY OPTIMIZATION
3.1 CPO ALGORITHM
3.1.1 CONSTRAINED MARKOV DECISION PROCESS(CMDP)
3.1.2 TRUST REGION POLICY OPTIMIZATION(TRPO)ALGORITHM
3.1.3 TRUST REGION APPLIED TO CONSTRAINED POLICY OPTIMIZATION
3.2 LANE KEEPING BASED ON CONSTRAINED POLICY OPTIMIZATION ALGORITHM
3.2.1 MARKOV MODELING OF LANE KEEPING PROBLEMS
3.2.2 APPROXIMATE SOLUTION OF CPO ALGORITHM
CHAPTER 4.EXPERIMENT DESIGN& DATA ANALYSIS
4.1 EXPERIMENT DESIGN
4.2 MAP DESIGN AND ANALYSIS
4.2.1 STRAIGHT ROAD
4.2.2 S-SHAPED CURVED ROAD
4.2.3 LOOP
4.2.4 ROUNDABOUT
4.3 RL VS CPO ENHANCED SAFE-RL
CHAPTER 5.SIMULATION ANALYSIS
5.1 SUMO(SIMULATION OF URBAN MOBILITY)
5.2 INTRODUCTION TO TRACI
5.3 ANALYSIS OF LANE KEEPING PERFORMANCE:
5.4 CHAPTER SUMMARY
CHAPTER 6.CONCLUSION AND FUTURE WORK
6.1 SUMMARY AND CONTRIBUTIONS
6.2 FUTURE WORK
REFERENCES
ACKNOWLEDGEMENT
RESUME
本文編號(hào):3473643
【文章來(lái)源】:清華大學(xué)北京市 211工程院校 985工程院校 教育部直屬院校
【文章頁(yè)數(shù)】:70 頁(yè)
【學(xué)位級(jí)別】:碩士
【文章目錄】:
摘要
ABSTRACT
CHAPTER 1.INTRODUCTION
1.1 GENERAL INTRODUCTION AND BACKGROUND
1.2 PROBLEM STATEMENT
1.3 OBJECTIVE
1.4 THESIS OUTLINE
CHAPTER 2.LITERATURE REVIEW
2.1 THE RESEARCH STATUS OF REINFORCEMENT LEARNING
2.2 REINFORCEMENT LEARNING THEORY AND STRUCTURE
2.2.1 MARKOV DECISION PROCESS AND STRUCTURE
2.2.2 BELLMAN EQUATION
2.3 REINFORCEMENT LEARNING CLASSIFICATIONS
2.4 REINFORCEMENT LEARNING ALGORITHMS
2.4.1 DYNAMIC PROGRAMMING
2.4.2 Q-LEARNING
2.4.3 SARSA ALGORITHM
2.4.4 POLICY GRADIENT METHODS
2.4.5 ACTOR-CRITIC
2.5 THE RESEARCH STATUS OF SAFE REINFORCEMENT LEARNING
2.5.1 BASED ON THE MODIFICATION IN OPTIMIZATION CRITERIA:
2.5.2 BASED ON THE MODIFICATION IN EXPLORATION PROCESS
CHAPTER 3.CONSTRAINED POLICY OPTIMIZATION
3.1 CPO ALGORITHM
3.1.1 CONSTRAINED MARKOV DECISION PROCESS(CMDP)
3.1.2 TRUST REGION POLICY OPTIMIZATION(TRPO)ALGORITHM
3.1.3 TRUST REGION APPLIED TO CONSTRAINED POLICY OPTIMIZATION
3.2 LANE KEEPING BASED ON CONSTRAINED POLICY OPTIMIZATION ALGORITHM
3.2.1 MARKOV MODELING OF LANE KEEPING PROBLEMS
3.2.2 APPROXIMATE SOLUTION OF CPO ALGORITHM
CHAPTER 4.EXPERIMENT DESIGN& DATA ANALYSIS
4.1 EXPERIMENT DESIGN
4.2 MAP DESIGN AND ANALYSIS
4.2.1 STRAIGHT ROAD
4.2.2 S-SHAPED CURVED ROAD
4.2.3 LOOP
4.2.4 ROUNDABOUT
4.3 RL VS CPO ENHANCED SAFE-RL
CHAPTER 5.SIMULATION ANALYSIS
5.1 SUMO(SIMULATION OF URBAN MOBILITY)
5.2 INTRODUCTION TO TRACI
5.3 ANALYSIS OF LANE KEEPING PERFORMANCE:
5.4 CHAPTER SUMMARY
CHAPTER 6.CONCLUSION AND FUTURE WORK
6.1 SUMMARY AND CONTRIBUTIONS
6.2 FUTURE WORK
REFERENCES
ACKNOWLEDGEMENT
RESUME
本文編號(hào):3473643
本文鏈接:http://www.sikaile.net/kejilunwen/qiche/3473643.html
最近更新
教材專(zhuān)著