天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 汽車(chē)論文 >

基于安全強(qiáng)化學(xué)習(xí)的車(chē)道保持方法研究及其在SUMO中的驗(yàn)證

發(fā)布時(shí)間:2021-11-03 12:00
  自動(dòng)駕駛在不久的將來(lái)將會(huì)改變?nèi)藗內(nèi)粘I钪械慕煌ǚ绞?大量的工作已投入到自主駕駛中的決策和運(yùn)動(dòng)控制算法。目前,強(qiáng)化學(xué)習(xí)(Reinforcement Learning)一直是應(yīng)用于這方面的主要策略。但是,若將強(qiáng)化學(xué)習(xí)應(yīng)用于自動(dòng)駕駛,其在進(jìn)行探索時(shí)所采取的行動(dòng)可能造成安全隱患,而且該算法的收斂速度可能太慢。因此要想將強(qiáng)化學(xué)習(xí)走出實(shí)驗(yàn)室并應(yīng)用于實(shí)際的車(chē)輛自主學(xué)習(xí)中的話(huà),迫切需要解決強(qiáng)化學(xué)習(xí)中的安全問(wèn)題。論文提出了一種應(yīng)用于自動(dòng)駕駛的安全強(qiáng)化學(xué)習(xí)算法(Safe Reinforcement Learning),通過(guò)添加約束來(lái)確保算法學(xué)習(xí)過(guò)程中的安全性。論文提出帶約束的策略?xún)?yōu)化算法(CPO:Constrained Policy Optimization),該算法的關(guān)鍵在于在代價(jià)函數(shù)中引入條件約束。CPO算法基于Actor-Critic算法框架,通過(guò)設(shè)置硬約束條件降低策略更新的大小來(lái)確保策略更新過(guò)程中的安全性。論文主要工作內(nèi)容包括CPO算法的理論證明和推導(dǎo),實(shí)際應(yīng)用以及仿真結(jié)果分析。論文在多種地圖上比較了提出的算法,評(píng)估和分析了算法在不同地圖上的安全性和穩(wěn)定性。同時(shí),論文也比較了CPO算法和傳統(tǒng)強(qiáng)化... 

【文章來(lái)源】:清華大學(xué)北京市 211工程院校 985工程院校 教育部直屬院校

【文章頁(yè)數(shù)】:70 頁(yè)

【學(xué)位級(jí)別】:碩士

【文章目錄】:
摘要
ABSTRACT
CHAPTER 1.INTRODUCTION
    1.1 GENERAL INTRODUCTION AND BACKGROUND
    1.2 PROBLEM STATEMENT
    1.3 OBJECTIVE
    1.4 THESIS OUTLINE
CHAPTER 2.LITERATURE REVIEW
    2.1 THE RESEARCH STATUS OF REINFORCEMENT LEARNING
    2.2 REINFORCEMENT LEARNING THEORY AND STRUCTURE
        2.2.1 MARKOV DECISION PROCESS AND STRUCTURE
        2.2.2 BELLMAN EQUATION
    2.3 REINFORCEMENT LEARNING CLASSIFICATIONS
    2.4 REINFORCEMENT LEARNING ALGORITHMS
        2.4.1 DYNAMIC PROGRAMMING
        2.4.2 Q-LEARNING
        2.4.3 SARSA ALGORITHM
        2.4.4 POLICY GRADIENT METHODS
        2.4.5 ACTOR-CRITIC
    2.5 THE RESEARCH STATUS OF SAFE REINFORCEMENT LEARNING
        2.5.1 BASED ON THE MODIFICATION IN OPTIMIZATION CRITERIA:
        2.5.2 BASED ON THE MODIFICATION IN EXPLORATION PROCESS
CHAPTER 3.CONSTRAINED POLICY OPTIMIZATION
    3.1 CPO ALGORITHM
        3.1.1 CONSTRAINED MARKOV DECISION PROCESS(CMDP)
        3.1.2 TRUST REGION POLICY OPTIMIZATION(TRPO)ALGORITHM
        3.1.3 TRUST REGION APPLIED TO CONSTRAINED POLICY OPTIMIZATION
    3.2 LANE KEEPING BASED ON CONSTRAINED POLICY OPTIMIZATION ALGORITHM
        3.2.1 MARKOV MODELING OF LANE KEEPING PROBLEMS
        3.2.2 APPROXIMATE SOLUTION OF CPO ALGORITHM
CHAPTER 4.EXPERIMENT DESIGN& DATA ANALYSIS
    4.1 EXPERIMENT DESIGN
    4.2 MAP DESIGN AND ANALYSIS
        4.2.1 STRAIGHT ROAD
        4.2.2 S-SHAPED CURVED ROAD
        4.2.3 LOOP
        4.2.4 ROUNDABOUT
    4.3 RL VS CPO ENHANCED SAFE-RL
CHAPTER 5.SIMULATION ANALYSIS
    5.1 SUMO(SIMULATION OF URBAN MOBILITY)
    5.2 INTRODUCTION TO TRACI
    5.3 ANALYSIS OF LANE KEEPING PERFORMANCE:
    5.4 CHAPTER SUMMARY
CHAPTER 6.CONCLUSION AND FUTURE WORK
    6.1 SUMMARY AND CONTRIBUTIONS
    6.2 FUTURE WORK
REFERENCES
ACKNOWLEDGEMENT
RESUME



本文編號(hào):3473643

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/kejilunwen/qiche/3473643.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶(hù)13e00***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com