Reinforcement learning with constraints on distribution functions확률 분포 함수에 대한 제한조건이 있는 강화학습

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 67
  • Download : 0
Reinforcement learning (RL) has been a major topic in deep learning whose objective is to find an optimal policy for a given environment. Such RL has been applied successfully to games, simulations, and real robots, but these are relatively easier than various autonomous control systems in real world. For successful application to various real control systems, developing good RL algorithms that can learn an optimal policy for more challenging environments such as sparse reward environments and safety environments, is of great importance from both theoretical and practical perspectives. In this dissertation thesis, we consider two constraints on a policy 1) to achieve faster and stable policy improvement even on sparse reward environments, and 2) to guarantee safety of its policy. Using these constraints, we propose learning frameworks for RL based on the theory that we proved under some mild assumptions. In the first half, a new population-guided parallel learning scheme is proposed to enhance the performance of RL. In the proposed scheme, multiple identical learners with their own value-functions and policies share a common experience replay buffer, and search a good policy in collaboration with the guidance of the best policy information. The key point is that the information of the best policy is fused in a soft manner by constructing a constrained optimization problem with a constraint on the distance between the non-best policies and the previous best policy. We used Lagrange function of the constrained problem as an augmented policy loss function and this loss function guides the non-best policies in the population to search for an enlarged overall region in the policy space by the multiple learners. Monotone improvement of the expected cumulative return by the proposed scheme is proved theoretically. Working algorithms are constructed by applying the proposed scheme to the twin delayed deep deterministic (TD3) policy gradient algorithm. Numerical results show that the constructed algorithm outperforms most of the current state-of-the-art RL algorithms, and the gain is significant in the case of sparse reward environment. Finally, we propose the framework of quantile-constrained RL to guarantee a target probability of outage event that the cumulative sum cost exceeds a given threshold. Most of the previous constrained RL works consider expected cumulative sum cost as the constraint. However, optimization with this constraint cannot guarantee a target probability of outage event that the cumulative sum cost exceeds a given threshold. This work proposes a framework, named Quantile Constrained RL (QCRL), to constrain the quantile of the distribution of cumulative sum cost that is a necessary and sufficient condition to satisfy the outage constraint. This is the first work that tackles the issue of applying policy gradient theorem to the quantile and provides theoretical results for approximating the gradient of the quantile. Based on the derived theoretical results and the technique of the Lagrange multiplier, we construct a constrained RL algorithm named Quantile Constrained Policy Optimization (QCPO). We use distributional RL with the Large Deviation Principle (LDP) to estimate quantiles and tail probability of cumulative sum cost for the implementation of QCPO. The implemented algorithm learns an optimal policy while it keeps satisfying the outage probability constraint during its learning process.
Advisors
Sung, Youngchulresearcher성영철researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2022
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2022.8,[iv, 73 p. :]

Keywords

Reinforcement learning▼aConstrained reinforcement learning▼aProbabilistic constraint; 강화학습▼a제한조건 강화학습▼a확률적 제한조건

URI
http://hdl.handle.net/10203/309131
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1007855&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0