Algorithms for safe reinforcement learning안전한 강화학습을 위한 알고리즘 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 82
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisorKim, Kee-Eung-
dc.contributor.advisor김기응-
dc.contributor.authorLee, Jongmin-
dc.date.accessioned2023-06-23T19:34:24Z-
dc.date.available2023-06-23T19:34:24Z-
dc.date.issued2022-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=996365&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/309224-
dc.description학위논문(박사) - 한국과학기술원 : 전산학부, 2022.2,[vi, 111 p. :]-
dc.description.abstractStandard reinforcement learning (RL) aims to learn a reward-maximizing policy through online interaction with the MDP environment. However, in many real-world domains, naive application of RL may be problematic especially when some behaviors of the agent can cause irrecoverable damage to the agent itself or its surroundings. Therefore, for RL to be applied to practical problems, we should consider the notion of safety in the process of policy learning and execution. In this thesis, we address safety in RL from two perspectives: (1) safety via offline learning, and (2) safety via constraints. First, we consider the offline RL problem where the agent optimizes the policy solely from the pre-collected experiences, whose learning process is essentially safe in that it does not involve taking actions sampled from the unoptimized policy in the real environment. We present two different offline RL algorithms using gradient-based hyperparameter optimization and using stationary distribution correction estimation. Second, we consider the constrained MDP (CMDP), which provides a framework to encode safety specifications through cost constraints. We present a scalable solution method for CMDPs based on the Monte-Carlo tree search. Lastly, we consider the offline constrained RL problem that lies in the intersection of the two safety considerations. We introduce an efficient offline constrained RL algorithm that aims to compute a cost-conservative policy for actual constraint satisfaction by constraining the cost upper bound.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.titleAlgorithms for safe reinforcement learning-
dc.title.alternative안전한 강화학습을 위한 알고리즘 연구-
dc.typeThesis(Ph.D)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :전산학부,-
dc.contributor.alternativeauthor이종민-
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0