DSpace at KOASAS: Algorithms for safe reinforcement learning

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Ph.D.(박사논문)

Algorithms for safe reinforcement learning안전한 강화학습을 위한 알고리즘 연구

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 82
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	Kim, Kee-Eung	-
dc.contributor.advisor	김기응	-
dc.contributor.author	Lee, Jongmin	-
dc.date.accessioned	2023-06-23T19:34:24Z	-
dc.date.available	2023-06-23T19:34:24Z	-
dc.date.issued	2022	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=996365&flag=dissertation	en_US
dc.identifier.uri	http://hdl.handle.net/10203/309224	-
dc.description	학위논문(박사) - 한국과학기술원 : 전산학부, 2022.2,[vi, 111 p. :]	-
dc.description.abstract	Standard reinforcement learning (RL) aims to learn a reward-maximizing policy through online interaction with the MDP environment. However, in many real-world domains, naive application of RL may be problematic especially when some behaviors of the agent can cause irrecoverable damage to the agent itself or its surroundings. Therefore, for RL to be applied to practical problems, we should consider the notion of safety in the process of policy learning and execution. In this thesis, we address safety in RL from two perspectives: (1) safety via offline learning, and (2) safety via constraints. First, we consider the offline RL problem where the agent optimizes the policy solely from the pre-collected experiences, whose learning process is essentially safe in that it does not involve taking actions sampled from the unoptimized policy in the real environment. We present two different offline RL algorithms using gradient-based hyperparameter optimization and using stationary distribution correction estimation. Second, we consider the constrained MDP (CMDP), which provides a framework to encode safety specifications through cost constraints. We present a scalable solution method for CMDPs based on the Monte-Carlo tree search. Lastly, we consider the offline constrained RL problem that lies in the intersection of the two safety considerations. We introduce an efficient offline constrained RL algorithm that aims to compute a cost-conservative policy for actual constraint satisfaction by constraining the cost upper bound.	-
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.title	Algorithms for safe reinforcement learning	-
dc.title.alternative	안전한 강화학습을 위한 알고리즘 연구	-
dc.type	Thesis(Ph.D)	-
dc.identifier.CNRN	325007	-
dc.description.department	한국과학기술원 :전산학부,	-
dc.contributor.alternativeauthor	이종민	-

Appears in Collection: CS-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Algorithms for safe reinforcement learning안전한 강화학습을 위한 알고리즘 연구

KOASAS

Communities & Collections