DSpace at KOASAS: Algorithms for offline imitation learning with supplementary demonstrations

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Ph.D.(박사논문)

Algorithms for offline imitation learning with supplementary demonstrations추가적인 시연을 활용한 오프라인 모방학습 알고리즘 연구

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 72
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	Kim, Kee-Eung	-
dc.contributor.advisor	김기응	-
dc.contributor.advisor	Yang, Hongseok	-
dc.contributor.advisor	양홍석	-
dc.contributor.author	Kim, Geon-Hyeong	-
dc.date.accessioned	2023-06-23T19:34:25Z	-
dc.date.available	2023-06-23T19:34:25Z	-
dc.date.issued	2022	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1007876&flag=dissertation	en_US
dc.identifier.uri	http://hdl.handle.net/10203/309225	-
dc.description	학위논문(박사) - 한국과학기술원 : 전산학부, 2022.8,[v, 56 p. :]	-
dc.description.abstract	We consider offline imitation learning (IL), which aims to mimic the expert’s behavior from its demonstration without further interaction with the environment. One of the main challenges in offline IL is to deal with the narrow support of the data distribution exhibited by the expert demonstrations that cover only a small fraction of the state and the action spaces. In this thesis, we address the two offline IL problems: imitation learning from (1) state-action demonstrations by experts, and (2) state-only demonstrations by experts. First, we consider the problem of learning from demonstrations (LfD), in which the agent aims to mimic the expert’s behavior from the state-action demonstrations by experts. Compared with the recent LfD algorithms that adopt adversarial minimax training objectives, we substantially stabilize overall learning process by reducing minimax optimization to a direct convex optimization in a principled manner. Next, we consider the problem of learning from observation (LfO), in which the agent aims to mimic the expert’s behavior from the state-only demonstrations by experts. We introduce an algorithm that solves a single convex minimization problem, which minimizes the divergence between the two state-transition distributions induced by the expert and the agent policy.	-
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	Imitation learning▼aLearning from demonstrations▼aLearning from observations▼aOffline imitation learning▼aImperfect demonstrations	-
dc.subject	모방학습▼a시연으로부터 학습▼a관찰로부터 학습▼a오프라인 모방학습▼a불완전한 시연	-
dc.title	Algorithms for offline imitation learning with supplementary demonstrations	-
dc.title.alternative	추가적인 시연을 활용한 오프라인 모방학습 알고리즘 연구	-
dc.type	Thesis(Ph.D)	-
dc.identifier.CNRN	325007	-
dc.description.department	한국과학기술원 :전산학부,	-
dc.contributor.alternativeauthor	김건형	-

Appears in Collection: CS-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Algorithms for offline imitation learning with supplementary demonstrations추가적인 시연을 활용한 오프라인 모방학습 알고리즘 연구

KOASAS

Communities & Collections