DSpace at KOASAS: LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Conference Papers(학술회의논문)

LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 321
Download : 0

Export

Kim, Geon-Hyeong / Lee, Jongmin / Jang, Youngsoo / Yang, Hongseok researcher / Kim, Kee-Eung researcher

We consider the problem of learning from observation (LfO), in which the agent aims to mimic the expert's behavior from the state-only demonstrations by experts. We additionally assume that the agent cannot interact with the environment but has access to the action-labeled transition data collected by some agents with unknown qualities. This offline setting for LfO is appealing in many real-world scenarios where the ground-truth expert actions are inaccessible and the arbitrary environment interactions are costly or risky. In this paper, we present LobsDICE, an offline LfO algorithm that learns to imitate the expert policy via optimization in the space of stationary distributions. Our algorithm solves a single convex minimization problem, which minimizes the divergence between the two state-transition distributions induced by the expert and the agent policy. Through an extensive set of offline LfO tasks, we show that LobsDICE outperforms strong baseline methods.

Publisher: Neural information processing systems foundation

Issue Date: 2022-12-01

Language: English

Citation: The 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

URI: http://hdl.handle.net/10203/300568

Appears in Collection: CS-Conference Papers(학술회의논문)AI-Conference Papers(학술대회논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation

KOASAS

Communities & Collections