DSpace at KOASAS: Post-hoc policy adjustment for offline actor-critic reinforcement learning methods

DSpace at KOASAS

College of Engineering(공과대학)Kim Jaechul Graduate School of AI(김재철AI대학원)AI-Theses_Master(석사논문)

Post-hoc policy adjustment for offline actor-critic reinforcement learning methods오프라인 강화학습을 위한 사후 정책 보정 기법

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 5
Download : 0

Export

Ban, Seonghyun / 반성현

Offline reinforcement learning (RL) seeks to learn policies from previously collected datasets alone. This requires offline RL methods to address the distribution shift between the data collection policy underlying the dataset and the learned policy. Many offline RL methods often regularize the policy or value function during training to discourage the choice of out-of-distribution actions. Despite these efforts, the learned policies often suffer from state distribution shift during deployment. Since there is no direct learning signal for out-of-distribution states, this shift can lead to generalization problems. In this paper, we propose a post-hoc policy adjustment method for deployment phase to enhance the policy. Specifically, we focus on offline actor-critic methods employing conservatism, such as conservative Q-learning (CQL). The main concept originates from two key observations: first, for out-of-distribution states, the actor might not be optimized sufficiently regarding the critic, and second, the conservatively trained critic can aid in locating a nearby in-distribution state. We test our method using the D4RL benchmark and show that it can notably improve the performance of current state-of-the-art offline actor-critic methods.

Advisors: 김기응 researcher

Description: 한국과학기술원 :김재철AI대학원,

Publisher: 한국과학기술원

Issue Date: 2024

Identifier: 325007

Language: eng

Description: 학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2024.2,[iii, 22 p. :]

Keywords: 오프라인 강화학습▼a분포 변화▼a사후 보정; Offline Reinforcement Learning▼aDistribution Shift▼aPost-hoc Adjustment

URI: http://hdl.handle.net/10203/321364

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1096069&flag=dissertation

Appears in Collection: AI-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Post-hoc policy adjustment for offline actor-critic reinforcement learning methods오프라인 강화학습을 위한 사후 정책 보정 기법

KOASAS

Communities & Collections