Post-hoc policy adjustment for offline actor-critic reinforcement learning methods오프라인 강화학습을 위한 사후 정책 보정 기법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 5
  • Download : 0
Offline reinforcement learning (RL) seeks to learn policies from previously collected datasets alone. This requires offline RL methods to address the distribution shift between the data collection policy underlying the dataset and the learned policy. Many offline RL methods often regularize the policy or value function during training to discourage the choice of out-of-distribution actions. Despite these efforts, the learned policies often suffer from state distribution shift during deployment. Since there is no direct learning signal for out-of-distribution states, this shift can lead to generalization problems. In this paper, we propose a post-hoc policy adjustment method for deployment phase to enhance the policy. Specifically, we focus on offline actor-critic methods employing conservatism, such as conservative Q-learning (CQL). The main concept originates from two key observations: first, for out-of-distribution states, the actor might not be optimized sufficiently regarding the critic, and second, the conservatively trained critic can aid in locating a nearby in-distribution state. We test our method using the D4RL benchmark and show that it can notably improve the performance of current state-of-the-art offline actor-critic methods.
Advisors
김기응researcher
Description
한국과학기술원 :김재철AI대학원,
Publisher
한국과학기술원
Issue Date
2024
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2024.2,[iii, 22 p. :]

Keywords

오프라인 강화학습▼a분포 변화▼a사후 보정; Offline Reinforcement Learning▼aDistribution Shift▼aPost-hoc Adjustment

URI
http://hdl.handle.net/10203/321364
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1096069&flag=dissertation
Appears in Collection
AI-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0