DSpace at KOASAS: Online actor-critic method based on incrementally generated radial basis functions

DSpace at KOASAS

College of Engineering(공과대학)The Robotics Program(로봇공학학제전공)RE-Theses_Ph.D.(박사논문)

Online actor-critic method based on incrementally generated radial basis functions점진적으로 생성되는 방사형 기저함수 기반 온라인 액터-크리틱 방법

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 998
Download : 0

Export

Lee, Dong-Hyun / 이동현

Reinforcement learning is learning what to do so as to maximize a numerical reward signal. The reinforcement learning agent is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward through interaction with its environment. The detailed information about the environment is not given to the agent as well. Because of these properties, reinforcement learning is a natural approach to deal with sequential decision problems. Direct methods of the reinforcement learning, such as Q-learning and SARSA, are widely used because of their simplicity, but it is difficult to deal with the continuous state and action problems using them. To use those methods, the discretization process is needed in advance, and it could bring the curse of dimensionality problem. In addition, the discontinuity of action selection in those methods could result in oscillations or divergence in the learning process. An alternative is the actor-critic method using the policy gradient. The policy gradient method guarantees convergence to a local optimal policy. In this thesis, a novel actor-critic method using an incrementally constructed radial basis function network is developed to deal with continuous state and action problems. There exists one local model for each basis function and the number of local models is increased as the basis function network grows. The normalized weighted sum of their outputs is used to estimate the value function for the critic, and the models are updated with a heuristic method, which uses the local temporal difference error in the receptive field of the corresponding basis function. A Gaussian policy is used for continuous action, and it is parameterized by the mean and the standard deviation. The parameters are determined by the normalized weighed sum of the corresponding sub-parameters assigned to the basis functions, and the regular policy gradient method is used for their update proces...

Advisors: Lee, Ju-Jang researcher; 이주장

Description: 한국과학기술원 : 로봇공학학제전공,

Publisher: 한국과학기술원

Issue Date: 2013

Identifier: 513482/325007 / 020075312

Language: eng

Description: 학위논문(박사) - 한국과학기술원 : 로봇공학학제전공, 2013.2, [ vii, 100 p. ]

Keywords: Reinforcement learning; actor-critic; local model; policy gradient; 강화학습; 액터-크리틱; 지역 모델; 정책기울기; 함수 추정; function approximation

URI: http://hdl.handle.net/10203/179591

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=513482&flag=dissertation

Appears in Collection: RE-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Online actor-critic method based on incrementally generated radial basis functions점진적으로 생성되는 방사형 기저함수 기반 온라인 액터-크리틱 방법

KOASAS

Communities & Collections