Learning critic network using priority in proximal policy optimization algorithm근접 정책 최적화 알고리즘에서 우선순위를 이용한 비평자 네트워크 학습

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 299
  • Download : 0
Proximal policy optimization algorithm (PPO) is one of the representative methods of policy-based reinforcement learning using actor-critic networks. It has been used as a baseline in various works on reinforcement learning. In this thesis, we newly consider the priorities of samples in the learning process of the critic neural network of the original proximal policy optimization algorithm. With the help of priorities, we accelerate learning the value function faster which can help learning of the actor neural network. We use two different prioritization methods: one using a time difference error as in the prioritized experience replay of deep Q networks and the other one using the Gaussian process regression.
Advisors
Hwang, Gangukresearcher황강욱researcher
Description
한국과학기술원 :수리과학과,
Publisher
한국과학기술원
Issue Date
2021
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 수리과학과, 2021.8,[iii, 16 p. :]

Keywords

Proximal policy optimization (PPO)▼aTime difference error▼aGaussian process regression; 근접 정책 최적화▼a시간차 오차▼a가우시안 과정 회귀

URI
http://hdl.handle.net/10203/295413
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=963346&flag=dissertation
Appears in Collection
MA-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0