DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Hwang, Ganguk | - |
dc.contributor.advisor | 황강욱 | - |
dc.contributor.author | Hong, Kihun | - |
dc.date.accessioned | 2022-04-21T19:31:51Z | - |
dc.date.available | 2022-04-21T19:31:51Z | - |
dc.date.issued | 2021 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=963346&flag=dissertation | en_US |
dc.identifier.uri | http://hdl.handle.net/10203/295413 | - |
dc.description | 학위논문(석사) - 한국과학기술원 : 수리과학과, 2021.8,[iii, 16 p. :] | - |
dc.description.abstract | Proximal policy optimization algorithm (PPO) is one of the representative methods of policy-based reinforcement learning using actor-critic networks. It has been used as a baseline in various works on reinforcement learning. In this thesis, we newly consider the priorities of samples in the learning process of the critic neural network of the original proximal policy optimization algorithm. With the help of priorities, we accelerate learning the value function faster which can help learning of the actor neural network. We use two different prioritization methods: one using a time difference error as in the prioritized experience replay of deep Q networks and the other one using the Gaussian process regression. | - |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | Proximal policy optimization (PPO)▼aTime difference error▼aGaussian process regression | - |
dc.subject | 근접 정책 최적화▼a시간차 오차▼a가우시안 과정 회귀 | - |
dc.title | Learning critic network using priority in proximal policy optimization algorithm | - |
dc.title.alternative | 근접 정책 최적화 알고리즘에서 우선순위를 이용한 비평자 네트워크 학습 | - |
dc.type | Thesis(Master) | - |
dc.identifier.CNRN | 325007 | - |
dc.description.department | 한국과학기술원 :수리과학과, | - |
dc.contributor.alternativeauthor | 홍기훈 | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.