Learning critic network using priority in proximal policy optimization algorithm근접 정책 최적화 알고리즘에서 우선순위를 이용한 비평자 네트워크 학습

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 298
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisorHwang, Ganguk-
dc.contributor.advisor황강욱-
dc.contributor.authorHong, Kihun-
dc.date.accessioned2022-04-21T19:31:51Z-
dc.date.available2022-04-21T19:31:51Z-
dc.date.issued2021-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=963346&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/295413-
dc.description학위논문(석사) - 한국과학기술원 : 수리과학과, 2021.8,[iii, 16 p. :]-
dc.description.abstractProximal policy optimization algorithm (PPO) is one of the representative methods of policy-based reinforcement learning using actor-critic networks. It has been used as a baseline in various works on reinforcement learning. In this thesis, we newly consider the priorities of samples in the learning process of the critic neural network of the original proximal policy optimization algorithm. With the help of priorities, we accelerate learning the value function faster which can help learning of the actor neural network. We use two different prioritization methods: one using a time difference error as in the prioritized experience replay of deep Q networks and the other one using the Gaussian process regression.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subjectProximal policy optimization (PPO)▼aTime difference error▼aGaussian process regression-
dc.subject근접 정책 최적화▼a시간차 오차▼a가우시안 과정 회귀-
dc.titleLearning critic network using priority in proximal policy optimization algorithm-
dc.title.alternative근접 정책 최적화 알고리즘에서 우선순위를 이용한 비평자 네트워크 학습-
dc.typeThesis(Master)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :수리과학과,-
dc.contributor.alternativeauthor홍기훈-
Appears in Collection
MA-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0