Off-policy multi-agent policy optimization with agent-wise advantage estimation에이전트별 어드밴티지 추정을 통한 오프-폴리시 다중 에이전트 정책 최적화

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 153
  • Download : 0
In many multi-agent environments, agents must coordinate their actions with partial information to cooperate or compete. To overcome miscoordination, the framework appeared that uses central informationn to estimate the global value function. Naturally, adopting the policy gradient method to multi-agent reinforcement learning has been actively studied. However, many of these studies do not deal with credit assignment or only in an implicit way. There have been recent attempts to explicitly design rewards, but they have some weaknesses. In this paper, we investigate reward shaping and credit assignment in multi-agent systems with a theoretical understanding of the trade-off between variance and bias. Also, we study off-policy correction for multi-agent systems. From these, we propose a multi-agent off-policy optimization algorithm based on a new advantage estimator with off-policy correction. The algorithm is capable of off-policy estimation while enabling the control bias and the credit assignment. Empirical evaluations on the StarCraft II benchmark and multi-agent MuJoCo environments demonstrate that our method outperforms recent algorithms.
Advisors
Sung, Youngchulresearcher성영철researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2023
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2023.2,[iv, 23 p. :]

Keywords

multi-agent reinforcement learning▼apolicy gradient method▼aoff-policy generalization▼acredit assignment▼areward shaping; 다중 에이전트 강화학습▼a정책 강하 기법▼a오프-폴리시 일반화▼a신용 할당▼a보상 성형

URI
http://hdl.handle.net/10203/309933
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1032871&flag=dissertation
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0