Learning coordinated behaviors in multi-agent reinforcement learning다중 에이전트 강화학습의 행동 조화 학습 방법 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 190
  • Download : 0
Multi-Agent Reinforcement Learning (MARL) is a learning framework that learns intelligent autonomous agents to take actions based on local observation for achieving a common goal or individual goals. With the success of reinforcement learning in the single-agent domain, MARL is being actively studied and applied to real-world problems such as traffic control systems and connected self-driving cars, which can be modeled as multi-agent systems requiring coordinated control. In this thesis, we aim to develop MARL algorithms that learn the coordinated behaviors of multiple agents, which is a core challenge of MARL, to achieve high coordination or cooperation among agents. For this, we consider two approaches to enhance coordination in both explicit and implicit ways: communication-based MARL and coordinated exploration-based MARL. Under these approaches, we propose several learning algorithms to improve coordination between agents. In the first half, we propose two communication-based MARL algorithms to enhance coordination explicitly. Communication is one of the core components for learning coordinated behavior in multi-agent systems since communication enables multiple agents to interact with other agents directly. In this first half, we aim to address the fundamental question that what content should be included in a message and how can we learn the message efficiently and robustly. For efficient and robust communication schemes, we first propose a new learning technique named Message-dropout to improve performance and robustness against communication errors under two application scenarios: 1) multi-agent reinforcement learning with direct message communication among agents and 2) centralized training with decentralized execution. In both scenarios, we show that the proposed message-dropout technique with a proper dropout rate improves the reinforcement learning performance significantly in terms of the training speed and the steady-state performance in the execution phase and makes learning robust against communication errors in the execution phase. Second, we propose a new communication scheme named Intention Sharing which harnesses the benefit of communication beyond sharing partial observation. Existing communication methods adopt end-to-end training based on differential communication channels and thus the trained message encodes the past and current information to maximize the other agents' objectives. Thus, messages learned by the existing communication schemes do not capture any future information or intention of agents. To solve this problem, the proposed intention-sharing method enables multiple agents to share their intentions by sharing their intention by generating an imagined trajectory capturing its own intention and using it as the content of the message by applying an attention mechanism to learn the relative importance of the components. We provide extensive experimental results and ablation studies to show the effectiveness of the proposed algorithms. In the second half, we propose two coordinated exploration methods that yield implicit coordination between agents. Exploration is an essential element required in RL due to the assumption that all state-action pairs should be visited infinitely often to guarantee the convergence of model-free RL and it becomes more challenging in MARL since the state-action space grows exponentially as the number of agents increases. Hence, exploration of autonomous agents should be correlated with those of other agents to effectively visit meaningful unseen states. For this, we first propose a new approach to mutual information-based coordination for MARL to coordinate simultaneous actions among multiple agents. The proposed method is based on introducing a common latent variable to induce mutual information among simultaneous actions of multiple agents and on a variational lower bound on MI that enables tractable optimization. Under the proposed formulation, applying policy iteration by redefining value functions, we propose the practical algorithm to learn the coordination of simultaneous actions among multiple agents. We also propose a new framework based on entropy regularization for adaptive exploration in MARL to handle the multi-agent exploration-exploitation trade-off. The proposed framework allocates different target entropy across agents over time based on our newly-proposed metric for the degree of necessary exploration for each agent. We provide various experiments including a didactic example and popular MARL benchmark environments.
Advisors
Sung, Youngchulresearcher성영철researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2023
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2023.2,[vi, 97 p. :]

Keywords

Reinforcement learning▼aMulti-agent reinforcement learning▼aCoordinated behaviors▼aCommunication▼aExploration; 강화학습▼a다중 에이전트 강화학습▼a행동 조화▼a통신▼a탐험

URI
http://hdl.handle.net/10203/309132
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1030536&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0