DSpace at KOASAS: Learning coordinated behaviors in multi-agent reinforcement learning

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Theses_Ph.D.(박사논문)

Learning coordinated behaviors in multi-agent reinforcement learning다중 에이전트 강화학습의 행동 조화 학습 방법 연구

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 192
Download : 0

Export

Kim, Woojun

Multi-Agent Reinforcement Learning (MARL) is a learning framework that learns intelligent autonomous agents to take actions based on local observation for achieving a common goal or individual goals. With the success of reinforcement learning in the single-agent domain, MARL is being actively studied and applied to real-world problems such as traffic control systems and connected self-driving cars, which can be modeled as multi-agent systems requiring coordinated control. In this thesis, we aim to develop MARL algorithms that learn the coordinated behaviors of multiple agents, which is a core challenge of MARL, to achieve high coordination or cooperation among agents. For this, we consider two approaches to enhance coordination in both explicit and implicit ways: communication-based MARL and coordinated exploration-based MARL. Under these approaches, we propose several learning algorithms to improve coordination between agents. In the first half, we propose two communication-based MARL algorithms to enhance coordination explicitly. Communication is one of the core components for learning coordinated behavior in multi-agent systems since communication enables multiple agents to interact with other agents directly. In this first half, we aim to address the fundamental question that what content should be included in a message and how can we learn the message efficiently and robustly. For efficient and robust communication schemes, we first propose a new learning technique named Message-dropout to improve performance and robustness against communication errors under two application scenarios: 1) multi-agent reinforcement learning with direct message communication among agents and 2) centralized training with decentralized execution. In both scenarios, we show that the proposed message-dropout technique with a proper dropout rate improves the reinforcement learning performance significantly in terms of the training speed and the steady-state performance in the execution phase and makes learning robust against communication errors in the execution phase. Second, we propose a new communication scheme named Intention Sharing which harnesses the benefit of communication beyond sharing partial observation. Existing communication methods adopt end-to-end training based on differential communication channels and thus the trained message encodes the past and current information to maximize the other agents' objectives. Thus, messages learned by the existing communication schemes do not capture any future information or intention of agents. To solve this problem, the proposed intention-sharing method enables multiple agents to share their intentions by sharing their intention by generating an imagined trajectory capturing its own intention and using it as the content of the message by applying an attention mechanism to learn the relative importance of the components. We provide extensive experimental results and ablation studies to show the effectiveness of the proposed algorithms. In the second half, we propose two coordinated exploration methods that yield implicit coordination between agents. Exploration is an essential element required in RL due to the assumption that all state-action pairs should be visited infinitely often to guarantee the convergence of model-free RL and it becomes more challenging in MARL since the state-action space grows exponentially as the number of agents increases. Hence, exploration of autonomous agents should be correlated with those of other agents to effectively visit meaningful unseen states. For this, we first propose a new approach to mutual information-based coordination for MARL to coordinate simultaneous actions among multiple agents. The proposed method is based on introducing a common latent variable to induce mutual information among simultaneous actions of multiple agents and on a variational lower bound on MI that enables tractable optimization. Under the proposed formulation, applying policy iteration by redefining value functions, we propose the practical algorithm to learn the coordination of simultaneous actions among multiple agents. We also propose a new framework based on entropy regularization for adaptive exploration in MARL to handle the multi-agent exploration-exploitation trade-off. The proposed framework allocates different target entropy across agents over time based on our newly-proposed metric for the degree of necessary exploration for each agent. We provide various experiments including a didactic example and popular MARL benchmark environments.

Advisors: Sung, Youngchul researcher; 성영철 researcher

Description: 한국과학기술원 :전기및전자공학부,

Publisher: 한국과학기술원

Issue Date: 2023

Identifier: 325007

Language: eng

Description: 학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2023.2,[vi, 97 p. :]

Keywords: Reinforcement learning▼aMulti-agent reinforcement learning▼aCoordinated behaviors▼aCommunication▼aExploration; 강화학습▼a다중 에이전트 강화학습▼a행동 조화▼a통신▼a탐험

URI: http://hdl.handle.net/10203/309132

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1030536&flag=dissertation

Appears in Collection: EE-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Learning coordinated behaviors in multi-agent reinforcement learning다중 에이전트 강화학습의 행동 조화 학습 방법 연구

KOASAS

Communities & Collections