DSpace at KOASAS: Algorithm and application of imitation learning and reinforcement learning for sequential decision making problems with multiple agents

DSpace at KOASAS

College of Engineering(공과대학)Dept. of Industrial and Systems Engineering(산업및시스템공학과)IE-Theses_Ph.D.(박사논문)

Algorithm and application of imitation learning and reinforcement learning for sequential decision making problems with multiple agents다중 에이전트가 존재하는 순차적 의사 결정 모델에 대한 강화학습과 모방학습 알고리즘 및 적용에 관한 연구

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 330
Download : 0

Export

Lee, Hyun-Rok

Systems, in systems engineering, are characterized by a large number of interrelated elements to achieve predefined objectives. Therefore, the general purpose of the sequential decision problem defined in the engineering system is to operate the system according to the predefined objectives through decision making based on the recognized information at each decision epoch. Because a system consists of many components that are related to each other, many decision problems in a system often need to be treated as sequential decision-making problems with multiple agents. In this dissertation, we study various sequential decision-making problems with multiple agents, using the emergency medical service system as a major application domain. Markov Decision Processes (MDP), decentralized-Partially Observable MDP (dec-POMDP), stochastic game (SG) models are mainly used, and multi-agent reinforcement learning (MARL) and imitation learning algorithms are used to solve difficult problems to solve. A typical problem we used is a selective patient admission problem at an ED after a mass-casualty incident. In Chapter 2, we formulate and analyze the MDP model for the selective patient admission problem by focusing on a single ED. The structural properties of the optimal policy of MDP model are reviewed and we identify the variation of an optimal policy according to the characteristics of the input functions that represent the external factors affecting decision making. We propose the solution method for partially-observable multi-agent problems in disaster response operations in Chapter 3. A Dec-POMDP model is suitable for the sequential decision-making problems in disaster response because it assumes the situation where multiple decision-makers choose actions based on partial information. We propose a solution method for dec-POMDP problems in disaster response by combining MARL and behavior cloning (BC) technique of imitation learning. The proposed solution method uses reference policies from the previous research on disaster response through the imitation learning method. We utilize the domain knowledge about the problem through BC to pretrain policy network and value network which will be used in reinforcement learning. As a case of using the proposed solution method, we generalize the mathematical model for the selective patient admission problem to the dec-POMDP model. The proposed solution method significantly reduces the computation time than the MARL algorithm which does not use pretraining and can obtain a near optimal dec-POMDP policy in which performance is close to the upper bound value of a problem. Besides, we find through various numerical experiments that the proposed method is still effective in inherently partially observable environments and the cases when decisions at the prehospital phase effects on the performance of selective patient admission strategy. In Chapter 4, we propose a method to improve a cooperative MARL algorithm using the imitation learning method. This method is using the reference policy obtained from the decision environment with more information than the situation assumed in a dec-POMDP problem to find a solution to a dec-POMDP problem. It collects the demonstrations from the solution of an multi-agent MDP (MMDP) or multi-agent POMDP (MPOMDP) model to mix these demonstrations when training a policy network in an MARL algorithm. We discover that the baseline MARL algorithm can obtain a better dec-POMDP policy when we mix demonstrations from a solution of a centralized model through the experiments in benchmark dec-POMDP problems. A comparison test shows that the method of mixing demonstration is more effective than the another method of using demonstrations to improve an MARL algorithm. We also find that investing a computational budget to learn a centralized policy in the earlier training steps is effective when a reference centralized policy is not provided.

Advisors: Lee, Taesik researcher; 이태식 researcher

Description: 한국과학기술원 :산업및시스템공학과,

Publisher: 한국과학기술원

Issue Date: 2020

Identifier: 325007

Language: eng

Description: 학위논문(박사) - 한국과학기술원 : 산업및시스템공학과, 2020.2,[v, 89 p. :]

Keywords: Sequential decision making model▼aDisaster response system▼aMulti-agent reinforcement learning▼aImitation learning▼aEmergency medical services system; 순차적 의사결정 모델▼a재난 대응 시스템▼a다중 에이전트 강화학습▼a모방학습▼a응급의료시스템

URI: http://hdl.handle.net/10203/283609

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=908365&flag=dissertation

Appears in Collection: IE-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Algorithm and application of imitation learning and reinforcement learning for sequential decision making problems with multiple agents다중 에이전트가 존재하는 순차적 의사 결정 모델에 대한 강화학습과 모방학습 알고리즘 및 적용에 관한 연구

KOASAS

Communities & Collections