Distributed dynamic programming and reinforcement learning from a control system perspective제어 시스템 관점에서의 분산 동적 프로그래밍 및 강화학습

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 119
  • Download : 0
We investigate distributed dynamic programming (DP) and reinforcement learning (RL) to solve networked multi-agent Markov decision problems (MDPs). We consider a distributed multi-agent case, where each agent does not have an access to the rewards of other agents except for its own reward. Moreover, each agent can share their parameters with its neighbors over a communication network represented by a graph. We propose a distributed DP in the continuous-time domain, and prove its convergence through control theoretic viewpoints. The proposed analysis can be viewed as a preliminary ordinary differential equation (ODE) analysis of a distributed temporal difference (TD) learning algorithm, whose convergence can be proved using Borkar-Meyn theorem and the single time-scale approach. Finally, We extend the DP to the corresponding TD learning.
Advisors
Lee, Donghwanresearcher이동환researcher
Description
한국과학기술원 :로봇공학학제전공,
Publisher
한국과학기술원
Issue Date
2022
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 로봇공학학제전공, 2022.8,[ii, 24 p. :]

Keywords

Dynamic programming,▼aMarkov decision processes▼amulti-agent systems▼aconsensus▼areinforcement learning; 동적 프로그래밍▼a마르코프 결정 프로세스▼a다중 에이전트 시스템▼a합의▼a강화 학습

URI
http://hdl.handle.net/10203/308279
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1008233&flag=dissertation
Appears in Collection
RE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0