Bayesian learning and planning methods for partially observable dynamical systems부분 관측 가능 동적 시스템을 위한 베이지안 학습 및 계획 기법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 266
  • Download : 0
This thesis addresses a learning and planning problems of partially observable dynamical systems. The Gaussian process state-space model (GP-SSM) is adopted to model the latent dynamical systems and learn systems from the partially observable measurements. GP-SSM is a probabilistic representation learning scheme that represents unknown state transition and/or measurement models as Gaussian processes (GPs). While the majority of prior literature on learning of GP-SSM are focused on processing a given set of time series data, data may arrive and accumulate sequentially over time in most dynamical systems. Storing all such sequential data and updating the model over entire data incur large amount of computational resources in space and time. To overcome this difficulty, a practical method termed onlineGPSSM is proposed that incorporates stochastic variational inference (VI) and online VI with novel formulation. The proposed method mitigates the computational complexity without catastrophic forgetting and also support adaptation to changes in a system and/or a real environments. Furthermore, application of onlineGPSSM into the reinforcement learning (RL) of partially observable dynamical systems is presented by integrating onlineGPSSM with Bayesian filtering and trajectory optimization algorithms. The proposed GP-SSM-based RL framework is applied not only to control of partially observable dynamical systems, but also to active sensing of a mobile sensor. Comparative numerical experiments show the validity and efficiency of the proposed methods compared with the existing methods. To extend the proposed learning and planning methods to the multi-agent systems, several important challenges need to be taken into consideration. This thesis focuses on tackling multi-agent path planning for sensing problems, one of the challenges. Non-myopic path planning of mobile sensors has posed a high computational complexity issue and/or the necessity of high-level decision making. Existing works tackle these issues by heuristically assigning targets to each sensing agent and solving the split problem for each agent. However, such heuristic methods reduce the target estimation performance in the absence of considering the changes of target state estimation along time. This work detour the task-assignment problem by reformulating the general non-myopic planning problem to a distributed optimization problem with respect to targets. By combining alternating direction method of multipliers (ADMM) and local trajectory optimization method, the problem is solved and consensus (i.e., high-level decisions) is induced automatically among the targets. In addition, a modified receding-horizon control (RHC) scheme and edge-cutting method are proposed for efficient real-time operation. The proposed algorithm is validated through simulations in various scenarios.
Choi, Han-Limresearcher최한림researcher
한국과학기술원 :항공우주공학과,
Issue Date

학위논문(박사) - 한국과학기술원 : 항공우주공학과, 2020.8,[v, 108 p. :]


Partially Observable Dynamical Systems▼aNonlinear System Identification▼aGaussian Processs▼aModel-based Reinforcement Learning▼aActive Sensing▼aNon-Myopic Path Planning▼aDistributed Optimization▼aMulti-target Tracking; 부분 관측 가능 동적 시스템▼a비선형 시스템 식별▼a가우시안 과정▼a모델 기반 강화학습▼a비근시 경로 계획▼a분산 최적화▼a다중 표적 추적

Appears in Collection
Files in This Item
There are no files associated with this item.


  • mendeley


rss_1.0 rss_2.0 atom_1.0