Online neural Q-Learning using heuristic weight assignment algorithm and optimization method = 휴리스틱한 가중치 배당 알고리즘과 최적화 방법을 이용한 온라인 Neural Q-Learning

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 671
  • Download : 0
The classic and almost recent robots still rely on fixed behavior based control. So the recent models of robots focus on increasing the robot`s ability to deal with any uncertainty from the environment. One approach of the paradigm is learning from experience and creating an appropriate control system from it. This approach is called Reinforcement Learning(RL). RL is a class of intelligent control methods that develop or improve the actions of the agent in an uncertain environment. By interacting with the environment, the agent learns and finds an optimal solution. To find the optimal solution, RL uses the value function. The value function is calculated using Bellman equation which is a nonlinear Lyapunov equation. But it usually requires knowledge of the system dynamics in order to solve for the value function. To avoid it, Q-Learning method for discrete space was introduced by Watkins. Another method is action dependent heuristic dynamic programming(AD HDP). AD HDP is based on an actor-critic structure that was introduced by Werbos. But the actor-critic structure involves training of two or more function approximators. This makes the training and the analysis of the results difficult. If it fails, it is unclear whether this is a result of the settings of the training parameters, the choice of function approximators or insufficient exploration in generating the data. In contrast, Neural Q-Learning which involves the training of a function approximator was introduced by S. Hagen to apply Q-Learning for continuous space. This approach is based on Q-Learning for Linear Quadratic Regulation(LQR). But the learning time of Neural Q-Learning is very slow when it learns very complex systems such as Multi Input Multi Output (MIMO) system etc. Furthermore, batch learning cannot adapt in other environments without using a large data set for the training process. To solve these problems, we propose three contributions. First, we apply this learning to online learning t...
Lee, Ju-Jangresearcher이주장
한국과학기술원 : 로봇공학학제전공,
Issue Date
514920/325007  / 020113119

학위논문(석사) - 한국과학기술원 : 로봇공학학제전공, 2013.2, [ v, 39 p. ]


reinforcement learning; intelligent control; neural Q-Learning; balancing control; 강화학습; 지능제어; 뉴럴 큐 학습; 밸런싱 제어; 온라인 학습; online learning

Appears in Collection
Files in This Item
There are no files associated with this item.


  • mendeley


rss_1.0 rss_2.0 atom_1.0