Online neural Q-Learning using heuristic weight assignment algorithm and optimization method휴리스틱한 가중치 배당 알고리즘과 최적화 방법을 이용한 온라인 Neural Q-Learning

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 840
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisorLee, Ju-Jang-
dc.contributor.advisor이주장-
dc.contributor.authorKim, Yeon-Seob-
dc.contributor.author김연섭-
dc.date.accessioned2013-09-12T05:01:31Z-
dc.date.available2013-09-12T05:01:31Z-
dc.date.issued2013-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=514920&flag=dissertation-
dc.identifier.urihttp://hdl.handle.net/10203/182397-
dc.description학위논문(석사) - 한국과학기술원 : 로봇공학학제전공, 2013.2, [ v, 39 p. ]-
dc.description.abstractThe classic and almost recent robots still rely on fixed behavior based control. So the recent models of robots focus on increasing the robot`s ability to deal with any uncertainty from the environment. One approach of the paradigm is learning from experience and creating an appropriate control system from it. This approach is called Reinforcement Learning(RL). RL is a class of intelligent control methods that develop or improve the actions of the agent in an uncertain environment. By interacting with the environment, the agent learns and finds an optimal solution. To find the optimal solution, RL uses the value function. The value function is calculated using Bellman equation which is a nonlinear Lyapunov equation. But it usually requires knowledge of the system dynamics in order to solve for the value function. To avoid it, Q-Learning method for discrete space was introduced by Watkins. Another method is action dependent heuristic dynamic programming(AD HDP). AD HDP is based on an actor-critic structure that was introduced by Werbos. But the actor-critic structure involves training of two or more function approximators. This makes the training and the analysis of the results difficult. If it fails, it is unclear whether this is a result of the settings of the training parameters, the choice of function approximators or insufficient exploration in generating the data. In contrast, Neural Q-Learning which involves the training of a function approximator was introduced by S. Hagen to apply Q-Learning for continuous space. This approach is based on Q-Learning for Linear Quadratic Regulation(LQR). But the learning time of Neural Q-Learning is very slow when it learns very complex systems such as Multi Input Multi Output (MIMO) system etc. Furthermore, batch learning cannot adapt in other environments without using a large data set for the training process. To solve these problems, we propose three contributions. First, we apply this learning to online learning t...eng
dc.languageeng-
dc.publisher한국과학기술원-
dc.subjectreinforcement learning-
dc.subjectintelligent control-
dc.subjectneural Q-Learning-
dc.subjectbalancing control-
dc.subject강화학습-
dc.subject지능제어-
dc.subject뉴럴 큐 학습-
dc.subject밸런싱 제어-
dc.subject온라인 학습-
dc.subjectonline learning-
dc.titleOnline neural Q-Learning using heuristic weight assignment algorithm and optimization method-
dc.title.alternative휴리스틱한 가중치 배당 알고리즘과 최적화 방법을 이용한 온라인 Neural Q-Learning-
dc.typeThesis(Master)-
dc.identifier.CNRN514920/325007 -
dc.description.department한국과학기술원 : 로봇공학학제전공, -
dc.identifier.uid020113119-
dc.contributor.localauthorLee, Ju-Jang-
dc.contributor.localauthor이주장-
Appears in Collection
RE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0