Unified spatio-temporal neural networks for contextual action understanding맥락적 행동 이해를 위한 시공간 통합 신경망

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 520
  • Download : 0
Context is important to understand human action because the same action can be interpreted in different ways, depending on context. Specifically, context works in a bottom-up or a top-down manner. On the one hand, context is formed by observing a series of actions in a bottom-up manner and affects to the recognition of the next action. On the other hand, context is predefined or planned before action execution and then used to execute an intended series of actions in a top-down manner. The type of actions that require contextual information is called contextual action. Contextual action should be comprehensively understood including bottom-up and top-down contexts. To do that, a mechanism for long-term information processing should be considered for maintaining contextual information over a long period of time. Despite the importance of contextual action understanding, the studies for contextual action understanding have not been active. Therefore, in this dissertation, we cover contextual action understanding, especially for contextual action recognition and planning, from deep neural network modeling to practical problems related to the training of deep neural networks. The contributions of this dissertation are the following. First, we propose a unified spatio-temporal network to overcome the existing neural networks for action recognition having the difficulty in contextual action recognition. The proposed unified spatio-temporal neural network combines a spatial hierarchy and a temporal hierarchy of individual neural networks into a single neural network having a spatio-temporal hierarchy. Through the spatio-temporal hierarchy, the proposed neural network can extract low-level motion features in lower layers and high-level motion features in higher layers. Thanks to long-term processing capability, the proposed neural network shows robust recognition performance under severe dynamic occlusion, and successfully recognizes contextual action, which cannot be done by the existing neural networks. Second, we propose a temporal normalization method to enhance the contextual processing capability and the learning speed of the unified spatio-temporal neural network. Although the unified spatio-temporal neural network has the rich spatio-temporal processing capability required for contextual action recognition, saturation functions cause a vanishing gradient problem limiting long-term processing capability and the network training is very slow because of the model complexity. Compared with existing normalization methods, the proposed temporal normalization method shows better learning acceleration and contextual processing capability. In addition, the improvement of the proposed method is further boosted by using the proposed method with existing spatial normalization methods. Finally, we propose a unified spatio-temporal neural network based on stochastic predictive coding for planning and executing an appropriate series of actions when a specific context is given. Predictive coding framework is able to encode multimodal information, but it should predict high-dimensional sensory information, which requires huge computation. Also, in the case of the networks under deterministic predictive coding, a huge amount of training samples is required for good generalization. The proposed network reduces the computational cost by using dynamic visual attention and improves the planning performance by maintaining long-term visuospatial information on an external visuospatial memory. Furthermore, the proposed network provides good generalization with a small amount of training samples thanks to stochastic predictive coding using variational Bayes.
Advisors
Shin, Jinwooresearcher신진우researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2019
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2019.2,[vii, 71 p. :]

Keywords

Contextual action▼aunified spatio-temporal neural networks▼anormalization▼agoal-directed action planning▼avisual attention▼aexternal visuospatial memory▼avariational Bayes; 맥락적 행동▼a시공간 통합 신경망▼a정규화▼a목표지향적 행동 계획▼a시각 주의집중▼a외부 시공간 메모리▼a변분 베이지안

URI
http://hdl.handle.net/10203/265218
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=842391&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0