Cross-modal knowledge distillation for one-shot human action recognition원샷 행동 인식을 위한 크로스 모달 지식 증류 방법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 4
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisor최호진-
dc.contributor.authorLee, Jong-Whoa-
dc.contributor.author이종화-
dc.date.accessioned2024-07-25T19:31:23Z-
dc.date.available2024-07-25T19:31:23Z-
dc.date.issued2023-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1045950&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/320718-
dc.description학위논문(석사) - 한국과학기술원 : 전산학부, 2023.8,[iii, 29 p. :]-
dc.description.abstractHuman action recognition (HAR) aims to understand human behaviors and predict the correct answer to each action using various visual information such as RGB video, infrared video, depth information video, or skeleton information as input data. In action recognition, the action may be expressed by different movements depending on the performers or interpreted as different actions in a specific domain. Such expressions make it challenging to prepare sufficient data for the learning of action recognition models. Thus, we consider an efficient method that can be trained with few samples and applied its potential features to other domains such as knowledge distillation. In this paper, we propose a teacher-student network to learn the representations from the given actions based on the skeleton sequences and textual information describing each action. Our teacher network consists of two encoders: a skeleton encoder, which is a graph-based model to fit the structure of skeletons, and a text encoder which is pre-trained with large-scale datasets. The teacher network uses the skeleton sequences and additional textual information of the synonyms of the action labels to provide cross-modality to the student network. Furthermore, the student network contains only a skeleton encoder same as the teacher to learn the semantic relationships guided by the knowledge of the teacher. Experiments on one-shot HAR using the public dataset NTU RGB+D120 demonstrate the state-of-the-art performance of the proposed method.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subject행동 인식▼a관절 기반 행동 인식▼a관절 정보▼a원샷 기반 학습▼a크로스 모달 지식 증류▼a교사-학생 네트워크-
dc.subjecthuman action recognition▼askeleton-based human action recognition▼askeleton information▼aone-shot learning▼across-modal knowledge distillation▼ateacher-student networks-
dc.titleCross-modal knowledge distillation for one-shot human action recognition-
dc.title.alternative원샷 행동 인식을 위한 크로스 모달 지식 증류 방법-
dc.typeThesis(Master)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :전산학부,-
dc.contributor.alternativeauthorChoi, Ho-Jin-
Appears in Collection
CS-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0