Attention-based video masking for improving open set action recognition오픈셋 행동 인식 향상을 위한 어텐션 기반 비디오 마스킹

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 5
  • Download : 0
In real-world scenarios, human action recognition (HAR) is essentially an open set problem that requires a model to classify actions from known classes and detect actions from unknown classes simultaneously. However, HAR models are easily biased to static information in the video (e.g., background), which can lead to performance degradation of open set action recognition (OSAR) models. In this paper, we propose a simple framework for improving OSAR based on the video attention map extracted from the video vision transformer model. Specifically, our framework eliminates patches with static bias in video using two debiasing steps: (1) frame selection and (2) patch masking. Experimental results show that our framework achieves consistent performance improvement on multiple OSAR methods and challenging benchmarks. Furthermore, we introduce two new OSAR tasks, Kinetics-400 vs. Kinetics-600 exclusive and Kinetics-400 vs. Kinetics-700 exclusive, to validate our method in a setting close to the real-world scenario. With extensive experiments, we demonstrate the effectiveness of our attention-based masking, and in-depth analysis validates the effect of static bias on OSAR.
Advisors
최호진researcher
Description
한국과학기술원 :전산학부,
Publisher
한국과학기술원
Issue Date
2023
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전산학부, 2023.8,[iv, 35 p. :]

Keywords

오픈셋 행동 인식▼a비디오 마스킹▼a어텐션 맵▼a비디오 비전 트랜스포머; Open set action recognition▼avideo masking▼aattention map▼avideo vision transformer

URI
http://hdl.handle.net/10203/320725
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1045957&flag=dissertation
Appears in Collection
CS-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0