DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | 최호진 | - |
dc.contributor.author | Sim, Minho | - |
dc.contributor.author | 심민호 | - |
dc.date.accessioned | 2024-07-25T19:31:24Z | - |
dc.date.available | 2024-07-25T19:31:24Z | - |
dc.date.issued | 2023 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1045957&flag=dissertation | en_US |
dc.identifier.uri | http://hdl.handle.net/10203/320725 | - |
dc.description | 학위논문(석사) - 한국과학기술원 : 전산학부, 2023.8,[iv, 35 p. :] | - |
dc.description.abstract | In real-world scenarios, human action recognition (HAR) is essentially an open set problem that requires a model to classify actions from known classes and detect actions from unknown classes simultaneously. However, HAR models are easily biased to static information in the video (e.g., background), which can lead to performance degradation of open set action recognition (OSAR) models. In this paper, we propose a simple framework for improving OSAR based on the video attention map extracted from the video vision transformer model. Specifically, our framework eliminates patches with static bias in video using two debiasing steps: (1) frame selection and (2) patch masking. Experimental results show that our framework achieves consistent performance improvement on multiple OSAR methods and challenging benchmarks. Furthermore, we introduce two new OSAR tasks, Kinetics-400 vs. Kinetics-600 exclusive and Kinetics-400 vs. Kinetics-700 exclusive, to validate our method in a setting close to the real-world scenario. With extensive experiments, we demonstrate the effectiveness of our attention-based masking, and in-depth analysis validates the effect of static bias on OSAR. | - |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | 오픈셋 행동 인식▼a비디오 마스킹▼a어텐션 맵▼a비디오 비전 트랜스포머 | - |
dc.subject | Open set action recognition▼avideo masking▼aattention map▼avideo vision transformer | - |
dc.title | Attention-based video masking for improving open set action recognition | - |
dc.title.alternative | 오픈셋 행동 인식 향상을 위한 어텐션 기반 비디오 마스킹 | - |
dc.type | Thesis(Master) | - |
dc.identifier.CNRN | 325007 | - |
dc.description.department | 한국과학기술원 :전산학부, | - |
dc.contributor.alternativeauthor | Choi, Ho-Jin | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.