DSpace at KOASAS: K-centered Patch Sampling for Efficient Video Recognition

DSpace at KOASAS

College of Engineering(공과대학)Kim Jaechul Graduate School of AI(김재철AI대학원)AI-Conference Papers(학술대회논문)

K-centered Patch Sampling for Efficient Video Recognition

Cited 5 time in

Cited 0 time in

Hit : 120
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Park, Seong Hyeon	ko
dc.contributor.author	Tack, Jihoon	ko
dc.contributor.author	Heo, Byeongho	ko
dc.contributor.author	Ha, Jung-Woo	ko
dc.contributor.author	Shin, Jinwoo	ko
dc.date.accessioned	2023-03-28T06:00:19Z	-
dc.date.available	2023-03-28T06:00:19Z	-
dc.date.created	2023-03-08	-
dc.date.issued	2022-10	-
dc.identifier.citation	17th European Conference on Computer Vision (ECCV), pp.160 - 176	-
dc.identifier.issn	0302-9743	-
dc.identifier.uri	http://hdl.handle.net/10203/305865	-
dc.description.abstract	For decades, it has been a common practice to choose a subset of video frames for reducing the computational burden of a video understanding model. In this paper, we argue that this popular heuristic might be sub-optimal under recent transformer-based models. Specifically, inspired by that transformers are built upon patches of video frames, we propose to sample patches rather than frames using the greedy K-center search, i.e., the farthest patch to what has been chosen so far is sampled iteratively. We then show that a transformer trained with the selected video patches can outperform its baseline trained with the video frames sampled in the traditional way. Furthermore, by adding a certain spatiotemporal structuredness condition, the proposed K-centered patch sampling can be even applied to the recent sophisticated video transformers, boosting their performance further. We demonstrate the superiority of our method on Something-Something and Kinetics datasets.	-
dc.language	English	-
dc.publisher	SPRINGER INTERNATIONAL PUBLISHING AG	-
dc.title	K-centered Patch Sampling for Efficient Video Recognition	-
dc.type	Conference	-
dc.identifier.wosid	000903538700010	-
dc.identifier.scopusid	2-s2.0-85144529452	-
dc.type.rims	CONF	-
dc.citation.beginningpage	160	-
dc.citation.endingpage	176	-
dc.citation.publicationname	17th European Conference on Computer Vision (ECCV)	-
dc.identifier.conferencecountry	IS	-
dc.identifier.conferencelocation	Tel Aviv	-
dc.identifier.doi	10.1007/978-3-031-19833-5_10	-
dc.contributor.localauthor	Shin, Jinwoo	-
dc.contributor.nonIdAuthor	Heo, Byeongho	-
dc.contributor.nonIdAuthor	Ha, Jung-Woo	-

Appears in Collection: AI-Conference Papers(학술대회논문)

Files in This Item: There are no files associated with this item.

This item is cited by other documents in WoS

⊙ Detail Information in WoSⓡ	Click to see
⊙ Cited 5 items in WoS	Click to see citing articles in

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

K-centered Patch Sampling for Efficient Video Recognition

This item is cited by other documents in WoS

KOASAS

Communities & Collections