K-centered Patch Sampling for Efficient Video Recognition

Cited 5 time in webofscience Cited 0 time in scopus
  • Hit : 120
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorPark, Seong Hyeonko
dc.contributor.authorTack, Jihoonko
dc.contributor.authorHeo, Byeonghoko
dc.contributor.authorHa, Jung-Wooko
dc.contributor.authorShin, Jinwooko
dc.date.accessioned2023-03-28T06:00:19Z-
dc.date.available2023-03-28T06:00:19Z-
dc.date.created2023-03-08-
dc.date.issued2022-10-
dc.identifier.citation17th European Conference on Computer Vision (ECCV), pp.160 - 176-
dc.identifier.issn0302-9743-
dc.identifier.urihttp://hdl.handle.net/10203/305865-
dc.description.abstractFor decades, it has been a common practice to choose a subset of video frames for reducing the computational burden of a video understanding model. In this paper, we argue that this popular heuristic might be sub-optimal under recent transformer-based models. Specifically, inspired by that transformers are built upon patches of video frames, we propose to sample patches rather than frames using the greedy K-center search, i.e., the farthest patch to what has been chosen so far is sampled iteratively. We then show that a transformer trained with the selected video patches can outperform its baseline trained with the video frames sampled in the traditional way. Furthermore, by adding a certain spatiotemporal structuredness condition, the proposed K-centered patch sampling can be even applied to the recent sophisticated video transformers, boosting their performance further. We demonstrate the superiority of our method on Something-Something and Kinetics datasets.-
dc.languageEnglish-
dc.publisherSPRINGER INTERNATIONAL PUBLISHING AG-
dc.titleK-centered Patch Sampling for Efficient Video Recognition-
dc.typeConference-
dc.identifier.wosid000903538700010-
dc.identifier.scopusid2-s2.0-85144529452-
dc.type.rimsCONF-
dc.citation.beginningpage160-
dc.citation.endingpage176-
dc.citation.publicationname17th European Conference on Computer Vision (ECCV)-
dc.identifier.conferencecountryIS-
dc.identifier.conferencelocationTel Aviv-
dc.identifier.doi10.1007/978-3-031-19833-5_10-
dc.contributor.localauthorShin, Jinwoo-
dc.contributor.nonIdAuthorHeo, Byeongho-
dc.contributor.nonIdAuthorHa, Jung-Woo-
Appears in Collection
AI-Conference Papers(학술대회논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 5 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0