Parallelized Spatiotemporal Slot Binding for Videos

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 37
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorSingh, Gautamko
dc.contributor.authorWang, Yueko
dc.contributor.authorYang, Jiaweiko
dc.contributor.authorIvanovic, Borisko
dc.contributor.authorAhn, Sungjinko
dc.contributor.authorPavone, Marcoko
dc.contributor.authorChe, Tongko
dc.date.accessioned2024-06-18T06:00:57Z-
dc.date.available2024-06-18T06:00:57Z-
dc.date.created2024-06-18-
dc.date.issued2024-07-25-
dc.identifier.citationThe Forty-first International Conference on Machine Learning-
dc.identifier.urihttp://hdl.handle.net/10203/319834-
dc.description.abstractWhile modern best practices advocate for scalable architectures that support long-range interactions, object-centric models are yet to fully embrace these architectures. In particular, existing object-centric models for handling sequential inputs, due to their reliance on RNN-based implementation, show poor stability and capacity and are slow to train on long sequences. We introduce Parallelizable Spatiotemporal Binder or PSB, the first temporally-parallelizable slot learning architecture for sequential inputs. Unlike conventional RNN-based approaches, PSB produces object-centric representations, known as slots, for all time-steps in parallel. This is achieved by refining the initial slots across all time-steps through a fixed number of layers equipped with causal attention. By capitalizing on the parallelism induced by our architecture, the proposed model exhibits a significant boost in efficiency. In experiments, we test PSB extensively as an encoder within an auto-encoding framework paired with a wide variety of decoder options. Compared to the state-of-the-art, our architecture demonstrates stable training on longer sequences, achieves parallelization that results in a 60% increase in training speed, and yields performance that is on par with or better on unsupervised 2D and 3D object-centric scene decomposition and understanding.-
dc.languageEnglish-
dc.publisherThe International Conference on Machine Learning (ICML)-
dc.titleParallelized Spatiotemporal Slot Binding for Videos-
dc.typeConference-
dc.type.rimsCONF-
dc.citation.publicationnameThe Forty-first International Conference on Machine Learning-
dc.identifier.conferencecountryAU-
dc.identifier.conferencelocationVienna-
dc.contributor.localauthorAhn, Sungjin-
dc.contributor.nonIdAuthorSingh, Gautam-
dc.contributor.nonIdAuthorWang, Yue-
dc.contributor.nonIdAuthorYang, Jiawei-
dc.contributor.nonIdAuthorIvanovic, Boris-
dc.contributor.nonIdAuthorPavone, Marco-
dc.contributor.nonIdAuthorChe, Tong-
Appears in Collection
CS-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0