DC Field | Value | Language |
---|---|---|
dc.contributor.author | Singh, Gautam | ko |
dc.contributor.author | Wang, Yue | ko |
dc.contributor.author | Yang, Jiawei | ko |
dc.contributor.author | Ivanovic, Boris | ko |
dc.contributor.author | Ahn, Sungjin | ko |
dc.contributor.author | Pavone, Marco | ko |
dc.contributor.author | Che, Tong | ko |
dc.date.accessioned | 2024-06-18T06:00:57Z | - |
dc.date.available | 2024-06-18T06:00:57Z | - |
dc.date.created | 2024-06-18 | - |
dc.date.issued | 2024-07-25 | - |
dc.identifier.citation | The Forty-first International Conference on Machine Learning | - |
dc.identifier.uri | http://hdl.handle.net/10203/319834 | - |
dc.description.abstract | While modern best practices advocate for scalable architectures that support long-range interactions, object-centric models are yet to fully embrace these architectures. In particular, existing object-centric models for handling sequential inputs, due to their reliance on RNN-based implementation, show poor stability and capacity and are slow to train on long sequences. We introduce Parallelizable Spatiotemporal Binder or PSB, the first temporally-parallelizable slot learning architecture for sequential inputs. Unlike conventional RNN-based approaches, PSB produces object-centric representations, known as slots, for all time-steps in parallel. This is achieved by refining the initial slots across all time-steps through a fixed number of layers equipped with causal attention. By capitalizing on the parallelism induced by our architecture, the proposed model exhibits a significant boost in efficiency. In experiments, we test PSB extensively as an encoder within an auto-encoding framework paired with a wide variety of decoder options. Compared to the state-of-the-art, our architecture demonstrates stable training on longer sequences, achieves parallelization that results in a 60% increase in training speed, and yields performance that is on par with or better on unsupervised 2D and 3D object-centric scene decomposition and understanding. | - |
dc.language | English | - |
dc.publisher | The International Conference on Machine Learning (ICML) | - |
dc.title | Parallelized Spatiotemporal Slot Binding for Videos | - |
dc.type | Conference | - |
dc.type.rims | CONF | - |
dc.citation.publicationname | The Forty-first International Conference on Machine Learning | - |
dc.identifier.conferencecountry | AU | - |
dc.identifier.conferencelocation | Vienna | - |
dc.contributor.localauthor | Ahn, Sungjin | - |
dc.contributor.nonIdAuthor | Singh, Gautam | - |
dc.contributor.nonIdAuthor | Wang, Yue | - |
dc.contributor.nonIdAuthor | Yang, Jiawei | - |
dc.contributor.nonIdAuthor | Ivanovic, Boris | - |
dc.contributor.nonIdAuthor | Pavone, Marco | - |
dc.contributor.nonIdAuthor | Che, Tong | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.