DSpace at KOASAS: Parallelized Spatiotemporal Slot Binding for Videos

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Conference Papers(학술회의논문)

Parallelized Spatiotemporal Slot Binding for Videos

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 37
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Singh, Gautam	ko
dc.contributor.author	Wang, Yue	ko
dc.contributor.author	Yang, Jiawei	ko
dc.contributor.author	Ivanovic, Boris	ko
dc.contributor.author	Ahn, Sungjin	ko
dc.contributor.author	Pavone, Marco	ko
dc.contributor.author	Che, Tong	ko
dc.date.accessioned	2024-06-18T06:00:57Z	-
dc.date.available	2024-06-18T06:00:57Z	-
dc.date.created	2024-06-18	-
dc.date.issued	2024-07-25	-
dc.identifier.citation	The Forty-first International Conference on Machine Learning	-
dc.identifier.uri	http://hdl.handle.net/10203/319834	-
dc.description.abstract	While modern best practices advocate for scalable architectures that support long-range interactions, object-centric models are yet to fully embrace these architectures. In particular, existing object-centric models for handling sequential inputs, due to their reliance on RNN-based implementation, show poor stability and capacity and are slow to train on long sequences. We introduce Parallelizable Spatiotemporal Binder or PSB, the first temporally-parallelizable slot learning architecture for sequential inputs. Unlike conventional RNN-based approaches, PSB produces object-centric representations, known as slots, for all time-steps in parallel. This is achieved by refining the initial slots across all time-steps through a fixed number of layers equipped with causal attention. By capitalizing on the parallelism induced by our architecture, the proposed model exhibits a significant boost in efficiency. In experiments, we test PSB extensively as an encoder within an auto-encoding framework paired with a wide variety of decoder options. Compared to the state-of-the-art, our architecture demonstrates stable training on longer sequences, achieves parallelization that results in a 60% increase in training speed, and yields performance that is on par with or better on unsupervised 2D and 3D object-centric scene decomposition and understanding.	-
dc.language	English	-
dc.publisher	The International Conference on Machine Learning (ICML)	-
dc.title	Parallelized Spatiotemporal Slot Binding for Videos	-
dc.type	Conference	-
dc.type.rims	CONF	-
dc.citation.publicationname	The Forty-first International Conference on Machine Learning	-
dc.identifier.conferencecountry	AU	-
dc.identifier.conferencelocation	Vienna	-
dc.contributor.localauthor	Ahn, Sungjin	-
dc.contributor.nonIdAuthor	Singh, Gautam	-
dc.contributor.nonIdAuthor	Wang, Yue	-
dc.contributor.nonIdAuthor	Yang, Jiawei	-
dc.contributor.nonIdAuthor	Ivanovic, Boris	-
dc.contributor.nonIdAuthor	Pavone, Marco	-
dc.contributor.nonIdAuthor	Che, Tong	-

Appears in Collection: CS-Conference Papers(학술회의논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Parallelized Spatiotemporal Slot Binding for Videos

KOASAS

Communities & Collections