DSpace at KOASAS: Towards end-to-end generative modeling of long videos with memory-efficient bidirectional transformers

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Master(석사논문)

Towards end-to-end generative modeling of long videos with memory-efficient bidirectional transformers메모리 효율적 양방향 트랜스포머를 활용한 긴 비디오의 엔드 투 엔드 생성 모델링 연구

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 2
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	홍승훈	-
dc.contributor.author	Yoo, Jaehoon	-
dc.contributor.author	유재훈	-
dc.date.accessioned	2024-07-25T19:31:25Z	-
dc.date.available	2024-07-25T19:31:25Z	-
dc.date.issued	2023	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1045959&flag=dissertation	en_US
dc.identifier.uri	http://hdl.handle.net/10203/320727	-
dc.description	학위논문(석사) - 한국과학기술원 : 전산학부, 2023.8,[iv, 26 p. :]	-
dc.description.abstract	Autoregressive transformers have shown remarkable success in video generation. However, the transformers are prohibited from directly learning the long-term dependency in videos due to the quadratic complexity of self-attention, and inherently suffering from slow inference time and error propagation due to the autoregressive process. In this paper, we propose Memory-efficient Bidirectional Transformer (MeBT) for end-to-end learning of long-term dependency in videos and fast inference. Based on recent advances in bidirectional transformers, our method learns to decode the entire spatio-temporal volume of a video in parallel from partially observed patches. The proposed transformer achieves a linear time complexity in both encoding and decoding, by projecting observable context tokens into a fixed number of latent tokens and conditioning them to decode the masked tokens through the cross-attention. Empowered by linear complexity and bidirectional modeling, our method demonstrates significant improvement over the autoregressive transformers for generating moderately long videos in both quality and speed.	-
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	비디오 생성 모델링▼a양방향 트랜스포머▼a메모리 효율▼a내재 변수 압축	-
dc.subject	Generative modeling of videos▼abidirectional transformer▼amemory efficiency▼alatent bottleneck	-
dc.title	Towards end-to-end generative modeling of long videos with memory-efficient bidirectional transformers	-
dc.title.alternative	메모리 효율적 양방향 트랜스포머를 활용한 긴 비디오의 엔드 투 엔드 생성 모델링 연구	-
dc.type	Thesis(Master)	-
dc.identifier.CNRN	325007	-
dc.description.department	한국과학기술원 :전산학부,	-
dc.contributor.alternativeauthor	Hong, Seunghoon	-

Appears in Collection: CS-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Towards end-to-end generative modeling of long videos with memory-efficient bidirectional transformers메모리 효율적 양방향 트랜스포머를 활용한 긴 비디오의 엔드 투 엔드 생성 모델링 연구

KOASAS

Communities & Collections