Towards end-to-end generative modeling of long videos with memory-efficient bidirectional transformers메모리 효율적 양방향 트랜스포머를 활용한 긴 비디오의 엔드 투 엔드 생성 모델링 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 1
  • Download : 0
Autoregressive transformers have shown remarkable success in video generation. However, the transformers are prohibited from directly learning the long-term dependency in videos due to the quadratic complexity of self-attention, and inherently suffering from slow inference time and error propagation due to the autoregressive process. In this paper, we propose Memory-efficient Bidirectional Transformer (MeBT) for end-to-end learning of long-term dependency in videos and fast inference. Based on recent advances in bidirectional transformers, our method learns to decode the entire spatio-temporal volume of a video in parallel from partially observed patches. The proposed transformer achieves a linear time complexity in both encoding and decoding, by projecting observable context tokens into a fixed number of latent tokens and conditioning them to decode the masked tokens through the cross-attention. Empowered by linear complexity and bidirectional modeling, our method demonstrates significant improvement over the autoregressive transformers for generating moderately long videos in both quality and speed.
Advisors
홍승훈researcher
Description
한국과학기술원 :전산학부,
Publisher
한국과학기술원
Issue Date
2023
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전산학부, 2023.8,[iv, 26 p. :]

Keywords

비디오 생성 모델링▼a양방향 트랜스포머▼a메모리 효율▼a내재 변수 압축; Generative modeling of videos▼abidirectional transformer▼amemory efficiency▼alatent bottleneck

URI
http://hdl.handle.net/10203/320727
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1045959&flag=dissertation
Appears in Collection
CS-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0