Towards end-to-end generative modeling of long videos with memory-efficient bidirectional transformers메모리 효율적 양방향 트랜스포머를 활용한 긴 비디오의 엔드 투 엔드 생성 모델링 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 2
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisor홍승훈-
dc.contributor.authorYoo, Jaehoon-
dc.contributor.author유재훈-
dc.date.accessioned2024-07-25T19:31:25Z-
dc.date.available2024-07-25T19:31:25Z-
dc.date.issued2023-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1045959&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/320727-
dc.description학위논문(석사) - 한국과학기술원 : 전산학부, 2023.8,[iv, 26 p. :]-
dc.description.abstractAutoregressive transformers have shown remarkable success in video generation. However, the transformers are prohibited from directly learning the long-term dependency in videos due to the quadratic complexity of self-attention, and inherently suffering from slow inference time and error propagation due to the autoregressive process. In this paper, we propose Memory-efficient Bidirectional Transformer (MeBT) for end-to-end learning of long-term dependency in videos and fast inference. Based on recent advances in bidirectional transformers, our method learns to decode the entire spatio-temporal volume of a video in parallel from partially observed patches. The proposed transformer achieves a linear time complexity in both encoding and decoding, by projecting observable context tokens into a fixed number of latent tokens and conditioning them to decode the masked tokens through the cross-attention. Empowered by linear complexity and bidirectional modeling, our method demonstrates significant improvement over the autoregressive transformers for generating moderately long videos in both quality and speed.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subject비디오 생성 모델링▼a양방향 트랜스포머▼a메모리 효율▼a내재 변수 압축-
dc.subjectGenerative modeling of videos▼abidirectional transformer▼amemory efficiency▼alatent bottleneck-
dc.titleTowards end-to-end generative modeling of long videos with memory-efficient bidirectional transformers-
dc.title.alternative메모리 효율적 양방향 트랜스포머를 활용한 긴 비디오의 엔드 투 엔드 생성 모델링 연구-
dc.typeThesis(Master)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :전산학부,-
dc.contributor.alternativeauthorHong, Seunghoon-
Appears in Collection
CS-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0