HARP: Autoregressive Latent Video Prediction with High-Fidelity Image Generator

Cited 5 time in webofscience Cited 0 time in scopus
  • Hit : 42
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorSeo, Younggyoko
dc.contributor.authorLee, Kiminko
dc.contributor.authorLiu, Fangchenko
dc.contributor.authorJames, Stephenko
dc.contributor.authorAbbeel, Pieterko
dc.date.accessioned2023-12-12T10:01:14Z-
dc.date.available2023-12-12T10:01:14Z-
dc.date.created2023-12-08-
dc.date.issued2022-10-
dc.identifier.citation29th IEEE International Conference on Image Processing, ICIP 2022, pp.3943 - 3947-
dc.identifier.issn1522-4880-
dc.identifier.urihttp://hdl.handle.net/10203/316315-
dc.description.abstractVideo prediction is an important yet challenging problem; burdened with the tasks of generating future frames and learning environment dynamics. Recently, autoregressive latent video models have proved to be a powerful video prediction tool, by separating the video prediction into two sub-problems: pre-training an image generator model, followed by learning an autoregressive prediction model in the latent space of the image generator. However, successfully generating high-fidelity and high-resolution videos has yet to be seen. In this work, we investigate how to train an autoregressive latent video prediction model capable of predicting high-fidelity future frames with minimal modification to existing models, and produce high-resolution (256x256) videos. Specifically, we scale up prior models by employing a high-fidelity image generator (VQ-GAN) with a causal transformer model, and introduce additional techniques of top-k sampling and data augmentation to further improve video prediction quality. Despite the simplicity, the proposed method achieves competitive performance to state-of-the-art approaches on standard video prediction benchmarks with fewer parameters, and enables high-resolution video prediction on complex and large-scale datasets.-
dc.languageEnglish-
dc.publisherIEEE International Conference on Image Processing-
dc.titleHARP: Autoregressive Latent Video Prediction with High-Fidelity Image Generator-
dc.typeConference-
dc.identifier.wosid001058109504005-
dc.identifier.scopusid2-s2.0-85139902097-
dc.type.rimsCONF-
dc.citation.beginningpage3943-
dc.citation.endingpage3947-
dc.citation.publicationname29th IEEE International Conference on Image Processing, ICIP 2022-
dc.identifier.conferencecountryFR-
dc.identifier.conferencelocationBordeaux-
dc.identifier.doi10.1109/ICIP46576.2022.9897982-
dc.contributor.localauthorLee, Kimin-
dc.contributor.nonIdAuthorLiu, Fangchen-
dc.contributor.nonIdAuthorJames, Stephen-
dc.contributor.nonIdAuthorAbbeel, Pieter-
Appears in Collection
AI-Conference Papers(학술대회논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 5 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0