DC Field | Value | Language |
---|---|---|
dc.contributor.author | Seo, Younggyo | ko |
dc.contributor.author | Lee, Kimin | ko |
dc.contributor.author | Liu, Fangchen | ko |
dc.contributor.author | James, Stephen | ko |
dc.contributor.author | Abbeel, Pieter | ko |
dc.date.accessioned | 2023-12-12T10:01:14Z | - |
dc.date.available | 2023-12-12T10:01:14Z | - |
dc.date.created | 2023-12-08 | - |
dc.date.issued | 2022-10 | - |
dc.identifier.citation | 29th IEEE International Conference on Image Processing, ICIP 2022, pp.3943 - 3947 | - |
dc.identifier.issn | 1522-4880 | - |
dc.identifier.uri | http://hdl.handle.net/10203/316315 | - |
dc.description.abstract | Video prediction is an important yet challenging problem; burdened with the tasks of generating future frames and learning environment dynamics. Recently, autoregressive latent video models have proved to be a powerful video prediction tool, by separating the video prediction into two sub-problems: pre-training an image generator model, followed by learning an autoregressive prediction model in the latent space of the image generator. However, successfully generating high-fidelity and high-resolution videos has yet to be seen. In this work, we investigate how to train an autoregressive latent video prediction model capable of predicting high-fidelity future frames with minimal modification to existing models, and produce high-resolution (256x256) videos. Specifically, we scale up prior models by employing a high-fidelity image generator (VQ-GAN) with a causal transformer model, and introduce additional techniques of top-k sampling and data augmentation to further improve video prediction quality. Despite the simplicity, the proposed method achieves competitive performance to state-of-the-art approaches on standard video prediction benchmarks with fewer parameters, and enables high-resolution video prediction on complex and large-scale datasets. | - |
dc.language | English | - |
dc.publisher | IEEE International Conference on Image Processing | - |
dc.title | HARP: Autoregressive Latent Video Prediction with High-Fidelity Image Generator | - |
dc.type | Conference | - |
dc.identifier.wosid | 001058109504005 | - |
dc.identifier.scopusid | 2-s2.0-85139902097 | - |
dc.type.rims | CONF | - |
dc.citation.beginningpage | 3943 | - |
dc.citation.endingpage | 3947 | - |
dc.citation.publicationname | 29th IEEE International Conference on Image Processing, ICIP 2022 | - |
dc.identifier.conferencecountry | FR | - |
dc.identifier.conferencelocation | Bordeaux | - |
dc.identifier.doi | 10.1109/ICIP46576.2022.9897982 | - |
dc.contributor.localauthor | Lee, Kimin | - |
dc.contributor.nonIdAuthor | Liu, Fangchen | - |
dc.contributor.nonIdAuthor | James, Stephen | - |
dc.contributor.nonIdAuthor | Abbeel, Pieter | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.