This paper tackles the task of extreme climate event tracking. It has unique challenges compared to other visual object tracking problems, including a wider range of spatio-temporal dynamics, the unclear boundary of the target, and the shortage of a labeled dataset. We propose a simple but robust end-to-end model based on multi-layered ConvLSTMs, suitable for climate event tracking. It first learns to imprint the location and the appearance of the target at the first frame in an auto-encoding fashion. Next, the learned feature is fed to the tracking module to track the target in subsequent time frames. To tackle the data shortage problem, we propose data augmentation based on conditional generative adversarial networks. Extensive experiments show that the proposed framework significantly improves tracking performance of a hurricane tracking task over several state-of-the-art methods.