and a spatial loss between network outputs and PAN inputs. Experiment results show that our method can generate pan-sharpened images with much higher visual quality and better metric scores, compared to our previous methods and the state-of-the-art pan-sharpening methods.; Recently, many deep-learning-based pan-sharpening methods have been proposed for generating high-quality pan-sharpened result images. Pan-sharpening is a method that can reconstruct a high-quality high-resolution image of a target band (e.g. multi-spectral, MS), from using a pair of a low-resolution image input of the target band and high-resolution image of other band (e.g. panchromatic, PAN). In this work, we propose three novel convolutional neural network (CNN) based pan-sharpening methods.
First, we propose a generalized CNN architecture for both pan-sharpening and super-resolution with and without PAN input images. Moreover, we use a mixed training dataset of two different satellites, and show promising results on the trained satellite datasets as well as an unseen satellite dataset. To achieve high performance for both pan-sharpening and super-resolution, we propose to create a dummy PAN image when the PAN input are unavailable, which is then fed to our pan-sharpening network. Moreover, as satellite image datasets have different signal characteristics (i.e. different dynamic ranges, pixel histograms) from one another, we employ an efficient normalization for satellite images.
Secondly, we propose a novel loss function for training CNNs for pan-sharpening. Conventional pan-sharpening methods mainly focused on various types of convolutional neural network (CNN) structures, which were trained by simply minimizing L1 or L2 losses between high-resolution multi-spectral (MS) target images and generated network outputs. When PAN and MS images (which are used as inputs for CNN) are of small ground sample distance (GSD), they often have inter-channel pixel misalignment due to inherent limitations in satellite sensor arrays. Conventional pan-sharpening methods that were trained with L1 or L2 losses on these misaligned datasets tend to produce HR images of inferior visual quality including double edge artifacts. In this work, we propose a novel loss function, called a spectral-spatial structure (S3) loss, which is specifically designed to preserve spectral information of MS targets and spatial structure of PAN targets in pan-sharpened images. More specifically, our S3 loss consists of two loss functions: a spectral loss between generated images and MS targets, and a spatial loss between generated images and PAN targets. Our S3 loss can be used for any types of CNN structures for pan-sharpening, resulting in significant visual improvements compared to the state-of-the-art CNN-based pan-sharpening methods.
Finally, we propose a novel unsupervised learning framework for pan-sharpening. Conventional CNN-based pan-sharpening methods used supervised learning for training networks, by applying a certain degradation model to original MS-PAN satellite images to generate MS-PAN inputs. In doing so, these networks were only trained for the lower scale scenario, and thus perform poorly when tested at the original scale scenario. On the contrary, our proposed unsupervised learning framework can overcome this problem. To achieve high visual quality, we first propose a simple multi-resolution MS-PAN registration based on correlations, to obtain a coarsely aligned PAN-resolution MS target from each MS-PAN input pair. Additionally, we designed two losses for training our network: a spectral loss between network outputs and our aligned MS targets