High dynamic range (HDR) UHD-TVs are being rapidly deployed in consumer markets, offering a highly realistic experience to customers. However, these HDR UHD-TVs still need to handle the legacy low resolution (LR) video of standard dynamic range (SDR). In this paper, we propose a convolutional neural network based structure for the joint learning of super-resolution and inverse tone-mapping, which can be used for converting LR-SDR legacy video to high resolution (HR) HDR video. Our proposed structure is designed to perform three tasks: (i) SDR-to-HDR conversion of LR images, (ii) super-resolution of LR-SDR images to HR-SDR images and (iii) joint conversion from LR-SDR to HR-HDR images. We show the effectiveness of our proposed joint learning CNN architecture with extensive experiments.