Deep spiral convolutional neural network for single image super-resolution and image enhancement초해상도 단일 영상 복원 및 영상 화질 개선을 위한 심층 나선형 컨볼루셔널 신경망 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 298
  • Download : 0
Convolutional neural network (CNN) based super-resolution (SR) and restoration algorithms have recently achieved a significant improvement on single image super-resolution (SISR) and various image enhancement (IE) tasks. The main objective of SR and IE are to generate a high-quality, high-resolution (HR) image from a given single low-resolution (LR) image or corrupted noisy image. Despite the powerful learning strength of deep networks, the previous CNN-based SR and IE algorithms still have limitations in recovering fine-textured HR results, although they have shown a high numerical similarity score such as a peak signal-to-noise ratio. This dissertation considers a fully end-to-end trainable texture-enhanced multi-scale SR network (TE-MSRN) and IE networks (e.g., multi-scale denoising network (MsDNN), multi-scale deblurring network (MsDBN) and video quality enhancement network (VQENet)) based on a deep spiral CNN while mitigating the limitations of previous deep SR and IE networks in terms of the SR and IE performance, training efficiency, and in recovering fine-textured details. When the SR and IE networks getting deeper, learning the long-range dependencies of the complex relationships between corrupted LR image and HR image becomes more difficult. Generally, the deeper networks suffer from not only the additional increase of the computational complexity and memory cost but also the difficulty in training networks due to the over-fitting and gradient explosion/vanishing/shattered problems. To overcome these difficulties, this dissertation investigates six extensions: an upscaling network with multi-scale feature embedding, multi-scale restoration network, both global and local residual learning, texture evaluating network, a deep spiral CNN, and combination of multiple loss. The TE-MSRN takes a LR image and reconstructs a HR image using an upscaling network and restoration network while not only minimizing corresponding residuals but also enhancing the texture by enforcing the HR prediction to generate ground truth texture through a texture evaluating network. The global residual between intermediate HR prediction and ground truth is minimized in a recurrent manner while reducing each local residual using a deep spiral CNN, and the intermediate output of each recurrent state in the restoration network is supervised by the intermediate auxiliary loss. While reconstructing the HR output, the texture evaluating network is cascaded on to the restoration network such that an accurate texture prediction can be made from the output of the restoration network during training, with this then removed during testing. A deep spiral CNN is considered via a recurrent structure while recurrently minimizing the restoration residual in multiple stages: multi-scale recurrent CNN takes its previous output as input and produces an output that is closer to ground truth residual. With each iteration, the residual gradually reduced, and the HR prediction becomes closer to ground truth. The entire process is reminiscent of a spiraling staircase reaching its destination. Nevertheless, it remains jointly optimized in a unified single architecture using all subnetworks specialized for their own purposes from scratch while increasing the training efficiency and yielding superior SR performance. The considered TE-MSRN is trained to produce a fine-textured HR image with suitable combinations of losses: $l_1$-loss, $l_2$-loss, perceptual structural similarity loss, and intermediate auxiliary loss. Based on a combination of loss functions, the TE-MSRN is explicitly trained to reduce visually implausible artifacts further, leading to a more accurate HR result. This is demonstrated to be effective when used to reduce visually implausible artifacts further, leading to a more accurate HR result. The TE-MSRN is completely end-to-end trainable with integration into a unified single architecture. The main architecture of TE-MSRN is modified to MsDNN, MsDBN, and VQENet for each task. The performance of the TE-MSRN is evaluated on six standard benchmark datasets for SR including two datasets consisting of only textures, four benchmark datasets for IE, and three benchmark datasets for complex video scene analysis (VSA) with video quality enhancement (VQE). Extensive experimental results show that the TE-MSRN, MsDNN, and MsDBN achieve the best performance while making better texture predictions compared to the current state-of-the-art SR and IE algorithms, and show that VQENet helps to increase VSA performance.
Advisors
Yoo, Changdongresearcher유창동researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2018
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2018.2,[vii, 102 p. :]

Keywords

Single Image Super Resolution▼aImage Enhancement▼aConvolutional Neural Network▼aDeep Spiral Convolutional Neural Network▼aVideo Quality Enhancement▼aVideo Scene Analysis; 영상 초해상도 복원▼a영상 화질 개선▼a컨볼루셔널 신경망▼a심층 나선형 컨볼루셔널 신경망▼a비디오 화질 향상▼a비디오 장면 분석

URI
http://hdl.handle.net/10203/265232
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=734379&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0