(A) study on learning-based Approaches to video frame interpolation using linear mapping kernels and CNN-based nonlinear mapping kernels선형 매핑 커널을 이용하는 방법과 콘볼루션 신경망 기반 비선형 매핑 커널을 이용하는 방법의 학습 기반 비디오 프레임 보간에 관한 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 378
  • Download : 0
Frame rate up-conversion, also called video frame interpolation (VFI), is a low-level computer vision problem for generating one [or] more intermediate frames between two original consecutive frames in videos. The FRUC problem has been solved for several decades by heuristic approaches, and deep-learning based FRUC has recently been studied. We propose two approaches to FRUC: (i) a learning-based direct linear mapping approach; and (ii) a kernel-based approach using a hierarchical deep convolutional neural network (CNN). We present a novel and effective learning-based FRUC scheme, using linear mapping. The proposed learning-based FRUC scheme consists of (i) a novel hierarchical extended bilateral motion estimation (HEBME) method and (ii) a synthesis-based motion-compensated frame interpolation (S-MCFI) method. Firstly, the HEBME method effectively enhances the accuracy of motion estimation (ME), which can lead to a significant improvement of FRUC performance. The proposed HEBME method consists of two ME pyramids of a three-layered hierarchy where one pyramid searches Motion Vectors (MVs) for the first set of block partitions and the other pyramid searches MVs for the second set of blocks which are placed in the shifted locations by the half block sizes of the first set of blocks. Thus, the MVs are searched in a coarse-to-fine manner via each pyramid such that they can be refined in an enhanced resolution of 4 times by jointly combining the MVs from the two pyramids where our proposed HEBME method plays an important role in achieving the high accuracy of ME. The HEBME method utilizes a novel and very effective matching criterion for ME which consists of the sum of bilateral absolute difference, the edge variance of an average block between two consecutive blocks found by bilateral ME, the pixel variances of two consecutive blocks, and the MV difference between the current block and its neighboring blocks. Secondly, the S-MCFI method finally generates interpolated frames by applying linear mapping kernels for the original frames. For the linear mappings of S-MCFI, multiple linear mapping kernels are computed based on kernel ridge regression for various edge characteristics during training. We also present a kernel-based FRUC scheme based on a convolution neural network (CNN) where two sets of horizontal and vertical kernels are learned for two consecutive input frames by the proposed hierarchical CNN. Our FRUC scheme aims at interpolation one single frame between two consecutive input frames. For a kernel-based approach, the number of kernel taps is important to improve the subject quality of the interpolated frame because the kernels only consider pixels which are in the range of kernel taps. Thus, kernel taps in the kernel-based FRUC scheme should be increased in order to handle videos with fast motions which are represented as large-displacements and large-scale videos such as high-definition (HD) 1080 and 4K ultra-high-definition (UHD). However, it is difficult to increase the number of kernel taps because of a lack of the memories and computational complexity. Hence, we propose a hierarchical CNN for the FRUC. The proposed learning-based FRUC scheme consists of (i) kernel estimation and (ii) shift-able local convolution for interpolating intermediate pixels. The shift-able` local convolution can yield the estimated kernels that can cover large regions that are often out of the ranges in conventional kernel-based approaches. In order to show the effectiveness of our proposed FRUC schemes, we present experimental results for FRUC using various test sequences. The experimental results show that our linear mapping-based FRUC significantly outperforms the state-of-the-art schemes which are based on heuristic approaches with average 1.50 dB higher in PSNR and our hierarchical CNN-based FRUC outperforms the state-of-the-art schemes including the latest deep learning-based FRUC scheme. Specifically, the hierarchical CNN-based FRUC scheme with our proposed shift-able local convolution can interpolate an intermediate frame with high-quality when objects in the original frames have fast motions.
Advisors
Kim, Munchurlresearcher김문철researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2018
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2018.2,[iii, 36 p. :]

Keywords

Video frame interpolation▼aFrame rate up-conversion(FRUC)▼aLinear Mapping▼aKernel Ridge Regression▼aConvolutional neural network(CNN); 비디오 프레임 보간 법▼a프레임 율 향상 기법▼a움직임 보상 프레임 보간▼a선형 매핑▼a커널 리지 회귀▼a콘볼루션 신경망

URI
http://hdl.handle.net/10203/266715
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=734033&flag=dissertation
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0