Deep predictive video compression using mode-selective uni- and bi-directional predictions based on multi-frame hypothesis심층 신경망 기반 다중 프레임을 이용한 단방향 및 양방향 예측을 이용한 비디오 압축

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 84
  • Download : 0
Recently, deep learning based researches in the field of image processing are being actively conducted. Likewise, there are recent methods based on both non-linear transform and motion estimation using deep neural networks for image and video compression. Deep learning-based image compression has shown significant performance improvement in terms of coding efficiency and subjective quality. However, less effort has been relatively done on video compression based on deep neural networks. In this study, we propose an end-to-end deep predictive video compression network, called DeepPVCnet, using mode-selective uni- and bi-directional predictions based on multi-frame hypothesis with a multi-scale structure and a temporal-context-adaptive entropy model as follows: First, we propose a structure that compresses the current frame using multiple reference frames, not using a single reference frame as in recent methods. The method based on a single reference frame has a limitation in improving the coding efficiency because the neighboring frame information is limitedly used for compression of the current frame. Second, learned from the lesson of the conventional video codec, we firstly incorporate a mode-selective framework into our DeepPVCnet with uni- and bi-directional predictive modes in a rate-distortion minimization sense. Since the recent methods used either uni-directional or bi-directional predictions for the current frame, the coding efficiency can be limited for video compression. Third, we propose an entropy model that utilizes the temporal context information of the reference frames for the current frame coding. The autoregressive entropy models for CNN-based image and video compression is difficult to compute with parallel processing. On the other hand, our proposed entropy model utilizes temporally coherent context from the reference frames, so that the context information can be computed in parallel. Finally, Our DeepPVCnet jointly compresses motion information and residual data that are generated from the multi-scale stucture via the feature transformation layers, which has an advantage in terms of computational complexity and the ability to remove redundancy between joint information. Extensive experiments show that our DeepPVCnet outperforms AVC/H.264, HEVC/H.265 and state-of-the-art methods in MS-SSIM perspective.
Advisors
Kim, Munchurlresearcher김문철researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2021
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2021.2,[vii, 60 p. :]

Keywords

Deep learning▼aVideo compression▼aPredictive coding▼aEntropy coding▼aAVC/H.264▼aHEVC/H.265; 딥러닝▼a비디오 압축▼a예측기반 코딩▼a엔트로피 코딩▼aAVC/H.264▼aHEVC/H.265

URI
http://hdl.handle.net/10203/295614
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=956672&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0