FaceSyncNet : a deep learning-based approach for non-linear synchronization of facial performance videos = 얼굴 표정 연기 비디오의 비선형 동기화를 위한 심층 학습 기반 접근법a deep learning-based approach for non-linear synchronization of facial performance videos
Given a pair of facial performance videos, we present a deep learning-based approach that can automatically return a synchronized version of these videos. Traditional methods require precise facial landmark tracking and/or clean audio, and thus are sensitive to tracking inaccuracies and audio noise. To alleviate these issues, our approach leverages large-scale video datasets along with their associated audio tracks and trains a deep learning network to learn the audio descriptors of video frames. We then use these descriptors to compute the similarity between video frames in a cost matrix and compute a low-cost non-linear synchronization path. Both quantitative and qualitative evaluations have shown that our approach outperforms existing state-of-the-art methods.