DSpace at KOASAS: Speaker-adaptive Lip Reading with User-dependent Padding

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Conference Papers(학술회의논문)

Speaker-adaptive Lip Reading with User-dependent Padding

Cited 5 time in

Cited 0 time in

Hit : 86
Download : 0

Export

Kim, Minsu / Kim, Hyun Jun / Ro, Yong Man researcher

Lip reading aims to predict speech based on lip movements alone. As it focuses on visual information to model the speech, its performance is inherently sensitive to personal lip appearances and movements. This makes the lip reading models show degraded performance when they are applied to unseen speakers due to the mismatch between training and testing conditions. Speaker adaptation technique aims to reduce this mismatch between train and test speakers, thus guiding a trained model to focus on modeling the speech content without being intervened by the speaker variations. In contrast to the efforts made in audio-based speech recognition for decades, the speaker adaptation methods have not well been studied in lip reading. In this paper, to remedy the performance degradation of lip reading model on unseen speakers, we propose a speaker-adaptive lip reading method, namely user-dependent padding. The user-dependent padding is a speaker-specific input that can participate in the visual feature extraction stage of a pre-trained lip reading model. Therefore, the lip appearances and movements information of different speakers can be considered during the visual feature encoding, adaptively for individual speakers. Moreover, the proposed method does not need 1) any additional layers, 2) to modify the learned weights of the pre-trained model, and 3) the speaker label of train data used during pre-train. It can directly adapt to unseen speakers by learning the userdependent padding only, in a supervised or unsupervised manner. Finally, to alleviate the speaker information insufficiency in public lip reading databases, we label the speaker of a well-known audio-visual database, LRW, and design an unseen-speaker lip reading scenario named LRW-ID. The effectiveness of the proposed method is verified on sentence- and word-level lip reading, and we show it can further improve the performance of a well-trained model with large speaker variations.

Publisher: European Computer Vision Association

Issue Date: 2022-10-25

Language: English

Citation: European Conference on Computer Vision, ECCV 2022, pp.576 - 593

ISSN: 0302-9743

DOI: 10.1007/978-3-031-20059-5_33

URI: http://hdl.handle.net/10203/299635

Appears in Collection: EE-Conference Papers(학술회의논문)

Files in This Item: There are no files associated with this item.

This item is cited by other documents in WoS

⊙ Detail Information in WoSⓡ	Click to see
⊙ Cited 5 items in WoS	Click to see citing articles in

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Speaker-adaptive Lip Reading with User-dependent Padding

This item is cited by other documents in WoS

KOASAS

Communities & Collections