Audio-visual learning with semantically similar samples의미론적 유사성을 이용한 청각-시각 연관학습

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 427
  • Download : 0
Instance discrimination-based contrastive learning is the learning method that contrasts the positive and negative pair. It assumes that the negative pair should contain different semantic information. However, the assumption only holds because of the random construction of the training batch. Intuitively, this faulty negative pair disturb the training and degrade the model performance. This work aims to solve the faulty negative problem for in- stance discrimination-based Audio-Visual Learning. Existing audio-visual works employ contrastive learning by assigning corresponding audio-visual pairs from the same source as positive while randomly mismatched pairs as negatives. As aforementioned general instance discrimination-based contrastive learning, these negative pairs may contain semantically matched audio-visual information. The key contribution of this work is showing that semantically similar samples can compensate for the effect of faulty negative pairs. Our approach incorporates semantically similar samples into a contrastive learning objective directly. It is applied to two tasks: Audio-Visual Sound Source Localization and Visually Grounded Speech. We demonstrate the effectiveness of our approach to the tasks.
Advisors
Kweon, In Soresearcher권인소researcher
Description
한국과학기술원 :미래자동차학제전공,
Publisher
한국과학기술원
Issue Date
2023
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 미래자동차학제전공, 2023.2,[v, 30 p. :]

Keywords

Audio-visual learning▼aSound source localization▼aVisually grounded speech; 청각-시각연관학습▼a음원위치탐색▼a음성의시각적이해

URI
http://hdl.handle.net/10203/308327
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1032366&flag=dissertation
Appears in Collection
PD-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0