Self-Supervised Visual Representation Learning via Residual Momentum

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 70
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorPham, Trung Xuanko
dc.contributor.authorNiu, Axiko
dc.contributor.authorZhang, Kangko
dc.contributor.authorJin, Tee Joshua Tianko
dc.contributor.authorHong, Ji Wooko
dc.contributor.authorYoo, Chang-Dongko
dc.date.accessioned2023-11-27T02:01:54Z-
dc.date.available2023-11-27T02:01:54Z-
dc.date.created2023-11-25-
dc.date.created2023-11-25-
dc.date.issued2023-
dc.identifier.citationIEEE ACCESS, v.11, pp.116706 - 116720-
dc.identifier.issn2169-3536-
dc.identifier.urihttp://hdl.handle.net/10203/315212-
dc.description.abstractSelf-supervised learning (SSL) has emerged as a promising approach for learning representations from unlabeled data. Momentum-based contrastive frameworks such as MoCo-v3 have shown remarkable success among the many SSL methods proposed in recent years. However, a significant gap in encoder representation exists between the online encoder (student) and the momentum encoder (teacher) in these frameworks, limiting the performance on downstream tasks. We identify this gap as a bottleneck often overlooked in existing frameworks and propose 'residual momentum' that explicitly reduces the gap during training to encourage the student to learn representations closer to the teacher's. We also reveal that a similar technique, knowledge distillation (KD), to reduce the distribution gap with cross-entropy-based loss in supervised learning is useless in the SSL context and demonstrate that the intra-representation gap measured by cosine similarity is crucial for EMA-based SSLs. Extensive experiments on different benchmark datasets and architectures demonstrate the superiority of our method compared to state-of-the-art contrastive learning baselines. Specifically, our method outperforms MoCo-v3 0.7% top-1 in ImageNet, 2.82% on CIFAR-100, 1.8% AP, and 3.0% AP75 on VOC detection pre-trained on the COCO dataset; it also improves DenseCL with 0.5% AP (800ep) and 0.6% AP75 (1600ep). Our work highlights the importance of reducing the teacher-student intra-gap in momentum-based contrastive learning frameworks and provides a practical solution for improving the quality of learned representations.-
dc.languageEnglish-
dc.publisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC-
dc.titleSelf-Supervised Visual Representation Learning via Residual Momentum-
dc.typeArticle-
dc.identifier.wosid001121769800001-
dc.identifier.scopusid2-s2.0-85174832711-
dc.type.rimsART-
dc.citation.volume11-
dc.citation.beginningpage116706-
dc.citation.endingpage116720-
dc.citation.publicationnameIEEE ACCESS-
dc.identifier.doi10.1109/access.2023.3325842-
dc.contributor.localauthorYoo, Chang-Dong-
dc.contributor.nonIdAuthorPham, Trung Xuan-
dc.contributor.nonIdAuthorNiu, Axi-
dc.contributor.nonIdAuthorJin, Tee Joshua Tian-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorContrastive learning-
dc.subject.keywordAuthorresidual momentum-
dc.subject.keywordAuthorrepresentation learning-
dc.subject.keywordAuthorself-supervised learning-
dc.subject.keywordAuthorknowledge distillation-
dc.subject.keywordAuthorteacher-student gap-
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0