CATs plus plus : Boosting Cost Aggregation With Convolutions and Transformers

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 5
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorCho, Seokjuko
dc.contributor.authorHong, Sunghwanko
dc.contributor.authorKim, Seungryongko
dc.date.accessioned2024-08-16T02:00:08Z-
dc.date.available2024-08-16T02:00:08Z-
dc.date.created2024-08-16-
dc.date.issued2023-06-
dc.identifier.citationIEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, v.45, no.6, pp.7174 - 7194-
dc.identifier.issn0162-8828-
dc.identifier.urihttp://hdl.handle.net/10203/322308-
dc.description.abstractCost aggregation is a process in image matching tasks that aims to disambiguate the noisy matching scores. Existing methods generally tackle this by hand-crafted or CNN-based methods, which either lack robustness to severe deformations or inherit the limitation of CNNs that fail to discriminate incorrect matches due to limited receptive fields and inadaptability. In this paper, we introduce Cost Aggregation with Transformers (CATs) to tackle this by exploring global consensus among initial correlation map with the help of some architectural designs that allow us to benefit from global receptive fields of self-attention mechanism. To this end, we include appearance affinity modeling, which helps to disambiguate the noisy initial correlation maps. Furthermore, we introduce some techniques, including multi-level aggregation to exploit rich semantics prevalent at different feature levels and swapping self-attention to obtain reciprocal matching scores to act as a regularization. Although CATs can attain competitive performance, it may face some limitations, i.e., high computational costs, which may restrict its applicability only at limited resolution and hurt performance. To overcome this, we propose CATs++, an extension of CATs. Concretely, we introduce early convolutions prior to cost aggregation with a transformer to control the number of tokens and inject some convolutional inductive bias, then propose a novel transformer architecture for both efficient and effective cost aggregation, which results in apparent performance boost and cost reduction. With the reduced costs, we are able to compose our network with a hierarchical structure to process higher-resolution inputs. We show that the proposed method with these integrated outperforms the previous state-of-the-art methods by large margins.-
dc.languageEnglish-
dc.publisherIEEE COMPUTER SOC-
dc.titleCATs plus plus : Boosting Cost Aggregation With Convolutions and Transformers-
dc.typeArticle-
dc.identifier.wosid000982475600039-
dc.identifier.scopusid2-s2.0-85141643700-
dc.type.rimsART-
dc.citation.volume45-
dc.citation.issue6-
dc.citation.beginningpage7174-
dc.citation.endingpage7194-
dc.citation.publicationnameIEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE-
dc.identifier.doi10.1109/TPAMI.2022.3218727-
dc.contributor.localauthorKim, Seungryong-
dc.contributor.nonIdAuthorCho, Seokju-
dc.contributor.nonIdAuthorHong, Sunghwan-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorCosts-
dc.subject.keywordAuthorTransformers-
dc.subject.keywordAuthorCorrelation-
dc.subject.keywordAuthorSemantics-
dc.subject.keywordAuthorFeature extraction-
dc.subject.keywordAuthorTask analysis-
dc.subject.keywordAuthorComputer architecture-
dc.subject.keywordAuthorCost aggregation-
dc.subject.keywordAuthorefficient transformer-
dc.subject.keywordAuthorsemantic visual correspondence-
Appears in Collection
AI-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0