DSpace at KOASAS: A Dual-Mode Similarity Search Accelerator based on Embedding Compression for Online Cross-Modal Image-Text Retrieval

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Conference Papers(학술회의논문)

A Dual-Mode Similarity Search Accelerator based on Embedding Compression for Online Cross-Modal Image-Text Retrieval

Cited 0 time in webofscience

Cited 0 time in

Hit : 58
Download : 0

Export

Park, Yeo-Reum / Kim, Ji-Hoon / Do, Jaeyoung / Kim, Joo-Young researcher

Image-text retrieval (ITR) that identifies the relevant images for a given text query, or vice versa, is the fundamental task in emerging vision-and-language machine learning applications. Recently, the cross-modal approach that extracts image and text features in separate reasoning pipelines but performs the similarity search on the same embedding representation is proposed for the real-time ITR system. However, the similarity search that finds the most relevant data in huge data embeddings for a given query becomes the bottleneck of the ITR system.In this paper, we propose a dual-mode similarity search accelerator that can solve the computational hurdle for online image-to-text and text-to-image retrieval service. We propose an embedding compression scheme that removes the sparsity in the text embeddings, further eliminating the time-consuming masking operations in the later processing pipeline. Combining with the data quantization from 32-bit floating-point to 8-bit integer, we reduce the target dataset size by 95.1% with less than 0.1% accuracy loss for 1024-dimensional embedding features. In addition, we propose a streamlined similarity search data flow for both query types, which minimizes the required memory bandwidth with maximal data reuse. The query and data embeddings are guaranteed to be fetched only once from the external memory with the optimized data flow. Based on the proposed data representation and flow, we design a scalable similarity search accelerator that includes multiple ITR kernels. Each ITR kernel has modular design, composed of a separate memory access module and a computing module. The computing module supports pipelined operations of the four similarity search tasks: dot product calculation, data reordering, partial score aggregation, and ranking. We double the number of processing operations in the computing module with the DSP packing technique. Finally, we implement the proposed accelerator with six ITR kernels on the Xilinx Alveo U280 FPGA card. It shows 2.98 tera operations per second (TOPS) performance at 186 MHz, achieving 526/144 and 1163/306 queries per second (QPS) performance for image-to-text and text-to-image retrieval on MS-COCO 1K/5K benchmark. It is up to 359.0 × and 13.9 × faster and 503.6 × and 68.7 × more energy-efficient than the baseline and optimized GPU implementation on Nvidia Titan RTX, respectively.

Publisher: Institute of Electrical and Electronics Engineers Inc.

Issue Date: 2022-05

Language: English

Citation: 30th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2022

ISSN: 0000-0000

DOI: 10.1109/FCCM53951.2022.9786159

URI: http://hdl.handle.net/10203/299728

Appears in Collection: EE-Conference Papers(학술회의논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

A Dual-Mode Similarity Search Accelerator based on Embedding Compression for Online Cross-Modal Image-Text Retrieval

KOASAS

Communities & Collections