Set2Box: Similarity Preserving Representation Learning for Sets

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 248
  • Download : 0
Sets have been used for modeling various types of objects, and measuring similarity between them has been a key building block of a wide range of applications. However, as sets have grown in numbers and sizes, the computational cost and storage required for set similarity computation have become substantial. In this work, we propose SET2Box, which represents sets as boxes to precisely capture overlaps of sets and thus accurately estimate various similarity measures. Additionally, based on the proposed box quantization scheme, we design SET2Box+, which yields more concise but more accurate box representations of sets. Through extensive experiments on 8 real-world datasets, we show that, compared to baseline approaches, SET2Box+ is (a) Accurate: achieving up to 40.8× smaller estimation error while requiring 60% fewer bits to encode sets, (b) Concise: yielding up to 96.8× more concise representations with similar estimation error, and (c) Versatile: enabling the estimation of four set-similarity measures from a single representation of each set. For reproducibility, the source code and datasets used in the paper are available at https://github.com/geon0325/Set2Box. © 2022 IEEE.
Publisher
IEEE Computer Society
Issue Date
2022-11-29
Language
English
Citation

The 22nd IEEE International Conference on Data Mining, ICDM 2022, pp.1023 - 1028

ISSN
1550-4786
DOI
10.1109/ICDM54844.2022.00125
URI
http://hdl.handle.net/10203/301996
Appears in Collection
IE-Conference Papers(학술회의논문)AI-Conference Papers(학술대회논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0