Pay-As-You-Go Entity Resolution

Cited 40 time in webofscience Cited 0 time in scopus
  • Hit : 68
  • Download : 0
Entity resolution (ER) is the problem of identifying which records in a database refer to the same entity. In practice, many applications need to resolve large data sets efficiently, but do not require the ER result to be exact. For example, people data from the web may simply be too large to completely resolve with a reasonable amount of work. As another example, real-time applications may not be able to tolerate any ER processing that takes longer than a certain amount of time. This paper investigates how we can maximize the progress of ER with a limited amount of work using "hints," which give information on records that are likely to refer to the same real-world entity. A hint can be represented in various formats (e.g., a grouping of records based on their likelihood of matching), and ER can use this information as a guideline for which records to compare first. We introduce a family of techniques for constructing hints efficiently and techniques for using the hints to maximize the number of matching records identified using a limited amount of work. Using real data sets, we illustrate the potential gains of our pay-as-you-go approach compared to running ER without using hints.
Publisher
IEEE COMPUTER SOC
Issue Date
2013-05
Language
English
Article Type
Article
Keywords

INFORMATION

Citation

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, v.25, no.5, pp.1111 - 1124

ISSN
1041-4347
DOI
10.1109/TKDE.2012.43
URI
http://hdl.handle.net/10203/240812
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 40 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0