Entity resolution with iterative blocking

Cited 0 time in webofscience Cited 188 time in scopus
  • Hit : 154
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorWhang, Steven Euijongko
dc.contributor.authorMenestrina, Davidko
dc.contributor.authorKoutrika, Georgiako
dc.contributor.authorTheobald, Martinko
dc.contributor.authorGarcia-Molina, Hectorko
dc.date.accessioned2019-04-16T03:50:50Z-
dc.date.available2019-04-16T03:50:50Z-
dc.date.created2018-03-29-
dc.date.issued2009-06-
dc.identifier.citationInternational Conference on Management of Data and 28th Symposium on Principles of Database Systems, SIGMOD-PODS'09, pp.219 - 231-
dc.identifier.urihttp://hdl.handle.net/10203/260329-
dc.description.abstractEntity Resolution (ER) is the problem of identifying which records in a database refer to the same real-world entity. An exhaustive ER process involves computing the similarities between pairs of records, which can be very expensive for large datasets. Various blocking techniques can be used to enhance the performance of ER by dividing the records into blocks in multiple ways and only comparing records within the same block. However, most blocking techniques process blocks separately and do not exploit the results of other blocks. In this paper, we propose an iterative blocking framework where the ER results of blocks are reflected to subsequently processed blocks. Blocks are now iteratively processed until no block contains any more matching records. Compared to simple blocking, iterative blocking may achieve higher accuracy because reflecting the ER results of blocks to other blocks may generate additional record matches. Iterative blocking may also be more efficient because processing a block now saves the processing time for other blocks. We implement a scalable iterative blocking system and demonstrate that iterative blocking can be more accurate and efficient than blocking for large datasets. © 2009 ACM.-
dc.languageEnglish-
dc.publisherACM Special Interest Group on Management of Data (SIGMOD)-
dc.titleEntity resolution with iterative blocking-
dc.typeConference-
dc.type.rimsCONF-
dc.citation.beginningpage219-
dc.citation.endingpage231-
dc.citation.publicationnameInternational Conference on Management of Data and 28th Symposium on Principles of Database Systems, SIGMOD-PODS'09-
dc.identifier.conferencecountryUS-
dc.identifier.conferencelocationProvidence, Rhode Island-
dc.identifier.doi10.1145/1559845.1559870-
dc.contributor.localauthorWhang, Steven Euijong-
dc.contributor.nonIdAuthorMenestrina, David-
dc.contributor.nonIdAuthorKoutrika, Georgia-
dc.contributor.nonIdAuthorTheobald, Martin-
dc.contributor.nonIdAuthorGarcia-Molina, Hector-
Appears in Collection
EE-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0