n-Gram/2L-approximation: a two-level n-gram inverted index structure for approximate string matching

Cited 4 time in webofscience Cited 0 time in scopus
  • Hit : 608
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorKim, Min Sooko
dc.contributor.authorWhang, Kyu-Youngko
dc.contributor.authorLee, Jae-Gilko
dc.date.accessioned2013-03-07T21:41:05Z-
dc.date.available2013-03-07T21:41:05Z-
dc.date.created2012-02-06-
dc.date.created2012-02-06-
dc.date.created2012-02-06-
dc.date.created2012-02-06-
dc.date.created2012-02-06-
dc.date.issued2007-11-
dc.identifier.citationCOMPUTER SYSTEMS SCIENCE AND ENGINEERING, v.22, pp.365 - 379-
dc.identifier.issn0267-6192-
dc.identifier.urihttp://hdl.handle.net/10203/91431-
dc.description.abstractApproximate string matching is to find all the occurrences of a query string in a text database allowing a specified number of errors. Approximate string matching based on the n-gram inverted index (simply, n-gram Matching) has been widely used. A major reason is that it is scalable for large databases since it is not a main memory algorithm. Nevertheless, n-gram Matching also has drawbacks: the query performance tends to be bad, and many false positives occur if a large number of errors are allowed. in this paper, we propose an inverted index structure, which we call the n-gram/2L-Approximation index, that improves these drawbacks and an approximate string matching algorithm based on it. The n-gram/2L-Approximation is an adaptation of the n-gram/2L index [4], which the authors have proposed earlier for exact matching. inheriting the advantages of the n-gram/2L index, the n-gram/2L-Approximation index reduces the size of the index and improves the query performance compared with the n-gram inverted index. In addition, the n-gram/2L-Approximation index reduces false positives compared with the n-gram inverted index if a large number of errors are allowed. We perform extensive experiments using the text and protein databases. Experimental results using databases of 1 GBytes show that the n-gram/2L-Approximation index reduces the index size by up to 1.8 times and, at the same time, improves the query performance by up to 4.2 times compared with those of the n-gram inverted index.-
dc.languageEnglish-
dc.publisherC R L PUBLISHING LTD-
dc.titlen-Gram/2L-approximation: a two-level n-gram inverted index structure for approximate string matching-
dc.typeArticle-
dc.identifier.wosid000252749200005-
dc.identifier.scopusid2-s2.0-38949126548-
dc.type.rimsART-
dc.citation.volume22-
dc.citation.beginningpage365-
dc.citation.endingpage379-
dc.citation.publicationnameCOMPUTER SYSTEMS SCIENCE AND ENGINEERING-
dc.contributor.localauthorKim, Min Soo-
dc.contributor.localauthorWhang, Kyu-Young-
dc.contributor.localauthorLee, Jae-Gil-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorapproximate string matching-
dc.subject.keywordAuthorn-gram-
dc.subject.keywordAuthorinverted index-
Appears in Collection
CS-Journal Papers(저널논문)IE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 4 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0