n-Gram/2L: A Space and Time Efficient Two-Level n-Gram Inverted Index Structure

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 693
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorKim, Min-Sooko
dc.contributor.authorWhang, Kyu-Youngko
dc.contributor.authorLee, Jae-Gilko
dc.contributor.authorLee, Min-Jaeko
dc.date.accessioned2013-03-18T19:36:58Z-
dc.date.available2013-03-18T19:36:58Z-
dc.date.created2012-02-06-
dc.date.created2012-02-06-
dc.date.issued2005-09-01-
dc.identifier.citation31st Int'l Conf. on Very Large Data Bases, pp.325 - 336-
dc.identifier.urihttp://hdl.handle.net/10203/151764-
dc.description.abstractThe n-gram inverted index has two major advantages: language-neutral and error-tolerant. Due to these advantages, it has been widely used in information retrieval or in similar sequence matching for DNA and protein databases. Nevertheless, the n-gram inverted index also has drawbacks: the size tends to be very large, and the performance of queries tends to be bad. In this paper, we propose the two-level n-gram inverted index (simply, the n-gram/2L index) that significantly reduces the size and improves the query performance while preserving the advantages of the n-gram inverted index. The proposed index eliminates the redundancy of the position information that exists in the n-gram inverted index. The proposed index is constructed in two steps: 1) extracting subsequences of length m from documents and 2) extracting n-grams from those subsequences. We formally prove that this two-step construction is identical to the relational normalization process that removes the redundancy caused by a non-trivial multivalued dependency. The n-gram/2L index has excellent properties: 1) it significantly reduces the size and improves the performance compared with the n-gram inverted index with these improvements becoming more marked as the database size gets larger; 2) the query processing time increases only very slightly as the query length gets longer. Experimental results using databases of 1 GBytes show that the size of the n-gram/2L index is reduced by up to 1.9 ~ 2.7 times and, at the same time, the query performance is improved by up to 13.1 times compared with those of the n-gram inverted index.-
dc.languageEnglish-
dc.publisherVLDB Endowment-
dc.titlen-Gram/2L: A Space and Time Efficient Two-Level n-Gram Inverted Index Structure-
dc.typeConference-
dc.identifier.scopusid2-s2.0-33745621089-
dc.type.rimsCONF-
dc.citation.beginningpage325-
dc.citation.endingpage336-
dc.citation.publicationname31st Int'l Conf. on Very Large Data Bases-
dc.identifier.conferencecountryNO-
dc.identifier.conferencelocationTrondheim-
dc.contributor.localauthorWhang, Kyu-Young-
dc.contributor.localauthorLee, Jae-Gil-
dc.contributor.nonIdAuthorKim, Min-Soo-
dc.contributor.nonIdAuthorLee, Min-Jae-
Appears in Collection
CS-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0