DSpace at KOASAS: n-Gram/2L: A Space and Time Efficient Two-Level n-Gram Inverted Index Structure

DSpace at KOASAS

College of Engineering(공과대학)Dept. of Industrial and Systems Engineering(산업및시스템공학과)IE-Conference Papers(학술회의논문)

n-Gram/2L: A Space and Time Efficient Two-Level n-Gram Inverted Index Structure

Cited 0 time in webofscience

Cited 0 time in

Hit : 721
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Kim, Min-Soo	ko
dc.contributor.author	Whang, Kyu-Young	ko
dc.contributor.author	Lee, Jae-Gil	ko
dc.contributor.author	Lee, Min-Jae	ko
dc.date.accessioned	2013-03-18T19:36:58Z	-
dc.date.available	2013-03-18T19:36:58Z	-
dc.date.created	2012-02-06	-
dc.date.created	2012-02-06	-
dc.date.issued	2005-09-01	-
dc.identifier.citation	31st Int'l Conf. on Very Large Data Bases, pp.325 - 336	-
dc.identifier.uri	http://hdl.handle.net/10203/151764	-
dc.description.abstract	The n-gram inverted index has two major advantages: language-neutral and error-tolerant. Due to these advantages, it has been widely used in information retrieval or in similar sequence matching for DNA and protein databases. Nevertheless, the n-gram inverted index also has drawbacks: the size tends to be very large, and the performance of queries tends to be bad. In this paper, we propose the two-level n-gram inverted index (simply, the n-gram/2L index) that significantly reduces the size and improves the query performance while preserving the advantages of the n-gram inverted index. The proposed index eliminates the redundancy of the position information that exists in the n-gram inverted index. The proposed index is constructed in two steps: 1) extracting subsequences of length m from documents and 2) extracting n-grams from those subsequences. We formally prove that this two-step construction is identical to the relational normalization process that removes the redundancy caused by a non-trivial multivalued dependency. The n-gram/2L index has excellent properties: 1) it significantly reduces the size and improves the performance compared with the n-gram inverted index with these improvements becoming more marked as the database size gets larger; 2) the query processing time increases only very slightly as the query length gets longer. Experimental results using databases of 1 GBytes show that the size of the n-gram/2L index is reduced by up to 1.9 ~ 2.7 times and, at the same time, the query performance is improved by up to 13.1 times compared with those of the n-gram inverted index.	-
dc.language	English	-
dc.publisher	VLDB Endowment	-
dc.title	n-Gram/2L: A Space and Time Efficient Two-Level n-Gram Inverted Index Structure	-
dc.type	Conference	-
dc.identifier.scopusid	2-s2.0-33745621089	-
dc.type.rims	CONF	-
dc.citation.beginningpage	325	-
dc.citation.endingpage	336	-
dc.citation.publicationname	31st Int'l Conf. on Very Large Data Bases	-
dc.identifier.conferencecountry	NO	-
dc.identifier.conferencelocation	Trondheim	-
dc.contributor.localauthor	Whang, Kyu-Young	-
dc.contributor.localauthor	Lee, Jae-Gil	-
dc.contributor.nonIdAuthor	Kim, Min-Soo	-
dc.contributor.nonIdAuthor	Lee, Min-Jae	-

Appears in Collection: CS-Conference Papers(학술회의논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

n-Gram/2L: A Space and Time Efficient Two-Level n-Gram Inverted Index Structure

KOASAS

Communities & Collections