DSpace at KOASAS: HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple Twig Pattern Queries

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Conference Papers(학술회의논문)

HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple Twig Pattern Queries

Cited 0 time in webofscience

Cited 0 time in

Hit : 513
Download : 2

Export

Choi, HB / Lee, KH / Kim, SH / Lee, Yoon Joon researcher / Moon, BK

The volume of XML data is tremendous in many areas, but especially in data logging and scientific areas. XML data in the areas are accumulated over time as new data are continuously collected. It is a challenge to process massive XML data with multiple twig pattern queries given by multiple users in a timely manner. We showcase HadoopXML, a system that simultaneously processes many twig pattern queries for a massive volume of XML data with Hadoop. Specifically, HadoopXML provides an efficient way to process a single large XML file in parallel. It processes multiple twig pattern queries simultaneously with a shared input scan. Users do not need to iterate M/R jobs for each query. HadoopXML also reduces many I/Os by enabling twig pattern queries to share their path solutions each other. Moreover, HadoopXML provides a sophisticated runtime load balancing scheme for fairly assigning multiple twig pattern joins across nodes. With synthetic and real world XML dataset, we demonstrate how efficiently HadoopXML processes many twig pattern queries in a shared and balanced way.

Publisher: ACM Special Interest Group on Information Retrieval (SIGIR)

Issue Date: 2012-10-31

Language: English

Citation: ACM International Conference on Information and Knowledge Management (CIKM) , pp.2737 - 2739

DOI: 10.1145/2396761.2398745

URI: http://hdl.handle.net/10203/172953

Appears in Collection: CS-Conference Papers(학술회의논문)

Files in This Item

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple Twig Pattern Queries

KOASAS

Communities & Collections