DSpace at KOASAS: Efficient parallel query processing of massive XML data in mapreduce

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Ph.D.(박사논문)

Efficient parallel query processing of massive XML data in mapreduce맵리듀스를 이용한 대용량 XML 데이터의 병렬 질의 처리 기법

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 622
Download : 0

Export

Choi, Hye-Bong / 최혜봉

Extensible Markup Language (XML) is a markup language to represent electronic documents readable for both human and machine. By virtue of its simplicity and extensibility, XML has played increasingly important roles to store and transfer data in industrial and academic fields over the past few decades. Accordingly, the size of XML document has grown significantly, especially huge amount of data are periodically produced and accumulated in data logging and scientific areas as new data are collected. The produced XML data are stored in the form of a huge XML file. This leads to growing demand for XML data analysis with multiple user-queries that are prepared in advance when XML data are not completely generated. However conventional XML database systems and XML pub/sub systems do not support this type of workload as they are designed for smaller size of XML documents. The MapReduce framework is more suitable to process the user-queries over large XML data because of its scalability. In this thesis, we present a parallel method to process multiple twig pattern queries simultaneously by using 2 MapReduce jobs, one for path filtering and another for twig join. In this way, we avoid the long iteration of MapReduce jobs that incurs a lot of redundant I/O cost and queries can share their input scans and intermediate results with each other to save I/O cost. We also devise an elaborate run-time load balancing scheme inside MapReduce for fair assignment of twig join workloads. Basically, MapReduce follows data-parallelism which divides large data into smaller blocks and processes them in distributed nodes. However a twig join with partitioned inputs yields incomplete join results. Instead, we apply task-parallelism into the MapReduce framework which assigns twig join operations to each reducer. We propose a sophisticated run-time load balancing algorithm for fair assignment of twig join workloads among nodes based on cost estimation of each reducer. Applying these appr...

Advisors: Lee, Yoon-Joon researcher; 이윤준

Description: 한국과학기술원 : 전산학과,

Publisher: 한국과학기술원

Issue Date: 2014

Identifier: 568611/325007 / 020097095

Language: eng

Description: 학위논문(박사) - 한국과학기술원 : 전산학과, 2014.2, [ viii, 76 p. ]

Keywords: XML; Big data; 질의최적화; 맵리듀스; 병렬처리; XML; parallel processing; query optimization; MapReduce; tree labeling

URI: http://hdl.handle.net/10203/197823

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=568611&flag=dissertation

Appears in Collection: CS-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Efficient parallel query processing of massive XML data in mapreduce맵리듀스를 이용한 대용량 XML 데이터의 병렬 질의 처리 기법

KOASAS

Communities & Collections