DSpace at KOASAS: Efficient parallel query processing of massive XML data in mapreduce

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Ph.D.(박사논문)

Efficient parallel query processing of massive XML data in mapreduce맵리듀스를 이용한 대용량 XML 데이터의 병렬 질의 처리 기법

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 628
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	Lee, Yoon-Joon	-
dc.contributor.advisor	이윤준	-
dc.contributor.author	Choi, Hye-Bong	-
dc.contributor.author	최혜봉	-
dc.date.accessioned	2015-04-23T08:30:37Z	-
dc.date.available	2015-04-23T08:30:37Z	-
dc.date.issued	2014	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=568611&flag=dissertation	-
dc.identifier.uri	http://hdl.handle.net/10203/197823	-
dc.description	학위논문(박사) - 한국과학기술원 : 전산학과, 2014.2, [ viii, 76 p. ]	-
dc.description.abstract	Extensible Markup Language (XML) is a markup language to represent electronic documents readable for both human and machine. By virtue of its simplicity and extensibility, XML has played increasingly important roles to store and transfer data in industrial and academic fields over the past few decades. Accordingly, the size of XML document has grown significantly, especially huge amount of data are periodically produced and accumulated in data logging and scientific areas as new data are collected. The produced XML data are stored in the form of a huge XML file. This leads to growing demand for XML data analysis with multiple user-queries that are prepared in advance when XML data are not completely generated. However conventional XML database systems and XML pub/sub systems do not support this type of workload as they are designed for smaller size of XML documents. The MapReduce framework is more suitable to process the user-queries over large XML data because of its scalability. In this thesis, we present a parallel method to process multiple twig pattern queries simultaneously by using 2 MapReduce jobs, one for path filtering and another for twig join. In this way, we avoid the long iteration of MapReduce jobs that incurs a lot of redundant I/O cost and queries can share their input scans and intermediate results with each other to save I/O cost. We also devise an elaborate run-time load balancing scheme inside MapReduce for fair assignment of twig join workloads. Basically, MapReduce follows data-parallelism which divides large data into smaller blocks and processes them in distributed nodes. However a twig join with partitioned inputs yields incomplete join results. Instead, we apply task-parallelism into the MapReduce framework which assigns twig join operations to each reducer. We propose a sophisticated run-time load balancing algorithm for fair assignment of twig join workloads among nodes based on cost estimation of each reducer. Applying these appr...	eng
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	XML	-
dc.subject	Big data	-
dc.subject	질의최적화	-
dc.subject	맵리듀스	-
dc.subject	병렬처리	-
dc.subject	XML	-
dc.subject	parallel processing	-
dc.subject	query optimization	-
dc.subject	MapReduce	-
dc.subject	tree labeling	-
dc.title	Efficient parallel query processing of massive XML data in mapreduce	-
dc.title.alternative	맵리듀스를 이용한 대용량 XML 데이터의 병렬 질의 처리 기법	-
dc.type	Thesis(Ph.D)	-
dc.identifier.CNRN	568611/325007	-
dc.description.department	한국과학기술원 : 전산학과,	-
dc.identifier.uid	020097095	-
dc.contributor.localauthor	Lee, Yoon-Joon	-
dc.contributor.localauthor	이윤준	-

Appears in Collection: CS-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Efficient parallel query processing of massive XML data in mapreduce맵리듀스를 이용한 대용량 XML 데이터의 병렬 질의 처리 기법

KOASAS

Communities & Collections