Efficient parallel query processing of massive XML data in mapreduce맵리듀스를 이용한 대용량 XML 데이터의 병렬 질의 처리 기법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 616
  • Download : 0
Extensible Markup Language (XML) is a markup language to represent electronic documents readable for both human and machine. By virtue of its simplicity and extensibility, XML has played increasingly important roles to store and transfer data in industrial and academic fields over the past few decades. Accordingly, the size of XML document has grown significantly, especially huge amount of data are periodically produced and accumulated in data logging and scientific areas as new data are collected. The produced XML data are stored in the form of a huge XML file. This leads to growing demand for XML data analysis with multiple user-queries that are prepared in advance when XML data are not completely generated. However conventional XML database systems and XML pub/sub systems do not support this type of workload as they are designed for smaller size of XML documents. The MapReduce framework is more suitable to process the user-queries over large XML data because of its scalability. In this thesis, we present a parallel method to process multiple twig pattern queries simultaneously by using 2 MapReduce jobs, one for path filtering and another for twig join. In this way, we avoid the long iteration of MapReduce jobs that incurs a lot of redundant I/O cost and queries can share their input scans and intermediate results with each other to save I/O cost. We also devise an elaborate run-time load balancing scheme inside MapReduce for fair assignment of twig join workloads. Basically, MapReduce follows data-parallelism which divides large data into smaller blocks and processes them in distributed nodes. However a twig join with partitioned inputs yields incomplete join results. Instead, we apply task-parallelism into the MapReduce framework which assigns twig join operations to each reducer. We propose a sophisticated run-time load balancing algorithm for fair assignment of twig join workloads among nodes based on cost estimation of each reducer. Applying these appr...
Advisors
Lee, Yoon-Joonresearcher이윤준
Description
한국과학기술원 : 전산학과,
Publisher
한국과학기술원
Issue Date
2014
Identifier
568611/325007  / 020097095
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전산학과, 2014.2, [ viii, 76 p. ]

Keywords

XML; Big data; 질의최적화; 맵리듀스; 병렬처리; XML; parallel processing; query optimization; MapReduce; tree labeling

URI
http://hdl.handle.net/10203/197823
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=568611&flag=dissertation
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0