Efficient parallel query processing of massive XML data in mapreduce맵리듀스를 이용한 대용량 XML 데이터의 병렬 질의 처리 기법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 628
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisorLee, Yoon-Joon-
dc.contributor.advisor이윤준-
dc.contributor.authorChoi, Hye-Bong-
dc.contributor.author최혜봉-
dc.date.accessioned2015-04-23T08:30:37Z-
dc.date.available2015-04-23T08:30:37Z-
dc.date.issued2014-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=568611&flag=dissertation-
dc.identifier.urihttp://hdl.handle.net/10203/197823-
dc.description학위논문(박사) - 한국과학기술원 : 전산학과, 2014.2, [ viii, 76 p. ]-
dc.description.abstractExtensible Markup Language (XML) is a markup language to represent electronic documents readable for both human and machine. By virtue of its simplicity and extensibility, XML has played increasingly important roles to store and transfer data in industrial and academic fields over the past few decades. Accordingly, the size of XML document has grown significantly, especially huge amount of data are periodically produced and accumulated in data logging and scientific areas as new data are collected. The produced XML data are stored in the form of a huge XML file. This leads to growing demand for XML data analysis with multiple user-queries that are prepared in advance when XML data are not completely generated. However conventional XML database systems and XML pub/sub systems do not support this type of workload as they are designed for smaller size of XML documents. The MapReduce framework is more suitable to process the user-queries over large XML data because of its scalability. In this thesis, we present a parallel method to process multiple twig pattern queries simultaneously by using 2 MapReduce jobs, one for path filtering and another for twig join. In this way, we avoid the long iteration of MapReduce jobs that incurs a lot of redundant I/O cost and queries can share their input scans and intermediate results with each other to save I/O cost. We also devise an elaborate run-time load balancing scheme inside MapReduce for fair assignment of twig join workloads. Basically, MapReduce follows data-parallelism which divides large data into smaller blocks and processes them in distributed nodes. However a twig join with partitioned inputs yields incomplete join results. Instead, we apply task-parallelism into the MapReduce framework which assigns twig join operations to each reducer. We propose a sophisticated run-time load balancing algorithm for fair assignment of twig join workloads among nodes based on cost estimation of each reducer. Applying these appr...eng
dc.languageeng-
dc.publisher한국과학기술원-
dc.subjectXML-
dc.subjectBig data-
dc.subject질의최적화-
dc.subject맵리듀스-
dc.subject병렬처리-
dc.subjectXML-
dc.subjectparallel processing-
dc.subjectquery optimization-
dc.subjectMapReduce-
dc.subjecttree labeling-
dc.titleEfficient parallel query processing of massive XML data in mapreduce-
dc.title.alternative맵리듀스를 이용한 대용량 XML 데이터의 병렬 질의 처리 기법-
dc.typeThesis(Ph.D)-
dc.identifier.CNRN568611/325007 -
dc.description.department한국과학기술원 : 전산학과, -
dc.identifier.uid020097095-
dc.contributor.localauthorLee, Yoon-Joon-
dc.contributor.localauthor이윤준-
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0