An Experimental Comparison of Iterative MapReduce Frameworks

Cited 8 time in webofscience Cited 0 time in scopus
  • Hit : 355
  • Download : 0
MapReduce has become a dominant framework in big data analysis, and thus there have been significant efforts to implement various data analysis algorithms in MapReduce. Many data analysis algorithms are inherently iterative, repeating the same set of tasks until a convergence. To efficiently support iterative algorithms at scale, a few variants of Hadoop and new platforms have been proposed and actively developed in both academia and industry. Representative systems include HaLoop, iMapReduce, Twister, and Spark. In this paper, we experimentally compare Hadoop and the aforementioned systems using various workloads and metrics. The five systems are compared through four iterative algorithms-PageRank, recursive query, k-means, and logistic regression-on 50 Amazon EC2 machines (200 cores in total). We thoroughly explore the effectiveness of their new caching, communication, and scheduling mechanisms in support of iterative computation. Our evaluation also shows the performance depending on data skew-ness and memory residency. Overall, we believe that our evaluation and interpretation will be useful for designing a new framework or improving the existing ones.
Publisher
ACM Special Interest Group on Information Retrieval (SIGIR)
Issue Date
2016-10-26
Language
English
Citation

25th ACM Int'l on Conf. on Information and Knowledge Management (CIKM), pp.2089 - 2094

DOI
10.1145/2983323.2983647
URI
http://hdl.handle.net/10203/215623
Appears in Collection
CS-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 8 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0