Parallel community detection on large graphs with MapReduce and GraphChi

Community detection from social network data gains much attention from academia and industry since it has many real-world applications. The Girvan-Newman (GN) algorithm is a divisive hierarchical clustering algorithm for community detection, which is regarded as one of the most popular algorithms. It exploits the concept of edge betweenness to divide a network into multiple communities. Though it is being widely used, it has limitations in supporting large-scale networks since it needs to calculate the shortest path between every pair of vertices in a network. In this paper, we develop two parallel versions of the GN algorithm to support large-scale networks. First, we propose a new algorithm, which we call Shortest Path Betweenness MapReduce Algorithm (SPB-MRA), that utilizes the MapReduce model. Second, we propose another new algorithm, which we call Shortest Path Betweenness Vertex-Centric Algorithm (SPB-VCA), that utilizes the vertex-centric model. An approximation technique is also developed to further speed up community detection processes. We implemented SPB-MRA using Hadoop and SPB-VCA using GraphChi, and then evaluated the performance of SPB-MRA on Amazon EC2 instances and that of SPB-VCA on a single commodity PC. The evaluation results showed that the elapsed time of SPB-MRA decreased almost linearly as the number of reducers increased, SPB-VCA outperformed SPB-MRA just on a single PC by 4-6 times, and the approximation technique introduced negligible errors. (C) 2015 Elsevier B.V. All rights reserved.
Publisher
ELSEVIER SCIENCE BV
Issue Date
2016-07
Language
ENG
Keywords

EDGE-BETWEENNESS; NETWORKS

Citation

DATA & KNOWLEDGE ENGINEERING, v.104, pp.17 - 31

ISSN
0169-023X
DOI
10.1016/j.datak.2015.05.001
URI
http://hdl.handle.net/10203/213274
Appears in Collection
KSE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
  • Hit : 156
  • Download : 0
  • Cited 0 times in thomson ci
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡClick to seewebofscience_button
⊙ Cited 2 items in WoSClick to see citing articles inrecords_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0