Large Scale Social Networks     Analysis – LS SNA   Rui Sarmento           João Gama           Tiago Cunha           Alber...
Outline – LS SNA                                            2/191.   Motivation2.   Software Tools     –   State of the ar...
1.Motivation – LS SNA                3/19Generic Problem:  Nowadays, the huge amounts of data  available pose problems for...
1.Motivation – LS SNA                4/19Solution:  Emerging technologies, like modern models  for parallel computing, mul...
1.Motivation – LS SNA                                          5/19Particular case Study:  CrunchBase database (accessed M...
1.Motivation – LS SNA                     6/19What can we do? - we want to analyze entities behavior in terms of   relatio...
2. Software Tools – LS SNA 7/19• State of the art – Recent Evolution2001 – Boost Graph Library (C++)2005 – Parallel BGL (C...
2. Software Tools – LS SNA 8/19• PEGASUS  – Computation framework written in JAVA  – Is an open-source, graph-mining syste...
2. Software Tools – LS SNA 9/19• Graphlab API  – Computation framework written in C++  – Computation in GraphLab is applie...
2. Software Tools – LS SNA 10/19• Snap (Stanford Network Analysis Platform)  – Not Parallel however…  – SNAP library is wr...
2. Software Tools – LS SNA 11/19• Other Tools (Resuming)  – Several more tools available:     • Giraph – graph oriented   ...
2. Software Tools – LS SNA 12/19Software           Pegasus          Graphlab                SnapAlgorithmsavailable from  ...
3. Case Study – LS SNA                           13/19   => Some Numbers• Network of companies and financial organizations...
3. Case Study – LS SNA                        14/19 => Algorithms and Used tools     – Node Degree with PEGASUS     – Frie...
3. Case Study – LS SNA   15/19 => Processing Time
4. Summary & Conclusions LS SNA                             16/19• Summary & Conclusions  – This paper resumes which tools...
References I – LS SNA                                    17/19• APACHE. 2012. Apache Giraph [Online]. The Apache Software ...
References II – LS SNA                                 18/19• UNIVERSITY, C. M. 2012. Project Pegasus [Online]. Available:...
END – LS SNA          19/19         Thank You!         Questions?
Upcoming SlideShare
Loading in …5
×

Large scale social networks analysis joclad 2013

754 views

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
754
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
16
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Large scale social networks analysis joclad 2013

  1. 1. Large Scale Social Networks Analysis – LS SNA Rui Sarmento João Gama Tiago Cunha Albert Bifet LIAAD/INESC TEC FEP - University of Porto April 13, 2013
  2. 2. Outline – LS SNA 2/191. Motivation2. Software Tools – State of the art – Recent Evolution – PEGASUS – Graphlab – Snap (Stanford Network Analysis Platform) – Other Tools3. Case Study – Network of companies and financial organizations – Some Numbers – Algorithms and Used tools – Processing Time4. Summary & Conclusions
  3. 3. 1.Motivation – LS SNA 3/19Generic Problem: Nowadays, the huge amounts of data available pose problems for analysis with regular hardware and/or software.Example Facts: “We have produced more data in the last two years than in all of prior history so we are witnessing a Big Bang of Data” – Tim McGuire, Mckinsey
  4. 4. 1.Motivation – LS SNA 4/19Solution: Emerging technologies, like modern models for parallel computing, multicore computers or even clusters of computers, can be very useful for analyzing massive network data.
  5. 5. 1.Motivation – LS SNA 5/19Particular case Study: CrunchBase database (accessed May 2012)• Network A of companies and financial organizations/funds, e.g: Y X » Company Y has connection to investment fund X• Network B of persons and companies e.g.: A Y » Person A has connection to company Y
  6. 6. 1.Motivation – LS SNA 6/19What can we do? - we want to analyze entities behavior in terms of relationships, or other influences.- we want to determine some characteristic of the network from the point of view of the self- centered and the network as a whole.What is the problem?- Takes too much time (many hours or even days) to do it with normal software like Gephi or R even with a good PC
  7. 7. 2. Software Tools – LS SNA 7/19• State of the art – Recent Evolution2001 – Boost Graph Library (C++)2005 – Parallel BGL (C++), Hadoop (Java)2007 – Development of Graphlab Starts2008 – SNAP Small-world Network Analysis and Partitioning (C, openMP) . .2013 – Several Graph Frameworks using Hadoop and/or HDFS
  8. 8. 2. Software Tools – LS SNA 8/19• PEGASUS – Computation framework written in JAVA – Is an open-source, graph-mining system with massive scalability – Dependent of Hadoop – Graph Oriented Tool
  9. 9. 2. Software Tools – LS SNA 9/19• Graphlab API – Computation framework written in C++ – Computation in GraphLab is applied to dependent records which are stored as vertices in a large distributed data-graph – Computation in GraphLab is expressed as vertex- programs which are executed in parallel on each vertex and can interact with neighboring vertices. – GraphLab programs interact by directly reading the state of neighboring vertices and by modifying the state of adjacent edges. – HDFS Integration: Access your data directly from HDFS
  10. 10. 2. Software Tools – LS SNA 10/19• Snap (Stanford Network Analysis Platform) – Not Parallel however… – SNAP library is written in C++ and optimized for maximum performance and compact graph representation – It easily scales to massive networks with hundreds of millions of nodes, and billions of edges – …although some algorithms in Snap might be slow due to complexity
  11. 11. 2. Software Tools – LS SNA 11/19• Other Tools (Resuming) – Several more tools available: • Giraph – graph oriented • Rhadoop (Package for R and Hadoop) – generic tool => All previous tools dependant of Hadoop which seems to be more and more commonly adopted
  12. 12. 2. Software Tools – LS SNA 12/19Software Pegasus Graphlab SnapAlgorithmsavailable from Degree approximate Cascadessoftware install PageRank diameter Centrality(graph analysis) Random Walk kcore Cliques with Restart pagerank Community (RWR) connected Concomp Radius component Forestfire Connected simple coloring Graphgen Components directed triangle count Graphhash format convert Kcores sssp Kronem undirected triangle Krongen count Kronfit Maggen Magfit Motifs Ncpplot Netevol Netinf Netstat Mkdatasets infopath
  13. 13. 3. Case Study – LS SNA 13/19 => Some Numbers• Network of companies and financial organizations/funds 1. Number of firms: 88,269 2. Number of investment funds: 7697• Network of persons and companies 1. Number of persons: 118,394
  14. 14. 3. Case Study – LS SNA 14/19 => Algorithms and Used tools – Node Degree with PEGASUS – Friends of Friends with Hadoop Map-Reduce – Centrality Measures with Snap (Stanford Network Analysis Platform) – Triangles Counting with Graphlab
  15. 15. 3. Case Study – LS SNA 15/19 => Processing Time
  16. 16. 4. Summary & Conclusions LS SNA 16/19• Summary & Conclusions – This paper resumes which tools to look for when dealing with big graphs studies. – We are witnesses of a big proliferation of software tools aimed at the analysis of big scale graphs. – What was once a problem to deal with these networks is solved with the right tools
  17. 17. References I – LS SNA 17/19• APACHE. 2012. Apache Giraph [Online]. The Apache Software Foundation. Available: http://incubator.apache.org/giraph/.• GRAPHLAB. Graphlab The Abstraction [Online]. Available: http://graphlab.org/home/abstraction/ 2012].• GRAPHLAB. 2012. Graph Analytics Toolkit [Online]. Available: http://graphlab.org/toolkits/graph-analytics/ 2012].• HOLMES, A. 2012. Hadoop In Practice, Manning.• LESKOVEC, J. Stanford Network Analysis Platform [Online]. Available: http://snap.stanford.edu/snap/ [Accessed 12-2012 2012].• MAZZA, G. 2012. FrontPage - Hadoop Wiki [Online]. Available: http://wiki.apache.org/lucene-hadoop/ [Accessed 11-2012.• THANEDAR, V. 2012. API Documentation [Online]. Available: http://developer.crunchbase.com/docs [Accessed 04-2012 2012].
  18. 18. References II – LS SNA 18/19• UNIVERSITY, C. M. 2012. Project Pegasus [Online]. Available: http://www.cs.cmu.edu/~pegasus/ 2012].• WASHINGTON, U. O. What is Hadoop? [Online]. Available: http://escience.washington.edu/get-help-now/what-hadoop [Accessed 05-03-2013 2013].• OWENS, J. R. 2013. Hadoop Real-World Solutions Cookbook. PACKT Publishing.• HOLMES, A. 2012. Hadoop In Practice, Manning.• McGuire, T. Big Data Better Decisions [Online]. Available: http://www.slideshare.net/McK_CMSOForum/big-data-and-advanced- analytics [Accessed 05-03-2013 2013].
  19. 19. END – LS SNA 19/19 Thank You! Questions?

×