GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013


Published on

GraphLab is like Hadoop for graphs in that it enables users to easily express and execute machine learning algorithms on massive graphs. In this session, we illustrate how GraphLab leverages Amazon EC2 and advances in graph representation, asynchronous communication, and scheduling to achieve orders-of-magnitude performance gains over systems like Hadoop on real-world data.

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

  1. 1. Large-Scale Machine Learning and Graphs Carlos Guestrin November 15, 2013 © 2013, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of, Inc.
  5. 5. Three Phases in Technological Development 1. Possibility 2. Scalability 3. Usability Wide Adoption Beyond Experts & Enthusiast
  6. 6. Machine Learning PHASE 1: POSSIBILITY
  7. 7. Rosenblatt 1957
  8. 8. Machine Learning PHASE 2: SCALABILITY
  9. 9. Needless to Say, We Need Machine Learning for Big Data 6 Billion Flickr Photos 28 Million Wikipedia Pages 1 Billion Facebook Users 72 Hours a Minute YouTube “… data a new class of economic asset, like currency or gold.”
  10. 10. Big Learning How will we design and implement parallel learning systems?
  11. 11. MapReduce for Data-Parallel ML Excellent for large data-parallel tasks! Data-Parallel MapReduce Feature Extraction Cross Validation Computing Sufficient Statistics Graph-Parallel Is there more to Machine Learning ?
  12. 12. The Power of Dependencies where the value is!
  13. 13. Flashback to 1998 Why? First Google advantage: a Graph Algorithm & a System to Support it!
  14. 14. It’s all about the graphs…
  15. 15. Social Media • Science Advertising Web Graphs encode the relationships between: People Facts Products Interests • Big: 100 billions of vertices and edges and rich metadata – – Facebook (10/2012): 1B users, 144B friendships Twitter (2011): 15B follower edges Ideas
  16. 16. Examples of Graphs in Machine Learning
  17. 17. Label a Face and Propagate
  18. 18. Pairwise similarity not enough… Not similar enough to be sure
  19. 19. Propagate Similarities & Co-occurrences for Accurate Predictions similarity edges Probabilistic Graphical Models co-occurring faces further evidence
  20. 20. Collaborative Filtering: Exploiting Dependencies Women on the Verge of a Nervous Breakdown The Celebration Latent Factor Models City of God Non-negative Matrix Factorization What do I recommend??? Wild Strawberries La Dolce Vita
  21. 21. Estimate Political Bias ? Liberal ? ? Post Post Semi-Supervised & ? Post Post Transductive Learning ? Post Post ? ? Conservative ? ? Post Post ? ? ? ? Post ? ? ? ? ? ? Post ? Post Post Post ? ? ? ? Post ? Post ? Post ? ? ? ?
  22. 22. Topic Modeling Cat Apple Growth Hat Plant LDA and co.
  23. 23. Machine Learning Pipeline Data Extract Features Graph Formation Structured Machine Learning Algorithm Value from Data Face labels images docs Movie ratings Social activity Doc topics movie recommend sentiment analysis
  24. 24. ML Tasks Beyond Data-Parallelism Data-Parallel Graph-Parallel Map Reduce Feature Extraction Cross Validation Computing Sufficient Statistics Graphical Models Semi-Supervised Gibbs Sampling Learning Belief Propagation Variational Opt. Collaborative Filtering Tensor Factorization Label Propagation CoEM Graph Analysis PageRank Triangle Counting
  25. 25. Example of a Graph-Parallel Algorithm
  26. 26. PageRank Depends on rank of who follows her Depends on rank of who follows them… What’s the rank of this user? Rank? Loops in graph  Must iterate!
  27. 27. PageRank Iteration Iterate until convergence: R[j] “My rank is weighted average of my friends’ ranks” wji R[i] – α is the random reset probability – wji is the prob. transitioning (similarity) from j to i
  28. 28. Properties of Graph Parallel Algorithms Dependency Graph Local Updates Iterative Computation My Rank Friends Rank
  29. 29. The Need for a New Abstraction • Need: Asynchronous, Dynamic Parallel Computations Data-Parallel Graph-Parallel Map Reduce Feature Extraction Cross Validation Computing Sufficient Statistics Graphical Models Gibbs Sampling Belief Propagation Variational Opt. Collaborative Filtering Tensor Factorization Semi-Supervised Learning Label Propagation CoEM Data-Mining PageRank Triangle Counting
  30. 30. The GraphLab Goals Know how to solve ML problem on 1 machine Efficient parallel predictions
  32. 32. Data Graph Data associated with vertices and edges Graph: • Social Network Vertex Data: • User profile text • Current interests estimates Edge Data: • Similarity weights
  33. 33. How do we program graph computation? “Think like a Vertex.” -Malewicz et al. [SIGMOD’10]
  34. 34. Update Functions User-defined program: applied to vertex transforms data in scope of vertex pagerank(i, scope){ // Get Neighborhood data (R[i], wij, R[j]) scope; Update function applied (asynchronously) // R[i] ¬the + (1- a ) å w ´ R[ j]; Update a vertex data in parallel until convergence ji jÎN [i ] // Reschedule Neighbors if needed Many schedulers available tochanges thencomputation if R[i] prioritize reschedule_neighbors_of(i); } Dynamic computation
  35. 35. The GraphLab Framework Graph Based Data Representation Scheduler Update Functions User Computation Consistency Model
  36. 36. Alternating Least Squares SVD CoEM Lasso LDA Splash Sampler Bayesian Tensor Factorization Belief Propagation PageRank Gibbs Sampling SVM K-Means Dynamic Block Gibbs Sampling Linear Solvers …Many others… Matrix Factorization
  37. 37. Never Ending Learner Project (CoEM) Hadoop 95 Cores 7.5 hrs Distributed GraphLab 32 EC2 machines 80 secs 0.3% of Hadoop time 2 orders of mag faster  2 orders of mag cheaper
  38. 38. – ML algorithms as vertex programs – Asynchronous execution and consistency models
  39. 39. Thus far… GraphLab 1 provided exciting scaling performance But… We couldn’t scale up to Altavista Webgraph 2002 1.4B vertices, 6.7B edges
  40. 40. Natural Graphs [Image from WikiCommons]
  41. 41. Problem: Existing distributed graph computation systems perform poorly on Natural Graphs
  42. 42. Achilles Heel: Idealized Graph Assumption Assumed… Small degree  Easy to partition But, Natural Graphs… Many high degree vertices (power-law degree distribution)  Very hard to partition
  43. 43. Power-Law Degree Distribution 10 Number of Vertices count 10 8 10 High-Degree Vertices: 1% vertices adjacent to 50% of edges 6 10 4 10 2 10 AltaVista WebGraph 1.4B Vertices, 6.6B Edges 0 10 0 10 2 10 4 10 degree Degree 6 10 8 10
  44. 44. High Degree Vertices are Common Popular Movies Users “Social” People Netflix Movies Hyper Parameters θθ ZZ θ θ ZZ Z Z ww Z Z Z ww Z Z ww ww Z ww Z ww ww ww b Docs α Common Words LDA Words Obama
  45. 45. Power-Law Degree Distribution “Star Like” Motif President Obama Followers
  46. 46. Problem: High Degree Vertices  High Communication for Distributed Updates Data transmitted across network Y O(# cut edges) Natural graphs do not have low-cost balanced cuts Machine 1 [Leskovec et al. 08, Lang 04] Machine 2 Popular partitioning tools (Metis, Chaco,…) perform poorly [Abou-Rjeili et al. 06] Extremely slow and require substantial memory
  47. 47. Random Partitioning • Both GraphLab 1, Pregel, Twitter, Facebook,… rely on Random (hashed) partitioning for Natural Graphs For p Machines: 10 Machines  90% of edges cut Machine 1 Machine 2 100 Machines  99% of edges cut! All data is communicated… Little advantage over MapReduce
  48. 48. In Summary GraphLab 1 and Pregel are not well suited for natural graphs • Poor performance on high-degree vertices • Low Quality Partitioning
  49. 49. PowerGraph SCALABILITY 2
  50. 50. Common Pattern for Update Fncs. R[j] GraphLab_PageRank(i) // Compute sum over neighbors total = 0 foreach( j in in_neighbors(i)): total = total + R[j] * wji // Update the PageRank R[i] = 0.1 + total // Trigger neighbors to run again if R[i] not converged then foreach( j in out_neighbors(i)) signal vertex-program on j wji R[i] Gather Information About Neighborhood Apply Update to Vertex Scatter Signal to Neighbors & Modify Edge Data
  51. 51. GAS Decomposition Gather (Reduce) Apply Scatter Accumulate information about neighborhood Apply the accumulated value to center vertex Update adjacent edges and vertices. Σ +…+  Y + Y Parallel “Sum” Y Y Y Y ’ Y’
  52. 52. Many ML Algorithms fit into GAS Model graph analytics, inference in graphical models, matrix factorization, collaborative filtering, clustering, LDA, …
  53. 53. Minimizing Communication in GL2 PowerGraph: Vertex Cuts GL2 PowerGraph includes novel vertex cut algorithms Y Communication linear  Provides order in #magnitude gains in performance of spanned machines A vertex-cut minimizes # machines per vertex Percolation theory suggests Power Law graphs can be split by removing only a small set of vertices [Albert et al. 2000]  Small vertex cuts possible!
  54. 54. 2 From the Abstraction to a System
  55. 55. Triangle Counting on Twitter Graph 34.8 Billion Triangles Hadoop [WWW’11] 1636 Machines 423 Minutes 64 Machines 15 Seconds Why? Wrong Abstraction  Broadcast O(degree2) messages per Vertex S. Suri and S. Vassilvitskii, “Counting triangles and the curse of the last reducer,” WWW’11
  56. 56. Topic Modeling (LDA) Specifically engineered for this task 64 cc2.8xlarge EC2 Nodes 200 lines of code & 4 human hours • English language Wikipedia – 2.6M Documents, 8.3M Words, 500M Tokens – Computationally intensive algorithm
  57. 57. How well does GraphLab scale? Yahoo Altavista Web Graph (2002): One of the largest publicly available webgraphs 1.4B Webpages, 6.7 Billion Links 7 seconds per iter. 1B links processed per second 30 lines of Nodes code 64 HPC user 1024 Cores (2048 HT) 4.4 TB RAM
  58. 58. GraphChi: Going small with GraphLab Solve huge problems on small or embedded devices? Key: Exploit non-volatile memory (starting with SSDs and HDs)
  59. 59. GraphChi – disk-based GraphLab Challenge: Random Accesses Novel GraphChi solution: Parallel sliding windows method  minimizes number of random accesses
  60. 60. Triangle Counting on Twitter Graph 40M Users Total: 34.8 Billion Triangles 1.2B Edges 1636 Machines 423 Minutes 59 Minutes, 1 Mac Mini! 64 Machines, 1024 Cores 15 Seconds Hadoop results from [Suri & Vassilvitskii '11]
  61. 61. – ML algorithms as vertex programs – Asynchronous execution and consistency models PowerGraph 2 – Natural graphs change the nature of computation – Vertex cuts and gather/apply/scatter model
  62. 62. GL2 PowerGraph focused on Scalability at the loss of Usability
  63. 63. GraphLab 1 PageRank(i, scope){ acc = 0 for (j in InNeighbors) { acc += pr[j] * edge[j].weight } pr[i] = 0.15 + 0.85 * acc } Explicitly described operations Code is intuitive
  64. 64. GraphLab 1 GL2 PowerGraph Implicit operation PageRank(i, scope){ acc = 0 for (j in InNeighbors) { acc += pr[j] * edge[j].weight } pr[i] = 0.15 + 0.85 * acc } gather(edge) { return edge.source.value * edge.weight } merge(acc1, acc2) { return accum1 + accum2 } Implicit Explicitly described operations Code is intuitive apply(v, accum) { aggregation = 0.15 + 0.85 * acc } Need to understand engine to understand code
  65. 65. Great flexibility, but hit scalability wall Scalability, but very rigid abstraction (many contortions needed to implement SVD++, Restricted Boltzmann Machines) What now?
  66. 66. WarpGraph USABILITY 3
  67. 67. GL3 WarpGraph Goals Program Like GraphLab 1 Run Like GraphLab 2 Machine 1 Machine 2
  68. 68. Fine-Grained Primitives Expose Neighborhood Operations through Parallelizable Iterators Y PageRankUpdateFunction(Y) { Y.pagerank = 0.15 + 0.85 * MapReduceNeighbors( lambda nbr: nbr.pagerank*nbr.weight, (aggregate sum over neighbors) lambda (a,b): a + b ) }
  69. 69. Expressive, Extensible Neighborhood API0 MapReduce over Neighbors Parallel Transform Adjacent Edges Broadcast Y Y Y Modify adjacent edges Schedule a selected subset of adjacent vertices Y + …+ Y + Y Parallel Sum
  70. 70. Can express every GL2 PowerGraph program (more easily) in GL3 WarpGraph But GL3 is more expressive Multiple gathers UpdateFunction(v) { if ( == 1) accum = MapReduceNeighs(g,m) else ... } Scatter before gather Conditional execution
  71. 71. Graph Coloring GL2 PowerGraph Twitter Graph: 41M Vertices 1.4B Edges 227 seconds GL3 WarpGraph 89 seconds 2.5x Faster WarpGraph outperforms PowerGraph with simpler code 32 Nodes x 16 Cores (EC2 HPC cc2.8x)
  72. 72. – ML algorithms as vertex programs – Asynchronous execution and consistency models PowerGraph 2 WarpGraph – Natural graphs change the nature of computation – Vertex cuts and gather/apply/scatter model – Usability is key – Access neighborhood through parallelizable iterators and latency hiding
  73. 73. Usability RECENT RELEASE: GRAPHLAB 2.2, INCLUDING WARPGRAPH ENGINE And support for streaming/dynamic graphs! Consensus that WarpGraph is much easier to use than PowerGraph “User study” group biased… :-)
  74. 74. Usability for Whom??? GL2 PowerGraph GL3 WarpGraph …
  75. 75. Machine Learning PHASE 3 USABILITY
  76. 76. Exciting Time to Work in ML With Big Data, I’ll take over the world!!! We met because of Big Data Why won’t Big Data read my mind??? Unique opportunities to change the world!!  But, every deployed system is an one-off solution, and requires PhDs to make work… 
  77. 77. But… ML key to any new service we want to build Even basics of scalable ML can be challenging 6 months from R/Matlab to production, at best State-of-art ML algorithms trapped in research papers Goal of GraphLab 3: Make huge-scale machine learning accessible to all! 
  78. 78. Step 1 Learning ML in Practice with GraphLab Notebook
  79. 79. Step 2 GraphLab+Python: ML Prototype to Production
  80. 80. Learn: GraphLab Notebook Prototype: pip install graphlab  local prototyping Production: Same code scales execute on EC2 cluster
  81. 81. Step 3 GraphLab Toolkits: Integrated State-of-the-Art ML in Production
  82. 82. GraphLab Toolkits Highly scalable, state-of-the-art machine learning straight from python Graph Analytics Graphical Models Computer Vision Clustering Topic Modeling Collaborative Filtering
  83. 83. Now with GraphLab: Learn/Prototype/Deploy Even basics of scalable ML can be challenging Learn ML with GraphLab Notebook 6 months from R/Matlab to production, at best pip install graphlab then deploy on EC2 State-of-art ML algorithms trapped in research papers Fully integrated via GraphLab Toolkits
  84. 84. We’re selecting strategic partners Help define our strategy & priorities And, get the value of GraphLab in your company
  85. 85. Possibility Scalability Usability GraphLab 2.2 available now: Define our future: Needless to say:
  86. 86. Please give us your feedback on this presentation BDT204 As a thank you, we will select prize winners daily for completed surveys!