Your SlideShare is downloading. ×
0
Large-Scale Machine Learning and Graphs
Carlos Guestrin
November 15, 2013

© 2013 Amazon.com, Inc. and its affiliates. All...
PHASE 1: POSSIBILITY
PHASE 2: SCALABILITY
PHASE 3: USABILITY
Three Phases in Technological
Development

1.
Possibility

2.
Scalability

3.
Usability

Wide
Adoption
Beyond
Experts &
En...
Machine Learning PHASE 1:
POSSIBILITY
Rosenblatt 1957
Machine Learning PHASE 2:
SCALABILITY
Needless to Say, We Need Machine
Learning for Big Data
6 Billion
Flickr Photos

28 Million
Wikipedia Pages

1 Billion
Face...
Big Learning
How will we design and implement
parallel learning systems?
MapReduce for Data-Parallel ML
Excellent for large data-parallel tasks!
Data-Parallel

MapReduce
Feature
Extraction

Cross...
The Power of
Dependencies

where the value is!
Flashback to 1998
Why?

First Google advantage:
a Graph Algorithm & a System to Support it!
It’s all about the
graphs…
Social Media

•

Science

Advertising

Web

Graphs encode the relationships between:

People

Facts

Products

Interests

...
Examples of
Graphs in
Machine Learning
Label a Face
and Propagate
Pairwise similarity not enough…

Not similar enough
to be sure
Propagate Similarities & Co-occurrences for
Accurate Predictions

similarity
edges

Probabilistic Graphical Models
co-occu...
Collaborative Filtering: Exploiting Dependencies
Women on the Verge of a
Nervous Breakdown

The Celebration

Latent Factor...
Estimate Political Bias
?

Liberal
?

?
Post

Post

Semi-Supervised & ?
Post
Post
Transductive Learning ?
Post
Post

?
?

...
Topic Modeling
Cat
Apple
Growth
Hat
Plant

LDA and co.
Machine Learning Pipeline

Data

Extract
Features

Graph
Formation

Structured
Machine
Learning
Algorithm

Value
from
Data...
ML Tasks Beyond Data-Parallelism
Data-Parallel

Graph-Parallel

Map Reduce
Feature
Extraction

Cross
Validation

Computing...
Example of a
Graph-Parallel
Algorithm
PageRank
Depends on rank
of who follows her

Depends on rank
of who follows them…

What’s the rank
of this user?

Rank?

L...
PageRank Iteration

Iterate until convergence:
R[j]

“My rank is weighted
average of my friends’ ranks”

wji

R[i]
– α is ...
Properties of Graph Parallel Algorithms
Dependency
Graph

Local
Updates

Iterative
Computation
My Rank

Friends Rank
The Need for a New Abstraction
• Need: Asynchronous, Dynamic Parallel Computations

Data-Parallel

Graph-Parallel

Map Red...
The GraphLab Goals
Know how to
solve ML problem
on 1 machine

Efficient
parallel
predictions
POSSIBILITY
Data Graph
Data associated with vertices and edges
Graph:
• Social Network

Vertex Data:
• User profile text
• Current int...
How do we program
graph computation?

“Think like a Vertex.”
-Malewicz et al. [SIGMOD’10]
Update Functions
User-defined program: applied to
vertex transforms data in scope of vertex
pagerank(i, scope){
// Get Nei...
The GraphLab Framework
Graph Based
Data Representation

Scheduler

Update Functions
User Computation

Consistency
Model
Alternating Least
Squares

SVD
CoEM

Lasso
LDA

Splash Sampler
Bayesian Tensor
Factorization

Belief Propagation
PageRank
...
Never Ending Learner Project (CoEM)
Hadoop

95 Cores

7.5 hrs

Distributed
GraphLab

32 EC2 machines 80 secs

0.3% of Hado...
– ML algorithms as vertex programs
– Asynchronous execution and consistency
models
Thus far…

GraphLab 1 provided exciting
scaling performance
But…

We couldn’t scale up to
Altavista Webgraph 2002
1.4B ver...
Natural Graphs

[Image from WikiCommons]
Problem:
Existing distributed graph
computation systems perform
poorly on Natural Graphs
Achilles Heel: Idealized Graph Assumption
Assumed…

Small degree 
Easy to partition

But, Natural Graphs…

Many high degr...
Power-Law Degree Distribution
10

Number of Vertices
count

10

8

10

High-Degree Vertices:
1% vertices adjacent to
50% o...
High Degree Vertices are Common
Popular Movies
Users

“Social” People

Netflix
Movies

Hyper Parameters

θθ
ZZ θ θ
ZZ
Z
Z
...
Power-Law Degree Distribution

“Star Like” Motif
President
Obama

Followers
Problem: High Degree Vertices  High
Communication for Distributed Updates
Data transmitted
across network
Y
O(# cut edges...
Random Partitioning
• Both GraphLab 1, Pregel, Twitter, Facebook,… rely on
Random (hashed) partitioning for Natural Graphs...
In Summary
GraphLab 1 and Pregel are not well
suited for natural graphs
• Poor performance on high-degree vertices
• Low Q...
PowerGraph

SCALABILITY

2
Common Pattern for Update Fncs.
R[j]
GraphLab_PageRank(i)
// Compute sum over neighbors
total = 0
foreach( j in in_neighbo...
GAS Decomposition
Gather (Reduce)

Apply

Scatter

Accumulate information
about neighborhood

Apply the accumulated
value ...
Many ML Algorithms fit
into GAS Model
graph analytics, inference in graphical
models, matrix factorization,
collaborative ...
Minimizing Communication in GL2
PowerGraph: Vertex Cuts
GL2 PowerGraph includes novel vertex cut algorithms
Y
Communicatio...
2

From the Abstraction
to a System
Triangle Counting on Twitter Graph
34.8 Billion Triangles

Hadoop
[WWW’11]

1636 Machines
423 Minutes

64 Machines
15 Seco...
Topic Modeling (LDA)

Specifically engineered for this task
64 cc2.8xlarge EC2 Nodes
200 lines of code & 4 human hours

• ...
How well does GraphLab scale?
Yahoo Altavista Web Graph (2002):
One of the largest publicly available webgraphs

1.4B Webp...
GraphChi: Going small with GraphLab

Solve huge problems on
small or embedded devices?

Key: Exploit non-volatile memory
(...
GraphChi – disk-based GraphLab
Challenge:
Random Accesses

Novel GraphChi solution:
Parallel sliding windows method 
mini...
Triangle Counting on Twitter Graph
40M Users
Total: 34.8 Billion Triangles
1.2B Edges

1636 Machines
423 Minutes
59 Minute...
– ML algorithms as vertex programs
– Asynchronous execution and
consistency models

PowerGraph 2

– Natural graphs change ...
GL2 PowerGraph
focused on Scalability
at the loss of Usability
GraphLab 1
PageRank(i, scope){
acc = 0
for (j in InNeighbors) {
acc += pr[j] * edge[j].weight
}
pr[i] = 0.15 + 0.85 * acc
...
GraphLab 1

GL2 PowerGraph
Implicit operation

PageRank(i, scope){
acc = 0
for (j in InNeighbors) {
acc += pr[j] * edge[j]...
Great flexibility,
but hit scalability wall

Scalability,
but very rigid abstraction
(many contortions needed to implement...
WarpGraph

USABILITY

3
GL3 WarpGraph Goals
Program
Like GraphLab 1

Run Like
GraphLab 2
Machine 1

Machine 2
Fine-Grained Primitives
Expose Neighborhood Operations through Parallelizable Iterators

Y

PageRankUpdateFunction(Y) {
Y....
Expressive, Extensible Neighborhood API0
MapReduce over
Neighbors

Parallel Transform
Adjacent Edges

Broadcast

Y

Y

Y

...
Can express every GL2 PowerGraph program
(more easily) in GL3 WarpGraph
But GL3 is
more
expressive

Multiple
gathers

Upda...
Graph Coloring
GL2 PowerGraph

Twitter Graph: 41M Vertices 1.4B Edges

227 seconds

GL3 WarpGraph 89 seconds

2.5x Faster
...
– ML algorithms as vertex programs
– Asynchronous execution and consistency models

PowerGraph 2

WarpGraph

– Natural gra...
Usability
RECENT RELEASE: GRAPHLAB 2.2,
INCLUDING WARPGRAPH ENGINE
And support for
streaming/dynamic graphs!

Consensus th...
Usability for Whom???
GL2
PowerGraph

GL3
WarpGraph

…
Machine Learning
PHASE 3
USABILITY
Exciting Time to Work in ML
With Big Data,
I’ll take over
the world!!!

We met
because of
Big Data

Why won’t
Big Data rea...
But…
ML key to any
new service we
want to build

Even basics of scalable ML
can be challenging
6 months from R/Matlab to
p...
Step 1
Learning ML in Practice
with GraphLab Notebook
Step 2
GraphLab+Python:
ML Prototype to Production
Learn:
GraphLab
Notebook

Prototype:
pip install graphlab

local prototyping

Production:
Same code scales execute on EC2...
Step 3
GraphLab Toolkits:
Integrated State-of-the-Art
ML in Production
GraphLab Toolkits
Highly scalable, state-of-the-art
machine learning straight from python
Graph
Analytics

Graphical
Model...
Now with GraphLab: Learn/Prototype/Deploy
Even basics of scalable ML
can be challenging

Learn ML with
GraphLab Notebook

...
We’re selecting strategic partners

Help define our strategy & priorities
And, get the value of GraphLab in your company

...
Possibility
Scalability
Usability
GraphLab 2.2 available now: graphlab.com
Define our future: partners@graphlab.com
Needle...
Please give us your feedback on this
presentation

BDT204
As a thank you, we will select prize
winners daily for completed...
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
Upcoming SlideShare
Loading in...5
×

GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

2,243

Published on

GraphLab is like Hadoop for graphs in that it enables users to easily express and execute machine learning algorithms on massive graphs. In this session, we illustrate how GraphLab leverages Amazon EC2 and advances in graph representation, asynchronous communication, and scheduling to achieve orders-of-magnitude performance gains over systems like Hadoop on real-world data.

Published in: Technology, Education
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,243
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
113
Comments
0
Likes
9
Embeds 0
No embeds

No notes for slide

Transcript of "GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013"

  1. 1. Large-Scale Machine Learning and Graphs Carlos Guestrin November 15, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  2. 2. PHASE 1: POSSIBILITY
  3. 3. PHASE 2: SCALABILITY
  4. 4. PHASE 3: USABILITY
  5. 5. Three Phases in Technological Development 1. Possibility 2. Scalability 3. Usability Wide Adoption Beyond Experts & Enthusiast
  6. 6. Machine Learning PHASE 1: POSSIBILITY
  7. 7. Rosenblatt 1957
  8. 8. Machine Learning PHASE 2: SCALABILITY
  9. 9. Needless to Say, We Need Machine Learning for Big Data 6 Billion Flickr Photos 28 Million Wikipedia Pages 1 Billion Facebook Users 72 Hours a Minute YouTube “… data a new class of economic asset, like currency or gold.”
  10. 10. Big Learning How will we design and implement parallel learning systems?
  11. 11. MapReduce for Data-Parallel ML Excellent for large data-parallel tasks! Data-Parallel MapReduce Feature Extraction Cross Validation Computing Sufficient Statistics Graph-Parallel Is there more to Machine Learning ?
  12. 12. The Power of Dependencies where the value is!
  13. 13. Flashback to 1998 Why? First Google advantage: a Graph Algorithm & a System to Support it!
  14. 14. It’s all about the graphs…
  15. 15. Social Media • Science Advertising Web Graphs encode the relationships between: People Facts Products Interests • Big: 100 billions of vertices and edges and rich metadata – – Facebook (10/2012): 1B users, 144B friendships Twitter (2011): 15B follower edges Ideas
  16. 16. Examples of Graphs in Machine Learning
  17. 17. Label a Face and Propagate
  18. 18. Pairwise similarity not enough… Not similar enough to be sure
  19. 19. Propagate Similarities & Co-occurrences for Accurate Predictions similarity edges Probabilistic Graphical Models co-occurring faces further evidence
  20. 20. Collaborative Filtering: Exploiting Dependencies Women on the Verge of a Nervous Breakdown The Celebration Latent Factor Models City of God Non-negative Matrix Factorization What do I recommend??? Wild Strawberries La Dolce Vita
  21. 21. Estimate Political Bias ? Liberal ? ? Post Post Semi-Supervised & ? Post Post Transductive Learning ? Post Post ? ? Conservative ? ? Post Post ? ? ? ? Post ? ? ? ? ? ? Post ? Post Post Post ? ? ? ? Post ? Post ? Post ? ? ? ?
  22. 22. Topic Modeling Cat Apple Growth Hat Plant LDA and co.
  23. 23. Machine Learning Pipeline Data Extract Features Graph Formation Structured Machine Learning Algorithm Value from Data Face labels images docs Movie ratings Social activity Doc topics movie recommend sentiment analysis
  24. 24. ML Tasks Beyond Data-Parallelism Data-Parallel Graph-Parallel Map Reduce Feature Extraction Cross Validation Computing Sufficient Statistics Graphical Models Semi-Supervised Gibbs Sampling Learning Belief Propagation Variational Opt. Collaborative Filtering Tensor Factorization Label Propagation CoEM Graph Analysis PageRank Triangle Counting
  25. 25. Example of a Graph-Parallel Algorithm
  26. 26. PageRank Depends on rank of who follows her Depends on rank of who follows them… What’s the rank of this user? Rank? Loops in graph  Must iterate!
  27. 27. PageRank Iteration Iterate until convergence: R[j] “My rank is weighted average of my friends’ ranks” wji R[i] – α is the random reset probability – wji is the prob. transitioning (similarity) from j to i
  28. 28. Properties of Graph Parallel Algorithms Dependency Graph Local Updates Iterative Computation My Rank Friends Rank
  29. 29. The Need for a New Abstraction • Need: Asynchronous, Dynamic Parallel Computations Data-Parallel Graph-Parallel Map Reduce Feature Extraction Cross Validation Computing Sufficient Statistics Graphical Models Gibbs Sampling Belief Propagation Variational Opt. Collaborative Filtering Tensor Factorization Semi-Supervised Learning Label Propagation CoEM Data-Mining PageRank Triangle Counting
  30. 30. The GraphLab Goals Know how to solve ML problem on 1 machine Efficient parallel predictions
  31. 31. POSSIBILITY
  32. 32. Data Graph Data associated with vertices and edges Graph: • Social Network Vertex Data: • User profile text • Current interests estimates Edge Data: • Similarity weights
  33. 33. How do we program graph computation? “Think like a Vertex.” -Malewicz et al. [SIGMOD’10]
  34. 34. Update Functions User-defined program: applied to vertex transforms data in scope of vertex pagerank(i, scope){ // Get Neighborhood data (R[i], wij, R[j]) scope; Update function applied (asynchronously) // R[i] ¬the + (1- a ) å w ´ R[ j]; Update a vertex data in parallel until convergence ji jÎN [i ] // Reschedule Neighbors if needed Many schedulers available tochanges thencomputation if R[i] prioritize reschedule_neighbors_of(i); } Dynamic computation
  35. 35. The GraphLab Framework Graph Based Data Representation Scheduler Update Functions User Computation Consistency Model
  36. 36. Alternating Least Squares SVD CoEM Lasso LDA Splash Sampler Bayesian Tensor Factorization Belief Propagation PageRank Gibbs Sampling SVM K-Means Dynamic Block Gibbs Sampling Linear Solvers …Many others… Matrix Factorization
  37. 37. Never Ending Learner Project (CoEM) Hadoop 95 Cores 7.5 hrs Distributed GraphLab 32 EC2 machines 80 secs 0.3% of Hadoop time 2 orders of mag faster  2 orders of mag cheaper
  38. 38. – ML algorithms as vertex programs – Asynchronous execution and consistency models
  39. 39. Thus far… GraphLab 1 provided exciting scaling performance But… We couldn’t scale up to Altavista Webgraph 2002 1.4B vertices, 6.7B edges
  40. 40. Natural Graphs [Image from WikiCommons]
  41. 41. Problem: Existing distributed graph computation systems perform poorly on Natural Graphs
  42. 42. Achilles Heel: Idealized Graph Assumption Assumed… Small degree  Easy to partition But, Natural Graphs… Many high degree vertices (power-law degree distribution)  Very hard to partition
  43. 43. Power-Law Degree Distribution 10 Number of Vertices count 10 8 10 High-Degree Vertices: 1% vertices adjacent to 50% of edges 6 10 4 10 2 10 AltaVista WebGraph 1.4B Vertices, 6.6B Edges 0 10 0 10 2 10 4 10 degree Degree 6 10 8 10
  44. 44. High Degree Vertices are Common Popular Movies Users “Social” People Netflix Movies Hyper Parameters θθ ZZ θ θ ZZ Z Z ww Z Z Z ww Z Z ww ww Z ww Z ww ww ww b Docs α Common Words LDA Words Obama
  45. 45. Power-Law Degree Distribution “Star Like” Motif President Obama Followers
  46. 46. Problem: High Degree Vertices  High Communication for Distributed Updates Data transmitted across network Y O(# cut edges) Natural graphs do not have low-cost balanced cuts Machine 1 [Leskovec et al. 08, Lang 04] Machine 2 Popular partitioning tools (Metis, Chaco,…) perform poorly [Abou-Rjeili et al. 06] Extremely slow and require substantial memory
  47. 47. Random Partitioning • Both GraphLab 1, Pregel, Twitter, Facebook,… rely on Random (hashed) partitioning for Natural Graphs For p Machines: 10 Machines  90% of edges cut Machine 1 Machine 2 100 Machines  99% of edges cut! All data is communicated… Little advantage over MapReduce
  48. 48. In Summary GraphLab 1 and Pregel are not well suited for natural graphs • Poor performance on high-degree vertices • Low Quality Partitioning
  49. 49. PowerGraph SCALABILITY 2
  50. 50. Common Pattern for Update Fncs. R[j] GraphLab_PageRank(i) // Compute sum over neighbors total = 0 foreach( j in in_neighbors(i)): total = total + R[j] * wji // Update the PageRank R[i] = 0.1 + total // Trigger neighbors to run again if R[i] not converged then foreach( j in out_neighbors(i)) signal vertex-program on j wji R[i] Gather Information About Neighborhood Apply Update to Vertex Scatter Signal to Neighbors & Modify Edge Data
  51. 51. GAS Decomposition Gather (Reduce) Apply Scatter Accumulate information about neighborhood Apply the accumulated value to center vertex Update adjacent edges and vertices. Σ +…+  Y + Y Parallel “Sum” Y Y Y Y ’ Y’
  52. 52. Many ML Algorithms fit into GAS Model graph analytics, inference in graphical models, matrix factorization, collaborative filtering, clustering, LDA, …
  53. 53. Minimizing Communication in GL2 PowerGraph: Vertex Cuts GL2 PowerGraph includes novel vertex cut algorithms Y Communication linear  Provides order in #magnitude gains in performance of spanned machines A vertex-cut minimizes # machines per vertex Percolation theory suggests Power Law graphs can be split by removing only a small set of vertices [Albert et al. 2000]  Small vertex cuts possible!
  54. 54. 2 From the Abstraction to a System
  55. 55. Triangle Counting on Twitter Graph 34.8 Billion Triangles Hadoop [WWW’11] 1636 Machines 423 Minutes 64 Machines 15 Seconds Why? Wrong Abstraction  Broadcast O(degree2) messages per Vertex S. Suri and S. Vassilvitskii, “Counting triangles and the curse of the last reducer,” WWW’11
  56. 56. Topic Modeling (LDA) Specifically engineered for this task 64 cc2.8xlarge EC2 Nodes 200 lines of code & 4 human hours • English language Wikipedia – 2.6M Documents, 8.3M Words, 500M Tokens – Computationally intensive algorithm
  57. 57. How well does GraphLab scale? Yahoo Altavista Web Graph (2002): One of the largest publicly available webgraphs 1.4B Webpages, 6.7 Billion Links 7 seconds per iter. 1B links processed per second 30 lines of Nodes code 64 HPC user 1024 Cores (2048 HT) 4.4 TB RAM
  58. 58. GraphChi: Going small with GraphLab Solve huge problems on small or embedded devices? Key: Exploit non-volatile memory (starting with SSDs and HDs)
  59. 59. GraphChi – disk-based GraphLab Challenge: Random Accesses Novel GraphChi solution: Parallel sliding windows method  minimizes number of random accesses
  60. 60. Triangle Counting on Twitter Graph 40M Users Total: 34.8 Billion Triangles 1.2B Edges 1636 Machines 423 Minutes 59 Minutes, 1 Mac Mini! 64 Machines, 1024 Cores 15 Seconds Hadoop results from [Suri & Vassilvitskii '11]
  61. 61. – ML algorithms as vertex programs – Asynchronous execution and consistency models PowerGraph 2 – Natural graphs change the nature of computation – Vertex cuts and gather/apply/scatter model
  62. 62. GL2 PowerGraph focused on Scalability at the loss of Usability
  63. 63. GraphLab 1 PageRank(i, scope){ acc = 0 for (j in InNeighbors) { acc += pr[j] * edge[j].weight } pr[i] = 0.15 + 0.85 * acc } Explicitly described operations Code is intuitive
  64. 64. GraphLab 1 GL2 PowerGraph Implicit operation PageRank(i, scope){ acc = 0 for (j in InNeighbors) { acc += pr[j] * edge[j].weight } pr[i] = 0.15 + 0.85 * acc } gather(edge) { return edge.source.value * edge.weight } merge(acc1, acc2) { return accum1 + accum2 } Implicit Explicitly described operations Code is intuitive apply(v, accum) { aggregation v.pr = 0.15 + 0.85 * acc } Need to understand engine to understand code
  65. 65. Great flexibility, but hit scalability wall Scalability, but very rigid abstraction (many contortions needed to implement SVD++, Restricted Boltzmann Machines) What now?
  66. 66. WarpGraph USABILITY 3
  67. 67. GL3 WarpGraph Goals Program Like GraphLab 1 Run Like GraphLab 2 Machine 1 Machine 2
  68. 68. Fine-Grained Primitives Expose Neighborhood Operations through Parallelizable Iterators Y PageRankUpdateFunction(Y) { Y.pagerank = 0.15 + 0.85 * MapReduceNeighbors( lambda nbr: nbr.pagerank*nbr.weight, (aggregate sum over neighbors) lambda (a,b): a + b ) }
  69. 69. Expressive, Extensible Neighborhood API0 MapReduce over Neighbors Parallel Transform Adjacent Edges Broadcast Y Y Y Modify adjacent edges Schedule a selected subset of adjacent vertices Y + …+ Y + Y Parallel Sum
  70. 70. Can express every GL2 PowerGraph program (more easily) in GL3 WarpGraph But GL3 is more expressive Multiple gathers UpdateFunction(v) { if (v.data == 1) accum = MapReduceNeighs(g,m) else ... } Scatter before gather Conditional execution
  71. 71. Graph Coloring GL2 PowerGraph Twitter Graph: 41M Vertices 1.4B Edges 227 seconds GL3 WarpGraph 89 seconds 2.5x Faster WarpGraph outperforms PowerGraph with simpler code 32 Nodes x 16 Cores (EC2 HPC cc2.8x)
  72. 72. – ML algorithms as vertex programs – Asynchronous execution and consistency models PowerGraph 2 WarpGraph – Natural graphs change the nature of computation – Vertex cuts and gather/apply/scatter model – Usability is key – Access neighborhood through parallelizable iterators and latency hiding
  73. 73. Usability RECENT RELEASE: GRAPHLAB 2.2, INCLUDING WARPGRAPH ENGINE And support for streaming/dynamic graphs! Consensus that WarpGraph is much easier to use than PowerGraph “User study” group biased… :-)
  74. 74. Usability for Whom??? GL2 PowerGraph GL3 WarpGraph …
  75. 75. Machine Learning PHASE 3 USABILITY
  76. 76. Exciting Time to Work in ML With Big Data, I’ll take over the world!!! We met because of Big Data Why won’t Big Data read my mind??? Unique opportunities to change the world!!  But, every deployed system is an one-off solution, and requires PhDs to make work… 
  77. 77. But… ML key to any new service we want to build Even basics of scalable ML can be challenging 6 months from R/Matlab to production, at best State-of-art ML algorithms trapped in research papers Goal of GraphLab 3: Make huge-scale machine learning accessible to all! 
  78. 78. Step 1 Learning ML in Practice with GraphLab Notebook
  79. 79. Step 2 GraphLab+Python: ML Prototype to Production
  80. 80. Learn: GraphLab Notebook Prototype: pip install graphlab  local prototyping Production: Same code scales execute on EC2 cluster
  81. 81. Step 3 GraphLab Toolkits: Integrated State-of-the-Art ML in Production
  82. 82. GraphLab Toolkits Highly scalable, state-of-the-art machine learning straight from python Graph Analytics Graphical Models Computer Vision Clustering Topic Modeling Collaborative Filtering
  83. 83. Now with GraphLab: Learn/Prototype/Deploy Even basics of scalable ML can be challenging Learn ML with GraphLab Notebook 6 months from R/Matlab to production, at best pip install graphlab then deploy on EC2 State-of-art ML algorithms trapped in research papers Fully integrated via GraphLab Toolkits
  84. 84. We’re selecting strategic partners Help define our strategy & priorities And, get the value of GraphLab in your company partners@graphlab.com
  85. 85. Possibility Scalability Usability GraphLab 2.2 available now: graphlab.com Define our future: partners@graphlab.com Needless to say: jobs@graphlab.com
  86. 86. Please give us your feedback on this presentation BDT204 As a thank you, we will select prize winners daily for completed surveys!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×