0
7. After

8. After

Large-­‐Scale	
  Machine	
  Learning	
  and	
  Graphs	
  
Yucheng	
  Low	
  
Phase	
  1:	
  POSSIBILITY	
  

Benz	
  Patent	
  Motorwagen	
  (1886)	
  
Phase	
  2:	
  SCALABILITY	
  

Model	
  T	
  Ford	
  (1908)	
  
Phase	
  3:	
  USABILITY	
  
Possibility

6. Before

7. After

6. Before

Scalability

+	
  Graph
8. After

7. After

6. Before

8. After

7. After

Us...
The	
  Big	
  QuesFon	
  of	
  	
  
Big	
  Learning	
  
How	
  will	
  we	
  
design	
  and	
  implement	
  	
  
parallel	...
MapReduce	
  for	
  Data-­‐Parallel	
  ML	
  
Excellent	
  for	
  large	
  data-­‐parallel	
  tasks!	
  
Data-Parallel

Ma...
Es(mate	
  Poli(cal	
  Bias	
  
?	
  

?	
  
?	
  

Liberal	
  
?	
  

?	
  

Post	
  

?	
  
Post	
  

?	
  
?	
  

?	
  ...
Flashback	
  to	
  1998	
  

First	
  Google	
  advantage:	
  	
  
a	
  Graph	
  Algorithm	
  &	
  a	
  System	
  to	
  Su...
The	
  Power	
  of	
  
Dependencies	
  
	
  
where	
  the	
  value	
  is!	
  
It’s	
  all	
  about	
  the	
  
graphs…	
  	
  
Social	
  Media	
  

!

AdverLsing	
  

Web	
  

Graphs	
  encode	
  the	
  relaLonships	
  between:	
  

People	
  
!

Sc...
Examples	
  of	
  
Graphs	
  in	
  	
  
Machine	
  Learning	
  
CollaboraFve	
  Filtering:	
  ExploiFng	
  Dependencies	
  
Women	
  on	
  the	
  Verge	
  of	
  a	
  
Nervous	
  Breakdow...
Topic	
  Modeling	
  
Cat	
  
Apple	
  

Latent	
  Dirichlet	
  AllocaFon,	
  etc	
  
Growth	
  

Hat	
  
Plant	
  
Example	
  Topics	
  Discovered	
  from	
  Wikipedia	
  
Machine	
  Learning	
  Pipeline	
  
Data

Extract
Features

Graph
Formation

Structured
Machine
Learning
Algorithm
6. Befo...
ML	
  Tasks	
  Beyond	
  Data-­‐Parallelism	
  	
  
Data-Parallel

Graph-Parallel

Map	
  Reduce	
  
Feature	
  	
  
Extra...
Example	
  of	
  a	
  
Graph-­‐Parallel	
  
Algorithm	
  
PageRank	
  

Depends on rank
of who follows them…

Depends on rank
of who follows her

What’s the rank
of this user?

Ran...
PageRank	
  IteraFon	
  
R[j]	
  

Iterate	
  unFl	
  convergence:	
  

wji	
  

R[i]	
  

“My	
  rank	
  is	
  weighted	
...
ProperFes	
  of	
  Graph	
  Parallel	
  Algorithms	
  
Dependency	
  
Graph	
  

Local	
  
Updates	
  

IteraFve	
  
Compu...
The	
  Need	
  for	
  a	
  New	
  AbstracFon	
  
!

Need:	
  Asynchronous,	
  Dynamic	
  Parallel	
  ComputaFons	
  
Data-...
The	
  GraphLab	
  Goals	
  
Know how to
solve ML problem
on 1 machine

Efficient	
  
parallel	
  
predicFons	
  
POSSIBILITY	
  
Data	
  Graph	
  
Data	
  associated	
  with	
  verFces	
  and	
  edges	
  
Graph:	
  
• 	
  Social	
  Network	
  
Vertex	...
How	
  do	
  we	
  program	
  	
  
graph	
  computaFon?	
  

“Think	
  like	
  a	
  Vertex.”	
  
-­‐Malewicz	
  et	
  al.	...
Update	
  FuncFons	
  

User-­‐defined	
  program:	
  applied	
  to	
  	
  
vertex	
  transforms	
  data	
  in	
  scope	
  ...
The	
  GraphLab	
  Framework	
  
Graph	
  Based	
  
Data	
  Representa(on	
  

Scheduler	
  

Update	
  FuncFons	
  
User	...
AlternaFng	
  Least	
  	
  
Squares	
  
CoEM	
  
Lasso	
  

SVD	
  

Belief	
  PropagaFon	
  

LDA	
  

Splash	
  Sampler	...
Never	
  Ending	
  Learner	
  Project	
  (CoEM)	
  
Hadoop	
  

95	
  Cores	
  

7.5	
  hrs	
  

Distributed	
  
GraphLab	...
!
!

ML	
  algorithms	
  as	
  vertex	
  programs	
  	
  
Asynchronous	
  execuFon	
  and	
  consistency	
  
models	
  
	
...
Thus	
  far…	
  

GraphLab	
  1	
  provided	
  exciFng	
  
scaling	
  performance	
  
But…	
  

We	
  couldn’t	
  scale	
 ...
Natural	
  Graphs	
  

[Image	
  from	
  WikiCommons]	
  
Problem:	
  
ExisFng	
  distributed	
  graph	
  
computaFon	
  systems	
  perform	
  
poorly	
  on	
  Natural	
  Graphs	
 ...
Achilles	
  Heel:	
  	
  	
  Idealized	
  Graph	
  AssumpFon	
  
Assumed…	
  

Small	
  degree	
  "	
  	
  
Easy	
  to	
  ...
Power-­‐Law	
  Degree	
  DistribuFon	
  
10

Number	
  of	
  VerFces	
  
count

10

8

10

High-­‐Degree	
  	
  
VerFces:	...
High	
  Degree	
  VerFces	
  are	
  Common	
  
Popular	
  Movies	
  
Users	
  

“Social”	
  People	
  

NeYlix	
  
Movies	...
Power-­‐Law	
  Degree	
  DistribuFon	
  

“Star	
  Like”	
  MoFf	
  

President	
  
Obama	
  

Followers	
  
Problem:	
  	
  
High	
  Degree	
  VerLces	
  è	
  High	
  
CommunicaLon	
  for	
  Distributed	
  Updates	
  

Data trans...
acement cutsParFFoning	
   edges:
most of the
Random	
  
!

Both	
  GraphLab	
  1,	
  Pregel,	
  Twicer,	
  Facebook,…	
  ...
In	
  Summary	
  
GraphLab	
  1	
  and	
  Pregel	
  are	
  not	
  well	
  
suited	
  for	
  natural	
  graphs	
  
	
  
!
!...
8. After

SCALABILITY	
  
Common	
  Padern	
  for	
  Update	
  Fncs.	
  
R[j]	
  

wji	
  
R[i]	
  

GraphLab_PageRank(i)	
  	
  
	
  	
  //	
  Comp...
GAS	
  DecomposiFon	
  
Gather	
  (Reduce)	
  

Accumulate	
  informaFon	
  
about	
  neighborhood	
  

Y	
  

Y	
  

Y	
 ...
Many	
  ML	
  Algorithms	
  fit	
  
into	
  GAS	
  Model	
  

	
  
graph	
  analyFcs,	
  inference	
  in	
  graphical	
  
m...
Minimizing	
  CommunicaFon	
  in	
  GL2	
  PowerGraph:	
  
Vertex	
  Cuts	
  
Y	
  
CommunicaFon	
  linear	
  	
  
in	
  #...
7. After

	
  
	
  
From	
  the	
  AbstracFon	
  	
  
to	
  a	
  System	
  
8. After
Triangle	
  CounLng	
  on	
  Twicer	
  Graph	
  

34.8	
  Billion	
  Triangles	
  

Hadoop	
   1636	
  Machines	
  

[WWW’...
Topic	
  Modeling	
  (LDA)	
  
Million	
  Tokens	
  Per	
  Second	
  
0	
  

20	
  

60	
  

80	
  

100	
  

120	
  

140...
How	
  well	
  does	
  GraphLab	
  scale?	
  
Yahoo	
  Altavista	
  Web	
  Graph	
  (2002):	
  
	
  One	
  of	
  the	
  la...
GraphChi:	
  Going	
  small	
  with	
  GraphLab	
  
7. After

8. After

Solve	
  huge	
  problems	
  on	
  
small	
  or	
 ...
GraphChi	
  –	
  disk-­‐based	
  GraphLab	
  
Challenge:	
  
	
  	
  	
  	
  Random	
  Accesses	
  

Novel	
  GraphChi	
  ...
Triangle	
  CounFng	
  on	
  Twicer	
  Graph	
  
40M	
  Users	
  	
  	
   Total:	
  34.8	
  Billion	
  Triangles	
  

1.2B...
6. Before

!
!

ML	
  algorithms	
  as	
  vertex	
  programs	
  	
  
Asynchronous	
  execuFon	
  and	
  consistency	
  
mo...
GL2	
  PowerGraph	
  	
  
focused	
  on	
  
Scalability	
  
at	
  the	
  loss	
  of	
  
Usability	
  
GraphLab	
  1	
  
PageRank(i,	
  scope){	
  
	
  	
  acc	
  =	
  0	
  
	
  	
  for	
  (j	
  in	
  InNeighbors)	
  {	
  
	
...
GraphLab	
  1	
  

GL2	
  PowerGraph	
  
Implicit	
  operaLon	
  

PageRank(i,	
  scope){	
  
	
  	
  acc	
  =	
  0	
  
	
...
Scalability,	
  	
  
but	
  very	
  rigid	
  abstracFon	
  

Great	
  flexibility,	
  
but	
  hit	
  scalability	
  wall	
 ...
8. After

USABILITY	
  
GL3	
  WarpGraph	
  Goals	
  
Program	
  
Like	
  GraphLab	
  1	
  

Run	
  Like	
  	
  
GraphLab	
  2	
  
Machine 1

Mach...
Fine-­‐Grained	
  PrimiFves	
  
Expose	
  Neighborhood	
  OperaLons	
  through	
  Parallel	
  Iterators	
  
R[i] = 0.15 + ...
Expressive,	
  Extensible	
  Neighborhood	
  API	
  
Parallel	
  Transform	
  
Adjacent	
  Edges	
  

Broadcast	
  

Y	
  ...
Can	
  express	
  every	
  GL2	
  PowerGraph	
  program	
  	
  
(more	
  easily)	
  in	
  GL3	
  WarpGraph	
  
But	
  GL3	...
Graph	
  Coloring	
   Twicer	
  Graph:	
  41M	
  VerFces	
  1.4B	
  Edges	
  
GL2	
  PowerGraph	
  

227	
  seconds	
  

G...
6. Before

!
!

ML	
  algorithms	
  as	
  vertex	
  programs	
  	
  
Asynchronous	
  execuFon	
  and	
  consistency	
  
mo...
Usability	
  for	
  Whom???	
  

PowerGraph	
  

WarpGraph	
  

…	
  
Machine	
  Learning	
  
PHASE	
  3	
  (part	
  2)	
  
USABILITY	
  
ExciFng	
  Time	
  to	
  Work	
  in	
  ML	
  
With Big Data,
I’ll take over
the world!!!

We met
because of
Big Data

Why ...
But…	
  	
  
Even	
  basics	
  of	
  scalable	
  ML	
  
can	
  be	
  challenging	
  

ML	
  key	
  to	
  any	
  
new	
  se...
Step	
  0	
  :	
  
Learn	
  ML	
  
	
  
With	
  GraphLab	
  Notebook	
  
GraphLab	
  +	
  Python	
  
Step	
  1	
  :	
  
pip	
  install	
  graphlab	
  
	
  
prototype	
  on	
  local	
  machine	
  
GraphLab	
  +	
  Python	
  
Step	
  2	
  :	
  
scale	
  to	
  full	
  dataset	
  
in	
  the	
  cloud	
  
	
  
with	
  mini...
GraphLab	
  +	
  Python	
  
Step	
  3:	
  
deploy	
  in	
  production	
  
	
  

	
  
GraphLab	
  +	
  Python	
  
Step	
  4:	
  
???	
  
	
  

	
  
GraphLab	
  +	
  Python	
  
Step	
  4:	
  
Profit	
  
	
  

	
  
GraphLab	
  Toolkits	
  
Highly	
  scalable,	
  state-­‐of-­‐the-­‐art	
  	
  
machine	
  learning	
  methods…	
  all	
  a...
Now	
  with	
  GraphLab:	
  	
  Learn/Prototype/Deploy	
  
Even	
  basics	
  of	
  scalable	
  ML	
  
can	
  be	
  challen...
We’re	
  selecFng	
  strategic	
  partners	
  

Help	
  define	
  our	
  strategy	
  &	
  prioriFes	
  
And,	
  get	
  the	...
8. After

C++	
  GraphLab	
  2.2	
  available	
  now:	
  graphlab.com	
  
Beta	
  Program:	
  beta.graphlab.com	
  
Follow...
Upcoming SlideShare
Loading in...5
×

CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

587

Published on

Presentation CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin at the AMD Developer Summit (APU13) Nov. 11-13, 2013.

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
587
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
31
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin"

  1. 1. 7. After 8. After Large-­‐Scale  Machine  Learning  and  Graphs   Yucheng  Low  
  2. 2. Phase  1:  POSSIBILITY   Benz  Patent  Motorwagen  (1886)  
  3. 3. Phase  2:  SCALABILITY   Model  T  Ford  (1908)  
  4. 4. Phase  3:  USABILITY  
  5. 5. Possibility 6. Before 7. After 6. Before Scalability +  Graph 8. After 7. After 6. Before 8. After 7. After Usability 8. After
  6. 6. The  Big  QuesFon  of     Big  Learning   How  will  we   design  and  implement     parallel  learning  systems?    
  7. 7. MapReduce  for  Data-­‐Parallel  ML   Excellent  for  large  data-­‐parallel  tasks!   Data-Parallel MapReduce   Feature     ExtracFon   Cross   ValidaFon   CompuFng  Sufficient   StaFsFcs     Graph-Parallel Is  there  more  to   Machine  Learning   Graphical  Models   Gibbs  Sampling   Belief  PropagaFon   VariaFonal  Opt.   CollaboraLve     Filtering   Semi-­‐Supervised     Learning   ?   Tensor  FactorizaFon   Label  PropagaFon   CoEM   Graph  Analysis   PageRank   Triangle  CounFng  
  8. 8. Es(mate  Poli(cal  Bias   ?   ?   ?   Liberal   ?   ?   Post   ?   Post   ?   ?   ?   Semi-­‐Supervised  &     ?   TransducFve  Learning   Post   Post   ?   ?   ?   Post   Post   Post   ?   ConservaFve   ?   ?   ?   ?   Post   ?   Post   ?   ?   ?   ?   Post   ?   Post   ?   Post   Post   Post   ?   Post   Post   ?   ?   ?   ?  
  9. 9. Flashback  to  1998   First  Google  advantage:     a  Graph  Algorithm  &  a  System  to  Support  it!  
  10. 10. The  Power  of   Dependencies     where  the  value  is!  
  11. 11. It’s  all  about  the   graphs…    
  12. 12. Social  Media   ! AdverLsing   Web   Graphs  encode  the  relaLonships  between:   People   ! Science   Products   Ideas   Facts   Interests   Big:  100  billions  of  verLces  and  edges  and  rich  metadata   ! ! Facebook  (10/2012):  1B  users,  144B  friendships     Twicer  (2011):  15B  follower  edges  
  13. 13. Examples  of   Graphs  in     Machine  Learning  
  14. 14. CollaboraFve  Filtering:  ExploiFng  Dependencies   Women  on  the  Verge  of  a   Nervous  Breakdown   The  CelebraFon   Latent  Factor  Models   City  of  God   Matrix  CompleFon/FactorizaFon  Models   Wild  Strawberries   La  Dolce  Vita  
  15. 15. Topic  Modeling   Cat   Apple   Latent  Dirichlet  AllocaFon,  etc   Growth   Hat   Plant  
  16. 16. Example  Topics  Discovered  from  Wikipedia  
  17. 17. Machine  Learning  Pipeline   Data Extract Features Graph Formation Structured Machine Learning Algorithm 6. Before Value from Data 7. After face   labels   images     docs   movie     raFngs     doc   topics       social   acFvity         8. After movie   recommend     senFment   analysis  
  18. 18. ML  Tasks  Beyond  Data-­‐Parallelism     Data-Parallel Graph-Parallel Map  Reduce   Feature     ExtracFon   Cross   ValidaFon   CompuFng  Sufficient   StaFsFcs     Graphical  Models   Gibbs  Sampling   Belief  PropagaFon   VariaFonal  Opt.   CollaboraLve     Filtering   Tensor  FactorizaFon   Semi-­‐Supervised     Learning   Label  PropagaFon   CoEM   Graph  Analysis   PageRank   Triangle  CounFng  
  19. 19. Example  of  a   Graph-­‐Parallel   Algorithm  
  20. 20. PageRank   Depends on rank of who follows them… Depends on rank of who follows her What’s the rank of this user? Rank?   Loops  in  graph  è  Must  iterate!  
  21. 21. PageRank  IteraFon   R[j]   Iterate  unFl  convergence:   wji   R[i]   “My  rank  is  weighted     average  of  my  friends’  ranks”   X R[i] = ↵ + (1 ↵) wji R[j] (j,i)2E ! ! α  is  the  random  reset  probability wji  is  the  prob.  transiFoning  (similarity)  from  j  to  i
  22. 22. ProperFes  of  Graph  Parallel  Algorithms   Dependency   Graph   Local   Updates   IteraFve   ComputaFon   My  Rank   Friends  Rank  
  23. 23. The  Need  for  a  New  AbstracFon   ! Need:  Asynchronous,  Dynamic  Parallel  ComputaFons   Data-Parallel Graph-Parallel Map  Reduce   Feature     ExtracFon   Cross   ValidaFon   CompuFng  Sufficient   StaFsFcs     Graphical  Models   Gibbs  Sampling   Belief  PropagaFon   VariaFonal  Opt.   CollaboraLve     Filtering   Tensor  FactorizaFon   Semi-­‐Supervised     Learning   Label  PropagaFon   CoEM   Data-­‐Mining   PageRank   Triangle  CounFng  
  24. 24. The  GraphLab  Goals   Know how to solve ML problem on 1 machine Efficient   parallel   predicFons  
  25. 25. POSSIBILITY  
  26. 26. Data  Graph   Data  associated  with  verFces  and  edges   Graph:   •   Social  Network   Vertex  Data:   •   User  profile  text   •   Current  interests  esFmates   Edge  Data:   •   Similarity  weights    
  27. 27. How  do  we  program     graph  computaFon?   “Think  like  a  Vertex.”   -­‐Malewicz  et  al.  [SIGMOD’10]  
  28. 28. Update  FuncFons   User-­‐defined  program:  applied  to     vertex  transforms  data  in  scope  of  vertex   pagerank(i,  scope){      //  Get  Neighborhood  data      (R[i],  wij,  R[j])  !scope;     //  Update  the  vertex  data Update  funcFon  applied  (asynchronously)         R[i] ← α + (1− α ) ∑ w ji × R[ j]; in  parallel  unFl  convergence   j∈N [i]      //  Reschedule  Neighbors  if  needed        if  R[i]  changes  then             Many  schedulers  available  eschedule_neighbors_of(i);            r to  prioriFze  computaFon   }   Dynamic     computaLon  
  29. 29. The  GraphLab  Framework   Graph  Based   Data  Representa(on   Scheduler   Update  FuncFons   User  Computa(on   Consistency  Model  
  30. 30. AlternaFng  Least     Squares   CoEM   Lasso   SVD   Belief  PropagaFon   LDA   Splash  Sampler   Bayesian  Tensor     FactorizaFon   PageRank   SVM   Gibbs  Sampling   Dynamic  Block  Gibbs  Sampling   K-­‐Means   Linear  Solvers   …Many  others…   Matrix   FactorizaFon  
  31. 31. Never  Ending  Learner  Project  (CoEM)   Hadoop   95  Cores   7.5  hrs   Distributed   GraphLab   32  EC2   machines   80  secs   0.3% of Hadoop time 2 orders of mag faster "# 2 orders of mag cheaper
  32. 32. ! ! ML  algorithms  as  vertex  programs     Asynchronous  execuFon  and  consistency   models    
  33. 33. Thus  far…   GraphLab  1  provided  exciFng   scaling  performance   But…   We  couldn’t  scale  up  to     Altavista  Webgraph  2002   1.4B  verLces,  6.7B  edges  
  34. 34. Natural  Graphs   [Image  from  WikiCommons]  
  35. 35. Problem:   ExisFng  distributed  graph   computaFon  systems  perform   poorly  on  Natural  Graphs  
  36. 36. Achilles  Heel:      Idealized  Graph  AssumpFon   Assumed…   Small  degree  "     Easy  to  parFFon   But,  Natural  Graphs…   Many  high  degree  verFces   (power-­‐law  degree  distribuFon)     "     Very  hard  to  parFFon  
  37. 37. Power-­‐Law  Degree  DistribuFon   10 Number  of  VerFces   count 10 8 10 High-­‐Degree     VerFces:     1%  verFces  adjacent   to  50%  of  edges     6 10 4 10 2 10 0 10 AltaVista  WebGraph   1.4B  VerFces,  6.6B  Edges   0 10 2 10 4 Degree   10 degree 6 10 8 10
  38. 38. High  Degree  VerFces  are  Common   Popular  Movies   Users   “Social”  People   NeYlix   Movies   Hyper  Parameters   θ θ β θ θ Z Z Z Z Z Z Z Z w w Z Z w w Z Z w w Z Z Z w w w Z w w w w w w w Docs   α Common  Words   LDA   Obama   Words  
  39. 39. Power-­‐Law  Degree  DistribuFon   “Star  Like”  MoFf   President   Obama   Followers  
  40. 40. Problem:     High  Degree  VerLces  è  High   CommunicaLon  for  Distributed  Updates   Data transmitted Y   across network O(# cut edges) Natural  graphs  do  not  have  low-­‐cost  balanced  cuts              [Leskovec  et  al.  08,  Lang  04]   Machine  1   Machine  2   Popular  parFFoning  tools  (MeFs,  Chaco,…)  perform  poorly                                                [Abou-­‐Rjeili  et  al.  06]   Extremely  slow  and  require  substan(al  memory  
  41. 41. acement cutsParFFoning   edges: most of the Random   ! Both  GraphLab  1,  Pregel,  Twicer,  Facebook,…  rely  on   Random  (hashed)  parFFoning  for  Natural  Graphs   m 5.1. If vertices are randomly assign s then the expected fraction of edges cu For  p  Machines:      |Edges Cut| 1  E =1 |E| p     Machine   10  Machines   Machine   e if just twoà  90%  of  1  dges  cut   used, machines are 2   100  Machines  à  99%  of  edges  cut!   ample ha will be cut requiring order |E| /2 commu All  data  is  communicated…  Licle  advantage  over  MapReduce  
  42. 42. In  Summary   GraphLab  1  and  Pregel  are  not  well   suited  for  natural  graphs     ! ! Poor  performance  on  high-­‐degree  verFces   Low  Quality  ParFFoning  
  43. 43. 8. After SCALABILITY  
  44. 44. Common  Padern  for  Update  Fncs.   R[j]   wji   R[i]   GraphLab_PageRank(i)        //  Compute  sum  over  neighbors      total  =  0   Gather  InformaLon      foreach(  j  in  in_neighbors(i)):     About  Neighborhood          total  =  total  +  R[j]  *  wji        //  Update  the  PageRank   Apply  Update  to  Vertex      R[i]  =  0.1  +  total          //  Trigger  neighbors  to  run  again      if  R[i]  not  converged  then   Sca7er  Signal  to  Neighbors            foreach(  j  in  out_neighbors(i))     Modify  Edge  Data   &              signal  vertex-­‐program  on  j  
  45. 45. GAS  DecomposiFon   Gather  (Reduce)   Accumulate  informaFon   about  neighborhood   Y   Y   Y   ⌃ +     +  …  +            $     Scacer   Apply  the  accumulated     value  to  center  vertex   Σ Y   Parallel   “Sum”   Apply   Y   Update  adjacent  edges   and  verFces.   Y’   Y’  
  46. 46. Many  ML  Algorithms  fit   into  GAS  Model     graph  analyFcs,  inference  in  graphical   models,  matrix  factorizaFon,   collaboraFve  filtering,  clustering,  LDA,  …  
  47. 47. Minimizing  CommunicaFon  in  GL2  PowerGraph:   Vertex  Cuts   Y   CommunicaFon  linear     in  #  spanned  machines   GL2  PowerGraph  includes  novel  vertex  cut  algorithms   %   A  vertex-­‐cut  m gains  in  p   Provides  order  of  magnitude  inimizes  erformance   #  machines  per  vertex   Percola(on  theory  suggests  Power  Law  graphs  can  be  split   by  removing  only  a  small  set  of  ver(ces  [Albert  et  al.  2000]   è   Small  vertex  cuts  possible!    
  48. 48. 7. After     From  the  AbstracFon     to  a  System   8. After
  49. 49. Triangle  CounLng  on  Twicer  Graph   34.8  Billion  Triangles   Hadoop   1636  Machines   [WWW’11]   423  Minutes   64  Machines   15  Seconds   Why?  Wrong  AbstracLon    $                Broadcast  O(degree2)  messages  per  Vertex   S.  Suri  and  S.  Vassilvitskii,  “CounFng  triangles  and  the  curse  of  the  last  reducer,”  WWW’11  
  50. 50. Topic  Modeling  (LDA)   Million  Tokens  Per  Second   0   20   60   80   100   120   140   Specifically  engineered  for  this  task   Smola  et  al.   GL2  PowerGraph   40   64  cc2.8xlarge  EC2  Nodes   200  lines  of  code  &  4  human  hours   ! English  language  Wikipedia     ! ! 2.6M  Documents,  8.3M  Words,  500M  Tokens   ComputaFonally  intensive  algorithm   100  Yahoo!  Machines   160  
  51. 51. How  well  does  GraphLab  scale?   Yahoo  Altavista  Web  Graph  (2002):    One  of  the  largest  publicly  available  webgraphs   1.4B  Webpages,    6.7  Billion  Links     7  seconds  per  iter.   64  HPC  Nodes   1B  links  processed  per  second   30  lines  of  user  code    
  52. 52. GraphChi:  Going  small  with  GraphLab   7. After 8. After Solve  huge  problems  on   small  or  embedded   devices?   Key:  Exploit  non-­‐volaFle  memory     (starFng  with  SSDs  and  HDs)  
  53. 53. GraphChi  –  disk-­‐based  GraphLab   Challenge:          Random  Accesses   Novel  GraphChi  soluLon:          Parallel  sliding  windows  method  è            minimizes  number  of  random  accesses  
  54. 54. Triangle  CounFng  on  Twicer  Graph   40M  Users       Total:  34.8  Billion  Triangles   1.2B  Edges   Hadoop   1636  Machines   423  Minutes   59  Minutes   59  Minutes,  1  Mac  Mini!   GraphChi   GraphLab2   64  Machines,  1024  Cores   15  Seconds   Hadoop results from [Suri & Vassilvitskii '11]  
  55. 55. 6. Before ! ! ML  algorithms  as  vertex  programs     Asynchronous  execuFon  and  consistency   models   7. After ! 8. After ! Natural  graphs  change  the  nature  of   computaFon   Vertex  cuts  and  gather/apply/scacer  model  
  56. 56. GL2  PowerGraph     focused  on   Scalability   at  the  loss  of   Usability  
  57. 57. GraphLab  1   PageRank(i,  scope){      acc  =  0      for  (j  in  InNeighbors)  {          acc  +=  pr[j]  *  edge[j].weight      }      pr[i]  =  0.15  +  0.85  *  acc   }   Explicitly  described  operaLons   Code is intuitive
  58. 58. GraphLab  1   GL2  PowerGraph   Implicit  operaLon   PageRank(i,  scope){      acc  =  0      for  (j  in  InNeighbors)  {          acc  +=  pr[j]  *  edge[j].weight      }      pr[i]  =  0.15  +  0.85  *  acc   }   Explicitly  described  operaLons   gather(edge)  {    return  edge.source.value  *                      edge.weight   }       merge(acc1,  acc2)  {                  return  accum1  +  accum2       }       Implicit  aggregaLon   apply(v,  accum)  {    v.pr  =  0.15  +  0.85  *  acc   }     Code is intuitive Need to understand engine to understand code
  59. 59. Scalability,     but  very  rigid  abstracFon   Great  flexibility,   but  hit  scalability  wall   (many  contorFons  needed  to  implement     SVD++,  Restricted  Boltzmann  Machines)     What now?
  60. 60. 8. After USABILITY  
  61. 61. GL3  WarpGraph  Goals   Program   Like  GraphLab  1   Run  Like     GraphLab  2   Machine 1 Machine 2
  62. 62. Fine-­‐Grained  PrimiFves   Expose  Neighborhood  OperaLons  through  Parallel  Iterators   R[i] = 0.15 + 0.85 X (j,i)2E Y   w[j, i] ⇤ R[j] PageRankUpdateFunction(Y)  {          Y.pagerank  =  0.15  +  0.85  *                            MapReduceNeighbors(                              lambda  nbr:  nbr.pagerank*nbr.weight,                              lambda  (a,b):  a  +  b   neighbors) (aggregate sum over                    )   }  
  63. 63. Expressive,  Extensible  Neighborhood  API   Parallel  Transform   Adjacent  Edges   Broadcast   Y   Y   Y   Modify  adjacent  edges   Schedule  a  selected  subset     of  adjacent  verFces   Y   +            +  …  +                 Y   Parallel   Sum   Y   MapReduce  over   Neighbors  
  64. 64. Can  express  every  GL2  PowerGraph  program     (more  easily)  in  GL3  WarpGraph   But  GL3  is  more   expressive   MulFple   gathers   UpdateFunction(v)  {      if  (v.data  ==  1)            accum  =  MapReduceNeighs(g,m)      else  ...   }   Scacer  before   gather   CondiFonal   execuFon  
  65. 65. Graph  Coloring   Twicer  Graph:  41M  VerFces  1.4B  Edges   GL2  PowerGraph   227  seconds   GL3  WarpGraph   60  seconds   3.8x  Faster   WarpGraph  outperforms  PowerGraph  with  simpler  code   32  Nodes  x  16  Cores  (EC2  HPC  cc2.8x)  
  66. 66. 6. Before ! ! ML  algorithms  as  vertex  programs     Asynchronous  execuFon  and  consistency   models   7. After 6. Before ! 8. After ! Natural  graphs  change  the  nature  of   computaFon   Vertex  cuts  and  gather/apply/scacer  model   7. After 8. After ! ! Usability  is  key   Access  neighborhood  through  parallelizable   iterators  and  latency  hiding  
  67. 67. Usability  for  Whom???   PowerGraph   WarpGraph   …  
  68. 68. Machine  Learning   PHASE  3  (part  2)   USABILITY  
  69. 69. ExciFng  Time  to  Work  in  ML   With Big Data, I’ll take over the world!!! We met because of Big Data Why won’t Big Data read my mind??? Unique  opportuniFes  to  change  the  world!!  ☺   But,  every  deployed  system  is  an  one-­‐off  soluFon,   and  requires  PhDs  to  make  work…  '  
  70. 70. But…     Even  basics  of  scalable  ML   can  be  challenging   ML  key  to  any   new  service  we   want  to  build   6  months  from  R/Matlab   to  producFon,  at  best   State-­‐of-­‐art  ML  algorithms   trapped  in  research  papers   Goal  of  GraphLab  3:       Make  huge-­‐scale  machine  learning  accessible  to  all!      
  71. 71. Step  0  :   Learn  ML     With  GraphLab  Notebook  
  72. 72. GraphLab  +  Python   Step  1  :   pip  install  graphlab     prototype  on  local  machine  
  73. 73. GraphLab  +  Python   Step  2  :   scale  to  full  dataset   in  the  cloud     with  minimal  code  changes  
  74. 74. GraphLab  +  Python   Step  3:   deploy  in  production      
  75. 75. GraphLab  +  Python   Step  4:   ???      
  76. 76. GraphLab  +  Python   Step  4:   Profit      
  77. 77. GraphLab  Toolkits   Highly  scalable,  state-­‐of-­‐the-­‐art     machine  learning  methods…  all  accessible   from  python   Graph     AnalyFcs   Graphical   Models   Computer   Vision   Clustering   Topic   Modeling   CollaboraFve   Filtering  
  78. 78. Now  with  GraphLab:    Learn/Prototype/Deploy   Even  basics  of  scalable  ML   can  be  challenging   6  months  from  R/Matlab   to  producFon,  at  best   State-­‐of-­‐art  ML  algorithms   trapped  in  research  papers   Learn ML with GraphLab Notebook pip install graphlab then deploy on EC2/Cluster Fully integrated via GraphLab Toolkits
  79. 79. We’re  selecFng  strategic  partners   Help  define  our  strategy  &  prioriFes   And,  get  the  value  of  GraphLab  in  your  company     partners@graphlab.com  
  80. 80. 8. After C++  GraphLab  2.2  available  now:  graphlab.com   Beta  Program:  beta.graphlab.com   Follow  us  on  Twicer:  @graphlabteam   Define  our  future:  partners@graphlab.com   Needless  to  say:  jobs@graphlab.com  
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×