• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin
 

CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

on

  • 417 views

Presentation CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin at the AMD Developer Summit (APU13) Nov. 11-13, 2013.

Presentation CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin at the AMD Developer Summit (APU13) Nov. 11-13, 2013.

Statistics

Views

Total Views
417
Views on SlideShare
417
Embed Views
0

Actions

Likes
1
Downloads
20
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin Presentation Transcript

    • 7. After 8. After Large-­‐Scale  Machine  Learning  and  Graphs   Yucheng  Low  
    • Phase  1:  POSSIBILITY   Benz  Patent  Motorwagen  (1886)  
    • Phase  2:  SCALABILITY   Model  T  Ford  (1908)  
    • Phase  3:  USABILITY  
    • Possibility 6. Before 7. After 6. Before Scalability +  Graph 8. After 7. After 6. Before 8. After 7. After Usability 8. After
    • The  Big  QuesFon  of     Big  Learning   How  will  we   design  and  implement     parallel  learning  systems?    
    • MapReduce  for  Data-­‐Parallel  ML   Excellent  for  large  data-­‐parallel  tasks!   Data-Parallel MapReduce   Feature     ExtracFon   Cross   ValidaFon   CompuFng  Sufficient   StaFsFcs     Graph-Parallel Is  there  more  to   Machine  Learning   Graphical  Models   Gibbs  Sampling   Belief  PropagaFon   VariaFonal  Opt.   CollaboraLve     Filtering   Semi-­‐Supervised     Learning   ?   Tensor  FactorizaFon   Label  PropagaFon   CoEM   Graph  Analysis   PageRank   Triangle  CounFng  
    • Es(mate  Poli(cal  Bias   ?   ?   ?   Liberal   ?   ?   Post   ?   Post   ?   ?   ?   Semi-­‐Supervised  &     ?   TransducFve  Learning   Post   Post   ?   ?   ?   Post   Post   Post   ?   ConservaFve   ?   ?   ?   ?   Post   ?   Post   ?   ?   ?   ?   Post   ?   Post   ?   Post   Post   Post   ?   Post   Post   ?   ?   ?   ?  
    • Flashback  to  1998   First  Google  advantage:     a  Graph  Algorithm  &  a  System  to  Support  it!  
    • The  Power  of   Dependencies     where  the  value  is!  
    • It’s  all  about  the   graphs…    
    • Social  Media   ! AdverLsing   Web   Graphs  encode  the  relaLonships  between:   People   ! Science   Products   Ideas   Facts   Interests   Big:  100  billions  of  verLces  and  edges  and  rich  metadata   ! ! Facebook  (10/2012):  1B  users,  144B  friendships     Twicer  (2011):  15B  follower  edges  
    • Examples  of   Graphs  in     Machine  Learning  
    • CollaboraFve  Filtering:  ExploiFng  Dependencies   Women  on  the  Verge  of  a   Nervous  Breakdown   The  CelebraFon   Latent  Factor  Models   City  of  God   Matrix  CompleFon/FactorizaFon  Models   Wild  Strawberries   La  Dolce  Vita  
    • Topic  Modeling   Cat   Apple   Latent  Dirichlet  AllocaFon,  etc   Growth   Hat   Plant  
    • Example  Topics  Discovered  from  Wikipedia  
    • Machine  Learning  Pipeline   Data Extract Features Graph Formation Structured Machine Learning Algorithm 6. Before Value from Data 7. After face   labels   images     docs   movie     raFngs     doc   topics       social   acFvity         8. After movie   recommend     senFment   analysis  
    • ML  Tasks  Beyond  Data-­‐Parallelism     Data-Parallel Graph-Parallel Map  Reduce   Feature     ExtracFon   Cross   ValidaFon   CompuFng  Sufficient   StaFsFcs     Graphical  Models   Gibbs  Sampling   Belief  PropagaFon   VariaFonal  Opt.   CollaboraLve     Filtering   Tensor  FactorizaFon   Semi-­‐Supervised     Learning   Label  PropagaFon   CoEM   Graph  Analysis   PageRank   Triangle  CounFng  
    • Example  of  a   Graph-­‐Parallel   Algorithm  
    • PageRank   Depends on rank of who follows them… Depends on rank of who follows her What’s the rank of this user? Rank?   Loops  in  graph  è  Must  iterate!  
    • PageRank  IteraFon   R[j]   Iterate  unFl  convergence:   wji   R[i]   “My  rank  is  weighted     average  of  my  friends’  ranks”   X R[i] = ↵ + (1 ↵) wji R[j] (j,i)2E ! ! α  is  the  random  reset  probability wji  is  the  prob.  transiFoning  (similarity)  from  j  to  i
    • ProperFes  of  Graph  Parallel  Algorithms   Dependency   Graph   Local   Updates   IteraFve   ComputaFon   My  Rank   Friends  Rank  
    • The  Need  for  a  New  AbstracFon   ! Need:  Asynchronous,  Dynamic  Parallel  ComputaFons   Data-Parallel Graph-Parallel Map  Reduce   Feature     ExtracFon   Cross   ValidaFon   CompuFng  Sufficient   StaFsFcs     Graphical  Models   Gibbs  Sampling   Belief  PropagaFon   VariaFonal  Opt.   CollaboraLve     Filtering   Tensor  FactorizaFon   Semi-­‐Supervised     Learning   Label  PropagaFon   CoEM   Data-­‐Mining   PageRank   Triangle  CounFng  
    • The  GraphLab  Goals   Know how to solve ML problem on 1 machine Efficient   parallel   predicFons  
    • POSSIBILITY  
    • Data  Graph   Data  associated  with  verFces  and  edges   Graph:   •   Social  Network   Vertex  Data:   •   User  profile  text   •   Current  interests  esFmates   Edge  Data:   •   Similarity  weights    
    • How  do  we  program     graph  computaFon?   “Think  like  a  Vertex.”   -­‐Malewicz  et  al.  [SIGMOD’10]  
    • Update  FuncFons   User-­‐defined  program:  applied  to     vertex  transforms  data  in  scope  of  vertex   pagerank(i,  scope){      //  Get  Neighborhood  data      (R[i],  wij,  R[j])  !scope;     //  Update  the  vertex  data Update  funcFon  applied  (asynchronously)         R[i] ← α + (1− α ) ∑ w ji × R[ j]; in  parallel  unFl  convergence   j∈N [i]      //  Reschedule  Neighbors  if  needed        if  R[i]  changes  then             Many  schedulers  available  eschedule_neighbors_of(i);            r to  prioriFze  computaFon   }   Dynamic     computaLon  
    • The  GraphLab  Framework   Graph  Based   Data  Representa(on   Scheduler   Update  FuncFons   User  Computa(on   Consistency  Model  
    • AlternaFng  Least     Squares   CoEM   Lasso   SVD   Belief  PropagaFon   LDA   Splash  Sampler   Bayesian  Tensor     FactorizaFon   PageRank   SVM   Gibbs  Sampling   Dynamic  Block  Gibbs  Sampling   K-­‐Means   Linear  Solvers   …Many  others…   Matrix   FactorizaFon  
    • Never  Ending  Learner  Project  (CoEM)   Hadoop   95  Cores   7.5  hrs   Distributed   GraphLab   32  EC2   machines   80  secs   0.3% of Hadoop time 2 orders of mag faster "# 2 orders of mag cheaper
    • ! ! ML  algorithms  as  vertex  programs     Asynchronous  execuFon  and  consistency   models    
    • Thus  far…   GraphLab  1  provided  exciFng   scaling  performance   But…   We  couldn’t  scale  up  to     Altavista  Webgraph  2002   1.4B  verLces,  6.7B  edges  
    • Natural  Graphs   [Image  from  WikiCommons]  
    • Problem:   ExisFng  distributed  graph   computaFon  systems  perform   poorly  on  Natural  Graphs  
    • Achilles  Heel:      Idealized  Graph  AssumpFon   Assumed…   Small  degree  "     Easy  to  parFFon   But,  Natural  Graphs…   Many  high  degree  verFces   (power-­‐law  degree  distribuFon)     "     Very  hard  to  parFFon  
    • Power-­‐Law  Degree  DistribuFon   10 Number  of  VerFces   count 10 8 10 High-­‐Degree     VerFces:     1%  verFces  adjacent   to  50%  of  edges     6 10 4 10 2 10 0 10 AltaVista  WebGraph   1.4B  VerFces,  6.6B  Edges   0 10 2 10 4 Degree   10 degree 6 10 8 10
    • High  Degree  VerFces  are  Common   Popular  Movies   Users   “Social”  People   NeYlix   Movies   Hyper  Parameters   θ θ β θ θ Z Z Z Z Z Z Z Z w w Z Z w w Z Z w w Z Z Z w w w Z w w w w w w w Docs   α Common  Words   LDA   Obama   Words  
    • Power-­‐Law  Degree  DistribuFon   “Star  Like”  MoFf   President   Obama   Followers  
    • Problem:     High  Degree  VerLces  è  High   CommunicaLon  for  Distributed  Updates   Data transmitted Y   across network O(# cut edges) Natural  graphs  do  not  have  low-­‐cost  balanced  cuts              [Leskovec  et  al.  08,  Lang  04]   Machine  1   Machine  2   Popular  parFFoning  tools  (MeFs,  Chaco,…)  perform  poorly                                                [Abou-­‐Rjeili  et  al.  06]   Extremely  slow  and  require  substan(al  memory  
    • acement cutsParFFoning   edges: most of the Random   ! Both  GraphLab  1,  Pregel,  Twicer,  Facebook,…  rely  on   Random  (hashed)  parFFoning  for  Natural  Graphs   m 5.1. If vertices are randomly assign s then the expected fraction of edges cu For  p  Machines:      |Edges Cut| 1  E =1 |E| p     Machine   10  Machines   Machine   e if just twoà  90%  of  1  dges  cut   used, machines are 2   100  Machines  à  99%  of  edges  cut!   ample ha will be cut requiring order |E| /2 commu All  data  is  communicated…  Licle  advantage  over  MapReduce  
    • In  Summary   GraphLab  1  and  Pregel  are  not  well   suited  for  natural  graphs     ! ! Poor  performance  on  high-­‐degree  verFces   Low  Quality  ParFFoning  
    • 8. After SCALABILITY  
    • Common  Padern  for  Update  Fncs.   R[j]   wji   R[i]   GraphLab_PageRank(i)        //  Compute  sum  over  neighbors      total  =  0   Gather  InformaLon      foreach(  j  in  in_neighbors(i)):     About  Neighborhood          total  =  total  +  R[j]  *  wji        //  Update  the  PageRank   Apply  Update  to  Vertex      R[i]  =  0.1  +  total          //  Trigger  neighbors  to  run  again      if  R[i]  not  converged  then   Sca7er  Signal  to  Neighbors            foreach(  j  in  out_neighbors(i))     Modify  Edge  Data   &              signal  vertex-­‐program  on  j  
    • GAS  DecomposiFon   Gather  (Reduce)   Accumulate  informaFon   about  neighborhood   Y   Y   Y   ⌃ +     +  …  +            $     Scacer   Apply  the  accumulated     value  to  center  vertex   Σ Y   Parallel   “Sum”   Apply   Y   Update  adjacent  edges   and  verFces.   Y’   Y’  
    • Many  ML  Algorithms  fit   into  GAS  Model     graph  analyFcs,  inference  in  graphical   models,  matrix  factorizaFon,   collaboraFve  filtering,  clustering,  LDA,  …  
    • Minimizing  CommunicaFon  in  GL2  PowerGraph:   Vertex  Cuts   Y   CommunicaFon  linear     in  #  spanned  machines   GL2  PowerGraph  includes  novel  vertex  cut  algorithms   %   A  vertex-­‐cut  m gains  in  p   Provides  order  of  magnitude  inimizes  erformance   #  machines  per  vertex   Percola(on  theory  suggests  Power  Law  graphs  can  be  split   by  removing  only  a  small  set  of  ver(ces  [Albert  et  al.  2000]   è   Small  vertex  cuts  possible!    
    • 7. After     From  the  AbstracFon     to  a  System   8. After
    • Triangle  CounLng  on  Twicer  Graph   34.8  Billion  Triangles   Hadoop   1636  Machines   [WWW’11]   423  Minutes   64  Machines   15  Seconds   Why?  Wrong  AbstracLon    $                Broadcast  O(degree2)  messages  per  Vertex   S.  Suri  and  S.  Vassilvitskii,  “CounFng  triangles  and  the  curse  of  the  last  reducer,”  WWW’11  
    • Topic  Modeling  (LDA)   Million  Tokens  Per  Second   0   20   60   80   100   120   140   Specifically  engineered  for  this  task   Smola  et  al.   GL2  PowerGraph   40   64  cc2.8xlarge  EC2  Nodes   200  lines  of  code  &  4  human  hours   ! English  language  Wikipedia     ! ! 2.6M  Documents,  8.3M  Words,  500M  Tokens   ComputaFonally  intensive  algorithm   100  Yahoo!  Machines   160  
    • How  well  does  GraphLab  scale?   Yahoo  Altavista  Web  Graph  (2002):    One  of  the  largest  publicly  available  webgraphs   1.4B  Webpages,    6.7  Billion  Links     7  seconds  per  iter.   64  HPC  Nodes   1B  links  processed  per  second   30  lines  of  user  code    
    • GraphChi:  Going  small  with  GraphLab   7. After 8. After Solve  huge  problems  on   small  or  embedded   devices?   Key:  Exploit  non-­‐volaFle  memory     (starFng  with  SSDs  and  HDs)  
    • GraphChi  –  disk-­‐based  GraphLab   Challenge:          Random  Accesses   Novel  GraphChi  soluLon:          Parallel  sliding  windows  method  è            minimizes  number  of  random  accesses  
    • Triangle  CounFng  on  Twicer  Graph   40M  Users       Total:  34.8  Billion  Triangles   1.2B  Edges   Hadoop   1636  Machines   423  Minutes   59  Minutes   59  Minutes,  1  Mac  Mini!   GraphChi   GraphLab2   64  Machines,  1024  Cores   15  Seconds   Hadoop results from [Suri & Vassilvitskii '11]  
    • 6. Before ! ! ML  algorithms  as  vertex  programs     Asynchronous  execuFon  and  consistency   models   7. After ! 8. After ! Natural  graphs  change  the  nature  of   computaFon   Vertex  cuts  and  gather/apply/scacer  model  
    • GL2  PowerGraph     focused  on   Scalability   at  the  loss  of   Usability  
    • GraphLab  1   PageRank(i,  scope){      acc  =  0      for  (j  in  InNeighbors)  {          acc  +=  pr[j]  *  edge[j].weight      }      pr[i]  =  0.15  +  0.85  *  acc   }   Explicitly  described  operaLons   Code is intuitive
    • GraphLab  1   GL2  PowerGraph   Implicit  operaLon   PageRank(i,  scope){      acc  =  0      for  (j  in  InNeighbors)  {          acc  +=  pr[j]  *  edge[j].weight      }      pr[i]  =  0.15  +  0.85  *  acc   }   Explicitly  described  operaLons   gather(edge)  {    return  edge.source.value  *                      edge.weight   }       merge(acc1,  acc2)  {                  return  accum1  +  accum2       }       Implicit  aggregaLon   apply(v,  accum)  {    v.pr  =  0.15  +  0.85  *  acc   }     Code is intuitive Need to understand engine to understand code
    • Scalability,     but  very  rigid  abstracFon   Great  flexibility,   but  hit  scalability  wall   (many  contorFons  needed  to  implement     SVD++,  Restricted  Boltzmann  Machines)     What now?
    • 8. After USABILITY  
    • GL3  WarpGraph  Goals   Program   Like  GraphLab  1   Run  Like     GraphLab  2   Machine 1 Machine 2
    • Fine-­‐Grained  PrimiFves   Expose  Neighborhood  OperaLons  through  Parallel  Iterators   R[i] = 0.15 + 0.85 X (j,i)2E Y   w[j, i] ⇤ R[j] PageRankUpdateFunction(Y)  {          Y.pagerank  =  0.15  +  0.85  *                            MapReduceNeighbors(                              lambda  nbr:  nbr.pagerank*nbr.weight,                              lambda  (a,b):  a  +  b   neighbors) (aggregate sum over                    )   }  
    • Expressive,  Extensible  Neighborhood  API   Parallel  Transform   Adjacent  Edges   Broadcast   Y   Y   Y   Modify  adjacent  edges   Schedule  a  selected  subset     of  adjacent  verFces   Y   +            +  …  +                 Y   Parallel   Sum   Y   MapReduce  over   Neighbors  
    • Can  express  every  GL2  PowerGraph  program     (more  easily)  in  GL3  WarpGraph   But  GL3  is  more   expressive   MulFple   gathers   UpdateFunction(v)  {      if  (v.data  ==  1)            accum  =  MapReduceNeighs(g,m)      else  ...   }   Scacer  before   gather   CondiFonal   execuFon  
    • Graph  Coloring   Twicer  Graph:  41M  VerFces  1.4B  Edges   GL2  PowerGraph   227  seconds   GL3  WarpGraph   60  seconds   3.8x  Faster   WarpGraph  outperforms  PowerGraph  with  simpler  code   32  Nodes  x  16  Cores  (EC2  HPC  cc2.8x)  
    • 6. Before ! ! ML  algorithms  as  vertex  programs     Asynchronous  execuFon  and  consistency   models   7. After 6. Before ! 8. After ! Natural  graphs  change  the  nature  of   computaFon   Vertex  cuts  and  gather/apply/scacer  model   7. After 8. After ! ! Usability  is  key   Access  neighborhood  through  parallelizable   iterators  and  latency  hiding  
    • Usability  for  Whom???   PowerGraph   WarpGraph   …  
    • Machine  Learning   PHASE  3  (part  2)   USABILITY  
    • ExciFng  Time  to  Work  in  ML   With Big Data, I’ll take over the world!!! We met because of Big Data Why won’t Big Data read my mind??? Unique  opportuniFes  to  change  the  world!!  ☺   But,  every  deployed  system  is  an  one-­‐off  soluFon,   and  requires  PhDs  to  make  work…  '  
    • But…     Even  basics  of  scalable  ML   can  be  challenging   ML  key  to  any   new  service  we   want  to  build   6  months  from  R/Matlab   to  producFon,  at  best   State-­‐of-­‐art  ML  algorithms   trapped  in  research  papers   Goal  of  GraphLab  3:       Make  huge-­‐scale  machine  learning  accessible  to  all!      
    • Step  0  :   Learn  ML     With  GraphLab  Notebook  
    • GraphLab  +  Python   Step  1  :   pip  install  graphlab     prototype  on  local  machine  
    • GraphLab  +  Python   Step  2  :   scale  to  full  dataset   in  the  cloud     with  minimal  code  changes  
    • GraphLab  +  Python   Step  3:   deploy  in  production      
    • GraphLab  +  Python   Step  4:   ???      
    • GraphLab  +  Python   Step  4:   Profit      
    • GraphLab  Toolkits   Highly  scalable,  state-­‐of-­‐the-­‐art     machine  learning  methods…  all  accessible   from  python   Graph     AnalyFcs   Graphical   Models   Computer   Vision   Clustering   Topic   Modeling   CollaboraFve   Filtering  
    • Now  with  GraphLab:    Learn/Prototype/Deploy   Even  basics  of  scalable  ML   can  be  challenging   6  months  from  R/Matlab   to  producFon,  at  best   State-­‐of-­‐art  ML  algorithms   trapped  in  research  papers   Learn ML with GraphLab Notebook pip install graphlab then deploy on EC2/Cluster Fully integrated via GraphLab Toolkits
    • We’re  selecFng  strategic  partners   Help  define  our  strategy  &  prioriFes   And,  get  the  value  of  GraphLab  in  your  company     partners@graphlab.com  
    • 8. After C++  GraphLab  2.2  available  now:  graphlab.com   Beta  Program:  beta.graphlab.com   Follow  us  on  Twicer:  @graphlabteam   Define  our  future:  partners@graphlab.com   Needless  to  say:  jobs@graphlab.com