• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Machine Learning in the Cloud with GraphLab
 

Machine Learning in the Cloud with GraphLab

on

  • 712 views

Talk by Dr. Danny Bickson, GraphLab Inc. at the Applied Machine Learning Day, January 20, 2014 @ MS

Talk by Dr. Danny Bickson, GraphLab Inc. at the Applied Machine Learning Day, January 20, 2014 @ MS

Statistics

Views

Total Views
712
Views on SlideShare
697
Embed Views
15

Actions

Likes
1
Downloads
23
Comments
0

4 Embeds 15

http://www.linkedin.com 12
https://www.linkedin.com 1
http://machinelearning.oktopic.com 1
https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Machine Learning in the Cloud with GraphLab Machine Learning in the Cloud with GraphLab Presentation Transcript

    • Machine  Learning  in  the  Cloud  with  GraphLab   Danny  Bickson   Applied  machine  learning  day,  January  20,  2014  MS  
    • Needless  to  Say,  We  Need   Machine  Learning  for  Big  Data   6  Billion     Flickr  Photos   28  Million     Wikipedia  Pages   1  Billion   Facebook  Users   72  Hours  a  Minute   YouTube   “…  data  a  new  class  of  economic  asset,   like  currency  or  gold.”  
    • Big  Learning   How  will  we   design  and  implement     parallel  learning  systems?    
    • A  ShiU  Towards  Parallelism   GPUs Multicore Clusters Clouds Supercomputers ! G  Muatexperts      repeatedly  solve  the  same  parallel    rad L  e students design  challenges:   ! ! Race  condiZons,  distributed  state,  communicaZon…     The  resulZng  code  is:   ! difficult  to  maintain,  extend,  debug…     Avoid  these  problems  by  using     high-­‐level  abstrac4ons  
    • MapReduce  for  Data-­‐Parallel  ML   ! Excellent  for  large  data-­‐parallel  tasks!   Data-Parallel MapReduce   Feature     ExtracZon   Cross   ValidaZon   CompuZng  Sufficient   StaZsZcs     Graph-Parallel Is  there  more  to   Machine  Learning   Graphical  Models   Gibbs  Sampling   Belief  PropagaZon   VariaZonal  Opt.   Collabora4ve     Filtering   Semi-­‐Supervised     Learning   ?   Tensor  FactorizaZon   Label  PropagaZon   CoEM   Graph  Analysis   PageRank   Triangle  CounZng  
    • The  Power  of   Dependencies     where  the  value  is!   Carnegie Mellon University
    • Label  a  Face  and  Propagate  
    • Pairwise  similarity  not  enough…   Not similar enough to be sure
    • Propagate  SimilariZes  &  Co-­‐occurrences   for  Accurate  PredicZons     similarity   edges   co-­‐occurring   faces   further  evidence  
    • CollaboraZve  Filtering:  Independent  Case   Lord  of  the  Rings   Star  Wars  IV   Star  Wars  I   Harry  Poder   Pirates  of  the  Caribbean    
    • CollaboraZve  Filtering:  ExploiZng  Dependencies   Women  on  the  Verge  of  a   Nervous  Breakdown   The  CelebraZon   What  do  I     recommend???   City  of  God   Wild  Strawberries   La  Dolce  Vita  
    • Machine  Learning  Pipeline   Data Extract Features images   faces   docs   movie     raZngs   important   words           side     info   Graph Formation similar   faces     shared   words   rated   movies     Structured Machine Learning Algorithm belief   propagaZon     LDA     collaboraZve   filtering   Value from Data face   labels     doc   topics   movie   recommend.    
    • Parallelizing  Machine  Learning   Data Extract Features Graph Formation Graph  Ingress   mostly  data-­‐parallel   Structured Machine Learning Algorithm Graph-­‐Structured   Computa4on   graph-­‐parallel   Value from Data
    • ML  Tasks  Beyond  Data-­‐Parallelism     Data-Parallel Graph-Parallel Map  Reduce   Feature     ExtracZon   Cross   ValidaZon   CompuZng  Sufficient   StaZsZcs     Graphical  Models   Gibbs  Sampling   Belief  PropagaZon   VariaZonal  Opt.   Collabora4ve     Filtering   Tensor  FactorizaZon   Semi-­‐Supervised     Learning   Label  PropagaZon   CoEM   Graph  Analysis   PageRank   Triangle  CounZng  
    • Example  of  a   Graph-­‐Parallel   Algorithm   Carnegie Mellon University
    • PageRank   Depends on rank of who follows them… Depends on rank of who follows her What’s the rank of this user? Rank?   Loops  in  graph  è  Must  iterate!  
    • PageRank  IteraZon   R[j]   Iterate  unZl  convergence:   wji   R[i]   “My  rank  is  weighted     average  of  my  friends’  ranks”   X R[i] = ↵ + (1 ↵) wji R[j] (j,i)2E ! ! α  is  the  random  reset  probability wji  is  the  prob.  transiZoning  (similarity)  from  j  to  i
    • ProperZes  of  Graph  Parallel  Algorithms   Dependency   Graph   Local   Updates   IteraZve   ComputaZon   My  Rank   Friends  Rank  
    • Addressing  Graph-­‐Parallel  ML   Data-Parallel Map  Reduce   Feature     ExtracZon   Cross   ValidaZon   CompuZng  Sufficient   StaZsZcs     Graph-Parallel Graph-­‐Parallel  AbstracZon   Map  Reduce?   Graphical  Models   Gibbs  Sampling   Belief  PropagaZon   VariaZonal  Opt.   Collabora4ve     Filtering   Tensor  FactorizaZon   Semi-­‐Supervised     Learning   Label  PropagaZon   CoEM   Data-­‐Mining   PageRank   Triangle  CounZng  
    • Carnegie Mellon University
    • Data  Graph   Data  associated  with  verZces  and  edges   Graph:   •   Social  Network   Vertex  Data:   •   User  profile  text   •   Current  interests  esZmates   Edge  Data:   •   Similarity  weights    
    • How  do  we  program     graph  computaZon?   “Think  like  a  Vertex.”   -­‐Malewicz  et  al.  [SIGMOD’10]   Carnegie Mellon University
    • Update  FuncZons   User-­‐defined  program:  applied  to     vertex  transforms  data  in  scope  of  vertex   pagerank(i,  scope){      //  Get  Neighborhood  data      (R[i],  wij,  R[j])  ßscope;     //  Update  the  vertex  data Update  funcZon  applied  (asynchronously)         R[i] ← α + (1− α ) ∑ w ji × R[ j]; in  parallel  unZl  convergence   j∈N [i]      //  Reschedule  Neighbors  if  needed        if  R[i]  changes  then             Many  schedulers  available  eschedule_neighbors_of(i);            r to  prioriZze  computaZon   }   Dynamic     computa4on  
    • The  GraphLab  Framework   Graph  Based   Data  Representa4on   Scheduler   Update  FuncZons   User  Computa4on   Consistency  Model  
    • AlternaZng  Least     Squares   CoEM   Lasso   SVD   Belief  PropagaZon   LDA   Splash  Sampler   Bayesian  Tensor     FactorizaZon   PageRank   SVM   Gibbs  Sampling   Dynamic  Block  Gibbs  Sampling   K-­‐Means   Linear  Solvers   …Many  others…   Matrix   FactorizaZon  
    • Never  Ending  Learner  Project  (CoEM)   Hadoop   95  Cores   7.5  hrs   Distributed   GraphLab   32  EC2   machines   80  secs   0.3% of Hadoop time 2 orders of mag faster è 2 orders of mag cheaper
    • Thus  far…   GraphLab  1  provided  exciZng   scaling  performance   But…   We  couldn’t  scale  up  to     Altavista  Webgraph  2002   1.4B  ver4ces,  6.7B  edges   Carnegie Mellon University
    • Natural  Graphs   Carnegie Mellon University [Image  from  WikiCommons]  
    • Problem:   ExisZng  distributed  graph   computaZon  systems  perform   poorly  on  Natural  Graphs   Carnegie Mellon University
    • Achilles  Heel:      Idealized  Graph  AssumpZon   Assumed…   Small  degree  è     Easy  to  parZZon   But,  Natural  Graphs…   Many  high  degree  verZces   (power-­‐law  degree  distribuZon)     è     Very  hard  to  parZZon  
    • Power-­‐Law  Degree  DistribuZon   10 Number  of  VerZces   count 10 8 10 High-­‐Degree     VerZces:     1%  verZces  adjacent   to  50%  of  edges     6 10 4 10 2 10 0 10 AltaVista  WebGraph   1.4B  VerZces,  6.6B  Edges   0 10 2 10 4 Degree   10 degree 6 10 8 10
    • High  Degree  VerZces  are  Common   Popular  Movies   Users   “Social”  People   NeQlix   Movies   Hyper  Parameters   θ θ B θ θ Z Z Z Z Z Z Z Z w w Z Z w w Z Z w w Z Z Z Z w w w w w w w w w w Docs   α Common  Words   LDA   Obama   Words  
    • Power-­‐Law  Graphs  are     Difficult  to  Par44on   CPU 1 ! ! CPU 2 Power-­‐Law  graphs  do  not  have  low-­‐cost  balanced   cuts  [Leskovec  et  al.  08,  Lang  04]   TradiZonal  graph-­‐parZZoning  algorithms  perform   poorly  on  Power-­‐Law  Graphs.   [Abou-­‐Rjeili  et  al.  06]   33  
    • GraphLab  2  Solu4on   Program   For  This   ! ! Run  on  This   Machine 1 Machine 2 Split  High-­‐Degree  verZces   New  Abstrac4on  à  Leads  to  this  Split  Vertex  Strategy  
    • GAS  DecomposiZon   Gather  (Reduce)   Accumulate  informaZon   about  neighborhood   Y   Y   Y   ⌃ +     +  …  +            à     Scader   Apply  the  accumulated     value  to  center  vertex   Σ Y   Parallel   “Sum”   Apply   Y   Update  adjacent  edges   and  verZces.   Y’   Y’  
    • GraphChi:  Going  small  with  GraphLab   7. After 8. After Solve  huge  problems  on   small  or  embedded   devices?   Key:  Exploit  non-­‐volaZle  memory     (starZng  with  SSDs  and  HDs)  
    • GraphChi  –  disk-­‐based  GraphLab   Challenge:          Random  Accesses   Novel  GraphChi  solu4on:          Parallel  sliding  windows  method  è            minimizes  number  of  random  accesses  
    • GraphChi  –  disk-­‐based  GraphLab   ! Novel  Parallel  Sliding     Windows  algorithm   ! ! Fast!   Solves  tasks  as  large  as  current   distributed  systems   Minimizes  non-­‐sequenZal  disk   accesses     ! ! Efficient  on  both  SSD  and  hard-­‐ drive   Parallel,  asynchronous   execuZon  
    • Sample  Results   Triangle  Coun4ng   Belief  Propaga4on   TwiYer  graph  (1.5B  edges)   Altavista  Graph  (6.7B  edges)   GraphChi  -­‐  1  Mac   Mini   GraphChi  -­‐  1  Mac   Mini   Hadoop  -­‐  1600   nodes  [1]   Hadoop  -­‐  100   machines  [2]   0   100   200   300   400   500   minutes   0   5   [1]  S.  Suri  and  S.  Vassilvitskii.  CounZng  triangles  and  the  curse  of  the  last  reducer.  WWW’  2011   [2]  U.  Kang,  D.  H.  Chau,  and  C.  Faloutsos.  Inference  of  Beliefs  on  Billion-­‐Scale  Graphs.  KDD-­‐LDMTA’10,  pages  1–7,  June  2010.     10   15   20   25   minutes   30  
    • Triangle  CounZng  on  Twider  Graph   40M  Users       Total:  34.8  Billion  Triangles   1.2B  Edges   Hadoop   1636  Machines   423  Minutes   59  Minutes   59  Minutes,  1  Mac  Mini!   GraphChi   GraphLab2   64  Machines,  1024  Cores   1.5  Minutes   Hadoop results from [Suri & Vassilvitskii WWW ‘11]  
    • Efficient  MulZcore   CollaboraZve  Filtering   LeBuSiShu  team  –     5th  place  in  track1,  ACM  KDD  CUP  2011   Yao  Wu   Qiang  Yan   Qing  Yang   InsZtute  of  AutomaZon   Chinese  Academy  of  Sciences   Danny  Bickson   Yucheng  Low   Machine  Learning  Dept   Carnegie  Mellon  University   ACM  KDD  CUP  Workshop  2011   Carnegie Mellon University
    • Neylix  CollaboraZve  Filtering   ! AlternaZng  Least  Squares  Matrix  FactorizaZon   Model:  0.5  million  nodes,  99  million  edges     4 10 3 10 Runtime(s)   MPI Hadoop MPI   Hadoop   GraphLab 2 10 GraphLab   1 10 4 8 16 24 32 40 #Nodes 48 56 64
    • Intel  Labs  Report  on  GraphLab   Data  source:  Nezih  Yigitbasi,  Intel  Labs  
    • ACM  KDD  CUP  2012  
    • GraphLab  team  @  WSDM  13  
    • Future  Plans  
    • Future  Plans   Learn:     GraphLab   Notebook   Prototype:     pip  install  graphlab     è local  prototyping   ProducZon:     Same  code  scales  -­‐       execute  on  EC2   cluster  
    • GraphLab  Internship  Plan  
    • GraphLab  Conferences   2012                          è                    2013