• Like
Scheduling MapReduce Jobs in HPC Clusters
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Scheduling MapReduce Jobs in HPC Clusters

  • 450 views
Published

 

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
450
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
9
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Scheduling  MapReduce  Jobs  in   HPC  Clusters   Marcelo  Neves,  Tiago  Ferreto,  Cesar  De  Rose   marcelo.neves@acad.pucrs.br         Faculty  of  InformaEcs,  PUCRS   Porto  Alegre,  Brazil     August  30,  2012  
  • 2. Outline  •  IntroducEon  •  HPC  Clusters  and  MapReduce  •  MapReduce  Job  Adaptor  •  EvaluaEon  •  Conclusion   2  
  • 3. IntroducEon  •  MapReduce  (MR)   –  A  parallel  programming  model   –  Simplicity,  efficiency  and  high  scalability   –  It  has  become  a  de  facto  standard  for  large-­‐scale  data   analysis  •  MR  has  also  aTracted  the  aTenEon  of  the  HPC   community   –  Simpler  approach  to  address  the  parallelizaEon  problem   –  Highly  visible  cases  where  MR  has  been  successfully  used   by  companies  like  Google,  Facebook  and  Yahoo!   3  
  • 4. HPC  Clusters  and  MapReduce  •  HPC  Clusters   –  Shared  among  mulEple  users/organizaEons   –  Resource  Management  System  (RMS),  such  as  PBS/Torque   –  ApplicaEons  are  submiTed  as  batch  jobs   –  Users  have  to  explicitly  allocate  the  resources,  specifying   the  number  of  nodes  and  amount  of  Eme  •  MR  ImplementaEons  (e.g.  Hadoop)   –  Have  their  own  complete  job  management  system   –  Users  do  not  have  to  explicitly  allocate  resources   –  Require  a  dedicated  cluster   4  
  • 5. Problem  •  Two  disEnct  clusters  are  required   How  to  run  MapReduce  jobs  in  a  exisEng   HPC  cluster  along  with  regular  HPC  jobs?     5  
  • 6. Current  soluEons  •  Hadoop  on  Demand  (HOD)  and  MyHadoop   –  Create  on  demand  MR  installaEons  as  RMS’s  jobs   –  It’s  not  transparent,  users  sEll  must  to  specify  the   number  of  nodes  and  amount  of  Eme  to  be  allocated  •  MESOS   –  Shares  a  cluster  between  mulEple  different   frameworks   –  Creates  another  level  of  resource  management   –  Management  is  taken  away  from  the  cluster’s  RMS   6  
  • 7. MapReduce  Job  Adaptor   HPC Job (# of nodes, time) ResourceHPC User Management System MR Job Adaptor ClusterMR User MR Job MR Job (# of nodes, time) (# of map tasks, # of reduce tasks, job profile) 7  
  • 8. MapReduce  Job  Adaptor  •  The  adaptor  has  three  main  goals:   –  Facilitate  the  execuEon  of  MR  jobs  in  HPC  clusters   –  Minimize  the  average  turnaround  Eme  of  the  jobs   –  Exploit  unused  resources  in  the  cluster  (the  result   of  the  various  shapes  of  HPC  job  requests)   8  
  • 9. CompleEon  Eme  esEmaEon   •  MR  performance  model  by  Verma  et  al.  1   –  Job  profile  with  performance  invariants   –  EsEmate  upper/lower  bounds  of  job  compleEon   •  NJM=  number  of  map  tasks   •  NJR=  number  of  reduce  tasks   •  SJM=  number  of  map  slots   •  SJR=  number  of  reduce  slots    1.  Verma  et  al.:  Aria:  automaEc  resource  inference  and  allocaEon  for  mapreduce  environments  (2011)   9  
  • 10. Algorithm   10  
  • 11. EvaluaEon   •  Simulated  environment  (using  the  SimGrid  toolkit)   –  Cluster  composed  by  128  nodes  with  2  cores  each   –  RMS  based  on  ConservaEve  Backfilling  (CBF)  algorithm   –  Stream  of  job  submissions   •  HPC  workload   –  SyntheEc  workload  based  on  model  by  Lublin  et  al.1   –  Real-­‐world  HPC  traces  from  the  Parallel  Workloads  Archive  (SDSC  SP2)   •  MR  workload   –  SyntheEc  workload  derived  from  Facebook  workloads  described  by   Zaharia  et  al.  2  1.  Lublin  et  al.:  The  workload  on  parallel  supercomputers:  Modeling  the  characterisEcs  of  rigid  jobs  (2003)  2.  Zaharia  et  al.:  Delay  scheduling:  a  simple  technique  for  achieving  locality  and  fairness  in  cluster  scheduling  (2010)   11  
  • 12. Turnaround  Time  and  System  UElizaEon  •  Workload:   –  HPC:    “peak  hour”  of  Lublin’s  model   –  MR:    hour  of  Facebook-­‐like  job  submissions           ≈  40%   ≈  15%  •  The  adaptor  obtained  shorter  turnaround  Emes  and  beTer   cluster  uElizaEon  in  all  cases   –  MR-­‐only:  turnaround  was  reduced  in  ≈  40%   –  HPC+MR:  overall  turnaround  was  reduced  in  ≈  15%   –  HPC+MR:  turnaround  of  MR  jobs  was  reduced  in  ≈  73%   12  
  • 13. 2500 2000 Influence  of  the  Job  Size   Average turnaround time (minutes)•  Shorter  turnaround   1500 regardless  the  job  size   2500•  BeTer  results  for  bins  with   Naive 1000 2000 smaller  jobs   Adaptor Average turnaround time (minutes) 500 #  Map   #  Reduce   %  Jobs  at   1500 Bin   Tasks   Tasks   Facebook   1   1   0   39%   2   2   0   16%   0 3   10   1 2 3   3 14%   5 4 6 7 1000 8 9 4   50   0   9%   5   100   0   6%   Bin 6   200   50   6%   500 7   400   0   4%   8   800   180   4%   9   2400   0   3%   0 Job  sizes  in  Facebook  workload     1 2 3 4 5 6 7 8 9 (based  on  Zaharia  et  al.)   13   Bin
  • 14. 1500 Influence  of  System  Load   1250 Average turnaround time (minutes) 1000 Algorithm 1000 Adaptor 1500 Algorithm Naive Adaptor 800 Naive 750 1250 Average turnaround time (minutes) Average turnaround time (minutes) 500 600 1000 Algorithm Alg Adaptor 250 Naive 750 400 100 10 15 20 25 30HPC job inter arrival time (seconds) 50 1 5 500 10 15 20 25 30 Mean MR job inter arrival time (seconds) 200 250 100 50 100 50 5 10 15 20 25 30 1 5 10 15 20 25 30 Mean HPC job inter arrival time (seconds) Mean MR job inter arrival time (seconds) 14  
  • 15. Real-­‐world  Workload  •  Workload:   –  HPC:  a  day-­‐long  trace  from  SDSC  SP2   –  MR:  1000  Facebook-­‐like  MR  jobs   ≈  54  %   ≈  80  %  •  The  adaptor’s  algorithm  performed  beTer  in  all  cases   15  
  • 16. Conclusion  •  Although  MR  has  gained  aTenEon  by  HPC   community  •  There  is  sEll  a  quesEon  of  how  to  run  MR  jobs   along  with  regular  HPC  jobs  in  a  HPC  cluster  •  MR  Job  Adaptor   –  Allows  transparent  MR  job  submission  on  HPC   clusters   –  Minimizes  the  average  turnaround  Eme   –  Improve  the  overall  uElizaEon,  by  exploiEng  unused   resources  in  the  cluster   16  
  • 17. Thank  you!   17