Adapative Provisioning of Stream Processing Systems in the Cloud
Upcoming SlideShare
Loading in...5
×
 

Adapative Provisioning of Stream Processing Systems in the Cloud

on

  • 360 views

 

Statistics

Views

Total Views
360
Views on SlideShare
357
Embed Views
3

Actions

Likes
0
Downloads
7
Comments
0

2 Embeds 3

http://www.linkedin.com 2
https://www.linkedin.com 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Adapative Provisioning of Stream Processing Systems in the Cloud Adapative Provisioning of Stream Processing Systems in the Cloud Presentation Transcript

  • Adap?ve  Provisioning  of  Stream  Processing  Systems   in  the  Cloud     Javier  Cerviño#1,  Eva  Kalyvianaki*2,   Joaquín  Salvachúa#3,  Peter  Pietzuch*4     #  Universidad  Politécnica  de  Madrid,  *  Imperial  College  London   1jcervino@dit.upm.es,  2ekalyv@doc.ic.ac.uk   3jsalvachua@dit.upm.es,  4prp@doc.ic.ac.uk     SMDB  2012    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  1/23    
  • Data  Stream  Processing  Systems  (DSPS)   •  Real-­‐?me  processing  of  con?nuous  data   •  Financial  trading,  sensor  networks,  etc.   •  Data  from  sources  arrive  as  streams   –  Time-­‐ordered  sequence  of  tuples   •  Characteris?cs   –  Tuples  arrival  rates  are  not  uniform   •  Performance  requirements   –  Low  latency   –  Guaranteed  throughput   •  Adap6ve  provisioning   –  Use  resources  on  demand    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  2/23    
  • Cloud  Compu?ng   Cloud  offers  elas?c  compu?ng  by  providing  resources  on  demand   –  Characteris?cs   •  Scalability   •  Geographical  Distribu?on   •  Virtualiza?on   •  Applica?on  Programming  Interface  (API)   –  Amazon  EC2   •  Public  cloud  provider   •  Infrastructure  as  a  Service   •  Images  and  Virtual  Machines    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  3/23    
  • Related  work   •  Cloud  Stream  Processing   [Kleiminger  et  al,  SMDB’11]     •  Cloud  network  performance   –  Cloud  and  Internet  paths  support  streaming  data  into  cloud  DCs?   [Barker  et  al,  MMSys’07],  [Wang  et  al,  INFOCOM’10],  [Jackson  et  al,  CLOUDCOM’10]     •  Cloud  computa?on  performance   –  Best  effort  VMs  support  low-­‐latency,  low-­‐jiier  and  high-­‐throughput  stream   processing?   [Barker  et  al,  MMSys’07]     –  Computa?onal  power  of  Amazon  EC2  VMs  for  standard  stream  processes  tasks?   [Diirich  et  al,  VLDB’10],      Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  4/23    
  • Contribu?ons   •  Explore  the  suitability  of  cloud  infrastructures  for  stream  processing,  (case   study  on  Amazon  EC2)   –  Measure  network  and  processing  latencies,  jiier  and  throughput   •  An  adap?ve  algorithm  to  allocate  cloud  resources  on-­‐demand   –  Resizes  the  number  of  VMs  in  a  DSPS  deployment   •  Algorithm  evalua?on   –  Deploying  the  algorithm  as  part  of  a  DSPS  on  Amazon  EC2    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  5/23    
  • Outline   1.  Cloud  Performance   1.  Network  Measurements   2.  Processing  Measurements   3.  Discussion   2.  Adap?ve  Cloud  Stream  Processing   1.  Architecture   2.  Algorithm   3.  Experimental  Evalua?on   1.  Descrip?on   2.  Results   4.  Future  Work  and  Conclusions    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  6/23    
  • Outline   1.  Cloud  Performance   1.  Network  Measurements   2.  Processing  Measurements   3.  Discussion   2.  Adap?ve  Cloud  Stream  Processing   1.  Architecture   2.  Algorithm   3.  Experimental  Evalua?on   1.  Descrip?on   2.  Results   4.  Future  Work  and  Conclusions      Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  7/23    
  • Cloud  Performance   Network  Measurements   •  Goal:  Explore  network  parameters  that  affect  stream  processing  condi?ons:     –  Ji9er,  latency  and  bandwidth     •  Experimental  set-­‐up   –  Stream  engines   •  Mock  engines  without  processing   •  9  Amazon  EC2  instances:  3  in  US,  3  in  EU  and  3  in  Asia.   •  Large  Amazon  EC2  instances:  7.5GB  and  4  ECU   –  Stream  sources   •  9  distributed  PlanetLab  nodes:  3  in  US,  3  in  EU  and  3  in  Asia.   –  Dataset   •  Random  data  at  three  different  data  rates:  10kbps,  100kbps  and  1Mbps   Europe PlanetLab Cloud USA Asia node instance SOURCE PROCESSING ENGINE  Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  8/23    
  • Cloud  Performance   Network  Measurements   high rate medium rate low rate 4000 Jitter (ms) 2000 0 1 2 3 4 5 6 7 8 9 PlanetLab nodes •  Average  jiier  is  less  than  2.5  μs   •  Some  outliers  have  a  value  of  almost  4  seconds   •  Low  ji9er  with  less  than  3%  of  high  outliers    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  9/23    
  • Cloud  Performance   Network  Measurements   Round−Trip Time (ms) Network−Level 300 200 ideal america 100 asia europe 0 0 50 100 150 200 250 Application−Level Round−Trip Time (ms) •  Applica?on-­‐level  delay  involves  processing  ?me:  tsent-­‐treceived     •  Network-­‐level  delay  between  the  source  and  the  engine:  RTT   •  Cloud  DC  does  not  increase  applica6on-­‐level  delay    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  10/23    
  • Cloud  Performance   Processing  Measurements   •  Goal   –  Explore  performance  varia?on  with  ?me-­‐of-­‐day  (processing  and  latency)   –  Check  if  cloud  VMs  can  scale  efficiently  with  varying  input  rate   •  Experimental  set-­‐up   –  Dataset   •  Esper  benchmark  tool   •  Stream  of  shares  and  stock  values  for  a  given  symbol  at  a  fixed  rate  (30000  tuples/sec)   –  Submi9er   •  10  Extra  large  Amazon  EC2  VMs:  15GB,  8  ECU   –  Nodes   •  10  Small  Amazon  EC2  VMs:  1.7  GB,  1  ECU    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  11/23    
  • Cloud  Performance   Processing  Measurements   Day 1 Day 2 Latency 50 (ms) 0 4 x 10 Throughput (tuples/s) 2 0 7 8 9 10111213141516171819 7 8 9 10111213141516171819 Time of day, 24−hour format Time of day, 24−hour format •  Throughput  remains  rela?vely  stable  over  the  measurement  period   •  Latency  suffers  more  from  unpredictable  outliers   •  No  obvious  pa9ern  to  correlate  performance  with  ?me-­‐of-­‐day    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  12/23    
  • Cloud  Performance   Processing  Measurements   5 Small VM instances Large VM instances x 10 2 1.8 Throughput − tuples/s 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 1 3 5 7 9 11 13 15 17 1 3 5 7 9 11 13 15 17 Input Data Rate − x10000 tuples/s Input Data Rate − x10000 tuples/s •  Cloud  VMs  can  be  used  to  scale  efficiently  with  an  increasing  input  rate   •  The  number  of  VMs  depends  on  their  type,  as  expected    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  13/23    
  • Outline   1.  Cloud  Performance   1.  Network  Measurements   2.  Processing  Measurements   3.  Discussion   2.  Adap?ve  Cloud  Stream  Processing   1.  Architecture   2.  Algorithm   3.  Experimental  Evalua?on   1.  Descrip?on   2.  Results   4.  Future  Work  and  Conclusions    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  14/23    
  • Adap?ve  Cloud  Stream  Processing   •  Elas?c  stream  processing  system  to  scale  the  number  of  VMs  to  input  stream  rates   •  Goals   –  Low-­‐latency  with  a  given  throughput   –  Keep  VMs  opera?ng  to  their  maximum  processing  capacity   •  Workload  is  par??oned  and  balanced  across  mul?ple  VMs   •  Many  VMs  available  to  scale  up  and  down  to  workload  demands   •  Collector  gathers  results  from  engines  and  process  addi?onal  queries   VM   engine   VM   source  1   engine   VM   collector   source  2   engine   VM   engine   Stream  source   Sub-­‐query  1   Sub-­‐query  2    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  15/23    
  • Adap?ve  Cloud  Stream  Processing   Algorithm  I   VM   N virtual machines Proc.   Esper   Rate   Input VM   Proc.   Tuple   Proc Extra Rate submiier   Esper   Rate   Σ   Rate -­‐   Rate VM   Proc.   Esper   Rate   /   Average Rate •  Gathering  and  calcula6on   –  Gathers  processing  rates  from  VMs     –  Obtains   •  Total  extra  processing  rate  (Extra rate)   •  Average  processing  rate  per  VM  (Average rate)    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  16/23    
  • Adap?ve  Cloud  Stream  Processing   Algorithm  II   Extra Average Rate /   Rate N scale  up   Σ   Yes   Average Rate Store   Extra Rate >  0  ?   N’ Return   No   scale  down   Input Rate /   •  Decision  stage   –  Calculates  new  number  of  machines  (N’)   –  Scale  up   •  Stores  the  average  rate  as  maximum  average  rate   –  Scale  down   •  Uses  last  maximum  average  rate    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  17/23    
  • Outline   1.  Cloud  Performance   1.  Network  Measurements   2.  Processing  Measurements   3.  Discussion   2.  Adap?ve  Cloud  Stream  Processing   1.  Architecture   2.  Algorithm   3.  Experimental  Evalua?on   1.  Descrip?on   2.  Results   4.  Future  Work  and  Conclusions    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  18/23    
  • Experimental  Evalua?on   Descrip?on   •  Goals   –  Adaptability  of  the  algorithm  against  varying  input  rates   –  Implica?ons  on  stream  processing  performance  to  adapta?on   •  Experimental  set-­‐up   –  Integrated  with  Esper  processing  system  engine   –  Framework  to  control  VMs  and  to  collect  performance  metrics   •  Throughput,  processing  latency  and  network  latency   •  Collec?on  of  shell  script   –  Deployed  on  Amazon  EC2   Amazon  EC2   Controller   VM   Esper   VM   Esper  tuple   Esper   submiier   VM   Esper   Esper  tuple   Esper   submiier   VM   engine   Stream  source   Sub-­‐query  1   Sub-­‐query  2   Random  values  of   Maximum  value  of  each  stock   Collec?on  and  merge  of  all  results   different  stock  symbols   symbol  per  second   Same  query    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  19/23    
  • Experimental  Evalua?on   Results   5 x 10 Small  Instances   Number of VMs Input Rate Tuples dropped Number of nodes 1.5 Tuples/sec 4 1 3 •  Processing  latency  remains   2 low:  7  –  28  μs     0.5 1 0 100 200 300 400 500 600 700 •  Scales  up  and  down  the   Time (sec) number  of  VMs  as  required  by   the  input  rate   •  There  is  a  significant  reac?on   2 x 10 5 2 delay  before  VMs  are  scaled   Large  Instances   up  and  down   Number of VMs Input Rate Tuples dropped Number of nodes Tuples/sec 1 1 •  VMs  are  pre-­‐allocated   0 0 100 200 300 400 500 600 700 Time (sec)  Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  20/23    
  • Outline   1.  Cloud  Performance   1.  Network  Measurements   2.  Processing  Measurements   3.  Discussion   2.  Adap?ve  Cloud  Stream  Processing   1.  Architecture   2.  Algorithm   3.  Experimental  Evalua?on   1.  Descrip?on   2.  Results   4.  Future  Work  and  Conclusions    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  21/23    
  • Future  Work   •  Inves?gate  ways  to  reduce  the  reac?on  delay  to  performance  viola?ons   •  Predict  the  future  behaviour  of  input  data  rates   •  Inves?gate  cost  models  for  alloca?on  of  small  and  large  VM  instances   •  Evaluate  our  system  in  other  cloud  environments   •  Extensive  evalua?on  over  longer  periods  of  ?me  and  different  VM  types      Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  22/23    
  • Conclusions   •  An  adap?ve  approach  to  provision  stream  processing  systems  in  the  cloud   •  Public  clouds  are  suitable  for  stream  processing   •  Network  latency  is  the  domina?ng  factor  in  public  clouds   •  Our  approach  can  adap?vely  scale  the  number  of  VMs  to  input  rates   •  Processing  latency  and  data  loss  remain  low   Javier  Cerviño   email:  jcervino@dit.upm.es   Thank  you!   Ques?ons?    Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  23/23    
  • Adap?ve  Cloud  Stream  Processing   Algorithm  e VM instances Algorithm 1 Adaptive provisioning of a cloud-based DSPS Require: totalInRate, N , maxRatePerVM Ensure: N 0 s.t. projRatePerVM ⇤ N 0 = totalInRate 1: expRatePerVM = btotalInRate/N c 2: totalExtraRateForVMs = 0; totalProcRate = 0 3: for all deployed VMs do 4: totalExtraRateForVMs += expRatePerVM - getRate(VM ) 7 9 11 13 15 17 5: totalProcRate += getRate(VM )Rate − x10000 tuples/s 6: end for 7: avgRatePerVM = b(totalProcRate/N )csizes on Amazon EC2) 8: if totalExtraRateForVMs > 0 then 9: N 0 = N +d(totalExtraRateForVMs/avgRatePerVM )e 10: maxRatePerVM = avgRatePerVM 11: else if totalExtraRateForVMs < 0 then 12: N 0 = dtotalInRate/maxRatePerVM e 13: end if 14: projRatePerVM = totalInRate/N 0 15: return N 0  Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  24/23    
  • Adap?ve  Cloud  Stream  Processing   Algorithm   getExpectedVMs(totalInRate, currentVMs) { expectedRatePerVM = totalInRate/currentVMs Input  rate     for each deployed VM { calcula?ons   vmRate = getRate(VM) totalExtraRate += (expRatePerVM-vmRate) } avgRatePerVM = totalProcRate/N if (totalExtraRateForVMs > 0) { Increasing   expectedVMs = currentVMs + totalExtraRate/avgRate maxRatePerVM = avgRatePerVM Input  rate   } Decreasing   else if (totalExtraRateForVMs < 0) { expectedVMs = totalInRate / maxRatePerVM Input  rate   } }  Javier  Cerviño,  Eva  Kalyvianaki,  Joaquín  Salvachúa,  Peter  Pietzuch                Adap?ve  Provisioning  of  Stream  Processing  Systems  in  the  Cloud                  25/23