XRM:	
  An	
  Event-­‐based	
  Resource	
  
Management	
  Framework	
  for	
  XCP	
  
                      Pradeep	
  Padala	
  



 in collaboration with Ken Igarashi, Akshay I. Mehta, and Ulas C. Kozat
Typical	
  scenario	
  in	
  shared	
  infrastructures	
  



 Web search                                  Data analytics




                                                 Shared
                                             infrastructure
                                                (cloud)
                      Data Center!

                       Xen Summit AMD 2010
ApplicaCon	
  requirements	
  
    Web search                           Data analytics
   Fast searches                     Analyze large data

  Low response time                 High throughput




            QoS differentiation 3:1
                   Xen Summit AMD 2010
How	
  to	
  host	
  these	
  applicaCons?	
  
Physical partitioning                  Virtualized data center
  app1        app1                  app1                  app1
  web                                       app2                  app3
               db                   web                    db
 Node I       Node II              Virtualization        Virtualization
  Virtualized	
  shared	
  data	
  center	
  =	
  a	
  new	
  paradigm!	
  
                                                                Node II
  app2     app3             Challenge	
   I
                                     Node

      How	
  to	
  allocate	
  resources	
  to	
  meet	
  goals?	
  
 Node III    Node IV
                                           	
  Improved utilization	
  
×  Wasteful                                	
  Reduced costs
×  Difficult to manage                     High flexibility (elastic!)

                              Xen Summit AMD 2010
Challenge	
  #1:	
  Developers	
  don’t	
  want	
  to	
  manage	
  resources	
  
ProvisionVMs()
RunApplications()                             Where	
  to	
  provision	
  VMs?	
  
While (true) {
   MonitorApplications()
   If(AppPerformance != GOAL) {
       FindReason()
       If (ScaleUp) {           Holy	
  Grail	
  
          FindAvailableResources()
          MigrateVM()                How	
  to	
  determine	
  what	
  to	
  do?	
  
                        DeployService();!
       }
       If (ScaleOut) {    AutoScale();! Migrate?	
  Clone?	
  	
  
                           Scale	
  UP?	
  Scale	
  Out?	
  
          ProvisionVMs()
          RunApplication()
        }
   }
   If (Consolidation == True) {
        FindSuitableVMs()
       Consolidate()                          How	
  to	
  consolidate	
  VMs?	
  
   }
}
                Cloud	
  Providers	
  Want	
  to	
  Consolidate	
  
                          MulCple	
  Services	
  too!	
   5
                               Xen Summit AMD 2010
Challenge	
  #2:	
  Resource	
  Management	
  Spans	
  MulCple	
  Layers	
  


                                            Services	
  

              Management	
  
               Resource	
  
                                               PaaS	
  

                                               IaaS	
  

                                          Hardware	
  

How	
  to	
  pass	
  informa.on	
  between	
  the	
  layers	
  so	
  that	
  they	
  
                 don’t	
  make	
  conflic.ng	
  decisions?	
  

                                  Xen Summit AMD 2010
Challenge	
  #3:	
  Complexity	
  of	
  Scaling	
  PrimiCves	
  

          Slicing	
                               Live	
  MigraCon	
  
   LiZle	
  overhead	
                            Handles	
  overload	
  
   Efficient	
                                      Small	
  downCme	
  
 X  Limited	
  to	
  single	
                    X  Overhead	
  
    machine	
  

          Cloning	
                              Live	
  ReplicaCon	
  
    State-­‐ful	
  clone	
                        Maintain	
  
  X  Overhead	
                                     connecCons	
  
  X  Side-­‐effects	
                             X  Overhead	
  


How	
  to	
  combine	
  primi.ves	
  to	
  achieve	
  goals?	
  
                                Xen Summit AMD 2010
What	
  is	
  a	
  perfect	
  Resource	
  Manager?	
  
A	
   RM	
   that	
   can	
   automaCcally	
   re-­‐arrange	
   resources	
   to	
  
mulCple	
  applicaCons/VMs	
  on	
  mulCple	
  physical	
  machines	
  
and	
  provides	
  opCmal	
  resource	
  uClizaCon	
  and	
  applicaCon	
  
performance	
  	
  
  We	
  are	
  building	
  the	
  (ulCmate)	
  RM	
  system	
  
   AutomaCon	
  
         XRM	
  =	
  first	
  incarnaCon	
  on	
  XCP!	
  
   Resource	
  AllocaCon	
  
   High	
  UClizaCon	
  
   High	
  ApplicaCon	
  Performance	
  



                                Xen Summit AMD 2010
Outline	
  
•    MoCvaCon	
  
•    Challenges	
  in	
  RM	
  
•    XRM	
  Feedback	
  Control	
  based	
  Design	
  
•    XRM	
  ImplementaCon	
  and	
  Preliminary	
  Results	
  
•    Summary	
  and	
  Feedback	
  




                             Xen Summit AMD 2010
How	
  to	
  achieve	
  the	
  automaCon?	
  


    “Almost any system that is
  considered automatic has some
   element of feedback control”
                    -Hellerstein et al.



       XRM	
  =	
  A	
  Feedback	
  Control	
  System	
  


                       Xen Summit AMD 2010
RM	
  in	
  mulCple	
  layers	
  

                                     Services	
  
                            High	
  level	
  service	
  request	
  
                                                                      Does	
  app	
  modeling	
  
                                    PaaS	
  RM	
                       and	
  may	
  request	
  
                                                                              changes	
  	
  
                         Slice	
  request	
             Slice	
  changes	
  
                                     IaaS	
  RM	
  
Automated	
                                                             Knows	
  only	
  about	
  
control	
  loop	
  
                                   Hardware	
                           VMs	
  and	
  hardware	
  
                                                                            resources	
  


                              XRM	
  =	
  IaaS	
  RM	
  
                                         Xen Summit AMD 2010
XRM’s	
  feedback	
  control	
  loop	
  
          XCP	
  
                                             Monitor	
  
 Network	
  stats	
  
                                                                                   Model	
  can	
  model	
  
                                                           Model	
               applicaCons,	
  VMs,	
  and	
  
Performance	
                                                                     underlying	
  resources
    goals	
                                  Control	
  
  Control	
  
parameters	
  



                                              AcCon	
  


                    Change	
  resource	
        Migrate	
              Power-­‐off	
  
                        shares	
                                       machines	
  
                                               Xen Summit AMD 2010
Current	
  incarnaCon	
  
         XCP	
                   Stats	
         Stats	
  analysis	
                          1.  Thresholds	
  
       monitoring	
                                 module	
                                     2.  Rules	
  
        module	
  
                                 Filtered	
  Stats	
  and	
  stats	
  analysis	
  data	
  

                                                Core	
  algorithm	
                             Algorithm	
  
                                                   module	
                                       bank	
  
       RRD	
  database	
  
                                                    Take	
  acCon	
  
 Out	
  of	
  band	
  stat	
  
updates	
  from	
  XCP	
                             Wrapper	
  
         nodes	
  
                                             Low-­‐level	
  commands/XAPI	
  
                                                      commands	
  

                                               XCP	
  master	
  node	
                       Openflow	
  


                                                    Xen Summit AMD 2010
XRM	
  is	
  an	
  event-­‐based	
  framework	
  
•  Many	
  algorithms	
  can	
  be	
  developed	
  and	
  plugged	
  in	
  
•  The	
  algorithms	
  register	
  for	
  specific	
  events	
  
   –  High	
  CPU	
  uClizaCon	
  
   –  Packet	
  drops	
  
   –  PowerOff	
  
   –  PowerOn	
  
   –  …	
  
•  Different	
  algorithms	
  may	
  take	
  different	
  acCons	
  
          A	
  Common	
  Abstrac.on	
  for	
  ALL	
  Algorithms	
  

                                Xen Summit AMD 2010
What	
  algorithms	
  can	
  you	
  implement?	
  
•  AutoControl	
  –	
  automated	
  control	
  of	
  mulCple	
  
   virtualized	
  resources	
  [PadalaEurosys09]	
  
•  Models	
  applicaCon	
  and	
  sets	
  VM	
  shares	
  based	
  on	
  
   applicaCon	
  goals	
  
                      App	
                 App	
                App	
  
                    Controller	
          Controller	
         Controller	
  
                                                                                Resource
     Goals
                                                                                 Shares
                          Node	
  Controller	
       Node	
  Controller	
  



[PadalaEurosys09] Pradeep Padala, Xiaoyun Zhu, Mustafa Uysal et al.
Automated Control of Multiple Virtualized Resources. In the proceedings of the
EuroSys 2009
                                        Xen Summit AMD 2010
Outline	
  
•    MoCvaCon	
  
•    Challenges	
  in	
  RM	
  
•    XRM	
  Feedback	
  Control	
  based	
  Design	
  
•    XRM	
  ImplementaCon	
  and	
  Preliminary	
  Results	
  
•    Summary	
  and	
  Feedback	
  




                             Xen Summit AMD 2010
XRM	
  features	
  
•    Interface	
  to	
  upper	
  layers	
  
•    Auto-­‐*	
  features	
  
•    External	
  control	
  
•    Pluggable	
  algorithms	
  
•    Extensibility	
  




                                  Xen Summit AMD 2010
XRM	
  ImplementaCon	
  
•    Implemented	
  on	
  XCP	
  0.1.1	
  
•    WriZen	
  in	
  Python	
  
•    Pluggable	
  algorithms	
  have	
  to	
  be	
  wriZen	
  in	
  Python	
  
•    Currently	
  implements	
  four	
  algorithms	
  
      –  Bin	
  packing	
  
      –  Bin	
  packing	
  +	
  Live	
  migraCon	
  
      –  Random	
  host	
  
      –  Round-­‐robin	
  
•  We	
  have	
  also	
  implemented	
  a	
  simulator	
  (run	
  1	
  Million	
  
   VMs	
  on	
  100,000	
  nodes!)	
  
      –  Can	
  capture	
  data	
  during	
  a	
  “real”	
  run	
  
      –  Run	
  mulCple	
  algorithms	
  on	
  exact	
  same	
  trace	
  

                                        Xen Summit AMD 2010
XRM	
  EvaluaCon	
  
•    5	
  hosts,	
  4	
  cores	
  
•    Random	
  uClizaCons	
  
•    Random	
  slice	
  requests	
  
•    Three	
  algorithms	
  
     –  Bin-­‐packing	
  
     –  Round-­‐robin	
  
     –  Random-­‐host	
  
•  Slicing	
  algorithms	
  evaluated	
  in	
  previous	
  work	
  -­‐	
  
   AutoControl	
  [PadalaEurosy’09]	
  

                                Xen Summit AMD 2010
Comparing	
  three	
  algorithms	
  
                   1000	
  
                                          Round-Robin                  Uses all five hosts, wasting energy
                    500	
  
Host Utilization




                      0	
  
                   1000	
  
                                          Random Host
                                                                       Uses <= five hosts, wasting energy
                     500	
  


                       0	
  
                    1000	
  
                                          Bin Packing                               Uses <= three hosts!
                     500	
  

                         0	
  
                                  1	
          2	
      3	
        4	
      5	
         6	
     7	
     8	
     9	
  

                                                                Time Interval
AutoControl	
  experiments	
  
                       •  Experiments on Emulab
                       •  20 server nodes – 80 VMs
                       •  20 client nodes
                       •  Mix of applications
                       •  Load increased on ½ of the VMs chosen randomly

           Under	
          Under	
          Over	
       Over	
       Over	
  
          loaded	
         loaded	
         loaded	
     loaded	
     loaded	
  


VM1	
  
VM2	
      No	
  control	
                  AutoControl	
  
VM3	
  
VM4	
  
            needed	
                        can	
  readjust	
  
SLO	
  (performance	
  goal)	
  violaCons	
  
           Default Xen             AutoControl

A
p
p
l
i
c
a
t
i
o
n
s




                           Time
    Time

           Bad            Target                 Good
Summary	
  
•  Resource	
  management	
  in	
  cloud	
  infrastructures	
  is	
  
   complex	
  
    –  MulCple	
  layers	
  of	
  RM	
  
    –  Complex	
  primiCves	
  
    –  Complex	
  decisions	
  
•  We	
  are	
  developing	
  feedback	
  control	
  theory	
  based	
  RM	
  	
  
•  XRM	
  is	
  event-­‐based,	
  pluggable	
  and	
  extensible	
  
•  Complex	
  algorithms	
  like	
  AutoControl	
  can	
  be	
  
   developed	
  
•  Research	
  in	
  advanced	
  algorithms	
  in	
  progress	
  
                                      Xen Summit AMD 2010
Summary	
  of	
  our	
  experiences	
  with	
  XCP	
  0.1.1	
  
•  We	
  are	
  trying	
  to	
  build	
  a	
  research	
  cloud	
  based	
  on	
  XCP	
  
•  Other	
  than	
  XRM,	
  adding	
  Fault	
  Tolerance	
  and	
  a	
  Web-­‐based	
  
   GUI	
  to	
  XCP	
  

•  Having	
  to	
  install	
  a	
  special	
  distribuCon	
  is	
  difficult	
  
     –  Why	
  not	
  have	
  XCP	
  as	
  a	
  set	
  of	
  packages	
  in	
  RHEL	
  or	
  other	
  
        distribuCons?	
  
     –  You	
  are	
  breaking	
  toolstacks	
  developed	
  at	
  various	
  companies	
  
•  XCP	
  docs	
  is	
  same	
  as	
  Citrix	
  Xenserver	
  docs	
  
     –  Some	
  of	
  the	
  features	
  don’t	
  work	
  or	
  not	
  supported	
  
     –  BeZer	
  documentaCon	
  of	
  API	
  
•  XCP	
  GUI	
  needs	
  to	
  improve	
  
     –  Bugs	
  in	
  OpenXenCenter	
  

                                           Xen Summit AMD 2010
Xen Summit AMD 2010
                           25
We	
  want	
  feedback	
  from	
  Xen	
  community	
  
•  Comments	
  on	
  XRM	
  architecture	
  
•  Should	
  we	
  incorporate	
  XRM	
  into	
  XCP?	
  	
  
    –  Ocaml	
  
•  Are	
  you	
  interested	
  in	
  open	
  source	
  XRM?	
  
    –  Does	
  the	
  community	
  wants	
  to	
  be	
  involved?	
  
•  QuesCons?	
  
  	
  	
   	
   	
  	
  

  	
  	
   	
   	
  ppadala@docomolabs-­‐usa.com	
  
                                  Xen Summit AMD 2010

Xrm xensummit

  • 1.
    XRM:  An  Event-­‐based  Resource   Management  Framework  for  XCP   Pradeep  Padala   in collaboration with Ken Igarashi, Akshay I. Mehta, and Ulas C. Kozat
  • 2.
    Typical  scenario  in  shared  infrastructures   Web search Data analytics Shared infrastructure (cloud) Data Center! Xen Summit AMD 2010
  • 3.
    ApplicaCon  requirements   Web search Data analytics Fast searches Analyze large data   Low response time   High throughput   QoS differentiation 3:1 Xen Summit AMD 2010
  • 4.
    How  to  host  these  applicaCons?   Physical partitioning Virtualized data center app1 app1 app1 app1 web app2 app3 db web db Node I Node II Virtualization Virtualization Virtualized  shared  data  center  =  a  new  paradigm!   Node II app2 app3 Challenge   I Node How  to  allocate  resources  to  meet  goals?   Node III Node IV    Improved utilization   ×  Wasteful    Reduced costs ×  Difficult to manage  High flexibility (elastic!) Xen Summit AMD 2010
  • 5.
    Challenge  #1:  Developers  don’t  want  to  manage  resources   ProvisionVMs() RunApplications() Where  to  provision  VMs?   While (true) { MonitorApplications() If(AppPerformance != GOAL) { FindReason() If (ScaleUp) { Holy  Grail   FindAvailableResources() MigrateVM() How  to  determine  what  to  do?   DeployService();! } If (ScaleOut) { AutoScale();! Migrate?  Clone?     Scale  UP?  Scale  Out?   ProvisionVMs() RunApplication() } } If (Consolidation == True) { FindSuitableVMs() Consolidate() How  to  consolidate  VMs?   } } Cloud  Providers  Want  to  Consolidate   MulCple  Services  too!   5 Xen Summit AMD 2010
  • 6.
    Challenge  #2:  Resource  Management  Spans  MulCple  Layers   Services   Management   Resource   PaaS   IaaS   Hardware   How  to  pass  informa.on  between  the  layers  so  that  they   don’t  make  conflic.ng  decisions?   Xen Summit AMD 2010
  • 7.
    Challenge  #3:  Complexity  of  Scaling  PrimiCves   Slicing   Live  MigraCon     LiZle  overhead     Handles  overload     Efficient     Small  downCme   X  Limited  to  single   X  Overhead   machine   Cloning   Live  ReplicaCon     State-­‐ful  clone     Maintain   X  Overhead   connecCons   X  Side-­‐effects   X  Overhead   How  to  combine  primi.ves  to  achieve  goals?   Xen Summit AMD 2010
  • 8.
    What  is  a  perfect  Resource  Manager?   A   RM   that   can   automaCcally   re-­‐arrange   resources   to   mulCple  applicaCons/VMs  on  mulCple  physical  machines   and  provides  opCmal  resource  uClizaCon  and  applicaCon   performance     We  are  building  the  (ulCmate)  RM  system    AutomaCon   XRM  =  first  incarnaCon  on  XCP!    Resource  AllocaCon    High  UClizaCon    High  ApplicaCon  Performance   Xen Summit AMD 2010
  • 9.
    Outline   •  MoCvaCon   •  Challenges  in  RM   •  XRM  Feedback  Control  based  Design   •  XRM  ImplementaCon  and  Preliminary  Results   •  Summary  and  Feedback   Xen Summit AMD 2010
  • 10.
    How  to  achieve  the  automaCon?   “Almost any system that is considered automatic has some element of feedback control” -Hellerstein et al. XRM  =  A  Feedback  Control  System   Xen Summit AMD 2010
  • 11.
    RM  in  mulCple  layers   Services   High  level  service  request   Does  app  modeling   PaaS  RM   and  may  request   changes     Slice  request   Slice  changes   IaaS  RM   Automated   Knows  only  about   control  loop   Hardware   VMs  and  hardware   resources   XRM  =  IaaS  RM   Xen Summit AMD 2010
  • 12.
    XRM’s  feedback  control  loop   XCP   Monitor   Network  stats   Model  can  model   Model   applicaCons,  VMs,  and   Performance   underlying  resources goals   Control   Control   parameters   AcCon   Change  resource   Migrate   Power-­‐off   shares   machines   Xen Summit AMD 2010
  • 13.
    Current  incarnaCon   XCP   Stats   Stats  analysis   1.  Thresholds   monitoring   module   2.  Rules   module   Filtered  Stats  and  stats  analysis  data   Core  algorithm   Algorithm   module   bank   RRD  database   Take  acCon   Out  of  band  stat   updates  from  XCP   Wrapper   nodes   Low-­‐level  commands/XAPI   commands   XCP  master  node   Openflow   Xen Summit AMD 2010
  • 14.
    XRM  is  an  event-­‐based  framework   •  Many  algorithms  can  be  developed  and  plugged  in   •  The  algorithms  register  for  specific  events   –  High  CPU  uClizaCon   –  Packet  drops   –  PowerOff   –  PowerOn   –  …   •  Different  algorithms  may  take  different  acCons   A  Common  Abstrac.on  for  ALL  Algorithms   Xen Summit AMD 2010
  • 15.
    What  algorithms  can  you  implement?   •  AutoControl  –  automated  control  of  mulCple   virtualized  resources  [PadalaEurosys09]   •  Models  applicaCon  and  sets  VM  shares  based  on   applicaCon  goals   App   App   App   Controller   Controller   Controller   Resource Goals Shares Node  Controller   Node  Controller   [PadalaEurosys09] Pradeep Padala, Xiaoyun Zhu, Mustafa Uysal et al. Automated Control of Multiple Virtualized Resources. In the proceedings of the EuroSys 2009 Xen Summit AMD 2010
  • 16.
    Outline   •  MoCvaCon   •  Challenges  in  RM   •  XRM  Feedback  Control  based  Design   •  XRM  ImplementaCon  and  Preliminary  Results   •  Summary  and  Feedback   Xen Summit AMD 2010
  • 17.
    XRM  features   •  Interface  to  upper  layers   •  Auto-­‐*  features   •  External  control   •  Pluggable  algorithms   •  Extensibility   Xen Summit AMD 2010
  • 18.
    XRM  ImplementaCon   •  Implemented  on  XCP  0.1.1   •  WriZen  in  Python   •  Pluggable  algorithms  have  to  be  wriZen  in  Python   •  Currently  implements  four  algorithms   –  Bin  packing   –  Bin  packing  +  Live  migraCon   –  Random  host   –  Round-­‐robin   •  We  have  also  implemented  a  simulator  (run  1  Million   VMs  on  100,000  nodes!)   –  Can  capture  data  during  a  “real”  run   –  Run  mulCple  algorithms  on  exact  same  trace   Xen Summit AMD 2010
  • 19.
    XRM  EvaluaCon   •  5  hosts,  4  cores   •  Random  uClizaCons   •  Random  slice  requests   •  Three  algorithms   –  Bin-­‐packing   –  Round-­‐robin   –  Random-­‐host   •  Slicing  algorithms  evaluated  in  previous  work  -­‐   AutoControl  [PadalaEurosy’09]   Xen Summit AMD 2010
  • 20.
    Comparing  three  algorithms   1000   Round-Robin Uses all five hosts, wasting energy 500   Host Utilization 0   1000   Random Host Uses <= five hosts, wasting energy 500   0   1000   Bin Packing Uses <= three hosts! 500   0   1   2   3   4   5   6   7   8   9   Time Interval
  • 21.
    AutoControl  experiments   •  Experiments on Emulab •  20 server nodes – 80 VMs •  20 client nodes •  Mix of applications •  Load increased on ½ of the VMs chosen randomly Under   Under   Over   Over   Over   loaded   loaded   loaded   loaded   loaded   VM1   VM2   No  control   AutoControl   VM3   VM4   needed   can  readjust  
  • 22.
    SLO  (performance  goal)  violaCons   Default Xen AutoControl A p p l i c a t i o n s Time Time Bad Target Good
  • 23.
    Summary   •  Resource  management  in  cloud  infrastructures  is   complex   –  MulCple  layers  of  RM   –  Complex  primiCves   –  Complex  decisions   •  We  are  developing  feedback  control  theory  based  RM     •  XRM  is  event-­‐based,  pluggable  and  extensible   •  Complex  algorithms  like  AutoControl  can  be   developed   •  Research  in  advanced  algorithms  in  progress   Xen Summit AMD 2010
  • 24.
    Summary  of  our  experiences  with  XCP  0.1.1   •  We  are  trying  to  build  a  research  cloud  based  on  XCP   •  Other  than  XRM,  adding  Fault  Tolerance  and  a  Web-­‐based   GUI  to  XCP   •  Having  to  install  a  special  distribuCon  is  difficult   –  Why  not  have  XCP  as  a  set  of  packages  in  RHEL  or  other   distribuCons?   –  You  are  breaking  toolstacks  developed  at  various  companies   •  XCP  docs  is  same  as  Citrix  Xenserver  docs   –  Some  of  the  features  don’t  work  or  not  supported   –  BeZer  documentaCon  of  API   •  XCP  GUI  needs  to  improve   –  Bugs  in  OpenXenCenter   Xen Summit AMD 2010
  • 25.
  • 26.
    We  want  feedback  from  Xen  community   •  Comments  on  XRM  architecture   •  Should  we  incorporate  XRM  into  XCP?     –  Ocaml   •  Are  you  interested  in  open  source  XRM?   –  Does  the  community  wants  to  be  involved?   •  QuesCons?                    ppadala@docomolabs-­‐usa.com   Xen Summit AMD 2010