Your SlideShare is downloading. ×
Failover and Global Server Load Balancing for Better Network Availability
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Failover and Global Server Load Balancing for Better Network Availability

3,922
views

Published on

Speaker Jeremy Hitchcock of Dynamic Network Services presents how to obtain better uptime and availability through network techniques like failover, global server load balancing, and CDN balancing. …

Speaker Jeremy Hitchcock of Dynamic Network Services presents how to obtain better uptime and availability through network techniques like failover, global server load balancing, and CDN balancing. Presented at Interop NYC 09.

Published in: Technology

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,922
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
142
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Failover  and  Global  Server  Load  Balancing     for  Be4er  Network  Availability   Jeremy  Hitchcock   CEO   Dynamic  Network  Services  
  • 2. Overview   • Problem  space:  Keeping  services  up   • About  Failover  and  GSLB   • Case  Study:  Roll  your  own  CDN  in...quick   • Case  Study:  Speed  and  Stability   • Case  Study:  DR  You  can  Sleep  On   • General  lessons  for  network  availability  
  • 3. You  are  probably…   •  SoJware  service  provider   •  Completely  online   •  UpLme  and  revenue  directly  related   •  Audience  is  internaLonal  (non-­‐geographical)   So  is  everyone  (lot  more  of  us)!  
  • 4. Mean  Time  Between  Failures  (MTBF)   (Local)  
  • 5. Fiber  Cuts   (Network/global)  
  • 6. Failures  Are  a     Way  of  Life   • Affects  bo4om  line   • Gets  people  paged   • Brands  loose  value  
  • 7. A  Be4er  Way?   •  Current  tools:  in-­‐house  scripts,  appliances,   CDN  networks   •  Either  high  opex  or  capex   •  New  opLons  in  infrastructure   •  Example:   – 5-­‐10  person  [boot-­‐strapped]  companies  rolling   self-­‐healing,  auto-­‐provisioning  networks  
  • 8. OpLmizing  The     Wrong  Part   • Hardware  redundancy  is  expensive   • Single  point  of  failures  are  bad   • Infrastructure  is  not  a  core  funcLon   • Things  break,  everything  auto   • Easier  (cheaper)  than  you  think  
  • 9. RealizaLons   •  Things  break,  route  around  outages   •  Infrastructure  providers  a  plenty  today   •  Users  more  sensiLve  to  outages   •  Internet  users  are  around  the  world   – Speed  of  light  is  sLll  c   – RTT  of  100m  with  50  objects  adds  up   Traffic  management  is  criBcal  
  • 10. Different  Architectures,     Different  Results   Old   New   Use  hardware  redundancy,  local   Use  soJware  redundancy   Super-­‐site  build  out     Regionalize,  all  over-­‐provisioned   Page  on  failure,  fix  based  on  page   Email  report  in  morning   Planned  deployments   AutomaLc  load  handling   Single  master  datacenter   Many  POPs,  all  closer  to  users   DR  is  a  passive,  manual  failover   DR  and  failover  blended  together  
  • 11. New  Tools  (new  to  some)   •  AutomaLc  failover   •  Global  server  load  balancing   •  CDN  balancing/managing   •  Opex  relaLve  to  actual  usage   •  Avoid  capex  step  funcLons  
  • 12. •  Two  acLve  components,                                                                                               traffic  switch                   •  Implies  external  monitoring   •  Hide  outages   Failover   Standard  operaLon   On  Failover  
  • 13. Failover  Use  Cases   •  Two  servers  for  www.domain.com   – On  failure,  redirect  from  one  to  the  other   – Works  via  DNS   – Redirect  to  a  staLc  page   •  Requirements   – External  monitoring  point   – External  DNS   – Low  DNS  caching  TTL  values  
  • 14. •  More  than  two  acLve   components   •  Traffic  management   –  TargeLng  (geo,  network)   –  WeighLng  (percent)   •  Failover  plus  opLmize  RTT   •  Hostname  to  A  record  mapping   Global  Server  Load  Balancing   (GSLB)  
  • 15. Global  Server  Load  Balancing  Use  Cases   •  Regionalize  eyeballs/end-­‐users   •  Internet  outages/subpar  speeds  avoided   •  Weight  based  on  load,  percentages   •  Requirements:   – Same  as  failover   – Bit  of  math/algorithms  to  balance  traffic   – Many  to  many  mappings  
  • 16. •  Two  complete  systems   •  Balance  between  CDNs   –  Bandwidth  commits   –  Regional  advantages   •  Works  on  CNAMEs   CDN  Management  
  • 17. CDN  Manager     •  Try  out  a  mix  of  networks     – CDNs,  infrastructure  providers   •  Be4er  manage  traffic   – Cost/performance  reasons   •  Requirements   – Same  as  GSLB  but  with  DNS  alias  CNAMEs  
  • 18. •  Internet  doesn't  care  about   domain.com   •  twi4er.com  128.121.146.228   •  Lot  of  tricks  you  can  do  here   Traffic  Cop:  DNS  
  • 19. Lenses  and  OpLons   •  EvaluaLon  Criteria   – SoJ/hard  costs,  capital/operaLng  costs   •  Outcome  based   – Determine  your  metrics,  test  those   •  PotenLal  Outcomes   – Roll  it  in  house   – CDN  Network   – Hardware  appliances   – SaaS-­‐based  
  • 20. Which  one  is  be4er?   •  Roll  it  in  house   –  Mid-­‐high  capex,  higher  than  you  think  opex   –  Lots  of  soJ-­‐costs,  applicaLon  specific  though   •  CDN  Network   –  Li4le  capex,  high  opex   –  Some  have  more  knobs  than  others     •  Hardware  appliances   –  High  capex,  low  opex   –  Need  to  make  full  investment  into  architecture   •  SaaS-­‐based   –  Li4le  capex,  low-­‐mid  opex   –  Let  others  worry  about  this  for  you  
  • 21. Case  Study  1   Roll  your  own  CDN  in...quick   Wikia  and  regionalizing     CDNs  for  be4er  delivery  
  • 22. CDN  Choice  and  Transparency   •  Lots  of  CDNs   – Two  great  public  ones   – 30  (more?)  private  providers   – Telco/ISP  opLons   •  Currently  give  customer  hostname   – (customer.cdn.com)   •  Only  test  with  live  traffic  
  • 23. CDN  Manager:  Enabling  TesLng   •  Segment  traffic  and  test   •  Try  2  or  10  CDNs   •  Low  risk  method  to  collect  data   •  Data  collecLon  has  to  be  from  end  points   – Your  office  computer  is  not  the  Internet   •  Can  be4er  rate  cost/performance  
  • 24. CDN  Manager:  Wikia   •  Wikia  runs  several  niche  wikis  (audience)   •  OpLmize  traffic  delivery  for  those  niches   •  Wanted  to  determine  the  best  CDN  based  on   actual  data  
  • 25. CDN  Manager:  Wikia   •  In  America,  use  CDN   •  In  Europe,  use  their  own   •  Why?    Who  knows,  but  it’s  the  best  for  their   traffic  
  • 26. Discussion   •  Not  all  CDNs  are  the  same   •  MulLple  relaLonships  to  manage   •  Cost  control/performance  of  CDNs   •  Audience  and  economies  drive  decisions  
  • 27. Case  Study  2   Speed  and  Stability   Twi4er  and  keeping  up  
  • 28. Speed  and  Stability   •  All  Internet  sites  have  DNS   – Range  from  good,  bad,  ugly   •  Online  services  must  be  fast  and  accurate   – Latency  and  upLme  are  what  ma4ers   •  Things  fail  all  the  Lme,  sends  users  to  what   works  
  • 29. Speed  and  Stability:  Twi4er   •  Spiky  and  growing  traffic  (like  a  lot)   •  Things  change  too  fast  to  keep  up   •  Load  balance  a  lot   •  Easier  to  scale  core  competencies   •  One  less  thing  to  worry  about  
  • 30. Speed  and  Stability:  Twi4er   •  DNS  part  of  system  to  make  site  work   •  Desire  not  to  be  an  expert  in  it   •  Huge,  wide  spread  audience   •  Online-­‐only  service  
  • 31. Discussion   •  When  infrastructure  changes  rapidly,  external   monitoring  good   •  Failover  message  is  be4er  than  Lmeouts   •  Keep  traffic  regionalize  through  targeLng   •  Outsource  non-­‐core  competencies   •  Latency  affects  page  views  or  ad  revenue  
  • 32. Case  Study  3:   Disaster  Recovery  You  Can  Sleep  With   37  Signals  and  doing     what  needs  to  get  done  
  • 33. Disaster  Recovery  ImplementaLon   Requirements   – One  good  facility  (A)   – One  backup  facility  (B)   – Ability  to  recognize  facility  A  is  out   – Ability  to  direct  traffic  from  A  to  B  
  • 34. Authorize.net  Interlude   •  DR  implementaLon  Lmeline   –  Late-­‐July:  move  to  new  DR  facility  and  plan   –  July  2:  fire  at  Fisher  Plaza  (unplanned)   –  July  3:  …   •  Only  missing  a  traffic  engineering  switch   •  TTLs  (DNS  record  caching)  a  big  difference   –  SLll  a  problem  today   –  secure.authorize.net.      86400      IN            A              64.94.118.32   •  Fully  discussion:  h4p://bit.ly/23mayf  
  • 35. DR:  37  Signals   •  Cloud  based  SaaS  tools,  have  to  be  up   •  External  DNS  important  for  controlling  traffic   •  What  if  facility  A  is  down  and  DNS  is  only  at  A?   •  External  DNS  means  failover/DR  possible  
  • 36. Discussion   •  Ensuring  full  replicaLon  is  usually  easy   •  Traffic  management,  is  usually  the  problem   •  Confuse  cold  assets/warm  spare/hot  acLve   •  People  wait  unLl  they  have  an  outage  to   implement  DR  
  • 37. Overall  Notes   •  Networked  services  need  to  be  rock  solid   •  Failover,  GSLB,  and  CDNM  are  within  reach   •  Wikia,  Twi4er,  and  37  Signals  using  external   traffic  management  for  their  applicaLon   •  Audience  ma4ers,  so  does  tesLng  and   benchmarking  
  • 38. •  DynTini   twi4er.com/dynLni  
  • 39. Copy  of  presentaLon?   Leave  a  business  card  in  back  (or  talk  to   me  aJerwards)  and  I’ll  send  it  to  you  
  • 40. Dynamic  Network  Services,  Inc.   1230  Elm  St.  FiJh  Floor   Manchester,  NH  03101   +1  888.840.3258   jeremy@dyn.com   dyn.com              Join  us  for  drinks:  dynLni.com                Follow  us  on  Twi4er:  @DynInc   Contact  Us   Uptime Is the Bottom Line.