Going	  Ac)ve/Ac)ve	                                                    Eric	  Rosenberry	   Cory	  von	  Wallenstein	    ...
Introduc)ons	   Cory	  von	  Wallenstein	                     Eric	  Rosenberry	  Chief	  Technology	  Officer,	  	     Prin...
What	  Do	  We	  Mean	  By	                 Ac)ve/Ac)ve?	  •    Ac)ve	  •    Passive	  •    Ac)ve/Passive	  •    Ac)ve/Ac)...
What	  Are	  We	  Looking	  to	  Gain?	  •  High(er)	  availability	  •  Flexibility	  to	  change	  infrastructure	  with...
Ac)ve/Ac)ve	  FUD	  •  “It’s	  impossible!”	      –  CAP	  theorem	      –  WAN	  latency	  •  “It’s	  built	  in	  to	  m...
“Wired	  people	  should	  know	  something	  about	  wires”	  -­‐  Neal	  Stephenson,	  quoted	  in	  Andrew	  Blum’s	  T...
hZp://www.ted.com/talks/andrew_blum_what_is_the_internet_really.html	  
Paradigm	  Shif	  •  All	  system	  maintenance	  is	  done	  during	     business	  hours	  without	  impact	  •  All	  s...
The	  Four	  Ques)ons	  You	  Need	  To	            Ask	  Before	  Embarking	  1.    What	  problem(s)	  am	  I	  aZemp)ng...
Step	  One:	  Scope	  the	  Problem	  •  What	  are	  we	  replica)ng	  and	  why?	  •  How	  close	  to	  real)me	  is	  ...
Step	  One:	  Scope	  the	  Problem	    •  Example:	  •  iova)on	  end-­‐user	  facing	  content	  services	  must	  be	  ...
Step	  Two:	  How	  Will	  You	  Segment?	  •  Global	  Server	  Load	  Balancing	  with	  DNS	      –  Round	  robin	    ...
Step	  Three:	  Where	  Will	  You	  Deploy?	  •  Going	  from	  1	  to	  N	  •  Where	  are	  you	  thinking?	      –  Wh...
Disaster	  Resilience	  hZp://maps.google.com	  
Speed	  of	  Light	                                    Speed	  of	  light	                               299,792.458	  km/...
Implica)ons	  on	  Selec)on	        •  Things	  don’t	  work	  as	  well	  at	  90ms	  RTT	  latency	           as	  they	...
Where	  The	  Fiber	  Actually	  Goes	  hZp://soladrive.com/images/level3-­‐map-­‐large.png	  
Disaster	  Resilience:	  Local	  Failures	                                                                                ...
Local	  Failures	        “Squirrel	  chews	  account	  for	  a	  whopping	  17%	  of	  our	  damages	  so	        far	  th...
Get	  closer	  to	  users	  hZp://www.akamai.com/html/technology/dataviz1.html	  
Get	  closer	  to	  users	  hZp://www.akamai.com/html/technology/dataviz1.html	  
“Sorry,	  we’re	  full”	  hZp://www.theregister.co.uk/2010/10/12/capgemini_merlin_data_center/	  
Step	  Three:	  Where	  Will	  You	  Deploy?	  •  Don’t	  just	  assume	  vastly	  different	  geographic	     areas	  •  H...
Portland	  to	  SeaZle	  hZp://www.zayo.com/sites/default/files/images/Zayo-­‐US-­‐Network-­‐EXTERNAL-­‐11-­‐1-­‐2012.kmz	  
Step	  Four:	  Think	  Through	  Your	  Apps	  •  How	  will	  these	  different	  pieces	  of	  the	     architecture	  be...
Step	  Four:	  Think	  Through	  Your	  Apps	  •  Examples	  from	  Iova)on:	     –  Web	  Device	  Print	  code	  is	  se...
Summary	  •  Top	  takeaways	     –  Ac)ve/Ac)ve	  is	  a	  Paradigm	  Shif	     –  It	  is	  achievable	     –  Choose	  ...
What iovation DoesIden)fy	  and	  re-­‐recognize	  devices	  connec)ng	  to	  your	  business	  sites	  Associate	  groups...
Ques)ons?	   Cory	  von	  Wallenstein	                     Eric	  Rosenberry	  Chief	  Technology	  Officer,	  	     Princip...
Upcoming SlideShare
Loading in …5
×

Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry

867 views

Published on

Dyn's Cory von Wallenstein & Iovation's Eric Rosenberry did a webinar recently on active/active failover setup with managed DNS. Here's the official slides.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
867
On SlideShare
0
From Embeds
0
Number of Embeds
118
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry

  1. 1. Going  Ac)ve/Ac)ve   Eric  Rosenberry   Cory  von  Wallenstein   Principal  Infrastructure  Architect,  Chief  Technology  Officer,     iova)on  Inc.   Dyn  Inc.   eric.rosenberry@iova)on.com   @cvonwallenstein   @eprosenx    
  2. 2. Introduc)ons   Cory  von  Wallenstein   Eric  Rosenberry  Chief  Technology  Officer,     Principal  Infrastructure  Architect,   Dyn  Inc.   iova)on  Inc.   cvw@dyn.com   eric.rosenberry@iova)on.com   @cvonwallenstein   @eprosenx    
  3. 3. What  Do  We  Mean  By   Ac)ve/Ac)ve?  •  Ac)ve  •  Passive  •  Ac)ve/Passive  •  Ac)ve/Ac)ve  
  4. 4. What  Are  We  Looking  to  Gain?  •  High(er)  availability  •  Flexibility  to  change  infrastructure  without   down)me  •  Flexibility  to  expand  infrastructure  without   four  walled  limita)ons  •  Disaster  resilience  
  5. 5. Ac)ve/Ac)ve  FUD  •  “It’s  impossible!”   –  CAP  theorem   –  WAN  latency  •  “It’s  built  in  to  my  database!”   –  NoSQL  and  WAN  replica)on  •  Reality  is  it’s  somewhere  in  the  middle,   depending  on  what  problem  you’re  trying  to   solve  
  6. 6. “Wired  people  should  know  something  about  wires”  -­‐  Neal  Stephenson,  quoted  in  Andrew  Blum’s  TED  Talk  What  is  the  Internet,  Really?   hZp://www.flickr.com/photos/notaperfectpilot/8119088205/  
  7. 7. hZp://www.ted.com/talks/andrew_blum_what_is_the_internet_really.html  
  8. 8. Paradigm  Shif  •  All  system  maintenance  is  done  during   business  hours  without  impact  •  All  sofware  upgrades  are  done  during   business  hours  •  Sofware  upgrades  do  not  require  down)me,   so  code  can  be  pushed  to  produc)on  more   rapidly  (more  frequent  smaller  itera)ons)  •  Enable  commodity  hardware  usage  
  9. 9. The  Four  Ques)ons  You  Need  To   Ask  Before  Embarking  1.  What  problem(s)  am  I  aZemp)ng  to  solve?  2.  How  will  I  segment?  3.  Where  will  I  deploy?  4.  How  will  this  affect  each  part  of  my  app?  
  10. 10. Step  One:  Scope  the  Problem  •  What  are  we  replica)ng  and  why?  •  How  close  to  real)me  is  it  needed  to  be?   –  Synchronous  vs.  Asynchronous  •  Think  about  this  for  each  applica)on  )er,  and   set  availability/distribu)on  goals  
  11. 11. Step  One:  Scope  the  Problem   •  Example:  •  iova)on  end-­‐user  facing  content  services  must  be  served  using  the  closest  GSLB   selected  node  and  each  node  must  have  N  capacity  (where  N  =  our  full  overall  global   load)  -­‐  so  overall  we  have  more  than  4N  total  capacity  with  all  nodes  online  •  iova)on  real-­‐)me  API  services  require  N+1  redundancy  in  each  of  our  two  Ac)ve/Ac)ve   facili)es  -­‐  i.e.  2  *  (N+1)  -­‐  Allows  us  to  lose  any  server,  plus  a  datacenter  and  con)nue  to   func)on  •  Non  real-­‐)me  API  services    (i.e.  Admin  Console)  require  2N+  resiliancy  (i.e.  one  instance   in  each  of  our  two  Ac)ve/Ac)ve  datacenters,  with  that  instance  running  on  a  N+1   Virtual  cluster)  •  Some  internal  processes  (i.e.  Research  Analy)cs)  only  require  placement    in  one   datacenter  
  12. 12. Step  Two:  How  Will  You  Segment?  •  Global  Server  Load  Balancing  with  DNS   –  Round  robin   –  Advanced  load  balancing   –  Ac)ve  failover   –  Geographic  •  Other  strategies  (out  of  scope  for  today):   –  Anycast  –  Challenges  with  TCP   –  HTTP  Redirec)on  –  Challenges  with  performance   –  BGP  Netblock  based  failover    
  13. 13. Step  Three:  Where  Will  You  Deploy?  •  Going  from  1  to  N  •  Where  are  you  thinking?   –  What  are  your  current  datacenter  assets  and  how   can  they  be  leveraged?  •  And  for  what  reasons?   –  Disaster  resilience   –  Get  closer  to  users   –  Room  to  grow  
  14. 14. Disaster  Resilience  hZp://maps.google.com  
  15. 15. Speed  of  Light   Speed  of  light   299,792.458  km/second   (in  a  vacuum)   Theore)cal  RTT   ~40ms   Real  RTT   ~90ms  hZp://www.cogentco.com/files/images/network/network_map/networkmap_global_large.png  
  16. 16. Implica)ons  on  Selec)on   •  Things  don’t  work  as  well  at  90ms  RTT  latency   as  they  do  at  9ms  RTT  latency   •  Where  can  you  go  to  get  out  of  the  way  of  a   disaster  but  not  create  latency  headaches?  hZp://www.globaldatavault.com/natural-­‐disaster-­‐threat-­‐maps.htm  
  17. 17. Where  The  Fiber  Actually  Goes  hZp://soladrive.com/images/level3-­‐map-­‐large.png  
  18. 18. Disaster  Resilience:  Local  Failures   “A  frying  squirrel  took   out  half  of  our  Santa   Clara  data  center  two   years  back,”   -­‐  Mike  Chris)an,  Yahoo  hZp://www.datacenterknowledge.com/archives/2012/07/09/outages-­‐surviving-­‐electric-­‐squirrels-­‐ups-­‐failures/  
  19. 19. Local  Failures   “Squirrel  chews  account  for  a  whopping  17%  of  our  damages  so   far  this  year!    But  let  me  add  that  it  is  down  from  28%  just  last   year  and  it  con)nues  to  decrease  since  we  added  cable  guards   to  our  plant.”,  Fred  Lawler,  Level(3)  hZp://blog.level3.com/level-­‐3-­‐network/the-­‐10-­‐most-­‐bizarre-­‐and-­‐annoying-­‐causes-­‐of-­‐fiber-­‐cuts/  
  20. 20. Get  closer  to  users  hZp://www.akamai.com/html/technology/dataviz1.html  
  21. 21. Get  closer  to  users  hZp://www.akamai.com/html/technology/dataviz1.html  
  22. 22. “Sorry,  we’re  full”  hZp://www.theregister.co.uk/2010/10/12/capgemini_merlin_data_center/  
  23. 23. Step  Three:  Where  Will  You  Deploy?  •  Don’t  just  assume  vastly  different  geographic   areas  •  How  far  do  you  need  to  go  to  get  out  of  same   disaster  zone?   –  What  kind  of  disasters  happen  in  your  area?   –  What  geographic  barriers  are  there?   –  Can  you  drive  it  in  an  emergency?  
  24. 24. Portland  to  SeaZle  hZp://www.zayo.com/sites/default/files/images/Zayo-­‐US-­‐Network-­‐EXTERNAL-­‐11-­‐1-­‐2012.kmz  
  25. 25. Step  Four:  Think  Through  Your  Apps  •  How  will  these  different  pieces  of  the   architecture  behave  with  increased  latency   between  them?  •  Can  you  avoid  real-­‐)me  calls  across  the  WAN?  
  26. 26. Step  Four:  Think  Through  Your  Apps  •  Examples  from  Iova)on:   –  Web  Device  Print  code  is  served  from  four  global   nodes  using  GSLB   •  via  Dyn  Traffic  Management   •  Was  our  first  Ac)ve/Ac)ve  applica)on   –  Real  )me  API  responses  are  served  Ac)ve/Ac)ve   between  Portland  and  SeaZle   •  50%  of  the  )me  our  API  URL  returns  PDX,  and  50%  it   returns  SEA  IP   •  Real  )me  queries  are  handled  locally  within  single  DC  
  27. 27. Summary  •  Top  takeaways   –  Ac)ve/Ac)ve  is  a  Paradigm  Shif   –  It  is  achievable   –  Choose  your  loca)ons  carefully   •  Network  is  a  primary  selec)on  criteria   •  How  far  do  you  really  need  to  go?   –  Analyze  each  applica)on  )ers  constraints  carefully   –  Start  with  low  hanging  fruit  
  28. 28. What iovation DoesIden)fy  and  re-­‐recognize  devices  connec)ng  to  your  business  sites  Associate  groups  of  devices  that  would  otherwise  appear  unrelated  Assess  real-­‐)me  risk  through  business  rules  including  velocity,  anomaly,  proxy  use,  etc.  
  29. 29. Ques)ons?   Cory  von  Wallenstein   Eric  Rosenberry  Chief  Technology  Officer,     Principal  Infrastructure  Architect,   Dyn  Inc.   iova)on  Inc.   cvw@dyn.com   eric.rosenberry@iova)on.com   @cvonwallenstein   @eprosenx    

×