Successfully reported this slideshow.

Migrating to Public Cloud


Published on

The Netflix recipe for migrating your organization from building a datacenter based product to a cloud based product. First presented at the Silicon Valley Cloud Computing Meetup "Speak Cloudy to Me" on Saturday April 30th, 2011

Published in: Technology
  • S5 is hilarious. S6 is probably worth expanding on sometime - lots of biz people understand the capex implication but not so much how lumpy capacity purchases affect revenue models that factor for storage.
    Are you sure you want to  Yes  No
    Your message goes here
  • Many thanks to everyone for viewing these slides, they made it onto the slideshare home page.

    This was presented at the Silicon Valley Cloud Computing Meetup on April 30th, and a video of that presentation should be available soon.
    Are you sure you want to  Yes  No
    Your message goes here
  • Found a few typo's Jeeves should be Jenkins, RDB should be RDS.
    Are you sure you want to  Yes  No
    Your message goes here

Migrating to Public Cloud

  1. Moving  Your  Organiza.on  To   Public  Cloud   April  30th,  2011   Adrian  Cockcro@   @adrianco  #neDlixcloud   hFp://  
  2. With  a  hop,  skip  and  jump  into   public  cloud…   Prototype  to  get  familiar  with  cloud   Convince  Managers  of  cloud  value   Get  Developers  comfortable  with  new  tools   Incremental  deployment  strategies  
  3. Why  Use  Public  Cloud?  
  4. Fric.onless  Deployment     (JFDI)  
  5. Things  We  Don’t  Do  
  6. Capacity  Planning  in  Clouds  •  Capacity  is  expensive  •  Capacity  takes  .me  to  buy  and  provision  •  Capacity  only  increases,  can’t  be  shrunk  easily  •  Capacity  comes  in  big  chunks,  paid  up  front  •  Planning  errors  can  cause  big  problems  •  Systems  are  clearly  defined  assets  •  Systems  can  be  instrumented  in  detail  
  7. BeFer  Business  Agility  
  8. Data  Center   NeDlix  could  not   build  new   datacenters  fast   enough   Capacity  growth  is,  unpredictable   Product  launch  spikes  -­‐  iPhone,  Wii,  PS3,  XBox  
  9. Which  Cloud?  What  MaFers?  •  Scalability  over  the  full  range   –  Small  scale  –  trivial  sign  up  and  low  cost  to  learn   –  Large  scale  –  deploy  1000’s  of  systems  per  hour  •  Large  and  Mature  Feature  Set   –  Less  work  to  do  yourself   –  Well  understood  and  robust  •  Large  Developer  Community   –  Easy  to  find  expert  staff   –  Lots  of  tools  and  open  source  support  
  10. Cloud  Portability?  •  PlaDorm  vendor  lock-­‐in  vs.  Cloud  vendor  lock-­‐in   –  Who  do  you  trust  for  the  long  term?   –  How  likely,  how  much  effort  to  switch  vendors?  •  Portable  tools  and  plaDorm  issues   –  Lowest  common  denominator  portability   –  Slow  to  add  advanced  features,  abstrac.on  conflicts  •  Reach  Around  the  PlaDorm   –  Access  to  underlying  features  creeps  in   –  You  aren’t  really  portable  in  the  end…  
  11. What  About  Cost?  •  Explicitly  a  non-­‐goal   –  Don’t  distract  the  developers,  catch  excep.ons  only   –  Expect  costs  to  decline  over  .me  as  market  matures  •  Cloud  costs  are  fully  burdened   –  Includes,  power,  staffing,  automa.on   –  No  charges  for  idle  and  obsolete  systems  •  Opportunity  Costs   –  Drama.cally  simpler  and  faster  decision  making   –  How  much  is  manager/  aFen.on  span  worth?  
  12. NeDlix  Choice  was  AWS  with  our   own  plaDorm  and  tools  
  13. Leverage  AWS  Scale   “the  biggest  public  cloud”   AWS  investment  in  tooling  and  automa.on  Use  AWS  zones  and  regions  for  high  availability,   scalability  and  global  deployment  
  14. Leverage  AWS  Feature  Set   “the  market  leader”  EC2,  S3,  SDB,  SQS,  EBS,  EMR,  ELB,  ASG,  IAM,  RDB,  VPC…  
  15. “The  cloud  lets  its  users  focus   on  delivering  differen4a4ng   business  value  instead  of   was4ng  valuable  resources   on  the  undifferen)ated   heavy  li0ing  that  makes   up  most  of  IT   infrastructure.”      Werner  Vogels    Amazon  CTO    
  16. Developers  and  Opera.ons  
  17. Devops  •  Developers  who  own  their  code  in  produc.on  •  Ops  staff  who  can  write  code  and  tools  •  How  do  they  bootstrap  into  cloud?   –  All  key  tools  are  open  source  or  in  the  cloud   –  Trivial  $  investment  to  learn  AWS,  NoSQL  etc.   –  No  excuse  to  not  have  it  on  your  resume…  
  18. Implica.ons  for  IT  Opera.ons  •  Cloud  is  run  by  developer  organiza.on   –  Our  IT  department  is  the  AWS  API   –  We  have  no  IT  staff  working  on  cloud  •  Cloud  capacity  is  much  bigger  than  Datacenter   –  Datacenter  oriented  IT  staffing  is  flat   –  We  have  moved  a  few  people  out  of  IT  to  write  code  •  Tradi.onal  IT  Roles  are  going  away   –  Don’t  need  SA,  DBA,  Storage,  Network  admins  
  19. Datacenter  oriented  tools  don’t   work   Ephemeral  instances   High  rate  of  change  
  20. “fork-­‐li@ed”  apps  don’t  work  well   Fragile   Too  many  datacenter  oriented   assump.ons  
  21. “In  the  datacenter,  robust  code  is  best   prac4ce.  In  the  cloud,  it’s  essen4al.”    
  22. Port  to  Cloud  Architecture  Short  term  investment,  long  term  payback!   Pay  down  technical  debt   Robust  paFerns  
  23. Transi.on  •  The  Goals   –  Faster,  Scalable,  Available  and  •  An.-­‐paFerns  and  Cloud  Architecture   –  The  things  we  wanted  to  change  and  why  •  Developer  Transi.ons  and  Tools   –  Cloud  Bring-­‐up  Strategy    
  24. Datacenter  An.-­‐PaFerns   What  do  we  currently  do  in  the  datacenter  that  prevents  us  from  our  goals?    
  25. Old  Datacenter  vs.  New  Cloud  Arch   Central  SQL  Database   Distributed  Key/Value  NoSQL   S.cky  In-­‐Memory  Session   Shared  Memcached  Session   ChaFy  Protocols   Latency  Tolerant  Protocols   Tangled  Service  Interfaces   Layered  Service  Interfaces   Instrumented  Code   Instrumented  Service  PaFerns   Fat  Complex  Objects   Lightweight  Serializable  Objects   Components  as  Jar  Files   Components  as  Services  
  26. Tools  and  Automa.on  •  Developer  and  Build  Tools   –  Jira,  Perforce,  Eclipse,  Jeeves,  Ivy,  Ar.factory   –  Builds,  creates  .war  file,  .rpm,  bakes  AMI  and  launches  •  Custom  NeDlix  Applica.on  Console   –  AWS  Features  at  Enterprise  Scale  (hide  the  AWS  security  keys!)   –  Auto  Scaler  Group  is  unit  of  deployment  to  produc.on  •  Open  Source  +  Support   –  Apache,  Tomcat,  Cassandra,  Hadoop,  OpenJDK,  CentOS   –  Soon?  TwiFer  Rainbird  hFp://­‐rainbird/  •  Monitoring  Tools   –  AppDynamics  –  Developer  focus  for  cloud  hFp://   –  EpicNMS  –  flexible  data  collec.on  and  plots  hFp://  
  27. Cloud  Developers  JFDI  Boot  Camp  •  Concentrated  Stretch  Goal   –  Built  a  rough  prototype  working  web  site  in  test  account   –  Room  full  of  engineers  sharing  the  pain  for  1-­‐2  days  •  Hands-­‐on  in  the  cloud  with  a  new  code  base   –  Debug  lots  of  tooling  and  conceptual  issues  very  fast   –  Try  out  architectures  and  paFerns,  throwaway,  no  risk  •  Whiteboard  and  Wiki  Pages  –  Built  During  Boot  Camp   –  What  core  objects  already  exist,  how  to  make  your  own   –  What  components  already  exist  or  are  work  in  progress  
  28. Developer  Instances  Collision  •  Development  in  shared  test  account  •  Shared  data  sources  and  most  services  •  Sam  and  Rex  both  want  to  deploy  web  front  end  •  Who  wins?   Sam   Rex   web  in   test   account  
  29. Developer  Service  Stacks  •  Developer  specific  service  instances   –  Configured  via  Java  at   –  implemented  by  REST  client  library  •  Server  Configura.on   –  Configure  discovery  service  “stack”  string   –  Registers  as  <appname>-­‐<stack>  •  Client  Configura.on   –  Route  traffic  on  per-­‐service  basis  including  stack  
  30. Per-­‐Service  Stack   Developers  choose  what  to  share   Sam   Rex   Mike   web-­‐sam   web-­‐rex   web-­‐dev  backend-­‐dev   backend-­‐dev   backend-­‐mike  
  31. Cloud  Product  Bring-­‐Up  Strategy   Simplest  and  Soonest  
  32. Shadow  Traffic  Redirec.on  •  First  traffic  sent  to  cloud   –  Real  traffic  stream  to  validate  cloud  back  end   –  Uncovered  lots  of  process  and  tools  issues   –  Uncovered  Service  latency  issues  •  TV  Device  calls  Datacenter  API   –  Returns  Genre/movie  list  for  a  customer   –  Asynchronously  duplicates  request  to  cloud   –  Start  with  send-­‐and-­‐forget  mode,  ignore  response  
  33. Shadow  Redirect  Instances   Modified   Datacenter   Datacenter   Service   Instances  Modified  Cloud   Cloud  Service   One  request  per   Instances   visit   Data  Sources   queueservice   videometadata  
  34. First  Web  Pages  in  the  Cloud  
  35. Starz  Page  
  36. First  Page  •  First  full  page  –  Starz  Channel  Genre   –  Simplest  page,  no  sub-­‐genres,  minimal  personaliza.on   –  Lots  of  investment  in  new  Struts  based  page  design  •  New  “merchweb”  front  end  instance   –  points  to  merchweb  instance  •  Uncovered  lots  of  latency  issues   –  Used  memcached  to  hide  S3  and  SimpleDB  latency   –  Improved  from  slower  to  faster  than  Datacenter  
  37. Starz  Page  Cloud  Instances   Front  End   merchweb   mul.ple  requests   Middle  Tier   starz    memcached   per  visit  Data  Sources   queueservice   rentalhistory   videometadata  
  38. Controlled  Cloud  Transi.on  •  WWW  calling  code  chooses  who  goes  to  cloud   –  Filter  out  corner  cases,  send  percentage  of  users  •  Redirect  if  Needed   –  The  URL  that  customers  see  is   hFp://   –  If  problem,  redirect  to  old  Datacenter  page   hFp://  •  Play  BuFon  and  Star  Ac.on  redirect   –  Point  URLs  for  ac.ons  that  create/modify  data  back  to   datacenter  to  start  with  
  39. Big-­‐Bang  Transi.on  •  iPhone  Launch  (August/Sept  2010)   –  Not  enough  capacity  in  the  datacenter,  cloud  only   –  App  Store  gates  release,  one  shot,  can’t  back  out  •  SOASTA  Cloud  Based  Load  Genera.on   –  Has  to  work  at  large  scale  on  day  one   –  Stress  test  API  and  end-­‐to-­‐end  func.onality  
  40. WWW  Page  by  Page  •  2010  Gradual  Migra.on  from  Datacenter   –  Add  pages  as  dependent  services  come  online   –  Home  page  –  most  complex  and  highest  traffic  •  2011  Clean  up  stragglers  and  dependencies   –  Shut  down  datacenter  service  .ers   –  Move  developer  focus  totally  to  cloud  
  41. Hop,  Skip,  Jump  •  Move  yourself  •  Move  your  management  and  colleagues  •  Move  your  developers  and  devops  •  Move  your  product  
  42. Takeaway     Hop,  skip,  jump……  splash!  Come  on  in,  the  water’s  fine,  just  a  bit  cloudy.     hFp://   @adrianco  #neDlixcloud  
  43. Amazon Cloud Terminology Reference See This is not a full list of Amazon Web Service features•  AWS  –  Amazon  Web  Services  (common  name  for  Amazon  cloud)  •  AMI  –  Amazon  Machine  Image  (archived  boot  disk,  Linux,  Windows  etc.  plus  applica.on  code)  •  EC2  –  Elas.c  Compute  Cloud   –  Range  of  virtual  machine  types  m1,  m2,  c1,  cc,  cg.  Varying  memory,  CPU  and  disk  configura.ons.   –  Instance  –  a  running  computer  system.  Ephemeral,  when  it  is  de-­‐allocated  nothing  is  kept.   –  Reserved  Instances  –  pre-­‐paid  to  reduce  cost  for  long  term  usage   –  Availability  Zone  –  datacenter  with  own  power  and  cooling  cloud  instances   –  Region  –  group  of  Availability  Zones  –  US-­‐East,  US-­‐West,  EU-­‐Eire,  Asia-­‐Singapore,  Asia-­‐Japan  •  ASG  –  Auto  Scaling  Group  (instances  from  the  same  AMI)  •  S3  –  Simple  Storage  Service  (hFp  access)  •  EBS  –  Elas.c  Block  Storage  (network  disk  filesystem  can  be  mounted  on  an  instance)  •  RDB  –  Rela.onal  Data  Base  (managed  MySQL  master  and  slaves)  •  SDB  –  Simple  Data  Base  (hosted  hFp  based  NoSQL  data  store)  •  SQS  –  Simple  Queue  Service  (hFp  based  message  queue)  •  SNS  –  Simple  No.fica.on  Service  (hFp  and  email  based  topics  and  messages)  •  EMR  –  Elas.c  Map  Reduce  (automa.cally  managed  Hadoop  cluster)  •  ELB  –  Elas.c  Load  Balancer  •  EIP  –  Elas.c  IP  (stable  IP  address  mapping  assigned  to  instance  or  ELB)  •  VPC  –  Virtual  Private  Cloud  (extension  of  enterprise  datacenter  network  into  cloud)  •  IAM  –  Iden.ty  and  Access  Management  (fine  grain  role  based  security  keys)