Moving	  Your	  Organiza.on	  To	            Public	  Cloud	                April	  30th,	  2011	                Adrian	  ...
With	  a	  hop,	  skip	  and	  jump	  into	                public	  cloud…	               Prototype	  to	  get	  familiar	...
Why	  Use	  Public	  Cloud?	  
Fric.onless	  Deployment	  	           (JFDI)	  
Things	  We	  Don’t	  Do	  
Capacity	  Planning	  in	  Clouds	  •    Capacity	  is	  expensive	  •    Capacity	  takes	  .me	  to	  buy	  and	  provis...
BeFer	  Business	  Agility	  
Data	  Center	                  NeDlix	  could	  not	                                       build	  new	                  ...
Which	  Cloud?	  What	  MaFers?	  •  Scalability	  over	  the	  full	  range	       –  Small	  scale	  –	  trivial	  sign	...
Cloud	  Portability?	  •  PlaDorm	  vendor	  lock-­‐in	  vs.	  Cloud	  vendor	  lock-­‐in	      –  Who	  do	  you	  trust	...
What	  About	  Cost?	  •  Explicitly	  a	  non-­‐goal	       –  Don’t	  distract	  the	  developers,	  catch	  excep.ons	 ...
NeDlix	  Choice	  was	  AWS	  with	  our	     own	  plaDorm	  and	  tools	  
Leverage	  AWS	  Scale	     “the	  biggest	  public	  cloud”	    AWS	  investment	  in	  tooling	  and	  automa.on	  Use	 ...
Leverage	  AWS	  Feature	  Set	        “the	  market	  leader”	  EC2,	  S3,	  SDB,	  SQS,	  EBS,	  EMR,	  ELB,	  ASG,	  IA...
“The	  cloud	  lets	  its	  users	  focus	           on	  delivering	  differen4a4ng	           business	  value	  instead	...
Developers	  and	  Opera.ons	  
Devops	  •  Developers	  who	  own	  their	  code	  in	  produc.on	  •  Ops	  staff	  who	  can	  write	  code	  and	  tool...
Implica.ons	  for	  IT	  Opera.ons	  •  Cloud	  is	  run	  by	  developer	  organiza.on	      –  Our	  IT	  department	  i...
Datacenter	  oriented	  tools	  don’t	                 work	             Ephemeral	  instances	             High	  rate	  ...
“fork-­‐li@ed”	  apps	  don’t	  work	  well	                          Fragile	          Too	  many	  datacenter	  oriented...
“In	  the	  datacenter,	  robust	  code	  is	  best	   prac4ce.	  In	  the	  cloud,	  it’s	  essen4al.”	                  ...
Port	  to	  Cloud	  Architecture	  Short	  term	  investment,	  long	  term	  payback!	              Pay	  down	  technica...
Transi.on	  •  The	  Goals	         –  Faster,	  Scalable,	  Available	  and	  •  An.-­‐paFerns	  and	  Cloud	...
Datacenter	  An.-­‐PaFerns	   What	  do	  we	  currently	  do	  in	  the	  datacenter	  that	  prevents	  us	  from	      ...
Old	  Datacenter	  vs.	  New	  Cloud	  Arch	      Central	  SQL	  Database	          Distributed	  Key/Value	  NoSQL	   S....
Tools	  and	  Automa.on	  •  Developer	  and	  Build	  Tools	        –  Jira,	  Perforce,	  Eclipse,	  Jeeves,	  Ivy,	  Ar...
Cloud	  Developers	  JFDI	  Boot	  Camp	  •  Concentrated	  Stretch	  Goal	       –  Built	  a	  rough	  prototype	  worki...
Developer	  Instances	  Collision	  •    Development	  in	  shared	  test	  account	  •    Shared	  data	  sources	  and	 ...
Developer	  Service	  Stacks	  •  Developer	  specific	  service	  instances	     –  Configured	  via	  Java	  a...
Per-­‐Service	  Stack	          Developers	  choose	  what	  to	  share	       Sam	                 Rex	         ...
Cloud	  Product	  Bring-­‐Up	  Strategy	             Simplest	  and	  Soonest	  
Shadow	  Traffic	  Redirec.on	  •  First	  traffic	  sent	  to	  cloud	      –  Real	  traffic	  stream	  to	  validate	  cloud	...
Shadow	  Redirect	  Instances	     Modified	                                              Datacenter	    Datacenter	       ...
First	  Web	  Pages	  in	  the	  Cloud	  
Starz	  Page	  
First	  Page	  •  First	  full	  page	  –	  Starz	  Channel	  Genre	      –  Simplest	  page,	  no	  sub-­‐genres,	  minim...
Starz	  Page	  Cloud	  Instances	    Front	  End	                                                     merchweb	           ...
Controlled	  Cloud	  Transi.on	  •  WWW	  calling	  code	  chooses	  who	  goes	  to	  cloud	      –  Filter	  out	  corne...
Big-­‐Bang	  Transi.on	  •  iPhone	  Launch	  (August/Sept	  2010)	     –  Not	  enough	  capacity	  in	  the	  datacenter...
WWW	  Page	  by	  Page	  •  2010	  Gradual	  Migra.on	  from	  Datacenter	     –  Add	  pages	  as	  dependent	  services	...
Hop,	  Skip,	  Jump	  •  Move	  yourself	  •  Move	  your	  management	  and	  colleagues	  •  Move	  your	  developers	  ...
Takeaway	                                   	                Hop,	  skip,	  jump……	  splash!	  Come	  on	  in,	  the	  wat...
Amazon Cloud Terminology Reference     See This is not a full list of Amazon Web Service features• ...
Upcoming SlideShare
Loading in...5

Migrating to Public Cloud


Published on

The Netflix recipe for migrating your organization from building a datacenter based product to a cloud based product. First presented at the Silicon Valley Cloud Computing Meetup "Speak Cloudy to Me" on Saturday April 30th, 2011

Published in: Technology
  • S5 is hilarious. S6 is probably worth expanding on sometime - lots of biz people understand the capex implication but not so much how lumpy capacity purchases affect revenue models that factor for storage.
    Are you sure you want to  Yes  No
    Your message goes here
  • Many thanks to everyone for viewing these slides, they made it onto the slideshare home page.

    This was presented at the Silicon Valley Cloud Computing Meetup on April 30th, and a video of that presentation should be available soon.
    Are you sure you want to  Yes  No
    Your message goes here
  • Found a few typo's Jeeves should be Jenkins, RDB should be RDS.
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Migrating to Public Cloud

  1. 1. Moving  Your  Organiza.on  To   Public  Cloud   April  30th,  2011   Adrian  Cockcro@   @adrianco  #neDlixcloud   hFp://  
  2. 2. With  a  hop,  skip  and  jump  into   public  cloud…   Prototype  to  get  familiar  with  cloud   Convince  Managers  of  cloud  value   Get  Developers  comfortable  with  new  tools   Incremental  deployment  strategies  
  3. 3. Why  Use  Public  Cloud?  
  4. 4. Fric.onless  Deployment     (JFDI)  
  5. 5. Things  We  Don’t  Do  
  6. 6. Capacity  Planning  in  Clouds  •  Capacity  is  expensive  •  Capacity  takes  .me  to  buy  and  provision  •  Capacity  only  increases,  can’t  be  shrunk  easily  •  Capacity  comes  in  big  chunks,  paid  up  front  •  Planning  errors  can  cause  big  problems  •  Systems  are  clearly  defined  assets  •  Systems  can  be  instrumented  in  detail  
  7. 7. BeFer  Business  Agility  
  8. 8. Data  Center   NeDlix  could  not   build  new   datacenters  fast   enough   Capacity  growth  is,  unpredictable   Product  launch  spikes  -­‐  iPhone,  Wii,  PS3,  XBox  
  9. 9. Which  Cloud?  What  MaFers?  •  Scalability  over  the  full  range   –  Small  scale  –  trivial  sign  up  and  low  cost  to  learn   –  Large  scale  –  deploy  1000’s  of  systems  per  hour  •  Large  and  Mature  Feature  Set   –  Less  work  to  do  yourself   –  Well  understood  and  robust  •  Large  Developer  Community   –  Easy  to  find  expert  staff   –  Lots  of  tools  and  open  source  support  
  10. 10. Cloud  Portability?  •  PlaDorm  vendor  lock-­‐in  vs.  Cloud  vendor  lock-­‐in   –  Who  do  you  trust  for  the  long  term?   –  How  likely,  how  much  effort  to  switch  vendors?  •  Portable  tools  and  plaDorm  issues   –  Lowest  common  denominator  portability   –  Slow  to  add  advanced  features,  abstrac.on  conflicts  •  Reach  Around  the  PlaDorm   –  Access  to  underlying  features  creeps  in   –  You  aren’t  really  portable  in  the  end…  
  11. 11. What  About  Cost?  •  Explicitly  a  non-­‐goal   –  Don’t  distract  the  developers,  catch  excep.ons  only   –  Expect  costs  to  decline  over  .me  as  market  matures  •  Cloud  costs  are  fully  burdened   –  Includes,  power,  staffing,  automa.on   –  No  charges  for  idle  and  obsolete  systems  •  Opportunity  Costs   –  Drama.cally  simpler  and  faster  decision  making   –  How  much  is  manager/  aFen.on  span  worth?  
  12. 12. NeDlix  Choice  was  AWS  with  our   own  plaDorm  and  tools  
  13. 13. Leverage  AWS  Scale   “the  biggest  public  cloud”   AWS  investment  in  tooling  and  automa.on  Use  AWS  zones  and  regions  for  high  availability,   scalability  and  global  deployment  
  14. 14. Leverage  AWS  Feature  Set   “the  market  leader”  EC2,  S3,  SDB,  SQS,  EBS,  EMR,  ELB,  ASG,  IAM,  RDB,  VPC…  
  15. 15. “The  cloud  lets  its  users  focus   on  delivering  differen4a4ng   business  value  instead  of   was4ng  valuable  resources   on  the  undifferen)ated   heavy  li0ing  that  makes   up  most  of  IT   infrastructure.”      Werner  Vogels    Amazon  CTO    
  16. 16. Developers  and  Opera.ons  
  17. 17. Devops  •  Developers  who  own  their  code  in  produc.on  •  Ops  staff  who  can  write  code  and  tools  •  How  do  they  bootstrap  into  cloud?   –  All  key  tools  are  open  source  or  in  the  cloud   –  Trivial  $  investment  to  learn  AWS,  NoSQL  etc.   –  No  excuse  to  not  have  it  on  your  resume…  
  18. 18. Implica.ons  for  IT  Opera.ons  •  Cloud  is  run  by  developer  organiza.on   –  Our  IT  department  is  the  AWS  API   –  We  have  no  IT  staff  working  on  cloud  •  Cloud  capacity  is  much  bigger  than  Datacenter   –  Datacenter  oriented  IT  staffing  is  flat   –  We  have  moved  a  few  people  out  of  IT  to  write  code  •  Tradi.onal  IT  Roles  are  going  away   –  Don’t  need  SA,  DBA,  Storage,  Network  admins  
  19. 19. Datacenter  oriented  tools  don’t   work   Ephemeral  instances   High  rate  of  change  
  20. 20. “fork-­‐li@ed”  apps  don’t  work  well   Fragile   Too  many  datacenter  oriented   assump.ons  
  21. 21. “In  the  datacenter,  robust  code  is  best   prac4ce.  In  the  cloud,  it’s  essen4al.”    
  22. 22. Port  to  Cloud  Architecture  Short  term  investment,  long  term  payback!   Pay  down  technical  debt   Robust  paFerns  
  23. 23. Transi.on  •  The  Goals   –  Faster,  Scalable,  Available  and  •  An.-­‐paFerns  and  Cloud  Architecture   –  The  things  we  wanted  to  change  and  why  •  Developer  Transi.ons  and  Tools   –  Cloud  Bring-­‐up  Strategy    
  24. 24. Datacenter  An.-­‐PaFerns   What  do  we  currently  do  in  the  datacenter  that  prevents  us  from  our  goals?    
  25. 25. Old  Datacenter  vs.  New  Cloud  Arch   Central  SQL  Database   Distributed  Key/Value  NoSQL   S.cky  In-­‐Memory  Session   Shared  Memcached  Session   ChaFy  Protocols   Latency  Tolerant  Protocols   Tangled  Service  Interfaces   Layered  Service  Interfaces   Instrumented  Code   Instrumented  Service  PaFerns   Fat  Complex  Objects   Lightweight  Serializable  Objects   Components  as  Jar  Files   Components  as  Services  
  26. 26. Tools  and  Automa.on  •  Developer  and  Build  Tools   –  Jira,  Perforce,  Eclipse,  Jeeves,  Ivy,  Ar.factory   –  Builds,  creates  .war  file,  .rpm,  bakes  AMI  and  launches  •  Custom  NeDlix  Applica.on  Console   –  AWS  Features  at  Enterprise  Scale  (hide  the  AWS  security  keys!)   –  Auto  Scaler  Group  is  unit  of  deployment  to  produc.on  •  Open  Source  +  Support   –  Apache,  Tomcat,  Cassandra,  Hadoop,  OpenJDK,  CentOS   –  Soon?  TwiFer  Rainbird  hFp://­‐rainbird/  •  Monitoring  Tools   –  AppDynamics  –  Developer  focus  for  cloud  hFp://   –  EpicNMS  –  flexible  data  collec.on  and  plots  hFp://  
  27. 27. Cloud  Developers  JFDI  Boot  Camp  •  Concentrated  Stretch  Goal   –  Built  a  rough  prototype  working  web  site  in  test  account   –  Room  full  of  engineers  sharing  the  pain  for  1-­‐2  days  •  Hands-­‐on  in  the  cloud  with  a  new  code  base   –  Debug  lots  of  tooling  and  conceptual  issues  very  fast   –  Try  out  architectures  and  paFerns,  throwaway,  no  risk  •  Whiteboard  and  Wiki  Pages  –  Built  During  Boot  Camp   –  What  core  objects  already  exist,  how  to  make  your  own   –  What  components  already  exist  or  are  work  in  progress  
  28. 28. Developer  Instances  Collision  •  Development  in  shared  test  account  •  Shared  data  sources  and  most  services  •  Sam  and  Rex  both  want  to  deploy  web  front  end  •  Who  wins?   Sam   Rex   web  in   test   account  
  29. 29. Developer  Service  Stacks  •  Developer  specific  service  instances   –  Configured  via  Java  at   –  implemented  by  REST  client  library  •  Server  Configura.on   –  Configure  discovery  service  “stack”  string   –  Registers  as  <appname>-­‐<stack>  •  Client  Configura.on   –  Route  traffic  on  per-­‐service  basis  including  stack  
  30. 30. Per-­‐Service  Stack   Developers  choose  what  to  share   Sam   Rex   Mike   web-­‐sam   web-­‐rex   web-­‐dev  backend-­‐dev   backend-­‐dev   backend-­‐mike  
  31. 31. Cloud  Product  Bring-­‐Up  Strategy   Simplest  and  Soonest  
  32. 32. Shadow  Traffic  Redirec.on  •  First  traffic  sent  to  cloud   –  Real  traffic  stream  to  validate  cloud  back  end   –  Uncovered  lots  of  process  and  tools  issues   –  Uncovered  Service  latency  issues  •  TV  Device  calls  Datacenter  API   –  Returns  Genre/movie  list  for  a  customer   –  Asynchronously  duplicates  request  to  cloud   –  Start  with  send-­‐and-­‐forget  mode,  ignore  response  
  33. 33. Shadow  Redirect  Instances   Modified   Datacenter   Datacenter   Service   Instances  Modified  Cloud   Cloud  Service   One  request  per   Instances   visit   Data  Sources   queueservice   videometadata  
  34. 34. First  Web  Pages  in  the  Cloud  
  35. 35. Starz  Page  
  36. 36. First  Page  •  First  full  page  –  Starz  Channel  Genre   –  Simplest  page,  no  sub-­‐genres,  minimal  personaliza.on   –  Lots  of  investment  in  new  Struts  based  page  design  •  New  “merchweb”  front  end  instance   –  points  to  merchweb  instance  •  Uncovered  lots  of  latency  issues   –  Used  memcached  to  hide  S3  and  SimpleDB  latency   –  Improved  from  slower  to  faster  than  Datacenter  
  37. 37. Starz  Page  Cloud  Instances   Front  End   merchweb   mul.ple  requests   Middle  Tier   starz    memcached   per  visit  Data  Sources   queueservice   rentalhistory   videometadata  
  38. 38. Controlled  Cloud  Transi.on  •  WWW  calling  code  chooses  who  goes  to  cloud   –  Filter  out  corner  cases,  send  percentage  of  users  •  Redirect  if  Needed   –  The  URL  that  customers  see  is   hFp://   –  If  problem,  redirect  to  old  Datacenter  page   hFp://  •  Play  BuFon  and  Star  Ac.on  redirect   –  Point  URLs  for  ac.ons  that  create/modify  data  back  to   datacenter  to  start  with  
  39. 39. Big-­‐Bang  Transi.on  •  iPhone  Launch  (August/Sept  2010)   –  Not  enough  capacity  in  the  datacenter,  cloud  only   –  App  Store  gates  release,  one  shot,  can’t  back  out  •  SOASTA  Cloud  Based  Load  Genera.on   –  Has  to  work  at  large  scale  on  day  one   –  Stress  test  API  and  end-­‐to-­‐end  func.onality  
  40. 40. WWW  Page  by  Page  •  2010  Gradual  Migra.on  from  Datacenter   –  Add  pages  as  dependent  services  come  online   –  Home  page  –  most  complex  and  highest  traffic  •  2011  Clean  up  stragglers  and  dependencies   –  Shut  down  datacenter  service  .ers   –  Move  developer  focus  totally  to  cloud  
  41. 41. Hop,  Skip,  Jump  •  Move  yourself  •  Move  your  management  and  colleagues  •  Move  your  developers  and  devops  •  Move  your  product  
  42. 42. Takeaway     Hop,  skip,  jump……  splash!  Come  on  in,  the  water’s  fine,  just  a  bit  cloudy.     hFp://   @adrianco  #neDlixcloud  
  43. 43. Amazon Cloud Terminology Reference See This is not a full list of Amazon Web Service features•  AWS  –  Amazon  Web  Services  (common  name  for  Amazon  cloud)  •  AMI  –  Amazon  Machine  Image  (archived  boot  disk,  Linux,  Windows  etc.  plus  applica.on  code)  •  EC2  –  Elas.c  Compute  Cloud   –  Range  of  virtual  machine  types  m1,  m2,  c1,  cc,  cg.  Varying  memory,  CPU  and  disk  configura.ons.   –  Instance  –  a  running  computer  system.  Ephemeral,  when  it  is  de-­‐allocated  nothing  is  kept.   –  Reserved  Instances  –  pre-­‐paid  to  reduce  cost  for  long  term  usage   –  Availability  Zone  –  datacenter  with  own  power  and  cooling  cloud  instances   –  Region  –  group  of  Availability  Zones  –  US-­‐East,  US-­‐West,  EU-­‐Eire,  Asia-­‐Singapore,  Asia-­‐Japan  •  ASG  –  Auto  Scaling  Group  (instances  from  the  same  AMI)  •  S3  –  Simple  Storage  Service  (hFp  access)  •  EBS  –  Elas.c  Block  Storage  (network  disk  filesystem  can  be  mounted  on  an  instance)  •  RDB  –  Rela.onal  Data  Base  (managed  MySQL  master  and  slaves)  •  SDB  –  Simple  Data  Base  (hosted  hFp  based  NoSQL  data  store)  •  SQS  –  Simple  Queue  Service  (hFp  based  message  queue)  •  SNS  –  Simple  No.fica.on  Service  (hFp  and  email  based  topics  and  messages)  •  EMR  –  Elas.c  Map  Reduce  (automa.cally  managed  Hadoop  cluster)  •  ELB  –  Elas.c  Load  Balancer  •  EIP  –  Elas.c  IP  (stable  IP  address  mapping  assigned  to  instance  or  ELB)  •  VPC  –  Virtual  Private  Cloud  (extension  of  enterprise  datacenter  network  into  cloud)  •  IAM  –  Iden.ty  and  Access  Management  (fine  grain  role  based  security  keys)