Successfully reported this slideshow.
Your SlideShare is downloading. ×

Netflix on Cloud - combined slides for Dev and Ops


Check these out next

1 of 90 Ad

More Related Content

Slideshows for you (20)


Similar to Netflix on Cloud - combined slides for Dev and Ops (20)


Recently uploaded (20)

Netflix on Cloud - combined slides for Dev and Ops

  1. Ne#lix  in  the  Cloud   Nov  6,  2010   Adrian  Cockcro:   @adrianco  #ne#lixcloud   h=p://  
  2. Combined  Slides   For  both  Developer  and  OperaJons  Audiences  (2-­‐3  hours  of  content)   Teaser  intro  slides  also  at   Oct  14th  Beta  #devops  subset  -­‐  Cloud  CompuJng  Meetup   Nov  3rd  GA  –  QConSF  Developer  Oriented  subset   Nov  6th,  2010  –  Combined  slides  at  
  3. With  more  than  16  million  subscribers  in  the   United  States  and  Canada,  Ne9lix,  Inc.  is  the   world’s  leading  Internet  subscripAon  service   for  enjoying  movies  and  TV  shows.   Source:  h=p://  
  4. Why  Give  This  Talk?  
  5. Ne#lix  is  Path-­‐finding   The  Cloud  ecosystem  is  evolving  very  fast   Share  with  and  learn  from  the  cloud  community  
  6. We  want  to  use  clouds,   not  build  them   Cloud  technology  should  be  a  commodity   Public  cloud  and  open  source  for  agility  and  scale  
  7. We  are  looking  for  talent   Ne#lix  wants  to  connect  with  the  very  best   engineers  
  8. Why  Use  AWS?  
  9. We  stopped  building  our  own   datacenters   Capacity  growth  rate  is  acceleraJng,  unpredictable   Product  launch  spikes  -­‐  iPhone,  Wii,  PS3,  XBox   Datacenter  is  large  inflexible  capital  commitment  
  10. Customers   Q3  year/year  +52%  Total  and  +145%  Streaming   18   16   14   12   10   8   6   4   2   0   2009Q2   2009Q3   2009Q4   2010Q1   2010Q2   2010Q3   Source:  h=p://  
  11. Leverage  AWS  Scale   “the  biggest  public  cloud”   AWS  investment  in  tooling  and  automaJon   AWS  zones  for  high  availability,  scalability   AWS  skills  are  common  on  resumes…  
  12. Leverage  AWS  Feature  Set   “two  years  ahead  of  the  others”   EC2,  S3,  SDB,  SQS,  EBS,  EMR,  ELB,  ASG,  IAM,  RDB…  
  13. “The  cloud  lets  its  users  focus   on  delivering  differenAaAng   business  value  instead  of   wasAng  valuable  resources   on  the  undifferen)ated   heavy  li0ing  that  makes   up  most  of  IT   infrastructure.”    Werner  Vogels    Amazon  CTO  
  14. Ne#lix  Deployed  on  AWS   Content   Logs   Play   WWW   API   Video   S3   DRM   Search   Metadata   Masters   EMR   CDN   Movie   Device   EC2   Hadoop   rouJng   Choosing   Config   TV  Movie   S3   Hive   Bookmarks   RaJngs   Choosing   Business   Mobile   CDN   Logging   Similars   Intelligence   iPhone  
  15. Movie  Encoding  farm  (2009)   •  Tens  of  thousands  of  videos   Content   •  Thousands  of  EC2  instances   Video   •  Encoding  apps  on  MS  Windows   Masters   •  ~100  speed/format  permutaJons     •  Petabytes  of  S3   EC2   •  Content  Delivery  Networks   S3   “Ne9lix  is  one  of  the  largest  customers   of  the  biggest  CDNs  Akamai  and   Limelight”   CDN  
  16. Hadoop  -­‐  ElasJc  Map-­‐Reduce  (2009)   •  Web  Access  Logs   Logs   •  Streaming  Service  Logs   S3   •  Terabyte  per  day  scale   •  Easy  Hadoop  via  Amazon  EMR   EMR   •  Hive  SQL  “Data  Mart”   Hadoop   •  Gateway  to  Datacenter  BI   Hive  talks   evamtse  “Ne#lix:  Hive  User  Group”  h=p://   adrianco  “Crunch  Your  Data  In  The  Cloud”  h=p://   Business   Intelligence  
  17. Streaming  Service  Back-­‐end   (early  2010)   •  PC/Mac  Silverlight  Player  Support   Play   •  Highly  available  “play  bu=on”   DRM   •  DRM  Key  Management   CDN   •  Generate  route  to  stream  on  CDN   rouJng   •  Lookup  bookmark  for  user/movie   Bookmarks   •  Update  bookmark  for  user/movie   •  Log  quality  of  service   Logging  
  18. Web  site,  a  page  at  a  Jme   (through  2010)   •  Clean  presentaJon  layer  rewrite   WWW   •  Search  auto-­‐complete   Search   •  Search  backend  and  landing  page   •  Movie  and  genre  choosing   Movie   •  Star  raJngs  and  recommendaJons   Choosing   •  Similar  movies   RaJngs   •  Page  by  page  to  80%  of  views   (leave  account  signup  in  Datacenter  for  now)   Similars  
  19. API  for  TV  devices  and  iPhone  etc.   (2010)   •  REST  API:   API   •  Interfaces  to  everything  else   Metadata   •  TV  Device  ConfiguraJon   •  Personalized  movie  choosing   Device   Config   •  iPhone  Launch  in  the  cloud  only   TV  Movie   Choosing   “Ne9lix  is  an  API  for  streaming  to  TVs   Mobile    (we  also  do  DVD’s  and  a  web  site)”   iPhone  
  20. Ne#lix  EC2  Instances  per  Account   Encoding   Test  and  ProducJon   Log  Analysis  
  21. Learnings…  
  22. Datacenter  oriented  tools  don’t   work   Ephemeral  instances   High  rate  of  change  
  23. Cloud  Tools  Don’t  Scale  for   Enterprise   Too  many  are  “Startup”  oriented   Built  our  own  tools   Drove  vendors  hard  
  24. “fork-­‐li:ed”  apps  don’t  work  well   Fragile   Too  many  datacenter  oriented   assumpJons  
  25. Faster  to  re-­‐code  from  scratch   •  Opportunity  to  pay  down  technical  debt   •  Re-­‐architected  and  re-­‐wrote  most  of  the  code   •  Fine  grain  web  services   •  Leveraged  many  open  source  Java  projects   •  SystemaJcally  instrumented   •  “NoSQL”  SimpleDB  backend  
  26. “In  the  datacenter,  robust  code  is  best   pracAce.  In  the  cloud,  it’s  essenAal.”  
  27. Takeaway   Ne9lix  is  path-­‐finding  the  use  of  public  AWS   cloud  to  replace  in-­‐house  IT  for  non-­‐trivial   applicaAons  with  hundreds  of  developers  and   thousands  of  systems.   (Pause  for  quesJons  before  we  dive  into  details)  
  28. What,  Why  and  How?   The  details…  
  29. Synopsis   •  The  Goals   –  Faster,  Scalable,  Available  and  ProducJve   •  AnJ-­‐pa=erns  and  Cloud  Architecture   –  The  things  we  wanted  to  change  and  why   •  Cloud  Bring-­‐up  Strategy   –  Developer  TransiJons  and  Tools   •  Roadmap  and  Next  Steps  
  30. Goals   •  Faster   –  Lower  latency  than  the  equivalent  datacenter  web  pages  and  API  calls   –  Measured  as  mean  and  99th  percenJle   –  For  both  first  hit  (e.g.  home  page)  and  in-­‐session  hits  for  the  same  user   •  Scalable   –  Avoid  needing  any  more  datacenter  capacity  as  subscriber  count  increases   –  No  central  verJcally  scaled  databases   –  Leverage  AWS  elasJc  capacity  effecJvely   •  Available   –  SubstanJally  higher  robustness  and  availability  than  datacenter  services   –  Leverage  mulJple  AWS  availability  zones   –  No  scheduled  down  Jme,  no  central  database  schema  to  change   •  ProducJve   –  OpJmize  agility  of  a  large  development  team  with  automaJon  and  tools   –  Leave  behind  complex  tangled  datacenter  code  base  (~8  year  old  architecture)   –  Enforce  clean  layered  interfaces  and  re-­‐usable  components  
  31. Cloud  Architecture  Pa=erns   Where  do  we  start?  
  32. Datacenter  AnJ-­‐Pa=erns   What  do  we  currently  do  in  the   datacenter  that  prevents  us  from   meeJng  our  goals?  
  33. Architecture   •  So:ware  Architecture   –  The  abstracJons  and  interfaces  that  developers  build   against   •  Systems  Architecture   –  The  service  instances  that  define  availability,   scalability   •  Compose-­‐ability   –  so:ware  architecture  that  is  independent  of  the   systems  architecture   –  decoupled  flexible  building  block  components    
  34. Rewrite  from  Scratch   Not  everything  is  cloud  specific   Pay  down  technical  debt   Robust  pa=erns  
  35. Old  Datacenter  vs.  New  Cloud  Arch   Central  SQL  Database   Distributed  Key/Value  NoSQL   SJcky  In-­‐Memory  Session   Shared  Memcached  Session   Cha=y  Protocols   Latency  Tolerant  Protocols   Tangled  Service  Interfaces   Layered  Service  Interfaces   Instrumented  Code   Instrumented  Service  Pa=erns   Fat  Complex  Objects   Lightweight  Serializable  Objects   Components  as  Jar  Files   Components  as  Services  
  36. The  Central  SQL  Database   •  Datacenter  has  a  central  database   –  Everything  in  one  place  is  convenient  unJl  it  fails   –  Customers,  movies,  history,  configuraJon   •  Schema  changes  require  downJme   AnA-­‐paTern  impacts  scalability,  availability  
  37. The  Distributed  Key-­‐Value  Store   •  Cloud  has  many  key-­‐value  data  stores   –  More  complex  to  keep  track  of,  do  backups  etc.   –  Each  store  is  much  simpler  to  administer   DBA   –  Joins  take  place  in  java  code   •  No  schema  to  change,  no  scheduled  downJme   •  Latency  for  Memcached  vs.  Oracle  vs.  SimpleDB   –  Memcached  is  dominated  by  network  latency  <1ms   –  Oracle  for  simple  queries  is  a  few  milliseconds   –  SimpleDB  has  replicaJon  and  REST  overheads  >10ms  
  38. The  SJcky  Session   •  Datacenter  SJcky  Load  Balancing   –  Efficient  caching  for  low  latency   –  Tricky  session  handling  code   –  Middle  Jer  load  balancer  has  issues  in  pracJce   •  Encourages  concentrated  funcJonality   –  one  service  that  does  everything   AnA-­‐paTern  impacts  producAvity,  availability  
  39. The  Shared  Session   •  Cloud  Uses  Round-­‐Robin  Load  Balancing   –  Simple  request-­‐based  code   –  External  shared  caching  with  memcached   •  More  flexible  fine  grain  services   –  Works  be=er  with  auto-­‐scaled  instance  counts  
  40. Cha=y  Opaque  and  Bri=le  Protocols   •  Datacenter  service  protocols   –  Assumed  low  latency  for  many  simple  requests   •  Based  on  serializing  exisJng  java  objects   –  Inefficient  formats   –  IncompaJble  when  definiJons  change   AnA-­‐paTern  causes  producAvity,  latency  and   availability  issues  
  41. Robust  and  Flexible  Protocols   •  Cloud  service  protocols   –  JSR311/Jersey  is  used  for  REST/HTTP  service  calls   –  Custom  client  code  includes  service  discovery   –  Support  complex  data  types  in  a  single  request   •  Apache  Avro   –  Evolved  from  Protocol  Buffers  and  Thri:   –  Includes  JSON  header  defining  key/value  protocol   –  Avro  serializaJon  is  half  the  size  and  several  Jmes   faster  than  Java  serializaJon,  more  work  to  code  
  42. Persisted  Protocols   •  Persist  Avro  in  Memcached   –  Save  space/latency  (zigzag  encoding,  half  the  size)   –  Less  bri=le  across  versions   –  New  keys  are  ignored   –  Missing  keys  are  handled  cleanly   •  Avro  protocol  definiJons   –  Can  be  wri=en  in  JSON  or  generated  from  POJOs   –  It’s  hard,  needs  be=er  tooling  
  43. Tangled  Service  Interfaces   •  Datacenter  implementaJon  is  exposed   –  Oracle  SQL  queries  mixed  into  business  logic   •  Tangled  code   –  Deep  dependencies,  false  sharing   •  Data  providers  with  sideways  dependencies   –  Everything  depends  on  everything  else   AnA-­‐paTern  affects  producAvity,  availability  
  44. Untangled  Service  Interfaces   •  New  Cloud  Code  With  Strict  Layering   –  Compile  against  interface  jar   –  Can  use  spring  runJme  binding  to  enforce   •  Service  interface  is  the  service   –  ImplementaJon  is  completely  hidden   –  Can  be  implemented  locally  or  remotely   –  ImplementaJon  can  evolve  independently  
  45. Untangled  Service  Interfaces   Two  layers:   •  SAL  -­‐  Service  Access  Library   –  Basic  serializaJon  and  error  handling   –  REST  or  POJO’s  defined  by  data  provider   •  ESL  -­‐  Extended  Service  Library   –  Caching,  conveniences   –  Can  combine  several  SALs   –  Exposes  faceted  type  system  (described  later)   –  Interface  defined  by  data  consumer  in  many  cases  
  46. Service  InteracJon  Pa=ern   Sample  Swimlane  Diagram  
  47. Service  Architecture  Pa=erns   •  Internal  Interfaces  Between  Services   –  Common  pa=erns  as  templates   –  Highly  instrumented,  observable,  analyJcs   –  Service  Level  Agreements  –  SLAs   •  Library  templates  for  generic  features   –  Instrumented  Ne#lix  Base  Servlet  template   –  Instrumented  generic  client  interface  template   –  Instrumented  S3,  SimpleDB,  Memcached  clients  
  48. CLIENT   Request  Start   Timestamp,   Client   Inbound   Request  End   outbound   deserialize  end   Timestamp   serialize  start   Jmestamp   Jmestamp   Inbound   Client   deserialize   outbound   start   serialize  end   Jmestamp   Jmestamp   Client  network   receive   Jmestamp   Service  Request   Client  Network   send   Jmestamp   Instruments  Every   Service   network  send   Jmestamp   Step  in  the  call   Service   Network   receive   Jmestamp   Service   Service   outbound   inbound   serialize  end   serialize  start   Jmestamp   Jmestamp   Service   Service   outbound   inbound   SERVICE  execute   serialize  start   serialize  end   Jmestamp   request  start   Jmestamp   Jmestamp,   execute  request   end  Jmestamp  
  49. Boundary  Interfaces   •  Isolate  teams  from  external  dependencies   –  Fake  SAL  built  by  cloud  team   –  Real  SAL  provided  by  data  provider  team  later   –  ESL  built  by  cloud  team  using  faceted  objects   •  Fake  data  sources  allow  development  to  start   –  e.g.  Fake  IdenJty  SAL  for  a  test  set  of  customers   –  Development  solidifies  dependencies  early   –  Helps  external  team  provide  the  right  interface  
  50. One  Object  That  Does  Everything   •  Datacenter  uses  a  few  big  complex  objects   –  Movie  and  Customer  objects  are  the  foundaJon   –  Good  choice  for  a  small  team  and  one  instance   –  ProblemaJc  for  large  teams  and  many  instances   •  False  sharing  causes  tangled  dependencies   –  UnproducJve  re-­‐integraJon  work   AnA-­‐paTern  impacAng  producAvity  and   availability  
  51. An  Interface  For  Each  Component   •  Cloud  uses  faceted  Video  and  Visitor   –  Basic  types  hold  only  the  idenJfier   –  Facets  scope  the  interface  you  actually  need   –  Each  component  can  define  its  own  facets   •  No  false-­‐sharing  and  dependency  chains   –  Type  manager  converts  between  facets  as  needed   –  video.asA(PresentaJonVideo)  for  www   –  video.asA(MerchableVideo)  for  middle  Jer  
  52. So:ware  Architecture  Pa=erns   •  Object  Models   –  Basic  and  derived  types,  facets,  serializable   –  Pass  by  reference  within  a  service   –  Pass  by  value  between  services   •  ComputaJon  and  I/O  Models   –  Service  ExecuJon  using  Best  Effort   –  Common  thread  pool  management  
  53. Ne#lix  Systems  Architecture  
  54. API   AWS  EC2   Front  End  ELB   Discovery   Service   API  Proxy   API  etc.   API  ELB   Component   API   SQS   Services   Oracl e   Oracle   Oracle   memcached   memcached   ReplicaJon   EBS   Ne@lix   S3   Data  Center   AWS  Storage   SimpleDB  
  55. Ne#lix  UndifferenJated  Li:ing   •  Middle  Tier  Load   Balancing   •  Discovery  (local  DNS)   •  EncrypJon  Services   •  Caching   •  Distributed  App   Management   We  want  cloud  vendors  to  do  all  this  for  us  as  well!  
  56. Load  Balancing  in  AWS   •  Middle  Jer  currently  not  supported  in  AWS   –  ELB  are  public-­‐facing  only   –  Cannot  apply  security  group  sezngs   •  ELB  verJcal  scalability  for  concentrated  clients   –  Too  few  proxy  IP  addresses  leads  to  hot  spots   •  ELB  needs  support  for  balancing  heurisJcs   –  ProporJonal  balance  across  Availability  Zones   –  Weighted  Least  connecJons,  Weighted  Round  Robin   •  Zone  aware  rouJng   –  Default  to  instances  in  the  same  Availability  Zone   –  Falls  back  to  cross-­‐zone  on  failure  
  57. Discovery   •  Discovery  Service  (Redundant  instances  per  zone)   –  Simple  REST  interface   –  Cloud  apps  register  with  Discovery   •  Apps  send  heartbeats  every  30  sec  to  renew  lease   –  App  evicted  a:er  3  missed  heartbeats   –  Can  re-­‐register  if  the  problem  was  transient   •  Apps  can  store  custom  metadata   –  Version  number,  AMI  id,  Availability  Zone,  etc.   •  So:ware  Round-­‐robin  Load  Balancer   –  Query  Discovery  for  instances  of  specific  applicaJon   –  Baked  into  Ne#lix  REST  client  (JSR311/Jersey  based)   AWS  Middle-­‐)er  ELB  would  eliminate  most  use  cases  
  58. Database  MigraJon   •  Why  SimpleDB?   –  No  DBA’s  in  the  cloud,  Amazon  hosted  service   –  Work  started  two  years  ago,  fewer  viable  opJons   –  Worked  with  Amazon  to  speed  up  and  scale  SimpleDB   •  AlternaJves?   –  InvesJgaJng  adding  Cassandra  and  Membase  to  the  mix   –  Need  several  opJons  to  match  use  cases  well   •  Detailed  SimpleDB  Advice   –  Sid  Anand    -­‐  QConSF  Nov  5th  –  Ne#lix’  TransiJon  to  High   Availability  Storage  Systems   –  Blog  -­‐  h=p://   –  Download  Paper  PDF  -­‐  h=p://  
  59. Oracle  to  SimpleDB   (See  Sid’s  paper  for  details)   •  SimpleDB  Domains   –  De-­‐normalize  mulJple  tables  into  a  single  domain   –  Work  around  size  limits  (10GB  per  domain,  1KB  per  key)   –  Shard  data  across  domains  to  scale   –  Key  –  Use  distributed  sequence  generator,  GUID  or  natural   unique  key  such  as  customer-­‐id     –  Implement  a  schema  validator  to  catch  bad  a=ributes   •  ApplicaJon  layer  support   –  Do  GROUP  BY  and  JOIN  operaJons  in  the  applicaJon   –  Compose  relaJons  in  the  applicaJon  layer   –  Check  constraints  on  read,  and  repair  data  as  a  side  effect   •  Do  without  triggers,  PL/SQL,  clock  operaJons  
  60. Tools  and  AutomaJon   •  Developer  and  Build  Tools   –  Jira,  Eclipse,  Hudson,  Ivy,  ArJfactory   –  Builds,  creates  .war  file,  .rpm,  bakes  AMI  and  launches   •  Custom  Ne#lix  ApplicaJon  Console   –  AWS  Features  at  Enterprise  Scale  (hide  the  keys!)   –  Auto  Scaler  Group  is  unit  of  deployment  to  producJon   •  Open  Source  +  Support   –  Apache,  Tomcat,  OpenJDK,  CentOS   •  Monitoring  Tools   –  Keynote  –  service  monitoring  and  alerJng   –  AppDynamics  –  Developer  focus  for  cloud   –  EpicNMS  –  flexible  data  collecJon  and  plots  h=p://   –  Nimso:  NMS  –  ITOps  focus  for  Datacenter  +  Cloud  alerJng  
  61. Ne#lix  App  Console  
  62. Auto  Scaling  Groups  
  63. ASG  ConfiguraJon  
  64. Monitoring  Tools  
  65. Monitoring  Vision   •  Problem   –  Too  many  tools,  each  with  a  good  reason  to  exist   –  Hard  to  get  an  integrated  view  of  a  problem   –  Too  much  manual  work  building  dashboards   –  Tools  are  not  discoverable,  views  are  not  filtered   •  SoluJon   –  Get  vendors  to  add  deep  linking  and  embedding   –  IntegraJon  “portal”  Jes  everything  together   –  Dynamic  portal  generaJon,  relevant  data,  all  tools  
  66. Cloud  Monitoring  Mechanisms   •  Keynote   –  External  URL  monitoring   •  Amazon  CloudWatch   –  Metrics  for  ELB  and  Instances   •  AppDynamics   –  End  to  end  transacJon  view  showing  resources  used   –  Powerful  real  Jme  debug  tools  for  latency,  CPU  and  Memory   •  Nimso:  NMS   –  Scalable  and  reliable  monitoring  and  alerJng,  integraJon  portal   •  Epic   –  Flexible  and  easy  to  use  to  extend  and  embed  plots   •  Logs   –  High  capacity  logging  and  analysis  framework   –  Hadoop  (log4j  -­‐>  chukwa  -­‐>  EMR)  
  67. Snapshots  for  a  Business  TransacJon   Sort  Call  Graphs  to  Top,  pick  a  slow  one  
  68. Drill  in  to  Slow  Call   Slow  Asynchronous  S3  Write  –  no  big  deal…  
  69. Cloud  Bring-­‐Up  Strategy   Simplest  and  Soonest  
  70. Shadow  Traffic  RedirecJon   •  Early  a=empt  to  send  traffic  to  cloud   –  Real  traffic  stream  to  validate  cloud  back  end   –  Uncovered  lots  of  process  and  tools  issues   –  Uncovered  Service  latency  issues   •  TV  Device  calls  Datacenter  API   –  Returns  Genre/movie  list  for  a  customer   –  Asynchronously  duplicates  request  to  cloud   –  Start  with  send-­‐and-­‐forget  mode,  ignore  response  
  71. Shadow  Redirect  Instances   Modified   Datacenter   Datacenter   Service   Instances   Modified  Cloud   Cloud  Service   One  request  per   Instances   visit   Data  Sources   queueservice   videometadata  
  72. First  Web  Pages  in  the  Cloud  
  73. Starz  Page  
  74. First  Page   •  First  full  page  –  Starz  Channel  Genre   –  Simplest  page,  no  sub-­‐genres,  minimal  personalizaJon   –  Lots  of  investment  in  new  Struts  based  page  design   –  Uses  idenJty  cookie  to  lookup  in  member  info  svc   •  New  “merchweb”  front  end  instance   –  points  to  merchweb  instance   •  Uncovered  lots  of  latency  issues   –  Used  memcached  to  hide  S3  and  SimpleDB  latency   –  Improved  from  slower  to  faster  than  Datacenter  
  75. Starz  Page  Cloud  Instances   Front  End   merchweb   mulJple  requests   Middle  Tier   starz    memcached   per  visit   Data  Sources   queueservice   rentalhistory   videometadata  
  76. Controlled  Cloud  TransiJon   •  WWW  calling  code  chooses  who  goes  to  cloud   –  Filter  out  corner  cases,  send  percentage  of  users   –  The  URL  that  customers  see  is   h=p://   –  If  problem,  redirect  to  old  Datacenter  page   h=p://   •  Play  Bu=on  and  Star  RaJng  AcJon  redirect   –  Point  URLs  for  acJons  that  create/modify  data   back  to  datacenter  to  start  with  
  77. Developers  
  78. Cloud  Developer  Setup   •  Cloud  Boot  Camp   –  Room  full  of  engineers  sharing  the  pain  for  1-­‐2  days   –  Built  a  very  rough  prototype  working  web  site   –  Get  everyone  hands-­‐on  in  the  cloud  with  a  new  code  base   –  Debug  lots  of  tooling  and  conceptual  issues  very  fast   –  Member  info  in  SimpleDB  with  developer’s  accounts  only   •  Cloud  Specific  Key  Setup   –  It’s  a  pain,  need  to  configure  your  IDE’s  JVM   –  Needed  to  integrate  with  AWS  security  model   •  Startup  Guide  Wiki  Pages   –  What  object  facets  already  exist,  how  to  make  your  own   –  What  components  already  exist  or  are  work  in  progress  
  79. Developer  Instances  Collision   Sam  and  Rex  both  want  to  deploy  web  front  end  for   development   Sam   Rex   web  in   test   account  
  80. Per-­‐Service  Namespace  RouJng   Developers  choose  what  to  share   Sam   Rex   Mike   web-­‐sam   web-­‐rex   web-­‐dev   backend-­‐dev   backend-­‐dev   backend-­‐mike  
  81. Developer  Service  Namespaces   •  Developer  specific  service  instances   –  Configured  via  Java  properJes  at  runJme   –  RouJng  implemented  by  REST  client  library   •  Server  ConfiguraJon   –  Configure  discovery  service  version  string   –  Registers  as  <appname>-­‐<namespace>   •  Client  ConfiguraJon   –  Route  traffic  on  per-­‐service  basis  including   namespace  
  82. Current  Status  
  83. WWW  Page  by  Page  during  Q2/Q3/Q4   •  Simplest  possible  page  first   –  Minimal  dependencies   •  Add  pages  as  dependent  services  come  online   •  Home  page  –  most  complex  and  highest  traffic   •  Leave  low  traffic  pages  for  later  cleanup   gradual  migraAon  from  Datacenter  pages  
  84. Big-­‐Bang  TransiJon   •  iPhone  Launch  (August/Sept)   –  No  capacity  in  the  datacenter,  cloud  only   –  App  Store  gates  release,  not  gradual,  can’t  back  out   –  Market  is  huge  (exisJng  and  new  customers)   –  Has  to  work  at  large  scale  on  day  one   •  Datacenter  Shadow  Redirect  Technique   –  Used  to  stress  back-­‐end  and  data  sources   •  SOASTA  Cloud  Based  Load  GeneraJon   –  Used  to  stress  test  API  and  end-­‐to-­‐end  funcJonality  
  85. Current  Work  for  Cloud  Pla#orm   •  Drive  latency  and  availability  goals   –  More  Aggressive  caching   –  Improving  Fault  and  latency  robustness   •  Logging  and  monitoring  portal/dashboards   –  Working  to  integrate  tools  and  data  sources   –  Need  be=er  observability  and  automaJon   •  EvaluaJng  a  range  of  NoSQL  choices   –  Broad  set  of  use  cases,  no  single  winner   –  Good  topic  for  another  talk…  
  86. Wrap  Up  
  87. Next  Few  Years…   •  “System  of  Record”  moves  to  Cloud   –  Master  copies  of  data  live  only  in  the  cloud,  with  backups  etc.   –  Cut  the  datacenter  to  cloud  replicaJon  link   •  InternaJonal  Expansion  –  Global  Clouds   –  Rapid  deployments  to  new  markets   •  GPU  Clouds  opJmized  for  video  encoding   •  Cloud  StandardizaJon   –  Cloud  features  and  APIs  should  be  a  commodity  not  a  differenJator   –  DifferenJate  on  scale  and  quality  of  service   –  CompeJJon  also  drives  cost  down   –  Higher  resilience   –  Higher  scalability   We  would  prefer  to  be  an  insignificant  customer  in  a  giant  cloud  
  88. Remember  the  Goals   Faster   Scalable   Available   ProducJve   Track  progress  against  these  goals  
  89. Takeaway   Ne9lix  is  path-­‐finding  the  use  of  public  AWS   cloud  to  replace  in-­‐house  IT  for  non-­‐trivial   applicaAons  with  hundreds  of  developers  and   thousands  of  systems.   h=p://   @adrianco  #ne#lixcloud