The	  Global	  Ne+lix	  Pla+orm	    A	  Large	  Scale	  Java	  oriented	  PaaS	  running	  on	  AWS	                     O...
Ne9lix	  Inc.	     With	  more	  than	  20	  million	  streaming	  members	  in	  the	     United	  States,	  Canada	  and...
The	  Global	  Ne9lix	  Pla9orm	               Ne9lix	  Cloud	  MigraLon	    Ne9lix	  Pla9orm	  Services	  and	  Interface...
Why	  Use	  Public	  Cloud?	  
Things	  We	  Don’t	  Do	  
Be=er	  Business	  Agility	  
Data	  Center	                  Ne9lix	  could	  not	                                       build	  new	                  ...
Out-­‐Growing	  Data	  Center	               h=p://techblog.ne9lix.com/2011/02/redesigning-­‐ne9lix-­‐api.html   	        ...
Ne9lix.com	  is	  now	  ~100%	  Cloud	      A	  few	  small	  back	  end	  data	  sources	  sLll	  in	  progress	         ...
Ne9lix	  Choice	  was	  AWS	  with	  our	     own	  pla9orm	  and	  tools	       Unique	  pla9orm	  requirements	  and	   ...
Leverage	  AWS	  Scale	     “the	  biggest	  public	  cloud”	   AWS	  investment	  in	  features	  and	  automaLon	  Use	 ...
But	  isn’t	  Amazon	  a	  compeLtor?	  Many	  products	  that	  compete	  with	  Amazon	  run	  on	  AWS	    We	  are	  a...
Could	  Ne9lix	  use	  another	  cloud?	   Would	  be	  nice,	  we	  use	  three	  interchangeable	  CDN	  Vendors	      B...
We	  want	  to	  use	  clouds,	       we	  don’t	  have	  Lme	  to	  build	  them	                               Public	  ...
Ne9lix	  Deployed	  on	  AWS	  Content	          Logs	              Play	              WWW	             API	              ...
Amazon Cloud Terminology Reference     See http://aws.amazon.com/ This is not a full list of Amazon Web Service features• ...
Boot	  Camp	  •  One	  day	  “Ne9lix	  Cloud	  Training”	  class	      –  Has	  been	  run	  5	  Lmes	  for	  20-­‐45	  pe...
Ne9lix	  Built	  a	  PaaS!	  •  Ne9lix	  Cloud	  Systems	  team	  (50+	  rock-­‐stars	  :)	      –  VP	  Cloud	  Systems	 ...
Ne9lix	  Global	  PaaS	  •    Architecture	  Features	  and	  Overview	  •    Portals	  and	  Explorers	  •    Pla9orm	  S...
Global	  PaaS?	              Toys	  are	  nice,	  but	  this	  is	  the	  real	  thing…	  •    Supports	  all	  AWS	  Avai...
Instance	  Architecture	  Linux	  Base	  AMI	  (currently	  Centos	  5)	     OpLonal	     Apache	    frontend,	           ...
Security	  Architecture	  •  Instance	  Level	  Security	  baked	  into	  base	  AMI	      –  Login	  via	  ssh	  only	  a...
Core	  Pla9orm	  Frameworks	  and	  APIs	  
Portals	  and	  Explorers	  •  Ne9lix	  ApplicaLon	  Console	  (NAC)	      –  Primary	  AWS	  provisioning/config	  interfa...
AWS	  Usage	  for	  test,	  carefully	  omi|ng	  any	  $	  numbers…   	  
Cassandra	  Explorer	  
Cassandra	  Explorer	  
Pla9orm	  Services	  •    Discovery	  –	  service	  registry	  for	  “applicaLons”	  •    IntrospecLon	  –	  Entrypoints	 ...
IntrospecLon	  -­‐	  Entrypoints	  •  REST	  API	  for	  tools,	  apps,	  explorers,	  monkeys…	     –  E.g.	  GET	  /REST...
Entrypoints	  Queries	          MongoDB	  is	  good	  for	  low	  traffic	  complex	  queries	  against	  complex	  objects ...
Metrics	  Framework	  •  System	  and	  ApplicaLon	      –  CollecLon,	  AggregaLon,	  Querying	  and	  ReporLng	      –  ...
ConfiguraAon	  Management	  •  Ne9lixConfiguraLon	       –  ValidaLon	  Framework	       –  Sitewide	  ProperLes	  Explorer	...
Interprocess	  CommunicaAon	  •  Discovery	  Service	  registry	  for	  “applicaLons”	      –  “here	  I	  am”	  call	  ev...
Security	  Key	  Management	  •  AKMS	      –  Dynamic	  Key	  Management	  interface	      –  Update	  AWS	  keys	  at	  ...
AWS	  Persistence	  Services	  •  SimpleDB	      –  Got	  us	  started,	  migraLng	  to	  Cassandra	  now	      –  NFSDB	 ...
Ne+lix	  Pla+orm	  Persistence	  •  Eccentric	  VolaLle	  Cache	  –	  evcache	      –  Discovery-­‐aware	  memcached	  bas...
Aside:	  Adrian’s	  Rant	  on	  CAP	  Theorem	  •    Instances	  and	  Networks	  will	  fail	  •    Network	  failure	  =...
Why	  Cassandra?	  •  We	  value	  Availability	  over	  Consistency	  –	  AP	      –  Cassandra	  is	  a	  Java	  distrib...
Priam	  –	  Cassandra	  AutomaLon	                Coming	  soon	  to	  h=p://github.com/ne9lix	  •    Ne9lix	  Pla9orm	  T...
Astyanax	                         Coming	  soon	  to	  h=p://github.com/ne9lix	  •  Cassandra	  java	  client	  •  API	  a...
IniLalizing	  Astyanax	  // Configuration either set in code or nfastyanax.propertiesplatform.ListOfComponentsToInit=LOGGI...
Astyanax	  Query	  Example	  Paginate	  through	  all	  columns	  in	  a	  row	  ColumnList<String>	  columns;	  int	  pag...
Data	  MigraLon	  to	  Cassandra	  
Distributed	  Key-­‐Value	  Stores	  •  Cloud	  has	  many	  key-­‐value	  data	  stores	      –  More	  complex	  to	  ke...
MulA-­‐Regional	  Data	  ReplicaAon	  •  IR	  Framework	  –	  Datacenter	  Item	  Replicator	      –  Built	  in	  2009,	 ...
TransiAonal	  Steps	  •  BidirecLonal	  ReplicaLon	     –  Oracle	  to	  SimpleDB	     –  Queued	  reverse	  path	  using	...
API	  AWS	  EC2	                                              Front	  End	  Load	  Balancer	               Discovery	     ...
Cu|ng	  the	  Umbilical	  •  TransiLon	  Oracle	  Data	  Sources	  to	  Cassandra	      –  Offload	  Datacenter	  Oracle	  h...
API	  AWS	  EC2	                                     Front	  End	  Load	  Balancer	              Discovery	               ...
High	  Availability	  •  Cassandra	  stores	  3	  local	  copies,	  1	  per	  zone	         –  Synchronous	  access,	  dur...
Cassandra	  Write	  Data	  Flows	                           Single	  Region,	  MulLple	  Availability	  Zone	             ...
Data	  Flows	  for	  MulL-­‐Region	  Writes	                                      Consistency	  Level	  =	  Local	  Quorum...
Remote	  Copies	  •  Cassandra	  duplicates	  across	  AWS	  regions	      –  Asynchronous	  write,	  replicates	  at	  de...
Cassandra	  Backup	  •  Full	  Backup	                                                                      Cassandra	    ...
Cassandra	  Restore	  •  Full	  Restore	                                                                   Cassandra	     ...
Cassandra	  Online	  AnalyLcs	  •  Brisk	  =	  Hadoop	  +	  Cass	                                                   Cassan...
Cassandra	  Archive	                       Appropriate	  level	  of	  paranoia	  needed…                	  •  Archive	  co...
Extending	  to	  MulL-­‐Region	                           In	  producLon	  last	  week	  for	  UK/Eire	  support!	  1.    ...
Tools	  and	  AutomaLon	  •  Developer	  and	  Build	  Tools	        –  Jira,	  Perforce,	  Eclipse,	  Jenkins,	  Ivy,	  A...
Developer	  MigraLon	  •  Detailed	  SQL	  to	  NoSQL	  TransiLon	  Advice	     –  Sid	  Anand	  	  -­‐	  QConSF	  Nov	  5...
Cloud	  OperaLons	     Cassandra	  Use	  Cases	  Model	  Driven	  Architecture	  Performance	  and	  Scalability	  
Cassandra	  Use	  Cases	  •  Key	  by	  Customer	  –	  Cross-­‐region	  clusters	       –  Many	  app	  specific	  Cassandr...
Model	  Driven	  Architecture	  •  Datacenter	  PracLces	     –  Lots	  of	  unique	  hand-­‐tweaked	  systems	     –  Har...
Chaos	  Monkey	  •  Make	  sure	  systems	  are	  resilient	      –  Allow	  any	  instance	  to	  fail	  without	  custom...
AppDynamics	  Monitoring	  of	  Cassandra	  –	  AutomaLc	  Discovery	  
Scalability	  TesLng	  •  Cloud	  Based	  TesLng	  –	  fricLonless,	  elasLc	      –  Create/destroy	  any	  sized	  clust...
<DrEvil>ONE	  MILLION</DrEvil>	  
Scale-­‐Up	  Linearity	                          Client	  Writes/s	  by	  node	  count	  –	  ReplicaAon	  Factor	  =	  3	 ...
Per	  Node	  AcLvity	            Per	  Node	               48	  Nodes	         96	  Nodes	         144	  Nodes	           ...
Time	  is	  Money	                                     48	  nodes	        96	  nodes	                  144	  nodes	       ...
Takeaway	                                      	   Ne<lix	  has	  built	  and	  deployed	  a	  scalable	  global	         ...
Global Netflix Platform
Global Netflix Platform
Global Netflix Platform
Global Netflix Platform
Upcoming SlideShare
Loading in...5
×

Global Netflix Platform

110,310

Published on

Slides from QConSF Nov 19th, 2011 focusing this time on describing the globally distributed and scaled industrial strength Java Platform as a Service that Netflix has built and run on top of AWS and Cassandra. Parts of that platform are being released as open source - Curator, Priam and Astyanax.

Published in: Technology, News & Politics
10 Comments
192 Likes
Statistics
Notes
No Downloads
Views
Total Views
110,310
On Slideshare
0
From Embeds
0
Number of Embeds
36
Actions
Shares
0
Downloads
0
Comments
10
Likes
192
Embeds 0
No embeds

No notes for slide

Transcript of "Global Netflix Platform"

  1. 1. The  Global  Ne+lix  Pla+orm   A  Large  Scale  Java  oriented  PaaS  running  on  AWS   October  24th,  2011   Adrian  Cockcro6   @adrianco  #ne9lixcloud   h=p://www.linkedin.com/in/adriancockcro6  
  2. 2. Ne9lix  Inc.   With  more  than  20  million  streaming  members  in  the   United  States,  Canada  and  La8n  America,  Ne<lix,  Inc.   is  the  worlds  leading  Internet  subscrip8on  service  for   enjoying  movies  and  TV  shows.     Interna8onal  Expansion   Ne<lix,  Inc.,  the  leading  global  Internet  movie   subscrip8on  service…  announced  it  will  expand  to  the   United  Kingdom  and  Ireland  in  early  2012.  Source:  h=p://ir.ne9lix.com  
  3. 3. The  Global  Ne9lix  Pla9orm   Ne9lix  Cloud  MigraLon   Ne9lix  Pla9orm  Services  and  Interfaces   Highly  Available  and  Globally   Distributed  Data   Scalability  and  Performance  
  4. 4. Why  Use  Public  Cloud?  
  5. 5. Things  We  Don’t  Do  
  6. 6. Be=er  Business  Agility  
  7. 7. Data  Center   Ne9lix  could  not   build  new   datacenters  fast   enough   Capacity  growth  is  acceleraLng,  unpredictable   Product  launch  spikes  -­‐  iPhone,  Wii,  PS3,  XBox  
  8. 8. Out-­‐Growing  Data  Center   h=p://techblog.ne9lix.com/2011/02/redesigning-­‐ne9lix-­‐api.html   37x  Growth  Jan   2010-­‐Jan  2011  Datacenter  Capacity  
  9. 9. Ne9lix.com  is  now  ~100%  Cloud   A  few  small  back  end  data  sources  sLll  in  progress   All  internaLonal  product  is  cloud  based   USA  specific  logisLcs  remains  in  the  Datacenter   Working  aggressively  on  billing,  PCI  compliance  on  AWS  
  10. 10. Ne9lix  Choice  was  AWS  with  our   own  pla9orm  and  tools   Unique  pla9orm  requirements  and   extreme  scale,  agility  and  flexibility  
  11. 11. Leverage  AWS  Scale   “the  biggest  public  cloud”   AWS  investment  in  features  and  automaLon  Use  AWS  zones  and  regions  for  high  availability,   scalability  and  global  deployment  
  12. 12. But  isn’t  Amazon  a  compeLtor?  Many  products  that  compete  with  Amazon  run  on  AWS   We  are  a  “poster  child”  for  the  AWS  Architecture   Ne9lix  is  one  of  the  biggest  AWS  customers   Strategy  –  turn  compeLtors  into  partners  
  13. 13. Could  Ne9lix  use  another  cloud?   Would  be  nice,  we  use  three  interchangeable  CDN  Vendors   But  no-­‐one  else  has  the  scale  and  features  of  AWS   You  have  to  be  this  tall  to  ride  this  ride…   Maybe  in  2-­‐3  years?  
  14. 14. We  want  to  use  clouds,   we  don’t  have  Lme  to  build  them   Public  cloud  for  agility  and  scale  We  use  electricity  too,  but  don’t  want  to  build  our  own  power  staLon…  AWS  because  they  are  big  enough  to  allocate  thousands  of  instances  per   hour  when  we  need  to  
  15. 15. Ne9lix  Deployed  on  AWS  Content   Logs   Play   WWW   API   CS   Video   InternaLonal   Masters   S3   DRM   Sign-­‐Up   Metadata   CS  lookup   Device   DiagnosLcs   EC2   EMR  Hadoop   CDN  rouLng   Search   Config   &  AcLons   Movie   TV  Movie   Customer   S3   Hive   Bookmarks   Choosing   Choosing   Call  Log   Business   Social   CDNs   Logging   RaLngs   Facebook   CS  AnalyLcs   Intelligence  
  16. 16. Amazon Cloud Terminology Reference See http://aws.amazon.com/ This is not a full list of Amazon Web Service features•  AWS  –  Amazon  Web  Services  (common  name  for  Amazon  cloud)  •  AMI  –  Amazon  Machine  Image  (archived  boot  disk,  Linux,  Windows  etc.  plus  applicaLon  code)  •  EC2  –  ElasLc  Compute  Cloud   –  Range  of  virtual  machine  types  m1,  m2,  c1,  cc,  cg.  Varying  memory,  CPU  and  disk  configuraLons.   –  Instance  –  a  running  computer  system.  Ephemeral,  when  it  is  de-­‐allocated  nothing  is  kept.   –  Reserved  Instances  –  pre-­‐paid  to  reduce  cost  for  long  term  usage   –  Availability  Zone  –  datacenter  with  own  power  and  cooling  hosLng  cloud  instances   –  Region  –  group  of  Availability  Zones  –  US-­‐East,  US-­‐West,  EU-­‐Eire,  Asia-­‐Singapore,  Asia-­‐Japan,  US-­‐Gov  •  ASG  –  Auto  Scaling  Group  (instances  booLng  from  the  same  AMI)  •  S3  –  Simple  Storage  Service  (h=p  access)  •  EBS  –  ElasLc  Block  Storage  (network  disk  filesystem  can  be  mounted  on  an  instance)  •  RDS  –  RelaLonal  Database  Service  (managed  MySQL  master  and  slaves)  •  SDB  –  Simple  Data  Base  (hosted  h=p  based  NoSQL  data  store)  •  SQS  –  Simple  Queue  Service  (h=p  based  message  queue)  •  SNS  –  Simple  NoLficaLon  Service  (h=p  and  email  based  topics  and  messages)  •  EMR  –  ElasLc  Map  Reduce  (automaLcally  managed  Hadoop  cluster)  •  ELB  –  ElasLc  Load  Balancer  •  EIP  –  ElasLc  IP  (stable  IP  address  mapping  assigned  to  instance  or  ELB)  •  VPC  –  Virtual  Private  Cloud  (extension  of  enterprise  datacenter  network  into  cloud)  •  IAM  –  IdenLty  and  Access  Management  (fine  grain  role  based  security  keys)  
  17. 17. Boot  Camp  •  One  day  “Ne9lix  Cloud  Training”  class   –  Has  been  run  5  Lmes  for  20-­‐45  people  each  Lme  •  Half  day  of  presentaLons  •  Half  day  hands-­‐on   –  Create  your  own  hello  world  app   –  Launch  in  AWS  test  account   –  Login  to  your  cloud  instances   –  Find  monitoring  data  on  your  cloud  instances   –  Connect  to  Cassandra  and  read/write  data  
  18. 18. Ne9lix  Built  a  PaaS!  •  Ne9lix  Cloud  Systems  team  (50+  rock-­‐stars  :)   –  VP  Cloud  Systems  (Yury  Izrailevsky)   –  Site  Reliability  Engineering  (@jedberg)  Hiring++!   –  Cloud  Performance  (Denis  Sheahan)   –  Database  Engineering  -­‐  Cassandra+MySQL  (@r39132)     –  Pla9orm  Engineering  –  Astyanax  (Eran  Landau)   –  Cloud  Tools  Engineering  –  Jenkins  (@cquinn)   –  Cloud  SoluLons  Team  –  Monkeys  (@atseitlin)   –  Security  (Jason  Chan)   –  Architecture  (@adrianco)  
  19. 19. Ne9lix  Global  PaaS  •  Architecture  Features  and  Overview  •  Portals  and  Explorers  •  Pla9orm  Services  •  Pla9orm  APIs  •  Pla9orm  Frameworks  •  Persistence  •  Scalability  Benchmark  
  20. 20. Global  PaaS?   Toys  are  nice,  but  this  is  the  real  thing…  •  Supports  all  AWS  Availability  Zones  and  Regions  •  Supports  mulLple  AWS  accounts  {test,  prod,  etc.}  •  Cross  Region/Acct  Data  ReplicaLon  and  Archiving  •  InternaLonalized,  Localized  and  GeoIP  rouLng  •  Security  is  fine  grain,  dynamic  AWS  keys  •  Autoscaling  to  thousands  of  instances  •  Monitoring  for  millions  of  metrics  •  20M+  users  USA,  Canada,  LaLn  America  (UK,  Eire)  
  21. 21. Instance  Architecture  Linux  Base  AMI  (currently  Centos  5)   OpLonal   Apache   frontend,   Java  (choice  of  JDK  6  or  7)  memcached,  non-­‐java  apps   Tomcat   AppDynamics   appagent   Monitoring   Log  rotaLon   ApplicaLon  servlet,  base   to  S3   Healthcheck,  status   GC  and  thread   server,  pla9orm,  interface  AppDynamics   servlets,  JMX  interface   dump  logging   jars  for  dependent  services  machineagent   Epic    
  22. 22. Security  Architecture  •  Instance  Level  Security  baked  into  base  AMI   –  Login  via  ssh  only  allowed  via  portal   –  Each  app  type  runs  as  its  own  userid  app{test|prod}  •  AWS  Security,  IdenLty  and  Access  Management   –  Each  app  has  its  own  security  group  (firewall  ports)   –  Fine  grain  user  roles  and  resource  ACLs  •  Key  Management   –  AWS  Keys  dynamically  provisioned,  easy  updates   –  High  grade  app  key  management  support  
  23. 23. Core  Pla9orm  Frameworks  and  APIs  
  24. 24. Portals  and  Explorers  •  Ne9lix  ApplicaLon  Console  (NAC)   –  Primary  AWS  provisioning/config  interface  •  AWS  Usage  Analyzer   –  Breaks  down  costs  by  applicaLon  and  resource  •  SimpleDB  Explorer   –  Browse  domains,  items,  a=ributes,  values  •  Cassandra  Explorer   –  Browse  clusters,  keyspaces,  column  families  •  Base  Server  Explorer   –  Browse  service  endpoints  configuraLon,  perf  
  25. 25. AWS  Usage  for  test,  carefully  omi|ng  any  $  numbers…  
  26. 26. Cassandra  Explorer  
  27. 27. Cassandra  Explorer  
  28. 28. Pla9orm  Services  •  Discovery  –  service  registry  for  “applicaLons”  •  IntrospecLon  –  Entrypoints  •  Cryptex  –  Dynamic  security  key  management  •  Geo  –  Geographic  IP  lookup  •  Pla9ormservice  –  Dynamic  property  configuraLon  •  LocalizaLon  –  manage  and  lookup  local  translaLons  •  Evcache  –  eccentric  volaLle  (mem)cached  •  Cassandra  –  Persistence  •  Zookeeper  -­‐  CoordinaLon  •  Various  proxies  –  access  to  old  datacenter  stuff  
  29. 29. IntrospecLon  -­‐  Entrypoints  •  REST  API  for  tools,  apps,  explorers,  monkeys…   –  E.g.  GET  /REST/v1/instance/$INSTANCE_ID  •  AWS  Resources   –  Autoscaling  Groups,  EIP  Groups,  Instances  •  Ne9lix  PaaS  Resources   –  Discovery  ApplicaLons,  Clusters  of  ASGs,  History  
  30. 30. Entrypoints  Queries   MongoDB  is  good  for  low  traffic  complex  queries  against  complex  objects  DescripAon   Range  expression  Find  all  acLve  instances.     all()  Find  all  instances  associated  with  a  group   %(cloudmonkey)  name.  Find  all  instances  associated  with  a   /^cloudmonkey$/discovery()  discovery  group.    Find  all  auto  scale  groups  with  no  instances.   asg(),-­‐has(INSTANCES;asg())  How  many  instances  are  not  in  an  auto   count(all(),-­‐info(eval(INSTANCES;asg())))    scale  group?  What  groups  include  an  instance?   *(i-­‐4e108521)  What  auto  scale  groups  and  elasLc  load   filter(TYPE;asg,elb;*(i-­‐4e108521))  balancers  include  an  instance?  What  instance  has  a  given  public  ip?   filter(PUBLIC_IP;174.129.188.{0..255};all())  
  31. 31. Metrics  Framework  •  System  and  ApplicaLon   –  CollecLon,  AggregaLon,  Querying  and  ReporLng   –  Non-­‐blocking  logging,  avoids  log4j  lock  contenLon   –  Chukwa  -­‐>  S3  -­‐>  EMR  -­‐>  Hive  •  Performance,  Robustness,  Monitoring,  Analysis   –  Tracers,  Counters  –  explicit  code  instrumentaLon  log   –  Real  Time  Tracers/Counters   –  SLA  –  service  level  response  Lme  percenLles   –  Epic  (@MonitoredResources)  annotated  JMX  extract  •  Latency  Monkey  Infrastructure   –  Inject  random  delays  into  service  responses  
  32. 32. ConfiguraAon  Management  •  Ne9lixConfiguraLon   –  ValidaLon  Framework   –  Sitewide  ProperLes  Explorer  •  Pla9ormService  •  Mapping  Service  •  ZooKeeper  (Curator)  •  InstanceIdenLty  
  33. 33. Interprocess  CommunicaAon  •  Discovery  Service  registry  for  “applicaLons”   –  “here  I  am”  call  every  30s,  drop  a6er  3  missed   –  “where  is  everyone”  call   –  Redundant,  distributed,  moving  to  Zookeeper  •  NIWS  –  Ne9lix  Internal  Web  Service  client   –  So6ware  Middle  Tier  Load  Balancer   –  Failure  retry  moves  to  next  instance   –  Many  opLons  for  encoding,  etc.  
  34. 34. Security  Key  Management  •  AKMS   –  Dynamic  Key  Management  interface   –  Update  AWS  keys  at  runLme,  no  restart   –  All  keys  stored  securely,  none  on  disk  or  in  AMI  •  Cryptex  -­‐  Flexible  key  store   –  Low  grade  keys  processed  in  client   –  Medium  grade  keys  processed  by  Cryptex  service   –  High  grade  keys  processed  by  hardware  (Ingrian)  
  35. 35. AWS  Persistence  Services  •  SimpleDB   –  Got  us  started,  migraLng  to  Cassandra  now   –  NFSDB  -­‐  Instrumented  wrapper  library   –  Domain  and  Item  sharding  (workarounds)  •  S3   –  Upgraded/Instrumented  JetS3t  based  interface   –  Supports  mulLpart  upload  and  large  files   –  Global  S3  endpoint  management  
  36. 36. Ne+lix  Pla+orm  Persistence  •  Eccentric  VolaLle  Cache  –  evcache   –  Discovery-­‐aware  memcached  based  backend   –  Client  abstracLons  for  zone  aware  replicaLon   –  OpLon  to  write  to  all  zones,  fast  read  from  local  •  Cassandra   –  Highly  available  and  scalable  (more  later…)  •  MongoDB   –  Complex  object/query  model  for  small  scale  use  •  MySQL   –  Hard  to  scale,  legacy  and  small  relaLonal  models  
  37. 37. Aside:  Adrian’s  Rant  on  CAP  Theorem  •  Instances  and  Networks  will  fail  •  Network  failure  =  ParLLon  “P”  is  a  given  •  Distributed  Systems:  two  choices  –  CP  or  AP  •  “Vendor  claims  CA”   –  Usually  they  mean  available  when  instances  fail  •  Master-­‐Slave  =  Consistent  when  ParLLoned   –  You  can’t  write  unless  you  can  see  the  master  •  Quorum  =  Available  when  ParLLoned   –  Writes  proceed,  conflicts  will  be  patched  up  later  
  38. 38. Why  Cassandra?  •  We  value  Availability  over  Consistency  –  AP   –  Cassandra  is  a  Java  distributed  systems  toolkit  •  We  have  a  building  full  of  Java  engineers   –  Riak  is  in  Erlang  –  a  blessing  and  a  curse…  •  We  want  FOSS  +  Support   –  Voldemort  doesn’t  have  a  support  model  •  Writes  are  intrinsically  harder  than  reads   –  Hbase  is  opLmized  for  reads,  Cassandra  for  writes  •  We  tested  Cassandra  and  it  works  for  us   –  Step  by  step  into  full  producLon  over  the  last  year  
  39. 39. Priam  –  Cassandra  AutomaLon   Coming  soon  to  h=p://github.com/ne9lix  •  Ne9lix  Pla9orm  Tomcat  Code  •  Zero  touch  auto-­‐configuraLon  •  State  management  for  Cassandra  JVM  •  Token  allocaLon  and  assignment  •  Broken  node  auto-­‐replacement  •  Full  and  incremental  backup  to  S3  •  Restore  sequencing  from  S3  
  40. 40. Astyanax   Coming  soon  to  h=p://github.com/ne9lix  •  Cassandra  java  client  •  API  abstracLon  on  top  of  Thri6  protocol  •  “Fixed”  ConnecLon  Pool  abstracLon  (vs.  Hector)   –  Round  robin  with  Failover   –  Retry-­‐able  operaLons  not  Led  to  a  connecLon   –  Discovery  integraLon   –  Host  reconnect  (fixed  interval  or  exponenLal  backoff)   –  Token  aware  (in  development)  to  save  a  network  hop  •  Ne9lix  style  configuraLon  (INFLibrary)  •  Batch  mutaLon:  set,  put,  delete,  increment  •  Simplified  use  of  serializers  via  method  overloading  (vs.  Hector)  •  ConnecLonPoolMonitor  interface  for  counters  and  tracers  •  Composite  Column  Names  replacing  deprecated  SuperColumns  
  41. 41. IniLalizing  Astyanax  // Configuration either set in code or nfastyanax.propertiesplatform.ListOfComponentsToInit=LOGGING,APPINFO,DISCOVERYnetflix.environment=testdefault.astyanax.readConsistency=CL_QUORUMdefault.astyanax.writeConsistency=CL_QUORUMMyCluster.MyKeyspace.astyanax.servers=127.0.0.1// Must initialize platform for discovery to workNFLibraryManager.initLibrary(PlatformManager.class, props, false, true);NFLibraryManager.initLibrary(NFAstyanaxManager.class, props, true, false);// Open a keyspace instanceKeyspace keyspace = KeyspaceFactory.openKeyspace(”MyCluster”,”MyKeyspace");
  42. 42. Astyanax  Query  Example  Paginate  through  all  columns  in  a  row  ColumnList<String>  columns;  int  pageize  =  10;  try  {          RowQuery<String,  String>  query  =  keyspace                  .prepareQuery(CF_STANDARD1)                  .getKey("A")                  .setIsPaginaLng()                  .withColumnRange(new  RangeBuilder().setMaxSize(pageize).build());                                      while  (!(columns  =  query.execute().getResult()).isEmpty())  {                  for  (Column<String>  c  :  columns)  {                  }          }  }  catch  (ConnecLonExcepLon  e)  {  }      
  43. 43. Data  MigraLon  to  Cassandra  
  44. 44. Distributed  Key-­‐Value  Stores  •  Cloud  has  many  key-­‐value  data  stores   –  More  complex  to  keep  track  of,  do  backups  etc.   –  Each  store  is  much  simpler  to  administer   DBA   –  Joins  take  place  in  java  code  •  No  schema  to  change,  no  scheduled  downLme  •  Latency  for  typical  queries   –  Memcached  is  dominated  by  network  latency  <1ms   –  Cassandra  takes  a  few  milliseconds   –  SimpleDB  replicaLon  and  REST  auth  overheads  >10ms  
  45. 45. MulA-­‐Regional  Data  ReplicaAon  •  IR  Framework  –  Datacenter  Item  Replicator   –  Built  in  2009,  first  step  to  the  cloud   –  Oracle  to  SimpleDB  or  Cassandra  via  poll  and  push   –  Return  updates  to  Oracle  via  SQS  message  queue  •  SimpleDB  or  S3  to  Cassandra   –  Data  migraLon  tool  for  global  Ne9lix  •  Global  SimpleDB  and  S3  ReplicaLon   –  Cross  region  async  updates  USA  to  Europe  
  46. 46. TransiAonal  Steps  •  BidirecLonal  ReplicaLon   –  Oracle  to  SimpleDB   –  Queued  reverse  path  using  SQS   –  Backups  remain  in  Datacenter  via  Oracle  •  New  Cloud-­‐Only  Data  Sources   –  Cassandra  based   –  No  replicaLon  to  Datacenter   –  Backups  performed  in  the  cloud  
  47. 47. API  AWS  EC2   Front  End  Load  Balancer   Discovery   Service   API  Proxy   API  etc.   Load  Balancer   Component   API   SQS   Services   Oracl e   Oracle   Oracle  Cassandra   memcached   ReplicaLon   memcached   EC2   Internal   Disks   Ne+lix   S3   Data  Center   SimpleDB  
  48. 48. Cu|ng  the  Umbilical  •  TransiLon  Oracle  Data  Sources  to  Cassandra   –  Offload  Datacenter  Oracle  hardware   –  Free  up  capacity  for  growth  of  remaining  services  •  TransiLon  SimpleDB+Memcached  to  Cassandra   –  Primary  data  sources  that  need  backup   –  Keep  simplest  small  use  cases  for  now  •  New  challenges   –  Backup,  restore,  archive,  business  conLnuity   –  Business  Intelligence  integraLon  
  49. 49. API  AWS  EC2   Front  End  Load  Balancer   Discovery   Service   API  Proxy   Load  Balancer   Component   API   Services   memcached   Cassandra   EC2   Internal   Disks   Backup   S3   SimpleDB  
  50. 50. High  Availability  •  Cassandra  stores  3  local  copies,  1  per  zone   –  Synchronous  access,  durable,  highly  available   –  Read/Write  One  fastest,  least  consistent  -­‐  ~1ms   –  Read/Write  Quorum  2  of  3,  consistent  -­‐  ~3ms  •  AWS  Availability  Zones   –  Separate  buildings   –  Separate  power  etc.   –  Fairly  close  together    
  51. 51. Cassandra  Write  Data  Flows   Single  Region,  MulLple  Availability  Zone   Cassandra   • Disks   • Zone  A   2   2   4   2  1.  Client  Writes  to  any   Cassandra  3   3   Cassandra   If  a  node  goes  offline,   Cassandra  Node   • Disks   5 • Disks   5   hinted  handoff  2.  Coordinator  Node   • Zone  C   1 • Zone  A   completes  the  write   replicates  to  nodes   when  the  node  comes   and  Zones   back  up.  3.  Nodes  return  ack  to   Clients     coordinator   Requests  can  choose  to  4.  Coordinator  returns   3   wait  for  one  node,  a   Cassandra   Cassandra   ack  to  client   • Disks   • Disks   5   quorum,  or  all  nodes  to  5.  Data  wri=en  to   • Zone  C   • Zone  B   ack  the  write   internal  commit  log     disk   Cassandra   SSTable  disk  writes  and   • Disks   • Zone  B   compacLons  occur   asynchronously  
  52. 52. Data  Flows  for  MulL-­‐Region  Writes   Consistency  Level  =  Local  Quorum  1.  Client  Writes  to  any   If  a  node  or  region  goes  offline,  hinted  handoff   Cassandra  Node   completes  the  write  when  the  node  comes  back  up.  2.  Coordinator  node  replicates   Nightly  global  compare  and  repair  jobs  ensure   to  other  nodes  Zones  and   everything  stays  consistent.   regions  3.  Local  write  acks  returned  to   coordinator   100+ms  latency   Cassandra   2 7  4.  Client  gets  ack  when  2  of  3   Cassandra   •  Disks   •  Disks   8   2   2   •  Zone  A   4   2   6   6   •  Zone  A   local  nodes  are  commi=ed   Cassandra   3   3   Cassandra   7   Cassandra   Cassandra   5   5  5.  Data  wri=en  to  internal   8   •  Disks   •  Disks   •  Disks   •  Disks   •  Zone  C   •  Zone  A   •  Zone  C   •  Zone  A   1   commit  log  disks   US   EU  6.  When  data  arrives,  remote   Clients   Clients   Cassandra   3   Cassandra   Cassandra   7   Cassandra   node  replicates  data   •  Disks   •  Zone  C   •  Disks   •  Zone  B   5   •  Disks   •  Zone  C   •  Disks   •  Zone  B   8  7.  Ack  direct  to  source  region   Cassandra   Cassandra   coordinator   •  Disks   •  Disks   •  Zone  B   •  Zone  B  8.  Remote  copies  wri=en  to   commit  log  disks  
  53. 53. Remote  Copies  •  Cassandra  duplicates  across  AWS  regions   –  Asynchronous  write,  replicates  at  desLnaLon   –  Doesn’t  directly  affect  local  read/write  latency  •  Global  Coverage   –  Business  agility   –  Follow  AWS…  •  Local  Access   3 3 –  Be=er  latency   3 3 –  Fault  IsolaLon    
  54. 54. Cassandra  Backup  •  Full  Backup   Cassandra   Cassandra   Cassandra   –  Time  based  snapshot   –  SSTable  compress  -­‐>  S3   Cassandra   Cassandra  •  Incremental   S3   Backup   Cassandra   Cassandra   –  SSTable  write  triggers   compressed  copy  to  S3   Cassandra   Cassandra   Cassandra   Cassandra  
  55. 55. Cassandra  Restore  •  Full  Restore   Cassandra   Cassandra   Cassandra   –  Replace  previous  data  •  New  Ring  from  Backup   Cassandra   Cassandra   –  New  name  old  data   S3   Backup   Cassandra   Cassandra  •  Scripted   –  Create  new  instances   Cassandra   Cassandra   –  Parallel  load  -­‐  fast   Cassandra   Cassandra  
  56. 56. Cassandra  Online  AnalyLcs  •  Brisk  =  Hadoop  +  Cass   Cassandra   Brisk   Cassandra   –  Use  split  Brisk  ring   –  Size  each  separately   Brisk   Cassandra  •  Direct  Access   S3   Backup   Cassandra   Cassandra   –  Keyspaces   –  Hive/Pig/Map-­‐Reduce   Cassandra   Cassandra   –  Hdfs  as  a  keyspace   Cassandra   Cassandra   –  Distributed  namenode  
  57. 57. Cassandra  Archive   Appropriate  level  of  paranoia  needed…  •  Archive  could  be  un-­‐readable   –  Restore  S3  backups  weekly  from  prod  to  test  •  Archive  could  be  stolen   –  PGP  Encrypt  archive  •  AWS  East  Region  could  have  a  problem   –  Copy  data  to  AWS  West  •  ProducLon  AWS  Account  could  have  an  issue   –  Separate  Archive  account  with  no-­‐delete  S3  ACL  •  AWS  S3  could  have  a  global  problem   –  Create  an  extra  copy  on  a  different  cloud  vendor  
  58. 58. Extending  to  MulL-­‐Region   In  producLon  last  week  for  UK/Eire  support!  1.  Create  cluster  in  EU   Take  a  Boeing  737  on  a  domesLc  flight,  upgrade  it  to   a  747  by  adding  more  engines  and  fly  it  to  Europe  2.  Backup  US  cluster  to  S3   without  landing  it  on  the  way…  3.  Restore  backup  in  EU  4.  Local  repair  EU  cluster  5.  Global  repair/join   Cassandra   100+ms  latency   Cassandra   1   •  Disks   •  Disks   •  Zone  A   •  Zone  A   Cassandra   Cassandra   Cassandra   Cassandra   •  Disks   •  Disks   •  Disks   •  Disks   •  Zone  C   •  Zone  A   •  Zone  C   •  Zone  A   US   5   EU   Clients   Clients   Cassandra   Cassandra   Cassandra   Cassandra   •  Disks   •  Disks   •  Disks   •  Disks   •  Zone  C   •  Zone  B   •  Zone  C   •  Zone  B   Cassandra   Cassandra   •  Disks   •  Disks   •  Zone  B   3   •  Zone  B   4   2   S3  
  59. 59. Tools  and  AutomaLon  •  Developer  and  Build  Tools   –  Jira,  Perforce,  Eclipse,  Jenkins,  Ivy,  ArLfactory   –  Builds,  creates  .war  file,  .rpm,  bakes  AMI  and  launches  •  Custom  Ne9lix  ApplicaLon  Console   –  AWS  Features  at  Enterprise  Scale  (hide  the  AWS  security  keys!)   –  Auto  Scaler  Group  is  unit  of  deployment  to  producLon  •  Open  Source  +  Support   –  Apache,  Tomcat,  Cassandra,  Hadoop,  OpenJDK,  CentOS   –  Datastax  support  for  Cassandra,  AWS  support  for  Hadoop  via  EMR  •  Monitoring  Tools   –  Datastax  Opscenter  for  monitoring  Cassandra   –  AppDynamics  –  Developer  focus  for  cloud  h=p://appdynamics.com  
  60. 60. Developer  MigraLon  •  Detailed  SQL  to  NoSQL  TransiLon  Advice   –  Sid  Anand    -­‐  QConSF  Nov  5th  –  Ne9lix’  TransiLon   to  High  Availability  Storage  Systems   –  Blog  -­‐  h=p://pracLcalcloudcompuLng.com/   –  Download  Paper  PDF  -­‐  h=p://bit.ly/bhOTLu  •  Mark  Atwood,  "Guide  to  NoSQL,  redux”   –  YouTube  h=p://youtu.be/zAbFRiyT3LU  
  61. 61. Cloud  OperaLons   Cassandra  Use  Cases  Model  Driven  Architecture  Performance  and  Scalability  
  62. 62. Cassandra  Use  Cases  •  Key  by  Customer  –  Cross-­‐region  clusters   –  Many  app  specific  Cassandra  clusters,  read-­‐intensive   –  Keys+Rows  in  memory  using  m2.4xl  Instances  •  Key  by  Customer:Movie  –  e.g.  Viewing  History   –  Growing  fast,  write  intensive  –  m1.xl  instances   –  Keys  cached  in  memory,  one  cluster  per  region  •  Large  scale  data  logging  –  lots  of  writes   –  Column  data  expires  a6er  Lme  period   –  Distributed  counters,  one  cluster  per  region  
  63. 63. Model  Driven  Architecture  •  Datacenter  PracLces   –  Lots  of  unique  hand-­‐tweaked  systems   –  Hard  to  enforce  pa=erns  •  Model  Driven  Cloud  Architecture   –  Perforce/Ivy/Jenkins  based  builds  for  everything   –  Every  producLon  instance  is  a  pre-­‐baked  AMI   –  Every  applicaLon  is  managed  by  an  Autoscaler   Every  change  is  a  new  AMI  
  64. 64. Chaos  Monkey  •  Make  sure  systems  are  resilient   –  Allow  any  instance  to  fail  without  customer  impact  •  Chaos  Monkey  hours   –  Monday-­‐Thursday  9am-­‐3pm  random  instance  kill  •  ApplicaLon  configuraLon  opLon   –  Apps  now  have  to  opt-­‐out  from  Chaos  Monkey  •  Computers  (Datacenter  or  AWS)  randomly  die   –  Fact  of  life,  but  too  infrequent  to  test  resiliency  
  65. 65. AppDynamics  Monitoring  of  Cassandra  –  AutomaLc  Discovery  
  66. 66. Scalability  TesLng  •  Cloud  Based  TesLng  –  fricLonless,  elasLc   –  Create/destroy  any  sized  cluster  in  minutes   –  Many  test  scenarios  run  in  parallel  •  Test  Scenarios   –  Internal  app  specific  tests   –  Simple  “stress”  tool  provided  with  Cassandra  •  Scale  test,  keep  making  the  cluster  bigger   –  Check  that  tooling  and  automaLon  works…   –  How  many  ten  column  row  writes/sec  can  we  do?  
  67. 67. <DrEvil>ONE  MILLION</DrEvil>  
  68. 68. Scale-­‐Up  Linearity   Client  Writes/s  by  node  count  –  ReplicaAon  Factor  =  3  1200000   1099837  1000000   800000   600000   537172   400000   366828   200000   174373   0   0   50   100   150   200   250   300   350  
  69. 69. Per  Node  AcLvity   Per  Node   48  Nodes   96  Nodes   144  Nodes   288  Nodes  Per  Server  Writes/s   10,900  w/s   11,460  w/s   11,900  w/s   11,456  w/s  Mean  Server  Latency   0.0117  ms   0.0134  ms   0.0148  ms   0.0139  ms  Mean  CPU  %Busy   74.4  %   75.4  %   72.5  %   81.5  %  Disk  Read   5,600  KB/s   4,590  KB/s   4,060  KB/s   4,280  KB/s  Disk  Write   12,800  KB/s   11,590  KB/s   10,380  KB/s   10,080  KB/s  Network  Read   22,460  KB/s   23,610  KB/s   21,390  KB/s   23,640  KB/s  Network  Write   18,600  KB/s   19,600  KB/s   17,810  KB/s   19,770  KB/s   Node  specificaLon  –  Xen  Virtual  Images,  AWS  US  East,  three  zones   •  Cassandra  0.8.6,  CentOS,  SunJDK6   •  AWS  EC2  m1  Extra  Large  –  Standard  price  $  0.68/Hour   •  15  GB  RAM,  4  Cores,  1Gbit  network   •  4  internal  disks  (total  1.6TB,  striped  together,  md,  XFS)  
  70. 70. Time  is  Money   48  nodes   96  nodes   144  nodes   288  nodes  Writes  Capacity   174373  w/s   366828  w/s   537172  w/s   1,099,837  w/s  Storage  Capacity   12.8  TB   25.6  TB   38.4  TB   76.8  TB  Nodes  Cost/hr   $32.64   $65.28   $97.92   $195.84  Test  Driver  Instances   10   20   30   60  Test  Driver  Cost/hr   $20.00   $40.00   $60.00   $120.00  Cross  AZ  Traffic   5  TB/hr   10  TB/hr   15  TB/hr   301  TB/hr  Traffic  Cost/10min   $8.33   $16.66   $25.00   $50.00  Setup  DuraLon   15  minutes   22  minutes   31  minutes   662  minutes  AWS  Billed  DuraLon   1hr   1hr   1  hr   2  hr  Total  Test  Cost   $60.97   $121.94   $182.92   $561.68   1  EsLmate  two  thirds  of  total  network  traffic     2  Workaround  for  a  tooling  bug  slowed  setup  
  71. 71. Takeaway     Ne<lix  has  built  and  deployed  a  scalable  global   Pla<orm  as  a  Service.    Also,  benchmarking  in  the  cloud  is  fast,  cheap  and   scalable     h=p://www.linkedin.com/in/adriancockcro6   @adrianco  #ne9lixcloud   acockcro6@ne9lix.com  

×