OSCON Data 2011 -- NoSQL @ Netflix, Part 2


Published on

This is my OSCON Data 2011 slide presentation titled "NoSQL @ Netflix". I

Published in: Technology, News & Politics
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

OSCON Data 2011 -- NoSQL @ Netflix, Part 2

  1. 1. Part 2@r39132
  2. 2. Announcements ™  A Special Thanks to ™  The OSCON organizers (e.g. Shirley Bailes) ™  Other Speakers from Netflix @OSCON ™  Adrian Cockcroft – Keynote, OSCON Data, Tuesday ™  Daniel Jacobson – API, OSCON, Wed ™  Matt McCarthy/Kim Trott – Webkit, OSCON, Fri@r39132 2
  3. 3. Big Data is Boring!@r39132 3
  4. 4. Big Data is Boring! ™  Our single largest stored data set is our tomcat logs ™  Hundreds of petabytes ™  Stored on S3 ™  Never read or accessed after writing ™  Why do we have so much data? ™  Cumbersome to delete data from S3 ™  This is Big Data .. And Boring!@r39132 4
  5. 5. Fast, Available, Customer-facing, & Global Big Data is What We Do!@r39132 5
  6. 6. Motivation ™  Circa late 2008, Netflix had a single data center ™  Single-point-of-failure (a.k.a. SPOF) ™  Approaching limits on cooling, power, space, traffic capacity ™  Alternatives ™  Build more data centers ™  Outsource the majority of our capacity planning and scale out ™  Allows us to focus on core competencies@r39132 7
  7. 7. Out-Growing Data Center 37x  Growth  Jan   2010-­‐Jan  2011   Datacenter Capacity@r39132 8
  8. 8. Data Center
  9. 9. @r39132 11
  10. 10. Device Experience IPhone 8 Screens of the IPhone App (From Upper Left to Lower Right): •  Login Screen •  Home Screen •  Genres •  Queue (… loading…) •  Queue •  Video Detail Page •  Video Playback Starting •  Video in Progress@r39132 12
  11. 11. Device Experience IPhone These Talk to API •  Home Screen •  Genres •  Queue (… loading…) •  Queue •  Video Detail Page These Talk to NCCP •  Video Playback Starting •  Video in Progress@r39132 13
  12. 12. Device Experience IPhone Playback is a multi-step process: Step 1 : Authenticate & Authorize the user and device (major over- simplification) Step 2 : Stream the video bits till your ISP cries “mother”@r39132 14
  13. 13. The AWS Experience ™  We use the following services: ™  Compute (w/ Auto-scaling) ™  EC2, ELB, CloudWatch, ASG ™  Queueing ™  SQS, starting to use SES and SNS ™  Persistence ™  SDB & S3 (and minimal EBS)@r39132 16
  14. 14. ELB Primer ™  An elastic-load balancer (ELB) routes traffic to your EC2 instances ™  e.g.: api-apiproxy-frontend-11111111.us-east-1.elb.amazonaws.com ™  Netflix maps a CNAME to this ELB ™  e.g.: api.netflix.com (just a guess!) ™  Netflix then registers EC2 instances with this ELB, so that the ELB can load balance traffic across EC2 instances ™  ELB periodically polls attached EC2 instances on their http port to ensure the instances are healthy. If they are not, then no traffic is sent to them@r39132 18
  15. 15. ELB Primer : Request Flow •  Client DNS Lookups •  Netflix CNAME à ELB DNS name •  ELB DNS name à IP Address of an ELB node •  Client Connection to ELB Node •  ELB node Round Robins to one of your servers •  Response sent back to ELB and passed back to the client@r39132 19
  16. 16. ELB Primer : Auto Scaling ™  Taking this a bit further: ™  We have CloudWatch monitor EC2 instance CPU ™  We set up a CloudWatch alarm on CPU limits ™  We associate this CloudWatch alarm with an Auto Scale policy ™  E.g. If CPU >60% persists for 5 minutes do policy z (add 3 nodes/zone) ™  E.g. If CPU <30% persists for 5 minutes do policy a (remove 1 node/zone) ™  Supported Metrics include: ™  CPU ™  Disk Read Ops or Disk Write Ops ™  Disk Read Bytes or Disk Write Bytes ™  Network In (bytes) or Network Out (bytes)@r39132 20
  17. 17. Event Flow Instances publish data to CloudWatch NCCP CloudWatch (Alarms) Standard system or custom metrics CloudWatch alarms trigger ASG policies Auto-Scaling EC2 instances are Service added/removed (Policies)@r39132 21
  18. 18. NCCP RulesRule DescriptionScale Up Event Average CPU > 60% for 5 minutesScale Down Event Average CPU < 30% FOR 5 minutesCool-Down Period 10 minutesAuto-Scale Alerts DLAutoScaleEvents@r39132 22
  19. 19. @r39132 23
  20. 20. Queuing: SQS Terminology SQS Definition Queue Message Q (not topic, SNS are topics) Message Message@r39132 25
  21. 21. Queuing: SQS™  SQS ™  API for Queue Management ™  CreateQueue ™  ListQueues ™  DeleteQueue ™  API for Message Management ™  SendMessage (up to 64K in size) ™  ReceiveMessage (up to 10 messages in a batch) ™  DeleteMessage (a.k.a. ACK Message) ™  SetVisibilityTimeout – after which, a message becomes visible to other ReceivedMessage calls@r39132 26
  22. 22. Queuing: SQS™  We are Happy with SQS ™  Our previous DC-based WebLogic Messaging Infrastructure did not scale ™  If the Message Queue grew too large, the message producer needed to drop messages (or store them on local disk) ™  If the producer tried to force the message onto the WebLogic queue, GC pauses would cripple WL ™  SQS has worked well even with >100M message backlogs ™  As long as you can work through the backlog before any message exceeds 4 days on the queue@r39132 27
  23. 23. Messaging Services™  SQS Wish List ™  API for Message Management ™  SendMessage ™  Support Batch Sends ™  ReceiveMessage ™  Record metrics in Cloud Watch on the following events ™  Empty Receive Count when Q is not empty ™  Visibility Timeout Expiration Count ™  DeleteMessage ™  Support Batch Deletes@r39132 28
  24. 24. Pick a Data Store in the Cloud During our Cloud Migration, out initial requirements were : þ  Hosted þ  Managed Distribution Model þ  Works in AWS þ  AP from CAP þ  Handles a majority of use-cases accessing high-growth, high- traffic data þ  Specifically, key access by customer id, movie id, or both@r39132 32
  25. 25. Pick a Data Store in the Cloud ™  We picked SimpleDB and S3 ™  SimpleDB was targeted as the AP equivalent of our RDBMS databases in our Data Center ™  S3 was used for data sets where item or row data exceeded SimpleDB limits and could be looked up purely by a single key (i.e. does not require secondary indices and complex query semantics) ™  Video encodes ™  Streaming device activity logs (i.e. CLOB, BLOB, etc…) ™  Compressed (old) Rental History@r39132 33
  26. 26. SimpleDB
  27. 27. Technology Overview : SimpleDB Terminology SimpleDB Hash Table Relational Databases Domain Hash Table Table Item Entry Row Item Name Key Mandatory Primary Key Attribute Part of the Entry Value Column@r39132 35
  28. 28. Technology Overview : SimpleDBSoccer PlayersKey Value Nickname = Wizard of Teams = Leeds United,ab12ocs12v9 First Name = Harold Last Name = Kewell Oz Liverpool, Galatasaray Nickname = Czech Teams = Lazio,b24h3b3403b First Name = Pavel Last Name = Nedved Cannon Juventus Teams = Sporting, Manchester United,cc89c9dc892 First Name = Cristiano Last Name = Ronaldo Real MadridSimpleDB’s salient characteristics •  SimpleDB offers a range of consistency options •  SimpleDB domains are sparse and schema-less •  The Key and all Attributes are indexed •  Each item must have a unique Key •  An item contains a set of Attributes •  Each Attribute has a name •  Each Attribute has a set of values •  All data is stored as UTF-8 character strings (i.e. no support for types such as numbers or dates)@r39132 36
  29. 29. Technology Overview : SimpleDB What does the API look like? ™  Manage Domains ™  CreateDomain ™  DeleteDomain ™  ListDomains ™  DomainMetaData ™  Access Data ™  Retrieving Data ™  GetAttributes – returns a single item ™  Select – returns multiple items using SQL syntax ™  Writing Data ™  PutAttributes – put single item ™  BatchPutAttributes – put multiple items ™  Removing Data ™  DeleteAttributes – delete single item ™  BatchDeleteAttributes – delete multiple items@r39132 37
  30. 30. Technology Overview : SimpleDB ™  Options available on reads and writes ™  Consistent Read ™  Read the most recently committed write ™  May have lower throughput/higher latency/lower availability ™  Conditional Put/Delete ™  i.e. Optimistic Locking ™  Useful if you want to build a consistent multi-master data store – you will still require your own anti-entropy ™  We do not use this currently, so we don’t know how it performs@r39132 38
  31. 31. Translate RDBMS Concepts to Key-Value Store Concepts ™  Relational Databases are known for relations ™  First, a quick refresher on Normal forms@r39132 40
  32. 32. Normalization NF1 : All occurrences of a record type must contain the same number of fields -- variable repeating fields and groups are not allowed NF2 : Second normal form is violated when a non-key field is a fact about a subset of a key Violated here Part Warehouse Quantity Warehouse- Address Fixed here Part Warehouse Quantity Warehouse Warehouse- Address@r39132 41
  33. 33. Normalization ™  Issues ™  Wastes Storage ™  The warehouse address is repeated for every Part-WH pair ™  Update Performance Suffers ™  If the address of a warehouse changes, I must update every part in that warehouse – i.e. many rows ™  Data Inconsistencies Possible ™  I can update the warehouse address for one Part-WH pair and miss Parts for the same WH (a.k.a. update anomaly) ™  Data Loss Possible ™  An empty warehouse does not have a row, so the address will be lost. (a.k.a. deletion anomaly)@r39132 42
  34. 34. Normalization ™  RDBMS à KV Store migrations can’t simply accept denormalization! ™  Especially many-to-many and many-to-one entity relationships ™  Instead, pick your data set candidates carefully! ™  Keep relational data in RDBMS ™  Move key-look-ups to KV stores ™  Luckily for Netflix, most Web Scale data is accessed by Customer, Video, or both ™  i.e. Key Lookups that do not violate 2NF or 3NF@r39132 43
  35. 35. Translate RDBMS Concepts to Key-Value Store Concepts ™  Aside from relations, relational databases typically offer the following: ™  Transactions ™  Locks ™  Sequences ™  Triggers ™  Clocks ™  A structured query language (i.e. SQL) ™  Database server-side coding constructs (i.e. PL/SQL) ™  Constraints@r39132 44
  36. 36. Translate RDBMS Concepts to Key-Value Store Concepts ™  Partial or no SQL support (e.g. no Joins, Group Bys, etc…) ™  BEST PRACTICE ™  Carry these out in the application layer for smallish data ™  No relations between domains ™  BEST PRACTICE ™  Compose relations in the application layer ™  No transactions ™  BEST PRACTICE ™  SimpleDB : Conditional Put/Delete (best effort) w/ fixer jobs ™  Cassandra : Batch Mutate + the same column TS for all writes@r39132 45
  37. 37. Translate RDBMS Concepts to Key-Value Store Concepts ™  No schema - This is non-obvious. A query for a misspelled attribute name will not fail with an error ™  BEST PRACTICE ™  Implement a schema validator in a common data access layer ™  No sequences ™  BEST PRACTICE ™  Sequences are often used as primary keys ™  In this case, use a naturally occurring unique key ™  If no naturally occurring unique key exists, use a UUID ™  Sequences are also often used for ordering ™  Use a distributed sequence generator or rely on client timestamps@r39132 46
  38. 38. Translate RDBMS Concepts to Key-Value Store Concepts ™  No clock operations, PL/SQL, Triggers ™  BEST PRACTICE ™  Clocks : Instead rely on client-generated clocks and run NTP. If using clocks to determine order, be aware that this is problematic over long distances. ™  PL/SQL, Triggers : Do without ™  No constraints. Specifically, ™  No uniqueness constraints ™  No foreign key or referential constraints ™  No integrity constraints ™  BEST PRACTICE ™  Applications must implement this functionality@r39132 47
  39. 39. Leaving Oracle Behind Resources ™  Mark Atwood’s : “A guide to NoSQL, redux” ™  http://www.youtube.com/watch?v=zAbFRiyT3LU ™  Sid Anand’s : “NoSQL @ Netflix Talk” ™  http://techblog.netflix.com/2011/03/nosql-netflix-talk-part-1.html ™  Sid Anand’s : “Netflix’s Transition to High-Availability Storage Systems” ™  http://practicalcloudcomputing.com/post/1267489138/ netflixcloudstorage@r39132 48
  40. 40. Cassandra
  41. 41. Cassandra
  42. 42. Data Model : Cassandra Terminology SimpleDB Cassandra Relational Databases Key Space “Schema” Domain Column Family Table Item Row Row Item Name Row Key Mandatory Primary Key Super Columns Attribute Column Column@r39132 51
  43. 43. Data Model : Cassandra@r39132 52
  44. 44. Earlier Persistence Requirements With Cloud Migration behind us and Global Expansion in front of us! þ  Hosted þ  Managed Distribution Model þ  Works in AWS – We can make it work in AWS þ  AP from CAP þ  Handles a majority of use-cases accessing high-growth, high- traffic data þ  Specifically, key access by customer id, movie id, or both@r39132 54
  45. 45. Persistence Requirements Revisited Requirements SDB S3 Cassandra Auto-Sharding No Yes Yes Auto-Failover & Yes Yes Yes Failback Fast Yes No TBD HA Writes Reads No No TBD Cross-Region No No Yes Exportable for No Yes Yes Backup and Recovery Works in AWS Yes Yes Yes Hosted Yes Yes No Open Source No No Yes@r39132 55
  46. 46. Netflix Wants You Cloud Systems •  Cassandra •  Netflix Platform •  Simian Army (e.g. Chaos Monkey) API & Discovery Engineering •  Video Discovery NCCP a.k.a. Streaming Server •  Video Playback Partner Product Development •  PS3, Android / Web Kit, etc…@r39132 56