Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

OSCON Data 2011 -- NoSQL @ Netflix, Part 2


Published on

This is my OSCON Data 2011 slide presentation titled "NoSQL @ Netflix". I

Published in: Technology, News & Politics

OSCON Data 2011 -- NoSQL @ Netflix, Part 2

  1. 1. Part 2@r39132
  2. 2. Announcements ™  A Special Thanks to ™  The OSCON organizers (e.g. Shirley Bailes) ™  Other Speakers from Netflix @OSCON ™  Adrian Cockcroft – Keynote, OSCON Data, Tuesday ™  Daniel Jacobson – API, OSCON, Wed ™  Matt McCarthy/Kim Trott – Webkit, OSCON, Fri@r39132 2
  3. 3. Big Data is Boring!@r39132 3
  4. 4. Big Data is Boring! ™  Our single largest stored data set is our tomcat logs ™  Hundreds of petabytes ™  Stored on S3 ™  Never read or accessed after writing ™  Why do we have so much data? ™  Cumbersome to delete data from S3 ™  This is Big Data .. And Boring!@r39132 4
  5. 5. Fast, Available, Customer-facing, & Global Big Data is What We Do!@r39132 5
  6. 6. Motivation ™  Circa late 2008, Netflix had a single data center ™  Single-point-of-failure (a.k.a. SPOF) ™  Approaching limits on cooling, power, space, traffic capacity ™  Alternatives ™  Build more data centers ™  Outsource the majority of our capacity planning and scale out ™  Allows us to focus on core competencies@r39132 7
  7. 7. Out-Growing Data Center 37x  Growth  Jan   2010-­‐Jan  2011   Datacenter Capacity@r39132 8
  8. 8. Data Center
  9. 9. @r39132 11
  10. 10. Device Experience IPhone 8 Screens of the IPhone App (From Upper Left to Lower Right): •  Login Screen •  Home Screen •  Genres •  Queue (… loading…) •  Queue •  Video Detail Page •  Video Playback Starting •  Video in Progress@r39132 12
  11. 11. Device Experience IPhone These Talk to API •  Home Screen •  Genres •  Queue (… loading…) •  Queue •  Video Detail Page These Talk to NCCP •  Video Playback Starting •  Video in Progress@r39132 13
  12. 12. Device Experience IPhone Playback is a multi-step process: Step 1 : Authenticate & Authorize the user and device (major over- simplification) Step 2 : Stream the video bits till your ISP cries “mother”@r39132 14
  13. 13. The AWS Experience ™  We use the following services: ™  Compute (w/ Auto-scaling) ™  EC2, ELB, CloudWatch, ASG ™  Queueing ™  SQS, starting to use SES and SNS ™  Persistence ™  SDB & S3 (and minimal EBS)@r39132 16
  14. 14. ELB Primer ™  An elastic-load balancer (ELB) routes traffic to your EC2 instances ™  e.g.: ™  Netflix maps a CNAME to this ELB ™  e.g.: (just a guess!) ™  Netflix then registers EC2 instances with this ELB, so that the ELB can load balance traffic across EC2 instances ™  ELB periodically polls attached EC2 instances on their http port to ensure the instances are healthy. If they are not, then no traffic is sent to them@r39132 18
  15. 15. ELB Primer : Request Flow •  Client DNS Lookups •  Netflix CNAME à ELB DNS name •  ELB DNS name à IP Address of an ELB node •  Client Connection to ELB Node •  ELB node Round Robins to one of your servers •  Response sent back to ELB and passed back to the client@r39132 19
  16. 16. ELB Primer : Auto Scaling ™  Taking this a bit further: ™  We have CloudWatch monitor EC2 instance CPU ™  We set up a CloudWatch alarm on CPU limits ™  We associate this CloudWatch alarm with an Auto Scale policy ™  E.g. If CPU >60% persists for 5 minutes do policy z (add 3 nodes/zone) ™  E.g. If CPU <30% persists for 5 minutes do policy a (remove 1 node/zone) ™  Supported Metrics include: ™  CPU ™  Disk Read Ops or Disk Write Ops ™  Disk Read Bytes or Disk Write Bytes ™  Network In (bytes) or Network Out (bytes)@r39132 20
  17. 17. Event Flow Instances publish data to CloudWatch NCCP CloudWatch (Alarms) Standard system or custom metrics CloudWatch alarms trigger ASG policies Auto-Scaling EC2 instances are Service added/removed (Policies)@r39132 21
  18. 18. NCCP RulesRule DescriptionScale Up Event Average CPU > 60% for 5 minutesScale Down Event Average CPU < 30% FOR 5 minutesCool-Down Period 10 minutesAuto-Scale Alerts DLAutoScaleEvents@r39132 22
  19. 19. @r39132 23
  20. 20. Queuing: SQS Terminology SQS Definition Queue Message Q (not topic, SNS are topics) Message Message@r39132 25
  21. 21. Queuing: SQS™  SQS ™  API for Queue Management ™  CreateQueue ™  ListQueues ™  DeleteQueue ™  API for Message Management ™  SendMessage (up to 64K in size) ™  ReceiveMessage (up to 10 messages in a batch) ™  DeleteMessage (a.k.a. ACK Message) ™  SetVisibilityTimeout – after which, a message becomes visible to other ReceivedMessage calls@r39132 26
  22. 22. Queuing: SQS™  We are Happy with SQS ™  Our previous DC-based WebLogic Messaging Infrastructure did not scale ™  If the Message Queue grew too large, the message producer needed to drop messages (or store them on local disk) ™  If the producer tried to force the message onto the WebLogic queue, GC pauses would cripple WL ™  SQS has worked well even with >100M message backlogs ™  As long as you can work through the backlog before any message exceeds 4 days on the queue@r39132 27
  23. 23. Messaging Services™  SQS Wish List ™  API for Message Management ™  SendMessage ™  Support Batch Sends ™  ReceiveMessage ™  Record metrics in Cloud Watch on the following events ™  Empty Receive Count when Q is not empty ™  Visibility Timeout Expiration Count ™  DeleteMessage ™  Support Batch Deletes@r39132 28
  24. 24. Pick a Data Store in the Cloud During our Cloud Migration, out initial requirements were : þ  Hosted þ  Managed Distribution Model þ  Works in AWS þ  AP from CAP þ  Handles a majority of use-cases accessing high-growth, high- traffic data þ  Specifically, key access by customer id, movie id, or both@r39132 32
  25. 25. Pick a Data Store in the Cloud ™  We picked SimpleDB and S3 ™  SimpleDB was targeted as the AP equivalent of our RDBMS databases in our Data Center ™  S3 was used for data sets where item or row data exceeded SimpleDB limits and could be looked up purely by a single key (i.e. does not require secondary indices and complex query semantics) ™  Video encodes ™  Streaming device activity logs (i.e. CLOB, BLOB, etc…) ™  Compressed (old) Rental History@r39132 33
  26. 26. SimpleDB
  27. 27. Technology Overview : SimpleDB Terminology SimpleDB Hash Table Relational Databases Domain Hash Table Table Item Entry Row Item Name Key Mandatory Primary Key Attribute Part of the Entry Value Column@r39132 35
  28. 28. Technology Overview : SimpleDBSoccer PlayersKey Value Nickname = Wizard of Teams = Leeds United,ab12ocs12v9 First Name = Harold Last Name = Kewell Oz Liverpool, Galatasaray Nickname = Czech Teams = Lazio,b24h3b3403b First Name = Pavel Last Name = Nedved Cannon Juventus Teams = Sporting, Manchester United,cc89c9dc892 First Name = Cristiano Last Name = Ronaldo Real MadridSimpleDB’s salient characteristics •  SimpleDB offers a range of consistency options •  SimpleDB domains are sparse and schema-less •  The Key and all Attributes are indexed •  Each item must have a unique Key •  An item contains a set of Attributes •  Each Attribute has a name •  Each Attribute has a set of values •  All data is stored as UTF-8 character strings (i.e. no support for types such as numbers or dates)@r39132 36
  29. 29. Technology Overview : SimpleDB What does the API look like? ™  Manage Domains ™  CreateDomain ™  DeleteDomain ™  ListDomains ™  DomainMetaData ™  Access Data ™  Retrieving Data ™  GetAttributes – returns a single item ™  Select – returns multiple items using SQL syntax ™  Writing Data ™  PutAttributes – put single item ™  BatchPutAttributes – put multiple items ™  Removing Data ™  DeleteAttributes – delete single item ™  BatchDeleteAttributes – delete multiple items@r39132 37
  30. 30. Technology Overview : SimpleDB ™  Options available on reads and writes ™  Consistent Read ™  Read the most recently committed write ™  May have lower throughput/higher latency/lower availability ™  Conditional Put/Delete ™  i.e. Optimistic Locking ™  Useful if you want to build a consistent multi-master data store – you will still require your own anti-entropy ™  We do not use this currently, so we don’t know how it performs@r39132 38
  31. 31. Translate RDBMS Concepts to Key-Value Store Concepts ™  Relational Databases are known for relations ™  First, a quick refresher on Normal forms@r39132 40
  32. 32. Normalization NF1 : All occurrences of a record type must contain the same number of fields -- variable repeating fields and groups are not allowed NF2 : Second normal form is violated when a non-key field is a fact about a subset of a key Violated here Part Warehouse Quantity Warehouse- Address Fixed here Part Warehouse Quantity Warehouse Warehouse- Address@r39132 41
  33. 33. Normalization ™  Issues ™  Wastes Storage ™  The warehouse address is repeated for every Part-WH pair ™  Update Performance Suffers ™  If the address of a warehouse changes, I must update every part in that warehouse – i.e. many rows ™  Data Inconsistencies Possible ™  I can update the warehouse address for one Part-WH pair and miss Parts for the same WH (a.k.a. update anomaly) ™  Data Loss Possible ™  An empty warehouse does not have a row, so the address will be lost. (a.k.a. deletion anomaly)@r39132 42
  34. 34. Normalization ™  RDBMS à KV Store migrations can’t simply accept denormalization! ™  Especially many-to-many and many-to-one entity relationships ™  Instead, pick your data set candidates carefully! ™  Keep relational data in RDBMS ™  Move key-look-ups to KV stores ™  Luckily for Netflix, most Web Scale data is accessed by Customer, Video, or both ™  i.e. Key Lookups that do not violate 2NF or 3NF@r39132 43
  35. 35. Translate RDBMS Concepts to Key-Value Store Concepts ™  Aside from relations, relational databases typically offer the following: ™  Transactions ™  Locks ™  Sequences ™  Triggers ™  Clocks ™  A structured query language (i.e. SQL) ™  Database server-side coding constructs (i.e. PL/SQL) ™  Constraints@r39132 44
  36. 36. Translate RDBMS Concepts to Key-Value Store Concepts ™  Partial or no SQL support (e.g. no Joins, Group Bys, etc…) ™  BEST PRACTICE ™  Carry these out in the application layer for smallish data ™  No relations between domains ™  BEST PRACTICE ™  Compose relations in the application layer ™  No transactions ™  BEST PRACTICE ™  SimpleDB : Conditional Put/Delete (best effort) w/ fixer jobs ™  Cassandra : Batch Mutate + the same column TS for all writes@r39132 45
  37. 37. Translate RDBMS Concepts to Key-Value Store Concepts ™  No schema - This is non-obvious. A query for a misspelled attribute name will not fail with an error ™  BEST PRACTICE ™  Implement a schema validator in a common data access layer ™  No sequences ™  BEST PRACTICE ™  Sequences are often used as primary keys ™  In this case, use a naturally occurring unique key ™  If no naturally occurring unique key exists, use a UUID ™  Sequences are also often used for ordering ™  Use a distributed sequence generator or rely on client timestamps@r39132 46
  38. 38. Translate RDBMS Concepts to Key-Value Store Concepts ™  No clock operations, PL/SQL, Triggers ™  BEST PRACTICE ™  Clocks : Instead rely on client-generated clocks and run NTP. If using clocks to determine order, be aware that this is problematic over long distances. ™  PL/SQL, Triggers : Do without ™  No constraints. Specifically, ™  No uniqueness constraints ™  No foreign key or referential constraints ™  No integrity constraints ™  BEST PRACTICE ™  Applications must implement this functionality@r39132 47
  39. 39. Leaving Oracle Behind Resources ™  Mark Atwood’s : “A guide to NoSQL, redux” ™ ™  Sid Anand’s : “NoSQL @ Netflix Talk” ™ ™  Sid Anand’s : “Netflix’s Transition to High-Availability Storage Systems” ™ netflixcloudstorage@r39132 48
  40. 40. Cassandra
  41. 41. Cassandra
  42. 42. Data Model : Cassandra Terminology SimpleDB Cassandra Relational Databases Key Space “Schema” Domain Column Family Table Item Row Row Item Name Row Key Mandatory Primary Key Super Columns Attribute Column Column@r39132 51
  43. 43. Data Model : Cassandra@r39132 52
  44. 44. Earlier Persistence Requirements With Cloud Migration behind us and Global Expansion in front of us! þ  Hosted þ  Managed Distribution Model þ  Works in AWS – We can make it work in AWS þ  AP from CAP þ  Handles a majority of use-cases accessing high-growth, high- traffic data þ  Specifically, key access by customer id, movie id, or both@r39132 54
  45. 45. Persistence Requirements Revisited Requirements SDB S3 Cassandra Auto-Sharding No Yes Yes Auto-Failover & Yes Yes Yes Failback Fast Yes No TBD HA Writes Reads No No TBD Cross-Region No No Yes Exportable for No Yes Yes Backup and Recovery Works in AWS Yes Yes Yes Hosted Yes Yes No Open Source No No Yes@r39132 55
  46. 46. Netflix Wants You Cloud Systems •  Cassandra •  Netflix Platform •  Simian Army (e.g. Chaos Monkey) API & Discovery Engineering •  Video Discovery NCCP a.k.a. Streaming Server •  Video Playback Partner Product Development •  PS3, Android / Web Kit, etc…@r39132 56