Netflix’s Transition to NoSQL P.1Siddharth “Sid” Anand@r39132Silicon Valley Cloud Computing Group February 2011
Special ThanksSebastian Stadil – Scalr, SVCCG Meetup OrganizerSriSatish Ambati – DatastaxLynn Bender – Geek AustinScott MacVicar – Facebook2@r39132 - #netflixcloud
MotivationNetflix’s motivation for moving to the cloud
MotivationCirca late 2008, Netflix had a single data centerSingle-point-of-failure (a.k.a. SPOF) Approaching limits on cooling, power, space, traffic capacityAlternativesBuild more data centersOutsource the majority of our capacity planning and scale out Allows us to focus on core competencies@r39132 - #netflixcloud4
MotivationWinner: Outsource the majority of our capacity planning and scale outLeverage a leading Infrastructure-as-a-service provider Amazon Web ServicesFootnote : As it has taken us a while (i.e.~2+ years) to realize our vision of running on the cloud, we needed an interim solution to handle growthWe did build a second data center along the wayWe did outgrow it5@r39132 - #netflixcloud
Cloud Migration StrategyWhat to Migrate?
Cloud Migration StrategyComponentsApplications and Software InfrastructureDataMigration ConsiderationsAvoid sensitive data for nowPII and PCI DSS stays in our DC, rest can go to the cloudFavor Web Scale applications & data@r39132 - #netflixcloud7
Cloud Migration StrategyExamples of Data that can be movedVideo-centric dataCritics’ and Users’ reviews Video Metadata (e.g. director, actors, plot description, etc…)User-video-centric data – some of our largest data setsVideo QueueWatched HistoryVideo Ratings (i.e. a 5-star rating system)Video Playback Metadata (e.g. streaming bookmarks, activity logs)@r39132 - #netflixcloud8
Cloud Migration StrategyHow and When to Migrate?
Cloud Migration StrategyHigh-level Requirements for our SiteNo big-bang migrations  New functionality needs to launch in the cloud when possibleHigh-level Requirements for our Data Data needs to migrate before applicationsData needs to be shared between applications running in the cloud and our data center during the transition period@r39132 - #netflixcloud10
Cloud Migration Strategy@r39132 - #netflixcloud11
Cloud Migration Strategy@r39132 - #netflixcloud12
Cloud Migration Strategy@r39132 - #netflixcloud13
Cloud Migration StrategyPick a (key-value) data store in the cloudChallengesTranslate RDBMS concepts to KV store conceptsWork-around Issues specific to the chosen KV store Create a bi-directional DC-Cloud data replication pipeline @r39132 - #netflixcloud14
Pick a Data Store in the Cloud
Pick a Data Store in the CloudAn ideal storage solution should have the following features:Hosted
Managed Distribution Model
Works in AWS
AP from CAP
Handles a majority of use-cases accessing high-growth, high-traffic data
Specifically, key access by customer id, movie id, or both@r39132 - #netflixcloud16
Pick a Data Store in the CloudWe picked SimpleDB and S3SimpleDB was targeted as the AP equivalent of our RDBMS databases in our Data CenterS3 was used for data sets where item or row data exceeded SimpleDB limits and could be looked up purely by a single key (i.e. does not require secondary indices and complex query semantics)Video encodesStreaming device activity logs (i.e. CLOB, BLOB, etc…)Compressed (old) Rental History@r39132 - #netflixcloud17
Technology OverviewSimpleDB
Technology Overview : SimpleDB@r39132 - #netflixcloud19Terminology
Technology Overview : SimpleDB@r39132 - #netflixcloud20SimpleDB’s salient characteristics SimpleDB offers a range of consistency options
 SimpleDB domains are sparse and schema-less
 The Key and all Attributes are indexed
 Each item must have a unique Key
 An item contains a set of Attributes
Each Attribute has a name
 Each Attribute has a set of values
 All data is stored as UTF-8 character strings (i.e. no support for types such as numbers or dates)Technology Overview : SimpleDBWhat does the API look like?Manage DomainsCreateDomainDeleteDomainListDomainsDomainMetaDataAccess DataRetrieving DataGetAttributes – returns a single itemSelect – returns multiple items using SQL syntaxWriting DataPutAttributes – put single itemBatchPutAttributes – put multiple itemsRemoving DataDeleteAttributes – delete single itemBatchDeleteAttributes – delete multiple items@r39132 - #netflixcloud21
Technology Overview : SimpleDB@r39132 - #netflixcloud22Options available on reads and writesConsistent Read Read the most recently committed write May have lower throughput/higher latency/lower availabilityConditional Put/Deletei.e. Optimistic Locking Useful if you want to build a consistent multi-master data store – you will still require your own anti-entropy We do not use this currently, so we don’t know how it performs
Challenge 1Translate RDBMS Concepts to Key-Value Store Concepts
Translate RDBMS Concepts to Key-Value Store ConceptsRelational Databases are known for relationsFirst, a quick refresher on Normal forms@r39132 - #netflixcloud24
NormalizationNF1 : All occurrences of a record type must contain the same number of fields -- variable repeating fields and groups are not allowedNF2 : Second normal form is violated when a non-key field is a fact about a subset of a keyViolated hereFixed here@r39132 - #netflixcloud25
NormalizationIssuesWastes StorageThe warehouse address is repeated for every Part-WH pairUpdate Performance SuffersIf the address of a warehouse changes, I must update every part in that warehouse – i.e. many rowsData Inconsistencies PossibleI can update the warehouse address for one Part-WH pair and miss Parts for the same WH (a.k.a. update anomaly)Data Loss PossibleAn empty warehouse does not have a row, so the address will be lost. (a.k.a. deletion anomaly)@r39132 - #netflixcloud26
NormalizationRDBMS  KV Store migrations can’t simply accept denormalization!Especially many-to-many and many-to-one entity relationshipsInstead, pick your data set candidates carefully!Keep relational data in RDBMS	Move key-look-ups to KV storesLuckily for Netflix, most Web Scale data is accessed by Customer, Video, or both  i.e. Key Lookups that do not violate 2NF or 3NF@r39132 - #netflixcloud27
Translate RDBMS Concepts to Key-Value Store ConceptsAside from relations, relational databases typically offer the following:TransactionsLocksSequencesTriggersClocksA structured query language (i.e. SQL)Database server-side coding constructs (i.e. PL/SQL)Constraints@r39132 - #netflixcloud28
Translate RDBMS Concepts to Key-Value Store ConceptsPartial or no SQL support (e.g. no Joins, Group Bys, etc…)BEST PRACTICECarry these out in the application layer for smallish dataNo relations between domainsBEST PRACTICECompose relations in the application layerNo transactionsBEST PRACTICESimpleDB : Conditional Put/Delete (best effort) w/ fixer jobsCassandra : Batch Mutate + the same column TS for all writes @r39132 - #netflixcloud29
Translate RDBMS Concepts to Key-Value Store ConceptsNo schema - This is non-obvious. A query for a misspelled attribute name will not fail with an errorBEST PRACTICEImplement a schema validator in a common data access layerNo sequencesBEST PRACTICESequences are often used as primary keysIn this case, use a naturally occurring unique keyIf no naturally occurring unique key exists, use a UUIDSequences are also often used for orderingUse a distributed sequence generator or rely on client timestamps@r39132 - #netflixcloud30
Translate RDBMS Concepts to Key-Value Store ConceptsNo clock operations, PL/SQL, TriggersBEST PRACTICEClocks : Instead rely on client-generated clocks and run NTP. If using clocks to determine order, be aware that this is problematic over long distances.PL/SQL, Triggers : Do withoutNo constraints. Specifically, No uniqueness constraintsNo foreign key or referential constraintsNo integrity constraintsBEST PRACTICEApplications must implement this functionality@r39132 - #netflixcloud31
Challenge 2Work-around Issues specific to the chosen KV store SimpleDB
Work-around Issues specific to the chosen KV storeMissing / Strange FunctionalityNo back-up and recoveryNo native support for types (e.g. Number, Float, Date, etc…)You cannot update one attribute and null out another one for an item in a single API callMis-cased or misspelled attribute names in operations fail silently. Why is SimpleDB case-sensitive?Neglecting "limit N" returns a subset of information. Why does the absence of an optional parameter not return all of the data?Users need to deal with data set partitioningBeware of NullsWrite throughput not as high as we need for certain use-cases@r39132 - #netflixcloud33
Work-around Issues specific to the chosen KV storeNo Native Types – Sorting, Inequalities Conditions, etc…Since sorting is lexicographical, if you plan on sorting by certain attributes, thenzero-pad logically-numeric attributese.g. – 000000000000000111111  this is bigger000000000000000011111use Joda time to store logical datese.g. – 2010-02-10T01:15:32.864Z this is more recent2010-02-10T01:14:42.864Z@r39132 - #netflixcloud34
Work-around Issues specific to the chosen KV storeAnti-pattern : Avoid the anti-pattern Select SOME_FIELD_1 from MY_DOMAIN where SOME_FIELD_2 is null as this is a full domain scanNulls are not indexed in a sparse-tableBEST PRACTICEInstead, replace this check with a (indexed) flag column called IS_FIELD_2_NULL: Select SOME_FIELD_1 from MY_DOMAIN where IS_FIELD_2_NULL = 'Y'Anti-pattern : When selecting data from a domain and sorting by an attribute, items missing that attribute will not be returnedIn Oracle, rows with null columns are still returnedBEST PRACTICEUse a flag column as shown previously@r39132 - #netflixcloud35
Work-around Issues specific to the chosen KV storeBEST PRACTICE : Aim for high index selectivity when you formulate your select expressions for best performanceSimpleDB select performance is sensitive to index selectivityIndex SelectivityDefinition : # of distinct attribute values in specified attribute/# of items in domaine.g. Good Index Selectivity (i.e. 1 is the best)A table having 100 records and one of its indexed column has 88 distinct values, then the selectivity of this index is 88 / 100= 0.88e.g. Bad Index Selectivitylf an index on a table of 1000 records had only 5 distinct values, then the index's selectivity is 5 / 1000 = 0.005@r39132 - #netflixcloud36
Work-around Issues specific to the chosen KV storeSharding DomainsThere are 2 reasons to shard domainsYou are trying to avoid running into one of the sizing limitse.g. 10GB of space or 1 Billion AttributesYou are trying to scale your writesTo scale your writes further, use BatchPutAttributes and BatchDeleteAttributes where possible@r39132 - #netflixcloud37
Challenge 3Create a Bi-directional DC-Cloud Data Replication Pipeline
Create a Bi-directional DC-Cloud Data Replication PipelineHome-grown Data Replication Framework known as IR for Item Replication2 schemes in use currentlyPolls the main table (a.k.a. Simple IR)Doesn’t capture deletes but easy to implementPolls a journal table that is populated via a trigger on the main table (a.k.a. Trigger-journaled IR)Captures every CRUD, but requires the development of triggers@r39132 - #netflixcloud39
Create a Bi-directional DC-Cloud Data Replication Pipeline@r39132 - #netflixcloud40
Create a Bi-directional DC-Cloud Data Replication PipelineHow often do we poll Oracle?Every 5 secondsWhat does the poll query look like?select * 	from QLOG_0 	where LAST_UPDATE_TS > :CHECKPOINT  Get recent	and LAST_UPDATE_TS < :NOW_MINUS_30s  Exclude most recent	order by LAST_UPDATE_TS  Process in order@r39132 - #netflixcloud41
Create a Bi-directional DC-Cloud Data Replication PipelineData Replication Challenges & Best PracticesSimpleDB throttles traffic aggressively via 503 HTTP Response codes (“Service Unavailable”)With Singleton writes, I see 70-120 write TPS/domainIRShard domains (i.e. partition data sets) to work-around these limitsEmploys Slow ramp upUses BatchPutAttributes instead of (Singleton) PutAttributes callExercises an exponential bounded-back-off algorithmUses attribute-level replace=false when fork-lifting data @r39132 - #netflixcloud42
Netflix’s Transition to NoSQL P.1.5Cassandra
Data ModelCassandra
Data Model : Cassandra@r39132 - #netflixcloud45Terminology
Data Model : Cassandra@r39132 - #netflixcloud46
Data Model : Cassandra@r39132 - #netflixcloud47
API in ActionCassandra
APIs for ReadsReadsI want to continue watching Tron from where I left off (quorum reads)?datastore.get(“Netflix”, ”Sid_Anand”, Streaming Bookmarks  Tron , ConsistencyLevel.QUORUM)When did the True Grit DVD get shipped and returned (fastest read)?datastore.get_slice(“Netflix”, ”Sid_Anand”, (DVD) Rental History  5678, [“Ship_TS”, “Return_TS”], ConsistencyLevel.ONE)How many DVD have been shipped to me (fastest read)?datastore.get_count(“Netflix”, ”Sid_Anand”, (DVD) Rental History, ConsistencyLevel.ONE)@r39132 - #netflixcloud49
APIs for WritesWritesReplicate Netflix Hub Operation Shipments as Batched Writes : True Grit and Tron shipped together to Siddatastore.batch_mutate(“Netflix”, mutation_map, ConsistencyLevel.QUORUM)@r39132 - #netflixcloud50
Performance ModelCassandra
The Promise of Performance@r39132 - #netflixcloud52
The Promise of PerformanceManage ReadsRows and Top-Level Columns are stored and indexed in sorted order giving logarithmic time complexity for look upThese helpBloom Filters at the Row LevelKey CacheLarge OS Page CacheThese do not helpDisk seeks on readsIt gets worse with more row-redundancy across SSTables  Compaction is a necessary evilCompaction wipes out the Key Cache@r39132 - #netflixcloud53
The Promise of Performance@r39132 - #netflixcloud54

Svccg nosql 2011_v4

  • 1.
    Netflix’s Transition toNoSQL P.1Siddharth “Sid” Anand@r39132Silicon Valley Cloud Computing Group February 2011
  • 2.
    Special ThanksSebastian Stadil– Scalr, SVCCG Meetup OrganizerSriSatish Ambati – DatastaxLynn Bender – Geek AustinScott MacVicar – Facebook2@r39132 - #netflixcloud
  • 3.
  • 4.
    MotivationCirca late 2008,Netflix had a single data centerSingle-point-of-failure (a.k.a. SPOF) Approaching limits on cooling, power, space, traffic capacityAlternativesBuild more data centersOutsource the majority of our capacity planning and scale out Allows us to focus on core competencies@r39132 - #netflixcloud4
  • 5.
    MotivationWinner: Outsource themajority of our capacity planning and scale outLeverage a leading Infrastructure-as-a-service provider Amazon Web ServicesFootnote : As it has taken us a while (i.e.~2+ years) to realize our vision of running on the cloud, we needed an interim solution to handle growthWe did build a second data center along the wayWe did outgrow it5@r39132 - #netflixcloud
  • 6.
  • 7.
    Cloud Migration StrategyComponentsApplicationsand Software InfrastructureDataMigration ConsiderationsAvoid sensitive data for nowPII and PCI DSS stays in our DC, rest can go to the cloudFavor Web Scale applications & data@r39132 - #netflixcloud7
  • 8.
    Cloud Migration StrategyExamplesof Data that can be movedVideo-centric dataCritics’ and Users’ reviews Video Metadata (e.g. director, actors, plot description, etc…)User-video-centric data – some of our largest data setsVideo QueueWatched HistoryVideo Ratings (i.e. a 5-star rating system)Video Playback Metadata (e.g. streaming bookmarks, activity logs)@r39132 - #netflixcloud8
  • 9.
    Cloud Migration StrategyHowand When to Migrate?
  • 10.
    Cloud Migration StrategyHigh-levelRequirements for our SiteNo big-bang migrations New functionality needs to launch in the cloud when possibleHigh-level Requirements for our Data Data needs to migrate before applicationsData needs to be shared between applications running in the cloud and our data center during the transition period@r39132 - #netflixcloud10
  • 11.
  • 12.
  • 13.
  • 14.
    Cloud Migration StrategyPicka (key-value) data store in the cloudChallengesTranslate RDBMS concepts to KV store conceptsWork-around Issues specific to the chosen KV store Create a bi-directional DC-Cloud data replication pipeline @r39132 - #netflixcloud14
  • 15.
    Pick a DataStore in the Cloud
  • 16.
    Pick a DataStore in the CloudAn ideal storage solution should have the following features:Hosted
  • 17.
  • 18.
  • 19.
  • 20.
    Handles a majorityof use-cases accessing high-growth, high-traffic data
  • 21.
    Specifically, key accessby customer id, movie id, or both@r39132 - #netflixcloud16
  • 22.
    Pick a DataStore in the CloudWe picked SimpleDB and S3SimpleDB was targeted as the AP equivalent of our RDBMS databases in our Data CenterS3 was used for data sets where item or row data exceeded SimpleDB limits and could be looked up purely by a single key (i.e. does not require secondary indices and complex query semantics)Video encodesStreaming device activity logs (i.e. CLOB, BLOB, etc…)Compressed (old) Rental History@r39132 - #netflixcloud17
  • 23.
  • 24.
    Technology Overview :SimpleDB@r39132 - #netflixcloud19Terminology
  • 25.
    Technology Overview :SimpleDB@r39132 - #netflixcloud20SimpleDB’s salient characteristics SimpleDB offers a range of consistency options
  • 26.
    SimpleDB domainsare sparse and schema-less
  • 27.
    The Keyand all Attributes are indexed
  • 28.
    Each itemmust have a unique Key
  • 29.
    An itemcontains a set of Attributes
  • 30.
  • 31.
    Each Attributehas a set of values
  • 32.
    All datais stored as UTF-8 character strings (i.e. no support for types such as numbers or dates)Technology Overview : SimpleDBWhat does the API look like?Manage DomainsCreateDomainDeleteDomainListDomainsDomainMetaDataAccess DataRetrieving DataGetAttributes – returns a single itemSelect – returns multiple items using SQL syntaxWriting DataPutAttributes – put single itemBatchPutAttributes – put multiple itemsRemoving DataDeleteAttributes – delete single itemBatchDeleteAttributes – delete multiple items@r39132 - #netflixcloud21
  • 33.
    Technology Overview :SimpleDB@r39132 - #netflixcloud22Options available on reads and writesConsistent Read Read the most recently committed write May have lower throughput/higher latency/lower availabilityConditional Put/Deletei.e. Optimistic Locking Useful if you want to build a consistent multi-master data store – you will still require your own anti-entropy We do not use this currently, so we don’t know how it performs
  • 34.
    Challenge 1Translate RDBMSConcepts to Key-Value Store Concepts
  • 35.
    Translate RDBMS Conceptsto Key-Value Store ConceptsRelational Databases are known for relationsFirst, a quick refresher on Normal forms@r39132 - #netflixcloud24
  • 36.
    NormalizationNF1 : Alloccurrences of a record type must contain the same number of fields -- variable repeating fields and groups are not allowedNF2 : Second normal form is violated when a non-key field is a fact about a subset of a keyViolated hereFixed here@r39132 - #netflixcloud25
  • 37.
    NormalizationIssuesWastes StorageThe warehouseaddress is repeated for every Part-WH pairUpdate Performance SuffersIf the address of a warehouse changes, I must update every part in that warehouse – i.e. many rowsData Inconsistencies PossibleI can update the warehouse address for one Part-WH pair and miss Parts for the same WH (a.k.a. update anomaly)Data Loss PossibleAn empty warehouse does not have a row, so the address will be lost. (a.k.a. deletion anomaly)@r39132 - #netflixcloud26
  • 38.
    NormalizationRDBMS  KVStore migrations can’t simply accept denormalization!Especially many-to-many and many-to-one entity relationshipsInstead, pick your data set candidates carefully!Keep relational data in RDBMS Move key-look-ups to KV storesLuckily for Netflix, most Web Scale data is accessed by Customer, Video, or both i.e. Key Lookups that do not violate 2NF or 3NF@r39132 - #netflixcloud27
  • 39.
    Translate RDBMS Conceptsto Key-Value Store ConceptsAside from relations, relational databases typically offer the following:TransactionsLocksSequencesTriggersClocksA structured query language (i.e. SQL)Database server-side coding constructs (i.e. PL/SQL)Constraints@r39132 - #netflixcloud28
  • 40.
    Translate RDBMS Conceptsto Key-Value Store ConceptsPartial or no SQL support (e.g. no Joins, Group Bys, etc…)BEST PRACTICECarry these out in the application layer for smallish dataNo relations between domainsBEST PRACTICECompose relations in the application layerNo transactionsBEST PRACTICESimpleDB : Conditional Put/Delete (best effort) w/ fixer jobsCassandra : Batch Mutate + the same column TS for all writes @r39132 - #netflixcloud29
  • 41.
    Translate RDBMS Conceptsto Key-Value Store ConceptsNo schema - This is non-obvious. A query for a misspelled attribute name will not fail with an errorBEST PRACTICEImplement a schema validator in a common data access layerNo sequencesBEST PRACTICESequences are often used as primary keysIn this case, use a naturally occurring unique keyIf no naturally occurring unique key exists, use a UUIDSequences are also often used for orderingUse a distributed sequence generator or rely on client timestamps@r39132 - #netflixcloud30
  • 42.
    Translate RDBMS Conceptsto Key-Value Store ConceptsNo clock operations, PL/SQL, TriggersBEST PRACTICEClocks : Instead rely on client-generated clocks and run NTP. If using clocks to determine order, be aware that this is problematic over long distances.PL/SQL, Triggers : Do withoutNo constraints. Specifically, No uniqueness constraintsNo foreign key or referential constraintsNo integrity constraintsBEST PRACTICEApplications must implement this functionality@r39132 - #netflixcloud31
  • 43.
    Challenge 2Work-around Issuesspecific to the chosen KV store SimpleDB
  • 44.
    Work-around Issues specificto the chosen KV storeMissing / Strange FunctionalityNo back-up and recoveryNo native support for types (e.g. Number, Float, Date, etc…)You cannot update one attribute and null out another one for an item in a single API callMis-cased or misspelled attribute names in operations fail silently. Why is SimpleDB case-sensitive?Neglecting "limit N" returns a subset of information. Why does the absence of an optional parameter not return all of the data?Users need to deal with data set partitioningBeware of NullsWrite throughput not as high as we need for certain use-cases@r39132 - #netflixcloud33
  • 45.
    Work-around Issues specificto the chosen KV storeNo Native Types – Sorting, Inequalities Conditions, etc…Since sorting is lexicographical, if you plan on sorting by certain attributes, thenzero-pad logically-numeric attributese.g. – 000000000000000111111  this is bigger000000000000000011111use Joda time to store logical datese.g. – 2010-02-10T01:15:32.864Z this is more recent2010-02-10T01:14:42.864Z@r39132 - #netflixcloud34
  • 46.
    Work-around Issues specificto the chosen KV storeAnti-pattern : Avoid the anti-pattern Select SOME_FIELD_1 from MY_DOMAIN where SOME_FIELD_2 is null as this is a full domain scanNulls are not indexed in a sparse-tableBEST PRACTICEInstead, replace this check with a (indexed) flag column called IS_FIELD_2_NULL: Select SOME_FIELD_1 from MY_DOMAIN where IS_FIELD_2_NULL = 'Y'Anti-pattern : When selecting data from a domain and sorting by an attribute, items missing that attribute will not be returnedIn Oracle, rows with null columns are still returnedBEST PRACTICEUse a flag column as shown previously@r39132 - #netflixcloud35
  • 47.
    Work-around Issues specificto the chosen KV storeBEST PRACTICE : Aim for high index selectivity when you formulate your select expressions for best performanceSimpleDB select performance is sensitive to index selectivityIndex SelectivityDefinition : # of distinct attribute values in specified attribute/# of items in domaine.g. Good Index Selectivity (i.e. 1 is the best)A table having 100 records and one of its indexed column has 88 distinct values, then the selectivity of this index is 88 / 100= 0.88e.g. Bad Index Selectivitylf an index on a table of 1000 records had only 5 distinct values, then the index's selectivity is 5 / 1000 = 0.005@r39132 - #netflixcloud36
  • 48.
    Work-around Issues specificto the chosen KV storeSharding DomainsThere are 2 reasons to shard domainsYou are trying to avoid running into one of the sizing limitse.g. 10GB of space or 1 Billion AttributesYou are trying to scale your writesTo scale your writes further, use BatchPutAttributes and BatchDeleteAttributes where possible@r39132 - #netflixcloud37
  • 49.
    Challenge 3Create aBi-directional DC-Cloud Data Replication Pipeline
  • 50.
    Create a Bi-directionalDC-Cloud Data Replication PipelineHome-grown Data Replication Framework known as IR for Item Replication2 schemes in use currentlyPolls the main table (a.k.a. Simple IR)Doesn’t capture deletes but easy to implementPolls a journal table that is populated via a trigger on the main table (a.k.a. Trigger-journaled IR)Captures every CRUD, but requires the development of triggers@r39132 - #netflixcloud39
  • 51.
    Create a Bi-directionalDC-Cloud Data Replication Pipeline@r39132 - #netflixcloud40
  • 52.
    Create a Bi-directionalDC-Cloud Data Replication PipelineHow often do we poll Oracle?Every 5 secondsWhat does the poll query look like?select * from QLOG_0 where LAST_UPDATE_TS > :CHECKPOINT  Get recent and LAST_UPDATE_TS < :NOW_MINUS_30s  Exclude most recent order by LAST_UPDATE_TS  Process in order@r39132 - #netflixcloud41
  • 53.
    Create a Bi-directionalDC-Cloud Data Replication PipelineData Replication Challenges & Best PracticesSimpleDB throttles traffic aggressively via 503 HTTP Response codes (“Service Unavailable”)With Singleton writes, I see 70-120 write TPS/domainIRShard domains (i.e. partition data sets) to work-around these limitsEmploys Slow ramp upUses BatchPutAttributes instead of (Singleton) PutAttributes callExercises an exponential bounded-back-off algorithmUses attribute-level replace=false when fork-lifting data @r39132 - #netflixcloud42
  • 54.
    Netflix’s Transition toNoSQL P.1.5Cassandra
  • 55.
  • 56.
    Data Model :Cassandra@r39132 - #netflixcloud45Terminology
  • 57.
    Data Model :Cassandra@r39132 - #netflixcloud46
  • 58.
    Data Model :Cassandra@r39132 - #netflixcloud47
  • 59.
  • 60.
    APIs for ReadsReadsIwant to continue watching Tron from where I left off (quorum reads)?datastore.get(“Netflix”, ”Sid_Anand”, Streaming Bookmarks  Tron , ConsistencyLevel.QUORUM)When did the True Grit DVD get shipped and returned (fastest read)?datastore.get_slice(“Netflix”, ”Sid_Anand”, (DVD) Rental History  5678, [“Ship_TS”, “Return_TS”], ConsistencyLevel.ONE)How many DVD have been shipped to me (fastest read)?datastore.get_count(“Netflix”, ”Sid_Anand”, (DVD) Rental History, ConsistencyLevel.ONE)@r39132 - #netflixcloud49
  • 61.
    APIs for WritesWritesReplicateNetflix Hub Operation Shipments as Batched Writes : True Grit and Tron shipped together to Siddatastore.batch_mutate(“Netflix”, mutation_map, ConsistencyLevel.QUORUM)@r39132 - #netflixcloud50
  • 62.
  • 63.
    The Promise ofPerformance@r39132 - #netflixcloud52
  • 64.
    The Promise ofPerformanceManage ReadsRows and Top-Level Columns are stored and indexed in sorted order giving logarithmic time complexity for look upThese helpBloom Filters at the Row LevelKey CacheLarge OS Page CacheThese do not helpDisk seeks on readsIt gets worse with more row-redundancy across SSTables  Compaction is a necessary evilCompaction wipes out the Key Cache@r39132 - #netflixcloud53
  • 65.
    The Promise ofPerformance@r39132 - #netflixcloud54
  • 66.
  • 67.
    Distribution ModelNo timehere.. Read up on the following:Merkle Trees + Gossip  Anti-EntropyRead-RepairConsistent HashingSEDA (a.k.a. Staged Event Driven Architecture) paperDynamo paper@r39132 - #netflixcloud56
  • 68.
  • 69.
    Features We Like@r39132- #netflixcloud58Rich Value Model : Value is a set of columns or super-columnsEfficiently address, change, and version individual columnsDoes not require read-whole-row-before-alter semanticsEffectively No Column or Column Family LimitsSimpleDB Limits256 Attributes / Item1 Billion Attributes / Domain1 KB Attribute Value LengthGrowing a Cluster a.k.a. Resharding a KeySpace is Manageable SimpleDBUsers must solve this problem : application code needs to do the migration
  • 70.
    Features We Like@r39132- #netflixcloud59Better handing of TypesSimpleDBEverything is a UTF-8 stringCassandraNative Support for Key and Column Key types (for sorting)Column values are never looked at and are just []byteOpen Source & JavaImplement our own Backup & Recovery Implement our own Replication StrategyWe know Java best, though we think Erlang is cool, with the exception of the fact that each digit in an integer is 1 byte of memory!!!We can make it work in AWS
  • 71.
    Features We Like@r39132- #netflixcloud60No Update-Delete AnomaliesSpecify a batch mutation with a delete and a mutation for a single row_key/column-family pair in a single batch Must use same column Time StampTunable Consistency/Availability TradeoffStrong ConsistencyQuorum Reads & WritesEventual ConsistencyR=1, W=1 (fastest read and fastest write)R=1 , W=QUORUM (fastest read and potentially-slower write)R=QUORUM, W=1 (potentially-slower read and fastest write)
  • 72.
    Where Does ThatLeave Us?Cassandra
  • 73.
    Where Are WeWith This List?KV Store Missing / Strange FunctionalityNo back-up and recoveryNo native support for types (e.g. Number, Float, Date, etc…)You cannot update one attribute and null out another one for an item in a single API callMis-cased or misspelled attribute names in operations fail silently. Neglecting "limit N" returns a subset of information. Why does the absence of an optional parameter not return all of the data?Users need to deal with data set partitioningBeware of NullsWrite throughput not as high as we need for certain use-cases@r39132 - #netflixcloud62
  • 74.
    Pick a DataStore in the CloudAn ideal storage solution should have the following features:Hosted
  • 75.
  • 76.
  • 77.
  • 78.
    Handles a majorityof use-cases accessing high-growth, high-traffic data
  • 79.
    Specifically, key accessby customer id, movie id, or both
  • 80.
  • 81.
  • 82.
    Netflix Wants You@r39132- #netflixcloud64Cloud Systems is working on interesting problems in distributed systems:Cassandra
  • 83.
  • 84.
    Other Custom SystemsThereare a bunch of Netflix and Cloud Systems folks here, feel free to reach out to them!!
  • 85.

Editor's Notes

  • #6 Existing functionality needs to move in phasesLimits the risk and exposure to bugsLimits conflicts with new product launches
  • #7 Existing functionality needs to move in phasesLimits the risk and exposure to bugsLimits conflicts with new product launches
  • #8 Existing functionality needs to move in phasesLimits the risk and exposure to bugsLimits conflicts with new product launches
  • #9 Existing functionality needs to move in phasesLimits the risk and exposure to bugsLimits conflicts with new product launches
  • #10 Existing functionality needs to move in phasesLimits the risk and exposure to bugsLimits conflicts with new product launches
  • #11 Dynamo storage doesn’t suffer from this!
  • #12 This is an issue with any SQL-like Query layer over a Sparse-data model. It can happen in other technologies.
  • #13 Cannot treat SimpleDB like a black-box for performance critical applications.
  • #14 We found the write availability was affected by the right partitioning scheme. We use a combination of forwarding tables and modulo addressing
  • #16 Mention trickle lift
  • #18 Dynamo storage doesn’t suffer from this!