Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

cassandra@Netflix

916 views

Published on

A brief overview of how cassandra is being used at Netflix

Published in: Technology, Education
  • Be the first to comment

cassandra@Netflix

  1. 1. Cassandra @Nitish KorlaCloud Data Architect
  2. 2. Why Cassandra? High Availability / Fully distributed Scalability (Linear) Write performance Multi-region replication support (bi-directional) Simple to install and operate
  3. 3. Cassandra footprint @ Netflix• 50+ Cassandra clusters• 1000+ nodes holding 100+ TB data• AWS 500 IOPS -> 100, 000 IOPS• Streaming data completely persisted in Cassandra• Related Open Source Projects– Cassandra : in-house committer– Priam : Cassandra Automation– Test Tools : jmeter– http://github.com/netflix
  4. 4. Device Keys - CassandraAWS EU-WestEU appsAWS US-EastAWS US-WestUS-E appsUS-W apps
  5. 5. Data Model• Row-oriented• Number of columns/Names can differnamexyz Paul zip 95123nameabc Adam zip 94538 sex Malenamenk12 Nitish
  6. 6. Read/Write performance• Write performance : Superfast!!– Sequential I/O– In-memory write– Zero locking• Point reads : high performant• Range scans– Need reverse-key indexes– Assess the need for range scans (full-table scans)– Use Netflix Astyanax client library
  7. 7. wide-row implementation• Viewing history22-JAN100 json 1-MAR json24-jan501 Jsondata25-janjsondata26-jan datadataname1000 Nitish28-jan Jsondata29-jan jsondata
  8. 8. Think Data Archival• Data stores in Netflix grow exponentially• Have a process in place to archive data– Work with Data Science Engineering /DW– Move data to cheap H/W– Set right expectations w.r.t latencies with historical data• Cassandra TTL’s
  9. 9. read-modify-write patterns• Read portion drives the overall latency• Revisit your architecture
  10. 10. Observations• Cassandra scales linearly without any noticeabledegradation to running cluster• Read performance sufficient enough to removememcache in some cases• Self-healing : minimal operational noise• Developers– mindset needed a shift from normalization todenormalization– Need to have reasonable understanding of Cassandraarchitecture
  11. 11. Avoid surprises• Benchmark …• AWS makes it easy for us

×