Your SlideShare is downloading. ×
0
cassandra@Netflix
cassandra@Netflix
cassandra@Netflix
cassandra@Netflix
cassandra@Netflix
cassandra@Netflix
cassandra@Netflix
cassandra@Netflix
cassandra@Netflix
cassandra@Netflix
cassandra@Netflix
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

cassandra@Netflix

487

Published on

A brief overview of how cassandra is being used at Netflix

A brief overview of how cassandra is being used at Netflix

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
487
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
11
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Share some practical data modeling lessons we have learned over past 2 yearsUnderstand your data use patterns and match it to your persistence store at the cost of DE normalizationVery important to spend time and come up appropriate data model – cost is high. Subscriber example
  • Start with some live example.. And then use it as segway to cover some best practices
  • Start with some live example.. And then use it as segway to cover some best practices
  • Rows are indexedColumns are sorted based on comparator you specify, so use it to your benefitKeep column names short as they are repeated Column size = 15 bytes + size of name + size of value Don’t store empty columns if there is no need – schema free design
  • Cassandra is for point queriesStill ok for small set of rows
  • We don’t have linear growthTTL fascinating feature… coming from oracle background
  • We don’t have linear growthTTL fascinating feature… coming from oracle background
  • gps 1.0
  • architecture to reap the benefits of distributed computing / high performance
  • Transcript

    • 1. Cassandra @Nitish KorlaCloud Data Architect
    • 2. Why Cassandra? High Availability / Fully distributed Scalability (Linear) Write performance Multi-region replication support (bi-directional) Simple to install and operate
    • 3. Cassandra footprint @ Netflix• 50+ Cassandra clusters• 1000+ nodes holding 100+ TB data• AWS 500 IOPS -> 100, 000 IOPS• Streaming data completely persisted in Cassandra• Related Open Source Projects– Cassandra : in-house committer– Priam : Cassandra Automation– Test Tools : jmeter– http://github.com/netflix
    • 4. Device Keys - CassandraAWS EU-WestEU appsAWS US-EastAWS US-WestUS-E appsUS-W apps
    • 5. Data Model• Row-oriented• Number of columns/Names can differnamexyz Paul zip 95123nameabc Adam zip 94538 sex Malenamenk12 Nitish
    • 6. Read/Write performance• Write performance : Superfast!!– Sequential I/O– In-memory write– Zero locking• Point reads : high performant• Range scans– Need reverse-key indexes– Assess the need for range scans (full-table scans)– Use Netflix Astyanax client library
    • 7. wide-row implementation• Viewing history22-JAN100 json 1-MAR json24-jan501 Jsondata25-janjsondata26-jan datadataname1000 Nitish28-jan Jsondata29-jan jsondata
    • 8. Think Data Archival• Data stores in Netflix grow exponentially• Have a process in place to archive data– Work with Data Science Engineering /DW– Move data to cheap H/W– Set right expectations w.r.t latencies with historical data• Cassandra TTL’s
    • 9. read-modify-write patterns• Read portion drives the overall latency• Revisit your architecture
    • 10. Observations• Cassandra scales linearly without any noticeabledegradation to running cluster• Read performance sufficient enough to removememcache in some cases• Self-healing : minimal operational noise• Developers– mindset needed a shift from normalization todenormalization– Need to have reasonable understanding of Cassandraarchitecture
    • 11. Avoid surprises• Benchmark …• AWS makes it easy for us

    ×