Your SlideShare is downloading. ×
Fosdem 2012
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Fosdem 2012

786
views

Published on

Published in: Technology

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
786
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
12
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Transcript

    • 1. The Apache Cassandra storage engine Sylvain Lebresne (sylvain@ .com) FOSDEM ’12, Brussels
    • 2. 1. What is Apache Cassandra2. Data Model3. The storage engine
    • 3. 1. What is Apache Cassandra2. Data Model3. The storage engine
    • 4. about:project• Distributed data store aimed at big data• Apache project since 2010.• Version 1.0 released last October.• Proven in production (Netflix, Twitter, Reddit, Cisco, ...). Largest know cluster has over 300TB in over 400 machines.
    • 5. Apache Cassandra
    • 6. Apache CassandraA database:
    • 7. Apache CassandraA database:• distributed / decentralized
    • 8. Apache CassandraA database:• distributed / decentralized• replicated & durable
    • 9. Apache CassandraA database:• distributed / decentralized• replicated & durable• scalable / elastic
    • 10. Apache CassandraA database:• distributed / decentralized• replicated & durable• scalable / elastic
    • 11. Apache CassandraA database:• distributed / decentralized• replicated & durable• scalable / elastic• fault-tolerant / no SPOF
    • 12. Apache CassandraA database:• distributed / decentralized• replicated & durable• scalable / elastic• fault-tolerant / no SPOF• highly available
    • 13. Apache CassandraA database:• distributed / decentralized• replicated & durable• scalable / elastic• fault-tolerant / no SPOF• highly available
    • 14. Apache CassandraA database:• distributed / decentralized• replicated & durable• scalable / elastic• fault-tolerant / no SPOF• highly available• data center aware US Europe
    • 15. 1. What is Apache Cassandra2. Data Model3. The storage engine
    • 16. Data Model• Not SQL (no transaction, nor joins) but more than Key/Value.• Inspired by Google BigTable• Column families based.
    • 17. Ex: user profiles “For each user, holds profile infos” 50e8-e29b birth_year 1994 fname Justin lname BieberUsers
    • 18. Ex: user profiles “For each user, holds profile infos” 50e8-e29b 2ab1-f1b7 birth_year 1994 birth_year 1978 fname Justin email a@kutcher.com lname Bieber fname Ashton lname KutcherUsers
    • 19. Ex: user’s Tweets “For each user, tweets he has made” 50e8-e29bTimeline
    • 20. Ex: user’s Tweets “For each user, tweets he has made” 50e8-e29b @LiveLoveKary glad you had 0 a good birthday #muchloveTimeline
    • 21. Ex: user’s Tweets “For each user, tweets he has made” 50e8-e29b @NickDeMoura happy bday 1 my dude. @LiveLoveKary glad you had 0 a good birthday #muchloveTimeline
    • 22. Ex: user’s Tweets “For each user, tweets he has made” 50e8-e29b @MickyArison @miamiHEAT 2 thanks for the gam tonight @NickDeMoura happy bday 1 my dude. @LiveLoveKary glad you had 0 a good birthday #muchloveTimeline
    • 23. Ex: user’s Tweets “For each user, tweets he has made” 50e8-e29b still a little tired. back in the 3 studio today with Timbaland @MickyArison @miamiHEAT 2 thanks for the gam tonight @NickDeMoura happy bday 1 my dude. @LiveLoveKary glad you had 0 a good birthday #muchloveTimeline
    • 24. There’s more• Secondary indexes• Distributed counters• Composite columns
    • 25. 1. What is Apache Cassandra2. Data Model3. The storage engine
    • 26. Goal• Writes are harder than reads to scale• Spinning disks aren’t good with random I/O• Goal: minimize random I/O
    • 27. A write’s journal write( k1 , c1:v1 ) Memory MemtableCommit log Hard drive
    • 28. A write’s journal write( k1 , c1:v1 ) Memory k1 c1:v1 Memtable k1 c1:v1Commit log Hard drive
    • 29. A write’s journalack Memory k1 c1:v1k1 c1:v1 Hard drive
    • 30. A write’s journalwrite( k1 , c2:v2 ) Memory k1 c1:v1 c2:v2 k1 c1:v1 k1 c2:v2 Hard drive
    • 31. A write’s journalwrite( k2 , c1:v1 c2:v2 ) Memory k1 c1:v1 c2:v2 k2 c1:v1 c2:v2 k1 c1:v1 k1 c2:v2 k2 c1:v1 c2:v2 Hard drive
    • 32. A write’s journalwrite( k1 , c1:v4 c3:v3 ) Memory k1 c1:v4 c2:v2 c3:v3 k2 c1:v1 c2:v2 k1 c1:v1 k1 c2:v2 k2 c1:v1 c2:v2k1 c1:v4 c3:v3 Hard drive
    • 33. A write’s journal Memory flush indexcleanup k1 c1:v4 c2:v2 c3:v3 k2 c1:v1 c2:v2 SSTable Hard drive
    • 34. A write’s journalmore updates Memory k1 c1:v5 c4:v4 k2 c1:v2 c3:v3 k2 c1:v2 c3:v3 k1 c1:v5 c4:v4 index k1 c1:v4 c2:v2 c3:v3 k2 c1:v1 c2:v2 Hard drive
    • 35. A write’s journal Memory flush index index k1 c1:v4 c2:v2 c3:v3 k1 c1:v5 c4:v4 k2 c1:v1 c2:v2 k2 c1:v2 c3:v3 Hard drive
    • 36. Writes properties• No reads or seeks• Only sequential I/O• Immutable SSTables: easy snapshots
    • 37. A read’s journalread( k1 ) Memory ? index index k1 c1:v4 c2:v2 c3:v3 k1 c1:v5 c4:v4 k2 c1:v1 c2:v2 k2 c1:v2 c3:v3 Hard drive
    • 38. A read’s journalk1 c1:v5 c2:v2 c3:v3 c4:v4 Memorymerge index index k1 c1:v4 c2:v2 c3:v3 k1 c1:v5 c4:v4 k2 c1:v1 c2:v2 k2 c1:v2 c3:v3 Hard drive
    • 39. Compaction• Goal: keep the number of SSTables low• Merge sort against multiple sstables• Sequential I/O
    • 40. Compaction• Goal: keep the number of SSTables low• Merge sort against multiple sstables• Sequential I/O index k1 c1:v4 c2:v2 c3:v3 k2 c1:v1 c2:v2 index k1 c1:v5 c4:v4 k2 c1:v2 c3:v3
    • 41. Compaction• Goal: keep the number of SSTables low• Merge sort against multiple sstables• Sequential I/O index k1 c1:v4 c2:v2 c3:v3 k2 c1:v1 c2:v2 index k1 c1:v5 c2:v2 c3:v3 c4:v4 index k2 c1:v2 c2:v2 c3:v3 k1 c1:v5 c4:v4 k2 c1:v2 c3:v3
    • 42. Optimizations• Row Cache• Bloom filters: eliminates whole SSTable• Key Cache• Rows & Columns Indexes• ...
    • 43. Other features• Compression• Checksums• Time to live
    • 44. Questions?
    • 45. • Cassandra 1.1 scheduled for next month• http://cassandra.apache.org/• http://wiki.apache.org/cassandra/• http://www.datastax.com/docs/1.0
    • 46. Data Model Keyspace name Column Family name Row key Column name Value Columns (upto 2B) Rows (∞) Column Families (10’s ➝ 100’s)Keyspaces (1 per app)
    • 47. Leveled CompactionL0L1L2L3
    • 48. Leveled CompactionL0L1L2L3
    • 49. Leveled CompactionL0L1L2L3
    • 50. Leveled CompactionL0L1L2L3
    • 51. Leveled CompactionL0L1L2L3
    • 52. Leveled CompactionL0L1L2L3

    ×