2. 1. What is Apache Cassandra
2. Data Model
3. The storage engine
3. 1. What is Apache Cassandra
2. Data Model
3. The storage engine
4. about:project
• Distributed data store aimed at big data
• Apache project since 2010.
• Version 1.0 released last October.
• Proven in production (Netflix, Twitter,
Reddit, Cisco, ...). Largest know cluster has
over 300TB in over 400 machines.
12. Apache Cassandra
A database:
• distributed / decentralized
• replicated & durable
• scalable / elastic
• fault-tolerant / no SPOF
• highly available
13. Apache Cassandra
A database:
• distributed / decentralized
• replicated & durable
• scalable / elastic
• fault-tolerant / no SPOF
• highly available
14. Apache Cassandra
A database:
• distributed / decentralized
• replicated & durable
• scalable / elastic
• fault-tolerant / no SPOF
• highly available
• data center aware
US
Europe
15. 1. What is Apache Cassandra
2. Data Model
3. The storage engine
16. Data Model
• Not SQL (no transaction, nor joins) but
more than Key/Value.
• Inspired by Google BigTable
• Column families based.
17. Ex: user profiles
“For each user, holds profile infos”
50e8-e29b
birth_year 1994
fname Justin
lname Bieber
Users
19. Ex: user’s Tweets
“For each user, tweets he has made”
50e8-e29b
Timeline
20. Ex: user’s Tweets
“For each user, tweets he has made”
50e8-e29b
@LiveLoveKary glad you had
0 a good birthday #muchlove
Timeline
21. Ex: user’s Tweets
“For each user, tweets he has made”
50e8-e29b
@NickDeMoura happy bday
1 my dude.
@LiveLoveKary glad you had
0 a good birthday #muchlove
Timeline
22. Ex: user’s Tweets
“For each user, tweets he has made”
50e8-e29b
@MickyArison @miamiHEAT
2 thanks for the gam tonight
@NickDeMoura happy bday
1 my dude.
@LiveLoveKary glad you had
0 a good birthday #muchlove
Timeline
23. Ex: user’s Tweets
“For each user, tweets he has made”
50e8-e29b
still a little tired. back in the
3 studio today with Timbaland
@MickyArison @miamiHEAT
2 thanks for the gam tonight
@NickDeMoura happy bday
1 my dude.
@LiveLoveKary glad you had
0 a good birthday #muchlove
Timeline
45. • Cassandra 1.1 scheduled for next month
• http://cassandra.apache.org/
• http://wiki.apache.org/cassandra/
• http://www.datastax.com/docs/1.0
46. Data Model
Keyspace name
Column Family name
Row key
Column name
Value
Columns (upto 2B)
Rows (∞)
Column Families (10’s ➝ 100’s)
Keyspaces (1 per app)