This is an introductory presentation to Cassandra, the database of choice for high availability and insane scalability.
I gave this talk at TheEdge conference.
7. Cassandra’s Sweet Spot
Many Linear
concurrent Scalability
users
Distributed
High Volumes Inherently
of Operations Clustered
8. The Road to Mastership
Introduction
to Cassandra Introduction to
Cassandra
Data
Running a Model
Server
Modeling
Data
Communicating
with the Server
Growing
a Cluster
14. The Road to Mastership
Introduction
to Cassandra
Running a Server
Data
Running a Model
Server
Modeling
Data
Communicating
with the Server
Growing
a Cluster
15. The Cassandra Project
» Project
» Runs on:
» Apache License
» Current release: 1.0.8
You are
here
sonia@hiro:~/apache-cassandra-1.0.8$
16. Running a Server
sonia@hiro:~/apache-cassandra-1.0.8$
bin/cassandra -f
....
Now serving reads.
localhost/127.0.0.1:9160
17. Connecting to Our Server
Cassandra command line interface (CLI) tool
sonia@hiro:~/apache-cassandra-1.0.8$
bin/cassandra-cli –host 127.0.0.1 –port 9160
Connected to: “Test Cluster” on
localhost/9160
Welcome to Cassandra CLI version 1.0.8
18. Creating a Keyspace
Cassandra’s equivalent to RDBMSs database
[default@unknown] create keyspace demo;
Lets start using it
[default@unknown] use demo;
[default@demo]
19. Creating a Column Family
A column family holds data, much like a table in
RDBMS.
[default@demo] create column family user;
Start adding data
[default@demo] set user[1][a]=utf8(„foo‟);
[default@demo] set user[2][b]=utf8(„bar‟);
[default@demo] set user[2][c]=utf8(„test‟);
20. Retrieving Data
Retrieving columns by user key
[default@demo] get user[2];
(column=b, value=bar)
(column=c, value=test)
Returned 2 results.
21. The Road to Mastership
Introduction
to Cassandra
Data Model
Data
Running a Model
Server
Modeling
Data
Communicating
with the Server
Growing
a Cluster
24. Row
icon name residence
spiderman
Peter Parker New York
25. Row
Columns
Row Id
icon name residence
spiderman
Peter Parker New York
1 2
spiderman name Peter Parker
26. Column Family
spider- icon name residence
man Peter P New York
icon name residence
batman
Bruce W Gotham
icon name residence
hulk
Bruce B New York
27. Column Family
spider- icon name residence
man Peter P New York
icon name residence
batman
set user[„spiderman‟][„name‟] W „Peter Parker‟
Bruce = Gotham
icon name residence
hulk Value
Column
Bruce B New York
Row id name
Column
Family
28. The Allies Column Family
Robin Alfred
batman
spider- Iceman Firestar Iron Man Storm
man
29. Published Issues Column Family
~2600 columns
spider- 1/8/1962
man ###
... 1/3/2012 8/3/2012
### ###
batman 1/5/1939
###
... 2/3/2012 9/3/2012
### ###
~3800 columns
31. Keyspace
» Like RDBMS database
» A container for column families
[default@unknown] create keyspace demo;
» One keyspace per application, in most cases
32. Expiring Columns – TTL
icon name passwd_ residence
spider- reminder
man Peter P abcd New York
set users[„spiredman‟][„passwd_reminder‟] =
„abcd‟ with ttl = 7200;
7200s = 2 hours
33. Distributed Counters
javaedge speakers sessions
.com 1035 3402
incr page_views[„javaedge.com‟][„speakers‟] by 1
get page_views[„javaedge.com‟][„speakers‟]
34. The Road to Mastership
Introduction
to Cassandra Communication with
the Server: Clients
Data
Running a Model
Server
Modeling
Data
Communicating
with the Server
Growing
a Cluster
35. Cassandra Query Language
» Looks a lot like SQL
INSERT INTO users (KEY, name, universe)
VALUES (hulk, Bruce, marvel)
» Mostly valid SQL
SELECT name, universe
FROM users
WHERE KEY = „hulk‟
36. Advantages of using CQL
» Run ad-hoc queries
» Very familiar, easier to use
» Stable interface
▪ For library developers
▪ For users
37. CQL Example
SELECT name, residence FROM users
SELECT 01/1/2011 .. 1/1/2012
FROM published_issues
WHERE KEY = „spiderman‟
SELECT FIRST 5
FROM allies
WHERE KEY = „spiderman‟
38. CQL Example
SELECT name, residence FROM users
SELECT 01/1/2011 .. 1/1/2012
FROM published_issues
WHERE KEY = „spiderman‟
SELECT FIRST 5
FROM allies
WHERE KEY = „spiderman‟
39. CQL Example
SELECT name, residence FROM users
SELECT 01/1/2011 .. 1/1/2012
FROM published_issues
WHERE KEY = „spiderman‟
SELECT FIRST 5
FROM allies
WHERE KEY = „spiderman‟
44. Hector: Advanced Features
» Failover support
» Connection pooling
» Load balancing
» JMX counters
» Object mapper
45. Maven plugin
mvn cassandra:start
Run your tests
mvn cassandra:cql-exec
mvn cassandra:stop
46. The Road to Mastership
Introduction
to Cassandra
Modeling Data
Data
Running a Model
Server
Modeling
Data
Communicating
with the Server
Growing
a Cluster
47. Queries First
» Use the same Column Family for data that
should be fetched together
▪ Reduces IO
» Consider filtering and ordering
48. Denormalize
» Less seeks - faster reads
» Storing redundant data
▪ Manually handling data integrity
» Disk space is cheaper than seek time
49. Secondary Index
» Requirement:
Find all superheroes that live in New York
icon name residence
spiderman
Peter Parker New York
50. Secondary Index
» Requirement:
Find all superheroes that live in New York
icon name residence
spiderman
Peter Parker New York
create column family users
... and column_metadata=
[{column_name: residence, index_type: KEYS}];
» Good nameindexes with low cardinality
SELECT for
FROM users
WHERE residence = „New York‟
52. Manually Managed Index
» Requirement:
Find a superhero by name
hulk batman
Bruce
Search Keys in
term users CF
spiderman
Peter
» Manually maintain an inverted index
54. The Road to Mastership
Introduction
to Cassandra
Cassandra Cluster
Data
Running a Model
Server
Modeling
Data
Communicating
with the Server
Growing
a Cluster
70. The Road to Mastership
Introduction
to Cassandra
Summary
Data
Running a Model
Server
Modeling
Data
Communicating
with the Server
Growing
a Cluster
71.
72. Where Do You Sign?
» Cassandra
▪ http://cassandra.apache.com
▪ http://www.datastax.com/
• Docs, tutorials & videos
▪ IRC: #cassandra on freenode
» Hector
▪ https://github.com/rantav/hector
▪ https://github.com/zznate/hector-examples
Editor's Notes
האפליקציה שלכם ויראלית כמות המשתמשים מוכפלת כל שבוע
Sparse nested hashtables
מילות מפתח:העמודות ממויינות
Columns are stored in rowsRows are indexed by row-id - This is the primary index in Cassandraמילות מפתח: עמודה ככלי עיקרי לשמירת נתונים. עד 2 ביליון עמודות.
כלהאימפורטים הם java.sql, וצריך רק לשים לב שה-sql שלכם מתחים ב-C
כלהאימפורטים הם java.sql, וצריך רק לשים לב שה-sql שלכם מתחים ב-C