©2013 DataStax. Do not distribute without consent.©2013 DataStax. Do not distribute without consent.
Nick Bailey
OpsCenter...
Who am I?
• OpsCenter Architect
• Monitoring/management tool for Cassandra
• Organizer of Austin Cassandra Users
• http://...
Cassandra - An introduction
Cassandra - Intro
• Based on Amazon Dynamo and Google BigTable papers
• Shared nothing
• Distributed
• Predictable scaling...
Users
533
Cassandra - Architecture
Cassandra - Cluster Architecture
• All nodes participate in a cluster
• Shared nothing
• Add or remove as needed
• More ca...
Cassandra - Data Distribution
8
75
0
25
50
• Each node owns 1 or more “tokens”
• Each piece of data has a “partition key”
...
Cassandra - Replication
• Client writes to any node
• Node coordinates with replicas
• Data replicated in parallel
• Repli...
Cassandra - Failure Modes
• Consistency level
• How many nodes?
• ONE/QUORUM/ALL
10
Cassandra - Geographically Distributed
• Client writes local
• Data syncs across WAN
• Replication Factor per DC
• Consist...
Data Modeling - Concepts
CQL
• Cassandra Query Language
• SQL-like
• Not Relational
Terminology
• Keyspace
• Table (Column Family)
• Row
• Column
• Partition Key
• Clustering Key
Data Types
cqlsh:clojure_cassandra_demo> help types
CQL types recognized by this version of cqlsh:
ascii
bigint
blob
boole...
Advanced Concepts
• Lightweight Transactions
• Atomic Batches
• User Defined Types (coming soon)
Data Modeling - An Example
Approaching Data Modeling
• Model your queries, not your data
• Generally, optimize for reads
• Denormalize!
• Iterate!
Basic Last.fm Clone
• See songs that user X has listened to recently
• See user X’s favorite songs in a specific month
• S...
Basic Last.fm Clone
• See songs that user X has listened to recently
• One of the most common patterns/data models
• Time ...
Basic Last.fm Clone
• See songs that user X has listened to recently
SELECT song, artist, played_at
FROM user_history
WHER...
Basic Last.fm Clone
• See songs that user X has listened to recently
CREATE TABLE user_history (
username text,
played_at ...
Basic Last.fm Clone
• See songs that user X has listened to recently
• This table has a “bad” partition key
CREATE TABLE u...
Basic Last.fm Clone
• See songs that user X has listened to recently
• Much better partition key
CREATE TABLE user_history...
Basic Last.fm Clone
• See songs that user X has listened to recently
cqlsh:clojure_cassandra_demo> select * from user_hist...
Basic Last.fm Clone
• See user X’s favorite songs in a specific month
SELECT song, artist, play_count
FROM user_history
WH...
Counters
• Counter can not be part of the PRIMARY KEY
• No ordering based on counter value
• All non counter columns must ...
Basic Last.fm Clone
• See user X’s favorite songs in a specific month
CREATE TABLE user_song_counts (
username text,
year_...
Basic Last.fm Clone
• See user X’s favorite songs in a specific month
• Results unordered
• Client will have to do the sor...
Basic Last.fm Clone
• See who has recently listened to artist Y
CREATE TABLE artist_history (
artist text,
year_and_week t...
Basic Last.fm Clone
• See artist Y’s most popular songs in a specific week
CREATE TABLE artist_song_counts (
artist text,
...
Cassandra from Clojure
Building Blocks
• Java Driver
• Hayt
33
Java Driver
• Fully featured
• Connection pooling
• Failover policies
• Retry policies
• Sync and Async interfaces
• Expos...
Hayt
• CQL DSL
• Similar to Korma
• Solely for building CQL strings
• https://github.com/mpenet/hayt
35
(select
:foo
(wher...
Clients
• Alia
• https://github.com/mpenet/alia
• Cassaforte
• https://github.com/clojurewerkz/cassaforte
• Both built on ...
Alia vs. Cassaforte
37
Cassaforte
(let [conn (cc/connect ["127.0.0.1"])]
(cql/create-keyspace conn "cassaforte_keyspace"
(...
Learn by Example - Alia
Cluster Object
• Entry point
• Configures relevant client options
• :contact-points
• :load-balancing-policy
• :reconnecti...
Session Object
• A Session is associated with a keyspace
• Allows interacting with multiple keyspaces
40
(def cluster (ali...
Querying
• Multiple ways to query
• alia/execute
• Synchronous, block on result
• alia/execute-async
• Returns a Lamina re...
Prepared Statements
• Statements can be prepared server side
• Better performance for common queries
42
(def prepared-stat...
What else?
• See github and docs
• https://github.com/mpenet/alia
• http://mpenet.github.io/alia/qbits.alia.html
43
Demo
Demo
• https://github.com/nickmbailey/clojure-cassandra-demo
• Built with
• CCM - https://github.com/pcmanus/ccm
• Alia - ...
More
Cassandra: http://cassandra.apache.org
DataStax Drivers: https://github.com/datastax
Documentation: http://www.datast...
©2013 DataStax Confidential. Do not distribute without consent.©2013 DataStax Confidential. Do not distribute without cons...
Upcoming SlideShare
Loading in...5
×

Cassandra and Clojure

1,318

Published on

An introduction to Cassandra as well as an example of accessing Cassandra from Clojure.

Includes an introduction to cluster architecture and data model in Cassandra. The code for the examples is available at: https://github.com/nickmbailey/clojure-cassandra-demo

Published in: Software
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,318
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
10
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Cassandra and Clojure

  1. 1. ©2013 DataStax. Do not distribute without consent.©2013 DataStax. Do not distribute without consent. Nick Bailey OpsCenter Architect Cassandra and Clojure 1
  2. 2. Who am I? • OpsCenter Architect • Monitoring/management tool for Cassandra • Organizer of Austin Cassandra Users • http://www.meetup.com/Austin-Cassandra-Users/ • Third Thursday each month. Come join! • Working with Cassandra for 4 years
  3. 3. Cassandra - An introduction
  4. 4. Cassandra - Intro • Based on Amazon Dynamo and Google BigTable papers • Shared nothing • Distributed • Predictable scaling 4 Dynamo BigTable
  5. 5. Users 533
  6. 6. Cassandra - Architecture
  7. 7. Cassandra - Cluster Architecture • All nodes participate in a cluster • Shared nothing • Add or remove as needed • More capacity? Add a server 7
  8. 8. Cassandra - Data Distribution 8 75 0 25 50 • Each node owns 1 or more “tokens” • Each piece of data has a “partition key” • Partition key is hashed to determine token • Hashes: • Murmur3 (default) • Md5
  9. 9. Cassandra - Replication • Client writes to any node • Node coordinates with replicas • Data replicated in parallel • Replication factor (RF): How many copies of your data? 9
  10. 10. Cassandra - Failure Modes • Consistency level • How many nodes? • ONE/QUORUM/ALL 10
  11. 11. Cassandra - Geographically Distributed • Client writes local • Data syncs across WAN • Replication Factor per DC • Consistency Level • LOCAL_QUORUM 11 Datacenter East Datacenter West
  12. 12. Data Modeling - Concepts
  13. 13. CQL • Cassandra Query Language • SQL-like • Not Relational
  14. 14. Terminology • Keyspace • Table (Column Family) • Row • Column • Partition Key • Clustering Key
  15. 15. Data Types cqlsh:clojure_cassandra_demo> help types CQL types recognized by this version of cqlsh: ascii bigint blob boolean counter decimal double float inet int list map set text timestamp timeuuid uuid varchar varint
  16. 16. Advanced Concepts • Lightweight Transactions • Atomic Batches • User Defined Types (coming soon)
  17. 17. Data Modeling - An Example
  18. 18. Approaching Data Modeling • Model your queries, not your data • Generally, optimize for reads • Denormalize! • Iterate!
  19. 19. Basic Last.fm Clone • See songs that user X has listened to recently • See user X’s favorite songs in a specific month • See who has recently listened to artist Y • See artist Y’s most popular songs in a specific week
  20. 20. Basic Last.fm Clone • See songs that user X has listened to recently • One of the most common patterns/data models • Time series • Immutable (good fit for Clojure!)
  21. 21. Basic Last.fm Clone • See songs that user X has listened to recently SELECT song, artist, played_at FROM user_history WHERE username = ‘nickmbailey’ ORDER BY played_at DESC; • Partition key = ‘username’ • Clustering key = ‘played_at’
  22. 22. Basic Last.fm Clone • See songs that user X has listened to recently CREATE TABLE user_history ( username text, played_at timestamp, album text, artist text, song text, PRIMARY KEY (username, played_at) ) WITH CLUSTERING ORDER BY (played_at DESC)
  23. 23. Basic Last.fm Clone • See songs that user X has listened to recently • This table has a “bad” partition key CREATE TABLE user_history ( username text, played_at timestamp, album text, artist text, song text, PRIMARY KEY (username, played_at) ) WITH CLUSTERING ORDER BY (played_at DESC)
  24. 24. Basic Last.fm Clone • See songs that user X has listened to recently • Much better partition key CREATE TABLE user_history ( username text, year_and_month text, played_at timestamp, album text, artist text, song text, PRIMARY KEY ((username, year_and_month), played_at) ) WITH CLUSTERING ORDER BY (played_at DESC)
  25. 25. Basic Last.fm Clone • See songs that user X has listened to recently cqlsh:clojure_cassandra_demo> select * from user_history limit 5; username | year_and_month | played_at | album | artist | song -------------+----------------+--------------------------+--------------------------+--------------------------+------------------------- nickmbailey | 2014-06 | 2014-06-30 17:13:54-0500 | Once More 'Round The Sun | Mastodon | Halloween nickmbailey | 2014-06 | 2014-06-30 17:08:53-0500 | Once More 'Round The Sun | Mastodon | Ember City b_hastings | 2014-06 | 2014-06-30 12:57:12-0500 | Buena Vista Social Club | Buena Vista Social Club | Chan Chan zack_smith | 2014-07 | 2014-07-30 12:49:35-0500 | Awake Remix | Tycho | Awake (Com Truise Remix) zack_smith | 2014-03 | 2014-03-30 12:44:50-0500 | Awake Remix | Tycho | Awake Partition Key - unordered Clustering Key - Ordered
  26. 26. Basic Last.fm Clone • See user X’s favorite songs in a specific month SELECT song, artist, play_count FROM user_history WHERE username = ‘nickmbailey’ AND month = ‘July’ ORDER BY play_count DESC; • Partition key = ‘username’, ‘month’ • Clustering key = ‘play_count’? • Counters are a special case
  27. 27. Counters • Counter can not be part of the PRIMARY KEY • No ordering based on counter value • All non counter columns must be part of the PRIMARY KEY • Limitations due to the storage format
  28. 28. Basic Last.fm Clone • See user X’s favorite songs in a specific month CREATE TABLE user_song_counts ( username text, year_and_month text, artist text, song text, play_count counter, PRIMARY KEY ((username, year_and_month), artist, song))
  29. 29. Basic Last.fm Clone • See user X’s favorite songs in a specific month • Results unordered • Client will have to do the sorting cqlsh:clojure_cassandra_demo> select * from user_song_counts where username = 'nickmbailey' and year_and_month = '2014-07'; username | year_and_month | artist | song | count -------------+----------------+----------+-----------------------------------+------- nickmbailey | 2014-07 | Amos Lee | Tricksters, Hucksters, And Scamps | 10 nickmbailey | 2014-07 | Beck | Blackbird Chain | 1 nickmbailey | 2014-07 | Beck | Blue Moon | 4 nickmbailey | 2014-07 | Cherub | <3 | 12 nickmbailey | 2014-07 | Cherub | Chocolate Strawberries | 6
  30. 30. Basic Last.fm Clone • See who has recently listened to artist Y CREATE TABLE artist_history ( artist text, year_and_week text, played_at timestamp, album text, song text, username text, PRIMARY KEY ((artist, year_and_week), played_at) ) WITH CLUSTERING ORDER BY (played_at DESC)
  31. 31. Basic Last.fm Clone • See artist Y’s most popular songs in a specific week CREATE TABLE artist_song_counts ( artist text, year_and_week text, album text, song text, play_count counter, PRIMARY KEY ((artist, year_and_week), album, song))
  32. 32. Cassandra from Clojure
  33. 33. Building Blocks • Java Driver • Hayt 33
  34. 34. Java Driver • Fully featured • Connection pooling • Failover policies • Retry policies • Sync and Async interfaces • Exposes client metrics • https://github.com/datastax/java-driver 34
  35. 35. Hayt • CQL DSL • Similar to Korma • Solely for building CQL strings • https://github.com/mpenet/hayt 35 (select :foo (where { :bar 1 :baz 2)}) (->raw (select :foo (where {:bar 1 :baz 2)})) > "SELECT * FROM foo WHERE bar = 1 AND baz = 2;"
  36. 36. Clients • Alia • https://github.com/mpenet/alia • Cassaforte • https://github.com/clojurewerkz/cassaforte • Both built on Java Driver and Hayt • Not particularly different 36
  37. 37. Alia vs. Cassaforte 37 Cassaforte (let [conn (cc/connect ["127.0.0.1"])] (cql/create-keyspace conn "cassaforte_keyspace" (with {:replication {:class "SimpleStrategy" :replication_factor 1 }}))) Alia (def cluster (alia/cluster {:contact-points ["localhost"]})) (def session (alia/connect cluster)) (alia/execute session (create-keyspace :alia (if-exists false) (with {:replication {:class "SimpleStrategy" :replication_factor 1}})))
  38. 38. Learn by Example - Alia
  39. 39. Cluster Object • Entry point • Configures relevant client options • :contact-points • :load-balancing-policy • :reconnection-policy • :retry-policy • and more! 39 (def cluster (alia/cluster {:contact-points ["localhost"]}))
  40. 40. Session Object • A Session is associated with a keyspace • Allows interacting with multiple keyspaces 40 (def cluster (alia/cluster {:contact-points [“localhost"]})) (def session (alia/connect cluster)) (def session (alia/connect cluster) :my_keyspace)
  41. 41. Querying • Multiple ways to query • alia/execute • Synchronous, block on result • alia/execute-async • Returns a Lamina result-channel (basically, a promise) • Optional success/error callbacks • alia/execute-chan • Returns a core.async channel • We won’t dive in to core.async now 41
  42. 42. Prepared Statements • Statements can be prepared server side • Better performance for common queries 42 (def prepared-statement (alia/prepare session "select * from users where user_name=?;"))
  43. 43. What else? • See github and docs • https://github.com/mpenet/alia • http://mpenet.github.io/alia/qbits.alia.html 43
  44. 44. Demo
  45. 45. Demo • https://github.com/nickmbailey/clojure-cassandra-demo • Built with • CCM - https://github.com/pcmanus/ccm • Alia - https://github.com/mpenet/alia • ring - https://github.com/ring-clojure/ring • compojure - https://github.com/weavejester/compojure • hiccup - https://github.com/weavejester/hiccup • least - https://github.com/Raynes/least 45
  46. 46. More Cassandra: http://cassandra.apache.org DataStax Drivers: https://github.com/datastax Documentation: http://www.datastax.com/docs Getting Started: http://www.datastax.com/documentation/gettingstarted/index.html Developer Blog: http://www.datastax.com/dev/blog Cassandra Community Site: http://planetcassandra.org Download: http://planetcassandra.org/Download/DataStaxCommunityEdition Webinars: http://planetcassandra.org/Learn/CassandraCommunityWebinars Cassandra Summit Talks: http://planetcassandra.org/Learn/CassandraSummit
  47. 47. ©2013 DataStax Confidential. Do not distribute without consent.©2013 DataStax Confidential. Do not distribute without consent. 47
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×