Using Apache Cassandra for Big Data
What is this thing, and how do I use it?

Jeremiah Jordan
Lead Software Engineer/Suppo...
Who I am
• Jeremiah Jordan
• Lead Software Engineer in Support at DataStax
• Previously Senior Architect at Morningstar, I...
Cassandra - An introduction

Monday, October 14, 13
Cassandra - Intro
• Based on Amazon Dynamo and Google BigTable papers
• Shared nothing
• Distributed
• Data safe as possib...
Cassandra - More than one server
• All nodes participate in a cluster
• Shared nothing
• Add or remove as needed
• More ca...
Cassandra - Locally Distributed

• Client writes to any node
• Node coordinates with others
• Data replicated in parallel
...
Cassandra - Geographically Distributed
• Client writes local
• Data syncs across WAN
• Replication Factor per DC

Single c...
Cassandra - Consistency
• Consistency Level (CL)
• Client specifies per read or write

• ALL = All replicas ack
• QUORUM =...
Cassandra - Transparent to the application
• A single node failure shouldn’t bring failure
• Replication Factor + Consiste...
Application Example - Layout
• Active-Active
• Service based DNS routing

Cassandra Replication

10
Monday, October 14, 13
Application Example - Uptime
• Normal server maintenance
• Application is unaware

Cassandra Replication

11
Monday, Octob...
Application Example - Failure
• Data center failure

Another happy user!

• Data is safe. Route traffic.

12
33
Monday, Oc...
Five Years of Cassandra

0.1
Jul-08

...

0.3
Jul-09

0.6
May-10

0.7
Feb-11

1.0
Dec-11

DSE

Monday, October 14, 13

1.2...
Cassandra 2.0 - Big new features

Monday, October 14, 13
Lightweight transactions: the problem
Session 1

Session 2

SELECT * FROM users
WHERE username = ’jbellis’

SELECT * FROM ...
LWT: details
• 4 round trips vs 1 for normal updates
• Paxos - Paxos made easy
• Immediate consistency with no leader elec...
Using LWT
• Don’t overwrite an existing record
INSERT INTO USERS (username, email, ...)
VALUES (‘jbellis’, ‘jbellis@datast...
LWT: Use with caution
• Great for 1% of your application
• Eventual consistency is your friend
• http://www.slideshare.net...
Installing Cassandra

Monday, October 14, 13
Download Cassandra

Monday, October 14, 13
Download Cassandra

Monday, October 14, 13
Download Cassandra

Monday, October 14, 13
Extract Cassandra

Monday, October 14, 13
Setup Data and Log Directories

Monday, October 14, 13
Start Cassandra

Monday, October 14, 13
Start Cassandra

Monday, October 14, 13
Installing Cassandra Python Driver

Monday, October 14, 13
Python Cassandra Driver

Monday, October 14, 13
Install Python Cassandra Driver

Monday, October 14, 13
Connect and Create a Keyspace
from cassandra.cluster import Cluster
cluster = Cluster(['127.0.0.1'])
session = cluster.con...
Create a Table
log.info("setting keyspace...")
session.set_keyspace(KEYSPACE)
log.info("creating table...")
session.execut...
Insert a Row
query = SimpleStatement("""
INSERT INTO mytable (thekey, col1, col2)
VALUES ('key1', 'a', 'b')
""", consisten...
Insert Rows (Prepared Statement)
prepared = session.prepare("""
INSERT INTO mytable (thekey, col1, col2)
VALUES (?, ?, ?)
...
Query Results
future = session.execute_async("""
SELECT * FROM mytable WHERE thekey='key1'
""")
rows = future.result()
log...
Run It

Monday, October 14, 13
Cassandra Applications - Drivers
• DataStax Drivers for Cassandra
• Java
• C#
• Python
• more on the way

36
Monday, Octob...
Find Out More
Cassandra: http://cassandra.apache.org
DataStax Drivers: https://github.com/datastax
Documentation: http://w...
©2013 DataStax Confidential. Do not distribute without consent.
Monday, October 14, 13

38
Upcoming SlideShare
Loading in...5
×

Using Apache Cassandra: What is this thing, and how do I use it?

2,752

Published on

This is the presentation I gave at the Reflections | Projections conference at UIUC. http://www.acm.uiuc.edu/conference/2013/ It is an introduction to some of the basics of Apache Cassandra, followed by actually getting it up and running. This presentation goes over what Apache Cassandra is and how to get it up and running on your development machine. It then goes over using the DataStax Python Driver and the Cassandra Query Language (CQL) to create tables, write data to them, and then read it back out.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,752
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
51
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Using Apache Cassandra: What is this thing, and how do I use it?

  1. 1. Using Apache Cassandra for Big Data What is this thing, and how do I use it? Jeremiah Jordan Lead Software Engineer/Support @zanson ©2013 DataStax. Do not distribute without consent. Monday, October 14, 13 1
  2. 2. Who I am • Jeremiah Jordan • Lead Software Engineer in Support at DataStax • Previously Senior Architect at Morningstar, Inc. • Using Cassandra since 0.6 • Before that, wrote code for the F22 Monday, October 14, 13
  3. 3. Cassandra - An introduction Monday, October 14, 13
  4. 4. Cassandra - Intro • Based on Amazon Dynamo and Google BigTable papers • Shared nothing • Distributed • Data safe as possible • Predictable scaling Dynamo BigTable 4 Monday, October 14, 13
  5. 5. Cassandra - More than one server • All nodes participate in a cluster • Shared nothing • Add or remove as needed • More capacity? Add a server • Each node owns a number of tokens • Tokens denote a range of keys • 4 nodes? -> Key range/4 • Each node owns 1/4 the data 5 Monday, October 14, 13
  6. 6. Cassandra - Locally Distributed • Client writes to any node • Node coordinates with others • Data replicated in parallel • Replication factor (RF): How many copies of your data? • RF = 3 here Each node stores 3/4 of clusters total data. 6 Monday, October 14, 13
  7. 7. Cassandra - Geographically Distributed • Client writes local • Data syncs across WAN • Replication Factor per DC Single coordinator 7 Monday, October 14, 13
  8. 8. Cassandra - Consistency • Consistency Level (CL) • Client specifies per read or write • ALL = All replicas ack • QUORUM = > 51% of replicas ack • LOCAL_QUORUM = > 51% in local DC ack • ONE = Only one replica acks 8 Monday, October 14, 13
  9. 9. Cassandra - Transparent to the application • A single node failure shouldn’t bring failure • Replication Factor + Consistency Level = Success • This example: • RF = 3 • CL = QUORUM >51% Ack so we are good! 9 Monday, October 14, 13
  10. 10. Application Example - Layout • Active-Active • Service based DNS routing Cassandra Replication 10 Monday, October 14, 13
  11. 11. Application Example - Uptime • Normal server maintenance • Application is unaware Cassandra Replication 11 Monday, October 14, 13
  12. 12. Application Example - Failure • Data center failure Another happy user! • Data is safe. Route traffic. 12 33 Monday, October 14, 13
  13. 13. Five Years of Cassandra 0.1 Jul-08 ... 0.3 Jul-09 0.6 May-10 0.7 Feb-11 1.0 Dec-11 DSE Monday, October 14, 13 1.2 Oct-12 2.0 Jul-13
  14. 14. Cassandra 2.0 - Big new features Monday, October 14, 13
  15. 15. Lightweight transactions: the problem Session 1 Session 2 SELECT * FROM users WHERE username = ’jbellis’ SELECT * FROM users WHERE username = ’jbellis’ [empty resultset] [empty resultset] It’s a Race! INSERT INTO users (username,password) VALUES (’jbellis’,‘xdg44hh’) Who wins? Monday, October 14, 13 INSERT INTO users (userName,password) VALUES (’jbellis’,‘8dhh43k’)
  16. 16. LWT: details • 4 round trips vs 1 for normal updates • Paxos - Paxos made easy • Immediate consistency with no leader election or failover • For reads, ConsistencyLevel.SERIAL • http://www.datastax.com/dev/blog/lightweight-transactions-incassandra-2-0 Monday, October 14, 13
  17. 17. Using LWT • Don’t overwrite an existing record INSERT INTO USERS (username, email, ...) VALUES (‘jbellis’, ‘jbellis@datastax.com’, ... ) IF NOT EXISTS; • Only update record if condition is met UPDATE USERS SET email = ’jonathan@datastax.com’, ... WHERE username = ’jbellis’ IF email = ’jbellis@datastax.com’; Monday, October 14, 13
  18. 18. LWT: Use with caution • Great for 1% of your application • Eventual consistency is your friend • http://www.slideshare.net/planetcassandra/c-summit-2013-eventual-consistencyhopeful-consistency-by-christos-kalantzis Monday, October 14, 13
  19. 19. Installing Cassandra Monday, October 14, 13
  20. 20. Download Cassandra Monday, October 14, 13
  21. 21. Download Cassandra Monday, October 14, 13
  22. 22. Download Cassandra Monday, October 14, 13
  23. 23. Extract Cassandra Monday, October 14, 13
  24. 24. Setup Data and Log Directories Monday, October 14, 13
  25. 25. Start Cassandra Monday, October 14, 13
  26. 26. Start Cassandra Monday, October 14, 13
  27. 27. Installing Cassandra Python Driver Monday, October 14, 13
  28. 28. Python Cassandra Driver Monday, October 14, 13
  29. 29. Install Python Cassandra Driver Monday, October 14, 13
  30. 30. Connect and Create a Keyspace from cassandra.cluster import Cluster cluster = Cluster(['127.0.0.1']) session = cluster.connect() log.info("creating keyspace...") KEYSPACE = "testkeyspace" session.execute(""" CREATE KEYSPACE IF NOT EXISTS %s WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': '1' } """ % KEYSPACE) Monday, October 14, 13
  31. 31. Create a Table log.info("setting keyspace...") session.set_keyspace(KEYSPACE) log.info("creating table...") session.execute(""" CREATE TABLE IF NOT EXISTS mytable ( thekey text, col1 text, col2 text, PRIMARY KEY (thekey, col1) ) """) Monday, October 14, 13
  32. 32. Insert a Row query = SimpleStatement(""" INSERT INTO mytable (thekey, col1, col2) VALUES ('key1', 'a', 'b') """, consistency_level=ConsistencyLevel.ONE) log.info("inserting row") session.execute(query) Monday, October 14, 13
  33. 33. Insert Rows (Prepared Statement) prepared = session.prepare(""" INSERT INTO mytable (thekey, col1, col2) VALUES (?, ?, ?) """) for i in range(10): log.info("inserting row %d" % i) bound = prepared.bind(("key%d" % i, "b%d" % i, "c%d" % i)) session.execute(bound) Monday, October 14, 13
  34. 34. Query Results future = session.execute_async(""" SELECT * FROM mytable WHERE thekey='key1' """) rows = future.result() log.info("keytcol1tcol2") log.info("---t----t----") for row in rows: log.info("t".join(row)) Monday, October 14, 13
  35. 35. Run It Monday, October 14, 13
  36. 36. Cassandra Applications - Drivers • DataStax Drivers for Cassandra • Java • C# • Python • more on the way 36 Monday, October 14, 13
  37. 37. Find Out More Cassandra: http://cassandra.apache.org DataStax Drivers: https://github.com/datastax Documentation: http://www.datastax.com/docs Getting Started: http://www.datastax.com/documentation/gettingstarted/index.html Developer Blog: http://www.datastax.com/dev/blog Cassandra Community Site: http://planetcassandra.org Download: http://planetcassandra.org/Download/DataStaxCommunityEdition Webinars: http://planetcassandra.org/Learn/CassandraCommunityWebinars Cassandra Summit Talks: http://planetcassandra.org/Learn/CassandraSummit Monday, October 14, 13
  38. 38. ©2013 DataStax Confidential. Do not distribute without consent. Monday, October 14, 13 38
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×