In this session you’ll learn just enough to get started with NoSQL Apache Cassandra and Python, including how to install, build and try out some basic API calls. You’ll learn the basics of how to code your first application in Python on top of Cassandra, and leave the session feeling confident and excited to take the next step!
2. This should be boring
• Talking to a database should not
be any of the following:
• Exciting
• "AH HA!"
• Confusing
git@github.com:rustyrazorblade/python-presentation.git
10. 1 from cassandra.concurrent import execute_concurrent_with_args
2
3 stmt = """SELECT * FROM sensor_data WHERE sensor_id=?
4 ORDER BY created_at DESC LIMIT 1""")
5
6 select_statement = session.prepare(stmt)
7
8 sensor_ids = [["f472d5ff-0c76-404a-8044-038db416685e"],
9 ["940cb741-d5b5-4c5d-82f5-bf1aa61c6d47"],
10 ["497d4b2c-cba2-4d0f-bd80-42de612690fd"],
11 ["1bdeac75-7e12-43ba-80b5-2d38405f9843"]
12
13 result = execute_concurrent_with_args(session, select_statement, sensor_ids)
Async Queries (managed)
prepared statement
automatically manages concurrency
11. Performance Considerations
• Like SQL, CQL features IN() but in
general, it's terrible for
performance
• Results in more GC & perf
problems
• BATCH has the same issue
• Failure to get a single result
causes entire IN() or batch to retry
13. Defining Models
• Each model maps to a single table
• Every model inherits from cassandra.cqlengine.models.Model
• Define fields in your table programatically
• Collections map to native Python types (lists, sets, dict)
• Table management included (no need to write ALTER)
14. Model with Collections
• Sets & Maps are most useful
• Use to denormalize
• Lists can have performance issues if misused
1 class Message(Model):
2 message_id = TimeUUID(primary_key=True, default=uuid1)
3 subject = Text()
4 body = Text()
5 addressed_to = Set(UUID)
6
7 class Photo(Model):
8 photo_id = UUID(primary_key=True, default=uuid4)
9 title = Text()
10 likes = Map<UUID, Text>
15. Clustering Keys
• Automatically determined by
ordering in model
• First primary key is partition key
• The rest are clustering keys
1 class UsersInGroup(Model):
2 group_id = UUID(primary_key=True)
3 user_id = UUID(primary_key=True)
4 is_admin = Boolean()
5
6
1 class UsersInGroupByState(Model):
2 group_id = UUID(primary_key=True, partition_key=True)
3 state = Text(primary_key=True, partition_key=True
4 user_id = UUID(primary_key=True)
5 is_admin = Boolean(default=False)
17. Lightweight Transactions
• Uses paxos for consensus
• IF NOT EXISTS for INSERT
• IF FIELD=VALUE for UPDATE
• Use sparingly - requires
several round trips
18. Batches
• Use only to maintain multiple views (for consistency purposes)
1 class User(Model):
2 name = Text(primary_key=True)
3 twitter = Text()
4 email = Text()
5
6 class TwitterToUser(Model):
7 twitter = Text(primary_key=True)
8 name = Text()
9
10 (twitter, name) = ("rustyrazorblade", "jon")
11
12 with BatchQuery() as b:
13 User.batch(b).create(name=name, twitter=twitter)
14 EmailToUser.batch(b).create(twitter=twitter, name=name)
19. Fetching a Row
• Model.get() can be used to
fetch a single row
• Will throw a DoesNotExist
exception if not found