Your SlideShare is downloading. ×
0
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Cassandra for Python Developers
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Cassandra for Python Developers

2,939

Published on

A high level introduction to Apache Cassandra followed by an introduction to pycassa, the Python client library for Cassandra. …

A high level introduction to Apache Cassandra followed by an introduction to pycassa, the Python client library for Cassandra.

Presented at PyTexas 2011 by Tyler Hobbs.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,939
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
86
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Cassandra forPython Developers Tyler Hobbs
  • 2. History Open-sourced by Facebook 2008 Apache Incubator 2009 Top-level Apache project 2010 DataStax founded 2010
  • 3. Strengths Scalable – 2x Nodes == 2x Performance Reliable (Available) – Replication that works – Multi-DC support – No single point of failure
  • 4. Strengths Fast – 10-30k writes/sec, 1-10k reads/sec Analytics – Integrated Hadoop support
  • 5. Weaknesses No ACID transactions – Dont need these as often as youd think Limited support for ad-hoc queries – Youll give these up anyway when sharding an RDBMS Generally complements another system – Not intended to be one-size-fits-all
  • 6. Clustering Every node plays the same role – No masters, slaves, or special nodes
  • 7. Clustering 0 50 10 40 20 30
  • 8. Clustering Key: “www.google.com” 0 50 10 40 20 30
  • 9. Clustering Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30
  • 10. Clustering Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30
  • 11. Clustering Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30
  • 12. Clustering Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30 Replication Factor = 3
  • 13. Clustering Client can talk to any node
  • 14. Data Model Keyspace – A collection of Column Families – Controls replication settings Column Family – Kinda resembles a table
  • 15. ColumnFamilies Static – Object data Dynamic – Pre-calculated query results
  • 16. Static Column Families Users zznate password: * name: Nate driftx password: * name: Brandon thobbs password: * name: Tyler jbellis password: * name: Jonathan site: riptano.com
  • 17. Dynamic Column Families Followingzznate driftx: thobbs:driftxthobbs zznate:jbellis driftx: mdennis: pcmanus thobbs: xedin: zznate
  • 18. Dynamic Column Families Timeline of tweets by a user Timeline of tweets by all of the people a user is following List of comments sorted by score List of friends grouped by state
  • 19. Pycassa Python client library for Cassandra Open Source (MIT License) – www.github.com/pycassa/pycassa Users – Reddit – ~10k github downloads of every version
  • 20. Installing Pycassa easy_install pycassa – or pip
  • 21. Basic Layout pycassa.pool – Connection pooling pycassa.columnfamily – Primary module for the data API pycassa.system_manager – Schema management
  • 22. The Data API RPC-based API Rows are like a sorted list of (name,value) tuples – Like a dict, but sorted by the names – OrderedDicts are used to preserve sorting
  • 23. Inserting Data>>> from pycassa.pool import ConnectionPool>>> from pycassa.columnfamily import ColumnFamily>>>>>> pool = ConnectionPool(“MyKeyspace”)>>> cf = ColumnFamily(pool, “MyCF”)>>>>>> cf.insert(“key”, {“col_name”: “col_value”})>>> cf.get(“key”){“col_name”: “col_value”}
  • 24. Inserting Data>>> columns = {“aaa”: 1, “ccc”: 3}>>> cf.insert(“key”, columns)>>> cf.get(“key”){“aaa”: 1, “ccc”: 3}>>>>>> # Updates are the same as inserts>>> cf.insert(“key”, {“aaa”: 42})>>> cf.get(“key”){“aaa”: 42, “ccc”: 3}>>>>>> # We can insert anywhere in the row>>> cf.insert(“key”, {“bbb”: 2, “ddd”: 4})>>> cf.get(“key”){“aaa”: 42, “bbb”: 2, “ccc”: 3, “ddd”: 4}
  • 25. Fetching Data>>> cf.get(“key”){“aaa”: 42, “bbb”: 2, “ccc”: 3, “ddd”: 4}>>>>>> # Get a set of columns by name>>> cf.get(“key”, columns=[“bbb”, “ddd”]){“bbb”: 2, “ddd”: 4}
  • 26. Fetching Data>>> # Get a slice of columns>>> cf.get(“key”, column_start=”bbb”,... column_finish=”ccc”){“bbb”: 2, “ccc”: 3}>>>>>> # Slice from “ccc” to the end>>> cf.get(“key”, column_start=”ccc”){“ccc”: 3, “ddd”: 4}>>>>>> # Slice from “bbb” to the beginning>>> cf.get(“key”, column_start=”bbb”,... column_reversed=True){“bbb”: 2, “aaa”: 42}
  • 27. Fetching Data>>> # Get the first two columns in the row>>> cf.get(“key”, column_count=2){“aaa”: 42, “bbb”: 2}>>>>>> # Get the last two columns in the row>>> cf.get(“key”, column_reversed=True,... column_count=2){“ddd”: 4, “ccc”: 3}
  • 28. Fetching Multiple Rows>>> columns = {“col”: “val”}>>> cf.batch_insert({“k1”: columns,... “k2”: columns,... “k3”: columns})>>>>>> # Get multiple rows by name>>> cf.multiget([“k1”,“k2”]){“k1”: {”col”: “val”}, “k2”: {“col”: “val”}}>>> # You can get slices of each row, too>>> cf.multiget([“k1”,“k2”], column_start=”bbb”) …
  • 29. Fetching a Range of Rows>>> # Get a generator over all of the rows>>> for key, columns in cf.get_range():... print key, columns“k1” {”col”: “val”}“k2” {“col”: “val”}“k3” {“col”: “val”}>>> # You can get slices of each row>>> cf.get_range(column_start=”bbb”) …
  • 30. Fetching Rows by Secondary Index>>> from pycassa.index import *>>>>>> # Build up our index clause to match>>> exp = create_index_expression(“name”, “Joe”)>>> clause = create_index_clause([exp])>>> matches = users.get_indexed_slices(clause)>>>>>> # results is a generator over matching rows>>> for key, columns in matches:... print key, columns“13” {”name”: “Joe”, “nick”: “thatguy2”}“257” {“name”: “Joe”, “nick”: “flowers”}“98” {“name”: “Joe”, “nick”: “fr0d0”}
  • 31. Deleting Data>>> # Delete a whole row>>> cf.remove(“key1”)>>>>>> # Or selectively delete columns>>> cf.remove(“key2”, columns=[“name”, “date”])
  • 32. Connection Management pycassa.pool.ConnectionPool – Takes a list of servers • Can be any set of nodes in your cluster – pool_size, max_retries, timeout – Automatically retries operations against other nodes • Writes are idempotent! – Individual node failures are transparent – Thread safe
  • 33. Async Options eventlet – Just need to monkeypatch socket and threading Twisted – Use Telephus instead of Pycassa – www.github.com/driftx/telephus – Less friendly, documented, etc
  • 34. Tyler Hobbs @tylhobbstyler@datastax.com

×