Your SlideShare is downloading. ×
0
HOW DO ICASSANDRA?Rick Branson, DataStaxMemphis Python Users GroupNovember 21st, 2011
WHO?•   This is my first day at DataStax, halp!•   Coroutine for 18 months•   American Roamer for 3 years•   FedEx for 4 y...
OCCUPY SQLWhy is it that 99% of the applications go into 1% of database softwares?
HOW DO I NOSQL?1. Anvil transitions. Deal with it.2. Find data store that doesn’t use SQL3. ANYTHING4. Cram all the things...
SEEMS LEGIT... it really is legit. There are real data sets that don’t make sense in the relational model, nor            ...
THE GREAT HAMMER... but, the typical developer only has a relational       database; everything looks like a nail.
SNAKE OILTypically, developers pick data structures suited to each use case. The purveyors of relational would  have you b...
FIT WHAT INTO WHERE? Trees, Semi-Structured Data, Web Content,     Multi-Dimensional Cubes, Graphs
HAMMERS GONNA HAMMER It does work – so does COBOL. Just like a motion    picture film, the speed of modern computers      ...
MAKE A WHAT OUTTA WHO?     ASSUMPTIONS MADE ON YOUR BEHALF• Data must be synchronized everywhere,  instantaneously• The st...
THE MOVEMENT“No one tool is appropriate for   solving every problem.”                     ~ Cliff McKinney                ...
SRS BZNS10gen, Basho, DataStax, Cloudant, Cloudera,       HortonWorks, VMware, Oracle                         Guess the nu...
MOVING ONOH: “The last DBA in the room went dataway”
AN DEFINITION  Cassandra – affectionately “C*” – is an opensource, distributed store for structured data that scales-out o...
FUH-GET ABOUT IT                     Ruins availability  • Strict Consistency  • Transactions Deadlocks & Not scalable  • ...
THE REWARDS•   Simplicity of Operations•   Transparency•   Very High Availability•   Painless Scale-Out•   Solid, Predicta...
Cassandra is about trading somedeveloper time for operational pain – or 3am phone calls as some of us              know it.
DEM GOOD GENES Put simply: the scalable, efficient storage model from Google’s BigTable atop the incredibly faultresistant...
VS OTHER NoSQL•   Truly Distributed Design•   No Special Servers, No SPOF•   2D Data Model•   Rocks on Spinning Disks•   S...
HORRIBLE IDEAS     THINGS NOT TO USE CASSANDRA FOR•   Prototyping•   Temporal Data•   Distributed Locks•   Message Queues•...
UNBOUNDED HUMANS ... OR DATA THAT THRIVES ON CASSANDRA• Social  Voluntary, human-generated data: text messages, comments, ...
TO START• easy_install pycassa• Pretty much ignore any info pre-0.7• Use DataStax Community Edition• Unfortunately, the “o...
WHY CASSANDRA +          PYTHON?• pycassa is one of the best client• Healthy Cassandra+Python Community• Many examples of ...
PYTHON + CASSANDRANOTABLE IMPLEMENTATIONS• Reddit  Caches generated comment trees in Cassandra• UrbanAirship  Delivers 20M...
DATA MODEL“I gotcha column family ratttt heeeeyyyaahhh!”
AN COLUMN FAMILY jim     age: 36   car: camaro   gender: M carol   age: 37   car: subaru   gender: Fjohnny   age:12    gen...
AN COLUMN FAMILY  • Don’t think in relational database terms  • Columns are { name, value } tuples  • It’s dictionaries al...
Row Key  jim     age: 36   car: camaro   gender: M carol    age: 37   car: subaru   gender: F johnny   age:12    gender: M...
Row Key  jim     age: 36   car: camaro   gender: M carol    age: 37   car: subaru   gender: F johnny   age:12    gender: M...
Columns jim     age: 36    car: camaro   gender: Mcarol    age: 37    car: subaru   gender: Fjohnny   age:12     gender: M...
Column Names jim     age: 36   car: camaro   gender: Mcarol    age: 37   car: subaru   gender: Fjohnny   age: 12   gender:...
Column Names jim       age: 36   car: camaro   gender: Mcarol      age: 37   car: subaru   gender: Fjohnny     age: 12   g...
Column Values jim     age: 36   car: camaro   gender: Mcarol    age: 37   car: subaru   gender: Fjohnny   age: 12   gender...
PARTITIONINGCassandra divides the data set into ranges and distributes rows among multiple servers to              achieve...
Row Key determines node placement  jim     age: 36   car: camaro   gender: M  carol   age: 37   car: subaru   gender: F jo...
Row Key   MD5 Hash  jim     5e02739678...                             MD5 hash carol    a9a0198010...   operation yields a...
Conceptually, hashed keys are placedalong a ring representing the keyspace.              Keyspace
The ring is divided up into ranges, one     for each node in the cluster.           Node A     Node B          Node D     ...
In a 4-node cluster, each node holds      roughly a quarter of the rows.               End (Token)                        ...
Start            EndA   0xc000000000..1 0x0000000000..0B   0x0000000000..1 0x4000000000..0C   0x4000000000..1 0x8000000000...
Start            EndA   0xc000000000..1 0x0000000000..0B   0x0000000000..1 0x4000000000..0C   0x4000000000..1 0x8000000000...
Start            EndA   0xc000000000..1 0x0000000000..0B   0x0000000000..1 0x4000000000..0C   0x4000000000..1 0x8000000000...
Start            EndA   0xc000000000..1 0x0000000000..0B   0x0000000000..1 0x4000000000..0C   0x4000000000..1 0x8000000000...
Start            EndA   0xc000000000..1 0x0000000000..0B   0x0000000000..1 0x4000000000..0C   0x4000000000..1 0x8000000000...
REPLICATIONCassandra creates multiple copies of rows so that if any one node is behaving badly, all of the data       stay...
REPLICATION•   Column Family has “Replication Factor”•   RF = total number of copies of each row•   Don’t be fail; NEVER u...
Replicas are chosen by jumping to the        next node(s) on the ring                        Node A   Node B              ...
Replication Factor = 2                        Node A   Node B                        Node D   Node Ccarol   a9a0198010...
Replication Factor = 3                        Node A   Node B                        Node D   Node Ccarol   a9a0198010...
DEEPER: TIMESTAMPS• I lied earlier• Columns are actually tuples of:  { name, value, timestamp }• Distributed conflict reso...
jim     age: 36   car: camaro   gender: Mcarol    age: 37   car: subaru   gender: Fjohnny   age:12    gender: Msuzy     ag...
jim     age: 36   car: camaro   gender: Mcarol    age: 37   car: subaru   gender: Fjohnny   age:12    gender: Msuzy     ag...
jim       age: 36          car: camaro        gender: M         2011/01/01 12:35   2011/01/01 12:35   2011/01/01 12:35caro...
Per-Column Timestamp      jim       age: 36          car: camaro        gender: M              2011/01/01 12:35   2011/01/...
jim       age: 36          car: camaro        gender: M         2011/01/01 12:35   2011/01/01 12:35   2011/01/01 12:35caro...
carol¹     age: 37          car: subaru         gender: F         2011/01/01 12:38   2011/01/01 12:38   2011/01/01 12:39ca...
insert(“carol”,       { “car”: “daewoo”, 2011/11/15 15:00 })    carol¹     age: 37          car: subaru         gender: F ...
insert(“carol”,       { “car”: “daewoo”, 2011/11/15 15:00 })    carol¹     age: 37          car: subaru         gender: F ...
insert(“carol”,       { “car”: “daewoo”, 2011/11/15 15:00 })    carol¹     age: 37          car: subaru             gender...
insert(“carol”,       { “car”: “daewoo”, 2011/11/15 15:00 })    carol¹     age: 37          car: subaru         gender: F ...
insert(“carol”,       { “car”: “daewoo”, 2011/11/15 15:00 })    carol¹     age: 37          car: daewoo         gender: F ...
DEEPER: CONSISTENCY• I lied again left something out• All ops also have a consistency level• Controls number of replicas i...
WRITE CONSISTENCY LEVEL:                      ONEcarol¹       age: 37          car: subaru         gender: F           201...
WRITE CONSISTENCY LEVEL:               QUORUMcarol¹       age: 37          car: daewoo         gender: F           2011/01...
READ CONSISTENCY LEVEL:                      ONEcarol¹       age: 37          car: subaru         gender: F           2011...
READ CONSISTENCY LEVEL:                    QUORUMTimestamp resolves conflict, “daewoo” wins     carol¹       age: 37      ...
QUICKIE LIST-AS-COLUMNSTIMELINE COLUMN FAMILY         time-uuid:   time-uuid:   time-uuid:  jim         “hello!”      “ech...
THINK IN QUERIES•   Break your normalization habit•   Roughly ~one CF per query•   Denormalize!•   Use in-column entity ca...
QUESTIONS YET?
PYCASSA           CREATE A CONNECTION>>> pool = pycassa.ConnectionPool(“Keyspace”,                           server_list=[...
PYCASSA          OPEN A COLUMN FAMILY>>> cf = pycassa.ColumnFamily(pool, “People”)                               Column Fa...
PYCASSA                    GET A ROW          Row Key>>> cf.get(“carol”){ “age”: “37”, “car”: “subaru”, “gender”: “F” }All...
PYCASSA           GET SPECIFIC COLUMNS         Row Key           Column Names>>> cf.get(“carol”, columns=[“age”, “gender”]...
PYCASSA           GET SLICE OF COLUMNS         Row Key              Column Name Range>>> cf.get(“carol”, column_start=”a”,...
PYCASSA             “UPSERT” A COLUMN                   Column Name>>> cf.insert(“carol”, { “car”: “mercedes” })          ...
PYCASSA                 DELETE A ROW>>> cf.remove(“carol”)            Row Key
PYCASSA              DELETE A COLUMN                         Column Names>>> cf.remove(“carol”, columns=[“car”])          ...
PYCASSA            CUSTOM TIMESTAMPS    • Default is GMT timestamp    • Overridden for inserts & removes    • Per-CF gener...
PYCASSA             CONSISTENCY LEVEL    • Can be overridden on all operations    • Global default read & write is ONE    ...
PYCASSA                      UUIDv4>>> import uuid>>> cf.insert(uuid.uuid4().bytes_le, { “foo”: “bar” })>>> cf.insert(“joe...
PYCASSA            TimeUUID (aka UUIDv1)>>> import uuid>>> timeline.insert(“joe”, { uuid.uuid1(): “hello!” })
DIDN’T COVER•   CQL•   Batch Operations•   Schema Definition•   SuperColumns (don’t use them)•   Key Range Queries (don’t ...
SHAMELESS• DataStax is the leading provider of  Apache Cassandra training, support,  and consulting services• DataStax Ent...
QUESTIONS?
THANKShttp://www.datastax.com/docs/1.0/http://www.datastax.com/dev/tutorials/http://www.datastax.com/products/community/ht...
Upcoming SlideShare
Loading in...5
×

How Do I Cassandra?

30,376

Published on

Published in: Technology, Sports
4 Comments
91 Likes
Statistics
Notes
No Downloads
Views
Total Views
30,376
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
729
Comments
4
Likes
91
Embeds 0
No embeds

No notes for slide

Transcript of "How Do I Cassandra?"

  1. 1. HOW DO ICASSANDRA?Rick Branson, DataStaxMemphis Python Users GroupNovember 21st, 2011
  2. 2. WHO?• This is my first day at DataStax, halp!• Coroutine for 18 months• American Roamer for 3 years• FedEx for 4 years
  3. 3. OCCUPY SQLWhy is it that 99% of the applications go into 1% of database softwares?
  4. 4. HOW DO I NOSQL?1. Anvil transitions. Deal with it.2. Find data store that doesn’t use SQL3. ANYTHING4. Cram all the things into it5. Triumphantly blog this success6. Complain a month later when it bursts into flames
  5. 5. SEEMS LEGIT... it really is legit. There are real data sets that don’t make sense in the relational model, nor modern ACID databases.
  6. 6. THE GREAT HAMMER... but, the typical developer only has a relational database; everything looks like a nail.
  7. 7. SNAKE OILTypically, developers pick data structures suited to each use case. The purveyors of relational would have you believe everything fits into one mold.
  8. 8. FIT WHAT INTO WHERE? Trees, Semi-Structured Data, Web Content, Multi-Dimensional Cubes, Graphs
  9. 9. HAMMERS GONNA HAMMER It does work – so does COBOL. Just like a motion picture film, the speed of modern computers creates an illusion.
  10. 10. MAKE A WHAT OUTTA WHO? ASSUMPTIONS MADE ON YOUR BEHALF• Data must be synchronized everywhere, instantaneously• The structure of data doesn’t change over time• Hardware never fails and networks are perfect• Humans always do the right thing• No upgrading or patches are ever needed• Planned outages are good for the soul
  11. 11. THE MOVEMENT“No one tool is appropriate for solving every problem.” ~ Cliff McKinney Gettysburg Address, cir. 1863
  12. 12. SRS BZNS10gen, Basho, DataStax, Cloudant, Cloudera, HortonWorks, VMware, Oracle Guess the number of craps Russ here gives about databases?
  13. 13. MOVING ONOH: “The last DBA in the room went dataway”
  14. 14. AN DEFINITION Cassandra – affectionately “C*” – is an opensource, distributed store for structured data that scales-out on cheap, commodity hardware and stays up even when things get really bad.
  15. 15. FUH-GET ABOUT IT Ruins availability • Strict Consistency • Transactions Deadlocks & Not scalable • Ad-hoc Queries • Indexes aka “Dumbass” QueriesScumbag index covers everycolumn; ignored by planner
  16. 16. THE REWARDS• Simplicity of Operations• Transparency• Very High Availability• Painless Scale-Out• Solid, Predictable Performance on Commodity and Cloud Servers
  17. 17. Cassandra is about trading somedeveloper time for operational pain – or 3am phone calls as some of us know it.
  18. 18. DEM GOOD GENES Put simply: the scalable, efficient storage model from Google’s BigTable atop the incredibly faultresistant distributed design of Amazon’s Dynamo resisting, not just tolerating!
  19. 19. VS OTHER NoSQL• Truly Distributed Design• No Special Servers, No SPOF• 2D Data Model• Rocks on Spinning Disks• Single-Server Durability
  20. 20. HORRIBLE IDEAS THINGS NOT TO USE CASSANDRA FOR• Prototyping• Temporal Data• Distributed Locks• Message Queues• ... anywhere availability doesn’t matter
  21. 21. UNBOUNDED HUMANS ... OR DATA THAT THRIVES ON CASSANDRA• Social Voluntary, human-generated data: text messages, comments, check- ins, real-time collaboration• Activity Involuntary or indirectly human-generated data: news feeds, timelines, clickstreams, system + application logs• Images Smaller files < ~10MB fit well into columns• ... anywhere availability is paramount, (not necessarily big data or scale either)
  22. 22. TO START• easy_install pycassa• Pretty much ignore any info pre-0.7• Use DataStax Community Edition• Unfortunately, the “official” wiki sucks, so use DataStax docs & tutorials• IRC, Mailing List, StackOverflow, Quora all monitored regularly and kick ass
  23. 23. WHY CASSANDRA + PYTHON?• pycassa is one of the best client• Healthy Cassandra+Python Community• Many examples of real, solid, production Python applications built on- top of Cassandra
  24. 24. PYTHON + CASSANDRANOTABLE IMPLEMENTATIONS• Reddit Caches generated comment trees in Cassandra• UrbanAirship Delivers 20M push notifications per day; bursts to 100k/sec+• GameFly behind their social network The database• Digg Pretty much the whole site
  25. 25. DATA MODEL“I gotcha column family ratttt heeeeyyyaahhh!”
  26. 26. AN COLUMN FAMILY jim age: 36 car: camaro gender: M carol age: 37 car: subaru gender: Fjohnny age:12 gender: M suzy age:10 gender: F
  27. 27. AN COLUMN FAMILY • Don’t think in relational database terms • Columns are { name, value } tuples • It’s dictionaries all the way down... { “key”: { “name”: “value” } }Column ColumnFamily Row
  28. 28. Row Key jim age: 36 car: camaro gender: M carol age: 37 car: subaru gender: F johnny age:12 gender: M suzy age:10 gender: F
  29. 29. Row Key jim age: 36 car: camaro gender: M carol age: 37 car: subaru gender: F johnny age:12 gender: M suzy age:10 gender: F Rows are randomly ordered
  30. 30. Columns jim age: 36 car: camaro gender: Mcarol age: 37 car: subaru gender: Fjohnny age:12 gender: Msuzy age:10 gender: F
  31. 31. Column Names jim age: 36 car: camaro gender: Mcarol age: 37 car: subaru gender: Fjohnny age: 12 gender: Msuzy age: 10 gender: F
  32. 32. Column Names jim age: 36 car: camaro gender: Mcarol age: 37 car: subaru gender: Fjohnny age: 12 gender: Msuzy age: 10 gender: F Column Name dictates order
  33. 33. Column Values jim age: 36 car: camaro gender: Mcarol age: 37 car: subaru gender: Fjohnny age: 12 gender: Msuzy age: 10 gender: F
  34. 34. PARTITIONINGCassandra divides the data set into ranges and distributes rows among multiple servers to achieve scale-out.
  35. 35. Row Key determines node placement jim age: 36 car: camaro gender: M carol age: 37 car: subaru gender: F johnny age:12 gender: M suzy age:10 gender: F
  36. 36. Row Key MD5 Hash jim 5e02739678... MD5 hash carol a9a0198010... operation yields a 128-bit number johnny f4eb27cea7... for keys of any size. suzy 78b421309e...
  37. 37. Conceptually, hashed keys are placedalong a ring representing the keyspace. Keyspace
  38. 38. The ring is divided up into ranges, one for each node in the cluster. Node A Node B Node D Node C
  39. 39. In a 4-node cluster, each node holds roughly a quarter of the rows. End (Token) StartNode A 0x00000000000000000000000000000000 0xc0000000000000000000000000000001Node B 0x40000000000000000000000000000000 0x00000000000000000000000000000001Node C 0x80000000000000000000000000000000 0x40000000000000000000000000000001Node D 0xc0000000000000000000000000000000 0x80000000000000000000000000000001
  40. 40. Start EndA 0xc000000000..1 0x0000000000..0B 0x0000000000..1 0x4000000000..0C 0x4000000000..1 0x8000000000..0D 0x8000000000..1 0xc000000000..0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
  41. 41. Start EndA 0xc000000000..1 0x0000000000..0B 0x0000000000..1 0x4000000000..0C 0x4000000000..1 0x8000000000..0D 0x8000000000..1 0xc000000000..0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
  42. 42. Start EndA 0xc000000000..1 0x0000000000..0B 0x0000000000..1 0x4000000000..0C 0x4000000000..1 0x8000000000..0D 0x8000000000..1 0xc000000000..0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
  43. 43. Start EndA 0xc000000000..1 0x0000000000..0B 0x0000000000..1 0x4000000000..0C 0x4000000000..1 0x8000000000..0D 0x8000000000..1 0xc000000000..0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
  44. 44. Start EndA 0xc000000000..1 0x0000000000..0B 0x0000000000..1 0x4000000000..0C 0x4000000000..1 0x8000000000..0D 0x8000000000..1 0xc000000000..0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
  45. 45. REPLICATIONCassandra creates multiple copies of rows so that if any one node is behaving badly, all of the data stays available and fully operational.
  46. 46. REPLICATION• Column Family has “Replication Factor”• RF = total number of copies of each row• Don’t be fail; NEVER use 1• Use 3+ in the “cloud”• Use 2+ on redundant hardware• Write operations on any replica
  47. 47. Replicas are chosen by jumping to the next node(s) on the ring Node A Node B Node D Node Ccarol a9a0198010...
  48. 48. Replication Factor = 2 Node A Node B Node D Node Ccarol a9a0198010...
  49. 49. Replication Factor = 3 Node A Node B Node D Node Ccarol a9a0198010...
  50. 50. DEEPER: TIMESTAMPS• I lied earlier• Columns are actually tuples of: { name, value, timestamp }• Distributed conflict resolution• Highest timestamp value wins!• Provided by client for every write
  51. 51. jim age: 36 car: camaro gender: Mcarol age: 37 car: subaru gender: Fjohnny age:12 gender: Msuzy age:10 gender: F
  52. 52. jim age: 36 car: camaro gender: Mcarol age: 37 car: subaru gender: Fjohnny age:12 gender: Msuzy age:10 gender: F
  53. 53. jim age: 36 car: camaro gender: M 2011/01/01 12:35 2011/01/01 12:35 2011/01/01 12:35carol age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39johnny age:12 gender: M 2011/05/15 07:35 2011/05/15 07:35suzy age:10 gender: F 2011/09/17 19:13 2011/09/17 19:13
  54. 54. Per-Column Timestamp jim age: 36 car: camaro gender: M 2011/01/01 12:35 2011/01/01 12:35 2011/01/01 12:35 carol age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 johnny age:12 gender: M 2011/05/15 07:35 2011/05/15 07:35 suzy age:10 gender: F 2011/09/17 19:13 2011/09/17 19:13
  55. 55. jim age: 36 car: camaro gender: M 2011/01/01 12:35 2011/01/01 12:35 2011/01/01 12:35carol age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39johnny age:12 gender: M 2011/05/15 07:35 2011/05/15 07:35suzy age:10 gender: F 2011/09/17 19:13 2011/09/17 19:13
  56. 56. carol¹ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39carol² age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39carol³ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39
  57. 57. insert(“carol”, { “car”: “daewoo”, 2011/11/15 15:00 }) carol¹ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 carol² age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 carol³ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39
  58. 58. insert(“carol”, { “car”: “daewoo”, 2011/11/15 15:00 }) carol¹ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 carol² age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 carol³ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39
  59. 59. insert(“carol”, { “car”: “daewoo”, 2011/11/15 15:00 }) carol¹ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 carol² age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 carol³ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 2011/11/15 15:00 > 2011/01/01 12:38
  60. 60. insert(“carol”, { “car”: “daewoo”, 2011/11/15 15:00 }) carol¹ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 carol² age: 37 car: daewoo gender: F 2011/01/01 12:38 2011/11/15 15:00 2011/01/01 12:39 carol³ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39
  61. 61. insert(“carol”, { “car”: “daewoo”, 2011/11/15 15:00 }) carol¹ age: 37 car: daewoo gender: F 2011/01/01 12:38 2011/11/15 15:00 2011/01/01 12:39 carol² age: 37 car: daewoo gender: F 2011/01/01 12:38 2011/11/15 15:00 2011/01/01 12:39 carol³ age: 37 car: daewoo gender: F 2011/01/01 12:38 2011/11/15 15:00 2011/01/01 12:39 Cassandra replicates data automagically
  62. 62. DEEPER: CONSISTENCY• I lied again left something out• All ops also have a consistency level• Controls number of replicas involved in an operation (read or write)• Eventual Consistency > Full Consistency• Use lowest you can get away with• Use quorum in place of “all” consistency
  63. 63. WRITE CONSISTENCY LEVEL: ONEcarol¹ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39carol² age: 37 car: daewoo gender: F 2011/01/01 12:38 2011/11/15 15:00 2011/01/01 12:39carol³ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 Writes to any replica and acknowledges to client, other nodes eventually get the update
  64. 64. WRITE CONSISTENCY LEVEL: QUORUMcarol¹ age: 37 car: daewoo gender: F 2011/01/01 12:38 2011/11/15 15:00 2011/01/01 12:39carol² age: 37 car: daewoo gender: F 2011/01/01 12:38 2011/11/15 15:00 2011/01/01 12:39carol³ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 Writes to majority replicas, other nodes eventually get the update
  65. 65. READ CONSISTENCY LEVEL: ONEcarol¹ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39carol² age: 37 car: daewoo gender: F 2011/01/01 12:38 2011/11/15 15:00 2011/01/01 12:39carol³ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 Reads from first replica to respond to query, so could be inconsistent
  66. 66. READ CONSISTENCY LEVEL: QUORUMTimestamp resolves conflict, “daewoo” wins carol¹ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 carol² age: 37 car: daewoo gender: F 2011/01/01 12:38 2011/11/15 15:00 2011/01/01 12:39 carol³ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 Reads from majority replicas, so if paired with quorum writes, guarantees consistency
  67. 67. QUICKIE LIST-AS-COLUMNSTIMELINE COLUMN FAMILY time-uuid: time-uuid: time-uuid: jim “hello!” “echo!” “anyone!” time-uuid: time-uuid: carol “first!” “second!” Time-sortable globally unique IDs orders columns by their insertion time
  68. 68. THINK IN QUERIES• Break your normalization habit• Roughly ~one CF per query• Denormalize!• Use in-column entity caching
  69. 69. QUESTIONS YET?
  70. 70. PYCASSA CREATE A CONNECTION>>> pool = pycassa.ConnectionPool(“Keyspace”, server_list=[“192.168.1.1”]) Moar servers here will distribute this client’s load over multiple nodes in the cluster.
  71. 71. PYCASSA OPEN A COLUMN FAMILY>>> cf = pycassa.ColumnFamily(pool, “People”) Column Family
  72. 72. PYCASSA GET A ROW Row Key>>> cf.get(“carol”){ “age”: “37”, “car”: “subaru”, “gender”: “F” }All columns returned as a dictionary
  73. 73. PYCASSA GET SPECIFIC COLUMNS Row Key Column Names>>> cf.get(“carol”, columns=[“age”, “gender”]){ “age”: “37”, “gender”: “F” }
  74. 74. PYCASSA GET SLICE OF COLUMNS Row Key Column Name Range>>> cf.get(“carol”, column_start=”a”, column_end=”d”){ “age”: “37”, “car”: “subaru” }
  75. 75. PYCASSA “UPSERT” A COLUMN Column Name>>> cf.insert(“carol”, { “car”: “mercedes” }) Row Key Column Value
  76. 76. PYCASSA DELETE A ROW>>> cf.remove(“carol”) Row Key
  77. 77. PYCASSA DELETE A COLUMN Column Names>>> cf.remove(“carol”, columns=[“car”]) Row Key
  78. 78. PYCASSA CUSTOM TIMESTAMPS • Default is GMT timestamp • Overridden for inserts & removes • Per-CF generator function can be specified. Insert beats delete!>>> cf.insert(“carol”, { “car”: “subaru” }, timestamp=10000001)>>> cf.remove(“carol”, columns=[“car”], timestamp=10000000)
  79. 79. PYCASSA CONSISTENCY LEVEL • Can be overridden on all operations • Global default read & write is ONE • Per-CF override for default>>> cf.insert(“carol”, { “car”: “subaru” }, write_consistency_level=ConsistencyLevel.QUORUM)>>> cf.get(“carol”, read_consistency_level=ConsistencyLevel.QUORUM)
  80. 80. PYCASSA UUIDv4>>> import uuid>>> cf.insert(uuid.uuid4().bytes_le, { “foo”: “bar” })>>> cf.insert(“joe”, { uuid.uuid4(): “bar” })
  81. 81. PYCASSA TimeUUID (aka UUIDv1)>>> import uuid>>> timeline.insert(“joe”, { uuid.uuid1(): “hello!” })
  82. 82. DIDN’T COVER• CQL• Batch Operations• Schema Definition• SuperColumns (don’t use them)• Key Range Queries (don’t use them)• Secondary Indexes• ... and the list goes on
  83. 83. SHAMELESS• DataStax is the leading provider of Apache Cassandra training, support, and consulting services• DataStax Enterprise is Hadoop on top of Cassandra; really nice for operations• Seriously smart peeps, no lie• www.datastax.com
  84. 84. QUESTIONS?
  85. 85. THANKShttp://www.datastax.com/docs/1.0/http://www.datastax.com/dev/tutorials/http://www.datastax.com/products/community/http://pycassa.github.com/http://apache.cassandra.org/Twitter: @rbranson
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×