Apache Cassandra                                   Vova Miguro                               THE END                      ...
What is Cassandra?                    •        key-value store with some structure                    •        fault-toler...
Where did it come from?                    •        created at Facebook                             -   Dynamo: distributi...
Who uses it?                    •        Facebook (of cource)                    •        Rackspace                    •  ...
What problems does it solve?                    •        reliability at scale                             -   no single po...
What problems it can’t solve?                    •        no flexible indices (later about this)                    •      ...
Clustering: CAP                    •        CAP Theorem                             -   Consistency                       ...
Clustering: Replication & Consistency                    •        replication factor                             -   how m...
Clustering: Consistency Level                              zero            none                  write                    ...
Clustering: Ring                •      every node gets a token                    -        defines its place               ...
Clustering:Ring                •      every node gets a token                    -        defines its place                ...
Clustering:Ring                •      new node                    -        token assignment                    -        ra...
Clustering:Ring                •      node dies or becomes                       isolated                •      hinting ha...
Data Model                    •        keyspace                             •   column family                             ...
Data Model: ColumnFamily families                                    ColumnThursday, September 22, 11
Supercolumn families                Data Model: SuperColumnFamilyThursday, September 22, 11
Easier to start from the bottom upThursday, September 22, 11
Data Model: ColumnThursday, September 22, 11
Data Model: RowThursday, September 22, 11
Data Model: Column comparators                    •        TimeUUID                    •        LexicalUUID               ...
Data Model: ColumnFamilyThursday, September 22, 11
Writing                    •        simple: put(key,col,value)                    •        complex: put(key,[col,value,......
Writes                WritingThursday, September 22, 11
Reading                    •        get(): retrieve column by name                    •        multiget(): by column name ...
Reads                ReadingThursday, September 22, 11
Clients                      Python:                        •Pycassa: http://github.com/pycassa/pycassa                   ...
CQL (from 0.8)                    •        USE                    •        SELECT                    •        INSERT/UPDAT...
CQL: Example                   CREATE COLUMNFAMILY users (                  ... KEY varchar PRIMARY KEY,                  ...
CQL: Example                  CREATE INDEX birth_year_key ON users (birth_year);                CREATE INDEX state_key ON ...
Indexing                    •        secondary indexes                             -   hashed                             ...
Indexing: Self-managed: one-to-one                                     indexed indexed                                    ...
Indexing: Self-managed: one-to-several                                        indexed         indexed                     ...
Indexing: Self-managed: one-to-many                                        related key related key                        ...
Indexing: Self-managed: one-to-many                                         ordering    ordering                          ...
Let’s practice: Twitter                      Get a user record by username                  •   Get the friends of a usern...
Facebook messagingThursday, September 22, 11
?Thursday, September 22, 11
Upcoming SlideShare
Loading in …5
×

cassandra

812 views
749 views

Published on

Presentation from my speech at Software Craftsmanship Belarus

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
812
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

cassandra

  1. 1. Apache Cassandra Vova Miguro THE END trnl.me@gmail.comThursday, September 22, 11
  2. 2. What is Cassandra? • key-value store with some structure • fault-tolerant • scalable • eventual consistent • tunable - consistency level - replicationThursday, September 22, 11
  3. 3. Where did it come from? • created at Facebook - Dynamo: distribution architecture - BigTable: data model • open-sourced in 2008 • Apache incubator in early 2009 • graduation in March 2010Thursday, September 22, 11
  4. 4. Who uses it? • Facebook (of cource) • Rackspace • Twitter • Digg • Reddit • IBM • others...Thursday, September 22, 11
  5. 5. What problems does it solve? • reliability at scale - no single point of failure (all nodes are identical) • simple scaling (linear) • high write throughput • large data setsThursday, September 22, 11
  6. 6. What problems it can’t solve? • no flexible indices (later about this) • not good for big binary data (>64mb) unless you chunk • row contents must fit in available memoryThursday, September 22, 11
  7. 7. Clustering: CAP • CAP Theorem - Consistency - Availability - Partition tolerance • choose two • Cassandra chooses A and P but allows them to be tunable to have more CThursday, September 22, 11
  8. 8. Clustering: Replication & Consistency • replication factor - how many nodes data is replicated on • consistency level - zero (async write) - any - one - quorum (rf/2+1) - allThursday, September 22, 11
  9. 9. Clustering: Consistency Level zero none write (async write) any 1st response write (included hinted handoff) one 1st response read/write quorum rf/2 + 1 read/write all all read/writeThursday, September 22, 11
  10. 10. Clustering: Ring • every node gets a token - defines its place in the ring - and which keys it is responsible for (ranges)Thursday, September 22, 11
  11. 11. Clustering:Ring • every node gets a token - defines its place in the ring - and which keys it is responsible for (ranges)Thursday, September 22, 11
  12. 12. Clustering:Ring • new node - token assignment - ranges adjusted - bootstrap - only neighbor nodes affectedThursday, September 22, 11
  13. 13. Clustering:Ring • node dies or becomes isolated • hinting handoffThursday, September 22, 11
  14. 14. Data Model • keyspace • column family • row (indexed) • key • columns • name (sorted) • valueThursday, September 22, 11
  15. 15. Data Model: ColumnFamily families ColumnThursday, September 22, 11
  16. 16. Supercolumn families Data Model: SuperColumnFamilyThursday, September 22, 11
  17. 17. Easier to start from the bottom upThursday, September 22, 11
  18. 18. Data Model: ColumnThursday, September 22, 11
  19. 19. Data Model: RowThursday, September 22, 11
  20. 20. Data Model: Column comparators • TimeUUID • LexicalUUID • UTF8 • Long • Bytes • ...Thursday, September 22, 11
  21. 21. Data Model: ColumnFamilyThursday, September 22, 11
  22. 22. Writing • simple: put(key,col,value) • complex: put(key,[col,value,...col,value]) • batch: multi keyThursday, September 22, 11
  23. 23. Writes WritingThursday, September 22, 11
  24. 24. Reading • get(): retrieve column by name • multiget(): by column name for a number of keys • get_slice(): by column name or a range of names - returning columns - returning supercolumns • multiget_slice(): a subset of columns for a set of keys • get_count(): number of columns or subcolumns • get_range_slice(): subset of columns for a range of keysThursday, September 22, 11
  25. 25. Reads ReadingThursday, September 22, 11
  26. 26. Clients Python: •Pycassa: http://github.com/pycassa/pycassa •Telephus: http://github.com/driftx/Telephus (Twisted) • Java: •Hector: http://github.com/rantav/hector •Kundera http://github.com/impetus-opensource/Kundera •Pelops: http://github.com/s7/scale7-pelops •Cassandrelle (Demoiselle Cassandra): http://demoiselle.sf.net/ component/demoiselle-cassandra/ • .NET •Aquiles: http://aquiles.codeplex.com/ • Ruby: •Cassandra: http://github.com/fauna/cassandra • PHP: •PHP Client Library: https://github.com/kallaspriit/Cassandra-PHP- Client-Library •phpcassa: http://github.com/thobbs/phpcassaThursday, September 22, 11
  27. 27. CQL (from 0.8) • USE • SELECT • INSERT/UPDATE • DELETE • TRUNCATE/DROP • BATCH • CREATE KEYSPACE • CREATE COLUMNFAMILY • CREATE INDEXThursday, September 22, 11
  28. 28. CQL: Example CREATE COLUMNFAMILY users ( ... KEY varchar PRIMARY KEY, ... password varchar, ... gender varchar, ... session_token varchar, ... state varchar, ... birth_year bigint); INSERT INTO users (KEY, password) VALUES (jsmith, ch@ngem3a); SELECT * FROM users WHERE KEY=jsmith; ujsmith | upassword,uch@ngem3a DROP COLUMNFAMILY users;Thursday, September 22, 11
  29. 29. CQL: Example CREATE INDEX birth_year_key ON users (birth_year); CREATE INDEX state_key ON users (state); SELECT * FROM users ... WHERE gender=f AND ... state=TX AND ... birth_year=1968; uuser1 | ubirth_year,1968 | ugender,uf | upassword,uch@ngem3 | ustate,uTX DROP COLUMNFAMILY users;Thursday, September 22, 11
  30. 30. Indexing • secondary indexes - hashed - equality predicates (where column x = y) - specified on creation or later - best when many rows with similar columns • self-managed indexesThursday, September 22, 11
  31. 31. Indexing: Self-managed: one-to-one indexed indexed value #1 value #2 index name related related key keyThursday, September 22, 11
  32. 32. Indexing: Self-managed: one-to-several indexed indexed value #1 value #2 index name related related related related key key key keyThursday, September 22, 11
  33. 33. Indexing: Self-managed: one-to-many related key related key indexed value #1 - - related key related key indexed value #2 - -Thursday, September 22, 11
  34. 34. Indexing: Self-managed: one-to-many ordering ordering indexed value value value #1 related key related key ordering ordering indexed value value value #2 related key related keyThursday, September 22, 11
  35. 35. Let’s practice: Twitter Get a user record by username • Get the friends of a username • Get the followers of a username • Get a timeline for a user • Get a timeline of a specific user’s tweets • Get a tweet from a tweet ID • Create a tweet • Create a user • Add friends to a user • Remove friends from a userThursday, September 22, 11
  36. 36. Facebook messagingThursday, September 22, 11
  37. 37. ?Thursday, September 22, 11

×