DATASTAX C*OLLEGE CREDIT:AN INTRODUCTION TO APACHE CASSANDRA                      Aaron MortonApache Cassandra Committer, ...
Overview  The ClusterThe Data Model    The API
Cassandra  - Started at Facebook  - Open sourced in 2008  - Top Level Apache projectsince 2010.
Used by...   Netflix, Twitter, Reddit, Rackspace...
Inspiration  - Google Big Table (2006)  - Amazon Dynamo (2007)
Why Cassandra? - Scale - Operations - Data Model
Why Cassandra? Is My App a Good Fit for Apache Cassandra? Eric Lubow (CTO, SimpleReach) Wednesday October 24 @ 8:30AM PST ...
Overview The ClusterThe Data Model   The API
Store ‘foo’ key with Replication Factor 3.                              Node 1 - foo                     Node 4           ...
Consistent Hashing. - Evenly map keys to nodes - Minimise key movementswhen nodes join or leave
Partitioner.     RandomPartitioner   transforms Keys to Tokens           using MD5.               (Default, there are othe...
Keys and Tokens?    key     fop   foo  token 0    10     90      99
Token Ring.                          99   0                  foo            fop              token: 90            token: 10
Token Ranges.                                   Node 1                                   token: 0                         ...
Locate Token Range.                                              Node 1                                              token...
Replication Strategy selectsReplication Factor number of      nodes for a row.
SimpleStrategy with RF 3.                                          Node 1                                          token: ...
NetworkTopologyStrategy uses a Replication Factor per Data           Centre.            (Default.)
Multi DC Replication with RF 3 and RF 2.                         Node 1                              Node 10              ...
The Snitch knows which DataCentre and Rack the Node is             in.
SimpleSnitch. Places all nodes in the same        DC and Rack.          (Default, there are others.)
PropertyFileSnitch. DC and Rack is specified per   node via configuration.
EC2Snitch.DC is set to AWS Region and a Rack to Availability Zone.
DynamicSnitch.Re-orders nodes according totheir observed performance.           (Wraps other snitch.)
The Client and the Coordinator.                                            Node 1                                         ...
Gossip.Nodes share information witha small number of neighbours.Who share information with a   small number of neigh..
Multi DC Client and the Coordinator.                          Node 1                              Node 10                 ...
Consistency Level (CL).  - Specified for each request  - Number of nodes to waitfor.
Consistency Level (CL)  - Any*  - One, Two Three  - QUORUM  - LOCAL_QUORUM, EACH_QUOURM*
QUOURM at Replication Factor...   Replication                 2 or 3   4 or 5   6 or 7     Factor   QUOURM          2     ...
QUOURM at with RF3.                                         Node 1                                         token: 0       ...
Write ‘foo’ at QUOURM with Hinted Handoff.                                             Node 1                             ...
Read ‘foo’ at QUOURM.                                       Node 1                                       foo              ...
Consistency Levelnodes must agree.
Column Timestamps used to resolve    differences.
Resolving differences.    Column        Node 1           Node 2           Node 3                    cromulent        cromu...
Consistent read for ‘foo’ at QUOURM.                    Node 1                                         Node 1             ...
Strong Consistency          W+R>N  (#Write Nodes + #Read Nodes> Replication Factor)
Achieving Strong Consistency.  - QUOURM Read + QUORUM Write  - ALL Read + ONE Write  - ONE Read + ALL Write
Eventual Consistency.       W + R <= N
Achieving Consistency.  - Hinted Handoff  - Read Repair  - Scheduled nodetool repair
Overview  The ClusterThe Data Model    The API
Data Model so far.     Row Key:   Column        Column   Column                  (Incomplete.)
Data Model.                           Keyspace               Column Family   Column Family   Column Family                ...
Rows are the unit of   replication.
The Column Family   is the unit of      storage.
Inside the Column Family.                            Keyspace                                Column Family                ...
Basic Data Types  - ASCII, UTF8  - Integer, Long, Float, Double, Boolean  - Date  - UUID  - Bytes  - Counter*
Composite Data Types   - Two or more Basic types   - Ordered by each component   - e.g. (IntegerType, UTF8) to hold(timest...
Data Modelling.  Data Modelling for Apache Cassandra  Aaron Morton (Cassandra Committer)  Wednesday November 7 @ 11AM PST ...
Overview  The ClusterThe Data Model   The API
The API.  - Original Thrift based RPC  - Declarative Cassandra Query Language(CQL)
RPC via Python pycassa.# pycassa - Python>>> col_fam = pycassa.ColumnFamily(connection_pool,ColumnFamily1)>>> col_fam.inse...
RPC via Python pycassa...# pycassa - Python>>> col_fam.get(row_key){col_name: col_val, col_name2: col_val2}>>> col_fam.mul...
RPC via Python pycassa...# pycassa - Python>>> col_fam.remove(row_key)>>> col_fam.remove(row_key, [‘col_name’])
CQL.# Cassandra Query Language (CQL)INSERT INTO ColumnFamily1 (KEY, col_name) VALUES (row_key,col_value);
CQL...# Cassandra Query Language (CQL)SELECT * FROM ColumnFamily1 IN (‘row_key_1’);SELECT col_name FROM ColumnFamily1 WHER...
CQL...# Cassandra Query Language (CQL)DELETE FROM ColumnFamily1 WHERE key IN (row_key,);DELETE col_name FROM ColumnFamily1...
Thanks.
Aaron Morton                     @aaronmorton                   www.thelastpickle.comLicensed under a Creative Commons Att...
Upcoming SlideShare
Loading in...5
×

C*ollege Credit: An Introduction to Apache Cassandra

1,173

Published on

Join Aaron Morton, DataStax MVP for Apache Cassandra and learn the basics of the massively scalable NoSQL database. This webinar is 101 level and will examine C*’s architecture and its strengths for powering mission-critical applications. Aaron will introduce you to concepts such as Cassandra’s data model, multi-datacenter replication, and tunable consistency.

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,173
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
67
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • C*ollege Credit: An Introduction to Apache Cassandra

    1. 1. DATASTAX C*OLLEGE CREDIT:AN INTRODUCTION TO APACHE CASSANDRA Aaron MortonApache Cassandra Committer, Data Stax MVP for Apache Cassandra @aaronmorton www.thelastpickle.com Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
    2. 2. Overview The ClusterThe Data Model The API
    3. 3. Cassandra - Started at Facebook - Open sourced in 2008 - Top Level Apache projectsince 2010.
    4. 4. Used by... Netflix, Twitter, Reddit, Rackspace...
    5. 5. Inspiration - Google Big Table (2006) - Amazon Dynamo (2007)
    6. 6. Why Cassandra? - Scale - Operations - Data Model
    7. 7. Why Cassandra? Is My App a Good Fit for Apache Cassandra? Eric Lubow (CTO, SimpleReach) Wednesday October 24 @ 8:30AM PST http://www.datastax.com/resources/webinars/collegecredit
    8. 8. Overview The ClusterThe Data Model The API
    9. 9. Store ‘foo’ key with Replication Factor 3. Node 1 - foo Node 4 Node 2 - foo Node 3 - foo
    10. 10. Consistent Hashing. - Evenly map keys to nodes - Minimise key movementswhen nodes join or leave
    11. 11. Partitioner. RandomPartitioner transforms Keys to Tokens using MD5. (Default, there are others.)
    12. 12. Keys and Tokens? key fop foo token 0 10 90 99
    13. 13. Token Ring. 99 0 foo fop token: 90 token: 10
    14. 14. Token Ranges. Node 1 token: 0 76-0 1-25 Node 4 Node 2 token: 75 token: 25 Node 3 token: 50
    15. 15. Locate Token Range. Node 1 token: 0 foo token: 90 Node 4 Node 2 token: 75 token: 25 Node 3 token: 50
    16. 16. Replication Strategy selectsReplication Factor number of nodes for a row.
    17. 17. SimpleStrategy with RF 3. Node 1 token: 0 foo token: 90 Node 4 Node 2 token: 75 token: 25 Node 3 token: 50
    18. 18. NetworkTopologyStrategy uses a Replication Factor per Data Centre. (Default.)
    19. 19. Multi DC Replication with RF 3 and RF 2. Node 1 Node 10 token: 0 token: 1 foo token: 90 Node 4 West DC Node 2 Node 40 East DC Node 20 token: 75 token: 25 token: 76 token: 26 Node 3 Node 30 token: 50 token: 51
    20. 20. The Snitch knows which DataCentre and Rack the Node is in.
    21. 21. SimpleSnitch. Places all nodes in the same DC and Rack. (Default, there are others.)
    22. 22. PropertyFileSnitch. DC and Rack is specified per node via configuration.
    23. 23. EC2Snitch.DC is set to AWS Region and a Rack to Availability Zone.
    24. 24. DynamicSnitch.Re-orders nodes according totheir observed performance. (Wraps other snitch.)
    25. 25. The Client and the Coordinator. Node 1 token: 0 foo token: 90 Node 4 Node 2 token: 75 token: 25 Node 3 Client token: 50
    26. 26. Gossip.Nodes share information witha small number of neighbours.Who share information with a small number of neigh..
    27. 27. Multi DC Client and the Coordinator. Node 1 Node 10 token: 0 token: 1 foo token: 90 Node 4 Node 2 Node 40 Node 20 token: 75 token: 25 token: 76 token: 26 Node 3 Node 30 Client token: 50 token: 51
    28. 28. Consistency Level (CL). - Specified for each request - Number of nodes to waitfor.
    29. 29. Consistency Level (CL) - Any* - One, Two Three - QUORUM - LOCAL_QUORUM, EACH_QUOURM*
    30. 30. QUOURM at Replication Factor... Replication 2 or 3 4 or 5 6 or 7 Factor QUOURM 2 3 4
    31. 31. QUOURM at with RF3. Node 1 token: 0 foo token: 90 Node 4 Node 2 token: 75 token: 25 Node 3 Client token: 50
    32. 32. Write ‘foo’ at QUOURM with Hinted Handoff. Node 1 foo foo token: 90 Node 4 Node 2 foo for #3 foo Node 3 Client
    33. 33. Read ‘foo’ at QUOURM. Node 1 foo foo token: 90 Node 4 Node 2 foo Node 3 Client
    34. 34. Consistency Levelnodes must agree.
    35. 35. Column Timestamps used to resolve differences.
    36. 36. Resolving differences. Column Node 1 Node 2 Node 3 cromulent cromulent purple <missing> (timestamp 10) (timestamp 10) embiggens embiggens debigulator monkey (timestamp 10) (timestamp 10) (timestamp 5) tomato tomato tomacco dishwasher (timestamp 10) (timestamp 10) (timestamp 15)
    37. 37. Consistent read for ‘foo’ at QUOURM. Node 1 Node 1 cromulent cromulent Node 4 Node 2 Node 4 Node 2 embiggins cromulent cromulent Client Client Node 3 Node 3
    38. 38. Strong Consistency W+R>N (#Write Nodes + #Read Nodes> Replication Factor)
    39. 39. Achieving Strong Consistency. - QUOURM Read + QUORUM Write - ALL Read + ONE Write - ONE Read + ALL Write
    40. 40. Eventual Consistency. W + R <= N
    41. 41. Achieving Consistency. - Hinted Handoff - Read Repair - Scheduled nodetool repair
    42. 42. Overview The ClusterThe Data Model The API
    43. 43. Data Model so far. Row Key: Column Column Column (Incomplete.)
    44. 44. Data Model. Keyspace Column Family Column Family Column Family Column Column Column Row Key: Column Column Column Column Column Column (Column Family and Table mean the same.)
    45. 45. Rows are the unit of replication.
    46. 46. The Column Family is the unit of storage.
    47. 47. Inside the Column Family. Keyspace Column Family Column: name, value, timestamp Row Key: Column: name, value, timestamp Column: name, value, timestamp (Also TTL Columns)
    48. 48. Basic Data Types - ASCII, UTF8 - Integer, Long, Float, Double, Boolean - Date - UUID - Bytes - Counter*
    49. 49. Composite Data Types - Two or more Basic types - Ordered by each component - e.g. (IntegerType, UTF8) to hold(timestamp, user_name)
    50. 50. Data Modelling. Data Modelling for Apache Cassandra Aaron Morton (Cassandra Committer) Wednesday November 7 @ 11AM PST http://www.datastax.com/resources/webinars/collegecredit
    51. 51. Overview The ClusterThe Data Model The API
    52. 52. The API. - Original Thrift based RPC - Declarative Cassandra Query Language(CQL)
    53. 53. RPC via Python pycassa.# pycassa - Python>>> col_fam = pycassa.ColumnFamily(connection_pool,ColumnFamily1)>>> col_fam.insert(row_key, {col_name: col_val})
    54. 54. RPC via Python pycassa...# pycassa - Python>>> col_fam.get(row_key){col_name: col_val, col_name2: col_val2}>>> col_fam.multi_get([row_key], [‘col_name’]){‘row_key’ : {col_name: col_val}}
    55. 55. RPC via Python pycassa...# pycassa - Python>>> col_fam.remove(row_key)>>> col_fam.remove(row_key, [‘col_name’])
    56. 56. CQL.# Cassandra Query Language (CQL)INSERT INTO ColumnFamily1 (KEY, col_name) VALUES (row_key,col_value);
    57. 57. CQL...# Cassandra Query Language (CQL)SELECT * FROM ColumnFamily1 IN (‘row_key_1’);SELECT col_name FROM ColumnFamily1 WHERE KEY IN (‘row_key_1’,‘row_key_2’);
    58. 58. CQL...# Cassandra Query Language (CQL)DELETE FROM ColumnFamily1 WHERE key IN (row_key,);DELETE col_name FROM ColumnFamily1 WHERE key = row_key;
    59. 59. Thanks.
    60. 60. Aaron Morton @aaronmorton www.thelastpickle.comLicensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×