C*ollege Credit: An Introduction to Apache Cassandra

  • 1,024 views
Uploaded on

Join Aaron Morton, DataStax MVP for Apache Cassandra and learn the basics of the massively scalable NoSQL database. This webinar is 101 level and will examine C*’s architecture and its strengths for …

Join Aaron Morton, DataStax MVP for Apache Cassandra and learn the basics of the massively scalable NoSQL database. This webinar is 101 level and will examine C*’s architecture and its strengths for powering mission-critical applications. Aaron will introduce you to concepts such as Cassandra’s data model, multi-datacenter replication, and tunable consistency.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,024
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
64
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Transcript

  • 1. DATASTAX C*OLLEGE CREDIT:AN INTRODUCTION TO APACHE CASSANDRA Aaron MortonApache Cassandra Committer, Data Stax MVP for Apache Cassandra @aaronmorton www.thelastpickle.com Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
  • 2. Overview The ClusterThe Data Model The API
  • 3. Cassandra - Started at Facebook - Open sourced in 2008 - Top Level Apache projectsince 2010.
  • 4. Used by... Netflix, Twitter, Reddit, Rackspace...
  • 5. Inspiration - Google Big Table (2006) - Amazon Dynamo (2007)
  • 6. Why Cassandra? - Scale - Operations - Data Model
  • 7. Why Cassandra? Is My App a Good Fit for Apache Cassandra? Eric Lubow (CTO, SimpleReach) Wednesday October 24 @ 8:30AM PST http://www.datastax.com/resources/webinars/collegecredit
  • 8. Overview The ClusterThe Data Model The API
  • 9. Store ‘foo’ key with Replication Factor 3. Node 1 - foo Node 4 Node 2 - foo Node 3 - foo
  • 10. Consistent Hashing. - Evenly map keys to nodes - Minimise key movementswhen nodes join or leave
  • 11. Partitioner. RandomPartitioner transforms Keys to Tokens using MD5. (Default, there are others.)
  • 12. Keys and Tokens? key fop foo token 0 10 90 99
  • 13. Token Ring. 99 0 foo fop token: 90 token: 10
  • 14. Token Ranges. Node 1 token: 0 76-0 1-25 Node 4 Node 2 token: 75 token: 25 Node 3 token: 50
  • 15. Locate Token Range. Node 1 token: 0 foo token: 90 Node 4 Node 2 token: 75 token: 25 Node 3 token: 50
  • 16. Replication Strategy selectsReplication Factor number of nodes for a row.
  • 17. SimpleStrategy with RF 3. Node 1 token: 0 foo token: 90 Node 4 Node 2 token: 75 token: 25 Node 3 token: 50
  • 18. NetworkTopologyStrategy uses a Replication Factor per Data Centre. (Default.)
  • 19. Multi DC Replication with RF 3 and RF 2. Node 1 Node 10 token: 0 token: 1 foo token: 90 Node 4 West DC Node 2 Node 40 East DC Node 20 token: 75 token: 25 token: 76 token: 26 Node 3 Node 30 token: 50 token: 51
  • 20. The Snitch knows which DataCentre and Rack the Node is in.
  • 21. SimpleSnitch. Places all nodes in the same DC and Rack. (Default, there are others.)
  • 22. PropertyFileSnitch. DC and Rack is specified per node via configuration.
  • 23. EC2Snitch.DC is set to AWS Region and a Rack to Availability Zone.
  • 24. DynamicSnitch.Re-orders nodes according totheir observed performance. (Wraps other snitch.)
  • 25. The Client and the Coordinator. Node 1 token: 0 foo token: 90 Node 4 Node 2 token: 75 token: 25 Node 3 Client token: 50
  • 26. Gossip.Nodes share information witha small number of neighbours.Who share information with a small number of neigh..
  • 27. Multi DC Client and the Coordinator. Node 1 Node 10 token: 0 token: 1 foo token: 90 Node 4 Node 2 Node 40 Node 20 token: 75 token: 25 token: 76 token: 26 Node 3 Node 30 Client token: 50 token: 51
  • 28. Consistency Level (CL). - Specified for each request - Number of nodes to waitfor.
  • 29. Consistency Level (CL) - Any* - One, Two Three - QUORUM - LOCAL_QUORUM, EACH_QUOURM*
  • 30. QUOURM at Replication Factor... Replication 2 or 3 4 or 5 6 or 7 Factor QUOURM 2 3 4
  • 31. QUOURM at with RF3. Node 1 token: 0 foo token: 90 Node 4 Node 2 token: 75 token: 25 Node 3 Client token: 50
  • 32. Write ‘foo’ at QUOURM with Hinted Handoff. Node 1 foo foo token: 90 Node 4 Node 2 foo for #3 foo Node 3 Client
  • 33. Read ‘foo’ at QUOURM. Node 1 foo foo token: 90 Node 4 Node 2 foo Node 3 Client
  • 34. Consistency Levelnodes must agree.
  • 35. Column Timestamps used to resolve differences.
  • 36. Resolving differences. Column Node 1 Node 2 Node 3 cromulent cromulent purple <missing> (timestamp 10) (timestamp 10) embiggens embiggens debigulator monkey (timestamp 10) (timestamp 10) (timestamp 5) tomato tomato tomacco dishwasher (timestamp 10) (timestamp 10) (timestamp 15)
  • 37. Consistent read for ‘foo’ at QUOURM. Node 1 Node 1 cromulent cromulent Node 4 Node 2 Node 4 Node 2 embiggins cromulent cromulent Client Client Node 3 Node 3
  • 38. Strong Consistency W+R>N (#Write Nodes + #Read Nodes> Replication Factor)
  • 39. Achieving Strong Consistency. - QUOURM Read + QUORUM Write - ALL Read + ONE Write - ONE Read + ALL Write
  • 40. Eventual Consistency. W + R <= N
  • 41. Achieving Consistency. - Hinted Handoff - Read Repair - Scheduled nodetool repair
  • 42. Overview The ClusterThe Data Model The API
  • 43. Data Model so far. Row Key: Column Column Column (Incomplete.)
  • 44. Data Model. Keyspace Column Family Column Family Column Family Column Column Column Row Key: Column Column Column Column Column Column (Column Family and Table mean the same.)
  • 45. Rows are the unit of replication.
  • 46. The Column Family is the unit of storage.
  • 47. Inside the Column Family. Keyspace Column Family Column: name, value, timestamp Row Key: Column: name, value, timestamp Column: name, value, timestamp (Also TTL Columns)
  • 48. Basic Data Types - ASCII, UTF8 - Integer, Long, Float, Double, Boolean - Date - UUID - Bytes - Counter*
  • 49. Composite Data Types - Two or more Basic types - Ordered by each component - e.g. (IntegerType, UTF8) to hold(timestamp, user_name)
  • 50. Data Modelling. Data Modelling for Apache Cassandra Aaron Morton (Cassandra Committer) Wednesday November 7 @ 11AM PST http://www.datastax.com/resources/webinars/collegecredit
  • 51. Overview The ClusterThe Data Model The API
  • 52. The API. - Original Thrift based RPC - Declarative Cassandra Query Language(CQL)
  • 53. RPC via Python pycassa.# pycassa - Python>>> col_fam = pycassa.ColumnFamily(connection_pool,ColumnFamily1)>>> col_fam.insert(row_key, {col_name: col_val})
  • 54. RPC via Python pycassa...# pycassa - Python>>> col_fam.get(row_key){col_name: col_val, col_name2: col_val2}>>> col_fam.multi_get([row_key], [‘col_name’]){‘row_key’ : {col_name: col_val}}
  • 55. RPC via Python pycassa...# pycassa - Python>>> col_fam.remove(row_key)>>> col_fam.remove(row_key, [‘col_name’])
  • 56. CQL.# Cassandra Query Language (CQL)INSERT INTO ColumnFamily1 (KEY, col_name) VALUES (row_key,col_value);
  • 57. CQL...# Cassandra Query Language (CQL)SELECT * FROM ColumnFamily1 IN (‘row_key_1’);SELECT col_name FROM ColumnFamily1 WHERE KEY IN (‘row_key_1’,‘row_key_2’);
  • 58. CQL...# Cassandra Query Language (CQL)DELETE FROM ColumnFamily1 WHERE key IN (row_key,);DELETE col_name FROM ColumnFamily1 WHERE key = row_key;
  • 59. Thanks.
  • 60. Aaron Morton @aaronmorton www.thelastpickle.comLicensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License