• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
C*ollege Credit: An Introduction to Apache Cassandra
 

C*ollege Credit: An Introduction to Apache Cassandra

on

  • 1,338 views

Join Aaron Morton, DataStax MVP for Apache Cassandra and learn the basics of the massively scalable NoSQL database. This webinar is 101 level and will examine C*’s architecture and its strengths for ...

Join Aaron Morton, DataStax MVP for Apache Cassandra and learn the basics of the massively scalable NoSQL database. This webinar is 101 level and will examine C*’s architecture and its strengths for powering mission-critical applications. Aaron will introduce you to concepts such as Cassandra’s data model, multi-datacenter replication, and tunable consistency.

Statistics

Views

Total Views
1,338
Views on SlideShare
1,338
Embed Views
0

Actions

Likes
2
Downloads
64
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

C*ollege Credit: An Introduction to Apache Cassandra C*ollege Credit: An Introduction to Apache Cassandra Presentation Transcript

  • DATASTAX C*OLLEGE CREDIT:AN INTRODUCTION TO APACHE CASSANDRA Aaron MortonApache Cassandra Committer, Data Stax MVP for Apache Cassandra @aaronmorton www.thelastpickle.com Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
  • Overview The ClusterThe Data Model The API
  • Cassandra - Started at Facebook - Open sourced in 2008 - Top Level Apache projectsince 2010.
  • Used by... Netflix, Twitter, Reddit, Rackspace...
  • Inspiration - Google Big Table (2006) - Amazon Dynamo (2007)
  • Why Cassandra? - Scale - Operations - Data Model
  • Why Cassandra? Is My App a Good Fit for Apache Cassandra? Eric Lubow (CTO, SimpleReach) Wednesday October 24 @ 8:30AM PST http://www.datastax.com/resources/webinars/collegecredit
  • Overview The ClusterThe Data Model The API
  • Store ‘foo’ key with Replication Factor 3. Node 1 - foo Node 4 Node 2 - foo Node 3 - foo
  • Consistent Hashing. - Evenly map keys to nodes - Minimise key movementswhen nodes join or leave
  • Partitioner. RandomPartitioner transforms Keys to Tokens using MD5. (Default, there are others.)
  • Keys and Tokens? key fop foo token 0 10 90 99
  • Token Ring. 99 0 foo fop token: 90 token: 10
  • Token Ranges. Node 1 token: 0 76-0 1-25 Node 4 Node 2 token: 75 token: 25 Node 3 token: 50
  • Locate Token Range. Node 1 token: 0 foo token: 90 Node 4 Node 2 token: 75 token: 25 Node 3 token: 50
  • Replication Strategy selectsReplication Factor number of nodes for a row.
  • SimpleStrategy with RF 3. Node 1 token: 0 foo token: 90 Node 4 Node 2 token: 75 token: 25 Node 3 token: 50
  • NetworkTopologyStrategy uses a Replication Factor per Data Centre. (Default.)
  • Multi DC Replication with RF 3 and RF 2. Node 1 Node 10 token: 0 token: 1 foo token: 90 Node 4 West DC Node 2 Node 40 East DC Node 20 token: 75 token: 25 token: 76 token: 26 Node 3 Node 30 token: 50 token: 51
  • The Snitch knows which DataCentre and Rack the Node is in.
  • SimpleSnitch. Places all nodes in the same DC and Rack. (Default, there are others.)
  • PropertyFileSnitch. DC and Rack is specified per node via configuration.
  • EC2Snitch.DC is set to AWS Region and a Rack to Availability Zone.
  • DynamicSnitch.Re-orders nodes according totheir observed performance. (Wraps other snitch.)
  • The Client and the Coordinator. Node 1 token: 0 foo token: 90 Node 4 Node 2 token: 75 token: 25 Node 3 Client token: 50
  • Gossip.Nodes share information witha small number of neighbours.Who share information with a small number of neigh..
  • Multi DC Client and the Coordinator. Node 1 Node 10 token: 0 token: 1 foo token: 90 Node 4 Node 2 Node 40 Node 20 token: 75 token: 25 token: 76 token: 26 Node 3 Node 30 Client token: 50 token: 51
  • Consistency Level (CL). - Specified for each request - Number of nodes to waitfor.
  • Consistency Level (CL) - Any* - One, Two Three - QUORUM - LOCAL_QUORUM, EACH_QUOURM*
  • QUOURM at Replication Factor... Replication 2 or 3 4 or 5 6 or 7 Factor QUOURM 2 3 4
  • QUOURM at with RF3. Node 1 token: 0 foo token: 90 Node 4 Node 2 token: 75 token: 25 Node 3 Client token: 50
  • Write ‘foo’ at QUOURM with Hinted Handoff. Node 1 foo foo token: 90 Node 4 Node 2 foo for #3 foo Node 3 Client
  • Read ‘foo’ at QUOURM. Node 1 foo foo token: 90 Node 4 Node 2 foo Node 3 Client
  • Consistency Levelnodes must agree.
  • Column Timestamps used to resolve differences.
  • Resolving differences. Column Node 1 Node 2 Node 3 cromulent cromulent purple <missing> (timestamp 10) (timestamp 10) embiggens embiggens debigulator monkey (timestamp 10) (timestamp 10) (timestamp 5) tomato tomato tomacco dishwasher (timestamp 10) (timestamp 10) (timestamp 15)
  • Consistent read for ‘foo’ at QUOURM. Node 1 Node 1 cromulent cromulent Node 4 Node 2 Node 4 Node 2 embiggins cromulent cromulent Client Client Node 3 Node 3
  • Strong Consistency W+R>N (#Write Nodes + #Read Nodes> Replication Factor)
  • Achieving Strong Consistency. - QUOURM Read + QUORUM Write - ALL Read + ONE Write - ONE Read + ALL Write
  • Eventual Consistency. W + R <= N
  • Achieving Consistency. - Hinted Handoff - Read Repair - Scheduled nodetool repair
  • Overview The ClusterThe Data Model The API
  • Data Model so far. Row Key: Column Column Column (Incomplete.)
  • Data Model. Keyspace Column Family Column Family Column Family Column Column Column Row Key: Column Column Column Column Column Column (Column Family and Table mean the same.)
  • Rows are the unit of replication.
  • The Column Family is the unit of storage.
  • Inside the Column Family. Keyspace Column Family Column: name, value, timestamp Row Key: Column: name, value, timestamp Column: name, value, timestamp (Also TTL Columns)
  • Basic Data Types - ASCII, UTF8 - Integer, Long, Float, Double, Boolean - Date - UUID - Bytes - Counter*
  • Composite Data Types - Two or more Basic types - Ordered by each component - e.g. (IntegerType, UTF8) to hold(timestamp, user_name)
  • Data Modelling. Data Modelling for Apache Cassandra Aaron Morton (Cassandra Committer) Wednesday November 7 @ 11AM PST http://www.datastax.com/resources/webinars/collegecredit
  • Overview The ClusterThe Data Model The API
  • The API. - Original Thrift based RPC - Declarative Cassandra Query Language(CQL)
  • RPC via Python pycassa.# pycassa - Python>>> col_fam = pycassa.ColumnFamily(connection_pool,ColumnFamily1)>>> col_fam.insert(row_key, {col_name: col_val})
  • RPC via Python pycassa...# pycassa - Python>>> col_fam.get(row_key){col_name: col_val, col_name2: col_val2}>>> col_fam.multi_get([row_key], [‘col_name’]){‘row_key’ : {col_name: col_val}}
  • RPC via Python pycassa...# pycassa - Python>>> col_fam.remove(row_key)>>> col_fam.remove(row_key, [‘col_name’])
  • CQL.# Cassandra Query Language (CQL)INSERT INTO ColumnFamily1 (KEY, col_name) VALUES (row_key,col_value);
  • CQL...# Cassandra Query Language (CQL)SELECT * FROM ColumnFamily1 IN (‘row_key_1’);SELECT col_name FROM ColumnFamily1 WHERE KEY IN (‘row_key_1’,‘row_key_2’);
  • CQL...# Cassandra Query Language (CQL)DELETE FROM ColumnFamily1 WHERE key IN (row_key,);DELETE col_name FROM ColumnFamily1 WHERE key = row_key;
  • Thanks.
  • Aaron Morton @aaronmorton www.thelastpickle.comLicensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License