• Save
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
Upcoming SlideShare
Loading in...5
×
 

Cassandra Community Webinar | Introduction to Apache Cassandra 1.2

on

  • 1,812 views

Title: Introduction to Apache Cassandra 1.2 ...

Title: Introduction to Apache Cassandra 1.2

Details: Join Aaron Morton, DataStax MVP for Apache Cassandra and learn the basics of the massively scalable NoSQL database. This webinar is will examine C*’s architecture and its strengths for powering mission-critical applications. Aaron will introduce you to core concepts such as Cassandra’s data model, multi-datacenter replication, and tunable consistency. He’ll also cover new features in Cassandra version 1.2 including virtual nodes, CQL 3 language and query tracing.

Speaker: Aaron Morton, Apache Cassandra Committer

Aaron Morton is a Freelance Developer based in New Zealand, and a Committer on the Apache Cassandra project. In 2010, he gave up the RDBMS world for the scale and reliability of Cassandra. He now spends his time advancing the Cassandra project and helping others get the best out of it.

Statistics

Views

Total Views
1,812
Views on SlideShare
1,812
Embed Views
0

Actions

Likes
7
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Cassandra Community Webinar | Introduction to Apache Cassandra 1.2 Cassandra Community Webinar | Introduction to Apache Cassandra 1.2 Presentation Transcript

    • CASSANDRA COMMUNITY WEBINARS APRIL 2013INTRODUCTION TOAPACHE CASSANDRA 1.2Aaron MortonApache Cassandra Committer, Data Stax MVP for Apache Cassandra@aaronmortonwww.thelastpickle.comLicensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
    • Cassandra Summit 2013June 11 & 12San FranciscoUse SFSummit25 for 25% off
    • Cassandra Summit 2013DataStax Ac*ademyFree certification during thesummit.
    • OverviewThe ClusterThe NodeThe Data Model
    • Cassandra- Started at Facebook- Open sourced in 2008- Top Level Apache projectsince 2010.
    • Used by...Netflix,Twitter, Reddit,Rackspace...
    • Inspiration- Google Big Table (2006)- Amazon Dynamo (2007)
    • Why Cassandra?- Scale- Operations- Data Model
    • OverviewThe ClusterThe NodeThe Data Model
    • Store ‘foo’ key with Replication Factor 3.Node 1 - fooNode 2 - fooNode 4Node 3 - foo
    • Consistent Hashing.- Evenly map keys to nodes- Minimise key movementswhen nodes join or leave
    • Partitioner.RandomPartitionertransforms Keys to Tokensusing MD5.(Default pre version 1.2.)
    • Partitioner.Murmur3Partitionertransforms Keys to Tokensusing Murmur3.(Default in version 1.2.)
    • Keys andTokens?token 0 99key fop foo10 90
    • Token Ring.footoken: 90foptoken: 1099 0
    • Token Ranges pre v1.2.Node 1token: 0Node 2token: 25Node 4token: 75Node 3token: 501-2576-0
    • Token Ranges withVirtual Nodes in v1.2.Node 1Node 2Node 3Node 4
    • LocateToken Range.Node 1Node 2Node 3Node 4footoken: 90
    • Replication Strategy selectsReplication Factor number ofnodes for a row.
    • SimpleStrategy with RF 3.Node 1Node 2Node 3Node 4footoken: 90
    • NetworkTopologyStrategy usesa Replication Factor per DataCentre.(Default.)
    • Multi DC Replication with RF 3 and RF 2.Node 1Node 2Node 3Node 4footoken: 90Node 1Node 2Node 3Node 4West DC East DC
    • The Snitch knows which DataCentre and Rack the Node isin.
    • SimpleSnitch.Places all nodes in the sameDC and Rack.(Default, there are others.)
    • EC2Snitch.DC is set to AWS Region anda Rack to Availability Zone.
    • The Client and the Coordinator.Node 1Node 2Node 3Node 4footoken: 90Client
    • Multi DC Client and the Coordinator.Node 1Node 2Node 3Node 4footoken: 90ClientNode 10Node 20Node 30Node 40
    • Gossip.Nodes share information witha small number of neighbours.Who share information with asmall number of neigh..
    • Consistency Level (CL).- Specified for each request- Number of nodes to waitfor.
    • Consistency Level (CL)- Any*- One,Two Three- QUORUM- LOCAL_QUORUM, EACH_QUOURM*
    • QUOURM at Replication Factor...ReplicationFactorQUOURM2 or 3 4 or 5 6 or 72 3 4
    • Write ‘foo’ at QUOURM with Hinted Handoff.Node 1Node 2Node 3Node 4foo for #3footoken: 90Client
    • Read ‘foo’ at QUOURM.Node 1Node 2Node 3Node 4footoken: 90Client
    • Column Timestampsused to resolvedifferences.
    • Resolving differences.Column Node 1 Node 2 Node 3purplecromulent(timestamp 10)cromulent(timestamp 10)<missing>monkeyembiggens(timestamp 10)embiggens(timestamp 10)debigulator(timestamp 5)dishwashertomato(timestamp 10)tomato(timestamp 10)tomacco(timestamp 15)
    • Consistent read for ‘foo’ at QUOURM.Node 1Node 2Node 3Node 4Clientcromulentcromulent<empty>Node 1Node 2Node 3Node 4Clientcromulent cromulent
    • Strong ConsistencyW + R > N(#Write Nodes + #Read Nodes> Replication Factor)
    • Achieving Strong Consistency.- QUOURM Read + QUORUM Write- ALL Read + ONE Write- ONE Read + ALL Write
    • Achieving Consistency- Consistency Level- Hinted Handoff- Read Repair- Anti Entropy
    • OverviewThe ClusterThe NodeThe Data Model
    • Optimised forWrites.
    • Write pathAppend to WriteAhead Log.(fsync every 10s by default, other options available)
    • Write path...Merge Columnsinto Memtable.(Lock free, always in memory.)
    • (Later.)Asynchronously flushMemtable to new files.(May be 10’s or 100’s of MB in size.)
    • Data is stored inimmutable SSTables.(Sorted String table.)
    • SSTable files.*-Data.db*-Index.db*-Filter.db(And others)
    • SSTables.SSTable 1foo:dishwasher (ts 10):tomatopurple (ts 10):cromulentSSTable 2foo:frink (ts 20):flayvenmonkey (ts 10):embigginsSSTable 3 SSTable 4foo:dishwasher (ts 15):tomaccoSSTable 5
    • Read purple, monkey, dishwasher.SSTable 1-Data.dbfoo:dishwasher (ts 10):tomatopurple (ts 10):cromulentSSTable 2-Data.dbfoo:frink (ts 20):flayvenmonkey (ts 10):embigginsSSTable 3-Data.db SSTable 4-Data.dbfoo:dishwasher (ts 15):tomaccoSSTable 5-Data.dbBloom FilterIndex SampleSSTable 1-Index.dbBloom FilterIndex SampleSSTable 2-Index.dbBloom FilterIndex SampleSSTable 3-Index.dbBloom FilterIndex SampleSSTable 4-Index.dbBloom FilterIndex SampleSSTable 5-Index.dbMemoryDisk
    • Key Cache caches row keyposition in *-Data.db file.(Removes up to1disk seek per SSTable.)
    • Read with Key Cache.SSTable 1-Data.dbfoo:dishwasher (ts 10):tomatopurple (ts 10):cromulentSSTable 2-Data.dbfoo:frink (ts 20):flayvenmonkey (ts 10):embigginsSSTable 3-Data.db SSTable 4-Data.dbfoo:dishwasher (ts 15):tomaccoSSTable 5-Data.dbKey CacheIndex SampleSSTable 1-Index.dbKey CacheIndex SampleSSTable 2-Index.dbKey CacheIndex SampleSSTable 3-Index.dbKey CacheIndex SampleSSTable 4-Index.dbKey CacheIndex SampleSSTable 5-Index.dbMemoryDiskBloom Filter Bloom Filter Bloom Filter Bloom Filter Bloom Filter
    • Row Cache caches entire row.(Removes all disk IO.)
    • Read with Row Cache.Row CacheSSTable 1-Data.dbfoo:dishwasher (ts 10):tomatopurple (ts 10):cromulentSSTable 2-Data.dbfoo:frink (ts 20):flayvenmonkey (ts 10):embigginsSSTable 3-Data.db SSTable 4-Data.dbfoo:dishwasher (ts 15):tomaccoSSTable 5-Data.dbKey CacheIndex SampleSSTable 1-Index.dbKey CacheIndex SampleSSTable 2-Index.dbKey CacheIndex SampleSSTable 3-Index.dbKey CacheIndex SampleSSTable 4-Index.dbKey CacheIndex SampleSSTable 5-Index.dbMemoryDiskBloom Filter Bloom Filter Bloom Filter Bloom Filter Bloom Filter
    • Compaction merges truth frommultiple SSTables into oneSSTable with the same truth.(Manual and continuous background process.)
    • Compaction.Column SSTable 1 SSTable 2 SSTable 4 Newpurplecromulent(timestamp 10)<tombstone>(timestamp 15)<tombstone>(timestamp 15)monkeyembiggens(timestamp 10)embiggens(timestamp 10)dishwashertomato(timestamp 10)tomacco(timestamp 15)tomacco(timestamp 15)
    • OverviewThe ClusterThe NodeThe Data Model
    • Cassandra is good atreading data from a row in theorder it is stored.
    • Typically an efficient data model willdenormalize data and use thestorage engine order.
    • To create a good data modelunderstand the queries yourapplication requires.
    • API ChoiceThriftOriginal and still fullysupported API.
    • API ChoiceCQL3New and fully supported API.
    • CQL 3A Table Orientated, SchemaDriven, Data Model andQuery language similar toSQL.
    • CQL 3A Table Orientated, SchemaDriven, Data Model andQuery language similar toSQL.
    • Twitter cloneusing CQL 3 via the cqlshtool.bin/cqlsh
    • Queries?- Post Tweet to Followers- Get Tweet by ID- List Tweets by User- List Tweets in User Timeline- List Followers
    • KeyspaceA Namespace container.
    • Our KeyspaceCREATE KEYSPACEcass_communityWITH replication ={class:NetworkTopologyStrategy,datacenter1:1};
    • TableA sparse collection of wellknown, ordered columns.
    • FirstTableCREATE TABLE User(user_name text,password text,real_name text,PRIMARY KEY (user_name));
    • Some userscqlsh:cass_community> INSERT INTO User... (user_name, password, real_name)... VALUES... (fred, sekr8t, Mr Foo);cqlsh:cass_community> select * from User;user_name | password | real_name-----------+----------+-----------fred | sekr8t | Mr Foo
    • Some userscqlsh:cass_community> INSERT INTO User... (user_name, password)... VALUES... (bob, pwd);cqlsh:cass_community> select * from User where user_name =bob;user_name | password | real_name-----------+----------+-----------bob | pwd | null
    • Data Model (so far)Table /ValueUseruser_name Primary Key
    • TweetTableCREATE TABLE Tweet(tweet_id bigint,body text,user_name text,timestamp timestamp,PRIMARY KEY (tweet_id));
    • TweetTable...cqlsh:cass_community> INSERT INTO Tweet... (tweet_id, body, user_name, timestamp)... VALUES... (1, The Tweet,fred,1352150816917);cqlsh:cass_community> select * from Tweet where tweet_id = 1;tweet_id | body | timestamp | user_name----------+-----------+--------------------------+-----------1 | The Tweet | 2012-11-06 10:26:56+1300 | fred
    • Data Model (so far)Table /ValueUser Tweetuser_name Primary Key Fieldtweet_id Primary Key
    • UserTweetsTableCREATE TABLE UserTweets(tweet_id bigint,user_name text,body text,timestamp timestamp,PRIMARY KEY (user_name, tweet_id));
    • UserTweetsTable...cqlsh:cass_community> INSERT INTO UserTweets... (tweet_id, body, user_name, timestamp)... VALUES... (1, The Tweet,fred,1352150816917);cqlsh:cass_community> select * from UserTweets whereuser_name=fred;user_name | tweet_id | body | timestamp-----------+----------+-----------+--------------------------fred | 1 | The Tweet | 2012-11-06 10:26:56+1300
    • UserTweetsTable...cqlsh:cass_community> select * from UserTweets whereuser_name=fred and tweet_id=1;user_name | tweet_id | body | timestamp-----------+----------+-----------+--------------------------fred | 1 | The Tweet | 2012-11-06 10:26:56+1300
    • UserTweetsTable...cqlsh:cass_community> INSERT INTO UserTweets... (tweet_id, body, user_name, timestamp)... VALUES... (2, Second Tweet, fred, 1352150816918);cqlsh:cass_community> select * from UserTweets where user_name = fred;user_name | tweet_id | body | timestamp-----------+----------+--------------+--------------------------fred | 1 | The Tweet | 2012-11-06 10:26:56+1300fred | 2 | Second Tweet | 2012-11-06 10:26:56+1300
    • UserTweetsTable...cqlsh:cass_community> select * from UserTweets where user_name = fred order bytweet_id desc;user_name | tweet_id | body | timestamp-----------+----------+--------------+--------------------------fred | 2 | Second Tweet | 2012-11-06 10:26:56+1300fred | 1 | The Tweet | 2012-11-06 10:26:56+1300
    • UserTimelineCREATE TABLE UserTimeline(user_name text,tweet_id bigint,tweet_user text,body text,timestamp timestamp,PRIMARY KEY (user_name, tweet_id))WITH CLUSTERING ORDER BY (tweet_id DESC);
    • UserTimelinecqlsh:cass_community> INSERT INTO UserTimeline... (user_name, tweet_id, tweet_user, body, timestamp)... VALUES... (fred, 1, fred, The Tweet,1352150816917);cqlsh:cass_community> INSERT INTO UserTimeline... (user_name, tweet_id, tweet_user, body, timestamp)... VALUES... (fred, 100, bob, My Tweet,1352150846917);
    • UserTimelinecqlsh:cass_community> select * from UserTimeline where user_name = fred;user_name | tweet_id | body | timestamp | tweet_user-----------+----------+-----------+--------------------------+------------fred | 100 | My Tweet | 2012-11-06 10:27:26+1300 | bobfred | 1 | The Tweet | 2012-11-06 10:26:56+1300 | fred
    • Data Model (so far)Table /ValueUser TweetUserTweetsUserTimelineuser_name Primary Key Field Primary Key Primary Keytweet_id Primary KeyPrimary KeyComponentPrimary KeyComponent
    • UserMetricsTableCREATE TABLE UserMetrics(user_name text,tweets counter,followers counter,following counter,PRIMARY KEY (user_name));
    • UserMetricsTable...cqlsh:cass_community> UPDATE... UserMetrics... SET... tweets = tweets + 1... WHERE... user_name = fred;cqlsh:cass_community> select * from UserMetrics whereuser_name = fred;user_name | followers | following | tweets-----------+-----------+-----------+--------fred | null | null | 1
    • Data Model (so far)Table /ValueUser TweetUserTweetsUserTimelineUser Metricsuser_namePrimaryKeyFieldPrimaryKeyPrimaryKeyPrimaryKeytweet_idPrimaryKeyPrimary KeyComponentPrimary KeyComponent
    • RelationshipsCREATE TABLE Followers(user_name text,follower text,timestamp timestamp,PRIMARY KEY (user_name, follower));CREATE TABLE Following(user_name text,following text,timestamp timestamp,PRIMARY KEY (user_name, following));
    • Relationshipscqlsh:cass_community> INSERT INTO... Following... (user_name, following, timestamp)... VALUES... (bob, fred, 1352247749161);cqlsh:cass_community> INSERT INTO... Followers... (user_name, follower, timestamp)... VALUES... (fred, bob, 1352247749161);
    • Relationshipscqlsh:cass_community> select * from Following;user_name | following | timestamp-----------+-----------+--------------------------bob | fred | 2012-11-07 13:22:29+1300cqlsh:cass_community> select * from Followers;user_name | follower | timestamp-----------+----------+--------------------------fred | bob | 2012-11-07 13:22:29+1300
    • Data ModelTable /ValueUser TweetUserTweetsUserTimelineUserMetricsFollowsFollowersuser_namePrimaryKeyFieldPrimaryKeyPrimaryKeyPrimaryKeyPrimaryKeytweet_idPrimaryKeyPrimary KeyComponentPrimary KeyComponent
    • Cassandra Summit 2013June 11 & 12San FranciscoUse SFSummit25 for 25% off
    • Thanks.
    • Aaron Morton@aaronmortonwww.thelastpickle.comLicensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License