C*ollege Credit: Data Modeling for Apache Cassandra
Upcoming SlideShare
Loading in...5
×
 

C*ollege Credit: Data Modeling for Apache Cassandra

on

  • 2,008 views

Cassandra stores data differently than traditional RDBMS’s. It is these differences that allow for improvements in performance, availability and scalability. Aaron Morton, DataStax MVP for Apache ...

Cassandra stores data differently than traditional RDBMS’s. It is these differences that allow for improvements in performance, availability and scalability. Aaron Morton, DataStax MVP for Apache Cassandra will present the basics of the data model and outline the differences clearly. This webinar is 101 level and is suitable for people who are coming from a relational background and just starting to get into Apache Cassandra.

Statistics

Views

Total Views
2,008
Views on SlideShare
2,008
Embed Views
0

Actions

Likes
3
Downloads
70
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

C*ollege Credit: Data Modeling for Apache Cassandra C*ollege Credit: Data Modeling for Apache Cassandra Presentation Transcript

  • DATASTAX C*OLLEGE CREDIT:DATA MODELLING FOR APACHE CASSANDRA Aaron MortonApache Cassandra Committer, Data Stax MVP for Apache Cassandra @aaronmorton www.thelastpickle.com Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
  • General Guidelines API Choice Example
  • Cassandra is good atreading data from a row in the order it is stored.
  • Typically an efficient data model will denormalize data and use the storage engine order.
  • To create a good data model understand the queries your application requires.
  • General Guidelines API Choice Example
  • Multiple API’s? initially only a Thrift / RPC API, used by language specific clients.
  • Multiple API’s... Cassandra Query Language (CQL) started as a higher level, declarative alternative.
  • Multiple API’s... CQL 3 brings many changes. Currently in Beta in Cassandra v1.1
  • CQL 3 uses a Table Orientated, Schema Driven, Data Model. (I said it had many changes.)
  • General Guidelines API Choice Example
  • Twitter Clone Previously done with Thrift at WDCNZ “Hello @World #Cassandra - Apache Cassandra in action” http://vimeo.com/49762233
  • Twitter clone... using CQL 3 via the cqlsh tool. bin/cqlsh -3
  • Queries? * Post Tweet to Followers * Get Tweet by ID * List Tweets by User * List Tweets in User Timeline * List Followers
  • Keyspace is a namespace container.
  • Our KeyspaceCREATE KEYSPACE cass_collegeWITH strategy_class = NetworkTopologyStrategyAND strategy_options:datacenter1 = 1;
  • Table is a sparse collection of well known, ordered columns.
  • First TableCREATE TABLE User( user_name text, password text, real_name text, PRIMARY KEY (user_name));
  • Some users...cqlsh:cass_college> INSERT INTO User ... (user_name, password, real_name) ... VALUES ... (fred, sekr8t, Mr Foo);cqlsh:cass_college> select * from User; user_name | password | real_name-----------+----------+----------- fred | sekr8t | Mr Foo
  • Some users...cqlsh:cass_college> INSERT INTO User ... (user_name, password) ... VALUES ... (bob, pwd);cqlsh:cass_college> select * from User where user_name =bob; user_name | password | real_name-----------+----------+----------- bob | pwd | null
  • Data Model (so far) User
  • Data Model (so far) CF / User Value user_name Primary Key
  • Tweet TableCREATE TABLE Tweet( tweet_id bigint, body text, user_name text, timestamp timestamp, PRIMARY KEY (tweet_id));
  • Tweet Table...cqlsh:cass_college> INSERT INTO Tweet ... (tweet_id, body, user_name, timestamp) ... VALUES ... (1, The Tweet,fred,1352150816917);cqlsh:cass_college> select * from Tweet where tweet_id = 1; tweet_id | body | timestamp | user_name----------+-----------+--------------------------+----------- 1 | The Tweet | 2012-11-06 10:26:56+1300 | fred
  • Data Model (so far) CF / User Tweet Value user_name Primary Key Field tweet_id Primary Key
  • UserTweets TableCREATE TABLE UserTweets( tweet_id bigint, user_name text, body text, timestamp timestamp, PRIMARY KEY (user_name, tweet_id));
  • UserTweets Table...cqlsh:cass_college> INSERT INTO UserTweets ... (tweet_id, body, user_name, timestamp) ... VALUES ... (1, The Tweet,fred,1352150816917);cqlsh:cass_college> select * from UserTweets whereuser_name=fred; user_name | tweet_id | body | timestamp-----------+----------+-----------+-------------------------- fred | 1 | The Tweet | 2012-11-06 10:26:56+1300
  • UserTweets Table...cqlsh:cass_college> select * from UserTweets whereuser_name=fred and tweet_id=1; user_name | tweet_id | body | timestamp-----------+----------+-----------+-------------------------- fred | 1 | The Tweet | 2012-11-06 10:26:56+1300
  • UserTweets Table...cqlsh:cass_college> INSERT INTO UserTweets ... (tweet_id, body, user_name, timestamp) ... VALUES ... (2, Second Tweet, fred, 1352150816918);cqlsh:cass_college> select * from UserTweets where user_name = fred; user_name | tweet_id | body | timestamp-----------+----------+--------------+-------------------------- fred | 1 | The Tweet | 2012-11-06 10:26:56+1300 fred | 2 | Second Tweet | 2012-11-06 10:26:56+1300
  • UserTweets Table...cqlsh:cass_college> select * from UserTweets where user_name = fred order bytweet_id desc; user_name | tweet_id | body | timestamp-----------+----------+--------------+-------------------------- fred | 2 | Second Tweet | 2012-11-06 10:26:56+1300 fred | 1 | The Tweet | 2012-11-06 10:26:56+1300
  • UserTimelineCREATE TABLE UserTimeline( tweet_id bigint, user_name text, body text, timestamp timestamp, PRIMARY KEY (user_name, tweet_id));
  • Data Model (so far) CF / User User User Tweet Value Tweets Timeline user_name Primary Key Field Primary Key Primary Key Primary Key Primary Key tweet_id Primary Key Component Component
  • UserMetrics TableCREATE TABLE UserMetrics( user_name text, tweets counter, followers counter, following counter, PRIMARY KEY (user_name));
  • UserMetrics Table...cqlsh:cass_college> UPDATE ... UserMetrics ... SET ... tweets = tweets + 1 ... WHERE ... user_name = fred;cqlsh:cass_college> select * from UserMetrics where user_name= fred; user_name | followers | following | tweets-----------+-----------+-----------+-------- fred | null | null | 1
  • Data Model (so far) CF / User User User Tweet User Metrics Value Tweets Timeline Primary Primary Primary Primary user_name Field Key Key Key Key Primary Primary Key Primary Key tweet_id Key Component Component
  • RelationshipsCREATE TABLE Followers( user_name text, follower text, timestamp timestamp, PRIMARY KEY (user_name, follower));CREATE TABLE Following( user_name text, following text, timestamp timestamp, PRIMARY KEY (user_name, following));
  • RelationshipsINSERT INTO Following (user_name, following, timestamp)VALUES (bob, fred, 1352247749161);INSERT INTO Followers (user_name, follower, timestamp)VALUES (fred, bob, 1352247749161);
  • Relationshipscqlsh:cass_college> select * from Following; user_name | following | timestamp-----------+-----------+-------------------------- bob | fred | 2012-11-07 13:22:29+1300cqlsh:cass_college> select * from Followers; user_name | follower | timestamp-----------+----------+-------------------------- fred | bob | 2012-11-07 13:22:29+1300
  • Data Model CF / User User User Follows User Tweet Value Tweets Timeline Metrics Followers Primary Primary Primary Primary Primaryuser_name Field Key Key Key Key Key Field Primary Primary Key Primary Key tweet_id Key Component Component
  • Thanks.
  • Aaron Morton @aaronmorton www.thelastpickle.comLicensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License