• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Introduction to Cassandra Basics
 

Introduction to Cassandra Basics

on

  • 2,455 views

An introduction to some basic concepts and data modeling techniques in Cassandra.

An introduction to some basic concepts and data modeling techniques in Cassandra.

Statistics

Views

Total Views
2,455
Views on SlideShare
2,455
Embed Views
0

Actions

Likes
1
Downloads
142
Comments
1

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • I haven't reviewed it...
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Introduction to Cassandra Basics Introduction to Cassandra Basics Presentation Transcript

    • Introduction to Cassandra Nick Bailey @nickmbailey Monday, October 28, 13
    • Who am I? ©2012 DataStax Monday, October 28, 13 2
    • What’s DataStax? ©2012 DataStax Monday, October 28, 13 3
    • On to the good stuff! ©2012 DataStax Monday, October 28, 13 4
    • Why Cassandra? Cluster Architecture Node Architecture 5 Data Modeling Wrap up ©2012 DataStax Monday, October 28, 13
    • Why Cassandra? ©2012 DataStax Monday, October 28, 13 6
    • Time for buzz words! ©2012 DataStax Monday, October 28, 13 Big Data! NoSQL! 7
    • Big Data • Gartner: “...high-volume, high-velocity and high-variety...” • 2 sides of ‘big data’ • • ©2012 DataStax Monday, October 28, 13 Analytics Real-time 8
    • NoSQL • A terrible label • Covers a wide range of DBs • • • • • ©2012 DataStax Monday, October 28, 13 Cassandra Redis MongoDB HBase ... 9
    • Started by Facebook ©2012 DataStax Monday, October 28, 13 10
    • Dynamo (Amazon) + Big Table (Google) ©2012 DataStax Monday, October 28, 13 11
    • ©2012 DataStax Monday, October 28, 13 12
    • Cassandra is great for... • Massive, linear scaling (e.g. CERN hadron collider, Barracuda Networks) • Extremely heavy writes (e.g. BlueMountain Capital – financial tick data) • High availability (e.g. eBay, Eventbrite, Netflix, SoundCloud, HeathCare Anytime, Comcast, GoDaddy, Sony Entertainment Network) ©2012 DataStax Monday, October 28, 13 13
    • ©2012 DataStax Monday, October 28, 13 14
    • ©2012 DataStax Monday, October 28, 13 15
    • http://techblog.netflix.com/2012/07/lessons-netflix-learned-from-aws-storm.html ©2012 DataStax Monday, October 28, 13 16 9
    • One size does not fit all Polyglot persistence ©2012 DataStax Monday, October 28, 13 17
    • More Resources • PlanetCassandra.org • Blog • 5 minute interviews ©2012 DataStax Monday, October 28, 13 18
    • Cluster Architecture ©2012 DataStax Monday, October 28, 13 19
    • Data Distribution 0 75 25 50 Hash_Function(Partition Key) >> Token ©2012 DataStax Monday, October 28, 13
    • Replication ©2012 DataStax Monday, October 28, 13
    • Failure Modes ©2012 DataStax Monday, October 28, 13
    • Consistency Level • Multiple options • • • • • ONE QUORUM ALL LOCAL_QUORUM ... • Can be specified per request ©2012 DataStax Monday, October 28, 13 23
    • Quorum ©2012 DataStax Monday, October 28, 13
    • Quorum ©2012 DataStax Monday, October 28, 13
    • Consistency Write CL: ONE ©2012 DataStax Monday, October 28, 13
    • Consistency Read CL: One ©2012 DataStax Monday, October 28, 13
    • Failure Types • UnavailableException • Didn’t even try • Possible success or failure • TimedOutException ©2012 DataStax Monday, October 28, 13 28
    • Multi DC ©2012 DataStax Monday, October 28, 13
    • Gossip • Manages cluster state • • Nodes up/down Nodes joining/leaving • Decentralized ©2012 DataStax Monday, October 28, 13 30
    • Snitch • Responsible for determining cluster topology • Tracks node responsiveness • Simple, PropertyFile, Ec2Snitch, etc... ©2012 DataStax Monday, October 28, 13 31
    • Node Architecture ©2012 DataStax Monday, October 28, 13 32
    • Write Path Write Memtable Memory Disk commit log ©2012 DataStax Monday, October 28, 13 SSTable 33
    • Read Path Read Memtable Memory Disk SSTable ©2012 DataStax Monday, October 28, 13 SSTable 34
    • Data Modeling ©2012 DataStax Monday, October 28, 13 35
    • CQL Cassandra Query Language ©2012 DataStax Monday, October 28, 13 36
    • Terminology • Keyspace • Table (Column Family) • Row • Column • Partition Key • Clustering Key (Optional) ©2012 DataStax Monday, October 28, 13 37
    • For Example: CREATE KEYSPACE packagetracker WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }; CREATE KEYSPACE packagetracker WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'dc1' : 2, 'dc2' : 2}; CREATE TABLE events ( package_id text, status_timestamp timestamp, location text, notes text, PRIMARY KEY (package_id, status_timestamp) ); ©2012 DataStax Monday, October 28, 13 38
    • Constructs ©2012 DataStax Monday, October 28, 13 39
    • Basic Data Types • blob • int • text • long • uuid • etc ©2012 DataStax Monday, October 28, 13 40
    • More Data Modeling Constructs • Collections • map, set, list • Time to live (TTL) • Counters • Secondary Indexes ©2012 DataStax Monday, October 28, 13 41
    • Approaching Data Modeling • Model your queries, not your data • Optimize your data model for reads • Don’t be afraid to denormalize • You will get it wrong, iterate ©2012 DataStax Monday, October 28, 13 42
    • An Example: User Logins ©2012 DataStax Monday, October 28, 13 43
    • The Query What are the last 10 locations nickmbailey logged in from? SELECT time, location FROM logins WHERE user = ‘nickmbailey’ ORDER BY time DESC LIMIT 10; ©2012 DataStax Monday, October 28, 13 44
    • The Query What are the last 10 locations nickmbailey logged in from? SELECT time, location FROM logins WHERE user = ‘nickmbailey’ ORDER BY time DESC LIMIT 10; Partition Key ©2012 DataStax Monday, October 28, 13 45
    • The Query What are the last 10 locations nickmbailey logged in from? SELECT time, location FROM logins WHERE user = ‘nickmbailey’ ORDER BY time DESC LIMIT 10; Clustering Key ©2012 DataStax Monday, October 28, 13 Partition Key 46
    • The Query What are the last 10 locations nickmbailey logged in from? SELECT time, location FROM logins WHERE user = ‘nickmbailey’ ORDER BY time DESC LIMIT 10; Clustering Key ©2012 DataStax Monday, October 28, 13 Partition Key Additional Columns 47
    • The Query What are the last 10 locations nickmbailey logged in from? SELECT time, location FROM logins WHERE user = ‘nickmbailey’ ORDER BY time DESC LIMIT 10; Clustering Key Partition Key Additional Columns CREATE COLUMN FAMILY logins ( user text, time timestamp, location text, PRIMARY KEY (user, time)); ©2012 DataStax Monday, October 28, 13 48
    • The Query What are the last 10 locations nickmbailey logged in from? SELECT time, location FROM logins WHERE user = ‘nickmbailey’ ORDER BY time DESC LIMIT 10; CREATE COLUMN FAMILY logins ( user text, time timestamp, location text, PRIMARY KEY (user, time)); Partition key Primary key User Time Location nickmbailey 2013-07-19 09:22:18 Austin, Texas nickmbailey 2013-07-19 14:49:27 Blacksburg, Virginia jsmith 2013-07-20 07:59:34 Atlanta, Georgia ©2012 DataStax Monday, October 28, 13 49
    • Time-series data • By far, the most common data model • Event logs • Metrics • Sensor Data • Etc ©2012 DataStax Monday, October 28, 13 50
    • Another Query When was the last time nickmbailey logged in from San Francisco, California? SELECT time FROM logins WHERE user = ‘nickmbailey’ and location=‘San Francisco, California’; User Time Location nickmbailey 2013-07-19 09:22:18 Austin, Texas nickmbailey 2013-07-19 14:49:27 Blacksburg, Virginia nickmbailey 2013-07-19 14:49:27 Austin, Texas nickmbailey 2013-05-19 14:49:27 Austin, Texas nickmbailey 2013-04-19 14:49:27 San Francisco, California ... ... ... jsmith 2013-07-20 07:59:34 Atlanta, Georgia ©2012 DataStax Monday, October 28, 13 51
    • Another Query When was the last time nickmbailey logged in from Austin, Texas? SELECT time FROM logins_by_location WHERE user = ‘nickmbailey’ and location=‘San Francisco, California’; CREATE COLUMN FAMILY logins_by_location ( user text, time timestamp, location text, PRIMARY KEY (user, location)); ©2012 DataStax Monday, October 28, 13 52
    • Another Query When was the last time nickmbailey logged in from Austin, Texas? SELECT time FROM logins_by_location WHERE user = ‘nickmbailey’ and location=‘San Francisco, California’; CREATE COLUMN FAMILY logins_by_location ( user text, time timestamp, location text, PRIMARY KEY (user, location)); User Location Time nickmbailey Austin, Texas 2013-07-19 09:22:18 nickmbailey Blacksburg, Virginia 2013-07-19 14:49:27 nickmbailey San Francisco, California 2013-07-19 14:49:27 ©2012 DataStax Monday, October 28, 13 53
    • Denormalize • Create materialized views of the same data to support different queries • Storage space is cheap, Cassandra is fast ©2012 DataStax Monday, October 28, 13 54
    • Debugging your data model cqlsh> tracing on; Now tracing requests. cqlsh:foo> INSERT INTO test (a, b) VALUES (1, 'example'); Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9 activity | timestamp | source | source_elapsed -------------------------------------+--------------+-----------+---------------execute_cql3_query | 00:02:37,015 | 127.0.0.1 | 0 Parsing statement | 00:02:37,015 | 127.0.0.1 | 81 Preparing statement | 00:02:37,015 | 127.0.0.1 | 273 Determining replicas for mutation | 00:02:37,015 | 127.0.0.1 | 540 Sending message to /127.0.0.2 | 00:02:37,015 | 127.0.0.1 | 779 Messsage received from /127.0.0.1 Applying mutation Acquiring switchLock Appending to commitlog Adding to memtable Enqueuing response to /127.0.0.1 Sending message to /127.0.0.1 ©2012 DataStax Monday, October 28, 13 | | | | | | | 00:02:37,016 00:02:37,016 00:02:37,016 00:02:37,016 00:02:37,016 00:02:37,016 00:02:37,016 | | | | | | | 127.0.0.2 127.0.0.2 127.0.0.2 127.0.0.2 127.0.0.2 127.0.0.2 127.0.0.2 | | | | | | | 63 220 250 277 378 710 888 55
    • A note on Transactions • In general, you want to construct your data model around them • The latest version of Cassandra has ‘Compare and swap’ • • • ©2012 DataStax Monday, October 28, 13 An implementation of Paxos ...IF NOT EXISTS; ...IF column1 = ‘value’; 56
    • Try it out ©2012 DataStax Monday, October 28, 13 57
    • CCM • CCM - Cassandra Cluster Manager • https://github.com/pcmanus/ccm • • • ccm create test -v 2.0.1 ccm populate -n 3 ccm start • Warning: not lightweight • Example: ©2012 DataStax Monday, October 28, 13 58
    • Clients • Cqlsh • Bundled with Cassandra • • • • java: https://github.com/datastax/java-driver python: https://github.com/datastax/python-driver .net: https://github.com/datastax/csharp-driver and more: http://www.datastax.com/download/ clientdrivers • Drivers ©2012 DataStax Monday, October 28, 13 59
    • Get Help • IRC: #cassandra on freenode • Mailing Lists • Stack Overflow • DataStax Docs • ©2012 DataStax Monday, October 28, 13 http://www.datastax.com/docs 60
    • Questions? ©2012 DataStax Monday, October 28, 13 61
    • Monday, October 28, 13