Cassandra : Introduction
Patrick McFadin
Chief Evangelist/Solution Architect - DataStax
@PatrickMcFadin
©2013 DataStax Con...
Who I am

• Patrick McFadin
• Solution Architect at DataStax
• Cassandra MVP
• User for years
• Follow me for more:

Dude....
Five Years of Cassandra

0.1
Jul-08

...

0.3
0

0.6
1

0.7

1.0
2

1.2
3

DSE

4

2.0
5
Why Cassandra?
The Best
Persistence
Tier
For Your
Application
!
!
!
!
!
!
!
!
Cassandra - An introduction
Cassandra - Roots
• Based on Amazon Dynamo and Google BigTable paper
• Shared nothing
• Data safe as possible
• Predictabl...
Cassandra - More than one server
Each node owns
25% of the data

• All nodes participate in a cluster
• Shared nothing
• A...
Core Concepts Write path

<row,column>

Compacted later
Core Concepts Read Path

Real user story
• New app
• SSDs
• 2.5 m requests
• Client P99: 3.17ms!
Cassandra - Locally Distributed
• Client writes to any node
• Node coordinates with others
• Data replicated in parallel
•...
Cassandra - Consistency
• Consistency Level (CL)
• Client specifies per read or write

• ALL = All replicas ack
• QUORUM =...
Cassandra - Transparent to the application
• A single node failure shouldn’t bring failure
• Replication Factor + Consiste...
My favorite feature.

Ever!

!14
Cassandra - Geographically Distributed
• Client writes local
• Data syncs across WAN
• Replication Factor per DC

!15
Cassandra Applications - Drivers
• DataStax Drivers for Cassandra
• Java
• C#
• Python
• more on the way

!16
Cassandra Applications - Connecting
• Create a pool of local servers
• Client just uses session to interact with Cassandra...
CQL Intro
• Cassandra Query Language
• SQL–like language to query Cassandra
• Limited predicates. Attempts to prevent bad ...
Data Model Logical containers
Cluster - Contains all nodes. Even across WAN
Keyspace - Contains all tables. Specifies repl...
CQL Intro
• CREATE / DROP / ALTER TABLE
• SELECT
!

• BUT
• INSERT AND UPDATE are similar to each other
• If a row doesn’t...
Data Modeling Creating Tables
CREATE TABLE user (!
! username varchar,!
! firstname varchar,!
! lastname varchar,!
! shopp...
CQL Inserts
• Insert will always overwrite

INSERT INTO users (username, firstname, lastname, !
email, password, created_d...
CQL Selects
• No joins
• Data is returned in row/column format
SELECT username, firstname, lastname, !
email, password, cr...
Cassandra and Time Series
Time Series Taming the beast
• Peter Higgs and Francois Englert. Nobel prize for Physics
• Theorized the existence of the ...
Use Cassandra for time series

Get a nobel prize
Time Series Why
• Storage model from BigTable is perfect
• One row key and tons of (variable)columns
• Single layout on di...
Time Series Example
• Storing weather data
• One weather station
• Temperature measurements every minute

WeatherStation I...
Time Series Example
• Query data
• Weather Station ID = Locality of single node
Date query

weatherStationID = 100 AND!
da...
Time Series How
• CQL expresses this well
• Data partitioned by weather station ID and time
CREATE TABLE temperature (!
we...
Time Series Further partitioning
• At every minute you will eventually run out of rows
• 2 billion columns per storage row...
Time Series Further Partitioning
• Still easy to insert
!
!

INSERT INTO temperature_by_day(weatherstation_id,date,event_t...
Time Series Use cases
• Logging
• Thing Tracking (IoT)
• Sensor Data
• User Tracking
• Fraud Detection
• Nobel prizes!
Application Example - Layout
• Active-Active
• Service based DNS routing

Cassandra Replication

!34
Application Example - Uptime
• Normal server maintenance
• Application is unaware

Cassandra Replication

!35
Application Example - Failure
• Data center failure

Another happy user!

• Data is safe. Route traffic.

33
!36
Cassandra Users and Use Cases
Netflix!
• If you haven’t heard their story… where have you been?
• 18B market cap — Runs on Cassandra
• User accounts
• P...
Spotify
• Millions of songs. Millions of users.
• Playlists
• 1 billion playlists
• 30+ Cassandra clusters
• 50+ TB of dat...
Instagram(Facebook)
• Loads and loads of photos. (Probably yours)
• All in AWS
• Security audits
• News feed
• 20k writes/...
DataStax Ac*demy for Apache Cassandra
Content
• First four sessions available with Weekly roll-out of 7 sessions total
• B...
©2013 DataStax Confidential. Do not distribute without consent.

!42
Upcoming SlideShare
Loading in...5
×

Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

2,244

Published on

Video: http://youtu.be/B-bTPSwhsDY

Abstract
Patrick McFadin (@PatrickMcFadin), Chief Evangelist for Apache Cassandra at DataStax, will be presenting an introduction to Cassandra as a key player in database technologies. Both large and small companies alike chose Apache Cassandra as their database solution and Patrick will be presenting on why they made that choice.

Patrick will also be discussing Cassandra's architecture, including: data modeling, time-series storage and replication strategies, providing a holistic overview of how Cassandra works and the best way to get started.

About Patrick McFadin
Prior to working for DataStax, Patrick was the Chief Architect at Hobsons, an education services company. His responsibilities included ensuring product availability and scaling for all higher education products. Prior to this position, he was the Director of Engineering at Hobsons which he came to after they acquired his company, Link-11 Systems, a software services company. While at Link-11 Systems, he built the first widely popular CRM system for universities, Connect. He obtained a BS in Computer Engineering from Cal Poly, San Luis Obispo and holds the distinction of being the only recipient of a medal (asanyone can find out) for hacking while serving in the US Navy.

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,244
On Slideshare
0
From Embeds
0
Number of Embeds
19
Actions
Shares
0
Downloads
75
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

  1. 1. Cassandra : Introduction Patrick McFadin Chief Evangelist/Solution Architect - DataStax @PatrickMcFadin ©2013 DataStax Confidential. Do not distribute without consent.
  2. 2. Who I am • Patrick McFadin • Solution Architect at DataStax • Cassandra MVP • User for years • Follow me for more: Dude. Uptime == $$ @PatrickMcFadin I talk about Cassandra and building scalable, resilient apps ALL THE TIME! !2
  3. 3. Five Years of Cassandra 0.1 Jul-08 ... 0.3 0 0.6 1 0.7 1.0 2 1.2 3 DSE 4 2.0 5
  4. 4. Why Cassandra?
  5. 5. The Best Persistence Tier For Your Application ! ! ! ! ! ! ! !
  6. 6. Cassandra - An introduction
  7. 7. Cassandra - Roots • Based on Amazon Dynamo and Google BigTable paper • Shared nothing • Data safe as possible • Predictable scaling Dynamo BigTable !7
  8. 8. Cassandra - More than one server Each node owns 25% of the data • All nodes participate in a cluster • Shared nothing • Add or remove as needed 25% • More capacity? Add a server
 25% 25% 25% !8
  9. 9. Core Concepts Write path <row,column> Compacted later
  10. 10. Core Concepts Read Path Real user story • New app • SSDs • 2.5 m requests • Client P99: 3.17ms!
  11. 11. Cassandra - Locally Distributed • Client writes to any node • Node coordinates with others • Data replicated in parallel • Replication factor: How many copies of your data? • RF = 3 here !11
  12. 12. Cassandra - Consistency • Consistency Level (CL) • Client specifies per read or write • ALL = All replicas ack • QUORUM = > 51% of replicas ack • LOCAL_QUORUM = > 51% in local DC ack • ONE = Only one replica acks !12
  13. 13. Cassandra - Transparent to the application • A single node failure shouldn’t bring failure • Replication Factor + Consistency Level = Success • This example: • RF = 3 • CL = QUORUM >51% Ack so we are good! !13
  14. 14. My favorite feature. Ever! !14
  15. 15. Cassandra - Geographically Distributed • Client writes local • Data syncs across WAN • Replication Factor per DC !15
  16. 16. Cassandra Applications - Drivers • DataStax Drivers for Cassandra • Java • C# • Python • more on the way !16
  17. 17. Cassandra Applications - Connecting • Create a pool of local servers • Client just uses session to interact with Cassandra ! contactPoints = {“10.0.0.1”,”10.0.0.2”}! ! keyspace = “videodb”! ! ! public VideoDbBasicImpl(List<String> contactPoints, String keyspace) {! cluster = Cluster! .builder()! .addContactPoints(! contactPoints.toArray(new String[contactPoints.size()]))! .withLoadBalancingPolicy(Policies.defaultLoadBalancingPolicy())! .withRetryPolicy(Policies.defaultRetryPolicy())! .build();! ! ! session = cluster.connect(keyspace);! } !17
  18. 18. CQL Intro • Cassandra Query Language • SQL–like language to query Cassandra • Limited predicates. Attempts to prevent bad queries • But still offers enough leeway to get into trouble !18
  19. 19. Data Model Logical containers Cluster - Contains all nodes. Even across WAN Keyspace - Contains all tables. Specifies replication Table (Column Family) - Contains rows
  20. 20. CQL Intro • CREATE / DROP / ALTER TABLE • SELECT ! • BUT • INSERT AND UPDATE are similar to each other • If a row doesn’t exist, UPDATE will insert it, and if it exists, INSERT will replace it. • Think of it as an UPSERT • Therefore we never get a key violation • For updates, Cassandra never reads (no col = col + 1) !20
  21. 21. Data Modeling Creating Tables CREATE TABLE user (! ! username varchar,! ! firstname varchar,! ! lastname varchar,! ! shopping_carts set<varchar>,! ! PRIMARY KEY (username)! ); Collection! CREATE TABLE shopping_cart (! ! username varchar,! ! cart_name text! ! item_id int,! ! item_name varchar,! description varchar,! ! price float,! ! item_detail map<varchar,varchar>! ! PRIMARY KEY ((username,cart_name),item_id)! ); Creates compound partition row key
  22. 22. CQL Inserts • Insert will always overwrite INSERT INTO users (username, firstname, lastname, ! email, password, created_date)! VALUES ('pmcfadin','Patrick','McFadin',! ['patrick@datastax.com'],'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00');! !22
  23. 23. CQL Selects • No joins • Data is returned in row/column format SELECT username, firstname, lastname, ! email, password, created_date! FROM users! WHERE username = 'pmcfadin';! username | firstname | lastname | email | password | created_date! ----------+-----------+----------+--------------------------+----------------------------------+--------------------------! pmcfadin | Patrick | McFadin | ['patrick@datastax.com'] | ba27e03fd95e507daf2937c937d499ab | 2011-06-20 13:50:00-0700! !23
  24. 24. Cassandra and Time Series
  25. 25. Time Series Taming the beast • Peter Higgs and Francois Englert. Nobel prize for Physics • Theorized the existence of the Higgs boson ! • Found using ATLAS ! ! • Data stored in P-BEAST ! ! • Time series running on Cassandra
  26. 26. Use Cassandra for time series Get a nobel prize
  27. 27. Time Series Why • Storage model from BigTable is perfect • One row key and tons of (variable)columns • Single layout on disk Row Key Column Name Column Name Column Value Column Value
  28. 28. Time Series Example • Storing weather data • One weather station • Temperature measurements every minute WeatherStation ID 2013-10-09 10:00 AM 2013-10-09 10:00 AM 72 Degrees 72 Degrees 2013-10-10 11:00 AM 65 Degrees
  29. 29. Time Series Example • Query data • Weather Station ID = Locality of single node Date query weatherStationID = 100 AND! date = 2013-10-09 10:00 AM WeatherStation ID 2013-10-09 10:00 AM 2013-10-09 10:00 AM 100 72 Degrees 72 Degrees 2013-10-10 11:00 AM 65 Degrees OR Date Range weatherStationID = 100 AND! date > 2013-10-09 10:00 AM AND! date < 2013-10-10 11:01 AM
  30. 30. Time Series How • CQL expresses this well • Data partitioned by weather station ID and time CREATE TABLE temperature (! weatherstation_id text,! event_time timestamp,! temperature text,! PRIMARY KEY (weatherstation_id,event_time)! ); ! ! ! • Easy to insert data INSERT INTO temperature(weatherstation_id,event_time,temperature) ! VALUES ('1234ABCD','2013-04-03 07:01:00','72F'); ! ! • Easy to query SELECT temperature ! FROM temperature ! WHERE weatherstation_id='1234ABCD'! AND event_time > '2013-04-03 07:01:00'! AND event_time < '2013-04-03 07:04:00';
  31. 31. Time Series Further partitioning • At every minute you will eventually run out of rows • 2 billion columns per storage row • Data partitioned by weather station ID and time • Use the partition key to split things up CREATE TABLE temperature_by_day (! weatherstation_id text,! date text,! event_time timestamp,! temperature text,! PRIMARY KEY ((weatherstation_id,date),event_time)! );
  32. 32. Time Series Further Partitioning • Still easy to insert ! ! INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature) ! VALUES ('1234ABCD','2013-04-03','2013-04-03 07:01:00','72F'); ! ! • Still easy to query SELECT temperature ! FROM temperature_by_day ! WHERE weatherstation_id='1234ABCD' ! AND date='2013-04-03'! AND event_time > '2013-04-03 07:01:00'! AND event_time < '2013-04-03 07:04:00';
  33. 33. Time Series Use cases • Logging • Thing Tracking (IoT) • Sensor Data • User Tracking • Fraud Detection • Nobel prizes!
  34. 34. Application Example - Layout • Active-Active • Service based DNS routing Cassandra Replication !34
  35. 35. Application Example - Uptime • Normal server maintenance • Application is unaware Cassandra Replication !35
  36. 36. Application Example - Failure • Data center failure Another happy user! • Data is safe. Route traffic. 33 !36
  37. 37. Cassandra Users and Use Cases
  38. 38. Netflix! • If you haven’t heard their story… where have you been? • 18B market cap — Runs on Cassandra • User accounts • Play lists • Payments • Statistics
  39. 39. Spotify • Millions of songs. Millions of users. • Playlists • 1 billion playlists • 30+ Cassandra clusters • 50+ TB of data • 40k req/sec peak http://www.slideshare.net/noaresare/cassandra-nyc !39
  40. 40. Instagram(Facebook) • Loads and loads of photos. (Probably yours) • All in AWS • Security audits • News feed • 20k writes/sec. 15k reads/sec. !40
  41. 41. DataStax Ac*demy for Apache Cassandra Content • First four sessions available with Weekly roll-out of 7 sessions total • Based on DataStax Community Edition • CQL, Schema Design and Data Modeling • Introduction to Cassandra Objects • First Java, then Python, C# and .NET Goals • 100,000 Registrations by the end of 2014 • 25,000 Certifications by the end of 2014 https://datastaxacademy.elogiclearning.com/ !41
  42. 42. ©2013 DataStax Confidential. Do not distribute without consent. !42
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×