SlideShare a Scribd company logo
A Deep Dive into Apache Cassandra for
.NET Developers
Luke Tillman (@LukeTillman)
Language Evangelist at DataStax
Who are you?!
• Evangelist with a focus on the .NET Community
• Long-time .NET Developer
• Recently presented at Cassandra Summit 2014 with Microsoft
• Very Recent Denver Transplant
2
Why should I care?
• Cassandra Core Principles
– Ease of Use
– Massive Scalability
– High Performance
– Always Available
• Growth!
3
DB Engines Rankings (January 2014)
http://db-engines.com/en/ranking
1 What is Cassandra and how does it work?
2 Cassandra Query Language (CQL)
3 Data Modeling Like a Pro
4 .NET Driver for Cassandra
5 Tools and Code
4
What is Cassandra and how does it work?
5
What is Cassandra?
• A Linearly Scaling and Fault Tolerant Distributed Database
• Fully Distributed
– Data spread over many nodes
– All nodes participate in a cluster
– All nodes are equal (masterless)
– No SPOF (shared nothing)
6
What is Cassandra?
• Linearly Scaling
– Have More Data? Add more nodes.
– Need More Throughput? Add more nodes.
7
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
What is Cassandra?
• Fault Tolerant
– Nodes Down != Database Down
– Datacenter Down != Database Down
8
What is Cassandra?
• Fully Replicated
• Clients write local
• Data syncs across WAN
• Replication Factor per DC
9
US Europe
Client
Cassandra and the CAP Theorem
• The CAP Theorem limits what distributed systems can do
• Consistency
• Availability
• Partition Tolerance
• Limits? “Pick 2 out of 3”
• Cassandra is an AP system that is Eventually Consistent
10
Two knobs control Cassandra fault tolerance
• Replication Factor (server side)
– How many copies of the data should exist?
11
Client
B
AD
C
AB
A
CD
D
BC
Write A
RF=3
Two knobs control Cassandra fault tolerance
• Consistency Level (client side)
– How many replicas do we need to hear from before we acknowledge?
12
Client
B
AD
C
AB
A
CD
D
BC
Write A
CL=QUORUM
Client
B
AD
C
AB
A
CD
D
BC
Write A
CL=ONE
Consistency Levels
• Applies to both Reads and Writes (i.e. is set on each query)
• ONE – one replica from any DC
• LOCAL_ONE – one replica from local DC
• QUORUM – 51% of replicas from any DC
• LOCAL_QUORUM – 51% of replicas from local DC
• ALL – all replicas
• TWO
13
Consistency Level and Speed
• How many replicas we need to hear from can affect how quickly
we can read and write data in Cassandra
14
Client
B
AD
C
AB
A
CD
D
BC
5 µs ack
300 µs ack
12 µs ack
12 µs ack
Read A
(CL=QUORUM)
Consistency Level and Availability
• Consistency Level choice affects availability
• For example, QUORUM can tolerate one replica being down and
still be available (in RF=3)
15
Client
B
AD
C
AB
A
CD
D
BC
A=2
A=2
A=2
Read A
(CL=QUORUM)
Consistency Level and Eventual Consistency
• Cassandra is an AP system that is Eventually Consistent so
replicas may disagree
• Column values are timestamped
• In Cassandra, Last Write Wins (LWW)
16
Client
B
AD
C
AB
A
CD
D
BC
A=2
Newer
A=1
Older
A=2
Read A
(CL=QUORUM)
Christos from Netflix: “Eventual Consistency != Hopeful Consistency”
https://www.youtube.com/watch?v=lwIA8tsDXXE
Writes in the cluster
• Fully distributed, no SPOF
• Node that receives a request is the Coordinator for request
• Any node can act as Coordinator
17
Client
B
AD
C
AB
A
CD
D
BC
Write A
(CL=ONE)
Coordinator Node
Writes in the cluster – Data Distribution
• Partition Key determines node placement
18
Partition Key
id='pmcfadin' lastname='McFadin'
id='jhaddad' firstname='Jon' lastname='Haddad'
id='ltillman' firstname='Luke' lastname='Tillman'
CREATE TABLE users (
id text,
firstname text,
lastname text,
PRIMARY KEY (id)
);
Writes in the cluster – Data Distribution
• The Partition Key is hashed using a consistent hashing function
(Murmur 3) and the output is used to place the data on a node
• The data is also replicated to RF-1 other nodes
19
Partition Key
id='ltillman' firstname='Luke' lastname='Tillman'
Murmur3
id: ltillman Murmur3: A
B
AD
C
AB
A
CD
D
BC
RF=3
Hashing – Back to Reality
• Back in reality, Partition Keys actually hash to 128 bit numbers
• Nodes in Cassandra own token ranges (i.e. hash ranges)
20
B
AD
C
AB
A
CD
D
BC
Range Start End
A 0xC000000..1 0x0000000..0
B 0x0000000..1 0x4000000..0
C 0x4000000..1 0x8000000..0
D 0x8000000..1 0xC000000..0
Partition Key
id='ltillman' Murmur3 0xadb95e99da887a8a4cb474db86eb5769
Writes on a single node
• Client makes a write request
Client
UPDATE users
SET firstname = 'Luke'
WHERE id = 'ltillman'
Disk
Memory
Writes on a single node
Client
UPDATE users
SET firstname = 'Luke'
WHERE id = 'ltillman'
Commit Log
id='ltillman', firstname='Luke'
…
…
Disk
Memory
• Data is appended to the Commit Log
• Append only, sequential IO == FAST
Writes on a single node
Client
UPDATE users
SET firstname = 'Luke'
WHERE id = 'ltillman'
Commit Log
id='ltillman', firstname='Luke'
…
…
Disk
Memory
Memtable for Users Some
Other
Memtableid='ltillman' firstname='Luke' lastname='Tillman'
• Data is written/merged into Memtable
Writes on a single node
Client
UPDATE users
SET firstname = 'Luke'
WHERE id = 'ltillman'
Commit Log
id='ltillman', firstname='Luke'
…
…
Disk
Memory
Memtable for Users Some
Other
Memtableid='ltillman' firstname='Luke' lastname='Tillman'
• Server acknowledges to client
• Writes in C* are FAST due to simplicity, log structured storage
Writes on a single node
Client
UPDATE users
SET firstname = 'Luke'
WHERE id = 'ltillman'
Data Directory
Disk
Memory
Memtable for Users Some
Other
Memtableid='ltillman' firstname='Luke' lastname='Tillman'
Some
Other
SSTable
SSTable
#1 for
Users
SSTable
#2 for
Users
• Once Memtable is full, data is flushed to disk as SSTable (Sorted
String Table)
Compaction
• Compactions merge and unify data in our SSTables
• SSTables are immutable, so this is when we consolidate rows
26
SSTable
#1 for
Users
SSTable
#2 for
Users
SSTable #3 for
Users
id='ltillman'
firstname='Lucas'
(timestamp=Older)
lastname='Tillman'
id='ltillman' firstname='Luke' lastname='Tillman'
id='ltillman'
firstname='Luke'
(timestamp=Newer)
Reads in the cluster
• Same as writes in the cluster, reads are coordinated
• Any node can be the Coordinator Node
27
Client
B
AD
C
AB
A
CD
D
BC
Read A
(CL=QUORUM)
Coordinator Node
Reads on a single node
• Client makes a read request
28
Client
SELECT firstname, lastname
FROM users
WHERE id = 'ltillman'
Disk
Memory
Reads on a single node
• Data is read from (possibly multiple) SSTables and merged
29
Client
SELECT firstname, lastname
FROM users
WHERE id = 'ltillman'
Disk
Memory
SSTable #1 for Users
id='ltillman'
firstname='Lucas'
(timestamp=Older)
lastname='Tillman'
SSTable #2 for Users
id='ltillman'
firstname='Luke'
(timestamp=Newer)
firstname='Luke' lastname='Tillman'
Reads on a single node
• Any unflushed Memtable data is also merged
30
Client
SELECT firstname, lastname
FROM users
WHERE id = 'ltillman'
Disk
Memory
firstname='Luke' lastname='Tillman'
Memtable
for Users
Reads on a single node
• Client gets acknowledgement with the data
• Reads in Cassandra are also FAST but often limited by Disk IO
31
Client
SELECT firstname, lastname
FROM users
WHERE id = 'ltillman'
Disk
Memory
firstname='Luke' lastname='Tillman'
Compaction - Revisited
• Compactions merge and unify data in our SSTables, making
them important to reads (less SSTables = less to read/merge)
32
SSTable
#1 for
Users
SSTable
#2 for
Users
SSTable #3 for
Users
id='ltillman'
firstname='Lucas'
(timestamp=Older)
lastname='Tillman'
id='ltillman' firstname='Luke' lastname='Tillman'
id='ltillman'
firstname='Luke'
(timestamp=Newer)
Cassandra Query Language (CQL)
33
Data Structures
• Keyspace is like RDBMS Database or Schema
• Like RDBMS, Cassandra uses Tables to store data
• Partitions can have one row (narrow) or multiple
rows (wide)
34
Keyspace
Tables
Partitions
Rows
Schema Definition (DDL)
• Easy to define tables for storing data
• First part of Primary Key is the Partition Key
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
tags set<text>,
added_date timestamp,
PRIMARY KEY (videoid)
);
Schema Definition (DDL)
• One row per partition (familiar)
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
tags set<text>,
added_date timestamp,
PRIMARY KEY (videoid)
);
name ...
Keyboard Cat ...
Nyan Cat ...
Original Grumpy Cat ...
videoid
689d56e5- …
93357d73- …
d978b136- …
Clustering Columns
• Second part of Primary Key is Clustering Columns
• Clustering columns affect ordering of data (on disk)
• Multiple rows per partition
37
CREATE TABLE comments_by_video (
videoid uuid,
commentid timeuuid,
userid uuid,
comment text,
PRIMARY KEY (videoid, commentid)
) WITH CLUSTERING ORDER BY (commentid DESC);
Clustering Columns – Wide Rows (Partitions)
• Use of Clustering Columns is where the (old) term “Wide Rows”
comes from
38
videoid='0fe6a...'
userid=
'ac346...'
comment=
'Awesome!'
commentid='82be1...'
(10/1/2014 9:36AM)
userid=
'f89d3...'
comment=
'Garbage!'
commentid='765ac...'
(9/17/2014 7:55AM)
CREATE TABLE comments_by_video (
videoid uuid,
commentid timeuuid,
userid uuid,
comment text,
PRIMARY KEY (videoid, commentid)
) WITH CLUSTERING ORDER BY (commentid DESC);
Inserts and Updates
• Use INSERT or UPDATE to add and modify data
• Both will overwrite data (no constraints like RDBMS)
• INSERT and UPDATE functionally equivalent
39
INSERT INTO comments_by_video (
videoid, commentid, userid, comment)
VALUES (
'0fe6a...', '82be1...', 'ac346...', 'Awesome!');
UPDATE comments_by_video
SET userid = 'ac346...', comment = 'Awesome!'
WHERE videoid = '0fe6a...' AND commentid = '82be1...';
TTL and Deletes
• Can specify a Time to Live (TTL) in seconds when doing an
INSERT or UPDATE
• Use DELETE statement to remove data
• Can optionally specify columns to remove part of a row
40
INSERT INTO comments_by_video ( ... )
VALUES ( ... )
USING TTL 86400;
DELETE FROM comments_by_video
WHERE videoid = '0fe6a...' AND commentid = '82be1...';
Querying
• Use SELECT to get data from your tables
• Always include Partition Key and optionally Clustering Columns
• Can use ORDER BY and LIMIT
• Use range queries (for example, by date) to slice partitions
41
SELECT * FROM comments_by_video
WHERE videoid = 'a67cd...'
LIMIT 10;
Data Modeling Like a Pro
42
Cassandra Data Modeling
• Requires a different mindset than RDBMS modeling
• Know your data and your queries up front
• Queries drive a lot of the modeling decisions (i.e. “table per
query” pattern)
• Denormalize/Duplicate data at write time to do as few queries
as possible come read time
• Remember, disk is cheap and writes in Cassandra are FAST
43
Getting to Know Your Data
44
User
id
firstname
lastname
email
password
Video
id
name
description
location
preview_image
tags
features
Comment
comment
id
adds
timestamp
posts
timestamp
1
n
n
1
1
n
n
m
rates
rating
Application Workflows
45
User Logs
into site
Show basic
information
about user
Show videos
added by a
user
Show
comments
posted by a
user
Search for a
video by tag
Show latest
videos added
to the site
Show
comments
for a video
Show ratings
for a video
Show video
and its
details
Queries come from Workflows
46
Users
User Logs
into site
Find user by email
address
Show basic
information
about user
Find user by id
Comments
Show
comments
for a video
Find comments by
video (latest first)
Show
comments
posted by a
user
Find comments by
user (latest first)
Ratings
Show ratings
for a video Find ratings by video
Users – The Relational Way
• Single Users table with all user data and an Id Primary Key
• Add an index on email address to allow queries by email
User Logs
into site
Find user by email
address
Show basic
information
about user
Find user by id
Users – The Cassandra Way
User Logs
into site
Find user by email
address
Show basic
information
about user
Find user by id
CREATE TABLE user_credentials (
email text,
password text,
userid uuid,
PRIMARY KEY (email)
);
CREATE TABLE users (
userid uuid,
firstname text,
lastname text,
email text,
created_date timestamp,
PRIMARY KEY (userid)
);
Modeling Relationships – Collection Types
• Cassandra doesn’t support JOINs, but your data will still have
relationships (and you can still model that in Cassandra)
• One tool available is CQL collection types
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
location text,
location_type int,
preview_image_location text,
tags set<text>,
added_date timestamp,
PRIMARY KEY (videoid)
);
Modeling Relationships – Client Side Joins
50
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
location text,
location_type int,
preview_image_location text,
tags set<text>,
added_date timestamp,
PRIMARY KEY (videoid)
);
CREATE TABLE users (
userid uuid,
firstname text,
lastname text,
email text,
created_date timestamp,
PRIMARY KEY (userid)
);
Currently requires query for video,
followed by query for user by id based
on results of first query
Modeling Relationships – Client Side Joins
• What is the cost? Might be OK in small situations
• Do NOT scale
• Avoid when possible
51
Modeling Relationships – Client Side Joins
52
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
...
user_firstname text,
user_lastname text,
user_email text,
PRIMARY KEY (videoid)
);
CREATE TABLE users_by_video (
videoid uuid,
userid uuid,
firstname text,
lastname text,
email text,
PRIMARY KEY (videoid)
);
or
.NET Driver for Cassandra
53
.NET and Cassandra
• Open Source (on GitHub), available via NuGet
• Bootstrap using the Builder and then reuse the ISession object
Cluster cluster = Cluster.Builder()
.AddContactPoint("127.0.0.1")
.Build();
ISession session = cluster.Connect("killrvideo");
54
.NET and Cassandra
• Executing CQL with SimpleStatement
• Sync and Async API available for executing statements
• Use Async API for executing queries in parallel
var videoId = Guid.NewGuid();
var statement = new SimpleStatement("SELECT * FROM videos WHERE videoid = ?",
videoId);
RowSet rows = await session.ExecuteAsync(statement);
55
.NET and Cassandra
• Getting values from a RowSet is easy
• Rowset is a collection of Row (IEnumerable<Row>)
RowSet rows = await _session.ExecuteAsync(statement);
foreach (Row row in rows)
{
var videoId = row.GetValue<Guid>("videoid");
var addedDate = row.GetValue<DateTimeOffset>("added_date");
var name = row.GetValue<string>("name");
}
56
CQL 3 Data Types to .NET Types
• Full listing available in driver docs (http://www.datastax.com/docs)
CQL 3 Data Type .NET Type
bigint, counter long
boolean bool
decimal, float float
double double
int int
uuid, timeuuid System.Guid
text, varchar string (Encoding.UTF8)
timestamp System.DateTimeOffset
varint System.Numerics.BigInteger
.NET and Cassandra - PreparedStatement
• SimpleStatement useful for one-off CQL execution (or when
dynamic CQL is a possibility)
• Use PreparedStatement for better performance
• Pay the cost of Prepare once (server roundtrip)
• Save the PreparedStatement instance and reuse
PreparedStatement prepared = session.Prepare(
"SELECT * FROM user_credentials WHERE email = ?");
.NET and Cassandra - PreparedStatement
• Bind variable values to get BoundStatement for execution
• Execution only has to send variable values
• You will use these all the time
• Remember: Prepare once, bind and execute many
BoundStatement bound = prepared.Bind("luke.tillman@datastax.com");
RowSet rows = await _session.ExecuteAsync(bound);
.NET and Cassandra - BatchStatement
• Add Simple/Bound statements to a batch
BoundStatement bound = prepared.Bind(video.VideoId, video.Name);
var simple = new SimpleStatement(
"UPDATE videos SET name = ? WHERE videoid = ?"
).Bind(video.Name, video.VideoId);
// Use an atomic batch to send over all the mutations
var batchStatement = new BatchStatement();
batchStatement.Add(bound);
batchStatement.Add(simple);
RowSet rows = await _session.ExecuteAsync(batch);
.NET and Cassandra - BatchStatement
• Batches are Logged (atomic) by default
• Use when you want a group of mutations (statements) to all
succeed or all fail (denormalizing at write time)
• Really large batches are an anti-pattern (and Cassandra will
warn you)
• Not a performance optimization for bulk-loading data
.NET and Cassandra – Statement Options
• Options like Consistency Level and Retry Policy are available at
the Statement level
• If not set on a statement, driver will fallback to defaults set when
building/configuring the Cluster
62
IStatement bound =
prepared.Bind("luke.tillman@datastax.com")
.SetPageSize(100)
.SetConsistencyLevel(ConsistencyLevel.LocalOne)
.SetRetryPolicy(new DefaultRetryPolicy())
.EnableTracing();
Lightweight Transactions (LWT)
• Use when you don’t want writes to step on each other
• AKA Linearizable Consistency
• Serial Isolation Level
• Be sure to read the fine print: has a latency cost associated
with using it, so use only where needed
• The canonical example: unique user accounts
Lightweight Transactions (LWT)
• Returns a column called [applied] indicating success/failure
• Different from the relational world where you might expect an
Exception (i.e. PrimaryKeyViolationException or similar)
var statement = new SimpleStatement("INSERT INTO user_credentials (email,
password) VALUES (?, ?) IF NOT EXISTS");
statement = statement.Bind("user1@killrvideo.com", "Password1!");
RowSet rows = await _session.ExecuteAsync(statement);
var userInserted = rows.Single().GetValue<bool>("[applied]");
Automatic Paging
• The Problem: Loading big result sets into memory is a recipe
for disaster (OutOfMemoryExceptions, etc.)
• Better to load and process a large result set in pages (chunks)
• Doing this manually with Cassandra prior to 2.0 was a pain
• Automatic Paging makes paging on a large RowSet
transparent
Automatic Paging
• Set a page size on a statement
• Iterate over the resulting RowSet
• As you iterate, new pages are fetched transparently when the
Rows in the current page are exhausted
• Will allow you to iterate until all pages are exhausted
boundStatement = boundStatement.SetPageSize(100);
RowSet rows = await _session.ExecuteAsync(boundStatement);
foreach (Row row in rows)
{
}
Typical Pager UI in a Web Application
• Show page of records in UI and allow user to navigate
Typical Pager UI in a Web Application
• Automatic Paging – this is not the feature you are looking for
.NET and Cassandra
• Mapping results to DTOs: if you like using CQL for querying, try
Mapper component (formerly CqlPoco package)
public class User
{
public Guid UserId { get; set; }
public string Name { get; set; }
}
// Create a mapper from your session object
var mapper = new Mapper(session);
// Get a user by id from Cassandra or null if not found
var user = client.SingleOrDefault<User>(
"SELECT userid, name FROM users WHERE userid = ?", someUserId);
69
.NET and Cassandra
• Mapping results to DTOs: if you like LINQ, use built-in LINQ
provider
[Table("users")]
public class User
{
[Column("userid"), PartitionKey]
public Guid UserId { get; set; }
[Column("name")]
public string Name { get; set; }
}
var user = session.GetTable<User>()
.SingleOrDefault(u => u.UserId == someUserId)
.Execute();
70
Tools and Code
71
Installing Cassandra
• Planet Cassandra for all things C* (planetcassandra.org)
Installing Cassandra
• Windows installer is super easy way to do development and
testing on your local machine
• Production Cassandra deployments on Linux (Windows
performance parity is coming in 3.0)
In the Box – Command Line Tools
• Use cqlsh REPL for running CQL against your cluster
74
In the Box – Command Line Tools
• Use nodetool for information on your cluster (and lots more)
• On Windows, available under apache-cassandrabin folder
75
DevCenter
• Get it on DataStax
web site
(www.datastax.com)
• GUI for Cassandra
development (think
SQL Server
Management Studio
for Cassandra)
76
Sample Code – KillrVideo
• Live demo available at http://www.killrvideo.com
– Written in C#
– Live Demo running in Azure
– Open source: https://github.com/luketillman/killrvideo-csharp
77
Questions? Overtime (use cases)?
Follow me for updates or to ask questions later: @LukeTillman
78
Overtime: Who’s using it?
79
Cassandra Adoption
Some Common Use Case Categories
• Product Catalogs and Playlists
• Internet of Things (IoT) and
Sensor Data
• Messaging (emails, IMs, alerts,
comments)
• Recommendation and
Personalization
• Fraud Detection
• Time series and temporal
ordered data
http://planetcassandra.org/apache-cassandra-use-cases/
The “Slide Heard Round the World”
• From Cassandra
Summit 2014, got a
lot of attention
• 75,000+ nodes
• 10s of PBs of data
• Millions ops/s
• One of the largest
known Cassandra
deployments
82
Spotify
• Streaming music web service
• > 24,000,000 music tracks
• > 50TB of data in Cassandra
Why Cassandra?
• Was PostgreSQL, but hit scaling
problems
• Multi Datacenter Availability
• Integration with Spark for data
processing and analytics
Usage
• Catalog
• User playlists
• Artists following
• Radio Stations
• Event notifications
83
http://planetcassandra.org/blog/interview/spotify-scales-to-the-top-of-the-charts-with-apache-cassandra-at-40k-requestssecond/
eBay
• Online auction site
• > 250TB of data, dozens of nodes,
multiple data centres
• > 6 billion writes, > 5 billion reads
per day
Why Cassandra?
• Low latency, high scale, multiple data
centers
• Suited for graph structures using
wide rows
Usage
• Building next generation of
recommendation engine
• Storing user activity data
• Updating models of user interests in
real time
84
http://planetcassandra.org/blog/5-minute-c-interview-ebay/
FullContact
• Contact management: from multiple
sources, sync, de-dupe, APIs available
• 2 clusters, dozens of nodes, running
in AWS
• Based here in Denver
Why Cassandra?
• Migated from MongoDB after
running into scaling issues
• Operational simplicity
• Resilience and Availability
Usage
• Person API (search by email, Twitter
handle, Facebook, or phone)
• Searched data from multiple sources
(ingested by Hadoop M/R jobs)
• Resolved profiles
85
http://planetcassandra.org/blog/fullcontact-readies-their-search-platform-to-scale-moves-from-mongodb-to-apache-cassandra/
Instagram
• Photo-sharing, video-sharing and
social networking service
• Originally AWS (Now Facebook data
centers?)
• > 20k writes/second, >15k
reads/second
Why Cassandra?
• Migrated from Redis (problems
keeping everything in memory)
• No painful “sharding” process
• 75% reduction in costs
Usage
• Auditing information – security,
integrity, spam detection
• News feed (“inboxes” or activity feed)
– Likes, Follows, etc.
86
http://planetcassandra.org/blog/instagram-making-the-switch-to-cassandra-from-redis-75-instasavings/
Summit 2014 Presentation: https://www.youtube.com/watch?v=_gc94ITUitY
Netflix
• TV and Movie streaming service
• > 2700+ nodes on over 90 clusters
• 4 Datacenters
• > 1 Trillion operations per day
Why Cassandra?
• Migrated from Oracle
• Massive amounts of data
• Multi datacenter, No SPOF
• No downtime for schema changes
Usage
• Everything! (Almost – 95% of DB use)
• Example: Personalization
– What titles do you play?
– What do you play before/after?
– Where did you pause?
– What did you abandon watching after 5
minutes?
87
http://planetcassandra.org/blog/case-study-netflix/
Summit 2014 Presentation: https://www.youtube.com/watch?v=RMSNLP_ORg8&index=43&list=UUvP-AXuCr-naAeEccCfKwUA
Go forth and build awesome things!
Follow me for updates or to ask questions later: @LukeTillman
88

More Related Content

What's hot

All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdf
Altinity Ltd
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
Aparna Pillai
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
Yoshinori Matsunobu
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
Citi Tech Talk Disaster Recovery Solutions Deep Dive
Citi Tech Talk  Disaster Recovery Solutions Deep DiveCiti Tech Talk  Disaster Recovery Solutions Deep Dive
Citi Tech Talk Disaster Recovery Solutions Deep Dive
confluent
 
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
confluent
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
The Hive
 
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
DataStax
 
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel LiljencrantzC* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
DataStax Academy
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
DataStax Academy
 
Modeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQLModeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQL
ScyllaDB
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
confluent
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
confluent
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafka
confluent
 
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Altinity Ltd
 
High Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseHigh Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouse
Altinity Ltd
 
Cassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsCassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requests
grro
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
Brent Theisen
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Altinity Ltd
 

What's hot (20)

All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdf
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Citi Tech Talk Disaster Recovery Solutions Deep Dive
Citi Tech Talk  Disaster Recovery Solutions Deep DiveCiti Tech Talk  Disaster Recovery Solutions Deep Dive
Citi Tech Talk Disaster Recovery Solutions Deep Dive
 
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
 
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
 
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel LiljencrantzC* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
 
Modeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQLModeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQL
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafka
 
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
 
High Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseHigh Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouse
 
Cassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsCassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requests
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
 

Viewers also liked

Cassandra Day NY 2014: Getting Started with the DataStax C# Driver
Cassandra Day NY 2014: Getting Started with the DataStax C# DriverCassandra Day NY 2014: Getting Started with the DataStax C# Driver
Cassandra Day NY 2014: Getting Started with the DataStax C# Driver
DataStax Academy
 
Getting started with DataStax .NET Driver for Cassandra
Getting started with DataStax .NET Driver for CassandraGetting started with DataStax .NET Driver for Cassandra
Getting started with DataStax .NET Driver for Cassandra
Luke Tillman
 
Introduction to .Net Driver
Introduction to .Net DriverIntroduction to .Net Driver
Introduction to .Net Driver
DataStax Academy
 
Client Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right Way
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
Luke Tillman
 
Building your First Application with Cassandra
Building your First Application with CassandraBuilding your First Application with Cassandra
Building your First Application with Cassandra
Luke Tillman
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
Luke Tillman
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time Cassandra
Acunu
 
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
Luke Tillman
 
Avoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
Avoiding the Pit of Despair - Event Sourcing with Akka and CassandraAvoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
Avoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
Luke Tillman
 
Bulk Loading into Cassandra
Bulk Loading into CassandraBulk Loading into Cassandra
Bulk Loading into Cassandra
Brian Hess
 
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
Luke Tillman
 
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
Luke Tillman
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Applicationsupertom
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
Jeremy Hanna
 

Viewers also liked (15)

Cassandra Day NY 2014: Getting Started with the DataStax C# Driver
Cassandra Day NY 2014: Getting Started with the DataStax C# DriverCassandra Day NY 2014: Getting Started with the DataStax C# Driver
Cassandra Day NY 2014: Getting Started with the DataStax C# Driver
 
Getting started with DataStax .NET Driver for Cassandra
Getting started with DataStax .NET Driver for CassandraGetting started with DataStax .NET Driver for Cassandra
Getting started with DataStax .NET Driver for Cassandra
 
Introduction to .Net Driver
Introduction to .Net DriverIntroduction to .Net Driver
Introduction to .Net Driver
 
Client Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right Way
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Building your First Application with Cassandra
Building your First Application with CassandraBuilding your First Application with Cassandra
Building your First Application with Cassandra
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time Cassandra
 
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
 
Avoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
Avoiding the Pit of Despair - Event Sourcing with Akka and CassandraAvoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
Avoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
 
Bulk Loading into Cassandra
Bulk Loading into CassandraBulk Loading into Cassandra
Bulk Loading into Cassandra
 
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
 
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Application
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 

Similar to A Deep Dive into Apache Cassandra for .NET Developers

Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Spark
nickmbailey
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
ScyllaDB
 
Cassandra Fundamentals - C* 2.0
Cassandra Fundamentals - C* 2.0Cassandra Fundamentals - C* 2.0
Cassandra Fundamentals - C* 2.0
Russell Spitzer
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
Christian Johannsen
 
Devops kc
Devops kcDevops kc
Devops kc
Philip Thompson
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
Jesus Guzman
 
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Hsien-Hsin Sean Lee, Ph.D.
 
Bloomreach - BloomStore Compute Cloud Infrastructure
Bloomreach - BloomStore Compute Cloud Infrastructure Bloomreach - BloomStore Compute Cloud Infrastructure
Bloomreach - BloomStore Compute Cloud Infrastructure
bloomreacheng
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentationMurat Çakal
 
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
DataStax Academy
 
Cyclone DDS Unleashed: Scalability in DDS and Dealing with Large Systems
Cyclone DDS Unleashed: Scalability in DDS and Dealing with Large SystemsCyclone DDS Unleashed: Scalability in DDS and Dealing with Large Systems
Cyclone DDS Unleashed: Scalability in DDS and Dealing with Large Systems
ZettaScaleTechnology
 
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsLeveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Julien Anguenot
 
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
DataStax Academy
 
SignalFx: Making Cassandra Perform as a Time Series Database
SignalFx: Making Cassandra Perform as a Time Series DatabaseSignalFx: Making Cassandra Perform as a Time Series Database
SignalFx: Making Cassandra Perform as a Time Series Database
DataStax Academy
 
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
SignalFx
 
Aerospike Go Language Client
Aerospike Go Language ClientAerospike Go Language Client
Aerospike Go Language Client
Sayyaparaju Sunil
 
Geographically dispersed perconaxtra db cluster deployment
Geographically dispersed perconaxtra db cluster deploymentGeographically dispersed perconaxtra db cluster deployment
Geographically dispersed perconaxtra db cluster deployment
Marco Tusa
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Sid Anand
 
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDBAWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
Amazon Web Services
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Boris Yen
 

Similar to A Deep Dive into Apache Cassandra for .NET Developers (20)

Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Spark
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
 
Cassandra Fundamentals - C* 2.0
Cassandra Fundamentals - C* 2.0Cassandra Fundamentals - C* 2.0
Cassandra Fundamentals - C* 2.0
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Devops kc
Devops kcDevops kc
Devops kc
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
 
Bloomreach - BloomStore Compute Cloud Infrastructure
Bloomreach - BloomStore Compute Cloud Infrastructure Bloomreach - BloomStore Compute Cloud Infrastructure
Bloomreach - BloomStore Compute Cloud Infrastructure
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
 
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
 
Cyclone DDS Unleashed: Scalability in DDS and Dealing with Large Systems
Cyclone DDS Unleashed: Scalability in DDS and Dealing with Large SystemsCyclone DDS Unleashed: Scalability in DDS and Dealing with Large Systems
Cyclone DDS Unleashed: Scalability in DDS and Dealing with Large Systems
 
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsLeveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
 
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
 
SignalFx: Making Cassandra Perform as a Time Series Database
SignalFx: Making Cassandra Perform as a Time Series DatabaseSignalFx: Making Cassandra Perform as a Time Series Database
SignalFx: Making Cassandra Perform as a Time Series Database
 
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
 
Aerospike Go Language Client
Aerospike Go Language ClientAerospike Go Language Client
Aerospike Go Language Client
 
Geographically dispersed perconaxtra db cluster deployment
Geographically dispersed perconaxtra db cluster deploymentGeographically dispersed perconaxtra db cluster deployment
Geographically dispersed perconaxtra db cluster deployment
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
 
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDBAWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
 

Recently uploaded

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 

Recently uploaded (20)

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 

A Deep Dive into Apache Cassandra for .NET Developers

  • 1. A Deep Dive into Apache Cassandra for .NET Developers Luke Tillman (@LukeTillman) Language Evangelist at DataStax
  • 2. Who are you?! • Evangelist with a focus on the .NET Community • Long-time .NET Developer • Recently presented at Cassandra Summit 2014 with Microsoft • Very Recent Denver Transplant 2
  • 3. Why should I care? • Cassandra Core Principles – Ease of Use – Massive Scalability – High Performance – Always Available • Growth! 3 DB Engines Rankings (January 2014) http://db-engines.com/en/ranking
  • 4. 1 What is Cassandra and how does it work? 2 Cassandra Query Language (CQL) 3 Data Modeling Like a Pro 4 .NET Driver for Cassandra 5 Tools and Code 4
  • 5. What is Cassandra and how does it work? 5
  • 6. What is Cassandra? • A Linearly Scaling and Fault Tolerant Distributed Database • Fully Distributed – Data spread over many nodes – All nodes participate in a cluster – All nodes are equal (masterless) – No SPOF (shared nothing) 6
  • 7. What is Cassandra? • Linearly Scaling – Have More Data? Add more nodes. – Need More Throughput? Add more nodes. 7 http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
  • 8. What is Cassandra? • Fault Tolerant – Nodes Down != Database Down – Datacenter Down != Database Down 8
  • 9. What is Cassandra? • Fully Replicated • Clients write local • Data syncs across WAN • Replication Factor per DC 9 US Europe Client
  • 10. Cassandra and the CAP Theorem • The CAP Theorem limits what distributed systems can do • Consistency • Availability • Partition Tolerance • Limits? “Pick 2 out of 3” • Cassandra is an AP system that is Eventually Consistent 10
  • 11. Two knobs control Cassandra fault tolerance • Replication Factor (server side) – How many copies of the data should exist? 11 Client B AD C AB A CD D BC Write A RF=3
  • 12. Two knobs control Cassandra fault tolerance • Consistency Level (client side) – How many replicas do we need to hear from before we acknowledge? 12 Client B AD C AB A CD D BC Write A CL=QUORUM Client B AD C AB A CD D BC Write A CL=ONE
  • 13. Consistency Levels • Applies to both Reads and Writes (i.e. is set on each query) • ONE – one replica from any DC • LOCAL_ONE – one replica from local DC • QUORUM – 51% of replicas from any DC • LOCAL_QUORUM – 51% of replicas from local DC • ALL – all replicas • TWO 13
  • 14. Consistency Level and Speed • How many replicas we need to hear from can affect how quickly we can read and write data in Cassandra 14 Client B AD C AB A CD D BC 5 µs ack 300 µs ack 12 µs ack 12 µs ack Read A (CL=QUORUM)
  • 15. Consistency Level and Availability • Consistency Level choice affects availability • For example, QUORUM can tolerate one replica being down and still be available (in RF=3) 15 Client B AD C AB A CD D BC A=2 A=2 A=2 Read A (CL=QUORUM)
  • 16. Consistency Level and Eventual Consistency • Cassandra is an AP system that is Eventually Consistent so replicas may disagree • Column values are timestamped • In Cassandra, Last Write Wins (LWW) 16 Client B AD C AB A CD D BC A=2 Newer A=1 Older A=2 Read A (CL=QUORUM) Christos from Netflix: “Eventual Consistency != Hopeful Consistency” https://www.youtube.com/watch?v=lwIA8tsDXXE
  • 17. Writes in the cluster • Fully distributed, no SPOF • Node that receives a request is the Coordinator for request • Any node can act as Coordinator 17 Client B AD C AB A CD D BC Write A (CL=ONE) Coordinator Node
  • 18. Writes in the cluster – Data Distribution • Partition Key determines node placement 18 Partition Key id='pmcfadin' lastname='McFadin' id='jhaddad' firstname='Jon' lastname='Haddad' id='ltillman' firstname='Luke' lastname='Tillman' CREATE TABLE users ( id text, firstname text, lastname text, PRIMARY KEY (id) );
  • 19. Writes in the cluster – Data Distribution • The Partition Key is hashed using a consistent hashing function (Murmur 3) and the output is used to place the data on a node • The data is also replicated to RF-1 other nodes 19 Partition Key id='ltillman' firstname='Luke' lastname='Tillman' Murmur3 id: ltillman Murmur3: A B AD C AB A CD D BC RF=3
  • 20. Hashing – Back to Reality • Back in reality, Partition Keys actually hash to 128 bit numbers • Nodes in Cassandra own token ranges (i.e. hash ranges) 20 B AD C AB A CD D BC Range Start End A 0xC000000..1 0x0000000..0 B 0x0000000..1 0x4000000..0 C 0x4000000..1 0x8000000..0 D 0x8000000..1 0xC000000..0 Partition Key id='ltillman' Murmur3 0xadb95e99da887a8a4cb474db86eb5769
  • 21. Writes on a single node • Client makes a write request Client UPDATE users SET firstname = 'Luke' WHERE id = 'ltillman' Disk Memory
  • 22. Writes on a single node Client UPDATE users SET firstname = 'Luke' WHERE id = 'ltillman' Commit Log id='ltillman', firstname='Luke' … … Disk Memory • Data is appended to the Commit Log • Append only, sequential IO == FAST
  • 23. Writes on a single node Client UPDATE users SET firstname = 'Luke' WHERE id = 'ltillman' Commit Log id='ltillman', firstname='Luke' … … Disk Memory Memtable for Users Some Other Memtableid='ltillman' firstname='Luke' lastname='Tillman' • Data is written/merged into Memtable
  • 24. Writes on a single node Client UPDATE users SET firstname = 'Luke' WHERE id = 'ltillman' Commit Log id='ltillman', firstname='Luke' … … Disk Memory Memtable for Users Some Other Memtableid='ltillman' firstname='Luke' lastname='Tillman' • Server acknowledges to client • Writes in C* are FAST due to simplicity, log structured storage
  • 25. Writes on a single node Client UPDATE users SET firstname = 'Luke' WHERE id = 'ltillman' Data Directory Disk Memory Memtable for Users Some Other Memtableid='ltillman' firstname='Luke' lastname='Tillman' Some Other SSTable SSTable #1 for Users SSTable #2 for Users • Once Memtable is full, data is flushed to disk as SSTable (Sorted String Table)
  • 26. Compaction • Compactions merge and unify data in our SSTables • SSTables are immutable, so this is when we consolidate rows 26 SSTable #1 for Users SSTable #2 for Users SSTable #3 for Users id='ltillman' firstname='Lucas' (timestamp=Older) lastname='Tillman' id='ltillman' firstname='Luke' lastname='Tillman' id='ltillman' firstname='Luke' (timestamp=Newer)
  • 27. Reads in the cluster • Same as writes in the cluster, reads are coordinated • Any node can be the Coordinator Node 27 Client B AD C AB A CD D BC Read A (CL=QUORUM) Coordinator Node
  • 28. Reads on a single node • Client makes a read request 28 Client SELECT firstname, lastname FROM users WHERE id = 'ltillman' Disk Memory
  • 29. Reads on a single node • Data is read from (possibly multiple) SSTables and merged 29 Client SELECT firstname, lastname FROM users WHERE id = 'ltillman' Disk Memory SSTable #1 for Users id='ltillman' firstname='Lucas' (timestamp=Older) lastname='Tillman' SSTable #2 for Users id='ltillman' firstname='Luke' (timestamp=Newer) firstname='Luke' lastname='Tillman'
  • 30. Reads on a single node • Any unflushed Memtable data is also merged 30 Client SELECT firstname, lastname FROM users WHERE id = 'ltillman' Disk Memory firstname='Luke' lastname='Tillman' Memtable for Users
  • 31. Reads on a single node • Client gets acknowledgement with the data • Reads in Cassandra are also FAST but often limited by Disk IO 31 Client SELECT firstname, lastname FROM users WHERE id = 'ltillman' Disk Memory firstname='Luke' lastname='Tillman'
  • 32. Compaction - Revisited • Compactions merge and unify data in our SSTables, making them important to reads (less SSTables = less to read/merge) 32 SSTable #1 for Users SSTable #2 for Users SSTable #3 for Users id='ltillman' firstname='Lucas' (timestamp=Older) lastname='Tillman' id='ltillman' firstname='Luke' lastname='Tillman' id='ltillman' firstname='Luke' (timestamp=Newer)
  • 34. Data Structures • Keyspace is like RDBMS Database or Schema • Like RDBMS, Cassandra uses Tables to store data • Partitions can have one row (narrow) or multiple rows (wide) 34 Keyspace Tables Partitions Rows
  • 35. Schema Definition (DDL) • Easy to define tables for storing data • First part of Primary Key is the Partition Key CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) );
  • 36. Schema Definition (DDL) • One row per partition (familiar) CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) ); name ... Keyboard Cat ... Nyan Cat ... Original Grumpy Cat ... videoid 689d56e5- … 93357d73- … d978b136- …
  • 37. Clustering Columns • Second part of Primary Key is Clustering Columns • Clustering columns affect ordering of data (on disk) • Multiple rows per partition 37 CREATE TABLE comments_by_video ( videoid uuid, commentid timeuuid, userid uuid, comment text, PRIMARY KEY (videoid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC);
  • 38. Clustering Columns – Wide Rows (Partitions) • Use of Clustering Columns is where the (old) term “Wide Rows” comes from 38 videoid='0fe6a...' userid= 'ac346...' comment= 'Awesome!' commentid='82be1...' (10/1/2014 9:36AM) userid= 'f89d3...' comment= 'Garbage!' commentid='765ac...' (9/17/2014 7:55AM) CREATE TABLE comments_by_video ( videoid uuid, commentid timeuuid, userid uuid, comment text, PRIMARY KEY (videoid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC);
  • 39. Inserts and Updates • Use INSERT or UPDATE to add and modify data • Both will overwrite data (no constraints like RDBMS) • INSERT and UPDATE functionally equivalent 39 INSERT INTO comments_by_video ( videoid, commentid, userid, comment) VALUES ( '0fe6a...', '82be1...', 'ac346...', 'Awesome!'); UPDATE comments_by_video SET userid = 'ac346...', comment = 'Awesome!' WHERE videoid = '0fe6a...' AND commentid = '82be1...';
  • 40. TTL and Deletes • Can specify a Time to Live (TTL) in seconds when doing an INSERT or UPDATE • Use DELETE statement to remove data • Can optionally specify columns to remove part of a row 40 INSERT INTO comments_by_video ( ... ) VALUES ( ... ) USING TTL 86400; DELETE FROM comments_by_video WHERE videoid = '0fe6a...' AND commentid = '82be1...';
  • 41. Querying • Use SELECT to get data from your tables • Always include Partition Key and optionally Clustering Columns • Can use ORDER BY and LIMIT • Use range queries (for example, by date) to slice partitions 41 SELECT * FROM comments_by_video WHERE videoid = 'a67cd...' LIMIT 10;
  • 42. Data Modeling Like a Pro 42
  • 43. Cassandra Data Modeling • Requires a different mindset than RDBMS modeling • Know your data and your queries up front • Queries drive a lot of the modeling decisions (i.e. “table per query” pattern) • Denormalize/Duplicate data at write time to do as few queries as possible come read time • Remember, disk is cheap and writes in Cassandra are FAST 43
  • 44. Getting to Know Your Data 44 User id firstname lastname email password Video id name description location preview_image tags features Comment comment id adds timestamp posts timestamp 1 n n 1 1 n n m rates rating
  • 45. Application Workflows 45 User Logs into site Show basic information about user Show videos added by a user Show comments posted by a user Search for a video by tag Show latest videos added to the site Show comments for a video Show ratings for a video Show video and its details
  • 46. Queries come from Workflows 46 Users User Logs into site Find user by email address Show basic information about user Find user by id Comments Show comments for a video Find comments by video (latest first) Show comments posted by a user Find comments by user (latest first) Ratings Show ratings for a video Find ratings by video
  • 47. Users – The Relational Way • Single Users table with all user data and an Id Primary Key • Add an index on email address to allow queries by email User Logs into site Find user by email address Show basic information about user Find user by id
  • 48. Users – The Cassandra Way User Logs into site Find user by email address Show basic information about user Find user by id CREATE TABLE user_credentials ( email text, password text, userid uuid, PRIMARY KEY (email) ); CREATE TABLE users ( userid uuid, firstname text, lastname text, email text, created_date timestamp, PRIMARY KEY (userid) );
  • 49. Modeling Relationships – Collection Types • Cassandra doesn’t support JOINs, but your data will still have relationships (and you can still model that in Cassandra) • One tool available is CQL collection types CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) );
  • 50. Modeling Relationships – Client Side Joins 50 CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) ); CREATE TABLE users ( userid uuid, firstname text, lastname text, email text, created_date timestamp, PRIMARY KEY (userid) ); Currently requires query for video, followed by query for user by id based on results of first query
  • 51. Modeling Relationships – Client Side Joins • What is the cost? Might be OK in small situations • Do NOT scale • Avoid when possible 51
  • 52. Modeling Relationships – Client Side Joins 52 CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, ... user_firstname text, user_lastname text, user_email text, PRIMARY KEY (videoid) ); CREATE TABLE users_by_video ( videoid uuid, userid uuid, firstname text, lastname text, email text, PRIMARY KEY (videoid) ); or
  • 53. .NET Driver for Cassandra 53
  • 54. .NET and Cassandra • Open Source (on GitHub), available via NuGet • Bootstrap using the Builder and then reuse the ISession object Cluster cluster = Cluster.Builder() .AddContactPoint("127.0.0.1") .Build(); ISession session = cluster.Connect("killrvideo"); 54
  • 55. .NET and Cassandra • Executing CQL with SimpleStatement • Sync and Async API available for executing statements • Use Async API for executing queries in parallel var videoId = Guid.NewGuid(); var statement = new SimpleStatement("SELECT * FROM videos WHERE videoid = ?", videoId); RowSet rows = await session.ExecuteAsync(statement); 55
  • 56. .NET and Cassandra • Getting values from a RowSet is easy • Rowset is a collection of Row (IEnumerable<Row>) RowSet rows = await _session.ExecuteAsync(statement); foreach (Row row in rows) { var videoId = row.GetValue<Guid>("videoid"); var addedDate = row.GetValue<DateTimeOffset>("added_date"); var name = row.GetValue<string>("name"); } 56
  • 57. CQL 3 Data Types to .NET Types • Full listing available in driver docs (http://www.datastax.com/docs) CQL 3 Data Type .NET Type bigint, counter long boolean bool decimal, float float double double int int uuid, timeuuid System.Guid text, varchar string (Encoding.UTF8) timestamp System.DateTimeOffset varint System.Numerics.BigInteger
  • 58. .NET and Cassandra - PreparedStatement • SimpleStatement useful for one-off CQL execution (or when dynamic CQL is a possibility) • Use PreparedStatement for better performance • Pay the cost of Prepare once (server roundtrip) • Save the PreparedStatement instance and reuse PreparedStatement prepared = session.Prepare( "SELECT * FROM user_credentials WHERE email = ?");
  • 59. .NET and Cassandra - PreparedStatement • Bind variable values to get BoundStatement for execution • Execution only has to send variable values • You will use these all the time • Remember: Prepare once, bind and execute many BoundStatement bound = prepared.Bind("luke.tillman@datastax.com"); RowSet rows = await _session.ExecuteAsync(bound);
  • 60. .NET and Cassandra - BatchStatement • Add Simple/Bound statements to a batch BoundStatement bound = prepared.Bind(video.VideoId, video.Name); var simple = new SimpleStatement( "UPDATE videos SET name = ? WHERE videoid = ?" ).Bind(video.Name, video.VideoId); // Use an atomic batch to send over all the mutations var batchStatement = new BatchStatement(); batchStatement.Add(bound); batchStatement.Add(simple); RowSet rows = await _session.ExecuteAsync(batch);
  • 61. .NET and Cassandra - BatchStatement • Batches are Logged (atomic) by default • Use when you want a group of mutations (statements) to all succeed or all fail (denormalizing at write time) • Really large batches are an anti-pattern (and Cassandra will warn you) • Not a performance optimization for bulk-loading data
  • 62. .NET and Cassandra – Statement Options • Options like Consistency Level and Retry Policy are available at the Statement level • If not set on a statement, driver will fallback to defaults set when building/configuring the Cluster 62 IStatement bound = prepared.Bind("luke.tillman@datastax.com") .SetPageSize(100) .SetConsistencyLevel(ConsistencyLevel.LocalOne) .SetRetryPolicy(new DefaultRetryPolicy()) .EnableTracing();
  • 63. Lightweight Transactions (LWT) • Use when you don’t want writes to step on each other • AKA Linearizable Consistency • Serial Isolation Level • Be sure to read the fine print: has a latency cost associated with using it, so use only where needed • The canonical example: unique user accounts
  • 64. Lightweight Transactions (LWT) • Returns a column called [applied] indicating success/failure • Different from the relational world where you might expect an Exception (i.e. PrimaryKeyViolationException or similar) var statement = new SimpleStatement("INSERT INTO user_credentials (email, password) VALUES (?, ?) IF NOT EXISTS"); statement = statement.Bind("user1@killrvideo.com", "Password1!"); RowSet rows = await _session.ExecuteAsync(statement); var userInserted = rows.Single().GetValue<bool>("[applied]");
  • 65. Automatic Paging • The Problem: Loading big result sets into memory is a recipe for disaster (OutOfMemoryExceptions, etc.) • Better to load and process a large result set in pages (chunks) • Doing this manually with Cassandra prior to 2.0 was a pain • Automatic Paging makes paging on a large RowSet transparent
  • 66. Automatic Paging • Set a page size on a statement • Iterate over the resulting RowSet • As you iterate, new pages are fetched transparently when the Rows in the current page are exhausted • Will allow you to iterate until all pages are exhausted boundStatement = boundStatement.SetPageSize(100); RowSet rows = await _session.ExecuteAsync(boundStatement); foreach (Row row in rows) { }
  • 67. Typical Pager UI in a Web Application • Show page of records in UI and allow user to navigate
  • 68. Typical Pager UI in a Web Application • Automatic Paging – this is not the feature you are looking for
  • 69. .NET and Cassandra • Mapping results to DTOs: if you like using CQL for querying, try Mapper component (formerly CqlPoco package) public class User { public Guid UserId { get; set; } public string Name { get; set; } } // Create a mapper from your session object var mapper = new Mapper(session); // Get a user by id from Cassandra or null if not found var user = client.SingleOrDefault<User>( "SELECT userid, name FROM users WHERE userid = ?", someUserId); 69
  • 70. .NET and Cassandra • Mapping results to DTOs: if you like LINQ, use built-in LINQ provider [Table("users")] public class User { [Column("userid"), PartitionKey] public Guid UserId { get; set; } [Column("name")] public string Name { get; set; } } var user = session.GetTable<User>() .SingleOrDefault(u => u.UserId == someUserId) .Execute(); 70
  • 72. Installing Cassandra • Planet Cassandra for all things C* (planetcassandra.org)
  • 73. Installing Cassandra • Windows installer is super easy way to do development and testing on your local machine • Production Cassandra deployments on Linux (Windows performance parity is coming in 3.0)
  • 74. In the Box – Command Line Tools • Use cqlsh REPL for running CQL against your cluster 74
  • 75. In the Box – Command Line Tools • Use nodetool for information on your cluster (and lots more) • On Windows, available under apache-cassandrabin folder 75
  • 76. DevCenter • Get it on DataStax web site (www.datastax.com) • GUI for Cassandra development (think SQL Server Management Studio for Cassandra) 76
  • 77. Sample Code – KillrVideo • Live demo available at http://www.killrvideo.com – Written in C# – Live Demo running in Azure – Open source: https://github.com/luketillman/killrvideo-csharp 77
  • 78. Questions? Overtime (use cases)? Follow me for updates or to ask questions later: @LukeTillman 78
  • 81. Some Common Use Case Categories • Product Catalogs and Playlists • Internet of Things (IoT) and Sensor Data • Messaging (emails, IMs, alerts, comments) • Recommendation and Personalization • Fraud Detection • Time series and temporal ordered data http://planetcassandra.org/apache-cassandra-use-cases/
  • 82. The “Slide Heard Round the World” • From Cassandra Summit 2014, got a lot of attention • 75,000+ nodes • 10s of PBs of data • Millions ops/s • One of the largest known Cassandra deployments 82
  • 83. Spotify • Streaming music web service • > 24,000,000 music tracks • > 50TB of data in Cassandra Why Cassandra? • Was PostgreSQL, but hit scaling problems • Multi Datacenter Availability • Integration with Spark for data processing and analytics Usage • Catalog • User playlists • Artists following • Radio Stations • Event notifications 83 http://planetcassandra.org/blog/interview/spotify-scales-to-the-top-of-the-charts-with-apache-cassandra-at-40k-requestssecond/
  • 84. eBay • Online auction site • > 250TB of data, dozens of nodes, multiple data centres • > 6 billion writes, > 5 billion reads per day Why Cassandra? • Low latency, high scale, multiple data centers • Suited for graph structures using wide rows Usage • Building next generation of recommendation engine • Storing user activity data • Updating models of user interests in real time 84 http://planetcassandra.org/blog/5-minute-c-interview-ebay/
  • 85. FullContact • Contact management: from multiple sources, sync, de-dupe, APIs available • 2 clusters, dozens of nodes, running in AWS • Based here in Denver Why Cassandra? • Migated from MongoDB after running into scaling issues • Operational simplicity • Resilience and Availability Usage • Person API (search by email, Twitter handle, Facebook, or phone) • Searched data from multiple sources (ingested by Hadoop M/R jobs) • Resolved profiles 85 http://planetcassandra.org/blog/fullcontact-readies-their-search-platform-to-scale-moves-from-mongodb-to-apache-cassandra/
  • 86. Instagram • Photo-sharing, video-sharing and social networking service • Originally AWS (Now Facebook data centers?) • > 20k writes/second, >15k reads/second Why Cassandra? • Migrated from Redis (problems keeping everything in memory) • No painful “sharding” process • 75% reduction in costs Usage • Auditing information – security, integrity, spam detection • News feed (“inboxes” or activity feed) – Likes, Follows, etc. 86 http://planetcassandra.org/blog/instagram-making-the-switch-to-cassandra-from-redis-75-instasavings/ Summit 2014 Presentation: https://www.youtube.com/watch?v=_gc94ITUitY
  • 87. Netflix • TV and Movie streaming service • > 2700+ nodes on over 90 clusters • 4 Datacenters • > 1 Trillion operations per day Why Cassandra? • Migrated from Oracle • Massive amounts of data • Multi datacenter, No SPOF • No downtime for schema changes Usage • Everything! (Almost – 95% of DB use) • Example: Personalization – What titles do you play? – What do you play before/after? – Where did you pause? – What did you abandon watching after 5 minutes? 87 http://planetcassandra.org/blog/case-study-netflix/ Summit 2014 Presentation: https://www.youtube.com/watch?v=RMSNLP_ORg8&index=43&list=UUvP-AXuCr-naAeEccCfKwUA
  • 88. Go forth and build awesome things! Follow me for updates or to ask questions later: @LukeTillman 88