C* @ Icon Platforms
Vassilis Bekiaris
@karbonized1
Software Architect
Presentation outline
• Meet Cassandra
• CQL - Data modeling basics
• Counters & Time-series use case: Polls
Meet Cassandra
History
• Started at Facebook
• Historically builds on
• Dynamo for distribution: consistent hashing, eventual consistency
• BigTable for disk storage model
Amazon’s Dynamo: http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
Google’s BigTable: http://research.google.com/archive/bigtable.html
Cassandra is
• A distributed database written in Java
• Scalable
• Masterless, no single point of failure
• Tunable consistency
• Network topology aware
Cassandra Data Model
• Original “Map of Maps” schema
• row key ➞ Map<ColumnName, Value>
• Now (in CQL):
• Keyspace = Database
• ColumnFamily = Table
• Row = Partition
• Column = Cell
• Data types
• strings, booleans, integers, decimals
• collections: list, set, map
• not indexable, not individually query-
able
• counters
• custom types
Cassandra Replication Factor &
Consistency Levels
• CAP Theorem:
• Consistency
• Availability
• Tolerance in the face of network partitions
Original article: http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
Review 12 years later: http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed
Fun with distributed systems under partitions: http://aphyr.com/tags/jepsen
Cassandra Replication Factor &
Consistency Levels
• RF: designated per keyspace
• CL:
• Writes: ANY, ONE, QUORUM, ALL
• Reads: ONE, QUORUM, ALL
• Consistent reads & writes are

achieved when CL(W) + CL(R) > RF
• QUORUM = RF/2 + 1
• Additional QUORUM variants:
• LOCAL_QUORUM: quorum of
replica nodes within same DC
• EACH_QUORUM: quorum of
replica nodes from all DCs
Cassandra parameters calculator: http://www.ecyrd.com/cassandracalculator/
Masterless design
• All nodes in the cluster are equal
• Gossip protocol among servers
• Adding / removing nodes is easy
• Clients are cluster-aware
Traditional replicated relational database systems focus on the
problem of guaranteeing strong consistency to replicated data.
Although strong consistency provides the application writer a
convenient programming model, these systems are limited in
scalability and availability [7]. These systems are not capable of
A
B
C
DE
F
G
Key K
Nodes B, C
and D store
keys in
range (A,B)
including
K.
Figure 2: Partitioning and replication of keys in Dynamo
ring.
Image from “Dynamo: Amazon’s Highly Available Key-value Store”
Write path
• Storage is log-structured; updates do not overwrite, deletes do not remove
• Commit log: sequential disk access
• Memtables: in-memory data structure (partially off-heap since 2.1b2)
• Memtables are flushed to SSTable on disk
• Compaction: merge SSTables, remove tombstones
Read path
• For each SSTable that may contain a partition key:
• Bloom filters: estimate probability of locating partition data per SSTable
• Locate offset in SSTable
• Sequential read in SSTable (if query involves several columns)
• A partition’s columns are merged from several SSTables / memtable, as
column updates never overwrite data
CQL - Data Modeling Basics
CQL
• Cassandra Query Language
• Client API for Cassandra
• CQL3 available since Cassandra 1.2
• Familiar syntax
• Easy to use
• Drivers available for Java, Python, C# and more
Creating a table
Creating a table - what happened??
• A new table was created
• It looks familiar!
• We defined the username as the
primary key, therefore we are able
to identify a row and query quickly
by username
• Primary keys can be composite;
the first part of the primary key is
the partition key and determines
the primary node for the partition
Composite Primary Key
Composite Primary Key
Partition Key
Composite Primary Key
Partition Key Clustering Column(s)
Composite Primary Key
Partition Key Clustering Column(s)
Partition key
(not ordered)
Composite Primary Key
Partition Key Clustering Column(s)
Partition key
(not ordered)
Clustering key
(ordered)
Composite Primary Key - Partition Layout
username
johndoe
key:
value:
key:
value:
username
anna
key:
value:
last_login
last_login
married_to
2014-01-04T12:00:00
2014-04-03T13:57:13
janedoe
Insert/Update
• INSERT & UPDATE are functionally equivalent
• New in Cassandra 2.0: Support for lightweight transactions (compare-and-
set)
• e.g. INSERT INTO users (username, email) VALUES (‘tony’,
‘tony@gmail.com’) IF NOT EXISTS;
• Based on Paxos consensus protocol
Paxos Made Live: An Engineering Perspective: http://research.google.com/archive/paxos_made_live.pdf
Select query
• SELECT * FROM user_attributes;
• Selecting across several partitions can be slow
• Default LIMIT 10.000
• Can filter results with WHERE clauses on partition key, partition key & clustering
columns or indexed columns
• EQ & IN operators allowed for partition keys
• EQ, <, > … operators allowed for clustering columns
Select query - Ordering
• Partition keys are not ordered
• … but clustering columns are ordered
• Default ordering is mandated by clustering columns
• ORDER BY can be specified on clustering columns at query time; default
order can be set WITH CLUSTERING ORDER on table creation
Secondary Indexes
• Secondary indexes allow queries using EQ or IN operators in columns other
than the partition key
• Internally implemented as hidden tables
• “Cassandra's built-in indexes are best on a table having many rows that
contain the indexed value. The more unique values that exist in a particular
column, the more overhead you will have, on average, to query and maintain
the index.”
http://www.datastax.com/documentation/cql/3.0/cql/ddl/ddl_when_use_index_c.html
Secondary Indexes
Query Performance
• Single-partition queries are fast!
• Queries for ranges on clustering columns are fast!
• Queries for multiple partitions are slow
• Use secondary indexes with caution
Counter columns
Tracing CQL requests
Setting TTL
Counters and Time Series use case:
Polls
Use cases
Data access patterns
• View poll ➞ Get poll name & sorted list of answers by poll id
• User votes ➞ Insert answer with user id, poll id, answer id, timestamp
• View result ➞ Retrieve counts per poll & answer
Poll & answers
POLL_ID TEXT
POLL_ID ANSWER_ID SORT_ORDER
POLL
POLL_ANSWER
ANSWER_ID TEXT
ANSWER
Poll & answers
• Need 3 queries to display a poll
• 2 by PK EQ
• 1 for multiple rows by PK IN
Poll & answers revisited
POLL_ID TEXT
POLL_ID SORT_ORDER ANSWER_ID ANSWER_TEXT
POLL
POLL_ANSWER
Poll & answers revisited
• Need 2 queries to display a poll
• both by PK EQ
Poll & answers re-revisited
POLL_ID
POLL_TEXT
(STATIC)
SORT_ORDER ANSWER_ID ANSWER_TEXT
POLL
(Requires Cassandra 2.0.6+)
Poll & answers re-revisited
• One table to rule them all
• One query by PK EQ
Votes
• Record user’s votes in a timeline
• Count of votes per answer
Votes
POLL_ID VOTED_ON USER_ID ANSWER_ID
VOTE
Time buckets
• If you have tons of votes to record, you may want to split your partitions in
buckets e.g. per day
Time buckets
• Partition layout
poll_id:1
day:20140401
user_id:21
answer_id:4
user_id:22
answer_id:1
poll_id:1
day:20140402
user_id:27
answer_id:2
user_id:29
answer_id:3
Counting votes
• Count per poll_id & answer_id
Links
• http://cassandra.apache.org
• http://planetcassandra.org/

Cassandra binary distributions, use cases, webinars
• http://www.datastax.com/docs

Excellent documentation for all things Cassandra (and DSE)
• http://www.slideshare.net/patrickmcfadin/cassandra-20-and-timeseries

Cassandra 2.0 new features & time series modeling
Thank you!

Cassandra Basics, Counters and Time Series Modeling

  • 1.
    C* @ IconPlatforms Vassilis Bekiaris @karbonized1 Software Architect
  • 2.
    Presentation outline • MeetCassandra • CQL - Data modeling basics • Counters & Time-series use case: Polls
  • 3.
  • 4.
    History • Started atFacebook • Historically builds on • Dynamo for distribution: consistent hashing, eventual consistency • BigTable for disk storage model Amazon’s Dynamo: http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html Google’s BigTable: http://research.google.com/archive/bigtable.html
  • 5.
    Cassandra is • Adistributed database written in Java • Scalable • Masterless, no single point of failure • Tunable consistency • Network topology aware
  • 6.
    Cassandra Data Model •Original “Map of Maps” schema • row key ➞ Map<ColumnName, Value> • Now (in CQL): • Keyspace = Database • ColumnFamily = Table • Row = Partition • Column = Cell • Data types • strings, booleans, integers, decimals • collections: list, set, map • not indexable, not individually query- able • counters • custom types
  • 7.
    Cassandra Replication Factor& Consistency Levels • CAP Theorem: • Consistency • Availability • Tolerance in the face of network partitions Original article: http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf Review 12 years later: http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed Fun with distributed systems under partitions: http://aphyr.com/tags/jepsen
  • 8.
    Cassandra Replication Factor& Consistency Levels • RF: designated per keyspace • CL: • Writes: ANY, ONE, QUORUM, ALL • Reads: ONE, QUORUM, ALL • Consistent reads & writes are
 achieved when CL(W) + CL(R) > RF • QUORUM = RF/2 + 1 • Additional QUORUM variants: • LOCAL_QUORUM: quorum of replica nodes within same DC • EACH_QUORUM: quorum of replica nodes from all DCs Cassandra parameters calculator: http://www.ecyrd.com/cassandracalculator/
  • 9.
    Masterless design • Allnodes in the cluster are equal • Gossip protocol among servers • Adding / removing nodes is easy • Clients are cluster-aware Traditional replicated relational database systems focus on the problem of guaranteeing strong consistency to replicated data. Although strong consistency provides the application writer a convenient programming model, these systems are limited in scalability and availability [7]. These systems are not capable of A B C DE F G Key K Nodes B, C and D store keys in range (A,B) including K. Figure 2: Partitioning and replication of keys in Dynamo ring. Image from “Dynamo: Amazon’s Highly Available Key-value Store”
  • 10.
    Write path • Storageis log-structured; updates do not overwrite, deletes do not remove • Commit log: sequential disk access • Memtables: in-memory data structure (partially off-heap since 2.1b2) • Memtables are flushed to SSTable on disk • Compaction: merge SSTables, remove tombstones
  • 11.
    Read path • Foreach SSTable that may contain a partition key: • Bloom filters: estimate probability of locating partition data per SSTable • Locate offset in SSTable • Sequential read in SSTable (if query involves several columns) • A partition’s columns are merged from several SSTables / memtable, as column updates never overwrite data
  • 12.
    CQL - DataModeling Basics
  • 13.
    CQL • Cassandra QueryLanguage • Client API for Cassandra • CQL3 available since Cassandra 1.2 • Familiar syntax • Easy to use • Drivers available for Java, Python, C# and more
  • 14.
  • 15.
    Creating a table- what happened?? • A new table was created • It looks familiar! • We defined the username as the primary key, therefore we are able to identify a row and query quickly by username • Primary keys can be composite; the first part of the primary key is the partition key and determines the primary node for the partition
  • 16.
  • 17.
  • 18.
    Composite Primary Key PartitionKey Clustering Column(s)
  • 19.
    Composite Primary Key PartitionKey Clustering Column(s) Partition key (not ordered)
  • 20.
    Composite Primary Key PartitionKey Clustering Column(s) Partition key (not ordered) Clustering key (ordered)
  • 21.
    Composite Primary Key- Partition Layout username johndoe key: value: key: value: username anna key: value: last_login last_login married_to 2014-01-04T12:00:00 2014-04-03T13:57:13 janedoe
  • 22.
    Insert/Update • INSERT &UPDATE are functionally equivalent • New in Cassandra 2.0: Support for lightweight transactions (compare-and- set) • e.g. INSERT INTO users (username, email) VALUES (‘tony’, ‘tony@gmail.com’) IF NOT EXISTS; • Based on Paxos consensus protocol Paxos Made Live: An Engineering Perspective: http://research.google.com/archive/paxos_made_live.pdf
  • 23.
    Select query • SELECT* FROM user_attributes; • Selecting across several partitions can be slow • Default LIMIT 10.000 • Can filter results with WHERE clauses on partition key, partition key & clustering columns or indexed columns • EQ & IN operators allowed for partition keys • EQ, <, > … operators allowed for clustering columns
  • 24.
    Select query -Ordering • Partition keys are not ordered • … but clustering columns are ordered • Default ordering is mandated by clustering columns • ORDER BY can be specified on clustering columns at query time; default order can be set WITH CLUSTERING ORDER on table creation
  • 25.
    Secondary Indexes • Secondaryindexes allow queries using EQ or IN operators in columns other than the partition key • Internally implemented as hidden tables • “Cassandra's built-in indexes are best on a table having many rows that contain the indexed value. The more unique values that exist in a particular column, the more overhead you will have, on average, to query and maintain the index.” http://www.datastax.com/documentation/cql/3.0/cql/ddl/ddl_when_use_index_c.html
  • 26.
  • 27.
    Query Performance • Single-partitionqueries are fast! • Queries for ranges on clustering columns are fast! • Queries for multiple partitions are slow • Use secondary indexes with caution
  • 28.
  • 29.
  • 30.
  • 31.
    Counters and TimeSeries use case: Polls
  • 32.
  • 33.
    Data access patterns •View poll ➞ Get poll name & sorted list of answers by poll id • User votes ➞ Insert answer with user id, poll id, answer id, timestamp • View result ➞ Retrieve counts per poll & answer
  • 34.
    Poll & answers POLL_IDTEXT POLL_ID ANSWER_ID SORT_ORDER POLL POLL_ANSWER ANSWER_ID TEXT ANSWER
  • 35.
    Poll & answers •Need 3 queries to display a poll • 2 by PK EQ • 1 for multiple rows by PK IN
  • 36.
    Poll & answersrevisited POLL_ID TEXT POLL_ID SORT_ORDER ANSWER_ID ANSWER_TEXT POLL POLL_ANSWER
  • 37.
    Poll & answersrevisited • Need 2 queries to display a poll • both by PK EQ
  • 38.
    Poll & answersre-revisited POLL_ID POLL_TEXT (STATIC) SORT_ORDER ANSWER_ID ANSWER_TEXT POLL (Requires Cassandra 2.0.6+)
  • 39.
    Poll & answersre-revisited • One table to rule them all • One query by PK EQ
  • 40.
    Votes • Record user’svotes in a timeline • Count of votes per answer
  • 41.
  • 42.
    Time buckets • Ifyou have tons of votes to record, you may want to split your partitions in buckets e.g. per day
  • 43.
    Time buckets • Partitionlayout poll_id:1 day:20140401 user_id:21 answer_id:4 user_id:22 answer_id:1 poll_id:1 day:20140402 user_id:27 answer_id:2 user_id:29 answer_id:3
  • 44.
    Counting votes • Countper poll_id & answer_id
  • 45.
    Links • http://cassandra.apache.org • http://planetcassandra.org/
 Cassandrabinary distributions, use cases, webinars • http://www.datastax.com/docs
 Excellent documentation for all things Cassandra (and DSE) • http://www.slideshare.net/patrickmcfadin/cassandra-20-and-timeseries
 Cassandra 2.0 new features & time series modeling
  • 46.