SlideShare a Scribd company logo
1 of 46
Download to read offline
C* @ Icon Platforms
Vassilis Bekiaris
@karbonized1
Software Architect
Presentation outline
• Meet Cassandra
• CQL - Data modeling basics
• Counters & Time-series use case: Polls
Meet Cassandra
History
• Started at Facebook
• Historically builds on
• Dynamo for distribution: consistent hashing, eventual consistency
• BigTable for disk storage model
Amazon’s Dynamo: http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
Google’s BigTable: http://research.google.com/archive/bigtable.html
Cassandra is
• A distributed database written in Java
• Scalable
• Masterless, no single point of failure
• Tunable consistency
• Network topology aware
Cassandra Data Model
• Original “Map of Maps” schema
• row key ➞ Map<ColumnName, Value>
• Now (in CQL):
• Keyspace = Database
• ColumnFamily = Table
• Row = Partition
• Column = Cell
• Data types
• strings, booleans, integers, decimals
• collections: list, set, map
• not indexable, not individually query-
able
• counters
• custom types
Cassandra Replication Factor &
Consistency Levels
• CAP Theorem:
• Consistency
• Availability
• Tolerance in the face of network partitions
Original article: http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
Review 12 years later: http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed
Fun with distributed systems under partitions: http://aphyr.com/tags/jepsen
Cassandra Replication Factor &
Consistency Levels
• RF: designated per keyspace
• CL:
• Writes: ANY, ONE, QUORUM, ALL
• Reads: ONE, QUORUM, ALL
• Consistent reads & writes are

achieved when CL(W) + CL(R) > RF
• QUORUM = RF/2 + 1
• Additional QUORUM variants:
• LOCAL_QUORUM: quorum of
replica nodes within same DC
• EACH_QUORUM: quorum of
replica nodes from all DCs
Cassandra parameters calculator: http://www.ecyrd.com/cassandracalculator/
Masterless design
• All nodes in the cluster are equal
• Gossip protocol among servers
• Adding / removing nodes is easy
• Clients are cluster-aware
Traditional replicated relational database systems focus on the
problem of guaranteeing strong consistency to replicated data.
Although strong consistency provides the application writer a
convenient programming model, these systems are limited in
scalability and availability [7]. These systems are not capable of
A
B
C
DE
F
G
Key K
Nodes B, C
and D store
keys in
range (A,B)
including
K.
Figure 2: Partitioning and replication of keys in Dynamo
ring.
Image from “Dynamo: Amazon’s Highly Available Key-value Store”
Write path
• Storage is log-structured; updates do not overwrite, deletes do not remove
• Commit log: sequential disk access
• Memtables: in-memory data structure (partially off-heap since 2.1b2)
• Memtables are flushed to SSTable on disk
• Compaction: merge SSTables, remove tombstones
Read path
• For each SSTable that may contain a partition key:
• Bloom filters: estimate probability of locating partition data per SSTable
• Locate offset in SSTable
• Sequential read in SSTable (if query involves several columns)
• A partition’s columns are merged from several SSTables / memtable, as
column updates never overwrite data
CQL - Data Modeling Basics
CQL
• Cassandra Query Language
• Client API for Cassandra
• CQL3 available since Cassandra 1.2
• Familiar syntax
• Easy to use
• Drivers available for Java, Python, C# and more
Creating a table
Creating a table - what happened??
• A new table was created
• It looks familiar!
• We defined the username as the
primary key, therefore we are able
to identify a row and query quickly
by username
• Primary keys can be composite;
the first part of the primary key is
the partition key and determines
the primary node for the partition
Composite Primary Key
Composite Primary Key
Partition Key
Composite Primary Key
Partition Key Clustering Column(s)
Composite Primary Key
Partition Key Clustering Column(s)
Partition key
(not ordered)
Composite Primary Key
Partition Key Clustering Column(s)
Partition key
(not ordered)
Clustering key
(ordered)
Composite Primary Key - Partition Layout
username
johndoe
key:
value:
key:
value:
username
anna
key:
value:
last_login
last_login
married_to
2014-01-04T12:00:00
2014-04-03T13:57:13
janedoe
Insert/Update
• INSERT & UPDATE are functionally equivalent
• New in Cassandra 2.0: Support for lightweight transactions (compare-and-
set)
• e.g. INSERT INTO users (username, email) VALUES (‘tony’,
‘tony@gmail.com’) IF NOT EXISTS;
• Based on Paxos consensus protocol
Paxos Made Live: An Engineering Perspective: http://research.google.com/archive/paxos_made_live.pdf
Select query
• SELECT * FROM user_attributes;
• Selecting across several partitions can be slow
• Default LIMIT 10.000
• Can filter results with WHERE clauses on partition key, partition key & clustering
columns or indexed columns
• EQ & IN operators allowed for partition keys
• EQ, <, > … operators allowed for clustering columns
Select query - Ordering
• Partition keys are not ordered
• … but clustering columns are ordered
• Default ordering is mandated by clustering columns
• ORDER BY can be specified on clustering columns at query time; default
order can be set WITH CLUSTERING ORDER on table creation
Secondary Indexes
• Secondary indexes allow queries using EQ or IN operators in columns other
than the partition key
• Internally implemented as hidden tables
• “Cassandra's built-in indexes are best on a table having many rows that
contain the indexed value. The more unique values that exist in a particular
column, the more overhead you will have, on average, to query and maintain
the index.”
http://www.datastax.com/documentation/cql/3.0/cql/ddl/ddl_when_use_index_c.html
Secondary Indexes
Query Performance
• Single-partition queries are fast!
• Queries for ranges on clustering columns are fast!
• Queries for multiple partitions are slow
• Use secondary indexes with caution
Counter columns
Tracing CQL requests
Setting TTL
Counters and Time Series use case:
Polls
Use cases
Data access patterns
• View poll ➞ Get poll name & sorted list of answers by poll id
• User votes ➞ Insert answer with user id, poll id, answer id, timestamp
• View result ➞ Retrieve counts per poll & answer
Poll & answers
POLL_ID TEXT
POLL_ID ANSWER_ID SORT_ORDER
POLL
POLL_ANSWER
ANSWER_ID TEXT
ANSWER
Poll & answers
• Need 3 queries to display a poll
• 2 by PK EQ
• 1 for multiple rows by PK IN
Poll & answers revisited
POLL_ID TEXT
POLL_ID SORT_ORDER ANSWER_ID ANSWER_TEXT
POLL
POLL_ANSWER
Poll & answers revisited
• Need 2 queries to display a poll
• both by PK EQ
Poll & answers re-revisited
POLL_ID
POLL_TEXT
(STATIC)
SORT_ORDER ANSWER_ID ANSWER_TEXT
POLL
(Requires Cassandra 2.0.6+)
Poll & answers re-revisited
• One table to rule them all
• One query by PK EQ
Votes
• Record user’s votes in a timeline
• Count of votes per answer
Votes
POLL_ID VOTED_ON USER_ID ANSWER_ID
VOTE
Time buckets
• If you have tons of votes to record, you may want to split your partitions in
buckets e.g. per day
Time buckets
• Partition layout
poll_id:1
day:20140401
user_id:21
answer_id:4
user_id:22
answer_id:1
poll_id:1
day:20140402
user_id:27
answer_id:2
user_id:29
answer_id:3
Counting votes
• Count per poll_id & answer_id
Links
• http://cassandra.apache.org
• http://planetcassandra.org/

Cassandra binary distributions, use cases, webinars
• http://www.datastax.com/docs

Excellent documentation for all things Cassandra (and DSE)
• http://www.slideshare.net/patrickmcfadin/cassandra-20-and-timeseries

Cassandra 2.0 new features & time series modeling
Thank you!

More Related Content

What's hot

Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
Cloudera, Inc.
 

What's hot (20)

NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
 
Cassandra
CassandraCassandra
Cassandra
 
Cql – cassandra query language
Cql – cassandra query languageCql – cassandra query language
Cql – cassandra query language
 
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel LiljencrantzC* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
 
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
 
Making Nested Columns as First Citizen in Apache Spark SQL
Making Nested Columns as First Citizen in Apache Spark SQLMaking Nested Columns as First Citizen in Apache Spark SQL
Making Nested Columns as First Citizen in Apache Spark SQL
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
 
From distributed caches to in-memory data grids
From distributed caches to in-memory data gridsFrom distributed caches to in-memory data grids
From distributed caches to in-memory data grids
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
 
Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?
 
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinC* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache Cassandra
 
Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012
 
Cassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsCassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requests
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and Consistency
 

Viewers also liked

Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
Patrick McFadin
 
Cassandra vs. Redis
Cassandra vs. RedisCassandra vs. Redis
Cassandra vs. Redis
Tim Lossen
 

Viewers also liked (20)

Data Modeling with Cassandra and Time Series Data
Data Modeling with Cassandra and Time Series DataData Modeling with Cassandra and Time Series Data
Data Modeling with Cassandra and Time Series Data
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseries
 
Cassandra compaction
Cassandra compactionCassandra compaction
Cassandra compaction
 
Micro services
Micro servicesMicro services
Micro services
 
Graphite cluster setup blueprint
Graphite cluster setup blueprintGraphite cluster setup blueprint
Graphite cluster setup blueprint
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team Apache
 
Highly Available Graphite
Highly Available GraphiteHighly Available Graphite
Highly Available Graphite
 
Open source or proprietary, choose wisely!
Open source or proprietary,  choose wisely!Open source or proprietary,  choose wisely!
Open source or proprietary, choose wisely!
 
It's not you, it's me: Ending a 15 year relationship with RRD
It's not you, it's me: Ending a 15 year relationship with RRDIt's not you, it's me: Ending a 15 year relationship with RRD
It's not you, it's me: Ending a 15 year relationship with RRD
 
Time series storage in Cassandra
Time series storage in CassandraTime series storage in Cassandra
Time series storage in Cassandra
 
Target: Performance Tuning Cassandra at Target
Target: Performance Tuning Cassandra at TargetTarget: Performance Tuning Cassandra at Target
Target: Performance Tuning Cassandra at Target
 
Bulk Loading Data into Cassandra
Bulk Loading Data into CassandraBulk Loading Data into Cassandra
Bulk Loading Data into Cassandra
 
Cassandra Basics: Indexing
Cassandra Basics: IndexingCassandra Basics: Indexing
Cassandra Basics: Indexing
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)
 
Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data
 
Cassandra vs. Redis
Cassandra vs. RedisCassandra vs. Redis
Cassandra vs. Redis
 
Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)
 

Similar to Cassandra Basics, Counters and Time Series Modeling

Slide presentation pycassa_upload
Slide presentation pycassa_uploadSlide presentation pycassa_upload
Slide presentation pycassa_upload
Rajini Ramesh
 
PostgreSQL 9.0 & The Future
PostgreSQL 9.0 & The FuturePostgreSQL 9.0 & The Future
PostgreSQL 9.0 & The Future
Aaron Thul
 
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
DataStax
 

Similar to Cassandra Basics, Counters and Time Series Modeling (20)

Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Cassandra
Cassandra Cassandra
Cassandra
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for Developers
 
Be A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data PipelineBe A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data Pipeline
 
Slide presentation pycassa_upload
Slide presentation pycassa_uploadSlide presentation pycassa_upload
Slide presentation pycassa_upload
 
Geek Sync I Need for Speed: In-Memory Databases in Oracle and SQL Server
Geek Sync I Need for Speed: In-Memory Databases in Oracle and SQL ServerGeek Sync I Need for Speed: In-Memory Databases in Oracle and SQL Server
Geek Sync I Need for Speed: In-Memory Databases in Oracle and SQL Server
 
Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
 
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleFiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
 
Introduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDBIntroduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDB
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Dan Hotka's Top 10 Oracle 12c New Features
Dan Hotka's Top 10 Oracle 12c New FeaturesDan Hotka's Top 10 Oracle 12c New Features
Dan Hotka's Top 10 Oracle 12c New Features
 
PostgreSQL 9.0 & The Future
PostgreSQL 9.0 & The FuturePostgreSQL 9.0 & The Future
PostgreSQL 9.0 & The Future
 
30334823 my sql-cluster-performance-tuning-best-practices
30334823 my sql-cluster-performance-tuning-best-practices30334823 my sql-cluster-performance-tuning-best-practices
30334823 my sql-cluster-performance-tuning-best-practices
 
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
 
Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra Community
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server Databases
 

Recently uploaded

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 

Recently uploaded (20)

AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 

Cassandra Basics, Counters and Time Series Modeling

  • 1. C* @ Icon Platforms Vassilis Bekiaris @karbonized1 Software Architect
  • 2. Presentation outline • Meet Cassandra • CQL - Data modeling basics • Counters & Time-series use case: Polls
  • 4. History • Started at Facebook • Historically builds on • Dynamo for distribution: consistent hashing, eventual consistency • BigTable for disk storage model Amazon’s Dynamo: http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html Google’s BigTable: http://research.google.com/archive/bigtable.html
  • 5. Cassandra is • A distributed database written in Java • Scalable • Masterless, no single point of failure • Tunable consistency • Network topology aware
  • 6. Cassandra Data Model • Original “Map of Maps” schema • row key ➞ Map<ColumnName, Value> • Now (in CQL): • Keyspace = Database • ColumnFamily = Table • Row = Partition • Column = Cell • Data types • strings, booleans, integers, decimals • collections: list, set, map • not indexable, not individually query- able • counters • custom types
  • 7. Cassandra Replication Factor & Consistency Levels • CAP Theorem: • Consistency • Availability • Tolerance in the face of network partitions Original article: http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf Review 12 years later: http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed Fun with distributed systems under partitions: http://aphyr.com/tags/jepsen
  • 8. Cassandra Replication Factor & Consistency Levels • RF: designated per keyspace • CL: • Writes: ANY, ONE, QUORUM, ALL • Reads: ONE, QUORUM, ALL • Consistent reads & writes are
 achieved when CL(W) + CL(R) > RF • QUORUM = RF/2 + 1 • Additional QUORUM variants: • LOCAL_QUORUM: quorum of replica nodes within same DC • EACH_QUORUM: quorum of replica nodes from all DCs Cassandra parameters calculator: http://www.ecyrd.com/cassandracalculator/
  • 9. Masterless design • All nodes in the cluster are equal • Gossip protocol among servers • Adding / removing nodes is easy • Clients are cluster-aware Traditional replicated relational database systems focus on the problem of guaranteeing strong consistency to replicated data. Although strong consistency provides the application writer a convenient programming model, these systems are limited in scalability and availability [7]. These systems are not capable of A B C DE F G Key K Nodes B, C and D store keys in range (A,B) including K. Figure 2: Partitioning and replication of keys in Dynamo ring. Image from “Dynamo: Amazon’s Highly Available Key-value Store”
  • 10. Write path • Storage is log-structured; updates do not overwrite, deletes do not remove • Commit log: sequential disk access • Memtables: in-memory data structure (partially off-heap since 2.1b2) • Memtables are flushed to SSTable on disk • Compaction: merge SSTables, remove tombstones
  • 11. Read path • For each SSTable that may contain a partition key: • Bloom filters: estimate probability of locating partition data per SSTable • Locate offset in SSTable • Sequential read in SSTable (if query involves several columns) • A partition’s columns are merged from several SSTables / memtable, as column updates never overwrite data
  • 12. CQL - Data Modeling Basics
  • 13. CQL • Cassandra Query Language • Client API for Cassandra • CQL3 available since Cassandra 1.2 • Familiar syntax • Easy to use • Drivers available for Java, Python, C# and more
  • 15. Creating a table - what happened?? • A new table was created • It looks familiar! • We defined the username as the primary key, therefore we are able to identify a row and query quickly by username • Primary keys can be composite; the first part of the primary key is the partition key and determines the primary node for the partition
  • 18. Composite Primary Key Partition Key Clustering Column(s)
  • 19. Composite Primary Key Partition Key Clustering Column(s) Partition key (not ordered)
  • 20. Composite Primary Key Partition Key Clustering Column(s) Partition key (not ordered) Clustering key (ordered)
  • 21. Composite Primary Key - Partition Layout username johndoe key: value: key: value: username anna key: value: last_login last_login married_to 2014-01-04T12:00:00 2014-04-03T13:57:13 janedoe
  • 22. Insert/Update • INSERT & UPDATE are functionally equivalent • New in Cassandra 2.0: Support for lightweight transactions (compare-and- set) • e.g. INSERT INTO users (username, email) VALUES (‘tony’, ‘tony@gmail.com’) IF NOT EXISTS; • Based on Paxos consensus protocol Paxos Made Live: An Engineering Perspective: http://research.google.com/archive/paxos_made_live.pdf
  • 23. Select query • SELECT * FROM user_attributes; • Selecting across several partitions can be slow • Default LIMIT 10.000 • Can filter results with WHERE clauses on partition key, partition key & clustering columns or indexed columns • EQ & IN operators allowed for partition keys • EQ, <, > … operators allowed for clustering columns
  • 24. Select query - Ordering • Partition keys are not ordered • … but clustering columns are ordered • Default ordering is mandated by clustering columns • ORDER BY can be specified on clustering columns at query time; default order can be set WITH CLUSTERING ORDER on table creation
  • 25. Secondary Indexes • Secondary indexes allow queries using EQ or IN operators in columns other than the partition key • Internally implemented as hidden tables • “Cassandra's built-in indexes are best on a table having many rows that contain the indexed value. The more unique values that exist in a particular column, the more overhead you will have, on average, to query and maintain the index.” http://www.datastax.com/documentation/cql/3.0/cql/ddl/ddl_when_use_index_c.html
  • 27. Query Performance • Single-partition queries are fast! • Queries for ranges on clustering columns are fast! • Queries for multiple partitions are slow • Use secondary indexes with caution
  • 31. Counters and Time Series use case: Polls
  • 33. Data access patterns • View poll ➞ Get poll name & sorted list of answers by poll id • User votes ➞ Insert answer with user id, poll id, answer id, timestamp • View result ➞ Retrieve counts per poll & answer
  • 34. Poll & answers POLL_ID TEXT POLL_ID ANSWER_ID SORT_ORDER POLL POLL_ANSWER ANSWER_ID TEXT ANSWER
  • 35. Poll & answers • Need 3 queries to display a poll • 2 by PK EQ • 1 for multiple rows by PK IN
  • 36. Poll & answers revisited POLL_ID TEXT POLL_ID SORT_ORDER ANSWER_ID ANSWER_TEXT POLL POLL_ANSWER
  • 37. Poll & answers revisited • Need 2 queries to display a poll • both by PK EQ
  • 38. Poll & answers re-revisited POLL_ID POLL_TEXT (STATIC) SORT_ORDER ANSWER_ID ANSWER_TEXT POLL (Requires Cassandra 2.0.6+)
  • 39. Poll & answers re-revisited • One table to rule them all • One query by PK EQ
  • 40. Votes • Record user’s votes in a timeline • Count of votes per answer
  • 42. Time buckets • If you have tons of votes to record, you may want to split your partitions in buckets e.g. per day
  • 43. Time buckets • Partition layout poll_id:1 day:20140401 user_id:21 answer_id:4 user_id:22 answer_id:1 poll_id:1 day:20140402 user_id:27 answer_id:2 user_id:29 answer_id:3
  • 44. Counting votes • Count per poll_id & answer_id
  • 45. Links • http://cassandra.apache.org • http://planetcassandra.org/
 Cassandra binary distributions, use cases, webinars • http://www.datastax.com/docs
 Excellent documentation for all things Cassandra (and DSE) • http://www.slideshare.net/patrickmcfadin/cassandra-20-and-timeseries
 Cassandra 2.0 new features & time series modeling