SlideShare a Scribd company logo
Apache Cassandra for mission critical
data
OLEKSANDR SEMENOV
Agenda
1) CAP Theorem
2) NoSQL vs RDBMS: advantages and disadvantages
3) What is Cassandra? History.
4) Cassandra features
5) Cassandra datamodel
6) Ways to access data: Thrift, CQL, Kundera ORM
What is NoSQL
NoSQL Not SQL
does not mean
NoSQL Not Only SQL
OR
Not Relational Database
it means
CAP Theorem
You can choose only two: Consistency, Availability, Partition tolerance
Choosing AP data storages
Cassandra is an AP storage
RDBMS
+ Strong mathematical basis
+ Referential Integrity
+ ACID transactions
+ Standard SQL
+ Well-known approaches to data modeling
- Poor performance at great data amounts
- Scaling issues
NoSQL
+ Great performance
+ Flexible data schema
+ Easy scaling
- Data redundancy
- Integrity should be ensured by developer in most
cases
- Different access interfaces for different stores
- Paradigm shift required
- BASE consistency model instead of ACID
transactions
ACID consistency model
Atomicity
• Transactions
are all or
nothing
Consistency
• Data written is
valid according
all rules:
Isolation
• Transactions
do not affect
each other
Durability
• Data written
will not be lost
BASE consistency model
BASE system example
What is Cassandra?
Cassandra is a:
• non-relational
• highly-scalable
• decentralized
• eventually consistent
key-multivalue storage
History
Who uses Cassandra?
Cassandra Features
Decentralized
• each node has the
same role and can
process any
request
Replication
• Cassandra
supports
multi -
datacenter
replication
Scalable
• read and
write
throughput
both increase
linearly as
new
machines are
added
Durable
• data write
once will
survive in
case of
hardware
failure
Cassandra Features
Fault-
tolerant
•data is
automatically
replicated to
multiple nodes for
fault-tolerance
Tunable
consistency
•you can choose
desired
consistency level
CQL
• SQL-like
query
language
Very fast IO
•Both reads and
writes are very
fast
Availability: partitioning with SPOF
Availability: Cassandra & no SPOF
• Each node can act
as router
• Data is replicated
to several nodes
according to
replication factor
Replication Factor
Replication Factor = 3
Availability
Tunable consistency
Consistency can be set on per-operation basis
Write path in Cassandra
• Data is written to any node called coordinator
• Data is written to commitlog(for durability) and then to memTable
• MemTable is flushed to disk(SSTable) periodically, it is recreated in memory
• Deletes are special cases of writes - tombstones
Read path in Cassandra
• Any server can be queried, it acts as coordinator
• Contacts node with requested key
• If consistency < ALL, read repair is performed on background
Read at consistency level = ONE
Read repair
• Read repair means that when a query is made against a given key, we
perform a digest query against all the replicas of the key and push
the most recent version to any out-of-date replicas.
Cassandra datamodel
Keyspace
ColumnFamily
Columns SuperColumns
Database
Table
Columns
RDBMS Cassandra
ColumnFamilies usage
patterns
Static
Dynamic
Columns
Column – is a tuple
which contains 3 fields:
name, value and
timestamp
Special column types
• Expiring Columns –
column with auto-removal
• Counter columns –
columns with auto-
increment.
• SuperColumns – columns,
which contain other
columns. Deprecated.
SuperColumns
Indexes
• Primary index – index built by key of the each row
• Secondary index – index on column values,
should be created manually. Good only for low
cardinality columns.
Example: columns Gender can have only two values:
M and F.
And it is a problem.
• Indexing is performed in background
Data modelling
• Query-driven approach is
required
• How to get data if I can
query only by key?
• Denormalize it!
• Create multiple tables for
data
• Use fast writes to do few
reads as possible
What Cassandra is good for?
Time series data
(logs, sensor data)
Write
intensive
applications
Applications
with
predefined
query-model
Never use Cassandra
• If you want to replace traditional RDBMS with it.
• If you can’t tell in which way your data will be queried
• If you have a lot of reads
• If strong consistency is required (financial, medical areas)
• Cassandra is not a silver-bullet solution
Ways to access data
Thrift
• First & native
client.
Deprecated.
Hector, Pelops
• Libraries based
on Thrift
CQL
• SQL-like
language,
very limited
Kundera
• ORM/ONM
framework
Thrift
• Apache Thrift – framework for cross-language
services development
• Supported languages: C++, Java, Python, PHP,
Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript,
Smalltalk, OCaml and others.
• Was developed by Facebook and released in 2007
• Deprecated 
Hector
• Hector - is a high level Java client for Apache
Cassandra currently in use on a number of
production systems.
• Includes an incredible number of features
Hector main features
• Security – connection using Kerberos
• Speed4j monitoring library integrating capabilities
• Hector Object Mapper – simple ORM(not
compliant with JPA  )
• Connection pooling
• Failover behavior on client side
CQL
CQL – a SQL-like language introduced in Cassandra
0.8
Offers next functionality:
• No JOINS
• Creating/dropping keyspaces, column families,
columns and rows
• Inserting/retrieving columns
• Indexing
Kundera ORM
Kundera is a “Polyglot Object Mapper”
Supports:
◦ Cassandra
◦ HBase
◦ MongoDB
◦ RDBMS
◦ and other
Kundera ORM
JPA 2.1
compliant
Supports
cross-
datastore-
persistance
Supports
many-to-
many
relationships
Allows to add
any NoSQL
support by
implementing
Client
Extension
Performance Comparison
Benchmarked on Amazon Ubuntu large instance:
◦ 7.5 GB memory
◦ 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute
Units each)
◦ 64-bit platform
Performance Comparison
Number Of Threads
(1 record)
Pelops Time (in sec) Hector Time (in sec) Kundera (in sec)
10 0.148 0.100 0.117
100 0.350 0.363 0.361
1000 1.793 1.885 2.180
10000 11.478 11.480 14.262
40000 38.887 37.241 41.977
50000 48.646 47.749 49.285
100000 91.280 92.874 97.707
Concurrent load – 1 record per thread
Performance Comparison
0
20
40
60
80
100
120
10 100 1000 10000 40000 50000 100000
Time,s
Threads number
Concurrent load - 1 record for each thread
Pelops
Hector
Kundera
Performance Comparison
Concurrent + Bulk load – 1000 record per thread
Number Of Threads
(1000 rec/ thread)
Pelops Time (in sec) Hector Time (in sec) Kundera (in sec)
10 5.929 5.286 7.722
100 34.750 32.228 39.124
1000 368.022 352.711 393.931
Performance Comparison
0
200
400
600
800
1000
1200
10 100 1000
Time,s
Thread number
Concurrent + Bulk load – 1000 record per thread
Kundera
Hector
Pelops
Cassandra limitations
The key
(and
column
names)
must < 64K
bytes.
The
maximum
number of
column per
row is 2
billion.
A single
column
value may
not be
larger than
2GB.
All data
read should
fit in
memory
due to
Thrift
streaming
support lack
Summary
Great I/O performance
Several data access interfaces
AP data store (CAP)
Production ready & production proved
Good for time series data
Extremely available
References
Datastax - http://www.datastax.com/docs/1.1/index
Apache Cassandra - http://cassandra.apache.org/
All Things Distributed - http://www.allthingsdistributed.com/
Hector - http://hector-
client.github.com/hector/build/html/index.html
Kundera - https://github.com/impetus-opensource/Kundera
Thank you!

More Related Content

What's hot

Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
Jeremy Hanna
 
Cassandra basics 2.0
Cassandra basics 2.0Cassandra basics 2.0
Cassandra basics 2.0
Asis Mohanty
 
Cassandra Architecture FTW
Cassandra Architecture FTWCassandra Architecture FTW
Cassandra Architecture FTW
Jeffrey Carpenter
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed Database
Eric Evans
 
What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...
What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...
What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...
Edureka!
 
Apache Cassandra
Apache CassandraApache Cassandra
Apache Cassandra
Rutuja Gholap
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
Christian Johannsen
 
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarC* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
DataStax Academy
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalability
jbellis
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamo
jbellis
 
Cassandra background-and-architecture
Cassandra background-and-architectureCassandra background-and-architecture
Cassandra background-and-architectureMarkus Klems
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overviewSean Murphy
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
DataStax
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
DataStax
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
Joe Stein
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
SoftwareMill
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
Oleg Magazov
 
Cassandra 101
Cassandra 101Cassandra 101
Cassandra 101
Nader Ganayem
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
DataStax Academy
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
Nikiforos Botis
 

What's hot (20)

Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
Cassandra basics 2.0
Cassandra basics 2.0Cassandra basics 2.0
Cassandra basics 2.0
 
Cassandra Architecture FTW
Cassandra Architecture FTWCassandra Architecture FTW
Cassandra Architecture FTW
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed Database
 
What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...
What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...
What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...
 
Apache Cassandra
Apache CassandraApache Cassandra
Apache Cassandra
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarC* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalability
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamo
 
Cassandra background-and-architecture
Cassandra background-and-architectureCassandra background-and-architecture
Cassandra background-and-architecture
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
 
Cassandra 101
Cassandra 101Cassandra 101
Cassandra 101
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
 

Viewers also liked

Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3
Markus Klems
 
Cassandra introduction 2016
Cassandra introduction 2016Cassandra introduction 2016
Cassandra introduction 2016
Duyhai Doan
 
Apache cassandra in 2016
Apache cassandra in 2016Apache cassandra in 2016
Apache cassandra in 2016
Duyhai Doan
 
Datastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basicsDatastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basics
Duyhai Doan
 
Java persistence api
Java persistence api Java persistence api
Java persistence api
Luis Goldster
 
Spring 4. Part 1 - IoC, AOP
Spring 4. Part 1 - IoC, AOPSpring 4. Part 1 - IoC, AOP
Spring 4. Part 1 - IoC, AOP
Nakraynikov Oleg
 
Gradle - Build System
Gradle - Build SystemGradle - Build System
Gradle - Build System
Jeevesh Pandey
 
20160523 hibernate persistence_framework_and_orm
20160523 hibernate persistence_framework_and_orm20160523 hibernate persistence_framework_and_orm
20160523 hibernate persistence_framework_and_orm
Kenan Sevindik
 
Java Persistence API
Java Persistence APIJava Persistence API
Java Persistence API
Carol McDonald
 
Spring Boot Update
Spring Boot UpdateSpring Boot Update
Spring Boot Update
Sergi Almar i Graupera
 
Java persistence api 2.1
Java persistence api 2.1Java persistence api 2.1
Java persistence api 2.1
Rakesh K. Cherukuri
 
JPA For Beginner's
JPA For Beginner'sJPA For Beginner's
JPA For Beginner's
NarayanaMurthy Ganashree
 
Second Level Cache in JPA Explained
Second Level Cache in JPA ExplainedSecond Level Cache in JPA Explained
Second Level Cache in JPA Explained
Patrycja Wegrzynowicz
 
DBM專案環境建置
DBM專案環境建置DBM專案環境建置
DBM專案環境建置
Guo Albert
 
Get the Most out of Testing with Spring 4.2
Get the Most out of Testing with Spring 4.2Get the Most out of Testing with Spring 4.2
Get the Most out of Testing with Spring 4.2
Sam Brannen
 
JPA - Beyond copy-paste
JPA - Beyond copy-pasteJPA - Beyond copy-paste
JPA - Beyond copy-paste
Jakub Kubrynski
 
Spring Data Jpa
Spring Data JpaSpring Data Jpa
Spring Data Jpa
Ivan Queiroz
 
Spring.Boot up your development
Spring.Boot up your developmentSpring.Boot up your development
Spring.Boot up your development
Strannik_2013
 
Google Web Toolkit: a case study
Google Web Toolkit: a case studyGoogle Web Toolkit: a case study
Google Web Toolkit: a case study
Bryan Basham
 
Introduction To Spring
Introduction To SpringIntroduction To Spring
Introduction To Spring
Ilio Catallo
 

Viewers also liked (20)

Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3
 
Cassandra introduction 2016
Cassandra introduction 2016Cassandra introduction 2016
Cassandra introduction 2016
 
Apache cassandra in 2016
Apache cassandra in 2016Apache cassandra in 2016
Apache cassandra in 2016
 
Datastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basicsDatastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basics
 
Java persistence api
Java persistence api Java persistence api
Java persistence api
 
Spring 4. Part 1 - IoC, AOP
Spring 4. Part 1 - IoC, AOPSpring 4. Part 1 - IoC, AOP
Spring 4. Part 1 - IoC, AOP
 
Gradle - Build System
Gradle - Build SystemGradle - Build System
Gradle - Build System
 
20160523 hibernate persistence_framework_and_orm
20160523 hibernate persistence_framework_and_orm20160523 hibernate persistence_framework_and_orm
20160523 hibernate persistence_framework_and_orm
 
Java Persistence API
Java Persistence APIJava Persistence API
Java Persistence API
 
Spring Boot Update
Spring Boot UpdateSpring Boot Update
Spring Boot Update
 
Java persistence api 2.1
Java persistence api 2.1Java persistence api 2.1
Java persistence api 2.1
 
JPA For Beginner's
JPA For Beginner'sJPA For Beginner's
JPA For Beginner's
 
Second Level Cache in JPA Explained
Second Level Cache in JPA ExplainedSecond Level Cache in JPA Explained
Second Level Cache in JPA Explained
 
DBM專案環境建置
DBM專案環境建置DBM專案環境建置
DBM專案環境建置
 
Get the Most out of Testing with Spring 4.2
Get the Most out of Testing with Spring 4.2Get the Most out of Testing with Spring 4.2
Get the Most out of Testing with Spring 4.2
 
JPA - Beyond copy-paste
JPA - Beyond copy-pasteJPA - Beyond copy-paste
JPA - Beyond copy-paste
 
Spring Data Jpa
Spring Data JpaSpring Data Jpa
Spring Data Jpa
 
Spring.Boot up your development
Spring.Boot up your developmentSpring.Boot up your development
Spring.Boot up your development
 
Google Web Toolkit: a case study
Google Web Toolkit: a case studyGoogle Web Toolkit: a case study
Google Web Toolkit: a case study
 
Introduction To Spring
Introduction To SpringIntroduction To Spring
Introduction To Spring
 

Similar to Cassandra for mission critical data

Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Lviv Startup Club
 
cassandra_presentation_final
cassandra_presentation_finalcassandra_presentation_final
cassandra_presentation_final
SergioBruno21
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
nehabsairam
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
Mohammed Fazuluddin
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting data
Chen Robert
 
MU - No SQL.pptx
MU - No SQL.pptxMU - No SQL.pptx
MU - No SQL.pptx
kapil yadav
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Fwdays
 
Cassandra presentation
Cassandra presentationCassandra presentation
Cassandra presentation
Sergey Enin
 
Cassandra Learning
Cassandra LearningCassandra Learning
Cassandra Learning
Ehsan Javanmard
 
NoSQL – Data Center Centric Application Enablement
NoSQL – Data Center Centric Application EnablementNoSQL – Data Center Centric Application Enablement
NoSQL – Data Center Centric Application Enablement
DATAVERSITY
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
YounesCharfaoui
 
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB
Rajesh Menon
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introduction
fardinjamshidi
 
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Dave Anselmi
 
Column db dol
Column db dolColumn db dol
Column db dol
poojabi
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
hothyfa
 
Cassandra from the trenches: migrating Netflix (update)
Cassandra from the trenches: migrating Netflix (update)Cassandra from the trenches: migrating Netflix (update)
Cassandra from the trenches: migrating Netflix (update)
Jason Brown
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
shnkr_rmchndrn
 
Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Cassandra - A Distributed Database System
Cassandra - A Distributed Database System
Md. Shohel Rana
 

Similar to Cassandra for mission critical data (20)

Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
 
cassandra_presentation_final
cassandra_presentation_finalcassandra_presentation_final
cassandra_presentation_final
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting data
 
MU - No SQL.pptx
MU - No SQL.pptxMU - No SQL.pptx
MU - No SQL.pptx
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
Cassandra presentation
Cassandra presentationCassandra presentation
Cassandra presentation
 
Cassandra Learning
Cassandra LearningCassandra Learning
Cassandra Learning
 
NoSQL – Data Center Centric Application Enablement
NoSQL – Data Center Centric Application EnablementNoSQL – Data Center Centric Application Enablement
NoSQL – Data Center Centric Application Enablement
 
No sql
No sqlNo sql
No sql
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introduction
 
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
 
Column db dol
Column db dolColumn db dol
Column db dol
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
 
Cassandra from the trenches: migrating Netflix (update)
Cassandra from the trenches: migrating Netflix (update)Cassandra from the trenches: migrating Netflix (update)
Cassandra from the trenches: migrating Netflix (update)
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Cassandra - A Distributed Database System
Cassandra - A Distributed Database System
 

Recently uploaded

GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
NaapbooksPrivateLimi
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
MayankTawar1
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
Peter Caitens
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
varshanayak241
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
IES VE
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
KrzysztofKkol1
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Anthony Dahanne
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 

Recently uploaded (20)

GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 

Cassandra for mission critical data

  • 1. Apache Cassandra for mission critical data OLEKSANDR SEMENOV
  • 2. Agenda 1) CAP Theorem 2) NoSQL vs RDBMS: advantages and disadvantages 3) What is Cassandra? History. 4) Cassandra features 5) Cassandra datamodel 6) Ways to access data: Thrift, CQL, Kundera ORM
  • 3. What is NoSQL NoSQL Not SQL does not mean NoSQL Not Only SQL OR Not Relational Database it means
  • 4. CAP Theorem You can choose only two: Consistency, Availability, Partition tolerance
  • 5.
  • 6. Choosing AP data storages Cassandra is an AP storage
  • 7. RDBMS + Strong mathematical basis + Referential Integrity + ACID transactions + Standard SQL + Well-known approaches to data modeling - Poor performance at great data amounts - Scaling issues
  • 8. NoSQL + Great performance + Flexible data schema + Easy scaling - Data redundancy - Integrity should be ensured by developer in most cases - Different access interfaces for different stores - Paradigm shift required - BASE consistency model instead of ACID transactions
  • 9. ACID consistency model Atomicity • Transactions are all or nothing Consistency • Data written is valid according all rules: Isolation • Transactions do not affect each other Durability • Data written will not be lost
  • 12. What is Cassandra? Cassandra is a: • non-relational • highly-scalable • decentralized • eventually consistent key-multivalue storage
  • 15.
  • 16. Cassandra Features Decentralized • each node has the same role and can process any request Replication • Cassandra supports multi - datacenter replication Scalable • read and write throughput both increase linearly as new machines are added Durable • data write once will survive in case of hardware failure
  • 17. Cassandra Features Fault- tolerant •data is automatically replicated to multiple nodes for fault-tolerance Tunable consistency •you can choose desired consistency level CQL • SQL-like query language Very fast IO •Both reads and writes are very fast
  • 19. Availability: Cassandra & no SPOF • Each node can act as router • Data is replicated to several nodes according to replication factor
  • 22. Tunable consistency Consistency can be set on per-operation basis
  • 23. Write path in Cassandra • Data is written to any node called coordinator • Data is written to commitlog(for durability) and then to memTable • MemTable is flushed to disk(SSTable) periodically, it is recreated in memory • Deletes are special cases of writes - tombstones
  • 24. Read path in Cassandra • Any server can be queried, it acts as coordinator • Contacts node with requested key • If consistency < ALL, read repair is performed on background Read at consistency level = ONE
  • 25. Read repair • Read repair means that when a query is made against a given key, we perform a digest query against all the replicas of the key and push the most recent version to any out-of-date replicas.
  • 28. Columns Column – is a tuple which contains 3 fields: name, value and timestamp
  • 29. Special column types • Expiring Columns – column with auto-removal • Counter columns – columns with auto- increment. • SuperColumns – columns, which contain other columns. Deprecated.
  • 31. Indexes • Primary index – index built by key of the each row • Secondary index – index on column values, should be created manually. Good only for low cardinality columns. Example: columns Gender can have only two values: M and F. And it is a problem. • Indexing is performed in background
  • 32. Data modelling • Query-driven approach is required • How to get data if I can query only by key? • Denormalize it! • Create multiple tables for data • Use fast writes to do few reads as possible
  • 33. What Cassandra is good for? Time series data (logs, sensor data) Write intensive applications Applications with predefined query-model
  • 34. Never use Cassandra • If you want to replace traditional RDBMS with it. • If you can’t tell in which way your data will be queried • If you have a lot of reads • If strong consistency is required (financial, medical areas) • Cassandra is not a silver-bullet solution
  • 35.
  • 36. Ways to access data Thrift • First & native client. Deprecated. Hector, Pelops • Libraries based on Thrift CQL • SQL-like language, very limited Kundera • ORM/ONM framework
  • 37. Thrift • Apache Thrift – framework for cross-language services development • Supported languages: C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Smalltalk, OCaml and others. • Was developed by Facebook and released in 2007 • Deprecated 
  • 38. Hector • Hector - is a high level Java client for Apache Cassandra currently in use on a number of production systems. • Includes an incredible number of features
  • 39. Hector main features • Security – connection using Kerberos • Speed4j monitoring library integrating capabilities • Hector Object Mapper – simple ORM(not compliant with JPA  ) • Connection pooling • Failover behavior on client side
  • 40. CQL CQL – a SQL-like language introduced in Cassandra 0.8 Offers next functionality: • No JOINS • Creating/dropping keyspaces, column families, columns and rows • Inserting/retrieving columns • Indexing
  • 41. Kundera ORM Kundera is a “Polyglot Object Mapper” Supports: ◦ Cassandra ◦ HBase ◦ MongoDB ◦ RDBMS ◦ and other
  • 43. Performance Comparison Benchmarked on Amazon Ubuntu large instance: ◦ 7.5 GB memory ◦ 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each) ◦ 64-bit platform
  • 44. Performance Comparison Number Of Threads (1 record) Pelops Time (in sec) Hector Time (in sec) Kundera (in sec) 10 0.148 0.100 0.117 100 0.350 0.363 0.361 1000 1.793 1.885 2.180 10000 11.478 11.480 14.262 40000 38.887 37.241 41.977 50000 48.646 47.749 49.285 100000 91.280 92.874 97.707 Concurrent load – 1 record per thread
  • 45. Performance Comparison 0 20 40 60 80 100 120 10 100 1000 10000 40000 50000 100000 Time,s Threads number Concurrent load - 1 record for each thread Pelops Hector Kundera
  • 46. Performance Comparison Concurrent + Bulk load – 1000 record per thread Number Of Threads (1000 rec/ thread) Pelops Time (in sec) Hector Time (in sec) Kundera (in sec) 10 5.929 5.286 7.722 100 34.750 32.228 39.124 1000 368.022 352.711 393.931
  • 47. Performance Comparison 0 200 400 600 800 1000 1200 10 100 1000 Time,s Thread number Concurrent + Bulk load – 1000 record per thread Kundera Hector Pelops
  • 48. Cassandra limitations The key (and column names) must < 64K bytes. The maximum number of column per row is 2 billion. A single column value may not be larger than 2GB. All data read should fit in memory due to Thrift streaming support lack
  • 49. Summary Great I/O performance Several data access interfaces AP data store (CAP) Production ready & production proved Good for time series data Extremely available
  • 50. References Datastax - http://www.datastax.com/docs/1.1/index Apache Cassandra - http://cassandra.apache.org/ All Things Distributed - http://www.allthingsdistributed.com/ Hector - http://hector- client.github.com/hector/build/html/index.html Kundera - https://github.com/impetus-opensource/Kundera

Editor's Notes

  1. Consistency All the servers in the system will have the same data so anyone using the system will get the same copy regardless of which server answers their request. Availability The system will always respond to a request (even if it's not the latest data or consistent across the system or just a message saying the system isn't working) Partition Tolerance The system continues to operate as a whole even if individual servers fail or can't be reached..
  2. Cassandra was developed by Facebook by one of the authors of Amazon DynamoDB to solve the problem of inbox search In 2008 Cassandra was released as an OpenSource project on Google Code In 2010 Cassandra became a top-level Apache project