SlideShare a Scribd company logo
Cassandra
Introduction & Key Features
Meetup Vienna Cassandra Users
13th of January 2014
philipp.potisk@geroba.com
Definition
Apache Cassandra is an open source, distributed,
decentralized, elastically scalable, highly available,
fault-tolerant, tuneably consistent, column-oriented
database that bases its distribution design on Amazon’s
Dynamo and its data model on Google’s Bigtable.
Created at Facebook, it is now used at some of the most
popular sites on the Web [The Definitive Guide, Eben
Hewitt, 2010]
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

2
History
Dynamo, 2007

Bigtable, 2006

OpenSource, 2008

13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

3
Key Features

Distributed
and
Decentralized
High Performance

CQL – A SQL
like query
interface

Elastic
Scalability

Cassandra

Columnoriented
Key-Value
store
13/01/2014

High
Availability
and Fault
Tolerance

Tuneable
Consistency

Cassandra Introduction & Key Features by Philipp Potisk

4
Distributed and Decentralized
Datacenter 1

• Distributed: Capable of running
on multiple machines
• Decentralized: No single point of
failure
No master-slave issues due to
peer-to-peer architecture
(protocol "gossip")
Single Cassandra cluster may run
across geographically dispersed
data centers
13/01/2014

Datacenter 2

1

7

6

2

5

3

4

12

8

11

9
10

Read- and writerequests to any node

Cassandra Introduction & Key Features by Philipp Potisk

5
Elastic Scalability

1
8

1

• Cassandra scales horizontally,
adding more machines that have
all or some of the data on
• Adding of nodes increase
performance throughput linearly
• De-/ and increasing the
nodecount happen seamlessly

4 Performance
2
throughput = N
3

2

Performance
throughput = N x 2

7

4

6
5

Linearly scales to
terabytes and
petabytes of data
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

3

6
Scaling Benchmark By Netflix*
48, 96, 144 and 288
instances, with 10, 20,
30 and 60 clients
respectively. Each client
generated ~20.000w/s
having 400byte in size

Cassandra scales linearly far
beyond our current capacity
requirements, and very
rapid deployment
automation makes it easy to
manage. In particular,
benchmarking in the cloud
is fast, cheap and scalable,

*http://techblog.netflix.com/201
1/11/benchmarking-cassandrascalability-on.html
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

7
High Availability and Fault Tolerance
• High Availability?
Multiple networked computers
operating in a cluster
Facility for recognizing node
failures
Forward failing over requests to
another part of the system

1
6

2

5

3
4

• Cassandra has High Availability

No single point of failure
due to the peer-to-peer
architecture
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

8
Tunable Consistency
• Choose between strong and eventual
consistency
• Adjustable for read- and writeoperations separately
• Conflicts are solved during reads, as
focus lies on write-performance

TUNABLE

Available

Consistency

Use case dependent
level of consistency
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

9
When do we have strong consistency?
• Simple Formula:

jsmith

(nodes_written + nodes_read) >
replication_factor
jsmith

t1
t2

NW: 2
NR: 2
RF: 3

t1
t2

jsmith

t1

• Ensures that a read always
reflects the most recent write
• If not: Weak consistency
 Eventually consistent
jsmith

13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

t2
10
Column-oriented Key-Value Store
Row Key1

Column
Key1
Column
Value1

Column
Key2
Column
Value2

Column
Key3
Column
Value3

…
…

…

• Data is stored in sparse
multidimensional hash tables
• A row can have multiple columns –
not necessarily the same amount of
columns for each row
• Each row has a unique key, which
also determines partitioning
• No relations!

Stored sorted by row key *

Stored sorted by column key/value

Map<RowKey, SortedMap<ColumnKey, ColumnValue>>
* Row keys (partition keys) should be hashed, in order to distribute data across the cluster evenly
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

11
CQL – An SQL-like query interface
• “CQL 3 is the default and primary interface into the Cassandra DBMS” *
• Familiar SQL-like syntax that maps to Cassandras storage engine and
simplifies data modelling
CRETE TABLE songs (
id uuid PRIMARY KEY,
title text,
album text,
artist text,
data blob,
tags set<text>
);

INSERT INTO songs
(id, title, artist,
album, tags)
VALUES(
'a3e64f8f...',
'La Grange',
'ZZ Top',
'Tres Hombres'‚
{'cool', 'hot'});

SELECT *
FROM songs
WHERE id = 'a3e64f8f...';

“SQL-like” but NOT
relational SQL

* http://www.datastax.com/documentation/cql/3.0/pdf/cql30.pdf
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

12
High Performance
• Optimized from the ground up
for high throughput
• All disk writes are sequential,
append only operations
• No reading before writing
• Cassandra`s threading-concept is
optimized for running on
multiprocessor/ multicore
machines
13/01/2014

Optimized for writing,
but fast reads are
possible as well

Cassandra Introduction & Key Features by Philipp Potisk

13
Benchmark from 2011 (Cassandra 0.7.4)*
ops
Cassandra showed
outstanding throughput in
“INSERT-only” with 20,000
ops

Insert: Enter 50 million 1K-sized records
Read: Search key for a one hour period + optional update
Hardware: Nehalem 6 Core x 2 CPU, 16GB Memory
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

*NoSql Benchmarking by Curbit
http://www.cubrid.org/blog/de
v-platform/nosqlbenchmarking/
14
Benchmark from 2013 (Cassandra 1.1.6)*

* Benchmarking Top NoSQL Databases by End Point Corporation,
http://www.datastax.com/wp-content/uploads/2013/02/WP-Benchmarking-Top-NoSQL-Databases.pdf
Yahoo! Cloud Serving Benchmark: https://github.com/brianfrankcooper/YCSB
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

15
When do we need these features?
Lots of
Writes,
Statistics, and
Analysis

Geographical
Distribution

Large
Deployments

13/01/2014

Evolving
Applications

Cassandra Introduction & Key Features by Philipp Potisk

16
Who is using Cassandra?

13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

17
ebay Data Infrastructure*
•
•
•
•
•
•

Thousands of nodes
> 2K sharded logical host
> 16K tables
> 27K indexes
> 140 billion SQLs/day
> 5 PB provisioned

• 10+ clusters
• 100+ nodes
• > 250 TB provisioned
(local HDD + shared SSD)
• > 9 billion writes/day
• > 5 billion reads/day

• Hundreds of nodes
• Persistent & in-memory
• > 40 billion SQLs/day

Not replacing RDMBS but
complementing!

Hundreds of nodes
> 50 TB
> 2 billion ops/day

• Thousands of nodes
• The world largest cluster
with 2K+ nodes

*by Jay Patel, Cassandra Summit June 2013 San Francisco
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

18
Cassandra Use Case at Ebay
Application/Use Case
• Time-series data and real-time insights
• Fraud detection & prevention
• Quality Click Pricing for affiliates
• Order & Shipment Tracking
•…
• Server metrics collection
• Taste graph-based next-gen recommendation
system
• Social Signals on eBay Product & Item pages
13/01/2014

Why Cassandra?
• Multi-Datacenter (active-active)
• No SPOF
• Easy to scale
• Write performance
• Distributed Counters

Cassandra Introduction & Key Features by Philipp Potisk

19
Cassandra/Hadoop Deployment

13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

20
Summary
• History
• Key features of Cassandra
•
•
•
•
•
•
•

Distributed and Decentralized
Elastic Scalability
High Availability and Fault Tolerance
Tunable Consistency
Column-oriented key-value store
CQL interface
High Performance

• Ebay Use Case
13/01/2014

Apache project: http://cassandra.apache.org

Community portal: http://planetcassandra.org

Documentation: http://www.datastax.com/docs

Cassandra Introduction & Key Features by Philipp Potisk

21

More Related Content

What's hot

Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0
Kai Sasaki
 
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASTechnical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPAS
Ashnikbiz
 
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of DutyCassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
DataStax Academy
 
Change Data Capture using Kafka
Change Data Capture using KafkaChange Data Capture using Kafka
Change Data Capture using Kafka
Akash Vacher
 
Spark streaming with apache kafka
Spark streaming with apache kafkaSpark streaming with apache kafka
Spark streaming with apache kafka
punesparkmeetup
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBase
dave_revell
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Data Con LA
 
Scaling MySQL using Fabric
Scaling MySQL using FabricScaling MySQL using Fabric
Scaling MySQL using Fabric
Karthik .P.R
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at Yelp
Nathan Handler
 
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE PlatformLarge Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
DataStax Academy
 
Migration from Redshift to Spark
Migration from Redshift to SparkMigration from Redshift to Spark
Migration from Redshift to Spark
Sky Yin
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talk
DataStax Academy
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Shivji Kumar Jha
 
Mining AWR V2 - Trend Analysis
Mining AWR V2 - Trend AnalysisMining AWR V2 - Trend Analysis
Mining AWR V2 - Trend Analysis
Maris Elsins
 
Real time dashboards with Kafka and Druid
Real time dashboards with Kafka and DruidReal time dashboards with Kafka and Druid
Real time dashboards with Kafka and Druid
Venu Ryali
 
From 100s to 100s of Millions
From 100s to 100s of MillionsFrom 100s to 100s of Millions
From 100s to 100s of Millions
Erik Onnen
 
Kafka - Linkedin's messaging backbone
Kafka - Linkedin's messaging backboneKafka - Linkedin's messaging backbone
Kafka - Linkedin's messaging backbone
Ayyappadas Ravindran (Appu)
 
Actor-based concurrency in a modern Java Enterprise
Actor-based concurrency in a modern Java EnterpriseActor-based concurrency in a modern Java Enterprise
Actor-based concurrency in a modern Java Enterprise
Alexander Lukyanchikov
 
MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)
Karthik .P.R
 
Redis Labs and SQL Server
Redis Labs and SQL ServerRedis Labs and SQL Server
Redis Labs and SQL Server
Lynn Langit
 

What's hot (20)

Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0
 
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASTechnical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPAS
 
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of DutyCassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
 
Change Data Capture using Kafka
Change Data Capture using KafkaChange Data Capture using Kafka
Change Data Capture using Kafka
 
Spark streaming with apache kafka
Spark streaming with apache kafkaSpark streaming with apache kafka
Spark streaming with apache kafka
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBase
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
Scaling MySQL using Fabric
Scaling MySQL using FabricScaling MySQL using Fabric
Scaling MySQL using Fabric
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at Yelp
 
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE PlatformLarge Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
 
Migration from Redshift to Spark
Migration from Redshift to SparkMigration from Redshift to Spark
Migration from Redshift to Spark
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talk
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
 
Mining AWR V2 - Trend Analysis
Mining AWR V2 - Trend AnalysisMining AWR V2 - Trend Analysis
Mining AWR V2 - Trend Analysis
 
Real time dashboards with Kafka and Druid
Real time dashboards with Kafka and DruidReal time dashboards with Kafka and Druid
Real time dashboards with Kafka and Druid
 
From 100s to 100s of Millions
From 100s to 100s of MillionsFrom 100s to 100s of Millions
From 100s to 100s of Millions
 
Kafka - Linkedin's messaging backbone
Kafka - Linkedin's messaging backboneKafka - Linkedin's messaging backbone
Kafka - Linkedin's messaging backbone
 
Actor-based concurrency in a modern Java Enterprise
Actor-based concurrency in a modern Java EnterpriseActor-based concurrency in a modern Java Enterprise
Actor-based concurrency in a modern Java Enterprise
 
MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)
 
Redis Labs and SQL Server
Redis Labs and SQL ServerRedis Labs and SQL Server
Redis Labs and SQL Server
 

Viewers also liked

System Center 2012 - January Licensing Update
System Center 2012 - January Licensing UpdateSystem Center 2012 - January Licensing Update
System Center 2012 - January Licensing Update
Softchoice Corporation
 
SQL Server 2012 ile Gelen Yeni Özellikler
SQL Server 2012 ile Gelen Yeni ÖzelliklerSQL Server 2012 ile Gelen Yeni Özellikler
SQL Server 2012 ile Gelen Yeni Özelliklerturgaysahtiyan
 
Softchoice Webinar Series: VMware vSphere 5.1 Changes
Softchoice Webinar Series: VMware vSphere 5.1 ChangesSoftchoice Webinar Series: VMware vSphere 5.1 Changes
Softchoice Webinar Series: VMware vSphere 5.1 Changes
Softchoice Corporation
 
You voiced your concerns. VMware listened: Major Adjustments to vSphere 5 lic...
You voiced your concerns. VMware listened: Major Adjustments to vSphere 5 lic...You voiced your concerns. VMware listened: Major Adjustments to vSphere 5 lic...
You voiced your concerns. VMware listened: Major Adjustments to vSphere 5 lic...
Softchoice Corporation
 
Nordic VMUG User Conference 2014 - Design VMware vCenter Server
Nordic VMUG User Conference 2014 - Design VMware vCenter ServerNordic VMUG User Conference 2014 - Design VMware vCenter Server
Nordic VMUG User Conference 2014 - Design VMware vCenter Server
Andrea Mauro
 
Limewood Event - VMware
Limewood Event - VMware Limewood Event - VMware
Limewood Event - VMware
BlueChipICT
 
VMUGIT Meeting Pisa 2015 - SDS secondo VMware: VSAN e VVOL
VMUGIT Meeting Pisa 2015 - SDS secondo VMware: VSAN e VVOLVMUGIT Meeting Pisa 2015 - SDS secondo VMware: VSAN e VVOL
VMUGIT Meeting Pisa 2015 - SDS secondo VMware: VSAN e VVOL
gguglie
 
Findability Day 2015 Mattias Ellison - Findwise - Enterprise Search and fin...
Findability Day 2015   Mattias Ellison - Findwise - Enterprise Search and fin...Findability Day 2015   Mattias Ellison - Findwise - Enterprise Search and fin...
Findability Day 2015 Mattias Ellison - Findwise - Enterprise Search and fin...
Findwise
 
Site Recovery Manager - Una visione architetturale
Site Recovery Manager - Una visione architetturaleSite Recovery Manager - Una visione architetturale
Site Recovery Manager - Una visione architetturale
gguglie
 
SQL Server Performans İpuçları
SQL Server Performans İpuçlarıSQL Server Performans İpuçları
SQL Server Performans İpuçları
turgaysahtiyan
 
Docker at Djangocon 2013 | Talk by Ken Cochrane
Docker at Djangocon 2013 | Talk by Ken CochraneDocker at Djangocon 2013 | Talk by Ken Cochrane
Docker at Djangocon 2013 | Talk by Ken Cochrane
dotCloud
 
vCenter and ESXi network port communications
vCenter and ESXi network port communicationsvCenter and ESXi network port communications
vCenter and ESXi network port communications
Animesh Dixit
 
Virtual Space Race: How IT with The Right Stuff Creates a Competitive Advantage
Virtual Space Race: How IT with The Right Stuff Creates a Competitive AdvantageVirtual Space Race: How IT with The Right Stuff Creates a Competitive Advantage
Virtual Space Race: How IT with The Right Stuff Creates a Competitive Advantage
Softchoice Corporation
 
VMworld 2014: Site Recovery Manager and vSphere Replication
VMworld 2014: Site Recovery Manager and vSphere ReplicationVMworld 2014: Site Recovery Manager and vSphere Replication
VMworld 2014: Site Recovery Manager and vSphere Replication
VMworld
 
Working Hard or Hardly Networked?
Working Hard or Hardly Networked?Working Hard or Hardly Networked?
Working Hard or Hardly Networked?
Softchoice Corporation
 
vmware_site_recovery_manager_and_net_app_fas_v-series_se_technical_presentati...
vmware_site_recovery_manager_and_net_app_fas_v-series_se_technical_presentati...vmware_site_recovery_manager_and_net_app_fas_v-series_se_technical_presentati...
vmware_site_recovery_manager_and_net_app_fas_v-series_se_technical_presentati...
Vinh Nguyen
 
Creating 3rd Generation Web APIs with Hydra
Creating 3rd Generation Web APIs with HydraCreating 3rd Generation Web APIs with Hydra
Creating 3rd Generation Web APIs with Hydra
Markus Lanthaler
 
Getting secure in a mobile-first world with EMS
Getting secure in a mobile-first world with EMSGetting secure in a mobile-first world with EMS
Getting secure in a mobile-first world with EMS
Softchoice Corporation
 
How to hack VMware vCenter server in 60 seconds
How to hack VMware vCenter server in 60 secondsHow to hack VMware vCenter server in 60 seconds
How to hack VMware vCenter server in 60 seconds
Positive Hack Days
 
InfoGrid Core Ideas
InfoGrid Core IdeasInfoGrid Core Ideas
InfoGrid Core Ideas
InfoGrid.org
 

Viewers also liked (20)

System Center 2012 - January Licensing Update
System Center 2012 - January Licensing UpdateSystem Center 2012 - January Licensing Update
System Center 2012 - January Licensing Update
 
SQL Server 2012 ile Gelen Yeni Özellikler
SQL Server 2012 ile Gelen Yeni ÖzelliklerSQL Server 2012 ile Gelen Yeni Özellikler
SQL Server 2012 ile Gelen Yeni Özellikler
 
Softchoice Webinar Series: VMware vSphere 5.1 Changes
Softchoice Webinar Series: VMware vSphere 5.1 ChangesSoftchoice Webinar Series: VMware vSphere 5.1 Changes
Softchoice Webinar Series: VMware vSphere 5.1 Changes
 
You voiced your concerns. VMware listened: Major Adjustments to vSphere 5 lic...
You voiced your concerns. VMware listened: Major Adjustments to vSphere 5 lic...You voiced your concerns. VMware listened: Major Adjustments to vSphere 5 lic...
You voiced your concerns. VMware listened: Major Adjustments to vSphere 5 lic...
 
Nordic VMUG User Conference 2014 - Design VMware vCenter Server
Nordic VMUG User Conference 2014 - Design VMware vCenter ServerNordic VMUG User Conference 2014 - Design VMware vCenter Server
Nordic VMUG User Conference 2014 - Design VMware vCenter Server
 
Limewood Event - VMware
Limewood Event - VMware Limewood Event - VMware
Limewood Event - VMware
 
VMUGIT Meeting Pisa 2015 - SDS secondo VMware: VSAN e VVOL
VMUGIT Meeting Pisa 2015 - SDS secondo VMware: VSAN e VVOLVMUGIT Meeting Pisa 2015 - SDS secondo VMware: VSAN e VVOL
VMUGIT Meeting Pisa 2015 - SDS secondo VMware: VSAN e VVOL
 
Findability Day 2015 Mattias Ellison - Findwise - Enterprise Search and fin...
Findability Day 2015   Mattias Ellison - Findwise - Enterprise Search and fin...Findability Day 2015   Mattias Ellison - Findwise - Enterprise Search and fin...
Findability Day 2015 Mattias Ellison - Findwise - Enterprise Search and fin...
 
Site Recovery Manager - Una visione architetturale
Site Recovery Manager - Una visione architetturaleSite Recovery Manager - Una visione architetturale
Site Recovery Manager - Una visione architetturale
 
SQL Server Performans İpuçları
SQL Server Performans İpuçlarıSQL Server Performans İpuçları
SQL Server Performans İpuçları
 
Docker at Djangocon 2013 | Talk by Ken Cochrane
Docker at Djangocon 2013 | Talk by Ken CochraneDocker at Djangocon 2013 | Talk by Ken Cochrane
Docker at Djangocon 2013 | Talk by Ken Cochrane
 
vCenter and ESXi network port communications
vCenter and ESXi network port communicationsvCenter and ESXi network port communications
vCenter and ESXi network port communications
 
Virtual Space Race: How IT with The Right Stuff Creates a Competitive Advantage
Virtual Space Race: How IT with The Right Stuff Creates a Competitive AdvantageVirtual Space Race: How IT with The Right Stuff Creates a Competitive Advantage
Virtual Space Race: How IT with The Right Stuff Creates a Competitive Advantage
 
VMworld 2014: Site Recovery Manager and vSphere Replication
VMworld 2014: Site Recovery Manager and vSphere ReplicationVMworld 2014: Site Recovery Manager and vSphere Replication
VMworld 2014: Site Recovery Manager and vSphere Replication
 
Working Hard or Hardly Networked?
Working Hard or Hardly Networked?Working Hard or Hardly Networked?
Working Hard or Hardly Networked?
 
vmware_site_recovery_manager_and_net_app_fas_v-series_se_technical_presentati...
vmware_site_recovery_manager_and_net_app_fas_v-series_se_technical_presentati...vmware_site_recovery_manager_and_net_app_fas_v-series_se_technical_presentati...
vmware_site_recovery_manager_and_net_app_fas_v-series_se_technical_presentati...
 
Creating 3rd Generation Web APIs with Hydra
Creating 3rd Generation Web APIs with HydraCreating 3rd Generation Web APIs with Hydra
Creating 3rd Generation Web APIs with Hydra
 
Getting secure in a mobile-first world with EMS
Getting secure in a mobile-first world with EMSGetting secure in a mobile-first world with EMS
Getting secure in a mobile-first world with EMS
 
How to hack VMware vCenter server in 60 seconds
How to hack VMware vCenter server in 60 secondsHow to hack VMware vCenter server in 60 seconds
How to hack VMware vCenter server in 60 seconds
 
InfoGrid Core Ideas
InfoGrid Core IdeasInfoGrid Core Ideas
InfoGrid Core Ideas
 

Similar to Cassandra Introduction & Features

NoSQL_Night
NoSQL_NightNoSQL_Night
NoSQL_Night
Clarence J M Tauro
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
Christian Johannsen
 
5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database
ScyllaDB
 
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
Sa introduction to big data pipelining with cassandra &amp; spark   west mins...Sa introduction to big data pipelining with cassandra &amp; spark   west mins...
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
Simon Ambridge
 
Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance Platform
Maris Elsins
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
nehabsairam
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Fwdays
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical data
Oleksandr Semenov
 
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ? Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
Swiss Data Forum Swiss Data Forum
 
DBaaS - The Next generation of database infrastructure
DBaaS - The Next generation of database infrastructureDBaaS - The Next generation of database infrastructure
DBaaS - The Next generation of database infrastructure
Emiliano Fusaglia
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
Brian Enochson
 
cassandra_presentation_final
cassandra_presentation_finalcassandra_presentation_final
cassandra_presentation_final
SergioBruno21
 
Cassandra
Cassandra Cassandra
Cassandra
Pooja GV
 
TupleJump: Breakthrough OLAP performance on Cassandra and Spark
TupleJump: Breakthrough OLAP performance on Cassandra and SparkTupleJump: Breakthrough OLAP performance on Cassandra and Spark
TupleJump: Breakthrough OLAP performance on Cassandra and Spark
DataStax Academy
 
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkFiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
Evan Chan
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
Mohammed Fazuluddin
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
Evan Chan
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
Victor Coustenoble
 
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network Analytics
Yousun Jeong
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
Spark Summit
 

Similar to Cassandra Introduction & Features (20)

NoSQL_Night
NoSQL_NightNoSQL_Night
NoSQL_Night
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database
 
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
Sa introduction to big data pipelining with cassandra &amp; spark   west mins...Sa introduction to big data pipelining with cassandra &amp; spark   west mins...
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
 
Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance Platform
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical data
 
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ? Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
 
DBaaS - The Next generation of database infrastructure
DBaaS - The Next generation of database infrastructureDBaaS - The Next generation of database infrastructure
DBaaS - The Next generation of database infrastructure
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
 
cassandra_presentation_final
cassandra_presentation_finalcassandra_presentation_final
cassandra_presentation_final
 
Cassandra
Cassandra Cassandra
Cassandra
 
TupleJump: Breakthrough OLAP performance on Cassandra and Spark
TupleJump: Breakthrough OLAP performance on Cassandra and SparkTupleJump: Breakthrough OLAP performance on Cassandra and Spark
TupleJump: Breakthrough OLAP performance on Cassandra and Spark
 
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkFiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
 
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network Analytics
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 

Recently uploaded

20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 

Recently uploaded (20)

20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 

Cassandra Introduction & Features

  • 1. Cassandra Introduction & Key Features Meetup Vienna Cassandra Users 13th of January 2014 philipp.potisk@geroba.com
  • 2. Definition Apache Cassandra is an open source, distributed, decentralized, elastically scalable, highly available, fault-tolerant, tuneably consistent, column-oriented database that bases its distribution design on Amazon’s Dynamo and its data model on Google’s Bigtable. Created at Facebook, it is now used at some of the most popular sites on the Web [The Definitive Guide, Eben Hewitt, 2010] 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 2
  • 3. History Dynamo, 2007 Bigtable, 2006 OpenSource, 2008 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 3
  • 4. Key Features Distributed and Decentralized High Performance CQL – A SQL like query interface Elastic Scalability Cassandra Columnoriented Key-Value store 13/01/2014 High Availability and Fault Tolerance Tuneable Consistency Cassandra Introduction & Key Features by Philipp Potisk 4
  • 5. Distributed and Decentralized Datacenter 1 • Distributed: Capable of running on multiple machines • Decentralized: No single point of failure No master-slave issues due to peer-to-peer architecture (protocol "gossip") Single Cassandra cluster may run across geographically dispersed data centers 13/01/2014 Datacenter 2 1 7 6 2 5 3 4 12 8 11 9 10 Read- and writerequests to any node Cassandra Introduction & Key Features by Philipp Potisk 5
  • 6. Elastic Scalability 1 8 1 • Cassandra scales horizontally, adding more machines that have all or some of the data on • Adding of nodes increase performance throughput linearly • De-/ and increasing the nodecount happen seamlessly 4 Performance 2 throughput = N 3 2 Performance throughput = N x 2 7 4 6 5 Linearly scales to terabytes and petabytes of data 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 3 6
  • 7. Scaling Benchmark By Netflix* 48, 96, 144 and 288 instances, with 10, 20, 30 and 60 clients respectively. Each client generated ~20.000w/s having 400byte in size Cassandra scales linearly far beyond our current capacity requirements, and very rapid deployment automation makes it easy to manage. In particular, benchmarking in the cloud is fast, cheap and scalable, *http://techblog.netflix.com/201 1/11/benchmarking-cassandrascalability-on.html 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 7
  • 8. High Availability and Fault Tolerance • High Availability? Multiple networked computers operating in a cluster Facility for recognizing node failures Forward failing over requests to another part of the system 1 6 2 5 3 4 • Cassandra has High Availability No single point of failure due to the peer-to-peer architecture 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 8
  • 9. Tunable Consistency • Choose between strong and eventual consistency • Adjustable for read- and writeoperations separately • Conflicts are solved during reads, as focus lies on write-performance TUNABLE Available Consistency Use case dependent level of consistency 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 9
  • 10. When do we have strong consistency? • Simple Formula: jsmith (nodes_written + nodes_read) > replication_factor jsmith t1 t2 NW: 2 NR: 2 RF: 3 t1 t2 jsmith t1 • Ensures that a read always reflects the most recent write • If not: Weak consistency  Eventually consistent jsmith 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk t2 10
  • 11. Column-oriented Key-Value Store Row Key1 Column Key1 Column Value1 Column Key2 Column Value2 Column Key3 Column Value3 … … … • Data is stored in sparse multidimensional hash tables • A row can have multiple columns – not necessarily the same amount of columns for each row • Each row has a unique key, which also determines partitioning • No relations! Stored sorted by row key * Stored sorted by column key/value Map<RowKey, SortedMap<ColumnKey, ColumnValue>> * Row keys (partition keys) should be hashed, in order to distribute data across the cluster evenly 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 11
  • 12. CQL – An SQL-like query interface • “CQL 3 is the default and primary interface into the Cassandra DBMS” * • Familiar SQL-like syntax that maps to Cassandras storage engine and simplifies data modelling CRETE TABLE songs ( id uuid PRIMARY KEY, title text, album text, artist text, data blob, tags set<text> ); INSERT INTO songs (id, title, artist, album, tags) VALUES( 'a3e64f8f...', 'La Grange', 'ZZ Top', 'Tres Hombres'‚ {'cool', 'hot'}); SELECT * FROM songs WHERE id = 'a3e64f8f...'; “SQL-like” but NOT relational SQL * http://www.datastax.com/documentation/cql/3.0/pdf/cql30.pdf 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 12
  • 13. High Performance • Optimized from the ground up for high throughput • All disk writes are sequential, append only operations • No reading before writing • Cassandra`s threading-concept is optimized for running on multiprocessor/ multicore machines 13/01/2014 Optimized for writing, but fast reads are possible as well Cassandra Introduction & Key Features by Philipp Potisk 13
  • 14. Benchmark from 2011 (Cassandra 0.7.4)* ops Cassandra showed outstanding throughput in “INSERT-only” with 20,000 ops Insert: Enter 50 million 1K-sized records Read: Search key for a one hour period + optional update Hardware: Nehalem 6 Core x 2 CPU, 16GB Memory 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk *NoSql Benchmarking by Curbit http://www.cubrid.org/blog/de v-platform/nosqlbenchmarking/ 14
  • 15. Benchmark from 2013 (Cassandra 1.1.6)* * Benchmarking Top NoSQL Databases by End Point Corporation, http://www.datastax.com/wp-content/uploads/2013/02/WP-Benchmarking-Top-NoSQL-Databases.pdf Yahoo! Cloud Serving Benchmark: https://github.com/brianfrankcooper/YCSB 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 15
  • 16. When do we need these features? Lots of Writes, Statistics, and Analysis Geographical Distribution Large Deployments 13/01/2014 Evolving Applications Cassandra Introduction & Key Features by Philipp Potisk 16
  • 17. Who is using Cassandra? 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 17
  • 18. ebay Data Infrastructure* • • • • • • Thousands of nodes > 2K sharded logical host > 16K tables > 27K indexes > 140 billion SQLs/day > 5 PB provisioned • 10+ clusters • 100+ nodes • > 250 TB provisioned (local HDD + shared SSD) • > 9 billion writes/day • > 5 billion reads/day • Hundreds of nodes • Persistent & in-memory • > 40 billion SQLs/day Not replacing RDMBS but complementing! Hundreds of nodes > 50 TB > 2 billion ops/day • Thousands of nodes • The world largest cluster with 2K+ nodes *by Jay Patel, Cassandra Summit June 2013 San Francisco 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 18
  • 19. Cassandra Use Case at Ebay Application/Use Case • Time-series data and real-time insights • Fraud detection & prevention • Quality Click Pricing for affiliates • Order & Shipment Tracking •… • Server metrics collection • Taste graph-based next-gen recommendation system • Social Signals on eBay Product & Item pages 13/01/2014 Why Cassandra? • Multi-Datacenter (active-active) • No SPOF • Easy to scale • Write performance • Distributed Counters Cassandra Introduction & Key Features by Philipp Potisk 19
  • 21. Summary • History • Key features of Cassandra • • • • • • • Distributed and Decentralized Elastic Scalability High Availability and Fault Tolerance Tunable Consistency Column-oriented key-value store CQL interface High Performance • Ebay Use Case 13/01/2014 Apache project: http://cassandra.apache.org Community portal: http://planetcassandra.org Documentation: http://www.datastax.com/docs Cassandra Introduction & Key Features by Philipp Potisk 21