SlideShare a Scribd company logo
tyfs.rocks
tyfs.rocks 126.07.2017
tayfun.sevimli
The History of Cassandra
tyfs.rocks 226.07.2017
Where is Cassandra?
tyfs.rocks 326.07.2017
Cassandra Architecture – CAP Theorem
tyfs.rocks 426.07.2017
Cassandra was designed to fall in the “AP” intersection of
the CAP theorem that states that any distributed system can
only guarantee two of the following capabilities at same time;
Consistency, Availability and Partition Tolerance. In this way
Cassandra is a best fit for a solution seeking a distributed
database that brings high availability to a system and is also very
tolerant to partition to its data when some node in the cluster is
offline, which is common in distributed systems.
Cassandra Architecture – Data Model
tyfs.rocks 526.07.2017
Cassandra is classified as a column based database, which means that its
basic structure to store data is based upon a set of columns, which are
comprised, by a pair of column key and column value. Every row is identified
by a unique key, a string without a size limit, called partition key. Each set of
columns are called column families, similar to a relational database table.
Cassandra Architecture – Data Model
tyfs.rocks 626.07.2017
SortedMap<RowKey,SortedMap<ColumnKey, ColumnValue>>
 A map gives efficient key lookup, and the sorted nature gives efficient scans. In Cassandra, we can use row keys and column
keys to do efficient lookups and range scans.
 The number of column keys is unbounded. This means, you can have wide rows.
 A key can itself hold a value, meaning In other words, you can have a valueless column.
Cassandra Architecture – Write Path
tyfs.rocks 726.07.2017
Cassandra Write Path
 Every node first writes the mutation to the commit log
and then writes the mutation to the memtable.
 Writing to the commit log ensures durability of the write
as the memtable is an in-memory structure and is only
written to disk when the memtable is flushed to disk. A
memtable is flushed to disk when:
• It reaches its maximum allocated size in memory
• The number of minutes a memtable can stay in
memory elapses.
• Manually flushed by a user
 A memtable is flushed to an immutable structure called
and SSTable (Sorted String Table). The commit log is used
for playback purposes in case data from the memtable is
lost due to node failure.
 Every SSTable creates three files on disk which include a
bloom filter, a key index and a data file.
Cassandra Architecture – Read Path
tyfs.rocks 826.07.2017
Cassandra Read Path
 Every Column Family stores data in a number of
SSTables. Thus Data for a particular row can be located in
a number of SSTables and the memtable. Thus for every
read request Cassandra needs to read data from all
applicable SSTables ( all SSTables for a column family)
and scan the memtable for applicable data fragments.
This data is then merged and returned to the
coordinator.
 If the contacted replicas has a different version of the
data the coordinator returns the latest version to the
client and issues a read repair command to the
node/nodes with the older version of the data. The read
repair operation pushes the newer version of the data to
nodes with the older version.
Cassandra Architecture – Cluster Topology
tyfs.rocks 926.07.2017
Cluster Concepts
 a node is a cassandra instance (in
production: one node per machine)
 a partition is one ordered and replicable
unit of data on a node
 a rack is a logical set of nodes
 a Data Center is a logical set or racks
 Cluster is the full set of nodes which
map to a single complete token ring
 peer-to-peer communication gossip
protocol
Cassandra Architecture – Data Consistency
tyfs.rocks 1026.07.2017
Tunable Data Consistency
How many nodes must acknowledge a
read/write request
 choose between STRONG to
EVENTUAL
 possible CL: ANY, ONE, QUORUM
(RF/2+1), ALL
 tunable per request support
 multi-datacenter support
Cassandra Architecture – CQL Language
tyfs.rocks 1126.07.2017
Cassandra Query Language
 very similar to RDBMS SQL syntax
 create objects via DDL
 core DML commands insert,
update, delete supported
 query data with Select commands
Cassandra Architecture – Security
tyfs.rocks 1226.07.2017
Cassandra Security Features
 Authentication based on internally
controlled rolename/passwords
 Authorization based on object
permission management
 Authentication and authorization
based on JMX
username/passwords
 SSL encryption
Why Cassandra ?
tyfs.rocks 1326.07.2017
• Scales linearly with massive write
 Cassandra is a great database which can handle a big amount of data. So it is preferred for the companies that provide
Mobile phones and messaging services. These companies have a huge amount of data, so Cassandra is best for them.
• Highly Fault Tolerant
 Masterless cluster with no single point of failure. In simple terms, your users will never know if a server, an entire rack
of servers, or even if an entire data center fails. There is also the potential for zero downtime rolling upgrades.
• Easy Replication / Data Distribution
• Homogenous Environment
 No master-slave or sharding setup and that all nodes in the ring are equal.
• Ease of Administration
 Masterless, fault-tolerant, supports temporary loss of nodes with minimal impact to production performance.
• Wide Community
 No master-slave or sharding setup and that all nodes in the ring are equal.
Use Cases of Cassandra
tyfs.rocks 1426.07.2017
• Messaging & Event Sourcing
 Cassandra is a great database which can handle a big amount of data. So it is preferred for the companies that provide
Mobile phones and messaging services. These companies have a huge amount of data, so Cassandra is best for them.
• IoT & High Speed Applications
 Cassandra can handle the high speed data so it is a great database for the applications where data is coming at very
high speed from different devices or sensors.
• Product Catalogs and Retail Apps
 Cassandra is used by many retailers for durable shopping cart protection and fast product catalog input and output.
• Social Media Analytics & Recommendations
 Cassandra is a great database for many online companies and social media providers for analysis and
recommendation to their customers.
Cassandra for Akka Persistence
tyfs.rocks 1526.07.2017
• Linear scalability
 Expected Massive Load
• No SPOF
 Fault-tolerant, Resilient
• Always-On Multi-Data Center
 Data Distribution & Replication
 Cluster over Multi-Data Centers
• AKKA Persistence
 CQRS with Event-Sourcing
 Akka’s supported up to date plugin
(Lightbend)
• Akka Streams
 Batch Processing over Streaming
Cassandra Benchmarks
tyfs.rocks 1626.07.2017
University of TORONTO, NoSQL Database Performance Benchmarks, 2012
Write latency for workload read/write
Throughput for workload read/scan/write
Read latency for workload read/write
Throughput for workload read/write
Cassandra Benchmarks
tyfs.rocks 1726.07.2017
Netflix, Benchmarking Cassandra Scalability on AWS, 2011
Cassandra Benchmarks
tyfs.rocks 1826.07.2017
EndPoint database and open source consulting company, 2014
Cassandra Benchmarks
tyfs.rocks 1926.07.2017
EndPoint database and open source consulting company, 2014
Resources
tyfs.rocks 2026.07.2017
• Apache Cassandra Web Site
• Planet Cassandra Community
• DataStax Web Site
• The Distributed Architecture Behind Apache Cassandra, Bruno TINOCO
• Introduction to Apache Cassandra's Architecture, Akhil Mehra
• An Overview of Apache Cassandra, DataStax
• NoSQL Performance Benchmarks, DataStax
• Top 10 Reasons to Use Cassandra, Michael COLBY
• Security in Cassandra, IBM Developer Works

More Related Content

What's hot

Cassandra Architecture FTW
Cassandra Architecture FTWCassandra Architecture FTW
Cassandra Architecture FTW
Jeffrey Carpenter
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
Oleg Magazov
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
Pooyan Mehrparvar
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
ateeq ateeq
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
Navdeep Charan
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
YounesCharfaoui
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
András Fehér
 
Apache Cassandra
Apache CassandraApache Cassandra
Apache Cassandra
Rutuja Gholap
 
Evaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud DatabaseEvaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud Database
DataStax
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architecture
T Jake Luciani
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
datastack
 
Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26
Benoit Perroud
 
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEMCASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
IJCI JOURNAL
 
Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructure
datastack
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
Ramakant Soni
 
Comparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsbComparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsb
sonalighai
 
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
Vivek Adithya Mohankumar
 
Selecting best NoSQL
Selecting best NoSQL Selecting best NoSQL
Selecting best NoSQL
Mohammed Fazuluddin
 

What's hot (20)

Cassandra Architecture FTW
Cassandra Architecture FTWCassandra Architecture FTW
Cassandra Architecture FTW
 
Cassandra
CassandraCassandra
Cassandra
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
Apache Cassandra
Apache CassandraApache Cassandra
Apache Cassandra
 
Evaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud DatabaseEvaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud Database
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architecture
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
 
Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26
 
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEMCASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
 
Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructure
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Comparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsbComparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsb
 
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Selecting best NoSQL
Selecting best NoSQL Selecting best NoSQL
Selecting best NoSQL
 

Similar to Why Cassandra?

cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
VitsRangannavar
 
Data Storage Management
Data Storage ManagementData Storage Management
Data Storage Management
Nisheet Mahajan
 
Learning Cassandra NoSQL
Learning Cassandra NoSQLLearning Cassandra NoSQL
Learning Cassandra NoSQL
Pankaj Khattar
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
DataStax
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
hothyfa
 
Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Cassandra - A Distributed Database System
Cassandra - A Distributed Database System
Md. Shohel Rana
 
Migrating Oracle database to Cassandra
Migrating Oracle database to CassandraMigrating Oracle database to Cassandra
Migrating Oracle database to Cassandra
Umair Mansoob
 
Cassandra Learning
Cassandra LearningCassandra Learning
Cassandra Learning
Ehsan Javanmard
 
5266732.ppt
5266732.ppt5266732.ppt
5266732.ppt
hothyfa
 
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAA NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
ijfcstjournal
 
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAA NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
ijfcstjournal
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
Mohammed Fazuluddin
 
cassandra_presentation_final
cassandra_presentation_finalcassandra_presentation_final
cassandra_presentation_final
SergioBruno21
 
Dsm project-h base-cassandra
Dsm project-h base-cassandraDsm project-h base-cassandra
Dsm project-h base-cassandra
Shantanu Deshpande
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandraPL dream
 
Cassandra presentation
Cassandra presentationCassandra presentation
Cassandra presentation
Sergey Enin
 
DSM - Comparison of Hbase and Cassandra
DSM - Comparison of Hbase and CassandraDSM - Comparison of Hbase and Cassandra
DSM - Comparison of Hbase and Cassandra
Shrikant Samarth
 
Column db dol
Column db dolColumn db dol
Column db dol
poojabi
 
Cassndra (4).pptx
Cassndra (4).pptxCassndra (4).pptx
Cassndra (4).pptx
NikhilAmauriya
 
Learn Cassandra at edureka!
Learn Cassandra at edureka!Learn Cassandra at edureka!
Learn Cassandra at edureka!
Edureka!
 

Similar to Why Cassandra? (20)

cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Data Storage Management
Data Storage ManagementData Storage Management
Data Storage Management
 
Learning Cassandra NoSQL
Learning Cassandra NoSQLLearning Cassandra NoSQL
Learning Cassandra NoSQL
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
 
Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Cassandra - A Distributed Database System
Cassandra - A Distributed Database System
 
Migrating Oracle database to Cassandra
Migrating Oracle database to CassandraMigrating Oracle database to Cassandra
Migrating Oracle database to Cassandra
 
Cassandra Learning
Cassandra LearningCassandra Learning
Cassandra Learning
 
5266732.ppt
5266732.ppt5266732.ppt
5266732.ppt
 
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAA NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
 
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAA NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 
cassandra_presentation_final
cassandra_presentation_finalcassandra_presentation_final
cassandra_presentation_final
 
Dsm project-h base-cassandra
Dsm project-h base-cassandraDsm project-h base-cassandra
Dsm project-h base-cassandra
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandra
 
Cassandra presentation
Cassandra presentationCassandra presentation
Cassandra presentation
 
DSM - Comparison of Hbase and Cassandra
DSM - Comparison of Hbase and CassandraDSM - Comparison of Hbase and Cassandra
DSM - Comparison of Hbase and Cassandra
 
Column db dol
Column db dolColumn db dol
Column db dol
 
Cassndra (4).pptx
Cassndra (4).pptxCassndra (4).pptx
Cassndra (4).pptx
 
Learn Cassandra at edureka!
Learn Cassandra at edureka!Learn Cassandra at edureka!
Learn Cassandra at edureka!
 

Recently uploaded

Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 

Recently uploaded (20)

Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 

Why Cassandra?

  • 2. The History of Cassandra tyfs.rocks 226.07.2017
  • 4. Cassandra Architecture – CAP Theorem tyfs.rocks 426.07.2017 Cassandra was designed to fall in the “AP” intersection of the CAP theorem that states that any distributed system can only guarantee two of the following capabilities at same time; Consistency, Availability and Partition Tolerance. In this way Cassandra is a best fit for a solution seeking a distributed database that brings high availability to a system and is also very tolerant to partition to its data when some node in the cluster is offline, which is common in distributed systems.
  • 5. Cassandra Architecture – Data Model tyfs.rocks 526.07.2017 Cassandra is classified as a column based database, which means that its basic structure to store data is based upon a set of columns, which are comprised, by a pair of column key and column value. Every row is identified by a unique key, a string without a size limit, called partition key. Each set of columns are called column families, similar to a relational database table.
  • 6. Cassandra Architecture – Data Model tyfs.rocks 626.07.2017 SortedMap<RowKey,SortedMap<ColumnKey, ColumnValue>>  A map gives efficient key lookup, and the sorted nature gives efficient scans. In Cassandra, we can use row keys and column keys to do efficient lookups and range scans.  The number of column keys is unbounded. This means, you can have wide rows.  A key can itself hold a value, meaning In other words, you can have a valueless column.
  • 7. Cassandra Architecture – Write Path tyfs.rocks 726.07.2017 Cassandra Write Path  Every node first writes the mutation to the commit log and then writes the mutation to the memtable.  Writing to the commit log ensures durability of the write as the memtable is an in-memory structure and is only written to disk when the memtable is flushed to disk. A memtable is flushed to disk when: • It reaches its maximum allocated size in memory • The number of minutes a memtable can stay in memory elapses. • Manually flushed by a user  A memtable is flushed to an immutable structure called and SSTable (Sorted String Table). The commit log is used for playback purposes in case data from the memtable is lost due to node failure.  Every SSTable creates three files on disk which include a bloom filter, a key index and a data file.
  • 8. Cassandra Architecture – Read Path tyfs.rocks 826.07.2017 Cassandra Read Path  Every Column Family stores data in a number of SSTables. Thus Data for a particular row can be located in a number of SSTables and the memtable. Thus for every read request Cassandra needs to read data from all applicable SSTables ( all SSTables for a column family) and scan the memtable for applicable data fragments. This data is then merged and returned to the coordinator.  If the contacted replicas has a different version of the data the coordinator returns the latest version to the client and issues a read repair command to the node/nodes with the older version of the data. The read repair operation pushes the newer version of the data to nodes with the older version.
  • 9. Cassandra Architecture – Cluster Topology tyfs.rocks 926.07.2017 Cluster Concepts  a node is a cassandra instance (in production: one node per machine)  a partition is one ordered and replicable unit of data on a node  a rack is a logical set of nodes  a Data Center is a logical set or racks  Cluster is the full set of nodes which map to a single complete token ring  peer-to-peer communication gossip protocol
  • 10. Cassandra Architecture – Data Consistency tyfs.rocks 1026.07.2017 Tunable Data Consistency How many nodes must acknowledge a read/write request  choose between STRONG to EVENTUAL  possible CL: ANY, ONE, QUORUM (RF/2+1), ALL  tunable per request support  multi-datacenter support
  • 11. Cassandra Architecture – CQL Language tyfs.rocks 1126.07.2017 Cassandra Query Language  very similar to RDBMS SQL syntax  create objects via DDL  core DML commands insert, update, delete supported  query data with Select commands
  • 12. Cassandra Architecture – Security tyfs.rocks 1226.07.2017 Cassandra Security Features  Authentication based on internally controlled rolename/passwords  Authorization based on object permission management  Authentication and authorization based on JMX username/passwords  SSL encryption
  • 13. Why Cassandra ? tyfs.rocks 1326.07.2017 • Scales linearly with massive write  Cassandra is a great database which can handle a big amount of data. So it is preferred for the companies that provide Mobile phones and messaging services. These companies have a huge amount of data, so Cassandra is best for them. • Highly Fault Tolerant  Masterless cluster with no single point of failure. In simple terms, your users will never know if a server, an entire rack of servers, or even if an entire data center fails. There is also the potential for zero downtime rolling upgrades. • Easy Replication / Data Distribution • Homogenous Environment  No master-slave or sharding setup and that all nodes in the ring are equal. • Ease of Administration  Masterless, fault-tolerant, supports temporary loss of nodes with minimal impact to production performance. • Wide Community  No master-slave or sharding setup and that all nodes in the ring are equal.
  • 14. Use Cases of Cassandra tyfs.rocks 1426.07.2017 • Messaging & Event Sourcing  Cassandra is a great database which can handle a big amount of data. So it is preferred for the companies that provide Mobile phones and messaging services. These companies have a huge amount of data, so Cassandra is best for them. • IoT & High Speed Applications  Cassandra can handle the high speed data so it is a great database for the applications where data is coming at very high speed from different devices or sensors. • Product Catalogs and Retail Apps  Cassandra is used by many retailers for durable shopping cart protection and fast product catalog input and output. • Social Media Analytics & Recommendations  Cassandra is a great database for many online companies and social media providers for analysis and recommendation to their customers.
  • 15. Cassandra for Akka Persistence tyfs.rocks 1526.07.2017 • Linear scalability  Expected Massive Load • No SPOF  Fault-tolerant, Resilient • Always-On Multi-Data Center  Data Distribution & Replication  Cluster over Multi-Data Centers • AKKA Persistence  CQRS with Event-Sourcing  Akka’s supported up to date plugin (Lightbend) • Akka Streams  Batch Processing over Streaming
  • 16. Cassandra Benchmarks tyfs.rocks 1626.07.2017 University of TORONTO, NoSQL Database Performance Benchmarks, 2012 Write latency for workload read/write Throughput for workload read/scan/write Read latency for workload read/write Throughput for workload read/write
  • 17. Cassandra Benchmarks tyfs.rocks 1726.07.2017 Netflix, Benchmarking Cassandra Scalability on AWS, 2011
  • 18. Cassandra Benchmarks tyfs.rocks 1826.07.2017 EndPoint database and open source consulting company, 2014
  • 19. Cassandra Benchmarks tyfs.rocks 1926.07.2017 EndPoint database and open source consulting company, 2014
  • 20. Resources tyfs.rocks 2026.07.2017 • Apache Cassandra Web Site • Planet Cassandra Community • DataStax Web Site • The Distributed Architecture Behind Apache Cassandra, Bruno TINOCO • Introduction to Apache Cassandra's Architecture, Akhil Mehra • An Overview of Apache Cassandra, DataStax • NoSQL Performance Benchmarks, DataStax • Top 10 Reasons to Use Cassandra, Michael COLBY • Security in Cassandra, IBM Developer Works

Editor's Notes

  1. Cassandra was designed to fall in the “AP” intersection of the CAP theorem that states that any distributed system can only  guarantee two of the following capabilities at same time; Consistency, Availability and Partition tolerance. In this way Cassandra is a best fit for a solution seeking a distributed database that brings high availability to a system and is also very tolerant to partition to its data when some node in the cluster is offline, which is common in distributed systems.
  2. Cassandra was designed to fall in the “AP” intersection of the CAP theorem that states that any distributed system can only  guarantee two of the following capabilities at same time; Consistency, Availability and Partition tolerance. In this way Cassandra is a best fit for a solution seeking a distributed database that brings high availability to a system and is also very tolerant to partition to its data when some node in the cluster is offline, which is common in distributed systems.
  3. Cassandra was designed to fall in the “AP” intersection of the CAP theorem that states that any distributed system can only  guarantee two of the following capabilities at same time; Consistency, Availability and Partition tolerance. In this way Cassandra is a best fit for a solution seeking a distributed database that brings high availability to a system and is also very tolerant to partition to its data when some node in the cluster is offline, which is common in distributed systems.
  4. Cassandra was designed to fall in the “AP” intersection of the CAP theorem that states that any distributed system can only  guarantee two of the following capabilities at same time; Consistency, Availability and Partition tolerance. In this way Cassandra is a best fit for a solution seeking a distributed database that brings high availability to a system and is also very tolerant to partition to its data when some node in the cluster is offline, which is common in distributed systems.
  5. Cassandra was designed to fall in the “AP” intersection of the CAP theorem that states that any distributed system can only  guarantee two of the following capabilities at same time; Consistency, Availability and Partition tolerance. In this way Cassandra is a best fit for a solution seeking a distributed database that brings high availability to a system and is also very tolerant to partition to its data when some node in the cluster is offline, which is common in distributed systems.
  6. Each node processes the request individually. Every node first writes the mutation to the commit log and then writes the mutation to the memtable. Writing to the commit log ensures durability of the write as the memtable is an in-memory structure and is only written to disk when the memtable is flushed to disk. A memtable is flushed to disk when: It reaches its maximum allocated size in memory The number of minutes a memtable can stay in memory elapses. Manually flushed by a user A memtable is flushed to an immutable structure called and SSTable (Sorted String Table). The commit log is used for playback purposes in case data from the memtable is lost due to node failure. For example the machine has a power outage before the memtable could get flushed. Every SSTable creates three files on disk which include a bloom filter, a key index and a data file. Over a period of time a number of SSTables are created. This results in the need to read multiple SSTables to satisfy a read request. Compaction is the process of combining SSTables so that related data can be found in a single SSTable. This helps with making reads much faster.
  7. At the cluster level a read operation is similar to a write operation. As with the write path the client can connect with any node in the cluster. The chosen node is called the coordinator and is responsible for returning the requested data.  A row key must be supplied for every read operation. The coordinator uses the row key to determine the first replica. The replication strategy in conjunction with the replication factor is used to determine all other applicable replicas. As with the write path the consistency level determines the number of replica's that must respond before successfully returning data. Let's assume that the request has a consistency level of QUORUM and a replication factor of three, thus requiring the coordinator to wait for successful replies from at least two nodes. If the contacted replicas has a different version of the data the coordinator returns the latest version to the client and issues a read repair command to the node/nodes with the older version of the data. The read repair operation pushes the newer version of the data to nodes with the older version. On a per SSTable basis the operation becomes a bit more complicated. The illustration above outlines key steps that take place when reading data from an SSTable. Every SSTable has an associated bloom filter which enables it to quickly ascertain if data for the requested row key exists on the corresponding SSTable. This reduces IO when performing an row key lookup. A bloom filter is always held in memory since the whole purpose is to save disk IO. Cassandra also keeps a copy of the bloom filter on disk which enables it to recreate the bloom filter in memory quickly .  Cassandra does not store the bloom filter Java Heap instead makes a separate allocation for it in memory.  If the bloom filter returns a negative response no data is returned from the particular SSTable. This is  a common case as the compaction operation tries to group all row key related data into as few SSTables as possible. If the bloom filter provides a positive response the partition key cache is scanned to ascertain the compression offset for the requested row key. It then proceeds to fetch the compressed data on disk and returns the result set. If the partition cache does not contain a corresponding entry the partition key summary is scanned. The partition summary is a subset to the partition index and helps determine the approximate location of the index entry in the partition index. The partition index is then scanned to locate the compression offset which is then used to find the appropriate data on disk. If you reached the end of this long post then well done. In this post I have provided an introduction to Cassandra architecture. In my upcoming posts I will try and explain Cassandra architecture using a more practical approach.
  8. Cassandra was designed to fall in the “AP” intersection of the CAP theorem that states that any distributed system can only  guarantee two of the following capabilities at same time; Consistency, Availability and Partition tolerance. In this way Cassandra is a best fit for a solution seeking a distributed database that brings high availability to a system and is also very tolerant to partition to its data when some node in the cluster is offline, which is common in distributed systems.
  9. Cassandra was designed to fall in the “AP” intersection of the CAP theorem that states that any distributed system can only  guarantee two of the following capabilities at same time; Consistency, Availability and Partition tolerance. In this way Cassandra is a best fit for a solution seeking a distributed database that brings high availability to a system and is also very tolerant to partition to its data when some node in the cluster is offline, which is common in distributed systems.
  10. Cassandra was designed to fall in the “AP” intersection of the CAP theorem that states that any distributed system can only  guarantee two of the following capabilities at same time; Consistency, Availability and Partition tolerance. In this way Cassandra is a best fit for a solution seeking a distributed database that brings high availability to a system and is also very tolerant to partition to its data when some node in the cluster is offline, which is common in distributed systems.
  11. Authentication based on internally controlled rolename/passwordsCassandra authentication is roles-based and stored internally in Cassandra system tables. Administrators can create, alter, drop, or list roles using CQL commands, with an associated password. Roles can be created with superuser, non-superuser, and login privileges. The internal authentication is used to access Cassandra keyspaces and tables, and by cqlsh and DevCenter to authenticate connections to Cassandra clusters and sstableloader to load SSTables. Authorization based on object permission managementAuthorization grants access privileges to Cassandra cluster operations based on role authentication. Authorization can grant permission to access the entire database or restrict a role to individual table access. Roles can grant authorization to authorize other roles. Roles can be granted to roles. CQL commands GRANT and REVOKE are used to manage authorization. Authentication and authorization based on JMX username/passwordsJMX (Java Management Extensions) technology provides a simple and standard way of managing and monitoring resources related to an instance of a Java Virtual Machine (JVM). This is achieved by instrumenting resources with Java objects known as Managed Beans (MBeans) that are registered with an MBean server. JMX authentication stores username and associated passwords in two files, one for passwords and one for access. JMX authentication is used by nodetool and external monitoring tools such as jconsole.In Cassandra 3.6 and later, JMX authentication and authorization can be accomplished using Cassandra's internal authentication and authorization capabilities. SSL encryptionCassandra provides secure communication between a client and a database cluster, and between nodes in a cluster. Enabling SSL encryption ensures that data in flight is not compromised and is transferred securely. Client-to-node and node-to-node encryption are independently configured. Cassandra tools (cqlsh, nodetool, DevCenter) can be configured to use SSL encryption. The DataStax drivers can be configured to secure traffic between the driver and Cassandra. General security measuresTypically, production Cassandra clusters will have all non-essential firewall ports closed. Some ports must be open in order for nodes to communicate in the cluster. These ports are detailed.
  12. Goals for the Tests Select workloads that are typical of today’s modern applications Use data volumes that are representative of ‘big data’ datasets that exceed the RAM capacity for each node Ensure that all data written was done in a manner that allowed no data loss (i.e. durable writes), which is what most production environments require Tested Workloads The following workloads were included in the benchmark: Read-mostly workload, based on YCSB’s provided workload B: 95% read to 5% update ratio Read/write combination, based on YCSB’s workload A: 50% read to 50% update ratio Read-modify-write, based on YCSB workload F: 50% read to 50% read-modify-write Mixed operational and analytical: 60% read, 25% update, 10% insert, and 5% scan Insert-mostly combined with read: 90% insert to 10% read ratio