SlideShare a Scribd company logo
Prepared by,
Pooja G.V
8th sem, CSE
GECH.
• Cassandra is a distributed database from Apache
that is highly scalable and designed to manage
very large amounts of structured data.
•It provides high availability with no single point of
failure.
• Open-source database
management system
(DBMS)
• Several key features of
Cassandra differentiate it
from other similar systems
Continued.....
History of Cassandra
● Cassandra was created to power
the Facebook Inbox Search
● Facebook open-sourced Cassandra in 2008 and
became an Apache Incubator project
● In 2010, Cassandra graduated to a top-level
project, regular update and releases followed
Motivation and Function
● Designed to handle large amount of data across
multiple servers
● There is a lot of unorganized data out there
● Easy to implement and deploy
● Mimics traditional relational database systems, but
with triggers and lightweight transactions
● Raw, simple data structures
General Design Features
Emphasis on performance over analysis
● Still has support for analysis tools such as Hadoop
Organization
● Rows are organized into tables
● First component of a table’s primary key is the partition key
● Rows are clustered by the remaining columns of the key
● Columns may be indexed separately from the primary key
● Tables may be created, dropped, altered at runtime without blocking
queries
Language
● CQL (Cassandra Query Language) introduced, similar to SQL (flattened
learning curve)
Peer to Peer Cluster
• Decentralized design
• No single point of failure
• No bottlenecking
Fault Tolerance/Durability
Failures happen all the time with multiple nodes
● Hardware Failure
● Bugs
● Operator error
● Power Outage, etc.
Solution: Buy cheap, redundant hardware,
replicate, maintain consistency
Fault Tolerance/Durability
• Replication
• Distribution of data to multiple data centers
Performance
• Core architectural designs allow Cassandra to
outperform its competitors
• Very good read and write throughputs
Scalability
Read and write throughput increase linearly as more machines are added
“In terms of scalability, there is a clear winner throughout our experiments.
Cassandra achieves the highest throughput for the maximum number of
nodes…” - University of Toronto
Comparisons
Apache Cassandra Google Big Table Amazon DynamoDB
Storage Type Column Column Key-Value
Best Use Write often, read less Designed for large
scalability
Large database solution
Concurrency Control MVCC Locks ACID
Characteristics High Availability
Partition Tolerance
Persistence
Consistency
High Availability
Partition Tolerance
Persistence
Consistency
High Availability
Cassandra Use Cases
Netflix
• online DVD and Blu-Ray
movie retailer
• Nielsen study showed
that 38% of Americans
use or subscribe to
Netflix
Netflix: Why Cassandra
● Using a central SQL database negatively
impacted scalability and availability
● International Expansion required Multi-
Datacenter solution
● Need for configurable Replication, Consistency,
and Resiliency in the face of failure
● Cassandra on AWS offered high levels of
scalability and availability
Jason Brown,
Senior Software
Engineer at Netflix
Hulu
• a website and a
subscription service
offering on-demand
streaming video media
• ~30 million unique
viewers per month
Hulu: Why Cassandra
• need for Availability
• need for Scalability
• Good Performance
• Nearly Linear Scalability
• Geo-Replication
• Minimal Maintenance Requirements
Andres Rangel,
Senior Software
Engineer at Hulu
Reasons for Choosing Cassandra
• Value availability over
consistency
• Require high write-throughput
• High scalability required
• No single point of failure
CAP Theorem
Cassandra’s Data Model
• Cassandra is a column oriented
NoSQL system
• Column families: sets of key-
value pairs
• A row is a collection of columns
labeled with a name
Key-Value Model
Cassandra Row
• the value of a row is itself a
sequence of key-value pairs
• such nested key-value pairs are
columns
• key = column name
• a row must contain at least 1
column
Example of Columns
Column names storing values
• key: User ID
• column names store
tweet ID values
• values of all column
names are set to “-”
(empty byte array) as
they are not used
• A Key Space is a group of
column families together. It
is only a logical grouping of
column families and
provides an isolated scope
for names
Key Space
Comparing Cassandra (C*) and RDBMS
• with RDBMS, a normalized data model is created
without considering the exact queries
• with C*, the data model is designed for specific
queries
• C*: NO joins, relationships, or foreign keys
Cassandra Query Language - CQL
• creating a keyspace - namespace of tables
CREATE KEYSPACE demo
WITH replication = {‘class’: ’SimpleStrategy’,
replication_factor’: 3};
• to use namespace:
USE demo;
Cassandra Query Language - CQL
• creating tables:
CREATE TABLE users( CREATE TABLE tweets(
email varchar, email varchar,
bio varchar, time_posted timestamp,
birthday timestamp, tweet varchar,
active boolean, PRIMARY KEY (email,
time_posted));
PRIMARY KEY (email));
Cassandra Query Language - CQL
• inserting data
INSERT INTO users (email, bio, birthday, active)
VALUES (‘john.doe@bti360.com’, ‘BT360 Teammate’,
516513600000, true);
Cassandra Query Language - CQL
• querying tables
• SELECT expression reads one or more records from
Cassandra column family and returns a result-set of rows
SELECT * FROM users;
SELECT email FROM users WHERE active = true;
Cassandra: Conclusion
• perfect for time-series data
• high performance
• Decentralization
• nearly linear scalability
• replication support
• no single points of failure
• MapReduce support
Cassandra Advantages
Cassandra Weaknesses
● no referential integrity
● querying options for retrieving data are limited
● sorting data is a design decision
● no support for atomic operations
● first think about queries, then about data model
Cassandra: Points to Consider
● Cassandra is designed as a distributed database
management system
● Cassandra write performance is always excellent, but read
performance depends on write patterns
● having a high-level understanding of some internals is a plus
References
• Lakshman, Avinash, and Prashant Malik. "Cassandra: a decentralized structured
storage system." ACM SIGOPS Operating Systems Review 44.2 (2010): 35-40.
• Hewitt, Eben. Cassandra: the definitive guide. O'Reilly Media, 2010.
• http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/a
rchitectureTOC.html
• http://www.slideshare.net/planetcassandra/a-deep-dive-into-understanding-
apache-cassandra
• http://www.slideshare.net/DataStax/evaluating-apache-cassandra-as-a-cloud-
database
• http://planetcassandra.org/functional-use-cases/
• http://marsmedia.info/en/cassandra-pros-cons-and-model.php
• http://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-
cassandra
• http://wiki.apache.org/cassandra/CassandraLimitations

More Related Content

What's hot

Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecture
nickmbailey
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Dvir Volk
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
Gokhan Atil
 
Cassandra - Research Paper Overview
Cassandra - Research Paper OverviewCassandra - Research Paper Overview
Cassandra - Research Paper Overview
sameiralk
 
Cassandra serving netflix @ scale
Cassandra serving netflix @ scaleCassandra serving netflix @ scale
Cassandra serving netflix @ scale
Vinay Kumar Chella
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
PritamKathar
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
DataWorks Summit
 
NoSql
NoSqlNoSql
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel LiljencrantzC* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
DataStax Academy
 
Introduction to mongodb
Introduction to mongodbIntroduction to mongodb
Introduction to mongodb
neela madheswari
 
Cassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage SystemCassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage System
Varad Meru
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
DataStax
 
Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?
Yaroslav Tkachenko
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
Folio3 Software
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
Yousun Jeong
 
Couchbase 101
Couchbase 101 Couchbase 101
Couchbase 101
Dipti Borkar
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architecture
Bishal Khanal
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
YounesCharfaoui
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
Brent Theisen
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Mike Dirolf
 

What's hot (20)

Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecture
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Cassandra - Research Paper Overview
Cassandra - Research Paper OverviewCassandra - Research Paper Overview
Cassandra - Research Paper Overview
 
Cassandra serving netflix @ scale
Cassandra serving netflix @ scaleCassandra serving netflix @ scale
Cassandra serving netflix @ scale
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
NoSql
NoSqlNoSql
NoSql
 
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel LiljencrantzC* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
 
Introduction to mongodb
Introduction to mongodbIntroduction to mongodb
Introduction to mongodb
 
Cassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage SystemCassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage System
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
Couchbase 101
Couchbase 101 Couchbase 101
Couchbase 101
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architecture
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 

Viewers also liked

Cassandra devoxx 2010
Cassandra devoxx 2010Cassandra devoxx 2010
Cassandra devoxx 2010
jbellis
 
Cassandra - Deep Dive ...
Cassandra - Deep Dive ...Cassandra - Deep Dive ...
Cassandra - Deep Dive ...
sameiralk
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
Arunit Gupta
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
Joe Stein
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architecture
T Jake Luciani
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed Database
Eric Evans
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflix
nkorla1share
 
Shall we play a game?
Shall we play a game?Shall we play a game?
Shall we play a game?
Maciej Lasyk
 

Viewers also liked (8)

Cassandra devoxx 2010
Cassandra devoxx 2010Cassandra devoxx 2010
Cassandra devoxx 2010
 
Cassandra - Deep Dive ...
Cassandra - Deep Dive ...Cassandra - Deep Dive ...
Cassandra - Deep Dive ...
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architecture
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed Database
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflix
 
Shall we play a game?
Shall we play a game?Shall we play a game?
Shall we play a game?
 

Similar to Cassandra

cassandra_presentation_final
cassandra_presentation_finalcassandra_presentation_final
cassandra_presentation_final
SergioBruno21
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
nehabsairam
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Fwdays
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra Model
Rishikese MR
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
András Fehér
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introduction
fardinjamshidi
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting data
Chen Robert
 
Big data stores
Big data  storesBig data  stores
Big data stores
Kumaran Ramanujam
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
Fayez Shayeb
 
Introduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDBIntroduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDB
Janos Geronimo
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
sameerfaizan
 
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
Amazon Web Services
 
NoSQL_Night
NoSQL_NightNoSQL_Night
NoSQL_Night
Clarence J M Tauro
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series Modeling
Vassilis Bekiaris
 
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB
Rajesh Menon
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Lviv Startup Club
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
Cassandra Architecture FTW
Cassandra Architecture FTWCassandra Architecture FTW
Cassandra Architecture FTW
Jeffrey Carpenter
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
Alessandro Melchiori
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
Mohammed Fazuluddin
 

Similar to Cassandra (20)

cassandra_presentation_final
cassandra_presentation_finalcassandra_presentation_final
cassandra_presentation_final
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra Model
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introduction
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting data
 
Big data stores
Big data  storesBig data  stores
Big data stores
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
Introduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDBIntroduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDB
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
 
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
 
NoSQL_Night
NoSQL_NightNoSQL_Night
NoSQL_Night
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series Modeling
 
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
Cassandra Architecture FTW
Cassandra Architecture FTWCassandra Architecture FTW
Cassandra Architecture FTW
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 

Recently uploaded

zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 

Recently uploaded (20)

zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
Artificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic WarfareArtificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic Warfare
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 

Cassandra

  • 1. Prepared by, Pooja G.V 8th sem, CSE GECH.
  • 2.
  • 3. • Cassandra is a distributed database from Apache that is highly scalable and designed to manage very large amounts of structured data. •It provides high availability with no single point of failure.
  • 4. • Open-source database management system (DBMS) • Several key features of Cassandra differentiate it from other similar systems Continued.....
  • 5. History of Cassandra ● Cassandra was created to power the Facebook Inbox Search ● Facebook open-sourced Cassandra in 2008 and became an Apache Incubator project ● In 2010, Cassandra graduated to a top-level project, regular update and releases followed
  • 6. Motivation and Function ● Designed to handle large amount of data across multiple servers ● There is a lot of unorganized data out there ● Easy to implement and deploy ● Mimics traditional relational database systems, but with triggers and lightweight transactions ● Raw, simple data structures
  • 7. General Design Features Emphasis on performance over analysis ● Still has support for analysis tools such as Hadoop Organization ● Rows are organized into tables ● First component of a table’s primary key is the partition key ● Rows are clustered by the remaining columns of the key ● Columns may be indexed separately from the primary key ● Tables may be created, dropped, altered at runtime without blocking queries Language ● CQL (Cassandra Query Language) introduced, similar to SQL (flattened learning curve)
  • 8. Peer to Peer Cluster • Decentralized design • No single point of failure • No bottlenecking
  • 9. Fault Tolerance/Durability Failures happen all the time with multiple nodes ● Hardware Failure ● Bugs ● Operator error ● Power Outage, etc. Solution: Buy cheap, redundant hardware, replicate, maintain consistency
  • 10. Fault Tolerance/Durability • Replication • Distribution of data to multiple data centers
  • 11. Performance • Core architectural designs allow Cassandra to outperform its competitors • Very good read and write throughputs
  • 12. Scalability Read and write throughput increase linearly as more machines are added “In terms of scalability, there is a clear winner throughout our experiments. Cassandra achieves the highest throughput for the maximum number of nodes…” - University of Toronto
  • 13. Comparisons Apache Cassandra Google Big Table Amazon DynamoDB Storage Type Column Column Key-Value Best Use Write often, read less Designed for large scalability Large database solution Concurrency Control MVCC Locks ACID Characteristics High Availability Partition Tolerance Persistence Consistency High Availability Partition Tolerance Persistence Consistency High Availability
  • 15. Netflix • online DVD and Blu-Ray movie retailer • Nielsen study showed that 38% of Americans use or subscribe to Netflix
  • 16. Netflix: Why Cassandra ● Using a central SQL database negatively impacted scalability and availability ● International Expansion required Multi- Datacenter solution ● Need for configurable Replication, Consistency, and Resiliency in the face of failure ● Cassandra on AWS offered high levels of scalability and availability Jason Brown, Senior Software Engineer at Netflix
  • 17. Hulu • a website and a subscription service offering on-demand streaming video media • ~30 million unique viewers per month
  • 18. Hulu: Why Cassandra • need for Availability • need for Scalability • Good Performance • Nearly Linear Scalability • Geo-Replication • Minimal Maintenance Requirements Andres Rangel, Senior Software Engineer at Hulu
  • 19. Reasons for Choosing Cassandra • Value availability over consistency • Require high write-throughput • High scalability required • No single point of failure
  • 22. • Cassandra is a column oriented NoSQL system • Column families: sets of key- value pairs • A row is a collection of columns labeled with a name Key-Value Model
  • 23. Cassandra Row • the value of a row is itself a sequence of key-value pairs • such nested key-value pairs are columns • key = column name • a row must contain at least 1 column
  • 25. Column names storing values • key: User ID • column names store tweet ID values • values of all column names are set to “-” (empty byte array) as they are not used
  • 26. • A Key Space is a group of column families together. It is only a logical grouping of column families and provides an isolated scope for names Key Space
  • 27. Comparing Cassandra (C*) and RDBMS • with RDBMS, a normalized data model is created without considering the exact queries • with C*, the data model is designed for specific queries • C*: NO joins, relationships, or foreign keys
  • 28. Cassandra Query Language - CQL • creating a keyspace - namespace of tables CREATE KEYSPACE demo WITH replication = {‘class’: ’SimpleStrategy’, replication_factor’: 3}; • to use namespace: USE demo;
  • 29. Cassandra Query Language - CQL • creating tables: CREATE TABLE users( CREATE TABLE tweets( email varchar, email varchar, bio varchar, time_posted timestamp, birthday timestamp, tweet varchar, active boolean, PRIMARY KEY (email, time_posted)); PRIMARY KEY (email));
  • 30. Cassandra Query Language - CQL • inserting data INSERT INTO users (email, bio, birthday, active) VALUES (‘john.doe@bti360.com’, ‘BT360 Teammate’, 516513600000, true);
  • 31. Cassandra Query Language - CQL • querying tables • SELECT expression reads one or more records from Cassandra column family and returns a result-set of rows SELECT * FROM users; SELECT email FROM users WHERE active = true;
  • 33. • perfect for time-series data • high performance • Decentralization • nearly linear scalability • replication support • no single points of failure • MapReduce support Cassandra Advantages
  • 34. Cassandra Weaknesses ● no referential integrity ● querying options for retrieving data are limited ● sorting data is a design decision ● no support for atomic operations ● first think about queries, then about data model
  • 35. Cassandra: Points to Consider ● Cassandra is designed as a distributed database management system ● Cassandra write performance is always excellent, but read performance depends on write patterns ● having a high-level understanding of some internals is a plus
  • 36. References • Lakshman, Avinash, and Prashant Malik. "Cassandra: a decentralized structured storage system." ACM SIGOPS Operating Systems Review 44.2 (2010): 35-40. • Hewitt, Eben. Cassandra: the definitive guide. O'Reilly Media, 2010. • http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/a rchitectureTOC.html • http://www.slideshare.net/planetcassandra/a-deep-dive-into-understanding- apache-cassandra • http://www.slideshare.net/DataStax/evaluating-apache-cassandra-as-a-cloud- database • http://planetcassandra.org/functional-use-cases/ • http://marsmedia.info/en/cassandra-pros-cons-and-model.php • http://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global- cassandra • http://wiki.apache.org/cassandra/CassandraLimitations

Editor's Notes

  1. It combines Amazon Dynamo’s fully distributed design with Google Bigtable’s column-oriented data model.
  2. designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failur
  3. offering streaming movies through video game consoles, Apple TV, TiVo and more http://www.usatoday.com/story/life/tv/2013/09/18/netflix-hulu-amazon-nielsen-viewership-data/2831535/
  4. Central Oracle database -> everything in one place, convenient until it fails Schema changes required downtime Cassandra stores 3 local copies, 1 per zone Synchronous access, durable Replicates at destination Global Coverage business agility Local Access better latency fault isolation
  5. Geo-Replication – distribution of data across multiple regions
  6. SimpleStrategy is a replication strategy. There are two: SimpleStrategy and NetworkTopologyStrategy. SimpleStrategy is used for simple data center clusters. It is default. NetworkTopologyStrategy is used for multi-datacenter deployments
  7. Table – when schema is static Column Family – when schema is dynamic
  8. No join or subquery support, and limited support for aggregation. This is by design, to force you to denormalize into partitions that can be efficiently queried from a single replica, instead of having to gather data from across the entire cluster. Ordering is done per-partition, and is specified at table creation time. Again, this is to enforce good application design; sorting thousands or millions of rows can be fast in development, but sorting billions in production is a bad idea.
  9. It’s important to analyze how you are going to query your data. Spending time to design your schema around your query pattern can save a lot of hassle debugging performance issues while also ensuring that you can scale easily. Additionally, having a high-level understanding of some of the internals such has how deletions are implemented, how secondary indices operate, and when to use the row cache can go a long way in designing a strong application built atop Cassandra.