Introduction to Apache Cassandra

•Download as ODP, PDF•

6 likes•4,536 views

Casandra is a open-source, distributed, highly scalable and fault-tolerant database. It is a best choice for managing structured, semi-structured or unstructured data at a large amount.

Software

Agenda
● What is Cassandra
● Gossip communication protocol
● Cassandra- Data Model
● Cassandra- Architecture
● Reading/Writing a node
● Data consistency

Cassandra
● Cassandra is massively scalable schemaless database.
● Open source database, licensed under Apache.
● Originally, developed by Facebok for inbox search.
● Data model based upon Google’s BigTable.
● Distributed design is based upon Amazon Dynamo.
● Promoted massively by Datastax.

Gossip Communication Protocol
● Peer to peer communication protocol.
● Nodes are arranged in ring format.
● Data is replicated to multiple nodes.
● Nodes periodically exchange info. they have.
● Nodes also exchange their own info.
● Each message has its associated version.
● No master-slave concept, and hence no single point of failure.

Cassandra- Data Model
● Column data is stored as in key/value pair.
● Collection of column makes a Row.
● Column family is then becomes as collection of all rows.
● In RDBMS, each column must have some value else NULL,
but not in case of cassandra database.

Cassandra- Data Model
● Consider following example,
● Now inserting a new row:
● Above insertion would not fail.

Cassandra- Data Model
● It means, data are stored as multi-dimensional sparse array.

Cassandra- Architecture
● A ring has several nodes.
● Each node is assigned a Partition value.
● Data processing is based on the Partition Key.
● When a client makes a request to a node, it becomes the
coordinator for that request.
● The coordinator determines which node in the ring should
process upon that request.

Cassandra- Architecture
● Virtual Nodes (Vnodes)
– Responsible for assigning the partition token range.
– Tokens are automatically calculated & assigned to each
node.
– Cluster re-balancing is done automatically.

Cassandra- Architecture
● Which node gets what data is based on the partition key.
● Cassandra assigns a hash value to each partition key.
● And data gets to a node as per the hash value

Cassandra- Architecture
● How write request gets fulfilled:-

Data Replication
● Data replication
– Simple Strategy
● Used for only one cluster
– Network Topology Strategy
● Used for multiple clusters in multiple data centers.

Writing data in a Node
● Write an entry in the commit log
● Write data to memtable.
● When memtable is full, Store data on disk in SSTables.
● SSTables are immutable data structure.
● Also has a support for TTL.
Cassandra is the fastest db in concern with the write operation

Reading data from a Node
● First, checks the memtable using Bloom filter.
● If found, then data is sent as response.
● Else, fetch the data from the SSTables.
Cassandra may write many versions of the same row, then
how to identify the latest one?

Update/Delete data from Node
● Data is not immediately deleted.
● It is marked to be deleted/updated in memtables.
● This process is called tombstone.
● Tombstone, runs at configured interval of time.
● During each interval, it collects all the SSTables and updates
the marked record and discards the old SSTables.

Data Consistency
● Data is not necessarily on every node all the time.
● For maintaining consistency, no. of replicas should respond:
– ONE
– QUORUM
– ALL
● Consistency has major impact on performance.
● For strong consistency:
R + W > N

References
● O’reilly- Cassandra Definitive Guide
● https://cassandra.apache.org/doc/latest/
● http://docs.datastax.com/en/cassandra/3.0/

What's hot

Cassandra 101Nader Ganayem

NOSQL Databases types and UsesSuvradeep Rudra

Appache Cassandra nehabsairam

Introduction to MongoDBMike Dirolf

Cassandra an overviewPritamKathar

Nosql data modelsViet-Trung TRAN

HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!

Dynamo and BigTable in light of the CAP theoremGrisha Weintraub

9. Document Oriented DatabasesFabio Fumarola

Cassandra Introduction & FeaturesDataStax Academy

Apache Spark ArchitectureAlexey Grishchenko

5 Data Modeling for NoSQL 1/2Fabio Fumarola

cassandraAkash R

Hadoop File system (HDFS)Prashant Gupta

Deep Dive into CassandraBrent Theisen

Presentation of Apache Cassandra Nikiforos Botis

NoSQL Database- cassandra column Base DBsadegh salehi

Introduction to Apache Spark Developer TrainingCloudera, Inc.

Apache Cassandra at the Geek2Geek BerlinChristian Johannsen

Intro to HBasealexbaranau

What's hot (20)

Cassandra 101

NOSQL Databases types and Uses

Appache Cassandra

Introduction to MongoDB

Cassandra an overview

Nosql data models

HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database

Dynamo and BigTable in light of the CAP theorem

9. Document Oriented Databases

Cassandra Introduction & Features

Apache Spark Architecture

5 Data Modeling for NoSQL 1/2

cassandra

Hadoop File system (HDFS)

Deep Dive into Cassandra

Presentation of Apache Cassandra

NoSQL Database- cassandra column Base DB

Introduction to Apache Spark Developer Training

Apache Cassandra at the Geek2Geek Berlin

Intro to HBase

Similar to Introduction to Apache Cassandra

CassandraCarbo Kuo

Cassandra overviewSean Murphy

On Rails with Apache CassandraStu Hood

An Introduction to Apache CassandraSaeid Zebardast

cybersecurity notes for mca students for learningVitsRangannavar

Apache cassandraAdnan Siddiqi

Cassandra Talk: Austin JUGStu Hood

CassandraUpaang Saxena

NoSQL DatabasesEduard Tudenhoefner

cassandra.pptxBRINDHA256909

Storage cassandraPL dream

Cassandra - A Basic Introduction GuideMohammed Fazuluddin

Cassandra - A Distributed Database System Md. Shohel Rana

Cassandra vs Databases Anant Corporation

Cassandra InsiderKnoldus Inc.

Introduction to AWS Big Data Omid Vahdaty

Cassandra advanced part-llachudhivi

Cassandra presentationSergey Enin

Cassandra tutorialRamakrishna kapa

Impala presentation ahad ranaData Con LA

Similar to Introduction to Apache Cassandra (20)

Cassandra

Cassandra overview

On Rails with Apache Cassandra

An Introduction to Apache Cassandra

cybersecurity notes for mca students for learning

Apache cassandra

Cassandra Talk: Austin JUG

Cassandra

NoSQL Databases

cassandra.pptx

Storage cassandra

Cassandra - A Basic Introduction Guide

Cassandra - A Distributed Database System

Cassandra vs Databases

Cassandra Insider

Introduction to AWS Big Data

Cassandra advanced part-ll

Cassandra presentation

Cassandra tutorial

Impala presentation ahad rana

Recently uploaded

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823

Software Quality Assurance Interview QuestionsArshad QA

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI

HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai

Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812

Professional Resume Template for Software DevelopersVinodh Ram

Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.

Clustering techniques data mining book ....ShaimaaMohamedGalal

Active Directory Penetration Testing, cionsystems.com.pdfCionsystems

Optimizing AI for immediate response in Smart CCTVshikhaohhpro

Diamond Application Development Crafting Solutions with PrecisionSolGuruz

why an Opensea Clone Script might be your perfect match.pdfjoe51371421

Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531

How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc

Test Automation Strategy for Frontend and BackendArshad QA

Recently uploaded (20)

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️

Software Quality Assurance Interview Questions

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI

HR Software Buyers Guide in 2024 - HRSoftware.com

Unlocking the Future of AI Agents with Large Language Models

Professional Resume Template for Software Developers

Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...

Clustering techniques data mining book ....

Active Directory Penetration Testing, cionsystems.com.pdf

Optimizing AI for immediate response in Smart CCTV

Diamond Application Development Crafting Solutions with Precision

why an Opensea Clone Script might be your perfect match.pdf

Hand gesture recognition PROJECT PPT.pptx

How To Use Server-Side Rendering with Nuxt.js

Test Automation Strategy for Frontend and Backend

Introduction to Apache Cassandra

1. Apache Cassandra Harshit Daga Software Consultant Knoldus Software LLP

2. Agenda ● What is Cassandra ● Gossip communication protocol ● Cassandra- Data Model ● Cassandra- Architecture ● Reading/Writing a node ● Data consistency

3. Cassandra ● Cassandra is massively scalable schemaless database. ● Open source database, licensed under Apache. ● Originally, developed by Facebok for inbox search. ● Data model based upon Google’s BigTable. ● Distributed design is based upon Amazon Dynamo. ● Promoted massively by Datastax.

4. Gossip Communication Protocol ● Peer to peer communication protocol. ● Nodes are arranged in ring format. ● Data is replicated to multiple nodes. ● Nodes periodically exchange info. they have. ● Nodes also exchange their own info. ● Each message has its associated version. ● No master-slave concept, and hence no single point of failure.

5. Cassandra- Data Model ● Column data is stored as in key/value pair. ● Collection of column makes a Row. ● Column family is then becomes as collection of all rows. ● In RDBMS, each column must have some value else NULL, but not in case of cassandra database.

6. Cassandra- Data Model ● Consider following example, ● Now inserting a new row: ● Above insertion would not fail.

7. Cassandra- Data Model ● It means, data are stored as multi-dimensional sparse array.

8. Cassandra- Architecture ● A ring has several nodes. ● Each node is assigned a Partition value. ● Data processing is based on the Partition Key. ● When a client makes a request to a node, it becomes the coordinator for that request. ● The coordinator determines which node in the ring should process upon that request.

9. Cassandra- Architecture ● Virtual Nodes (Vnodes) – Responsible for assigning the partition token range. – Tokens are automatically calculated & assigned to each node. – Cluster re-balancing is done automatically.

10. Cassandra- Architecture ● Which node gets what data is based on the partition key. ● Cassandra assigns a hash value to each partition key. ● And data gets to a node as per the hash value

11. Cassandra- Architecture ● How write request gets fulfilled:-

12. Data Replication ● Data replication – Simple Strategy ● Used for only one cluster – Network Topology Strategy ● Used for multiple clusters in multiple data centers.

13. Writing data in a Node ● Write an entry in the commit log ● Write data to memtable. ● When memtable is full, Store data on disk in SSTables. ● SSTables are immutable data structure. ● Also has a support for TTL. Cassandra is the fastest db in concern with the write operation

14. Reading data from a Node ● First, checks the memtable using Bloom filter. ● If found, then data is sent as response. ● Else, fetch the data from the SSTables. Cassandra may write many versions of the same row, then how to identify the latest one?

15. Update/Delete data from Node ● Data is not immediately deleted. ● It is marked to be deleted/updated in memtables. ● This process is called tombstone. ● Tombstone, runs at configured interval of time. ● During each interval, it collects all the SSTables and updates the marked record and discards the old SSTables.

16. Data Consistency ● Data is not necessarily on every node all the time. ● For maintaining consistency, no. of replicas should respond: – ONE – QUORUM – ALL ● Consistency has major impact on performance. ● For strong consistency: R + W > N

17. References ● O’reilly- Cassandra Definitive Guide ● https://cassandra.apache.org/doc/latest/ ● http://docs.datastax.com/en/cassandra/3.0/

18. Thank You !!

Introduction to Apache Cassandra

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to Apache Cassandra

Similar to Introduction to Apache Cassandra (20)

More from Knoldus Inc.

More from Knoldus Inc. (20)

Recently uploaded

Recently uploaded (20)

Introduction to Apache Cassandra