SlideShare a Scribd company logo
1 of 31
Getting started with Apache
Cassandra and Python
By
Adnan Siddiqi
(http://adnansiddiqi.me)
What is Apache Cassandra?
• According to Wikipedia:
Apache Cassandra is a free and open-source,
distributed, wide column store, NoSQL database
management system designed to handle large
amounts of data across many commodity servers,
providing high availability with no single point of
failure. Cassandra offers robust support for
clusters spanning multiple datacenters,[1] with
asynchronous masterless replication allowing low
latency operations for all clients.
History
• Developed by two Facebook engineers to deal
with search mechanism of Inbox.
• Released as an open-source project after few
years.
• Handed over to Apache Foundation.
Companies using Cassandra
• Apple
• Netflix
• eBay
• Weather Channel
Architecture
Architecture(Contd…)
• Node:- The basic component of the data, a
machine where the data is stored.
• Datacenter:- A collection of related nodes. It
can be a physical datacenter or virtual.
• Cluster:- A cluster contains one or more
datacenters, it could span across locations.
• Commit Log:- Every write operation is first
stored in the commit log. It is used for crash
recovery.
Architecture(Contd…)
• Mem-Table:- After data is written to the
commit log it then is stored in Mem-
Table(Memory Table) which remains there till
it reaches to the threshold.
• SSTable:- Sorted-String Table or SSTable is a
disk file which stores data from MemTable
once it reaches to the threshold. SSTables are
stored on disk sequentially and maintained for
each database table.
Write Operations
Write Operations(Contd…)
• Write request is stored in both CommitLog to
make sure that data is saved.
• Data is written in Memtable which holds data
till it reaches to threshold.
• Data is flused to SSTable once Memtable
reaches to its threshold.
• The node that accepts requests called
Coordinator.
Read Operations
• Direct Request:- The coordinator node sends
the read request to one of the replicas.
• Digest:- The coordinator contacts the replicas
specified by the consistency level. The
contacted nodes respond with a digest
request of the required data. Comparison
takes place to make sure that the update data
is sent back.
Replication Strategies
• Simple Strategy
• Network Topology
Simple Strategy
• It is used when you have only one data center.
It places the first replica on the node selected
by the partitioner. A partitioner determines
how data is distributed across the nodes in the
cluster (including replicas). After that,
remaining replicas are placed in a clockwise
direction in the Node ring.
Simple Strategy(Contd…)
Network Topology Strategy
• Deployments across multiple Datacenters.
• This strategy places replicas in the same
datacenter by traversing the ring clockwise
until reaching the first node in another rack.
• This strategy is highly recommended for
scalability purpose and future expansion.
Network Topology Strategy(Contd…)
Installation and Setup
• Dockerized Version.
• docker pull cassandra
• Make sure to set the Docker memory to 4GB
atleast to avoid 137 exit error code.
Installation and Setup(Contd…)
• data docker exec -it cas1
nodetool status
CQL Shell
GUI Client
Cassandra Data Modeling
Cassandra Data Modeling
Cassandra Data Modeling
• Keyspace:- It is the container collection of
column families. You can think of it as a
Database in the RDBMS world.
• Column Family:- A column family is a
container for an ordered collection of rows.
Each row, in turn, is an ordered collection of
columns. Think of it as a Table in the RDBMS
world.
Cassandra Data Modeling(Contd…)
Cassandra Data Modeling(Contd…)
Creating KeySpace
• Creating Keyspace with name CityInfo.
• create keyspace CityInfo with
replication = {'class' :
'SimpleStrategy',
'replication_factor':2}
Designing Modeling Goals
• Evenly spread of data in a cluster.
• Minimize the number of Reads.
Demo
Data Clustering
Cassandra and Python
• pip install cassandra-driver
Reading Data
from cassandra.cluster import Cluster
if __name__ == "__main__":
cluster = Cluster(['0.0.0.0'],port=9042)
session =
cluster.connect('cityinfo',wait_for_all_pools=T
rue)
session.execute('USE cityinfo')
rows = session.execute('SELECT * FROM
users')
for row in rows:
print(row.age,row.name,row.username)
The End

More Related Content

What's hot

Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2Anant Corporation
 
What we Learned About Application Resiliency When the Data Center Burned Down
What we Learned About Application Resiliency When the Data Center Burned DownWhat we Learned About Application Resiliency When the Data Center Burned Down
What we Learned About Application Resiliency When the Data Center Burned DownScyllaDB
 
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...DataStax
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architectureT Jake Luciani
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesKamesh Pemmaraju
 
Cassandra
Cassandra Cassandra
Cassandra Pooja GV
 
Cassandra basics 2.0
Cassandra basics 2.0Cassandra basics 2.0
Cassandra basics 2.0Asis Mohanty
 
Glusterfs and openstack
Glusterfs  and openstackGlusterfs  and openstack
Glusterfs and openstackopenstackindia
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series DatabasePramit Choudhary
 
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudShubham Tagra
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real WorldJeremy Hanna
 
Migrating from a Relational Database to Cassandra: Why, Where, When and How
Migrating from a Relational Database to Cassandra: Why, Where, When and HowMigrating from a Relational Database to Cassandra: Why, Where, When and How
Migrating from a Relational Database to Cassandra: Why, Where, When and HowAnant Corporation
 
Scylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File Format
Scylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File FormatScylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File Format
Scylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File FormatScyllaDB
 
Survey of distributed storage system
Survey of distributed storage systemSurvey of distributed storage system
Survey of distributed storage systemZhichao Liang
 
Hedvig & ClusterHQ - Persistent, portable storage for Docker
Hedvig & ClusterHQ - Persistent, portable storage for DockerHedvig & ClusterHQ - Persistent, portable storage for Docker
Hedvig & ClusterHQ - Persistent, portable storage for DockerEric Carter
 

What's hot (20)

Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
 
Cassandra useful features
Cassandra useful featuresCassandra useful features
Cassandra useful features
 
What we Learned About Application Resiliency When the Data Center Burned Down
What we Learned About Application Resiliency When the Data Center Burned DownWhat we Learned About Application Resiliency When the Data Center Burned Down
What we Learned About Application Resiliency When the Data Center Burned Down
 
Cassandra database design best practises
Cassandra database design best practisesCassandra database design best practises
Cassandra database design best practises
 
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architecture
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference Architectures
 
Cassandra
Cassandra Cassandra
Cassandra
 
Cassandra basics 2.0
Cassandra basics 2.0Cassandra basics 2.0
Cassandra basics 2.0
 
Distributed storage system
Distributed storage systemDistributed storage system
Distributed storage system
 
Glusterfs and openstack
Glusterfs  and openstackGlusterfs  and openstack
Glusterfs and openstack
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series Database
 
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the Cloud
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
Cassandra vs Databases
Cassandra vs Databases Cassandra vs Databases
Cassandra vs Databases
 
Migrating from a Relational Database to Cassandra: Why, Where, When and How
Migrating from a Relational Database to Cassandra: Why, Where, When and HowMigrating from a Relational Database to Cassandra: Why, Where, When and How
Migrating from a Relational Database to Cassandra: Why, Where, When and How
 
Scylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File Format
Scylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File FormatScylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File Format
Scylla Summit 2018: Scylla Feature Talks - SSTables 3.0 File Format
 
Survey of distributed storage system
Survey of distributed storage systemSurvey of distributed storage system
Survey of distributed storage system
 
Hedvig & ClusterHQ - Persistent, portable storage for Docker
Hedvig & ClusterHQ - Persistent, portable storage for DockerHedvig & ClusterHQ - Persistent, portable storage for Docker
Hedvig & ClusterHQ - Persistent, portable storage for Docker
 

Similar to Getting Started with Apache Cassandra and Python - A Comprehensive Guide

cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoopMohit Tare
 
Pythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra ClusterPythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra ClusterDataStax Academy
 
Cassandra
CassandraCassandra
Cassandraexsuns
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdfhothyfa
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introductionfardinjamshidi
 
Cassandra Tutorial
Cassandra Tutorial Cassandra Tutorial
Cassandra Tutorial Na Zhu
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataChen Robert
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage systemArunit Gupta
 
cassandra
cassandracassandra
cassandraAkash R
 
5266732.ppt
5266732.ppt5266732.ppt
5266732.ppthothyfa
 
Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Md. Shohel Rana
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservicesBigstep
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In DepthFabio Fumarola
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overviewPritamKathar
 

Similar to Getting Started with Apache Cassandra and Python - A Comprehensive Guide (20)

cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Pythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra ClusterPythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra Cluster
 
Cassandra
CassandraCassandra
Cassandra
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introduction
 
Cassandra Tutorial
Cassandra Tutorial Cassandra Tutorial
Cassandra Tutorial
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting data
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
Cassandra tutorial
Cassandra tutorialCassandra tutorial
Cassandra tutorial
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
 
cassandra
cassandracassandra
cassandra
 
5266732.ppt
5266732.ppt5266732.ppt
5266732.ppt
 
Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Cassandra - A Distributed Database System
Cassandra - A Distributed Database System
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 

More from Adnan Siddiqi

Map filter reduce in Python
Map filter reduce in PythonMap filter reduce in Python
Map filter reduce in PythonAdnan Siddiqi
 
Python Advance Tutorial - Advance Functions
Python Advance Tutorial - Advance FunctionsPython Advance Tutorial - Advance Functions
Python Advance Tutorial - Advance FunctionsAdnan Siddiqi
 
Exception handling in Python
Exception handling in PythonException handling in Python
Exception handling in PythonAdnan Siddiqi
 
Tips every developer should know to improve site performance
Tips every developer should know to improve site performanceTips every developer should know to improve site performance
Tips every developer should know to improve site performanceAdnan Siddiqi
 
Learning Dockers - Step by Step
Learning Dockers - Step by StepLearning Dockers - Step by Step
Learning Dockers - Step by StepAdnan Siddiqi
 

More from Adnan Siddiqi (6)

Map filter reduce in Python
Map filter reduce in PythonMap filter reduce in Python
Map filter reduce in Python
 
Python Decorators
Python DecoratorsPython Decorators
Python Decorators
 
Python Advance Tutorial - Advance Functions
Python Advance Tutorial - Advance FunctionsPython Advance Tutorial - Advance Functions
Python Advance Tutorial - Advance Functions
 
Exception handling in Python
Exception handling in PythonException handling in Python
Exception handling in Python
 
Tips every developer should know to improve site performance
Tips every developer should know to improve site performanceTips every developer should know to improve site performance
Tips every developer should know to improve site performance
 
Learning Dockers - Step by Step
Learning Dockers - Step by StepLearning Dockers - Step by Step
Learning Dockers - Step by Step
 

Recently uploaded

DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 

Recently uploaded (20)

DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 

Getting Started with Apache Cassandra and Python - A Comprehensive Guide

  • 1. Getting started with Apache Cassandra and Python By Adnan Siddiqi (http://adnansiddiqi.me)
  • 2. What is Apache Cassandra? • According to Wikipedia: Apache Cassandra is a free and open-source, distributed, wide column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters,[1] with asynchronous masterless replication allowing low latency operations for all clients.
  • 3. History • Developed by two Facebook engineers to deal with search mechanism of Inbox. • Released as an open-source project after few years. • Handed over to Apache Foundation.
  • 4. Companies using Cassandra • Apple • Netflix • eBay • Weather Channel
  • 6. Architecture(Contd…) • Node:- The basic component of the data, a machine where the data is stored. • Datacenter:- A collection of related nodes. It can be a physical datacenter or virtual. • Cluster:- A cluster contains one or more datacenters, it could span across locations. • Commit Log:- Every write operation is first stored in the commit log. It is used for crash recovery.
  • 7. Architecture(Contd…) • Mem-Table:- After data is written to the commit log it then is stored in Mem- Table(Memory Table) which remains there till it reaches to the threshold. • SSTable:- Sorted-String Table or SSTable is a disk file which stores data from MemTable once it reaches to the threshold. SSTables are stored on disk sequentially and maintained for each database table.
  • 9. Write Operations(Contd…) • Write request is stored in both CommitLog to make sure that data is saved. • Data is written in Memtable which holds data till it reaches to threshold. • Data is flused to SSTable once Memtable reaches to its threshold. • The node that accepts requests called Coordinator.
  • 10. Read Operations • Direct Request:- The coordinator node sends the read request to one of the replicas. • Digest:- The coordinator contacts the replicas specified by the consistency level. The contacted nodes respond with a digest request of the required data. Comparison takes place to make sure that the update data is sent back.
  • 11. Replication Strategies • Simple Strategy • Network Topology
  • 12. Simple Strategy • It is used when you have only one data center. It places the first replica on the node selected by the partitioner. A partitioner determines how data is distributed across the nodes in the cluster (including replicas). After that, remaining replicas are placed in a clockwise direction in the Node ring.
  • 14. Network Topology Strategy • Deployments across multiple Datacenters. • This strategy places replicas in the same datacenter by traversing the ring clockwise until reaching the first node in another rack. • This strategy is highly recommended for scalability purpose and future expansion.
  • 16. Installation and Setup • Dockerized Version. • docker pull cassandra • Make sure to set the Docker memory to 4GB atleast to avoid 137 exit error code.
  • 17. Installation and Setup(Contd…) • data docker exec -it cas1 nodetool status
  • 22. Cassandra Data Modeling • Keyspace:- It is the container collection of column families. You can think of it as a Database in the RDBMS world. • Column Family:- A column family is a container for an ordered collection of rows. Each row, in turn, is an ordered collection of columns. Think of it as a Table in the RDBMS world.
  • 25. Creating KeySpace • Creating Keyspace with name CityInfo. • create keyspace CityInfo with replication = {'class' : 'SimpleStrategy', 'replication_factor':2}
  • 26. Designing Modeling Goals • Evenly spread of data in a cluster. • Minimize the number of Reads.
  • 27. Demo
  • 29. Cassandra and Python • pip install cassandra-driver
  • 30. Reading Data from cassandra.cluster import Cluster if __name__ == "__main__": cluster = Cluster(['0.0.0.0'],port=9042) session = cluster.connect('cityinfo',wait_for_all_pools=T rue) session.execute('USE cityinfo') rows = session.execute('SELECT * FROM users') for row in rows: print(row.age,row.name,row.username)