SlideShare a Scribd company logo
1 of 39
Apache Cassandra 
Philip Thompson 
Software Engineer 
DataStax 
©2014 DataStax. Do not distribute without consent. 
1
Who I am 
• Philip Thompson 
• Software Engineer at DataStax 
• Contributor to Apache Cassandra 
• A maintainer of CCM, the Cassandra Cluster Manager
Apache Cassandra™ 
•Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed 
database built for modern, mission-critical online applications. 
•Written in Java and is a hybrid of Amazon Dynamo and Google BigTable 
•Masterless with no single point of failure 
•Distributed and data centre aware 
•100% uptime 
•Predictable scaling 
3
©2012 DataStax 4
©2012 DataStax 5
©2012 DataStax 6
http://techblog.netflix.com/2012/07/lessons-netflix-learned-from-aws-storm.html 
©2012 DataStax 7 9
Cluster Architecture 
©2012 DataStax 
8
Data Distribution 
75 
0 
25 
50 
Murmur3_Hash_Function(Partition Key) >> 
Token
Cassandra - More than one server 
• All nodes participate in a 
cluster 
• Shared nothing 
• Add or remove as needed 
• More capacity? Add a 
server 
10 
• Each node owns a number of tokens 
• Tokens denote a range of keys 
• 4 nodes? -> Key range/4 
• Each node owns 1/4 the data
Cassandra - Locally Distributed 
• Client writes to any 
node 
• Node coordinates with 
others 
• Data replicated in 
parallel 
• Replication factor (RF): 
How many copies of 
your data? 
• RF = 3 here 
Each node stores 3/4 
of clusters total data. 
11
Cassandra - Geographically Distributed 
• Client writes local 
• Data syncs across WAN 
• Replication Factor per DC 
Single coordinator 
12
Cassandra - Replication Factor 
• Replication factor (RF): 
How many copies of 
your data? 
• Replication Factor is set 
per keyspace 
• Can be altered by 
operator 
13 
RF = 3
Cassandra - Consistency 
• Consistency Level (CL) 
• Client specifies per read 
or write 
• ALL = All replicas ack 
• QUORUM = > 51% of replicas ack 
• LOCAL_QUORUM = > 51% in local DC ack 
• ONE = Only one replica acks 
14
Cassandra - Transparent to the application 
• A single node failure shouldn’t bring failure 
• Replication Factor + Consistency Level = Success 
• This example: 
• RF = 3 
• CL = QUORUM 
>51% Ack so we are good! 
15
Cassandra - Scaling 
• Take a cluster of four nodes 
• Where does the fifth node go? 
• Rebalancing is costly 
75 
16 
0 
25 
50
Gossip 
• Manages cluster state 
• Nodes up/down 
• Nodes joining/leaving 
• Decentralized 
• “Heartbeat” every second 
• Every node contacts 1-3 other nodes
Snitch 
• Responsible for determining cluster topology 
• Datacenter awareness 
• Tracks node responsiveness 
• Many snitches provided out of the box 
• SimpleSnitch 
• GossipingPropertyFileSnitch (recommended for production) 
• EC2Snitch and EC2MultiRegionSnitch 
• For use with AWS 
• Comparable GCE snitch has just been added 
• Custom snitches can be added 
20
Anti-Entropy - Read Repair
Anti-Entropy - Hinted Handoff 
• Three hour window 
• Hints are replayed when node is 
restored 
• Stored in system.hints table on 
coordinator 
• Cassandra does not copy Dynamo’s 
“sloppy quorum” 
22
Anti-Entropy - Repair 
• Nodetool repair 
• Uses merkle trees for data 
comparison 
• Should be run weekly. 
• Cassandra 2.1 has drastically 
improved repair times, thanks to 
incremental repair 
23
Node Architecture 
©2012 DataStax 
24
Write Path 
commit log 
Memtable 
SSTable 
Write 
Memory 
Disk
Write Path 
• By default data is fsynced every 10s 
• This can be configured in cassandra.yaml 
commit log 
Memtable 
SSTable 
Write
Read Path 
Memtable 
SSTable 
Read 
SSTable 
Memory 
Disk
Read Path
Compaction
Compaction
Debugging your data model 
• Tracing 
cqlsh> tracing on; 
Now tracing requests. 
cqlsh:foo> INSERT INTO test (a, b) VALUES (1, 'example'); 
Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9 
activity | timestamp | source | source_elapsed 
-------------------------------------+--------------+-----------+---------------- 
execute_cql3_query | 00:02:37,015 | 127.0.0.1 | 0 
Parsing statement | 00:02:37,015 | 127.0.0.1 | 81 
Preparing statement | 00:02:37,015 | 127.0.0.1 | 273 
Determining replicas for mutation | 00:02:37,015 | 127.0.0.1 | 540 
Sending message to /127.0.0.2 | 00:02:37,015 | 127.0.0.1 | 779 
Messsage received from /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 63 
Applying mutation | 00:02:37,016 | 127.0.0.2 | 220 
Acquiring switchLock | 00:02:37,016 | 127.0.0.2 | 250 
Appending to commitlog | 00:02:37,016 | 127.0.0.2 | 277 
Adding to memtable | 00:02:37,016 | 127.0.0.2 | 378 
Enqueuing response to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 710 
Sending message to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 888 
Messsage received from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2334 
Processing response from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2550 
Request complete | 00:02:37,017 | 127.0.0.1 | 2581
Nodetool 
• Command line interface for monitoring Cassandra and performing routine 
database operations 
• Commands for viewing detailed metrics for tables, server metrics, and 
compaction statistics: 
• cfstats: statistics for each table and keyspace 
• cfhistograms: statistics about a table, including read/write latency, row size, column count, 
and number of SSTables 
• netstats: statistics about network operations and connections 
• tpstats: statistics about the number of active, pending, and completed tasks for each stage of 
Cassandra operations by thread pool 
32
Try it out 
©2012 DataStax 
33
Cassandra 
• Download from source: 
• git clone git://git.apache.org/cassandra.git 
• Packaged install and tarballs available: 
• http://www.datastax.com/documentation/cassandra/2.1/cassandra/install/install_cassan 
draTOC.html
CCM 
• CCM - Cassandra Cluster Manager 
• https://github.com/pcmanus/ccm 
•Warning: not lightweight 
• Example: 
• ccm create test -v 2.0.1 
• ccm populate -n 3 
• ccm start
Clients 
• Cqlsh 
• Bundled with Cassandra 
• Drivers 
• java: https://github.com/datastax/java-driver 
• python: https://github.com/datastax/python-driver 
• .net: https://github.com/datastax/csharp-driver 
• and more: http://www.datastax.com/download/clientdrivers 
• Ruby, C/C++, NodeJS
Get Help 
• IRC: #cassandra on freenode 
• Mailing Lists 
• Subscribe at cassandra.apache.org 
• Stack Overflow 
• DataStax Docs 
• http://www.datastax.com/docs 
37
Questions? 
©2012 DataStax 
38
©2014 DataStax Confidential. Do not distribute without consent. 39

More Related Content

What's hot

Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
DataStax
 
How netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudHow netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloud
Vinay Kumar Chella
 

What's hot (20)

Advanced Operations
Advanced OperationsAdvanced Operations
Advanced Operations
 
Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Spark
 
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
 
Cassandra Summit 2014: Active-Active Cassandra Behind the Scenes
Cassandra Summit 2014: Active-Active Cassandra Behind the ScenesCassandra Summit 2014: Active-Active Cassandra Behind the Scenes
Cassandra Summit 2014: Active-Active Cassandra Behind the Scenes
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecture
 
Looking towards an official cassandra sidecar netflix
Looking towards an official cassandra sidecar   netflixLooking towards an official cassandra sidecar   netflix
Looking towards an official cassandra sidecar netflix
 
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
 
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
 
Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...
Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...
Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talk
 
Instaclustr webinar 2017 feb 08 japan
Instaclustr webinar 2017 feb 08   japanInstaclustr webinar 2017 feb 08   japan
Instaclustr webinar 2017 feb 08 japan
 
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
 
A glimpse of cassandra 4.0 features netflix
A glimpse of cassandra 4.0 features   netflixA glimpse of cassandra 4.0 features   netflix
A glimpse of cassandra 4.0 features netflix
 
How netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudHow netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloud
 
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
 
Mesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run CassandraMesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run Cassandra
 
Processing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and SparkProcessing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and Spark
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
 
Micro-batching: High-performance writes
Micro-batching: High-performance writesMicro-batching: High-performance writes
Micro-batching: High-performance writes
 
How to Monitor and Size Workloads on AWS i3 instances
How to Monitor and Size Workloads on AWS i3 instancesHow to Monitor and Size Workloads on AWS i3 instances
How to Monitor and Size Workloads on AWS i3 instances
 

Viewers also liked

Dsp presentation
Dsp presentationDsp presentation
Dsp presentation
ILA SHARMA
 
02 cv mil_intro_to_probability
02 cv mil_intro_to_probability02 cv mil_intro_to_probability
02 cv mil_intro_to_probability
zukun
 

Viewers also liked (20)

Dsp presentation
Dsp presentationDsp presentation
Dsp presentation
 
Window_of_Economic_Statistics_MDPS_AE_Q3_2013Window of economic_statistics_md...
Window_of_Economic_Statistics_MDPS_AE_Q3_2013Window of economic_statistics_md...Window_of_Economic_Statistics_MDPS_AE_Q3_2013Window of economic_statistics_md...
Window_of_Economic_Statistics_MDPS_AE_Q3_2013Window of economic_statistics_md...
 
02 cv mil_intro_to_probability
02 cv mil_intro_to_probability02 cv mil_intro_to_probability
02 cv mil_intro_to_probability
 
Data Portals in National Statistics Offices: Case of Developing Countries
Data Portals in National Statistics Offices: Case of Developing CountriesData Portals in National Statistics Offices: Case of Developing Countries
Data Portals in National Statistics Offices: Case of Developing Countries
 
Chap019
Chap019Chap019
Chap019
 
Spatial Statistics on the Geospatial Web
Spatial Statistics on the Geospatial WebSpatial Statistics on the Geospatial Web
Spatial Statistics on the Geospatial Web
 
Fourier transform
Fourier transformFourier transform
Fourier transform
 
Six sigma
Six sigmaSix sigma
Six sigma
 
Probability and random processes project based learning template.pdf
Probability and random processes project based learning template.pdfProbability and random processes project based learning template.pdf
Probability and random processes project based learning template.pdf
 
Economics Statistics Worktext
Economics Statistics WorktextEconomics Statistics Worktext
Economics Statistics Worktext
 
Quantative analysis
Quantative analysisQuantative analysis
Quantative analysis
 
Key Economic & Social Statistics - India
Key Economic & Social Statistics - IndiaKey Economic & Social Statistics - India
Key Economic & Social Statistics - India
 
Analytical Design in Applied Marketing Research
Analytical Design in Applied Marketing ResearchAnalytical Design in Applied Marketing Research
Analytical Design in Applied Marketing Research
 
Noida Master Plan 2021
Noida Master Plan 2021Noida Master Plan 2021
Noida Master Plan 2021
 
Hollywood Motion Picture Cluster
Hollywood Motion Picture ClusterHollywood Motion Picture Cluster
Hollywood Motion Picture Cluster
 
Data Analysis
Data AnalysisData Analysis
Data Analysis
 
Application fields of R in classical industrial analytics
Application fields of R in classical industrial analyticsApplication fields of R in classical industrial analytics
Application fields of R in classical industrial analytics
 
Lean knowledge
Lean knowledgeLean knowledge
Lean knowledge
 
COM2304: Intensity Transformation and Spatial Filtering – I (Intensity Transf...
COM2304: Intensity Transformation and Spatial Filtering – I (Intensity Transf...COM2304: Intensity Transformation and Spatial Filtering – I (Intensity Transf...
COM2304: Intensity Transformation and Spatial Filtering – I (Intensity Transf...
 
Ola fopl stats project
Ola fopl stats projectOla fopl stats project
Ola fopl stats project
 

Similar to Devops kc

Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
DataStax
 
High Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureHigh Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & Azure
DataStax Academy
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0
jbellis
 
2010 12 mysql_clusteroverview
2010 12 mysql_clusteroverview2010 12 mysql_clusteroverview
2010 12 mysql_clusteroverview
Dimas Prasetyo
 

Similar to Devops kc (20)

Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsLeveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
 
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
 
High Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureHigh Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & Azure
 
Pythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra ClusterPythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra Cluster
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 
2010 12 mysql_clusteroverview
2010 12 mysql_clusteroverview2010 12 mysql_clusteroverview
2010 12 mysql_clusteroverview
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
Apache cassandra
Apache cassandraApache cassandra
Apache cassandra
 
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
 
Apache Cassandra and The Multi-Cloud by Amanda Moran
Apache Cassandra and The Multi-Cloud by Amanda MoranApache Cassandra and The Multi-Cloud by Amanda Moran
Apache Cassandra and The Multi-Cloud by Amanda Moran
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffDatabases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
 
DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014
 

Recently uploaded

Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Lisi Hocke
 

Recently uploaded (20)

Abortion Clinic In Pongola ](+27832195400*)[ 🏥 Safe Abortion Pills In Pongola...
Abortion Clinic In Pongola ](+27832195400*)[ 🏥 Safe Abortion Pills In Pongola...Abortion Clinic In Pongola ](+27832195400*)[ 🏥 Safe Abortion Pills In Pongola...
Abortion Clinic In Pongola ](+27832195400*)[ 🏥 Safe Abortion Pills In Pongola...
 
The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)
 
Auto Affiliate AI Earns First Commission in 3 Hours..pdf
Auto Affiliate  AI Earns First Commission in 3 Hours..pdfAuto Affiliate  AI Earns First Commission in 3 Hours..pdf
Auto Affiliate AI Earns First Commission in 3 Hours..pdf
 
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
 
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
 
BusinessGPT - Security and Governance for Generative AI
BusinessGPT  - Security and Governance for Generative AIBusinessGPT  - Security and Governance for Generative AI
BusinessGPT - Security and Governance for Generative AI
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
 
From Theory to Practice: Utilizing SpiraPlan's REST API
From Theory to Practice: Utilizing SpiraPlan's REST APIFrom Theory to Practice: Utilizing SpiraPlan's REST API
From Theory to Practice: Utilizing SpiraPlan's REST API
 
Software Engineering - Introduction + Process Models + Requirements Engineering
Software Engineering - Introduction + Process Models + Requirements EngineeringSoftware Engineering - Introduction + Process Models + Requirements Engineering
Software Engineering - Introduction + Process Models + Requirements Engineering
 
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
 
Community is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea GouletCommunity is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea Goulet
 
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
 
GraphSummit Milan - Neo4j: The Art of the Possible with Graph
GraphSummit Milan - Neo4j: The Art of the Possible with GraphGraphSummit Milan - Neo4j: The Art of the Possible with Graph
GraphSummit Milan - Neo4j: The Art of the Possible with Graph
 
Your Ultimate Web Studio for Streaming Anywhere | Evmux
Your Ultimate Web Studio for Streaming Anywhere | EvmuxYour Ultimate Web Studio for Streaming Anywhere | Evmux
Your Ultimate Web Studio for Streaming Anywhere | Evmux
 
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
 
Effective Strategies for Wix's Scaling challenges - GeeCon
Effective Strategies for Wix's Scaling challenges - GeeConEffective Strategies for Wix's Scaling challenges - GeeCon
Effective Strategies for Wix's Scaling challenges - GeeCon
 
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale IbridaUNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
 
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAOpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
 
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit MilanWorkshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
 
Encryption Recap: A Refresher on Key Concepts
Encryption Recap: A Refresher on Key ConceptsEncryption Recap: A Refresher on Key Concepts
Encryption Recap: A Refresher on Key Concepts
 

Devops kc

  • 1. Apache Cassandra Philip Thompson Software Engineer DataStax ©2014 DataStax. Do not distribute without consent. 1
  • 2. Who I am • Philip Thompson • Software Engineer at DataStax • Contributor to Apache Cassandra • A maintainer of CCM, the Cassandra Cluster Manager
  • 3. Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical online applications. •Written in Java and is a hybrid of Amazon Dynamo and Google BigTable •Masterless with no single point of failure •Distributed and data centre aware •100% uptime •Predictable scaling 3
  • 9. Data Distribution 75 0 25 50 Murmur3_Hash_Function(Partition Key) >> Token
  • 10. Cassandra - More than one server • All nodes participate in a cluster • Shared nothing • Add or remove as needed • More capacity? Add a server 10 • Each node owns a number of tokens • Tokens denote a range of keys • 4 nodes? -> Key range/4 • Each node owns 1/4 the data
  • 11. Cassandra - Locally Distributed • Client writes to any node • Node coordinates with others • Data replicated in parallel • Replication factor (RF): How many copies of your data? • RF = 3 here Each node stores 3/4 of clusters total data. 11
  • 12. Cassandra - Geographically Distributed • Client writes local • Data syncs across WAN • Replication Factor per DC Single coordinator 12
  • 13. Cassandra - Replication Factor • Replication factor (RF): How many copies of your data? • Replication Factor is set per keyspace • Can be altered by operator 13 RF = 3
  • 14. Cassandra - Consistency • Consistency Level (CL) • Client specifies per read or write • ALL = All replicas ack • QUORUM = > 51% of replicas ack • LOCAL_QUORUM = > 51% in local DC ack • ONE = Only one replica acks 14
  • 15. Cassandra - Transparent to the application • A single node failure shouldn’t bring failure • Replication Factor + Consistency Level = Success • This example: • RF = 3 • CL = QUORUM >51% Ack so we are good! 15
  • 16. Cassandra - Scaling • Take a cluster of four nodes • Where does the fifth node go? • Rebalancing is costly 75 16 0 25 50
  • 17.
  • 18.
  • 19. Gossip • Manages cluster state • Nodes up/down • Nodes joining/leaving • Decentralized • “Heartbeat” every second • Every node contacts 1-3 other nodes
  • 20. Snitch • Responsible for determining cluster topology • Datacenter awareness • Tracks node responsiveness • Many snitches provided out of the box • SimpleSnitch • GossipingPropertyFileSnitch (recommended for production) • EC2Snitch and EC2MultiRegionSnitch • For use with AWS • Comparable GCE snitch has just been added • Custom snitches can be added 20
  • 22. Anti-Entropy - Hinted Handoff • Three hour window • Hints are replayed when node is restored • Stored in system.hints table on coordinator • Cassandra does not copy Dynamo’s “sloppy quorum” 22
  • 23. Anti-Entropy - Repair • Nodetool repair • Uses merkle trees for data comparison • Should be run weekly. • Cassandra 2.1 has drastically improved repair times, thanks to incremental repair 23
  • 25. Write Path commit log Memtable SSTable Write Memory Disk
  • 26. Write Path • By default data is fsynced every 10s • This can be configured in cassandra.yaml commit log Memtable SSTable Write
  • 27. Read Path Memtable SSTable Read SSTable Memory Disk
  • 31. Debugging your data model • Tracing cqlsh> tracing on; Now tracing requests. cqlsh:foo> INSERT INTO test (a, b) VALUES (1, 'example'); Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9 activity | timestamp | source | source_elapsed -------------------------------------+--------------+-----------+---------------- execute_cql3_query | 00:02:37,015 | 127.0.0.1 | 0 Parsing statement | 00:02:37,015 | 127.0.0.1 | 81 Preparing statement | 00:02:37,015 | 127.0.0.1 | 273 Determining replicas for mutation | 00:02:37,015 | 127.0.0.1 | 540 Sending message to /127.0.0.2 | 00:02:37,015 | 127.0.0.1 | 779 Messsage received from /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 63 Applying mutation | 00:02:37,016 | 127.0.0.2 | 220 Acquiring switchLock | 00:02:37,016 | 127.0.0.2 | 250 Appending to commitlog | 00:02:37,016 | 127.0.0.2 | 277 Adding to memtable | 00:02:37,016 | 127.0.0.2 | 378 Enqueuing response to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 710 Sending message to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 888 Messsage received from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2334 Processing response from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2550 Request complete | 00:02:37,017 | 127.0.0.1 | 2581
  • 32. Nodetool • Command line interface for monitoring Cassandra and performing routine database operations • Commands for viewing detailed metrics for tables, server metrics, and compaction statistics: • cfstats: statistics for each table and keyspace • cfhistograms: statistics about a table, including read/write latency, row size, column count, and number of SSTables • netstats: statistics about network operations and connections • tpstats: statistics about the number of active, pending, and completed tasks for each stage of Cassandra operations by thread pool 32
  • 33. Try it out ©2012 DataStax 33
  • 34. Cassandra • Download from source: • git clone git://git.apache.org/cassandra.git • Packaged install and tarballs available: • http://www.datastax.com/documentation/cassandra/2.1/cassandra/install/install_cassan draTOC.html
  • 35. CCM • CCM - Cassandra Cluster Manager • https://github.com/pcmanus/ccm •Warning: not lightweight • Example: • ccm create test -v 2.0.1 • ccm populate -n 3 • ccm start
  • 36. Clients • Cqlsh • Bundled with Cassandra • Drivers • java: https://github.com/datastax/java-driver • python: https://github.com/datastax/python-driver • .net: https://github.com/datastax/csharp-driver • and more: http://www.datastax.com/download/clientdrivers • Ruby, C/C++, NodeJS
  • 37. Get Help • IRC: #cassandra on freenode • Mailing Lists • Subscribe at cassandra.apache.org • Stack Overflow • DataStax Docs • http://www.datastax.com/docs 37
  • 39. ©2014 DataStax Confidential. Do not distribute without consent. 39