SlideShare a Scribd company logo
1 of 11
Distributed
Peer to Peer
Client
● There is no leader/follower.
● Each node is aware of keys held by other nodes and coordinates with that node to fetch the
data.
● Depending on the replication factor & consistency level the coordinator talks to one of more
nodes before returning the response to the client.
● Every table defines a partition key.
● Data is distributed across the various nodes in the cluster using the hash on the partition
key. Uses Consistent hashing algo.
● Partitions are replicated across multiple nodes to prevent single point of failure.
Replication copies of the data across
multiple nodes within/across the DCs.
Replication Factor (RF) denotes the no of
copies.
Set at the keyspace level.
Snitch: Is a strategy to identify the DC and
Rack the node belongs to. This identity
can be manually shared across all nodes
or via Gossiping.
Coordinator is aware of the RF/keyspace
and coordinates the writes upto that factor
to the various nodes within/across DCs.
Hinted Handoff - While the replica node is
down the coordinator will delay the
transmission to that node by persisting
that data locally. It can retransmits it once
that replica node is back online.
Cassandra configuration sets the duration
for holding such data before handoff.
Replication & Consistency Consistency is an agreeable factor across
the nodes that ensures the acceptance of
a read/write.
Consistency can be set for both
read/writes.
Consistency levels (CL) can be set from
low to high (ONE, LOCAL_QUOROUM,
QUORUM, ALL)
CL is a trade off b/w consistency and
availability.
Read Repair: Coordinator performs a read
repair on some/all of the replicas that
have trailing versions. Depending on the
CL this can be done async during a read
request.
Gossip Each node stores info about itself and
every other node in its Knowledge base.
Each node initiates the gossip every
second with 2 or 3 other nodes to share
its knowledge base.
Knowledge Base:
Each node increments its heartbeat
version every second.
When it receives a gossip from other
node, it checks each nodes heart beat
version and updates if it had received the
latest version.
Optimization to reduce message
bandwidth during gossiping
Gossip is initiated with a SYN to the
receiving node.
SYN: Just a digest - no AppState included
Receiving node ACKs back to the sender.
ACK: Digest for the trailing versions &
detailed (includes AppState) for leading
versions.
Sender updates the trailing versions and
Acks back with the detailed info for the
requested trailing versions on the other
end.
EndPt State: <IP of a node>
HeartBeat State:
Generation: 10
Version: 34
Application State;
Status::
Norma/Removed/Arrived…
DataCenter:
Rack:
Load:
Severity:
….
EndPt State: <IP of a
node>...
Knowledge Base
Mem table
Commit log
Client
Write Path Client writes to both commit log and
memtable. In the event of the node
failures, the memtable can be constructed
from the commit log.
Commit log is append only, does not
maintain any order.
Memtable is partitioned by partition key
and ordered by clustering columns.
Eventually memtable grows out of size
and is flushed to disk (SSTable). SSTable
is immutable so with each flush a new
SSTable file is created.
SSTable holds each partition
Compaction is a process of merging
numerous sstable files into one. It relies
on timestamp of each row to resolve dups.
SSTable 1
SSTable 1
SSTable 1
SSTable
Compaction
Flushing
Disk
Memory
23, USA 4
23, USA 8
23, Mexico 7
55, Korea 9
23, USA 5
55, Korea 9
23, Mexico 7
23, USA 4
23, USA 5
23, USA 8
23, Mexico 7
23, USA 4
55, Korea 9
23, USA 5
23, USA 8
55, China 20
55, China 40
55, Korea 9
23, Mexico 7
23, USA 4
23, USA 5
23, USA 8
Replica Node
Coordinator
Bloom Filters
Read Path
Mem table
Client
SSTable 1
SSTable 1
SSTable 1
SSTable
Compaction
Flushing
DiskMemory
Partition
Index
Summary
Index
Key
Cache
(LRU)
Order of search during a Read:
Coordinator node calls one of the replica
node for the requested partition key.
Replica Node first looks in the Mem table.
If not found, follows the below path until
the key is found.
Bloom filters help determine two things.
The key doesn’t exist in the sstable or the
key may exist in the sstable.
Key Cache, An LRU cache with partition
key & value is the offset of the partition in
the SSTable file.
Summary Index is range based index for
the keys in the partition index and their
offset.
Partition Index is the indexed lookup on
the partition key and the offset of the
partition in the SSTable file.
Replica Node
Coordinator
Bloom Filters
Bloom Filters
Bloom Filters
Summary Index
Partition Index
Key Cache
References:
● https://academy.datastax.com
● https://www.youtube.com/watch?v=s1xc1HVsRk0&list=PLalrWAGybpB-L1PGA-
NfFu2uiWHEsdscD&index=1
● https://www.toptal.com/big-data/consistent-hashing
● https://www.baeldung.com/cassandra-data-modeling
Consistent Hashing
Given a set of key/value pairs, hashing is strategy to
spread each pair evenly as possible, so that we can fetch
them in almost constant time by their key.
Consistent hashing is one such hashing strategy to spread
the keys in a distributed env.
The hash of keys are hypothetically spread on ring. The
position the key takes on the ring can be anywhere b/w 0 -
360 based on hash of key (mostly mod on the hash).
The stores/server that hosts these key are also given a
position on the ring (e.g., A, B, C…)
The key is stored on the server that is found first, while
traversing the ring in anti-clockwise direction from the keys
position.
E.g., key Steve @ 352.3 finds server C @ 81.7
If we maintain a sorted list of server and their position, a
quick binary search will point us to the server where the
key can be found eliminating the need to query all servers.
Keys can be replicated on succeeding servers to avoid
SPF (Single point of failures).
Consistent Hashing
Although the keys are spread over several servers, the
distribution may not be even due to the uneven clustering
of the key in real world (names starting with a certain
alphabet may be more common).
In such scenarios, to overcome the load on an individual
server, we define virtual servers. What this means is we
provide multiple positions for the same server simulating
multiple instances of the same server across the ring.
With ref to the pic here, the refined sorted list of servers
will now have virtual instances of servers a1, a2, b2, c3
etc... Thereby distributed the load on C to B and A as well.
Bloom Filters
It's a probabilistic data structure to determine if an element is present in the set of not.
It consists of a set of n bits & a collection of independent hash functions. Each of which return a no between 0 to n-1 representing one of
the nth bit.
Writes:
A key is run thru the collection of hash functions. The resulting nth bit is flipped on to mark the elements presence.
Reads:
A key is run thru the collection of hash functions. Iff all the resulting nth bit is turned on, we can ensure that the key MAY be present in the
underlying set. Even if one of them is not flipped on, we can GUARANTEE that the key is not present.

More Related Content

What's hot

Locks In Disributed Systems
Locks In Disributed SystemsLocks In Disributed Systems
Locks In Disributed Systemsmridul mishra
 
Istanbul BFT
Istanbul BFTIstanbul BFT
Istanbul BFTYu-Te Lin
 
Pulsar connector on flink 1.14
Pulsar connector on flink 1.14Pulsar connector on flink 1.14
Pulsar connector on flink 1.14宇帆 盛
 
Cassandra consistency
Cassandra consistencyCassandra consistency
Cassandra consistencyzqhxuyuan
 
GopherCon 2017 - Writing Networking Clients in Go: The Design & Implementati...
GopherCon 2017 -  Writing Networking Clients in Go: The Design & Implementati...GopherCon 2017 -  Writing Networking Clients in Go: The Design & Implementati...
GopherCon 2017 - Writing Networking Clients in Go: The Design & Implementati...wallyqs
 
Cassandra basic
Cassandra basicCassandra basic
Cassandra basiczqhxuyuan
 
Stephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large StateStephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large StateFlink Forward
 
Dynamic Resource Management In a Massively Parallel Stream Processing Engine
 Dynamic Resource Management In a Massively Parallel Stream Processing Engine Dynamic Resource Management In a Massively Parallel Stream Processing Engine
Dynamic Resource Management In a Massively Parallel Stream Processing EngineKasper Grud Skat Madsen
 
Distributed System by Pratik Tambekar
Distributed System by Pratik TambekarDistributed System by Pratik Tambekar
Distributed System by Pratik TambekarPratik Tambekar
 
Logging Last Resource Optimization for Distributed Transactions in Oracle We...
Logging Last Resource Optimization for Distributed Transactions in  Oracle We...Logging Last Resource Optimization for Distributed Transactions in  Oracle We...
Logging Last Resource Optimization for Distributed Transactions in Oracle We...Gera Shegalov
 
Flink Forward SF 2017: Till Rohrmann - Redesigning Apache Flink’s Distributed...
Flink Forward SF 2017: Till Rohrmann - Redesigning Apache Flink’s Distributed...Flink Forward SF 2017: Till Rohrmann - Redesigning Apache Flink’s Distributed...
Flink Forward SF 2017: Till Rohrmann - Redesigning Apache Flink’s Distributed...Flink Forward
 
Cassandra Internals Overview
Cassandra Internals OverviewCassandra Internals Overview
Cassandra Internals Overviewbeobal
 
Cassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data ModelingCassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data ModelingMatthew Dennis
 
Transactions and Concurrency Control
Transactions and Concurrency ControlTransactions and Concurrency Control
Transactions and Concurrency ControlDilum Bandara
 
Building your own Distributed System The easy way - Cassandra Summit EU 2014
Building your own Distributed System The easy way - Cassandra Summit EU 2014Building your own Distributed System The easy way - Cassandra Summit EU 2014
Building your own Distributed System The easy way - Cassandra Summit EU 2014Kévin LOVATO
 
Everything You Thought You Already Knew About Orchestration
Everything You Thought You Already Knew About OrchestrationEverything You Thought You Already Knew About Orchestration
Everything You Thought You Already Knew About OrchestrationLaura Frank Tacho
 
Chapter 12 transactions and concurrency control
Chapter 12 transactions and concurrency controlChapter 12 transactions and concurrency control
Chapter 12 transactions and concurrency controlAbDul ThaYyal
 
M|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScaleM|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScaleMariaDB plc
 
Seattle Cassandra Meetup - Cassandra 1.2 - Eddie Satterly
Seattle Cassandra Meetup - Cassandra 1.2 - Eddie SatterlySeattle Cassandra Meetup - Cassandra 1.2 - Eddie Satterly
Seattle Cassandra Meetup - Cassandra 1.2 - Eddie Satterlybtoddb
 

What's hot (20)

Locks In Disributed Systems
Locks In Disributed SystemsLocks In Disributed Systems
Locks In Disributed Systems
 
Istanbul BFT
Istanbul BFTIstanbul BFT
Istanbul BFT
 
Pulsar connector on flink 1.14
Pulsar connector on flink 1.14Pulsar connector on flink 1.14
Pulsar connector on flink 1.14
 
Cassandra consistency
Cassandra consistencyCassandra consistency
Cassandra consistency
 
GopherCon 2017 - Writing Networking Clients in Go: The Design & Implementati...
GopherCon 2017 -  Writing Networking Clients in Go: The Design & Implementati...GopherCon 2017 -  Writing Networking Clients in Go: The Design & Implementati...
GopherCon 2017 - Writing Networking Clients in Go: The Design & Implementati...
 
Cassandra basic
Cassandra basicCassandra basic
Cassandra basic
 
Stephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large StateStephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large State
 
Dynamic Resource Management In a Massively Parallel Stream Processing Engine
 Dynamic Resource Management In a Massively Parallel Stream Processing Engine Dynamic Resource Management In a Massively Parallel Stream Processing Engine
Dynamic Resource Management In a Massively Parallel Stream Processing Engine
 
Distributed System by Pratik Tambekar
Distributed System by Pratik TambekarDistributed System by Pratik Tambekar
Distributed System by Pratik Tambekar
 
Logging Last Resource Optimization for Distributed Transactions in Oracle We...
Logging Last Resource Optimization for Distributed Transactions in  Oracle We...Logging Last Resource Optimization for Distributed Transactions in  Oracle We...
Logging Last Resource Optimization for Distributed Transactions in Oracle We...
 
Flink Forward SF 2017: Till Rohrmann - Redesigning Apache Flink’s Distributed...
Flink Forward SF 2017: Till Rohrmann - Redesigning Apache Flink’s Distributed...Flink Forward SF 2017: Till Rohrmann - Redesigning Apache Flink’s Distributed...
Flink Forward SF 2017: Till Rohrmann - Redesigning Apache Flink’s Distributed...
 
Elixir concurrency 101
Elixir concurrency 101Elixir concurrency 101
Elixir concurrency 101
 
Cassandra Internals Overview
Cassandra Internals OverviewCassandra Internals Overview
Cassandra Internals Overview
 
Cassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data ModelingCassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data Modeling
 
Transactions and Concurrency Control
Transactions and Concurrency ControlTransactions and Concurrency Control
Transactions and Concurrency Control
 
Building your own Distributed System The easy way - Cassandra Summit EU 2014
Building your own Distributed System The easy way - Cassandra Summit EU 2014Building your own Distributed System The easy way - Cassandra Summit EU 2014
Building your own Distributed System The easy way - Cassandra Summit EU 2014
 
Everything You Thought You Already Knew About Orchestration
Everything You Thought You Already Knew About OrchestrationEverything You Thought You Already Knew About Orchestration
Everything You Thought You Already Knew About Orchestration
 
Chapter 12 transactions and concurrency control
Chapter 12 transactions and concurrency controlChapter 12 transactions and concurrency control
Chapter 12 transactions and concurrency control
 
M|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScaleM|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScale
 
Seattle Cassandra Meetup - Cassandra 1.2 - Eddie Satterly
Seattle Cassandra Meetup - Cassandra 1.2 - Eddie SatterlySeattle Cassandra Meetup - Cassandra 1.2 - Eddie Satterly
Seattle Cassandra Meetup - Cassandra 1.2 - Eddie Satterly
 

Similar to Cassandra Architecture

Dynamo cassandra
Dynamo cassandraDynamo cassandra
Dynamo cassandraWu Liang
 
Cassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupCassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupAdam Hutson
 
Samsung DeepSort
Samsung DeepSortSamsung DeepSort
Samsung DeepSortRyo Jin
 
The Apache Cassandra ecosystem
The Apache Cassandra ecosystemThe Apache Cassandra ecosystem
The Apache Cassandra ecosystemAlex Thompson
 
Lab Seminar 2009 12 01 Message Drop Reduction And Movement
Lab Seminar 2009 12 01  Message Drop Reduction And MovementLab Seminar 2009 12 01  Message Drop Reduction And Movement
Lab Seminar 2009 12 01 Message Drop Reduction And Movementtharindanv
 
Handling Data in Mega Scale Web Systems
Handling Data in Mega Scale Web SystemsHandling Data in Mega Scale Web Systems
Handling Data in Mega Scale Web SystemsVineet Gupta
 
Distributed Coordination
Distributed CoordinationDistributed Coordination
Distributed CoordinationLuis Galárraga
 
2.communcation in distributed system
2.communcation in distributed system2.communcation in distributed system
2.communcation in distributed systemGd Goenka University
 
Lab Seminar 2009 06 17 Description Based Ad Hoc Networks
Lab Seminar 2009 06 17  Description Based Ad Hoc NetworksLab Seminar 2009 06 17  Description Based Ad Hoc Networks
Lab Seminar 2009 06 17 Description Based Ad Hoc Networkstharindanv
 
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking VN
 
OPEN SHORTEST PATH FIRST (OSPF)
OPEN SHORTEST PATH FIRST (OSPF)OPEN SHORTEST PATH FIRST (OSPF)
OPEN SHORTEST PATH FIRST (OSPF)Ann Joseph
 
MySQL HA with PaceMaker
MySQL HA with  PaceMakerMySQL HA with  PaceMaker
MySQL HA with PaceMakerKris Buytaert
 
Session 7 Tp 7
Session 7 Tp 7Session 7 Tp 7
Session 7 Tp 7githe26200
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingDibyendu Bhattacharya
 
Open shortest path first (ospf)
Open shortest path first (ospf)Open shortest path first (ospf)
Open shortest path first (ospf)Respa Peter
 

Similar to Cassandra Architecture (20)

Dynamo cassandra
Dynamo cassandraDynamo cassandra
Dynamo cassandra
 
Cassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupCassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User Group
 
Samsung DeepSort
Samsung DeepSortSamsung DeepSort
Samsung DeepSort
 
The Apache Cassandra ecosystem
The Apache Cassandra ecosystemThe Apache Cassandra ecosystem
The Apache Cassandra ecosystem
 
Lab Seminar 2009 12 01 Message Drop Reduction And Movement
Lab Seminar 2009 12 01  Message Drop Reduction And MovementLab Seminar 2009 12 01  Message Drop Reduction And Movement
Lab Seminar 2009 12 01 Message Drop Reduction And Movement
 
Handling Data in Mega Scale Web Systems
Handling Data in Mega Scale Web SystemsHandling Data in Mega Scale Web Systems
Handling Data in Mega Scale Web Systems
 
Distributed Coordination
Distributed CoordinationDistributed Coordination
Distributed Coordination
 
Technical presentation
Technical presentationTechnical presentation
Technical presentation
 
2.communcation in distributed system
2.communcation in distributed system2.communcation in distributed system
2.communcation in distributed system
 
Lab Seminar 2009 06 17 Description Based Ad Hoc Networks
Lab Seminar 2009 06 17  Description Based Ad Hoc NetworksLab Seminar 2009 06 17  Description Based Ad Hoc Networks
Lab Seminar 2009 06 17 Description Based Ad Hoc Networks
 
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
 
OPEN SHORTEST PATH FIRST (OSPF)
OPEN SHORTEST PATH FIRST (OSPF)OPEN SHORTEST PATH FIRST (OSPF)
OPEN SHORTEST PATH FIRST (OSPF)
 
MySQL HA with PaceMaker
MySQL HA with  PaceMakerMySQL HA with  PaceMaker
MySQL HA with PaceMaker
 
Session 7 Tp 7
Session 7 Tp 7Session 7 Tp 7
Session 7 Tp 7
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
 
Open shortest path first (ospf)
Open shortest path first (ospf)Open shortest path first (ospf)
Open shortest path first (ospf)
 
1 ddbms jan 2011_u
1 ddbms jan 2011_u1 ddbms jan 2011_u
1 ddbms jan 2011_u
 
No sql
No sqlNo sql
No sql
 
Link state routing protocol
Link state routing protocolLink state routing protocol
Link state routing protocol
 
Link state routing protocol
Link state routing protocolLink state routing protocol
Link state routing protocol
 

Recently uploaded

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Cassandra Architecture

  • 1.
  • 2. Distributed Peer to Peer Client ● There is no leader/follower. ● Each node is aware of keys held by other nodes and coordinates with that node to fetch the data. ● Depending on the replication factor & consistency level the coordinator talks to one of more nodes before returning the response to the client. ● Every table defines a partition key. ● Data is distributed across the various nodes in the cluster using the hash on the partition key. Uses Consistent hashing algo. ● Partitions are replicated across multiple nodes to prevent single point of failure.
  • 3. Replication copies of the data across multiple nodes within/across the DCs. Replication Factor (RF) denotes the no of copies. Set at the keyspace level. Snitch: Is a strategy to identify the DC and Rack the node belongs to. This identity can be manually shared across all nodes or via Gossiping. Coordinator is aware of the RF/keyspace and coordinates the writes upto that factor to the various nodes within/across DCs. Hinted Handoff - While the replica node is down the coordinator will delay the transmission to that node by persisting that data locally. It can retransmits it once that replica node is back online. Cassandra configuration sets the duration for holding such data before handoff. Replication & Consistency Consistency is an agreeable factor across the nodes that ensures the acceptance of a read/write. Consistency can be set for both read/writes. Consistency levels (CL) can be set from low to high (ONE, LOCAL_QUOROUM, QUORUM, ALL) CL is a trade off b/w consistency and availability. Read Repair: Coordinator performs a read repair on some/all of the replicas that have trailing versions. Depending on the CL this can be done async during a read request.
  • 4. Gossip Each node stores info about itself and every other node in its Knowledge base. Each node initiates the gossip every second with 2 or 3 other nodes to share its knowledge base. Knowledge Base: Each node increments its heartbeat version every second. When it receives a gossip from other node, it checks each nodes heart beat version and updates if it had received the latest version. Optimization to reduce message bandwidth during gossiping Gossip is initiated with a SYN to the receiving node. SYN: Just a digest - no AppState included Receiving node ACKs back to the sender. ACK: Digest for the trailing versions & detailed (includes AppState) for leading versions. Sender updates the trailing versions and Acks back with the detailed info for the requested trailing versions on the other end. EndPt State: <IP of a node> HeartBeat State: Generation: 10 Version: 34 Application State; Status:: Norma/Removed/Arrived… DataCenter: Rack: Load: Severity: …. EndPt State: <IP of a node>... Knowledge Base
  • 5. Mem table Commit log Client Write Path Client writes to both commit log and memtable. In the event of the node failures, the memtable can be constructed from the commit log. Commit log is append only, does not maintain any order. Memtable is partitioned by partition key and ordered by clustering columns. Eventually memtable grows out of size and is flushed to disk (SSTable). SSTable is immutable so with each flush a new SSTable file is created. SSTable holds each partition Compaction is a process of merging numerous sstable files into one. It relies on timestamp of each row to resolve dups. SSTable 1 SSTable 1 SSTable 1 SSTable Compaction Flushing Disk Memory 23, USA 4 23, USA 8 23, Mexico 7 55, Korea 9 23, USA 5 55, Korea 9 23, Mexico 7 23, USA 4 23, USA 5 23, USA 8 23, Mexico 7 23, USA 4 55, Korea 9 23, USA 5 23, USA 8 55, China 20 55, China 40 55, Korea 9 23, Mexico 7 23, USA 4 23, USA 5 23, USA 8 Replica Node Coordinator Bloom Filters
  • 6. Read Path Mem table Client SSTable 1 SSTable 1 SSTable 1 SSTable Compaction Flushing DiskMemory Partition Index Summary Index Key Cache (LRU) Order of search during a Read: Coordinator node calls one of the replica node for the requested partition key. Replica Node first looks in the Mem table. If not found, follows the below path until the key is found. Bloom filters help determine two things. The key doesn’t exist in the sstable or the key may exist in the sstable. Key Cache, An LRU cache with partition key & value is the offset of the partition in the SSTable file. Summary Index is range based index for the keys in the partition index and their offset. Partition Index is the indexed lookup on the partition key and the offset of the partition in the SSTable file. Replica Node Coordinator Bloom Filters Bloom Filters Bloom Filters
  • 8. References: ● https://academy.datastax.com ● https://www.youtube.com/watch?v=s1xc1HVsRk0&list=PLalrWAGybpB-L1PGA- NfFu2uiWHEsdscD&index=1 ● https://www.toptal.com/big-data/consistent-hashing ● https://www.baeldung.com/cassandra-data-modeling
  • 9. Consistent Hashing Given a set of key/value pairs, hashing is strategy to spread each pair evenly as possible, so that we can fetch them in almost constant time by their key. Consistent hashing is one such hashing strategy to spread the keys in a distributed env. The hash of keys are hypothetically spread on ring. The position the key takes on the ring can be anywhere b/w 0 - 360 based on hash of key (mostly mod on the hash). The stores/server that hosts these key are also given a position on the ring (e.g., A, B, C…) The key is stored on the server that is found first, while traversing the ring in anti-clockwise direction from the keys position. E.g., key Steve @ 352.3 finds server C @ 81.7 If we maintain a sorted list of server and their position, a quick binary search will point us to the server where the key can be found eliminating the need to query all servers. Keys can be replicated on succeeding servers to avoid SPF (Single point of failures).
  • 10. Consistent Hashing Although the keys are spread over several servers, the distribution may not be even due to the uneven clustering of the key in real world (names starting with a certain alphabet may be more common). In such scenarios, to overcome the load on an individual server, we define virtual servers. What this means is we provide multiple positions for the same server simulating multiple instances of the same server across the ring. With ref to the pic here, the refined sorted list of servers will now have virtual instances of servers a1, a2, b2, c3 etc... Thereby distributed the load on C to B and A as well.
  • 11. Bloom Filters It's a probabilistic data structure to determine if an element is present in the set of not. It consists of a set of n bits & a collection of independent hash functions. Each of which return a no between 0 to n-1 representing one of the nth bit. Writes: A key is run thru the collection of hash functions. The resulting nth bit is flipped on to mark the elements presence. Reads: A key is run thru the collection of hash functions. Iff all the resulting nth bit is turned on, we can ensure that the key MAY be present in the underlying set. Even if one of them is not flipped on, we can GUARANTEE that the key is not present.