SlideShare a Scribd company logo
Distributed Counters
            in Cassandra




Friday, August 13, 2010
I: Goal
             II: Design
            III: Implementation




            Distributed Counters in Cassandra

Friday, August 13, 2010
I: Goal




            Distributed Counters in Cassandra

Friday, August 13, 2010
Goal




       Low Latency,
       Highly Available
       Counters




            Distributed Counters in Cassandra

Friday, August 13, 2010
II: Design




            Distributed Counters in Cassandra

Friday, August 13, 2010
I: Traditional Counter Design
             II: Abstract Strategy
            III: Distributed Counter Design




            Distributed Counters in Cassandra

Friday, August 13, 2010
Design



                 I: Traditional Counter Design




            Distributed Counters in Cassandra

Friday, August 13, 2010
Traditional Counter Design
       Atomic Counters


       1. single machine
       2. one order of execution
       3. strongly consistent



            Distributed Counters in Cassandra

Friday, August 13, 2010
Traditional Counter Design
       Problems


       1. SPOF / single master
       2. high latency
       3. manually sharded



            Distributed Counters in Cassandra

Friday, August 13, 2010
Traditional Counter Design
       Question




                          What constraints can we relax?




            Distributed Counters in Cassandra

Friday, August 13, 2010
Design



               II: Abstract Strategy




            Distributed Counters in Cassandra

Friday, August 13, 2010
Abstract Strategy
       Constraints to Relax



       1. one order of execution
       2. strong consistency




            Distributed Counters in Cassandra

Friday, August 13, 2010
Abstract Strategy
       Relax: One Order of Execution



       commutative operation:
         - operations must be re-orderable



            Distributed Counters in Cassandra

Friday, August 13, 2010
Abstract Strategy
       Relax: Strong Consistency

       partitioned work:
         - each op must occur once
         - unique partition identifier
       idempotent repair:
         - recognize ops from other partitions

            Distributed Counters in Cassandra

Friday, August 13, 2010
Design



            III: Distributed Counter Design




            Distributed Counters in Cassandra

Friday, August 13, 2010
Distributed Counter Design
       Requirements


       1. commutative operation
       2. partitioned work
       3. idempotent repair



            Distributed Counters in Cassandra

Friday, August 13, 2010
Distributed Counter Design
       Commutative Operation


       addition:
         - commutative operation
         - sum ops performed by all replicas
         -a + b = b + a

            Distributed Counters in Cassandra

Friday, August 13, 2010
Distributed Counter Design
       Partitioned Work



       each op assigned to a replica:
         - every replica sums all of its ops



            Distributed Counters in Cassandra

Friday, August 13, 2010
Distributed Counter Design
       Idempotent Repair


       save counts from remote replicas:
         - keep highest count seen
       prevent multiple execution:
         - do not transfer the target replica’s count


            Distributed Counters in Cassandra

Friday, August 13, 2010
III: Implementation




            Distributed Counters in Cassandra

Friday, August 13, 2010
I: Data Structure
             II: Single Node
            III: Eventual Consistency




            Distributed Counters in Cassandra

Friday, August 13, 2010
I: Data Structure




            Distributed Counters in Cassandra

Friday, August 13, 2010
Data Structure
       Requirements


       local counts:
         - incrementally update
       remote counts:
         - independently track partitions

            Distributed Counters in Cassandra

Friday, August 13, 2010
Data Structure
       Context Format



       list of (replica id, count) tuples:
                 [(replica A, count), (replica B, count), ...]




            Distributed Counters in Cassandra

Friday, August 13, 2010
Data Structure
       Context Mutations


       local write:
         sum local count and write delta
         note: memtable



            Distributed Counters in Cassandra

Friday, August 13, 2010
Data Structure
       Context Mutations


       remote repair:
         for each replica,
         keep highest count seen
         (local or from repair)


            Distributed Counters in Cassandra

Friday, August 13, 2010
II: Single Node




            Distributed Counters in Cassandra

Friday, August 13, 2010
Single Node
       Write Path

       client
          1. construct column
             - value: delta (big-endian long)
             - clock: empty
          2. thrift: insert / batch_mutate

            Distributed Counters in Cassandra

Friday, August 13, 2010
Single Node
       Write Path

       coordinator
         1. choose partition
                          - choose target replica
                          - requirement: ConsistencyLevel.ONE
                 2. construct clock
                          - context format: [(target replica id, count delta)]


            Distributed Counters in Cassandra

Friday, August 13, 2010
Single Node
       Write Path


       target replica
       insert:
                 1. memtable does not contain column
                 2. insert column into memtable



            Distributed Counters in Cassandra

Friday, August 13, 2010
Single Node
       Write Path
       target replica
       update:
                 1. memtable contains column
                 2. retrieve existing column
                 3. create new column
                    - context: sum local count w/ delta from write
                 4. replace column in ConcurrentSkipListMap
                 5. if failed to replace column, go to step 2.


            Distributed Counters in Cassandra

Friday, August 13, 2010
Single Node
       Write Path
       Interesting Note:
       MTs are serialized to SSTs, as-is
                 - each SST encapsulates the updates
                   when it was an MT
                 - local count total must be aggregated
                   across the MT and all SSTs

            Distributed Counters in Cassandra

Friday, August 13, 2010
Single Node
       Read Path
       target replica
       read:
                 1. construct collating iterator over:
                    - frozen snapshot of MT
                    - all relevant SSTs
                 2. resolve column
                    - local counts: sum
                    - remote counts: keep max
                 3. construct value
                    - sum local and remote counts (big-endian long)

            Distributed Counters in Cassandra

Friday, August 13, 2010
Single Node
       Compaction

       replica
       compaction:
                 1. construct collating iterator over all SSTs
                 2. resolve every column in the CF
                    - local counts: sum
                    - remote counts: keep max
                 3. write out resolved CF



            Distributed Counters in Cassandra

Friday, August 13, 2010
III: Eventual Consistency




            Distributed Counters in Cassandra

Friday, August 13, 2010
Eventual Consistency
       Read Repair


       coordinator / replica
       read repair:
                 1. calculate resolved (superset) CF
                    - resolve every column (local: sum, remote: max)
                 2. return resolved CF to client




            Distributed Counters in Cassandra

Friday, August 13, 2010
Eventual Consistency
       Read Repair

       coordinator / replica
       read repair:
                 1. calculate repair CF for each replica
                    - calculate diff CF between resolved and received
                    - modify columns to remove target replica’s counts
                 2. send repair CF to each replica



            Distributed Counters in Cassandra

Friday, August 13, 2010
Eventual Consistency
       Anti-Entropy Service


       sending replica
       AES:
                 1. follow normal AES code path
                    - calculate repair SST based on shared ranges
                    - send repair SST



            Distributed Counters in Cassandra

Friday, August 13, 2010
Eventual Consistency
       Anti-Entropy Service

       receiving replica
       AES:
                 1. post-process streamed SST
                    - re-build streamed SST
                    - note: strip out local replica’s counts
                 2. remove temporary descriptor
                 3. add to SSTableTracker



            Distributed Counters in Cassandra

Friday, August 13, 2010
Questions?




            Distributed Counters in Cassandra

Friday, August 13, 2010
More Information
       Issues:
       #580: Vector Clocks
       #1072: Distributed Counters

       Related Work:
       Helland and Campbell, Building on Quicksand, CIDR (2009),
       Sections 5 & 6.


       My email address:
       kakugawa@gmail.com


            Distributed Counters in Cassandra

Friday, August 13, 2010

More Related Content

What's hot

Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
Databricks
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
Altinity Ltd
 
Erasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterErasure codes and storage tiers on gluster
Erasure codes and storage tiers on gluster
Red_Hat_Storage
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Bo Yang
 
Building Modern Data Streaming Apps with Python
Building Modern Data Streaming Apps with PythonBuilding Modern Data Streaming Apps with Python
Building Modern Data Streaming Apps with Python
Timothy Spann
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
StreamNative
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
Databricks
 
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...Interactive real time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...
DataWorks Summit
 
OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...
OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...
OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...
Altinity Ltd
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in Spark
Databricks
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
Julien Le Dem
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
Databricks
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Dvir Volk
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta Lake
Databricks
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
DataStax
 
Bulk Loading into Cassandra
Bulk Loading into CassandraBulk Loading into Cassandra
Bulk Loading into Cassandra
Brian Hess
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
Flink Forward
 

What's hot (20)

Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
 
Erasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterErasure codes and storage tiers on gluster
Erasure codes and storage tiers on gluster
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
 
Building Modern Data Streaming Apps with Python
Building Modern Data Streaming Apps with PythonBuilding Modern Data Streaming Apps with Python
Building Modern Data Streaming Apps with Python
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
 
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...Interactive real time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...
 
OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...
OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...
OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in Spark
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta Lake
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
 
Bulk Loading into Cassandra
Bulk Loading into CassandraBulk Loading into Cassandra
Bulk Loading into Cassandra
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
 

Similar to Distributed Counters in Cassandra (Cassandra Summit 2010)

07 problem-solving
07 problem-solving07 problem-solving
07 problem-solving
Hadley Wickham
 
Summary of "Cassandra" for 3rd nosql summer reading in Tokyo
Summary of "Cassandra" for 3rd nosql summer reading in TokyoSummary of "Cassandra" for 3rd nosql summer reading in Tokyo
Summary of "Cassandra" for 3rd nosql summer reading in Tokyo
CLOUDIAN KK
 
L09.pdf
L09.pdfL09.pdf
TechEvent Apache Cassandra
TechEvent Apache CassandraTechEvent Apache Cassandra
TechEvent Apache Cassandra
Trivadis
 
Big Data & NoSQL - EFS'11 (Pavlo Baron)
Big Data & NoSQL - EFS'11 (Pavlo Baron)Big Data & NoSQL - EFS'11 (Pavlo Baron)
Big Data & NoSQL - EFS'11 (Pavlo Baron)
Pavlo Baron
 
Dynamo cassandra
Dynamo cassandraDynamo cassandra
Dynamo cassandra
Wu Liang
 
L09-handout.pdf
L09-handout.pdfL09-handout.pdf
L09-handout.pdf
TRNHONGLINHBCHCM
 
04 reports
04 reports04 reports
04 reports
Hadley Wickham
 
Understanding AntiEntropy in Cassandra
Understanding AntiEntropy in CassandraUnderstanding AntiEntropy in Cassandra
Understanding AntiEntropy in Cassandra
Jason Brown
 
SignalFx: Making Cassandra Perform as a Time Series Database
SignalFx: Making Cassandra Perform as a Time Series DatabaseSignalFx: Making Cassandra Perform as a Time Series Database
SignalFx: Making Cassandra Perform as a Time Series Database
DataStax Academy
 
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
SignalFx
 
The Apache Cassandra ecosystem
The Apache Cassandra ecosystemThe Apache Cassandra ecosystem
The Apache Cassandra ecosystem
Alex Thompson
 
ScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous Speed
J On The Beach
 
04 Reports
04 Reports04 Reports
04 Reports
Hadley Wickham
 
Automatic Storage Management (ASM) metrics are a goldmine: Let's use them!
Automatic Storage Management (ASM) metrics are a goldmine: Let's use them!Automatic Storage Management (ASM) metrics are a goldmine: Let's use them!
Automatic Storage Management (ASM) metrics are a goldmine: Let's use them!
BertrandDrouvot
 

Similar to Distributed Counters in Cassandra (Cassandra Summit 2010) (15)

07 problem-solving
07 problem-solving07 problem-solving
07 problem-solving
 
Summary of "Cassandra" for 3rd nosql summer reading in Tokyo
Summary of "Cassandra" for 3rd nosql summer reading in TokyoSummary of "Cassandra" for 3rd nosql summer reading in Tokyo
Summary of "Cassandra" for 3rd nosql summer reading in Tokyo
 
L09.pdf
L09.pdfL09.pdf
L09.pdf
 
TechEvent Apache Cassandra
TechEvent Apache CassandraTechEvent Apache Cassandra
TechEvent Apache Cassandra
 
Big Data & NoSQL - EFS'11 (Pavlo Baron)
Big Data & NoSQL - EFS'11 (Pavlo Baron)Big Data & NoSQL - EFS'11 (Pavlo Baron)
Big Data & NoSQL - EFS'11 (Pavlo Baron)
 
Dynamo cassandra
Dynamo cassandraDynamo cassandra
Dynamo cassandra
 
L09-handout.pdf
L09-handout.pdfL09-handout.pdf
L09-handout.pdf
 
04 reports
04 reports04 reports
04 reports
 
Understanding AntiEntropy in Cassandra
Understanding AntiEntropy in CassandraUnderstanding AntiEntropy in Cassandra
Understanding AntiEntropy in Cassandra
 
SignalFx: Making Cassandra Perform as a Time Series Database
SignalFx: Making Cassandra Perform as a Time Series DatabaseSignalFx: Making Cassandra Perform as a Time Series Database
SignalFx: Making Cassandra Perform as a Time Series Database
 
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
 
The Apache Cassandra ecosystem
The Apache Cassandra ecosystemThe Apache Cassandra ecosystem
The Apache Cassandra ecosystem
 
ScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous Speed
 
04 Reports
04 Reports04 Reports
04 Reports
 
Automatic Storage Management (ASM) metrics are a goldmine: Let's use them!
Automatic Storage Management (ASM) metrics are a goldmine: Let's use them!Automatic Storage Management (ASM) metrics are a goldmine: Let's use them!
Automatic Storage Management (ASM) metrics are a goldmine: Let's use them!
 

Recently uploaded

Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 

Recently uploaded (20)

Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 

Distributed Counters in Cassandra (Cassandra Summit 2010)

  • 1. Distributed Counters in Cassandra Friday, August 13, 2010
  • 2. I: Goal II: Design III: Implementation Distributed Counters in Cassandra Friday, August 13, 2010
  • 3. I: Goal Distributed Counters in Cassandra Friday, August 13, 2010
  • 4. Goal Low Latency, Highly Available Counters Distributed Counters in Cassandra Friday, August 13, 2010
  • 5. II: Design Distributed Counters in Cassandra Friday, August 13, 2010
  • 6. I: Traditional Counter Design II: Abstract Strategy III: Distributed Counter Design Distributed Counters in Cassandra Friday, August 13, 2010
  • 7. Design I: Traditional Counter Design Distributed Counters in Cassandra Friday, August 13, 2010
  • 8. Traditional Counter Design Atomic Counters 1. single machine 2. one order of execution 3. strongly consistent Distributed Counters in Cassandra Friday, August 13, 2010
  • 9. Traditional Counter Design Problems 1. SPOF / single master 2. high latency 3. manually sharded Distributed Counters in Cassandra Friday, August 13, 2010
  • 10. Traditional Counter Design Question What constraints can we relax? Distributed Counters in Cassandra Friday, August 13, 2010
  • 11. Design II: Abstract Strategy Distributed Counters in Cassandra Friday, August 13, 2010
  • 12. Abstract Strategy Constraints to Relax 1. one order of execution 2. strong consistency Distributed Counters in Cassandra Friday, August 13, 2010
  • 13. Abstract Strategy Relax: One Order of Execution commutative operation: - operations must be re-orderable Distributed Counters in Cassandra Friday, August 13, 2010
  • 14. Abstract Strategy Relax: Strong Consistency partitioned work: - each op must occur once - unique partition identifier idempotent repair: - recognize ops from other partitions Distributed Counters in Cassandra Friday, August 13, 2010
  • 15. Design III: Distributed Counter Design Distributed Counters in Cassandra Friday, August 13, 2010
  • 16. Distributed Counter Design Requirements 1. commutative operation 2. partitioned work 3. idempotent repair Distributed Counters in Cassandra Friday, August 13, 2010
  • 17. Distributed Counter Design Commutative Operation addition: - commutative operation - sum ops performed by all replicas -a + b = b + a Distributed Counters in Cassandra Friday, August 13, 2010
  • 18. Distributed Counter Design Partitioned Work each op assigned to a replica: - every replica sums all of its ops Distributed Counters in Cassandra Friday, August 13, 2010
  • 19. Distributed Counter Design Idempotent Repair save counts from remote replicas: - keep highest count seen prevent multiple execution: - do not transfer the target replica’s count Distributed Counters in Cassandra Friday, August 13, 2010
  • 20. III: Implementation Distributed Counters in Cassandra Friday, August 13, 2010
  • 21. I: Data Structure II: Single Node III: Eventual Consistency Distributed Counters in Cassandra Friday, August 13, 2010
  • 22. I: Data Structure Distributed Counters in Cassandra Friday, August 13, 2010
  • 23. Data Structure Requirements local counts: - incrementally update remote counts: - independently track partitions Distributed Counters in Cassandra Friday, August 13, 2010
  • 24. Data Structure Context Format list of (replica id, count) tuples: [(replica A, count), (replica B, count), ...] Distributed Counters in Cassandra Friday, August 13, 2010
  • 25. Data Structure Context Mutations local write: sum local count and write delta note: memtable Distributed Counters in Cassandra Friday, August 13, 2010
  • 26. Data Structure Context Mutations remote repair: for each replica, keep highest count seen (local or from repair) Distributed Counters in Cassandra Friday, August 13, 2010
  • 27. II: Single Node Distributed Counters in Cassandra Friday, August 13, 2010
  • 28. Single Node Write Path client 1. construct column - value: delta (big-endian long) - clock: empty 2. thrift: insert / batch_mutate Distributed Counters in Cassandra Friday, August 13, 2010
  • 29. Single Node Write Path coordinator 1. choose partition - choose target replica - requirement: ConsistencyLevel.ONE 2. construct clock - context format: [(target replica id, count delta)] Distributed Counters in Cassandra Friday, August 13, 2010
  • 30. Single Node Write Path target replica insert: 1. memtable does not contain column 2. insert column into memtable Distributed Counters in Cassandra Friday, August 13, 2010
  • 31. Single Node Write Path target replica update: 1. memtable contains column 2. retrieve existing column 3. create new column - context: sum local count w/ delta from write 4. replace column in ConcurrentSkipListMap 5. if failed to replace column, go to step 2. Distributed Counters in Cassandra Friday, August 13, 2010
  • 32. Single Node Write Path Interesting Note: MTs are serialized to SSTs, as-is - each SST encapsulates the updates when it was an MT - local count total must be aggregated across the MT and all SSTs Distributed Counters in Cassandra Friday, August 13, 2010
  • 33. Single Node Read Path target replica read: 1. construct collating iterator over: - frozen snapshot of MT - all relevant SSTs 2. resolve column - local counts: sum - remote counts: keep max 3. construct value - sum local and remote counts (big-endian long) Distributed Counters in Cassandra Friday, August 13, 2010
  • 34. Single Node Compaction replica compaction: 1. construct collating iterator over all SSTs 2. resolve every column in the CF - local counts: sum - remote counts: keep max 3. write out resolved CF Distributed Counters in Cassandra Friday, August 13, 2010
  • 35. III: Eventual Consistency Distributed Counters in Cassandra Friday, August 13, 2010
  • 36. Eventual Consistency Read Repair coordinator / replica read repair: 1. calculate resolved (superset) CF - resolve every column (local: sum, remote: max) 2. return resolved CF to client Distributed Counters in Cassandra Friday, August 13, 2010
  • 37. Eventual Consistency Read Repair coordinator / replica read repair: 1. calculate repair CF for each replica - calculate diff CF between resolved and received - modify columns to remove target replica’s counts 2. send repair CF to each replica Distributed Counters in Cassandra Friday, August 13, 2010
  • 38. Eventual Consistency Anti-Entropy Service sending replica AES: 1. follow normal AES code path - calculate repair SST based on shared ranges - send repair SST Distributed Counters in Cassandra Friday, August 13, 2010
  • 39. Eventual Consistency Anti-Entropy Service receiving replica AES: 1. post-process streamed SST - re-build streamed SST - note: strip out local replica’s counts 2. remove temporary descriptor 3. add to SSTableTracker Distributed Counters in Cassandra Friday, August 13, 2010
  • 40. Questions? Distributed Counters in Cassandra Friday, August 13, 2010
  • 41. More Information Issues: #580: Vector Clocks #1072: Distributed Counters Related Work: Helland and Campbell, Building on Quicksand, CIDR (2009), Sections 5 & 6. My email address: kakugawa@gmail.com Distributed Counters in Cassandra Friday, August 13, 2010