SlideShare a Scribd company logo
Cassandra at 46 Labs: Idempotent
Counters
July 17, 2014
Who is this guy?
I’m also the Founder, which in Latin means
“everyone else gets paid before me.”
~ Literal Translation
Founded in 2012
Currently handle around 1/2 Billion call billing records
per day.
What is 46 Labs?
We build realtime telecom analytics
and security solutions for Carriers and Enterprises
Shout Outs
#Cassandra IRC Channel
“Unbelievable resource”
!
“Thumbs up for the Startup Program”
Nate McCall
“Helped us in our time of need”
To all of you who aren’t in that ballpark…feel free to take
the pitch and swing away.
Patent Warning
So…we the have parts of this process related to the
handling of telecom analytics and billing records patented.
!
Fair Warning to the telecom folks in the room.
You can do an operation several times without changing
the result as a function of performing the operation.
Simple Answer:
What is idempotence?
Example:
For example, as “set” is idempotent. An “increment or decrement”
isn’t. Not just with Cassandra, but with anything, by definition.
But why?
Because counters are NOT atomic in Cassandra.
Why does it matter?
Because it is really, really, really hard to do anything
atomic and distributed, especially counters.
Since counters aren’t idempotent, by definition, and not
atomic in Cassandra, it means that if you repeated the
same counter operation 100 times….you might get
different results on each run.
So…
???
It means that you can’t use Cassandra counters for anything
requiring precision….like billing balances, voting, statistical
analysis or any time-series data that must be exact.
The higher the volume and the more nodes you have, the
more inaccurate the counters become.
And…?
If you are wanting atomic counters inside of a database
as of today’s date, then maybe.
Hint: We have tried both (and a lot more). They are slow. Like…really slow for this type of
operation and have hurdles way beyond just being slow.
So I should use Mysql or Couchbase?
Is there a chance that a better alternative exists that will
allow me to use Cassandra and have atomic and
idempotent counters?
So, All is Lost?
Yeap.!
!
But it involves some helpers.
+
How we do it
+
=
Our call billing records come off our infrastructure and go
into a RabbitMQ cluster.
!
Hint: you could use Kafka, Redis, 0MQ, etc.
The RabbitMQ queues are a nice and safe place for our messages to sit and
wait to be processed.
RabbitMQ
With RabbitMQ ACKs, we can be sure the messages are fully processed
before they are removed.
We wrote Java workers, who’s sole job in life is to:
1. Consume Messages from Rabbit!
!
2. Perform In-memory atomic increment operations (increment/decrement).!
!
3. Persist the message to Cassandra.!
!
4. Push a static counter value into Cassandra (i.e. a set instead of an increment) every X seconds.!
!
5. ACK that the operation is complete back to Rabbit.
Workers
(You can use whatever language you prefer)
1. You can stream analytics in realtime.
!
2. Being in-memory, it is ridiculously fast and lightweight.
!
3. Its atomic because each counter constituent is in a single thread.
!
4. Cassandra can be used to atomically persist the counter.
!
5. The counter data matches the underlying data used to generate it exactly.
Why is this special?
What happens if the worker crashes…its all in memory!!
!
Refer to step 4 in what our worker’s job is to do:
“Push a static counter value into Cassandra (i.e. a set instead of an increment) every second.”
Wait…
Since we push a static counter value into Cassandra, we now have an idempotent way
to recover gracefully in the event of a crash. The worker fires up, asks Cassandra what
it should have in its memory, then starts its atomic operations again. This backup
worker can come up (Zookeeper) on a different physical or virtual host if needed.
Since you are limited to a single thread processing a single counter….once you run out of memory
or saturate the CPU for that counter you can’t grow!!
!
Yeap. This is why we shard our data at the application layer and not the worker layer. We abstract
scalability further out knowing we have a finite amount of memory and processing power to play with at
the worker level.
You cant grow!
We can atomically handle 1M ops/sec from a single worker on a single moderately
powered server. If you are taxing that single server you need to re-think your
architecture.!
Sure it does.
Does it work?
We currently process over 2 million counter operations
per second using this method.
Questions?
If you think of any ones that you forgot to ask,
you can email me at trevor@46labs.com.

More Related Content

Similar to Austin Cassandra Meetup re: Atomic Counters

Lessons learnt on a 2000-core cluster
Lessons learnt on a 2000-core clusterLessons learnt on a 2000-core cluster
Lessons learnt on a 2000-core cluster
Eugene Kirpichov
 
Building Asynchronous Applications
Building Asynchronous ApplicationsBuilding Asynchronous Applications
Building Asynchronous Applications
Johan Edstrom
 
Flink Forward Berlin 2018: Lasse Nedergaard - "Our successful journey with Fl...
Flink Forward Berlin 2018: Lasse Nedergaard - "Our successful journey with Fl...Flink Forward Berlin 2018: Lasse Nedergaard - "Our successful journey with Fl...
Flink Forward Berlin 2018: Lasse Nedergaard - "Our successful journey with Fl...
Flink Forward
 
Work Queues
Work QueuesWork Queues
Work Queuesciconf
 
Synchronous Reads Asynchronous Writes RubyConf 2009
Synchronous Reads Asynchronous Writes RubyConf 2009Synchronous Reads Asynchronous Writes RubyConf 2009
Synchronous Reads Asynchronous Writes RubyConf 2009pauldix
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire
Marko Mitranić
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey
J On The Beach
 
Gearman and CodeIgniter
Gearman and CodeIgniterGearman and CodeIgniter
Gearman and CodeIgniter
Erik Giberti
 
OSDC 2017 | Something Openshift Kubernetes Containers by Kristian Köhntopp
OSDC 2017 | Something Openshift Kubernetes Containers by Kristian KöhntoppOSDC 2017 | Something Openshift Kubernetes Containers by Kristian Köhntopp
OSDC 2017 | Something Openshift Kubernetes Containers by Kristian Köhntopp
NETWAYS
 
STP201 Efficiency at Scale - AWS re: Invent 2012
STP201 Efficiency at Scale - AWS re: Invent 2012STP201 Efficiency at Scale - AWS re: Invent 2012
STP201 Efficiency at Scale - AWS re: Invent 2012
Amazon Web Services
 
Scylla Summit 2018: Meshify - A Case Study, or Petshop Seamonsters
Scylla Summit 2018: Meshify - A Case Study, or Petshop SeamonstersScylla Summit 2018: Meshify - A Case Study, or Petshop Seamonsters
Scylla Summit 2018: Meshify - A Case Study, or Petshop Seamonsters
ScyllaDB
 
Low latency in java 8 v5
Low latency in java 8 v5Low latency in java 8 v5
Low latency in java 8 v5
Peter Lawrey
 
Case Study: Ansible and NASA
Case Study: Ansible and NASACase Study: Ansible and NASA
Case Study: Ansible and NASA
All Things Open
 
The cassandra odyssey
The cassandra odysseyThe cassandra odyssey
The cassandra odyssey
Phillip Gentry
 
MongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!
Andraz Tori
 
Software + Babies
Software + BabiesSoftware + Babies
Software + Babies
ArangoDB Database
 
Writing and testing high frequency trading engines in java
Writing and testing high frequency trading engines in javaWriting and testing high frequency trading engines in java
Writing and testing high frequency trading engines in java
Peter Lawrey
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++
Mike Acton
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Databricks
 

Similar to Austin Cassandra Meetup re: Atomic Counters (20)

Lessons learnt on a 2000-core cluster
Lessons learnt on a 2000-core clusterLessons learnt on a 2000-core cluster
Lessons learnt on a 2000-core cluster
 
Building Asynchronous Applications
Building Asynchronous ApplicationsBuilding Asynchronous Applications
Building Asynchronous Applications
 
Flink Forward Berlin 2018: Lasse Nedergaard - "Our successful journey with Fl...
Flink Forward Berlin 2018: Lasse Nedergaard - "Our successful journey with Fl...Flink Forward Berlin 2018: Lasse Nedergaard - "Our successful journey with Fl...
Flink Forward Berlin 2018: Lasse Nedergaard - "Our successful journey with Fl...
 
Work Queues
Work QueuesWork Queues
Work Queues
 
Synchronous Reads Asynchronous Writes RubyConf 2009
Synchronous Reads Asynchronous Writes RubyConf 2009Synchronous Reads Asynchronous Writes RubyConf 2009
Synchronous Reads Asynchronous Writes RubyConf 2009
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey
 
Gearman and CodeIgniter
Gearman and CodeIgniterGearman and CodeIgniter
Gearman and CodeIgniter
 
OSDC 2017 | Something Openshift Kubernetes Containers by Kristian Köhntopp
OSDC 2017 | Something Openshift Kubernetes Containers by Kristian KöhntoppOSDC 2017 | Something Openshift Kubernetes Containers by Kristian Köhntopp
OSDC 2017 | Something Openshift Kubernetes Containers by Kristian Köhntopp
 
STP201 Efficiency at Scale - AWS re: Invent 2012
STP201 Efficiency at Scale - AWS re: Invent 2012STP201 Efficiency at Scale - AWS re: Invent 2012
STP201 Efficiency at Scale - AWS re: Invent 2012
 
Scylla Summit 2018: Meshify - A Case Study, or Petshop Seamonsters
Scylla Summit 2018: Meshify - A Case Study, or Petshop SeamonstersScylla Summit 2018: Meshify - A Case Study, or Petshop Seamonsters
Scylla Summit 2018: Meshify - A Case Study, or Petshop Seamonsters
 
Low latency in java 8 v5
Low latency in java 8 v5Low latency in java 8 v5
Low latency in java 8 v5
 
Case Study: Ansible and NASA
Case Study: Ansible and NASACase Study: Ansible and NASA
Case Study: Ansible and NASA
 
The cassandra odyssey
The cassandra odysseyThe cassandra odyssey
The cassandra odyssey
 
MongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB and AWS Best Practices
MongoDB and AWS Best Practices
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!
 
Software + Babies
Software + BabiesSoftware + Babies
Software + Babies
 
Writing and testing high frequency trading engines in java
Writing and testing high frequency trading engines in javaWriting and testing high frequency trading engines in java
Writing and testing high frequency trading engines in java
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
 

Recently uploaded

National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 

Recently uploaded (20)

National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 

Austin Cassandra Meetup re: Atomic Counters

  • 1. Cassandra at 46 Labs: Idempotent Counters July 17, 2014
  • 2. Who is this guy? I’m also the Founder, which in Latin means “everyone else gets paid before me.” ~ Literal Translation
  • 3. Founded in 2012 Currently handle around 1/2 Billion call billing records per day. What is 46 Labs? We build realtime telecom analytics and security solutions for Carriers and Enterprises
  • 4. Shout Outs #Cassandra IRC Channel “Unbelievable resource” ! “Thumbs up for the Startup Program” Nate McCall “Helped us in our time of need”
  • 5. To all of you who aren’t in that ballpark…feel free to take the pitch and swing away. Patent Warning So…we the have parts of this process related to the handling of telecom analytics and billing records patented. ! Fair Warning to the telecom folks in the room.
  • 6. You can do an operation several times without changing the result as a function of performing the operation. Simple Answer: What is idempotence? Example: For example, as “set” is idempotent. An “increment or decrement” isn’t. Not just with Cassandra, but with anything, by definition.
  • 7. But why? Because counters are NOT atomic in Cassandra. Why does it matter? Because it is really, really, really hard to do anything atomic and distributed, especially counters.
  • 8. Since counters aren’t idempotent, by definition, and not atomic in Cassandra, it means that if you repeated the same counter operation 100 times….you might get different results on each run. So… ???
  • 9. It means that you can’t use Cassandra counters for anything requiring precision….like billing balances, voting, statistical analysis or any time-series data that must be exact. The higher the volume and the more nodes you have, the more inaccurate the counters become. And…?
  • 10. If you are wanting atomic counters inside of a database as of today’s date, then maybe. Hint: We have tried both (and a lot more). They are slow. Like…really slow for this type of operation and have hurdles way beyond just being slow. So I should use Mysql or Couchbase?
  • 11. Is there a chance that a better alternative exists that will allow me to use Cassandra and have atomic and idempotent counters? So, All is Lost? Yeap.! ! But it involves some helpers.
  • 12. + How we do it + =
  • 13. Our call billing records come off our infrastructure and go into a RabbitMQ cluster. ! Hint: you could use Kafka, Redis, 0MQ, etc. The RabbitMQ queues are a nice and safe place for our messages to sit and wait to be processed. RabbitMQ With RabbitMQ ACKs, we can be sure the messages are fully processed before they are removed.
  • 14. We wrote Java workers, who’s sole job in life is to: 1. Consume Messages from Rabbit! ! 2. Perform In-memory atomic increment operations (increment/decrement).! ! 3. Persist the message to Cassandra.! ! 4. Push a static counter value into Cassandra (i.e. a set instead of an increment) every X seconds.! ! 5. ACK that the operation is complete back to Rabbit. Workers (You can use whatever language you prefer)
  • 15. 1. You can stream analytics in realtime. ! 2. Being in-memory, it is ridiculously fast and lightweight. ! 3. Its atomic because each counter constituent is in a single thread. ! 4. Cassandra can be used to atomically persist the counter. ! 5. The counter data matches the underlying data used to generate it exactly. Why is this special?
  • 16. What happens if the worker crashes…its all in memory!! ! Refer to step 4 in what our worker’s job is to do: “Push a static counter value into Cassandra (i.e. a set instead of an increment) every second.” Wait… Since we push a static counter value into Cassandra, we now have an idempotent way to recover gracefully in the event of a crash. The worker fires up, asks Cassandra what it should have in its memory, then starts its atomic operations again. This backup worker can come up (Zookeeper) on a different physical or virtual host if needed.
  • 17. Since you are limited to a single thread processing a single counter….once you run out of memory or saturate the CPU for that counter you can’t grow!! ! Yeap. This is why we shard our data at the application layer and not the worker layer. We abstract scalability further out knowing we have a finite amount of memory and processing power to play with at the worker level. You cant grow! We can atomically handle 1M ops/sec from a single worker on a single moderately powered server. If you are taxing that single server you need to re-think your architecture.!
  • 18. Sure it does. Does it work? We currently process over 2 million counter operations per second using this method.
  • 19. Questions? If you think of any ones that you forgot to ask, you can email me at trevor@46labs.com.