SlideShare a Scribd company logo
1 of 84
Download to read offline
Building Your Own Distributed System 
The Easy Way 
Kévin Lovato - @alprema
What this presentation will 
NOT talk about 
• Gazillions of inserts per second 
• Hundreds of nodes 
• Migrations from old technology to C* that now go 100 times faster
What this presentation will talk 
about 
• Servers that synchronize their state 
• Out of order messages 
• CQL Schema design 
• Time measurement madness
Introduction
• Hedge fund specialized in algorithmic trading 
• ~80 employees 
• Our C* usage 
• Historical data (6+ Tb) 
• Time series (Metrics) 
• Home made Service Bus (Zebus)
Service Bus 101 
• Network abstraction layer 
• Allows communication between services (SOA) 
• Communication is enabled using Business level messages (events) 
• Usually relies on a broker
Zebus 101 
• Developed in .Net 
• P2P 
• Lightweight 
• CQRS oriented 
• 1+ year of production experience 
• ~150M messages / day
Architecture overview
Terminology 
• Peer: A program connected to the Bus 
• Subscription: A message type a Peer is interested in 
• Directory server: A Peer that knows all the Peers and their Subscriptions
Directory 1 Directory 2 
Peer 1 Peer 2 
Peer 3 
Peer 1 is not connected and needs to 
register on the bus
Directory 1 Directory 2 
Peer 1 Peer 2 
Peer 3 
Register Peers list + 
Subscriptions
Directory 1 Directory 2 
Peer 1 Peer 2 
Peer 3 
New Peer information
Directory 1 Directory 2 
Peer 1 Peer 2 
Peer 3 
Direct communication
Design requirements
The Directory servers must be identical (no master)
The Directory servers must be identical (no master) 
A peer can contact any of the Directory servers at any time
The Directory servers must be identical (no master) 
A peer can contact any of the Directory servers at any time 
Directory servers can be updated/restarted at any time
The Directory servers must be identical (no master) 
A peer can contact any of the Directory servers at any time 
Directory servers can be updated/restarted at any time 
Peers have to be able to add Subscriptions one at a time if needed
Option 1: Design a resilient 
distributed system
Option 2: Let Cassandra do the 
heavy lifting Pick me! 
Pick me!
How ?
Make the Directory Servers 
stateless 
I
• Allows to offload state synchronization to Cassandra (Quorum 
everywhere) 
• Makes restart / crash recovery easy 
• Only « business » code in the Directory Server
Handle out of order 
subscriptions 
II
Directory 1 Directory 2 
Peer 1 
Timestamps: 
Naive implementation (server side) 
Peer 1 is already registered on the Bus and 
will need to do multiple Subscription updates
Directory 1 Directory 2 
Subscriptions update A 
Peer 1 
Timestamps: 
Naive implementation (server side)
Directory 1 Directory 2 
Peer 1 
Subscriptions update B 
Timestamps: 
Naive implementation (server side)
Directory 1 Directory 2 
A delay (network, slow machine, etc.) causes 
Directory 1 to process the update after Directory 2 
Peer 1 
Timestamps: 
Naive implementation (server side)
Subscriptions update B 
Timestamp: 00:00:01 
Directory 1 Directory 2 
Peer 1 
Timestamps: 
Naive implementation (server side)
Timestamps: 
Naive implementation (server side) 
Directory 1 Directory 2 
Peer 1 
Subscriptions update A 
Timestamp: 00:00:02
Directory 1 Directory 2 
Peer 1 
Stored: Subscriptions update A 
Timestamps: 
Naive implementation (server side)
Stored: Subscriptions update A 
Timestamps: 
Naive implementation (server side)
Directory 1 Directory 2 
Peer 1 
Timestamps: 
Zebus implementation (client side) 
Same scenario, but this time using client side 
timestamps
Directory 1 Directory 2 
Subscriptions update A 
Timestamp: 00:00:01 
Peer 1 
Timestamps: 
Zebus implementation (client side)
Directory 1 Directory 2 
Peer 1 
Subscriptions update B 
Timestamp: 00:00:02 
Timestamps: 
Zebus implementation (client side)
Directory 1 Directory 2 
Peer 1 
Timestamps: 
Zebus implementation (client side) 
The delay voodoo happens again
Subscriptions update B 
Timestamp: 00:00:02 
Directory 1 Directory 2 
Peer 1 
Timestamps: 
Zebus implementation (client side)
Timestamps: 
Zebus implementation (client side) 
Directory 1 Directory 2 
Peer 1 
Subscriptions update A 
Timestamp: 00:00:01
Directory 1 Directory 2 
Peer 1 
Timestamp resolution is handled by C* 
Stored: Subscriptions update B 
Timestamps: 
Zebus implementation (client side)
Timestamp resolution is handled by C* 
Stored: Subscriptions update B 
Timestamps: 
Zebus implementation (client side)
Handle subscriptions 
efficiently 
III
A Peer is already registered on the bus, and 
has subscribed to one event type 
Peer 1 
Directory 1 
Peer ID MessageType Sub. Info 
Peer.1 CoolEvent { misc. Info } Initial subscriptions
It now needs to add a new subscription 
Peer 1 
Directory 1 
Peer ID MessageType Sub. Info 
Peer.1 CoolEvent { misc. Info } Initial subscriptions
Peer 1 
Directory 1 
Peer ID MessageType Sub. Info 
Peer.1 CoolEvent { misc. Info } 
Peer.1 OtherEvent 
(new) { misc. Info } 
It will send all its current subscriptions + the 
new one
Peer 1 
Directory 1 
Now imagine that the peer adds 10 000 
subscriptions
Peer 1 
Directory 1 
Now imagine that the peer adds 10 000 
subscriptions, one at a time
Peer 1 
Directory 1 
Peer ID MessageType Sub. Info 
Peer.1 CoolEvent { misc. Info } 
Peer.1 OtherEvent 
(new) { misc. Info } 
…10 000 other events… 
Peer.1 NthEvent { misc. Info } 
10 000x times
Peer 1 
Directory 1 
Solution: Transfer subscriptions by message 
type
Peer 1 
Directory 1 
Peer ID MessageType Sub. Info 
Peer.1 NewEvent (1st) { misc. Info }
Peer 1 
Directory 1 
Peer ID MessageType Sub. Info 
Peer.1 NewEvent (2nd) { misc. Info } 
And so on…
But then, how do we store that?
Pick the proper row granularity 
IV
• We want to only do upserts (no read-before-write) 
• We want Cassandra to use client timestamps to resolve out of order 
updates 
• Subscriptions have to be updatable one by one
One subscription per row 
Peer ID MessageType Subscription Info 
Peer.18 CoolEvent { misc. Info } 
… … … 
• Primary Key (Peer Id, MessageType)
Directory 
Peer 1 and 2 need to register on the Bus 
Peer 1 Peer 2
Peer ID MessageType Sub. Info 
Peer.1 CoolEvent { misc. Info } 
Peer.1 OtherEvent { misc. Info } 
Directory 
• Peer 1 registers with 2 
Subscriptions 
Peer 1 Peer 2
Writing 
Directory 
• Peer 1 registers with 2 
Subscriptions • Directory starts to write to C* 
Peer 1 Peer 2
Still writing 
Directory 
• Peer 1 registers with 2 
Subscriptions • Directory starts to write to C* • Peer 2 registers during the write 
Register 
Peer 1 Peer 2
Still writing 
Directory 
• Peer 1 registers with 2 
Subscriptions • Directory starts to write to C* • Peer 2 registers during the write • Since insertion was not over, 
Peer 2 gets an incomplete state 
Peer ID MessageType Sub. Info 
Peer.1 CoolEvent { misc. Info } 
Peer 1 Peer 2
All subscriptions in one row 
Peer ID All Subscriptions Blob 
Peer.18 { blob } 
… … 
• Primary Key (Peer Id)
Directory 1 Directory 2 
Peer 1 is already registered on the Bus 
and needs to add two Subscriptions 
Peer 1
Add Subscription 1 
Directory 1 Directory 2 
Peer 1 
• Peer 1 adds Subscription 1
Add Subscription 2 
Directory 1 Directory 2 
Peer 1 
• Peer 1 adds Subscription 1 • Peer 1 adds Subscription 2
Directory 1 Directory 2 
A delay (again!) slows down Directory 1, causing both 
Subscriptions to be added simultaneously 
Peer 1
State: 
No subscriptions 
State: 
No subscriptions 
Directory 1 Directory 2 
Peer 1 
• Peer 1 adds Subscription 1 • Peer 1 adds Subscription 2 • Directory 1 gets the state to add Subscription 1 • Directory 2 gets the state to add Subscription 2
Store: 
Subscription 1 
Store: 
Subscription 2 
Directory 1 Directory 2 
Peer 1 
• Peer 1 adds Subscription 1 • Peer 1 adds Subscription 2 • Directory 1 gets the state to add Subscription 1 • Directory 2 gets the state to add Subscription 2 • They both store the updated state to C*
Stored: 
Either Subscription 1 or 2 depending on 
which was the slowest 
Directory 1 Directory 2 
Peer 1 
• Peer 1 adds Subscription 1 • Peer 1 adds Subscription 2 • Directory 1 gets the state to add Subscription 1 • Directory 2 gets the state to add Subscription 2 • They both store the updated state to C* • Both store only their new subscription
Solution: Compromise 
• We split subscriptions into Static and Dynamic subscriptions 
• Static subscriptions cannot be updated one-by-one 
• The Dynamic subscriptions list cannot be handled as atomic 
• Each type has its own Column Family
Static subscriptions schema 
Peer ID Endpoint IsUp […] StaticSubscriptions 
Peer.18 tcp://1.2.3.4:123 true […] { blob } 
… … … […] … 
• Primary Key (Peer Id)
Dynamic subscriptions schema 
Peer ID MessageType Subscription info 
Peer.18 UserCreated { misc. Info } 
… … … 
• Primary Key (Peer Id, MessageType)
Miscellaneous bits of “fun” 
V
DateTime.Now 
• Calling DateTime.Now twice in a row can (and will) return the same value 
• Its resolution is around 10ms 
• We had to create a unique timestamp provider (add 1 tick when called in 
same « time bucket »)
Cassandra timestamp 
• .Net’s DateTime.Ticks is more precise than Cassandra’s timestamps (100 
ns vs. 1 μs) 
• Our custom time provider ensured uniqueness by adding 1 tick at a time, 
which was lost in translation
« UselessKey » 
• The Directory CF is really small and needs to be retrieved entirely and 
frequently 
• We used a « bool UselessKey » PartitionKey to force sequential storage 
and squeeze the last bits of speeds we needed
« UselessKey » 
UselessKey Peer ID MessageType Subscription info 
false Peer.18 UserCreated { misc. Info } 
… … … 
• Primary Key (UselessKey, Peer Id, MessageType) 
• You should bench (after a flush) with your real data
Summary
When you have multiple servers sharing a state, Cassandra can save you 
some headaches
When you have multiple servers sharing a state, Cassandra can save you 
some headaches 
The schema design is very critical, think it thoroughly and make sure you 
understand what is atomic and what is not
When you have multiple servers sharing a state, Cassandra can save you 
some headaches 
The schema design is very critical, think it thoroughly and make sure you 
understand what is atomic and what is not 
Client provided timestamps can be very useful, but be sure to generate 
unique timestamps
When you have multiple servers sharing a state, Cassandra can save you 
some headaches 
The schema design is very critical, think it thoroughly and make sure you 
understand what is atomic and what is not 
Client provided timestamps can be very useful, but be sure to generate 
unique timestamps 
If you are not using Java, be well-aware of data types differences between 
your language and Java
Want to see the code ? 
www.github.com/Abc-Arbitrage
Want to see more code ? 
jobs@abc-arbitrage.com
Questions ?

More Related Content

What's hot

Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin PodvalMartin Podval
 
Building Stream Processing as a Service
Building Stream Processing as a ServiceBuilding Stream Processing as a Service
Building Stream Processing as a ServiceSteven Wu
 
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...confluent
 
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean FellowsDeploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean Fellowsconfluent
 
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019confluent
 
How to manage large amounts of data with akka streams
How to manage large amounts of data with akka streamsHow to manage large amounts of data with akka streams
How to manage large amounts of data with akka streamsIgor Mielientiev
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloudconfluent
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafkaJiangjie Qin
 
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTINGLOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTINGijccsa
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...DataWorks Summit/Hadoop Summit
 
Administrative techniques to reduce Kafka costs | Anna Kepler, Viasat
Administrative techniques to reduce Kafka costs | Anna Kepler, ViasatAdministrative techniques to reduce Kafka costs | Anna Kepler, Viasat
Administrative techniques to reduce Kafka costs | Anna Kepler, ViasatHostedbyConfluent
 
Sql disaster recovery
Sql disaster recoverySql disaster recovery
Sql disaster recoverySqlperfomance
 
Load Balancing from the Cloud - Layer 7 Aware Solution
Load Balancing from the Cloud - Layer 7 Aware SolutionLoad Balancing from the Cloud - Layer 7 Aware Solution
Load Balancing from the Cloud - Layer 7 Aware SolutionImperva Incapsula
 
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...HostedbyConfluent
 
Load Balancing In Cloud Computing newppt
Load Balancing In Cloud Computing newpptLoad Balancing In Cloud Computing newppt
Load Balancing In Cloud Computing newpptUtshab Saha
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulHostedbyConfluent
 
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015Till Rohrmann
 
Integrating Apache Pulsar with Big Data Ecosystem
Integrating Apache Pulsar with Big Data EcosystemIntegrating Apache Pulsar with Big Data Ecosystem
Integrating Apache Pulsar with Big Data EcosystemStreamNative
 

What's hot (20)

Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Building Stream Processing as a Service
Building Stream Processing as a ServiceBuilding Stream Processing as a Service
Building Stream Processing as a Service
 
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
 
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean FellowsDeploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
 
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
 
Kafka ops-new
Kafka ops-newKafka ops-new
Kafka ops-new
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
How to manage large amounts of data with akka streams
How to manage large amounts of data with akka streamsHow to manage large amounts of data with akka streams
How to manage large amounts of data with akka streams
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafka
 
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTINGLOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
 
Administrative techniques to reduce Kafka costs | Anna Kepler, Viasat
Administrative techniques to reduce Kafka costs | Anna Kepler, ViasatAdministrative techniques to reduce Kafka costs | Anna Kepler, Viasat
Administrative techniques to reduce Kafka costs | Anna Kepler, Viasat
 
Sql disaster recovery
Sql disaster recoverySql disaster recovery
Sql disaster recovery
 
Load Balancing from the Cloud - Layer 7 Aware Solution
Load Balancing from the Cloud - Layer 7 Aware SolutionLoad Balancing from the Cloud - Layer 7 Aware Solution
Load Balancing from the Cloud - Layer 7 Aware Solution
 
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
 
Load Balancing In Cloud Computing newppt
Load Balancing In Cloud Computing newpptLoad Balancing In Cloud Computing newppt
Load Balancing In Cloud Computing newppt
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
 
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
 
Integrating Apache Pulsar with Big Data Ecosystem
Integrating Apache Pulsar with Big Data EcosystemIntegrating Apache Pulsar with Big Data Ecosystem
Integrating Apache Pulsar with Big Data Ecosystem
 

Similar to Building your own Distributed System The easy way - Cassandra Summit EU 2014

Zebus - Pitfalls of a P2P service bus
Zebus - Pitfalls of a P2P service busZebus - Pitfalls of a P2P service bus
Zebus - Pitfalls of a P2P service busKévin LOVATO
 
Orbe: Scalable Causal Consistency Using Dependency Matrices and Physical Clocks
Orbe: Scalable Causal Consistency Using Dependency Matrices and Physical ClocksOrbe: Scalable Causal Consistency Using Dependency Matrices and Physical Clocks
Orbe: Scalable Causal Consistency Using Dependency Matrices and Physical ClocksJiaqing Du
 
Event Driven Architectures
Event Driven ArchitecturesEvent Driven Architectures
Event Driven ArchitecturesAvinash Ramineni
 
Spark Summit EU talk by Sebastian Schroeder and Ralf Sigmund
Spark Summit EU talk by Sebastian Schroeder and Ralf SigmundSpark Summit EU talk by Sebastian Schroeder and Ralf Sigmund
Spark Summit EU talk by Sebastian Schroeder and Ralf SigmundSpark Summit
 
Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Xavier Lucas
 
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...HostedbyConfluent
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
Event Driven Architectures - Phoenix Java Users Group 2013
Event Driven Architectures - Phoenix Java Users Group 2013Event Driven Architectures - Phoenix Java Users Group 2013
Event Driven Architectures - Phoenix Java Users Group 2013clairvoyantllc
 
Swift container sync
Swift container syncSwift container sync
Swift container syncOpen Stack
 
MyHeritage Kakfa use cases - Feb 2014 Meetup
MyHeritage Kakfa use cases - Feb 2014 Meetup MyHeritage Kakfa use cases - Feb 2014 Meetup
MyHeritage Kakfa use cases - Feb 2014 Meetup Ran Levy
 
Realtime stream processing with kafka
Realtime stream processing with kafkaRealtime stream processing with kafka
Realtime stream processing with kafkaPraveen Singh Bora
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
ECS19 - Ingo Gegenwarth - Running Exchange in large environment
ECS19 - Ingo Gegenwarth -  Running Exchangein large environmentECS19 - Ingo Gegenwarth -  Running Exchangein large environment
ECS19 - Ingo Gegenwarth - Running Exchange in large environmentEuropean Collaboration Summit
 
From a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised LandFrom a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised LandRan Silberman
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Sid Anand
 
How we built an event-time merge of two kafka-streams with spark-streaming
How we built an event-time merge of two kafka-streams with spark-streamingHow we built an event-time merge of two kafka-streams with spark-streaming
How we built an event-time merge of two kafka-streams with spark-streamingRalf Sigmund
 
Build your own Service Bus V2
Build your own Service Bus V2Build your own Service Bus V2
Build your own Service Bus V2Kévin LOVATO
 

Similar to Building your own Distributed System The easy way - Cassandra Summit EU 2014 (20)

Zebus - Pitfalls of a P2P service bus
Zebus - Pitfalls of a P2P service busZebus - Pitfalls of a P2P service bus
Zebus - Pitfalls of a P2P service bus
 
Orbe: Scalable Causal Consistency Using Dependency Matrices and Physical Clocks
Orbe: Scalable Causal Consistency Using Dependency Matrices and Physical ClocksOrbe: Scalable Causal Consistency Using Dependency Matrices and Physical Clocks
Orbe: Scalable Causal Consistency Using Dependency Matrices and Physical Clocks
 
Kafka eos
Kafka eosKafka eos
Kafka eos
 
Event Driven Architectures
Event Driven ArchitecturesEvent Driven Architectures
Event Driven Architectures
 
Kafka101
Kafka101Kafka101
Kafka101
 
Spark Summit EU talk by Sebastian Schroeder and Ralf Sigmund
Spark Summit EU talk by Sebastian Schroeder and Ralf SigmundSpark Summit EU talk by Sebastian Schroeder and Ralf Sigmund
Spark Summit EU talk by Sebastian Schroeder and Ralf Sigmund
 
Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28
 
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Event Driven Architectures - Phoenix Java Users Group 2013
Event Driven Architectures - Phoenix Java Users Group 2013Event Driven Architectures - Phoenix Java Users Group 2013
Event Driven Architectures - Phoenix Java Users Group 2013
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
 
Swift container sync
Swift container syncSwift container sync
Swift container sync
 
MyHeritage Kakfa use cases - Feb 2014 Meetup
MyHeritage Kakfa use cases - Feb 2014 Meetup MyHeritage Kakfa use cases - Feb 2014 Meetup
MyHeritage Kakfa use cases - Feb 2014 Meetup
 
Realtime stream processing with kafka
Realtime stream processing with kafkaRealtime stream processing with kafka
Realtime stream processing with kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
ECS19 - Ingo Gegenwarth - Running Exchange in large environment
ECS19 - Ingo Gegenwarth -  Running Exchangein large environmentECS19 - Ingo Gegenwarth -  Running Exchangein large environment
ECS19 - Ingo Gegenwarth - Running Exchange in large environment
 
From a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised LandFrom a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised Land
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
 
How we built an event-time merge of two kafka-streams with spark-streaming
How we built an event-time merge of two kafka-streams with spark-streamingHow we built an event-time merge of two kafka-streams with spark-streaming
How we built an event-time merge of two kafka-streams with spark-streaming
 
Build your own Service Bus V2
Build your own Service Bus V2Build your own Service Bus V2
Build your own Service Bus V2
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 

Building your own Distributed System The easy way - Cassandra Summit EU 2014

  • 1.
  • 2. Building Your Own Distributed System The Easy Way Kévin Lovato - @alprema
  • 3. What this presentation will NOT talk about • Gazillions of inserts per second • Hundreds of nodes • Migrations from old technology to C* that now go 100 times faster
  • 4. What this presentation will talk about • Servers that synchronize their state • Out of order messages • CQL Schema design • Time measurement madness
  • 6. • Hedge fund specialized in algorithmic trading • ~80 employees • Our C* usage • Historical data (6+ Tb) • Time series (Metrics) • Home made Service Bus (Zebus)
  • 7. Service Bus 101 • Network abstraction layer • Allows communication between services (SOA) • Communication is enabled using Business level messages (events) • Usually relies on a broker
  • 8. Zebus 101 • Developed in .Net • P2P • Lightweight • CQRS oriented • 1+ year of production experience • ~150M messages / day
  • 10. Terminology • Peer: A program connected to the Bus • Subscription: A message type a Peer is interested in • Directory server: A Peer that knows all the Peers and their Subscriptions
  • 11. Directory 1 Directory 2 Peer 1 Peer 2 Peer 3 Peer 1 is not connected and needs to register on the bus
  • 12. Directory 1 Directory 2 Peer 1 Peer 2 Peer 3 Register Peers list + Subscriptions
  • 13. Directory 1 Directory 2 Peer 1 Peer 2 Peer 3 New Peer information
  • 14. Directory 1 Directory 2 Peer 1 Peer 2 Peer 3 Direct communication
  • 16. The Directory servers must be identical (no master)
  • 17. The Directory servers must be identical (no master) A peer can contact any of the Directory servers at any time
  • 18. The Directory servers must be identical (no master) A peer can contact any of the Directory servers at any time Directory servers can be updated/restarted at any time
  • 19. The Directory servers must be identical (no master) A peer can contact any of the Directory servers at any time Directory servers can be updated/restarted at any time Peers have to be able to add Subscriptions one at a time if needed
  • 20. Option 1: Design a resilient distributed system
  • 21. Option 2: Let Cassandra do the heavy lifting Pick me! Pick me!
  • 22. How ?
  • 23. Make the Directory Servers stateless I
  • 24. • Allows to offload state synchronization to Cassandra (Quorum everywhere) • Makes restart / crash recovery easy • Only « business » code in the Directory Server
  • 25. Handle out of order subscriptions II
  • 26. Directory 1 Directory 2 Peer 1 Timestamps: Naive implementation (server side) Peer 1 is already registered on the Bus and will need to do multiple Subscription updates
  • 27. Directory 1 Directory 2 Subscriptions update A Peer 1 Timestamps: Naive implementation (server side)
  • 28. Directory 1 Directory 2 Peer 1 Subscriptions update B Timestamps: Naive implementation (server side)
  • 29. Directory 1 Directory 2 A delay (network, slow machine, etc.) causes Directory 1 to process the update after Directory 2 Peer 1 Timestamps: Naive implementation (server side)
  • 30. Subscriptions update B Timestamp: 00:00:01 Directory 1 Directory 2 Peer 1 Timestamps: Naive implementation (server side)
  • 31. Timestamps: Naive implementation (server side) Directory 1 Directory 2 Peer 1 Subscriptions update A Timestamp: 00:00:02
  • 32. Directory 1 Directory 2 Peer 1 Stored: Subscriptions update A Timestamps: Naive implementation (server side)
  • 33. Stored: Subscriptions update A Timestamps: Naive implementation (server side)
  • 34. Directory 1 Directory 2 Peer 1 Timestamps: Zebus implementation (client side) Same scenario, but this time using client side timestamps
  • 35. Directory 1 Directory 2 Subscriptions update A Timestamp: 00:00:01 Peer 1 Timestamps: Zebus implementation (client side)
  • 36. Directory 1 Directory 2 Peer 1 Subscriptions update B Timestamp: 00:00:02 Timestamps: Zebus implementation (client side)
  • 37. Directory 1 Directory 2 Peer 1 Timestamps: Zebus implementation (client side) The delay voodoo happens again
  • 38. Subscriptions update B Timestamp: 00:00:02 Directory 1 Directory 2 Peer 1 Timestamps: Zebus implementation (client side)
  • 39. Timestamps: Zebus implementation (client side) Directory 1 Directory 2 Peer 1 Subscriptions update A Timestamp: 00:00:01
  • 40. Directory 1 Directory 2 Peer 1 Timestamp resolution is handled by C* Stored: Subscriptions update B Timestamps: Zebus implementation (client side)
  • 41. Timestamp resolution is handled by C* Stored: Subscriptions update B Timestamps: Zebus implementation (client side)
  • 43. A Peer is already registered on the bus, and has subscribed to one event type Peer 1 Directory 1 Peer ID MessageType Sub. Info Peer.1 CoolEvent { misc. Info } Initial subscriptions
  • 44. It now needs to add a new subscription Peer 1 Directory 1 Peer ID MessageType Sub. Info Peer.1 CoolEvent { misc. Info } Initial subscriptions
  • 45. Peer 1 Directory 1 Peer ID MessageType Sub. Info Peer.1 CoolEvent { misc. Info } Peer.1 OtherEvent (new) { misc. Info } It will send all its current subscriptions + the new one
  • 46. Peer 1 Directory 1 Now imagine that the peer adds 10 000 subscriptions
  • 47. Peer 1 Directory 1 Now imagine that the peer adds 10 000 subscriptions, one at a time
  • 48. Peer 1 Directory 1 Peer ID MessageType Sub. Info Peer.1 CoolEvent { misc. Info } Peer.1 OtherEvent (new) { misc. Info } …10 000 other events… Peer.1 NthEvent { misc. Info } 10 000x times
  • 49. Peer 1 Directory 1 Solution: Transfer subscriptions by message type
  • 50. Peer 1 Directory 1 Peer ID MessageType Sub. Info Peer.1 NewEvent (1st) { misc. Info }
  • 51. Peer 1 Directory 1 Peer ID MessageType Sub. Info Peer.1 NewEvent (2nd) { misc. Info } And so on…
  • 52. But then, how do we store that?
  • 53. Pick the proper row granularity IV
  • 54. • We want to only do upserts (no read-before-write) • We want Cassandra to use client timestamps to resolve out of order updates • Subscriptions have to be updatable one by one
  • 55. One subscription per row Peer ID MessageType Subscription Info Peer.18 CoolEvent { misc. Info } … … … • Primary Key (Peer Id, MessageType)
  • 56. Directory Peer 1 and 2 need to register on the Bus Peer 1 Peer 2
  • 57. Peer ID MessageType Sub. Info Peer.1 CoolEvent { misc. Info } Peer.1 OtherEvent { misc. Info } Directory • Peer 1 registers with 2 Subscriptions Peer 1 Peer 2
  • 58. Writing Directory • Peer 1 registers with 2 Subscriptions • Directory starts to write to C* Peer 1 Peer 2
  • 59. Still writing Directory • Peer 1 registers with 2 Subscriptions • Directory starts to write to C* • Peer 2 registers during the write Register Peer 1 Peer 2
  • 60. Still writing Directory • Peer 1 registers with 2 Subscriptions • Directory starts to write to C* • Peer 2 registers during the write • Since insertion was not over, Peer 2 gets an incomplete state Peer ID MessageType Sub. Info Peer.1 CoolEvent { misc. Info } Peer 1 Peer 2
  • 61. All subscriptions in one row Peer ID All Subscriptions Blob Peer.18 { blob } … … • Primary Key (Peer Id)
  • 62. Directory 1 Directory 2 Peer 1 is already registered on the Bus and needs to add two Subscriptions Peer 1
  • 63. Add Subscription 1 Directory 1 Directory 2 Peer 1 • Peer 1 adds Subscription 1
  • 64. Add Subscription 2 Directory 1 Directory 2 Peer 1 • Peer 1 adds Subscription 1 • Peer 1 adds Subscription 2
  • 65. Directory 1 Directory 2 A delay (again!) slows down Directory 1, causing both Subscriptions to be added simultaneously Peer 1
  • 66. State: No subscriptions State: No subscriptions Directory 1 Directory 2 Peer 1 • Peer 1 adds Subscription 1 • Peer 1 adds Subscription 2 • Directory 1 gets the state to add Subscription 1 • Directory 2 gets the state to add Subscription 2
  • 67. Store: Subscription 1 Store: Subscription 2 Directory 1 Directory 2 Peer 1 • Peer 1 adds Subscription 1 • Peer 1 adds Subscription 2 • Directory 1 gets the state to add Subscription 1 • Directory 2 gets the state to add Subscription 2 • They both store the updated state to C*
  • 68. Stored: Either Subscription 1 or 2 depending on which was the slowest Directory 1 Directory 2 Peer 1 • Peer 1 adds Subscription 1 • Peer 1 adds Subscription 2 • Directory 1 gets the state to add Subscription 1 • Directory 2 gets the state to add Subscription 2 • They both store the updated state to C* • Both store only their new subscription
  • 69. Solution: Compromise • We split subscriptions into Static and Dynamic subscriptions • Static subscriptions cannot be updated one-by-one • The Dynamic subscriptions list cannot be handled as atomic • Each type has its own Column Family
  • 70. Static subscriptions schema Peer ID Endpoint IsUp […] StaticSubscriptions Peer.18 tcp://1.2.3.4:123 true […] { blob } … … … […] … • Primary Key (Peer Id)
  • 71. Dynamic subscriptions schema Peer ID MessageType Subscription info Peer.18 UserCreated { misc. Info } … … … • Primary Key (Peer Id, MessageType)
  • 72. Miscellaneous bits of “fun” V
  • 73. DateTime.Now • Calling DateTime.Now twice in a row can (and will) return the same value • Its resolution is around 10ms • We had to create a unique timestamp provider (add 1 tick when called in same « time bucket »)
  • 74. Cassandra timestamp • .Net’s DateTime.Ticks is more precise than Cassandra’s timestamps (100 ns vs. 1 μs) • Our custom time provider ensured uniqueness by adding 1 tick at a time, which was lost in translation
  • 75. « UselessKey » • The Directory CF is really small and needs to be retrieved entirely and frequently • We used a « bool UselessKey » PartitionKey to force sequential storage and squeeze the last bits of speeds we needed
  • 76. « UselessKey » UselessKey Peer ID MessageType Subscription info false Peer.18 UserCreated { misc. Info } … … … • Primary Key (UselessKey, Peer Id, MessageType) • You should bench (after a flush) with your real data
  • 78. When you have multiple servers sharing a state, Cassandra can save you some headaches
  • 79. When you have multiple servers sharing a state, Cassandra can save you some headaches The schema design is very critical, think it thoroughly and make sure you understand what is atomic and what is not
  • 80. When you have multiple servers sharing a state, Cassandra can save you some headaches The schema design is very critical, think it thoroughly and make sure you understand what is atomic and what is not Client provided timestamps can be very useful, but be sure to generate unique timestamps
  • 81. When you have multiple servers sharing a state, Cassandra can save you some headaches The schema design is very critical, think it thoroughly and make sure you understand what is atomic and what is not Client provided timestamps can be very useful, but be sure to generate unique timestamps If you are not using Java, be well-aware of data types differences between your language and Java
  • 82. Want to see the code ? www.github.com/Abc-Arbitrage
  • 83. Want to see more code ? jobs@abc-arbitrage.com

Editor's Notes

  1. 4 DC
  2. MuleESB: 120 Connectors!
  3. Masterless
  4. Anytime Round-robin
  5. Fail/reboot anytime
  6. Subscriptions one-by-one
  7. No state synchro Crash recovery easy Clean business only code
  8. ONE REGISTERED peer wants MULTIPLE subscriptions
  9. SAME thing
  10. PAUSE after
  11. ONE REGISTERED peer has ONE subscription
  12. I talked about « Subscriptions » update…
  13. Only upserts Client timestamps => each atom in a column Subscriptions one-by-one
  14. peer1 & peer2 SIMULTANEOUSLY
  15. peer1 REGISTERED wants TWO subscriptions
  16. I wanna go « incremental »…
  17. Compromise Can’t be both « transactional » on register AND incremental Have to separate
  18. Now that I’m done with the big steps…
  19. « AboutNow » 10 ms resolution Unique provider +1tick
  20. Ticks (100ns) Vs. C* timestamp (1µs) 1tick => lost in translation
  21. Select * from (no where) often Squeeze the last bits of speed DONT do that
  22. Select * from (no where) often Squeeze the last bits of speed DONT do that
  23. Need a cool picture
  24. Shared state => cassandra good
  25. Schema design => atomic / non atomic
  26. Client timestamp good / but unique timestamp
  27. Warning language differences