SlideShare a Scribd company logo
$10 thousand per minute

of downtime: architecture,

queues, streaming and fintech
Max Baginskiy
Solidgate
About me
Head of Engineering

PreviouslyTech Lead and Platform engineer
Head of Engineering

PreviouslyTech Lead and Platform engineer
Head of Engineering

PreviouslyTech Lead and Platform engineer
10 yrs in Software Engineering

Last 6 years Go, fan of DevOps
10 yrs in Software Engineering

Last 6 years Go, fan of DevOps
10 yrs in Software Engineering

Last 6 years Go, fan of DevOps
Build teams (5 teams, 30+ people hired)

And architecture
Build teams (5 teams, 30+ people hired)

And architecture
Build teams (5 teams, 30+ people hired)

And architecture
Agenda
Company intro
Architecture of the system
Queues and Streams to choose from
Low latency streaming using outbox
CDC to our solution - comparison
Questions.
Company intro
Architecture of the system
Queues and Streams to choose from
Low latency streaming using outbox
CDC to our solution - comparison
Questions.
Company intro
Architecture of the system
Queues and Streams to choose from
Low latency streaming using outbox
CDC to our solution - comparison
Questions.
About company
7+ years online
7+ years online
7+ years online 70 engineers
50 SW engineers

20 Infra + Data engineers + AQA
70 engineers
50 SW engineers

20 Infra + Data engineers + AQA
70 engineers
50 SW engineers

20 Infra + Data engineers + AQA
PCI DSS Compliant
PCI DSS Compliant
PCI DSS Compliant European Acquirer
European Acquirer
European Acquirer
"$10 thousand per minute of downtime: architecture, queues, streaming and fintech", Max Baginskiy
Business figures
2.5b$
annually
2.5b$
annually
2.5b$
annually
15-18m
tx monthly
15-18m
tx monthly
15-18m
tx monthly
10k$
1 min of downtime
10k$
1 min of downtime
10k$
1 min of downtime
40+
integrated payment methods

and providers
40+
integrated payment methods

and providers
40+
integrated payment methods

and providers
ALBTraffic
We have 100x less traffic on ALB

during high season than Shopify
We have 100x less traffic on ALB

during high season than Shopify
We have 100x less traffic on ALB

during high season than Shopify
Stripe served 250mil API calls

in 2020 perday
Stripe served 250mil API calls

in 2020 perday
Stripe served 250mil API calls

in 2020 perday
Kafka Producer
We started integrating kafka lastyear
We started integrating kafka lastyear
We started integrating kafka lastyear 20 rps average
20 rps average
20 rps average 2 mil events perday
2 mil events perday
2 mil events perday
RabbitMQ Producer
100-120 rps average
100-120 rps average
100-120 rps average 10 mil events perday
10 mil events perday
10 mil events perday
Logs
1.5-2k rps of logs
1.5-2k rps of logs
1.5-2k rps of logs 150 mil events per day. 200-300 GB of logs daily
150 mil events per day. 200-300 GB of logs daily
150 mil events per day. 200-300 GB of logs daily
Architecture
Letthestory

begin
Go
Letthestory

begin
Go
Letthestory

begin
Go
Go
Go
Project “Taxer”
Non functional requirements
Durability out of the box
Durability out of the box
Durability out of the box Queue replay
Queue replay
Queue replay
Single active consumer support
Single active consumer support
Single active consumer support
Easy to setup and to maintain
Easy to setup and to maintain
Easy to setup and to maintain Partitioning
Partitioning
Partitioning
Easy scaling for publisher and consumer
Easy scaling for publisher and consumer
Easy scaling for publisher and consumer
Extensiblity: schema registry support, dynamic routing, enrichment
Extensiblity: schema registry support, dynamic routing, enrichment
Extensiblity: schema registry support, dynamic routing, enrichment
NFR - explanation
What if message is lost in between services

while processing andwe retry payment?
What if message is lost in between services

while processing andwe retry payment?
What if message is lost in between services

while processing andwe retry payment?
What if message is lost in between

callback service and callback processor?
What if message is lost in between

callback service and callback processor?
What if message is lost in between

callback service and callback processor?
What if message is lost in between

payment and finance systems?
What if message is lost in between

payment and finance systems?
What if message is lost in between

payment and finance systems?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
what if …what if …what if …?
RabbitMQ dive in
Erlang
Written in Erlang. Erlang made by Ericssonwhich
makes telecommunication devices.
Erlang
Written in Erlang. Erlang made by Ericssonwhich
makes telecommunication devices.
Erlang
Written in Erlang. Erlang made by Ericssonwhich
makes telecommunication devices.
Proof of fail-safety
ATM AXD301 example, Calculated uptime
99,9999999%, only one problem permany years.
Proof of fail-safety
ATM AXD301 example, Calculated uptime
99,9999999%, only one problem permany years.
Proof of fail-safety
ATM AXD301 example, Calculated uptime
99,9999999%, only one problem permany years.
Mnesia as storage
Mnesia doesn’t support recovery from split brain and
othertypes of failures.
Mnesia as storage
Mnesia doesn’t support recovery from split brain and
othertypes of failures.
Mnesia as storage
Mnesia doesn’t support recovery from split brain and
othertypes of failures.
RabbitMQ Durability
Mechanisms
Publisher confirms is a MUST have
RabbitMQ can store data to Disk and
different autoheal modes
Different types of queues: Quorum,
Mirrored
Have Streaming in “beta”.
Mechanisms
Publisher confirms is a MUST have
RabbitMQ can store data to Disk and
different autoheal modes
Different types of queues: Quorum,
Mirrored
Have Streaming in “beta”.
Mechanisms
Publisher confirms is a MUST have
RabbitMQ can store data to Disk and
different autoheal modes
Different types of queues: Quorum,
Mirrored
Have Streaming in “beta”.
What if publisher confirms
disabled?
Delivery after exchange might not
happen
Persistence might not happen
Few replicas might not acknowledge
message in Quorum
Overwhelmed Clusterwill not accept
messages but publisherwill not
know.
What if publisher confirms
disabled?
Delivery after exchange might not
happen
Persistence might not happen
Few replicas might not acknowledge
message in Quorum
Overwhelmed Clusterwill not accept
messages but publisherwill not
know.
What if publisher confirms
disabled?
Delivery after exchange might not
happen
Persistence might not happen
Few replicas might not acknowledge
message in Quorum
Overwhelmed Clusterwill not accept
messages but publisherwill not
know.
Quorum queues
+ Pros
Have Consensus built in
Data written to disk, metadata in memory
Can easily handle restarts.
+ Pros
Have Consensus built in
Data written to disk, metadata in memory
Can easily handle restarts.
+ Pros
Have Consensus built in
Data written to disk, metadata in memory
Can easily handle restarts.
− Cons
Doesn’t scale well - millions of messages after
restart can replicate hours.
Doen’t have “replay” mechanism
Consumers doesn’t scale
Doesn’t preserve order of messages.
− Cons
Doesn’t scale well - millions of messages after
restart can replicate hours.
Doen’t have “replay” mechanism
Consumers doesn’t scale
Doesn’t preserve order of messages.
− Cons
Doesn’t scale well - millions of messages after
restart can replicate hours.
Doen’t have “replay” mechanism
Consumers doesn’t scale
Doesn’t preserve order of messages.
Split brain
Split brain - autoheal
ignore
Usewhen network reliability is the highest practically possible and node availability is of topmost importance.
ignore
Usewhen network reliability is the highest practically possible and node availability is of topmost importance.
ignore
Usewhen network reliability is the highest practically possible and node availability is of topmost importance.
pause_minority
Appropriatewhen clustering across racks oravailability zones in a single region and the probability of losing a majority
of nodes (zones) at once is considered to bevery low.
pause_minority
Appropriatewhen clustering across racks oravailability zones in a single region and the probability of losing a majority
of nodes (zones) at once is considered to bevery low.
pause_minority
Appropriatewhen clustering across racks oravailability zones in a single region and the probability of losing a majority
of nodes (zones) at once is considered to bevery low.
autoheal
Appropriatewhen are more concernedwith continuity of service thanwith data consistency across nodes.
autoheal
Appropriatewhen are more concernedwith continuity of service thanwith data consistency across nodes.
autoheal
Appropriatewhen are more concernedwith continuity of service thanwith data consistency across nodes.
Summary - noway to guarantee that autohealwillwork properly
Summary - noway to guarantee that autohealwillwork properly
Summary - noway to guarantee that autohealwillwork properly
RabbitMQ streaming, problem #1
RabbitMQ streaming. Go client, issue #2
RabbitMQ streaming. Go client, issue #3
RabbitMQ + RabbitMQ streaming
Newfeature that not a lot of companies use.
Newfeature that not a lot of companies use.
Newfeature that not a lot of companies use.
Go client is not ready,what about Python orNode.js I’m aftaid to ask.
Go client is not ready,what about Python orNode.js I’m aftaid to ask.
Go client is not ready,what about Python orNode.js I’m aftaid to ask.
Hard to support. Requires updates of Erlang and then RabbitMQ.
Hard to support. Requires updates of Erlang and then RabbitMQ.
Hard to support. Requires updates of Erlang and then RabbitMQ.
Streaming is a plugin that requires specificversion of RabbitMQ.
Streaming is a plugin that requires specificversion of RabbitMQ.
Streaming is a plugin that requires specificversion of RabbitMQ.
Not made for fintech: lack of properdurability, lack of functionality.
Not made for fintech: lack of properdurability, lack of functionality.
Not made for fintech: lack of properdurability, lack of functionality.
"$10 thousand per minute of downtime: architecture, queues, streaming and fintech", Max Baginskiy
Kafka dive in
Java
Written in Java by Linkedin and then
opensourced and licenced under
Apache licence.
Java
Written in Java by Linkedin and then
opensourced and licenced under
Apache licence.
Java
Written in Java by Linkedin and then
opensourced and licenced under
Apache licence.
Highly available and durable
HasWAL,works in cluster, saves data to
disk by default.
Highly available and durable
HasWAL,works in cluster, saves data to
disk by default.
Highly available and durable
HasWAL,works in cluster, saves data to
disk by default.
️Blazing fast
Sequentialwrites, zero copy.
️Blazing fast
Sequentialwrites, zero copy.
️Blazing fast
Sequentialwrites, zero copy.
Kafka dive in
Kafka uses optimizations around Sequentialwrites to optimize disk usagewith zero copy.
Kafka uses optimizations around Sequentialwrites to optimize disk usagewith zero copy.
Kafka uses optimizations around Sequentialwrites to optimize disk usagewith zero copy.
HasWAL log forreplication and durability.
HasWAL log forreplication and durability.
HasWAL log forreplication and durability.
Zookeeperas separate system tracks health of the cluster.
Zookeeperas separate system tracks health of the cluster.
Zookeeperas separate system tracks health of the cluster.
Canwork evenwithout Zookeeper.
Canwork evenwithout Zookeeper.
Canwork evenwithout Zookeeper.
Chaos engineering shows that Kafka is highlyavailable and durable solution.
Chaos engineering shows that Kafka is highlyavailable and durable solution.
Chaos engineering shows that Kafka is highlyavailable and durable solution.
Debezium
Debezium howto:
Debezium howto:
Debezium howto: Create a replication slot
Create a replication slot
Create a replication slot Run Debezium Java service in cluster
Run Debezium Java service in cluster
Run Debezium Java service in cluster Configurate itwith Groovy
Configurate itwith Groovy
Configurate itwith Groovy
Debezium
+ Pros
UsesWAL directly - doesn’t create
additional load toWAL(no additional
data iswritten).
Production ready, tested solution
Lowlatency. ️
+ Pros
UsesWAL directly - doesn’t create
additional load toWAL(no additional
data iswritten).
Production ready, tested solution
Lowlatency. ️
+ Pros
UsesWAL directly - doesn’t create
additional load toWAL(no additional
data iswritten).
Production ready, tested solution
Lowlatency. ️
− Cons
Howto replay data? Can you specify
Log Sequence Number?What if you
need to stream only a fraction ofwhat
iswritten inWAL
Missing Buf(protobuf on steroids)
Lowflexibility and hard configurability
DB Isolation.
Groovywhich is not easy to use
Random disconnects

and need to restart.
Transactional outbox
Why to use Transactional
Outbox?
Why to use Transactional
Outbox?
Why to use Transactional
Outbox?
Nor Kafka nor CDC can flexibly
re-stream data.
Nor Kafka nor CDC can flexibly
re-stream data.
Nor Kafka nor CDC can flexibly
re-stream data.
Without specific instruments
you can’t remove specific
events from Kafka.
Without specific instruments
you can’t remove specific
events from Kafka.
Without specific instruments
you can’t remove specific
events from Kafka.
Replay with Kafka will require
setup of additional services.
Replay with Kafka will require
setup of additional services.
Replay with Kafka will require
setup of additional services.
Consistent state with the usage
of Transactions.
Consistent state with the usage
of Transactions.
Consistent state with the usage
of Transactions.
Outboxtable-WAL
ID-ulid(sortableuuids).
ID-ulid(sortableuuids).
ID-ulid(sortableuuids). Bucket-partitioning.Read/Writepartitioning.
Bucket-partitioning.Read/Writepartitioning.
Bucket-partitioning.Read/Writepartitioning. Payload-jsonbodyofdomainmodel.
Payload-jsonbodyofdomainmodel.
Payload-jsonbodyofdomainmodel.
ChoosingGolib
confluent-kafka-go - CGO + librdkafka
confluent-kafka-go - CGO + librdkafka
confluent-kafka-go - CGO + librdkafka
ibm/sarama
ibm/sarama
ibm/sarama
segmentio/kafka-go
segmentio/kafka-go
segmentio/kafka-go
Outboxtable-WAL
BatchSize-usually10.
BatchSize-usually10.
BatchSize-usually10. Batch Timeout-100ms.
Batch Timeout-100ms.
Batch Timeout-100ms. RequiedAcks-allnodesshouldconfirmmessage.
RequiedAcks-allnodesshouldconfirmmessage.
RequiedAcks-allnodesshouldconfirmmessage.
Async-alseforsynchronouserrorhandling.
Async-alseforsynchronouserrorhandling.
Async-alseforsynchronouserrorhandling.
Kafka write latency
Schema registry -
“Speca first” approach - speedup development.
“Speca first” approach - speedup development.
“Speca first” approach - speedup development.
Backward compatibility support - linters.
Backward compatibility support - linters.
Backward compatibility support - linters.
Reusable “menthal model” = simplified migration from api to stream.
Reusable “menthal model” = simplified migration from api to stream.
Reusable “menthal model” = simplified migration from api to stream.
Client, server and models are generated for various language.
Client, server and models are generated for various language.
Client, server and models are generated for various language.
Simplified versioning.
Simplified versioning.
Simplified versioning.
Taxerv1 optionwe built
Update payment in Gate(kotlin).
Update payment in Gate(kotlin).
Update payment in Gate(kotlin).
Transaction: Save payment update
and create a record in Outbox.
Transaction: Save payment update
and create a record in Outbox.
Transaction: Save payment update
and create a record in Outbox.
Orderstreamer(Go) - reads batch
from outbox.
Orderstreamer(Go) - reads batch
from outbox.
Orderstreamer(Go) - reads batch
from outbox.
Publish data to Stream.
Publish data to Stream.
Publish data to Stream.
Update Offset in meta table.
Update Offset in meta table.
Update Offset in meta table.
Streamer
Meta table
Architecture comparison
v1 comparison with typical architecture
+ Pros
We have a full transaction log that can
be replayed, reworked, saved, fixed
Only 1 new tech - kafka
Streamer + Leaser = 200 lines of code +
800 lines of tests. It can be used as
library not a service
Buf/Go/PostgreSQL - everything
reused - maintenance simplified.
+ Pros
We have a full transaction log that can
be replayed, reworked, saved, fixed
Only 1 new tech - kafka
Streamer + Leaser = 200 lines of code +
800 lines of tests. It can be used as
library not a service
Buf/Go/PostgreSQL - everything
reused - maintenance simplified.
+ Pros
We have a full transaction log that can
be replayed, reworked, saved, fixed
Only 1 new tech - kafka
Streamer + Leaser = 200 lines of code +
800 lines of tests. It can be used as
library not a service
Buf/Go/PostgreSQL - everything
reused - maintenance simplified.
− Cons
WAL amplification - 2x. Transactional
outbox requires 1 more write to each
operation
High delay - 2 min for events
More CPU load than just reading from
WAL.
The end.
Nah, I’m kidding.
Orderstreamerdelay: 2min
You learned about 2 min delay
Orderstreamerdelay: 2min
ULIDs - doesn’t allowus to understand the commit orderof events and missing parts.
ULIDs - doesn’t allowus to understand the commit orderof events and missing parts.
ULIDs - doesn’t allowus to understand the commit orderof events and missing parts.
Solution-LogicalClock+Autoincrement
Logicalclocksallowadistributedsystemtoenforceapartialorderingofeventswithoutphysicalclocks.

Youalsocandetectmissingeventswiththem.
v2 Implementation
Auto increment instead of ULIDwill helpyou to report
and look formissing IDs. It’s more like “logical time”.
Auto increment instead of ULIDwill helpyou to report
and look formissing IDs. It’s more like “logical time”.
Auto increment instead of ULIDwill helpyou to report
and look formissing IDs. It’s more like “logical time”.
Look formissing ids for, save them in meta table for2
mins and restream themwhen theyappear.
Look formissing ids for, save them in meta table for2
mins and restream themwhen theyappear.
Look formissing ids for, save them in meta table for2
mins and restream themwhen theyappear.
v2Summary
+Pros
Reducesdelaytimefrom2minsto
literallyseconds
Wecanusethisapproachnotonlyin
reports/taxesbutalsoinprocessing.
+Pros
Reducesdelaytimefrom2minsto
literallyseconds
Wecanusethisapproachnotonlyin
reports/taxesbutalsoinprocessing.
+Pros
Reducesdelaytimefrom2minsto
literallyseconds
Wecanusethisapproachnotonlyin
reports/taxesbutalsoinprocessing.
−Cons
Orderingcanbebroken,butwecan
supportseveralmodelsofeventual
consistency
HigherDBCPUutilization.
−Cons
Orderingcanbebroken,butwecan
supportseveralmodelsofeventual
consistency
HigherDBCPUutilization.
−Cons
Orderingcanbebroken,butwecan
supportseveralmodelsofeventual
consistency
HigherDBCPUutilization.
The end.
Nah,I’mkiddingx2
v3readingWAL
WAL“reading”canbe
implementedinjust200lines

ofcodealongwithreplication
slotcreationandpublication
creation.
WAL“reading”canbe
implementedinjust200lines

ofcodealongwithreplication
slotcreationandpublication
creation.
WAL“reading”canbe
implementedinjust200lines

ofcodealongwithreplication
slotcreationandpublication
creation.
InPostgreSQLreplicationslots
youhaveanaccessto
received_lsn,latest_end_lsn.

Itseemslikeyoucanreplay
changes.
InPostgreSQLreplicationslots
youhaveanaccessto
, .

Itseemslikeyoucanreplay
changes.
received_lsn latest_end_lsn
InPostgreSQLreplicationslots
youhaveanaccessto
, .

Itseemslikeyoucanreplay
changes.
received_lsn latest_end_lsn
V3WALlib-NextTime
V3WALlib-NextTime
V3WALlib-NextTime
SeeyouatthenextHighload!
SeeyouatthenextHighload!
SeeyouatthenextHighload!
.....
.....
.....
making online payments simple

More Related Content

Similar to "$10 thousand per minute of downtime: architecture, queues, streaming and fintech", Max Baginskiy

The Ember.js Framework - Everything You Need To Know
The Ember.js Framework - Everything You Need To KnowThe Ember.js Framework - Everything You Need To Know
The Ember.js Framework - Everything You Need To Know
All Things Open
 
Presto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop MeetupPresto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop Meetup
Justin Borgman
 
Nagios Conference 2014 - David Josephsen - Alert on What You Draw
Nagios Conference 2014 - David Josephsen - Alert on What You DrawNagios Conference 2014 - David Josephsen - Alert on What You Draw
Nagios Conference 2014 - David Josephsen - Alert on What You Draw
Nagios
 
Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)
W2O Group
 
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven SystemsGo Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
Jonas Bonér
 
Logs And Backups
Logs And BackupsLogs And Backups
Logs And Backups
Charles Southerland
 
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmxMoved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx
Milen Dyankov
 
Software Architecture Anti-Patterns
Software Architecture Anti-PatternsSoftware Architecture Anti-Patterns
Software Architecture Anti-Patterns
Eduards Sizovs
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Databricks
 
Scaling a Rails Application from the Bottom Up
Scaling a Rails Application from the Bottom Up Scaling a Rails Application from the Bottom Up
Scaling a Rails Application from the Bottom Up
Abhishek Singh
 
Enterprise Search Summit - Speeding Up Search
Enterprise Search Summit - Speeding Up SearchEnterprise Search Summit - Speeding Up Search
Enterprise Search Summit - Speeding Up Search
Azul Systems Inc.
 
Let's Get to the Rapids
Let's Get to the RapidsLet's Get to the Rapids
Let's Get to the Rapids
Maurice Naftalin
 
Parallel Ruby: Managing the Memory Monster
Parallel Ruby: Managing the Memory MonsterParallel Ruby: Managing the Memory Monster
Parallel Ruby: Managing the Memory Monster
Kevin Miller
 
Sadeem cloud native السحابة الطبيعية
Sadeem cloud native السحابة الطبيعيةSadeem cloud native السحابة الطبيعية
Sadeem cloud native السحابة الطبيعية
Taher Boujrida
 
MongoDB in the Cloud -- Mongo Boulder
MongoDB in the Cloud -- Mongo BoulderMongoDB in the Cloud -- Mongo Boulder
MongoDB in the Cloud -- Mongo Boulder
Justin Smestad
 
Expecto Performa! The Magic and Reality of Performance Tuning
Expecto Performa! The Magic and Reality of Performance TuningExpecto Performa! The Magic and Reality of Performance Tuning
Expecto Performa! The Magic and Reality of Performance Tuning
Atlassian
 
Introducing Akka
Introducing AkkaIntroducing Akka
Introducing Akka
Jonas Bonér
 
Introducingakkajavazone2012 120914094033-phpapp02
Introducingakkajavazone2012 120914094033-phpapp02Introducingakkajavazone2012 120914094033-phpapp02
Introducingakkajavazone2012 120914094033-phpapp02
Typesafe
 
Containers and Developer Defined Data Centers - Evan Powell - Keynote in Bang...
Containers and Developer Defined Data Centers - Evan Powell - Keynote in Bang...Containers and Developer Defined Data Centers - Evan Powell - Keynote in Bang...
Containers and Developer Defined Data Centers - Evan Powell - Keynote in Bang...
CodeOps Technologies LLP
 
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Go Reactive: Event-Driven, Scalable, Resilient & Responsive SystemsGo Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Jonas Bonér
 

Similar to "$10 thousand per minute of downtime: architecture, queues, streaming and fintech", Max Baginskiy (20)

The Ember.js Framework - Everything You Need To Know
The Ember.js Framework - Everything You Need To KnowThe Ember.js Framework - Everything You Need To Know
The Ember.js Framework - Everything You Need To Know
 
Presto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop MeetupPresto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop Meetup
 
Nagios Conference 2014 - David Josephsen - Alert on What You Draw
Nagios Conference 2014 - David Josephsen - Alert on What You DrawNagios Conference 2014 - David Josephsen - Alert on What You Draw
Nagios Conference 2014 - David Josephsen - Alert on What You Draw
 
Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)
 
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven SystemsGo Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
 
Logs And Backups
Logs And BackupsLogs And Backups
Logs And Backups
 
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmxMoved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx
 
Software Architecture Anti-Patterns
Software Architecture Anti-PatternsSoftware Architecture Anti-Patterns
Software Architecture Anti-Patterns
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
 
Scaling a Rails Application from the Bottom Up
Scaling a Rails Application from the Bottom Up Scaling a Rails Application from the Bottom Up
Scaling a Rails Application from the Bottom Up
 
Enterprise Search Summit - Speeding Up Search
Enterprise Search Summit - Speeding Up SearchEnterprise Search Summit - Speeding Up Search
Enterprise Search Summit - Speeding Up Search
 
Let's Get to the Rapids
Let's Get to the RapidsLet's Get to the Rapids
Let's Get to the Rapids
 
Parallel Ruby: Managing the Memory Monster
Parallel Ruby: Managing the Memory MonsterParallel Ruby: Managing the Memory Monster
Parallel Ruby: Managing the Memory Monster
 
Sadeem cloud native السحابة الطبيعية
Sadeem cloud native السحابة الطبيعيةSadeem cloud native السحابة الطبيعية
Sadeem cloud native السحابة الطبيعية
 
MongoDB in the Cloud -- Mongo Boulder
MongoDB in the Cloud -- Mongo BoulderMongoDB in the Cloud -- Mongo Boulder
MongoDB in the Cloud -- Mongo Boulder
 
Expecto Performa! The Magic and Reality of Performance Tuning
Expecto Performa! The Magic and Reality of Performance TuningExpecto Performa! The Magic and Reality of Performance Tuning
Expecto Performa! The Magic and Reality of Performance Tuning
 
Introducing Akka
Introducing AkkaIntroducing Akka
Introducing Akka
 
Introducingakkajavazone2012 120914094033-phpapp02
Introducingakkajavazone2012 120914094033-phpapp02Introducingakkajavazone2012 120914094033-phpapp02
Introducingakkajavazone2012 120914094033-phpapp02
 
Containers and Developer Defined Data Centers - Evan Powell - Keynote in Bang...
Containers and Developer Defined Data Centers - Evan Powell - Keynote in Bang...Containers and Developer Defined Data Centers - Evan Powell - Keynote in Bang...
Containers and Developer Defined Data Centers - Evan Powell - Keynote in Bang...
 
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Go Reactive: Event-Driven, Scalable, Resilient & Responsive SystemsGo Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
 

More from Fwdays

"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
Fwdays
 
"Microservices and multitenancy - how to serve thousands of databases in one ...
"Microservices and multitenancy - how to serve thousands of databases in one ..."Microservices and multitenancy - how to serve thousands of databases in one ...
"Microservices and multitenancy - how to serve thousands of databases in one ...
Fwdays
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
Fwdays
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
"Black Monday: The Story of 5.5 Hours of Downtime", Dmytro Dziubenko
"Black Monday: The Story of 5.5 Hours of Downtime", Dmytro Dziubenko"Black Monday: The Story of 5.5 Hours of Downtime", Dmytro Dziubenko
"Black Monday: The Story of 5.5 Hours of Downtime", Dmytro Dziubenko
Fwdays
 
"Reaching 3_000_000 HTTP requests per second — conclusions from participation...
"Reaching 3_000_000 HTTP requests per second — conclusions from participation..."Reaching 3_000_000 HTTP requests per second — conclusions from participation...
"Reaching 3_000_000 HTTP requests per second — conclusions from participation...
Fwdays
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
"What I learned through reverse engineering", Yuri Artiukh
"What I learned through reverse engineering", Yuri Artiukh"What I learned through reverse engineering", Yuri Artiukh
"What I learned through reverse engineering", Yuri Artiukh
Fwdays
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
"Micro frontends: Unbelievably true life story", Dmytro Pavlov
"Micro frontends: Unbelievably true life story", Dmytro Pavlov"Micro frontends: Unbelievably true life story", Dmytro Pavlov
"Micro frontends: Unbelievably true life story", Dmytro Pavlov
Fwdays
 
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
Fwdays
 
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
Fwdays
 
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y..."How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
Fwdays
 
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
Fwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Fwdays
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
Fwdays
 
"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets
Fwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Fwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
Fwdays
 

More from Fwdays (20)

"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
 
"Microservices and multitenancy - how to serve thousands of databases in one ...
"Microservices and multitenancy - how to serve thousands of databases in one ..."Microservices and multitenancy - how to serve thousands of databases in one ...
"Microservices and multitenancy - how to serve thousands of databases in one ...
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
"Black Monday: The Story of 5.5 Hours of Downtime", Dmytro Dziubenko
"Black Monday: The Story of 5.5 Hours of Downtime", Dmytro Dziubenko"Black Monday: The Story of 5.5 Hours of Downtime", Dmytro Dziubenko
"Black Monday: The Story of 5.5 Hours of Downtime", Dmytro Dziubenko
 
"Reaching 3_000_000 HTTP requests per second — conclusions from participation...
"Reaching 3_000_000 HTTP requests per second — conclusions from participation..."Reaching 3_000_000 HTTP requests per second — conclusions from participation...
"Reaching 3_000_000 HTTP requests per second — conclusions from participation...
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
"What I learned through reverse engineering", Yuri Artiukh
"What I learned through reverse engineering", Yuri Artiukh"What I learned through reverse engineering", Yuri Artiukh
"What I learned through reverse engineering", Yuri Artiukh
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
"Micro frontends: Unbelievably true life story", Dmytro Pavlov
"Micro frontends: Unbelievably true life story", Dmytro Pavlov"Micro frontends: Unbelievably true life story", Dmytro Pavlov
"Micro frontends: Unbelievably true life story", Dmytro Pavlov
 
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
 
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
 
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y..."How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
 
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Recently uploaded

Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and OllamaTirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Zilliz
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
Emerging Tech
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
SynapseIndia
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Bert Blevins
 
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptxDublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Kunal Gupta
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc
 
ARTIFICIAL INTELLIGENCE (AI) IN MUSIC.pdf
ARTIFICIAL INTELLIGENCE (AI) IN MUSIC.pdfARTIFICIAL INTELLIGENCE (AI) IN MUSIC.pdf
ARTIFICIAL INTELLIGENCE (AI) IN MUSIC.pdf
Inglês no Mundo Digital
 
How to build a generative AI solution A step-by-step guide (2).pdf
How to build a generative AI solution A step-by-step guide (2).pdfHow to build a generative AI solution A step-by-step guide (2).pdf
How to build a generative AI solution A step-by-step guide (2).pdf
ChristopherTHyatt
 
Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
Zilliz
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
kumarjarun2010
 
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly DetectionAdvanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Bert Blevins
 
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
Edge AI and Vision Alliance
 
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
bhumivarma35300
 
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite SolutionIPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Networks
 
July Patch Tuesday
July Patch TuesdayJuly Patch Tuesday
July Patch Tuesday
Ivanti
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Muhammad Ali
 
Salesforce AI & Einstein Copilot Workshop
Salesforce AI & Einstein Copilot WorkshopSalesforce AI & Einstein Copilot Workshop
Salesforce AI & Einstein Copilot Workshop
CEPTES Software Inc
 
Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...
chetankumar9855
 
The Role of Technology in Payroll Statutory Compliance (1).pdf
The Role of Technology in Payroll Statutory Compliance (1).pdfThe Role of Technology in Payroll Statutory Compliance (1).pdf
The Role of Technology in Payroll Statutory Compliance (1).pdf
paysquare consultancy
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
SynapseIndia
 

Recently uploaded (20)

Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and OllamaTirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
 
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptxDublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
 
ARTIFICIAL INTELLIGENCE (AI) IN MUSIC.pdf
ARTIFICIAL INTELLIGENCE (AI) IN MUSIC.pdfARTIFICIAL INTELLIGENCE (AI) IN MUSIC.pdf
ARTIFICIAL INTELLIGENCE (AI) IN MUSIC.pdf
 
How to build a generative AI solution A step-by-step guide (2).pdf
How to build a generative AI solution A step-by-step guide (2).pdfHow to build a generative AI solution A step-by-step guide (2).pdf
How to build a generative AI solution A step-by-step guide (2).pdf
 
Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
 
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly DetectionAdvanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
 
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
 
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
 
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite SolutionIPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite Solution
 
July Patch Tuesday
July Patch TuesdayJuly Patch Tuesday
July Patch Tuesday
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
 
Salesforce AI & Einstein Copilot Workshop
Salesforce AI & Einstein Copilot WorkshopSalesforce AI & Einstein Copilot Workshop
Salesforce AI & Einstein Copilot Workshop
 
Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...
 
The Role of Technology in Payroll Statutory Compliance (1).pdf
The Role of Technology in Payroll Statutory Compliance (1).pdfThe Role of Technology in Payroll Statutory Compliance (1).pdf
The Role of Technology in Payroll Statutory Compliance (1).pdf
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
 

"$10 thousand per minute of downtime: architecture, queues, streaming and fintech", Max Baginskiy

  • 1. $10 thousand per minute of downtime: architecture, queues, streaming and fintech Max Baginskiy Solidgate
  • 2. About me Head of Engineering PreviouslyTech Lead and Platform engineer Head of Engineering PreviouslyTech Lead and Platform engineer Head of Engineering PreviouslyTech Lead and Platform engineer 10 yrs in Software Engineering Last 6 years Go, fan of DevOps 10 yrs in Software Engineering Last 6 years Go, fan of DevOps 10 yrs in Software Engineering Last 6 years Go, fan of DevOps Build teams (5 teams, 30+ people hired) And architecture Build teams (5 teams, 30+ people hired) And architecture Build teams (5 teams, 30+ people hired) And architecture
  • 3. Agenda Company intro Architecture of the system Queues and Streams to choose from Low latency streaming using outbox CDC to our solution - comparison Questions. Company intro Architecture of the system Queues and Streams to choose from Low latency streaming using outbox CDC to our solution - comparison Questions. Company intro Architecture of the system Queues and Streams to choose from Low latency streaming using outbox CDC to our solution - comparison Questions.
  • 4. About company 7+ years online 7+ years online 7+ years online 70 engineers 50 SW engineers 20 Infra + Data engineers + AQA 70 engineers 50 SW engineers 20 Infra + Data engineers + AQA 70 engineers 50 SW engineers 20 Infra + Data engineers + AQA PCI DSS Compliant PCI DSS Compliant PCI DSS Compliant European Acquirer European Acquirer European Acquirer
  • 6. Business figures 2.5b$ annually 2.5b$ annually 2.5b$ annually 15-18m tx monthly 15-18m tx monthly 15-18m tx monthly 10k$ 1 min of downtime 10k$ 1 min of downtime 10k$ 1 min of downtime 40+ integrated payment methods
 and providers 40+ integrated payment methods
 and providers 40+ integrated payment methods
 and providers
  • 7. ALBTraffic We have 100x less traffic on ALB
 during high season than Shopify We have 100x less traffic on ALB
 during high season than Shopify We have 100x less traffic on ALB
 during high season than Shopify Stripe served 250mil API calls
 in 2020 perday Stripe served 250mil API calls
 in 2020 perday Stripe served 250mil API calls
 in 2020 perday
  • 8. Kafka Producer We started integrating kafka lastyear We started integrating kafka lastyear We started integrating kafka lastyear 20 rps average 20 rps average 20 rps average 2 mil events perday 2 mil events perday 2 mil events perday
  • 9. RabbitMQ Producer 100-120 rps average 100-120 rps average 100-120 rps average 10 mil events perday 10 mil events perday 10 mil events perday
  • 10. Logs 1.5-2k rps of logs 1.5-2k rps of logs 1.5-2k rps of logs 150 mil events per day. 200-300 GB of logs daily 150 mil events per day. 200-300 GB of logs daily 150 mil events per day. 200-300 GB of logs daily
  • 14. Non functional requirements Durability out of the box Durability out of the box Durability out of the box Queue replay Queue replay Queue replay Single active consumer support Single active consumer support Single active consumer support Easy to setup and to maintain Easy to setup and to maintain Easy to setup and to maintain Partitioning Partitioning Partitioning Easy scaling for publisher and consumer Easy scaling for publisher and consumer Easy scaling for publisher and consumer Extensiblity: schema registry support, dynamic routing, enrichment Extensiblity: schema registry support, dynamic routing, enrichment Extensiblity: schema registry support, dynamic routing, enrichment
  • 15. NFR - explanation What if message is lost in between services
 while processing andwe retry payment? What if message is lost in between services
 while processing andwe retry payment? What if message is lost in between services
 while processing andwe retry payment? What if message is lost in between callback service and callback processor? What if message is lost in between callback service and callback processor? What if message is lost in between callback service and callback processor? What if message is lost in between payment and finance systems? What if message is lost in between payment and finance systems? What if message is lost in between payment and finance systems? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …? what if …what if …what if …?
  • 16. RabbitMQ dive in Erlang Written in Erlang. Erlang made by Ericssonwhich makes telecommunication devices. Erlang Written in Erlang. Erlang made by Ericssonwhich makes telecommunication devices. Erlang Written in Erlang. Erlang made by Ericssonwhich makes telecommunication devices. Proof of fail-safety ATM AXD301 example, Calculated uptime 99,9999999%, only one problem permany years. Proof of fail-safety ATM AXD301 example, Calculated uptime 99,9999999%, only one problem permany years. Proof of fail-safety ATM AXD301 example, Calculated uptime 99,9999999%, only one problem permany years. Mnesia as storage Mnesia doesn’t support recovery from split brain and othertypes of failures. Mnesia as storage Mnesia doesn’t support recovery from split brain and othertypes of failures. Mnesia as storage Mnesia doesn’t support recovery from split brain and othertypes of failures.
  • 17. RabbitMQ Durability Mechanisms Publisher confirms is a MUST have RabbitMQ can store data to Disk and different autoheal modes Different types of queues: Quorum, Mirrored Have Streaming in “beta”. Mechanisms Publisher confirms is a MUST have RabbitMQ can store data to Disk and different autoheal modes Different types of queues: Quorum, Mirrored Have Streaming in “beta”. Mechanisms Publisher confirms is a MUST have RabbitMQ can store data to Disk and different autoheal modes Different types of queues: Quorum, Mirrored Have Streaming in “beta”. What if publisher confirms disabled? Delivery after exchange might not happen Persistence might not happen Few replicas might not acknowledge message in Quorum Overwhelmed Clusterwill not accept messages but publisherwill not know. What if publisher confirms disabled? Delivery after exchange might not happen Persistence might not happen Few replicas might not acknowledge message in Quorum Overwhelmed Clusterwill not accept messages but publisherwill not know. What if publisher confirms disabled? Delivery after exchange might not happen Persistence might not happen Few replicas might not acknowledge message in Quorum Overwhelmed Clusterwill not accept messages but publisherwill not know.
  • 18. Quorum queues + Pros Have Consensus built in Data written to disk, metadata in memory Can easily handle restarts. + Pros Have Consensus built in Data written to disk, metadata in memory Can easily handle restarts. + Pros Have Consensus built in Data written to disk, metadata in memory Can easily handle restarts. − Cons Doesn’t scale well - millions of messages after restart can replicate hours. Doen’t have “replay” mechanism Consumers doesn’t scale Doesn’t preserve order of messages. − Cons Doesn’t scale well - millions of messages after restart can replicate hours. Doen’t have “replay” mechanism Consumers doesn’t scale Doesn’t preserve order of messages. − Cons Doesn’t scale well - millions of messages after restart can replicate hours. Doen’t have “replay” mechanism Consumers doesn’t scale Doesn’t preserve order of messages.
  • 20. Split brain - autoheal ignore Usewhen network reliability is the highest practically possible and node availability is of topmost importance. ignore Usewhen network reliability is the highest practically possible and node availability is of topmost importance. ignore Usewhen network reliability is the highest practically possible and node availability is of topmost importance. pause_minority Appropriatewhen clustering across racks oravailability zones in a single region and the probability of losing a majority of nodes (zones) at once is considered to bevery low. pause_minority Appropriatewhen clustering across racks oravailability zones in a single region and the probability of losing a majority of nodes (zones) at once is considered to bevery low. pause_minority Appropriatewhen clustering across racks oravailability zones in a single region and the probability of losing a majority of nodes (zones) at once is considered to bevery low. autoheal Appropriatewhen are more concernedwith continuity of service thanwith data consistency across nodes. autoheal Appropriatewhen are more concernedwith continuity of service thanwith data consistency across nodes. autoheal Appropriatewhen are more concernedwith continuity of service thanwith data consistency across nodes. Summary - noway to guarantee that autohealwillwork properly Summary - noway to guarantee that autohealwillwork properly Summary - noway to guarantee that autohealwillwork properly
  • 22. RabbitMQ streaming. Go client, issue #2
  • 23. RabbitMQ streaming. Go client, issue #3
  • 24. RabbitMQ + RabbitMQ streaming Newfeature that not a lot of companies use. Newfeature that not a lot of companies use. Newfeature that not a lot of companies use. Go client is not ready,what about Python orNode.js I’m aftaid to ask. Go client is not ready,what about Python orNode.js I’m aftaid to ask. Go client is not ready,what about Python orNode.js I’m aftaid to ask. Hard to support. Requires updates of Erlang and then RabbitMQ. Hard to support. Requires updates of Erlang and then RabbitMQ. Hard to support. Requires updates of Erlang and then RabbitMQ. Streaming is a plugin that requires specificversion of RabbitMQ. Streaming is a plugin that requires specificversion of RabbitMQ. Streaming is a plugin that requires specificversion of RabbitMQ. Not made for fintech: lack of properdurability, lack of functionality. Not made for fintech: lack of properdurability, lack of functionality. Not made for fintech: lack of properdurability, lack of functionality.
  • 26. Kafka dive in Java Written in Java by Linkedin and then opensourced and licenced under Apache licence. Java Written in Java by Linkedin and then opensourced and licenced under Apache licence. Java Written in Java by Linkedin and then opensourced and licenced under Apache licence. Highly available and durable HasWAL,works in cluster, saves data to disk by default. Highly available and durable HasWAL,works in cluster, saves data to disk by default. Highly available and durable HasWAL,works in cluster, saves data to disk by default. ️Blazing fast Sequentialwrites, zero copy. ️Blazing fast Sequentialwrites, zero copy. ️Blazing fast Sequentialwrites, zero copy.
  • 27. Kafka dive in Kafka uses optimizations around Sequentialwrites to optimize disk usagewith zero copy. Kafka uses optimizations around Sequentialwrites to optimize disk usagewith zero copy. Kafka uses optimizations around Sequentialwrites to optimize disk usagewith zero copy. HasWAL log forreplication and durability. HasWAL log forreplication and durability. HasWAL log forreplication and durability. Zookeeperas separate system tracks health of the cluster. Zookeeperas separate system tracks health of the cluster. Zookeeperas separate system tracks health of the cluster. Canwork evenwithout Zookeeper. Canwork evenwithout Zookeeper. Canwork evenwithout Zookeeper. Chaos engineering shows that Kafka is highlyavailable and durable solution. Chaos engineering shows that Kafka is highlyavailable and durable solution. Chaos engineering shows that Kafka is highlyavailable and durable solution.
  • 28. Debezium Debezium howto: Debezium howto: Debezium howto: Create a replication slot Create a replication slot Create a replication slot Run Debezium Java service in cluster Run Debezium Java service in cluster Run Debezium Java service in cluster Configurate itwith Groovy Configurate itwith Groovy Configurate itwith Groovy
  • 29. Debezium + Pros UsesWAL directly - doesn’t create additional load toWAL(no additional data iswritten). Production ready, tested solution Lowlatency. ️ + Pros UsesWAL directly - doesn’t create additional load toWAL(no additional data iswritten). Production ready, tested solution Lowlatency. ️ + Pros UsesWAL directly - doesn’t create additional load toWAL(no additional data iswritten). Production ready, tested solution Lowlatency. ️ − Cons Howto replay data? Can you specify Log Sequence Number?What if you need to stream only a fraction ofwhat iswritten inWAL Missing Buf(protobuf on steroids) Lowflexibility and hard configurability DB Isolation. Groovywhich is not easy to use Random disconnects
 and need to restart.
  • 30. Transactional outbox Why to use Transactional Outbox? Why to use Transactional Outbox? Why to use Transactional Outbox? Nor Kafka nor CDC can flexibly re-stream data. Nor Kafka nor CDC can flexibly re-stream data. Nor Kafka nor CDC can flexibly re-stream data. Without specific instruments you can’t remove specific events from Kafka. Without specific instruments you can’t remove specific events from Kafka. Without specific instruments you can’t remove specific events from Kafka. Replay with Kafka will require setup of additional services. Replay with Kafka will require setup of additional services. Replay with Kafka will require setup of additional services. Consistent state with the usage of Transactions. Consistent state with the usage of Transactions. Consistent state with the usage of Transactions.
  • 32. ChoosingGolib confluent-kafka-go - CGO + librdkafka confluent-kafka-go - CGO + librdkafka confluent-kafka-go - CGO + librdkafka ibm/sarama ibm/sarama ibm/sarama segmentio/kafka-go segmentio/kafka-go segmentio/kafka-go
  • 33. Outboxtable-WAL BatchSize-usually10. BatchSize-usually10. BatchSize-usually10. Batch Timeout-100ms. Batch Timeout-100ms. Batch Timeout-100ms. RequiedAcks-allnodesshouldconfirmmessage. RequiedAcks-allnodesshouldconfirmmessage. RequiedAcks-allnodesshouldconfirmmessage. Async-alseforsynchronouserrorhandling. Async-alseforsynchronouserrorhandling. Async-alseforsynchronouserrorhandling.
  • 35. Schema registry - “Speca first” approach - speedup development. “Speca first” approach - speedup development. “Speca first” approach - speedup development. Backward compatibility support - linters. Backward compatibility support - linters. Backward compatibility support - linters. Reusable “menthal model” = simplified migration from api to stream. Reusable “menthal model” = simplified migration from api to stream. Reusable “menthal model” = simplified migration from api to stream. Client, server and models are generated for various language. Client, server and models are generated for various language. Client, server and models are generated for various language. Simplified versioning. Simplified versioning. Simplified versioning.
  • 36. Taxerv1 optionwe built Update payment in Gate(kotlin). Update payment in Gate(kotlin). Update payment in Gate(kotlin). Transaction: Save payment update and create a record in Outbox. Transaction: Save payment update and create a record in Outbox. Transaction: Save payment update and create a record in Outbox. Orderstreamer(Go) - reads batch from outbox. Orderstreamer(Go) - reads batch from outbox. Orderstreamer(Go) - reads batch from outbox. Publish data to Stream. Publish data to Stream. Publish data to Stream. Update Offset in meta table. Update Offset in meta table. Update Offset in meta table.
  • 40. v1 comparison with typical architecture + Pros We have a full transaction log that can be replayed, reworked, saved, fixed Only 1 new tech - kafka Streamer + Leaser = 200 lines of code + 800 lines of tests. It can be used as library not a service Buf/Go/PostgreSQL - everything reused - maintenance simplified. + Pros We have a full transaction log that can be replayed, reworked, saved, fixed Only 1 new tech - kafka Streamer + Leaser = 200 lines of code + 800 lines of tests. It can be used as library not a service Buf/Go/PostgreSQL - everything reused - maintenance simplified. + Pros We have a full transaction log that can be replayed, reworked, saved, fixed Only 1 new tech - kafka Streamer + Leaser = 200 lines of code + 800 lines of tests. It can be used as library not a service Buf/Go/PostgreSQL - everything reused - maintenance simplified. − Cons WAL amplification - 2x. Transactional outbox requires 1 more write to each operation High delay - 2 min for events More CPU load than just reading from WAL.
  • 44. You learned about 2 min delay
  • 45. Orderstreamerdelay: 2min ULIDs - doesn’t allowus to understand the commit orderof events and missing parts. ULIDs - doesn’t allowus to understand the commit orderof events and missing parts. ULIDs - doesn’t allowus to understand the commit orderof events and missing parts.
  • 47. v2 Implementation Auto increment instead of ULIDwill helpyou to report and look formissing IDs. It’s more like “logical time”. Auto increment instead of ULIDwill helpyou to report and look formissing IDs. It’s more like “logical time”. Auto increment instead of ULIDwill helpyou to report and look formissing IDs. It’s more like “logical time”. Look formissing ids for, save them in meta table for2 mins and restream themwhen theyappear. Look formissing ids for, save them in meta table for2 mins and restream themwhen theyappear. Look formissing ids for, save them in meta table for2 mins and restream themwhen theyappear.