SlideShare a Scribd company logo
1 of 60
Download to read offline
https://nbomber.com
https://github.com/stereodb
AGENDA
- intro
- why stateless is slow and less reliable
- tools for building stateful services
PART I
intro to sportsbook domain
and
how we come to stateful
Dynamo Kyiv vs Chelsea
2 : 1
Red card
Score
changed
Odds
changed
PUSH
PULL
Dynamo Kyiv vs Chelsea
2 : 1
Red card
Score
changed
Odds
changed
PUSH
PULL
- quite big payloads: 30 KB compressed data (1.5 MB uncompressed)
- update rate: 2K RPS (per tenant)
- user query rate: 3-4K RPS (per tenant)
- live data is very dynamic: no much sense to cache it
- data should be queryable: simple KV is not enough
- we need secondary indexes
20 KB payload for concurrent read and write
Redis, single node: 4vcpu - 8gb
redis_write: 4K RPS, p75 = 543, p95 = 688, p99 = 842
redis read: 7K RPS, p75 = 970, p95 = 1278, p99 = 1597
API API API
DB
API
Cache
DB
but Cache is not queryable
API + DB
Stateful Service
state
state
state
CDC (Debezium)
DB
How to handle a case when
your data is larger than RAM?
10 GB 30 GB
Solution 1: use memory DB that supports data larger than RAM
10 GB
20 GB
UA PL FR
Solution 2: use partition by tenant
Solution 3: use range-based sharding
users
(1-500)
users
(501-1000)
shard A shard B
PART II
why stateless is slow
API + DB
Stateful Service
API
Cache
DB
network latency
network latency
Latency Numbers
Latency
2010 2020
Compress 1KB with Zippy 2μs 2μs
Read 1 MB sequentially from RAM 30μs 3μs
Read 1 MB sequentially from SSD 494μs 49μs
Read 1 MB sequentially from disk 3ms 825μs
Round trip within same datacenter 500μs 500μs
Send packet CA -> Netherlands -> CA 150ms 150ms
https://colin-scott.github.io/personal_website/research/interactive_latency.html
API
Cache
DB
CPU: for serialize/deserialize
CPU: serialize/deserialize
API + DB
Stateful Service
CPU for serialize (we don’t need to deserialize)
API
Cache
DB
CPU: for serialize/deserialize
CPU for ASYNC request handling
CPU: serialize/deserialize
CPU: ASYNC request handling
API + DB
Stateful Service
CPU for serialize (we don’t need to deserialize)
API
Cache
DB
CPU: for serialize/deserialize
CPU for ASYNC request handling
CPU: managing sockets
CPU: serialize/deserialize
CPU: ASYNC request handling
CPU: managing sockets
API + DB
Stateful Service
CPU for serialize (we don’t need to deserialize)
CPU for managing sockets (only clients sockets )
API
Cache
DB
CPU: for serialize/deserialize
CPU for ASYNC request handling
CPU: managing sockets
CPU: serialize/deserialize
CPU: ASYNC request handling
CPU: managing sockets
API + DB
Stateful Service
CPU for serialize (we don’t need to deserialize)
CPU for managing sockets (only clients sockets )
CPU for handling query (very cheap compared to
serialization)
API
Cache
DB
CPU: for serialize/deserialize
CPU for ASYNC request handling
CPU: managing sockets
Overreads
CPU: serialize/deserialize
CPU: ASYNC request handling
CPU: managing sockets
API + DB
Stateful Service
CPU for serialize (we don’t need to deserialize)
CPU for managing sockets (only clients sockets )
CPU for handling query (very cheap compared to
serialization)
Object hit rate / Transactional hit rate
A B
C
API
In order to fulfill our transactional flow we need to
fetch records: A, B, C
Record A and B will not impact our latency
Overall Latency = Latency of record C
Most existing cache eviction algorithms focus on maximizing
object hit rate, or the fraction of single object requests served
from cache. However, this approach fails to capture the
inter-object dependencies within transactions.
async / await
async / await
Imagine that we run Redis on localhost. Even with such setup we
usually use async request handling.
public void SimpleMethod()
{
var k = 0;
for (int i = 0; i < Iterations; i++)
{
k = Add(i, i);
}
}
[MethodImpl(MethodImplOptions.NoInlining)]
private int Add(int a, int b) => a + b;
public async Task SimpleMethodAsync()
{
var k = 0;
for (int i = 0; i < Iterations; i++)
{
k = await AddAsync(i, i);
}
}
private Task<int> AddAsync(int a, int b)
{
return Task.FromResult(a + b);
}
public async Task SimpleMethodAsyncYield()
{
var k = 0;
for (int i = 0; i < Iterations; i++)
{
k = await AddAsync(i, i);
}
}
private async Task<int> AddAsync(int a, int b)
{
await Task.Yield();
return a + b;
}
public async Task SimpleMethodAsyncYield()
{
var k = 0;
for (int i = 0; i < Iterations; i++)
{
k = await AddAsync(i, i);
}
}
private async Task<int> AddAsync(int a, int b)
{
await Task.Yield();
return await Task.Run(() => a + b);
}
PART III
why stateless is less reliable
API
Cache
DB
API + DB
Stateful Service
We have a higher probability of failure
API
Cache
DB
circuit breaker
retry
fallback
timeout
bulkhead isolation
circuit breaker
retry
fallback
timeout
bulkhead isolation
API + DB
Stateful Service
API
Cache
DB
What about cache invalidation
and data consistency?
API + DB
Stateful Service
API
Cache
DB
What about the predictable scale-out?
Will your RPS increase if you add an
additional API or Cache node?
API + DB
Stateful Service
- Metastable failures occur in open systems with an uncontrolled source of
load where a trigger causes the system to enter a bad state that persists
even when the trigger is removed.
- Paradoxically, the root cause of these failures is often features that
improve the efficiency or reliability of the system.
- The characteristic of a metastable failure is that the sustaining effect keeps
the system in the metastable failure state even after the trigger is
removed.
At least 4 out of 15 major outages in the
last decade at Amazon Web Services
were caused by metastable failures.
PART IV
tools for building stateful services
distributed log with sync replication
In-process memory DB
SQL OLAP
Dynamo Kyiv vs Chelsea
2 : 1
Red card
Score
changed
Odds
changed
PUSH
PULL
- quite big payloads: 30 KB compressed data (1.5 MB uncompressed)
- update rate: 2K RPS (per tenant)
- user query rate: 3-4K RPS (per tenant)
- live data is very dynamic: no much sense to cache it
- data should be queryable: simple KV is not enough
- we need secondary indexes
At pick to handle big load for 1 tenant we have:
5-10 nodes, 0.5-2 CPU, 6GB RAM
THANKS
always benchmark
https://twitter.com/antyadev

More Related Content

Similar to "Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan

L’odyssée d’une requête HTTP chez Scaleway
L’odyssée d’une requête HTTP chez ScalewayL’odyssée d’une requête HTTP chez Scaleway
L’odyssée d’une requête HTTP chez ScalewayScaleway
 
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SFWebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SFAlexandre Gouaillard
 
DPC2007 PHP And Oracle (Kuassi Mensah)
DPC2007 PHP And Oracle (Kuassi Mensah)DPC2007 PHP And Oracle (Kuassi Mensah)
DPC2007 PHP And Oracle (Kuassi Mensah)dpc
 
StrongLoop Overview
StrongLoop OverviewStrongLoop Overview
StrongLoop OverviewShubhra Kar
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Choisir entre une API RPC, SOAP, REST, GraphQL? 
Et si le problème était ai...
Choisir entre une API  RPC, SOAP, REST, GraphQL?  
Et si le problème était ai...Choisir entre une API  RPC, SOAP, REST, GraphQL?  
Et si le problème était ai...
Choisir entre une API RPC, SOAP, REST, GraphQL? 
Et si le problème était ai...François-Guillaume Ribreau
 
EEDC 2010. Scaling Web Applications
EEDC 2010. Scaling Web ApplicationsEEDC 2010. Scaling Web Applications
EEDC 2010. Scaling Web ApplicationsExpertos en TI
 
Shared Personalization Service - How To Scale to 15K RPS, Patrice Pelland
Shared Personalization Service - How To Scale to 15K RPS, Patrice PellandShared Personalization Service - How To Scale to 15K RPS, Patrice Pelland
Shared Personalization Service - How To Scale to 15K RPS, Patrice PellandFuenteovejuna
 
Kafka elastic search meetup 09242018
Kafka elastic search meetup 09242018Kafka elastic search meetup 09242018
Kafka elastic search meetup 09242018Ying Xu
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkMichael Stack
 
초보자를 위한 분산 캐시 이야기
초보자를 위한 분산 캐시 이야기초보자를 위한 분산 캐시 이야기
초보자를 위한 분산 캐시 이야기OnGameServer
 
Sql sever engine batch mode and cpu architectures
Sql sever engine batch mode and cpu architecturesSql sever engine batch mode and cpu architectures
Sql sever engine batch mode and cpu architecturesChris Adkin
 
How To Set Up SQL Load Balancing with HAProxy - Slides
How To Set Up SQL Load Balancing with HAProxy - SlidesHow To Set Up SQL Load Balancing with HAProxy - Slides
How To Set Up SQL Load Balancing with HAProxy - SlidesSeveralnines
 
Oracle Client Failover - Under The Hood
Oracle Client Failover - Under The HoodOracle Client Failover - Under The Hood
Oracle Client Failover - Under The HoodLudovico Caldara
 
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D...
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D..."Production-ready Serverless Java Applications in 3 weeks" at AWS Community D...
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D...Vadym Kazulkin
 
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호Amazon Web Services Korea
 
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...Ontico
 
Extending Piwik At R7.com
Extending Piwik At R7.comExtending Piwik At R7.com
Extending Piwik At R7.comLeo Lorieri
 

Similar to "Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan (20)

L’odyssée d’une requête HTTP chez Scaleway
L’odyssée d’une requête HTTP chez ScalewayL’odyssée d’une requête HTTP chez Scaleway
L’odyssée d’une requête HTTP chez Scaleway
 
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SFWebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF
 
DPC2007 PHP And Oracle (Kuassi Mensah)
DPC2007 PHP And Oracle (Kuassi Mensah)DPC2007 PHP And Oracle (Kuassi Mensah)
DPC2007 PHP And Oracle (Kuassi Mensah)
 
Cassandra at teads
Cassandra at teadsCassandra at teads
Cassandra at teads
 
StrongLoop Overview
StrongLoop OverviewStrongLoop Overview
StrongLoop Overview
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Choisir entre une API RPC, SOAP, REST, GraphQL? 
Et si le problème était ai...
Choisir entre une API  RPC, SOAP, REST, GraphQL?  
Et si le problème était ai...Choisir entre une API  RPC, SOAP, REST, GraphQL?  
Et si le problème était ai...
Choisir entre une API RPC, SOAP, REST, GraphQL? 
Et si le problème était ai...
 
EEDC 2010. Scaling Web Applications
EEDC 2010. Scaling Web ApplicationsEEDC 2010. Scaling Web Applications
EEDC 2010. Scaling Web Applications
 
Shared Personalization Service - How To Scale to 15K RPS, Patrice Pelland
Shared Personalization Service - How To Scale to 15K RPS, Patrice PellandShared Personalization Service - How To Scale to 15K RPS, Patrice Pelland
Shared Personalization Service - How To Scale to 15K RPS, Patrice Pelland
 
Kafka elastic search meetup 09242018
Kafka elastic search meetup 09242018Kafka elastic search meetup 09242018
Kafka elastic search meetup 09242018
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
 
초보자를 위한 분산 캐시 이야기
초보자를 위한 분산 캐시 이야기초보자를 위한 분산 캐시 이야기
초보자를 위한 분산 캐시 이야기
 
Sql sever engine batch mode and cpu architectures
Sql sever engine batch mode and cpu architecturesSql sever engine batch mode and cpu architectures
Sql sever engine batch mode and cpu architectures
 
How To Set Up SQL Load Balancing with HAProxy - Slides
How To Set Up SQL Load Balancing with HAProxy - SlidesHow To Set Up SQL Load Balancing with HAProxy - Slides
How To Set Up SQL Load Balancing with HAProxy - Slides
 
Oracle Client Failover - Under The Hood
Oracle Client Failover - Under The HoodOracle Client Failover - Under The Hood
Oracle Client Failover - Under The Hood
 
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D...
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D..."Production-ready Serverless Java Applications in 3 weeks" at AWS Community D...
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D...
 
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
 
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
 
Extending Piwik At R7.com
Extending Piwik At R7.comExtending Piwik At R7.com
Extending Piwik At R7.com
 

More from Fwdays

"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y..."How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...Fwdays
 
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil TopchiiFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro SpodaretsFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"Distributed graphs and microservices in Prom.ua", Maksym Kindritskyi
"Distributed graphs and microservices in Prom.ua",  Maksym Kindritskyi"Distributed graphs and microservices in Prom.ua",  Maksym Kindritskyi
"Distributed graphs and microservices in Prom.ua", Maksym KindritskyiFwdays
 
"Rethinking the existing data loading and processing process as an ETL exampl...
"Rethinking the existing data loading and processing process as an ETL exampl..."Rethinking the existing data loading and processing process as an ETL exampl...
"Rethinking the existing data loading and processing process as an ETL exampl...Fwdays
 
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T..."How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...Fwdays
 
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ..."The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...Fwdays
 
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu..."[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...Fwdays
 
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care..."[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...Fwdays
 
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"..."4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...Fwdays
 
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout", Anast...
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout",  Anast..."Reconnecting with Purpose: Rediscovering Job Interest after Burnout",  Anast...
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout", Anast...Fwdays
 
"Mentoring 101: How to effectively invest experience in the success of others...
"Mentoring 101: How to effectively invest experience in the success of others..."Mentoring 101: How to effectively invest experience in the success of others...
"Mentoring 101: How to effectively invest experience in the success of others...Fwdays
 
"Mission (im) possible: How to get an offer in 2024?", Oleksandra Myronova
"Mission (im) possible: How to get an offer in 2024?",  Oleksandra Myronova"Mission (im) possible: How to get an offer in 2024?",  Oleksandra Myronova
"Mission (im) possible: How to get an offer in 2024?", Oleksandra MyronovaFwdays
 
"Why have we learned how to package products, but not how to 'package ourselv...
"Why have we learned how to package products, but not how to 'package ourselv..."Why have we learned how to package products, but not how to 'package ourselv...
"Why have we learned how to package products, but not how to 'package ourselv...Fwdays
 
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin..."How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...Fwdays
 

More from Fwdays (20)

"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y..."How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
 
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"Distributed graphs and microservices in Prom.ua", Maksym Kindritskyi
"Distributed graphs and microservices in Prom.ua",  Maksym Kindritskyi"Distributed graphs and microservices in Prom.ua",  Maksym Kindritskyi
"Distributed graphs and microservices in Prom.ua", Maksym Kindritskyi
 
"Rethinking the existing data loading and processing process as an ETL exampl...
"Rethinking the existing data loading and processing process as an ETL exampl..."Rethinking the existing data loading and processing process as an ETL exampl...
"Rethinking the existing data loading and processing process as an ETL exampl...
 
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T..."How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
 
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ..."The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
 
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu..."[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
 
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care..."[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
 
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"..."4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
 
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout", Anast...
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout",  Anast..."Reconnecting with Purpose: Rediscovering Job Interest after Burnout",  Anast...
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout", Anast...
 
"Mentoring 101: How to effectively invest experience in the success of others...
"Mentoring 101: How to effectively invest experience in the success of others..."Mentoring 101: How to effectively invest experience in the success of others...
"Mentoring 101: How to effectively invest experience in the success of others...
 
"Mission (im) possible: How to get an offer in 2024?", Oleksandra Myronova
"Mission (im) possible: How to get an offer in 2024?",  Oleksandra Myronova"Mission (im) possible: How to get an offer in 2024?",  Oleksandra Myronova
"Mission (im) possible: How to get an offer in 2024?", Oleksandra Myronova
 
"Why have we learned how to package products, but not how to 'package ourselv...
"Why have we learned how to package products, but not how to 'package ourselv..."Why have we learned how to package products, but not how to 'package ourselv...
"Why have we learned how to package products, but not how to 'package ourselv...
 
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin..."How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
 

Recently uploaded

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan

  • 1.
  • 4. AGENDA - intro - why stateless is slow and less reliable - tools for building stateful services
  • 5. PART I intro to sportsbook domain and how we come to stateful
  • 6. Dynamo Kyiv vs Chelsea 2 : 1 Red card Score changed Odds changed PUSH PULL
  • 7. Dynamo Kyiv vs Chelsea 2 : 1 Red card Score changed Odds changed PUSH PULL - quite big payloads: 30 KB compressed data (1.5 MB uncompressed) - update rate: 2K RPS (per tenant) - user query rate: 3-4K RPS (per tenant) - live data is very dynamic: no much sense to cache it - data should be queryable: simple KV is not enough - we need secondary indexes
  • 8. 20 KB payload for concurrent read and write Redis, single node: 4vcpu - 8gb redis_write: 4K RPS, p75 = 543, p95 = 688, p99 = 842 redis read: 7K RPS, p75 = 970, p95 = 1278, p99 = 1597
  • 10. API Cache DB but Cache is not queryable
  • 11. API + DB Stateful Service
  • 13.
  • 15. How to handle a case when your data is larger than RAM? 10 GB 30 GB
  • 16. Solution 1: use memory DB that supports data larger than RAM 10 GB 20 GB
  • 17. UA PL FR Solution 2: use partition by tenant
  • 18. Solution 3: use range-based sharding users (1-500) users (501-1000) shard A shard B
  • 20. API + DB Stateful Service API Cache DB network latency network latency
  • 21. Latency Numbers Latency 2010 2020 Compress 1KB with Zippy 2μs 2μs Read 1 MB sequentially from RAM 30μs 3μs Read 1 MB sequentially from SSD 494μs 49μs Read 1 MB sequentially from disk 3ms 825μs Round trip within same datacenter 500μs 500μs Send packet CA -> Netherlands -> CA 150ms 150ms https://colin-scott.github.io/personal_website/research/interactive_latency.html
  • 22. API Cache DB CPU: for serialize/deserialize CPU: serialize/deserialize API + DB Stateful Service CPU for serialize (we don’t need to deserialize)
  • 23. API Cache DB CPU: for serialize/deserialize CPU for ASYNC request handling CPU: serialize/deserialize CPU: ASYNC request handling API + DB Stateful Service CPU for serialize (we don’t need to deserialize)
  • 24. API Cache DB CPU: for serialize/deserialize CPU for ASYNC request handling CPU: managing sockets CPU: serialize/deserialize CPU: ASYNC request handling CPU: managing sockets API + DB Stateful Service CPU for serialize (we don’t need to deserialize) CPU for managing sockets (only clients sockets )
  • 25. API Cache DB CPU: for serialize/deserialize CPU for ASYNC request handling CPU: managing sockets CPU: serialize/deserialize CPU: ASYNC request handling CPU: managing sockets API + DB Stateful Service CPU for serialize (we don’t need to deserialize) CPU for managing sockets (only clients sockets ) CPU for handling query (very cheap compared to serialization)
  • 26. API Cache DB CPU: for serialize/deserialize CPU for ASYNC request handling CPU: managing sockets Overreads CPU: serialize/deserialize CPU: ASYNC request handling CPU: managing sockets API + DB Stateful Service CPU for serialize (we don’t need to deserialize) CPU for managing sockets (only clients sockets ) CPU for handling query (very cheap compared to serialization)
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32. Object hit rate / Transactional hit rate
  • 33. A B C API In order to fulfill our transactional flow we need to fetch records: A, B, C Record A and B will not impact our latency Overall Latency = Latency of record C
  • 34.
  • 35.
  • 36. Most existing cache eviction algorithms focus on maximizing object hit rate, or the fraction of single object requests served from cache. However, this approach fails to capture the inter-object dependencies within transactions.
  • 38. async / await Imagine that we run Redis on localhost. Even with such setup we usually use async request handling.
  • 39. public void SimpleMethod() { var k = 0; for (int i = 0; i < Iterations; i++) { k = Add(i, i); } } [MethodImpl(MethodImplOptions.NoInlining)] private int Add(int a, int b) => a + b;
  • 40. public async Task SimpleMethodAsync() { var k = 0; for (int i = 0; i < Iterations; i++) { k = await AddAsync(i, i); } } private Task<int> AddAsync(int a, int b) { return Task.FromResult(a + b); }
  • 41. public async Task SimpleMethodAsyncYield() { var k = 0; for (int i = 0; i < Iterations; i++) { k = await AddAsync(i, i); } } private async Task<int> AddAsync(int a, int b) { await Task.Yield(); return a + b; }
  • 42. public async Task SimpleMethodAsyncYield() { var k = 0; for (int i = 0; i < Iterations; i++) { k = await AddAsync(i, i); } } private async Task<int> AddAsync(int a, int b) { await Task.Yield(); return await Task.Run(() => a + b); }
  • 43.
  • 44.
  • 45. PART III why stateless is less reliable
  • 46. API Cache DB API + DB Stateful Service We have a higher probability of failure
  • 47. API Cache DB circuit breaker retry fallback timeout bulkhead isolation circuit breaker retry fallback timeout bulkhead isolation API + DB Stateful Service
  • 48. API Cache DB What about cache invalidation and data consistency? API + DB Stateful Service
  • 49. API Cache DB What about the predictable scale-out? Will your RPS increase if you add an additional API or Cache node? API + DB Stateful Service
  • 50.
  • 51. - Metastable failures occur in open systems with an uncontrolled source of load where a trigger causes the system to enter a bad state that persists even when the trigger is removed. - Paradoxically, the root cause of these failures is often features that improve the efficiency or reliability of the system. - The characteristic of a metastable failure is that the sustaining effect keeps the system in the metastable failure state even after the trigger is removed.
  • 52. At least 4 out of 15 major outages in the last decade at Amazon Web Services were caused by metastable failures.
  • 53.
  • 54. PART IV tools for building stateful services
  • 55. distributed log with sync replication
  • 57.
  • 58.
  • 59. Dynamo Kyiv vs Chelsea 2 : 1 Red card Score changed Odds changed PUSH PULL - quite big payloads: 30 KB compressed data (1.5 MB uncompressed) - update rate: 2K RPS (per tenant) - user query rate: 3-4K RPS (per tenant) - live data is very dynamic: no much sense to cache it - data should be queryable: simple KV is not enough - we need secondary indexes At pick to handle big load for 1 tenant we have: 5-10 nodes, 0.5-2 CPU, 6GB RAM