SlideShare a Scribd company logo
Scaling Horizons
May 2024
Effective Strategies for Wix's Scaling
challenges
@NSilnitsky
Hi,
I’m Natan
Backend Infra Tech Lead @Wix
Yoga enthusiast
Speaker
Blogger
natansilnitsky
www.natansil.com
@NSilnitsky
Taming Distributed Systems @NSilnitsky
Taming Distributed Systems @NSilnitsky
Daily HTTP
Transactions
OLTP databases
~3.5B ~4PB
~70B
Kafka
messages a day
Scaling Challenges
Taming Distributed Systems @NSilnitsky
* our systems able grow meet
Taming Distributed Systems @NSilnitsky
Increase in number of machines
Scaling Challenges
vertical vs. horizontal scaling
Vertical scaling Horizontal scaling
Increase
in
processing
power
Taming Distributed Systems @NSilnitsky
Vertical scaling - bottlenecks, high cost
100%
CPU utilization
Taming Distributed Systems @NSilnitsky
Vertical scaling - bottlenecks, high cost
100%
CPU utilization
Increase in
processing
power
@NSilnitsky
Scaling
Challenges
vertical scaling is limited
@NSilnitsky
vertical scaling is limited
horizontal scaling for the rescue
Scaling
Challenges
Taming Distributed Systems @NSilnitsky * This talk focus mainly on horizontal…
@NSilnitsky
Horizontal scaling
▪ Distributes workload
▪ Improve overall performance
Taming Distributed Systems @NSilnitsky
Horizontal scaling
multiple dimensions
WWW
Application
Layer
Web requests balancing
Taming Distributed Systems @NSilnitsky
Horizontal scaling
multiple dimensions
WWW
Application
Layer
Kafka consumer balancing
Web requests balancing
* other messaging brokers
Taming Distributed Systems @NSilnitsky
Horizontal scaling
multiple dimensions
WWW
Application
Layer
Kafka consumer balancing
Web requests balancing
Data sharding
Taming Distributed Systems @NSilnitsky
Horizontal scaling
multiple dimensions
WWW
Application
Layer
Kafka consumer balancing
Web requests balancing
Data sharding
* But another, … key space for routes or
shards. goal, uniform, consistency
Routing/Sharding Strategies
Taming Distributed Systems @NSilnitsky
Horizontal scaling
multiple dimensions
WWW
Application
Layer
Kafka consumer balancing
Web requests balancing
Data sharding
* e.g. kafka clusters
Routing/Sharding Strategies
1, 2, 3
Fixed
Taming Distributed Systems @NSilnitsky
Horizontal scaling
multiple dimensions
WWW
Application
Layer
Kafka consumer balancing
Web requests balancing
Data sharding
Routing/Sharding Strategies
1, 2, 3 hash()
Fixed Dynamic
…
Taming Distributed Systems @NSilnitsky
Routing/Sharding Strategies
5 unique strategies at Wix
Fixed Dynamic
…
Taming Distributed Systems @NSilnitsky
Fixed Routing
1
Kafka cluster tenancy
Taming Distributed Systems @NSilnitsky
Fixed Routing
Kafka cluster tenancy - Before change
1
Kafka Consumer Proxy Wix App
RPC
Taming Distributed Systems @NSilnitsky
Fixed Routing
Kafka cluster tenancy - Before change
Kafka Consumer Proxy Wix App
RPC
Single point of failure
1
Taming Distributed Systems @NSilnitsky
Taming Distributed Systems @NSilnitsky
Proxy Deployment
for Kafka Cluster B
Proxy Deployment
for Kafka Cluster A
Proxy Deployment
for Kafka Cluster C
Consumer Proxy
Shared Codebase
Fixed Routing
Kafka cluster tenancy
1
Taming Distributed Systems @NSilnitsky
Consumer Proxy
B
Consumer Proxy
A
Consumer Proxy
C
Wix App
Wix App
A B C
Fixed Routing
Kafka cluster tenancy
1
Taming Distributed Systems @NSilnitsky
Consumer Proxy
B
Consumer Proxy
A
Consumer Proxy
C
Wix App
Wix App
A B C
Fixed Routing
Kafka cluster tenancy
1
Taming Distributed Systems @NSilnitsky
Consumer Proxy
B
Consumer Proxy
A
Consumer Proxy
C
Wix App
Wix App
A B C
Fixed Routing
Kafka cluster tenancy
1
Taming Distributed Systems @NSilnitsky
Fixed Routing
2
QoS tenancy
Taming Distributed Systems @NSilnitsky
kafka events to webhooks
Fixed routing
WWW
Kafka to Webhooks
Service
2
Taming Distributed Systems @NSilnitsky
Fixed Routing
QoS tenancy – Slow / Fast
WWW
Kafka to Webhooks
Service
Restaurants orders
App URL1
slow-responding.com
2
Taming Distributed Systems @NSilnitsky
Fixed Routing
QoS tenancy – Slow / Fast
Restaurants orders
App URL1
slow-responding.com
Restaurants
Restaurants
Restaurants
Slow-respo
2s
Back Front
Enqueue Dequeue
2
Taming Distributed Systems @NSilnitsky
Taming Distributed Systems @NSilnitsky
Taming Distributed Systems @NSilnitsky
Fixed Routing
QoS tenancy
Webhooks Service
Deployment
QoS group 2
Webhooks Service
Deployment
QoS group 1
Webhooks Service
Deployment
QoS group 3
Kafka to Webhooks Service
Shared Codebase
2
Taming Distributed Systems @NSilnitsky
Fixed Routing
QoS tenancy
Webhooks Service
Deployment
QoS group 2
Webhooks Service
Deployment
QoS group 1
Webhooks Service
Deployment
QoS group 3
Restaurants orders
App URL2
slow-responding.com
Restaurants orders
App URL1
Filter
restaurants-orders topic
Filter
slow-responding.com
2
Taming Distributed Systems @NSilnitsky
Taming Distributed Systems @NSilnitsky
Dynamic routing based on QoS metrics?
Webhooks Service
QoS group 2
Webhooks Service
QoS group 1
Webhooks Service
QoS group 3
my-webhook.com
Response time
< 200ms
Taming Distributed Systems @NSilnitsky
Dynamic routing based on QoS metrics?
Webhooks Service
QoS group 2
Webhooks Service
QoS group 1
Webhooks Service
QoS group 3
my-webhook.com
Response time
> 200ms for 5 minutes
Taming Distributed Systems @NSilnitsky
Dynamic routing based on QoS metrics?
Webhooks Service
QoS group 2
Webhooks Service
QoS group 1
Webhooks Service
QoS group 3
my-webhook.com my-webhook.com
Taming Distributed Systems @NSilnitsky
Dynamic routing based on QoS metrics?
Webhooks Service
QoS group 2
Webhooks Service
QoS group 1
Webhooks Service
QoS group 3
my-webhook.com
Response time
> 2s for 5 minutes
Taming Distributed Systems @NSilnitsky
Dynamic routing based on QoS metrics?
Webhooks Service
QoS group 2
Webhooks Service
QoS group 1
Webhooks Service
QoS group 3
my-webhook.com my-webhook.com
Taming Distributed Systems @NSilnitsky
Dynamic routing based on QoS metrics?
Webhooks Service
QoS group 2
Webhooks Service
QoS group 1
Webhooks Service
QoS group 3
my-webhook.com
Taming Distributed Systems @NSilnitsky
Dynamic routing based on QoS metrics?
Webhooks Service
QoS group 2
Webhooks Service
QoS group 1
Webhooks Service
QoS group 3
Restaurants orders
App URL2
slow-responding.com
Restaurants orders
App URL1
Filter
restaurants-orders topic
Filter
slow-responding.com
Taming Distributed Systems @NSilnitsky
Fixed routing
3 WWW
Wix-level traffic
Taming Distributed Systems @NSilnitsky
Fixed routing
Wix traffic
Editor Segment
Public
Segment
* rendering high avail.
1
Taming Distributed Systems @NSilnitsky
Fixed routing
Wix traffic
Editor Segment Public /SSR Segment
▪ Site Editing
▪ Business management
▪ Route “shard” key E.g.
*.wix.com
▪ Site rendering
▪ Route “shard” key E.g.
{useName}.wixsite.com/{siteName}
2
Taming Distributed Systems @NSilnitsky
Dynamic Routing
4
High-throughput Reactions application
Taming Distributed Systems @NSilnitsky
Dynamic Routing
High-throughput application
• Reactions application
• simple service for users to like/unlike posts
or post a custom emoji as a reaction
Taming Distributed Systems @NSilnitsky
Taming Distributed Systems @NSilnitsky
• Reactions application
• simple service for users to like/unlike posts
or post a custom emoji as a reaction
• Key Design Considerations
• High Read/Write Throughput - Read: 30K RPM, Write: 1K RPM
• Low Latency Reads
• Scalability
* how to address requirements
Dynamic Routing
High-throughput application
Taming Distributed Systems @NSilnitsky
Kafka Topic Partitions Traffic Balancing
▪ Performance & Durability
▪ Parallel Processing & Order Guarantee
▪ Scalability
Taming Distributed Systems @NSilnitsky
Kafka Topic Partitions Traffic Balancing
Broker
Topic
Partition
Partition
Partition
Topic
Partition
Partition
Partition
Topic
Partition
Partition
Partition
Producer
Consumer
Producer Producer
Consumer Consumer
Taming Distributed Systems @NSilnitsky
Kafka Topic Partitions Traffic Balancing
Cluster
Server 1 Server 2
C3 C4 C5 C6
P0 P3 P1 P2
Consumer Group A
Taming Distributed Systems @NSilnitsky
Kafka Topic Partitions Traffic Balancing
Topic
Partition
Partition
Partition
Reactions App
Reaction
by user A
Reactions Pod A
Reactions Pod B
Taming Distributed Systems @NSilnitsky
Kafka Topic Partitions Traffic Balancing
Topic
Partition
Partition
Partition
Reactions App
Reaction
by user A
Reactions Pod A
Reactions Pod B
Taming Distributed Systems @NSilnitsky
• Key Design Considerations
• High Write Throughput
• Low Latency Reads
• Scalability
Dynamic Routing
High-throughput application
Taming Distributed Systems @NSilnitsky
• Key Design Considerations
• High Write Throughput
• Low Latency Reads
• Scalability
We handled traffic bottleneck,
what about data persistence?
* only strong weakest link
Dynamic Routing
High-throughput application
Taming Distributed Systems @NSilnitsky
Dynamic sharding
5
High-throughput Reactions application
Taming Distributed Systems @NSilnitsky
Enter –
DynamoDB (Data) Sharding
▪ High-throughput, large-scale applications
▪ Partitioning data across multiple nodes
▪ Distributes workload evenly
▪ Automatically adjusts
Taming Distributed Systems @NSilnitsky
DynamoDB
Intro
Table:
Source: Amazon AWS
Taming Distributed Systems @NSilnitsky
DynamoDB Sharding
High-throughput application
• Reactions application
• DynamoDB Table Structure
Partition Key:
appDef/itemId
(image gallery, forum, blog)
Sort Key:
userId
Attributes:
list of reactions
UserId
David
Laura
Michael
Jane
AppId/
ItemId
Taming Distributed Systems @NSilnitsky
DynamoDB
Tables and partitions
Taming Distributed Systems @NSilnitsky
DynamoDB
Partition splitting and merging
10 GB of data
3,000 read capacity units
1,000 write capacity units
* exceeding, triggers splitting,
conversely, merging. single item…
Taming Distributed Systems @NSilnitsky
DynamoDB
Partition Table
Taming Distributed Systems @NSilnitsky
DynamoDB
Partition hash Sort Key
Taming Distributed Systems @NSilnitsky
DynamoDB
Cost increases with capacity
Throughput Cost
Taming Distributed Systems @NSilnitsky
Taming Distributed Systems @NSilnitsky
Dynamic routing
(Fixed data sharding)
6 WWW
Data Locality for Enterprise customers
Taming Distributed Systems @NSilnitsky
Types of headache
Taming Distributed Systems @NSilnitsky
Geo Sharding
Big Enterprise Customers
▪ Data locality e.g. GDPR
▪ EU/US only setup
▪ configurable
Taming Distributed Systems @NSilnitsky
Geo Sharding
Data locality
Cluster with nodes
only in USA
Global Clusters
Cluster with nodes
only in EU
U
S
E
U
Taming Distributed Systems @NSilnitsky
MySQL Routing via ProxySQL
• ProxySQL is high performance load balancer for MySQL
• Intelligent Query Routing
• Enhanced High Availability
• Sharding Support
Taming Distributed Systems @NSilnitsky
MySQL Routing via ProxySQL
Taming Distributed Systems @NSilnitsky
Query rules + Routing hints
Taming Distributed Systems @NSilnitsky
Future Sharding Strategies
▪ Breaking large database to smaller
units
▪ Dedicated database cluster for a
specific customer
▪ Supporting additional regional
constraints
* By using query rule
Taming Distributed Systems @NSilnitsky
Takeaways
Taming Distributed Systems @NSilnitsky
Horizontal Scaling increases complexity
▪ Modeling & architecture
▪ Migrations
▪ Mental complexity
Taming Distributed Systems @NSilnitsky
Horizontal scaling Tech
Taming Distributed Systems @NSilnitsky
Horizontal scaling Tech
Taming Distributed Systems @NSilnitsky
Horizontal scaling Tech
Taming Distributed Systems @NSilnitsky
Horizontal scaling Tech
Taming Distributed Systems @NSilnitsky
How to split Key space? Fixed segments
Cluster
A
Cluster
B
Cluster
C
Cluster
D
Small group of tenants
QoS / SLA
GEO
US …
… EU
0
Latency …
“regular” abusers
Taming Distributed Systems @NSilnitsky
How to split Key space? Dynamic routing
Hash function - %
Ranges
WildCard / Regex
Tenant Mapping
Taming Distributed Systems @NSilnitsky
The scaling + sharding decision tree
Manually
Partition your
Deployment/Table
Use rules to
determine Shard
Use Hash Key
Add more
compute/Memory
Degraded
Performance
Small
machine /
low cost?
small set
of
“tenants”?
Custom
Routing?
No, scale
horizontally
No, too
many
“tenants”
No rule,
go wild!
Yes,
preset
rules
Yes, Fixed
sharding
Yes,
Scale vertically
Taming Distributed Systems @NSilnitsky
The scaling + sharding decision tree
Manually
Partition your
Deployment/Table
Use rules to
determine Shard
Use Hash Key
Add more
compute/Memory
Degraded
Performance
Scale
vertically?
Naïve
Sharding?
Custom
Routing?
No, too
expansive
No, too
many
“tenants”
No rule,
go wild!
Yes,
preset
rules
Yes, small
set of
“tenants”
Yes,
cost-effective
You can also mix
Fixed with Dynamic
Taming Distributed Systems @NSilnitsky
Taming Distributed Systems
Q & A
Thank you
natansilnitsky www.natansil.com
@NSilnitsky
👉 slideshare.net/NatanSilnitsky

More Related Content

Similar to Effective Strategies for Wix's Scaling challenges - GeeCon

Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...
Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...
Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...
Natan Silnitsky
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
confluent
 
Changing application demands: What developers need to know
Changing application demands: What developers need to knowChanging application demands: What developers need to know
Changing application demands: What developers need to know
IndicThreads
 
Virtualize with bare metal performance
Virtualize with bare metal performanceVirtualize with bare metal performance
Virtualize with bare metal performance
Deba Chatterjee
 
Migrating to Multi Cluster Managed Kafka - Conf42 - CloudNative
Migrating to Multi Cluster Managed Kafka - Conf42 - CloudNative Migrating to Multi Cluster Managed Kafka - Conf42 - CloudNative
Migrating to Multi Cluster Managed Kafka - Conf42 - CloudNative
Natan Silnitsky
 
AWS Webinar 201: Designing scalable, available & resilient cloud applications
AWS Webinar 201: Designing scalable, available & resilient cloud applicationsAWS Webinar 201: Designing scalable, available & resilient cloud applications
AWS Webinar 201: Designing scalable, available & resilient cloud applications
Amazon Web Services
 
Architectural Commandments for Building & Running Microservices at Scale
Architectural Commandments for Building & Running Microservices at ScaleArchitectural Commandments for Building & Running Microservices at Scale
Architectural Commandments for Building & Running Microservices at Scale
Brian Wilson
 
OpenStack-Based NFV Cloud at Swisscom: challenges and best practices
OpenStack-Based NFV Cloud at Swisscom: challenges and best practicesOpenStack-Based NFV Cloud at Swisscom: challenges and best practices
OpenStack-Based NFV Cloud at Swisscom: challenges and best practices
Avi Networks
 
Openstack Summit: Networking and policies across Containers and VMs
Openstack Summit: Networking and policies across Containers and VMsOpenstack Summit: Networking and policies across Containers and VMs
Openstack Summit: Networking and policies across Containers and VMs
Sanjeev Rampal
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Kai Wähner
 
Apache Kafka® at Dropbox
Apache Kafka® at DropboxApache Kafka® at Dropbox
Apache Kafka® at Dropbox
confluent
 
Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...
Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...
Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...
Natan Silnitsky
 
Migrating to Multi Cluster Managed Kafka - DevopStars 2022
Migrating to Multi Cluster Managed Kafka - DevopStars 2022Migrating to Multi Cluster Managed Kafka - DevopStars 2022
Migrating to Multi Cluster Managed Kafka - DevopStars 2022
Natan Silnitsky
 
AWS를 활용한 웹, 모바일, 소셜 애플리케이션 구축 방법
AWS를 활용한 웹, 모바일, 소셜 애플리케이션 구축 방법AWS를 활용한 웹, 모바일, 소셜 애플리케이션 구축 방법
AWS를 활용한 웹, 모바일, 소셜 애플리케이션 구축 방법
Amazon Web Services Korea
 
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed ServiceCloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
VMware Tanzu
 
Directions for CloudStack Networking
Directions for CloudStack  NetworkingDirections for CloudStack  Networking
Directions for CloudStack Networking
Chiradeep Vittal
 
Service Provider Architectures for Tomorrow by Chow Khay Kid
Service Provider Architectures for Tomorrow by Chow Khay KidService Provider Architectures for Tomorrow by Chow Khay Kid
Service Provider Architectures for Tomorrow by Chow Khay Kid
MyNOG
 
Devoxx UK - Migrating to Multi Cluster Managed Kafka
Devoxx UK - Migrating to Multi Cluster Managed KafkaDevoxx UK - Migrating to Multi Cluster Managed Kafka
Devoxx UK - Migrating to Multi Cluster Managed Kafka
Natan Silnitsky
 
Hadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an exampleHadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an example
hadooparchbook
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 

Similar to Effective Strategies for Wix's Scaling challenges - GeeCon (20)

Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...
Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...
Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
 
Changing application demands: What developers need to know
Changing application demands: What developers need to knowChanging application demands: What developers need to know
Changing application demands: What developers need to know
 
Virtualize with bare metal performance
Virtualize with bare metal performanceVirtualize with bare metal performance
Virtualize with bare metal performance
 
Migrating to Multi Cluster Managed Kafka - Conf42 - CloudNative
Migrating to Multi Cluster Managed Kafka - Conf42 - CloudNative Migrating to Multi Cluster Managed Kafka - Conf42 - CloudNative
Migrating to Multi Cluster Managed Kafka - Conf42 - CloudNative
 
AWS Webinar 201: Designing scalable, available & resilient cloud applications
AWS Webinar 201: Designing scalable, available & resilient cloud applicationsAWS Webinar 201: Designing scalable, available & resilient cloud applications
AWS Webinar 201: Designing scalable, available & resilient cloud applications
 
Architectural Commandments for Building & Running Microservices at Scale
Architectural Commandments for Building & Running Microservices at ScaleArchitectural Commandments for Building & Running Microservices at Scale
Architectural Commandments for Building & Running Microservices at Scale
 
OpenStack-Based NFV Cloud at Swisscom: challenges and best practices
OpenStack-Based NFV Cloud at Swisscom: challenges and best practicesOpenStack-Based NFV Cloud at Swisscom: challenges and best practices
OpenStack-Based NFV Cloud at Swisscom: challenges and best practices
 
Openstack Summit: Networking and policies across Containers and VMs
Openstack Summit: Networking and policies across Containers and VMsOpenstack Summit: Networking and policies across Containers and VMs
Openstack Summit: Networking and policies across Containers and VMs
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
 
Apache Kafka® at Dropbox
Apache Kafka® at DropboxApache Kafka® at Dropbox
Apache Kafka® at Dropbox
 
Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...
Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...
Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...
 
Migrating to Multi Cluster Managed Kafka - DevopStars 2022
Migrating to Multi Cluster Managed Kafka - DevopStars 2022Migrating to Multi Cluster Managed Kafka - DevopStars 2022
Migrating to Multi Cluster Managed Kafka - DevopStars 2022
 
AWS를 활용한 웹, 모바일, 소셜 애플리케이션 구축 방법
AWS를 활용한 웹, 모바일, 소셜 애플리케이션 구축 방법AWS를 활용한 웹, 모바일, 소셜 애플리케이션 구축 방법
AWS를 활용한 웹, 모바일, 소셜 애플리케이션 구축 방법
 
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed ServiceCloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
 
Directions for CloudStack Networking
Directions for CloudStack  NetworkingDirections for CloudStack  Networking
Directions for CloudStack Networking
 
Service Provider Architectures for Tomorrow by Chow Khay Kid
Service Provider Architectures for Tomorrow by Chow Khay KidService Provider Architectures for Tomorrow by Chow Khay Kid
Service Provider Architectures for Tomorrow by Chow Khay Kid
 
Devoxx UK - Migrating to Multi Cluster Managed Kafka
Devoxx UK - Migrating to Multi Cluster Managed KafkaDevoxx UK - Migrating to Multi Cluster Managed Kafka
Devoxx UK - Migrating to Multi Cluster Managed Kafka
 
Hadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an exampleHadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an example
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 

More from Natan Silnitsky

Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Natan Silnitsky
 
Workflow Engines & Event Streaming Brokers - Can they work together? [Current...
Workflow Engines & Event Streaming Brokers - Can they work together? [Current...Workflow Engines & Event Streaming Brokers - Can they work together? [Current...
Workflow Engines & Event Streaming Brokers - Can they work together? [Current...
Natan Silnitsky
 
DevSum - Lessons Learned from 2000 microservices
DevSum - Lessons Learned from 2000 microservicesDevSum - Lessons Learned from 2000 microservices
DevSum - Lessons Learned from 2000 microservices
Natan Silnitsky
 
GeeCon - Lessons Learned from 2000 microservices
GeeCon - Lessons Learned from 2000 microservicesGeeCon - Lessons Learned from 2000 microservices
GeeCon - Lessons Learned from 2000 microservices
Natan Silnitsky
 
Wix+Confluent Meetup - Lessons Learned from 2000 Event Driven Microservices
Wix+Confluent Meetup - Lessons Learned from 2000 Event Driven MicroservicesWix+Confluent Meetup - Lessons Learned from 2000 Event Driven Microservices
Wix+Confluent Meetup - Lessons Learned from 2000 Event Driven Microservices
Natan Silnitsky
 
BuildStuff - Lessons Learned from 2000 Event Driven Microservices
BuildStuff - Lessons Learned from 2000 Event Driven MicroservicesBuildStuff - Lessons Learned from 2000 Event Driven Microservices
BuildStuff - Lessons Learned from 2000 Event Driven Microservices
Natan Silnitsky
 
Lessons Learned from 2000 Event Driven Microservices - Reversim
Lessons Learned from 2000 Event Driven Microservices - ReversimLessons Learned from 2000 Event Driven Microservices - Reversim
Lessons Learned from 2000 Event Driven Microservices - Reversim
Natan Silnitsky
 
Devoxx Ukraine - Kafka based Global Data Mesh
Devoxx Ukraine - Kafka based Global Data MeshDevoxx Ukraine - Kafka based Global Data Mesh
Devoxx Ukraine - Kafka based Global Data Mesh
Natan Silnitsky
 
Dev Days Europe - Kafka based Global Data Mesh at Wix
Dev Days Europe - Kafka based Global Data Mesh at WixDev Days Europe - Kafka based Global Data Mesh at Wix
Dev Days Europe - Kafka based Global Data Mesh at Wix
Natan Silnitsky
 
Kafka Summit London - Kafka based Global Data Mesh at Wix
Kafka Summit London - Kafka based Global Data Mesh at WixKafka Summit London - Kafka based Global Data Mesh at Wix
Kafka Summit London - Kafka based Global Data Mesh at Wix
Natan Silnitsky
 
5 Takeaways from Migrating a Library to Scala 3 - Scala Love
5 Takeaways from Migrating a Library to Scala 3 - Scala Love5 Takeaways from Migrating a Library to Scala 3 - Scala Love
5 Takeaways from Migrating a Library to Scala 3 - Scala Love
Natan Silnitsky
 
Open sourcing a successful internal project - Reversim 2021
Open sourcing a successful internal project - Reversim 2021Open sourcing a successful internal project - Reversim 2021
Open sourcing a successful internal project - Reversim 2021
Natan Silnitsky
 
How to successfully manage a ZIO fiber’s lifecycle - Functional Scala 2021
How to successfully manage a ZIO fiber’s lifecycle - Functional Scala 2021How to successfully manage a ZIO fiber’s lifecycle - Functional Scala 2021
How to successfully manage a ZIO fiber’s lifecycle - Functional Scala 2021
Natan Silnitsky
 
Advanced Caching Patterns used by 2000 microservices - Code Motion
Advanced Caching Patterns used by 2000 microservices - Code MotionAdvanced Caching Patterns used by 2000 microservices - Code Motion
Advanced Caching Patterns used by 2000 microservices - Code Motion
Natan Silnitsky
 
Advanced Caching Patterns used by 2000 microservices - Devoxx Ukraine
Advanced Caching Patterns used by 2000 microservices - Devoxx UkraineAdvanced Caching Patterns used by 2000 microservices - Devoxx Ukraine
Advanced Caching Patterns used by 2000 microservices - Devoxx Ukraine
Natan Silnitsky
 
Advanced Microservices Caching Patterns - Devoxx UK
Advanced Microservices Caching Patterns - Devoxx UKAdvanced Microservices Caching Patterns - Devoxx UK
Advanced Microservices Caching Patterns - Devoxx UK
Natan Silnitsky
 
Battle-tested event-driven patterns for your microservices architecture - Sca...
Battle-tested event-driven patterns for your microservices architecture - Sca...Battle-tested event-driven patterns for your microservices architecture - Sca...
Battle-tested event-driven patterns for your microservices architecture - Sca...
Natan Silnitsky
 
Advanced Caching Patterns used by 2000 microservices - Api World
Advanced Caching Patterns used by 2000 microservices - Api WorldAdvanced Caching Patterns used by 2000 microservices - Api World
Advanced Caching Patterns used by 2000 microservices - Api World
Natan Silnitsky
 
Kafka based Global Data Mesh at Wix
Kafka based Global Data Mesh at WixKafka based Global Data Mesh at Wix
Kafka based Global Data Mesh at Wix
Natan Silnitsky
 

More from Natan Silnitsky (20)

Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Workflow Engines & Event Streaming Brokers - Can they work together? [Current...
Workflow Engines & Event Streaming Brokers - Can they work together? [Current...Workflow Engines & Event Streaming Brokers - Can they work together? [Current...
Workflow Engines & Event Streaming Brokers - Can they work together? [Current...
 
DevSum - Lessons Learned from 2000 microservices
DevSum - Lessons Learned from 2000 microservicesDevSum - Lessons Learned from 2000 microservices
DevSum - Lessons Learned from 2000 microservices
 
GeeCon - Lessons Learned from 2000 microservices
GeeCon - Lessons Learned from 2000 microservicesGeeCon - Lessons Learned from 2000 microservices
GeeCon - Lessons Learned from 2000 microservices
 
Wix+Confluent Meetup - Lessons Learned from 2000 Event Driven Microservices
Wix+Confluent Meetup - Lessons Learned from 2000 Event Driven MicroservicesWix+Confluent Meetup - Lessons Learned from 2000 Event Driven Microservices
Wix+Confluent Meetup - Lessons Learned from 2000 Event Driven Microservices
 
BuildStuff - Lessons Learned from 2000 Event Driven Microservices
BuildStuff - Lessons Learned from 2000 Event Driven MicroservicesBuildStuff - Lessons Learned from 2000 Event Driven Microservices
BuildStuff - Lessons Learned from 2000 Event Driven Microservices
 
Lessons Learned from 2000 Event Driven Microservices - Reversim
Lessons Learned from 2000 Event Driven Microservices - ReversimLessons Learned from 2000 Event Driven Microservices - Reversim
Lessons Learned from 2000 Event Driven Microservices - Reversim
 
Devoxx Ukraine - Kafka based Global Data Mesh
Devoxx Ukraine - Kafka based Global Data MeshDevoxx Ukraine - Kafka based Global Data Mesh
Devoxx Ukraine - Kafka based Global Data Mesh
 
Dev Days Europe - Kafka based Global Data Mesh at Wix
Dev Days Europe - Kafka based Global Data Mesh at WixDev Days Europe - Kafka based Global Data Mesh at Wix
Dev Days Europe - Kafka based Global Data Mesh at Wix
 
Kafka Summit London - Kafka based Global Data Mesh at Wix
Kafka Summit London - Kafka based Global Data Mesh at WixKafka Summit London - Kafka based Global Data Mesh at Wix
Kafka Summit London - Kafka based Global Data Mesh at Wix
 
5 Takeaways from Migrating a Library to Scala 3 - Scala Love
5 Takeaways from Migrating a Library to Scala 3 - Scala Love5 Takeaways from Migrating a Library to Scala 3 - Scala Love
5 Takeaways from Migrating a Library to Scala 3 - Scala Love
 
Open sourcing a successful internal project - Reversim 2021
Open sourcing a successful internal project - Reversim 2021Open sourcing a successful internal project - Reversim 2021
Open sourcing a successful internal project - Reversim 2021
 
How to successfully manage a ZIO fiber’s lifecycle - Functional Scala 2021
How to successfully manage a ZIO fiber’s lifecycle - Functional Scala 2021How to successfully manage a ZIO fiber’s lifecycle - Functional Scala 2021
How to successfully manage a ZIO fiber’s lifecycle - Functional Scala 2021
 
Advanced Caching Patterns used by 2000 microservices - Code Motion
Advanced Caching Patterns used by 2000 microservices - Code MotionAdvanced Caching Patterns used by 2000 microservices - Code Motion
Advanced Caching Patterns used by 2000 microservices - Code Motion
 
Advanced Caching Patterns used by 2000 microservices - Devoxx Ukraine
Advanced Caching Patterns used by 2000 microservices - Devoxx UkraineAdvanced Caching Patterns used by 2000 microservices - Devoxx Ukraine
Advanced Caching Patterns used by 2000 microservices - Devoxx Ukraine
 
Advanced Microservices Caching Patterns - Devoxx UK
Advanced Microservices Caching Patterns - Devoxx UKAdvanced Microservices Caching Patterns - Devoxx UK
Advanced Microservices Caching Patterns - Devoxx UK
 
Battle-tested event-driven patterns for your microservices architecture - Sca...
Battle-tested event-driven patterns for your microservices architecture - Sca...Battle-tested event-driven patterns for your microservices architecture - Sca...
Battle-tested event-driven patterns for your microservices architecture - Sca...
 
Advanced Caching Patterns used by 2000 microservices - Api World
Advanced Caching Patterns used by 2000 microservices - Api WorldAdvanced Caching Patterns used by 2000 microservices - Api World
Advanced Caching Patterns used by 2000 microservices - Api World
 
Kafka based Global Data Mesh at Wix
Kafka based Global Data Mesh at WixKafka based Global Data Mesh at Wix
Kafka based Global Data Mesh at Wix
 

Recently uploaded

J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
Bert Jan Schrijver
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
Alberto Brandolini
 
How GenAI Can Improve Supplier Performance Management.pdf
How GenAI Can Improve Supplier Performance Management.pdfHow GenAI Can Improve Supplier Performance Management.pdf
How GenAI Can Improve Supplier Performance Management.pdf
Zycus
 
14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision
ShulagnaSarkar2
 
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
kalichargn70th171
 
The Role of DevOps in Digital Transformation.pdf
The Role of DevOps in Digital Transformation.pdfThe Role of DevOps in Digital Transformation.pdf
The Role of DevOps in Digital Transformation.pdf
mohitd6
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert
vaishalijagtap12
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
Alina Yurenko
 
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
gapen1
 
Penify - Let AI do the Documentation, you write the Code.
Penify - Let AI do the Documentation, you write the Code.Penify - Let AI do the Documentation, you write the Code.
Penify - Let AI do the Documentation, you write the Code.
KrishnaveniMohan1
 
TMU毕业证书精仿办理
TMU毕业证书精仿办理TMU毕业证书精仿办理
TMU毕业证书精仿办理
aeeva
 
What is Continuous Testing in DevOps - A Definitive Guide.pdf
What is Continuous Testing in DevOps - A Definitive Guide.pdfWhat is Continuous Testing in DevOps - A Definitive Guide.pdf
What is Continuous Testing in DevOps - A Definitive Guide.pdf
kalichargn70th171
 
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
dakas1
 
ppt on the brain chip neuralink.pptx
ppt  on   the brain  chip neuralink.pptxppt  on   the brain  chip neuralink.pptx
ppt on the brain chip neuralink.pptx
Reetu63
 
Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...
Paul Brebner
 
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptxMigration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
ervikas4
 
Upturn India Technologies - Web development company in Nashik
Upturn India Technologies - Web development company in NashikUpturn India Technologies - Web development company in Nashik
Upturn India Technologies - Web development company in Nashik
Upturn India Technologies
 
How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?
ToXSL Technologies
 
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
campbellclarkson
 

Recently uploaded (20)

J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
 
How GenAI Can Improve Supplier Performance Management.pdf
How GenAI Can Improve Supplier Performance Management.pdfHow GenAI Can Improve Supplier Performance Management.pdf
How GenAI Can Improve Supplier Performance Management.pdf
 
14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision
 
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
 
The Role of DevOps in Digital Transformation.pdf
The Role of DevOps in Digital Transformation.pdfThe Role of DevOps in Digital Transformation.pdf
The Role of DevOps in Digital Transformation.pdf
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
 
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
 
Penify - Let AI do the Documentation, you write the Code.
Penify - Let AI do the Documentation, you write the Code.Penify - Let AI do the Documentation, you write the Code.
Penify - Let AI do the Documentation, you write the Code.
 
TMU毕业证书精仿办理
TMU毕业证书精仿办理TMU毕业证书精仿办理
TMU毕业证书精仿办理
 
What is Continuous Testing in DevOps - A Definitive Guide.pdf
What is Continuous Testing in DevOps - A Definitive Guide.pdfWhat is Continuous Testing in DevOps - A Definitive Guide.pdf
What is Continuous Testing in DevOps - A Definitive Guide.pdf
 
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
 
ppt on the brain chip neuralink.pptx
ppt  on   the brain  chip neuralink.pptxppt  on   the brain  chip neuralink.pptx
ppt on the brain chip neuralink.pptx
 
Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...
 
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptxMigration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
 
Upturn India Technologies - Web development company in Nashik
Upturn India Technologies - Web development company in NashikUpturn India Technologies - Web development company in Nashik
Upturn India Technologies - Web development company in Nashik
 
How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?
 
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
 

Effective Strategies for Wix's Scaling challenges - GeeCon

  • 1. Scaling Horizons May 2024 Effective Strategies for Wix's Scaling challenges
  • 2. @NSilnitsky Hi, I’m Natan Backend Infra Tech Lead @Wix Yoga enthusiast Speaker Blogger natansilnitsky www.natansil.com @NSilnitsky
  • 4. Taming Distributed Systems @NSilnitsky Daily HTTP Transactions OLTP databases ~3.5B ~4PB ~70B Kafka messages a day Scaling Challenges
  • 5. Taming Distributed Systems @NSilnitsky * our systems able grow meet
  • 6. Taming Distributed Systems @NSilnitsky Increase in number of machines Scaling Challenges vertical vs. horizontal scaling Vertical scaling Horizontal scaling Increase in processing power
  • 7. Taming Distributed Systems @NSilnitsky Vertical scaling - bottlenecks, high cost 100% CPU utilization
  • 8. Taming Distributed Systems @NSilnitsky Vertical scaling - bottlenecks, high cost 100% CPU utilization Increase in processing power
  • 10. @NSilnitsky vertical scaling is limited horizontal scaling for the rescue Scaling Challenges
  • 11. Taming Distributed Systems @NSilnitsky * This talk focus mainly on horizontal…
  • 12. @NSilnitsky Horizontal scaling ▪ Distributes workload ▪ Improve overall performance
  • 13. Taming Distributed Systems @NSilnitsky Horizontal scaling multiple dimensions WWW Application Layer Web requests balancing
  • 14. Taming Distributed Systems @NSilnitsky Horizontal scaling multiple dimensions WWW Application Layer Kafka consumer balancing Web requests balancing * other messaging brokers
  • 15. Taming Distributed Systems @NSilnitsky Horizontal scaling multiple dimensions WWW Application Layer Kafka consumer balancing Web requests balancing Data sharding
  • 16. Taming Distributed Systems @NSilnitsky Horizontal scaling multiple dimensions WWW Application Layer Kafka consumer balancing Web requests balancing Data sharding * But another, … key space for routes or shards. goal, uniform, consistency Routing/Sharding Strategies
  • 17. Taming Distributed Systems @NSilnitsky Horizontal scaling multiple dimensions WWW Application Layer Kafka consumer balancing Web requests balancing Data sharding * e.g. kafka clusters Routing/Sharding Strategies 1, 2, 3 Fixed
  • 18. Taming Distributed Systems @NSilnitsky Horizontal scaling multiple dimensions WWW Application Layer Kafka consumer balancing Web requests balancing Data sharding Routing/Sharding Strategies 1, 2, 3 hash() Fixed Dynamic …
  • 19. Taming Distributed Systems @NSilnitsky Routing/Sharding Strategies 5 unique strategies at Wix Fixed Dynamic …
  • 20. Taming Distributed Systems @NSilnitsky Fixed Routing 1 Kafka cluster tenancy
  • 21. Taming Distributed Systems @NSilnitsky Fixed Routing Kafka cluster tenancy - Before change 1 Kafka Consumer Proxy Wix App RPC
  • 22. Taming Distributed Systems @NSilnitsky Fixed Routing Kafka cluster tenancy - Before change Kafka Consumer Proxy Wix App RPC Single point of failure 1
  • 24. Taming Distributed Systems @NSilnitsky Proxy Deployment for Kafka Cluster B Proxy Deployment for Kafka Cluster A Proxy Deployment for Kafka Cluster C Consumer Proxy Shared Codebase Fixed Routing Kafka cluster tenancy 1
  • 25. Taming Distributed Systems @NSilnitsky Consumer Proxy B Consumer Proxy A Consumer Proxy C Wix App Wix App A B C Fixed Routing Kafka cluster tenancy 1
  • 26. Taming Distributed Systems @NSilnitsky Consumer Proxy B Consumer Proxy A Consumer Proxy C Wix App Wix App A B C Fixed Routing Kafka cluster tenancy 1
  • 27. Taming Distributed Systems @NSilnitsky Consumer Proxy B Consumer Proxy A Consumer Proxy C Wix App Wix App A B C Fixed Routing Kafka cluster tenancy 1
  • 28. Taming Distributed Systems @NSilnitsky Fixed Routing 2 QoS tenancy
  • 29. Taming Distributed Systems @NSilnitsky kafka events to webhooks Fixed routing WWW Kafka to Webhooks Service 2
  • 30. Taming Distributed Systems @NSilnitsky Fixed Routing QoS tenancy – Slow / Fast WWW Kafka to Webhooks Service Restaurants orders App URL1 slow-responding.com 2
  • 31. Taming Distributed Systems @NSilnitsky Fixed Routing QoS tenancy – Slow / Fast Restaurants orders App URL1 slow-responding.com Restaurants Restaurants Restaurants Slow-respo 2s Back Front Enqueue Dequeue 2
  • 34. Taming Distributed Systems @NSilnitsky Fixed Routing QoS tenancy Webhooks Service Deployment QoS group 2 Webhooks Service Deployment QoS group 1 Webhooks Service Deployment QoS group 3 Kafka to Webhooks Service Shared Codebase 2
  • 35. Taming Distributed Systems @NSilnitsky Fixed Routing QoS tenancy Webhooks Service Deployment QoS group 2 Webhooks Service Deployment QoS group 1 Webhooks Service Deployment QoS group 3 Restaurants orders App URL2 slow-responding.com Restaurants orders App URL1 Filter restaurants-orders topic Filter slow-responding.com 2
  • 37. Taming Distributed Systems @NSilnitsky Dynamic routing based on QoS metrics? Webhooks Service QoS group 2 Webhooks Service QoS group 1 Webhooks Service QoS group 3 my-webhook.com Response time < 200ms
  • 38. Taming Distributed Systems @NSilnitsky Dynamic routing based on QoS metrics? Webhooks Service QoS group 2 Webhooks Service QoS group 1 Webhooks Service QoS group 3 my-webhook.com Response time > 200ms for 5 minutes
  • 39. Taming Distributed Systems @NSilnitsky Dynamic routing based on QoS metrics? Webhooks Service QoS group 2 Webhooks Service QoS group 1 Webhooks Service QoS group 3 my-webhook.com my-webhook.com
  • 40. Taming Distributed Systems @NSilnitsky Dynamic routing based on QoS metrics? Webhooks Service QoS group 2 Webhooks Service QoS group 1 Webhooks Service QoS group 3 my-webhook.com Response time > 2s for 5 minutes
  • 41. Taming Distributed Systems @NSilnitsky Dynamic routing based on QoS metrics? Webhooks Service QoS group 2 Webhooks Service QoS group 1 Webhooks Service QoS group 3 my-webhook.com my-webhook.com
  • 42. Taming Distributed Systems @NSilnitsky Dynamic routing based on QoS metrics? Webhooks Service QoS group 2 Webhooks Service QoS group 1 Webhooks Service QoS group 3 my-webhook.com
  • 43. Taming Distributed Systems @NSilnitsky Dynamic routing based on QoS metrics? Webhooks Service QoS group 2 Webhooks Service QoS group 1 Webhooks Service QoS group 3 Restaurants orders App URL2 slow-responding.com Restaurants orders App URL1 Filter restaurants-orders topic Filter slow-responding.com
  • 44. Taming Distributed Systems @NSilnitsky Fixed routing 3 WWW Wix-level traffic
  • 45. Taming Distributed Systems @NSilnitsky Fixed routing Wix traffic Editor Segment Public Segment * rendering high avail. 1
  • 46. Taming Distributed Systems @NSilnitsky Fixed routing Wix traffic Editor Segment Public /SSR Segment ▪ Site Editing ▪ Business management ▪ Route “shard” key E.g. *.wix.com ▪ Site rendering ▪ Route “shard” key E.g. {useName}.wixsite.com/{siteName} 2
  • 47. Taming Distributed Systems @NSilnitsky Dynamic Routing 4 High-throughput Reactions application
  • 48. Taming Distributed Systems @NSilnitsky Dynamic Routing High-throughput application • Reactions application • simple service for users to like/unlike posts or post a custom emoji as a reaction
  • 50. Taming Distributed Systems @NSilnitsky • Reactions application • simple service for users to like/unlike posts or post a custom emoji as a reaction • Key Design Considerations • High Read/Write Throughput - Read: 30K RPM, Write: 1K RPM • Low Latency Reads • Scalability * how to address requirements Dynamic Routing High-throughput application
  • 51. Taming Distributed Systems @NSilnitsky Kafka Topic Partitions Traffic Balancing ▪ Performance & Durability ▪ Parallel Processing & Order Guarantee ▪ Scalability
  • 52. Taming Distributed Systems @NSilnitsky Kafka Topic Partitions Traffic Balancing Broker Topic Partition Partition Partition Topic Partition Partition Partition Topic Partition Partition Partition Producer Consumer Producer Producer Consumer Consumer
  • 53. Taming Distributed Systems @NSilnitsky Kafka Topic Partitions Traffic Balancing Cluster Server 1 Server 2 C3 C4 C5 C6 P0 P3 P1 P2 Consumer Group A
  • 54. Taming Distributed Systems @NSilnitsky Kafka Topic Partitions Traffic Balancing Topic Partition Partition Partition Reactions App Reaction by user A Reactions Pod A Reactions Pod B
  • 55. Taming Distributed Systems @NSilnitsky Kafka Topic Partitions Traffic Balancing Topic Partition Partition Partition Reactions App Reaction by user A Reactions Pod A Reactions Pod B
  • 56. Taming Distributed Systems @NSilnitsky • Key Design Considerations • High Write Throughput • Low Latency Reads • Scalability Dynamic Routing High-throughput application
  • 57. Taming Distributed Systems @NSilnitsky • Key Design Considerations • High Write Throughput • Low Latency Reads • Scalability We handled traffic bottleneck, what about data persistence? * only strong weakest link Dynamic Routing High-throughput application
  • 58. Taming Distributed Systems @NSilnitsky Dynamic sharding 5 High-throughput Reactions application
  • 59. Taming Distributed Systems @NSilnitsky Enter – DynamoDB (Data) Sharding ▪ High-throughput, large-scale applications ▪ Partitioning data across multiple nodes ▪ Distributes workload evenly ▪ Automatically adjusts
  • 60. Taming Distributed Systems @NSilnitsky DynamoDB Intro Table: Source: Amazon AWS
  • 61. Taming Distributed Systems @NSilnitsky DynamoDB Sharding High-throughput application • Reactions application • DynamoDB Table Structure Partition Key: appDef/itemId (image gallery, forum, blog) Sort Key: userId Attributes: list of reactions UserId David Laura Michael Jane AppId/ ItemId
  • 62. Taming Distributed Systems @NSilnitsky DynamoDB Tables and partitions
  • 63. Taming Distributed Systems @NSilnitsky DynamoDB Partition splitting and merging 10 GB of data 3,000 read capacity units 1,000 write capacity units * exceeding, triggers splitting, conversely, merging. single item…
  • 64. Taming Distributed Systems @NSilnitsky DynamoDB Partition Table
  • 65. Taming Distributed Systems @NSilnitsky DynamoDB Partition hash Sort Key
  • 66. Taming Distributed Systems @NSilnitsky DynamoDB Cost increases with capacity Throughput Cost
  • 68. Taming Distributed Systems @NSilnitsky Dynamic routing (Fixed data sharding) 6 WWW Data Locality for Enterprise customers
  • 69. Taming Distributed Systems @NSilnitsky Types of headache
  • 70. Taming Distributed Systems @NSilnitsky Geo Sharding Big Enterprise Customers ▪ Data locality e.g. GDPR ▪ EU/US only setup ▪ configurable
  • 71. Taming Distributed Systems @NSilnitsky Geo Sharding Data locality Cluster with nodes only in USA Global Clusters Cluster with nodes only in EU U S E U
  • 72. Taming Distributed Systems @NSilnitsky MySQL Routing via ProxySQL • ProxySQL is high performance load balancer for MySQL • Intelligent Query Routing • Enhanced High Availability • Sharding Support
  • 73. Taming Distributed Systems @NSilnitsky MySQL Routing via ProxySQL
  • 74. Taming Distributed Systems @NSilnitsky Query rules + Routing hints
  • 75. Taming Distributed Systems @NSilnitsky Future Sharding Strategies ▪ Breaking large database to smaller units ▪ Dedicated database cluster for a specific customer ▪ Supporting additional regional constraints * By using query rule
  • 76. Taming Distributed Systems @NSilnitsky Takeaways
  • 77. Taming Distributed Systems @NSilnitsky Horizontal Scaling increases complexity ▪ Modeling & architecture ▪ Migrations ▪ Mental complexity
  • 78. Taming Distributed Systems @NSilnitsky Horizontal scaling Tech
  • 79. Taming Distributed Systems @NSilnitsky Horizontal scaling Tech
  • 80. Taming Distributed Systems @NSilnitsky Horizontal scaling Tech
  • 81. Taming Distributed Systems @NSilnitsky Horizontal scaling Tech
  • 82. Taming Distributed Systems @NSilnitsky How to split Key space? Fixed segments Cluster A Cluster B Cluster C Cluster D Small group of tenants QoS / SLA GEO US … … EU 0 Latency … “regular” abusers
  • 83. Taming Distributed Systems @NSilnitsky How to split Key space? Dynamic routing Hash function - % Ranges WildCard / Regex Tenant Mapping
  • 84. Taming Distributed Systems @NSilnitsky The scaling + sharding decision tree Manually Partition your Deployment/Table Use rules to determine Shard Use Hash Key Add more compute/Memory Degraded Performance Small machine / low cost? small set of “tenants”? Custom Routing? No, scale horizontally No, too many “tenants” No rule, go wild! Yes, preset rules Yes, Fixed sharding Yes, Scale vertically
  • 85. Taming Distributed Systems @NSilnitsky The scaling + sharding decision tree Manually Partition your Deployment/Table Use rules to determine Shard Use Hash Key Add more compute/Memory Degraded Performance Scale vertically? Naïve Sharding? Custom Routing? No, too expansive No, too many “tenants” No rule, go wild! Yes, preset rules Yes, small set of “tenants” Yes, cost-effective You can also mix Fixed with Dynamic
  • 87. Taming Distributed Systems Q & A Thank you natansilnitsky www.natansil.com @NSilnitsky 👉 slideshare.net/NatanSilnitsky