SlideShare a Scribd company logo
1 of 14
PRESENTED BY
Creating a Highly Available Persistent
Session Management Service with Redis
and a Connection Pooling Proxy
Lead Software Engineer, Zulily
Mohamed Elmergawi
2
A NEW STORE EVERY DAY
Thousands of products at brag-worthy prices
INSPIRED, DISCOVERY-DRIVEN EXPERIENCE
without specific purchase intent
HIGHLY CURATED SALES EVENTS
100+ time-limited sales (72 hours)
A DAILY DESTINATION
75% orders via mobile (Q319)
MASSIVELY PERSONALIZED APPROACH
Launch millions of versions of the site/app
daily
GLOBAL MARKETPLACE
15,000+ vendors including Under Armour,
Cuisinart, Melissa & Doug
ZULILY’S BUSINESS CREATES INTERESTING TECHNICAL
CHALLENGES
PRESENTED BY
A reliable global session service is critical:
• If it goes down, you can't serve customers
• Infrastructure is volatile; we need persistence
• Speed is key
“Everything fails all the time” - Werner Vogels, CTO Amazon
Problem Definition
PRESENTED BY
• No HA: a hardware or
network degradation
leads to a failure
• Sharding logic is coupled
in the application level
• Requires manual
intervention to promote
a slave to master
• Limits global expansion
• Idle slave nodes
Legacy Architecture
APP CLUSTER
TWEMPROXY
R/W
REST API
APPLICATION CLUSTER
TWEMPROXY
Master Node
SLAVE NODE
R/W
Async
Replica
SITE CLUSTER
TWEMPROXY
R/W
Master Node
SLAVE NODE
R/W
. . .
. . .
Async
Replica
PRESENTED BY
Redis Cluster
• Not suited for applications that require availability in the event
of large net splits
• Active passive mode
Redis Sentinel
• The sharding logic would still be coupled with the application
• Active passive mode
Alternative Approaches
PRESENTED BY
New Architecture
• Connection Pooling Proxy
• Session Service
• Real-time Replications
Session service
1
Redis
+
Dynomite
Redis
+
Dynomite
Redis
+
Dynomite
Redis
+
Dynomite
SITE CLUSTER
PROXY
ALB
APP CLUSTER
PROXY
ALB
Session service
n
Session service
2
. . .
. . . . . .
. . .
. . .
Redis
+
Dynomite
Redis
+
Dynomite
Redis
+
Dynomite
Redis
+
Dynomite
Redis
+
Dynomite
Region c
Region a Region b
PRESENTED BY
• Reduces the overhead associated with establishing a new
connection
• Leverages existing connections efficiently
• Constrains the total number of connections
Connection Pooling Proxy for Site and App Cluster Nodes
PRESENTED BY
• Request routing based on consistent hashing (Murmur hash)
• Traffic distribution based on GEO location
• Topology aware load balancing (Token Aware)
• Request rerouting based on failed functional or latency health
checks
Session Service
a1
a2
a3
0 - 100
101 - 200
201 - 300
PRESENTED BY
Session Service
Real-Time Replication between Redis Nodes via Dynomite
P2P and active/active approach
Data Center b
b1
b3
b2
Data Center a
a1
a3
a2
Data Center c
c1
c3
c2
session id 1 hash
session id 2 hash
Incoming write, with persistent hashing
Replication
PRESENTED BY
• Staged rollout
• Double write (Time T1)
• Copied data offline from the slave nodes (Prior to T1)
• Double read
• Data sanity checks
• Apply chaos engineering principles to the new system
Production Rollout
PRESENTED BY
250ms
Recovery window
Results
After simulating an outage on 2 out of 3 network partitions
0.42%
Failure rate
Simulated Outage
PRESENTED BY
• Scale can only happen in multiple hosts
• Higher network traffic volume
• Cross-AZ/Regions/DC traffic costs money
• Adding hosts to the ring is a manual process
Drawbacks
PRESENTED BY
• Connection Pooling Proxy
• Session Service
• Redis is not only a cache, it
is a persistent storage
• Design for failure
• Use Chaos
Engineering practices
• Replicate your data across
multiple regions and use
real time replication
Summary
Session service
1
Redis
+
Dynomite
Redis
+
Dynomite
Redis
+
Dynomite
Redis
+
Dynomite
SITE CLUSTER
PROXY
ALB
APP CLUSTER
PROXY
ALB
Session service
n
Session service
2
. . .
. . . . . .
. . .
. . .
Redis
+
Dynomite
Redis
+
Dynomite
Redis
+
Dynomite
Redis
+
Dynomite
Redis
+
Dynomite
Region c
Region a Region b
Thank You!
zulily.com/careers

More Related Content

What's hot

SQL Server 2016 New Security Features
SQL Server 2016 New Security FeaturesSQL Server 2016 New Security Features
SQL Server 2016 New Security FeaturesGianluca Sartori
 
RedisConf18 - Microservicesand Redis: A Match made in Heaven
RedisConf18 - Microservicesand Redis: A Match made in HeavenRedisConf18 - Microservicesand Redis: A Match made in Heaven
RedisConf18 - Microservicesand Redis: A Match made in HeavenRedis Labs
 
Trusted db a trusted hardware based database with privacy and data confidenti...
Trusted db a trusted hardware based database with privacy and data confidenti...Trusted db a trusted hardware based database with privacy and data confidenti...
Trusted db a trusted hardware based database with privacy and data confidenti...LeMeniz Infotech
 
JPD1418 TrustedDB: A Trusted Hardware-Based Database with Privacy and Data C...
JPD1418  TrustedDB: A Trusted Hardware-Based Database with Privacy and Data C...JPD1418  TrustedDB: A Trusted Hardware-Based Database with Privacy and Data C...
JPD1418 TrustedDB: A Trusted Hardware-Based Database with Privacy and Data C...chennaijp
 
Lessons From Deploying Redis On Azure For Enterprise Customers: Carl Dacosta,...
Lessons From Deploying Redis On Azure For Enterprise Customers: Carl Dacosta,...Lessons From Deploying Redis On Azure For Enterprise Customers: Carl Dacosta,...
Lessons From Deploying Redis On Azure For Enterprise Customers: Carl Dacosta,...Redis Labs
 
[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and Solutions
[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and Solutions[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and Solutions
[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and SolutionsWSO2
 
RedisConf18 - Video Experience Operational Insights in Real Time.
RedisConf18 - Video Experience Operational Insights in Real Time.RedisConf18 - Video Experience Operational Insights in Real Time.
RedisConf18 - Video Experience Operational Insights in Real Time.Redis Labs
 
5 infrastructure security
5 infrastructure security5 infrastructure security
5 infrastructure securityLen Bass
 
How to create custom dashboards in Elastic Search / Kibana with Performance V...
How to create custom dashboards in Elastic Search / Kibana with Performance V...How to create custom dashboards in Elastic Search / Kibana with Performance V...
How to create custom dashboards in Elastic Search / Kibana with Performance V...PerformanceVision (previously SecurActive)
 
Troubleshooting and Best Practices with WSO2 Enterprise Integrator
Troubleshooting and Best Practices with WSO2 Enterprise IntegratorTroubleshooting and Best Practices with WSO2 Enterprise Integrator
Troubleshooting and Best Practices with WSO2 Enterprise IntegratorWSO2
 
Moving Beyond Cache by Yiftach Shoolman - Redis Day Bangalore 2020
Moving Beyond Cache by Yiftach Shoolman - Redis Day Bangalore 2020Moving Beyond Cache by Yiftach Shoolman - Redis Day Bangalore 2020
Moving Beyond Cache by Yiftach Shoolman - Redis Day Bangalore 2020Redis Labs
 
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...Big Data Spain
 
Achieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environmentAchieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environmentRakuten Group, Inc.
 
Software defined storage real or bs-2014
Software defined storage real or bs-2014Software defined storage real or bs-2014
Software defined storage real or bs-2014Howard Marks
 
REST vs. Messaging For Microservices
REST vs. Messaging For MicroservicesREST vs. Messaging For Microservices
REST vs. Messaging For MicroservicesEberhard Wolff
 
Enterprise Cloud Platform - Keynote
Enterprise Cloud Platform - KeynoteEnterprise Cloud Platform - Keynote
Enterprise Cloud Platform - KeynoteNEXTtour
 
Integrating Hybrid Cloud Database-as-a-Service with Cloud Foundry’s Service​ ...
Integrating Hybrid Cloud Database-as-a-Service with Cloud Foundry’s Service​ ...Integrating Hybrid Cloud Database-as-a-Service with Cloud Foundry’s Service​ ...
Integrating Hybrid Cloud Database-as-a-Service with Cloud Foundry’s Service​ ...VMware Tanzu
 
2016 Internet Outages: Trends, Insights & Analysis
2016 Internet Outages: Trends, Insights & Analysis 2016 Internet Outages: Trends, Insights & Analysis
2016 Internet Outages: Trends, Insights & Analysis ThousandEyes
 
Hyperconverged Infrastructure, It's the Future
Hyperconverged Infrastructure, It's the FutureHyperconverged Infrastructure, It's the Future
Hyperconverged Infrastructure, It's the FutureHoward Marks
 

What's hot (20)

Securing Redis
Securing RedisSecuring Redis
Securing Redis
 
SQL Server 2016 New Security Features
SQL Server 2016 New Security FeaturesSQL Server 2016 New Security Features
SQL Server 2016 New Security Features
 
RedisConf18 - Microservicesand Redis: A Match made in Heaven
RedisConf18 - Microservicesand Redis: A Match made in HeavenRedisConf18 - Microservicesand Redis: A Match made in Heaven
RedisConf18 - Microservicesand Redis: A Match made in Heaven
 
Trusted db a trusted hardware based database with privacy and data confidenti...
Trusted db a trusted hardware based database with privacy and data confidenti...Trusted db a trusted hardware based database with privacy and data confidenti...
Trusted db a trusted hardware based database with privacy and data confidenti...
 
JPD1418 TrustedDB: A Trusted Hardware-Based Database with Privacy and Data C...
JPD1418  TrustedDB: A Trusted Hardware-Based Database with Privacy and Data C...JPD1418  TrustedDB: A Trusted Hardware-Based Database with Privacy and Data C...
JPD1418 TrustedDB: A Trusted Hardware-Based Database with Privacy and Data C...
 
Lessons From Deploying Redis On Azure For Enterprise Customers: Carl Dacosta,...
Lessons From Deploying Redis On Azure For Enterprise Customers: Carl Dacosta,...Lessons From Deploying Redis On Azure For Enterprise Customers: Carl Dacosta,...
Lessons From Deploying Redis On Azure For Enterprise Customers: Carl Dacosta,...
 
[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and Solutions
[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and Solutions[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and Solutions
[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and Solutions
 
RedisConf18 - Video Experience Operational Insights in Real Time.
RedisConf18 - Video Experience Operational Insights in Real Time.RedisConf18 - Video Experience Operational Insights in Real Time.
RedisConf18 - Video Experience Operational Insights in Real Time.
 
5 infrastructure security
5 infrastructure security5 infrastructure security
5 infrastructure security
 
How to create custom dashboards in Elastic Search / Kibana with Performance V...
How to create custom dashboards in Elastic Search / Kibana with Performance V...How to create custom dashboards in Elastic Search / Kibana with Performance V...
How to create custom dashboards in Elastic Search / Kibana with Performance V...
 
Troubleshooting and Best Practices with WSO2 Enterprise Integrator
Troubleshooting and Best Practices with WSO2 Enterprise IntegratorTroubleshooting and Best Practices with WSO2 Enterprise Integrator
Troubleshooting and Best Practices with WSO2 Enterprise Integrator
 
Moving Beyond Cache by Yiftach Shoolman - Redis Day Bangalore 2020
Moving Beyond Cache by Yiftach Shoolman - Redis Day Bangalore 2020Moving Beyond Cache by Yiftach Shoolman - Redis Day Bangalore 2020
Moving Beyond Cache by Yiftach Shoolman - Redis Day Bangalore 2020
 
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
 
Achieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environmentAchieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environment
 
Software defined storage real or bs-2014
Software defined storage real or bs-2014Software defined storage real or bs-2014
Software defined storage real or bs-2014
 
REST vs. Messaging For Microservices
REST vs. Messaging For MicroservicesREST vs. Messaging For Microservices
REST vs. Messaging For Microservices
 
Enterprise Cloud Platform - Keynote
Enterprise Cloud Platform - KeynoteEnterprise Cloud Platform - Keynote
Enterprise Cloud Platform - Keynote
 
Integrating Hybrid Cloud Database-as-a-Service with Cloud Foundry’s Service​ ...
Integrating Hybrid Cloud Database-as-a-Service with Cloud Foundry’s Service​ ...Integrating Hybrid Cloud Database-as-a-Service with Cloud Foundry’s Service​ ...
Integrating Hybrid Cloud Database-as-a-Service with Cloud Foundry’s Service​ ...
 
2016 Internet Outages: Trends, Insights & Analysis
2016 Internet Outages: Trends, Insights & Analysis 2016 Internet Outages: Trends, Insights & Analysis
2016 Internet Outages: Trends, Insights & Analysis
 
Hyperconverged Infrastructure, It's the Future
Hyperconverged Infrastructure, It's the FutureHyperconverged Infrastructure, It's the Future
Hyperconverged Infrastructure, It's the Future
 

Similar to Redis presentation

Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...Redis Labs
 
How Automation And Intelligence Can Simplify Your High Availability
How Automation And Intelligence Can Simplify Your High AvailabilityHow Automation And Intelligence Can Simplify Your High Availability
How Automation And Intelligence Can Simplify Your High AvailabilityPrecisely
 
管理向云的迁移过程
管理向云的迁移过程管理向云的迁移过程
管理向云的迁移过程ITband
 
NetSuite For Manufacturing _ Cloud Manufacturing Software for Modern Manufact...
NetSuite For Manufacturing _ Cloud Manufacturing Software for Modern Manufact...NetSuite For Manufacturing _ Cloud Manufacturing Software for Modern Manufact...
NetSuite For Manufacturing _ Cloud Manufacturing Software for Modern Manufact...Net at Work
 
Visualizing Your Network Health - Driving Visibility in Increasingly Complex...
Visualizing Your Network Health -  Driving Visibility in Increasingly Complex...Visualizing Your Network Health -  Driving Visibility in Increasingly Complex...
Visualizing Your Network Health - Driving Visibility in Increasingly Complex...DellNMS
 
VMworld 2015: Take Virtualization to the Next Level vSphere with Operations M...
VMworld 2015: Take Virtualization to the Next Level vSphere with Operations M...VMworld 2015: Take Virtualization to the Next Level vSphere with Operations M...
VMworld 2015: Take Virtualization to the Next Level vSphere with Operations M...VMworld
 
Embracing Failure - AzureDay Rome
Embracing Failure - AzureDay RomeEmbracing Failure - AzureDay Rome
Embracing Failure - AzureDay RomeAlberto Acerbis
 
Migrating IBM i Systems to the Cloud: Exploring the Pros and Cons
Migrating IBM i Systems to the Cloud: Exploring the Pros and ConsMigrating IBM i Systems to the Cloud: Exploring the Pros and Cons
Migrating IBM i Systems to the Cloud: Exploring the Pros and ConsPrecisely
 
Ransomware-Recovery-as-a-Service
Ransomware-Recovery-as-a-ServiceRansomware-Recovery-as-a-Service
Ransomware-Recovery-as-a-ServiceSagi Brody
 
High Scalability Network Performance Management for Enterprises
High Scalability Network Performance Management for EnterprisesHigh Scalability Network Performance Management for Enterprises
High Scalability Network Performance Management for EnterprisesCA Technologies
 
70% Improvement in Service and Product Delivery on Implementing DevOps
70% Improvement in Service and Product Delivery on Implementing DevOps70% Improvement in Service and Product Delivery on Implementing DevOps
70% Improvement in Service and Product Delivery on Implementing DevOpsCygnet Infotech
 
Mainframe Possible: Migrating a Mainframe to AWS
Mainframe Possible: Migrating a Mainframe to AWSMainframe Possible: Migrating a Mainframe to AWS
Mainframe Possible: Migrating a Mainframe to AWSAmazon Web Services
 
Application Darwinism - Why Most Enterprise Apps Will Evolve to the Cloud
Application Darwinism - Why Most Enterprise Apps Will Evolve to the CloudApplication Darwinism - Why Most Enterprise Apps Will Evolve to the Cloud
Application Darwinism - Why Most Enterprise Apps Will Evolve to the CloudSkytap Cloud
 
Contact Center Capabilities
Contact Center CapabilitiesContact Center Capabilities
Contact Center Capabilitiesservice007
 
Bluemix Local – Relay Options and Challenges
Bluemix Local – Relay Options and Challenges Bluemix Local – Relay Options and Challenges
Bluemix Local – Relay Options and Challenges Eduardo Patrocinio
 
Protecting Your Power Systems with Cloud-based HA/DR
Protecting Your Power Systems with Cloud-based HA/DRProtecting Your Power Systems with Cloud-based HA/DR
Protecting Your Power Systems with Cloud-based HA/DRPrecisely
 
Softlayer an IBM Compay . Connaissez vous le cloud de l'avenir
Softlayer an IBM Compay . Connaissez vous le cloud de l'avenir Softlayer an IBM Compay . Connaissez vous le cloud de l'avenir
Softlayer an IBM Compay . Connaissez vous le cloud de l'avenir Patrick Bouillaud
 
OSSF 2018 - Peter Crocker of Cumulus Networks - TCO and technical advantages ...
OSSF 2018 - Peter Crocker of Cumulus Networks - TCO and technical advantages ...OSSF 2018 - Peter Crocker of Cumulus Networks - TCO and technical advantages ...
OSSF 2018 - Peter Crocker of Cumulus Networks - TCO and technical advantages ...FINOS
 

Similar to Redis presentation (20)

Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
 
How Automation And Intelligence Can Simplify Your High Availability
How Automation And Intelligence Can Simplify Your High AvailabilityHow Automation And Intelligence Can Simplify Your High Availability
How Automation And Intelligence Can Simplify Your High Availability
 
管理向云的迁移过程
管理向云的迁移过程管理向云的迁移过程
管理向云的迁移过程
 
NetSuite For Manufacturing _ Cloud Manufacturing Software for Modern Manufact...
NetSuite For Manufacturing _ Cloud Manufacturing Software for Modern Manufact...NetSuite For Manufacturing _ Cloud Manufacturing Software for Modern Manufact...
NetSuite For Manufacturing _ Cloud Manufacturing Software for Modern Manufact...
 
Maximize the Cloud Today
Maximize the Cloud TodayMaximize the Cloud Today
Maximize the Cloud Today
 
Visualizing Your Network Health - Driving Visibility in Increasingly Complex...
Visualizing Your Network Health -  Driving Visibility in Increasingly Complex...Visualizing Your Network Health -  Driving Visibility in Increasingly Complex...
Visualizing Your Network Health - Driving Visibility in Increasingly Complex...
 
VMworld 2015: Take Virtualization to the Next Level vSphere with Operations M...
VMworld 2015: Take Virtualization to the Next Level vSphere with Operations M...VMworld 2015: Take Virtualization to the Next Level vSphere with Operations M...
VMworld 2015: Take Virtualization to the Next Level vSphere with Operations M...
 
Embracing Failure - AzureDay Rome
Embracing Failure - AzureDay RomeEmbracing Failure - AzureDay Rome
Embracing Failure - AzureDay Rome
 
Migrating IBM i Systems to the Cloud: Exploring the Pros and Cons
Migrating IBM i Systems to the Cloud: Exploring the Pros and ConsMigrating IBM i Systems to the Cloud: Exploring the Pros and Cons
Migrating IBM i Systems to the Cloud: Exploring the Pros and Cons
 
Ransomware-Recovery-as-a-Service
Ransomware-Recovery-as-a-ServiceRansomware-Recovery-as-a-Service
Ransomware-Recovery-as-a-Service
 
High Scalability Network Performance Management for Enterprises
High Scalability Network Performance Management for EnterprisesHigh Scalability Network Performance Management for Enterprises
High Scalability Network Performance Management for Enterprises
 
70% Improvement in Service and Product Delivery on Implementing DevOps
70% Improvement in Service and Product Delivery on Implementing DevOps70% Improvement in Service and Product Delivery on Implementing DevOps
70% Improvement in Service and Product Delivery on Implementing DevOps
 
Mainframe Possible: Migrating a Mainframe to AWS
Mainframe Possible: Migrating a Mainframe to AWSMainframe Possible: Migrating a Mainframe to AWS
Mainframe Possible: Migrating a Mainframe to AWS
 
Application Darwinism - Why Most Enterprise Apps Will Evolve to the Cloud
Application Darwinism - Why Most Enterprise Apps Will Evolve to the CloudApplication Darwinism - Why Most Enterprise Apps Will Evolve to the Cloud
Application Darwinism - Why Most Enterprise Apps Will Evolve to the Cloud
 
Contact Center Capabilities
Contact Center CapabilitiesContact Center Capabilities
Contact Center Capabilities
 
Bluemix Local – Relay Options and Challenges
Bluemix Local – Relay Options and Challenges Bluemix Local – Relay Options and Challenges
Bluemix Local – Relay Options and Challenges
 
Lithium: Event-Driven Network Control
Lithium: Event-Driven Network ControlLithium: Event-Driven Network Control
Lithium: Event-Driven Network Control
 
Protecting Your Power Systems with Cloud-based HA/DR
Protecting Your Power Systems with Cloud-based HA/DRProtecting Your Power Systems with Cloud-based HA/DR
Protecting Your Power Systems with Cloud-based HA/DR
 
Softlayer an IBM Compay . Connaissez vous le cloud de l'avenir
Softlayer an IBM Compay . Connaissez vous le cloud de l'avenir Softlayer an IBM Compay . Connaissez vous le cloud de l'avenir
Softlayer an IBM Compay . Connaissez vous le cloud de l'avenir
 
OSSF 2018 - Peter Crocker of Cumulus Networks - TCO and technical advantages ...
OSSF 2018 - Peter Crocker of Cumulus Networks - TCO and technical advantages ...OSSF 2018 - Peter Crocker of Cumulus Networks - TCO and technical advantages ...
OSSF 2018 - Peter Crocker of Cumulus Networks - TCO and technical advantages ...
 

Recently uploaded

AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGSIVASHANKAR N
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 

Recently uploaded (20)

AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 

Redis presentation

  • 1. PRESENTED BY Creating a Highly Available Persistent Session Management Service with Redis and a Connection Pooling Proxy Lead Software Engineer, Zulily Mohamed Elmergawi
  • 2. 2 A NEW STORE EVERY DAY Thousands of products at brag-worthy prices INSPIRED, DISCOVERY-DRIVEN EXPERIENCE without specific purchase intent HIGHLY CURATED SALES EVENTS 100+ time-limited sales (72 hours) A DAILY DESTINATION 75% orders via mobile (Q319) MASSIVELY PERSONALIZED APPROACH Launch millions of versions of the site/app daily GLOBAL MARKETPLACE 15,000+ vendors including Under Armour, Cuisinart, Melissa & Doug ZULILY’S BUSINESS CREATES INTERESTING TECHNICAL CHALLENGES
  • 3. PRESENTED BY A reliable global session service is critical: • If it goes down, you can't serve customers • Infrastructure is volatile; we need persistence • Speed is key “Everything fails all the time” - Werner Vogels, CTO Amazon Problem Definition
  • 4. PRESENTED BY • No HA: a hardware or network degradation leads to a failure • Sharding logic is coupled in the application level • Requires manual intervention to promote a slave to master • Limits global expansion • Idle slave nodes Legacy Architecture APP CLUSTER TWEMPROXY R/W REST API APPLICATION CLUSTER TWEMPROXY Master Node SLAVE NODE R/W Async Replica SITE CLUSTER TWEMPROXY R/W Master Node SLAVE NODE R/W . . . . . . Async Replica
  • 5. PRESENTED BY Redis Cluster • Not suited for applications that require availability in the event of large net splits • Active passive mode Redis Sentinel • The sharding logic would still be coupled with the application • Active passive mode Alternative Approaches
  • 6. PRESENTED BY New Architecture • Connection Pooling Proxy • Session Service • Real-time Replications Session service 1 Redis + Dynomite Redis + Dynomite Redis + Dynomite Redis + Dynomite SITE CLUSTER PROXY ALB APP CLUSTER PROXY ALB Session service n Session service 2 . . . . . . . . . . . . . . . Redis + Dynomite Redis + Dynomite Redis + Dynomite Redis + Dynomite Redis + Dynomite Region c Region a Region b
  • 7. PRESENTED BY • Reduces the overhead associated with establishing a new connection • Leverages existing connections efficiently • Constrains the total number of connections Connection Pooling Proxy for Site and App Cluster Nodes
  • 8. PRESENTED BY • Request routing based on consistent hashing (Murmur hash) • Traffic distribution based on GEO location • Topology aware load balancing (Token Aware) • Request rerouting based on failed functional or latency health checks Session Service a1 a2 a3 0 - 100 101 - 200 201 - 300
  • 9. PRESENTED BY Session Service Real-Time Replication between Redis Nodes via Dynomite P2P and active/active approach Data Center b b1 b3 b2 Data Center a a1 a3 a2 Data Center c c1 c3 c2 session id 1 hash session id 2 hash Incoming write, with persistent hashing Replication
  • 10. PRESENTED BY • Staged rollout • Double write (Time T1) • Copied data offline from the slave nodes (Prior to T1) • Double read • Data sanity checks • Apply chaos engineering principles to the new system Production Rollout
  • 11. PRESENTED BY 250ms Recovery window Results After simulating an outage on 2 out of 3 network partitions 0.42% Failure rate Simulated Outage
  • 12. PRESENTED BY • Scale can only happen in multiple hosts • Higher network traffic volume • Cross-AZ/Regions/DC traffic costs money • Adding hosts to the ring is a manual process Drawbacks
  • 13. PRESENTED BY • Connection Pooling Proxy • Session Service • Redis is not only a cache, it is a persistent storage • Design for failure • Use Chaos Engineering practices • Replicate your data across multiple regions and use real time replication Summary Session service 1 Redis + Dynomite Redis + Dynomite Redis + Dynomite Redis + Dynomite SITE CLUSTER PROXY ALB APP CLUSTER PROXY ALB Session service n Session service 2 . . . . . . . . . . . . . . . Redis + Dynomite Redis + Dynomite Redis + Dynomite Redis + Dynomite Redis + Dynomite Region c Region a Region b

Editor's Notes

  1. Lead Engineer for E Commence platform  team at ZULILY. I WILL TALK ABOUT HOW AT  ZULIY USED REDIS TO BUILD A HIGHLY AVAILABLE  PERSISTENT SESSION MANAGEMENT  HOW Zulily BUSINESS MODEL CREATED its specific  technical challenges and the role of session management
  2. Zulily  business model is all about discovery driven experience ,Our customers comes to site/apps to discover and enjoy liking going to a mall or a boutique Zulily launches a new story every day which is technically launching   millions of  personalized versions of the site/app daily  That translates to specific technical challenges. -Nature of traffic is spikey which means time  warm the cache is not an option. -Speed is critical , -Customer session  flow  is critical for a smooth discovery  and is called per every single  request.
  3. A Reliable global session service is critical:  It goes down, you can't serve customers ,As every single request to apps or site requires a session. Infrastructure is volatile; we need persistence  Speed is critical As engineers the main fact we believe in is “Every thing fails , All the time”  Bad code push  Hard ware failure Network Latency Regions /AZ outage. That brings us to the reason we’re here today – to discuss how we at Zulily evolved our infrastructure to a more distributed system with the help of Redis – to create a more reliable experience. In retail, a session service is critical – especially if your footprint is global. But – we all know this familiar quote from Werner Vogels. Failure is bound to happen – our jobs as engineers are to plan for failure – and to ensure that no matter what, we can serve the customer. Session management service sharded across multiple  AZs One AZ Outage   % of customers Business Impact == $$$
  4. Typical architecture Client Layer  (Apps and Site cluster ) Twem Proxy  (Twemproxy played the role of proxy and connection pooling was deployed on every client machine . ) Application ayer :Sharding logic coupled with the application layer, Session service shared same resource with other application  resources. Customer session lived in (Redis as permanent storage with slave nodes as back ups) Problems 1-Not HA (Losing hardware/network partition will lead to outage) ,  Network Latency will lead degraded experience. 2-Sharding is coupled which limited  scaling and  global expansion off Zulily.We want out data close to our customers. Losing an AWS AZ caused us major outage and degraded experience. As session data  is used for every request to Zulily app.This was not acceptable. That brings us to the reason we’re here today – to discuss how we at Zulily evolved our infrastructure to a more distributed system with the help of Redis – to create a more reliable experience. In retail, a session service is critical – especially if your footprint is global. But – we all know this familiar quote from Werner Vogels. Failure is bound to happen – our jobs as engineers are to plan for failure – and to ensure that no matter what, we can serve the customer. Session management service sharded across multiple  AZs One AZ Outage   % of customers Business Impact == $$$
  5. That brings us to the reason we’re here today – to discuss how we at Zulily evolved our infrastructure to a more distributed system with the help of Redis – to create a more reliable experience. In retail, a session service is critical – especially if your footprint is global. But – we all know this familiar quote from Werner Vogels. Failure is bound to happen – our jobs as engineers are to plan for failure – and to ensure that no matter what, we can serve the customer. Session management service sharded across multiple  AZs One AZ Outage   % of customers Business Impact == $$$
  6. Client  Replaced twem  proxy with a custom proxy as client no more directly connect to  Redis  acting as TCP connection pooling  Server Used consistent hashing and abstracted sharding logic and geo location detection to a new service scaling horizontally. Storage  Used Redis as storage layer  distributed across multiple regions in a ring topology for consistent hashing  and we used  dynamite for replication across regions/data centers. Now I will deep dive in every layer the client ,Server and Data layers. --------------------------------- What did we need ? Highly Available, Geo distributed and Scalable Tolerate hardware/partition failures and network degradation Seamless Customer Experience 1,000,000s  of requests
  7. Connection Pooling every node in the app and site cluster. Overhead of establishing a new TCP connection collecting metrics (Service Mesh/ Envoy Proxy) Leverages existing connection  Constrains total open connections against load balancer. ------------------------------------------------------------- That brings us to the reason we’re here today – to discuss how we at Zulily evolved our infrastructure to a more distributed system with the help of Redis – to create a more reliable experience. In retail, a session service is critical – especially if your footprint is global. But – we all know this familiar quote from Werner Vogels. Failure is bound to happen – our jobs as engineers are to plan for failure – and to ensure that no matter what, we can serve the customer. Session management service sharded across multiple  AZs One AZ Outage   % of customers Business Impact == $$$
  8. We got rid of master/slave approach and used P2P  using dynamite a netflix open source project  for replication across regions. Data center definition is just a virtual grouping , regions or az or even on premises Read Request Life cycle, Consistent Hashing by service layer. Service layer will route to the right node in the ring that has the data.(Either going to A1 ,A2 , A3) ------------------------------------------
  9. That brings us to the reason we’re here today – to discuss how we at Zulily evolved our infrastructure to a more distributed system with the help of Redis – to create a more reliable experience. In retail, a session service is critical – especially if your footprint is global. But – we all know this familiar quote from Werner Vogels. Failure is bound to happen – our jobs as engineers are to plan for failure – and to ensure that no matter what, we can serve the customer. Session management service sharded across multiple  AZs One AZ Outage   % of customers Business Impact == $$$
  10. [FA] the actual graph isn't important other than to show latency remained flat, maybe add vertical lines to show when the network was killed... 
  11. That brings us to the reason we’re here today – to discuss how we at Zulily evolved our infrastructure to a more distributed system with the help of Redis – to create a more reliable experience. In retail, a session service is critical – especially if your footprint is global. But – we all know this familiar quote from Werner Vogels. Failure is bound to happen – our jobs as engineers are to plan for failure – and to ensure that no matter what, we can serve the customer. Session management service sharded across multiple  AZs One AZ Outage   % of customers Business Impact == $$$
  12. That brings us to the reason we’re here today – to discuss how we at Zulily evolved our infrastructure to a more distributed system with the help of Redis – to create a more reliable experience. In retail, a session service is critical – especially if your footprint is global. But – we all know this familiar quote from Werner Vogels. Failure is bound to happen – our jobs as engineers are to plan for failure – and to ensure that no matter what, we can serve the customer. Session management service sharded across multiple  AZs One AZ Outage   % of customers Business Impact == $$$