SlideShare a Scribd company logo
1 of 10
RESILIENT SYSTEM DESIGN
June 2013
Risk & Compliance Engineering, PayPal
Pradeep Ballal
Staale Nerboe
Greg Berry
This deck contains generic architecture information, and does not
reflect the exact details of current or planned systems.
PROBLEM DEFINITION AND SOLUTION
Problem
In a distributed, virtualized environment, system failures are inevitable.
Solution
Isolate functionality to enable independent implementation of appropriate availability
patterns and increase velocity/flexibility of fixes.
Use asynchronous reconciliation to resolve failures without affecting overall customer
experience.
2 Confidential and Proprietary
PPaaS
Circuit Breakers
Clients
Service Container
Circuit Breakers
3
HIGH LEVEL ARCHITECTURE
Confidential and Proprietary
Dependency Dependency
Dependency Dependency Dependency
Dependency
Orchestration/Response Consolidation
Request
Request
Request
Request
Component Container
Functional Component Functional Component
Dependency Dependency DependencyDependency
Functional Component (FC): Isolated set
of functionality that can be developed,
deployed and executed independently.
• Fits well into the Agile Development
methodology
• Fallback behavior defined
Service Container (SC): Contains
infrastructure to orchestrate FCs and
handle response consolidation and
initiate reconciliation during failure.
• Component based model (e.g. OSGi)
including support for hot deploy of FCs
without downtime for service
• Malfunctioning FCs will quickly show and
can be handled dynamically by properties
or real time deployments
• Provide meaningful response back to clients
4
SERVICE CONTAINER
Confidential and Proprietary
Service Container (SC): Contains infrastructure to orchestrate FCs and handle response
consolidation and initiate reconciliation during failure.
• Build on top of PayPal Platform as a Services (PPaaS)
• Component based model (e.g. OSGi) including support for hot deploy of FCs without downtime
for service
• Enforces the concepts of coarse grained services
• Malfunctioning FCs will quickly show and can be handled dynamically by properties or real time
deployments
• Provide meaningful response back to clients
• Non-intrusive on the clients
Functional Component (FC): Isolated set of functionality that can be developed,
deployed and executed independently.
• Fits well into the Agile Development methodology
• FCs can fail independently
• Fallback behavior defined
Clients
FALLBACK
5 Confidential and Proprietary
To create a resilient system each Functional Component and Dependency SHOULD fail
gracefully and have Fallback Behavior. This can be achieved by utilizing a framework
that enforces normalized behavior across the platform.
PS: Fallback Behavior should not be an
afterthought but should be detailed
out in the design in conjunction with
your business partners.
FAILURE
Request
Functional Component / Dependency
Circuit Breakers (Local / Global)
Logging / Monitoring
Normal
Behavior
Fallback
Behavior
Clients
CIRCUIT BREAKERS*
6 Confidential and Proprietary
Circuit Breakers (CB)s serve these purposes:
• It protects the clients from slow or broken FCs
• It protects services from demand in excess
of capacity
• And most importantly it protects the
Business from malfunctioning code by
tracking negative actions (like decline
payment) and if abnormal behavior is
found, shuts down the FC
*Concept first discussed in the excellent book Release It! by Michael Nygard.
Example open source implementation by Netflix: https://github.com/Netflix/Hystrix/wiki
CBs are named after their counterparts
in the physical world.
Local CBs: Track the health of services
Global CBs: Tracks negative behavior
that impacts the Business or health of
overall system
Service ContainerService Container
Request
Request
Functional Component
Dependency
Orchestration/
Response Consolidation
Circuit Breakers (Global)
Circuit Breakers
Request
Request
Functional Component
Dependency
Circuit BreakersConfig
Orchestration/
Response Consolidation
DATA ACCESS – NEED MORE
7 Confidential and Proprietary
Globally Distributed
• You can’t have a single system of record that contains all data
• Latency matters (you can’t go faster than the speed of light)
• There must be a way to partition data and processing
Always Available
• Everything needs to be redundant (or dispensable)
• Can’t have a single point of failure
Shares Nothing
• Systems must be able to run completely
independently
Read
ReplicasRead
Replicas
SoR
Journal
Read Service Life Cycle (CRUD) Service
Latency Bridge
Replay
Clients
8
EVENTUALLY CONSISTENT*
Confidential and Proprietary
CAP theorem: States that of three properties of distributed -data systems—data
consistency, system availability, and tolerance to network partition—only two can be
achieved at any given time.
To account for this fact a reconciliation system is required to identify issues and try to
correct them automatically. Only as a last resort should a Manual Review should be
conducted.
Design considerations:
• Limited DB table scanning: System should not rely on heavy DB table scanning and heavy
queries. If required this SHOULD be done in a DW or on a hadoop cluster and feed back into the
real time system.
• Non-intrusive: Listening only to events from other systems, SHOULD NOT touch code in other
parts of the system (and hence don’t need to get on their road map).
Types of reconciliation:
• Stateless: Only depend on the data in the request.
• Stateful: Depends on business processes and states when failure occurred. Hence when the
system failed may matter in the outcome of the reconciliation.
*See excellent paper “Eventually Consistent” by Werner Vogels, CTO Amazon
Service Container
Clients
9
DETAILED DESIGN
Confidential and Proprietary
Service Container
Request
Request
Functional Component
Dependency
Orchestration/Response
Consolidation
Circuit Breakers (Global)
Circuit Breakers
Request
Request
Functional Component
Dependency
Orchestration/Response
Consolidation
Circuit BreakersConfig
SoR
Reconciliation
&
Actions
Queue
Events
Reports (Manual)
Reconcile
THANK
YOU
10 Confidential and Proprietary

More Related Content

Recently uploaded

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 

Featured

How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellSaba Software
 
Introduction to C Programming Language
Introduction to C Programming LanguageIntroduction to C Programming Language
Introduction to C Programming LanguageSimplilearn
 

Featured (20)

How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
 
Introduction to C Programming Language
Introduction to C Programming LanguageIntroduction to C Programming Language
Introduction to C Programming Language
 

PayPal resilient system design

  • 1. RESILIENT SYSTEM DESIGN June 2013 Risk & Compliance Engineering, PayPal Pradeep Ballal Staale Nerboe Greg Berry This deck contains generic architecture information, and does not reflect the exact details of current or planned systems.
  • 2. PROBLEM DEFINITION AND SOLUTION Problem In a distributed, virtualized environment, system failures are inevitable. Solution Isolate functionality to enable independent implementation of appropriate availability patterns and increase velocity/flexibility of fixes. Use asynchronous reconciliation to resolve failures without affecting overall customer experience. 2 Confidential and Proprietary
  • 3. PPaaS Circuit Breakers Clients Service Container Circuit Breakers 3 HIGH LEVEL ARCHITECTURE Confidential and Proprietary Dependency Dependency Dependency Dependency Dependency Dependency Orchestration/Response Consolidation Request Request Request Request Component Container Functional Component Functional Component Dependency Dependency DependencyDependency Functional Component (FC): Isolated set of functionality that can be developed, deployed and executed independently. • Fits well into the Agile Development methodology • Fallback behavior defined Service Container (SC): Contains infrastructure to orchestrate FCs and handle response consolidation and initiate reconciliation during failure. • Component based model (e.g. OSGi) including support for hot deploy of FCs without downtime for service • Malfunctioning FCs will quickly show and can be handled dynamically by properties or real time deployments • Provide meaningful response back to clients
  • 4. 4 SERVICE CONTAINER Confidential and Proprietary Service Container (SC): Contains infrastructure to orchestrate FCs and handle response consolidation and initiate reconciliation during failure. • Build on top of PayPal Platform as a Services (PPaaS) • Component based model (e.g. OSGi) including support for hot deploy of FCs without downtime for service • Enforces the concepts of coarse grained services • Malfunctioning FCs will quickly show and can be handled dynamically by properties or real time deployments • Provide meaningful response back to clients • Non-intrusive on the clients Functional Component (FC): Isolated set of functionality that can be developed, deployed and executed independently. • Fits well into the Agile Development methodology • FCs can fail independently • Fallback behavior defined
  • 5. Clients FALLBACK 5 Confidential and Proprietary To create a resilient system each Functional Component and Dependency SHOULD fail gracefully and have Fallback Behavior. This can be achieved by utilizing a framework that enforces normalized behavior across the platform. PS: Fallback Behavior should not be an afterthought but should be detailed out in the design in conjunction with your business partners. FAILURE Request Functional Component / Dependency Circuit Breakers (Local / Global) Logging / Monitoring Normal Behavior Fallback Behavior
  • 6. Clients CIRCUIT BREAKERS* 6 Confidential and Proprietary Circuit Breakers (CB)s serve these purposes: • It protects the clients from slow or broken FCs • It protects services from demand in excess of capacity • And most importantly it protects the Business from malfunctioning code by tracking negative actions (like decline payment) and if abnormal behavior is found, shuts down the FC *Concept first discussed in the excellent book Release It! by Michael Nygard. Example open source implementation by Netflix: https://github.com/Netflix/Hystrix/wiki CBs are named after their counterparts in the physical world. Local CBs: Track the health of services Global CBs: Tracks negative behavior that impacts the Business or health of overall system Service ContainerService Container Request Request Functional Component Dependency Orchestration/ Response Consolidation Circuit Breakers (Global) Circuit Breakers Request Request Functional Component Dependency Circuit BreakersConfig Orchestration/ Response Consolidation
  • 7. DATA ACCESS – NEED MORE 7 Confidential and Proprietary Globally Distributed • You can’t have a single system of record that contains all data • Latency matters (you can’t go faster than the speed of light) • There must be a way to partition data and processing Always Available • Everything needs to be redundant (or dispensable) • Can’t have a single point of failure Shares Nothing • Systems must be able to run completely independently Read ReplicasRead Replicas SoR Journal Read Service Life Cycle (CRUD) Service Latency Bridge Replay Clients
  • 8. 8 EVENTUALLY CONSISTENT* Confidential and Proprietary CAP theorem: States that of three properties of distributed -data systems—data consistency, system availability, and tolerance to network partition—only two can be achieved at any given time. To account for this fact a reconciliation system is required to identify issues and try to correct them automatically. Only as a last resort should a Manual Review should be conducted. Design considerations: • Limited DB table scanning: System should not rely on heavy DB table scanning and heavy queries. If required this SHOULD be done in a DW or on a hadoop cluster and feed back into the real time system. • Non-intrusive: Listening only to events from other systems, SHOULD NOT touch code in other parts of the system (and hence don’t need to get on their road map). Types of reconciliation: • Stateless: Only depend on the data in the request. • Stateful: Depends on business processes and states when failure occurred. Hence when the system failed may matter in the outcome of the reconciliation. *See excellent paper “Eventually Consistent” by Werner Vogels, CTO Amazon
  • 9. Service Container Clients 9 DETAILED DESIGN Confidential and Proprietary Service Container Request Request Functional Component Dependency Orchestration/Response Consolidation Circuit Breakers (Global) Circuit Breakers Request Request Functional Component Dependency Orchestration/Response Consolidation Circuit BreakersConfig SoR Reconciliation & Actions Queue Events Reports (Manual) Reconcile

Editor's Notes

  1. Mr. Pradeep Ballal works as a Senior Architect in the Core Service Product Development with specific focus on Compliance and Risk products with PayPal Singapore. Mr. Ballal is a software generalist with 13 years of technology experience and has special interest in decision management, business rules, enterprise software and architectures. Mr. Staale Nerboe (snerboe@paypal.com) works as a Senior Architect in the Core Service Product Development organization withPayPal Singapore. Mr. Nerboe has 15+ years of Technology Consulting and Software Architecture experience for large global companies world-wide.Mr. Greg Berry (gberry@paypal.com) works as a Principle Architect at PayPal in the Core Services organization. Greg has been an architect in the payments industry for more than 15 years.
  2. In a complex system you will see multiple levels of fallback behavior, like a onion. Also, a fallback behavior can also have fallback. E.g. as a last resort if only log and return an error message to the client.
  3. CBs can be implemented in various ways including Complex Event Processing (CEP), Database, Global Cache, or any other fast storage media. It needs to support fast read/write, but also be able to handle rolling windows, like last 5 minutes, 1 hour, 24 hours. This gets complex in an environment where there volume of service invocations are high (e.g. with large number of invocations or