SlideShare a Scribd company logo
1 of 29
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Comment se crasher avec classe
Guillaume Marchand
Solutions Architect AWS France
A W S M E E T U P P A R I S
# E F F E T C A P I T A L
https://www.youtube.com/watch?v=BJoEGR47rk0
https://www.youtube.com/watch?v=MCpjEiemsRg
mesures rapides
P O U R É V I T E R L A C A T A S T R O P H E
Ma stratégie de healthchecks ?
“ O N N E M ’ A V A I T P A S D I T Q U I F A L L A I T Q U E J ’ E N P R E N N E … ”
Ma stratégie de healthchecks ?
Liveness checks Local health checks Dependency health checks
Le DNS
“ I T ’ S A L W A Y S D N S ”
Traffic Flow
Le DNS
Amazon Route 53
DNS Failover
CACHE DNS
Le CDN
C ’ E S T L A V I E , M Ê M E P O U R L E C O N T E N U D Y N A M I Q U E
Le CDN
Amazon Cloudfront
cache.monsite.com CNAME xxx.cloudfront.net
Default (*) : Min TTL=2s
Q: Origine non disponible ?
Q: Mise en cache des pages d’erreur ?
Bonus : If-Modified-Since → HTTP 304
Feature: Origin Failover
Mon site est la page d’erreur 500
Mon site est la page d’erreur 500
Mon site est la page d’erreur 500
Mon site est la page d’erreur 500
Amazon Simple
Storage Service (S3)
Amazon CloudFront
Amazon API Gateway AWS Lambda Amazon DynamoDB
Et les timeouts ?
E U H H … I L S V O N T B I E N …
Et les timeouts ?
Q: Quelle valeur choisir ? p99.9
ElastiCache
for Redis
MySQL
instance
API
Application
Load balancer
CDN
? ?
?
?
?
Comment se sont passés les tests de charge ?
O N N ’ E N F A I T P A S , C E N ’ E S T P A S R E P R É S E N T A T I F D U T R A F F I C R É E L
tests
Comment se sont passés les tests de charge ?
Comment se sont passés les tests de charge ?
Comment se sont passés les tests de charge ?
“Max Connections” ?
J U S Q U ’ À L ’ I N F I N I E T A U D E L À …
A Ï E . . S Q L
Amazon RDS / Aurora
Tester le Failover
”A failure event results in a brief interruption, during which read and write operations fail
with an exception. However, service is typically restored in less than 120 seconds, and
often less than 60 seconds.”
Read Replica
Séparer les “insert” des ”select” (PHP, Java)
Amazon RDS Proxy
“With RDS Proxy, failover times for Aurora and RDS databases are reduced by
up to 66%”
Amazon RDS / Aurora
Conclusion
F A I T E S L E A U J O U R D ’ H U I
1. Healthchecks
2. DNS
3. CDN
4. Page d’erreur
Conclusion
5. Timeouts
6. Tests de charge
7. Max Connections
8. Bases de données
Et ensuite ?
Quand je crash …
Atténuer l’impact
NON
NON
NON
NON
NON
Healthcheck
AWS Well-Architected Framework > Operational Excellence > “OPS 8: How do you understand the health of your workload?”
Amazon Builders’ Library > ”Implementing health checks”
Workshop ”Health check and dependencies”
Timeout
Amazon Builders’ Library > “Timeouts, retries and backoff with jitter”
“Resources consumed by idle PostgreSQL connections”
Gestion d’un incident
Session AWS Reinvent 2020 : Incident management in a distributed organization
AWS Gameday
Test de charge
Distributed Load Testing on AWS
Tests de résilience
AWS Fault Injection Simulator
Et ensuite ?
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Merci !
Freepik smashicons
https://www.linkedin.com/in/guillaumemarchand/

More Related Content

What's hot

How To Lock Down And Secure Your Wordpress
How To Lock Down And Secure Your WordpressHow To Lock Down And Secure Your Wordpress
How To Lock Down And Secure Your WordpressChelsea O'Brien
 
State of the resource timing api
State of the resource timing apiState of the resource timing api
State of the resource timing apiAaron Peters
 
Your Script Just Killed My Site
Your Script Just Killed My SiteYour Script Just Killed My Site
Your Script Just Killed My SiteSteve Souders
 
Defeating Cross-Site Scripting with Content Security Policy
Defeating Cross-Site Scripting with Content Security PolicyDefeating Cross-Site Scripting with Content Security Policy
Defeating Cross-Site Scripting with Content Security PolicyFrancois Marier
 
Content Security Policy
Content Security PolicyContent Security Policy
Content Security PolicyRyan LaBouve
 
Protecting Web App users in today’s hostile environment
Protecting Web App users in today’s hostile environmentProtecting Web App users in today’s hostile environment
Protecting Web App users in today’s hostile environmentajitdhumale
 
Using jsPerf correctly
Using jsPerf correctlyUsing jsPerf correctly
Using jsPerf correctlyMathias Bynens
 
10 Excellent Ways to Secure Your Spring Boot Application - The Secure Develop...
10 Excellent Ways to Secure Your Spring Boot Application - The Secure Develop...10 Excellent Ways to Secure Your Spring Boot Application - The Secure Develop...
10 Excellent Ways to Secure Your Spring Boot Application - The Secure Develop...Matt Raible
 
Word camp pune 2013 security
Word camp pune 2013   securityWord camp pune 2013   security
Word camp pune 2013 securityGaurav Singh
 
Creating Secure Web Apps: What Every Developer Needs to Know About HTTPS Today
Creating Secure Web Apps: What Every Developer Needs to Know About HTTPS TodayCreating Secure Web Apps: What Every Developer Needs to Know About HTTPS Today
Creating Secure Web Apps: What Every Developer Needs to Know About HTTPS TodayHeroku
 
Content Security Policy - The application security Swiss Army Knife
Content Security Policy - The application security Swiss Army KnifeContent Security Policy - The application security Swiss Army Knife
Content Security Policy - The application security Swiss Army KnifeScott Helme
 
Adventure Time with JavaScript & Single Page Applications
Adventure Time with JavaScript & Single Page ApplicationsAdventure Time with JavaScript & Single Page Applications
Adventure Time with JavaScript & Single Page ApplicationsFITC
 
Csdn Drdobbs Tenni Theurer Yahoo
Csdn Drdobbs Tenni Theurer YahooCsdn Drdobbs Tenni Theurer Yahoo
Csdn Drdobbs Tenni Theurer Yahooguestb1b95b
 
Building Open Radar
Building Open RadarBuilding Open Radar
Building Open RadarTim Burks
 
WebHosting Performance / WordPress - Pubcon Vegas - Hendison
WebHosting Performance / WordPress  - Pubcon Vegas - HendisonWebHosting Performance / WordPress  - Pubcon Vegas - Hendison
WebHosting Performance / WordPress - Pubcon Vegas - HendisonSearch Commander, Inc.
 
Security and Privacy on the Web in 2016
Security and Privacy on the Web in 2016Security and Privacy on the Web in 2016
Security and Privacy on the Web in 2016Francois Marier
 

What's hot (20)

How To Lock Down And Secure Your Wordpress
How To Lock Down And Secure Your WordpressHow To Lock Down And Secure Your Wordpress
How To Lock Down And Secure Your Wordpress
 
State of the resource timing api
State of the resource timing apiState of the resource timing api
State of the resource timing api
 
Your Script Just Killed My Site
Your Script Just Killed My SiteYour Script Just Killed My Site
Your Script Just Killed My Site
 
The Last Mile
The Last MileThe Last Mile
The Last Mile
 
Defeating Cross-Site Scripting with Content Security Policy
Defeating Cross-Site Scripting with Content Security PolicyDefeating Cross-Site Scripting with Content Security Policy
Defeating Cross-Site Scripting with Content Security Policy
 
Content Security Policy
Content Security PolicyContent Security Policy
Content Security Policy
 
Security 101
Security 101Security 101
Security 101
 
Protecting Web App users in today’s hostile environment
Protecting Web App users in today’s hostile environmentProtecting Web App users in today’s hostile environment
Protecting Web App users in today’s hostile environment
 
Using jsPerf correctly
Using jsPerf correctlyUsing jsPerf correctly
Using jsPerf correctly
 
10 Excellent Ways to Secure Your Spring Boot Application - The Secure Develop...
10 Excellent Ways to Secure Your Spring Boot Application - The Secure Develop...10 Excellent Ways to Secure Your Spring Boot Application - The Secure Develop...
10 Excellent Ways to Secure Your Spring Boot Application - The Secure Develop...
 
Web Security - CSP & Web Cryptography
Web Security - CSP & Web CryptographyWeb Security - CSP & Web Cryptography
Web Security - CSP & Web Cryptography
 
Word camp pune 2013 security
Word camp pune 2013   securityWord camp pune 2013   security
Word camp pune 2013 security
 
Creating Secure Web Apps: What Every Developer Needs to Know About HTTPS Today
Creating Secure Web Apps: What Every Developer Needs to Know About HTTPS TodayCreating Secure Web Apps: What Every Developer Needs to Know About HTTPS Today
Creating Secure Web Apps: What Every Developer Needs to Know About HTTPS Today
 
Content Security Policy - The application security Swiss Army Knife
Content Security Policy - The application security Swiss Army KnifeContent Security Policy - The application security Swiss Army Knife
Content Security Policy - The application security Swiss Army Knife
 
Adventure Time with JavaScript & Single Page Applications
Adventure Time with JavaScript & Single Page ApplicationsAdventure Time with JavaScript & Single Page Applications
Adventure Time with JavaScript & Single Page Applications
 
Csdn Drdobbs Tenni Theurer Yahoo
Csdn Drdobbs Tenni Theurer YahooCsdn Drdobbs Tenni Theurer Yahoo
Csdn Drdobbs Tenni Theurer Yahoo
 
Building Open Radar
Building Open RadarBuilding Open Radar
Building Open Radar
 
Website security
Website securityWebsite security
Website security
 
WebHosting Performance / WordPress - Pubcon Vegas - Hendison
WebHosting Performance / WordPress  - Pubcon Vegas - HendisonWebHosting Performance / WordPress  - Pubcon Vegas - Hendison
WebHosting Performance / WordPress - Pubcon Vegas - Hendison
 
Security and Privacy on the Web in 2016
Security and Privacy on the Web in 2016Security and Privacy on the Web in 2016
Security and Privacy on the Web in 2016
 

Similar to Comment se crasher avec classe pendant un pic d'audience, a.k.a #effetcapital

Innovations and trends in Cloud. Connectfest Porto 2019
Innovations and trends in Cloud. Connectfest Porto 2019Innovations and trends in Cloud. Connectfest Porto 2019
Innovations and trends in Cloud. Connectfest Porto 2019javier ramirez
 
Getting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at Scale
Getting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at ScaleGetting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at Scale
Getting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at ScaleBishop Fox
 
AWS Summit Singapore Opening Keynote
AWS Summit Singapore Opening Keynote AWS Summit Singapore Opening Keynote
AWS Summit Singapore Opening Keynote Amazon Web Services
 
AWS Lambda from the trenches (Serverless London)
AWS Lambda from the trenches (Serverless London)AWS Lambda from the trenches (Serverless London)
AWS Lambda from the trenches (Serverless London)Yan Cui
 
Lunch && Learn DevHub - 6 Things to Learn to become an AWS Genius
Lunch && Learn DevHub - 6 Things to Learn to become an AWS GeniusLunch && Learn DevHub - 6 Things to Learn to become an AWS Genius
Lunch && Learn DevHub - 6 Things to Learn to become an AWS GeniusAndrew Brown
 
2023 Databases AWS reInvent Launches.pdf
2023 Databases AWS reInvent Launches.pdf2023 Databases AWS reInvent Launches.pdf
2023 Databases AWS reInvent Launches.pdfbobbyhht
 
Serverless in production, an experience report (JeffConf)
Serverless in production, an experience report (JeffConf)Serverless in production, an experience report (JeffConf)
Serverless in production, an experience report (JeffConf)Yan Cui
 
Yan Cui - Serverless in production, an experience report - Codemotion Milan 2017
Yan Cui - Serverless in production, an experience report - Codemotion Milan 2017Yan Cui - Serverless in production, an experience report - Codemotion Milan 2017
Yan Cui - Serverless in production, an experience report - Codemotion Milan 2017Codemotion
 
Serverless in production, an experience report (codemotion milan)
Serverless in production, an experience report (codemotion milan)Serverless in production, an experience report (codemotion milan)
Serverless in production, an experience report (codemotion milan)Yan Cui
 
AWS における サーバーレスの基礎からチューニングまで
AWS における サーバーレスの基礎からチューニングまでAWS における サーバーレスの基礎からチューニングまで
AWS における サーバーレスの基礎からチューニングまで崇之 清水
 
Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)Yan Cui
 
Serverless in Production, an experience report (cloudXchange)
Serverless in Production, an experience report (cloudXchange)Serverless in Production, an experience report (cloudXchange)
Serverless in Production, an experience report (cloudXchange)Yan Cui
 
Serverless in production, an experience report (Going Serverless)
Serverless in production, an experience report (Going Serverless)Serverless in production, an experience report (Going Serverless)
Serverless in production, an experience report (Going Serverless)Yan Cui
 
Serverless in production, an experience report (LNUG)
Serverless in production, an experience report (LNUG)Serverless in production, an experience report (LNUG)
Serverless in production, an experience report (LNUG)Yan Cui
 
Serverless in production, an experience report
Serverless in production, an experience reportServerless in production, an experience report
Serverless in production, an experience reportYan Cui
 
Escalando hasta sus primeros 10 millones de usuarios
Escalando hasta sus primeros 10 millones de usuariosEscalando hasta sus primeros 10 millones de usuarios
Escalando hasta sus primeros 10 millones de usuariosAmazon Web Services LATAM
 
데이터 기반 의사결정을 통한 비지니스 혁신 - 윤석찬 (AWS 테크에반젤리스트)
데이터 기반 의사결정을 통한 비지니스 혁신 - 윤석찬 (AWS 테크에반젤리스트)데이터 기반 의사결정을 통한 비지니스 혁신 - 윤석찬 (AWS 테크에반젤리스트)
데이터 기반 의사결정을 통한 비지니스 혁신 - 윤석찬 (AWS 테크에반젤리스트)Amazon Web Services Korea
 
The servicescore card - Gamifying Operational Excellence - SRECON
The servicescore card - Gamifying Operational Excellence - SRECONThe servicescore card - Gamifying Operational Excellence - SRECON
The servicescore card - Gamifying Operational Excellence - SRECONDaniel ( Danny ) ☃ Lawrence
 
SMC304 Serverless Orchestration with AWS Step Functions
SMC304 Serverless Orchestration with AWS Step FunctionsSMC304 Serverless Orchestration with AWS Step Functions
SMC304 Serverless Orchestration with AWS Step FunctionsAmazon Web Services
 
Aws Introduction, technology and $ sense
Aws Introduction, technology and $ senseAws Introduction, technology and $ sense
Aws Introduction, technology and $ senseSachin Dole
 

Similar to Comment se crasher avec classe pendant un pic d'audience, a.k.a #effetcapital (20)

Innovations and trends in Cloud. Connectfest Porto 2019
Innovations and trends in Cloud. Connectfest Porto 2019Innovations and trends in Cloud. Connectfest Porto 2019
Innovations and trends in Cloud. Connectfest Porto 2019
 
Getting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at Scale
Getting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at ScaleGetting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at Scale
Getting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at Scale
 
AWS Summit Singapore Opening Keynote
AWS Summit Singapore Opening Keynote AWS Summit Singapore Opening Keynote
AWS Summit Singapore Opening Keynote
 
AWS Lambda from the trenches (Serverless London)
AWS Lambda from the trenches (Serverless London)AWS Lambda from the trenches (Serverless London)
AWS Lambda from the trenches (Serverless London)
 
Lunch && Learn DevHub - 6 Things to Learn to become an AWS Genius
Lunch && Learn DevHub - 6 Things to Learn to become an AWS GeniusLunch && Learn DevHub - 6 Things to Learn to become an AWS Genius
Lunch && Learn DevHub - 6 Things to Learn to become an AWS Genius
 
2023 Databases AWS reInvent Launches.pdf
2023 Databases AWS reInvent Launches.pdf2023 Databases AWS reInvent Launches.pdf
2023 Databases AWS reInvent Launches.pdf
 
Serverless in production, an experience report (JeffConf)
Serverless in production, an experience report (JeffConf)Serverless in production, an experience report (JeffConf)
Serverless in production, an experience report (JeffConf)
 
Yan Cui - Serverless in production, an experience report - Codemotion Milan 2017
Yan Cui - Serverless in production, an experience report - Codemotion Milan 2017Yan Cui - Serverless in production, an experience report - Codemotion Milan 2017
Yan Cui - Serverless in production, an experience report - Codemotion Milan 2017
 
Serverless in production, an experience report (codemotion milan)
Serverless in production, an experience report (codemotion milan)Serverless in production, an experience report (codemotion milan)
Serverless in production, an experience report (codemotion milan)
 
AWS における サーバーレスの基礎からチューニングまで
AWS における サーバーレスの基礎からチューニングまでAWS における サーバーレスの基礎からチューニングまで
AWS における サーバーレスの基礎からチューニングまで
 
Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)
 
Serverless in Production, an experience report (cloudXchange)
Serverless in Production, an experience report (cloudXchange)Serverless in Production, an experience report (cloudXchange)
Serverless in Production, an experience report (cloudXchange)
 
Serverless in production, an experience report (Going Serverless)
Serverless in production, an experience report (Going Serverless)Serverless in production, an experience report (Going Serverless)
Serverless in production, an experience report (Going Serverless)
 
Serverless in production, an experience report (LNUG)
Serverless in production, an experience report (LNUG)Serverless in production, an experience report (LNUG)
Serverless in production, an experience report (LNUG)
 
Serverless in production, an experience report
Serverless in production, an experience reportServerless in production, an experience report
Serverless in production, an experience report
 
Escalando hasta sus primeros 10 millones de usuarios
Escalando hasta sus primeros 10 millones de usuariosEscalando hasta sus primeros 10 millones de usuarios
Escalando hasta sus primeros 10 millones de usuarios
 
데이터 기반 의사결정을 통한 비지니스 혁신 - 윤석찬 (AWS 테크에반젤리스트)
데이터 기반 의사결정을 통한 비지니스 혁신 - 윤석찬 (AWS 테크에반젤리스트)데이터 기반 의사결정을 통한 비지니스 혁신 - 윤석찬 (AWS 테크에반젤리스트)
데이터 기반 의사결정을 통한 비지니스 혁신 - 윤석찬 (AWS 테크에반젤리스트)
 
The servicescore card - Gamifying Operational Excellence - SRECON
The servicescore card - Gamifying Operational Excellence - SRECONThe servicescore card - Gamifying Operational Excellence - SRECON
The servicescore card - Gamifying Operational Excellence - SRECON
 
SMC304 Serverless Orchestration with AWS Step Functions
SMC304 Serverless Orchestration with AWS Step FunctionsSMC304 Serverless Orchestration with AWS Step Functions
SMC304 Serverless Orchestration with AWS Step Functions
 
Aws Introduction, technology and $ sense
Aws Introduction, technology and $ senseAws Introduction, technology and $ sense
Aws Introduction, technology and $ sense
 

Recently uploaded

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 

Recently uploaded (20)

Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 

Comment se crasher avec classe pendant un pic d'audience, a.k.a #effetcapital

  • 1. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Comment se crasher avec classe Guillaume Marchand Solutions Architect AWS France A W S M E E T U P P A R I S # E F F E T C A P I T A L
  • 4. mesures rapides P O U R É V I T E R L A C A T A S T R O P H E
  • 5. Ma stratégie de healthchecks ? “ O N N E M ’ A V A I T P A S D I T Q U I F A L L A I T Q U E J ’ E N P R E N N E … ”
  • 6. Ma stratégie de healthchecks ? Liveness checks Local health checks Dependency health checks
  • 7. Le DNS “ I T ’ S A L W A Y S D N S ”
  • 8. Traffic Flow Le DNS Amazon Route 53 DNS Failover CACHE DNS
  • 9. Le CDN C ’ E S T L A V I E , M Ê M E P O U R L E C O N T E N U D Y N A M I Q U E
  • 10. Le CDN Amazon Cloudfront cache.monsite.com CNAME xxx.cloudfront.net Default (*) : Min TTL=2s Q: Origine non disponible ? Q: Mise en cache des pages d’erreur ? Bonus : If-Modified-Since → HTTP 304 Feature: Origin Failover
  • 11. Mon site est la page d’erreur 500
  • 12. Mon site est la page d’erreur 500
  • 13. Mon site est la page d’erreur 500
  • 14. Mon site est la page d’erreur 500 Amazon Simple Storage Service (S3) Amazon CloudFront Amazon API Gateway AWS Lambda Amazon DynamoDB
  • 15. Et les timeouts ? E U H H … I L S V O N T B I E N …
  • 16. Et les timeouts ? Q: Quelle valeur choisir ? p99.9 ElastiCache for Redis MySQL instance API Application Load balancer CDN ? ? ? ? ?
  • 17. Comment se sont passés les tests de charge ? O N N ’ E N F A I T P A S , C E N ’ E S T P A S R E P R É S E N T A T I F D U T R A F F I C R É E L tests
  • 18. Comment se sont passés les tests de charge ?
  • 19. Comment se sont passés les tests de charge ?
  • 20. Comment se sont passés les tests de charge ?
  • 21. “Max Connections” ? J U S Q U ’ À L ’ I N F I N I E T A U D E L À …
  • 22. A Ï E . . S Q L Amazon RDS / Aurora
  • 23. Tester le Failover ”A failure event results in a brief interruption, during which read and write operations fail with an exception. However, service is typically restored in less than 120 seconds, and often less than 60 seconds.” Read Replica Séparer les “insert” des ”select” (PHP, Java) Amazon RDS Proxy “With RDS Proxy, failover times for Aurora and RDS databases are reduced by up to 66%” Amazon RDS / Aurora
  • 24. Conclusion F A I T E S L E A U J O U R D ’ H U I
  • 25. 1. Healthchecks 2. DNS 3. CDN 4. Page d’erreur Conclusion 5. Timeouts 6. Tests de charge 7. Max Connections 8. Bases de données
  • 27. Quand je crash … Atténuer l’impact NON NON NON NON NON
  • 28. Healthcheck AWS Well-Architected Framework > Operational Excellence > “OPS 8: How do you understand the health of your workload?” Amazon Builders’ Library > ”Implementing health checks” Workshop ”Health check and dependencies” Timeout Amazon Builders’ Library > “Timeouts, retries and backoff with jitter” “Resources consumed by idle PostgreSQL connections” Gestion d’un incident Session AWS Reinvent 2020 : Incident management in a distributed organization AWS Gameday Test de charge Distributed Load Testing on AWS Tests de résilience AWS Fault Injection Simulator Et ensuite ?
  • 29. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Merci ! Freepik smashicons https://www.linkedin.com/in/guillaumemarchand/

Editor's Notes

  1. Bonjour à tous, merci de m’acceuillir au sein de ce meetup. Si on imaginait que Vous arriviez au travail Vous voyez votre patron se fait interviewer par des journalistes TV sur la qualité, l’originalité des produits de votre société qui sont disponibles sur votre site ou votre application. Le sujet sera diffusé lors dans 4j pendant le journal télévisé de 20h d’une grande chaine de télé. C’est de la communication gratuite et une grande chance pour votre société. Ils vont mettre en avant votre site web, votre application mobile. Tous les téléspacteurs vont se connecter en même temps à la seconde pret sur votre application. Et il est pratiquement sûr qu’elle ne va pas être dispo.
  2. J’ai eu la chance de travailler pour un grand groupe audiovisuel dans les années 2010. J’ai toujours été surpris par le risque de succès et même recurrent. Les premières semaines, je crashais misérablement
  3. Il n’y a pas besoin de l’effet de surprise pour crasher. Ce qui est intéressant avec la cérémonie d’ouverture du festival de cannes, c’est que c’est tous les ans à la même date. Malgré tout, après plusieurs montées des marches, je me suis retrouvé par terre. A cause de facteur externe : Plan marketing, deal publicitaire
  4. Le sujet a été évoqué plusieurs fois par nos clients
  5. Et par Sébastien. Je voudrais partir du fait que vous ayez déjà un existant et que vous êtes On premise. Mettre en place des quicks wins pour que ça ne soit pas catastrophique. Il faut assumer son crash.
  6. Sans changement d’architecture Sans développement Facile à mettre en place Que vous allez pouvoir mettre en place tout seul Si j’avais mis en place ces quickwins sur mes applications, j’aurai pu passer éviter des catastrophes Et vous pourrez regarder le journal de 20h tranquillement dans votre canapé
  7. J’aime poser cette question à mes clients pendant les revues d’architecture. Tout d’un coup il n’y a plus de bruit, la vidéo se coupe. On croirait à un incident d’Amazon Chime mais pas du tout health checks to detect and deal with single-server failures, Une page d’erreur est souvent plus rapide à afficher, si le healthcheck est mal configuré, le load balancer pourrait distribuer le traffic sur ce serveur car plus rapide. Un serveur peut planté à cause d’une dépendance et c’est donc clairement un faux positif. Liveness checks : HTTP 200 Local Health Checks : Verifie le fonctionnement local de l’application : RW Disk, Process, Application process, Support Process (Monitoring & log) Dependency Health Checks : A common pattern is a Read API that queries a database but caches responses locally for some time. If the database is down, the service can still serve cached reads until the database is back online. https://aws.amazon.com/builders-library/implementing-health-checks/ Avez vous vérifié la configuration des healthchecks du DNS, Loadbalancer, du Groupe D’autoscaling et de l’instance virtuelle ? <div>Icons made by <a href="https://www.flaticon.com/authors/freepik" title="Freepik">Freepik</a> from <a href="https://www.flaticon.com/" title="Flaticon">www.flaticon.com</a></div>
  8. Traffic Flow avec plusieurs règles dont le failover et la lantence Quel est votre failover ? Une page d’erreur
  9. https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RequestAndResponseBehaviorCustomOrigin.html#ResponseCustomOriginUnavailable CloudFront either serves the expired version of the object or serves a custom error page. https://docs.aws.amazon.com/fr_fr/AmazonCloudFront/latest/DeveloperGuide/HTTPStatusCodes.html#HTTPStatusCodes-custom-error-pages https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/high_availability_origin_failover.html Quel est votre failover
  10. When a client is waiting longer than usual for a request to complete, it also holds on to the resources it was using for that request for a longer time. When a number of requests hold on to resources for a long time, the server can run out of those resources. Avez vous bien règlés les timeouts applicatifs avec vos bdds et vos services partenaires où autres ? Loadbalancers OS A good practice for choosing a timeout for calls within an AWS Region is to start with the latency metrics of the downstream service. So at Amazon, when we make one service call another service, we choose an acceptable rate of false timeouts (such as 0.1%). Then, we look at the corresponding latency percentile on the downstream service (p99.9 in this example).
  11. C5.24xlarge
  12. C5.24xlarge 192 GB de Ram
  13. 19 Gbits de bande passante EBS
  14. When a single server fails, that's not a problem, but in a traffic surge to the service, the last thing we want is to shrink the size of the service. Taking servers out of service during an overload can cause a downward spiral. Forcing the remaining servers take even more traffic makes them more likely to become overloaded, also fail a health check, and shrink the fleet even more. The problem is not that overloaded servers return errors when they're overloaded. It's that servers don't respond to the load balancer ping request in time. After all, load balancer health checks are configured with timeouts, just like any other remote service call. Fortunately, there are some straightforward configuration best practices that we follow to help prevent this kind of downward spiral. Tools like iptables, and even some load balancers, support the notion of “max connections.” In this case, the OS (or load balancer) limits the number of connections to the server so that the server process is not flooded with concurrent requests that would have slowed it down. https://helecloud.com/blog/handling-hundreds-of-thousands-of-concurrent-http-connections-on-aws/
  15. When a single server fails, that's not a problem, but in a traffic surge to the service, the last thing we want is to shrink the size of the service. Taking servers out of service during an overload can cause a downward spiral. Forcing the remaining servers take even more traffic makes them more likely to become overloaded, also fail a health check, and shrink the fleet even more. The problem is not that overloaded servers return errors when they're overloaded. It's that servers don't respond to the load balancer ping request in time. After all, load balancer health checks are configured with timeouts, just like any other remote service call. Fortunately, there are some straightforward configuration best practices that we follow to help prevent this kind of downward spiral. Tools like iptables, and even some load balancers, support the notion of “max connections.” In this case, the OS (or load balancer) limits the number of connections to the server so that the server process is not flooded with concurrent requests that would have slowed it down. https://helecloud.com/blog/handling-hundreds-of-thousands-of-concurrent-http-connections-on-aws/
  16. https://www.php.net/manual/en/mysqlnd-ms.rwsplit.php https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-Sizing Q: What happens during Multi-AZ failover and how long does it take? Failover is automatically handled by Amazon RDS so that you can resume database operations as quickly as possible without administrative intervention. When failing over, Amazon RDS simply flips the canonical name record (CNAME) for your DB instance to point at the standby, which is in turn promoted to become the new primary. We encourage you to follow best practices and implement database connection retry at the application layer. Failovers, as defined by the interval between the detection of the failure on the primary and the resumption of transactions on the standby, typically complete within one to two minutes. Failover time can also be affected by whether large uncommitted transactions must be recovered; the use of adequately large instance types is recommended with Multi-AZ for best results. AWS also recommends the use of Provisioned IOPS with Multi-AZ instances, for fast, predictable, and consistent throughput performance. --- https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/rds-proxy.html
  17. When a single server fails, that's not a problem, but in a traffic surge to the service, the last thing we want is to shrink the size of the service. Taking servers out of service during an overload can cause a downward spiral. Forcing the remaining servers take even more traffic makes them more likely to become overloaded, also fail a health check, and shrink the fleet even more. The problem is not that overloaded servers return errors when they're overloaded. It's that servers don't respond to the load balancer ping request in time. After all, load balancer health checks are configured with timeouts, just like any other remote service call. Fortunately, there are some straightforward configuration best practices that we follow to help prevent this kind of downward spiral. Tools like iptables, and even some load balancers, support the notion of “max connections.” In this case, the OS (or load balancer) limits the number of connections to the server so that the server process is not flooded with concurrent requests that would have slowed it down. https://helecloud.com/blog/handling-hundreds-of-thousands-of-concurrent-http-connections-on-aws/
  18. Sur le long terme, il y a une autre façon de s’organiser pour gérer ces incidents de productions
  19. démocratiser l'utilisation du chaos engineering
  20. Merci de m’avoir écouté. Vous savez comment crasher vos applications tout en restant dans votre canapé à regarder le journal TV