SlideShare a Scribd company logo
Cloud Architecture &
Distributed Systems
Trivia
Dr. Michael Menzel
AQA Session @ Dev Team Meeting
Agenda
1. Distribute & Scale
2. Stabilize & Prevent Failure
3. Deployment
4. Failure in Production
5. Scaling the Persistence Layer
Distribute & Scale
“Distribution and Elasticity are king.”
Load Balancers
• Assume balancing over heterogeneous hardware
• Shared hardware with virtualization
• Different load on machines (long requests)
• Vertical scaling
• Don’t keep state! As much stateless as possible
• Incorporate health checks and feedback channels
• Allow “Lame Ducks” (= healthy but busy)
• Reserve time to boot (commission/decommission)
Health Checks & Monitoring
• Web services typically offer /health or /ping
• Test inwards to give more precise health score (lame duck)
• Don’t make health check too expensive to avoid extra load
• Use monitoring a lot to detect trends and history
• Monitor basics: CPU, Mem, etc.
• Add application-level monitoring
(queued requests, etc.)
Auto Scaling
• Start with capacity planning to skip initial scaling delay
• Benchmark to find scarce resource of your application
• Monitor ftw & apply rules
• Custom metrics are better than generic
• Test behavior to learn about metrics
• Predict resource requirements (future)
Auto Scaling ctd.
• For best elasticity prepare your VM/docker images to boot quickly
• Test and measure your elasticity!!!
• Stress testing: bursts, volatility
• Performance testing: grow, shrink
• Chaos testing
• Test with “Huge Scales”
Stabilize & Prevent Failure
“Expect failures at all loads. Prevent failures before one cascades.”
Degrade Performance
• Introduce grades for important users (if possible)
• Know whose request is processed
• Process only important users on peak loads
Request Time Thresholds
• Long lasting requests are expensive, example:
“30 sec threshold, 1000 QPS with full load
5% of requests take ≥ 30 sec, after 20 sec (latest) you are blocked”
• Define thresholds and propagate sub-thresholds
Example
Future.firstCompletedOf(Seq(
Promise.timeout(InternalServerError("Oops"), 30 second),
Webservice.call(“/fibonacci/next”, 10 second).map(Ok)
))
Web
Service A
Web
Service B
Web
Service C
Request Time Thresholds!
Anti-Overload: Circuit Breakers & Back-off!
• Back off when web service endpoint does not respond (in time)
• Exponential is famous, but not best!
• Jitter back off strategy is better!!!1)
• Use circuit breakers (e.g. https://github.com/Netflix/Hystrix)
1) Source: https://www.awsarchitectureblog.com/2015/03/backoff.html
sleep = random_between(0, min(cap, base * 2 ** attempt))
sleep = min(cap, base * 2 ** attempt)
Random Jitter Back Off
Source: https://www.awsarchitectureblog.com/2015/03/backoff.html
Deployment
“Prevent toil and remain stable!”
Package Deployments
• Prepare a full VM/docker image (if possible)
• VMs bring operating system and only need virtualization stack
• Dockers need docker environment but boot quicker
• Keep old versions for rollbacks and tests/comparisons
• If you don’t package:
• Ensure you deploy into a reset environment (mem usage, temp files, etc.)
• Ensure you use a bundling with all dependencies (Java? Node?)
• Coordinate thoroughly to not interfere with other deployments
Maintain multiple environments
• “The more the merrier”, but costly – find your trade-off!
• Allow many testing environments for different types of tests
• Stress & performance tests
• Integration & regression tests
• Chaos testing & Demos
• Automate the creation of new environments
Canary Deployments
• Canary allow you to monitor new software versions
• Keep track of which servers have which version
• In monitoring
• In logging
• Activate extra logging and notifications for the canaries
Load Balancers during Deployment
• Two strategies
1. Same load balancer: add new instances to existing load balancer
2. Extra load balancer: add whole new load balancer and move over eventually
• Same load balancer tips
• Add instance when ready for health checks
• Tag new instances to differentiate versions
• Extra load balancer tips
• Make sure all settings are identical (infrastructure as code!)
• First run both load balancers in parallel, then switch (use DNS or other LB)
Failure in production
“Goal is to make your pager obsolete.”
Anything can happen!
Countermeasures for Failures
• Install a immediate response channel (pager, SMS)
• Stop the bleeding first! – Symptoms before cause
• Avoid looking for the cause, but prevent further failures
• Shut down parts of the system if necessary
• Declare a coordinator
Document Failures & Solutions
• Document every step and progress of failure resolutions
• Define protocol templates to reduce overhead
• Analyze and replay old protocols
• Write regression tests with your solution
• Tests make sure old bugs sneak back in
• You documented the symptoms of the bug in code
Scaling the Persistence Layer
“Just hard. ‘Nough said.”
CDNs: grab the low hanging fruits
• CDNs are cheap web serving helpers
• Take load from web servers
• Are quick due to in-mem caching of static content
• Edge location with shorter round-trip = best latency
• Digesting with MD5 hash
8425b886b9a2184c48b34212dfaf103b-index.html
6269a326c6a2184d32b39881baac720c-main.js
ReCAP: CAP Theorem?
• Out of C, A, and P only two can be kept.
Pick your storage systems
• Narrow down by purpose, data
structure & features
• ACID vs. BASE
• Basically Available
• Soft state
• Eventually consistent
Complex Queries &
Structured
• Key-Value & BigTable
• SQL
Simple Queries &
Unstructured
• Blob
• Document
Examples of NoSQL usage
Use multiple stores and even redundant data (if necessary for A)
• Simple JSON-based web service: Document store
• Requests to /profile/{id} loads document “profile-{id}”
• Changes are simple and only per document
• Complex, but predictable queries: BigTable store
• Avoid scans!!!
• Create 1 table per query, don’t fear redundant data
• Video and Image service: Blob store (+ CDN)
Database goes global?
• Writing state is hard to distribute globally (c.f. Google Spanner)
• Inconsistencies! (A over C)
• http://research.google.com/archive/spanner.html
• Use distributed replicas & caches for read(?)
• Local caches can drift (remember load balancing!)
• Memcached clusters can help per data center
• Expect eventual consistency with outdated reads
• Sharding & Partitioning (in a global cluster)
• Divide data horizontally on application layer (primary keys)
• Partition/Sharding key design is key
• Be careful with JOINs or scans across partitions/shards!
Knowing your storage system(s) is crucial
• Consistency level & consensus protocols?
Paxos, BFT, 2-phase commit, quorum, hashgraph, etc.
• Replication strategies? Backups?
Replication keys, replication factors, rack/data center-awareness
• Performance? Fault-tolerance?
Benchmark (data layouts, configurations), elasticity, chaos/stress tests
Cloud Architecture & Distributed Systems Trivia

More Related Content

What's hot

NoSQL - No Security? - The BSides Edition
NoSQL - No Security? - The BSides EditionNoSQL - No Security? - The BSides Edition
NoSQL - No Security? - The BSides Edition
Gavin Holt
 
Altitude SF 2017: Building a continuous deployment pipeline
Altitude SF 2017: Building a continuous deployment pipelineAltitude SF 2017: Building a continuous deployment pipeline
Altitude SF 2017: Building a continuous deployment pipeline
Fastly
 
(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool Management(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool Management
BIOVIA
 
NoSQL - No Security?
NoSQL - No Security?NoSQL - No Security?
NoSQL - No Security?
Gavin Holt
 
WebLogic Scripting Tool Overview
WebLogic Scripting Tool OverviewWebLogic Scripting Tool Overview
WebLogic Scripting Tool Overview
James Bayer
 
JCache Using JCache
JCache Using JCacheJCache Using JCache
NoSQL, no security?
NoSQL, no security?NoSQL, no security?
NoSQL, no security?
wurbanski
 
Altitude SF 2017: The power of the network
Altitude SF 2017: The power of the networkAltitude SF 2017: The power of the network
Altitude SF 2017: The power of the network
Fastly
 
Introduction to .Net Driver
Introduction to .Net DriverIntroduction to .Net Driver
Introduction to .Net Driver
DataStax Academy
 
Galera Cluster 4 for MySQL 8 Release Webinar slides
Galera Cluster 4 for MySQL 8 Release Webinar slidesGalera Cluster 4 for MySQL 8 Release Webinar slides
Galera Cluster 4 for MySQL 8 Release Webinar slides
Codership Oy - Creators of Galera Cluster
 
ProxySQL for MySQL
ProxySQL for MySQLProxySQL for MySQL
ProxySQL for MySQL
Mydbops
 
Gatling workshop lets test17
Gatling workshop lets test17Gatling workshop lets test17
Gatling workshop lets test17
Gerald Muecke
 
Taking Full Advantage of Galera Multi Master Cluster
Taking Full Advantage of Galera Multi Master ClusterTaking Full Advantage of Galera Multi Master Cluster
Taking Full Advantage of Galera Multi Master Cluster
Codership Oy - Creators of Galera Cluster
 
Next Generation DevOps in Drupal: DrupalCamp London 2014
Next Generation DevOps in Drupal: DrupalCamp London 2014Next Generation DevOps in Drupal: DrupalCamp London 2014
Next Generation DevOps in Drupal: DrupalCamp London 2014
Barney Hanlon
 
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2
Marco Tusa
 
ProxySQL Tutorial - PLAM 2016
ProxySQL Tutorial - PLAM 2016ProxySQL Tutorial - PLAM 2016
ProxySQL Tutorial - PLAM 2016
Derek Downey
 
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
Performance Tuning -  Memory leaks, Thread deadlocks, JDK toolsPerformance Tuning -  Memory leaks, Thread deadlocks, JDK tools
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
Haribabu Nandyal Padmanaban
 
Proxysql use case scenarios plam 2016
Proxysql use case scenarios    plam 2016Proxysql use case scenarios    plam 2016
Proxysql use case scenarios plam 2016
Alkin Tezuysal
 
Building a better web
Building a better webBuilding a better web
Building a better web
Fastly
 
Proxysql ha plam_2016_2_keynote
Proxysql ha plam_2016_2_keynoteProxysql ha plam_2016_2_keynote
Proxysql ha plam_2016_2_keynote
Marco Tusa
 

What's hot (20)

NoSQL - No Security? - The BSides Edition
NoSQL - No Security? - The BSides EditionNoSQL - No Security? - The BSides Edition
NoSQL - No Security? - The BSides Edition
 
Altitude SF 2017: Building a continuous deployment pipeline
Altitude SF 2017: Building a continuous deployment pipelineAltitude SF 2017: Building a continuous deployment pipeline
Altitude SF 2017: Building a continuous deployment pipeline
 
(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool Management(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool Management
 
NoSQL - No Security?
NoSQL - No Security?NoSQL - No Security?
NoSQL - No Security?
 
WebLogic Scripting Tool Overview
WebLogic Scripting Tool OverviewWebLogic Scripting Tool Overview
WebLogic Scripting Tool Overview
 
JCache Using JCache
JCache Using JCacheJCache Using JCache
JCache Using JCache
 
NoSQL, no security?
NoSQL, no security?NoSQL, no security?
NoSQL, no security?
 
Altitude SF 2017: The power of the network
Altitude SF 2017: The power of the networkAltitude SF 2017: The power of the network
Altitude SF 2017: The power of the network
 
Introduction to .Net Driver
Introduction to .Net DriverIntroduction to .Net Driver
Introduction to .Net Driver
 
Galera Cluster 4 for MySQL 8 Release Webinar slides
Galera Cluster 4 for MySQL 8 Release Webinar slidesGalera Cluster 4 for MySQL 8 Release Webinar slides
Galera Cluster 4 for MySQL 8 Release Webinar slides
 
ProxySQL for MySQL
ProxySQL for MySQLProxySQL for MySQL
ProxySQL for MySQL
 
Gatling workshop lets test17
Gatling workshop lets test17Gatling workshop lets test17
Gatling workshop lets test17
 
Taking Full Advantage of Galera Multi Master Cluster
Taking Full Advantage of Galera Multi Master ClusterTaking Full Advantage of Galera Multi Master Cluster
Taking Full Advantage of Galera Multi Master Cluster
 
Next Generation DevOps in Drupal: DrupalCamp London 2014
Next Generation DevOps in Drupal: DrupalCamp London 2014Next Generation DevOps in Drupal: DrupalCamp London 2014
Next Generation DevOps in Drupal: DrupalCamp London 2014
 
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2
 
ProxySQL Tutorial - PLAM 2016
ProxySQL Tutorial - PLAM 2016ProxySQL Tutorial - PLAM 2016
ProxySQL Tutorial - PLAM 2016
 
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
Performance Tuning -  Memory leaks, Thread deadlocks, JDK toolsPerformance Tuning -  Memory leaks, Thread deadlocks, JDK tools
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
 
Proxysql use case scenarios plam 2016
Proxysql use case scenarios    plam 2016Proxysql use case scenarios    plam 2016
Proxysql use case scenarios plam 2016
 
Building a better web
Building a better webBuilding a better web
Building a better web
 
Proxysql ha plam_2016_2_keynote
Proxysql ha plam_2016_2_keynoteProxysql ha plam_2016_2_keynote
Proxysql ha plam_2016_2_keynote
 

Viewers also liked

Value of Non-Functional Qualitites of Cloud Storage
Value of Non-Functional Qualitites of Cloud StorageValue of Non-Functional Qualitites of Cloud Storage
Value of Non-Functional Qualitites of Cloud Storage
Dr.-Ing. Michael Menzel
 
IC2E A Configuration Crawler for Cloud Appliances
IC2E A Configuration Crawler for Cloud AppliancesIC2E A Configuration Crawler for Cloud Appliances
IC2E A Configuration Crawler for Cloud Appliances
Dr.-Ing. Michael Menzel
 
(MC²)²: A Generic Decision-Making Framework and its Application to Cloud Comp...
(MC²)²: A Generic Decision-Making Framework and its Application to Cloud Comp...(MC²)²: A Generic Decision-Making Framework and its Application to Cloud Comp...
(MC²)²: A Generic Decision-Making Framework and its Application to Cloud Comp...
Dr.-Ing. Michael Menzel
 
WWW Conference 2012 - Web-Engineering - Cloudgenius
WWW Conference 2012 - Web-Engineering - CloudgeniusWWW Conference 2012 - Web-Engineering - Cloudgenius
WWW Conference 2012 - Web-Engineering - Cloudgenius
Dr.-Ing. Michael Menzel
 
Cloud Migration: Moving to the Cloud
Cloud Migration: Moving to the CloudCloud Migration: Moving to the Cloud
Cloud Migration: Moving to the Cloud
Dr.-Ing. Michael Menzel
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
Luminary Labs
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
LinkedIn
 

Viewers also liked (7)

Value of Non-Functional Qualitites of Cloud Storage
Value of Non-Functional Qualitites of Cloud StorageValue of Non-Functional Qualitites of Cloud Storage
Value of Non-Functional Qualitites of Cloud Storage
 
IC2E A Configuration Crawler for Cloud Appliances
IC2E A Configuration Crawler for Cloud AppliancesIC2E A Configuration Crawler for Cloud Appliances
IC2E A Configuration Crawler for Cloud Appliances
 
(MC²)²: A Generic Decision-Making Framework and its Application to Cloud Comp...
(MC²)²: A Generic Decision-Making Framework and its Application to Cloud Comp...(MC²)²: A Generic Decision-Making Framework and its Application to Cloud Comp...
(MC²)²: A Generic Decision-Making Framework and its Application to Cloud Comp...
 
WWW Conference 2012 - Web-Engineering - Cloudgenius
WWW Conference 2012 - Web-Engineering - CloudgeniusWWW Conference 2012 - Web-Engineering - Cloudgenius
WWW Conference 2012 - Web-Engineering - Cloudgenius
 
Cloud Migration: Moving to the Cloud
Cloud Migration: Moving to the CloudCloud Migration: Moving to the Cloud
Cloud Migration: Moving to the Cloud
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
 

Similar to Cloud Architecture & Distributed Systems Trivia

Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014
Bryan Bende
 
Resilience Planning & How the Empire Strikes Back
Resilience Planning & How the Empire Strikes BackResilience Planning & How the Empire Strikes Back
Resilience Planning & How the Empire Strikes Back
C4Media
 
Performance tuning Grails applications
Performance tuning Grails applicationsPerformance tuning Grails applications
Performance tuning Grails applications
Lari Hotari
 
Data Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDCData Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDC
Abhijit Kumar
 
Work with hundred of hot terabytes in JVMs
Work with hundred of hot terabytes in JVMsWork with hundred of hot terabytes in JVMs
Work with hundred of hot terabytes in JVMs
Malin Weiss
 
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Tibo Beijen
 
Tools. Techniques. Trouble?
Tools. Techniques. Trouble?Tools. Techniques. Trouble?
Tools. Techniques. Trouble?
Testplant
 
Implementing Micro Services Tasks (service discovery, load balancing etc.) - ...
Implementing Micro Services Tasks (service discovery, load balancing etc.) - ...Implementing Micro Services Tasks (service discovery, load balancing etc.) - ...
Implementing Micro Services Tasks (service discovery, load balancing etc.) - ...
Gal Marder
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark
Anubhav Kale
 
Road to database automation - Database source control
Road to database automation - Database source controlRoad to database automation - Database source control
Road to database automation - Database source control
Eduardo Piairo
 
Sql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.pptSql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.ppt
Qingsong Yao
 
Azure PaaS (WebApp & SQL Database) workshop solution
Azure PaaS (WebApp & SQL Database) workshop solutionAzure PaaS (WebApp & SQL Database) workshop solution
Azure PaaS (WebApp & SQL Database) workshop solution
Gelis Wu
 
Ioug oow12 em12c
Ioug oow12 em12cIoug oow12 em12c
Ioug oow12 em12c
Kellyn Pot'Vin-Gorman
 
EM12c Monitoring, Metric Extensions and Performance Pages
EM12c Monitoring, Metric Extensions and Performance PagesEM12c Monitoring, Metric Extensions and Performance Pages
EM12c Monitoring, Metric Extensions and Performance Pages
Enkitec
 
End-to-end Troubleshooting Checklist for Microsoft SQL Server
End-to-end Troubleshooting Checklist for Microsoft SQL ServerEnd-to-end Troubleshooting Checklist for Microsoft SQL Server
End-to-end Troubleshooting Checklist for Microsoft SQL Server
Kevin Kline
 
Road to Continuous Delivery - Wix.com
Road to Continuous Delivery - Wix.comRoad to Continuous Delivery - Wix.com
Road to Continuous Delivery - Wix.com
Aviran Mordo
 
Building large scale, job processing systems with Scala Akka Actor framework
Building large scale, job processing systems with Scala Akka Actor frameworkBuilding large scale, job processing systems with Scala Akka Actor framework
Building large scale, job processing systems with Scala Akka Actor framework
Vignesh Sukumar
 
Performance tuning Grails applications
 Performance tuning Grails applications Performance tuning Grails applications
Performance tuning Grails applications
GR8Conf
 
Postgresql in Education
Postgresql in EducationPostgresql in Education
Postgresql in Education
dostatni
 
Tarabica 2019 (Belgrade, Serbia) - SQL Server performance troubleshooting
Tarabica 2019 (Belgrade, Serbia) - SQL Server performance troubleshootingTarabica 2019 (Belgrade, Serbia) - SQL Server performance troubleshooting
Tarabica 2019 (Belgrade, Serbia) - SQL Server performance troubleshooting
Jovan Popovic
 

Similar to Cloud Architecture & Distributed Systems Trivia (20)

Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014
 
Resilience Planning & How the Empire Strikes Back
Resilience Planning & How the Empire Strikes BackResilience Planning & How the Empire Strikes Back
Resilience Planning & How the Empire Strikes Back
 
Performance tuning Grails applications
Performance tuning Grails applicationsPerformance tuning Grails applications
Performance tuning Grails applications
 
Data Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDCData Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDC
 
Work with hundred of hot terabytes in JVMs
Work with hundred of hot terabytes in JVMsWork with hundred of hot terabytes in JVMs
Work with hundred of hot terabytes in JVMs
 
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
 
Tools. Techniques. Trouble?
Tools. Techniques. Trouble?Tools. Techniques. Trouble?
Tools. Techniques. Trouble?
 
Implementing Micro Services Tasks (service discovery, load balancing etc.) - ...
Implementing Micro Services Tasks (service discovery, load balancing etc.) - ...Implementing Micro Services Tasks (service discovery, load balancing etc.) - ...
Implementing Micro Services Tasks (service discovery, load balancing etc.) - ...
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark
 
Road to database automation - Database source control
Road to database automation - Database source controlRoad to database automation - Database source control
Road to database automation - Database source control
 
Sql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.pptSql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.ppt
 
Azure PaaS (WebApp & SQL Database) workshop solution
Azure PaaS (WebApp & SQL Database) workshop solutionAzure PaaS (WebApp & SQL Database) workshop solution
Azure PaaS (WebApp & SQL Database) workshop solution
 
Ioug oow12 em12c
Ioug oow12 em12cIoug oow12 em12c
Ioug oow12 em12c
 
EM12c Monitoring, Metric Extensions and Performance Pages
EM12c Monitoring, Metric Extensions and Performance PagesEM12c Monitoring, Metric Extensions and Performance Pages
EM12c Monitoring, Metric Extensions and Performance Pages
 
End-to-end Troubleshooting Checklist for Microsoft SQL Server
End-to-end Troubleshooting Checklist for Microsoft SQL ServerEnd-to-end Troubleshooting Checklist for Microsoft SQL Server
End-to-end Troubleshooting Checklist for Microsoft SQL Server
 
Road to Continuous Delivery - Wix.com
Road to Continuous Delivery - Wix.comRoad to Continuous Delivery - Wix.com
Road to Continuous Delivery - Wix.com
 
Building large scale, job processing systems with Scala Akka Actor framework
Building large scale, job processing systems with Scala Akka Actor frameworkBuilding large scale, job processing systems with Scala Akka Actor framework
Building large scale, job processing systems with Scala Akka Actor framework
 
Performance tuning Grails applications
 Performance tuning Grails applications Performance tuning Grails applications
Performance tuning Grails applications
 
Postgresql in Education
Postgresql in EducationPostgresql in Education
Postgresql in Education
 
Tarabica 2019 (Belgrade, Serbia) - SQL Server performance troubleshooting
Tarabica 2019 (Belgrade, Serbia) - SQL Server performance troubleshootingTarabica 2019 (Belgrade, Serbia) - SQL Server performance troubleshooting
Tarabica 2019 (Belgrade, Serbia) - SQL Server performance troubleshooting
 

Recently uploaded

How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?
ToXSL Technologies
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
dakas1
 
The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024
Yara Milbes
 
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptxMigration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
ervikas4
 
ppt on the brain chip neuralink.pptx
ppt  on   the brain  chip neuralink.pptxppt  on   the brain  chip neuralink.pptx
ppt on the brain chip neuralink.pptx
Reetu63
 
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
widenerjobeyrl638
 
TMU毕业证书精仿办理
TMU毕业证书精仿办理TMU毕业证书精仿办理
TMU毕业证书精仿办理
aeeva
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
kalichargn70th171
 
DevOps Consulting Company | Hire DevOps Services
DevOps Consulting Company | Hire DevOps ServicesDevOps Consulting Company | Hire DevOps Services
DevOps Consulting Company | Hire DevOps Services
seospiralmantra
 
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
gapen1
 
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptxOperational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
sandeepmenon62
 
Boost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management AppsBoost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management Apps
Jhone kinadey
 
Upturn India Technologies - Web development company in Nashik
Upturn India Technologies - Web development company in NashikUpturn India Technologies - Web development company in Nashik
Upturn India Technologies - Web development company in Nashik
Upturn India Technologies
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
XfilesPro
 
Photoshop Tutorial for Beginners (2024 Edition)
Photoshop Tutorial for Beginners (2024 Edition)Photoshop Tutorial for Beginners (2024 Edition)
Photoshop Tutorial for Beginners (2024 Edition)
alowpalsadig
 
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdfBaha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
brainerhub1
 
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
kalichargn70th171
 

Recently uploaded (20)

How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
 
The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024
 
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptxMigration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
 
ppt on the brain chip neuralink.pptx
ppt  on   the brain  chip neuralink.pptxppt  on   the brain  chip neuralink.pptx
ppt on the brain chip neuralink.pptx
 
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
 
TMU毕业证书精仿办理
TMU毕业证书精仿办理TMU毕业证书精仿办理
TMU毕业证书精仿办理
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
 
DevOps Consulting Company | Hire DevOps Services
DevOps Consulting Company | Hire DevOps ServicesDevOps Consulting Company | Hire DevOps Services
DevOps Consulting Company | Hire DevOps Services
 
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
 
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptxOperational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
 
Boost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management AppsBoost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management Apps
 
Upturn India Technologies - Web development company in Nashik
Upturn India Technologies - Web development company in NashikUpturn India Technologies - Web development company in Nashik
Upturn India Technologies - Web development company in Nashik
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
 
Photoshop Tutorial for Beginners (2024 Edition)
Photoshop Tutorial for Beginners (2024 Edition)Photoshop Tutorial for Beginners (2024 Edition)
Photoshop Tutorial for Beginners (2024 Edition)
 
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdfBaha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
 
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
 

Cloud Architecture & Distributed Systems Trivia

  • 1. Cloud Architecture & Distributed Systems Trivia Dr. Michael Menzel AQA Session @ Dev Team Meeting
  • 2. Agenda 1. Distribute & Scale 2. Stabilize & Prevent Failure 3. Deployment 4. Failure in Production 5. Scaling the Persistence Layer
  • 3.
  • 4. Distribute & Scale “Distribution and Elasticity are king.”
  • 5. Load Balancers • Assume balancing over heterogeneous hardware • Shared hardware with virtualization • Different load on machines (long requests) • Vertical scaling • Don’t keep state! As much stateless as possible • Incorporate health checks and feedback channels • Allow “Lame Ducks” (= healthy but busy) • Reserve time to boot (commission/decommission)
  • 6. Health Checks & Monitoring • Web services typically offer /health or /ping • Test inwards to give more precise health score (lame duck) • Don’t make health check too expensive to avoid extra load • Use monitoring a lot to detect trends and history • Monitor basics: CPU, Mem, etc. • Add application-level monitoring (queued requests, etc.)
  • 7. Auto Scaling • Start with capacity planning to skip initial scaling delay • Benchmark to find scarce resource of your application • Monitor ftw & apply rules • Custom metrics are better than generic • Test behavior to learn about metrics • Predict resource requirements (future)
  • 8. Auto Scaling ctd. • For best elasticity prepare your VM/docker images to boot quickly • Test and measure your elasticity!!! • Stress testing: bursts, volatility • Performance testing: grow, shrink • Chaos testing • Test with “Huge Scales”
  • 9. Stabilize & Prevent Failure “Expect failures at all loads. Prevent failures before one cascades.”
  • 10. Degrade Performance • Introduce grades for important users (if possible) • Know whose request is processed • Process only important users on peak loads
  • 11. Request Time Thresholds • Long lasting requests are expensive, example: “30 sec threshold, 1000 QPS with full load 5% of requests take ≥ 30 sec, after 20 sec (latest) you are blocked” • Define thresholds and propagate sub-thresholds Example Future.firstCompletedOf(Seq( Promise.timeout(InternalServerError("Oops"), 30 second), Webservice.call(“/fibonacci/next”, 10 second).map(Ok) )) Web Service A Web Service B Web Service C
  • 13. Anti-Overload: Circuit Breakers & Back-off! • Back off when web service endpoint does not respond (in time) • Exponential is famous, but not best! • Jitter back off strategy is better!!!1) • Use circuit breakers (e.g. https://github.com/Netflix/Hystrix) 1) Source: https://www.awsarchitectureblog.com/2015/03/backoff.html sleep = random_between(0, min(cap, base * 2 ** attempt)) sleep = min(cap, base * 2 ** attempt)
  • 14. Random Jitter Back Off Source: https://www.awsarchitectureblog.com/2015/03/backoff.html
  • 15. Deployment “Prevent toil and remain stable!”
  • 16. Package Deployments • Prepare a full VM/docker image (if possible) • VMs bring operating system and only need virtualization stack • Dockers need docker environment but boot quicker • Keep old versions for rollbacks and tests/comparisons • If you don’t package: • Ensure you deploy into a reset environment (mem usage, temp files, etc.) • Ensure you use a bundling with all dependencies (Java? Node?) • Coordinate thoroughly to not interfere with other deployments
  • 17. Maintain multiple environments • “The more the merrier”, but costly – find your trade-off! • Allow many testing environments for different types of tests • Stress & performance tests • Integration & regression tests • Chaos testing & Demos • Automate the creation of new environments
  • 18. Canary Deployments • Canary allow you to monitor new software versions • Keep track of which servers have which version • In monitoring • In logging • Activate extra logging and notifications for the canaries
  • 19. Load Balancers during Deployment • Two strategies 1. Same load balancer: add new instances to existing load balancer 2. Extra load balancer: add whole new load balancer and move over eventually • Same load balancer tips • Add instance when ready for health checks • Tag new instances to differentiate versions • Extra load balancer tips • Make sure all settings are identical (infrastructure as code!) • First run both load balancers in parallel, then switch (use DNS or other LB)
  • 20. Failure in production “Goal is to make your pager obsolete.” Anything can happen!
  • 21. Countermeasures for Failures • Install a immediate response channel (pager, SMS) • Stop the bleeding first! – Symptoms before cause • Avoid looking for the cause, but prevent further failures • Shut down parts of the system if necessary • Declare a coordinator
  • 22. Document Failures & Solutions • Document every step and progress of failure resolutions • Define protocol templates to reduce overhead • Analyze and replay old protocols • Write regression tests with your solution • Tests make sure old bugs sneak back in • You documented the symptoms of the bug in code
  • 23. Scaling the Persistence Layer “Just hard. ‘Nough said.”
  • 24. CDNs: grab the low hanging fruits • CDNs are cheap web serving helpers • Take load from web servers • Are quick due to in-mem caching of static content • Edge location with shorter round-trip = best latency • Digesting with MD5 hash 8425b886b9a2184c48b34212dfaf103b-index.html 6269a326c6a2184d32b39881baac720c-main.js
  • 25. ReCAP: CAP Theorem? • Out of C, A, and P only two can be kept.
  • 26. Pick your storage systems • Narrow down by purpose, data structure & features • ACID vs. BASE • Basically Available • Soft state • Eventually consistent Complex Queries & Structured • Key-Value & BigTable • SQL Simple Queries & Unstructured • Blob • Document
  • 27. Examples of NoSQL usage Use multiple stores and even redundant data (if necessary for A) • Simple JSON-based web service: Document store • Requests to /profile/{id} loads document “profile-{id}” • Changes are simple and only per document • Complex, but predictable queries: BigTable store • Avoid scans!!! • Create 1 table per query, don’t fear redundant data • Video and Image service: Blob store (+ CDN)
  • 28. Database goes global? • Writing state is hard to distribute globally (c.f. Google Spanner) • Inconsistencies! (A over C) • http://research.google.com/archive/spanner.html • Use distributed replicas & caches for read(?) • Local caches can drift (remember load balancing!) • Memcached clusters can help per data center • Expect eventual consistency with outdated reads • Sharding & Partitioning (in a global cluster) • Divide data horizontally on application layer (primary keys) • Partition/Sharding key design is key • Be careful with JOINs or scans across partitions/shards!
  • 29. Knowing your storage system(s) is crucial • Consistency level & consensus protocols? Paxos, BFT, 2-phase commit, quorum, hashgraph, etc. • Replication strategies? Backups? Replication keys, replication factors, rack/data center-awareness • Performance? Fault-tolerance? Benchmark (data layouts, configurations), elasticity, chaos/stress tests