SlideShare a Scribd company logo
1 of 22
Scalable Service Architectures
Lessons learned
Zoltán Németh
Engineering Manager, Core SystemsAn IBM company
Agenda
 Our scalability experience
 What is Scalability?
 Requirements in detail
 Tips and tools
 Extras, Closing remarks
Our experience
Streaming stack
Defining scalability
Scalability is the ability to handle increased workload
by repeatedly applying a costeffective strategy for
extending a system’s capacity.
(CMU paper, 2006)
How well a solution to some problem will work when
the size of the problem increases. When the size
decreases, the solution must fit. (dictionary.com and
Theo Schlossnagle, 2006)
Self-contained
service
 Explicitly declare and
isolate dependencies
 Isolation from the outside
system
 Static linking
 Do not rely on system
packages
Disposability  Maximize robustness with
fast startup and graceful
shutdown
 Disposable processes
 Graceful shutdown on
SIGTERM
 Handling sudden death:
robust queue backend
Startup and
Shutdown
 Automate all the things
 Chef
 Docker
 Gold image based
deployment
 Immutable
 Handling tasks before
shutdown
Backing Services  Treat backing services as
attached resources
 No distinction between
local and third party
services
 Easily swap out resources
 Export services via port
binding
 Become the backing
service for another app
Processes,
concurrency
 Stateless processes (not
even sticky sessions)
 Process types by work type
 We <3 linux process
 Shared-nothing  adding
concurrency is safe
 Process distribution
spanning machines
Statelessness  Store everything in a
datastore
 Aggregate data
 Chandra
 Aggregator / map &
reduce
 Scalable datastores
 Handling user sessions
Monitoring  Application state and
metrics
 Dashboards
 Alerting
 Health
 Remove failing nodes
 Capacity
 Act on trends
Monitoring  Metrics collecting
 Graphite, New Relic
 Self-aware checks
 Cluster state
 Zookeeper, Consul
 Scaling decision types
 Capacity amount
 Graph derivative
 App requests
Load Balance and
Resource
Allocation
 Load Balance: distribute
tasks
 Utilize machines
efficiently
 VM compatible apps
 Flexibility
 Adapting to available
resources
Load Balance  DNS or API
 App level balance
 Uniform entry point or
proxy
 Balance decisions
 Load
 Zookeeper state
 Resource policies
Service
Separation
 Failure is inevitable
 Protect from failing
components
 Cascading failure
 Fail fast
 Decoupling
 Asynchronous operations
 Message queues
Service
Separation
 Rate limiting
 Circuit Breaker pattern
 Stop cascading failure,
allow recovery
 Hystrix
 Fail fast, fail silent
 Service decoupling
Extras  Debugging features
 Logs
 Clojure / JS consoles
 Runtime configuration
via env
 Scaling API
 Integrating several
cloud providers
 Automatic start / stop
Reading
 Scalable Internet Architectures by Theo Schlossnagle
 The 12-factor App: http://12factor.net/
 Carnegie Mellon Paper: http://www.sei.cmu.edu/reports/06tn012.pdf
 Circuit Breaker: http://martinfowler.com/bliki/CircuitBreaker.html
 Release It! by Michael T. Nygard
Questions
syntaxerror@ustream.tv

More Related Content

What's hot

Keynote : évolution et vision d'Elastic Observability
Keynote : évolution et vision d'Elastic ObservabilityKeynote : évolution et vision d'Elastic Observability
Keynote : évolution et vision d'Elastic ObservabilityElasticsearch
 
Grant Fritchey - Query Tuning In Azure SQL Database
Grant Fritchey - Query Tuning In Azure SQL DatabaseGrant Fritchey - Query Tuning In Azure SQL Database
Grant Fritchey - Query Tuning In Azure SQL DatabaseRed Gate Software
 
Workflows via Event driven architecture
Workflows via Event driven architectureWorkflows via Event driven architecture
Workflows via Event driven architectureMilan Patel
 
Ionut hrubaru, bogdan lazarescu sql server high availability
Ionut hrubaru, bogdan lazarescu   sql server high availabilityIonut hrubaru, bogdan lazarescu   sql server high availability
Ionut hrubaru, bogdan lazarescu sql server high availabilityCodecamp Romania
 
CloudBrew 2016 - Building IoT solution with Service Fabric
CloudBrew 2016 - Building IoT solution with Service FabricCloudBrew 2016 - Building IoT solution with Service Fabric
CloudBrew 2016 - Building IoT solution with Service FabricTeemu Tapanila
 
Whitepaper : Building an Efficient Microservices Architecture
Whitepaper : Building an Efficient Microservices ArchitectureWhitepaper : Building an Efficient Microservices Architecture
Whitepaper : Building an Efficient Microservices ArchitectureNewt Global Consulting LLC
 

What's hot (6)

Keynote : évolution et vision d'Elastic Observability
Keynote : évolution et vision d'Elastic ObservabilityKeynote : évolution et vision d'Elastic Observability
Keynote : évolution et vision d'Elastic Observability
 
Grant Fritchey - Query Tuning In Azure SQL Database
Grant Fritchey - Query Tuning In Azure SQL DatabaseGrant Fritchey - Query Tuning In Azure SQL Database
Grant Fritchey - Query Tuning In Azure SQL Database
 
Workflows via Event driven architecture
Workflows via Event driven architectureWorkflows via Event driven architecture
Workflows via Event driven architecture
 
Ionut hrubaru, bogdan lazarescu sql server high availability
Ionut hrubaru, bogdan lazarescu   sql server high availabilityIonut hrubaru, bogdan lazarescu   sql server high availability
Ionut hrubaru, bogdan lazarescu sql server high availability
 
CloudBrew 2016 - Building IoT solution with Service Fabric
CloudBrew 2016 - Building IoT solution with Service FabricCloudBrew 2016 - Building IoT solution with Service Fabric
CloudBrew 2016 - Building IoT solution with Service Fabric
 
Whitepaper : Building an Efficient Microservices Architecture
Whitepaper : Building an Efficient Microservices ArchitectureWhitepaper : Building an Efficient Microservices Architecture
Whitepaper : Building an Efficient Microservices Architecture
 

Similar to Scalable service architectures @ BWS16

Scalable service architectures @ VDB16
Scalable service architectures @ VDB16Scalable service architectures @ VDB16
Scalable service architectures @ VDB16Zoltán Németh
 
Stephane Lapointe & Alexandre Brisebois: Développer des microservices avec Se...
Stephane Lapointe & Alexandre Brisebois: Développer des microservices avec Se...Stephane Lapointe & Alexandre Brisebois: Développer des microservices avec Se...
Stephane Lapointe & Alexandre Brisebois: Développer des microservices avec Se...MSDEVMTL
 
Microsoft Azure Cloud Basics Tutorial
Microsoft Azure Cloud Basics TutorialMicrosoft Azure Cloud Basics Tutorial
Microsoft Azure Cloud Basics TutorialIIMSE Edu
 
(ENT210) Accelerating Business Innovation with DevOps on AWS | AWS re:Invent ...
(ENT210) Accelerating Business Innovation with DevOps on AWS | AWS re:Invent ...(ENT210) Accelerating Business Innovation with DevOps on AWS | AWS re:Invent ...
(ENT210) Accelerating Business Innovation with DevOps on AWS | AWS re:Invent ...Amazon Web Services
 
Eda on the azure services platform
Eda on the azure services platformEda on the azure services platform
Eda on the azure services platformYves Goeleven
 
Designing distributed systems
Designing distributed systemsDesigning distributed systems
Designing distributed systemsMalisa Ncube
 
Muves3 Elastic Grid Java One2009 Final
Muves3 Elastic Grid Java One2009 FinalMuves3 Elastic Grid Java One2009 Final
Muves3 Elastic Grid Java One2009 FinalElastic Grid, LLC.
 
Deploying SaaS Application on the Cloud - Case Study
Deploying SaaS Application on the Cloud - Case StudyDeploying SaaS Application on the Cloud - Case Study
Deploying SaaS Application on the Cloud - Case StudyNati Shalom
 
A guide through the Azure Messaging services - Update Conference
A guide through the Azure Messaging services - Update ConferenceA guide through the Azure Messaging services - Update Conference
A guide through the Azure Messaging services - Update ConferenceEldert Grootenboer
 
CSC AWS re:Invent Enterprise DevOps session
CSC AWS re:Invent Enterprise DevOps sessionCSC AWS re:Invent Enterprise DevOps session
CSC AWS re:Invent Enterprise DevOps sessionTom Laszewski
 
Being Elastic -- Evolving Programming for the Cloud
Being Elastic -- Evolving Programming for the CloudBeing Elastic -- Evolving Programming for the Cloud
Being Elastic -- Evolving Programming for the CloudRandy Shoup
 
SaaS Enablement of your existing application (Cloud Slam 2010)
SaaS Enablement of your existing application (Cloud Slam 2010)SaaS Enablement of your existing application (Cloud Slam 2010)
SaaS Enablement of your existing application (Cloud Slam 2010)Nati Shalom
 
Kluczowe elementy infrastruktury...
Kluczowe elementy infrastruktury...Kluczowe elementy infrastruktury...
Kluczowe elementy infrastruktury...Alicja Sieminska
 
Albara Abdalkhalig
Albara AbdalkhaligAlbara Abdalkhalig
Albara Abdalkhaligbrra51
 
Current trends in software engineering
Current trends in software engineeringCurrent trends in software engineering
Current trends in software engineeringbrra51
 

Similar to Scalable service architectures @ BWS16 (20)

Scalable service architectures @ VDB16
Scalable service architectures @ VDB16Scalable service architectures @ VDB16
Scalable service architectures @ VDB16
 
Stephane Lapointe & Alexandre Brisebois: Développer des microservices avec Se...
Stephane Lapointe & Alexandre Brisebois: Développer des microservices avec Se...Stephane Lapointe & Alexandre Brisebois: Développer des microservices avec Se...
Stephane Lapointe & Alexandre Brisebois: Développer des microservices avec Se...
 
Microsoft Azure Cloud Basics Tutorial
Microsoft Azure Cloud Basics TutorialMicrosoft Azure Cloud Basics Tutorial
Microsoft Azure Cloud Basics Tutorial
 
(ENT210) Accelerating Business Innovation with DevOps on AWS | AWS re:Invent ...
(ENT210) Accelerating Business Innovation with DevOps on AWS | AWS re:Invent ...(ENT210) Accelerating Business Innovation with DevOps on AWS | AWS re:Invent ...
(ENT210) Accelerating Business Innovation with DevOps on AWS | AWS re:Invent ...
 
Eda on the azure services platform
Eda on the azure services platformEda on the azure services platform
Eda on the azure services platform
 
Azure and cloud design patterns
Azure and cloud design patternsAzure and cloud design patterns
Azure and cloud design patterns
 
Designing distributed systems
Designing distributed systemsDesigning distributed systems
Designing distributed systems
 
Sql Azure
Sql AzureSql Azure
Sql Azure
 
Muves3 Elastic Grid Java One2009 Final
Muves3 Elastic Grid Java One2009 FinalMuves3 Elastic Grid Java One2009 Final
Muves3 Elastic Grid Java One2009 Final
 
Deploying SaaS Application on the Cloud - Case Study
Deploying SaaS Application on the Cloud - Case StudyDeploying SaaS Application on the Cloud - Case Study
Deploying SaaS Application on the Cloud - Case Study
 
A guide through the Azure Messaging services - Update Conference
A guide through the Azure Messaging services - Update ConferenceA guide through the Azure Messaging services - Update Conference
A guide through the Azure Messaging services - Update Conference
 
Introduction To Cloud Computing
Introduction To Cloud ComputingIntroduction To Cloud Computing
Introduction To Cloud Computing
 
12-Factor App
12-Factor App12-Factor App
12-Factor App
 
CSC AWS re:Invent Enterprise DevOps session
CSC AWS re:Invent Enterprise DevOps sessionCSC AWS re:Invent Enterprise DevOps session
CSC AWS re:Invent Enterprise DevOps session
 
Being Elastic -- Evolving Programming for the Cloud
Being Elastic -- Evolving Programming for the CloudBeing Elastic -- Evolving Programming for the Cloud
Being Elastic -- Evolving Programming for the Cloud
 
Managing the cloud
Managing the cloudManaging the cloud
Managing the cloud
 
SaaS Enablement of your existing application (Cloud Slam 2010)
SaaS Enablement of your existing application (Cloud Slam 2010)SaaS Enablement of your existing application (Cloud Slam 2010)
SaaS Enablement of your existing application (Cloud Slam 2010)
 
Kluczowe elementy infrastruktury...
Kluczowe elementy infrastruktury...Kluczowe elementy infrastruktury...
Kluczowe elementy infrastruktury...
 
Albara Abdalkhalig
Albara AbdalkhaligAlbara Abdalkhalig
Albara Abdalkhalig
 
Current trends in software engineering
Current trends in software engineeringCurrent trends in software engineering
Current trends in software engineering
 

More from Zoltán Németh

Reveal The Secrets of Your Videos
Reveal The Secrets of Your VideosReveal The Secrets of Your Videos
Reveal The Secrets of Your VideosZoltán Németh
 
Voxxed Days Belgrade 2017 - How not to do DevOps
Voxxed Days Belgrade 2017 - How not to do DevOpsVoxxed Days Belgrade 2017 - How not to do DevOps
Voxxed Days Belgrade 2017 - How not to do DevOpsZoltán Németh
 
Content protection with UMS
Content protection with UMSContent protection with UMS
Content protection with UMSZoltán Németh
 
Implementing DevOps In Practice
Implementing DevOps In PracticeImplementing DevOps In Practice
Implementing DevOps In PracticeZoltán Németh
 
On-demand real time transcoding
On-demand real time transcoding On-demand real time transcoding
On-demand real time transcoding Zoltán Németh
 
DB séma kezelés Liquibase-el
DB séma kezelés Liquibase-elDB séma kezelés Liquibase-el
DB séma kezelés Liquibase-elZoltán Németh
 

More from Zoltán Németh (9)

Reveal The Secrets of Your Videos
Reveal The Secrets of Your VideosReveal The Secrets of Your Videos
Reveal The Secrets of Your Videos
 
Voxxed Days Belgrade 2017 - How not to do DevOps
Voxxed Days Belgrade 2017 - How not to do DevOpsVoxxed Days Belgrade 2017 - How not to do DevOps
Voxxed Days Belgrade 2017 - How not to do DevOps
 
Content protection with UMS
Content protection with UMSContent protection with UMS
Content protection with UMS
 
Implementing DevOps In Practice
Implementing DevOps In PracticeImplementing DevOps In Practice
Implementing DevOps In Practice
 
Building our own CDN
Building our own CDNBuilding our own CDN
Building our own CDN
 
Culture @ Velocity UK
Culture @ Velocity UKCulture @ Velocity UK
Culture @ Velocity UK
 
On-demand real time transcoding
On-demand real time transcoding On-demand real time transcoding
On-demand real time transcoding
 
DB séma kezelés Liquibase-el
DB séma kezelés Liquibase-elDB séma kezelés Liquibase-el
DB séma kezelés Liquibase-el
 
Daemons in PHP
Daemons in PHPDaemons in PHP
Daemons in PHP
 

Recently uploaded

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Scalable service architectures @ BWS16

  • 1. Scalable Service Architectures Lessons learned Zoltán Németh Engineering Manager, Core SystemsAn IBM company
  • 2. Agenda  Our scalability experience  What is Scalability?  Requirements in detail  Tips and tools  Extras, Closing remarks
  • 5.
  • 6. Defining scalability Scalability is the ability to handle increased workload by repeatedly applying a costeffective strategy for extending a system’s capacity. (CMU paper, 2006) How well a solution to some problem will work when the size of the problem increases. When the size decreases, the solution must fit. (dictionary.com and Theo Schlossnagle, 2006)
  • 7. Self-contained service  Explicitly declare and isolate dependencies  Isolation from the outside system  Static linking  Do not rely on system packages
  • 8. Disposability  Maximize robustness with fast startup and graceful shutdown  Disposable processes  Graceful shutdown on SIGTERM  Handling sudden death: robust queue backend
  • 9. Startup and Shutdown  Automate all the things  Chef  Docker  Gold image based deployment  Immutable  Handling tasks before shutdown
  • 10. Backing Services  Treat backing services as attached resources  No distinction between local and third party services  Easily swap out resources  Export services via port binding  Become the backing service for another app
  • 11. Processes, concurrency  Stateless processes (not even sticky sessions)  Process types by work type  We <3 linux process  Shared-nothing  adding concurrency is safe  Process distribution spanning machines
  • 12. Statelessness  Store everything in a datastore  Aggregate data  Chandra  Aggregator / map & reduce  Scalable datastores  Handling user sessions
  • 13. Monitoring  Application state and metrics  Dashboards  Alerting  Health  Remove failing nodes  Capacity  Act on trends
  • 14. Monitoring  Metrics collecting  Graphite, New Relic  Self-aware checks  Cluster state  Zookeeper, Consul  Scaling decision types  Capacity amount  Graph derivative  App requests
  • 15.
  • 16. Load Balance and Resource Allocation  Load Balance: distribute tasks  Utilize machines efficiently  VM compatible apps  Flexibility  Adapting to available resources
  • 17. Load Balance  DNS or API  App level balance  Uniform entry point or proxy  Balance decisions  Load  Zookeeper state  Resource policies
  • 18. Service Separation  Failure is inevitable  Protect from failing components  Cascading failure  Fail fast  Decoupling  Asynchronous operations  Message queues
  • 19. Service Separation  Rate limiting  Circuit Breaker pattern  Stop cascading failure, allow recovery  Hystrix  Fail fast, fail silent  Service decoupling
  • 20. Extras  Debugging features  Logs  Clojure / JS consoles  Runtime configuration via env  Scaling API  Integrating several cloud providers  Automatic start / stop
  • 21. Reading  Scalable Internet Architectures by Theo Schlossnagle  The 12-factor App: http://12factor.net/  Carnegie Mellon Paper: http://www.sei.cmu.edu/reports/06tn012.pdf  Circuit Breaker: http://martinfowler.com/bliki/CircuitBreaker.html  Release It! by Michael T. Nygard

Editor's Notes

  1. A bit of Ustream intro
  2. Definition Requirements coming from 12-factor, and some added by us Some more detail and tools on selected requirements
  3. 30 day viewer graph. Clear peaks -> need for scaling
  4. Quick description of the streaming stack, roles of components, how they require scaling - Transcontroller/transcoder scaling - UMS scaling
  5. Quick description of the streaming stack, roles of components, how they require scaling - Transcontroller/transcoder scaling - UMS scaling
  6. Carnegie Mellon University paper by Charles B. Weinstock, John B. Goodenough: On System Scalability LINFO: The Linux Information Project http://www.linfo.org/ Next: principles
  7. Example: calling imagemagick or curl from code – they might be there or might not be Bundle everything into the app instead
  8. Disposable process: they can be started or stopped at a moment’s notice For a web process, graceful shutdown is achieved by ceasing to listen on the service port (thereby refusing any new requests), allowing any current requests to finish, and then exiting. Implicit in this model is that HTTP requests are short (no more than a few seconds), or in the case of long polling, the client should seamlessly attempt to reconnect when the connection is lost. For a worker process, graceful shutdown is achieved by returning the current job to the work queue.
  9. Docker: build images from dockerfile, deploy from repository Tasks before shutdown: moving jobs, log collection, sleep
  10. A backing service is any service the app consumes over the network as part of its normal operation. Examples include datastores (such as MySQL or CouchDB), messaging/queueing systems (such as RabbitMQ or Beanstalkd), SMTP services for outbound email (such as Postfix), and caching systems (such as Memcached). Put a resource locator in the config only – environment variables Example: Easily swap out a local mysql to a remote service The app does not rely on runtime injection of a webserver into the execution environment to create a web-facing service. The web app exports HTTP as a service by binding to a port, and listening to requests coming in on that port. One app can become the backing service for another app, by providing the URL to the backing app as a resource handle in the config for the consuming app
  11. Handle diverse workloads by assigning each type of work to a process type. For example, HTTP requests may be handled by a web process, and long-running background tasks handled by a worker process An individual VM can only grow so large (vertical scale), so the application must also be able to span multiple processes running on multiple physical machines.
  12. Aggregate everything within the app and write it out in bulk – careful about write frequency, must not lose too many data on a crash Aggregator map-reduce Redis: scales reads, write problematic Cassandra: quick scaling questionable Aerospike: scales reads and writes, working together with their eng team User sessions: persistent connection, NIO+
  13. Alerting -> openduty Two important groups: Health vs capacity
  14. Report everything to graphite, constantly check graph trends automatically Apps are self-aware, they know their health App instances report into Zookeeper and thus know about each other Central logic can request resource based on capacity or graph, app can request based on self-check or zookeeper Zookeeper, Consul: miért, mik az előnyei
  15. load balancing distributes workloads across multiple computing resources Flexibility: can increase or decrease its own size, example: Threadpools Adapting to CPU, RAM, disk, network
  16. App level: transcontroller selects transcoder App level balance with proxy can be SPOF, careful Resource policies: even distribution, keep large chunks free for possible large tasks (transcoder use case), group requests together on some attribute (pro, etc)
  17. Failure inevitable because: large numbers, hw issues, independent network Decoupling: serving one request should not wait on others
  18. Hystrix by Netflix 2011/12 Circuit Breaker: Martin Fowler post from 2014 Service decoupling example: inserting layers between DB and UMS -> RGW. Then another layer between RGW and UMS -> Queue
  19. Logs: logs as stream / stdout (factor #9), collect / transport / process Scaling API: Other considerations: price, network line to the cloud provider, instance type (spot vs normal) Openstack, Ganeti