Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Capacity Planning Infrastructure for Web Applications (Drupal)

87 views

Published on

In this session we will try to solve a couple of recurring problems:
Site Launch and User expectations

Imagine a customer that provides a set of needs for hardware, sets a date and launches the site, but then he forgets to warn that they have sent out some (thousands of) emails to half the world announcing their new website launch! What do you think it will happen?

Of course launching a Drupal Site involves a lot of preparation steps and there are plenty of guides out there about common Drupal Launch Readiness Checklists which is not a problem anymore.
What we are really missing here is a Plan for Capacity.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Capacity Planning Infrastructure for Web Applications (Drupal)

  1. 1. Carbon Fiber Tank, SpaceX How to lower the costs of your Drupal Site's resources and plan Capacity in advance ricardoamaro sre@acquia
  2. 2. About me @ricardoamaro ● Principal SRE @Acquia (Cloud Data Team) ● Joined in December 2011 ● Location: Lisbon, Portugal ● Co-authored Seeking SRE w/ Machine Learning for SRE (O’Reilly) ● Founder and Lead of the Portuguese Drupal Association ● Fun Facts: ○ Presented in DevOps events including DrupalCons. ○ Dedicated father of 2 kids and still manages to study and write. ○ First Linux installation: Slackware in 1994. ○ Former theatre actor.
  3. 3. Agenda What we will be talking about The problem What is Capacity Why do Capacity Planning Relation to Site Reliability Engineering Budget & Capacity Planning Load Testing Performance Tuning vs. Capacity Planning What to measure How to measure How to track capacity Forecasting First Easy Steps Conclusions
  4. 4. The Problem Site Launch & User Expectations Falcon Heavy launch, Spacex
  5. 5. Typical Drupal Site Launch What about Capacity Planning?? - Disable devel - Configure cron - Check The Upload Sizes & Execution Time - Check Recipient Email Addresses - Set The File Permissions - Protect Your Root Account - Check Permissions - Turn Off Error Reporting - Handle 404 Errors Gracefully - Check Robots.txt - Combine Pathauto With Global Redirect - Create A Maintenance Page - Configure Caching - Css And Javascript Optimisation - Check Unpublished Content Is Not Visible - Configure Statistics - Monitor the Site - ** Plan for Failure **
  6. 6. User Expectations Drupal click screenshot ● The end goal of capacity planning is a smooth and speedy experience for the users ● Varies depending on what type of application is and what portion of the application they interact with
  7. 7. No silver bullet ● Plenty of capacity but a slow website or unavailable ● Capacity is only one part of making the end-user experience fast ● We want to measure and track to make forecasts ● Intolerable amount of latency should raise a flag
  8. 8. What is Capacity resources required to run your services in the context you have chosen to run them Carbon Fiber Tank, SpaceX
  9. 9. Capacity in Site Reliability Engineering (SRE) ● Capacity: The maximum amount of output a product deployment is capable of completing in a given period of time ● Capacity planning: Process that determines the resources needed, like people, instances, CPU, memory, time and more, for the company to meet changing demands for its services ● In the Drupal World we focus mostly on serving WEB capacity
  10. 10. Resource management The Art of Capacity Planning Arun Kejariwal, John Allspaw "O'Reilly Media, Inc." ● Ensure proper resources are available to handle load ● Define procurement and an approval process ● Justify capital needs ● Manage resources after deployment
  11. 11. Why do Capacity Planning Kroger grocery store, Lexington Kentucky, 1947, by Brett Streutket
  12. 12. Quick and Dirty Math ● Only spend as much as you actually need ● Be ahead of sharp growth ● Avoid emergencies Stay Fast and Reliable
  13. 13. Site Reliability Engineering Rocket Laboratory, 1952 NASA/William A. Bowles
  14. 14. Ben Treynor - Google ...an SRE team is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their service(s)... “ “
  15. 15. Demand Forecasting and Capacity Planning ● Ensuring that there is sufficient capacity and redundancy ● Serve projected future demand with the required availability ● Ensure the required capacity is in place by the time it is needed ● Take both organic and inorganic growth into account https://unsplash.com/photos/mexeVPlTB6k
  16. 16. How SRE advocates for Capacity Planning ● Perform regular load testing ● Incorporate SLOs on Capacity ● Capacity is critical to availability, therefore the SRE team leads capacity planning initiatives and provisioning https://unsplash.com/photos/DX9X0g0Cg88
  17. 17. Budget & Capacity Planning Vintage Grow Your Money by Chris Potter, ccPixs.com
  18. 18. Keeping the costs low ● Meet with Finance, Engineering and Product ● Gather Systems and Application metrics ● Use that data to justify the investment Three forces that impact Capacity Planning Product FinanceEngineering Plan
  19. 19. Load Testing “Hope is not a strategy” St. Margrethen - Load Test by Kecko
  20. 20. Load testing a Drupal stack ● How to load test? “Hit it until it breaks” ● Include the points of failure in the calculations ● Determining backend limits can be tricky ● Use those resource ceilings as a basis while predicting future growth https://docs.acquia.com/acquia-cloud/arch/
  21. 21. Database Backend Load Test ➔ How many queries/second (QPS) can the DB server manage? ➔ How many QPS can it serve before performance degradation affects end-user experience? ● What load will cause the database to be unresponsive or fail-over? Allowing to set alert thresholds accordingly. ● What to expect from adding (or removing) nodes to the backend? ● When to begin sizing for a new database capacity?
  22. 22. A Few Load testing Tools simulate ● Loadrunner ○ http://bit.ly/microfocus-loadrunner ● Iago ○ https://github.com/twitter/iago ● JMeter ○ http://jmeter.apache.org/ collect ● Prometheus ○ http://www.prometheus.io/ ● Signalfx ○ http://www.signalfx.com/ ● Cacti ○ http://cacti.net ● Ganglia ○ http://ganglia.info ● Nagios ○ http://nagios.org/ https://www.gocomics.com/calvinandhobbes/1986/11/26
  23. 23. Performance Tuning vs. Capacity planning (different goals) Top Speed by Alexander Nie
  24. 24. What to measure defining the metrics End-of-life by Dennis van Zuijlekom
  25. 25. Divide & Conquer ● Splitting nodes ● Understand capacity demands of each node ● Measure more distinctly ● How requests or queries per second affect resources
  26. 26. Identifying the key resources to measure ● Disk space (MB) ● Disk throughput (IOPS) ● CPU performance (FLOPS) ● RAM memory (MB) ● Network bandwidth (Mbps) ● Network IP pool (Netmask) ● Others
  27. 27. How to measure Living Computer Museum, Seattle
  28. 28. http://www.brendangregg.com/Perf/linux_perf_tools_full.png | Tools to measure on Linux servers |
  29. 29. Collecting resources on web servers TODO: CODE ● Example script that sends metrics to statsd ● Low footprint using /proc, df and ps ● For a constant reliable monitoring service use collectd: https://collectd.org or Telegraf: https://www.influxdata.com/time- series-platform/telegraf/
  30. 30. How to track Capacity
  31. 31. Store and display time-series ● Signalfx ● Cacti ● Ganglia ● Graphite ● Signalfx ● Datadog ● Ruxit ● LogicMonitor ● Sematext ● CoScale ● Riemann ● Prometheus ● Sensu ● Idera ● Bijk ● X-Pack ● vRealize Hyperic HQ
  32. 32. A couple of load testing tips load testing Tutorials: https://www.tutorialspoint.com/jmeter https://www.blazemeter.com/load-testing docker app for grafana: https://github.com/kamon-io/docker-grafana-graphite
  33. 33. Forecasting (predicting trends) Numbers And Finance by SeniorLiving.org
  34. 34. Predict the future? ● Use Context & Math ● Make educated guesses ● Long-term view is generally steady ● Generate estimates to sustain growth ● Use an adjustable process ● Forecast guides autoscaling policies
  35. 35. Ceilings and Historical data ● Daily storage consumption example ● Metric: total available disk space ● Cumulative total provides an historical perspective ● We can predict future needs ● Storage will probably be exhausted in the ceiling to where the line is headed
  36. 36. Curve fitting ● Curve fitting ● Creative & Scientific ● Stay ahead of growth ● Use time-series data ● Forecast by constructing new data points beyond the known ● Reconciliation of what we know and the best fit equation ● Consider context before math y = mx+b
  37. 37. Forecasting Peak-Driven Resource Usage ● Track how the peaks change over time ● Extrapolate from that data to predict future needs ● Identify the server resource ceilings ● Find a relation between resources and application-level work ● Decide if we should scale vertically or horizontally ● and perform proactive autoscalling
  38. 38. ● Fityk is an Open Source Software for nonlinear fitting of analytical functions to data. ● Incorporate cfityk scripts into automated curve fitting, like: cfityk ricardo-disk.fit @0 < ricardo-disk.csv guess Quadratic fit info formula quit Returns the formula: 4888.18 + 363.063 * x + 8.91132 + -1.55119*x + 0.0660771*x^2 Homepage: https://fityk.nieto.pl/ cfityk ricardo-disk.fit @0 < ricardo-disk.csv guess Quadratic fit info formula quit Automating Forecasts with fityk & cfityk Small demo: https://youtube.com/watch?v=EZnyq1Hr_7I
  39. 39. Forecasting with Machine Learning Seeking SRE Conversations About Running Production Systems at Scale Publisher: O'Reilly Media ● Most popular method for curve-fitting in fityk is Levenberg-Marquardt ● ML is also an option for forecasting (book I co-authored) ● Code examples and guides https://github.com/ricardoamaro/MachineLearning4SRE
  40. 40. Start with Easy Steps
  41. 41. Get Started 1. Select a process owner. 2. Identify the resources to be measured. 3. Measure these resources. 4. Compare to maximum capacity. 5. Collect workload forecasts. 6. Use forecasts for IT resource requirements. 7. Map requirements onto existing utilizations. 8. Predict when the system will be out of capacity. 9. Update forecasts and utilizations.
  42. 42. Set a Goal! ● Two Classes: ○ Load: usually expressed in arrival rate or peak rate of requests hitting the service eg. target for 10.000 authenticated concurrent Drupal users ○ Performance: usually expressed in the form of Service Level Objectives eg. 99th percentile of all requests should return in less 500ms
  43. 43. Be proactive ( plan & document ahead) Picasso drawing with Paloma and Claude at Villa la Galloise, 1953. By Edward Quinn, EdwardQuinn.com.
  44. 44. Capacity Planning Dashboard ● Support your conclusions with metrics in a dashboard ● Both manual scaling and auto scaling decision should be based on real data ● When to scale? ○ date and time (be alerted if needed) ● How to scale? ○ vertical, horizontal or diagonal scaling (Example) Drupal Cluster Dashboard type valu e limit/ node ceiling units limit (total) current (peak) peak % Estimated days left Varnish cache 28 1024 req/sec 2048 600 29% 830 Web 31 80 busy calls 160 145 90% 12 Database 15 60 connections 120 96 80% 36 Storage 14 30 TB 30 14 46% 21
  45. 45. Conclusions Drive the system to the appropriate level of risk for the lowest cost.
  46. 46. Join us for contribution opportunities Thursday, October 31, 2019 9:00-18:00 Room: Europe Foyer 2 Mentored Contribution First Time Contributor Workshop General Contribution #DrupalContributions 9:00-14:00 Room: Diamond Lounge 9:00-18:00 Room: Europe Foyer 2

×