Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

CLL19 - Acceptance Tests as Monitors

22 views

Published on

In this talk I explain why and how you can run acceptance tests in production as part of your monitoring.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

CLL19 - Acceptance Tests as Monitors

  1. 1. Acceptance Tests as Monitors Phill Barber
  2. 2. In this talk Why the need to do this? How can you do it? Live Demo Conclusions
  3. 3. About me Java Dev ● Media Companies ● Small Startup ● Finance (Investment Bank) DevOps and Monitoring has been a focus
  4. 4. Why? Why run Acceptance Tests in Production?
  5. 5. Incident Response Failure Detection Capacity Planning Performance Analysis Business Metrics / Insight Monitoring
  6. 6. Signing off your build Monitoring your environment CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=587238
  7. 7. Monitoring: “Standard” health checks Your App Upstream App A Upstream App B Your App’s DB Monitoring/ Alerting GET /healthcheck GET /ping GET /ping select * from dual; Prometheus Dashing, Nagios etc etc
  8. 8. See Dropwizard: https://github.com/dropwizard/dropwizard/blob/master/dropwizard-core/src/main/java/io/dropwizard/setup/AdminEnvironment.java
  9. 9. Monitor User/Client Activity This is one of the best things you can do. ● If you can see actual events - you know it’s ok. ● Low cost - High Benefit But… can you alert? ● Pretty Graph - Does it alert?
  10. 10. Low traffic apps Is it broken - or is it Christmas Day? Anomaly Detection can’t be responsive with low volumes Maybe you can’t Alert?
  11. 11. Instead of alerting that build is broken, alert that prod is broken They: ● pass or fail - simple! ● descriptive names - helpful! ● Can have helpful errors - helps incident response Tests can lend themselves well to asserting the health of a system. Build time tests can be used as monitoring
  12. 12. REDUCE By Juergen Rosskamp, wiki+spam@eindruckschinderdomain.de - digital still picture, CC BY-SA 2.5, https://commons.wikimedia.org/w/index.php?curid=1877742
  13. 13. Know the Impact of an issue Your App App 1 App 2 Your App’s DB Monitoring/ Alerting GET /healthcheck GET /ping select * from dual; Search Results Test Product Summary Test Product Details Test
  14. 14. Supplementing a Canary release Real Clients/Users Load Balancer Old Version New Version Acceptance Tests as Monitors
  15. 15. Real Availability stats Check the production site’s homepage? Check the functionality!
  16. 16. Real Availability stats Your App Acceptance Tests as Monitors Search Results Test Product Summary Test Product Details Test Pingdom
  17. 17. Background Website for a media company with login. Java code and Java tests. Very concerned about site availability We were sold. How do we do it?
  18. 18. 1. Login with correct username/password creates a session 2. Login with incorrect username/password gives error user.for.test.monitoring.membership-team@thecompany.com What to test?
  19. 19. First thought “Let’s run (some of) our acceptance tests from Jenkins against production” Not good: ● Does your CI Server get as much love as production in your org? How do we do it?
  20. 20. Second thought “Let’s run our tests from Nagios (monitoring server) in production” Not good: ● Nagios (and it’s many forks/variants) rely on quick simple checks. ○ We ended with chaos! ● Harder to deploy How do we do it?
  21. 21. Third thought “Let’s just treat our acceptance tests as another microservice in production” Great: ● Treated like any other service. ○ Development ○ Deployment ○ Monitoring How do we do it?
  22. 22. atam4j Acceptance Tests As Monitors 4 Java Anurag Kapur @anuragkapur and myself created atam4j https://github.com/atam4j
  23. 23. Your App Monitoring/ Alerting GET /tests atam4j Dropwizard Microservice Run Your Tests atam4j Prometheus Dashing, Nagios etc etc
  24. 24. Three big issues Data ● Does it pollute? Events ● How significant is the action? Technically ● Where and how will they run?
  25. 25. Demo time What could possibly go wrong?
  26. 26. Solving the Test Data & Event issues Data ● Can you hide it? ● Mark it as test data ○ Sometimes you might need to add extra fields “testData”: true Events ● How far should events propagate ? ○ You might want to cut them short ○ Talk to your downstream systems consuming the data ● Always exclude from monitoring real transactions
  27. 27. Atam4j - New Features ● Prometheus Support? - Not yet, but it is on a branch… WIP ● Atam4Node? - Considering it!
  28. 28. Conclusion It can have benefits Cost Benefit will depend on your domain If you do it: ● treat the tests as similar as possible to other tests ● treat the deployment and monitoring as similar as possible to other apps Tools will help, but.... Hardest problems are specific to your domain

×