Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Bringing observability to your testing environments

62 views

Published on

Talk at European Testing Conference 2019 where we talk about how to bring observability principles, commonly used in production environments, to your testing environments with ease.

Published in: Software
  • Be the first to comment

Bringing observability to your testing environments

  1. 1. @fgortazar Bringing production observability to your testing environment Patxi Gortázar Fundación Bancaja, Valencia, Spain 14-15 February 2019 Funded by the European Union This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
  2. 2. @fgortazar Some background
  3. 3. @fgortazar
  4. 4. @fgortazar
  5. 5. @fgortazar
  6. 6. @fgortazar How’s the testing process?
  7. 7. @fgortazar How’s the testing process? Testing activities Automated or manual testing activities. Collect information about the running components during the test run. Fixing Fix the error on the faulty component and any other components that might be affected by these changes. Submit the changes and re- start the process. Bug localization Aka root-cause analysis. Find the faulty component(s) out of the information provided by the failed test. 03 01 02
  8. 8. @fgortazar How’s the testing process? Testing activities Automated or manual testing activities. Collect information about the running components during the test run. Fixing Fix the error on the faulty component and any other components that might be affected by these changes. Submit the changes and re- start the process. Bug localization Find the faulty component(s) out of the information provided by the failed test. 03 01 02
  9. 9. @fgortazar How hard is bug localization?
  10. 10. @fgortazar How hard is bug localization? “I’m not a huge fan of triangles or pyramids but there is an interesting relationship between test levels, understanding the impact of a problem and understanding the cause of the problem” Robert Meaney (@RobMeaney)
  11. 11. @fgortazar Problem: Not enough information!!
  12. 12. @fgortazar Test feedback
  13. 13. @fgortazar Tests feedback
  14. 14. @fgortazar Test logs
  15. 15. @fgortazar Solution: Observability + Analytics!!
  16. 16. @fgortazar “In control theory, observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs.” (Wikipedia)
  17. 17. @fgortazar This is what we usually do in production
  18. 18. @fgortazar Source: honeycomb.io
  19. 19. @fgortazar The four analytics levels 1.Descriptive – What has happened? A test failed 2.Diagnostic – Why did it happen? A request timed out ● Why? The service was down – Why? (narrow the search) 3.Predictive – What will happen? The test will fail because requests are taking longer and longer 4.Prescriptive – What should I do? Stop the test and re-run to discard infrastructure problems ● It pass -> mark as flaky/infra problem ● It fails -> report error
  20. 20. @fgortazar The four analytics levels Most common tools we use for testing and CI belongs to level 1. They are basically descriptive and don’t help in bug localization. Understanding why the product failed under an specific test scenario is a tough exercise.
  21. 21. @fgortazar The four analytics levels We used to spend 30-60 minutes per day to investigate nightly tests that failed in order to classify them into true positive and false positive
  22. 22. @fgortazar Now we have a plan
  23. 23. @fgortazar Collect info about your system ● SUT logs ● SUT health status ● Infrastructure status ● Network traffic ● Resource consumption ● Custom events/metrics
  24. 24. @fgortazar Elasticsearch Kibana Logstash
  25. 25. @fgortazar Source: https://logz.io/blog/filebeat-vs-logstash/
  26. 26. @fgortazar Recipe (1) ● For each service – Use filebeat to read logs and send them over to ElasticSearch – Use metricbeat agent to send metrics over to ES – Use packetbeat to send network packets over to ES ● Send the test logs as well as if it was any other service, they can be useful! ● Beat agents work with standard processes and containers!
  27. 27. @fgortazar Some corner cases “You cannot keep, and should not try to keep, all the operational spew that issues forth from your systems. Dynamic sampling is your friend here, as are server side limits to lop the top off the mountain when e.g. the site goes down and you get a billion operational msgs at once” @mipsytipsy (Charity Majors)
  28. 28. @fgortazar Ben Sigelman
  29. 29. @fgortazar You don’t want to step into big data issues: keep a decomission plan for your data
  30. 30. @fgortazar Recipe (and 2) ● Collect any other custom metrics you might need ● OpenVidu – Video & audio quality metrics: PESQ, PEVQ, MOS, … – Number of participants – Number of videoconference sessions (rooms) – Jitter – WebRTC specific metrics ● Build your own dashboards & reports!
  31. 31. @fgortazar Build your own reports!
  32. 32. @fgortazar Grafana PrometeusKubernetes
  33. 33. @fgortazar It seem s im possible to localize the bug here!!
  34. 34. @fgortazar I need a tool that is aware of my testing activities!
  35. 35. @fgortazar
  36. 36. @fgortazar
  37. 37. @fgortazar
  38. 38. @fgortazar Summary
  39. 39. @fgortazar Summary ● E2E testing is good at detecting user-related problems but it does not give much info about what the actual problem is ● Observability means observing the current status and guessing what’s going on – Info should be collected during the test run – Visualizations are needed to help in the root-cause analysis – Mind the corner cases!
  40. 40. @fgortazar Thanks! Patxi Gortázar Fundación Bancaja, Valencia, Spain 14-15 February 2019 Funded by the European Union This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

×