Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Integration Testing as Validation and Monitoring


Published on

In the world of software-as-a-service, just about anyone with a laptop and an Internet connection can spin up their very own cloud-based web service. Software startups, in particular, are often big on ideas but small on staff. This makes streamlining the traditional develop-test-integrate-deploy-monitor pipeline critically important. Melissa Benua says that an effective way to accomplish this is to reduce the number of different test suites that verify many of the same things for each stage. Melissa explains how teams can avoid this by authoring the right set of tests and using the right frameworks. Drawing on lessons learned in companies both large and small, Melissa shows how teams can drastically slash time spent developing automation, verifying builds for release, and monitoring code in production—without sacrificing availability or reliability.

Published in: Technology
  • Be the first to comment

Integration Testing as Validation and Monitoring

  1. 1. INTEGRATION TESTING AS VALIDATION AND MONITORING Melissa Benua Senior Backend Engineer PlayFab, Inc STARWEST 2015
  2. 2. The challenge: Monitoring SaaS products Software as a service is exploding, and so is testing complexity: 1. Not enough just to run tests at build time, now you also need need deploy-time integration tests and continuous network monitoring 2. Every layer of tests adds complexity & maintenance costs 3. There are a limited amount of engineer-hours in the day 4. Engineers want to use their time with maximum efficiency Time spent writing the same tests over again is time that could be spent doing more interesting and important stuff!
  3. 3. EXISTING OPTIONS Commercial products you can buy now! 3
  4. 4. Cloud Monitoring Services Providers: • Keynote • Gomez • Pingdom Pros: • Lightweight • Integrated alerting • Public vs. private status pages Cons: • Difficult to manage multiple contributors • Can’t do complex checks easily (log in a user and verify inventory) • Can get expensive or require enterprise contracts
  5. 5. Hosted Monitoring Services Providers: • Sensu • System Center Operations Manager (SCOM) • Nagios Pros: • Extremely powerful • Older technology Cons: • Complex to set up • Single centralized server • Overkill for many services hosted in the cloud
  6. 6. OUR APPROACH Do it the PlayFab way! October 5, 2015 PlayFab Confidential 6
  7. 7. Our Solution 1. Author one set of HTTP-level tests • Same as how clients connect • Self-contained and self-initializing • Repeatable and reliable 2. Deploy tests both within the build environment and within the monitoring cloud 3. Collect data from tests into one central location 4. Present data for use by both devops and customers Pros: • Efficient use of engineering resources • VM hosting bill is very small • Can run complex tests without worrying about maintainability Cons: • Pipeline requires some maintenance • Requires knowing how to use two different clouds • Must be able to do test setup from within a different ecosystem
  8. 8. Our solution, cont’d Goals: • Minimize number of lines of code duplicated per functional piece • Reliable & trustworthy reporting • Affordable cost • Adequate geo-location • Very low maintenance time cost • Easy to access • More free time for engineering! Limitations: • Smaller # of monitoring leaf nodes (~10 instead of ~100 or ~1000) • Vulnerable to gaps in dev logic • Not as straightforward to set up • Monitoring is only as good as your testing!
  9. 9. TESTING SCENARIOS One of these may look familiar! October 5, 2015 PlayFab Confidential 9
  10. 10. Scenario A – RESTful API Sample characteristics: • Custom service in Java layered on Apache • Private hosting • Tests via Junit • Authenticates using private login • Connects to several different backend services (mongodb, sql, analytics, queueing, etc)
  11. 11. Scenario B – MVC Website Sample characteristics: • Built on .net MVC • Hosted in Azure • Testing via custom harness • Authenticates using OAuth and Facebook • Backends into locally-hosted SQL server
  12. 12. Scenario C - PlayFab Characteristics: • JSON API built on C# + management website • • Hosted in Windows on AWS • Tests via VSTest • Many moving parts • Game server hosting • Client versus server authentication • Third-party purchasing and auth providers • Various backend data sources
  13. 13. IMPLEMENTING OUR SOLUTION How to wire up the pipeline! October 5, 2015 PlayFab Confidential 13
  14. 14. Architecture 14 Build Server  Compiles code  Runs tests Production Deploys Web Server  Collects Data Web Site  Displays Data Developer Writes Tests Europe Microsoft Azure US-West US-East Asia Amazon Web Services Submits Code
  15. 15. Utilized Tech Test Framework • VSTest or Junit or custom executor • Must output a predictable, machine-readable format (.TRX from VSTest comes with an XSD for easy parsing) Execution + Communication Layer • Consul or custom cross-DC chatter • Consul API is in many languages, easy to secure and simple configure • Regularly executes the test executable • Shares test results as ‘service health checks’ across DCs Custom Data Bridge • Transform test framework output into Consul input
  16. 16. Picking Monitoring Tests October 5, 2015 PlayFab Confidential 16 Full App Integration Test Suite Internal Service A Test Suite Library Unit Test Suite Integration Suite Internal Service B Test Suite Integration Suite
  17. 17. Picking Monitoring Tests, con’t Must-haves: • Happen at same layer clients access (HTTP, generally) • Cover key ‘P0’ functionality areas • Cover areas with lots of ‘moving parts’ Nice-to-haves: • All exposed APIs • Third-party integrations • Full success-testing run Ideal world: • Full integration test suite
  18. 18. Scenario Must-Have Test Cases REST API • Login/Authenticate • Logout • One test per downstream service • Stretch: one test per API MVC Website • One test per login method (OAuth, Facebook) • Key pages • Basic SQL coverage
  19. 19. Deployment Pipeline The fewer manual steps the better! Sample flow: Submit Code to Repo CI Runs Build CI Runs Tests Deployment Packages Created Tests Deployed into Monitor Cloud Storage Cloud Storage Distributes to VMs
  20. 20. Monitoring Cloud Any cloud will do! Number of regions is important • Azure has • AWS has VMs can be teeny – no need for heavy compute or memory usage
  21. 21. Test Execution Frequency How complex is it to run your tests? • Run a simple executable? • Have to download a lot of data? • Long setup phase? • How long does a full test pass take? Periodic execution (every N seconds) Faster is better! Pingdom ‘free’ tier is every 15 minutes per check Ideal range is between 30 seconds and 5 minutes Be careful not to drown your ‘real traffic’ • Test traffic hiding problems with real users is a legitimate issue! • Try to stay under 10% of total traffic if possible
  22. 22. Collecting Results Execute Tests Put machine-readable test results into collator • Consul accepts Datacenter, CaseName, Pass/Warn/Fail, Note (we store latency) • Agents may be updated using SDK or direct to HTTP interface • Example: http://localhost:8500/v1/agent/check/pass/mytestcase • Full HTTP API: Small adapter program reads test results and outputs to Consul Agent (SDK or HTTP)
  23. 23. Output!
  24. 24. Alerting Ideal to hear about outages as a push rather than a pull Determine what ‘failure’ means to you • Balance between false alarms and missing real alarms Many options! • Post alerts into VictorOps for paging • Send email from monitoring website • Send push notification through your cloud
  25. 25. Questions? Melissa Benua
  26. 26. APPENDIX Technical Details and Sample Config October 5, 2015 PlayFab Confidential 26
  27. 27. Partial Consul Configuration { "datacenter": "prd-uswest1", "retry_join_wan": [ “", “" ], "server": true, "service": { "name": "pfmonitor", "checks": [ { "script": "C:WindowsSystem32WindowsPowerShellv1.0powershell .exe -file c:runtests.ps1", "interval": "120s" } ] } }
  28. 28. Consul Commands Full HTTP API: Add a health check: $body = { "ID": “mypath", "Name": "Path Works", "Notes": "Checking uptime and latency", "HTTP": "", "TTL": "45s" } • Invoke-WebRequest http://localhost:8500/v1/agent/check/register -Body $body List the health checks: • Invoke-WebRequest http://localhost:8500/v1/health/checks/myservice [ { "Node": "somenode", "CheckID": “mypath", "Name": “Path Works", "Status": "passing", }, ]
  29. 29. Consul Commands Update a health check: • Can add ?note=foo to pass details like latency • Invoke-WebRequest http://localhost:8500/v1/agent/check/pass/mypath • Invoke-WebRequest http://localhost:8500/v1/agent/check/warn/mypath • Invoke-WebRequest http://localhost:8500/v1/agent/check/fail/mypath