Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Self Healing - Bringing Intelligence into Automation


Published on

by Mohanbabu Nellore, Director, Visa Inc & Arup Datta, Principal Engineer, Swiggy

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Self Healing - Bringing Intelligence into Automation

  1. 1. 1 Self Healing Bringing Intelligence into Automation Mohan Babu Nellore, Director of Engineering, Visa Arup Datta, Principal Engineer, Swiggy
  2. 2. 2 • Slide for legal disclaimer
  3. 3. 3 Test Engineering… Productivity Challenges? Quick list…
  4. 4. 4 Employee Satisfactio n Troubleshootin g E2E Integration Environment Effective Communicatio n Tools Automation Tools Failed Tests Analysis KPI Metrics/Data Requirement s
  5. 5. 5 Environments! Any Challenges? Small story… Dev/QA Regression Pre-Prod
  6. 6. 6 How do we solve? Self Healing Automation What is it?
  7. 7. 7 Application Health Check• Logs • Monitoring • Alerts • Email Notification System PING Database Queues • Listeners • CRUD Operations • Messaging Queues Stages in the workflow: 1. Ping test 2. Application health-check 3. Listeners health-check 4. E2E flow with database- check 5. Mutual API response check 6. Log collection 7. Mail Notification
  8. 8. 8 Self-healing Automation • Dependency graph • Health-check APIs. • APIs to start/stop/restart services & applications
  9. 9. 9 Functionalities Identify dependencies – producers and consumers. Ensure server and application health monitoring/alerts are set up. Ensure all the dependent components are up and running. Restart services applications automatically, if not working as desired. If not, send out an email notification with logs attached.
  10. 10. 10 Sample Effort Savings Calculations
  11. 11. 11 Key pointers Use Human-centered design approach as to solve the problem Identify multiple metrics to assess training and monitoring When possible, directly examine your raw data Understand the limitations of your dataset and model Test, test and test. Continue to monitor and update the system after deployment
  12. 12. 12 Impact on the team • Improved Up-time, no manual intervention/debugging at large. • Know the exact point of failure for downtime. • Consistent test results without false positives. • Drastic productivity boost by avoiding manual debugging. • Extendibility for reliability/failover testing -> graceful handling. • We have time for solving bigger problems!
  13. 13. 13 Thank You! @mohanbn Q & A