With more than 3.2 million customers and a vastly complex tech landscape, Virgin Money's IT team faces huge pressure to provide the ultimate digital banking experience. In this candid Q&A session, Andy Lofthouse will dive into the company's journey from alert storm and countless hours of problem hunting, to rapid release cycles and precise digital experience insights, which has saved the company inordinate amounts of time and money.
8. confidential
• Smartscape - vertical and horizontal
topology.
• Understand which services, hosts or
processes are talking to each other
• Understand the services, processes and
hosts are providing the application, directly
or indirectly
• No configuration and easy deployment!
• Nodes highlight red if in a current problem
• Quick drill down to the desired component
Why Dynatrace – first impressions
9. confidential
Quick time to value
Problem
• Not repeatable in Test and cannot be
troubleshooted with current tooling
• After months of investigation and customers
being impacted, the root-cause of the issue
cannot be found
Impact
• Issue causes severe slow downs for the users
and timeouts, eventually needing a manual
failover to the DR site
• Operations team mislead by current alerting on
their investigation path
Consequences
• Poor customer experience drive
poor conversion rates
Recurring issue
for months
479 hours
lost in War-room
up to today.
6 Virgin Money teams
and one 3rd party were
involved
Happening
more frequently
Has cost so far
£23,950
Brand reputation
impacted by bad tweets
10. First 2 weeks - Incidents & Alerting
Foglight Alerts - 128
• 61% of them were false alerts
• 39% of them were genuine issues.
• Out of that 39%, half of them were duplicate alerts
• Only 26 were real after duplicates/false etc taken out
Dynatrace Problem Resolution - 100
• 42% said problem resolved.
• Leaving 58% which were genuine
• 100% accurate!
Noise caused by poor alerting + poor troubleshooting + no Rootcause analysis
= 479 hours of investigation.
11. First value we saw
• Database CPU everynight
between 8 and 9pm
• Peak login times
• Couldn’t see this issue prior
Response time slow down