Standard Bank Group
Agile, DevOps, Engineering
transformation.
DevOps & AppDynamics in a Complex Banking Environment.
1,221
Branches
8,800
ATMs
Operations in 20
Countries
Est. 154
years ago
14,9 million
Retail Customers
Over 44,000
Staff
A Broad & Complex Footprint
Our vision is to be the leading financial services organisation
in, for and across Africa, delivering exceptional client
experiences and superior value
Vision & Strategy
Environment where people can collaborate and innovate
Engineering Culture
Innovative Engineering Practises
Digital Strategy
Unified Customer Experience
Digital Strategy…
Unified Customer Experience
Mobile, Web, USSD, IVR, ATM, POS & Branch
Staff Self Service, Staff Assist and Self Service
Monolith to Domain Services
Build Pipelines – MVP2
FREQUENCY QUALITY RELIABILITY SECURITY
Code into trunk
Deploys into test
Days since last
prod deploy
Lint
Complexity
Customer
feedback
Failed deploys %
Up-time (% 500 errors)
Coverage
Health checks
TESTING
Checkmarx
SSL Scans
Build Pipelines – MVP2…
Test Coverage – Portfolio
Test Coverage – Feature Teams
Trunk Based Development – Feature Teams
Some History
The Challenges
26 February 2015 Mobile Banking app was down for the whole day!
The Status Quo
If you had an outage then…
1. Invited to the crisis room
2. No access to production
3. No access to the tools being used
4. Tools turned off because it impacts production
5. Yet you need to be able to tell people what is wrong
6. Changes being backed out
7. No clear root cause
!!
The Fallout
The Challenges
• 27 February 2015 Meeting Group CIO to explain what happened
• No RCA
• Devs treated as Second Class Citizens when it comes to prod!
• We need to do something, doing nothing and leaving things as they are
were not an option…
The Journey to a Solution
Discovery
• How did we monitor applications in the past?
• There is monitoring, but it is not available to the community…
• We tried to use existing tools…
• Why APM?
• Looked at alternatives
• Convince the right people
The Journey to a Solution…
7 Key Requirements
1. Always running in production
2. Easy to deploy and use
3. “I do not want to call the vendor!”
4. DevOps enabler
5. For everybody to use
6. Code drill-down capability
7. Auto discovery
AppDynamics - Standard Bank’s APM Solution
How can we remove
impediments and deliver quality
software to our customers?
•
Rest of Africa
Nigeria – Internet Banking
• Performance issues
• Login takes 30 Seconds
• Slow responses in services layer
• Developers spend 3 – 4 hours per day looking for/in logs
• It wasn't me…
• Login issue traced to core banking system
• Week later patch received - Logins reduced to less than 3 sec
• Developers build features instead of hunting logs and bugs
•
South Africa
USSD - 2016
• Project ran for more than 2 years
• Complex stack – so many layers…
• Performance issues everywhere
• Pinpoint where the performance issues and errors are
• Metrics to help drive decisions
• Finally in production 26 October 2016
You can prevent outages and reduce
the time to fix production issues…
•MTBF – We can prevent bad customer experience
South Africa – Mobile Banking
• Alert was triggered that the error rate was higher than usual
• Investigation found one node in the cluster not working
• 25% of our customers were experiencing timeouts
•MTTR – We can solve problems…quicker
• April 2016 outage is reported on Internet Banking
• 5 minutes into the outage we could pinpoint the problem
• Issue in our Adaptive Risk system
• Issue isolated and service was restored
South Africa – Internet Banking
• No need for a crisis meeting (How boring…)
• Responsible team dealt the issue
• Service was restored quickly
•MTTR – We can solve problems… quicker
• Alert triggered due to slow responses for transaction logging
• Not impacting customer experience yet…
• Connection pools started to fill up
• Customers experiencing slow performance
South Africa – Mobile Banking
• Pinpoint that the issue was on a message queue
• We could pro-actively fix the issues and restart the broker
• Response times improved and service returned to normal
Enable feature teams to do dev and
ops
• Visibility on errors and exceptions
• Metrics on how code is performing
• I did not know the code was doing that…in production
• Pin point where the issues are
• Better quality code going into production (Engineering practices)
• Alerts to when things go wrong
• Have visibility without having to logon to the server
• You can even monitor certificate expiry
Dev and Ops in a feature team is possible
Dev and Ops in a feature team is possible…
DevOps – What does it mean to Standard Bank?
What does it mean to Standard Bank?
DevOps
• Resilience in our teams, applications and infrastructure
• Visibility to our software and infrastructure
• You build it, you own it
• Automate everything
• Culture
• Enabler to get solutions to our customers quicker and frequently
DevOps = Development + Operations…
Learnings from Standard Bank
TIPS
Automate
Automate as much you can
• Recipes to install Controller & Agents
• But why?
• Repeatable & consistent
• Machines are better at repetitive things
• People can focus on value add work
The right people
AppDynamics in the right hands…
• Allow dev and ops teams to deploy the agents
• Do not centralise the deployment of the agents
• Dev and ops teams know their applications best
• Find people that care about their applications
Not everything comes for free…
Out of the box good enough?
• You need to manage your licences
• Put some thought into how you structure your apps
• Business transaction limits
Define minimum criteria
Checklist
• Name business transactions you want to monitor
• Name remote services to identify back ends
• Define health rules
• Setup alerts
• Dashboard per feature team
DevOps & AppDynamics in a Complex Banking Environment.
Questions?
“They call it Africa,
we call it home”

Standard Bank: How APM Supports DevOps, Agile and Engineering Transformation - AppD Summit Europe

  • 1.
    Standard Bank Group Agile,DevOps, Engineering transformation. DevOps & AppDynamics in a Complex Banking Environment.
  • 2.
    1,221 Branches 8,800 ATMs Operations in 20 Countries Est.154 years ago 14,9 million Retail Customers Over 44,000 Staff A Broad & Complex Footprint
  • 3.
    Our vision isto be the leading financial services organisation in, for and across Africa, delivering exceptional client experiences and superior value Vision & Strategy
  • 4.
    Environment where peoplecan collaborate and innovate Engineering Culture
  • 5.
  • 6.
  • 7.
    Digital Strategy… Unified CustomerExperience Mobile, Web, USSD, IVR, ATM, POS & Branch Staff Self Service, Staff Assist and Self Service
  • 8.
  • 9.
    Build Pipelines –MVP2 FREQUENCY QUALITY RELIABILITY SECURITY Code into trunk Deploys into test Days since last prod deploy Lint Complexity Customer feedback Failed deploys % Up-time (% 500 errors) Coverage Health checks TESTING Checkmarx SSL Scans
  • 10.
  • 11.
  • 12.
    Test Coverage –Feature Teams
  • 13.
    Trunk Based Development– Feature Teams
  • 14.
    Some History The Challenges 26February 2015 Mobile Banking app was down for the whole day!
  • 15.
    The Status Quo Ifyou had an outage then… 1. Invited to the crisis room 2. No access to production 3. No access to the tools being used 4. Tools turned off because it impacts production 5. Yet you need to be able to tell people what is wrong 6. Changes being backed out 7. No clear root cause !!
  • 16.
    The Fallout The Challenges •27 February 2015 Meeting Group CIO to explain what happened • No RCA • Devs treated as Second Class Citizens when it comes to prod! • We need to do something, doing nothing and leaving things as they are were not an option…
  • 17.
    The Journey toa Solution Discovery • How did we monitor applications in the past? • There is monitoring, but it is not available to the community… • We tried to use existing tools… • Why APM? • Looked at alternatives • Convince the right people
  • 18.
    The Journey toa Solution… 7 Key Requirements 1. Always running in production 2. Easy to deploy and use 3. “I do not want to call the vendor!” 4. DevOps enabler 5. For everybody to use 6. Code drill-down capability 7. Auto discovery AppDynamics - Standard Bank’s APM Solution
  • 19.
    How can weremove impediments and deliver quality software to our customers?
  • 20.
    • Rest of Africa Nigeria– Internet Banking • Performance issues • Login takes 30 Seconds • Slow responses in services layer • Developers spend 3 – 4 hours per day looking for/in logs • It wasn't me… • Login issue traced to core banking system • Week later patch received - Logins reduced to less than 3 sec • Developers build features instead of hunting logs and bugs
  • 21.
    • South Africa USSD -2016 • Project ran for more than 2 years • Complex stack – so many layers… • Performance issues everywhere • Pinpoint where the performance issues and errors are • Metrics to help drive decisions • Finally in production 26 October 2016
  • 22.
    You can preventoutages and reduce the time to fix production issues…
  • 23.
    •MTBF – Wecan prevent bad customer experience South Africa – Mobile Banking • Alert was triggered that the error rate was higher than usual • Investigation found one node in the cluster not working • 25% of our customers were experiencing timeouts
  • 24.
    •MTTR – Wecan solve problems…quicker • April 2016 outage is reported on Internet Banking • 5 minutes into the outage we could pinpoint the problem • Issue in our Adaptive Risk system • Issue isolated and service was restored South Africa – Internet Banking • No need for a crisis meeting (How boring…) • Responsible team dealt the issue • Service was restored quickly
  • 25.
    •MTTR – Wecan solve problems… quicker • Alert triggered due to slow responses for transaction logging • Not impacting customer experience yet… • Connection pools started to fill up • Customers experiencing slow performance South Africa – Mobile Banking • Pinpoint that the issue was on a message queue • We could pro-actively fix the issues and restart the broker • Response times improved and service returned to normal
  • 26.
    Enable feature teamsto do dev and ops
  • 27.
    • Visibility onerrors and exceptions • Metrics on how code is performing • I did not know the code was doing that…in production • Pin point where the issues are • Better quality code going into production (Engineering practices) • Alerts to when things go wrong • Have visibility without having to logon to the server • You can even monitor certificate expiry Dev and Ops in a feature team is possible
  • 28.
    Dev and Opsin a feature team is possible…
  • 29.
    DevOps – Whatdoes it mean to Standard Bank?
  • 30.
    What does itmean to Standard Bank? DevOps • Resilience in our teams, applications and infrastructure • Visibility to our software and infrastructure • You build it, you own it • Automate everything • Culture • Enabler to get solutions to our customers quicker and frequently DevOps = Development + Operations…
  • 31.
  • 32.
    Automate Automate as muchyou can • Recipes to install Controller & Agents • But why? • Repeatable & consistent • Machines are better at repetitive things • People can focus on value add work
  • 33.
    The right people AppDynamicsin the right hands… • Allow dev and ops teams to deploy the agents • Do not centralise the deployment of the agents • Dev and ops teams know their applications best • Find people that care about their applications
  • 34.
    Not everything comesfor free… Out of the box good enough? • You need to manage your licences • Put some thought into how you structure your apps • Business transaction limits
  • 35.
    Define minimum criteria Checklist •Name business transactions you want to monitor • Name remote services to identify back ends • Define health rules • Setup alerts • Dashboard per feature team
  • 36.
    DevOps & AppDynamicsin a Complex Banking Environment. Questions? “They call it Africa, we call it home”

Editor's Notes

  • #2 -Head up digital channels software engineering -Recently asked to drive software engineering across IT (not the admin side)
  • #3 -Geographically distributed. -Core Banking transformation – with many different deployment scenarios. -Solutions need to be flexible enough to deal with variety and a state of transition
  • #4 Digitisation strategy across the board Staff SS, Staff Assist, Self Service Deep stack not just channels Omni channel strategy favoring Self Service So how do you meet the challenge – One key objective was to become obsessed with Software Engineering Background on Tablet app – clueless we were
  • #5 Building 20 or the Magical Incubator Erected during WW2 which lasted 55 years. 9 Nobel Prize Winners worked in this building Many significant innovations came out of this building – radar WW2 and Bose
  • #6 - Blue Green Deployments came out of one of our feature teams - Allows for safe intraday deployments (Beta programs and roll back) - Adopted Agile and later moved into DevOps - 31k per FP to 8.5k per FP and significantly faster. (Governance aside)
  • #7 - Multi Geography Multi Domain Federate ownership of the Tile to the respective Feature Team We also have Companion Apps if justified (Kids banking, OST) where some services are shared
  • #8 Leverage new features: Cross Boarder Payments – easy on SS not SA Home loans calculators Beneficiary Management Ties into the concept of feature teams who know the domain Structuring Feature Teams is hard – historically around system not domain (makes ownership of technical debt and migration off monolith harder)
  • #9 Make use of Elastic Search for many things: Example where we pulled our Stash repo into Elastic Search Easily see which teams are the most active
  • #10 Automate anything that gave us grief Applying Engineering practices to all build pipelines Things we want to measure for now (Remedy, App Stores) Metrics will be gatherer from our tools (Atlassian API, AppDynamics) The concept of the build pipeline is commit and automated process kicks off No fingers in the pie All governance baked into the pipeline Test data created automatically Web Services off Mainframe – challenge the status quo
  • #11 Automate anything that gave us grief Applying Engineering practices to all build pipelines Things we want to measure for now (Remedy, App Stores) Metrics will be gatherer from our tools (Atlassian API, AppDynamics) The concept of the build pipeline is commit and automated process kicks off No fingers in the pie All governance baked into the pipeline Test data created automatically Web Services off Mainframe – challenge the status quo
  • #12 Sonar summary of the portfolio
  • #31 Need to put Ops back into DevOps
  • #35 Walter White
  • #36 Walter White