It is great to be here and listen to some really good examples of how Splunk is being used in different areas
But before I start, a quick disclosure that my compliance colleague forced me to tatoo on my arm:
All opinions on this presentation are mine and they do not represent a position from my employer LBG is, by no means, providing financial advise or disclosing any information that is already public knowledge
Fine, that is out of the way … so let me share a current example of how we are using Splunk for transform our Operating Model for the Digital Bank
A bit of background ….
I am aware that there are some people from outside the UK … so some general info about LBG.
LBG is a banking group with four major brands … Lloyds, Halifax, BoS and more recently MBNA. The group also provides other FS such as pensions, insurance, personal financing etc… but those are out of the scope for this presentation.
Lloyds has been trading for more than 250 years … and we hope it stays that way…. or at least until I retire…. Which is still really far away as I am seriously young.
Lloyds is presumably the largest online bank in Europe by number of UU (+15m)…. For this particular presentation, size does matter because volumes drive data crunching….Last year we recorded almost 4bn logons! On our busiest day we have around 18m logons … a peak minute goes as high as 21k logons! An average digital user logons 24 times per month …. for those of you in the creative industries, that mean 288 times per year… if you compare that with the average number of times a customer goes to a branch in a year … any guess? …. 4 time per year! Clearly our main channel is the Digital Bank.
So what does that mean? Well, the way I see it, we are mainly a digital company that provides an IB platform for our customers personal finance needs! Our main operations are closer to a software company rather than a brick & mortar business…. and that requires a significant shift on how we operate and manage our business.
The next few minutes I want to talk about two fundamental changes for us and how they are accelerating transformation.
The first fundamental change is at our infrastructure level…. this used to be a our happy place, an on-premise infrastructure with a nice landing zone and an authentication layer. Regulation, new technology and a need for increased agility forced us to change that model ….
We still have our legacy infrastructure but now we also have API Gateways to interact with TPP. Open Banking is a two way street that allow us to interact with other FS institutions …. and obviously we also have cloud.
As you can expect, we have Splunk deployed in each one of this layers. We have a central team that provides monitoring and analytics for these layers …. but that is also due to change because there is a second fundamental change ….
The second fundamental change is on WoW…. a nice and cosy monolithic BR train , whit all products lining up their new features to be pushed into production …. But what about if something goes wrong in one of those features? Well, normally, you bring back the whole train … fix it and ship it again. I am not going to stop and explain why this is not the most agile way to operate …that debate is long gone .
A central team provides the monitoring and alerts on any unacceptable variance that can jeopardise our production environments.
So the new WoW… no longer a single train to ship all in one go, each product teams can deploy smaller releases more frequently with a certain degree of autonomy.
To add to the complexity, product teams can deploy to different environments …on-prem, hybrid, cloud
In summary, we are transforming our business from centralised teams…. to a Spoke & Hub model and ultimately …. Autonomous cells
All this sounds great…. I mean, where do I sign? …. Or is there a catch? …. It all comes at a price…. And the price is resilience …..
So this is the eternal battle between…. Good and evil…. RavenClaw vs Gryfiindor …. Lanisters vs Starks …. I still cant get over the fact that GoT is over
The main challenge that these new changes are bringing is resilience …. Resilience is particularly important in FS companies because it drives our group risk appetite, it has sever implications from a regulatory angle and it is one of the main drivers of customer satisfaction …. Oh yes, I did that, I am justifying my actions in the name of its holiness …. “Customer Satisfaction”
What are we doing to tackle this challenge? Introduce the concept of SRE … a well proven model in the Tech industry but relatively fresh in FS
In a nutshell, we are integrating our Dev and Ops teams to work closer together ….using an SRE approach where we introduce the concept of “Reliability Acceptance Point”
RAP is the point where reliability intersects the cost inflection point. In other words, our appetite to tolerate disruption at a certain cost.
The way to enforce that RAP is, mainly, through error budgets …. and finally …. this where Splunk! comes into play … I was already seeing a worry face from the Splunk team … as in when is this guy going to mention us?
If you remember, on my previous slides I emphasised that the monitoring and analytics was mainly driven by a central team (AKA as my team). The introduction of SRE and Error Budgets brings two challenges:
How to extend the use of our tooling Define an error budget
Let me use an example to bring this to life. Think about one of our key Product Teams…. Digital access … the logon journey…. Error Budgets should be defined from a customer point of view … if you think about your banking mobile app we identify that there are mainly two things that matter:
OK…. easy …. Here you go, we have hundreds of dashboards and screens where you can pull the data from Splunk ….
The problem with this approach, is that, some teams will find it a bit too much for their needs ….
So our challenge is transform that into a simple business view that can build the bridge between Tech and Business ,,,,
In a nut shell…. Your data should be able to answer straight forward questions….
- Was the mobile app available when I needed it? - Did the journey deliver to the expectations?
A simple error budget can drive and change behaviour ….
At the moment we are in a transformation journey, so far I can share three lessons learnt ….
- Democratise date …. It is OK to have a CoE, but if you want to drive digital transformation at scale you need to make sure data analytics is a intrinsic capability in your product teams
- Tooling rightsize …. Provide the right level of tools so that your teams can do the right level of DIY ….. When needed, bring the big drill!
- Potentially you can use Splunk to measure a ridiculous number of metrics …. But at the end of the day, your customer should be the one that defines your digital journey metrics.
Splunk at Lloyds Banking Group
Splunk + SRE
BACKGROUND ON LBG
• Founded June 1765
• One of the big 4 “Clearing Banks”
• Largest Online Bank (15m UU)
FUNDAMENTAL OPERATIONAL CHANGE (2)
Monolithic Business Release
FUNDAMENTAL OPERATIONAL CHANGE (2)
Agile and Multi Environment Model
CHANGE IN OPERATING MODEL
CENTRALISED SPOKE & HUB AUTONOMOUS
Deploy as many features as possible
Deploy as fast as possible
Focus on reliability
Break as little as possible
The SRE model improves operational reliability of a service, with a focus on actively coaching product
teams on system resiliency principles, automation and optimisation initiatives.
The goal of SRE is to enable the product teams to maximise
speed and agility in a responsible way