A strategy for security data analytics - SIRACon 2016

What does a strategy for data analytics looks like, which can win in the often chaotic
reality of the business environment?
2

While I can’t promise this talk will provide a definitive answer, hopefully it will offer
some answers, and if not, at least some inspiration about what to think about, and a
snag list of things to avoid.
3

Security leaders and executives globally face 3 challenging questions:
1. What’s our business risk exposure from cyber?
2. What’s our security capability to manage that risk?
3. And based on the above, what do we prioritize?
4

By applying data analytics to security, we can provide the meaningful, timely, accurate
insights that security leadership need to do 2 things:
1) Support their colleagues in other teams, so they have the information to make
robust, defensible risk decisions; and
2) Gain the evidence we need to justify improvement where it matters most –
hopefully at best cost
5

That’s easy to say, but harder to do.
Because data analytics is a multi-dimensional problem space with a lot of moving
parts.
At the data layer, we have technologies that provide the output we need for analytics.
But even for one type of technology like anti-virus, we may have multiple vendors in
place, outputting data in different structures, with various coverage. They can also be
swapped out from one year to the next.
At the platform layer, how do we marshal our data so that we can deliver the
analytics that meet user need?
At the analysis layer, what techniques are available to us, and how repeatable are
they (or can they be) for the scale and frequency of analysis we want to run?
At the insight layer, how can we manage the fact that one question answered often
leads to many harder questions emerging, with an expectation for faster turn-
around?
6

At the communication layer, how do we make insight relevant and accessible to our
multiple stakeholders, based on their decision making context and concerns? And
how do we make any caveats clear at the different levels of strategic, operational and
tactical?
And lastly, provability. How do we win trust when so much analysis that leaders have
seen is wrong?
6

Taking on multidimensional problems always has a high risk of resulting in a resume
generating event.
So if we know all this, we may be tempted not to try.
7

Because if we do try – and precisely because the problem is complex, we will likely
fail forward on the way to success.
8

To our investors, the teams funding our projects, this is what failure looks like.
- An increasing amount of spend
- Very little visible value
- Someone making a heroic effort to save the situation
- Then getting frustrated with the politics they run into
- And leaving
9

In many cases, this happens because security data analytics efforts are “Death Star
projects”.
They are built on a big visions (and equally big promises), which require large teams
to do stuff successfully they haven’t done before, with coordination of lots of moving
parts over a long period of time.
And sometimes these visions aim to tackle problems that we don’t know that we can
solve, or even that solves the most important problem we have.
10

This cartoon sums up a lot of ‘blue sky thinking’ threat detection projects, which
often turn into sunk costs that the business can’t bear to put an end to because of
the time and money they’ve ploughed into them.
11

With any Death Star project, careful thought is needed about the legacy that other
teams will have to pick up afterwards.
12

But the same is also true at the other end of the spectrum, where it’s easy to end up
spending a lot of money on ‘data science noodling’, which doesn’t provide
foundational building blocks that can be built upon.
We hire some data scientists, set them to work on some interesting things, but
eventually they end up helping us fight the latest fire (either in security operations or
answering this weeks knee jerk question from the Executive Committee), rather than
doing what we hired them for.
And while artisanal data analytics programs definitely have short term value they also
have 2 core problems: 1) they aren’t scalable; and 2) their legacy doesn’t create the
operational engine for future success.
13

Although the actual amounts can differ, both the Death Star and artisanal model can
lead to this situation.
And in security, this isn't an unusual curve.
For many executives it represents their experience across most of the security
projects they’ve seen.
14

So how do we bend the cost-to-value curve to our will?
Not only in terms of security data analytics programs themselves, but also in terms of
how security data analytics can bend the overall cost curve for security by delivering
the meaningful, timely, accurate information that leadership need.
15

Ultimately what we need to win in business, is to win in a game of ratios.
This means :
- minimizing the time where value is less than spend
- maximizing the amount of value delivered for spend; and
- making value very visible to the people who are funding us
16

The conclusion of today’s talk is that we can only achieve this in the following way:
1) Start with a focus on sets of problems, not single use cases – and select initial
problem sets that give us building blocks of data that combine further down the line
to solve higher order problems with greater ease.
17

2) Take an approach to these problem sets that stacks Minimum Viable Products of
‘data plus analytics’ on top of each other.
18

And 3), approach problem sets in a sequence that sets us up to deliver greatest value
across multiple stakeholders with the fewest data sets possible.
19

In summary, this means that the battle ground we select for our data analytics
program needs to be here.
20

So, onto Act 1 of this talk.
21

Let’s imagine we work at ACME Inc., and our data analytics journey so far looks like
this.
Years ago, we invested heavily in SIEM - and discovered that while it was great if you
had a small rule set and a narrow scope, it quickly became clear upon deployment
that this dream would be unattainable.
As we moved from ‘rules’ to ‘search’, we invested in Splunk, only to run into cost
problem that inhibited our ability to ingest high volume data sets.
To manage a few specific headaches, we purchased some point analytics solutions.
And then spun up our own Hadoop cluster, with the vision of feeding these other
technologies with only the data they needed from a data store that we could also use
to run more complex analytics.
23

In meta terms, we could describe our journey as an oscillation between specific and
general platforms …
24

… as we adapted to the changing scale, complexity and IT operating model of our
organisation.
25

Let’s zoom in our our latest project, and walk through some scenarios that we may
have experienced …
26

… as we moved from build …
27

And let’s imagine, not unreasonably, that our ‘value delivered’ curve across our
scenarios looks like this.
31

We’ve built our data lake, ingested data, done analysis, and delivered some insight –
so we’re feeling good.
33

But now we’ve run into problems.
34

And we’re paid a visit by the CFO’s pet T-Rex.
What’s gone wrong?
35

Well, as the questions people asked us got harder over time, at some point our ability
to answer them ran into limits.
36

The first problem we had was the architecture of our data lake.
It’s centrally managed by IT, and it doesn't support the type of analysis we want to do.
Business units using the cluster care about one category of analytics problem, and
the components we need to answer our categories of analytics problems isn’t in IT’s
roadmap.
We put in a change request, but we’re always behind the business in the priority
queue – and change control is taking a long time to go through, as IT have to work
out if and how an upgrade to one component in the stack will affect all the others.
Meanwhile, the business is tapping it's fingers impatiently, which means as a stop gap
we're putting stuff in excel and analysis notebooks … which is exactly what we
wanted to avoid.
37

Fortunately we got that problem solved, but then we encountered another. Now that
we’re generating insights, we’re getting high demand for analysis from lots of
different people.
In essence, we’ve become a service. Everyone who has a question realizes we have a
toolset and team that can provide answers, and we’re get a huge influx of requests
that are pulling us away from our core mission.
Half our team are now working on an effort to understand application dependencies
for a major server migration IT is doing. We're effectively DDoSed by our success, and
have to service the person who can shout the loudest.
We need to wrap a lot of process round servicing incoming requests, but while we’re
trying to do that, prioritization has run amok.
38

To try and get a handle on this, we called in a consultancy, who’ve convinced us that
what we need to do is set up a self service library of recipes so people can answer
their own questions.
We’ve built an intuitive front end interface to Hadoop, but we've quickly discovered
that with the same recipe, two people with different levels of skill can make dishes
that taste and look very different.
Now we're in a battle to scale the right knowledge for different people on how to do
analysis to avoid insights being presented that are accurate.
39

We're also finding that, as we deal with more complex questions, what we thought
was insight is not proving that valuable to the people consuming it.
Our stakeholders don’t want statuses or facts; they wanted us to answer the question
‘What is my best action?’
While we’re used to producing stuff at tactical or operational level for well-defined
problems, they are looking for strategic direction.
40

Here, we’re pre-insight, and doing good stuff building analytics.
42

But it’s taking a lot longer to get to insights than we thought it would.
43

We didn’t understand the amount of work involved to:
- understand data sets, clean them, and prepare them; then
- work out the best analysis for the problem at hand, do that analysis and
communicate the result in a way that's meaningful to the stakeholder receiving them
…
- all with appropriate caveats to communicate the precision and accuracy of the
information they’re looking at
44

We’re now having conversations like this. because someone has read a presentation
or bit of marketing that suggest sciencing data happens auto-magically by throwing it
at a pre-built algorithm.
45

Specifically in the context of machine learning, a lot of the marketing we're seeing on
this today is dangerous.
First, there's a blurring of vocabulary, which doesn’t differentiate the discipline of
data analytics and data science vs the methods that data science and data analytics
use.
So when marketing pushes stories of automagic results from data analytics (which is
used wrongly as a synonym for ML) – and that later turns out to be an illusion - the
good work being done suffers by association.
46

Second, it speaks to us on an emotional level, when we don’t have a good framework
to assess if these ‘solutions’ will do what they claim in our environments.
As the CISO of a global bank said to me a few weeks ago, it is tempting and
comforting to think, when we face all the problems we do with headcount, expertise
and budget that, yes, perhaps some unsupervised machine learning algo can solve
this thorny problem I have.
So we give it a try, and it makes our problems worse not better.
47

Now, this isn’t a new problem in security.
It’s summed up eloquently in a paper called ‘A Market For Silver Bullets’, which
describes the fundamental asymmetry of information we face, where both sellers and
buyers lack knowledge of what an effective solution looks like. (Of course, the threat
actors know, but unfortunately, they’re not telling us).
In the world of ML, algos lack the business context they need – and it’s the
enrichment of algos that make the difference between lots of anomalies that are
interesting, but not high value, vs output we can act on with confidence.
48

But often neither the vendor nor the buyer know exactly how to do that.
So what you end up with is ‘solutions’ that have user experiences like this.
Now, I don’t know if you know the application owners I know, but this is simply not
going to happen.
49

And it's definitely not going to happen if what vendors deliver is the equivalent of
‘false positive as a service’.
50

Because if the first 10 things you give to someone who is very busy with with
business concerns are false positives, that’s going to be pretty much game over in
getting their time and buy-in for the future.
In the same way, Security Operations teams are already being fire hosed with alerts.
This means the tap may as well not be on if this is yet another pipe where there isn’t
time to do the necessary tuning.
51

In short, with ML and it’s promises, we face a classic fruit salad problem.
Knowledge is knowing a tomato is a fruit. Wisdom is not putting it in a fruit salad.
And while lots of vendors provide ML learning algos that have knowledge, it’s refining
those so that they have wisdom in the context of our business that makes them
valuable. Until that is possible (and easy) we’ll continue to be disappointed by results.
52

Here, we’ve built lake and ingested data, but analysis has hit a wall.
54

We didn’t have mature process around data analytics in security when we started this
effort, and what we've done is simply scaled up the approach we were taking before.
This has created a data swamp, with loads of data pools that are poorly maintained.
55

We’re used to running a workflow in which an analyst runs to our tech frankenstack,
pulls any data they can on an ad hoc basis into a spreadsheet, runs some best effort
analysis, creates a pie chart, and sends off a PDF we hope no one will read.
56

By automating part of that mess, we now have … a faster mess.
57

We run into trouble at the ingest stage
59

We’ve decided to ingest everything before starting with analysis.
And because this costs money and takes a lot of time, the business is sat for a long
time tapping their fingers waiting for insight.
Eventually, they get sick of waiting and cut the budget before we have enough in the
lake to do meaningful correlations and get some analysis going. We may try to
present some conclusions, but they’re flimsy and unmoving.
60

In which we run into problems at the very first stage of building the lake.
We’ve been running a big data initiative for 6 months, and the business has come to
ask us how we were doing.
62

We said it would be done soon while wrestling with getting technology set up that
was stable and usable.
63

They checked back on us when the next budget cycle rolled round.
64

We said it would be done soon (while continuing to battle with the tech).
65

And then they decided they were done with a transformation program that was on a
trajectory to be anything but.
66

So, if these are the foreseeable problems to avoid …
67

… what does that mean as we consider our approach at strategic and operational
levels?
68

Let’s imagine at ACME, we understand all the problems we’ve just looked at, because
our team has lived through them in other firms.
And we want to take an MVP approach to solving a big problem, so that it has a good
chance of success.
69

The problem at the top of our agenda is how to deal with newly introduced DevOps
pipelines in several business units.
Our devs are creating code that’s ready to push into production in 2 weeks. Which is
great.
70

What’s not so great, is that security has a 3 month waterfall assurance process.
And at the end of this, multiple high risk findings are raised consistently.
71

So app dev asks the CIO for exceptions, which are now granted so frequently that
eventually security is pretty much ignored all together.
72

Because of the pain involved in going through this risk management process, the
status quo is fast becoming: let’s not go find and manage risk.
73

We need to change this, so we can shift the timeline for getting risk under
management from months to weeks.
We know data analytics is critical to this, both to a) get the information we need to
make good data informed decisions, then b) automate off the back of that to manage
risk at the speed of the business and be as data-driven as possible.
74

This means moving from a policy based approach, where only a tiny bit of code meets
all requirements ...
75

… to a risk based approach, where we can understand risk holistically, and manage it
pragmatically.
76

This means bringing together lots of puzzle pieces across security, ops and dev
processes.
77

And turning those puzzle pieces into a picture, to show risk as a factor of
connectedness, dependencies and activity across the operational entities that
support and delivery business outcomes.
78

Our plan to do this is to understand where we should set thresholds in various
relevant metrics, so that when data analytics identifies toxic combinations (or that
we’re getting close to them) we can jump on the problem.
79

In the long term, ideally we want to be able to do ‘what if’ analysis to address
problems before they arise, and shift thresholds dynamically as internal and external
factors relating to the threat, technology and business landscape.
80

This means we can start measuring risk to business value and revenue across
business units, based on business asset exposure to compromise and impact.
81

To top it all off, we then want to automate action on our environment using ‘security
robotics’ - i.e. orchestration technologies.
82

If what we’re building towards is to stand a chance of doing that, we’re going to want
lots of optionality across the platform (or platforms!) that could eventually support
these outcomes.
83

We’ll need to tie in requirements from lots of stakeholders outside security.
84

And consider how this effort (and other security controls) are introducing friction or
issues into people’s ‘jobs to be done’.
85

Especially where we’ve deployed ‘award winning solutions’ that people talk about like
this in private.
86

If we start with the question ‘What’s the user need?’, we can – no doubt – come up
with a set of foundational insights, which will deliver value to the CIO, project
managers, devs, sec ops the CISO and security risk functions.
87

And we can think about how to make information accessible to interrogate, so lots of
different people can self-serve.
88

The vision driving our MVP approach might look like this.
Which sounds convincing.
89

Except, what if we have 2000 developers in one Business Unit?
Or at least we think we do. We know we’ve got at least 2000, but it could be more.
And our code base is totally fragmented, so we don’t know where all our code is, and
how we‘d get good coverage on scanning it.
And we‘re about to move a load of infrastructure and operations to a Business
Process Outsourcer. Which will make it challenging to get some of the data we want.
And the available data that we can correlate in the short term, well ... to be honest, it
ain‘t great.
90

Perhaps we’ve chosen data analytics as a proxy for the problem that actually needs
solving, as analytics is very unlikely to be able to solve the problem we have.
91

All of which is to say, you can have the best strategy in the world to tackle a problem
you have, but if it isn’t focused on a problem you have, that you also know you can
solve, then we’re back to square one and the CFO’s pet T-Rex.
92

So onto our final act: Act 3.
93

How do we choose our battleground to solve problems we know we have, which we
know we can solve.
94

Simon Wardley is open sourcing really great thinking on strategy, and he talks a lot
about the primacy of ‘where’; i.e. we have to understand our landscape to choose
where to play, in order to win.
In our strawman devops example, the problem we had to solve was dictated to us.
- We had some great ideas and frameworks, but no understanding of our landscape
- We had no time to build that up
- We couldn’t choose a battleground where we had a good chance of winning
- And we had to jump on the problem that was right in front of us, because we were
firefighting
95

This massively limited our chance of success, because we lacked context about the
game, the landscape and the climate.
96

Let’s return to the concept we started out with.
We need to help our leaders demonstrate strong control over risk.
97

And to do that we need to pull lots of puzzle pieces together into a picture.
98

Measuring the probability of badness happening is a topic of great debate in security.
But very often, at a practical level, we can end up in endless meetings arguing about
how naked we need to be to catch a cold, when it would be more productive to just
put our clothes on.
Because if our cyber hygiene levels are low (or inconsistent), not only is the job of
detect and respond harder, but it’s harder to know if we’re in a defensible position
should the worst happen.
99

Starting with foundational building blocks that are possible, and highly palatable to
solve makes good sense.
100

As long as we can present outputs and results that people want to hang behind their
metaphorical desk on the office wall.
101

If this is our battle ground …
102

We can now assess where we have problems that sit in that box.
This may be as simple as assuring that we have the AV coverage and operational
consistency we expect across our different host types (servers and workstations) and
OS types.
103

The output should be relevant to various stakeholders, from the CIO to IT Ops to
security control managers.
104

And we should be able to track that we are moving from here …
105

This is a model I call ‘the security cross fader’.
107

It expresses that investment in detect / respond becomes unsustainable at scale,
where that function is also picking up the side effects of poor cyber hygiene.
This sets up an investment trade off, of implementing preventative controls or change
processes to be secure by design where that makes financial sense, and having detect
/ respond pick up the slack where it’s not.
108

The goal (and challenge) is to find the right balance, so there’s less noise for detect /
respond to sift through, and an ability to for Security Operations to control the scope
of what they need to worry about.
109

How does this shake out into our problem space for security data analytics?
110

… we want to ensure we tackle problem sets in a way that delivers maximum value
for multiple stakeholders with minimal data sets.
112

With that constraint, we can ask, “If you could only choose 5 data sets to meet your
user needs, what would they be?”
Here is an example of an answer we might get back.
113

The correlation we get in Gold is a 1st
order confirmation, and in blue, 2nd
order
confirmation of ‘facts’ about ‘stuff’.
114

We then have inferences we can draw based on our knowledge.
115

And finally 3rd
order signals …
116

That don’t give us strong confirmations, but which we can use to join dots.
117

Now we know what we’re aiming for, we might not start with Netflow, but we can
target data sets for collection and analysis that get us on the ladder we eventually
need to climb.
118

… the journey can start with a user need that is far narrower.
121

As Mark Madsen said in 2011, if you procrastinate long enough, most problems solve
themselves.
122

And when it comes to building data lakes that can handle data volume, velocity and
diversity at scale, this is certainly where the market is heading.
So before investing lots of money to try and get there ourselves (with all the inter-
dependencies and challenges that entails) the best advice may be to wait a while.
123

Finally, onto analysis and insight.
124

If this is the approach we take to iterate quickly …
125

Then what we are setting up is a phased approach to quickly understand our data, the
value we can get from it currently, and the extent of the value we’ll be able to get in
future.
126

Like a musical cannon, we want to solve early problems that make harmonies more
pleasing over time as we add data sources and build analytics.
127

For example, we can use these data sets (at the bottom in grey) to address the
hygiene factors in green above.
128

Over time, we can use this to build upon.
129

Adding larger and more complex data soures as we go.
130

Tacking increasingly complex problems, accruing wisdom as we do.
131

We can then start looking at ‘risk factors’ in populations of operational entities.
132

Be they machine, or people.
133

Giving us the facts and evidence we need to put detections in context, and
understand risk across our landscape.
135

I realize that this is quite high level, and the equivalent in some ways of this guide to
‘How to draw a horse’.
136

None the less, I hope it’s been useful, and will answer any questions you have as best
I can!
Thank you very much.
137

A strategy for security data analytics - SIRACon 2016

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to A strategy for security data analytics - SIRACon 2016

Similar to A strategy for security data analytics - SIRACon 2016 (20)

Recently uploaded

Recently uploaded (20)

A strategy for security data analytics - SIRACon 2016