We need to stop using averages in information security. Breaches and other infosec events are actually power law distributed, not normally disturbed. This means a better analogy is "a 10 year breach" vs an average breach. This talk looks at 3 distinct datasets to illustrate the phenomenon.
BsidesLV 2014 The Power Law of Information Security
1. NORMAL
DISTRIBUTIONS
RULE EVERYTHING
AROUND ME
NORMAL DISTRIBUTIONS RULE EVERYTHING AROUND ME
Many empirical quantities cluster around a typical value. The dice rolls in these casinos, the number of reporters on the wall of sheep every year, the air
pressure, the sea level, the temperature on a sunny BlackHat day in Vegas. All of these things vary somewhat, but their distributions place a negligible
amount of probability far from the typical value, making the typical value representative of most observations. For instance, it is a useful statement to say
that it is really fucking hot in vegas in August because it never deviates very far from this. Even the largest deviations, which are exceptionally rare, are still
only about a factor of two from the mean in either direction and hence the distribution can be well characterized by quoting just its mean and standard
deviation. But not everything.
2. ALEX HUTTON DREAMS OF RISK
My name is Alex Hutton and I model risk for a small too big to fail bank. Last year, like every other day, I woke up and built a risk model. Since weāre a
bank, we track the prices of a lot of things. For one of these widgets, I built a distribution of price movements. This one is a normal distribution and I
assumed that the s.dev was 3%, which is a typical number for daily price movements in financial markets. My boss used this to make some decisions, and
was quite happy. We made millions from the tiny everyday price fluctuations and trades.
3. SHIT GOES WRONG SLIDE
Today, however, we are fucked. Today is Black Monday, October 19, 1987 and the S&P drops by 21%. My boss freaks out, the firm is in financial ruin, my
kids starve.
4. How could this happen? Under my model, the probability of a 21% fluctuation is 10^-16, orā¦ nonexistent.
So what happened? Well, the distribution of price fluctuations actually has a fat tail. In fact, the mistake I made was using a normal distribution. Take a look
at what happens if we use a power law distribution instead.
5. Probability 0.9 0.99 0.999 10
NORMAL 3.8 7.0 9.2 21
POWER 2.8 7.8 38.5 almost 0
SOMEBODY SET UP US THE BOMB!
Now, the chance of a 21% fluctuation is 0.08%, something that my risk model would certainly have included. And, would have certainly changed our
behavior on the financial markets. The good news is most financial firms are aware of this phenomenon, and model accordingly (after a few massive
failures). In info sec, weāre just not there yet.
6. MOTHERF*RS
SWANS
ACT LIKE āØ
THEY FORGOT āØ
ABOUT
Often, as Russell Thomas likes to point out, people mistake events that they did not predict for black swan events. However,
!
What makes a "Black Swan event" is not the event itself. Ā Instead, it is how that event fits into the object-observer system.
!
And in fact, the paradigm shift to using power law distributions to describe many of the variables we use in info sec explains away plenty of āblack swansā
- by making the object-observer system more receptive to rare, high impact events.
7. THE POWER LAW(S)
OF INFORMATION
SECURITY
@mroytman
THE POWER LAW OF INFORMATION SECURITY
But in fact, nothing is linear. This talk is about the power laws which occur in information security, what they mean, where iāve found some, and what to do
about them. The research iāll present is far from done, but itās a starting point and I hope to make you think twice before using a normal distribution in a
model again.
8. SLIDE WITH FRACTALS
WHAT ARE POWER LAWS
Power laws are distributions which describe scale-free phenomenon. What this means in lay manās terms is that the same mechanism is at work across a
range of scales, and orders of magnitudes. In fact, power laws are a necessary and sufficient condition for scale free phenomenon. The importance and
ubiquity of scale free behavior was first pointed out by Mandlebrot, who coined the term āfractalsā. In fractals, we see the same behavior across different
scales of length, time, price or any other relevant variable with a scale attached to it.
9. A quantity is said to follow a power law if it is drawn from a probability distribution that looks like:
P(x) ~ Cx^alpha
!
alpha is a constant parameter of the distribution known as the exponent, or scaling parameter. typical scaling parameters are in the range 2-3, but there
are exceptions.
10. Lots of things follow a power law power law phenomenon. The oldest (1948) and cleanest statistical regularity in international relations is Richardson's law
which states that the severity of warfare is power law distributed. This behavior is not unique to wars, and occurs in natural sciences (traffic jams,
earthquakes, biodiversity, coastlines, brownian motion, asteroid impacts, etc) and social sciences (language, wealth, firm size, salaries, guild sizes in world
of warcraft, links to blogs). These power laws are considered fingerprints of a "complex" system; although what exactly is meant by complex is transient.
These systems generally produce outputs that are patterned, but have no standard(for lack of a better term) size in the Gaussian sense. More often than
not, a power law only applies to the values of a distribution greater than some minimum x. In these cases, we say that the tail follows a power law.
11. FAKE SWANS
Tails are vitally important. A power law is an instance of a fat tailed distribution. There exist precise proofs that āsufficiently fat tailsā == power law
distributions. Measuring how fat a tail is, is actually quite difficult - The question of proving that something is or isnāt a power law, is often reduced to a
question of ājust how fat the tail isā.
12. You canāt tell the difference here, but when we go further outā¦
13. You can see how much smaller the tails of the non-power law distributions are.
14. LACK OF PREDICTION
Why does this matter? Itās because when the tails are small, we can say meaningful things about the āmeanā and varianceā of the distributions. With a
power law distribution, the mean or variance donāt necessarily stay stable over time.
!
An interest aspect of power laws is that the alpha exponent has a natural interpretation. It is the cutoff above which moments of the function do not exist.
More familiarly, for exponents less than 2, the variance does not exist, and the central limit theorem does not apply. In effect, even with an infinite amount
of data, we cannot say much about the variance of such functions. For exponents less than 1, the mean does not exist. For this reason there is no such
thing as an āaverage floodā. There is instead a 100 year flood, a 10 year flood.
15. Perhaps we ought to start talking about the target breach as a ā10 year breachā.
!
But letās get back to our own industry - why would information security exhibit power law behavior? And where?
16. First, when two distributions are combined, the fattest tail always wins. This means are you add in power law distributed factors to a distribution the results
stay power law distributed.
!
Second, Information Security is the combination of a great many factors - the size of the internet, the size of firms, the power of terrorist groups, and the
distribution of wealth are just a few of the ones I can think of that are power law distributed. If each has an exogenous effect on infosec, we would expect
our own variables to inherit those power law properties.
17. LAW 1
First, when two distributions are combined, the fattest tail always wins. This means are you add in power law distributed factors to a distribution the results
stay power law distributed.
!
Second, Information Security is the combination of a great many factors - the size of the internet, the size of firms, the power of terrorist groups, and the
distribution of wealth are just a few of the ones I can think of that are power law distributed. If each has an exogenous effect on infosec, we would expect
our own variables to inherit those power law properties.
18. BREACH FREQUENCY BY CVE TYPE
P(CVE has breach volume X) = X^-1.5
TheĀ KolmogorovāSmirnov D-value: 0.1134174, xmin: 15, alpha: 1.5
!
The chance that a particular CVE has high breach volume is substantially higher than we previously thought, just like in the hutton example the chance that
the S&P dropped by 21% was underestimated.
19. ONE VULN WILL CAUSE YOUR BREACH
(OR A COUPLE)
What does this mean for you? It means there are vulnerabilities which have an extremely high probability of causing a breach. Since this breach data
comes from how attackers are behaving, having a handle on threat intelligence globally allows you to identify _which_ vulnerabilities are those most likely
to cause the breaches.
!
It means shifting your strategy away from trying to fix everything, or even trying to fix everything that comes out on patch tuesday, and instead focusing
on identifying and remediating the few vulnerabilities which are _most_ likely to cause a breach. THIS is non-linear thinking.
20. LAW 2
First, when two distributions are combined, the fattest tail always wins. This means are you add in power law distributed factors to a distribution the results
stay power law distributed.
!
Second, Information Security is the combination of a great many factors - the size of the internet, the size of firms, the power of terrorist groups, and the
distribution of wealth are just a few of the ones I can think of that are power law distributed. If each has an exogenous effect on infosec, we would expect
our own variables to inherit those power law properties.
21. Kevin Thormsonās talk tomorrow at 2pm - This talk introduces the VERIS Community Database (VCDB), a research project aimed at gathering news articles
about information security incidents, extracting data, and serving as a public repository of breach data suitable for analysis and research
22. ID THEFT FREQUENCY
P(Theft has X victims) = X^-0.7
beta 0.7+- 0.1 Malliart and Sornette, ETH Zurich 2009 (datalossdb).
!
24. ONE BREACH WILL MATTER MOST
(OR A COUPLE)
The takeaway here is that impact is concentrated in the fat tails of the distributions as well - it means we ought to be tailoring our strategies to preventing
the one big breach. This also means thereās no average breach, and estimates of potential losses need to plan for scenarios like the black friday that was
missed in the opening example.
25. LAW 3
First, when two distributions are combined, the fattest tail always wins. This means are you add in power law distributed factors to a distribution the results
stay power law distributed.
!
Second, Information Security is the combination of a great many factors - the size of the internet, the size of firms, the power of terrorist groups, and the
distribution of wealth are just a few of the ones I can think of that are power law distributed. If each has an exogenous effect on infosec, we would expect
our own variables to inherit those power law properties.
26. BREACH FREQUENCY BY DAY
P(Day has breach volume X) = X^-1.5
TheĀ KolmogorovāSmirnov D-value: 0.1134174, xmin: 15, alpha: 1.5
!
28. SLIDE WITH WHAT DO
WE DO ABOUT IT
From Russell: Handling Fat Tails for Decisionmakers
!
Here's a list of things that analysts and decision makers can do to successfully cope with the unruliness of very fat tailed probability distributions:
1. To the method of frequentist statistical analysis of historical data, add other methods and other data. Ā Simulations, laboratory experiments, and
subjective probability estimates by calibrated experts are just three alternative methods that can fill in for the limitations of frequentist methods with
limited sample data.
2. Resist using colloquial terms like "average", "typical", "spread", or even "worst case". Ā Using them will only add to confusion,
misunderstanding, and mis-set expectations.
3. Communicate and decide using quantiles, not the usually summary statistics mean, standard deviation, etc. Ā If any summary statistics are used
as decision criteria or in models, use quantiles.
4. Put in some effort to estimate the "fatness" of the tail, either parametrically or non-parametrically. Ā Even a not-very-good fat tail model is much better
than one based on thin tails. Ā There are ways to test how good the alternative models are. Ā In my opinion, the best academic paper on this is "Power-law
distributions in empirical data".
29. You should model risk differently
!
!
michael
!
[8:21 AM]
You should focus your efforts on identifying things that live in the fat tail or are predictors of it
!
!
michael
!
[8:22 AM]
Bc there is no average
!
!
michael
!
[8:22 AM]
you should never ever use metrics like average vulns closed or something like that
1. Investing to fix 100% of vulns is poor use of resources
2. When the Big Loss event happens, only one or a few vulnerabilities will be exploited
3. Ahead of that (ex ante), you need a systematic method to invest to fix a portfolio of vulns which, with very high confidence, include ALL of the
vulns that could be part of the Big Loss event. Ā These vulns will be strategically positioned in the most likely attack graphs.
4. And hereās how youād do that in practice ...
31. Dan Geer, Power. Law. http://geer.tinho.net/ieee/ieee.sp.geer.1201a.pdf
Clauset et al. Power Law Distributions in Empirical Data http://arxiv.org/abs/0706.1062
Farmer and Geanokoplos, Power Laws in Economics and Elsewhere
http://tuvalu.santafe.edu/~jdf/papers/powerlaw3.pdf
Malliart and Sornette, Heavy-Tailed Distribution of Cyber Risks, http://arxiv.org/abs/
0803.2256
poweRlaw R Package http://cran.r-project.org/web/packages/poweRlaw/vignettes/
poweRlaw.pdf
Gabaix, Some Nondescript NYU Stern Lecture on Power Laws http://pages.stern.nyu.edu/
~xgabaix/papers/powerLaws.pdf
Russell Thomas for graphs and everything he writes on http://
exploringpossibilityspace.blogspot.com/
THANKS!
and Alex Hutton