A Big Dashboard of Problems.pdf

Slide
A Big Dashboard of Problems
Travis McPeak, co-founder and CEO @ Resourcely
1

Who here has heard of Security Monkey? Open sourced in 2014, Security Monkey
was part of Netflix’s Simian Army, and I believe was among the first cloud security
posture management tools, before they were even called CSPM. Security Monkey
scans your cloud resources, creates an inventory, and reports on misconfigurations.
The tool is really useful, and had it been a for-profit company it would probably be
worth a bajillion dollars today.
Fun fact, in the initial Simian Army post, it actually says “Security Monkey terminates
offending instances”, but this didn’t end up being true. We did, however, show all of
the misconfigurations in a dashboard. One time, we were having a meeting with some
security friends from Riot games. The Riot team flew up from LA and was hanging out
at our Netflix office. Both teams were showing each other some of the tools we had
built to solve a problem. One of the Riot folks told us about a tool they created that
was designed to enforce tagging and would terminate instances that weren’t tagged
correctly. Apparently the system had gone awry and earned the unfortunate name
“Murderbot”.
When the Netflix folks showed Security Monkey, one of the Riot folks said “What am I
supposed to do with a thousand findings?” This was the first time I thought, “huh,
dashboards by themselves aren’t very useful”.

Has anybody seen Repokid? This is Netflix’s open source tool that automatically
rightsizes roles to least privilege. Did you know that before Repokid there was another
tool called Repoman? My former Netflix colleague Patrick Kelley created Repoman, a
tool that would look at application roles in AWS and show the findings to developers in
a dashboard. The developers would click a button to rightsize their role.
The problem? Nobody used it! We learned a few things:
1) Developers don’t really care about least privilege. This is a security concern.
The best you can do is make it automatic for developers to get least privilege.
2) Nobody wants to see problems in a dashboard. We need to do something!
So Repoman became Repokid. Repokid was my first project at Netflix, and it did
*exactly* what Repoman did except with one big change – rather than show issues in
a dashboard with a button to fix, Repokid made the default least privilege and allowed
developers to opt out. We’ll talk more about Repokid later.

At least Repoman had a button to fix issues though. One of the ways I like to spend
time is advising and angel investing. To do this well, you have to look at a lot of
companies. I have seen probably 200 pitches in the last 4 years. I’m shocked at how
many of those companies are big dashboards of problems!
As an industry, we’re so busy. There is so much work to do and we literally can’t pay
people to come do it. We have 3.5 million open cyber security jobs in 2021.
Any security folks here? Yeah, my people. How about developers, y’all have a bunch
of extra time you are looking to fill with new projects?
I see dashboards like this and get so frustrated. What am I supposed to do with this?
There’s 120 new vulnerabilities and 8.5 vulnerabilities per host. What am I supposed
to do with this information? Who is going to come to the dashboard and what are they
going to do as a result?

Look at this one (by the way, this is just an image I grabbed from the internet).
76 thousand vulnerabilities awaiting attention! If I saw this I would go hide under my
desk or retire or something. Things are not working! Also I love the line at the bottom
that says 32 days to close. LOL. If I saw that graph I’m definitely not thinking it’s going
to 0 in 30 days.

How about this – 907 total assets and 857 non-compliant assets! If I see 1.517K of
anything, it is not actionable. Also, pie charts are so dumb. Is this for people that don’t
understand percentages?

Cyber! What am I supposed to do with this though? Disconnect from the internet?

Today is the day, when I’m going to start taking security seriously. It’s “shields up” day.

Protip: If somebody wants to bulk archive your findings, the product is not working.

My name is Travis, and I’m tired of big dashboards of problems.
Hi!
My name is Travis, for those I haven’t met yet, “it’s great to meet you!”
I have spent most of my career leading some aspect of security at large companies.
One thing I love about security, particularly at large companies, is the strategy
involved. You have very finite resources, security as a cost center after all, and we
prevent bad things from happening in the future. Humans tend to be really bad at
estimating future risk, so unsurprisingly security has to fight for every dollar they get.
Your job in security is to mitigate risk. Let me ask you, what risk does me having a
shiny new dashboard like one of those I just showed mitigate? Unless the dashboard
is just used as eye candy for the CISO, or to make other executives think you are
doing work, it’s a part of a solution, and not the most useful part.
By the way, I use a lot of Netflix examples in here. This isn’t because Netflix does all
of the great work in security, it’s just where I spent a lot of time. I sourced other
examples from the Internet and gave shoutouts to the folks that sent them to me.

In defense of dashboards, I understand that visibility is the first requirement of
security, and you can’t fix what you can’t see. Unless you’re an inventory solution or a
scanner for something with a known simple solution, we need to do better.
So many products I see are focused on identify and detect, but those are an
incomplete solution. If your product can’t protect and/or respond, it’s probably not that
useful.

What’s wrong with dashboards?
● Unsolvable problems
○ Your internet facing thing is under aack!
● Unactionable findings
○ You have 34,567 new vulnerabilities!
● Unimportant findings
○ Why does the INFO category even exist?
● “Sky is falling”
○ At least give me an “EASY” buon for each finding
Even for simple dashboard products, there is too much noise. If your product is telling
me about something it should be really important. The big innovation from the CSPM
v1 products (ie Security Monkey, Evident, Redlock) and CSPM v2 (Orca, Wiz) are the
filtering. They pull in a bunch of context about your environment and tell you only
about things that are really important.
“Your thing is internet facing” vs. “Your thing is internet facing, it has 5 critical
vulnerabilities, and it hasn’t been touched in 3 years”.
Unsolvable problems – not to beat up Guard Duty too much, I think it’s a useful
product for smaller companies that need to do *something* for security, but at Netflix it
would regularly tell me internet facing instances had incoming connections from
known malicious IPs. What am I supposed to do with that? We can block the IP, but
there are lots of false positives in that. At Netflix, we had a way to block known
malicious IPs. You know what would happen? Somebody in a college IP block would
do something abusive and the whole college wouldn’t be able to watch Netflix.
Attackers can change IPs but normal users can’t.
Unactionable findings – if there is more than 1000 of something, I can’t take an action.
I either need to do more triage/filtering or use this as a signal to go fix something
upstream. In any case, the product telling me about so many findings is not a solution
for me.
Unimportant findings – literally, why is there an INFO category? Realistically,

everybody in the industry ignores lows and most mediums. If you’re going to be a
dashboard, show me things that have a high likelihood of me taking an action, and
then help me take the action.
“Sky is falling” – I get it, security is really hard. If there isn’t a path for me to fix the
problem, I don’t want to know about it. I’ll focus my limited attention in the areas
where I *can* make a difference.

“Costs and Consequences of Gaps in Vulnerability Response” (Ponemon Institute)
60% of breaches occurred because of an unpatched, known vulnerability.
The real problem with these scanners is they address issues too late. OK, it’s better to
have a scanner tell me about it before an attacker uses it. But not really, if I don’t take
an action. It’s the same. According to the Ponemon institute, 60% of breaches
occurred because of a known vulnerability.

Hierarchy of security products
Big dashboards of problems
Dashboard of problems with EASY button
Scan/fix and tell me when it’s done
Continuously fix
Catch the problem in test
????
��
��
��
��
��
��
Vulnerability Window
& Eort Required
I call this the Pyramid of Shit. At the bottom, we have big dashboards of problems.
Hopefully you all agree with me that these are shit and we can move on.
Above dashboards, we have dashboards that have some easy fix option. Repoman,
for example, is in this category. Some of the new CSPM tools also do this.
Moving up the pyramid we have tools that go fix issues and then report on success.
Repokid is in this category.
Better than scan and fix, we can continuously fix. This is better because the
vulnerability window tends to be smaller. A big tech company told me they have a tool
that runs Lambdas and puts non-compliant cloud resources into compliance
continuously. This is the best level we can achieve if we aren’t willing to change
developer behavior at all.
Some of the newer security solutions are advocating shift-left, and this usually
involves catching issues in CI or in test environments. A good example here is
Bridgecrew and other similar shift-left CSPMs. These solutions are great because the
issue is never actually vulnerable. But technically, we still have to fix issues.
There is a better option…

“Throw computer into the sea”
- Alex Maestrei
The most secure default…
Just kidding… As tempting as this is, we all know you can’t be secure without
availability :(.

Defaults! ��
Transaction Barriers Behavioral Biases
● Loss aversion
● Discounting
● Procrastination
Preference Formation
● Implicit advice
● Experience
Examples
● 🫀 organ donation
● 💊 generic prescriptions
● 💳 auto-renewal
We should be looking for secure-by-default. It’s so much more effective than anything
else. Why?
Tyranny of the default – people overwhelmingly don’t change defaults. They just don’t.
A study from Microsoft found that 95% of Word users kept the defaults that were
preloaded.
In fact, there’s a whole branch of economics called “Behavioral Economics” that uses
defaults as a powerful tool.
In a paper for the Australian Government called “Harnessing the power of defaults”
the authors describe why defaults work and how to use them:
Defaults work because of three main categories: transaction barriers, behavioral
biases, and preference formation.
Transaction barriers – essentially some combination of actual pain in the ass or
perceived pain in the ass to change the settings.
Behavioral biases use the way we are wired, specifically:
- Loss aversion – people are wired to avoid perceived losses. The “settings”
they already have are included in this in the brain.
- Discounting – I have to opt out now (pay a cost) but the benefits are in the

- future
- Procrastination – when a decision carries cognitive load, people maintain the
status quo
Preference formation – two parts:
- Implicit advice – the defaults are seen as a suggestion by a presumed expert
- Experience – staying in a state for too long leads decision maker to prefer the
default
Examples:
● Organ donation – Even though many like the idea of organ donation, few
actually went to the trouble to sign it. For countries with it default, up to 90%
people are organ donors. Countries with opt-out struggle to get to 15%.
● In one research study, researchers changed the default for prescriptions to
use the generic. Generics are basically as effective as brand names and save
the healthcare system a lot of money when used. Despite this, physicians
would stick to prescribing brand names. Researchers put a box that allowed
physicians to opt out and increased % of generic prescriptions by 23.1% to
98.4.
● Sales people know the value of defaults first-hand, this is why most
subscriptions helpfully auto-renew.
How do we use this? You won’t believe this one simple trick that can make you more
secure than all of your friends’ companies…

Secure by Default – Application Security
The rest of this talk shows some inspiration for products, tools, services, and
approaches that got it right. I’ll break these down by category. First up – security of
applications.

Shoutout: Leif @ Segment
Segment has done a great job of open sourcing a lot of their projects, tools, and
approaches. This PR was made to one of those projects, UI-box, a React component
that implements, you guessed it, a UI box. The box can be used for implementing
buttons easily.
This feature makes it so that you can configure a property that applies to all buttons
created with it. The people making the buttons don’t have to worry about checking the
safety of their destination links. The box will only allow safe destinations (what you
would expect from a link) vs. things like Javascript exploit code.
Segment built this because they had received a few bug bounty reports with
Javascript HREFs and they didn’t want to keep playing whackamole. In the first
version it was opt-in and in the next version it was on by default.
Similarly, the web framework, Angular, requires you to explicitly append “unsafe” in
front of a protocol that isn’t allowlisted. Calling these methods “unsafe” vs something
generic like “non-default” is a good simple queue to developers to think about what
they are doing.

Shoutout: Christian Frichot
Read this! hps://go.dev/blog/tls-cipher-suites
The Cipher suites in TLS date way back to OpenSSL. The cipher suite lists the
cryptographic algorithms that are used to exchange keys, encrypt the connection, and
verify the certificates. Many servers leave the choice of which algorithms to support to
the developer, and this is a big cognitive load for devs. Without being a cryptography
expert, how are you supposed to know what to support? This is an important issue
because man-in-the-middle attackers can force a cipher suite downgrade, so if a
server supports bad cipher suites, and attacker can force the worst to be used in
some cases.
The choice is so hard there are tools that do nothing but build configuration lists for
you.
Beginning in Go 1.17, Go takes over cipher ordering for all Go users. You can use
configuration to disable cipher suites, but ordering is not developer controlled. The
crypto/tls library makes all ordering decisions based on available cipher suites, local
hardware, and remote capabilities.

Shoutout: Rami McCarthy
Next up is Tink, a project with a self-described goal of “making crypto not feel like
juggling chainsaws in the dark”. That hits home. Even as a security person, ever time
I deal with crypto I get nervous. I definitely don’t want to screw this up, and I don’t feel
equipped with all of the context and history to make a decision. I could spend several
hours researching it, and even then I might still make a mistake.
Instead, I can use Google’s Tink open source, which makes it “easy to use correctly
and hard(er) to misuse”. This is another example of a library that comes with sane
defaults baked in and prevents me from chopping myself up accidentally.

Rails CSRF prevention does what it advertises on the label, it eliminates an entire
major class of web vulnerability. Similar to Segment’s feature, it shipped as an option
at first and then became the default. I look forward to a world where developers don’t
even have to learn about CSRF. Learning about this and keeping it in mind distracts
from the work we want to be doing.

Generate a strong password during install vs. leing the user pick
I have seen a lot of applications with a default password that is supposed to be
changed and then isn’t. A previous employer had a major bug bounty submission due
to this. Passwords are bad, and I would love to move away from them completely, but
until that day comes let’s use strong, pseudorandom passwords for everything. One
way to guarantee this is to simply remove that choice from the user, or give them the
choice to pick their own password but make them jump through hoops to do it, similar
to Tink’s philosophy of making it hard to screw up crypto.

Use an ORM to make it really hard to write SQLi vulnerabilities.
An oldie but a goodie. When you use a ORM, you get a lot of technical benefits, but
you also make it much harder to write SQL injection vulnerabilities. This works
because we move away from raw SQL queries. Without the raw SQL queries, there’s
really nothing to inject into. Easy peasy.

Secure by Default – Architecture
OK, next up, secure-by-default architectures.

Immutable Infrastructure +
Auto-patching +
Managed Delivery
or Serverless
or Distroless
Do less patching
More than half of breaches involve some kind of an unpatched vulnerability. So what
can we do to make vulnerabilities more secure-by-default?
One solution is to make patching less effortful. At Netflix, we invested a lot in guiding
developers to have more cattle and less pets. We generally practiced immutable
infrastructure, which means that if you want to change your system you build a new
image and redeploy vs. changing software directly on the instances. This approach
carries lots of benefits, for example, if something happens to your instance
automation can easily bring up another identical one. We encouraged this practice
with a tool many of you have heard of – Chaos Monkey. Chaos Monkey would go
randomly make your instance unstable to test your ability to automatically recover.
If you get to the point where rebuilding and redeploying is automated, and you invest
in testing and telemetry to tell you when your application is unhealthy, you can lean
into auto-patching. The idea is you constantly redeploy images with the latest
software and if something goes wrong your orchestration routes traffic to the previous
version. Taking this one step further, Netflix has a system called “managed delivery”
where the application developers get a platform that can perform updates
asynchronously from them. Netflix invested so much here, that they were able to
patch many log4j instances in 10 minutes versus “weeks or more than a month”
according to ISC(2) data. Assuming an organization spent one week, Netflix’s 10
minutes would be over 1000 times faster.
An alternative approach is to simply need to patch less. One way to accomplish this is

with distroless distributions. Many folks treat containers essentially as virtual
machines, with their own operating system images. A better way to use containers is
to use the host OS components and only bundle your application and its direct library
dependencies in the container. This approach yields less overall patching.
Serverless, such as AWS Lambda, also removes the need to patch underlying
systems. Your application gets a simple runtime on top of somebody else’s host OS
that will only let you bundle the application and its library dependencies.

Default empty security groups &
Egress Filtering
or
Default empty IAM roles
Kubernetes:
● No priv containers
● No host network
● MustRunAsNonRoot
shoutout: Shelley Wu
We can also apply secure-by-default principles to ACLs.
One example is Repokid. With Repokid, we start off with intentionally broad IAM roles
that contain most of the permissions folks generally need. We observe the application
over a period of time to see which permissions are actively used. After a while let
developers know what we’re taking away and if they don’t opt-out we remove the
unused permissions. Note – this isn’t actually “secure-by-default”, because the
application is vulnerable for a period of time. But it is automatic security.
Another tactic we used at Netflix is an empty role. Many workloads don’t actually
require any IAM permissions, so our default launching for some of our systems had
empty roles. Of course, for usability we have to make it easy to go get the
permissions you need. We invested a lot in self-service for this.
We can take a similar approach for security groups. I like how AWS has default empty
security groups, which don’t allow incoming traffic. You have to explicitly add incoming
connections that you need. We can also apply secure-by-default on the way out with
egress filtering.
In Kubernetes land, we can launch with NoPriv containers, no host network, and force
containers to run without root. Shoutout to Shelley Wu for these!

I’m a huge fan of systems that developers want to use because it makes their lives
easier but has security properties baked in.
One case here is Spinnaker. Spinnaker has a ton of auxiliary security benefits, like
making it easy to re-deploy your application for patching like we discussed earlier. But
since so many developers prefer to use it, we have a nice injection point for secure
defaults. In Spinnaker, each application launched with its own app specific role by
default. These roles made it possible to do repoing (as we discussed earlier).
Spinnaker would make it hard to launch instances that weren’t using our golden
image. It also tracked application properties like ownership.
Another example is Lemur. Without Lemur, if a developer wants a certificate for their
microservice they have to select a cipher suite, generate a private key, generate a
certificate, get the certificate to their load balancer and handle rotation. With Lemur,
we replace all of that with a few button clicks. Now we get secure-by-default crypto
algorithms, strong key storage, and an inventory.
Finally, we have Zuul and Zuul’s internally facing sister Wall-E. Both of these services
have really nice security properties baked in that developers get for free.

hps://www.actionlockdoc.com/blog/the-history-of-car-locks/
Shoutout: Dylan Ayrey
Autos have come a long way (shoutout to Dylan for this history)
- Early cars didn’t have keys to prevent theft
- In the 1940s makers started making locks to prevent theft and protect
valuables
- In 1998 most carmakers started introducing central locking systems, to make it
so you don’t have to unlock each door
- Keyless fobs were introduced starting in late 90’s, early 00’s
- By 2018 keyless was standard on 62% of cars
Fobs make it easy to unlock your door AND make it hard to lock your key in your car
The Apple Watch can auto-unlock your machine, which makes it easier to set an
aggressive locking policy.

I think this GIF speaks for itself. The saw will prevent you from chopping your hot dog!

Chromebook automatically updates without user having to do anything. Also, it has a
secure boot process with validation, so every time the computer starts up it makes
sure it’s untampered Chrome booting.

hps://blog.google/products/chrome/chrome-secure-default-everyone/
The Chrome browser has a ton of security features built in. It has a solid built in
password manager, it can automatically upgrade all connections to HTTPS, and it
tries to use secure DNS to resolve sites. Chrome also makes it really clear when
you’re about to visit something sketchy. A special shoutout to Adrienne Porter Felt,
who has done a ton of work and research on making Chrome secure-by-default and
highly usable.

Shoutout: Peter Collins
Read this! hps://webauthn.guide/
WebAuthn has a ton of secure-by-default properties (shoutout to Peter Collins for this
one). I think most of us can agree that passwords aren’t aging well. All of us still have
a loved one that use the same terrible password for every site. I highly recommend a
read of the link above.
According to the site, 81% of hacking related breaches use weak or stolen
passwords. Developers have to figure out how to store and manage passwords
securely, which is a big burden for them. Users have to figure out how to store and
manage passwords. This is the kind of stuff that makes my family afraid to use the
internet.
With WebAuthn, we store passwords in HSMs, which are systems purpose built to
keep secrets safe. Users don’t have to deal with keys at all. Developers only have to
store a public key, which is deliberately public, so if it gets disclosed it’s worthless to
an attacker. Probably the biggest game-changer here is that your private key can only
be used to authenticate to the site it’s scoped for, mitigating the risk of phishing.
Phishing is so bad it basically dominates the Verizon DBIR results, so most folks filter
it out when citing DBIR.

Shoutout: Will Bengtson
Here’s a cool example – Airpods have a feature that warns when you’re about to
leave them somewhere. This is good, because the damn things fall out of my pocket
constantly.

GSuite
O365
Both GSuite and O365, two of the biggest hosted email providers, have malware and
phishing prevention built in and on-by-default. This makes email far safer to use by
default.

Shoutout: Material Security
Extending this one step further is one of my favorite vendors, Material Security.
Material noticed that attackers often compromise email and then dig out sensitive
information to use for further attacks. Material will automatically quarantine this info
and require a second-factor push to retrieve it. This way, you still have the info if you
need it, but attackers can’t easily get it.

Slide
So, what’s the point?
37
● Move up the pyramid (more Murderbot)
● If your product requires me to throw a bunch of ops at it, I’m not buying
● Defaults are powerful – make it hard to do the wrong thing
● Users have it hard, make their lives easier
● An ounce of prevention is worth a pound of cure
I want to move the industry up the pyramid. Visibility of security issues shouldn’t be a
viable product.

Slide
Takeaway for Developers
38
● You are responsible for security of your application
● Guide users to safe choices by default, let them opt out
● Walk through setup with your users, observe their struggles
● If there is a clear best practice, make it default

Slide
Recommendations
39
● Chromebooks for your family
● WebAuthN for auth
● Golang for your apps
● Privless for containers, ﬁrewalls, and RBAC

40
Thank you!
@travismcpeak
hps://www.linkedin.com/in/travismcpeak/

A Big Dashboard of Problems.pdf

Recommended

Recommended

More Related Content

Similar to A Big Dashboard of Problems.pdf

Similar to A Big Dashboard of Problems.pdf (20)

Recently uploaded

Recently uploaded (20)

A Big Dashboard of Problems.pdf