BlueHat v18 || Improving security posture through increased agility with measurable effectiveness at scale

Improving Security Posture through Increased Agility with Measurable Effectiveness at Scale
Speed Kills
1

Taken from the 2018 Verizon Data Breach Investigation Report
https://www.verizonenterprise.com/resources/reports/rp_DBIR_2018_Report_execsummary_en_xg.pdf 3

What we are
going to talk
about
What is Vanquish?
Agility by Design
Agility through Measurability
Agility via Machine Learning
Lessons of Red October Pen Test
4

What is
Vanquish?
Near real-time security monitoring & analytics
platform for M365 Data Center infrastructure
• Detections
• Remediation
• Alerting
• Telemetry from Hosts
• Integrated Context
• Incident Management
• Analyst Tools
5

Collecting and Processing Security Data
M365SubstrateInfrastructure
Vanquish Cluster
(Kafka/Spark/CosmosDB/Kusto)
Hundreds ofthousandsofmachinesscored
innearrealtime
NRT processing withintelligentlogicfor
combiningsignals(triangulation)
AnalystToolsandDashboards
Thousandsofresults/day
Alertingand Automation
Approximatelyonepagingalert/day

Agility by
Design
Move fast, don’t impact customers
Treat detections as code not scripts
Leverage right technologies
Detections at the speed of attackers
Remediation and Investigation at the speed of
attackers
7

Vanquish is decoupled from monitored assets
MOVE FAST, BUT DON’T
IMPACT CUSTOMERS
DEPLOY NEW CODE
WITHOUT RISK TO
MONITORED ASSETS
APPLY FILTERS ACROSS ALL
DIMENSIONS IN SECONDS
CREATE IOCS ACROSS ALL
DIMENSIONS IN SECONDS
8

Detections are code – not scripts
Broken is not agile
Detections are tested – deployment is gated on passing tests
9

Leverage the
Best of Microsoft
& Open Source
Technologies
Microsoft Siphon
10

Detections at the speed of attackers
Detection
of Badness
Forensic
Analysis
Created
New
Detection
Added the
Detection
to ML
model
More
Badness
found
The Hunt for Red October
• Alerted <4 minutes after intrusion
• IOCs added in minutes
• New detection deployed within hours
11

Remediation: Too Slow
1. 9am: Decision to remediate
2. 1pm: Attacker starts pivoting
3. 4pm: Remediation complete
Ample opportunity to remediate
Delayed by tooling
Active attacker kept ahead of us
1
2
3
12
1
2
3

Investigation and Remediation at the speed of attackers
On-Host
Telemetry
Cloud Based
Detection
Remediation
Subsystem
Data-Center
Management
Service
On-Host
Remediation
or
Investigation
13

Agility through
Measurability
System is Up
Endpoints are Covered
Badness Detected
14

System is Up!
Run pipeline as a
service
Monitor for data
latency & completeness
Monitor Spark Jobs
15

Endpoints are covered!
We look for heartbeats and configuration correctness for each host
We have monitoring for HostIDS health
Remediation is automated for unhealthy endpoints
16
99th
Percentile

Badness Detected!
PEN TESTS ATTACKBOT (BROUGHT TO
YOU BY M365 RED TEAM )
17

Pen Test results
Pen tests in the last year which did not trigger a paging alert:
0
Before we get too overconfident – our Red Team is awesome
• Detecting them does not mean that they did not achieve their objective
M365 still believes in Assume Breach approach
18

AttackBot is constantly validating detections
Automated
Attacks Run
Frequently
Process
Signal/Create
Detections
Auto-Label
Detections
Measure
Results
Adjust (if
needed)
19

Agility via
Machine
Learning
Anomaly System Filters Normal Activity
ML Precisely Targets Known Bad
Automated Model Training Adapts Quickly
21

Anomaly Calculation
• A service in a Data Center is largely uniform
• Automate whitelists for normal behavior
• State snapshots: autorun reg keys, group membership
• Challenges:
• Anomalous ≠ Malicious
• Emerging behaviors create noise
22

Anomaly Detection is Not Enough
~500 Billion Events Per Day
23
0
100000
200000
300000
400000
8/25 8/27 8/29 8/31 9/2
Anomaly Detections per Day
+ We Catch Attacks
0
100000
200000
300000
400000
8/25 8/27 8/29 8/31 9/2
Anomaly Detections per Day
0
1
2
3
4
5
8/25 8/27 8/29 8/31 9/02
Alerts Per Day

Supervised Machine Learning
Maintain an archive of known malicious behavior
• Pen test, attack automation, others, etc.
• Any thing that our security analysts have labelled malicious/bad
New behavior
Is new behavior
similar / not similar
to known attacks?
Limitation - Can’t learn what we haven’t seen
• But there is value in auto-learning what we have seen
• With a world class Pen Test team you can auto-learn a lot
Challenge - M365 evolves quickly → Model becomes stale quickly
24

Repeatable Intelligent Automation
• Data processing, model training, evaluation & promotion takes time
when done manually
• AI & Automation is the key to agility & better results
Data Processing
Wrangling
Normalization
Sampling
Bootstrapping
Etc.
Features
Extraction
ML Model
Training
&
Evaluation
Model
Promotion
&
Threshold
Selection
Data Processing
Wrangling
Normalization
Sampling
Bootstrapping
Etc.
Features
Extraction
ML Model
Training
&
Evaluation
Model
Promotion
&
Threshold
Selection
Repeatable Intelligent Automation
Without Needing
Human Intervention
IntelligenceIntelligence
25

Model Performance and Automated Learning
Hunt for Red October
New malicious
behavior
identified and
labelled by
humans on a
couple of
machines
24 Machines
compromised
in 4 days
10
Alerted by ML
before
humans
6
Tied between
ML & humans
8
Missed by ML
10
Alerted by ML
before
humans
6
Tied between
ML & humans
ML learned in
and alerted
on the rest
New malicious behavior
learned in by ML
automatically
26

Agile Model Experimentation and Update
Adding/updating features to ML model doesn’t require a code change
Features
Extraction
<Features>
<Feature Type ="Numeric" Signal=“Detection1" Operation="Count" Field="ProcessName" />
<Feature Type ="Numeric" Signal="Detection2" Operation="Max" Field="Score" />
</Features>
Normalized
Detections
Feature
Vectors
<Feature Type ="Numeric" Signal=“Detection3" Operation="MaxSum" Field="Bytes,IP,Port" />
New detection feature
27

Intelligent
Automated
Machine Learning
28
Auto adopts to service changes
Auto responds to active attacks

It takes a
village
Don’t build it all yourself
29

Takeaways
Design to move fast, without impacting customers
Build confidence through continuous validation
Effectiveness at scale through Intelligent Automated ML
If you are an M365 service – get onboarded with us :-)
31

Questions?
Bryan Jeffrey, Naveed Ahmad, David Hurley
O365 Signals - Security Signals Team
Members in Cambridge, Redmond, and Suzhou
Contact us:
O365f-enggsise@microsoft.com
Bryan.Jeffrey@microsoft.com
Navahm@microsoft.com
Davehur@microsoft.com
M365 Service that wants to onboard to Vanquish?
https://aka.ms/getvanquish
32

BlueHat v18 || Improving security posture through increased agility with measurable effectiveness at scale

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to BlueHat v18 || Improving security posture through increased agility with measurable effectiveness at scale

Similar to BlueHat v18 || Improving security posture through increased agility with measurable effectiveness at scale (20)

More from BlueHat Security Conference

More from BlueHat Security Conference (20)

Recently uploaded

Recently uploaded (20)

BlueHat v18 || Improving security posture through increased agility with measurable effectiveness at scale