This document discusses applying data science techniques to indicator of compromise (IOC)-based detection. It describes challenges with using IOCs, including quality issues and high volume. The document proposes using data enrichment and aggregation to identify relationships between IOCs and measure their "maliciousness ratio" and "rating". This provides context to better determine if an IOC represents a true compromise. The approach aims to make threat intelligence more useful at scale.
2. • Security Data Scientist
• Capybara Enthusiast
• Co-Founder and Chief Data Scientist at Niddel
(@NiddelCorp)
• Lead of MLSec Project (@MLSecProject)
Who am I?
• What is a Niddel?
• Niddel provides a SaaS-based Autonomous Threat Hunting System
• Research from this talk was performed using anonymized Niddel data and
uses concepts implemented on its products.
• Not a vendor-centric talk, focus on learning and y’all to reproduce this.
3. • The Promise of IOCs
• 7 Habits of Highly Effective Analysts (ok,
only 3)
• Nation-State APT Detection Deluxe Recipe
• Data Science to Assist on Pivoting
• Maliciousness Ratio
• Maliciousness Rating
• Revisiting TIQ-TEST – Telemetry Test
Agenda
5. Promise - Some Definitions First
• IOCs: Indicators of compromise
• CTI: Cyber Threat Intelligence
• Will be using them interchangeably
during this presentation
• IOCs -> technical data that allows for
”tactical” discovery of a potential
compromise on a system
• We will be focusing on network IOCs on
this talk
Little Bobby Comics by @RobertMLee and Jeff Haas
6. Promise – Sounds Great! Sign me up!
• Not so fast, my friend
• Main challenges with IOCs consumption:
• Quality and Curation
• Vetting and quality control
• Open feeds vs Paid feeds
• Manual vs Automated
• Velocity and Volume
• How to operationalize?
• Add to SIEM?
• Block in Firewall / Web Proxy?
7. Promise – Quality and Velocity at Odds
• AIS – Threat Intel sharing initiative from
US Department of Homeland Security
• I fully support sharing (see previous
intel sharing decks from 2015)
• But if we are resigned to this level of
quality, ”it is what it is”, how can CTI /
IOCs be shaped into a useful tool at
scale?
8. Promise – Current Implementation Strategies
1. Alerting based on matching with IOC data:
• By being careful, only matching on more ”precise” indicators (URLs >> IPs),
you can reduce number of False Positives, but still challenging
2. Using IOC data to build context for existing alerts:
• Safer bet, but you are not adding any detection power to existing controls
SPOILER ALERT:
Everyone starts with (1) because ”the FPs can’t be that bad”, and then begrudgingly
moves to (2) because there is not enough time in the world to go through all the
noise that (1) generates.
13. Data Science to Assist on Pivoting
• Doing it ourselves: - Begin with data collection
1. Get IOCs from your favorite / available providers – there are a few options
that are fairly good. Please do select according to collection criteria.
2. ”Enrich” the data to gather the ”pivot points” and find the connections.
Combine (https://github.com/mlsecproject/combine) can help with IOC gathering
and enrichment for ASN data and pDNS (if you have a Farsight pDNS key)
• IP Addresses:
• AS number
• BGP prefix
• Country
• pDNS relationship to domains
• Domain names:
• pDNS relationship to IPs
• WHOIS Registrations
• SOA
• NS Servers
19. Data Aggregation – Rig EK Example
In summary: let’s create different graphs for each one of the pivoting points and measure the
cardinality of the node connectedness
AS48096 - ITGRAD
AS16276 – OVH SAS L
AS14576 – Hosting Solution Ltd
(actually king-servers.com)