© 2019 SPLUNK INC.© 2019 SPLUNK INC.
Machine Learning in Infosec
Debunking Buzz and Demystifying Use-
cases
Presented by: Dominique Dessy
With content from: Phillipp Drieger and David Gamer
LSEC – CTI 23 Jan 2019
© 2019 SPLUNK INC.
▶ What is Machine Learning (Really)
▶ How does ML fit into Security?
▶ Working Example (Domain Generation Algorithms)
▶ Further Common Security Use-Cases with ML
▶ Fitting ML into your SOC process
▶ Questions
Agenda
© 2019 SPLUNK INC.
What is Machine
Learning?
© 2019 SPLUNK INC.
Existing Algorithms
‘We’ (Vendors) didn’t just make this stuff up
▶ Logistic
Regression
David Cox 1958
▶ Random Forest
Classification
Tin Kam Ho 1995
▶ Principal Component
Analysis (PCA)
Karl Pearson 1901
© 2019 SPLUNK INC.
Improving
Progressively improving performance without reprogramming
© 2019 SPLUNK INC.
Observations at Machine Speed
Making simple observations faster than humans
Machine SpeedHuman Speed
VS
dd-hh:mm :ss.fffff
© 2019 SPLUNK INC.
Machine Learning for
Security?
© 2019 SPLUNK INC.
ML is NOT: What ML CAN’T do in security*
✘ A silver bullet
✘ A fortune teller
✘ Smarter than an analyst
✘ A replacement or easy button for a SIEM
✘ Be 100% effective from day one
✘ Predict breaches
✘ Replace analysts
✘ Work without oversight
Machine Learning (ML) in Security
What it ISN’T
* at least not today in real world environments
© 2019 SPLUNK INC.
 Bulk observation and analysis of data, faster than an analyst
 Identification of Anomalies, Dynamic Classification, Mining Behavior
 Higher level of confidence than vanilla correlation
 Focusing Analyst time on the areas most likely to relate to an incident
 Making Analysts more effective, saving manual work
Machine Learning (ML) in Security
What it IS
David Gamer
Splunk
“Machine Learning augments your
analysts and maximises the value of
their time”
© 2019 SPLUNK INC.
Let’s lift the veil
Buy a product, powered by AI with self healing
capabilities
All security issues solved?
© 2019 SPLUNK INC.
Search and Respond
OT
Industrial Assets
IT
Consumer and
Mobile Devices
Search
Third-Party
Applications
Smartphones
and Devices
Tickets
Email
Send an
email
File a
ticket
Send a text
Flash lights
Trigger
process flow
Every Search Can Use
Machine Learning
AlertReal Time
© 2019 SPLUNK INC.
► Deviation from past behavior
► Deviation from peers
► (aka Multivariate AD or Cohesive AD)
► Unusual change in features
► ITSI MAD Anomaly Detection
► Predict Service Health Score
Predicting Churn
► Predicting Events
► Trend Forecasting
► Detecting influencing entities
► Early warning of failure – predictive
maintenance
► Identify peer groups
► Event Correlation
► Reduce alert noise
► Behavioral Analytics
► ITSI Event Analytics
Anomaly detection Predictive Analytics Clustering
Typical Machine Learning Scenarios
© 2019 SPLUNK INC.
▶ Supervised Learning
(labeled data)
• regression
• classification
▶ Unsupervised Learning
(unlabeled data)
• clustering
• anomaly detection
▶ Mixed Models
(with reinforcement or
feedback)
• human in the loop
• autonomous systems
Types of Machine Learning
© 2019 SPLUNK INC.
Data
Science
Expertise
• Statistics/math background
• Algorithm selection
• Model building
Skill Areas for Machine Learning
Domain
Expertise
(IT, Security,
IoT…)
• Identify use cases
• Drive decisions
• Understanding of business impact
Splunk
Expertise
• Searching
• Reporting
• Alerting
• Workflow
© 2019 SPLUNK INC.
DGA App for Splunk
Download for free from splunkbase:
https://splunkbase.splunk.com/app/3559/
© 2019 SPLUNK INC.
Challenges to detect DGAs:
▶ Static matching runs against
potentially infinite blacklist entries
O(∞)
▶ Regex can narrow down this list, but
still hard to compute and find rules
(and define exceptions for rules)
▶ Unknown unknowns?
▶ Want to get fuzzy?
▶ Let’s ML!
▶ Example of DGAs:
Example IoCs for Wannacry
(https://cert.europa.eu/static/SecurityAdvisories/2017/CERT-EU-SA2017-012.pdf)
Domain Generating Algorithms (DGA)
What’s the problem?
© 2019 SPLUNK INC.
▶ Example for end to end data science process
▶ Disclaimer: this is not a turn key solution but a template to get you started
▶ Feel free to improve and let the author know your feedback ☺
© 2019 SPLUNK INC.
▶ Small data set with 100K domain names for training and testing
© 2019 SPLUNK INC.
© 2019 SPLUNK INC.
▶ More features can significantly improve your machine learning models
▶ Extend this with your feature engineering ideas (e.g. subdomains, age of
domain registration, rating/scoring from threat lists for known malicious
domains etc.)
© 2019 SPLUNK INC.
© 2019 SPLUNK INC.
© 2019 SPLUNK INC.
© 2019 SPLUNK INC.
© 2019 SPLUNK INC.
© 2019 SPLUNK INC.
▶ Consider your goals using machine learning in the context of your problem:
maximize detection rate? minimize false positives?
© 2019 SPLUNK INC.
▶ Check how our trained model performs
against WannaCry C&C domains that the
model has NOT been trained on.
Reality check: Detect Unknown Unknowns?
Example WannaCry
© 2019 SPLUNK INC.
Fraud
• Outlier transactions
• Unexpected patterns of
spending
• Deviation from peers
groups
• Account take over
Malicious Traffic
• New Domain Access
• External Access
Patterns
• Unusual Download /
Uploads
• Account take over
Insider Threat
• Unusual access to
company resources
• Unusual Download /
Uploads
• Unusual external device
access
• Account take ove
Security Use-Cases
Common ML Use-cases in Security
© 2019 SPLUNK INC.
Fitting ML into the SOC
Where will all these robots sit?
• Alerts from ML feed into same workflow
than other security notifications
• Level 1 initial analysis and triage is still
required for ML alerts
• A higher level of confidence means that
these alerts CAN be prioritised
• Alert velocity for ML should be much
lower than from other alert sources
• An Analyst should be assessing all
alerts before action is taken, even for
ML
• The Output from Machine Learning is
INDICATIVE not DEFINATIVE
Raw Security
Events
Anomalies Anomaly Chains
(Threats)
Lateral Movement
Beaconing
Land-Speed Violation
HCI
Anomalies graph
Entity relationship
graph
Kill chain
sequence
Forensic artifacts
Threat/Risk
scoring
Feedback
Machine
Learning
Graph
Mining
Threat
Models
© 2019 SPLUNK INC.© 2017 SPLUNK INC.

20190123 LSEC CTI - Machine Learning in Infosec

  • 1.
    © 2019 SPLUNKINC.© 2019 SPLUNK INC. Machine Learning in Infosec Debunking Buzz and Demystifying Use- cases Presented by: Dominique Dessy With content from: Phillipp Drieger and David Gamer LSEC – CTI 23 Jan 2019
  • 2.
    © 2019 SPLUNKINC. ▶ What is Machine Learning (Really) ▶ How does ML fit into Security? ▶ Working Example (Domain Generation Algorithms) ▶ Further Common Security Use-Cases with ML ▶ Fitting ML into your SOC process ▶ Questions Agenda
  • 3.
    © 2019 SPLUNKINC. What is Machine Learning?
  • 4.
    © 2019 SPLUNKINC. Existing Algorithms ‘We’ (Vendors) didn’t just make this stuff up ▶ Logistic Regression David Cox 1958 ▶ Random Forest Classification Tin Kam Ho 1995 ▶ Principal Component Analysis (PCA) Karl Pearson 1901
  • 5.
    © 2019 SPLUNKINC. Improving Progressively improving performance without reprogramming
  • 6.
    © 2019 SPLUNKINC. Observations at Machine Speed Making simple observations faster than humans Machine SpeedHuman Speed VS dd-hh:mm :ss.fffff
  • 7.
    © 2019 SPLUNKINC. Machine Learning for Security?
  • 8.
    © 2019 SPLUNKINC. ML is NOT: What ML CAN’T do in security* ✘ A silver bullet ✘ A fortune teller ✘ Smarter than an analyst ✘ A replacement or easy button for a SIEM ✘ Be 100% effective from day one ✘ Predict breaches ✘ Replace analysts ✘ Work without oversight Machine Learning (ML) in Security What it ISN’T * at least not today in real world environments
  • 9.
    © 2019 SPLUNKINC.  Bulk observation and analysis of data, faster than an analyst  Identification of Anomalies, Dynamic Classification, Mining Behavior  Higher level of confidence than vanilla correlation  Focusing Analyst time on the areas most likely to relate to an incident  Making Analysts more effective, saving manual work Machine Learning (ML) in Security What it IS
  • 10.
    David Gamer Splunk “Machine Learningaugments your analysts and maximises the value of their time”
  • 11.
    © 2019 SPLUNKINC. Let’s lift the veil Buy a product, powered by AI with self healing capabilities All security issues solved?
  • 12.
    © 2019 SPLUNKINC. Search and Respond OT Industrial Assets IT Consumer and Mobile Devices Search Third-Party Applications Smartphones and Devices Tickets Email Send an email File a ticket Send a text Flash lights Trigger process flow Every Search Can Use Machine Learning AlertReal Time
  • 13.
    © 2019 SPLUNKINC. ► Deviation from past behavior ► Deviation from peers ► (aka Multivariate AD or Cohesive AD) ► Unusual change in features ► ITSI MAD Anomaly Detection ► Predict Service Health Score Predicting Churn ► Predicting Events ► Trend Forecasting ► Detecting influencing entities ► Early warning of failure – predictive maintenance ► Identify peer groups ► Event Correlation ► Reduce alert noise ► Behavioral Analytics ► ITSI Event Analytics Anomaly detection Predictive Analytics Clustering Typical Machine Learning Scenarios
  • 14.
    © 2019 SPLUNKINC. ▶ Supervised Learning (labeled data) • regression • classification ▶ Unsupervised Learning (unlabeled data) • clustering • anomaly detection ▶ Mixed Models (with reinforcement or feedback) • human in the loop • autonomous systems Types of Machine Learning
  • 15.
    © 2019 SPLUNKINC. Data Science Expertise • Statistics/math background • Algorithm selection • Model building Skill Areas for Machine Learning Domain Expertise (IT, Security, IoT…) • Identify use cases • Drive decisions • Understanding of business impact Splunk Expertise • Searching • Reporting • Alerting • Workflow
  • 16.
    © 2019 SPLUNKINC. DGA App for Splunk Download for free from splunkbase: https://splunkbase.splunk.com/app/3559/
  • 17.
    © 2019 SPLUNKINC. Challenges to detect DGAs: ▶ Static matching runs against potentially infinite blacklist entries O(∞) ▶ Regex can narrow down this list, but still hard to compute and find rules (and define exceptions for rules) ▶ Unknown unknowns? ▶ Want to get fuzzy? ▶ Let’s ML! ▶ Example of DGAs: Example IoCs for Wannacry (https://cert.europa.eu/static/SecurityAdvisories/2017/CERT-EU-SA2017-012.pdf) Domain Generating Algorithms (DGA) What’s the problem?
  • 18.
    © 2019 SPLUNKINC. ▶ Example for end to end data science process ▶ Disclaimer: this is not a turn key solution but a template to get you started ▶ Feel free to improve and let the author know your feedback ☺
  • 19.
    © 2019 SPLUNKINC. ▶ Small data set with 100K domain names for training and testing
  • 20.
  • 21.
    © 2019 SPLUNKINC. ▶ More features can significantly improve your machine learning models ▶ Extend this with your feature engineering ideas (e.g. subdomains, age of domain registration, rating/scoring from threat lists for known malicious domains etc.)
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
    © 2019 SPLUNKINC. ▶ Consider your goals using machine learning in the context of your problem: maximize detection rate? minimize false positives?
  • 28.
    © 2019 SPLUNKINC. ▶ Check how our trained model performs against WannaCry C&C domains that the model has NOT been trained on. Reality check: Detect Unknown Unknowns? Example WannaCry
  • 29.
    © 2019 SPLUNKINC. Fraud • Outlier transactions • Unexpected patterns of spending • Deviation from peers groups • Account take over Malicious Traffic • New Domain Access • External Access Patterns • Unusual Download / Uploads • Account take over Insider Threat • Unusual access to company resources • Unusual Download / Uploads • Unusual external device access • Account take ove Security Use-Cases Common ML Use-cases in Security
  • 30.
    © 2019 SPLUNKINC. Fitting ML into the SOC Where will all these robots sit? • Alerts from ML feed into same workflow than other security notifications • Level 1 initial analysis and triage is still required for ML alerts • A higher level of confidence means that these alerts CAN be prioritised • Alert velocity for ML should be much lower than from other alert sources • An Analyst should be assessing all alerts before action is taken, even for ML • The Output from Machine Learning is INDICATIVE not DEFINATIVE Raw Security Events Anomalies Anomaly Chains (Threats) Lateral Movement Beaconing Land-Speed Violation HCI Anomalies graph Entity relationship graph Kill chain sequence Forensic artifacts Threat/Risk scoring Feedback Machine Learning Graph Mining Threat Models
  • 31.
    © 2019 SPLUNKINC.© 2017 SPLUNK INC.