SlideShare a Scribd company logo
Data-Driven Threat Intelligence: Useful
Methods and Measurements for
Handling Indicators (#ddti)
Alex Pinto
Chief Data Scientist
Niddel / MLSec Project
@alexcpsec
@MLSecProject
Alexandre Sieira
CTO
Niddel
@AlexandreSieira
@NiddelCorp
MLSec Project / Niddel
• MLSec Project – research-focused branch of Niddel for open-
source tools and community building
• Niddel builds Magnet, the Applied Threat Intelligence Platform
focused on detecting breaches and malware activity
• Looking for trial prospects and research collaboration
• More info at:
• niddel.com
• mlsecproject.org
• Cyber War… Threat Intel – What is it good for?
• Combine and TIQ-test
• Using TIQ-test
• Dataset description
• Tests and more tests
• Use case: Feed Comparison
• Future research direction
Agenda
What is TI good for (1) Attribution
What is TI good for anyway?
TY to @bfist for his work on http://sony.attributed.to
What is TI good for (2) – Cyber Maps!!
TY to @hrbrmstr for his work on
https://github.com/hrbrmstr/pewpew
What is TI good for anyway?
• (3) How about actual defense?
• Strategic and tactical: planning
• Technical indicators: DFIR and monitoring
Affirming the Consequent Fallacy
1. If A, then B.
2. B.
3. Therefore, A.
1. Evil malware talks to 8.8.8.8.
2. I see traffic to 8.8.8.8.
3. ZOMG, APT!!!
Combine and TIQ-Test
• Combine (https://github.com/mlsecproject/combine)
• Gathers TI data (ip/host) from Internet and local files
• Normalizes the data and enriches it (AS / Geo / pDNS)
• Can export to CSV, “tiq-test format” and CRITs
• Coming Soon™: CybOX / STIX / SILK /ArcSight CEF
• TIQ-Test (https://github.com/mlsecproject/tiq-test)
• Runs statistical summaries and tests on TI feeds
• Generates charts based on the tests and summaries
• Written in R (because you should learn a stat language)
Using TIQ-TEST
• Available tests and statistics:
• NOVELTY – How often do they update themselves?
• AGING – How long does an indicator sit on a feed?
• POPULATION – How does this population distribution
compare to another one?
• OVERLAP – How do they compare to what you got?
• UNIQUENESS – How many indicators are found in only
one feed?
• https://github.com/mlsecproject/tiq-test-Summer2015
Using TIQ-TEST – Feeds Selected
• Dataset was separated into “inbound” and “outbound”
TY to @kafeine and John Bambenek for access to their feeds
Using TIQ-TEST – Data Prep
• Extract the “raw” information from indicator feeds
• Both IP addresses and hostnames were extracted
Using TIQ-TEST – Data Prep
• Convert the hostname data to IP addresses:
• Active IP addresses for the respective date (“A” query)
• Passive DNS from Farsight Security (DNSDB)
• For each IP record (including the ones from hostnames):
• Add asnumber and asname (from MaxMind ASN DB)
• Add country (from MaxMind GeoLite DB)
• Add rhost (again from DNSDB) – most popular “PTR”
Using TIQ-TEST – Data Prep Done
Novelty Test
Measuring added and dropped
indicators
Novelty Test - Inbound
Aging Test
Is anyone cleaning this mess up
eventually?
INBOUND
OUTBOUND
Population Test
• Let us use the ASN and
GeoIP databases that we
used to enrich our data as a
reference of the “true”
population.
• But, but, human beings are
unpredictable! We will
never be able to forecast
this!
Is your sampling poll as random as
you think?
Can we get a better look?
• Statistical inference-based comparison models
(hypothesis testing)
• Exact binomial tests (when we have the “true” pop)
• Chi-squared proportion tests (similar to
independence tests)
Overlap Test
More data can be better, but make
sure it is not the same data
Overlap Test - Inbound
Overlap Test - Outbound
Uniqueness Test
Uniqueness Test
• “Domain-based indicators are unique to one list between 96.16% and
97.37%”
• “IP-based indicators are unique to one list between 82.46% and
95.24% of the time”
Intermission
• “You Data Scientists and your
algorithms, how quaint.”
• “Why aren’t you doing some
useful research like nation-state
attribution?”
OPTION 1: Cool Story, Bro!
OPTION 2: How can I use this
awesomeness on my data?
• How about using TIQ-TEST to evaluate a private intel feed?
• Trying stuff before you buy is usually a good idea. Just sayin’
• Let’s compare a new feed, “private1”, against our combined
outbound indicators
Use Case: Comparing Private Feeds
Population Test
Population Test
Population Test
Aging Test
• I guess most DGAs rotate every 24 hours, right?
• Rotation means the private data is still “fresh”, from research or
DGA generation procedures
Mostly DGA Related Churn
A++ WOULD
THREAT INTEL
AGAIN
(or would I?)
I hate quoting myself, but…
Take Aways
• Analyze your data. Extract more value from it!
• If you ABSOLUTELY HAVE TO buy Threat
Intelligence or data, evaluate it first.
• Try the sample data, replicate the experiments:
• https://github.com/mlsecproject/tiq-test-Summer2015
• http://rpubs.com/alexcpsec/tiq-test-Summer2015
• Share data with us. I’ll make sure it gets proper
exercise!
Future Research
• Updating this presentation for Black Hat USA
• Analyzing Threat Intelligence Sharing behavior
• We need anonymous indicator sharing counts:
• “C1 shared X indicators on day Y”
• “C2 marked x indicators as FPs / or down voted
them”
Thanks!
• Q&A?
• Feedback!
”The measure of intelligence is the ability to change."
- Albert Einstein
Alex Pinto
@alexcpsec
@MLSecProject
@NiddelCorp
Alexandre Sieira
@AlexandreSieira
@MLSecProject
@NiddelCorp

More Related Content

What's hot

Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Alex Pinto
 
BSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information SecurityBSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information Security
Alex Pinto
 
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Alex Pinto
 
2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies
2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies
2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies
Joshua R Nicholson
 
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Alex Pinto
 
SANS CTI Summit 2016 Borderless Threat Intelligence
SANS CTI Summit 2016 Borderless Threat IntelligenceSANS CTI Summit 2016 Borderless Threat Intelligence
SANS CTI Summit 2016 Borderless Threat Intelligence
Jason Trost
 
CTI ANT: Hunting for Chinese Threat Intelligence
CTI ANT: Hunting for Chinese Threat IntelligenceCTI ANT: Hunting for Chinese Threat Intelligence
CTI ANT: Hunting for Chinese Threat Intelligence
JacklynTsai
 
Creating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & VisualizationCreating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & Visualization
Raffael Marty
 
Artificial Intelligence – Time Bomb or The Promised Land?
Artificial Intelligence – Time Bomb or The Promised Land?Artificial Intelligence – Time Bomb or The Promised Land?
Artificial Intelligence – Time Bomb or The Promised Land?
Raffael Marty
 
Threat Hunting with Elastic at SpectorOps: Welcome to HELK
Threat Hunting with Elastic at SpectorOps: Welcome to HELKThreat Hunting with Elastic at SpectorOps: Welcome to HELK
Threat Hunting with Elastic at SpectorOps: Welcome to HELK
Elasticsearch
 
Dreaming of IoCs Adding Time Context to Threat Intelligence
Dreaming of IoCs Adding Time Context to Threat IntelligenceDreaming of IoCs Adding Time Context to Threat Intelligence
Dreaming of IoCs Adding Time Context to Threat Intelligence
Priyanka Aash
 
Cyber Threat Hunting with Phirelight
Cyber Threat Hunting with PhirelightCyber Threat Hunting with Phirelight
Cyber Threat Hunting with Phirelight
Hostway|HOSTING
 
Application of threat intelligence in security operation
Application of threat intelligence in security operationApplication of threat intelligence in security operation
Application of threat intelligence in security operation
Jeremy Li
 
Building a Successful Threat Hunting Program
Building a Successful Threat Hunting ProgramBuilding a Successful Threat Hunting Program
Building a Successful Threat Hunting Program
Carl C. Manion
 
AI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't Changed
AI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't ChangedAI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't Changed
AI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't Changed
Raffael Marty
 
The Cyber Threat Intelligence Matrix: Taking the attacker eviction red pill
The Cyber Threat Intelligence Matrix: Taking the attacker eviction red pillThe Cyber Threat Intelligence Matrix: Taking the attacker eviction red pill
The Cyber Threat Intelligence Matrix: Taking the attacker eviction red pill
Frode Hommedal
 
Abstract Tools for Effective Threat Hunting
Abstract Tools for Effective Threat HuntingAbstract Tools for Effective Threat Hunting
Abstract Tools for Effective Threat Hunting
chrissanders88
 
Security Insights at Scale
Security Insights at ScaleSecurity Insights at Scale
Security Insights at Scale
Raffael Marty
 
Visualization in the Age of Big Data
Visualization in the Age of Big DataVisualization in the Age of Big Data
Visualization in the Age of Big Data
Raffael Marty
 
MITRE ATT&CKcon 2.0: Prioritizing Data Sources for Minimum Viable Detection; ...
MITRE ATT&CKcon 2.0: Prioritizing Data Sources for Minimum Viable Detection; ...MITRE ATT&CKcon 2.0: Prioritizing Data Sources for Minimum Viable Detection; ...
MITRE ATT&CKcon 2.0: Prioritizing Data Sources for Minimum Viable Detection; ...
MITRE - ATT&CKcon
 

What's hot (20)

Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
 
BSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information SecurityBSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information Security
 
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
 
2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies
2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies
2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies
 
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
 
SANS CTI Summit 2016 Borderless Threat Intelligence
SANS CTI Summit 2016 Borderless Threat IntelligenceSANS CTI Summit 2016 Borderless Threat Intelligence
SANS CTI Summit 2016 Borderless Threat Intelligence
 
CTI ANT: Hunting for Chinese Threat Intelligence
CTI ANT: Hunting for Chinese Threat IntelligenceCTI ANT: Hunting for Chinese Threat Intelligence
CTI ANT: Hunting for Chinese Threat Intelligence
 
Creating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & VisualizationCreating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & Visualization
 
Artificial Intelligence – Time Bomb or The Promised Land?
Artificial Intelligence – Time Bomb or The Promised Land?Artificial Intelligence – Time Bomb or The Promised Land?
Artificial Intelligence – Time Bomb or The Promised Land?
 
Threat Hunting with Elastic at SpectorOps: Welcome to HELK
Threat Hunting with Elastic at SpectorOps: Welcome to HELKThreat Hunting with Elastic at SpectorOps: Welcome to HELK
Threat Hunting with Elastic at SpectorOps: Welcome to HELK
 
Dreaming of IoCs Adding Time Context to Threat Intelligence
Dreaming of IoCs Adding Time Context to Threat IntelligenceDreaming of IoCs Adding Time Context to Threat Intelligence
Dreaming of IoCs Adding Time Context to Threat Intelligence
 
Cyber Threat Hunting with Phirelight
Cyber Threat Hunting with PhirelightCyber Threat Hunting with Phirelight
Cyber Threat Hunting with Phirelight
 
Application of threat intelligence in security operation
Application of threat intelligence in security operationApplication of threat intelligence in security operation
Application of threat intelligence in security operation
 
Building a Successful Threat Hunting Program
Building a Successful Threat Hunting ProgramBuilding a Successful Threat Hunting Program
Building a Successful Threat Hunting Program
 
AI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't Changed
AI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't ChangedAI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't Changed
AI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't Changed
 
The Cyber Threat Intelligence Matrix: Taking the attacker eviction red pill
The Cyber Threat Intelligence Matrix: Taking the attacker eviction red pillThe Cyber Threat Intelligence Matrix: Taking the attacker eviction red pill
The Cyber Threat Intelligence Matrix: Taking the attacker eviction red pill
 
Abstract Tools for Effective Threat Hunting
Abstract Tools for Effective Threat HuntingAbstract Tools for Effective Threat Hunting
Abstract Tools for Effective Threat Hunting
 
Security Insights at Scale
Security Insights at ScaleSecurity Insights at Scale
Security Insights at Scale
 
Visualization in the Age of Big Data
Visualization in the Age of Big DataVisualization in the Age of Big Data
Visualization in the Age of Big Data
 
MITRE ATT&CKcon 2.0: Prioritizing Data Sources for Minimum Viable Detection; ...
MITRE ATT&CKcon 2.0: Prioritizing Data Sources for Minimum Viable Detection; ...MITRE ATT&CKcon 2.0: Prioritizing Data Sources for Minimum Viable Detection; ...
MITRE ATT&CKcon 2.0: Prioritizing Data Sources for Minimum Viable Detection; ...
 

Similar to Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling Indicators

Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
Alexandre Sieira
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
Yannick Pouliot
 
PXL Data Engineering Workshop By Selligent
PXL Data Engineering Workshop By Selligent PXL Data Engineering Workshop By Selligent
PXL Data Engineering Workshop By Selligent
Jonny Daenen
 
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Data Works MD
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
Travis Oliphant
 
Data Science.pptx Data science Data science
Data Science.pptx Data science Data scienceData Science.pptx Data science Data science
Data Science.pptx Data science Data science
Pradeep482741
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
StampedeCon
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014ALTER WAY
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
Rachel Berryman
 
WWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big dataWWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big data
webwinkelvakdag
 
SplunkLive! Cincinnati - Hurricane Labs - Oct 2012
SplunkLive! Cincinnati - Hurricane Labs - Oct 2012SplunkLive! Cincinnati - Hurricane Labs - Oct 2012
SplunkLive! Cincinnati - Hurricane Labs - Oct 2012Splunk
 
Meetup SF - Amundsen
Meetup SF  -  AmundsenMeetup SF  -  Amundsen
Meetup SF - Amundsen
Philippe Mizrahi
 
Data strategy - How & When to Invest (SXSW V2V Core Conversation)
Data strategy - How & When to Invest (SXSW V2V Core Conversation)Data strategy - How & When to Invest (SXSW V2V Core Conversation)
Data strategy - How & When to Invest (SXSW V2V Core Conversation)
Courtney Hemphill
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
Tao Feng
 
Secure360 May 2018 Lessons Learned from OWASP T10 Datacall
Secure360 May 2018 Lessons Learned from OWASP T10 DatacallSecure360 May 2018 Lessons Learned from OWASP T10 Datacall
Secure360 May 2018 Lessons Learned from OWASP T10 Datacall
Brian Glas
 
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsBridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable Workflows
Ilkay Altintas, Ph.D.
 
Big Data Certification
Big Data CertificationBig Data Certification
Big Data Certification
Experfy
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
Tao Feng
 
Irving-TeraData: data and science driven big industry-nfdp13
Irving-TeraData: data and science driven big industry-nfdp13Irving-TeraData: data and science driven big industry-nfdp13
Irving-TeraData: data and science driven big industry-nfdp13
DataDryad
 
Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014
Carly Strasser
 

Similar to Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling Indicators (20)

Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
 
PXL Data Engineering Workshop By Selligent
PXL Data Engineering Workshop By Selligent PXL Data Engineering Workshop By Selligent
PXL Data Engineering Workshop By Selligent
 
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
Data Science.pptx Data science Data science
Data Science.pptx Data science Data scienceData Science.pptx Data science Data science
Data Science.pptx Data science Data science
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
 
WWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big dataWWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big data
 
SplunkLive! Cincinnati - Hurricane Labs - Oct 2012
SplunkLive! Cincinnati - Hurricane Labs - Oct 2012SplunkLive! Cincinnati - Hurricane Labs - Oct 2012
SplunkLive! Cincinnati - Hurricane Labs - Oct 2012
 
Meetup SF - Amundsen
Meetup SF  -  AmundsenMeetup SF  -  Amundsen
Meetup SF - Amundsen
 
Data strategy - How & When to Invest (SXSW V2V Core Conversation)
Data strategy - How & When to Invest (SXSW V2V Core Conversation)Data strategy - How & When to Invest (SXSW V2V Core Conversation)
Data strategy - How & When to Invest (SXSW V2V Core Conversation)
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
Secure360 May 2018 Lessons Learned from OWASP T10 Datacall
Secure360 May 2018 Lessons Learned from OWASP T10 DatacallSecure360 May 2018 Lessons Learned from OWASP T10 Datacall
Secure360 May 2018 Lessons Learned from OWASP T10 Datacall
 
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsBridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable Workflows
 
Big Data Certification
Big Data CertificationBig Data Certification
Big Data Certification
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
Irving-TeraData: data and science driven big industry-nfdp13
Irving-TeraData: data and science driven big industry-nfdp13Irving-TeraData: data and science driven big industry-nfdp13
Irving-TeraData: data and science driven big industry-nfdp13
 
Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014
 

Recently uploaded

Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 

Recently uploaded (20)

Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 

Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling Indicators

  • 1. Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling Indicators (#ddti) Alex Pinto Chief Data Scientist Niddel / MLSec Project @alexcpsec @MLSecProject Alexandre Sieira CTO Niddel @AlexandreSieira @NiddelCorp
  • 2. MLSec Project / Niddel • MLSec Project – research-focused branch of Niddel for open- source tools and community building • Niddel builds Magnet, the Applied Threat Intelligence Platform focused on detecting breaches and malware activity • Looking for trial prospects and research collaboration • More info at: • niddel.com • mlsecproject.org
  • 3. • Cyber War… Threat Intel – What is it good for? • Combine and TIQ-test • Using TIQ-test • Dataset description • Tests and more tests • Use case: Feed Comparison • Future research direction Agenda
  • 4. What is TI good for (1) Attribution
  • 5. What is TI good for anyway? TY to @bfist for his work on http://sony.attributed.to
  • 6. What is TI good for (2) – Cyber Maps!! TY to @hrbrmstr for his work on https://github.com/hrbrmstr/pewpew
  • 7. What is TI good for anyway? • (3) How about actual defense? • Strategic and tactical: planning • Technical indicators: DFIR and monitoring
  • 8. Affirming the Consequent Fallacy 1. If A, then B. 2. B. 3. Therefore, A. 1. Evil malware talks to 8.8.8.8. 2. I see traffic to 8.8.8.8. 3. ZOMG, APT!!!
  • 9. Combine and TIQ-Test • Combine (https://github.com/mlsecproject/combine) • Gathers TI data (ip/host) from Internet and local files • Normalizes the data and enriches it (AS / Geo / pDNS) • Can export to CSV, “tiq-test format” and CRITs • Coming Soon™: CybOX / STIX / SILK /ArcSight CEF • TIQ-Test (https://github.com/mlsecproject/tiq-test) • Runs statistical summaries and tests on TI feeds • Generates charts based on the tests and summaries • Written in R (because you should learn a stat language)
  • 10. Using TIQ-TEST • Available tests and statistics: • NOVELTY – How often do they update themselves? • AGING – How long does an indicator sit on a feed? • POPULATION – How does this population distribution compare to another one? • OVERLAP – How do they compare to what you got? • UNIQUENESS – How many indicators are found in only one feed?
  • 12.
  • 13. Using TIQ-TEST – Feeds Selected • Dataset was separated into “inbound” and “outbound” TY to @kafeine and John Bambenek for access to their feeds
  • 14. Using TIQ-TEST – Data Prep • Extract the “raw” information from indicator feeds • Both IP addresses and hostnames were extracted
  • 15. Using TIQ-TEST – Data Prep • Convert the hostname data to IP addresses: • Active IP addresses for the respective date (“A” query) • Passive DNS from Farsight Security (DNSDB) • For each IP record (including the ones from hostnames): • Add asnumber and asname (from MaxMind ASN DB) • Add country (from MaxMind GeoLite DB) • Add rhost (again from DNSDB) – most popular “PTR”
  • 16. Using TIQ-TEST – Data Prep Done
  • 17. Novelty Test Measuring added and dropped indicators
  • 18. Novelty Test - Inbound
  • 19. Aging Test Is anyone cleaning this mess up eventually?
  • 22. Population Test • Let us use the ASN and GeoIP databases that we used to enrich our data as a reference of the “true” population. • But, but, human beings are unpredictable! We will never be able to forecast this!
  • 23.
  • 24. Is your sampling poll as random as you think?
  • 25. Can we get a better look? • Statistical inference-based comparison models (hypothesis testing) • Exact binomial tests (when we have the “true” pop) • Chi-squared proportion tests (similar to independence tests)
  • 26.
  • 27. Overlap Test More data can be better, but make sure it is not the same data
  • 28. Overlap Test - Inbound
  • 29. Overlap Test - Outbound
  • 31. Uniqueness Test • “Domain-based indicators are unique to one list between 96.16% and 97.37%” • “IP-based indicators are unique to one list between 82.46% and 95.24% of the time”
  • 32.
  • 34. • “You Data Scientists and your algorithms, how quaint.” • “Why aren’t you doing some useful research like nation-state attribution?” OPTION 1: Cool Story, Bro!
  • 35. OPTION 2: How can I use this awesomeness on my data?
  • 36. • How about using TIQ-TEST to evaluate a private intel feed? • Trying stuff before you buy is usually a good idea. Just sayin’ • Let’s compare a new feed, “private1”, against our combined outbound indicators Use Case: Comparing Private Feeds
  • 40. Aging Test • I guess most DGAs rotate every 24 hours, right? • Rotation means the private data is still “fresh”, from research or DGA generation procedures Mostly DGA Related Churn
  • 41.
  • 42.
  • 44. I hate quoting myself, but…
  • 45. Take Aways • Analyze your data. Extract more value from it! • If you ABSOLUTELY HAVE TO buy Threat Intelligence or data, evaluate it first. • Try the sample data, replicate the experiments: • https://github.com/mlsecproject/tiq-test-Summer2015 • http://rpubs.com/alexcpsec/tiq-test-Summer2015 • Share data with us. I’ll make sure it gets proper exercise!
  • 46. Future Research • Updating this presentation for Black Hat USA • Analyzing Threat Intelligence Sharing behavior • We need anonymous indicator sharing counts: • “C1 shared X indicators on day Y” • “C2 marked x indicators as FPs / or down voted them”
  • 47.
  • 48. Thanks! • Q&A? • Feedback! ”The measure of intelligence is the ability to change." - Albert Einstein Alex Pinto @alexcpsec @MLSecProject @NiddelCorp Alexandre Sieira @AlexandreSieira @MLSecProject @NiddelCorp

Editor's Notes

  1. During this presentation I’ll be making a quick introduction on some important concepts and our views on what threat intelligence is and the many useful things that can and cannot be done with it. Then, Mr. Pinto will present a couple of open source tools made available at the MLSec Project GitHub repository and an analysis of the metrics we gathered from a set of publicly available and private threat intelligence feeds.
  2. Based on what you read on the media and one some online marketing material, attribution is one of the sexiest parts about threat intelligence. If you understand your adversaries, their intent and capabilities, there is a lot you can do within your risk management and even business decisions to better prepare yourself. However, if you want to be realistic you’ll have to admit that as in the case demonstrated here, attribution is really hard to do. This reflects both on the cost of of getting good CTI data with attribution. Also it is not reasonable to expect that any significant proportion of attacks out there will be attributed at all, given the sheer amount of smaller threat actors out there.
  3. This is beautifully illustrated by the Sony breach controversy. So data science to the rescue! The good humored sony.attributed.to website creates plausible attribution reports for the Sony hack. It is based on data sampled from actual attacks data in the VERIS and DBIR databases consistent with how frequently threat actors, locations and methods are actually used.
  4. One other really common application of threat intelligence is building a threat map. After all, how else will upper management know that your team really has what it takes to prevent pandas, bears and even maybe capybaras from infiltrating your networks? Who knows what those samba-dancing Brazilian hackers are up to, after all? Fear not, now there’s an open source threat map that you can use called Pew pew. And you don’t even need to stand up your own honeypots. Pew pew displays plausible attack patterns that are sampled randomly from public data made available by Arbor Networks. So it is exactly as useful as the real thing.
  5. But seriously, what about using the threat intelligence data to actually defend organizations? The high level data is awesome to help with the high level decisions, of course. However, how do you go about using the technical indicators? It’s straightforward enough to pull a list of IP addresses, domain names and URLs into SIEMs or IDS rules. As Gavin Reid mentioned in his talk yesterday, this can be a great way to reduce the time it takes to detect a novel threat and bypass the detection, rule writing and policy update cycle of your sensors. However, this has to be done with great care.
  6. Threat intelligence feeds are mostly providing indicators of things that malware does: for example IP addresses, domain names and URLs they communicate with. Knowing this can be invaluable for all sorts of investigation and forensics activities. But we really believe the holy grail is detection. A breach detection example the fallacy: some malware observed talking to a destination does NOT mean that all communication towards that destination is indicative of malware. Also you need evaluate the quality and the applicability of the indicators you are consuming (and perhaps paying for) to decide which combination of sources is optimal for your organization. Now Mr. Pinto will tell you a bit about two open source projects we develop to help you perform such an evaluation.
  7. TESTING THE DATA Population is tricky: - Could mean the entire world (all IPv4 space) - Should ideally mean your world (use your data)
  8. 2014 was not a leap year 150k outbound 300k inbound 450k X 365 -> 164,250,000 / 165 MILLION
  9. 28 feeds
  10. - For the hostname / domains feeds: - We extracted the "active" IP addresses for those hostnames on the dates they were reported (using pDNS from Farsight Security) - Passive DNS query of active "A" responses on the reported day (from 00:00 to 23:59) - For this experiment, we got rid of "non-public IPs" (localhost, RFC1918) - Then for each IP record (including the ones got above) enrich it with: - asnumber and asname (from MaxMind ASN DB) - country (from MaxMind GeoLite DB) - rhost - the more popular reverse DNS entry ("PTR") from passive DNS on that date
  11. - Then for each IP record (including the ones got above) enrich it with: - asnumber and asname (from MaxMind ASN DB) - country (from MaxMind GeoLite DB) - rhost - the more popular reverse DNS entry ("PTR") from passive DNS on that date - we are not playing around with this on this talk, data is too sparse
  12. NOVELTY: Always request a trial of the data feed (15/30 days) Measure addition and churn
  13. OVERLAP
  14. OVERLAP
  15. OVERLAP
  16. POPULATION: We will use the ASN and GEO databases as our population - They should cover all the existing IPv4 space, give or take a few anomalies With this, we can simulate a draw from this population for people that are going to be attacking us because human beings are unpredictable and will never be able to forecast where the attacks are coming from right? :troll: ----- COUNTRY / ASN (graphs of country proportions on different feeds)
  17. Inbound – Turkey WINS! US is highly above the population average, and CN is slightly below
  18. ---- HYPOTHESIS TESTING OF PROPORTIONS (explain exact binom test vs. Chi-squared)
  19. ---- HYPOTHESIS TESTING OF PROPORTIONS (explain exact binom test vs. Chi-squared)
  20. -> Explain the diferences - These differences describe a higher probability of specific actors targeting you from specific locations (GEO/ASN) in relation to a completely random actor. - Was one of the 1st features I used for MLSec
  21. Get 100 fish in a pond Tag all Get 100 more fish – how many were tagged? 5?-> 20x more fish
  22. Intermission