SlideShare a Scribd company logo
Defending Networks with Incomplete
Information: A Machine Learning
Approach
Alexandre Pinto
alexcp@mlsecproject.org
@alexcpsec
@MLSecProject
• Security Monitoring: We are doing it wrong
• Machine Learning and the Robot Uprising
• More attacks = more data = better defenses
• Case study: Model to detect malicious agents
• MLSec Project
• Acknowledgments and thanks
Agenda
• 12 years in Information Security, done a little bit
of everything.
• Past 7 or so years leading security consultancy
and monitoring teams in Brazil, London and the
US.
– If there is any way a SIEM can hurt you, it did to me.
• Researching machine learning and data science in
general for the past year or so. Active competitor
in Kaggle machine learning competitions.
Who’s this guy?
• Logs, logs everywhere
• Where?
– Log management
– SIEM solutions
The Monitoring Problem
• Why?
– Compliance
– Incident Response
• Gartner Magic Quadrant for Security Information and Event
Management 2013.
– “Organizations are failing at early breach detection, with more than 92% of
breaches undetected by the breached organization”
– “We continue to see large companies that are re-evaluating SIEM vendors
to replace SIEM technology associated with partial, marginal or failed
deployments.”
• Are these the right tools for the job?
Monitoring / Log Management is Hard
• SANS Eighth Annual 2012 Log and Event Management Survey Results
(http://www.sans.org/reading_room/analysts_program/SortingThruNoise.pdf)
Monitoring / Log Management is Hard
• However, there are
individuals who will
do a good job
• How many do you
know?
• DAM hard (ouch!) to
find these capable
professionals
Not exclusively a tool problem
• How many of these
very qualified
professionals will
we need?
• How many know/
will learn statistics,
data analysis, data
science?
Next up: Big Data Technologies
• How many of these
very qualified
professionals will
we need?
• How many know/
will learn statistics,
data analysis, data
science?
Next up: Big Data Technologies
We need an Army! Of ROBOTS!
• “Machine learning systems automatically learn
programs from data” (*)
• You don’t really code the program, but it is
inferred from data.
• Intuition of trying to mimic the way the brain
learns: that’s where terms like artificial
intelligence come from.
Enter Machine Learning
(*) CACM 55(10) - A Few Useful Things to Know about Machine Learning (Domingos 2012)
• Sales
Applications of Machine Learning
• Trading
• Image and
Voice
Recognition
• Supervised Learning:
– Classification (NN, SVM,
Naïve Bayes)
– Regression (linear,
logistic)
Kinds of Machine Learning
Source – scikit-learn.github.io/scikit-learn-tutorial/general_concepts.html
• Unsupervised Learning :
– Clustering (k-means)
– Decomposition (PCA,
SVD)
• The original use case for
ML in Information Security
• Remember the “Bayesian
filters”? There you go.
• How many talks have you
been hearing about SPAM
filtering lately? ;)
Remember SPAM filters?
So what is the fuss?
• Models will get better with more data
– We always have to consider bias and variance as we
select our data points
• “I’ve got 99 problems, but data ain’t one”
Domingos, 2012 Abu-Mostafa, Caltech, 2012
Designing a model to detect external
agents with malicious behavior
• We’ve got all that log data anyway, let’s dig into it
• Most important thing is the “feature engineering”
Model: Data Collection
• Firewall block data from SANS DShield (per day)
• Firewalls, really? Yes, but could be anything.
• We get summarized “malicious” data per port
Not quite “Big Data”, but enough to play
around
Model Intuition: Proximity
• Assumptions to aggregate the data
• Correlation / proximity / similarity BY BEHAVIOUR
• “Bad Neighborhoods” concept:
– Spamhaus x CyberBunker
– Google Report (June 2013)
– Moura 2013
• Group by Netblock
• Group by ASN (thanks, TC)
Model Intuition: Temporal Decay
• Even bad neighborhoods renovate:
– Agents may change ISP, Botnets may be shut down
– Paranoia can be ok, but not EVERYONE is out to get
you
• As days pass, let’s forget, bit by bit, who attacked
• A Half-Life decay function will do just fine
Model Intuition: Temporal Decay
Model: Calculate Features
• Cluster your data: what
behavior are you trying to
predict?
• Create “Badness” Rank =
lwRank (just because)
• Calculate normalized ranks by
IP, Netblock (16, 24) and ASN
• Missing ASNs and Bogons (we
still have those) handled
separately, get higher ranks.
Model: Calculate Features
• We will have a rank calculation per day
– Each “day-rank” will accumulate all the knowledge
we gathered on that IP, Netblock and ASN to that day
• We NEED different days for the training data
• Each entry will have its date:
– Use that “day-rank”
– NO cheating
– Survivorship bias issues!
How are we doing so far?
Training the Model
• YAY! We have a bunch of numbers per IP address!
– How can I use this?
• We get the latest blocked log files (SANS or not):
– We have “badness” data on IP Addresses - features
– If they are blocked, they are “malicious” - label
• Sounds familiar?
• Now, for each behavior to predict:
– Create a dataset with “enough” observations:
– ROT of 50k - 60k because of empirical dimensionality.
Negative and Positive Observations
• We also require “non-
malicious” IPs!
• If we just feed the
algorithms with one label,
they will get lazy.
• CHEAP TRICK: Everything
is “malicious”
• Gather “non-malicious” IP
addresses from Alexa and
Chromium Top 1m Sites.
SVM FTW!
• Use your favorite algorithm! YMMV.
• I chose Support Vector Machines (SVM):
– Good for classification problems with numeric features
– Not a lot of features, so it helps control overfitting, built in
regularization in the model, usually robust
– Also awesome: hyperplane separation on an unknown infinite
dimension.
Jesse Johnson – shapeofdata.wordpress.com
No idea… Everyone copies this one
Results: Training Data
• Cross-Validation: method to test the data against itself
• On the training data itself, 85 to 95% accuracy
• Accuracy = (things we got right) / (everything we had)
• Some behaviors are much more predictable than others:
– Port 3389 is close to the 95%
– Port 22 is close to the 85%
– SANS has much more data on port 3389. Hmmm……
Results: New Data
• And what about new data?
• With new data we know the labels, we find:
– 80 – 85% true positive rate (sensitivity)
– 85 – 90% true negative rate (specificity)
• This means that:
– If the model says something is “bad”, it is 5.3 to 8.5 times
MORE LIKELY to be bad.
• Think about this. Our statistical intuition is bad.
• Wouldn’t you rather have your analysts look at these?
Results: Really New Data
Final Remarks
• These and other algorithms are being developed in a
personal project of mine: MLSec Project
• Sign up, send logs, receive reports generated by models!
– FREE! I need the data! Please help! ;)
• Looking for contributors, ideas, skeptics to support project
as well.
• Please visit http://mlsecproject.org or just e-mail me.
Thanks!
• Q&A?
• Don’t forget your feedback
forms!
Alexandre Pinto
alexcp@mlsecproject.org
@alexcpsec
@MLSecProject

More Related Content

What's hot

MLSEV Virtual. My first BigML Project
MLSEV Virtual. My first BigML ProjectMLSEV Virtual. My first BigML Project
MLSEV Virtual. My first BigML Project
BigML, Inc
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Xavier Amatriain
 
Machine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora ExampleMachine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora Example
Xavier Amatriain
 
Lean DevOps - Lessons Learned from Innovation-driven Companies
Lean DevOps - Lessons Learned from Innovation-driven CompaniesLean DevOps - Lessons Learned from Innovation-driven Companies
Lean DevOps - Lessons Learned from Innovation-driven Companies
Xavier Amatriain
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
Koundinya Desiraju
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
Frank Evans
 
DEFCON 23 - John Seymour - “Quantum” Classification of Malware
DEFCON 23 - John Seymour - “Quantum” Classification of MalwareDEFCON 23 - John Seymour - “Quantum” Classification of Malware
DEFCON 23 - John Seymour - “Quantum” Classification of Malware
Felipe Prado
 

What's hot (7)

MLSEV Virtual. My first BigML Project
MLSEV Virtual. My first BigML ProjectMLSEV Virtual. My first BigML Project
MLSEV Virtual. My first BigML Project
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
Machine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora ExampleMachine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora Example
 
Lean DevOps - Lessons Learned from Innovation-driven Companies
Lean DevOps - Lessons Learned from Innovation-driven CompaniesLean DevOps - Lessons Learned from Innovation-driven Companies
Lean DevOps - Lessons Learned from Innovation-driven Companies
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
 
DEFCON 23 - John Seymour - “Quantum” Classification of Malware
DEFCON 23 - John Seymour - “Quantum” Classification of MalwareDEFCON 23 - John Seymour - “Quantum” Classification of Malware
DEFCON 23 - John Seymour - “Quantum” Classification of Malware
 

Similar to Defcon 21-pinto-defending-networks-machine-learning by pseudor00t

BSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information SecurityBSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information Security
Alex Pinto
 
Rise of the machines -- Owasp israel -- June 2014 meetup
Rise of the machines -- Owasp israel -- June 2014 meetupRise of the machines -- Owasp israel -- June 2014 meetup
Rise of the machines -- Owasp israel -- June 2014 meetup
Shlomo Yona
 
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Alex Pinto
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
HJ van Veen
 
MLSEV. Machine Learning: Business Perspective
MLSEV. Machine Learning: Business PerspectiveMLSEV. Machine Learning: Business Perspective
MLSEV. Machine Learning: Business Perspective
BigML, Inc
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software Engineering
Miroslaw Staron
 
DutchMLSchool. ML Business Perspective
DutchMLSchool. ML Business PerspectiveDutchMLSchool. ML Business Perspective
DutchMLSchool. ML Business Perspective
BigML, Inc
 
Artificial Intelligence (ML - DL)
Artificial Intelligence (ML - DL)Artificial Intelligence (ML - DL)
Artificial Intelligence (ML - DL)
ShehryarSH1
 
From c# Into Machine Learning
From c# Into Machine LearningFrom c# Into Machine Learning
From c# Into Machine Learning
Dev Raj Gautam
 
Tech essentials for Product managers
Tech essentials for Product managersTech essentials for Product managers
Tech essentials for Product managers
Nitin T Bhat
 
BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6
Rod Soto
 
High time to add machine learning to your information security stack
High time to add machine learning to your information security stackHigh time to add machine learning to your information security stack
High time to add machine learning to your information security stack
Minhaz A V
 
AI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't Changed
AI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't ChangedAI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't Changed
AI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't Changed
Raffael Marty
 
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting Automation
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting AutomationBiting into the Jawbreaker: Pushing the Boundaries of Threat Hunting Automation
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting Automation
Alex Pinto
 
AI & ML in Cyber Security - Why Algorithms Are Dangerous
AI & ML in Cyber Security - Why Algorithms Are DangerousAI & ML in Cyber Security - Why Algorithms Are Dangerous
AI & ML in Cyber Security - Why Algorithms Are Dangerous
Raffael Marty
 
Towards a Threat Hunting Automation Maturity Model
Towards a Threat Hunting Automation Maturity ModelTowards a Threat Hunting Automation Maturity Model
Towards a Threat Hunting Automation Maturity Model
Alex Pinto
 
Semi-Supervised Insight Generation from Petabyte Scale Text Data
Semi-Supervised Insight Generation from Petabyte Scale Text DataSemi-Supervised Insight Generation from Petabyte Scale Text Data
Semi-Supervised Insight Generation from Petabyte Scale Text Data
Tech Triveni
 
Machine Learning in Information Security by Mohammed Zuber
Machine Learning in Information Security by Mohammed ZuberMachine Learning in Information Security by Mohammed Zuber
Machine Learning in Information Security by Mohammed Zuber
OWASP Delhi
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
Justin Basilico
 
Webinar: Machine Learning para Microcontroladores
Webinar: Machine Learning para MicrocontroladoresWebinar: Machine Learning para Microcontroladores
Webinar: Machine Learning para Microcontroladores
Embarcados
 

Similar to Defcon 21-pinto-defending-networks-machine-learning by pseudor00t (20)

BSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information SecurityBSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information Security
 
Rise of the machines -- Owasp israel -- June 2014 meetup
Rise of the machines -- Owasp israel -- June 2014 meetupRise of the machines -- Owasp israel -- June 2014 meetup
Rise of the machines -- Owasp israel -- June 2014 meetup
 
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
MLSEV. Machine Learning: Business Perspective
MLSEV. Machine Learning: Business PerspectiveMLSEV. Machine Learning: Business Perspective
MLSEV. Machine Learning: Business Perspective
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software Engineering
 
DutchMLSchool. ML Business Perspective
DutchMLSchool. ML Business PerspectiveDutchMLSchool. ML Business Perspective
DutchMLSchool. ML Business Perspective
 
Artificial Intelligence (ML - DL)
Artificial Intelligence (ML - DL)Artificial Intelligence (ML - DL)
Artificial Intelligence (ML - DL)
 
From c# Into Machine Learning
From c# Into Machine LearningFrom c# Into Machine Learning
From c# Into Machine Learning
 
Tech essentials for Product managers
Tech essentials for Product managersTech essentials for Product managers
Tech essentials for Product managers
 
BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6
 
High time to add machine learning to your information security stack
High time to add machine learning to your information security stackHigh time to add machine learning to your information security stack
High time to add machine learning to your information security stack
 
AI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't Changed
AI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't ChangedAI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't Changed
AI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't Changed
 
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting Automation
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting AutomationBiting into the Jawbreaker: Pushing the Boundaries of Threat Hunting Automation
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting Automation
 
AI & ML in Cyber Security - Why Algorithms Are Dangerous
AI & ML in Cyber Security - Why Algorithms Are DangerousAI & ML in Cyber Security - Why Algorithms Are Dangerous
AI & ML in Cyber Security - Why Algorithms Are Dangerous
 
Towards a Threat Hunting Automation Maturity Model
Towards a Threat Hunting Automation Maturity ModelTowards a Threat Hunting Automation Maturity Model
Towards a Threat Hunting Automation Maturity Model
 
Semi-Supervised Insight Generation from Petabyte Scale Text Data
Semi-Supervised Insight Generation from Petabyte Scale Text DataSemi-Supervised Insight Generation from Petabyte Scale Text Data
Semi-Supervised Insight Generation from Petabyte Scale Text Data
 
Machine Learning in Information Security by Mohammed Zuber
Machine Learning in Information Security by Mohammed ZuberMachine Learning in Information Security by Mohammed Zuber
Machine Learning in Information Security by Mohammed Zuber
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Webinar: Machine Learning para Microcontroladores
Webinar: Machine Learning para MicrocontroladoresWebinar: Machine Learning para Microcontroladores
Webinar: Machine Learning para Microcontroladores
 

More from pseudor00t overflow

Correo de carlos castaño a vicente castaño nov. 2002 by pseudor00t
Correo de carlos castaño a vicente castaño nov. 2002 by pseudor00tCorreo de carlos castaño a vicente castaño nov. 2002 by pseudor00t
Correo de carlos castaño a vicente castaño nov. 2002 by pseudor00t
pseudor00t overflow
 
Carta ramon isaza by pseudor00t Carta ramon isaza AUC by pseudor00t Ramón Is...
Carta ramon isaza by pseudor00t Carta ramon isaza AUC by pseudor00t Ramón  Is...Carta ramon isaza by pseudor00t Carta ramon isaza AUC by pseudor00t Ramón  Is...
Carta ramon isaza by pseudor00t Carta ramon isaza AUC by pseudor00t Ramón Is...
pseudor00t overflow
 
Bomba lapa artefactos explosivos by pseudor00t
Bomba lapa artefactos explosivos by pseudor00tBomba lapa artefactos explosivos by pseudor00t
Bomba lapa artefactos explosivos by pseudor00t
pseudor00t overflow
 
Colombia paramilitares confidencial by pseudor00t CIA
Colombia paramilitares confidencial by pseudor00t CIAColombia paramilitares confidencial by pseudor00t CIA
Colombia paramilitares confidencial by pseudor00t CIA
pseudor00t overflow
 
Defcon 21-ozavci-vo ip-wars-return-of-the-sip by pseudor00t
Defcon 21-ozavci-vo ip-wars-return-of-the-sip by pseudor00tDefcon 21-ozavci-vo ip-wars-return-of-the-sip by pseudor00t
Defcon 21-ozavci-vo ip-wars-return-of-the-sip by pseudor00t
pseudor00t overflow
 
Defcon 21-caceres-massive-attacks-with-distributed-computing by pseudor00t
Defcon 21-caceres-massive-attacks-with-distributed-computing by pseudor00tDefcon 21-caceres-massive-attacks-with-distributed-computing by pseudor00t
Defcon 21-caceres-massive-attacks-with-distributed-computing by pseudor00t
pseudor00t overflow
 
fingerprinting blackhat by pseudor00t
fingerprinting blackhat by pseudor00tfingerprinting blackhat by pseudor00t
fingerprinting blackhat by pseudor00t
pseudor00t overflow
 
EL RANSOMWARE by pseudor00t
EL RANSOMWARE by pseudor00t EL RANSOMWARE by pseudor00t
EL RANSOMWARE by pseudor00t
pseudor00t overflow
 
Modelo de seguridad Zerotrust by pseudor00t
Modelo de seguridad Zerotrust by pseudor00tModelo de seguridad Zerotrust by pseudor00t
Modelo de seguridad Zerotrust by pseudor00t
pseudor00t overflow
 
Nagios para Dummies By pseudor00t
Nagios para Dummies By pseudor00tNagios para Dummies By pseudor00t
Nagios para Dummies By pseudor00t
pseudor00t overflow
 
Medicamentos disponibles para SARSCOV2 COVID19 by pseudor00t
Medicamentos disponibles para SARSCOV2 COVID19 by pseudor00t Medicamentos disponibles para SARSCOV2 COVID19 by pseudor00t
Medicamentos disponibles para SARSCOV2 COVID19 by pseudor00t
pseudor00t overflow
 
Hacking Etico by pseudor00t
Hacking Etico by pseudor00tHacking Etico by pseudor00t
Hacking Etico by pseudor00t
pseudor00t overflow
 
Criptopunks. La libertad y el futuro de internet by pseudor00t
Criptopunks. La libertad y el futuro de  internet by pseudor00tCriptopunks. La libertad y el futuro de  internet by pseudor00t
Criptopunks. La libertad y el futuro de internet by pseudor00t
pseudor00t overflow
 
Infor nmap6 listado_de_comandos by pseudor00t
Infor nmap6 listado_de_comandos by pseudor00t Infor nmap6 listado_de_comandos by pseudor00t
Infor nmap6 listado_de_comandos by pseudor00t
pseudor00t overflow
 
Metodologia pentesting-dragon jar by pseudor00t
Metodologia pentesting-dragon jar by pseudor00tMetodologia pentesting-dragon jar by pseudor00t
Metodologia pentesting-dragon jar by pseudor00t
pseudor00t overflow
 

More from pseudor00t overflow (15)

Correo de carlos castaño a vicente castaño nov. 2002 by pseudor00t
Correo de carlos castaño a vicente castaño nov. 2002 by pseudor00tCorreo de carlos castaño a vicente castaño nov. 2002 by pseudor00t
Correo de carlos castaño a vicente castaño nov. 2002 by pseudor00t
 
Carta ramon isaza by pseudor00t Carta ramon isaza AUC by pseudor00t Ramón Is...
Carta ramon isaza by pseudor00t Carta ramon isaza AUC by pseudor00t Ramón  Is...Carta ramon isaza by pseudor00t Carta ramon isaza AUC by pseudor00t Ramón  Is...
Carta ramon isaza by pseudor00t Carta ramon isaza AUC by pseudor00t Ramón Is...
 
Bomba lapa artefactos explosivos by pseudor00t
Bomba lapa artefactos explosivos by pseudor00tBomba lapa artefactos explosivos by pseudor00t
Bomba lapa artefactos explosivos by pseudor00t
 
Colombia paramilitares confidencial by pseudor00t CIA
Colombia paramilitares confidencial by pseudor00t CIAColombia paramilitares confidencial by pseudor00t CIA
Colombia paramilitares confidencial by pseudor00t CIA
 
Defcon 21-ozavci-vo ip-wars-return-of-the-sip by pseudor00t
Defcon 21-ozavci-vo ip-wars-return-of-the-sip by pseudor00tDefcon 21-ozavci-vo ip-wars-return-of-the-sip by pseudor00t
Defcon 21-ozavci-vo ip-wars-return-of-the-sip by pseudor00t
 
Defcon 21-caceres-massive-attacks-with-distributed-computing by pseudor00t
Defcon 21-caceres-massive-attacks-with-distributed-computing by pseudor00tDefcon 21-caceres-massive-attacks-with-distributed-computing by pseudor00t
Defcon 21-caceres-massive-attacks-with-distributed-computing by pseudor00t
 
fingerprinting blackhat by pseudor00t
fingerprinting blackhat by pseudor00tfingerprinting blackhat by pseudor00t
fingerprinting blackhat by pseudor00t
 
EL RANSOMWARE by pseudor00t
EL RANSOMWARE by pseudor00t EL RANSOMWARE by pseudor00t
EL RANSOMWARE by pseudor00t
 
Modelo de seguridad Zerotrust by pseudor00t
Modelo de seguridad Zerotrust by pseudor00tModelo de seguridad Zerotrust by pseudor00t
Modelo de seguridad Zerotrust by pseudor00t
 
Nagios para Dummies By pseudor00t
Nagios para Dummies By pseudor00tNagios para Dummies By pseudor00t
Nagios para Dummies By pseudor00t
 
Medicamentos disponibles para SARSCOV2 COVID19 by pseudor00t
Medicamentos disponibles para SARSCOV2 COVID19 by pseudor00t Medicamentos disponibles para SARSCOV2 COVID19 by pseudor00t
Medicamentos disponibles para SARSCOV2 COVID19 by pseudor00t
 
Hacking Etico by pseudor00t
Hacking Etico by pseudor00tHacking Etico by pseudor00t
Hacking Etico by pseudor00t
 
Criptopunks. La libertad y el futuro de internet by pseudor00t
Criptopunks. La libertad y el futuro de  internet by pseudor00tCriptopunks. La libertad y el futuro de  internet by pseudor00t
Criptopunks. La libertad y el futuro de internet by pseudor00t
 
Infor nmap6 listado_de_comandos by pseudor00t
Infor nmap6 listado_de_comandos by pseudor00t Infor nmap6 listado_de_comandos by pseudor00t
Infor nmap6 listado_de_comandos by pseudor00t
 
Metodologia pentesting-dragon jar by pseudor00t
Metodologia pentesting-dragon jar by pseudor00tMetodologia pentesting-dragon jar by pseudor00t
Metodologia pentesting-dragon jar by pseudor00t
 

Recently uploaded

manuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal
manuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaalmanuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal
manuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal
wolfsoftcompanyco
 
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
zoowe
 
制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理
制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理
制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理
cuobya
 
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
zyfovom
 
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
ukwwuq
 
[HUN][hackersuli] Red Teaming alapok 2024
[HUN][hackersuli] Red Teaming alapok 2024[HUN][hackersuli] Red Teaming alapok 2024
[HUN][hackersuli] Red Teaming alapok 2024
hackersuli
 
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
uehowe
 
留学学历(UoA毕业证)奥克兰大学毕业证成绩单官方原版办理
留学学历(UoA毕业证)奥克兰大学毕业证成绩单官方原版办理留学学历(UoA毕业证)奥克兰大学毕业证成绩单官方原版办理
留学学历(UoA毕业证)奥克兰大学毕业证成绩单官方原版办理
bseovas
 
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
uehowe
 
Azure EA Sponsorship - Customer Guide.pdf
Azure EA Sponsorship - Customer Guide.pdfAzure EA Sponsorship - Customer Guide.pdf
Azure EA Sponsorship - Customer Guide.pdf
AanSulistiyo
 
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
bseovas
 
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdf
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdfMeet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdf
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdf
Florence Consulting
 
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
xjq03c34
 
Understanding User Behavior with Google Analytics.pdf
Understanding User Behavior with Google Analytics.pdfUnderstanding User Behavior with Google Analytics.pdf
Understanding User Behavior with Google Analytics.pdf
SEO Article Boost
 
Should Repositories Participate in the Fediverse?
Should Repositories Participate in the Fediverse?Should Repositories Participate in the Fediverse?
Should Repositories Participate in the Fediverse?
Paul Walk
 
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
fovkoyb
 
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
cuobya
 
Ready to Unlock the Power of Blockchain!
Ready to Unlock the Power of Blockchain!Ready to Unlock the Power of Blockchain!
Ready to Unlock the Power of Blockchain!
Toptal Tech
 
Explore-Insanony: Watch Instagram Stories Secretly
Explore-Insanony: Watch Instagram Stories SecretlyExplore-Insanony: Watch Instagram Stories Secretly
Explore-Insanony: Watch Instagram Stories Secretly
Trending Blogers
 
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
vmemo1
 

Recently uploaded (20)

manuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal
manuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaalmanuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal
manuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal
 
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
 
制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理
制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理
制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理
 
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
 
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
 
[HUN][hackersuli] Red Teaming alapok 2024
[HUN][hackersuli] Red Teaming alapok 2024[HUN][hackersuli] Red Teaming alapok 2024
[HUN][hackersuli] Red Teaming alapok 2024
 
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
 
留学学历(UoA毕业证)奥克兰大学毕业证成绩单官方原版办理
留学学历(UoA毕业证)奥克兰大学毕业证成绩单官方原版办理留学学历(UoA毕业证)奥克兰大学毕业证成绩单官方原版办理
留学学历(UoA毕业证)奥克兰大学毕业证成绩单官方原版办理
 
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
 
Azure EA Sponsorship - Customer Guide.pdf
Azure EA Sponsorship - Customer Guide.pdfAzure EA Sponsorship - Customer Guide.pdf
Azure EA Sponsorship - Customer Guide.pdf
 
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
 
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdf
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdfMeet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdf
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdf
 
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
 
Understanding User Behavior with Google Analytics.pdf
Understanding User Behavior with Google Analytics.pdfUnderstanding User Behavior with Google Analytics.pdf
Understanding User Behavior with Google Analytics.pdf
 
Should Repositories Participate in the Fediverse?
Should Repositories Participate in the Fediverse?Should Repositories Participate in the Fediverse?
Should Repositories Participate in the Fediverse?
 
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
 
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
 
Ready to Unlock the Power of Blockchain!
Ready to Unlock the Power of Blockchain!Ready to Unlock the Power of Blockchain!
Ready to Unlock the Power of Blockchain!
 
Explore-Insanony: Watch Instagram Stories Secretly
Explore-Insanony: Watch Instagram Stories SecretlyExplore-Insanony: Watch Instagram Stories Secretly
Explore-Insanony: Watch Instagram Stories Secretly
 
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
 

Defcon 21-pinto-defending-networks-machine-learning by pseudor00t

  • 1. Defending Networks with Incomplete Information: A Machine Learning Approach Alexandre Pinto alexcp@mlsecproject.org @alexcpsec @MLSecProject
  • 2. • Security Monitoring: We are doing it wrong • Machine Learning and the Robot Uprising • More attacks = more data = better defenses • Case study: Model to detect malicious agents • MLSec Project • Acknowledgments and thanks Agenda
  • 3. • 12 years in Information Security, done a little bit of everything. • Past 7 or so years leading security consultancy and monitoring teams in Brazil, London and the US. – If there is any way a SIEM can hurt you, it did to me. • Researching machine learning and data science in general for the past year or so. Active competitor in Kaggle machine learning competitions. Who’s this guy?
  • 4. • Logs, logs everywhere • Where? – Log management – SIEM solutions The Monitoring Problem • Why? – Compliance – Incident Response
  • 5. • Gartner Magic Quadrant for Security Information and Event Management 2013. – “Organizations are failing at early breach detection, with more than 92% of breaches undetected by the breached organization” – “We continue to see large companies that are re-evaluating SIEM vendors to replace SIEM technology associated with partial, marginal or failed deployments.” • Are these the right tools for the job? Monitoring / Log Management is Hard
  • 6. • SANS Eighth Annual 2012 Log and Event Management Survey Results (http://www.sans.org/reading_room/analysts_program/SortingThruNoise.pdf) Monitoring / Log Management is Hard
  • 7. • However, there are individuals who will do a good job • How many do you know? • DAM hard (ouch!) to find these capable professionals Not exclusively a tool problem
  • 8. • How many of these very qualified professionals will we need? • How many know/ will learn statistics, data analysis, data science? Next up: Big Data Technologies
  • 9. • How many of these very qualified professionals will we need? • How many know/ will learn statistics, data analysis, data science? Next up: Big Data Technologies
  • 10. We need an Army! Of ROBOTS!
  • 11. • “Machine learning systems automatically learn programs from data” (*) • You don’t really code the program, but it is inferred from data. • Intuition of trying to mimic the way the brain learns: that’s where terms like artificial intelligence come from. Enter Machine Learning (*) CACM 55(10) - A Few Useful Things to Know about Machine Learning (Domingos 2012)
  • 12. • Sales Applications of Machine Learning • Trading • Image and Voice Recognition
  • 13. • Supervised Learning: – Classification (NN, SVM, Naïve Bayes) – Regression (linear, logistic) Kinds of Machine Learning Source – scikit-learn.github.io/scikit-learn-tutorial/general_concepts.html • Unsupervised Learning : – Clustering (k-means) – Decomposition (PCA, SVD)
  • 14. • The original use case for ML in Information Security • Remember the “Bayesian filters”? There you go. • How many talks have you been hearing about SPAM filtering lately? ;) Remember SPAM filters?
  • 15. So what is the fuss? • Models will get better with more data – We always have to consider bias and variance as we select our data points • “I’ve got 99 problems, but data ain’t one” Domingos, 2012 Abu-Mostafa, Caltech, 2012
  • 16. Designing a model to detect external agents with malicious behavior • We’ve got all that log data anyway, let’s dig into it • Most important thing is the “feature engineering”
  • 17. Model: Data Collection • Firewall block data from SANS DShield (per day) • Firewalls, really? Yes, but could be anything. • We get summarized “malicious” data per port
  • 18. Not quite “Big Data”, but enough to play around
  • 19. Model Intuition: Proximity • Assumptions to aggregate the data • Correlation / proximity / similarity BY BEHAVIOUR • “Bad Neighborhoods” concept: – Spamhaus x CyberBunker – Google Report (June 2013) – Moura 2013 • Group by Netblock • Group by ASN (thanks, TC)
  • 20. Model Intuition: Temporal Decay • Even bad neighborhoods renovate: – Agents may change ISP, Botnets may be shut down – Paranoia can be ok, but not EVERYONE is out to get you • As days pass, let’s forget, bit by bit, who attacked • A Half-Life decay function will do just fine
  • 22. Model: Calculate Features • Cluster your data: what behavior are you trying to predict? • Create “Badness” Rank = lwRank (just because) • Calculate normalized ranks by IP, Netblock (16, 24) and ASN • Missing ASNs and Bogons (we still have those) handled separately, get higher ranks.
  • 23. Model: Calculate Features • We will have a rank calculation per day – Each “day-rank” will accumulate all the knowledge we gathered on that IP, Netblock and ASN to that day • We NEED different days for the training data • Each entry will have its date: – Use that “day-rank” – NO cheating – Survivorship bias issues!
  • 24. How are we doing so far?
  • 25. Training the Model • YAY! We have a bunch of numbers per IP address! – How can I use this? • We get the latest blocked log files (SANS or not): – We have “badness” data on IP Addresses - features – If they are blocked, they are “malicious” - label • Sounds familiar? • Now, for each behavior to predict: – Create a dataset with “enough” observations: – ROT of 50k - 60k because of empirical dimensionality.
  • 26. Negative and Positive Observations • We also require “non- malicious” IPs! • If we just feed the algorithms with one label, they will get lazy. • CHEAP TRICK: Everything is “malicious” • Gather “non-malicious” IP addresses from Alexa and Chromium Top 1m Sites.
  • 27. SVM FTW! • Use your favorite algorithm! YMMV. • I chose Support Vector Machines (SVM): – Good for classification problems with numeric features – Not a lot of features, so it helps control overfitting, built in regularization in the model, usually robust – Also awesome: hyperplane separation on an unknown infinite dimension. Jesse Johnson – shapeofdata.wordpress.com No idea… Everyone copies this one
  • 28. Results: Training Data • Cross-Validation: method to test the data against itself • On the training data itself, 85 to 95% accuracy • Accuracy = (things we got right) / (everything we had) • Some behaviors are much more predictable than others: – Port 3389 is close to the 95% – Port 22 is close to the 85% – SANS has much more data on port 3389. Hmmm……
  • 29. Results: New Data • And what about new data? • With new data we know the labels, we find: – 80 – 85% true positive rate (sensitivity) – 85 – 90% true negative rate (specificity) • This means that: – If the model says something is “bad”, it is 5.3 to 8.5 times MORE LIKELY to be bad. • Think about this. Our statistical intuition is bad. • Wouldn’t you rather have your analysts look at these?
  • 31. Final Remarks • These and other algorithms are being developed in a personal project of mine: MLSec Project • Sign up, send logs, receive reports generated by models! – FREE! I need the data! Please help! ;) • Looking for contributors, ideas, skeptics to support project as well. • Please visit http://mlsecproject.org or just e-mail me.
  • 32. Thanks! • Q&A? • Don’t forget your feedback forms! Alexandre Pinto alexcp@mlsecproject.org @alexcpsec @MLSecProject