SlideShare a Scribd company logo
1 of 15
Jonathan Sander, Chief Technology Officer
@sanderiam
Pushing Machine
Learning Down the
Security Stack to Make
It Effective
AGENDA
Machine Learning Basics to form
a vocabulary about what & why
Watching Machine Learning being
applied to SIEM
Why many SIEM Machine Learning
applications fail
The lesson we took away and how
we have applied it
“Machine Learning” by XKCD
(https://xkcd.com/1838/)
MACHINE
LEARNING
ALGORITHM
Neural network,
naïve Bayes,
decision tree,
clustering,
regression, etc.
MODEL
The way you will
make the algorithm
apply to your use
case
FEATURES
If the model is the
graph, then these
are the points and
lines
DATA
The reason it’s
called “data
science” is because
this is where the
real work is
“Deep Learning Cars” by Samuel Arzt
(https://www.youtube.com/watch?v=Aut32pR5PQA)
You use Machine Learning when you
know the data and the outcome, but
not how to turn one into the
other. (Sort of… but that’s a good place to
start)
WHY USE ML? WHAT IS IT DOING THAT’S ATTRACTIVE?
• Machine Learning makes prediction cheap.
• How many oil burning vehicles were there when the first wells were
dug?
• How many business problems were broken down into arithmetic before
the first computers were being introduced outside research?
• How much communication was digital ready when the internet was first
born?
• What problem will we transform from their current form to prediction
problems?
WE DIDN’T KNOW MACHINE LEARNING WAS A HAMMER YET,
BUT SIEM SURE LOOKED LIKE A NAIL…
• SIEM has tons of data coming in from many sources (when you’re
doing it right)
• The outcomes that are desired are pretty clear
– Find things that represent leading indicators and threats
– Guide systems and staff to address those conditions by arming them with
data
• SIEM begins its life as log aggregation, morphs into a “single pane
of glass” and then changes again to be “analytics” (or at least the
data stream for it)
• This is all rule based in the beginning, which is like the worst
game of whack a mole
• Then we have the emergence on UEBA and others that use more math
and ML methods to attempt to cut through that noise and pump up the
signal
It’s at this point we collectively learn the phrases
“false positive” and “false negative”
SIEM
EVT
FEEDS
SYSLOG
FEEDS
SECURITY
WINDOWS STUFF? NETWORK STUFF?LINUX/UNIX THREATS SCANNERSAPPS
IF AT FIRST YOU DON’T SUCCEED…
PEER GROUP ANALYSIS
HYBRID
REDEFINED SUCCESS
PEER GROUP ANALYSIS
This takes the data and corrals
it, but doesn’t solve all the
problems (ie. what happens when
peer groups shift with
seasons).
REDEFINED SUCCESS
The goal becomes pure anomaly
detection and there are efforts
to use “fine tuning” of Machine
Learning models and algorithms.
HYBRID
There is a combination of
aggressive rules and Machine
Learning. The rules essentially
try to narrow the scope of
data.
None of this is using Machine Learning
to do what it is best at doing. My
thesis is the issue is we never got
over being rule based – being
procedural. The troubles were not to be
solved by refinements in how the
systems work.The trouble is the data.
Who had the title or role Webmaster
at some point in their career?
Let’s embrace the Data Scientist.
FEATURES
ARE HARD
DATA
SCIENCE
MUST BE
DONE CLOSE
TO THE DATA
ALGORITHMS
GET THE
ATTENTION
BECAUSE
MATH SEEMS
LIKE THE
COOL PART
WHY WE CALL IT “DATA SCIENCE”
HOW THIS TRANSLATED INTO OUR JOURNEY
• We came out of the gate attacking the UEBA problem
at the top
– We were fooled by early success because we
modeled our data, which we knew well, in ways
that yielded useful results
– Then we tackled problems of scale,
architecture, etc. (geek comfort food)
• When we started feeding the system other data, it
failed
– That data didn’t fit our models, and we weren't
even looking at the right features for that
type of data in a similar model
– At this stage, I suspect even the algorithms
didn’t apply
• We scaled back immensely and decided to conquer
the data we knew well that we had success with to
UEBA
THINGS WE’VE IGNORED &
WHERE TO FIND THEM…
SUPERVISED VS. UNSUPERVISED MACHINE LEARNING
ADVERSARIAL MODELS
THE “HOW” FROM A TECH POINT OF VIEW
THE SUPER-COOL MATH!
The “deep learning” series by
3Blue1Brown
https://www.youtube.com/watch?v=aircAruvnKk
Get your hands dirty:
https://machinelearningmastery.com/machine-learning-in-python-step-by-step/
The Master Algorithm
How the Quest for the Ultimate Learning
Machine Will Remake Our World
by Pedro Domingos
ISBN-13: 9780465065707
Prediction Machines
The Simple Economics of Artificial Intelligence
by Ajay Agrawal,
Joshua Gans & Avi Goldfarb
ISBN-13: 9781633695672
Thank you!

More Related Content

Similar to Pushing Machine Learning Down the Security Stack to Make It More Effective for @Identiverse

End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning ProjectEng Teong Cheah
 
San Francisco Hacker News - Machine Learning for Hackers
San Francisco Hacker News - Machine Learning for HackersSan Francisco Hacker News - Machine Learning for Hackers
San Francisco Hacker News - Machine Learning for HackersAdam Gibson
 
Defcon 21-pinto-defending-networks-machine-learning by pseudor00t
Defcon 21-pinto-defending-networks-machine-learning by pseudor00tDefcon 21-pinto-defending-networks-machine-learning by pseudor00t
Defcon 21-pinto-defending-networks-machine-learning by pseudor00tpseudor00t overflow
 
Frameworks of the IBM Systems Journal
Frameworks of the IBM Systems JournalFrameworks of the IBM Systems Journal
Frameworks of the IBM Systems JournalThe Open Group SA
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data AnalyticsOsman Ali
 
Modex Talks - AI Conceptual Overview
Modex Talks - AI Conceptual OverviewModex Talks - AI Conceptual Overview
Modex Talks - AI Conceptual OverviewModex
 
How to make m achines learn
How to make m achines learnHow to make m achines learn
How to make m achines learniskamegy
 
System Dynamics (Sd) & Agent Based Modelling
System Dynamics (Sd) & Agent Based ModellingSystem Dynamics (Sd) & Agent Based Modelling
System Dynamics (Sd) & Agent Based ModellingdseConsulting
 
Using Neo4j and Machine Learning to Create a Decision Engine, CluedIn
Using Neo4j and Machine Learning  to Create a Decision Engine, CluedInUsing Neo4j and Machine Learning  to Create a Decision Engine, CluedIn
Using Neo4j and Machine Learning to Create a Decision Engine, CluedInNeo4j
 
Put Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwarePut Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwareTheo Schlossnagle
 
Monitoring Models in Production
Monitoring Models in ProductionMonitoring Models in Production
Monitoring Models in ProductionJannes Klaas
 
BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6Rod Soto
 
Neural Networks and Deep Learning
Neural Networks and Deep LearningNeural Networks and Deep Learning
Neural Networks and Deep LearningAsim Jalis
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprisemark madsen
 
Decision Intelligence: How AI and DI (and YOU) are Evolving to the Next Level
Decision Intelligence: How AI and DI (and YOU) are Evolving to the Next LevelDecision Intelligence: How AI and DI (and YOU) are Evolving to the Next Level
Decision Intelligence: How AI and DI (and YOU) are Evolving to the Next LevelLorien Pratt
 
Careers in System Administration (2007)
Careers in System Administration (2007)Careers in System Administration (2007)
Careers in System Administration (2007)Fran Fabrizio
 
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...Agile Testing Alliance
 

Similar to Pushing Machine Learning Down the Security Stack to Make It More Effective for @Identiverse (20)

End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
 
San Francisco Hacker News - Machine Learning for Hackers
San Francisco Hacker News - Machine Learning for HackersSan Francisco Hacker News - Machine Learning for Hackers
San Francisco Hacker News - Machine Learning for Hackers
 
Defcon 21-pinto-defending-networks-machine-learning by pseudor00t
Defcon 21-pinto-defending-networks-machine-learning by pseudor00tDefcon 21-pinto-defending-networks-machine-learning by pseudor00t
Defcon 21-pinto-defending-networks-machine-learning by pseudor00t
 
Frameworks of the IBM Systems Journal
Frameworks of the IBM Systems JournalFrameworks of the IBM Systems Journal
Frameworks of the IBM Systems Journal
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Modex Talks - AI Conceptual Overview
Modex Talks - AI Conceptual OverviewModex Talks - AI Conceptual Overview
Modex Talks - AI Conceptual Overview
 
How to make m achines learn
How to make m achines learnHow to make m achines learn
How to make m achines learn
 
System Dynamics (Sd) & Agent Based Modelling
System Dynamics (Sd) & Agent Based ModellingSystem Dynamics (Sd) & Agent Based Modelling
System Dynamics (Sd) & Agent Based Modelling
 
Using Neo4j and Machine Learning to Create a Decision Engine, CluedIn
Using Neo4j and Machine Learning  to Create a Decision Engine, CluedInUsing Neo4j and Machine Learning  to Create a Decision Engine, CluedIn
Using Neo4j and Machine Learning to Create a Decision Engine, CluedIn
 
Put Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwarePut Some SRE in Your Shipped Software
Put Some SRE in Your Shipped Software
 
Collab365 Empower-Your-Applications-With-Azure-Machine-Learning
Collab365 Empower-Your-Applications-With-Azure-Machine-LearningCollab365 Empower-Your-Applications-With-Azure-Machine-Learning
Collab365 Empower-Your-Applications-With-Azure-Machine-Learning
 
Monitoring Models in Production
Monitoring Models in ProductionMonitoring Models in Production
Monitoring Models in Production
 
BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6
 
Data Structure and Algorithms
Data Structure and AlgorithmsData Structure and Algorithms
Data Structure and Algorithms
 
Neural Networks and Deep Learning
Neural Networks and Deep LearningNeural Networks and Deep Learning
Neural Networks and Deep Learning
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprise
 
Decision Intelligence: How AI and DI (and YOU) are Evolving to the Next Level
Decision Intelligence: How AI and DI (and YOU) are Evolving to the Next LevelDecision Intelligence: How AI and DI (and YOU) are Evolving to the Next Level
Decision Intelligence: How AI and DI (and YOU) are Evolving to the Next Level
 
Careers in System Administration (2007)
Careers in System Administration (2007)Careers in System Administration (2007)
Careers in System Administration (2007)
 
Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
 
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
 

Recently uploaded

Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxDilipVasan
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...Amil baba
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdfvyankatesh1
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfRobertoOcampo24
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group MeetingAlison Pitt
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfMichaelSenkow
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Valters Lauzums
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeralNABLAS株式会社
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfscitechtalktv
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理pyhepag
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonPayment Village
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理cyebo
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictJack Cole
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxStephen266013
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理pyhepag
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyRafigAliyev2
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理pyhepag
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp onlinebalibahu1313
 

Recently uploaded (20)

Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 

Pushing Machine Learning Down the Security Stack to Make It More Effective for @Identiverse

  • 1. Jonathan Sander, Chief Technology Officer @sanderiam Pushing Machine Learning Down the Security Stack to Make It Effective
  • 2. AGENDA Machine Learning Basics to form a vocabulary about what & why Watching Machine Learning being applied to SIEM Why many SIEM Machine Learning applications fail The lesson we took away and how we have applied it “Machine Learning” by XKCD (https://xkcd.com/1838/)
  • 3. MACHINE LEARNING ALGORITHM Neural network, naïve Bayes, decision tree, clustering, regression, etc. MODEL The way you will make the algorithm apply to your use case FEATURES If the model is the graph, then these are the points and lines DATA The reason it’s called “data science” is because this is where the real work is
  • 4. “Deep Learning Cars” by Samuel Arzt (https://www.youtube.com/watch?v=Aut32pR5PQA)
  • 5. You use Machine Learning when you know the data and the outcome, but not how to turn one into the other. (Sort of… but that’s a good place to start)
  • 6. WHY USE ML? WHAT IS IT DOING THAT’S ATTRACTIVE? • Machine Learning makes prediction cheap. • How many oil burning vehicles were there when the first wells were dug? • How many business problems were broken down into arithmetic before the first computers were being introduced outside research? • How much communication was digital ready when the internet was first born? • What problem will we transform from their current form to prediction problems?
  • 7. WE DIDN’T KNOW MACHINE LEARNING WAS A HAMMER YET, BUT SIEM SURE LOOKED LIKE A NAIL… • SIEM has tons of data coming in from many sources (when you’re doing it right) • The outcomes that are desired are pretty clear – Find things that represent leading indicators and threats – Guide systems and staff to address those conditions by arming them with data • SIEM begins its life as log aggregation, morphs into a “single pane of glass” and then changes again to be “analytics” (or at least the data stream for it) • This is all rule based in the beginning, which is like the worst game of whack a mole • Then we have the emergence on UEBA and others that use more math and ML methods to attempt to cut through that noise and pump up the signal
  • 8. It’s at this point we collectively learn the phrases “false positive” and “false negative”
  • 9. SIEM EVT FEEDS SYSLOG FEEDS SECURITY WINDOWS STUFF? NETWORK STUFF?LINUX/UNIX THREATS SCANNERSAPPS
  • 10. IF AT FIRST YOU DON’T SUCCEED… PEER GROUP ANALYSIS HYBRID REDEFINED SUCCESS PEER GROUP ANALYSIS This takes the data and corrals it, but doesn’t solve all the problems (ie. what happens when peer groups shift with seasons). REDEFINED SUCCESS The goal becomes pure anomaly detection and there are efforts to use “fine tuning” of Machine Learning models and algorithms. HYBRID There is a combination of aggressive rules and Machine Learning. The rules essentially try to narrow the scope of data.
  • 11. None of this is using Machine Learning to do what it is best at doing. My thesis is the issue is we never got over being rule based – being procedural. The troubles were not to be solved by refinements in how the systems work.The trouble is the data.
  • 12. Who had the title or role Webmaster at some point in their career? Let’s embrace the Data Scientist.
  • 13. FEATURES ARE HARD DATA SCIENCE MUST BE DONE CLOSE TO THE DATA ALGORITHMS GET THE ATTENTION BECAUSE MATH SEEMS LIKE THE COOL PART WHY WE CALL IT “DATA SCIENCE”
  • 14. HOW THIS TRANSLATED INTO OUR JOURNEY • We came out of the gate attacking the UEBA problem at the top – We were fooled by early success because we modeled our data, which we knew well, in ways that yielded useful results – Then we tackled problems of scale, architecture, etc. (geek comfort food) • When we started feeding the system other data, it failed – That data didn’t fit our models, and we weren't even looking at the right features for that type of data in a similar model – At this stage, I suspect even the algorithms didn’t apply • We scaled back immensely and decided to conquer the data we knew well that we had success with to UEBA
  • 15. THINGS WE’VE IGNORED & WHERE TO FIND THEM… SUPERVISED VS. UNSUPERVISED MACHINE LEARNING ADVERSARIAL MODELS THE “HOW” FROM A TECH POINT OF VIEW THE SUPER-COOL MATH! The “deep learning” series by 3Blue1Brown https://www.youtube.com/watch?v=aircAruvnKk Get your hands dirty: https://machinelearningmastery.com/machine-learning-in-python-step-by-step/ The Master Algorithm How the Quest for the Ultimate Learning Machine Will Remake Our World by Pedro Domingos ISBN-13: 9780465065707 Prediction Machines The Simple Economics of Artificial Intelligence by Ajay Agrawal, Joshua Gans & Avi Goldfarb ISBN-13: 9781633695672 Thank you!

Editor's Notes

  1. An algorithm Neural network, naïve Bayes, decision tree, clustering, regression, … A model The way you will make the algorithm apply to your use case The features If the model is the graph, then these are the points and lines The data The reason it’s called “data science” is because this is where the real work is
  2. We get things like peer group analysis This takes the data and corrals it, but doesn’t solve all the problems For example, what happens when peer groups shift with seasons Some redefine success The goal becomes pure anomaly detection There are efforts to use “fine tuning” of ML models and algorithms Others go “hybrid” There is a combination of aggressive rules and ML The rules essentially try to narrow the scope of data
  3. Features are hard Everyone in software knows that picking what to measure is hard This is doubly so when you don’t get explicit feedback on feature impact Data science must be done close to the data The ingenuity of measuring the “lines of sight” for the cars does not translate Bring your expertise. You will find you know features instinctively – hardest part is making that explicit. Algorithms get the attention because (ironically) math seems like the cool part …but picking the right algorithm is also about knowing the data
  4. We came out of the gate attacking the UEBA problem at the top We were fooled by early success because we modeled our data, which we knew well, in ways that yielded useful results Then we tackled problems of scale, architecture, etc. (geek comfort food) When we started feeding the system other data, it failed That data didn’t fit our models, and we weren't even looking at the right features for that type of data in a similar model At this stage, I suspect even the algorithms didn’t apply We scaled back immensely and decided to conquer the data we knew well that we had success with to start with – and did so with a vastly simpler set of technology