Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
SecuML: Machine Learning
for Computer Security Experts
Anaël Bonneton
anael.bonneton@ssi.gouv.fr
ANSSI, ENS Paris, INRIA
P...
Intrusion Detection System
Information System
Detection
System
Communications
Alerts
Anaël Bonneton PyParis 2017 - SecuML ...
Traditional Detection Methods
Signatures: Precise detection rules built by security experts
 Low false alert rate
 Alerts ...
Traditional Detection Methods
Signatures: Precise detection rules built by security experts
 Low false alert rate
 Alerts ...
Supervised Detection Model
Training
PDF
PDF PDF
PDFPDF
PDF PDF PDF
PDF
PDFPDF PDF
PDFPDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF
Train...
Computer Security Specificities
Non Machine Learning Experts
Machine Learning pipeline
Machine Learning jargon
Anaël Bonnet...
Computer Security Specificities
Non Machine Learning Experts
Machine Learning pipeline
Machine Learning jargon
Lack of trai...
Computer Security Specificities
Non Machine Learning Experts
Machine Learning pipeline
Machine Learning jargon
Lack of trai...
Applying Machine Learning
 Deep Learning Frameworks
 TensorFlow
 Microsoft Cognitive Toolkit
 Paddle
Anaël Bonneton PyPari...
Applying Machine Learning
 Deep Learning Frameworks
 TensorFlow
 Microsoft Cognitive Toolkit
 Paddle
 Cloud Solutions
 Goo...
Applying Machine Learning
 Deep Learning Frameworks
 TensorFlow
 Microsoft Cognitive Toolkit
 Paddle
 Cloud Solutions
 Goo...
SecuML: Beyound scikit-learn
Scikit-learn
Classification, clustering, dimension reduction, etc.
Scaling, grid search, cross...
SecuML: Beyound scikit-learn
Scikit-learn
Classification, clustering, dimension reduction, etc.
Scaling, grid search, cross...
SecuML
Machine Learning for Computer Security Experts
Algorithms (scikit-learn, metric-learn, active learning)
Web user in...
SecuML - Input Data
features.csv
id,f0,f1,f2,f3,f4,...
0,1,3,0,0,0,...
1,1,4,5,4,1,...
2,1,4,32,13,0,...
3,1,3,0,0,0,...
4...
SecuML
1 Set up a Detection Model
2 Acquire a Representative Training Dataset at Low Cost
Anaël Bonneton PyParis 2017 - Se...
Set up
a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 11/30
Set up a Detection Model
Training Data
Features and Labels
Training
Diagnosis
Deployment
Anaël Bonneton PyParis 2017 - Sec...
Automation of the Machine Learning Pipeline
Training Pipeline
Scaling
Cross validation to select the hyperparameters
Anaël...
Automation of the Machine Learning Pipeline
Training Pipeline
Scaling
Cross validation to select the hyperparameters
Valid...
Trust in the Detection Model
PDF
PDF PDF
PDFPDF
PDF PDF PDF
PDF
PDFPDF PDF
PDFPDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF
Training Dat...
Trust in the Detection Model
How does the detection model work ?
CreateDate_correct
producer_space
image_large
Font_count
...
Trust in the Detection Model
Why an alert has been raised ?
OpenAction_count
keywords_unique_count
creator_unique_count
cr...
Train and Diagnose a Detection Model
Demo
./SecuML_classification LogisticRegression PDF contagio
Anaël Bonneton PyParis 2...
Train and Diagnose a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 17/30
Train and Diagnose a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 17/30
Train and Diagnose a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 17/30
Train and Diagnose a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 17/30
Train and Diagnose a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 17/30
Train and Diagnose a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 17/30
Train and Diagnose a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 17/30
Acquire a Representative
Training Dataset at Low Cost
Anaël Bonneton PyParis 2017 - SecuML 18/30
Annotation System
Unlabelled Pool Labelled Dataset
Annotation queries
Security experts = expensive resources
Anaël Bonneto...
Active Learning Strategy
Expert
Active Learning Strategy
Annotation queriesNew labelled instances
Which instances should b...
Reducing the number of annotations is not enough !
Expert
Active Learning Strategy
Annotation queriesNew labelled instance...
ILAB: Interactive LABelling
A whole annotation system
Active learning strategy
Feedback to the expert
User interface for a...
ILAB’s Active Learning Strategy
Annotations Queries
Anaël Bonneton PyParis 2017 - SecuML 23/30
ILAB’s Active Learning Strategy
Annotations Queries
Anaël Bonneton PyParis 2017 - SecuML 23/30
ILAB’s Active Learning Strategy
Annotations Queries
Anaël Bonneton PyParis 2017 - SecuML 23/30
ILAB’s Active Learning Strategy
Annotations Queries
Close to the decision boundary
Anaël Bonneton PyParis 2017 - SecuML 23...
ILAB’s Active Learning Strategy
Annotations Queries
Close to the decision boundary
Clusters = User-defined Families
Anaël B...
ILAB’s Active Learning Strategy
Annotations Queries
Close to the decision boundary
Center of the clusters
Clusters = User-...
ILAB’s Active Learning Strategy
Annotations Queries
Close to the decision boundary
Center of the clusters
Edge of the clus...
ILAB
Demo
./SecuML_activeLearning ILAB PDF contagio
Anaël Bonneton PyParis 2017 - SecuML 24/30
ILAB
Anaël Bonneton PyParis 2017 - SecuML 24/30
ILAB
Anaël Bonneton PyParis 2017 - SecuML 24/30
ILAB
Anaël Bonneton PyParis 2017 - SecuML 24/30
ILAB
Anaël Bonneton PyParis 2017 - SecuML 24/30
SecuML
Anaël Bonneton PyParis 2017 - SecuML 25/30
SecuML
1 Acquire a Representative Training Dataset at Low Cost
2 Set up a Detection Model
Anaël Bonneton PyParis 2017 - Se...
What’s next ?
GUI to launch the experiments
Anaël Bonneton PyParis 2017 - SecuML 27/30
What’s next ?
GUI to launch the experiments
Interpretation of more complex models
CreateDate_correct
producer_space
image_...
What’s next ?
GUI to launch the experiments
Interpretation of more complex models
CreateDate_correct
producer_space
image_...
SecuML
Algorithms and Corresponding Interfaces !
Detection Models
Logistic regression, SVM, Naive Bayes, ...
Performance, ...
SecuML
Algorithms and Corresponding Interfaces !
Clustering
K-means, Gaussian Mixtures, ...
Instances in each cluster
Proj...
SecuML is available online !
https://github.com/ANSSI-FR/SecuML
Only for Computer Security experts ?
Model interpretation
...
SecuML is available online !
https://github.com/ANSSI-FR/SecuML
Only for Computer Security experts ?
Model interpretation
...
Upcoming SlideShare
Loading in …5
×
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0

Share

Download to read offline

• Machine Learning for Computer Security Experts using Python & scikit-learn Anaël Bonneton

Download to read offline

PyParis 2017
http://pyparis.org

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

• Machine Learning for Computer Security Experts using Python & scikit-learn Anaël Bonneton

  1. 1. SecuML: Machine Learning for Computer Security Experts Anaël Bonneton anael.bonneton@ssi.gouv.fr ANSSI, ENS Paris, INRIA PyParis 2017
  2. 2. Intrusion Detection System Information System Detection System Communications Alerts Anaël Bonneton PyParis 2017 - SecuML 2/30
  3. 3. Traditional Detection Methods Signatures: Precise detection rules built by security experts Low false alert rate Alerts easy to interprete Not robust to attack variations, to new attacks Anaël Bonneton PyParis 2017 - SecuML 3/30
  4. 4. Traditional Detection Methods Signatures: Precise detection rules built by security experts Low false alert rate Alerts easy to interprete Not robust to attack variations, to new attacks Machine Learning ! Anaël Bonneton PyParis 2017 - SecuML 3/30
  5. 5. Supervised Detection Model Training PDF PDF PDF PDFPDF PDF PDF PDF PDF PDFPDF PDF PDFPDF PDF PDF PDF PDF PDF PDF PDF Training Data Learning Algorithm Classifier Predicting PDF Ok Alert Anaël Bonneton PyParis 2017 - SecuML 4/30
  6. 6. Computer Security Specificities Non Machine Learning Experts Machine Learning pipeline Machine Learning jargon Anaël Bonneton PyParis 2017 - SecuML 5/30
  7. 7. Computer Security Specificities Non Machine Learning Experts Machine Learning pipeline Machine Learning jargon Lack of training data Few public labelled datasets No crowdsourcing Anaël Bonneton PyParis 2017 - SecuML 5/30
  8. 8. Computer Security Specificities Non Machine Learning Experts Machine Learning pipeline Machine Learning jargon Lack of training data Few public labelled datasets No crowdsourcing Need for interpretation How does the detection model work ? Why an alert has been raised ? Anaël Bonneton PyParis 2017 - SecuML 5/30
  9. 9. Applying Machine Learning Deep Learning Frameworks TensorFlow Microsoft Cognitive Toolkit Paddle Anaël Bonneton PyParis 2017 - SecuML 6/30
  10. 10. Applying Machine Learning Deep Learning Frameworks TensorFlow Microsoft Cognitive Toolkit Paddle Cloud Solutions Google Cloud ML Microsoft Azure Amazon Machine Learning Anaël Bonneton PyParis 2017 - SecuML 6/30
  11. 11. Applying Machine Learning Deep Learning Frameworks TensorFlow Microsoft Cognitive Toolkit Paddle Cloud Solutions Google Cloud ML Microsoft Azure Amazon Machine Learning Machine Learning Libraries scikit-learn Mahout, Weka, Vowpal Wabbit Anaël Bonneton PyParis 2017 - SecuML 6/30
  12. 12. SecuML: Beyound scikit-learn Scikit-learn Classification, clustering, dimension reduction, etc. Scaling, grid search, cross validation, etc. Anaël Bonneton PyParis 2017 - SecuML 7/30
  13. 13. SecuML: Beyound scikit-learn Scikit-learn Classification, clustering, dimension reduction, etc. Scaling, grid search, cross validation, etc. SecuML Automation of the Machine Learning pipeline Interactive labelling to acquire training data at low cost Graphical User Interface Anaël Bonneton PyParis 2017 - SecuML 7/30
  14. 14. SecuML Machine Learning for Computer Security Experts Algorithms (scikit-learn, metric-learn, active learning) Web user interface (flask server) Any Type of Data PDF PCAP JavaScriptDOC Netflows EXE Anaël Bonneton PyParis 2017 - SecuML 8/30
  15. 15. SecuML - Input Data features.csv id,f0,f1,f2,f3,f4,... 0,1,3,0,0,0,... 1,1,4,5,4,1,... 2,1,4,32,13,0,... 3,1,3,0,0,0,... 4,1,3,0,0,0,... 5,1,3,0,0,0,... 6,1,6,7,6,0,... 7,1,3,0,0,0,... 8,1,3,0,0,0,... ... true_labels.csv id,label,family 0,M,CVE-2017-30-10 1,M,CVE-2017-30-10 2,B,slides 3,M,CVE-2016-0945 4,M,CVE-2016-0945 5,B,user_manual 6,B,user_manual 7,B,technical_report 8,B,technical_report ... Anaël Bonneton PyParis 2017 - SecuML 9/30
  16. 16. SecuML 1 Set up a Detection Model 2 Acquire a Representative Training Dataset at Low Cost Anaël Bonneton PyParis 2017 - SecuML 10/30
  17. 17. Set up a Detection Model Anaël Bonneton PyParis 2017 - SecuML 11/30
  18. 18. Set up a Detection Model Training Data Features and Labels Training Diagnosis Deployment Anaël Bonneton PyParis 2017 - SecuML 12/30
  19. 19. Automation of the Machine Learning Pipeline Training Pipeline Scaling Cross validation to select the hyperparameters Anaël Bonneton PyParis 2017 - SecuML 13/30
  20. 20. Automation of the Machine Learning Pipeline Training Pipeline Scaling Cross validation to select the hyperparameters Validation of a Detection Model 90% Data Training 10% Data Validation Anaël Bonneton PyParis 2017 - SecuML 13/30
  21. 21. Trust in the Detection Model PDF PDF PDF PDFPDF PDF PDF PDF PDF PDFPDF PDF PDFPDF PDF PDF PDF PDF PDF PDF PDF Training Data Learning Algorithm Classifier Understanding the Classifier How does the detection model work ? Why an alert has been raised ? Anaël Bonneton PyParis 2017 - SecuML 14/30
  22. 22. Trust in the Detection Model How does the detection model work ? CreateDate_correct producer_space image_large Font_count creator_upper_case creator_lower_case Filter_count Page_count modDate_correct OpenAction_count keywords_unique_count XFA_count creator_dot author_lower_case -1.5 -1 -0.5 0 0.5 1 1.5 Anaël Bonneton PyParis 2017 - SecuML 15/30
  23. 23. Trust in the Detection Model Why an alert has been raised ? OpenAction_count keywords_unique_count creator_unique_count creator_dot AcroForm_count createDate_correct producer_space creator_lower_case modDate_correct creator_upper_case producer_unique_count image_large Filter_count -1.5 -1 -0.5 0 0.5 1 1.5 Anaël Bonneton PyParis 2017 - SecuML 16/30
  24. 24. Train and Diagnose a Detection Model Demo ./SecuML_classification LogisticRegression PDF contagio Anaël Bonneton PyParis 2017 - SecuML 17/30
  25. 25. Train and Diagnose a Detection Model Anaël Bonneton PyParis 2017 - SecuML 17/30
  26. 26. Train and Diagnose a Detection Model Anaël Bonneton PyParis 2017 - SecuML 17/30
  27. 27. Train and Diagnose a Detection Model Anaël Bonneton PyParis 2017 - SecuML 17/30
  28. 28. Train and Diagnose a Detection Model Anaël Bonneton PyParis 2017 - SecuML 17/30
  29. 29. Train and Diagnose a Detection Model Anaël Bonneton PyParis 2017 - SecuML 17/30
  30. 30. Train and Diagnose a Detection Model Anaël Bonneton PyParis 2017 - SecuML 17/30
  31. 31. Train and Diagnose a Detection Model Anaël Bonneton PyParis 2017 - SecuML 17/30
  32. 32. Acquire a Representative Training Dataset at Low Cost Anaël Bonneton PyParis 2017 - SecuML 18/30
  33. 33. Annotation System Unlabelled Pool Labelled Dataset Annotation queries Security experts = expensive resources Anaël Bonneton PyParis 2017 - SecuML 19/30
  34. 34. Active Learning Strategy Expert Active Learning Strategy Annotation queriesNew labelled instances Which instances should be annotated ? Which instances are the most informative ? Anaël Bonneton PyParis 2017 - SecuML 20/30
  35. 35. Reducing the number of annotations is not enough ! Expert Active Learning Strategy Annotation queriesNew labelled instances Low expert waiting time Feedback: Your annotations are useful ! User interface for annotating Anaël Bonneton PyParis 2017 - SecuML 21/30
  36. 36. ILAB: Interactive LABelling A whole annotation system Active learning strategy Feedback to the expert User interface for annotating Anaël Bonneton PyParis 2017 - SecuML 22/30
  37. 37. ILAB’s Active Learning Strategy Annotations Queries Anaël Bonneton PyParis 2017 - SecuML 23/30
  38. 38. ILAB’s Active Learning Strategy Annotations Queries Anaël Bonneton PyParis 2017 - SecuML 23/30
  39. 39. ILAB’s Active Learning Strategy Annotations Queries Anaël Bonneton PyParis 2017 - SecuML 23/30
  40. 40. ILAB’s Active Learning Strategy Annotations Queries Close to the decision boundary Anaël Bonneton PyParis 2017 - SecuML 23/30
  41. 41. ILAB’s Active Learning Strategy Annotations Queries Close to the decision boundary Clusters = User-defined Families Anaël Bonneton PyParis 2017 - SecuML 23/30
  42. 42. ILAB’s Active Learning Strategy Annotations Queries Close to the decision boundary Center of the clusters Clusters = User-defined Families Anaël Bonneton PyParis 2017 - SecuML 23/30
  43. 43. ILAB’s Active Learning Strategy Annotations Queries Close to the decision boundary Center of the clusters Edge of the clusters Clusters = User-defined Families Will be published at RAID 2017. Anaël Bonneton PyParis 2017 - SecuML 23/30
  44. 44. ILAB Demo ./SecuML_activeLearning ILAB PDF contagio Anaël Bonneton PyParis 2017 - SecuML 24/30
  45. 45. ILAB Anaël Bonneton PyParis 2017 - SecuML 24/30
  46. 46. ILAB Anaël Bonneton PyParis 2017 - SecuML 24/30
  47. 47. ILAB Anaël Bonneton PyParis 2017 - SecuML 24/30
  48. 48. ILAB Anaël Bonneton PyParis 2017 - SecuML 24/30
  49. 49. SecuML Anaël Bonneton PyParis 2017 - SecuML 25/30
  50. 50. SecuML 1 Acquire a Representative Training Dataset at Low Cost 2 Set up a Detection Model Anaël Bonneton PyParis 2017 - SecuML 26/30
  51. 51. What’s next ? GUI to launch the experiments Anaël Bonneton PyParis 2017 - SecuML 27/30
  52. 52. What’s next ? GUI to launch the experiments Interpretation of more complex models CreateDate_correct producer_space image_large Font_count creator_upper_case creator_lower_case Filter_count Page_count modDate_correct OpenAction_count keywords_unique_count XFA_count creator_dot author_lower_case -1.5 -1 -0.5 0 0.5 1 1.5 Anaël Bonneton PyParis 2017 - SecuML 27/30
  53. 53. What’s next ? GUI to launch the experiments Interpretation of more complex models CreateDate_correct producer_space image_large Font_count creator_upper_case creator_lower_case Filter_count Page_count modDate_correct OpenAction_count keywords_unique_count XFA_count creator_dot author_lower_case -1.5 -1 -0.5 0 0.5 1 1.5 Automatic feature extraction PDF PCAP JavaScriptDOC Netflows EXE Anaël Bonneton PyParis 2017 - SecuML 27/30
  54. 54. SecuML Algorithms and Corresponding Interfaces ! Detection Models Logistic regression, SVM, Naive Bayes, ... Performance, interpretation Interactive Machine Learning Active learning, Rare category detection Annotations, feedback Anaël Bonneton PyParis 2017 - SecuML 28/30
  55. 55. SecuML Algorithms and Corresponding Interfaces ! Clustering K-means, Gaussian Mixtures, ... Instances in each cluster Projection PCA, RCA, LDA, LMNN, .. Projection on two components Anaël Bonneton PyParis 2017 - SecuML 29/30
  56. 56. SecuML is available online ! https://github.com/ANSSI-FR/SecuML Only for Computer Security experts ? Model interpretation Interactive labelling Data visualization with projections Clustering display Anaël Bonneton PyParis 2017 - SecuML 30/30
  57. 57. SecuML is available online ! https://github.com/ANSSI-FR/SecuML Only for Computer Security experts ? Model interpretation Interactive labelling Data visualization with projections Clustering display Anaël Bonneton PyParis 2017 - SecuML 30/30

PyParis 2017 http://pyparis.org

Views

Total views

889

On Slideshare

0

From embeds

0

Number of embeds

8

Actions

Downloads

40

Shares

0

Comments

0

Likes

0

×