3. Traditional Detection Methods
Signatures: Precise detection rules built by security experts
Low false alert rate
Alerts easy to interprete
Not robust to attack variations, to new attacks
Anaël Bonneton PyParis 2017 - SecuML 3/30
4. Traditional Detection Methods
Signatures: Precise detection rules built by security experts
Low false alert rate
Alerts easy to interprete
Not robust to attack variations, to new attacks
Machine Learning !
Anaël Bonneton PyParis 2017 - SecuML 3/30
5. Supervised Detection Model
Training
PDF
PDF PDF
PDFPDF
PDF PDF PDF
PDF
PDFPDF PDF
PDFPDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF
Training Data
Learning Algorithm
Classifier
Predicting
PDF
Ok
Alert
Anaël Bonneton PyParis 2017 - SecuML 4/30
7. Computer Security Specificities
Non Machine Learning Experts
Machine Learning pipeline
Machine Learning jargon
Lack of training data
Few public labelled datasets
No crowdsourcing
Anaël Bonneton PyParis 2017 - SecuML 5/30
8. Computer Security Specificities
Non Machine Learning Experts
Machine Learning pipeline
Machine Learning jargon
Lack of training data
Few public labelled datasets
No crowdsourcing
Need for interpretation
How does the detection model work ?
Why an alert has been raised ?
Anaël Bonneton PyParis 2017 - SecuML 5/30
9. Applying Machine Learning
Deep Learning Frameworks
TensorFlow
Microsoft Cognitive Toolkit
Paddle
Anaël Bonneton PyParis 2017 - SecuML 6/30
10. Applying Machine Learning
Deep Learning Frameworks
TensorFlow
Microsoft Cognitive Toolkit
Paddle
Cloud Solutions
Google Cloud ML
Microsoft Azure
Amazon Machine Learning
Anaël Bonneton PyParis 2017 - SecuML 6/30
11. Applying Machine Learning
Deep Learning Frameworks
TensorFlow
Microsoft Cognitive Toolkit
Paddle
Cloud Solutions
Google Cloud ML
Microsoft Azure
Amazon Machine Learning
Machine Learning Libraries
scikit-learn
Mahout, Weka, Vowpal Wabbit
Anaël Bonneton PyParis 2017 - SecuML 6/30
13. SecuML: Beyound scikit-learn
Scikit-learn
Classification, clustering, dimension reduction, etc.
Scaling, grid search, cross validation, etc.
SecuML
Automation of the Machine Learning pipeline
Interactive labelling to acquire training data at low cost
Graphical User Interface
Anaël Bonneton PyParis 2017 - SecuML 7/30
14. SecuML
Machine Learning for Computer Security Experts
Algorithms (scikit-learn, metric-learn, active learning)
Web user interface (flask server)
Any Type of Data
PDF
PCAP
JavaScriptDOC Netflows
EXE
Anaël Bonneton PyParis 2017 - SecuML 8/30
18. Set up a Detection Model
Training Data
Features and Labels
Training
Diagnosis
Deployment
Anaël Bonneton PyParis 2017 - SecuML 12/30
19. Automation of the Machine Learning Pipeline
Training Pipeline
Scaling
Cross validation to select the hyperparameters
Anaël Bonneton PyParis 2017 - SecuML 13/30
20. Automation of the Machine Learning Pipeline
Training Pipeline
Scaling
Cross validation to select the hyperparameters
Validation of a Detection Model
90% Data
Training
10% Data
Validation
Anaël Bonneton PyParis 2017 - SecuML 13/30
21. Trust in the Detection Model
PDF
PDF PDF
PDFPDF
PDF PDF PDF
PDF
PDFPDF PDF
PDFPDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF
Training Data
Learning Algorithm
Classifier
Understanding the Classifier
How does the detection model work ?
Why an alert has been raised ?
Anaël Bonneton PyParis 2017 - SecuML 14/30
22. Trust in the Detection Model
How does the detection model work ?
CreateDate_correct
producer_space
image_large
Font_count
creator_upper_case
creator_lower_case
Filter_count
Page_count
modDate_correct
OpenAction_count
keywords_unique_count
XFA_count
creator_dot
author_lower_case
-1.5 -1 -0.5 0 0.5 1 1.5
Anaël Bonneton PyParis 2017 - SecuML 15/30
23. Trust in the Detection Model
Why an alert has been raised ?
OpenAction_count
keywords_unique_count
creator_unique_count
creator_dot
AcroForm_count
createDate_correct
producer_space
creator_lower_case
modDate_correct
creator_upper_case
producer_unique_count
image_large
Filter_count
-1.5 -1 -0.5 0 0.5 1 1.5
Anaël Bonneton PyParis 2017 - SecuML 16/30
24. Train and Diagnose a Detection Model
Demo
./SecuML_classification LogisticRegression PDF contagio
Anaël Bonneton PyParis 2017 - SecuML 17/30
25. Train and Diagnose a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 17/30
26. Train and Diagnose a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 17/30
27. Train and Diagnose a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 17/30
28. Train and Diagnose a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 17/30
29. Train and Diagnose a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 17/30
30. Train and Diagnose a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 17/30
31. Train and Diagnose a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 17/30
34. Active Learning Strategy
Expert
Active Learning Strategy
Annotation queriesNew labelled instances
Which instances should be annotated ?
Which instances are the most informative ?
Anaël Bonneton PyParis 2017 - SecuML 20/30
35. Reducing the number of annotations is not enough !
Expert
Active Learning Strategy
Annotation queriesNew labelled instances
Low expert waiting time
Feedback: Your annotations are useful !
User interface for annotating
Anaël Bonneton PyParis 2017 - SecuML 21/30
36. ILAB: Interactive LABelling
A whole annotation system
Active learning strategy
Feedback to the expert
User interface for annotating
Anaël Bonneton PyParis 2017 - SecuML 22/30
40. ILAB’s Active Learning Strategy
Annotations Queries
Close to the decision boundary
Anaël Bonneton PyParis 2017 - SecuML 23/30
41. ILAB’s Active Learning Strategy
Annotations Queries
Close to the decision boundary
Clusters = User-defined Families
Anaël Bonneton PyParis 2017 - SecuML 23/30
42. ILAB’s Active Learning Strategy
Annotations Queries
Close to the decision boundary
Center of the clusters
Clusters = User-defined Families
Anaël Bonneton PyParis 2017 - SecuML 23/30
43. ILAB’s Active Learning Strategy
Annotations Queries
Close to the decision boundary
Center of the clusters
Edge of the clusters
Clusters = User-defined Families
Will be published at RAID 2017.
Anaël Bonneton PyParis 2017 - SecuML 23/30
55. SecuML
Algorithms and Corresponding Interfaces !
Clustering
K-means, Gaussian Mixtures, ...
Instances in each cluster
Projection
PCA, RCA, LDA, LMNN, ..
Projection on two components
Anaël Bonneton PyParis 2017 - SecuML 29/30
56. SecuML is available online !
https://github.com/ANSSI-FR/SecuML
Only for Computer Security experts ?
Model interpretation
Interactive labelling
Data visualization with projections
Clustering display
Anaël Bonneton PyParis 2017 - SecuML 30/30
57. SecuML is available online !
https://github.com/ANSSI-FR/SecuML
Only for Computer Security experts ?
Model interpretation
Interactive labelling
Data visualization with projections
Clustering display
Anaël Bonneton PyParis 2017 - SecuML 30/30