SlideShare a Scribd company logo
1 of 57
Download to read offline
SecuML: Machine Learning
for Computer Security Experts
Anaël Bonneton
anael.bonneton@ssi.gouv.fr
ANSSI, ENS Paris, INRIA
PyParis 2017
Intrusion Detection System
Information System
Detection
System
Communications
Alerts
Anaël Bonneton PyParis 2017 - SecuML 2/30
Traditional Detection Methods
Signatures: Precise detection rules built by security experts
 Low false alert rate
 Alerts easy to interprete
 Not robust to attack variations, to new attacks
Anaël Bonneton PyParis 2017 - SecuML 3/30
Traditional Detection Methods
Signatures: Precise detection rules built by security experts
 Low false alert rate
 Alerts easy to interprete
 Not robust to attack variations, to new attacks
Machine Learning !
Anaël Bonneton PyParis 2017 - SecuML 3/30
Supervised Detection Model
Training
PDF
PDF PDF
PDFPDF
PDF PDF PDF
PDF
PDFPDF PDF
PDFPDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF
Training Data
Learning Algorithm
Classifier
Predicting
PDF
Ok
Alert
Anaël Bonneton PyParis 2017 - SecuML 4/30
Computer Security Specificities
Non Machine Learning Experts
Machine Learning pipeline
Machine Learning jargon
Anaël Bonneton PyParis 2017 - SecuML 5/30
Computer Security Specificities
Non Machine Learning Experts
Machine Learning pipeline
Machine Learning jargon
Lack of training data
Few public labelled datasets
No crowdsourcing
Anaël Bonneton PyParis 2017 - SecuML 5/30
Computer Security Specificities
Non Machine Learning Experts
Machine Learning pipeline
Machine Learning jargon
Lack of training data
Few public labelled datasets
No crowdsourcing
Need for interpretation
How does the detection model work ?
Why an alert has been raised ?
Anaël Bonneton PyParis 2017 - SecuML 5/30
Applying Machine Learning
 Deep Learning Frameworks
 TensorFlow
 Microsoft Cognitive Toolkit
 Paddle
Anaël Bonneton PyParis 2017 - SecuML 6/30
Applying Machine Learning
 Deep Learning Frameworks
 TensorFlow
 Microsoft Cognitive Toolkit
 Paddle
 Cloud Solutions
 Google Cloud ML
 Microsoft Azure
 Amazon Machine Learning
Anaël Bonneton PyParis 2017 - SecuML 6/30
Applying Machine Learning
 Deep Learning Frameworks
 TensorFlow
 Microsoft Cognitive Toolkit
 Paddle
 Cloud Solutions
 Google Cloud ML
 Microsoft Azure
 Amazon Machine Learning
Machine Learning Libraries
 scikit-learn
 Mahout, Weka, Vowpal Wabbit
Anaël Bonneton PyParis 2017 - SecuML 6/30
SecuML: Beyound scikit-learn
Scikit-learn
Classification, clustering, dimension reduction, etc.
Scaling, grid search, cross validation, etc.
Anaël Bonneton PyParis 2017 - SecuML 7/30
SecuML: Beyound scikit-learn
Scikit-learn
Classification, clustering, dimension reduction, etc.
Scaling, grid search, cross validation, etc.
SecuML
Automation of the Machine Learning pipeline
Interactive labelling to acquire training data at low cost
Graphical User Interface
Anaël Bonneton PyParis 2017 - SecuML 7/30
SecuML
Machine Learning for Computer Security Experts
Algorithms (scikit-learn, metric-learn, active learning)
Web user interface (flask server)
Any Type of Data
PDF
PCAP
JavaScriptDOC Netflows
EXE
Anaël Bonneton PyParis 2017 - SecuML 8/30
SecuML - Input Data
features.csv
id,f0,f1,f2,f3,f4,...
0,1,3,0,0,0,...
1,1,4,5,4,1,...
2,1,4,32,13,0,...
3,1,3,0,0,0,...
4,1,3,0,0,0,...
5,1,3,0,0,0,...
6,1,6,7,6,0,...
7,1,3,0,0,0,...
8,1,3,0,0,0,...
...
true_labels.csv
id,label,family
0,M,CVE-2017-30-10
1,M,CVE-2017-30-10
2,B,slides
3,M,CVE-2016-0945
4,M,CVE-2016-0945
5,B,user_manual
6,B,user_manual
7,B,technical_report
8,B,technical_report
...
Anaël Bonneton PyParis 2017 - SecuML 9/30
SecuML
1 Set up a Detection Model
2 Acquire a Representative Training Dataset at Low Cost
Anaël Bonneton PyParis 2017 - SecuML 10/30
Set up
a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 11/30
Set up a Detection Model
Training Data
Features and Labels
Training
Diagnosis
Deployment
Anaël Bonneton PyParis 2017 - SecuML 12/30
Automation of the Machine Learning Pipeline
Training Pipeline
Scaling
Cross validation to select the hyperparameters
Anaël Bonneton PyParis 2017 - SecuML 13/30
Automation of the Machine Learning Pipeline
Training Pipeline
Scaling
Cross validation to select the hyperparameters
Validation of a Detection Model
90% Data
Training
10% Data
Validation
Anaël Bonneton PyParis 2017 - SecuML 13/30
Trust in the Detection Model
PDF
PDF PDF
PDFPDF
PDF PDF PDF
PDF
PDFPDF PDF
PDFPDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF
Training Data
Learning Algorithm
Classifier
Understanding the Classifier
How does the detection model work ?
Why an alert has been raised ?
Anaël Bonneton PyParis 2017 - SecuML 14/30
Trust in the Detection Model
How does the detection model work ?
CreateDate_correct
producer_space
image_large
Font_count
creator_upper_case
creator_lower_case
Filter_count
Page_count
modDate_correct
OpenAction_count
keywords_unique_count
XFA_count
creator_dot
author_lower_case
-1.5 -1 -0.5 0 0.5 1 1.5
Anaël Bonneton PyParis 2017 - SecuML 15/30
Trust in the Detection Model
Why an alert has been raised ?
OpenAction_count
keywords_unique_count
creator_unique_count
creator_dot
AcroForm_count
createDate_correct
producer_space
creator_lower_case
modDate_correct
creator_upper_case
producer_unique_count
image_large
Filter_count
-1.5 -1 -0.5 0 0.5 1 1.5
Anaël Bonneton PyParis 2017 - SecuML 16/30
Train and Diagnose a Detection Model
Demo
./SecuML_classification LogisticRegression PDF contagio
Anaël Bonneton PyParis 2017 - SecuML 17/30
Train and Diagnose a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 17/30
Train and Diagnose a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 17/30
Train and Diagnose a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 17/30
Train and Diagnose a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 17/30
Train and Diagnose a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 17/30
Train and Diagnose a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 17/30
Train and Diagnose a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 17/30
Acquire a Representative
Training Dataset at Low Cost
Anaël Bonneton PyParis 2017 - SecuML 18/30
Annotation System
Unlabelled Pool Labelled Dataset
Annotation queries
Security experts = expensive resources
Anaël Bonneton PyParis 2017 - SecuML 19/30
Active Learning Strategy
Expert
Active Learning Strategy
Annotation queriesNew labelled instances
Which instances should be annotated ?
Which instances are the most informative ?
Anaël Bonneton PyParis 2017 - SecuML 20/30
Reducing the number of annotations is not enough !
Expert
Active Learning Strategy
Annotation queriesNew labelled instances
Low expert waiting time
Feedback: Your annotations are useful !
User interface for annotating
Anaël Bonneton PyParis 2017 - SecuML 21/30
ILAB: Interactive LABelling
A whole annotation system
Active learning strategy
Feedback to the expert
User interface for annotating
Anaël Bonneton PyParis 2017 - SecuML 22/30
ILAB’s Active Learning Strategy
Annotations Queries
Anaël Bonneton PyParis 2017 - SecuML 23/30
ILAB’s Active Learning Strategy
Annotations Queries
Anaël Bonneton PyParis 2017 - SecuML 23/30
ILAB’s Active Learning Strategy
Annotations Queries
Anaël Bonneton PyParis 2017 - SecuML 23/30
ILAB’s Active Learning Strategy
Annotations Queries
Close to the decision boundary
Anaël Bonneton PyParis 2017 - SecuML 23/30
ILAB’s Active Learning Strategy
Annotations Queries
Close to the decision boundary
Clusters = User-defined Families
Anaël Bonneton PyParis 2017 - SecuML 23/30
ILAB’s Active Learning Strategy
Annotations Queries
Close to the decision boundary
Center of the clusters
Clusters = User-defined Families
Anaël Bonneton PyParis 2017 - SecuML 23/30
ILAB’s Active Learning Strategy
Annotations Queries
Close to the decision boundary
Center of the clusters
Edge of the clusters
Clusters = User-defined Families
Will be published at RAID 2017.
Anaël Bonneton PyParis 2017 - SecuML 23/30
ILAB
Demo
./SecuML_activeLearning ILAB PDF contagio
Anaël Bonneton PyParis 2017 - SecuML 24/30
ILAB
Anaël Bonneton PyParis 2017 - SecuML 24/30
ILAB
Anaël Bonneton PyParis 2017 - SecuML 24/30
ILAB
Anaël Bonneton PyParis 2017 - SecuML 24/30
ILAB
Anaël Bonneton PyParis 2017 - SecuML 24/30
SecuML
Anaël Bonneton PyParis 2017 - SecuML 25/30
SecuML
1 Acquire a Representative Training Dataset at Low Cost
2 Set up a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 26/30
What’s next ?
GUI to launch the experiments
Anaël Bonneton PyParis 2017 - SecuML 27/30
What’s next ?
GUI to launch the experiments
Interpretation of more complex models
CreateDate_correct
producer_space
image_large
Font_count
creator_upper_case
creator_lower_case
Filter_count
Page_count
modDate_correct
OpenAction_count
keywords_unique_count
XFA_count
creator_dot
author_lower_case
-1.5 -1 -0.5 0 0.5 1 1.5
Anaël Bonneton PyParis 2017 - SecuML 27/30
What’s next ?
GUI to launch the experiments
Interpretation of more complex models
CreateDate_correct
producer_space
image_large
Font_count
creator_upper_case
creator_lower_case
Filter_count
Page_count
modDate_correct
OpenAction_count
keywords_unique_count
XFA_count
creator_dot
author_lower_case
-1.5 -1 -0.5 0 0.5 1 1.5
Automatic feature extraction
PDF
PCAP
JavaScriptDOC Netflows
EXE
Anaël Bonneton PyParis 2017 - SecuML 27/30
SecuML
Algorithms and Corresponding Interfaces !
Detection Models
Logistic regression, SVM, Naive Bayes, ...
Performance, interpretation
Interactive Machine Learning
Active learning, Rare category detection
Annotations, feedback
Anaël Bonneton PyParis 2017 - SecuML 28/30
SecuML
Algorithms and Corresponding Interfaces !
Clustering
K-means, Gaussian Mixtures, ...
Instances in each cluster
Projection
PCA, RCA, LDA, LMNN, ..
Projection on two components
Anaël Bonneton PyParis 2017 - SecuML 29/30
SecuML is available online !
https://github.com/ANSSI-FR/SecuML
Only for Computer Security experts ?
Model interpretation
Interactive labelling
Data visualization with projections
Clustering display
Anaël Bonneton PyParis 2017 - SecuML 30/30
SecuML is available online !
https://github.com/ANSSI-FR/SecuML
Only for Computer Security experts ?
Model interpretation
Interactive labelling
Data visualization with projections
Clustering display
Anaël Bonneton PyParis 2017 - SecuML 30/30

More Related Content

Similar to • Machine Learning for Computer Security Experts using Python & scikit-learn Anaël Bonneton

Realizing Traceability for Safety and Certainty
Realizing Traceability for Safety and CertaintyRealizing Traceability for Safety and Certainty
Realizing Traceability for Safety and CertaintySteven Vettermann
 
Sopra Steria / Interconnect2017 Digital Architecture
Sopra Steria / Interconnect2017 Digital ArchitectureSopra Steria / Interconnect2017 Digital Architecture
Sopra Steria / Interconnect2017 Digital ArchitectureXavier BERNARD
 
SplunkLive! Frankfurt 2018 - Customer Presentation: Bosch Cyber Defense Center
SplunkLive! Frankfurt 2018 - Customer Presentation: Bosch Cyber Defense CenterSplunkLive! Frankfurt 2018 - Customer Presentation: Bosch Cyber Defense Center
SplunkLive! Frankfurt 2018 - Customer Presentation: Bosch Cyber Defense CenterSplunk
 
Splunk ITOA Roundtable - Zurich: 30th November 2017
Splunk ITOA Roundtable - Zurich: 30th November 2017Splunk ITOA Roundtable - Zurich: 30th November 2017
Splunk ITOA Roundtable - Zurich: 30th November 2017Splunk
 
A Shelter to Protect your Documents
A Shelter to Protect your DocumentsA Shelter to Protect your Documents
A Shelter to Protect your Documentsteam-WIBU
 
MoniTutor
MoniTutorMoniTutor
MoniTutorIcinga
 
Splunk live nyc_2017_sec_buildinganalyticsdrivensoc
Splunk live nyc_2017_sec_buildinganalyticsdrivensocSplunk live nyc_2017_sec_buildinganalyticsdrivensoc
Splunk live nyc_2017_sec_buildinganalyticsdrivensocRene Aguero
 
ML6 talk at Nexxworks Bootcamp
ML6 talk at Nexxworks BootcampML6 talk at Nexxworks Bootcamp
ML6 talk at Nexxworks BootcampKarel Dumon
 
Auto visualization and viml
Auto visualization and vimlAuto visualization and viml
Auto visualization and vimlBill Liu
 
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data LakeITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data LakeITCamp
 
Agent based modeling level 2 - edukite
Agent based modeling level 2 - edukiteAgent based modeling level 2 - edukite
Agent based modeling level 2 - edukiteEduKite
 
KNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To DeploymentKNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To DeploymentKNIMESlides
 
Remote facility monitoring and control using IoT
Remote facility monitoring and control using IoTRemote facility monitoring and control using IoT
Remote facility monitoring and control using IoTBernhard Mehl
 
Sensirion SGP30 Gas Sensor 2018 - teardown reverse costing report published b...
Sensirion SGP30 Gas Sensor 2018 - teardown reverse costing report published b...Sensirion SGP30 Gas Sensor 2018 - teardown reverse costing report published b...
Sensirion SGP30 Gas Sensor 2018 - teardown reverse costing report published b...system_plus
 
PyConPL 2017 - with python: security
PyConPL 2017 - with python: securityPyConPL 2017 - with python: security
PyConPL 2017 - with python: securityPiotr Dyba
 
FLIR Boson – a small, innovative,low power, smart thermal camera core 2017 te...
FLIR Boson – a small, innovative,low power, smart thermal camera core 2017 te...FLIR Boson – a small, innovative,low power, smart thermal camera core 2017 te...
FLIR Boson – a small, innovative,low power, smart thermal camera core 2017 te...Yole Developpement
 
ZF S-Cam 4 – Forward Automotive Mono and Tri Camera for Advanced Driver Assis...
ZF S-Cam 4 – Forward Automotive Mono and Tri Camera for Advanced Driver Assis...ZF S-Cam 4 – Forward Automotive Mono and Tri Camera for Advanced Driver Assis...
ZF S-Cam 4 – Forward Automotive Mono and Tri Camera for Advanced Driver Assis...system_plus
 
Salottino MIX 2017 - ARouteServer - IXP Automation Made Easy
Salottino MIX 2017 - ARouteServer - IXP Automation Made EasySalottino MIX 2017 - ARouteServer - IXP Automation Made Easy
Salottino MIX 2017 - ARouteServer - IXP Automation Made EasyPier Carlo Chiodi
 

Similar to • Machine Learning for Computer Security Experts using Python & scikit-learn Anaël Bonneton (20)

Realizing Traceability for Safety and Certainty
Realizing Traceability for Safety and CertaintyRealizing Traceability for Safety and Certainty
Realizing Traceability for Safety and Certainty
 
Sopra Steria / Interconnect2017 Digital Architecture
Sopra Steria / Interconnect2017 Digital ArchitectureSopra Steria / Interconnect2017 Digital Architecture
Sopra Steria / Interconnect2017 Digital Architecture
 
SplunkLive! Frankfurt 2018 - Customer Presentation: Bosch Cyber Defense Center
SplunkLive! Frankfurt 2018 - Customer Presentation: Bosch Cyber Defense CenterSplunkLive! Frankfurt 2018 - Customer Presentation: Bosch Cyber Defense Center
SplunkLive! Frankfurt 2018 - Customer Presentation: Bosch Cyber Defense Center
 
Splunk ITOA Roundtable - Zurich: 30th November 2017
Splunk ITOA Roundtable - Zurich: 30th November 2017Splunk ITOA Roundtable - Zurich: 30th November 2017
Splunk ITOA Roundtable - Zurich: 30th November 2017
 
A Shelter to Protect your Documents
A Shelter to Protect your DocumentsA Shelter to Protect your Documents
A Shelter to Protect your Documents
 
MoniTutor
MoniTutorMoniTutor
MoniTutor
 
Splunk live nyc_2017_sec_buildinganalyticsdrivensoc
Splunk live nyc_2017_sec_buildinganalyticsdrivensocSplunk live nyc_2017_sec_buildinganalyticsdrivensoc
Splunk live nyc_2017_sec_buildinganalyticsdrivensoc
 
ML6 talk at Nexxworks Bootcamp
ML6 talk at Nexxworks BootcampML6 talk at Nexxworks Bootcamp
ML6 talk at Nexxworks Bootcamp
 
Auto visualization and viml
Auto visualization and vimlAuto visualization and viml
Auto visualization and viml
 
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data LakeITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
 
Agent based modeling level 2 - edukite
Agent based modeling level 2 - edukiteAgent based modeling level 2 - edukite
Agent based modeling level 2 - edukite
 
KNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To DeploymentKNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To Deployment
 
Remote facility monitoring and control using IoT
Remote facility monitoring and control using IoTRemote facility monitoring and control using IoT
Remote facility monitoring and control using IoT
 
Bridging the Gap
Bridging the GapBridging the Gap
Bridging the Gap
 
Sensirion SGP30 Gas Sensor 2018 - teardown reverse costing report published b...
Sensirion SGP30 Gas Sensor 2018 - teardown reverse costing report published b...Sensirion SGP30 Gas Sensor 2018 - teardown reverse costing report published b...
Sensirion SGP30 Gas Sensor 2018 - teardown reverse costing report published b...
 
PyConPL 2017 - with python: security
PyConPL 2017 - with python: securityPyConPL 2017 - with python: security
PyConPL 2017 - with python: security
 
FLIR Boson – a small, innovative,low power, smart thermal camera core 2017 te...
FLIR Boson – a small, innovative,low power, smart thermal camera core 2017 te...FLIR Boson – a small, innovative,low power, smart thermal camera core 2017 te...
FLIR Boson – a small, innovative,low power, smart thermal camera core 2017 te...
 
BIM and Scrum in construction
BIM and Scrum in construction BIM and Scrum in construction
BIM and Scrum in construction
 
ZF S-Cam 4 – Forward Automotive Mono and Tri Camera for Advanced Driver Assis...
ZF S-Cam 4 – Forward Automotive Mono and Tri Camera for Advanced Driver Assis...ZF S-Cam 4 – Forward Automotive Mono and Tri Camera for Advanced Driver Assis...
ZF S-Cam 4 – Forward Automotive Mono and Tri Camera for Advanced Driver Assis...
 
Salottino MIX 2017 - ARouteServer - IXP Automation Made Easy
Salottino MIX 2017 - ARouteServer - IXP Automation Made EasySalottino MIX 2017 - ARouteServer - IXP Automation Made Easy
Salottino MIX 2017 - ARouteServer - IXP Automation Made Easy
 

More from Pôle Systematic Paris-Region

OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...Pôle Systematic Paris-Region
 
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...Pôle Systematic Paris-Region
 
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...Pôle Systematic Paris-Region
 
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...Pôle Systematic Paris-Region
 
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...Pôle Systematic Paris-Region
 
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...Pôle Systematic Paris-Region
 
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...Pôle Systematic Paris-Region
 
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyOsis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyPôle Systematic Paris-Region
 
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAOsis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAPôle Systematic Paris-Region
 
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentOsis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentPôle Systematic Paris-Region
 
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...Pôle Systematic Paris-Region
 
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotOSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotPôle Systematic Paris-Region
 
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...Pôle Systematic Paris-Region
 
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...Pôle Systematic Paris-Region
 
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...Pôle Systematic Paris-Region
 
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)Pôle Systematic Paris-Region
 
PyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPôle Systematic Paris-Region
 

More from Pôle Systematic Paris-Region (20)

OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
 
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
 
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
 
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
 
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
 
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
 
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
 
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyOsis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
 
Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?
 
Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin
 
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAOsis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
 
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentOsis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
 
Osis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritageOsis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritage
 
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
 
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotOSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
 
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
 
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
 
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
 
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
 
PyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelat
 

Recently uploaded

Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your QueriesExploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your QueriesSanjay Willie
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your QueriesExploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

• Machine Learning for Computer Security Experts using Python & scikit-learn Anaël Bonneton

  • 1. SecuML: Machine Learning for Computer Security Experts Anaël Bonneton anael.bonneton@ssi.gouv.fr ANSSI, ENS Paris, INRIA PyParis 2017
  • 2. Intrusion Detection System Information System Detection System Communications Alerts Anaël Bonneton PyParis 2017 - SecuML 2/30
  • 3. Traditional Detection Methods Signatures: Precise detection rules built by security experts Low false alert rate Alerts easy to interprete Not robust to attack variations, to new attacks Anaël Bonneton PyParis 2017 - SecuML 3/30
  • 4. Traditional Detection Methods Signatures: Precise detection rules built by security experts Low false alert rate Alerts easy to interprete Not robust to attack variations, to new attacks Machine Learning ! Anaël Bonneton PyParis 2017 - SecuML 3/30
  • 5. Supervised Detection Model Training PDF PDF PDF PDFPDF PDF PDF PDF PDF PDFPDF PDF PDFPDF PDF PDF PDF PDF PDF PDF PDF Training Data Learning Algorithm Classifier Predicting PDF Ok Alert Anaël Bonneton PyParis 2017 - SecuML 4/30
  • 6. Computer Security Specificities Non Machine Learning Experts Machine Learning pipeline Machine Learning jargon Anaël Bonneton PyParis 2017 - SecuML 5/30
  • 7. Computer Security Specificities Non Machine Learning Experts Machine Learning pipeline Machine Learning jargon Lack of training data Few public labelled datasets No crowdsourcing Anaël Bonneton PyParis 2017 - SecuML 5/30
  • 8. Computer Security Specificities Non Machine Learning Experts Machine Learning pipeline Machine Learning jargon Lack of training data Few public labelled datasets No crowdsourcing Need for interpretation How does the detection model work ? Why an alert has been raised ? Anaël Bonneton PyParis 2017 - SecuML 5/30
  • 9. Applying Machine Learning Deep Learning Frameworks TensorFlow Microsoft Cognitive Toolkit Paddle Anaël Bonneton PyParis 2017 - SecuML 6/30
  • 10. Applying Machine Learning Deep Learning Frameworks TensorFlow Microsoft Cognitive Toolkit Paddle Cloud Solutions Google Cloud ML Microsoft Azure Amazon Machine Learning Anaël Bonneton PyParis 2017 - SecuML 6/30
  • 11. Applying Machine Learning Deep Learning Frameworks TensorFlow Microsoft Cognitive Toolkit Paddle Cloud Solutions Google Cloud ML Microsoft Azure Amazon Machine Learning Machine Learning Libraries scikit-learn Mahout, Weka, Vowpal Wabbit Anaël Bonneton PyParis 2017 - SecuML 6/30
  • 12. SecuML: Beyound scikit-learn Scikit-learn Classification, clustering, dimension reduction, etc. Scaling, grid search, cross validation, etc. Anaël Bonneton PyParis 2017 - SecuML 7/30
  • 13. SecuML: Beyound scikit-learn Scikit-learn Classification, clustering, dimension reduction, etc. Scaling, grid search, cross validation, etc. SecuML Automation of the Machine Learning pipeline Interactive labelling to acquire training data at low cost Graphical User Interface Anaël Bonneton PyParis 2017 - SecuML 7/30
  • 14. SecuML Machine Learning for Computer Security Experts Algorithms (scikit-learn, metric-learn, active learning) Web user interface (flask server) Any Type of Data PDF PCAP JavaScriptDOC Netflows EXE Anaël Bonneton PyParis 2017 - SecuML 8/30
  • 15. SecuML - Input Data features.csv id,f0,f1,f2,f3,f4,... 0,1,3,0,0,0,... 1,1,4,5,4,1,... 2,1,4,32,13,0,... 3,1,3,0,0,0,... 4,1,3,0,0,0,... 5,1,3,0,0,0,... 6,1,6,7,6,0,... 7,1,3,0,0,0,... 8,1,3,0,0,0,... ... true_labels.csv id,label,family 0,M,CVE-2017-30-10 1,M,CVE-2017-30-10 2,B,slides 3,M,CVE-2016-0945 4,M,CVE-2016-0945 5,B,user_manual 6,B,user_manual 7,B,technical_report 8,B,technical_report ... Anaël Bonneton PyParis 2017 - SecuML 9/30
  • 16. SecuML 1 Set up a Detection Model 2 Acquire a Representative Training Dataset at Low Cost Anaël Bonneton PyParis 2017 - SecuML 10/30
  • 17. Set up a Detection Model Anaël Bonneton PyParis 2017 - SecuML 11/30
  • 18. Set up a Detection Model Training Data Features and Labels Training Diagnosis Deployment Anaël Bonneton PyParis 2017 - SecuML 12/30
  • 19. Automation of the Machine Learning Pipeline Training Pipeline Scaling Cross validation to select the hyperparameters Anaël Bonneton PyParis 2017 - SecuML 13/30
  • 20. Automation of the Machine Learning Pipeline Training Pipeline Scaling Cross validation to select the hyperparameters Validation of a Detection Model 90% Data Training 10% Data Validation Anaël Bonneton PyParis 2017 - SecuML 13/30
  • 21. Trust in the Detection Model PDF PDF PDF PDFPDF PDF PDF PDF PDF PDFPDF PDF PDFPDF PDF PDF PDF PDF PDF PDF PDF Training Data Learning Algorithm Classifier Understanding the Classifier How does the detection model work ? Why an alert has been raised ? Anaël Bonneton PyParis 2017 - SecuML 14/30
  • 22. Trust in the Detection Model How does the detection model work ? CreateDate_correct producer_space image_large Font_count creator_upper_case creator_lower_case Filter_count Page_count modDate_correct OpenAction_count keywords_unique_count XFA_count creator_dot author_lower_case -1.5 -1 -0.5 0 0.5 1 1.5 Anaël Bonneton PyParis 2017 - SecuML 15/30
  • 23. Trust in the Detection Model Why an alert has been raised ? OpenAction_count keywords_unique_count creator_unique_count creator_dot AcroForm_count createDate_correct producer_space creator_lower_case modDate_correct creator_upper_case producer_unique_count image_large Filter_count -1.5 -1 -0.5 0 0.5 1 1.5 Anaël Bonneton PyParis 2017 - SecuML 16/30
  • 24. Train and Diagnose a Detection Model Demo ./SecuML_classification LogisticRegression PDF contagio Anaël Bonneton PyParis 2017 - SecuML 17/30
  • 25. Train and Diagnose a Detection Model Anaël Bonneton PyParis 2017 - SecuML 17/30
  • 26. Train and Diagnose a Detection Model Anaël Bonneton PyParis 2017 - SecuML 17/30
  • 27. Train and Diagnose a Detection Model Anaël Bonneton PyParis 2017 - SecuML 17/30
  • 28. Train and Diagnose a Detection Model Anaël Bonneton PyParis 2017 - SecuML 17/30
  • 29. Train and Diagnose a Detection Model Anaël Bonneton PyParis 2017 - SecuML 17/30
  • 30. Train and Diagnose a Detection Model Anaël Bonneton PyParis 2017 - SecuML 17/30
  • 31. Train and Diagnose a Detection Model Anaël Bonneton PyParis 2017 - SecuML 17/30
  • 32. Acquire a Representative Training Dataset at Low Cost Anaël Bonneton PyParis 2017 - SecuML 18/30
  • 33. Annotation System Unlabelled Pool Labelled Dataset Annotation queries Security experts = expensive resources Anaël Bonneton PyParis 2017 - SecuML 19/30
  • 34. Active Learning Strategy Expert Active Learning Strategy Annotation queriesNew labelled instances Which instances should be annotated ? Which instances are the most informative ? Anaël Bonneton PyParis 2017 - SecuML 20/30
  • 35. Reducing the number of annotations is not enough ! Expert Active Learning Strategy Annotation queriesNew labelled instances Low expert waiting time Feedback: Your annotations are useful ! User interface for annotating Anaël Bonneton PyParis 2017 - SecuML 21/30
  • 36. ILAB: Interactive LABelling A whole annotation system Active learning strategy Feedback to the expert User interface for annotating Anaël Bonneton PyParis 2017 - SecuML 22/30
  • 37. ILAB’s Active Learning Strategy Annotations Queries Anaël Bonneton PyParis 2017 - SecuML 23/30
  • 38. ILAB’s Active Learning Strategy Annotations Queries Anaël Bonneton PyParis 2017 - SecuML 23/30
  • 39. ILAB’s Active Learning Strategy Annotations Queries Anaël Bonneton PyParis 2017 - SecuML 23/30
  • 40. ILAB’s Active Learning Strategy Annotations Queries Close to the decision boundary Anaël Bonneton PyParis 2017 - SecuML 23/30
  • 41. ILAB’s Active Learning Strategy Annotations Queries Close to the decision boundary Clusters = User-defined Families Anaël Bonneton PyParis 2017 - SecuML 23/30
  • 42. ILAB’s Active Learning Strategy Annotations Queries Close to the decision boundary Center of the clusters Clusters = User-defined Families Anaël Bonneton PyParis 2017 - SecuML 23/30
  • 43. ILAB’s Active Learning Strategy Annotations Queries Close to the decision boundary Center of the clusters Edge of the clusters Clusters = User-defined Families Will be published at RAID 2017. Anaël Bonneton PyParis 2017 - SecuML 23/30
  • 44. ILAB Demo ./SecuML_activeLearning ILAB PDF contagio Anaël Bonneton PyParis 2017 - SecuML 24/30
  • 45. ILAB Anaël Bonneton PyParis 2017 - SecuML 24/30
  • 46. ILAB Anaël Bonneton PyParis 2017 - SecuML 24/30
  • 47. ILAB Anaël Bonneton PyParis 2017 - SecuML 24/30
  • 48. ILAB Anaël Bonneton PyParis 2017 - SecuML 24/30
  • 49. SecuML Anaël Bonneton PyParis 2017 - SecuML 25/30
  • 50. SecuML 1 Acquire a Representative Training Dataset at Low Cost 2 Set up a Detection Model Anaël Bonneton PyParis 2017 - SecuML 26/30
  • 51. What’s next ? GUI to launch the experiments Anaël Bonneton PyParis 2017 - SecuML 27/30
  • 52. What’s next ? GUI to launch the experiments Interpretation of more complex models CreateDate_correct producer_space image_large Font_count creator_upper_case creator_lower_case Filter_count Page_count modDate_correct OpenAction_count keywords_unique_count XFA_count creator_dot author_lower_case -1.5 -1 -0.5 0 0.5 1 1.5 Anaël Bonneton PyParis 2017 - SecuML 27/30
  • 53. What’s next ? GUI to launch the experiments Interpretation of more complex models CreateDate_correct producer_space image_large Font_count creator_upper_case creator_lower_case Filter_count Page_count modDate_correct OpenAction_count keywords_unique_count XFA_count creator_dot author_lower_case -1.5 -1 -0.5 0 0.5 1 1.5 Automatic feature extraction PDF PCAP JavaScriptDOC Netflows EXE Anaël Bonneton PyParis 2017 - SecuML 27/30
  • 54. SecuML Algorithms and Corresponding Interfaces ! Detection Models Logistic regression, SVM, Naive Bayes, ... Performance, interpretation Interactive Machine Learning Active learning, Rare category detection Annotations, feedback Anaël Bonneton PyParis 2017 - SecuML 28/30
  • 55. SecuML Algorithms and Corresponding Interfaces ! Clustering K-means, Gaussian Mixtures, ... Instances in each cluster Projection PCA, RCA, LDA, LMNN, .. Projection on two components Anaël Bonneton PyParis 2017 - SecuML 29/30
  • 56. SecuML is available online ! https://github.com/ANSSI-FR/SecuML Only for Computer Security experts ? Model interpretation Interactive labelling Data visualization with projections Clustering display Anaël Bonneton PyParis 2017 - SecuML 30/30
  • 57. SecuML is available online ! https://github.com/ANSSI-FR/SecuML Only for Computer Security experts ? Model interpretation Interactive labelling Data visualization with projections Clustering display Anaël Bonneton PyParis 2017 - SecuML 30/30