SlideShare a Scribd company logo
1 of 57
Download to read offline
SecuML: Machine Learning
for Computer Security Experts
Anaël Bonneton
anael.bonneton@ssi.gouv.fr
ANSSI, ENS Paris, INRIA
PyParis 2017
Intrusion Detection System
Information System
Detection
System
Communications
Alerts
Anaël Bonneton PyParis 2017 - SecuML 2/30
Traditional Detection Methods
Signatures: Precise detection rules built by security experts
 Low false alert rate
 Alerts easy to interprete
 Not robust to attack variations, to new attacks
Anaël Bonneton PyParis 2017 - SecuML 3/30
Traditional Detection Methods
Signatures: Precise detection rules built by security experts
 Low false alert rate
 Alerts easy to interprete
 Not robust to attack variations, to new attacks
Machine Learning !
Anaël Bonneton PyParis 2017 - SecuML 3/30
Supervised Detection Model
Training
PDF
PDF PDF
PDFPDF
PDF PDF PDF
PDF
PDFPDF PDF
PDFPDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF
Training Data
Learning Algorithm
Classifier
Predicting
PDF
Ok
Alert
Anaël Bonneton PyParis 2017 - SecuML 4/30
Computer Security Specificities
Non Machine Learning Experts
Machine Learning pipeline
Machine Learning jargon
Anaël Bonneton PyParis 2017 - SecuML 5/30
Computer Security Specificities
Non Machine Learning Experts
Machine Learning pipeline
Machine Learning jargon
Lack of training data
Few public labelled datasets
No crowdsourcing
Anaël Bonneton PyParis 2017 - SecuML 5/30
Computer Security Specificities
Non Machine Learning Experts
Machine Learning pipeline
Machine Learning jargon
Lack of training data
Few public labelled datasets
No crowdsourcing
Need for interpretation
How does the detection model work ?
Why an alert has been raised ?
Anaël Bonneton PyParis 2017 - SecuML 5/30
Applying Machine Learning
 Deep Learning Frameworks
 TensorFlow
 Microsoft Cognitive Toolkit
 Paddle
Anaël Bonneton PyParis 2017 - SecuML 6/30
Applying Machine Learning
 Deep Learning Frameworks
 TensorFlow
 Microsoft Cognitive Toolkit
 Paddle
 Cloud Solutions
 Google Cloud ML
 Microsoft Azure
 Amazon Machine Learning
Anaël Bonneton PyParis 2017 - SecuML 6/30
Applying Machine Learning
 Deep Learning Frameworks
 TensorFlow
 Microsoft Cognitive Toolkit
 Paddle
 Cloud Solutions
 Google Cloud ML
 Microsoft Azure
 Amazon Machine Learning
Machine Learning Libraries
 scikit-learn
 Mahout, Weka, Vowpal Wabbit
Anaël Bonneton PyParis 2017 - SecuML 6/30
SecuML: Beyound scikit-learn
Scikit-learn
Classification, clustering, dimension reduction, etc.
Scaling, grid search, cross validation, etc.
Anaël Bonneton PyParis 2017 - SecuML 7/30
SecuML: Beyound scikit-learn
Scikit-learn
Classification, clustering, dimension reduction, etc.
Scaling, grid search, cross validation, etc.
SecuML
Automation of the Machine Learning pipeline
Interactive labelling to acquire training data at low cost
Graphical User Interface
Anaël Bonneton PyParis 2017 - SecuML 7/30
SecuML
Machine Learning for Computer Security Experts
Algorithms (scikit-learn, metric-learn, active learning)
Web user interface (flask server)
Any Type of Data
PDF
PCAP
JavaScriptDOC Netflows
EXE
Anaël Bonneton PyParis 2017 - SecuML 8/30
SecuML - Input Data
features.csv
id,f0,f1,f2,f3,f4,...
0,1,3,0,0,0,...
1,1,4,5,4,1,...
2,1,4,32,13,0,...
3,1,3,0,0,0,...
4,1,3,0,0,0,...
5,1,3,0,0,0,...
6,1,6,7,6,0,...
7,1,3,0,0,0,...
8,1,3,0,0,0,...
...
true_labels.csv
id,label,family
0,M,CVE-2017-30-10
1,M,CVE-2017-30-10
2,B,slides
3,M,CVE-2016-0945
4,M,CVE-2016-0945
5,B,user_manual
6,B,user_manual
7,B,technical_report
8,B,technical_report
...
Anaël Bonneton PyParis 2017 - SecuML 9/30
SecuML
1 Set up a Detection Model
2 Acquire a Representative Training Dataset at Low Cost
Anaël Bonneton PyParis 2017 - SecuML 10/30
Set up
a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 11/30
Set up a Detection Model
Training Data
Features and Labels
Training
Diagnosis
Deployment
Anaël Bonneton PyParis 2017 - SecuML 12/30
Automation of the Machine Learning Pipeline
Training Pipeline
Scaling
Cross validation to select the hyperparameters
Anaël Bonneton PyParis 2017 - SecuML 13/30
Automation of the Machine Learning Pipeline
Training Pipeline
Scaling
Cross validation to select the hyperparameters
Validation of a Detection Model
90% Data
Training
10% Data
Validation
Anaël Bonneton PyParis 2017 - SecuML 13/30
Trust in the Detection Model
PDF
PDF PDF
PDFPDF
PDF PDF PDF
PDF
PDFPDF PDF
PDFPDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF
Training Data
Learning Algorithm
Classifier
Understanding the Classifier
How does the detection model work ?
Why an alert has been raised ?
Anaël Bonneton PyParis 2017 - SecuML 14/30
Trust in the Detection Model
How does the detection model work ?
CreateDate_correct
producer_space
image_large
Font_count
creator_upper_case
creator_lower_case
Filter_count
Page_count
modDate_correct
OpenAction_count
keywords_unique_count
XFA_count
creator_dot
author_lower_case
-1.5 -1 -0.5 0 0.5 1 1.5
Anaël Bonneton PyParis 2017 - SecuML 15/30
Trust in the Detection Model
Why an alert has been raised ?
OpenAction_count
keywords_unique_count
creator_unique_count
creator_dot
AcroForm_count
createDate_correct
producer_space
creator_lower_case
modDate_correct
creator_upper_case
producer_unique_count
image_large
Filter_count
-1.5 -1 -0.5 0 0.5 1 1.5
Anaël Bonneton PyParis 2017 - SecuML 16/30
Train and Diagnose a Detection Model
Demo
./SecuML_classification LogisticRegression PDF contagio
Anaël Bonneton PyParis 2017 - SecuML 17/30
Train and Diagnose a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 17/30
Train and Diagnose a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 17/30
Train and Diagnose a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 17/30
Train and Diagnose a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 17/30
Train and Diagnose a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 17/30
Train and Diagnose a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 17/30
Train and Diagnose a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 17/30
Acquire a Representative
Training Dataset at Low Cost
Anaël Bonneton PyParis 2017 - SecuML 18/30
Annotation System
Unlabelled Pool Labelled Dataset
Annotation queries
Security experts = expensive resources
Anaël Bonneton PyParis 2017 - SecuML 19/30
Active Learning Strategy
Expert
Active Learning Strategy
Annotation queriesNew labelled instances
Which instances should be annotated ?
Which instances are the most informative ?
Anaël Bonneton PyParis 2017 - SecuML 20/30
Reducing the number of annotations is not enough !
Expert
Active Learning Strategy
Annotation queriesNew labelled instances
Low expert waiting time
Feedback: Your annotations are useful !
User interface for annotating
Anaël Bonneton PyParis 2017 - SecuML 21/30
ILAB: Interactive LABelling
A whole annotation system
Active learning strategy
Feedback to the expert
User interface for annotating
Anaël Bonneton PyParis 2017 - SecuML 22/30
ILAB’s Active Learning Strategy
Annotations Queries
Anaël Bonneton PyParis 2017 - SecuML 23/30
ILAB’s Active Learning Strategy
Annotations Queries
Anaël Bonneton PyParis 2017 - SecuML 23/30
ILAB’s Active Learning Strategy
Annotations Queries
Anaël Bonneton PyParis 2017 - SecuML 23/30
ILAB’s Active Learning Strategy
Annotations Queries
Close to the decision boundary
Anaël Bonneton PyParis 2017 - SecuML 23/30
ILAB’s Active Learning Strategy
Annotations Queries
Close to the decision boundary
Clusters = User-defined Families
Anaël Bonneton PyParis 2017 - SecuML 23/30
ILAB’s Active Learning Strategy
Annotations Queries
Close to the decision boundary
Center of the clusters
Clusters = User-defined Families
Anaël Bonneton PyParis 2017 - SecuML 23/30
ILAB’s Active Learning Strategy
Annotations Queries
Close to the decision boundary
Center of the clusters
Edge of the clusters
Clusters = User-defined Families
Will be published at RAID 2017.
Anaël Bonneton PyParis 2017 - SecuML 23/30
ILAB
Demo
./SecuML_activeLearning ILAB PDF contagio
Anaël Bonneton PyParis 2017 - SecuML 24/30
ILAB
Anaël Bonneton PyParis 2017 - SecuML 24/30
ILAB
Anaël Bonneton PyParis 2017 - SecuML 24/30
ILAB
Anaël Bonneton PyParis 2017 - SecuML 24/30
ILAB
Anaël Bonneton PyParis 2017 - SecuML 24/30
SecuML
Anaël Bonneton PyParis 2017 - SecuML 25/30
SecuML
1 Acquire a Representative Training Dataset at Low Cost
2 Set up a Detection Model
Anaël Bonneton PyParis 2017 - SecuML 26/30
What’s next ?
GUI to launch the experiments
Anaël Bonneton PyParis 2017 - SecuML 27/30
What’s next ?
GUI to launch the experiments
Interpretation of more complex models
CreateDate_correct
producer_space
image_large
Font_count
creator_upper_case
creator_lower_case
Filter_count
Page_count
modDate_correct
OpenAction_count
keywords_unique_count
XFA_count
creator_dot
author_lower_case
-1.5 -1 -0.5 0 0.5 1 1.5
Anaël Bonneton PyParis 2017 - SecuML 27/30
What’s next ?
GUI to launch the experiments
Interpretation of more complex models
CreateDate_correct
producer_space
image_large
Font_count
creator_upper_case
creator_lower_case
Filter_count
Page_count
modDate_correct
OpenAction_count
keywords_unique_count
XFA_count
creator_dot
author_lower_case
-1.5 -1 -0.5 0 0.5 1 1.5
Automatic feature extraction
PDF
PCAP
JavaScriptDOC Netflows
EXE
Anaël Bonneton PyParis 2017 - SecuML 27/30
SecuML
Algorithms and Corresponding Interfaces !
Detection Models
Logistic regression, SVM, Naive Bayes, ...
Performance, interpretation
Interactive Machine Learning
Active learning, Rare category detection
Annotations, feedback
Anaël Bonneton PyParis 2017 - SecuML 28/30
SecuML
Algorithms and Corresponding Interfaces !
Clustering
K-means, Gaussian Mixtures, ...
Instances in each cluster
Projection
PCA, RCA, LDA, LMNN, ..
Projection on two components
Anaël Bonneton PyParis 2017 - SecuML 29/30
SecuML is available online !
https://github.com/ANSSI-FR/SecuML
Only for Computer Security experts ?
Model interpretation
Interactive labelling
Data visualization with projections
Clustering display
Anaël Bonneton PyParis 2017 - SecuML 30/30
SecuML is available online !
https://github.com/ANSSI-FR/SecuML
Only for Computer Security experts ?
Model interpretation
Interactive labelling
Data visualization with projections
Clustering display
Anaël Bonneton PyParis 2017 - SecuML 30/30

More Related Content

Similar to • Machine Learning for Computer Security Experts using Python & scikit-learn Anaël Bonneton

Realizing Traceability for Safety and Certainty
Realizing Traceability for Safety and CertaintyRealizing Traceability for Safety and Certainty
Realizing Traceability for Safety and CertaintySteven Vettermann
 
Sopra Steria / Interconnect2017 Digital Architecture
Sopra Steria / Interconnect2017 Digital ArchitectureSopra Steria / Interconnect2017 Digital Architecture
Sopra Steria / Interconnect2017 Digital ArchitectureXavier BERNARD
 
SplunkLive! Frankfurt 2018 - Customer Presentation: Bosch Cyber Defense Center
SplunkLive! Frankfurt 2018 - Customer Presentation: Bosch Cyber Defense CenterSplunkLive! Frankfurt 2018 - Customer Presentation: Bosch Cyber Defense Center
SplunkLive! Frankfurt 2018 - Customer Presentation: Bosch Cyber Defense CenterSplunk
 
Splunk ITOA Roundtable - Zurich: 30th November 2017
Splunk ITOA Roundtable - Zurich: 30th November 2017Splunk ITOA Roundtable - Zurich: 30th November 2017
Splunk ITOA Roundtable - Zurich: 30th November 2017Splunk
 
A Shelter to Protect your Documents
A Shelter to Protect your DocumentsA Shelter to Protect your Documents
A Shelter to Protect your Documentsteam-WIBU
 
MoniTutor
MoniTutorMoniTutor
MoniTutorIcinga
 
Splunk live nyc_2017_sec_buildinganalyticsdrivensoc
Splunk live nyc_2017_sec_buildinganalyticsdrivensocSplunk live nyc_2017_sec_buildinganalyticsdrivensoc
Splunk live nyc_2017_sec_buildinganalyticsdrivensocRene Aguero
 
ML6 talk at Nexxworks Bootcamp
ML6 talk at Nexxworks BootcampML6 talk at Nexxworks Bootcamp
ML6 talk at Nexxworks BootcampKarel Dumon
 
Auto visualization and viml
Auto visualization and vimlAuto visualization and viml
Auto visualization and vimlBill Liu
 
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data LakeITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data LakeITCamp
 
Agent based modeling level 2 - edukite
Agent based modeling level 2 - edukiteAgent based modeling level 2 - edukite
Agent based modeling level 2 - edukiteEduKite
 
KNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To DeploymentKNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To DeploymentKNIMESlides
 
Remote facility monitoring and control using IoT
Remote facility monitoring and control using IoTRemote facility monitoring and control using IoT
Remote facility monitoring and control using IoTBernhard Mehl
 
Splunk Forum Frankfurt - 15th Nov 2017 - .conf2017 Update
Splunk Forum Frankfurt - 15th Nov 2017 - .conf2017 UpdateSplunk Forum Frankfurt - 15th Nov 2017 - .conf2017 Update
Splunk Forum Frankfurt - 15th Nov 2017 - .conf2017 UpdateSplunk
 
Sensirion SGP30 Gas Sensor 2018 - teardown reverse costing report published b...
Sensirion SGP30 Gas Sensor 2018 - teardown reverse costing report published b...Sensirion SGP30 Gas Sensor 2018 - teardown reverse costing report published b...
Sensirion SGP30 Gas Sensor 2018 - teardown reverse costing report published b...system_plus
 
PyConPL 2017 - with python: security
PyConPL 2017 - with python: securityPyConPL 2017 - with python: security
PyConPL 2017 - with python: securityPiotr Dyba
 
FLIR Boson – a small, innovative,low power, smart thermal camera core 2017 te...
FLIR Boson – a small, innovative,low power, smart thermal camera core 2017 te...FLIR Boson – a small, innovative,low power, smart thermal camera core 2017 te...
FLIR Boson – a small, innovative,low power, smart thermal camera core 2017 te...Yole Developpement
 
ZF S-Cam 4 – Forward Automotive Mono and Tri Camera for Advanced Driver Assis...
ZF S-Cam 4 – Forward Automotive Mono and Tri Camera for Advanced Driver Assis...ZF S-Cam 4 – Forward Automotive Mono and Tri Camera for Advanced Driver Assis...
ZF S-Cam 4 – Forward Automotive Mono and Tri Camera for Advanced Driver Assis...system_plus
 

Similar to • Machine Learning for Computer Security Experts using Python & scikit-learn Anaël Bonneton (20)

Realizing Traceability for Safety and Certainty
Realizing Traceability for Safety and CertaintyRealizing Traceability for Safety and Certainty
Realizing Traceability for Safety and Certainty
 
Sopra Steria / Interconnect2017 Digital Architecture
Sopra Steria / Interconnect2017 Digital ArchitectureSopra Steria / Interconnect2017 Digital Architecture
Sopra Steria / Interconnect2017 Digital Architecture
 
SplunkLive! Frankfurt 2018 - Customer Presentation: Bosch Cyber Defense Center
SplunkLive! Frankfurt 2018 - Customer Presentation: Bosch Cyber Defense CenterSplunkLive! Frankfurt 2018 - Customer Presentation: Bosch Cyber Defense Center
SplunkLive! Frankfurt 2018 - Customer Presentation: Bosch Cyber Defense Center
 
Splunk ITOA Roundtable - Zurich: 30th November 2017
Splunk ITOA Roundtable - Zurich: 30th November 2017Splunk ITOA Roundtable - Zurich: 30th November 2017
Splunk ITOA Roundtable - Zurich: 30th November 2017
 
A Shelter to Protect your Documents
A Shelter to Protect your DocumentsA Shelter to Protect your Documents
A Shelter to Protect your Documents
 
MoniTutor
MoniTutorMoniTutor
MoniTutor
 
Splunk live nyc_2017_sec_buildinganalyticsdrivensoc
Splunk live nyc_2017_sec_buildinganalyticsdrivensocSplunk live nyc_2017_sec_buildinganalyticsdrivensoc
Splunk live nyc_2017_sec_buildinganalyticsdrivensoc
 
ML6 talk at Nexxworks Bootcamp
ML6 talk at Nexxworks BootcampML6 talk at Nexxworks Bootcamp
ML6 talk at Nexxworks Bootcamp
 
Auto visualization and viml
Auto visualization and vimlAuto visualization and viml
Auto visualization and viml
 
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data LakeITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
 
Agent based modeling level 2 - edukite
Agent based modeling level 2 - edukiteAgent based modeling level 2 - edukite
Agent based modeling level 2 - edukite
 
KNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To DeploymentKNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To Deployment
 
Remote facility monitoring and control using IoT
Remote facility monitoring and control using IoTRemote facility monitoring and control using IoT
Remote facility monitoring and control using IoT
 
Splunk Forum Frankfurt - 15th Nov 2017 - .conf2017 Update
Splunk Forum Frankfurt - 15th Nov 2017 - .conf2017 UpdateSplunk Forum Frankfurt - 15th Nov 2017 - .conf2017 Update
Splunk Forum Frankfurt - 15th Nov 2017 - .conf2017 Update
 
Bridging the Gap
Bridging the GapBridging the Gap
Bridging the Gap
 
Sensirion SGP30 Gas Sensor 2018 - teardown reverse costing report published b...
Sensirion SGP30 Gas Sensor 2018 - teardown reverse costing report published b...Sensirion SGP30 Gas Sensor 2018 - teardown reverse costing report published b...
Sensirion SGP30 Gas Sensor 2018 - teardown reverse costing report published b...
 
PyConPL 2017 - with python: security
PyConPL 2017 - with python: securityPyConPL 2017 - with python: security
PyConPL 2017 - with python: security
 
FLIR Boson – a small, innovative,low power, smart thermal camera core 2017 te...
FLIR Boson – a small, innovative,low power, smart thermal camera core 2017 te...FLIR Boson – a small, innovative,low power, smart thermal camera core 2017 te...
FLIR Boson – a small, innovative,low power, smart thermal camera core 2017 te...
 
BIM and Scrum in construction
BIM and Scrum in construction BIM and Scrum in construction
BIM and Scrum in construction
 
ZF S-Cam 4 – Forward Automotive Mono and Tri Camera for Advanced Driver Assis...
ZF S-Cam 4 – Forward Automotive Mono and Tri Camera for Advanced Driver Assis...ZF S-Cam 4 – Forward Automotive Mono and Tri Camera for Advanced Driver Assis...
ZF S-Cam 4 – Forward Automotive Mono and Tri Camera for Advanced Driver Assis...
 

More from Pôle Systematic Paris-Region

OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...Pôle Systematic Paris-Region
 
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...Pôle Systematic Paris-Region
 
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...Pôle Systematic Paris-Region
 
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...Pôle Systematic Paris-Region
 
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...Pôle Systematic Paris-Region
 
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...Pôle Systematic Paris-Region
 
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...Pôle Systematic Paris-Region
 
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyOsis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyPôle Systematic Paris-Region
 
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAOsis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAPôle Systematic Paris-Region
 
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentOsis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentPôle Systematic Paris-Region
 
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...Pôle Systematic Paris-Region
 
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotOSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotPôle Systematic Paris-Region
 
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...Pôle Systematic Paris-Region
 
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...Pôle Systematic Paris-Region
 
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...Pôle Systematic Paris-Region
 
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)Pôle Systematic Paris-Region
 
PyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPôle Systematic Paris-Region
 

More from Pôle Systematic Paris-Region (20)

OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
 
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
 
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
 
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
 
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
 
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
 
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
 
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyOsis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
 
Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?
 
Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin
 
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAOsis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
 
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentOsis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
 
Osis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritageOsis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritage
 
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
 
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotOSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
 
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
 
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
 
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
 
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
 
PyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelat
 

Recently uploaded

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

• Machine Learning for Computer Security Experts using Python & scikit-learn Anaël Bonneton

  • 1. SecuML: Machine Learning for Computer Security Experts Anaël Bonneton anael.bonneton@ssi.gouv.fr ANSSI, ENS Paris, INRIA PyParis 2017
  • 2. Intrusion Detection System Information System Detection System Communications Alerts Anaël Bonneton PyParis 2017 - SecuML 2/30
  • 3. Traditional Detection Methods Signatures: Precise detection rules built by security experts Low false alert rate Alerts easy to interprete Not robust to attack variations, to new attacks Anaël Bonneton PyParis 2017 - SecuML 3/30
  • 4. Traditional Detection Methods Signatures: Precise detection rules built by security experts Low false alert rate Alerts easy to interprete Not robust to attack variations, to new attacks Machine Learning ! Anaël Bonneton PyParis 2017 - SecuML 3/30
  • 5. Supervised Detection Model Training PDF PDF PDF PDFPDF PDF PDF PDF PDF PDFPDF PDF PDFPDF PDF PDF PDF PDF PDF PDF PDF Training Data Learning Algorithm Classifier Predicting PDF Ok Alert Anaël Bonneton PyParis 2017 - SecuML 4/30
  • 6. Computer Security Specificities Non Machine Learning Experts Machine Learning pipeline Machine Learning jargon Anaël Bonneton PyParis 2017 - SecuML 5/30
  • 7. Computer Security Specificities Non Machine Learning Experts Machine Learning pipeline Machine Learning jargon Lack of training data Few public labelled datasets No crowdsourcing Anaël Bonneton PyParis 2017 - SecuML 5/30
  • 8. Computer Security Specificities Non Machine Learning Experts Machine Learning pipeline Machine Learning jargon Lack of training data Few public labelled datasets No crowdsourcing Need for interpretation How does the detection model work ? Why an alert has been raised ? Anaël Bonneton PyParis 2017 - SecuML 5/30
  • 9. Applying Machine Learning Deep Learning Frameworks TensorFlow Microsoft Cognitive Toolkit Paddle Anaël Bonneton PyParis 2017 - SecuML 6/30
  • 10. Applying Machine Learning Deep Learning Frameworks TensorFlow Microsoft Cognitive Toolkit Paddle Cloud Solutions Google Cloud ML Microsoft Azure Amazon Machine Learning Anaël Bonneton PyParis 2017 - SecuML 6/30
  • 11. Applying Machine Learning Deep Learning Frameworks TensorFlow Microsoft Cognitive Toolkit Paddle Cloud Solutions Google Cloud ML Microsoft Azure Amazon Machine Learning Machine Learning Libraries scikit-learn Mahout, Weka, Vowpal Wabbit Anaël Bonneton PyParis 2017 - SecuML 6/30
  • 12. SecuML: Beyound scikit-learn Scikit-learn Classification, clustering, dimension reduction, etc. Scaling, grid search, cross validation, etc. Anaël Bonneton PyParis 2017 - SecuML 7/30
  • 13. SecuML: Beyound scikit-learn Scikit-learn Classification, clustering, dimension reduction, etc. Scaling, grid search, cross validation, etc. SecuML Automation of the Machine Learning pipeline Interactive labelling to acquire training data at low cost Graphical User Interface Anaël Bonneton PyParis 2017 - SecuML 7/30
  • 14. SecuML Machine Learning for Computer Security Experts Algorithms (scikit-learn, metric-learn, active learning) Web user interface (flask server) Any Type of Data PDF PCAP JavaScriptDOC Netflows EXE Anaël Bonneton PyParis 2017 - SecuML 8/30
  • 15. SecuML - Input Data features.csv id,f0,f1,f2,f3,f4,... 0,1,3,0,0,0,... 1,1,4,5,4,1,... 2,1,4,32,13,0,... 3,1,3,0,0,0,... 4,1,3,0,0,0,... 5,1,3,0,0,0,... 6,1,6,7,6,0,... 7,1,3,0,0,0,... 8,1,3,0,0,0,... ... true_labels.csv id,label,family 0,M,CVE-2017-30-10 1,M,CVE-2017-30-10 2,B,slides 3,M,CVE-2016-0945 4,M,CVE-2016-0945 5,B,user_manual 6,B,user_manual 7,B,technical_report 8,B,technical_report ... Anaël Bonneton PyParis 2017 - SecuML 9/30
  • 16. SecuML 1 Set up a Detection Model 2 Acquire a Representative Training Dataset at Low Cost Anaël Bonneton PyParis 2017 - SecuML 10/30
  • 17. Set up a Detection Model Anaël Bonneton PyParis 2017 - SecuML 11/30
  • 18. Set up a Detection Model Training Data Features and Labels Training Diagnosis Deployment Anaël Bonneton PyParis 2017 - SecuML 12/30
  • 19. Automation of the Machine Learning Pipeline Training Pipeline Scaling Cross validation to select the hyperparameters Anaël Bonneton PyParis 2017 - SecuML 13/30
  • 20. Automation of the Machine Learning Pipeline Training Pipeline Scaling Cross validation to select the hyperparameters Validation of a Detection Model 90% Data Training 10% Data Validation Anaël Bonneton PyParis 2017 - SecuML 13/30
  • 21. Trust in the Detection Model PDF PDF PDF PDFPDF PDF PDF PDF PDF PDFPDF PDF PDFPDF PDF PDF PDF PDF PDF PDF PDF Training Data Learning Algorithm Classifier Understanding the Classifier How does the detection model work ? Why an alert has been raised ? Anaël Bonneton PyParis 2017 - SecuML 14/30
  • 22. Trust in the Detection Model How does the detection model work ? CreateDate_correct producer_space image_large Font_count creator_upper_case creator_lower_case Filter_count Page_count modDate_correct OpenAction_count keywords_unique_count XFA_count creator_dot author_lower_case -1.5 -1 -0.5 0 0.5 1 1.5 Anaël Bonneton PyParis 2017 - SecuML 15/30
  • 23. Trust in the Detection Model Why an alert has been raised ? OpenAction_count keywords_unique_count creator_unique_count creator_dot AcroForm_count createDate_correct producer_space creator_lower_case modDate_correct creator_upper_case producer_unique_count image_large Filter_count -1.5 -1 -0.5 0 0.5 1 1.5 Anaël Bonneton PyParis 2017 - SecuML 16/30
  • 24. Train and Diagnose a Detection Model Demo ./SecuML_classification LogisticRegression PDF contagio Anaël Bonneton PyParis 2017 - SecuML 17/30
  • 25. Train and Diagnose a Detection Model Anaël Bonneton PyParis 2017 - SecuML 17/30
  • 26. Train and Diagnose a Detection Model Anaël Bonneton PyParis 2017 - SecuML 17/30
  • 27. Train and Diagnose a Detection Model Anaël Bonneton PyParis 2017 - SecuML 17/30
  • 28. Train and Diagnose a Detection Model Anaël Bonneton PyParis 2017 - SecuML 17/30
  • 29. Train and Diagnose a Detection Model Anaël Bonneton PyParis 2017 - SecuML 17/30
  • 30. Train and Diagnose a Detection Model Anaël Bonneton PyParis 2017 - SecuML 17/30
  • 31. Train and Diagnose a Detection Model Anaël Bonneton PyParis 2017 - SecuML 17/30
  • 32. Acquire a Representative Training Dataset at Low Cost Anaël Bonneton PyParis 2017 - SecuML 18/30
  • 33. Annotation System Unlabelled Pool Labelled Dataset Annotation queries Security experts = expensive resources Anaël Bonneton PyParis 2017 - SecuML 19/30
  • 34. Active Learning Strategy Expert Active Learning Strategy Annotation queriesNew labelled instances Which instances should be annotated ? Which instances are the most informative ? Anaël Bonneton PyParis 2017 - SecuML 20/30
  • 35. Reducing the number of annotations is not enough ! Expert Active Learning Strategy Annotation queriesNew labelled instances Low expert waiting time Feedback: Your annotations are useful ! User interface for annotating Anaël Bonneton PyParis 2017 - SecuML 21/30
  • 36. ILAB: Interactive LABelling A whole annotation system Active learning strategy Feedback to the expert User interface for annotating Anaël Bonneton PyParis 2017 - SecuML 22/30
  • 37. ILAB’s Active Learning Strategy Annotations Queries Anaël Bonneton PyParis 2017 - SecuML 23/30
  • 38. ILAB’s Active Learning Strategy Annotations Queries Anaël Bonneton PyParis 2017 - SecuML 23/30
  • 39. ILAB’s Active Learning Strategy Annotations Queries Anaël Bonneton PyParis 2017 - SecuML 23/30
  • 40. ILAB’s Active Learning Strategy Annotations Queries Close to the decision boundary Anaël Bonneton PyParis 2017 - SecuML 23/30
  • 41. ILAB’s Active Learning Strategy Annotations Queries Close to the decision boundary Clusters = User-defined Families Anaël Bonneton PyParis 2017 - SecuML 23/30
  • 42. ILAB’s Active Learning Strategy Annotations Queries Close to the decision boundary Center of the clusters Clusters = User-defined Families Anaël Bonneton PyParis 2017 - SecuML 23/30
  • 43. ILAB’s Active Learning Strategy Annotations Queries Close to the decision boundary Center of the clusters Edge of the clusters Clusters = User-defined Families Will be published at RAID 2017. Anaël Bonneton PyParis 2017 - SecuML 23/30
  • 44. ILAB Demo ./SecuML_activeLearning ILAB PDF contagio Anaël Bonneton PyParis 2017 - SecuML 24/30
  • 45. ILAB Anaël Bonneton PyParis 2017 - SecuML 24/30
  • 46. ILAB Anaël Bonneton PyParis 2017 - SecuML 24/30
  • 47. ILAB Anaël Bonneton PyParis 2017 - SecuML 24/30
  • 48. ILAB Anaël Bonneton PyParis 2017 - SecuML 24/30
  • 49. SecuML Anaël Bonneton PyParis 2017 - SecuML 25/30
  • 50. SecuML 1 Acquire a Representative Training Dataset at Low Cost 2 Set up a Detection Model Anaël Bonneton PyParis 2017 - SecuML 26/30
  • 51. What’s next ? GUI to launch the experiments Anaël Bonneton PyParis 2017 - SecuML 27/30
  • 52. What’s next ? GUI to launch the experiments Interpretation of more complex models CreateDate_correct producer_space image_large Font_count creator_upper_case creator_lower_case Filter_count Page_count modDate_correct OpenAction_count keywords_unique_count XFA_count creator_dot author_lower_case -1.5 -1 -0.5 0 0.5 1 1.5 Anaël Bonneton PyParis 2017 - SecuML 27/30
  • 53. What’s next ? GUI to launch the experiments Interpretation of more complex models CreateDate_correct producer_space image_large Font_count creator_upper_case creator_lower_case Filter_count Page_count modDate_correct OpenAction_count keywords_unique_count XFA_count creator_dot author_lower_case -1.5 -1 -0.5 0 0.5 1 1.5 Automatic feature extraction PDF PCAP JavaScriptDOC Netflows EXE Anaël Bonneton PyParis 2017 - SecuML 27/30
  • 54. SecuML Algorithms and Corresponding Interfaces ! Detection Models Logistic regression, SVM, Naive Bayes, ... Performance, interpretation Interactive Machine Learning Active learning, Rare category detection Annotations, feedback Anaël Bonneton PyParis 2017 - SecuML 28/30
  • 55. SecuML Algorithms and Corresponding Interfaces ! Clustering K-means, Gaussian Mixtures, ... Instances in each cluster Projection PCA, RCA, LDA, LMNN, .. Projection on two components Anaël Bonneton PyParis 2017 - SecuML 29/30
  • 56. SecuML is available online ! https://github.com/ANSSI-FR/SecuML Only for Computer Security experts ? Model interpretation Interactive labelling Data visualization with projections Clustering display Anaël Bonneton PyParis 2017 - SecuML 30/30
  • 57. SecuML is available online ! https://github.com/ANSSI-FR/SecuML Only for Computer Security experts ? Model interpretation Interactive labelling Data visualization with projections Clustering display Anaël Bonneton PyParis 2017 - SecuML 30/30