SlideShare a Scribd company logo
1 of 31
Download to read offline
Improving the performance of Machine Learning models
with features selection techniques in the context of
signal/background discrimination for the AGILE telescope
Leonardo Baroncelli (PhD student in Data Science and Computation),
Andrea Bulgarelli, Nicolò Parmiggiani
(INAF/OAS Bologna)
leonardo.baroncelli@inaf.it
Deep Learning @ Inaf, F2F meeting, 16-19 September 2019 1
The AGILE-GRID data acquisition system is composed of three
main detectors:
● a Tungsten-Silicon Tracker designed to detect and image
photons in the 30 MeV-50 GeV energy band,
● a Mini-Calorimeter that detects gamma-rays and charged
particles energy deposits between 300 keV and 100 MeV,
● an anti-coincidence (AC) system that surrounds the Silicon
Tracker and the Mini-Calorimeter.
On-ground signal-to-background discrimination: FM3.199 filter.
The FM3.119 uses AdaBoost and a subset of discriminant variables
(57 features) selected by a domain expert (manual feature
selection).
The task: signal/background discrimination
AGILE (Astrorivelatore Gamma ad Immagini LEggero) is a scientific mission of the Italian Space Agency
(ASI), launched on April 23, 2007 for Gamma Ray Astrophysics.
2
⇒ Will automatic features selection techniques bring to better results?
⇒ Which is the best features selection method for this task? How many features?
The dataset
● It has been generated from Monte Carlo
simulations of the AGILE on-fly data acquisition
systems.
● It describes the particle interactions with the
AGILE instruments (silicon tracker, calorimeter,
anti-coincidence system).
● It is a supervised dataset: the particles are
divided into two classes: gamma photons and
background particles.
● It is composed by 169.813 rows (particles) and
260 columns (interaction features) in csv format.
3
Dataset preprocessing
Random (uniform) subsampling of the gamma examples to obtain a balanced dataset.
Initial shape = (169.813, 260)
From 169.185 to 67.952 examples.
Dropped ~60% of the dataset.
Final shape = (67.952, 241) 4
Cleaning + subsampling
Dataset exploration
For each feature:
● Counts histogram
● Density histogram
● Counts histogram in
log scale
● Box plot
5
Dataset exploration
For each group of related features: correlation matrix
6
Corr coeff. = 0.92
Corr coeff. = 0.79 Corr coeff. = 0.8
Example: BIG_CL = big cluster
Z plane
X plane
Multicollinearity problem
Dataset splitting
The dataset has been random splitted in three parts.
Shape: (47566, 241)
● class gamma = 23760
● class bg = 23806
Shape: (10193, 241)
● class gamma = 5104
● class bg = 5089
Shape: (10193, 241)
● class gamma = 5112
● class bg = 5081
7
Dimensionality reduction
Why?
● Data storage is reduced.
● Machine learning models are simplified which can lead to increased generalization capability.
● Computational complexity for training and testing machine learning models is reduced.
Dimensionality reduction techniques are generally divided into two categories:
● Features extraction ⇒ linear or non-linear projection of data into a lower-dimensional subspace.
● Features selection ⇒ subset of the original features based on some performance criterion.
8
Dimensionality reduction workflow
PEARSON
CORR. COEFF.
Correlation with the target.
MUTUAL
INFORMATION
Based on the concept of
entropy of a random variable.
LASSO It constrains or shrinks the model
coefficients towards zero.
EXTRA
TREES
It fits a number of randomized
decision trees on various
sub-samples of the dataset and
uses averaging to improve the
predictive accuracy and control
over-fitting.
9
..and many more techniques:
Bachu, Venkatesh & Anuradha, J.. (2019). A Review of Feature Selection and Its Methods. Cybernetics
and Information Technologies. 19. 3. 10.2478/cait-2019-0001.
preprocessing
features selection
features extraction
Dimensionality reduction workflow
PEARSON
CORR. COEFF.
Correlation with the target.
MUTUAL
INFORMATION
Based on the concept of
entropy of a random variable.
LASSO It constrains or shrinks the model
coefficients towards zero.
EXTRA
TREES
It fits a number of randomized
decision trees on various
sub-samples of the dataset and
uses averaging to improve the
predictive accuracy and control
over-fitting.
10
..and many more techniques:
Bachu, Venkatesh & Anuradha, J.. (2019). A Review of Feature Selection and Its Methods. Cybernetics
and Information Technologies. 19. 3. 10.2478/cait-2019-0001.
preprocessing
features selection
features extraction
Dimensionality reduction workflow
PEARSON
CORR. COEFF.
Correlation with the target.
MUTUAL
INFORMATION
Based on the concept of
entropy of a random variable.
LASSO It constrains or shrinks the model
coefficients towards zero.
EXTRA
TREES
It fits a number of randomized
decision trees on various
sub-samples of the dataset and
uses averaging to improve the
predictive accuracy and control
over-fitting.
11
..and many more techniques:
Bachu, Venkatesh & Anuradha, J.. (2019). A Review of Feature Selection and Its Methods. Cybernetics
and Information Technologies. 19. 3. 10.2478/cait-2019-0001.
preprocessing
features selection
features extraction
Automatically selected features vs FM3.119 features
First 57 most discriminant features
Method Intersection with
FM3.119
PEARSON
CORR. COEFF.
26.31%
MUTUAL
INFORMATION
38.6 %
LASSO 35.09 %
EXTRA TREES 33.33 %
12
First 30 most discriminant
features
Method Pairwise
correlations
(corr > 0.6)
PEARSON
CORR. COEFF.
26.6 %
MUTUAL
INFORMATION
2.6 %
LASSO 2.1 %
EXTRA TREES 4.8 %
Pairwise correlations
13
Training of the ML algorithms
Total models to train = 17 * (100+100) = 3.400
Algorithm Hyper-parameters space
Random Forest 'criterion': ['gini', 'entropy'],
'n_estimators': [900, 1100, 1300, 1500, 1700],
'max_depth': [15, 20, 25, 30, 35],
'max_features': ['sqrt', 'log2']
⇒ 100 models
Gradient Boosting 'learning_rate': [1, 0.1],
'n_estimators': [300 ,500, 700, 900, 1100],
'max_depth': [5, 10, 15, 20, 25],
'max_features': ['sqrt', 'log2']
⇒ 100 models
14
Features subsets
Features selection
method
Number of
features
FM3.119 57
ALL features 241
Pearson 30, 60 ,120, 180
Mutual Information 30, 60 ,120, 180
Lasso 30, 60 ,120
Extra Trees 30, 60 ,120, 180
⇒ 17 features subsets
Training/Testing workflow The 5-Fold cross-validation method has been used
to assess the performance of each model built with
a specific set of hyper-parameters.
Evaluation metric: F1 score (class. threshold = 0.5)
Selecting a features
subset
(1 of 17)
Grid search to find the
best hyper-parameters
for the algorithm
Selecting algorithm
(1 of 2)
Training of the best
model on the entire
training set.
15
Training/Testing workflow The 5-Fold cross-validation method has been used
to assess the performance of each model built with
a specific set of hyper-parameters.
Evaluation metric: F1 score (class. threshold = 0.5)
Selecting a features
subset
(1 of 17)
Grid search to find the
best hyper-parameters
for the algorithm
Selecting algorithm
(1 of 2)
Training of the best
model on the entire
training set.
16
Training/Testing workflow The 5-Fold cross-validation method has been used
to assess the performance of each model built with
a specific set of hyper-parameters.
Evaluation metric: F1 score (class. threshold = 0.5)
Selecting a features
subset
(1 of 17)
Grid search to find the
best hyper-parameters
for the algorithm
Selecting algorithm
(1 of 2)
Training of the best
model on the entire
training set.
17
Training/Testing workflow The 5-Fold cross-validation method has been used
to assess the performance of each model built with
a specific set of hyper-parameters.
Evaluation metric: F1 score (class. threshold = 0.5)
Selecting a features
subset
(1 of 17)
Grid search to find the
best hyper-parameters
for the algorithm
Selecting algorithm
(1 of 2)
Training of the best
model on the entire
training set.
18
Training/Testing workflow The 5-Fold cross-validation method has been used
to assess the performance of each model built with
a specific set of hyper-parameters.
Evaluation metric: F1 score (class. threshold = 0.5)
Selecting a features
subset
(1 of 17)
Grid search to find the
best hyper-parameters
for the algorithm
Selecting algorithm
(1 of 2)
Evaluation metric: F1 score (class. threshold = 0.5)
Training of the best
model on the entire
training set.
Evaluation of the best
model on the
validation set.
19
Training/Testing workflow The 5-Fold cross-validation method has been used
to assess the performance of each model built with
a specific set of hyper-parameters.
Evaluation metric: F1 score (class. threshold = 0.5)
Selecting a features
subset
(1 of 17)
Grid search to find the
best hyper-parameters
for the algorithm
Selecting algorithm
(1 of 2)
Evaluation metric: F1 score (class. threshold = 0.5)
Training of the best
model on the entire
training set
Evaluation of the best
model on the
validation set
34 fine-tuned
models
20
Performance of the 34 models
21
Performance of the 34 models
22
4451 661
869 4212
Performance of the 34 models
23
4451 661
869 4212
Performance of the 34 models
24
4451 661
869 4212
4741 371
534 4547
Performance of the 34 models
25
4451 661
869 4212
4741 371
534 4547
4701 411
770 4311
4710 402
760 4321
Performance of the 34 models
26
4451 661
869 4212
4741 371
534 4547
4701 411
770 4311
4710 402
760 4321
4783 329
588 4493
Choosing the final model
Algorithm Fs method Number of features F1 score ROCAUC
Gradient Boosting ALL features 239 0.909491 0.911164
Gradient Boosting Mutual Information 180 0.908302 0.910746
Gradient Boosting Lasso 120 0.907402 0.909958
... ... ... ... ...
Algorithm Fs method Number of features F1 score ROCAUC
Random Forest Mutual Information 180 0.881477 0.885892
Random Forest Extra Trees 180 0.880734 0.885109
Random Forest Lasso 120 0.879837 0.883844
... ... ... ... ... 27
BG rejection vs Signal efficiency
Background rejection
vs Signal efficiency εB=0.01 εB=0.1 εB=3
Gradient Boosting
with Lasso
(120 features)
0.798 0.949 0.983
Gradient Boosting
with FM3.119
(57 features)
0.560 0.869 0.958
(production filter)
AdaBoost
with FM3.119
(57 features)
0.747 0.925 0.976
28
1 - εB
background efficiency (FPR)
threshold = 1
threshold = 0
⇒ TPR = TP / (TP + FN)
⇒
TNR
=
TN
/
(TN
+
FP)
● A multi-collinearity analysis is required to completely remove the correlation among the features.
● Features selection techniques MI, Lasso and Extra Trees selected a subset of features that generated a
better model with respect to manual selection.
● The best performances were NOT achieved using a subset of automatic selected features but instead
they were achieved training the Gradient Boosting model using all the features (Gradient Boosting
performs features selection implicitly).
● With ~50% of the total features (120) selected by Lasso, it has been possible to obtain performance
really close to the best results. Data storage requirement is reduced.
● With a ~12% of the total features (30) selected by Extra Trees, it has been possible to obtain a quite
good 0.85 F1 score and potentially an easy interpretable model (I need the domain expert, now!).
Conclusions
29
Thank you for your attention!
Any questions?
30
EXTRA: CVGridSearch mt scalability
Feature selection hyper-parameters space
("mi", 10)
Random forest hyper-parameters space
grid_params_rf = [{ 'rfc__criterion': ['gini', 'entropy'],
'rfc__n_estimators': [300, 500, 700, 900],
‘rfc__max_features': ['sqrt', 'log2'],
'rfc__max_depth': [10, 15, 20],
}]
CPU(s) 192
Thread(s) per core 8
Core(s) per socket 6
Socket(s) 4
⇒ 96 will be used to run the Grid Search
31

More Related Content

Similar to Deep_Learning__INAF_baroncelli.pdf

Artificial Intelligence based Pattern Recognition
Artificial Intelligence based Pattern RecognitionArtificial Intelligence based Pattern Recognition
Artificial Intelligence based Pattern RecognitionDr. Amarjeet Singh
 
A Threshold Fuzzy Entropy Based Feature Selection: Comparative Study
A Threshold Fuzzy Entropy Based Feature Selection:  Comparative StudyA Threshold Fuzzy Entropy Based Feature Selection:  Comparative Study
A Threshold Fuzzy Entropy Based Feature Selection: Comparative StudyIJMER
 
Analysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approachAnalysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approachLorenzo Cesaretti
 
Object Tracking By Online Discriminative Feature Selection Algorithm
Object Tracking By Online Discriminative Feature Selection AlgorithmObject Tracking By Online Discriminative Feature Selection Algorithm
Object Tracking By Online Discriminative Feature Selection AlgorithmIRJET Journal
 
New feature selection based on kernel
New feature selection based on kernelNew feature selection based on kernel
New feature selection based on kerneljournalBEEI
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchJisang Yoon
 
deep_Visualization in Data mining.ppt
deep_Visualization in Data mining.pptdeep_Visualization in Data mining.ppt
deep_Visualization in Data mining.pptPerumalPitchandi
 
Comparative analysis of classical multi-objective evolutionary algorithms an...
 Comparative analysis of classical multi-objective evolutionary algorithms an... Comparative analysis of classical multi-objective evolutionary algorithms an...
Comparative analysis of classical multi-objective evolutionary algorithms an...Javier Ferrer, PhD
 
Improved feature selection using a hybrid side-blotched lizard algorithm and ...
Improved feature selection using a hybrid side-blotched lizard algorithm and ...Improved feature selection using a hybrid side-blotched lizard algorithm and ...
Improved feature selection using a hybrid side-blotched lizard algorithm and ...IJECEIAES
 
Automated Machine Learning via Sequential Uniform Designs
Automated Machine Learning via Sequential Uniform DesignsAutomated Machine Learning via Sequential Uniform Designs
Automated Machine Learning via Sequential Uniform DesignsAijun Zhang
 
A Sparse-Coding Based Approach for Class-Specific Feature Selection
A Sparse-Coding Based Approach for Class-Specific Feature SelectionA Sparse-Coding Based Approach for Class-Specific Feature Selection
A Sparse-Coding Based Approach for Class-Specific Feature SelectionDavide Nardone
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET Journal
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET Journal
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerDatabricks
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsChirag Gupta
 
A Firefly based improved clustering algorithm
A Firefly based improved clustering algorithmA Firefly based improved clustering algorithm
A Firefly based improved clustering algorithmIRJET Journal
 
The role of Dataset in training ANFIS System for Course Advisor
The role of Dataset in training ANFIS System for Course AdvisorThe role of Dataset in training ANFIS System for Course Advisor
The role of Dataset in training ANFIS System for Course AdvisorAM Publications
 
JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...
JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...
JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...IEEEGLOBALSOFTTECHNOLOGIES
 
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...IEEEGLOBALSOFTTECHNOLOGIES
 

Similar to Deep_Learning__INAF_baroncelli.pdf (20)

Artificial Intelligence based Pattern Recognition
Artificial Intelligence based Pattern RecognitionArtificial Intelligence based Pattern Recognition
Artificial Intelligence based Pattern Recognition
 
A Threshold Fuzzy Entropy Based Feature Selection: Comparative Study
A Threshold Fuzzy Entropy Based Feature Selection:  Comparative StudyA Threshold Fuzzy Entropy Based Feature Selection:  Comparative Study
A Threshold Fuzzy Entropy Based Feature Selection: Comparative Study
 
Analysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approachAnalysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approach
 
Object Tracking By Online Discriminative Feature Selection Algorithm
Object Tracking By Online Discriminative Feature Selection AlgorithmObject Tracking By Online Discriminative Feature Selection Algorithm
Object Tracking By Online Discriminative Feature Selection Algorithm
 
New feature selection based on kernel
New feature selection based on kernelNew feature selection based on kernel
New feature selection based on kernel
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
 
deep_Visualization in Data mining.ppt
deep_Visualization in Data mining.pptdeep_Visualization in Data mining.ppt
deep_Visualization in Data mining.ppt
 
Comparative analysis of classical multi-objective evolutionary algorithms an...
 Comparative analysis of classical multi-objective evolutionary algorithms an... Comparative analysis of classical multi-objective evolutionary algorithms an...
Comparative analysis of classical multi-objective evolutionary algorithms an...
 
Improved feature selection using a hybrid side-blotched lizard algorithm and ...
Improved feature selection using a hybrid side-blotched lizard algorithm and ...Improved feature selection using a hybrid side-blotched lizard algorithm and ...
Improved feature selection using a hybrid side-blotched lizard algorithm and ...
 
Da35573574
Da35573574Da35573574
Da35573574
 
Automated Machine Learning via Sequential Uniform Designs
Automated Machine Learning via Sequential Uniform DesignsAutomated Machine Learning via Sequential Uniform Designs
Automated Machine Learning via Sequential Uniform Designs
 
A Sparse-Coding Based Approach for Class-Specific Feature Selection
A Sparse-Coding Based Approach for Class-Specific Feature SelectionA Sparse-Coding Based Approach for Class-Specific Feature Selection
A Sparse-Coding Based Approach for Class-Specific Feature Selection
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification Algorithms
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification Algorithms
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional Experts
 
A Firefly based improved clustering algorithm
A Firefly based improved clustering algorithmA Firefly based improved clustering algorithm
A Firefly based improved clustering algorithm
 
The role of Dataset in training ANFIS System for Course Advisor
The role of Dataset in training ANFIS System for Course AdvisorThe role of Dataset in training ANFIS System for Course Advisor
The role of Dataset in training ANFIS System for Course Advisor
 
JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...
JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...
JAVA 2013 IEEE PROJECT A fast clustering based feature subset selection algor...
 
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT A fast clustering based feature subset ...
 

Recently uploaded

Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 

Recently uploaded (20)

Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 

Deep_Learning__INAF_baroncelli.pdf

  • 1. Improving the performance of Machine Learning models with features selection techniques in the context of signal/background discrimination for the AGILE telescope Leonardo Baroncelli (PhD student in Data Science and Computation), Andrea Bulgarelli, Nicolò Parmiggiani (INAF/OAS Bologna) leonardo.baroncelli@inaf.it Deep Learning @ Inaf, F2F meeting, 16-19 September 2019 1
  • 2. The AGILE-GRID data acquisition system is composed of three main detectors: ● a Tungsten-Silicon Tracker designed to detect and image photons in the 30 MeV-50 GeV energy band, ● a Mini-Calorimeter that detects gamma-rays and charged particles energy deposits between 300 keV and 100 MeV, ● an anti-coincidence (AC) system that surrounds the Silicon Tracker and the Mini-Calorimeter. On-ground signal-to-background discrimination: FM3.199 filter. The FM3.119 uses AdaBoost and a subset of discriminant variables (57 features) selected by a domain expert (manual feature selection). The task: signal/background discrimination AGILE (Astrorivelatore Gamma ad Immagini LEggero) is a scientific mission of the Italian Space Agency (ASI), launched on April 23, 2007 for Gamma Ray Astrophysics. 2 ⇒ Will automatic features selection techniques bring to better results? ⇒ Which is the best features selection method for this task? How many features?
  • 3. The dataset ● It has been generated from Monte Carlo simulations of the AGILE on-fly data acquisition systems. ● It describes the particle interactions with the AGILE instruments (silicon tracker, calorimeter, anti-coincidence system). ● It is a supervised dataset: the particles are divided into two classes: gamma photons and background particles. ● It is composed by 169.813 rows (particles) and 260 columns (interaction features) in csv format. 3
  • 4. Dataset preprocessing Random (uniform) subsampling of the gamma examples to obtain a balanced dataset. Initial shape = (169.813, 260) From 169.185 to 67.952 examples. Dropped ~60% of the dataset. Final shape = (67.952, 241) 4 Cleaning + subsampling
  • 5. Dataset exploration For each feature: ● Counts histogram ● Density histogram ● Counts histogram in log scale ● Box plot 5
  • 6. Dataset exploration For each group of related features: correlation matrix 6 Corr coeff. = 0.92 Corr coeff. = 0.79 Corr coeff. = 0.8 Example: BIG_CL = big cluster Z plane X plane Multicollinearity problem
  • 7. Dataset splitting The dataset has been random splitted in three parts. Shape: (47566, 241) ● class gamma = 23760 ● class bg = 23806 Shape: (10193, 241) ● class gamma = 5104 ● class bg = 5089 Shape: (10193, 241) ● class gamma = 5112 ● class bg = 5081 7
  • 8. Dimensionality reduction Why? ● Data storage is reduced. ● Machine learning models are simplified which can lead to increased generalization capability. ● Computational complexity for training and testing machine learning models is reduced. Dimensionality reduction techniques are generally divided into two categories: ● Features extraction ⇒ linear or non-linear projection of data into a lower-dimensional subspace. ● Features selection ⇒ subset of the original features based on some performance criterion. 8
  • 9. Dimensionality reduction workflow PEARSON CORR. COEFF. Correlation with the target. MUTUAL INFORMATION Based on the concept of entropy of a random variable. LASSO It constrains or shrinks the model coefficients towards zero. EXTRA TREES It fits a number of randomized decision trees on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. 9 ..and many more techniques: Bachu, Venkatesh & Anuradha, J.. (2019). A Review of Feature Selection and Its Methods. Cybernetics and Information Technologies. 19. 3. 10.2478/cait-2019-0001. preprocessing features selection features extraction
  • 10. Dimensionality reduction workflow PEARSON CORR. COEFF. Correlation with the target. MUTUAL INFORMATION Based on the concept of entropy of a random variable. LASSO It constrains or shrinks the model coefficients towards zero. EXTRA TREES It fits a number of randomized decision trees on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. 10 ..and many more techniques: Bachu, Venkatesh & Anuradha, J.. (2019). A Review of Feature Selection and Its Methods. Cybernetics and Information Technologies. 19. 3. 10.2478/cait-2019-0001. preprocessing features selection features extraction
  • 11. Dimensionality reduction workflow PEARSON CORR. COEFF. Correlation with the target. MUTUAL INFORMATION Based on the concept of entropy of a random variable. LASSO It constrains or shrinks the model coefficients towards zero. EXTRA TREES It fits a number of randomized decision trees on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. 11 ..and many more techniques: Bachu, Venkatesh & Anuradha, J.. (2019). A Review of Feature Selection and Its Methods. Cybernetics and Information Technologies. 19. 3. 10.2478/cait-2019-0001. preprocessing features selection features extraction
  • 12. Automatically selected features vs FM3.119 features First 57 most discriminant features Method Intersection with FM3.119 PEARSON CORR. COEFF. 26.31% MUTUAL INFORMATION 38.6 % LASSO 35.09 % EXTRA TREES 33.33 % 12
  • 13. First 30 most discriminant features Method Pairwise correlations (corr > 0.6) PEARSON CORR. COEFF. 26.6 % MUTUAL INFORMATION 2.6 % LASSO 2.1 % EXTRA TREES 4.8 % Pairwise correlations 13
  • 14. Training of the ML algorithms Total models to train = 17 * (100+100) = 3.400 Algorithm Hyper-parameters space Random Forest 'criterion': ['gini', 'entropy'], 'n_estimators': [900, 1100, 1300, 1500, 1700], 'max_depth': [15, 20, 25, 30, 35], 'max_features': ['sqrt', 'log2'] ⇒ 100 models Gradient Boosting 'learning_rate': [1, 0.1], 'n_estimators': [300 ,500, 700, 900, 1100], 'max_depth': [5, 10, 15, 20, 25], 'max_features': ['sqrt', 'log2'] ⇒ 100 models 14 Features subsets Features selection method Number of features FM3.119 57 ALL features 241 Pearson 30, 60 ,120, 180 Mutual Information 30, 60 ,120, 180 Lasso 30, 60 ,120 Extra Trees 30, 60 ,120, 180 ⇒ 17 features subsets
  • 15. Training/Testing workflow The 5-Fold cross-validation method has been used to assess the performance of each model built with a specific set of hyper-parameters. Evaluation metric: F1 score (class. threshold = 0.5) Selecting a features subset (1 of 17) Grid search to find the best hyper-parameters for the algorithm Selecting algorithm (1 of 2) Training of the best model on the entire training set. 15
  • 16. Training/Testing workflow The 5-Fold cross-validation method has been used to assess the performance of each model built with a specific set of hyper-parameters. Evaluation metric: F1 score (class. threshold = 0.5) Selecting a features subset (1 of 17) Grid search to find the best hyper-parameters for the algorithm Selecting algorithm (1 of 2) Training of the best model on the entire training set. 16
  • 17. Training/Testing workflow The 5-Fold cross-validation method has been used to assess the performance of each model built with a specific set of hyper-parameters. Evaluation metric: F1 score (class. threshold = 0.5) Selecting a features subset (1 of 17) Grid search to find the best hyper-parameters for the algorithm Selecting algorithm (1 of 2) Training of the best model on the entire training set. 17
  • 18. Training/Testing workflow The 5-Fold cross-validation method has been used to assess the performance of each model built with a specific set of hyper-parameters. Evaluation metric: F1 score (class. threshold = 0.5) Selecting a features subset (1 of 17) Grid search to find the best hyper-parameters for the algorithm Selecting algorithm (1 of 2) Training of the best model on the entire training set. 18
  • 19. Training/Testing workflow The 5-Fold cross-validation method has been used to assess the performance of each model built with a specific set of hyper-parameters. Evaluation metric: F1 score (class. threshold = 0.5) Selecting a features subset (1 of 17) Grid search to find the best hyper-parameters for the algorithm Selecting algorithm (1 of 2) Evaluation metric: F1 score (class. threshold = 0.5) Training of the best model on the entire training set. Evaluation of the best model on the validation set. 19
  • 20. Training/Testing workflow The 5-Fold cross-validation method has been used to assess the performance of each model built with a specific set of hyper-parameters. Evaluation metric: F1 score (class. threshold = 0.5) Selecting a features subset (1 of 17) Grid search to find the best hyper-parameters for the algorithm Selecting algorithm (1 of 2) Evaluation metric: F1 score (class. threshold = 0.5) Training of the best model on the entire training set Evaluation of the best model on the validation set 34 fine-tuned models 20
  • 21. Performance of the 34 models 21
  • 22. Performance of the 34 models 22 4451 661 869 4212
  • 23. Performance of the 34 models 23 4451 661 869 4212
  • 24. Performance of the 34 models 24 4451 661 869 4212 4741 371 534 4547
  • 25. Performance of the 34 models 25 4451 661 869 4212 4741 371 534 4547 4701 411 770 4311 4710 402 760 4321
  • 26. Performance of the 34 models 26 4451 661 869 4212 4741 371 534 4547 4701 411 770 4311 4710 402 760 4321 4783 329 588 4493
  • 27. Choosing the final model Algorithm Fs method Number of features F1 score ROCAUC Gradient Boosting ALL features 239 0.909491 0.911164 Gradient Boosting Mutual Information 180 0.908302 0.910746 Gradient Boosting Lasso 120 0.907402 0.909958 ... ... ... ... ... Algorithm Fs method Number of features F1 score ROCAUC Random Forest Mutual Information 180 0.881477 0.885892 Random Forest Extra Trees 180 0.880734 0.885109 Random Forest Lasso 120 0.879837 0.883844 ... ... ... ... ... 27
  • 28. BG rejection vs Signal efficiency Background rejection vs Signal efficiency εB=0.01 εB=0.1 εB=3 Gradient Boosting with Lasso (120 features) 0.798 0.949 0.983 Gradient Boosting with FM3.119 (57 features) 0.560 0.869 0.958 (production filter) AdaBoost with FM3.119 (57 features) 0.747 0.925 0.976 28 1 - εB background efficiency (FPR) threshold = 1 threshold = 0 ⇒ TPR = TP / (TP + FN) ⇒ TNR = TN / (TN + FP)
  • 29. ● A multi-collinearity analysis is required to completely remove the correlation among the features. ● Features selection techniques MI, Lasso and Extra Trees selected a subset of features that generated a better model with respect to manual selection. ● The best performances were NOT achieved using a subset of automatic selected features but instead they were achieved training the Gradient Boosting model using all the features (Gradient Boosting performs features selection implicitly). ● With ~50% of the total features (120) selected by Lasso, it has been possible to obtain performance really close to the best results. Data storage requirement is reduced. ● With a ~12% of the total features (30) selected by Extra Trees, it has been possible to obtain a quite good 0.85 F1 score and potentially an easy interpretable model (I need the domain expert, now!). Conclusions 29
  • 30. Thank you for your attention! Any questions? 30
  • 31. EXTRA: CVGridSearch mt scalability Feature selection hyper-parameters space ("mi", 10) Random forest hyper-parameters space grid_params_rf = [{ 'rfc__criterion': ['gini', 'entropy'], 'rfc__n_estimators': [300, 500, 700, 900], ‘rfc__max_features': ['sqrt', 'log2'], 'rfc__max_depth': [10, 15, 20], }] CPU(s) 192 Thread(s) per core 8 Core(s) per socket 6 Socket(s) 4 ⇒ 96 will be used to run the Grid Search 31