SlideShare a Scribd company logo
Scikit-Learn in particle physics 
Gilles Louppe 
CERN, Switzerland 
November 18, 2014 
1 / 13
High Energy Physics (HEP) 
c CERN 
2 / 13
High Energy Physics (HEP) 
c CERN 
Study the nature of the 
constituents of matter 
2 / 13
High Energy Physics (HEP) 
c CERN 
Study the nature of the 
constituents of matter 
2 / 13
Particle detector 101 
c ATLAS CERN 
3 / 13
Particle detector 101 
c ATLAS CERN 
3 / 13
Data analysis tasks in detectors 
1 Track
nding 
Reconstruction of particle trajectories from hits in detectors 
2 Budgeted classi
cation 
Real-time classi
cation of events in triggers 
3 Classi
cation of signal / background events 
Oine statistical analysis for discovery of new particles 
4 / 13
The Kaggle Higgs Boson challenge (in HEP terms) 
 Data comes as a
nite set 
D = f(xi ; yi ;wi )ji = 0; : : : ;N  1g; 
where xi 2 Rd , yi 2 fsignal; backgroundg and wi 2 R+. 
 The goal is to
nd a region G = fxjg(x) = signalg  Rd , 
de
ned from a binary function g, for which the 
background-only hypothesis can be rejected at a strong 
signi
cance level (p = 2:87  107, i.e., 5 sigma). 
 Empirically, this is approximately equivalent to
nding g from 
D so as to maximize AMS  ps 
b 
, where 
s = 
P 
fi jyi =signal;g(xi)=signalg wi 
b = 
P 
fi jyi =background;g(xi)=signalg wi 
5 / 13

More Related Content

Viewers also liked

Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random Forests
Gilles Louppe
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to Practice
Gilles Louppe
 
Scikit-Learn - Or why I joined an open source software project
Scikit-Learn - Or why I joined an open source software projectScikit-Learn - Or why I joined an open source software project
Scikit-Learn - Or why I joined an open source software project
Gilles Louppe
 
Understanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized treesUnderstanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized trees
Gilles Louppe
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
Gilles Louppe
 
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnAccelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Gilles Louppe
 

Viewers also liked (6)

Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random Forests
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to Practice
 
Scikit-Learn - Or why I joined an open source software project
Scikit-Learn - Or why I joined an open source software projectScikit-Learn - Or why I joined an open source software project
Scikit-Learn - Or why I joined an open source software project
 
Understanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized treesUnderstanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized trees
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
 
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnAccelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
 

Similar to Scikit-Learn in Particle Physics

Chap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsChap 8. Optimization for training deep models
Chap 8. Optimization for training deep models
Young-Geun Choi
 
Learning do discover: machine learning in high-energy physics
Learning do discover: machine learning in high-energy physicsLearning do discover: machine learning in high-energy physics
Learning do discover: machine learning in high-energy physics
Balázs Kégl
 
When The New Science Is In The Outliers
When The New Science Is In The OutliersWhen The New Science Is In The Outliers
When The New Science Is In The Outliers
aimsnist
 
Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
David Gleich
 
PAISS (PRAIRIE AI Summer School) Digest July 2018
PAISS (PRAIRIE AI Summer School) Digest July 2018 PAISS (PRAIRIE AI Summer School) Digest July 2018
PAISS (PRAIRIE AI Summer School) Digest July 2018
Natalia Díaz Rodríguez
 
New Rough Set Attribute Reduction Algorithm based on Grey Wolf Optimization
New Rough Set Attribute Reduction Algorithm based on Grey Wolf OptimizationNew Rough Set Attribute Reduction Algorithm based on Grey Wolf Optimization
New Rough Set Attribute Reduction Algorithm based on Grey Wolf Optimization
Aboul Ella Hassanien
 
Introduction to LLMs
Introduction to LLMsIntroduction to LLMs
Introduction to LLMs
Loic Merckel
 
Optimal transport driven CycleGAN for unsupervised learning in inverse problems
Optimal transport driven CycleGAN for unsupervised learning in inverse problemsOptimal transport driven CycleGAN for unsupervised learning in inverse problems
Optimal transport driven CycleGAN for unsupervised learning in inverse problems
ByeongsuSim1
 
Graph Neural Prompting with Large Language Models.pdf
Graph Neural Prompting with Large Language Models.pdfGraph Neural Prompting with Large Language Models.pdf
Graph Neural Prompting with Large Language Models.pdf
Po-Chuan Chen
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLab
Impetus Technologies
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
butest
 
Individual Brain Charting: third-release dataset validation
Individual Brain Charting: third-release dataset validationIndividual Brain Charting: third-release dataset validation
Individual Brain Charting: third-release dataset validation
Ana Luísa Pinho
 
深度学习639页PPT/////////////////////////////
深度学习639页PPT/////////////////////////////深度学习639页PPT/////////////////////////////
深度学习639页PPT/////////////////////////////
alicejiang7888
 
Enabling Congestion Control Using Homogeneous Archetypes
Enabling Congestion Control Using Homogeneous ArchetypesEnabling Congestion Control Using Homogeneous Archetypes
Enabling Congestion Control Using Homogeneous Archetypes
James Johnson
 
12_applications.pdf
12_applications.pdf12_applications.pdf
12_applications.pdf
KSChidanandKumarJSSS
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
inside-BigData.com
 
Error of Multileaf collimator prediction using recurrent neural network (LSTM)
Error of Multileaf collimator prediction using recurrent neural network (LSTM)Error of Multileaf collimator prediction using recurrent neural network (LSTM)
Error of Multileaf collimator prediction using recurrent neural network (LSTM)
WonjoongCheon
 
[David a. coley]_an_introduction_to_genetic_algori(book_fi.org)
[David a. coley]_an_introduction_to_genetic_algori(book_fi.org)[David a. coley]_an_introduction_to_genetic_algori(book_fi.org)
[David a. coley]_an_introduction_to_genetic_algori(book_fi.org)
swapnatoya
 
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Fabian Pedregosa
 
Tutorial of topological data analysis part 3(Mapper algorithm)
Tutorial of topological data analysis part 3(Mapper algorithm)Tutorial of topological data analysis part 3(Mapper algorithm)
Tutorial of topological data analysis part 3(Mapper algorithm)
Ha Phuong
 

Similar to Scikit-Learn in Particle Physics (20)

Chap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsChap 8. Optimization for training deep models
Chap 8. Optimization for training deep models
 
Learning do discover: machine learning in high-energy physics
Learning do discover: machine learning in high-energy physicsLearning do discover: machine learning in high-energy physics
Learning do discover: machine learning in high-energy physics
 
When The New Science Is In The Outliers
When The New Science Is In The OutliersWhen The New Science Is In The Outliers
When The New Science Is In The Outliers
 
Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
PAISS (PRAIRIE AI Summer School) Digest July 2018
PAISS (PRAIRIE AI Summer School) Digest July 2018 PAISS (PRAIRIE AI Summer School) Digest July 2018
PAISS (PRAIRIE AI Summer School) Digest July 2018
 
New Rough Set Attribute Reduction Algorithm based on Grey Wolf Optimization
New Rough Set Attribute Reduction Algorithm based on Grey Wolf OptimizationNew Rough Set Attribute Reduction Algorithm based on Grey Wolf Optimization
New Rough Set Attribute Reduction Algorithm based on Grey Wolf Optimization
 
Introduction to LLMs
Introduction to LLMsIntroduction to LLMs
Introduction to LLMs
 
Optimal transport driven CycleGAN for unsupervised learning in inverse problems
Optimal transport driven CycleGAN for unsupervised learning in inverse problemsOptimal transport driven CycleGAN for unsupervised learning in inverse problems
Optimal transport driven CycleGAN for unsupervised learning in inverse problems
 
Graph Neural Prompting with Large Language Models.pdf
Graph Neural Prompting with Large Language Models.pdfGraph Neural Prompting with Large Language Models.pdf
Graph Neural Prompting with Large Language Models.pdf
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLab
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Individual Brain Charting: third-release dataset validation
Individual Brain Charting: third-release dataset validationIndividual Brain Charting: third-release dataset validation
Individual Brain Charting: third-release dataset validation
 
深度学习639页PPT/////////////////////////////
深度学习639页PPT/////////////////////////////深度学习639页PPT/////////////////////////////
深度学习639页PPT/////////////////////////////
 
Enabling Congestion Control Using Homogeneous Archetypes
Enabling Congestion Control Using Homogeneous ArchetypesEnabling Congestion Control Using Homogeneous Archetypes
Enabling Congestion Control Using Homogeneous Archetypes
 
12_applications.pdf
12_applications.pdf12_applications.pdf
12_applications.pdf
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
Error of Multileaf collimator prediction using recurrent neural network (LSTM)
Error of Multileaf collimator prediction using recurrent neural network (LSTM)Error of Multileaf collimator prediction using recurrent neural network (LSTM)
Error of Multileaf collimator prediction using recurrent neural network (LSTM)
 
[David a. coley]_an_introduction_to_genetic_algori(book_fi.org)
[David a. coley]_an_introduction_to_genetic_algori(book_fi.org)[David a. coley]_an_introduction_to_genetic_algori(book_fi.org)
[David a. coley]_an_introduction_to_genetic_algori(book_fi.org)
 
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and Algorithms
 
Tutorial of topological data analysis part 3(Mapper algorithm)
Tutorial of topological data analysis part 3(Mapper algorithm)Tutorial of topological data analysis part 3(Mapper algorithm)
Tutorial of topological data analysis part 3(Mapper algorithm)
 

Recently uploaded

AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 

Recently uploaded (20)

AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 

Scikit-Learn in Particle Physics

  • 1. Scikit-Learn in particle physics Gilles Louppe CERN, Switzerland November 18, 2014 1 / 13
  • 2. High Energy Physics (HEP) c CERN 2 / 13
  • 3. High Energy Physics (HEP) c CERN Study the nature of the constituents of matter 2 / 13
  • 4. High Energy Physics (HEP) c CERN Study the nature of the constituents of matter 2 / 13
  • 5. Particle detector 101 c ATLAS CERN 3 / 13
  • 6. Particle detector 101 c ATLAS CERN 3 / 13
  • 7. Data analysis tasks in detectors 1 Track
  • 8. nding Reconstruction of particle trajectories from hits in detectors 2 Budgeted classi
  • 10. cation of events in triggers 3 Classi
  • 11. cation of signal / background events Oine statistical analysis for discovery of new particles 4 / 13
  • 12. The Kaggle Higgs Boson challenge (in HEP terms) Data comes as a
  • 13. nite set D = f(xi ; yi ;wi )ji = 0; : : : ;N 1g; where xi 2 Rd , yi 2 fsignal; backgroundg and wi 2 R+. The goal is to
  • 14. nd a region G = fxjg(x) = signalg Rd , de
  • 15. ned from a binary function g, for which the background-only hypothesis can be rejected at a strong signi
  • 16. cance level (p = 2:87 107, i.e., 5 sigma). Empirically, this is approximately equivalent to
  • 17. nding g from D so as to maximize AMS ps b , where s = P fi jyi =signal;g(xi)=signalg wi b = P fi jyi =background;g(xi)=signalg wi 5 / 13
  • 18. The Kaggle Higgs Boson challenge (in ML terms) Find a binary classi
  • 19. er g : Rd7! fsignal; backgroundg maximizing the objective function AMS s p b ; where s is the weighted number of true positives b is the weighted number of false positives. 6 / 13
  • 20. Winning methods Ensembles of neural networks (1st and 3rd) ; Ensembles of regularized greedy forests (2nd) ; Boosting with regularization (XGBoost package). Most contestants dit not optimize AMS directly ; But chosed the prediction cut-o maximizing AMS in CV. 7 / 13
  • 21. Lessons learned (for machine learning) AMS is highly unstable, hence the need for Rigorous and stable cross-validation to avoid over
  • 22. tting. Ensembles to reduce variance ; Regularized base models. Support of samples weights wi in classi
  • 23. cation models was key for this challenge. Feature engineering hardly helped. (Because features already incorporated physics knowledge.) 8 / 13
  • 24. Lessons learned (for physicists) Domain knowledge hardly helped. Standard machine learning techniques, run on a single laptop, beat benchmarks without much eorts. Physicists started to realize that collaborations with machine learning experts is likely to be bene
  • 25. cial. I worked on the ATLAS experiment for over a decade [...] It is rather depressing to see how badly I scored. The
  • 26. nal results seem to reinforce the idea that the machine learning experience is vastly more important in a similar contest than the knowledge of particle physics. I think that many people underestimate the computers. It is probably the reason why ML experts and physicists should work together for
  • 29. c software in HEP ROOT and TMVA are standard data analysis tools in HEP. Surprisingly, this HEP software ecosystem proved to be rather limited and easily outperformed (at least in the context of the Kaggle challenge). 10 / 13
  • 30. Scikit-Learn in Particle Physics ? The main technical blocker for the larger adoption of Scikit-Learn in HEP remains the full support of sample weights throughout all essential modules. Since 0.16, weights are supported in all ensembles and in most metrics. Next step is to add support in grid search. In parallel, domain-speci
  • 31. c packages are getting traction ROOTpy, for bridging the gap between ROOT data format and NumPy ; lhcb trigger ml, implementing ML algorithms for HEP (mostly Boosting variants), on top of scikit-learn. 11 / 13
  • 32. Major blocker : social reasons ? The adoption of external solutions (e.g., the scienti
  • 33. c Python stack or Scikit-Learn) appears to be slow and dicult in the HEP community because of No ground-breaking added-value ; The learning curve of new tools ; Lack of understanding of non-HEP methods ; Isolation from the community ; Genuine ignorance. 12 / 13
  • 34. Conclusions Scikit-Learn has the potential to become an important tool in HEP. But we are not there yet [WIP]. Overall, both for data analysis and software aspects, this calls for a larger collaboration between data sciences and HEP. The process of attempting as a physicist to compete against ML experts has given us a new respect for a
  • 35. eld that (through ignorance) none of us held in as high esteem as we do now. 13 / 13
  • 36. c xkcd Questions ? 14 / 13