Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
B. Kégl / AppStat@LAL Learning to discover
LEARNING TO DISCOVER:
MACHINE LEARNING IN HIGH-ENERGY PHYSICS
Linear Accelerato...
B. Kégl / AppStat@LAL Learning to discover
OUTLINE
• What is machine learning?	

• Three projects to illustrate data scien...
B. Kégl / AppStat@LAL Learning to discover
WHAT IS MACHINE LEARNING?
• “The science of getting computers to act without be...
B. Kégl / AppStat@LAL Learning to discover
MACHINE LEARNING TAXONOMY
• Supervised learning: non-parametric (model-free) in...
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION
Character recognition
5
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION
Emotion recognition
6
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION
Speech recognition
7
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION
• Input: a usually high dimensional vector x	

• Output: a categ...
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION
The only goal is a low probability of error	

P(g(x) = y)	

on p...
B. Kégl / AppStat@LAL Learning to discover
BUDGETED CLASSIFICATION
Real time face detection
10
B. Kégl / AppStat@LAL Learning to discover
BUDGETED CLASSIFICATION
Real time web page ranking
11
B. Kégl / AppStat@LAL Learning to discover
BUDGETED CLASSIFICATION
Real time ad placement
12
B. Kégl / AppStat@LAL Learning to discover
BUDGETED CLASSIFICATION
Real time signal/background separation
13
B. Kégl / AppStat@LAL Learning to discover
BUDGETED CLASSIFICATION
The second goal is the fast execution of 	

g(x)
14
B. Kégl / AppStat@LAL Learning to discover
BUDGETED CLASSIFICATION
Trade-off between quality and speed
15
B. Kégl / AppStat@LAL Learning to discover
BUDGETED CLASSIFICATION
• Time constraints	

• Memory constraints	

• Consumpti...
B. Kégl / AppStat@LAL Learning to discover
BUDGETED CLASSIFICATION
Learning deep decision DAGs in a
{djalel.benbouzid,busa...
B. Kégl / AppStat@LAL Learning to discover
THE LHCB TRIGGER
• Collaboration with	

• Vava Gligorov (CERN)	

• Mike William...
B. Kégl / AppStat@LAL Learning to discover
THE LHCB TRIGGER
• A beautifully complex problem	

• varying feature costs	

• ...
B. Kégl / AppStat@LAL Learning to discover
THE LHCB TRIGGER
Immediate cost
Bag-dependent cost
Value-dependent cost
D0_VTX_...
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Easy background
Background-like Signal-like...
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
QUIT
Benbouzid et al. ICML 2012
Easy backgr...
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C...
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
...
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
...
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
...
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
...
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
...
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
...
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
...
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
...
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
...
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
...
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Easy signal Benbouzid et al. ICML 2012
0ms
...
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Easy signal Benbouzid et al. ICML 2012
0ms
...
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Easy signal Benbouzid et al. ICML 2012
0ms
...
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard signal Benbouzid et al. ICML 2012
0ms
...
B. Kégl / AppStat@LAL Learning to discover
BUDGETED CLASSIFICATION
• Classification with test-time constraints	

• An activ...
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
Challenge 1: estimation la plus précise et en un t...
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
• In a nutshell	

• A vector x of variables is ext...
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
• Exciting physics	

• The Higgs to tau-tau excess...
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
We are organizing a 

data challenge 

to answer s...
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
The formal setup
• We simulate data: D = (x1, y1, ...
B. Kégl / AppStat@LAL Learning to discover
How to design g to maximize the sensitivity?
• A two-stage approach
1. optimize...
B. Kégl / AppStat@LAL Learning to discover
How to design g to maximize the sensitivity?
• A two-stage approach
1. optimize...
B. Kégl / AppStat@LAL Learning to discover
How to design g to maximize the sensitivity?
• A two-stage approach
1. optimize...
B. Kégl / AppStat@LAL Learning to discover
How to design g to maximize the sensitivity?
• A two-stage approach (make figure...
B. Kégl / AppStat@LAL Learning to discover
Comparing with Atlas analysis
• Atlas does a manual pre-selection (category), t...
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
How to handle systematic (model) uncertainties?
• ...
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
How to handle systematic (model) uncertainties?
• ...
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
Center for Data Science
Paris-Saclay
the HiggsML c...
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
Center for Data Science
Paris-Saclay
the HiggsML c...
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
Center for Data Science
Paris-Saclay
the HiggsML c...
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
Center for Data Science
Paris-Saclay
the HiggsML c...
B. Kégl / AppStat@LAL Learning to discover
LEADERBOARD AS OF THIS MORNING
55
B. Kégl / AppStat@LAL Learning to discover
LEARNING FROM SCRATCH:
THE DEEP LEARNING REVOLUTION
Why don’t we train a neural...
B. Kégl / AppStat@LAL Learning to discover
LEARNING FROM SCRATCH:
THE DEEP LEARNING REVOLUTION
• Because it is notoriously...
B. Kégl / AppStat@LAL Learning to discover
WHAT IS THIS?
0 100 200 300 400 500 600 700 800
0
5
10
15
20
25
30
35
40
t @nsD...
B. Kégl / AppStat@LAL Learning to discover
WHAT IS THIS?
!"#$%&#"'()*#$+*,#+"-$'.*.#/.#+0/
12(#"(*-34*5#+"-$'.*/,-60"/*'$*...
B. Kégl / AppStat@LAL Learning to discover
WHAT IS THIS?
60
B. Kégl / AppStat@LAL Learning to discover
WHAT IS THIS?
61
B. Kégl / AppStat@LAL Learning to discover
0 100 200 300 400 500 600 700 800
0
5
10
15
20
25
30
35
40
t @nsD
PE
WHAT IS T...
B. Kégl / AppStat@LAL Learning to discover
WHAT IS THIS?
!"#$%&#"'()*#$+*,#+"-$'.*.#/.#+0/
12(#"(*-34*5#+"-$'.*/,-60"/*'$*...
B. Kégl / AppStat@LAL Learning to discover
MODELS
• Inference	

• if you want to be able to answer questions about observe...
B. Kégl / AppStat@LAL Learning to discover
A FORMAL MODEL
Bal´azs K´egl/LAL 3
Bal´azs K´egl/LAL 6
The observatory
Bal´azs ...
B. Kégl / AppStat@LAL Learning to discover
INFERENCE BY SAMPLING
66
B. Kégl / AppStat@LAL Learning to discover
HOW TO BUILD MODELS FOR THESE?
0 100 200 300 400 500 600 700 800
0
5
10
15
20
2...
B. Kégl / AppStat@LAL Learning to discover
LEARNING FROM SCRATCH:
THE DEEP LEARNING REVOLUTION
• Training multi-layer neur...
B. Kégl / AppStat@LAL Learning to discover
• Google passes the “purring
test” (ICML’12)	

• 16K cores watching 10M
youtube...
B. Kégl / AppStat@LAL Learning to discover
• Can we also learn physics by observing natural
phenomena?
LEARNING FROM SCRAT...
B. Kégl / AppStat@LAL Learning to discover
DEEP LEARNING FOR IMAGING
CALORIMETERS
• Collaboration with	

• Roman Poeschl (...
B. Kégl / AppStat@LAL Learning to discover
FOOD FOR THOUGHT 1:
CERN ACCELERATING SCIENCE
Data Science	

!
Design of automa...
B. Kégl / AppStat@LAL Learning to discover
Data science
statistics

machine learning

signal processing

data visualizatio...
B. Kégl / AppStat@LAL Learning to discover
You have been doing data science for a long time
Consider transferring knowledg...
B. Kégl / AppStat@LAL Learning to discover
• What is a data challenge:	

• outreach to public?	

• peer-to-peer communicat...
B. Kégl / AppStat@LAL Learning to discover
• What is a data challenge:	

• between the two: communicating (translating!) y...
B. Kégl / AppStat@LAL Learning to discover
THANK YOU!
77
B. Kégl / AppStat@LAL Learning to discover
BACKUP
78
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
How to design g to maximize the sensitivity?
• A t...
B. Kégl / AppStat@LAL Learning to discover
Comparing with the official Atlas analysis
• Atlas does a manual pre-selection, ...
B. Kégl / AppStat@LAL Learning to discover
The Higgs boson ML challenge
• Dilemma: the physically relevant AMS is optimize...
B. Kégl / AppStat@LAL Learning to discover
The Higgs boson ML challenge
• We are even nervous about the original AMS (red)...
Upcoming SlideShare
Loading in …5
×

Learning do discover: machine learning in high-energy physics

2,255 views

Published on

My slides from my CERN talk on data science and high-energy physics.

http://cds.cern.ch/record/1702668

Published in: Data & Analytics
  • Login to see the comments

Learning do discover: machine learning in high-energy physics

  1. 1. B. Kégl / AppStat@LAL Learning to discover LEARNING TO DISCOVER: MACHINE LEARNING IN HIGH-ENERGY PHYSICS Linear Accelerator Laboratory and Computer Science Laboratory CNRS/IN2P3 & University Paris-S{ud,aclay} BALÁZS KÉGL CERN, May 13, 2014 1
  2. 2. B. Kégl / AppStat@LAL Learning to discover OUTLINE • What is machine learning? • Three projects to illustrate data science in HEP • budgeted learning for triggers (LHCb) • classification for discovery and the HiggsML challenge (ATLAS) • deep learning for imaging calorimeters (ILC) • Concluding remarks • interdisciplinarity: HEP, ML, data science 2
  3. 3. B. Kégl / AppStat@LAL Learning to discover WHAT IS MACHINE LEARNING? • “The science of getting computers to act without being explicitly programmed” - Andrew Ng (Stanford/Coursera) • part of standard computer science curriculum since the 90s • inferring knowledge from data • generalizing to unseen data • usually no parametric 
 model assumptions • emphasizing the computational
 challenges Machine Learning Statistics Optimization Artificial intelligence Neuroscience Cognitive science Signal processing Information theory Statistical physics 3
  4. 4. B. Kégl / AppStat@LAL Learning to discover MACHINE LEARNING TAXONOMY • Supervised learning: non-parametric (model-free) input - output functions • classification (Trees, BDT, SVM, NN) - what you call MVA • regression (Trees, NN, Gaussian Processes) • Unsupervised learning: non-parametric data representation • clustering (k-means, spectral clustering, Dirichlet processes) • dimensionality reduction (PCA, ISOMAP, LLE, auto-associative NN) • density estimation (kernel density, Gaussian mixtures, the Boltzmann machine) • Reinforcement learning: • learning + dynamic control: learn to behave in an environment to maximize cumulative reward 4
  5. 5. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION Character recognition 5
  6. 6. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION Emotion recognition 6
  7. 7. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION Speech recognition 7
  8. 8. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION • Input: a usually high dimensional vector x • Output: a category (label, class) y • Usually no parametric model • the classification function y = g(x) is learned using a training set 
 D = {(x1,y1), . . . , (xn,yn)} • Well-tested algorithms: • neural networks, support vector machines, boosting 8
  9. 9. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION The only goal is a low probability of error P(g(x) = y) on previously unseen examples (x, y) 9
  10. 10. B. Kégl / AppStat@LAL Learning to discover BUDGETED CLASSIFICATION Real time face detection 10
  11. 11. B. Kégl / AppStat@LAL Learning to discover BUDGETED CLASSIFICATION Real time web page ranking 11
  12. 12. B. Kégl / AppStat@LAL Learning to discover BUDGETED CLASSIFICATION Real time ad placement 12
  13. 13. B. Kégl / AppStat@LAL Learning to discover BUDGETED CLASSIFICATION Real time signal/background separation 13
  14. 14. B. Kégl / AppStat@LAL Learning to discover BUDGETED CLASSIFICATION The second goal is the fast execution of g(x) 14
  15. 15. B. Kégl / AppStat@LAL Learning to discover BUDGETED CLASSIFICATION Trade-off between quality and speed 15
  16. 16. B. Kégl / AppStat@LAL Learning to discover BUDGETED CLASSIFICATION • Time constraints • Memory constraints • Consumption constraints • Communication constraints 16
  17. 17. B. Kégl / AppStat@LAL Learning to discover BUDGETED CLASSIFICATION Learning deep decision DAGs in a {djalel.benbouzid,busarobi,balazs.kegl}@ Original motivation Before... • Stage 1 Stage 2 Stage 3 Stage 4 The common design: cascade classification = trigger with levels easy background medium background hard background signal/very hard backgroundViola-Jones CVPR 2001 17
  18. 18. B. Kégl / AppStat@LAL Learning to discover THE LHCB TRIGGER • Collaboration with • Vava Gligorov (CERN) • Mike Williams (MIT) • Djalel Benbouzid (LAL) 18
  19. 19. B. Kégl / AppStat@LAL Learning to discover THE LHCB TRIGGER • A beautifully complex problem • varying feature costs • cost may depend on the value • events are bags of overlapping candidates 19
  20. 20. B. Kégl / AppStat@LAL Learning to discover THE LHCB TRIGGER Immediate cost Bag-dependent cost Value-dependent cost D0_VTX_FD PiS_IP D0C_1_IP D0C_2_IP D0C_2_PTD0C_1_PT PiS_PT D0C_2_IPC D0C_2_TFC D0C_1_IPC D0C_1_TFC PiS_IPC PiS_TFCDstMD0MD0Tau 20
  21. 21. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Easy background Background-like Signal-like EVALUATE Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC 21
  22. 22. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH QUIT Benbouzid et al. ICML 2012 Easy background Background-like Signal-like 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC 22
  23. 23. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC Easy background Benbouzid et al. ICML 201223
  24. 24. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Hard background Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC EVALUATE Background-like Signal-like 24
  25. 25. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Hard background Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC SKIP Background-like Signal-like 25
  26. 26. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Hard background Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPCEVALUATE Background-like Signal-like 26
  27. 27. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Hard background Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC EVALUATE Background-like Signal-like 27
  28. 28. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Hard background Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC EVALUATE Background-like Signal-like 28
  29. 29. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Hard background Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC EVALUATE Background-like Signal-like 29
  30. 30. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Hard background Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC SKIP Background-like Signal-like 30
  31. 31. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Hard background Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC EVALUATE Background-like Signal-like 31
  32. 32. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Hard background Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC SKIP Background-like Signal-like 32
  33. 33. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Hard background Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC SKIP SKIP SKIP SKIP Background-like Signal-like 33
  34. 34. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Easy signal Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC EVALUATE Background-like Signal-like 34
  35. 35. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Easy signal Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC QUIT Background-like Signal-like 35
  36. 36. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Easy signal Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC 36
  37. 37. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Hard signal Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC EVAL EVAL EVAL EVAL EVAL EVAL EVAL EVAL SKIP SKIP SKIP SKIP SKIP 37
  38. 38. B. Kégl / AppStat@LAL Learning to discover BUDGETED CLASSIFICATION • Classification with test-time constraints • An active research area due to IT applications • To be exploited for trigger design 38
  39. 39. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION FOR DISCOVERY Challenge 1: estimation la plus précise et en un temps CPU minimal de la masse du candidat boson de Higgs en fonction des observables de l’événement, et malgré les particules non mesurées. Précision actuelle (intégration avec chaine de Markov en dimension 5) ~20% en 0.1s par événement The HiggsML challenge 39
  40. 40. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION FOR DISCOVERY • In a nutshell • A vector x of variables is extracted from each event • A classifier g(x) is trained to separate signal from background • The background b is estimated in the selection region 
 G = {x : g(x) = s} • Discovery is made when the number of real events n is significantly higher than b 40
  41. 41. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION FOR DISCOVERY • Exciting physics • The Higgs to tau-tau excess is not yet at five sigma
 Tech. Rep.ATLAS-CONF-2013-108 • Exciting data science (statistics and machine learning) • What is the theoretical relationship between classification and test sensitivity? • What is the quantitative criteria to optimize? • How to formally include systematic uncertainties? • Can we redesign classical algorithms (boosting, SVM, neural nets) for optimizing this criteria? 41
  42. 42. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION FOR DISCOVERY We are organizing a 
 data challenge 
 to answer some of these questions Center for Data Science Paris-Saclay the HiggsML challenge May to September 2014 When High Energy Physics meets Machine Learning Joerg Stelzer - Atlas-CERN Marc Schoenauer - INRIA Balázs Kégl - Appstat-LAL Cécile Germain - TAO-LRI David Rousseau - Atlas-LAL Glen Cowan - Atlas-RHUL Isabelle Guyon - Chalearn Claire Adam-Bourdarios - Atlas-LAL Thorsten Wengler - Atlas-CERN Andreas Hoecker - Atlas-CERN Organization committee Advisory committee info to participate and compete : https://www.kaggle.com/c/higgs-boson 42
  43. 43. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION FOR DISCOVERY The formal setup • We simulate data: D = (x1, y1, w1), . . . , (xn, yn, wn) • xi 2 Rd is the feature vector • yi 2 {background, signal} is the label • wi 2 R+ is a non-negative weight (importance sampling) • let S = {i : yi = s} and B = {i : yi = b} be the index sets of signal and background events, respectively • Maximize the Approximate Median Significance G. Cowan, K. Cranmer, E. Gross, and O. Vitells. EPJ C, 71:1554, 2011. AMS = r 2 ⇣ (s + b) ln ⇣ 1 + s b ⌘ s ⌘ ⇡ s p b • bG = i : g(xi) = s • s = P i2S bG wi • b = P i2B bG wi 43
  44. 44. B. Kégl / AppStat@LAL Learning to discover How to design g to maximize the sensitivity? • A two-stage approach 1. optimize a discriminant (score) function f : Rd ! R using a classical learning algorithm (BDT, NN) CLASSIFICATION FOR DISCOVERY Signal Background 44
  45. 45. B. Kégl / AppStat@LAL Learning to discover How to design g to maximize the sensitivity? • A two-stage approach 1. optimize a discriminant (score) function f : Rd ! R using a classical learning algorithm (BDT, NN) CLASSIFICATION FOR DISCOVERY Signal Background 45
  46. 46. B. Kégl / AppStat@LAL Learning to discover How to design g to maximize the sensitivity? • A two-stage approach 1. optimize a discriminant (score) function f : Rd ! R using a classical learning algorithm (BDT, NN) CLASSIFICATION FOR DISCOVERY Signal Background 46
  47. 47. B. Kégl / AppStat@LAL Learning to discover How to design g to maximize the sensitivity? • A two-stage approach (make figure with score) 1. optimize a discriminant (score) function f : Rd ! R using a classical learning algorithm (BDT, NN) 2. define g(x) = sign f(x) ✓ and optimize ✓ for maximizing the AMS CLASSIFICATION FOR DISCOVERY θ 3.5σ 47
  48. 48. B. Kégl / AppStat@LAL Learning to discover Comparing with Atlas analysis • Atlas does a manual pre-selection (category), the first maximum of the AMS is completely eliminated. Why? CLASSIFICATION FOR DISCOVERY s = 250 b = 5000 ± 500! Systematics! 48
  49. 49. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION FOR DISCOVERY How to handle systematic (model) uncertainties? • OK, so let’s design an objective function that can take background systematics into consideration • Likelihood with unknown background b ⇠ N(µb, b) L(µs, µb) = P(n, b|µs, µb, b) = (µs + µb)n n! e (µs+µb) 1 p 2⇡ b e (b µb)2 /2 b 2 • Profile likelihood ratio (0) = L(0, ˆˆµb) L(ˆµs, ˆµb) • The new Approximate Median Significance (by Glen Cowan) AMS = s 2 ✓ (s + b) ln s + b b0 s b + b0 ◆ + (b b0)2 b 2 where b0 = 1 2 ⇣ b b 2 + p (b b 2)2 + 4(s + b) b 2 ⌘ 49
  50. 50. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION FOR DISCOVERY How to handle systematic (model) uncertainties? • The new Approximate Median Significance AMS = s 2 ✓ (s + b) ln s + b b0 s b + b0 ◆ + (b b0)2 b 2 where b0 = 1 2 ⇣ b b 2 + p (b b 2)2 + 4(s + b) b 2 ⌘ New AMS ATLAS Old AMS 50
  51. 51. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION FOR DISCOVERY Center for Data Science Paris-Saclay the HiggsML challenge May to September 2014 When High Energy Physics meets Machine Learning Joerg Stelzer - Atlas-CERN Marc Schoenauer - INRIA Balázs Kégl - Appstat-LAL Cécile Germain - TAO-LRI David Rousseau - Atlas-LAL Glen Cowan - Atlas-RHUL Isabelle Guyon - Chalearn Claire Adam-Bourdarios - Atlas-LAL Thorsten Wengler - Atlas-CERN Andreas Hoecker - Atlas-CERN Organization committee Advisory committee info to participate and compete : https://www.kaggle.com/c/higgs-boson A tool for getting 
 the ML community excited about your problem OPEN since yesterday 51
  52. 52. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION FOR DISCOVERY Center for Data Science Paris-Saclay the HiggsML challenge May to September 2014 When High Energy Physics meets Machine Learning Joerg Stelzer - Atlas-CERN Marc Schoenauer - INRIA Balázs Kégl - Appstat-LAL Cécile Germain - TAO-LRI David Rousseau - Atlas-LAL Glen Cowan - Atlas-RHUL Isabelle Guyon - Chalearn Claire Adam-Bourdarios - Atlas-LAL Thorsten Wengler - Atlas-CERN Andreas Hoecker - Atlas-CERN Organization committee Advisory committee info to participate and compete : https://www.kaggle.com/c/higgs-boson • Organizing committee • David Rousseau (ATLAS / LAL) • Balázs Kégl (AppStat / LAL) • Cécile Germain (LRI / UPSud) • Glen Cowan (ATLAS / Royal Holloway) • Claire Adam Bourdarios (ATLAS / LAL) • Isabelle Guyon (ChaLearn) 52
  53. 53. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION FOR DISCOVERY Center for Data Science Paris-Saclay the HiggsML challenge May to September 2014 When High Energy Physics meets Machine Learning Joerg Stelzer - Atlas-CERN Marc Schoenauer - INRIA Balázs Kégl - Appstat-LAL Cécile Germain - TAO-LRI David Rousseau - Atlas-LAL Glen Cowan - Atlas-RHUL Isabelle Guyon - Chalearn Claire Adam-Bourdarios - Atlas-LAL Thorsten Wengler - Atlas-CERN Andreas Hoecker - Atlas-CERN Organization committee Advisory committee info to participate and compete : https://www.kaggle.com/c/higgs-boson • Official ATLAS GEANT4 simulations • 30 features (variables) • 250K training: input, label, weight • 100K public test (AMS displayed real-time), only input • 450K private test (to determine the winner after the closing of the challenge), only input • public and private tests set are shuffled, participants submit a vector of 550K labels • Using the “old” AMS • cannot compare participants 
 if the metric varies a lot
 53
  54. 54. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION FOR DISCOVERY Center for Data Science Paris-Saclay the HiggsML challenge May to September 2014 When High Energy Physics meets Machine Learning Joerg Stelzer - Atlas-CERN Marc Schoenauer - INRIA Balázs Kégl - Appstat-LAL Cécile Germain - TAO-LRI David Rousseau - Atlas-LAL Glen Cowan - Atlas-RHUL Isabelle Guyon - Chalearn Claire Adam-Bourdarios - Atlas-LAL Thorsten Wengler - Atlas-CERN Andreas Hoecker - Atlas-CERN Organization committee Advisory committee info to participate and compete : https://www.kaggle.com/c/higgs-boson • 16K$ prize pool • 7-4-2K$ for the three top participants • HEP meets ML award for the most useful model, decided by the ATLAS members of the organizing committee 54
  55. 55. B. Kégl / AppStat@LAL Learning to discover LEADERBOARD AS OF THIS MORNING 55
  56. 56. B. Kégl / AppStat@LAL Learning to discover LEARNING FROM SCRATCH: THE DEEP LEARNING REVOLUTION Why don’t we train a neural network on the raw ~105-108 dimensional signal of ATLAS?
 56
  57. 57. B. Kégl / AppStat@LAL Learning to discover LEARNING FROM SCRATCH: THE DEEP LEARNING REVOLUTION • Because it is notoriously difficult to automatically build a model 
 = learn particle physics just by looking at the event browser • Again, you are not alone: it is also notoriously difficult to automatically build the model of natural scenes
 = learn a model of our surroundings by just looking at images • In the last 5-10 years, we are getting close • May be interesting if you do not know what you are looking for 
 57
  58. 58. B. Kégl / AppStat@LAL Learning to discover WHAT IS THIS? 0 100 200 300 400 500 600 700 800 0 5 10 15 20 25 30 35 40 t @nsD PE 58
  59. 59. B. Kégl / AppStat@LAL Learning to discover WHAT IS THIS? !"#$%&#"'()*#$+*,#+"-$'.*.#/.#+0/ 12(#"(*-34*5#+"-$'.*/,-60"/*'$*(,0*2'7*8.#& 9-:;&0<*#$+*':;"0//'=0 >$0&#/('.*"0#.('-$*'$*2'7*8.#& >$(0"#.('-$ >$'('#&*?'-$ @%(A-'$A* B"#A:0$(/ 2.#((0"0+*?'-$ 8C0.(0+*D%.&0-$ 2':;&0*E%(*$'.0 2,-"(*("%$.#(0+*/,-60"/ 59
  60. 60. B. Kégl / AppStat@LAL Learning to discover WHAT IS THIS? 60
  61. 61. B. Kégl / AppStat@LAL Learning to discover WHAT IS THIS? 61
  62. 62. B. Kégl / AppStat@LAL Learning to discover 0 100 200 300 400 500 600 700 800 0 5 10 15 20 25 30 35 40 t @nsD PE WHAT IS THIS? 62
  63. 63. B. Kégl / AppStat@LAL Learning to discover WHAT IS THIS? !"#$%&#"'()*#$+*,#+"-$'.*.#/.#+0/ 12(#"(*-34*5#+"-$'.*/,-60"/*'$*(,0*2'7*8.#& 9-:;&0<*#$+*':;"0//'=0 >$0&#/('.*"0#.('-$*'$*2'7*8.#& >$(0"#.('-$ >$'('#&*?'-$ @%(A-'$A* B"#A:0$(/ 2.#((0"0+*?'-$ 8C0.(0+*D%.&0-$ 2':;&0*E%(*$'.0 2,-"(*("%$.#(0+*/,-60"/63
  64. 64. B. Kégl / AppStat@LAL Learning to discover MODELS • Inference • if you want to be able to answer questions about observed phenomena, you need a model • if you want quantitative answers, you need a formal model • Formal setup • x: observation vector, Θ: parameter vector to infer • likelihood: p(x | Θ) • simulator: given Θ, generate a random x 64
  65. 65. B. Kégl / AppStat@LAL Learning to discover A FORMAL MODEL Bal´azs K´egl/LAL 3 Bal´azs K´egl/LAL 6 The observatory Bal´azs K´egl / LAL 4 energy, direction, mass X0, HEP XMax, NMuMax, NMuTotal, LEP LDF, asymmetry, S1000, S38, NMu1000 S, t0, risetime, jumps Bal´azs K´egl/LAL Cherenkov light Bal´azs K´egl/LAL 12 PEs given ideal response name notation unit expected number of PEs in bin i ¯ni unitless number of PEs in bin i ni unitless 0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 0 2 4 6 8 10 12 14 16 18 20 22 24 26 t ns⇥ PE AL 11 Ideal muon response name notation unit signal decay time ⇤ ns signal risetime td ns muon arrival time tµ ns muon tracklength Lµ m muon energy factor µ unitless avg number of PEs per 1 m tracklength ⇥ m 1 L⌅⇤⌅⇧t⇥ 1⌥ td ⌃ td L⌅ ⇤⌅ ⇧ ⌃td t⌅ 0 25 50 75 100 125 150 175 200 0 2 4 6 8 10 12 14 16 18 20 22 24 26 t ns⇥ PE p(x | t) 0 100 200 300 400 500 600 700 800 0 5 10 15 20 25 30 35 40 t @nsD PE p(x | t1,...t4) p(t | Θ) 65
  66. 66. B. Kégl / AppStat@LAL Learning to discover INFERENCE BY SAMPLING 66
  67. 67. B. Kégl / AppStat@LAL Learning to discover HOW TO BUILD MODELS FOR THESE? 0 100 200 300 400 500 600 700 800 0 5 10 15 20 25 30 35 40 t @nsD PE !"#"$%&'()*+,+)-,) , !"#$%&#"'()*#$+*,#+"-$'.*.#/.#+0/ 12(#"(*-34*5#+"-$'.*/,-60"/*'$*(,0*2'7*8.#& 9-:;&0<*#$+*':;"0//'=0 >$0&#/('.*"0#.('-$*'$*2'7*8.#& >$(0"#.('-$ >$'('#&*?'-$ @%(A-'$A* B"#A:0$(/ 2.#((0"0+*?'-$ 8C0.(0+*D%.&0-$ 2':;&0*E%(*$'.0 2,-"(*("%$.#(0+*/,-60"/ 5'A,*A"#$%&#"'()*;0":'(/*+0(#'&0+*='06*'$(-*,#+"-$'.*/,-60" π 67
  68. 68. B. Kégl / AppStat@LAL Learning to discover LEARNING FROM SCRATCH: THE DEEP LEARNING REVOLUTION • Training multi-layer neural networks • biological inspiration: we know the brain is multi-layer • appealing from a modeling point of view: abstraction increases with depth • notoriously difficult to train until Hinton et al. (stacked RBMs) and Bengio et al. (stacked autoencoders), around 2006 • the key principle is (was?) unsupervised pre-training • they remain computationally very expensive, but they learn high- level (abstract) features and they scale: with more data they learn more 68
  69. 69. B. Kégl / AppStat@LAL Learning to discover • Google passes the “purring test” (ICML’12) • 16K cores watching 10M youtube stills for 3 days • completely unsupervised: the cat has just appeared as a useful concept to represent LEARNING FROM SCRATCH: THE DEEP LEARNING REVOLUTION 69
  70. 70. B. Kégl / AppStat@LAL Learning to discover • Can we also learn physics by observing natural phenomena? LEARNING FROM SCRATCH: THE DEEP LEARNING REVOLUTION !"#$%&#"'()*#$+*,#+"-$'.*.#/.#+0/ 12(#"(*-34*5#+"-$'.*/,-60"/*'$*(,0*2'7*8.#& 9-:;&0<*#$+*':;"0//'=0 >$0&#/('.*"0#.('-$*'$*2'7*8.#& >$(0"#.('-$ >$'('#&*?'-$ @%(A-'$A* B"#A:0$(/ 2.#((0"0+*?'-$ 8C0.(0+*D%.&0-$ 2':;&0*E%(*$'.0 2,-"(*("%$.#(0+*/,-60"/70
  71. 71. B. Kégl / AppStat@LAL Learning to discover DEEP LEARNING FOR IMAGING CALORIMETERS • Collaboration with • Roman Poeschl (ILC / LAL) • Naomi van der Kolk (ILC / LAL) • Sviatoslav Bilokin (ILC / LAL) • Mehdi Cherti (AppStat / LAL) • Trong Hieu Tran (ILC / LLR) • Vincent Boudry (ILC / LLR) 71
  72. 72. B. Kégl / AppStat@LAL Learning to discover FOOD FOR THOUGHT 1: CERN ACCELERATING SCIENCE Data Science ! Design of automated methods to analyze massive and complex data in order to extract useful information from them 72
  73. 73. B. Kégl / AppStat@LAL Learning to discover Data science statistics
 machine learning
 signal processing
 data visualization
 databases Tool building software engineering
 clouds/grids
 high-performance computing
 optimization Domain science life brain earth
 universe human society FOOD FOR THOUGHT 1: CERN ACCELERATING SCIENCE 73
  74. 74. B. Kégl / AppStat@LAL Learning to discover You have been doing data science for a long time Consider transferring knowledge 
 to other sciences on 
 how to organize large scientific projects around data FOOD FOR THOUGHT 1: CERN ACCELERATING SCIENCE 74
  75. 75. B. Kégl / AppStat@LAL Learning to discover • What is a data challenge: • outreach to public? • peer-to-peer communication? FOOD FOR THOUGHT II: CERN ACCELERATING SCIENCE 75
  76. 76. B. Kégl / AppStat@LAL Learning to discover • What is a data challenge: • between the two: communicating (translating!) your technical problems to another scientific community which might have solutions (and know-how to design solutions) for you • think about building formal channels (as you did for outreach and “classical” publications) FOOD FOR THOUGHT II: CERN ACCELERATING SCIENCE 76
  77. 77. B. Kégl / AppStat@LAL Learning to discover THANK YOU! 77
  78. 78. B. Kégl / AppStat@LAL Learning to discover BACKUP 78
  79. 79. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION FOR DISCOVERY How to design g to maximize the sensitivity? • A two-stage approach 1. optimize a discriminant function f : Rd ! R for balanced AUC or balanced classification error: N0 s = N0 b = 0.5 Tree, N = 2., T = 100000 1 10 100 1000 104 0.00 0.05 0.10 0.15 0.20 0.25 T balancederror AdaBoost learning curves 79
  80. 80. B. Kégl / AppStat@LAL Learning to discover Comparing with the official Atlas analysis • Atlas does a manual pre-selection, the first maximum of the AMS is completely eliminated. Why? Have we found something, or they have an implicit reason? • No we haven’t, yes they have • µb has a ⇠ 10% relative systematic uncertainty: the 300 signals are completely submerged by the = 600 background systematics AMS = 1s AMS = 2s AMS = 3s AMS = 4s AMS = 5s 0 2000 4000 6000 8000 10000 12000 0 100 200 300 400 500 b HFPL sHTPL Unnormalized ROC 1 / 1 80
  81. 81. B. Kégl / AppStat@LAL Learning to discover The Higgs boson ML challenge • Dilemma: the physically relevant AMS is optimized in a tiny region • the AMS has a high variance: a bad measure to compare the participants • in the Atlas analysis, there was 1 between the expected and measured significances 1 / 1 81
  82. 82. B. Kégl / AppStat@LAL Learning to discover The Higgs boson ML challenge • We are even nervous about the original AMS (red), so we regularize it AMS = s 2 ✓ (s + b + breg) ln ✓ 1 + s b + breg ◆ s ◆ with breg = 10 1 / 1 82

×