Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
BIG 
DATA 
COMPETITION: 
MAXIMIZING 
YOUR 
POTENTIAL 
EXAMPLED 
WITH 
THE 
2014 
HIGGS 
BOSON 
MACHINE 
LEARNING 
CHALLENG...
PRESENTER 
Ohio State University, Tongji University 
Ph.D. Civil Engineering 
M.S. Applied Statistics 
Minor Computer Scie...
HIGGS 
BOSON 
MACHINE 
LEARNING 
CHALLENGE 
• Goal: improve the procedure that produces the selection region of Higgs Boso...
Background 
4 
Data 
Model 
Understand 
read 
visualize 
read 
discuss 
Explore Enhance 
reduce 
generate 
cross 
validate...
Background 
5 
Data 
Model 
Understand 
read 
visualize 
Explore Enhance 
reduce 
generate 
find 
Train Select Optimize 
i...
READ 
AND 
DISCUSS 
6
• a.k.a 
HIGGS 
BOSON 
the 
God 
Particle 
(explains 
some 
mass) 
• A 
fundamental 
particle 
theorized 
in 
1964 
in 
th...
CERN: 
THE 
EUROPEAN 
ORGANIZATION 
• Established 
FOR 
NUCLEAR 
RESEARCH 
in 
1954 
• Birth 
of 
World 
Wide 
Web 
(1989)...
• 27 
LARGE 
HADRON 
COLLIDER 
(LHC) 
km 
(17 
mi) 
in 
circumference 
• 175 
meters 
(574 
ft) 
beneath 
ground 
• Built ...
• 46 
meters 
long 
• 25 
meters 
in 
diameter 
• Weighs 
about 
7,000 
tonnes 
• Contains 
some 
3000 
km 
of 
cable 
• I...
• 46 
meters 
long 
• 25 
meters 
in 
diameter 
• Weighs 
about 
7,000 
tonnes 
• Contains 
some 
3000 
km 
of 
cable 
• I...
• 46 
meters 
long 
• 25 
meters 
in 
diameter 
• Weighs 
about 
7,000 
tonnes 
• Contains 
some 
3000 
km 
of 
cable 
• I...
• Higgs 
CHALLENGES 
IN 
DETECTION 
OF 
HIGGS 
BOSON 
Boson 
can 
not 
be 
measured 
directly 
(decays 
immediately 
into ...
CURRENT 
DETECTION 
MECHANISM 
• Raw 
data 
collected 
from 
LHC 
• Hundreds 
of 
millions 
of 
proton-­‐proton 
collision...
SIMPLIFICATIONS 
FOR 
COMPETITION 
• Simulated 
Data 
• Fixed 
mass 
(125 
GeV) 
• Simplified 
decay 
channel 
– Next 
Sli...
• Decay 
SIMPLIFIED 
DECAY 
CHANNEL 
of 
Tau-­‐Tau 
Channel 
only 
• One 
tau 
decays 
into 
lepton 
and 
two 
neutrino 
•...
• Decay 
SIMPLIFIED 
DECAY 
CHANNEL 
of 
Tau-­‐Tau 
Channel 
only 
• One 
tau 
decays 
into 
lepton 
and 
two 
neutrino 
•...
• Decay 
SIMPLIFIED 
DECAY 
CHANNEL 
of 
Tau-­‐Tau 
Channel 
only 
• One 
tau 
decays 
into 
lepton 
and 
two 
neutrino 
•...
Background 
19 
Data 
Model 
Understand 
read 
visualize 
Explore Enhance 
reduce 
generate 
find 
Train Select Optimize 
...
• 250,000 
training 
• 550,000 
testing 
• 30 
variables 
– 17 
Primitive 
• Momenta 
• Direction 
– 13 
Derived 
DATA 
DI...
MISSING 
VALUES 
21 
col_name NA_count 
NA_pct 
1 EventId 
2 DER_mass_MMC 
38,114 
15% 
3 DER_mass_transverse_met_lep 
4 D...
MISSING 
VALUES 
22 
col_name NA_count 
NA_pct 
1 EventId 
2 DER_mass_MMC 
38,114 
15% 
3 DER_mass_transverse_met_lep 
4 D...
HOW 
TO 
HANDLE 
MISSING 
VALUES 
• Assign 
a 
value 
– Generate 
a 
random 
value 
– Fit 
a 
value 
(mean, 
median, 
near...
HOW 
TO 
HANDLE 
MISSING 
VALUES 
• Assign 
a 
value 
– Generate 
a 
random 
value 
– Fit 
a 
value 
(mean, 
median, 
near...
HISTOGRAM 
PRI_jet_leading_pt 
Count 
Log 
transformation 
Count 
Inverse 
transformation 
Count 
Density 
is 
more 
meani...
HISTOGRAM 
(CONT’D) 
DER_pt_h 
Count 
Log 
transformation 
Bi-­‐modality 
is 
revealed 26 
Count 
Inverse 
transformation ...
INTERACTIVE 
VISUALIZATION 
R 
SHINY 
27 
http://chencheng.shinyapps.DEMO io/demo_higgs
INTERACTIVE 
VISUALIZATION 
R 
SHINY 
28 
http://chencheng.shinyapps.DEMO io/demo_higgs
INTERACTIVE 
VISUALIZATION 
R 
SHINY 
29 
Use 
a 
reasonable 
number 
of 
bins 
to 
display 
the 
underlying 
distribution...
INTERACTIVE 
VISUALIZATION 
R 
SHINY 
30 
Use 
a 
reasonable 
transformation 
to 
display 
the 
underlying 
distribution 
...
HISTOGRAM 
(CONT’D) 
31 
Count 
Transformations 
aPrReI 
_stoamu_eetitma es 
not 
necessary
32 
Do 
that 
for 
all 
30 
variables
PAIRWISE 
CORRELATIONS 
33 
Count 
Count 
BKG 
SGN 
PRI_lep_phi 
& 
PRI_met_phi
PAIRWISE 
CORRELATIONS 
34 
Count 
BKG 
SGN 
PRI_lep_phi 
& 
PRI_met_phi 
Set 
transparency 
parameter 
appropriately 
to ...
PAIRWISE 
CORRELATIONS 
35 
Count 
BKG 
SGN 
PRI_lep_phi 
& 
PRI_met_phi 
Correlation 
coefficient 
== 
0 
does 
not 
mean...
PAIRWISE 
CORRELATIONS 
36 
Count 
Count 
BKG 
SGN 
PRI_lep_phi 
& 
PRI_met_phi
FEATURE 
ENHANCEMENT 
ROTATION 
BKG 
SGN 
rotated 
PRI_lep_phi 
& 
PRI_met_phi 
Validate 
visual 
“evidence” 
from 
variou...
FEATURE 
ENHANCEMENT 
ROTATION 
BKG 
SGN 
rotated 
PRI_lep_phi 
& 
PRI_met_phi 
Validate 
visual 
“evidence” 
from 
variou...
PAIRWISE 
VARIABLES 
— 
LOW 
RES. 
39 
Count 
Count 
BKG 
SGN 
DER_pt_h 
& 
DER_deltar_tau_lep
PAIRWISE 
VARIABLES 
— 
HIGH 
RES. 
Try 
High 
Resolution 40 
Count 
Count 
BKG 
SGN 
DER_pt_h 
& 
DER_deltar_tau_lep
PAIRWISE 
VARIABLES 
— 
HIGH 
RES. 
Curve 
fitting 41 
Count 
Count 
BKG 
SGN 
DER_pt_h 
& 
DER_deltar_tau_lep
FEATURE 
ENHANCEMENT 
CURVE 
FITTING 
Enhance 
a 
variable 
based 
on 
correlation 
with 
another 
variable 42 
Count 
Cou...
FEATURE 
ENHANCEMENT 
ROTATION 
BY 
PRI_TAU_PHI 
43 
Domain 
Knowledge 
Count 
Count 
BKG 
SGN 
DER_pt_h 
& 
PRI_lep_phi
FEATURE 
ENHANCEMENT 
ROTATION 
BY 
PRI_TAU_PHI 
Feature 
enhancement 
by 
applying 
domain 
knowledge 
44 
Count 
Count 
...
FEATURE 
ENHANCEMENT 
ROTATION 
45 
Count 
Count 
BKG 
SGN 
PRI_jet_leading_eta 
& 
PRI_jet_subleading_eta
• Select 
DATA 
DRILL 
DOWN 
variable(s): 
One 
var. 
for 
histogram, 
two 
var. 
for 
scatter 
plot 
46 
http://chencheng...
• Dynamically 
DATA 
DRILL 
DOWN 
select 
a 
subset 
of 
data 
— 
PRI_jet_num 
= 
2 
47 
http://chencheng.shinyapps.DEMO i...
• Patterns 
DATA 
DRILL 
DOWN 
in 
the 
subset 
data 
— 
PRI_jet_leading_eta 
& 
PRI_jet_subleading_eta 
48 
http://chench...
• Dynamically 
DATA 
DRILL 
DOWN 
select 
a 
subset 
of 
data 
— 
PRI_jet_num 
= 
3 
49 
http://chencheng.shinyapps.DEMO i...
• Patterns 
DATA 
DRILL 
DOWN 
in 
the 
subset 
data 
— 
PRI_jet_leading_eta 
& 
PRI_jet_subleading_eta 
50 
http://chench...
• Patterns 
DATA 
DRILL 
DOWN 
in 
the 
subset 
data 
— 
PRI_jet_leading_eta 
& 
PRI_jet_subleading_eta 
51 
PRI_jet_num 
...
52 
Do 
that 
for 
all 
30 
* 
29 
~= 
900 
pairs
PARTICLE 
LOCATION 
— 
(0, 
S) 
53 
Animation 
Convert 
numerical 
data 
back 
into 
actual 
object 
with 
meaning
PARTICLE 
LOCATION 
— 
(0, 
B) 
54 
Animation
INSPIRATION 
FROM 
ANIMATION 
• Distance 
ratio 
between 
MET-­‐Lep 
and 
Tau-­‐Lep 
d(MET, 
Lep)/d(Tau, 
Lep) 
55 
Inspir...
INSPIRATION 
FROM 
ANIMATION 
• Distance 
ratio 
between 
MET-­‐Lep 
and 
Tau-­‐Lep 
d(MET, 
Lep)/d(Tau, 
Lep) 
BKG 
SGN 
...
• Variable 
reduction 
– Simple 
rotation 
– Transformation 
– Domain 
knowledge 
– … 
• Feature 
generation 
– Domain 
kn...
Background 
58 
Data 
Model 
Understand 
read 
visualize 
read 
discuss 
Explore Enhance 
reduce 
generate 
apply innovate...
• Gradient 
boosting 
tree 
• Neural 
network 
• Bayesian 
network 
• Support 
vector 
machine 
• Generalized 
additive 
m...
• Gradient 
boosting 
tree 
• Neural 
network 
• Bayesian 
network 
• Support 
vector 
machine 
• Generalized 
additive 
m...
• Decision 
GRADIENT 
BOOSTING 
TREE 
tree 
– Build 
many 
shallow 
trees 
• Boosting 
– Build 
trees 
based 
on 
residual...
• Decision 
GRADIENT 
BOOSTING 
TREE 
tree 
– Build 
many 
shallow 
trees 
• Boosting 
– Build 
trees 
based 
on 
residual...
• Regression 
tree 
DECISION 
TREE 
63 
1.0 
0.5 
0.0 
−0.5 
−1.0 
0.0 2.5 5.0 7.5 10.0 
x 
y
• Regression 
tree 
DECISION 
TREE 
64 
1.0 
0.5 
0.0 
−0.5 
−1.0 
0.0 2.5 5.0 7.5 10.0 
x 
y 
Depth 
= 
1 
| 
x< 6.614 
x...
• Regression 
tree 
DECISION 
TREE 
65 
0.19 
n=100 
| 
x< 6.614 
x>=6.614 
x>=3.049 x>=8.953 
x< 3.049 x< 8.953 
−0.08 
n...
• Regression 
tree 
DECISION 
TREE 
66 
| 
x< 6.614 
x>=3.049 
x< 5.862 
x>=8.953 
x< 7.207 
x>=6.614 
x< 3.049 
x>=5.862 ...
• Regression 
tree 
DECISION 
TREE 
67 
| 
x< 6.614 
x>=3.049 
x< 5.862 
x>=3.594 
x>=8.953 
x< 7.207 
x>=6.614 
x< 3.049 ...
DECISION 
TREE 
X0 
= 
X; 
Y0 
= 
Y; 
latest_model 
= 
train_tree(X, 
Y); 
for 
ii 
= 
1:NUM_ITER 
Index_train 
= 
random(...
GRADIENT 
BOOSTING 
TREE 
(V. 
1) 
X0 
= 
X; 
Y0 
= 
Y; 
latest_model 
= 
train_tree(X, 
Y); 
for 
ii 
= 
1:NUM_ITER 
Inde...
(STOCHASTIC) 
GRADIENT 
BOOSTING 
TREE 
X0 
= 
X; 
Y0 
= 
Y; 
latest_model 
= 
train_tree(X, 
Y); 
for 
ii 
= 
1:NUM_ITER ...
(STOCHASTIC) 
GRADIENT 
BOOSTING 
TREE 
WITH 
WEIGHT 
X0 
= 
X; 
Y0 
= 
Y; 
latest_model 
= 
train_tree(X, 
Y, 
wts); 
for...
(GENERAL) 
GRADIENT 
BOOSTING 
X0 
= 
X; 
Y0 
= 
Y; 
latest_model 
= 
train_base_model(X, 
Y, 
wts); 
for 
ii 
= 
1:NUM_IT...
Background 
73 
Data 
Model 
Understand 
read 
visualize 
read 
discuss 
Explore Enhance 
reduce 
generate 
apply innovate...
APPLYING 
GBM 
IN 
R 
gbm_model 
= 
gbm.fit( 
x=train[,x_vars, 
with 
= 
FALSE], 
y=train$Label, 
distribution 
= 
char_di...
VARIABLE 
IMPORTANCE 
75 
Relative 
Importance
APPLY 
MODEL 
ON 
TEST 
DATA 
76 
EventId Score RankOrder Class 
1 0.98 501 s 
2 0.42 259,579 b 
3 0.46 264,125 b 
. . . ....
Background 
77 
Data 
Model 
Understand 
read 
visualize 
read 
discuss 
Explore Enhance 
reduce 
generate 
apply innovate...
GRADIENT 
BOOSTING 
PARAMETERS 
• Number 
of 
iteration 
• Minimum 
observation 
for 
each 
node 
• Fraction 
of 
bagging ...
Background 
79 
Data 
Model 
Understand 
read 
visualize 
read 
discuss 
Explore Enhance 
reduce 
generate 
apply innovate...
• Split 
training 
data 
– 70% 
CROSS 
VALIDATION 
for 
training 
– 30% 
for 
cross 
validation 
• Train 
model 
(70%) 
• ...
PERFORMANCE 
BASED 
ON 
AMS 
81 
Trade-­‐off 
between: 
Ratio 
of 
Signal/Background 
events 
Number 
of 
records 
in 
sel...
PERFORMANCE 
BASED 
ON 
AMS 
82 
Percentile 
AMS 
AMS 
percentage 
of 
signal
COMPARE 
TWO 
MODEL 
RESULTS 
Percentile 
83 
Training 
Cross 
validation 
Percentile 
AMS 
AMS 
percentage 
of 
signal
Percentile 
84 
COMPARE 
TWO 
MODEL 
RESULTS 
Training 
Cross 
validation 
Percentile 
AMS 
AMS 
percentage 
of 
signal
AMS 
BY 
NUM. 
ITERATION 
85 
Percentile 
AMS 
Animation
Background 
86 
Data 
Model 
Understand 
read 
visualize 
read 
discuss 
Explore Enhance 
reduce 
generate 
apply innovate...
s 
b 
>> 
4 
HEAT 
MAP 
OF 
AMS 
ON 
B-­‐S 
PLAN 
87
OPTIMIZATION 
BASED 
ON 
OBJECTIVE 
FUNCTION 
Percentile 
88 
A 
B 
C 
AMS
HEAT 
MAP 
OF 
AMS 
ON 
B-­‐S 
PLAN 
89 
s 
b 
A 
B 
C
HEAT 
MAP 
OF 
AMS 
ON 
B-­‐S 
PLAN 
90 
s 
b 
A 
B 
C 
Inspiration 
from 
Lagrangian 
Method 
Weight 
signal 
and 
backgr...
AMS 
CURVE 
ON 
B-­‐S 
PLAN 
91 
A 
B 
C 
Inspiration 
from 
Lagrangian 
Method 
Weight 
signal 
and 
background 
events 
...
IMPROVEMENT 
DUE 
TO 
WEIGHTING 
92 
AMS* 
Num_Iterations 
AMS
IMPROVEMENT 
DUE 
TO 
WEIGHTING 
(CONT’D) 
93 
Num_Iterations 
AMS* 
AMS
AUGMENTED 
GRADIENT 
BOOSTING 
94 
Apply 
GBM 
Weight 
Adjustment 
©
AUGMENTED 
GRADIENT 
BOOSTING 
95 
Apply 
GBM 
Weight 
Adjustment 
Remove 
very 
high 
and 
very 
low 
score 
records 
fro...
IMPROVEMENT 
DUE 
TO 
ELIMINATION 
96 
Num_Iterations 
AMS* 
AMS
IMPROVEMENT 
DUE 
TO 
ELIMINATION 
(CONT’D) 
97 
Num_Iterations 
AMS* 
AMS
AUGMENTED 
GRADIENT 
BOOSTING 
98 
Apply 
ML 
Model 
Weight 
Adjustment 
Remove 
very 
high 
and 
very 
low 
score 
record...
Background 
99 
Data 
Model 
Understand 
read 
visualize 
read 
discuss 
Explore Enhance 
reduce 
generate 
apply innovate...
• Version 
OTHER 
TOPICS 
control 
(Git, 
Source 
Tree) 
– Effectively 
implement 
many 
different 
ideas 
• File 
organiz...
Thanks 
you 
for 
your 
participation! 
Any 
Questions? 
goDCI.com
Upcoming SlideShare
Loading in …5
×

Big Data Competition: maximizing your potential
 exampled with the 2014 Higgs Boson Machine Learning Challenge

1,960 views

Published on

The Higgs Boson Machine Learning Challenge is, by far, one of the biggest big data competitions focusing on data analysis in the world. To be successful in such a competition, Cheng applied his knowledge in Computer Science, Mathematics, Statistics, and Physics, while his problem solving habit is developed during his training in Civil Engineering.

In this presentation, Cheng will use his experience in this competition to illustrate some important elements in big data analytics and why they are important. The content of the presentation covers different disciplines such as physics, statistics, and mathematics. But no background knowledge of these areas are required to understand the essence of the presentation.

In brief, the presentation covers the following content:
An effective framework for general data mining projects,
Introduction of the competition and its related physics background,
Various techniques in data exploring and some traps to avoid,
Various ways of feature enhancement,
Model building and selection, and
Optimization of model performance

Published in: Data & Analytics
  • Login to see the comments

Big Data Competition: maximizing your potential
 exampled with the 2014 Higgs Boson Machine Learning Challenge

  1. 1. BIG DATA COMPETITION: MAXIMIZING YOUR POTENTIAL EXAMPLED WITH THE 2014 HIGGS BOSON MACHINE LEARNING CHALLENGE Dr. Cheng CHEN email: cchen@goDCI.com twitter: @cheng_chen_us Development Consulting International LLC goDCI.com this presentation is copyright protected © 1
  2. 2. PRESENTER Ohio State University, Tongji University Ph.D. Civil Engineering M.S. Applied Statistics Minor Computer Science Advanced trainings: City and Regional Planning Industrial and Systems Engineering Mathematics Passion: (this) machine learning 2
  3. 3. HIGGS BOSON MACHINE LEARNING CHALLENGE • Goal: improve the procedure that produces the selection region of Higgs Boson • 4 Month Duration • 1,785 teams • Many machine learning experts, statisticians, and physicist • Top 5 are from 5 different countries 3 Hungary Netherlands France Russia http://www.kaggle.com/c/higgs-­‐boson/leaderboard U.S.A/China
  4. 4. Background 4 Data Model Understand read visualize read discuss Explore Enhance reduce generate cross validate innovate find Train Select Optimize Validate apply fine-­‐tune ©
  5. 5. Background 5 Data Model Understand read visualize Explore Enhance reduce generate find Train Select Optimize innovate read discuss Validate apply fine-­‐tune cross validate ©
  6. 6. READ AND DISCUSS 6
  7. 7. • a.k.a HIGGS BOSON the God Particle (explains some mass) • A fundamental particle theorized in 1964 in the Standard Model of Particle Physics • “Considered” discovered in 2011 – 2013 in LHC by CERN • A number of prestigious awards in 2013, including a Nobel prize 7 A "definitive" answer might require "another few years" after the collider's 2015 restart. deputy chair of physics at Brookhaven National Laboratory http://en.wikipedia.org/wiki/Higgs_boson http://upload.wikimedia.org/wikipedia/commons/0/00/Standard_Model_of_Elementary_Particles.svg
  8. 8. CERN: THE EUROPEAN ORGANIZATION • Established FOR NUCLEAR RESEARCH in 1954 • Birth of World Wide Web (1989) 8 maps.google.com
  9. 9. • 27 LARGE HADRON COLLIDER (LHC) km (17 mi) in circumference • 175 meters (574 ft) beneath ground • Built from 1998 to 2008 • Over 10,000 scientists and engineers • Over 100 counties • Seven particle detectors https://www.llnl.gov/news/llnl-­‐set-­‐host-­‐international-­‐lattice-­‐physics-­‐conference 9 http://en.wikipedia.org/wiki/Large_Hadron_Collider http://en.wikipedia.org/wiki/Large_Hadron_Collider
  10. 10. • 46 meters long • 25 meters in diameter • Weighs about 7,000 tonnes • Contains some 3000 km of cable • Involves ATLAS roughly 3,000 physicists from over 175 institutions in 38 countries. 10 http://en.wikipedia.org/wiki/Large_Hadron_Collider http://higgsml.lal.in2p3.fr/documentation/
  11. 11. • 46 meters long • 25 meters in diameter • Weighs about 7,000 tonnes • Contains some 3000 km of cable • Involves ATLAS roughly 3,000 physicists from over 175 institutions in 38 countries. 11 http://en.wikipedia.org/wiki/Large_Hadron_Collider http://higgsml.lal.in2p3.fr/documentation/
  12. 12. • 46 meters long • 25 meters in diameter • Weighs about 7,000 tonnes • Contains some 3000 km of cable • Involves ATLAS roughly 3,000 physicists from over 175 institutions in 38 countries. 12 http://en.wikipedia.org/wiki/Large_Hadron_Collider http://higgsml.lal.in2p3.fr/documentation/
  13. 13. • Higgs CHALLENGES IN DETECTION OF HIGGS BOSON Boson can not be measured directly (decays immediately into lighter particles) • Other particles can decay into the same set of lighter particles • PRODUCTION and DECAY of Higgs Boson depends on the mass, while mass was not predicted by theory (now we know it is close to 125 Gev) 13 Seeing a circular shaped shadow does not mean the real object is a sphere ball https://www2.physics.ox.ac.uk/sites/default/files/2012-­‐03-­‐27/sinead_farrington_pdf_17376.pdf
  14. 14. CURRENT DETECTION MECHANISM • Raw data collected from LHC • Hundreds of millions of proton-­‐proton collisions (event) per second • 400 events of interest are selected per second – Signal event (i.e. Higgs Boson) – Background event (i.e. other particles) • Events in Ad Hoc selection region (in certain channels) exceeding background noise 14 Needs improvement in significance and robustness in selection criteria
  15. 15. SIMPLIFICATIONS FOR COMPETITION • Simulated Data • Fixed mass (125 GeV) • Simplified decay channel – Next Slide • Simplified background events (three representative types only) –Decay of the Z boson (91.2 GeV) into Tau-­‐Tau –Decay of a pair of top quarks into lepton and hadronic tau –“Decay” of the W boson into lepton and hadronic tau due to imperfections in the particle identification procedure • Simplified objective function (significance score) 15
  16. 16. • Decay SIMPLIFIED DECAY CHANNEL of Tau-­‐Tau Channel only • One tau decays into lepton and two neutrino • The other tau decays into hadronic tau and a neutrino • (Note: Neutrinos can not be detected) hadronic tau: a bunch of hadrons 16
  17. 17. • Decay SIMPLIFIED DECAY CHANNEL of Tau-­‐Tau Channel only • One tau decays into lepton and two neutrino • The other tau decays into hadronic tau and a neutrino • (Note: Neutrinos can not be detected) hadronic tau: a bunch of hadrons 17
  18. 18. • Decay SIMPLIFIED DECAY CHANNEL of Tau-­‐Tau Channel only • One tau decays into lepton and two neutrino • The other tau decays into hadronic tau and a neutrino • (Note: Neutrinos can not be detected) 18 Jets MET vectorized momenta are given hadronic tau: a bunch of hadrons
  19. 19. Background 19 Data Model Understand read visualize Explore Enhance reduce generate find Train Select Optimize innovate read discuss Validate apply fine-­‐tune cross validate ©
  20. 20. • 250,000 training • 550,000 testing • 30 variables – 17 Primitive • Momenta • Direction – 13 Derived DATA DIMENSION 20 4 rows in training data EventId DER_ma ss_MMC DER_ma ss_trans verse_m et_lep DER_ma ss_vis DER_pt_ h DER_del taeta_jet _jet DER_ma ss_jet_je t DER_pro deta_jet_ jet DER_del tar_tau_l ep DER_pt_ tot DER_su m_pt 100000 138.47 51.655 97.827 27.98 0.91 124.711 2.666 3.064 41.928 197.76 100001 160.937 68.768 103.235 48.146 NA NA NA 3.473 2.078 125.157 100002 NA 162.172 125.953 35.635 NA NA NA 3.148 9.336 197.814 100003 143.905 81.417 80.943 0.414 NA NA NA 3.31 0.414 75.968 EventId DER_pt_ ratio_lep _tau DER_me t_phi_ce ntrality DER_lep _eta_cen trality PRI_tau_ pt PRI_tau_ eta PRI_tau_ phi PRI_lep_ pt PRI_lep_ eta PRI_lep_ phi PRI_met 100000 1.582 1.396 0.2 32.638 1.017 0.381 51.626 2.273 -2.414 16.824 100001 0.879 1.414 NA 42.014 2.039 -3.011 36.918 0.501 0.103 44.704 100002 3.776 1.414 NA 32.154 -0.705 -2.093 121.409 -0.953 1.052 54.283 100003 2.354 -1.285 NA 22.647 -1.655 0.01 53.321 -0.522 -3.1 31.082 EventId PRI_met _phi PRI_met _sumet PRI_jet_ num PRI_jet_l eading_ pt PRI_jet_l eading_e ta PRI_jet_l eading_ phi PRI_jet_ subleadi ng_pt PRI_jet_ subleadi ng_eta PRI_jet_ subleadi ng_phi PRI_jet_ all_pt 100000 -0.277 258.733 2 67.435 2.15 0.444 46.062 1.24 -2.475 113.497 100001 -1.916 164.546 1 46.226 0.725 1.158 NA NA NA 46.226 100002 -2.186 260.414 1 44.251 2.053 -2.028 NA NA NA 44.251 100003 0.06 86.062 0 NA NA NA NA NA NA 0 EventId Weight Label 100000 0.00265331s133733 100001 2.23358448b717 100002 2.34738894b364 100003 5.44637821b192 Data loaded correctly Notice NA values
  21. 21. MISSING VALUES 21 col_name NA_count NA_pct 1 EventId 2 DER_mass_MMC 38,114 15% 3 DER_mass_transverse_met_lep 4 DER_mass_vis 5 DER_pt_h 6 DER_deltaeta_jet_jet 177,457 71% 7 DER_mass_jet_jet 177,457 71% 8 DER_prodeta_jet_jet 177,457 71% 9 DER_deltar_tau_lep 10 DER_pt_tot 11 DER_sum_pt 12 DER_pt_ratio_lep_tau 13 DER_met_phi_centrality 14 DER_lep_eta_centrality 177,457 71% 15 PRI_tau_pt 16 PRI_tau_eta 17 PRI_tau_phi 18 PRI_lep_pt 19 PRI_lep_eta 20 PRI_lep_phi 21 PRI_met 22 PRI_met_phi 23 PRI_met_sumet 24 PRI_jet_num 25 PRI_jet_leading_pt 99,913 40% 26 PRI_jet_leading_eta 99,913 40% 27 PRI_jet_leading_phi 99,913 40% 28 PRI_jet_subleading_pt 177,457 71% 29 PRI_jet_subleading_eta 177,457 71% 30 PRI_jet_subleading_phi 177,457 71% 31 PRI_jet_all_pt 32 Weight 33 Label
  22. 22. MISSING VALUES 22 col_name NA_count NA_pct 1 EventId 2 DER_mass_MMC 38,114 15% 3 DER_mass_transverse_met_lep 4 DER_mass_vis 5 DER_pt_h 6 DER_deltaeta_jet_jet 177,457 71% 7 DER_mass_jet_jet 177,457 71% 8 DER_prodeta_jet_jet 177,457 71% 9 DER_deltar_tau_lep 10 DER_pt_tot 11 DER_sum_pt 12 DER_pt_ratio_lep_tau 13 DER_met_phi_centrality 14 DER_lep_eta_centrality 177,457 71% 15 PRI_tau_pt 16 PRI_tau_eta 17 PRI_tau_phi 18 PRI_lep_pt 19 PRI_lep_eta 20 PRI_lep_phi 21 PRI_met 22 PRI_met_phi 23 PRI_met_sumet 24 PRI_jet_num 25 PRI_jet_leading_pt 99,913 40% 26 PRI_jet_leading_eta 99,913 40% 27 PRI_jet_leading_phi 99,913 40% 28 PRI_jet_subleading_pt 177,457 71% 29 PRI_jet_subleading_eta 177,457 71% 30 PRI_jet_subleading_phi 177,457 71% 31 PRI_jet_all_pt 32 Weight 33 Label Notice the consistency in missing values
  23. 23. HOW TO HANDLE MISSING VALUES • Assign a value – Generate a random value – Fit a value (mean, median, nearest neighbor, etc.) – Fix a value (domain knowledge) • Remove the record • Leave as is 23
  24. 24. HOW TO HANDLE MISSING VALUES • Assign a value – Generate a random value – Fit a value (mean, median, nearest neighbor, etc.) – Fix a value (domain knowledge) • Remove the record • Leave as is 24
  25. 25. HISTOGRAM PRI_jet_leading_pt Count Log transformation Count Inverse transformation Count Density is more meaningful in the range of x No fuzzy jump at the edge 25
  26. 26. HISTOGRAM (CONT’D) DER_pt_h Count Log transformation Bi-­‐modality is revealed 26 Count Inverse transformation Count
  27. 27. INTERACTIVE VISUALIZATION R SHINY 27 http://chencheng.shinyapps.DEMO io/demo_higgs
  28. 28. INTERACTIVE VISUALIZATION R SHINY 28 http://chencheng.shinyapps.DEMO io/demo_higgs
  29. 29. INTERACTIVE VISUALIZATION R SHINY 29 Use a reasonable number of bins to display the underlying distribution http://chencheng.shinyapps.DEMO io/demo_higgs
  30. 30. INTERACTIVE VISUALIZATION R SHINY 30 Use a reasonable transformation to display the underlying distribution http://chencheng.shinyapps.DEMO io/demo_higgs
  31. 31. HISTOGRAM (CONT’D) 31 Count Transformations aPrReI _stoamu_eetitma es not necessary
  32. 32. 32 Do that for all 30 variables
  33. 33. PAIRWISE CORRELATIONS 33 Count Count BKG SGN PRI_lep_phi & PRI_met_phi
  34. 34. PAIRWISE CORRELATIONS 34 Count BKG SGN PRI_lep_phi & PRI_met_phi Set transparency parameter appropriately to reveal important pattCeronusnt
  35. 35. PAIRWISE CORRELATIONS 35 Count BKG SGN PRI_lep_phi & PRI_met_phi Correlation coefficient == 0 does not mean no correlation Count
  36. 36. PAIRWISE CORRELATIONS 36 Count Count BKG SGN PRI_lep_phi & PRI_met_phi
  37. 37. FEATURE ENHANCEMENT ROTATION BKG SGN rotated PRI_lep_phi & PRI_met_phi Validate visual “evidence” from various perspectives 37
  38. 38. FEATURE ENHANCEMENT ROTATION BKG SGN rotated PRI_lep_phi & PRI_met_phi Validate visual “evidence” from various perspectives 38
  39. 39. PAIRWISE VARIABLES — LOW RES. 39 Count Count BKG SGN DER_pt_h & DER_deltar_tau_lep
  40. 40. PAIRWISE VARIABLES — HIGH RES. Try High Resolution 40 Count Count BKG SGN DER_pt_h & DER_deltar_tau_lep
  41. 41. PAIRWISE VARIABLES — HIGH RES. Curve fitting 41 Count Count BKG SGN DER_pt_h & DER_deltar_tau_lep
  42. 42. FEATURE ENHANCEMENT CURVE FITTING Enhance a variable based on correlation with another variable 42 Count Count BKG SGN DER_pt_h & DER_deltar_tau_lep
  43. 43. FEATURE ENHANCEMENT ROTATION BY PRI_TAU_PHI 43 Domain Knowledge Count Count BKG SGN DER_pt_h & PRI_lep_phi
  44. 44. FEATURE ENHANCEMENT ROTATION BY PRI_TAU_PHI Feature enhancement by applying domain knowledge 44 Count Count BKG SGN DER_pt_h & PRI_lep_phi Domain Knowledge
  45. 45. FEATURE ENHANCEMENT ROTATION 45 Count Count BKG SGN PRI_jet_leading_eta & PRI_jet_subleading_eta
  46. 46. • Select DATA DRILL DOWN variable(s): One var. for histogram, two var. for scatter plot 46 http://chencheng.shinyapps.DEMO io/demo_higgs
  47. 47. • Dynamically DATA DRILL DOWN select a subset of data — PRI_jet_num = 2 47 http://chencheng.shinyapps.DEMO io/demo_higgs
  48. 48. • Patterns DATA DRILL DOWN in the subset data — PRI_jet_leading_eta & PRI_jet_subleading_eta 48 http://chencheng.shinyapps.DEMO io/demo_higgs
  49. 49. • Dynamically DATA DRILL DOWN select a subset of data — PRI_jet_num = 3 49 http://chencheng.shinyapps.DEMO io/demo_higgs
  50. 50. • Patterns DATA DRILL DOWN in the subset data — PRI_jet_leading_eta & PRI_jet_subleading_eta 50 http://chencheng.shinyapps.DEMO io/demo_higgs
  51. 51. • Patterns DATA DRILL DOWN in the subset data — PRI_jet_leading_eta & PRI_jet_subleading_eta 51 PRI_jet_num = 2 PRI_jet_num = 3 Interactive data visualization techniques are helpful http://chencheng.shinyapps.DEMO io/demo_higgs
  52. 52. 52 Do that for all 30 * 29 ~= 900 pairs
  53. 53. PARTICLE LOCATION — (0, S) 53 Animation Convert numerical data back into actual object with meaning
  54. 54. PARTICLE LOCATION — (0, B) 54 Animation
  55. 55. INSPIRATION FROM ANIMATION • Distance ratio between MET-­‐Lep and Tau-­‐Lep d(MET, Lep)/d(Tau, Lep) 55 Inspiration from meaningful visualization can be helpful Count dist_ratio_met_lep_tau BKG SGN
  56. 56. INSPIRATION FROM ANIMATION • Distance ratio between MET-­‐Lep and Tau-­‐Lep d(MET, Lep)/d(Tau, Lep) BKG SGN 56 Adjust visualization for better efficiency Count dist_ratio_met_lep_tau Count dist_ratio_met_lep_tau BKG SGN
  57. 57. • Variable reduction – Simple rotation – Transformation – Domain knowledge – … • Feature generation – Domain knowledge – Inspiration from various visualizations – Statistical approaches –… FEATURE ENHANCEMENT 45 degree rotation Curve fitting Rotation by phi distance_ratio Principle component analysis 57
  58. 58. Background 58 Data Model Understand read visualize read discuss Explore Enhance reduce generate apply innovate fine-­‐tune Train Select Optimize Validate find cross validate ©
  59. 59. • Gradient boosting tree • Neural network • Bayesian network • Support vector machine • Generalized additive model MODELS 59
  60. 60. • Gradient boosting tree • Neural network • Bayesian network • Support vector machine • Generalized additive model MODELS 60
  61. 61. • Decision GRADIENT BOOSTING TREE tree – Build many shallow trees • Boosting – Build trees based on residual • Bagging – Each tree uses a subset of the data • Ensembling – Combine the trees 61
  62. 62. • Decision GRADIENT BOOSTING TREE tree – Build many shallow trees • Boosting – Build trees based on residual • Bagging – Each tree uses a subset of the data • Ensembling – Combine the trees 62
  63. 63. • Regression tree DECISION TREE 63 1.0 0.5 0.0 −0.5 −1.0 0.0 2.5 5.0 7.5 10.0 x y
  64. 64. • Regression tree DECISION TREE 64 1.0 0.5 0.0 −0.5 −1.0 0.0 2.5 5.0 7.5 10.0 x y Depth = 1 | x< 6.614 x>=6.614 0.19 n=100 −0.08 n=64 0.66 n=36 Regression Tree with Node Depth = 1
  65. 65. • Regression tree DECISION TREE 65 0.19 n=100 | x< 6.614 x>=6.614 x>=3.049 x>=8.953 x< 3.049 x< 8.953 −0.08 n=64 −0.53 n=40 0.67 n=24 0.66 n=36 0.086 n=7 0.8 n=29 Regression Tree with Node Depth = 2 1.0 0.5 0.0 −0.5 −1.0 0.0 2.5 5.0 7.5 10.0 x y Depth = 2
  66. 66. • Regression tree DECISION TREE 66 | x< 6.614 x>=3.049 x< 5.862 x>=8.953 x< 7.207 x>=6.614 x< 3.049 x>=5.862 x< 8.953 x>=7.207 0.19 n=100 −0.08 n=64 −0.53 n=40 −0.67 n=32 0.045 n=8 0.67 n=24 0.66 n=36 0.086 n=7 0.8 n=29 0.57 n=7 0.87 n=22 Regression Tree with Node Depth = 3 1.0 0.5 0.0 −0.5 −1.0 0.0 2.5 5.0 7.5 10.0 x y Depth = 3
  67. 67. • Regression tree DECISION TREE 67 | x< 6.614 x>=3.049 x< 5.862 x>=3.594 x>=8.953 x< 7.207 x>=6.614 x< 3.049 x>=5.862 x< 3.594 x< 8.953 x>=7.207 0.19 n=100 −0.08 n=64 −0.53 n=40 −0.67 n=32 −0.8 n=25 −0.23 n=7 0.045 n=8 0.67 n=24 0.66 n=36 0.086 n=7 0.8 n=29 0.57 n=7 0.87 n=22 Regression Tree with Node Depth = 4 1.0 0.5 0.0 −0.5 −1.0 0.0 2.5 5.0 7.5 10.0 x y Depth = 4
  68. 68. DECISION TREE X0 = X; Y0 = Y; latest_model = train_tree(X, Y); for ii = 1:NUM_ITER Index_train = random(1:NUM_REC, FRAC_TRAIN * NUM_REC) X = X0[Index_train]; Y = Y0[Index_train]; v_resid = Y -­‐ wts * latest_model(X); tree(ii) = train_tree(X, v_pseudo_resid, wts); latest_model += LARNING_RATE * tree(ii) 68 base model
  69. 69. GRADIENT BOOSTING TREE (V. 1) X0 = X; Y0 = Y; latest_model = train_tree(X, Y); for ii = 1:NUM_ITER Index_train = random(1:NUM_REC, FRAC_TRAIN * NUM_REC) X = X0[Index_train]; Y = Y0[Index_train]; v_resid = Y -­‐ latest_model(X); tree_add= train_tree(X, v_resid); latest_model += LARNING_RATE * tree_add get the residuals fit a tree for residuals additive model 69
  70. 70. (STOCHASTIC) GRADIENT BOOSTING TREE X0 = X; Y0 = Y; latest_model = train_tree(X, Y); for ii = 1:NUM_ITER Index_train = random(1:NUM_REC, FRAC_TRAIN * NUM_REC) X = X0[Index_train]; Y = Y0[Index_train]; v_resid = Y -­‐ latest_model(X); tree_add = train_tree(X, v_resid); latest_model += LARNING_RATE * tree_add get sampled index sampled records as input 70 store input
  71. 71. (STOCHASTIC) GRADIENT BOOSTING TREE WITH WEIGHT X0 = X; Y0 = Y; latest_model = train_tree(X, Y, wts); for ii = 1:NUM_ITER Index_train = random(1:NUM_REC, FRAC_TRAIN * NUM_REC) X = X0[Index_train]; Y = Y0[Index_train]; v_resid = Y -­‐ wts * latest_model(X); tree_add = train_tree(X, v_resid, wts); latest_model += LARNING_RATE * tree_add 71
  72. 72. (GENERAL) GRADIENT BOOSTING X0 = X; Y0 = Y; latest_model = train_base_model(X, Y, wts); for ii = 1:NUM_ITER Index_train = random(1:NUM_REC, FRAC_TRAIN * NUM_REC) X = X0[Index_train]; Y = Y0[Index_train]; v_pseudo_resid = get_pseudo_residual(X, Y, wts, latest_model, LOSS_FUNCTION_TYPE); model_add_base = train_base_model(X, v_pseudo_resid, wts); alpha = linear_search(cost_function, model_add_base, X, Y, wts); latest_model += LARNING_RATE * (alpha * model_add_base) [Stochastic Gradient Boosting] Jerome H. Friedman, 1999 72
  73. 73. Background 73 Data Model Understand read visualize read discuss Explore Enhance reduce generate apply innovate fine-­‐tune Train Select Optimize Validate find cross validate ©
  74. 74. APPLYING GBM IN R gbm_model = gbm.fit( x=train[,x_vars, with = FALSE], y=train$Label, distribution = char_distr, w = w, n.trees = n_trees, interaction.depth = num_inter, n.minobsinnode = min_obs_node, shrinkage = shrinkage_rate, bag.fraction = frac_bag) 74
  75. 75. VARIABLE IMPORTANCE 75 Relative Importance
  76. 76. APPLY MODEL ON TEST DATA 76 EventId Score RankOrder Class 1 0.98 501 s 2 0.42 259,579 b 3 0.46 264,125 b . . . . . . . . 449,998 0.86 31,154 s 449,999 0.12 489,251 b 550,000 0.79 110,154 b
  77. 77. Background 77 Data Model Understand read visualize read discuss Explore Enhance reduce generate apply innovate fine-­‐tune Train Select Optimize Validate find cross validate
  78. 78. GRADIENT BOOSTING PARAMETERS • Number of iteration • Minimum observation for each node • Fraction of bagging (0.5 ~ 0.8) • Learning rate (<0.1) • Depth of tree (4 ~ 8) 78
  79. 79. Background 79 Data Model Understand read visualize read discuss Explore Enhance reduce generate apply innovate fine-­‐tune Train Select Optimize Validate find cross validate
  80. 80. • Split training data – 70% CROSS VALIDATION for training – 30% for cross validation • Train model (70%) • Measure performance (30%) 80
  81. 81. PERFORMANCE BASED ON AMS 81 Trade-­‐off between: Ratio of Signal/Background events Number of records in selection region EventId Score RankOrd er Class truth 1 0.98 501 S S 2 0.42 259,579 B 3 0.46 264,125 B . . . . . . . . 449,998 0.86 31,154 S B 449,999 0.12 489,251 B 550,000 0.79 110,154 B Selection Region s = sum(S) b= sum(B)
  82. 82. PERFORMANCE BASED ON AMS 82 Percentile AMS AMS percentage of signal
  83. 83. COMPARE TWO MODEL RESULTS Percentile 83 Training Cross validation Percentile AMS AMS percentage of signal
  84. 84. Percentile 84 COMPARE TWO MODEL RESULTS Training Cross validation Percentile AMS AMS percentage of signal
  85. 85. AMS BY NUM. ITERATION 85 Percentile AMS Animation
  86. 86. Background 86 Data Model Understand read visualize read discuss Explore Enhance reduce generate apply innovate fine-­‐tune Train Select Optimize Validate find cross validate
  87. 87. s b >> 4 HEAT MAP OF AMS ON B-­‐S PLAN 87
  88. 88. OPTIMIZATION BASED ON OBJECTIVE FUNCTION Percentile 88 A B C AMS
  89. 89. HEAT MAP OF AMS ON B-­‐S PLAN 89 s b A B C
  90. 90. HEAT MAP OF AMS ON B-­‐S PLAN 90 s b A B C Inspiration from Lagrangian Method Weight signal and background events by partial derivatives of AMS function
  91. 91. AMS CURVE ON B-­‐S PLAN 91 A B C Inspiration from Lagrangian Method Weight signal and background events by partial derivatives of AMS function s partial derivative of AMS against s partial derivative of AMS against b b Ratio of the derivatives ==> relative weight
  92. 92. IMPROVEMENT DUE TO WEIGHTING 92 AMS* Num_Iterations AMS
  93. 93. IMPROVEMENT DUE TO WEIGHTING (CONT’D) 93 Num_Iterations AMS* AMS
  94. 94. AUGMENTED GRADIENT BOOSTING 94 Apply GBM Weight Adjustment ©
  95. 95. AUGMENTED GRADIENT BOOSTING 95 Apply GBM Weight Adjustment Remove very high and very low score records from train and test ©
  96. 96. IMPROVEMENT DUE TO ELIMINATION 96 Num_Iterations AMS* AMS
  97. 97. IMPROVEMENT DUE TO ELIMINATION (CONT’D) 97 Num_Iterations AMS* AMS
  98. 98. AUGMENTED GRADIENT BOOSTING 98 Apply ML Model Weight Adjustment Remove very high and very low score records from train and test ©
  99. 99. Background 99 Data Model Understand read visualize read discuss Explore Enhance reduce generate apply innovate fine-­‐tune Train Select Optimize Validate find cross validate
  100. 100. • Version OTHER TOPICS control (Git, Source Tree) – Effectively implement many different ideas • File organization – Efficiently pull out the file needed • Effective code (R, Python) – it matters so much when dealing with big data 100
  101. 101. Thanks you for your participation! Any Questions? goDCI.com

×