SlideShare a Scribd company logo
Machine Learning Applications
Armando Benitez
BMO Capital Markets
Jul 18, 2016
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
• Located on the outskirts of
Geneva. France - Switzerland 

• 27 km in circumference 

• The tunnel is buried around 50
to 175 m. underground.
2
LHC - CERN
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 3
Atlas Detector
Detector
Amplifier
Digitizer
selection
storage
computers
Particle
signal
Trash
010010
5/6/03
Shabnam Jabeen (Kansas)
Trig
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 4
Multiple Algorithms in Parallel"#$%&'()&(%*+(,($-.&.+/*%012.
!!!!!"##$%&'!!!!!!!!!!!!!!!!"()&$*(+!!!!!!!!!!,(%-*.
/&0*$*#+!1-&&$!!2&3-(4!2&%5#-6$!!74&8&+%$
Using another ML algorithm to combine the
result of individual classifiers.
Purpose: extract all possible information
from the Dataset.
The Combination
produces an output, from
where all measurements
are obtained
Combine
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 5
Mobile Market Place
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
Data Processing and Modelling
Transaction
grade
APIs + MQs
Data Lake
HBase,
Cassandra,
etc.
Stream
Processing
Batch
Processing
Model
Generator
Decision
Engine
(context, event, data)
(event)
(data)
Feature Selection
Model Training
Model Evaluation
Model Assembly
Real-Time
Layer
Batch Processing
Layer
{
Data Science
1. Fraud Detection
2. Search
3. Recommendations
4. Notifications
5. Ratings
6. Merchant Intelligence
7. Engagement
Optimization
8. Marketing Optimization
9. App Personalization
10. Ad Network Support
11. Image / Speech
Recognition
Theory
(Math, Algorithms)
Proof-of-Concept
(R, Python, Scala, C++)
Spark Implementation
(Scalability, Robustness)
Platform Integration
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
Fraud Detection
7
• Very small number of fraud cases

• Large number of good transactions

• Many different “types” of anomalies.
Hard for algorithms to learn from
positive examples what the anomalies
look like

• Future anomalies may look nothing
like any of the anomalous examples
we’ve seen so far
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 8
Personalization
• Offers targeted for each user

• Use browsing history and shopping
habits to determine products the user is
most likely to buy

• Similarity among users

• Similarity among items

• Catalog search results
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 9
Incorporating ML to Design
Visual Inputs
Aural Inputs
Corporal Inputs
Environmental Inputs
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
• Machine Learning algorithm capable of
discovering pattern with data presented to
them. How can we make use of it?

• Find discovery opportunities that only are
possible with the help of Machine Learning

• Designers and programmers to establish a
strong collaboration to find ground-
breaking applications.

• Understand rules to know which ones to
bend or break
10
Creating Dialogue
Extra Slides
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 12
Search Strategy
Initial
objects Found it!
15
)2
Invariant Mass (GeV/c
5.2 5.4 5.6 5.8 6 6.2 6.4 6.6 6.8 7
Events/(0.05)
2
4
6
8
10
12
)2
Invariant Mass (GeV/c
5.2 5.4 5.6 5.8 6 6.2 6.4 6.6 6.8 7
Events/(0.05)
2
4
6
8
10
12
)2
Invariant Mass (GeV/c
5.2 5.4 5.6 5.8 6 6.2 6.4 6.6 6.8 7
Events/(0.05)
0
2
4
6
8
10
12
)2
Invariant Mass (GeV/c
5.2 5.4 5.6 5.8 6 6.2 6.4 6.6 6.8 7
Events/(0.05)
0
2
4
6
8
10
12
FIG. 16: b mass distribution of background events from J/ sideband events after all selection cuts have been applied (top),
and these events -red squares- on top of the signal observed in right-sign combination events -open circles- (bottom).
3. ⇥b reconstruction on b ⇥ J/⇥ (p ) MC events.
We applied our ⇥b selection on 30K generated b ⇥ J/⇥ (p ) MC events. This is p17 MC with the same
cuts at generation level as those applied to our ⇥b MC, and reprocessed with the same extended configuration
as used on data. No events survived after selection.
VI. CONCLUSIONS
By using a simple set of cuts we observe a signal peak with a mass of 5.774 ± 0.011 GeV/c2
(stat) ± 0.22 GeV/c2
(sys) and a width of 0.037 ± 0.008 GeV/c2
, a significance of 5.53 and S/
⇤
B = 7.80. This peak is showed in Fig. 12
and the results of the fit are in Table II. This support the previous report of the observation by using Bagger Decision
Trees [6]. We measure a relative production ratio to be
f(b⇥⇥b )Br(⇥b ⇥J/⇥⇥ ( ))
f(b⇥ b)Br( b⇥J/⇥ ) = 0.376 ± 0.119stat. ± 0.188syst
[1] PL B384 449, D. Buskalic et. al.
[2] ZPHY C68 541 P. Abreu et al.
[3] Common Samples Group, http://wwwd0.fnal.gov/Run2Physics/cs/.
[4] See description of ”J/psi & dimuon mass continuum” at http://d0server1.fnal.gov/users/nomerot/Run2A/BANA/Dskim.html.
[5] Reconstruction of B hadron signals at DØ , DØ Note 4481.
[6] DØ Note 5401.
DØ Note 5403
Version 4.1 as June 5, 2007
Observation of the heavy baryon b
E. De La Cruz Burelo, H.A. Neal, and J. Qian
University of Michigan
B. Abbott
University of Oklahoma
G.D. Alexeev, Yu.P. Merekov, G.A. Panov, A.M. Rozhdestvensky, L.S. Vertogradov, Yu.L. Vertogradova
Joint Institute for Nuclear Research, Russia
Using approximately 1.3 fb 1
of data collected by the upgraded DØ detector in Run II of the
Tevatron, the ⇤b state has been observed in the decay mode J/⇤(⇤ µ+
µ )⇤ (⇤ ⇤ ⇥⇥±
, ⇥ ⇤ ⇥p)
A tracking algorithm which allows a more e⇧cient method of reconstructing tracks with large impact
parameters was used in order to increase the e⇧ciency of reconstructing the ⇥ and ⇤ . We observe
the ⇤b with a significance of 2 ln(L) = 5.53, S/
⌅
B = 7.80 with a mass of 5.774 ± 0.011
GeV/c2
(stat) ± .022 GeV/c2
(sys). We measure the relative production ratio to be
f(b ⇤ ⇤b )Br(⇤b ⇤ J/⇤⇤ (⇥⇥ ))
f(b ⇤ ⇥b)Br(⇥b ⇤ J/⇤⇥)
= 0.376 ± 0.119 stat. ± 0.188 syst.
Data Cleaning
Signal to Bkg
20:1
Initial
objects
Found it!Data Cleaning
Machine
Learning
9.4.2 Observed Results
tb+tqb DT Output
0 0.2 0.4 0.6 0.8 1
EventYield
0
200
400
600
800
-1
D0 RunII Prelim. 2.3 fb
channelµp17+p20 e+
1-2 b-tags
2-4 jets
tb+tqb DT Output
0 0.2 0.4 0.6 0.8 1
EventYield
0
200
400
600
800
tb+tqb DT Output
0 0.2 0.4 0.6 0.8 1
EventYield
2
10
3
10 -1
D0 RunII Prelim. 2.3 fb
channelµp17+p20 e+
1-2 b-tags
2-4 jets
tb+tqb DT Output
0 0.2 0.4 0.6 0.8 1
EventYield
2
10
3
10
ield
60 -1
D0 RunII Prelim. 2.3 fb
ield
60
Traditional searches
Small Signal Analysis
Signal to Bkg
1:20
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
Signal
13
Decision Trees
1
Signal
Bkg
Bkg
x
y
y
1
1
2
2
X
Y
Bkg
Signal
x<x
x<x
y<y
Bkg
Signal
BkgSignal
Bkg
x
2
1
1
L 4 R4
L R3 3L R2 2
y<y2
L 1 R
Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving
the classification problem of signal and background.
8.1 Overview
Signal
Signal
Bkg
Bkg
Bkg
Task: separate signal from background
Issue: A single split on X or Y is not
enough!
Solution: Use a series of
consecutive splits,
generating a tree structure
1
Signal
Bkg
Bkg
x
y
y
1
1
2
2
X
Y
Bkg
Signal
x<x
x<x
y<y
Bkg
Signal
BkgSignal
Bkg
x
2
1
1
L 4 R4
L R3 3L R2 2
y<y2
L 1 R
Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving
the classification problem of signal and background.
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
Signal
14
Decision Trees
1
Signal
Bkg
Bkg
x
y
y
1
1
2
2
X
Y
Bkg
Signal
x<x
x<x
y<y
Bkg
Signal
BkgSignal
Bkg
x
2
1
1
L 4 R4
L R3 3L R2 2
y<y2
L 1 R
Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving
the classification problem of signal and background.
8.1 Overview
Failed
C1
Split 1: on the X variable
1
Signal
Bkg
Bkg
x
y
y
1
1
2
2
X
Y
Bkg
Signal
x<x
x<x
y<y
Bkg
Signal
BkgSignal
Bkg
x
2
1
1
L 4 R4
L R3 3L R2 2
y<y2
L 1 R
Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving
the classification problem of signal and background.
Passed
C1
P1F1
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
Signal
15
Decision Trees
1
Signal
Bkg
Bkg
x
y
y
1
1
2
2
X
Y
Bkg
Signal
x<x
x<x
y<y
Bkg
Signal
BkgSignal
Bkg
x
2
1
1
L 4 R4
L R3 3L R2 2
y<y2
L 1 R
Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving
the classification problem of signal and background.
8.1 Overview
F: C1
F: C2
Split 2: Recovered events that failed the split 1
1
Signal
Bkg
Bkg
x
y
y
1
1
2
2
X
Y
Bkg
Signal
x<x
x<x
y<y
Bkg
Signal
BkgSignal
Bkg
x
2
1
1
L 4 R4
L R3 3L R2 2
y<y2
L 1 R
Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving
the classification problem of signal and background.
Passed
C1
P1F1
P2F2
F: C1
P: C2
repeat and continue the splitting process until events are classified
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 16
Decision Trees
After 4 splits: Signal and Background regions are separated! Done!
1
Signal
Bkg
Bkg
x
y
y
1
1
2
2
X
Y
Bkg
Signal
x<x
x<x
y<y
Bkg
Signal
BkgSignal
Bkg
x
2
1
1
L 4 R4
L R3 3L R2 2
y<y2
L 1 R
Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving
the classification problem of signal and background.
P1F1
P2F2 P3F3
P4F4
Signal
1
Signal
Bkg
Bkg
x
y
y
1
1
2
2
X
Y
Bkg
Signal
x<x
x<x
y<y
Bkg
Signal
BkgSignal
Bkg
x
2
1
1
L 4 R4
L R3 3L R2 2
y<y2
L 1 R
Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving
the classification problem of signal and background.
8.1 Overview
F: C1
P: C2
P: C1,C2
F: C4
P: C1,

C3,C4
F: C1,C2
P: C1
F: C2
Toy model: only 2 variables, easy to determine cut values
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 17
A/B Testing
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
• Consultant @ BMO Capital Markets
• Previously:
• Data Scientist @ Paytm Labs
• Researcher - ATLAS Experiment @ CERN
• Researcher - Fermilab National Laboratory
18
Background
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
Anomaly detection
19
๏
Fit model on training set
๏
On a cross validation/test example, predict
๏
Possible evaluation metrics:
๏ True positive, false positive, false negative, true negative
๏ Precision/Recall
๏ F1-score
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
• The SM describes the world around
us
• Components:
• 24 particles of matter
• 4 mediators
• Interactions of the particles explained
by the mediators
• Does not include: gravity, dark
matter and dark energy
20
Standard Model (SM)
Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 21
Identity Resolution
• What? 

Identify products having similar properties (name, colour, size) as a
unique product

• Why? 

Recommender systems trained on these products would produce
better recommendations -> Non-repetitive

• How?

• Classifying pairs as match or non-match, based on how similar they
are. 

• Making use of catalog known features

More Related Content

Similar to Machine Learning Applications

Analisis dinamico de un portico
Analisis dinamico de un porticoAnalisis dinamico de un portico
Analisis dinamico de un portico
ARTHUR ANTONY ALBA ROSALES
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?
tuxette
 
Count Data Models in SAS
Count Data Models in SASCount Data Models in SAS
Count Data Models in SAS
WenSui Liu
 
Dagobahic2020orange
Dagobahic2020orangeDagobahic2020orange
Dagobahic2020orange
JixiongLIU
 
Crude-Oil Scheduling Technology: moving from simulation to optimization
Crude-Oil Scheduling Technology: moving from simulation to optimizationCrude-Oil Scheduling Technology: moving from simulation to optimization
Crude-Oil Scheduling Technology: moving from simulation to optimization
Brenno Menezes
 
Should I be dead? a very personal genomics
Should I be dead? a very personal genomicsShould I be dead? a very personal genomics
Should I be dead? a very personal genomics
Neil Saunders
 
GPUs for GEC Competition @ GECCO-2013
GPUs for GEC Competition @ GECCO-2013GPUs for GEC Competition @ GECCO-2013
GPUs for GEC Competition @ GECCO-2013
Daniele Loiacono
 
Recent Developments in Computational Methods for the Analysis of Ducted Prope...
Recent Developments in Computational Methods for the Analysis of Ducted Prope...Recent Developments in Computational Methods for the Analysis of Ducted Prope...
Recent Developments in Computational Methods for the Analysis of Ducted Prope...
João Baltazar
 
Julien vachaudez - projet Autodiag
Julien vachaudez - projet AutodiagJulien vachaudez - projet Autodiag
Julien vachaudez - projet Autodiag
Synhera
 
Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYC
Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYCBryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYC
Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYC
MLconf
 
(Semi-) Big Data Corpora: New Challanges and New Solutions for Corpus Linguists
(Semi-) Big Data Corpora: New Challanges and New Solutions for Corpus Linguists(Semi-) Big Data Corpora: New Challanges and New Solutions for Corpus Linguists
(Semi-) Big Data Corpora: New Challanges and New Solutions for Corpus Linguists
Tobias Gärtner
 
MATH 533 RANK Inspiring Innovation--math533rank.com
MATH 533 RANK Inspiring Innovation--math533rank.comMATH 533 RANK Inspiring Innovation--math533rank.com
MATH 533 RANK Inspiring Innovation--math533rank.com
shanaabe65
 
A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)
Toshiyuki Shimono
 
DSD-INT 2018 Algorithmic Differentiation - Markus
DSD-INT 2018 Algorithmic Differentiation - MarkusDSD-INT 2018 Algorithmic Differentiation - Markus
DSD-INT 2018 Algorithmic Differentiation - Markus
Deltares
 
Yahoo! presentation
Yahoo! presentationYahoo! presentation
Yahoo! presentation
Yawen Li
 
An FPGA-based acceleration methodology and performance model for iterative st...
An FPGA-based acceleration methodology and performance model for iterative st...An FPGA-based acceleration methodology and performance model for iterative st...
An FPGA-based acceleration methodology and performance model for iterative st...
NECST Lab @ Politecnico di Milano
 
MATH 533 RANK Lessons in Excellence-- math533rank.com
MATH 533 RANK Lessons in Excellence-- math533rank.comMATH 533 RANK Lessons in Excellence-- math533rank.com
MATH 533 RANK Lessons in Excellence-- math533rank.com
RoelofMerwe118
 
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision...
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision..."Separable Convolutions for Efficient Implementation of CNNs and Other Vision...
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision...
Edge AI and Vision Alliance
 

Similar to Machine Learning Applications (20)

Analisis dinamico de un portico
Analisis dinamico de un porticoAnalisis dinamico de un portico
Analisis dinamico de un portico
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?
 
Count Data Models in SAS
Count Data Models in SASCount Data Models in SAS
Count Data Models in SAS
 
LalitBDA2015V3
LalitBDA2015V3LalitBDA2015V3
LalitBDA2015V3
 
Dagobahic2020orange
Dagobahic2020orangeDagobahic2020orange
Dagobahic2020orange
 
Crude-Oil Scheduling Technology: moving from simulation to optimization
Crude-Oil Scheduling Technology: moving from simulation to optimizationCrude-Oil Scheduling Technology: moving from simulation to optimization
Crude-Oil Scheduling Technology: moving from simulation to optimization
 
Should I be dead? a very personal genomics
Should I be dead? a very personal genomicsShould I be dead? a very personal genomics
Should I be dead? a very personal genomics
 
GPUs for GEC Competition @ GECCO-2013
GPUs for GEC Competition @ GECCO-2013GPUs for GEC Competition @ GECCO-2013
GPUs for GEC Competition @ GECCO-2013
 
Recent Developments in Computational Methods for the Analysis of Ducted Prope...
Recent Developments in Computational Methods for the Analysis of Ducted Prope...Recent Developments in Computational Methods for the Analysis of Ducted Prope...
Recent Developments in Computational Methods for the Analysis of Ducted Prope...
 
Julien vachaudez - projet Autodiag
Julien vachaudez - projet AutodiagJulien vachaudez - projet Autodiag
Julien vachaudez - projet Autodiag
 
Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYC
Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYCBryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYC
Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYC
 
(Semi-) Big Data Corpora: New Challanges and New Solutions for Corpus Linguists
(Semi-) Big Data Corpora: New Challanges and New Solutions for Corpus Linguists(Semi-) Big Data Corpora: New Challanges and New Solutions for Corpus Linguists
(Semi-) Big Data Corpora: New Challanges and New Solutions for Corpus Linguists
 
MATH 533 RANK Inspiring Innovation--math533rank.com
MATH 533 RANK Inspiring Innovation--math533rank.comMATH 533 RANK Inspiring Innovation--math533rank.com
MATH 533 RANK Inspiring Innovation--math533rank.com
 
A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)
 
DSD-INT 2018 Algorithmic Differentiation - Markus
DSD-INT 2018 Algorithmic Differentiation - MarkusDSD-INT 2018 Algorithmic Differentiation - Markus
DSD-INT 2018 Algorithmic Differentiation - Markus
 
Yahoo! presentation
Yahoo! presentationYahoo! presentation
Yahoo! presentation
 
An FPGA-based acceleration methodology and performance model for iterative st...
An FPGA-based acceleration methodology and performance model for iterative st...An FPGA-based acceleration methodology and performance model for iterative st...
An FPGA-based acceleration methodology and performance model for iterative st...
 
MATH 533 RANK Lessons in Excellence-- math533rank.com
MATH 533 RANK Lessons in Excellence-- math533rank.comMATH 533 RANK Lessons in Excellence-- math533rank.com
MATH 533 RANK Lessons in Excellence-- math533rank.com
 
vega
vegavega
vega
 
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision...
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision..."Separable Convolutions for Efficient Implementation of CNNs and Other Vision...
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision...
 

Recently uploaded

一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 

Recently uploaded (20)

一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 

Machine Learning Applications

  • 1. Machine Learning Applications Armando Benitez BMO Capital Markets Jul 18, 2016
  • 2. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 • Located on the outskirts of Geneva. France - Switzerland • 27 km in circumference • The tunnel is buried around 50 to 175 m. underground. 2 LHC - CERN
  • 3. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 3 Atlas Detector Detector Amplifier Digitizer selection storage computers Particle signal Trash 010010 5/6/03 Shabnam Jabeen (Kansas) Trig
  • 4. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 4 Multiple Algorithms in Parallel"#$%&'()&(%*+(,($-.&.+/*%012. !!!!!"##$%&'!!!!!!!!!!!!!!!!"()&$*(+!!!!!!!!!!,(%-*. /&0*$*#+!1-&&$!!2&3-(4!2&%5#-6$!!74&8&+%$ Using another ML algorithm to combine the result of individual classifiers. Purpose: extract all possible information from the Dataset. The Combination produces an output, from where all measurements are obtained Combine
  • 5. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 5 Mobile Market Place
  • 6. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 Data Processing and Modelling Transaction grade APIs + MQs Data Lake HBase, Cassandra, etc. Stream Processing Batch Processing Model Generator Decision Engine (context, event, data) (event) (data) Feature Selection Model Training Model Evaluation Model Assembly Real-Time Layer Batch Processing Layer { Data Science 1. Fraud Detection 2. Search 3. Recommendations 4. Notifications 5. Ratings 6. Merchant Intelligence 7. Engagement Optimization 8. Marketing Optimization 9. App Personalization 10. Ad Network Support 11. Image / Speech Recognition Theory (Math, Algorithms) Proof-of-Concept (R, Python, Scala, C++) Spark Implementation (Scalability, Robustness) Platform Integration
  • 7. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 Fraud Detection 7 • Very small number of fraud cases • Large number of good transactions • Many different “types” of anomalies. Hard for algorithms to learn from positive examples what the anomalies look like • Future anomalies may look nothing like any of the anomalous examples we’ve seen so far
  • 8. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 8 Personalization • Offers targeted for each user • Use browsing history and shopping habits to determine products the user is most likely to buy • Similarity among users • Similarity among items • Catalog search results
  • 9. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 9 Incorporating ML to Design Visual Inputs Aural Inputs Corporal Inputs Environmental Inputs
  • 10. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 • Machine Learning algorithm capable of discovering pattern with data presented to them. How can we make use of it? • Find discovery opportunities that only are possible with the help of Machine Learning • Designers and programmers to establish a strong collaboration to find ground- breaking applications. • Understand rules to know which ones to bend or break 10 Creating Dialogue
  • 12. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 12 Search Strategy Initial objects Found it! 15 )2 Invariant Mass (GeV/c 5.2 5.4 5.6 5.8 6 6.2 6.4 6.6 6.8 7 Events/(0.05) 2 4 6 8 10 12 )2 Invariant Mass (GeV/c 5.2 5.4 5.6 5.8 6 6.2 6.4 6.6 6.8 7 Events/(0.05) 2 4 6 8 10 12 )2 Invariant Mass (GeV/c 5.2 5.4 5.6 5.8 6 6.2 6.4 6.6 6.8 7 Events/(0.05) 0 2 4 6 8 10 12 )2 Invariant Mass (GeV/c 5.2 5.4 5.6 5.8 6 6.2 6.4 6.6 6.8 7 Events/(0.05) 0 2 4 6 8 10 12 FIG. 16: b mass distribution of background events from J/ sideband events after all selection cuts have been applied (top), and these events -red squares- on top of the signal observed in right-sign combination events -open circles- (bottom). 3. ⇥b reconstruction on b ⇥ J/⇥ (p ) MC events. We applied our ⇥b selection on 30K generated b ⇥ J/⇥ (p ) MC events. This is p17 MC with the same cuts at generation level as those applied to our ⇥b MC, and reprocessed with the same extended configuration as used on data. No events survived after selection. VI. CONCLUSIONS By using a simple set of cuts we observe a signal peak with a mass of 5.774 ± 0.011 GeV/c2 (stat) ± 0.22 GeV/c2 (sys) and a width of 0.037 ± 0.008 GeV/c2 , a significance of 5.53 and S/ ⇤ B = 7.80. This peak is showed in Fig. 12 and the results of the fit are in Table II. This support the previous report of the observation by using Bagger Decision Trees [6]. We measure a relative production ratio to be f(b⇥⇥b )Br(⇥b ⇥J/⇥⇥ ( )) f(b⇥ b)Br( b⇥J/⇥ ) = 0.376 ± 0.119stat. ± 0.188syst [1] PL B384 449, D. Buskalic et. al. [2] ZPHY C68 541 P. Abreu et al. [3] Common Samples Group, http://wwwd0.fnal.gov/Run2Physics/cs/. [4] See description of ”J/psi & dimuon mass continuum” at http://d0server1.fnal.gov/users/nomerot/Run2A/BANA/Dskim.html. [5] Reconstruction of B hadron signals at DØ , DØ Note 4481. [6] DØ Note 5401. DØ Note 5403 Version 4.1 as June 5, 2007 Observation of the heavy baryon b E. De La Cruz Burelo, H.A. Neal, and J. Qian University of Michigan B. Abbott University of Oklahoma G.D. Alexeev, Yu.P. Merekov, G.A. Panov, A.M. Rozhdestvensky, L.S. Vertogradov, Yu.L. Vertogradova Joint Institute for Nuclear Research, Russia Using approximately 1.3 fb 1 of data collected by the upgraded DØ detector in Run II of the Tevatron, the ⇤b state has been observed in the decay mode J/⇤(⇤ µ+ µ )⇤ (⇤ ⇤ ⇥⇥± , ⇥ ⇤ ⇥p) A tracking algorithm which allows a more e⇧cient method of reconstructing tracks with large impact parameters was used in order to increase the e⇧ciency of reconstructing the ⇥ and ⇤ . We observe the ⇤b with a significance of 2 ln(L) = 5.53, S/ ⌅ B = 7.80 with a mass of 5.774 ± 0.011 GeV/c2 (stat) ± .022 GeV/c2 (sys). We measure the relative production ratio to be f(b ⇤ ⇤b )Br(⇤b ⇤ J/⇤⇤ (⇥⇥ )) f(b ⇤ ⇥b)Br(⇥b ⇤ J/⇤⇥) = 0.376 ± 0.119 stat. ± 0.188 syst. Data Cleaning Signal to Bkg 20:1 Initial objects Found it!Data Cleaning Machine Learning 9.4.2 Observed Results tb+tqb DT Output 0 0.2 0.4 0.6 0.8 1 EventYield 0 200 400 600 800 -1 D0 RunII Prelim. 2.3 fb channelµp17+p20 e+ 1-2 b-tags 2-4 jets tb+tqb DT Output 0 0.2 0.4 0.6 0.8 1 EventYield 0 200 400 600 800 tb+tqb DT Output 0 0.2 0.4 0.6 0.8 1 EventYield 2 10 3 10 -1 D0 RunII Prelim. 2.3 fb channelµp17+p20 e+ 1-2 b-tags 2-4 jets tb+tqb DT Output 0 0.2 0.4 0.6 0.8 1 EventYield 2 10 3 10 ield 60 -1 D0 RunII Prelim. 2.3 fb ield 60 Traditional searches Small Signal Analysis Signal to Bkg 1:20
  • 13. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 Signal 13 Decision Trees 1 Signal Bkg Bkg x y y 1 1 2 2 X Y Bkg Signal x<x x<x y<y Bkg Signal BkgSignal Bkg x 2 1 1 L 4 R4 L R3 3L R2 2 y<y2 L 1 R Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving the classification problem of signal and background. 8.1 Overview Signal Signal Bkg Bkg Bkg Task: separate signal from background Issue: A single split on X or Y is not enough! Solution: Use a series of consecutive splits, generating a tree structure 1 Signal Bkg Bkg x y y 1 1 2 2 X Y Bkg Signal x<x x<x y<y Bkg Signal BkgSignal Bkg x 2 1 1 L 4 R4 L R3 3L R2 2 y<y2 L 1 R Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving the classification problem of signal and background.
  • 14. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 Signal 14 Decision Trees 1 Signal Bkg Bkg x y y 1 1 2 2 X Y Bkg Signal x<x x<x y<y Bkg Signal BkgSignal Bkg x 2 1 1 L 4 R4 L R3 3L R2 2 y<y2 L 1 R Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving the classification problem of signal and background. 8.1 Overview Failed C1 Split 1: on the X variable 1 Signal Bkg Bkg x y y 1 1 2 2 X Y Bkg Signal x<x x<x y<y Bkg Signal BkgSignal Bkg x 2 1 1 L 4 R4 L R3 3L R2 2 y<y2 L 1 R Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving the classification problem of signal and background. Passed C1 P1F1
  • 15. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 Signal 15 Decision Trees 1 Signal Bkg Bkg x y y 1 1 2 2 X Y Bkg Signal x<x x<x y<y Bkg Signal BkgSignal Bkg x 2 1 1 L 4 R4 L R3 3L R2 2 y<y2 L 1 R Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving the classification problem of signal and background. 8.1 Overview F: C1 F: C2 Split 2: Recovered events that failed the split 1 1 Signal Bkg Bkg x y y 1 1 2 2 X Y Bkg Signal x<x x<x y<y Bkg Signal BkgSignal Bkg x 2 1 1 L 4 R4 L R3 3L R2 2 y<y2 L 1 R Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving the classification problem of signal and background. Passed C1 P1F1 P2F2 F: C1 P: C2 repeat and continue the splitting process until events are classified
  • 16. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 16 Decision Trees After 4 splits: Signal and Background regions are separated! Done! 1 Signal Bkg Bkg x y y 1 1 2 2 X Y Bkg Signal x<x x<x y<y Bkg Signal BkgSignal Bkg x 2 1 1 L 4 R4 L R3 3L R2 2 y<y2 L 1 R Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving the classification problem of signal and background. P1F1 P2F2 P3F3 P4F4 Signal 1 Signal Bkg Bkg x y y 1 1 2 2 X Y Bkg Signal x<x x<x y<y Bkg Signal BkgSignal Bkg x 2 1 1 L 4 R4 L R3 3L R2 2 y<y2 L 1 R Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving the classification problem of signal and background. 8.1 Overview F: C1 P: C2 P: C1,C2 F: C4 P: C1,
 C3,C4 F: C1,C2 P: C1 F: C2 Toy model: only 2 variables, easy to determine cut values
  • 17. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 17 A/B Testing
  • 18. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 • Consultant @ BMO Capital Markets • Previously: • Data Scientist @ Paytm Labs • Researcher - ATLAS Experiment @ CERN • Researcher - Fermilab National Laboratory 18 Background
  • 19. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 Anomaly detection 19 ๏ Fit model on training set ๏ On a cross validation/test example, predict ๏ Possible evaluation metrics: ๏ True positive, false positive, false negative, true negative ๏ Precision/Recall ๏ F1-score
  • 20. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 • The SM describes the world around us • Components: • 24 particles of matter • 4 mediators • Interactions of the particles explained by the mediators • Does not include: gravity, dark matter and dark energy 20 Standard Model (SM)
  • 21. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 21 Identity Resolution • What? 
 Identify products having similar properties (name, colour, size) as a unique product • Why? 
 Recommender systems trained on these products would produce better recommendations -> Non-repetitive • How? • Classifying pairs as match or non-match, based on how similar they are. • Making use of catalog known features