This document discusses machine learning applications and provides examples. It begins with an overview of machine learning algorithms being used in parallel to combine results from individual classifiers and extract all possible information from datasets. It then provides examples of mobile marketplaces using machine learning for fraud detection, personalization, and other applications. It concludes by discussing how machine learning can be incorporated into design to make use of visual, aural, corporal, and environmental inputs.
Taking forward change in technology-enhanced educationRichard Hall
My presentation for the JISC-funded Strategy Cascade: Taking forward change in technology-enhanced education workshop, run by Mark Johnson [University of Bolton] and Keith Smythe [Edinburgh Napier University]. See: http://strategycascade.wordpress.com/
We’ve seen many major industries undergo dramatic change in the last decade (i.e. manufacturing, newspapers, and customer service). With the introduction of MOOCs, adaptive learning systems, and content-delivery platforms, higher education doesn’t seem as “untouchable” as it used to. How can you stay ahead of the trends and stay relevant in this new world of technology-enhanced education?
Evidence-based practice in technology-enhanced learningJisc
How much do we know about what works in technology-enhanced learning in higher education?
How can universities and course teams ensure that they’re making most effective use of technology to improve students’ learning experience?
In this workshop you will hear from a range of universities on how they explore impact and what they’ve discovered about what works, and share any findings of your own.
We will also discuss how the evidence base can be brought together and made more accessible.
How does technology-enhanced learning contribute to teaching excellence?Jisc
Speakers:
Sarah Davies, head of higher eduaction and student experience, Jisc
Dr Rhona Sharpe, deputy HR director and head of OCSLD, Oxford Brookes University
Prof Paul Bartholomew, pro vice-chancellor student experience, Ulster University
The introduction of the Teaching Excellence Framework (TEF) has focused attention on how technology-enhanced learning contributes to teaching excellence, and how we can begin to evidence this.
In this session our speakers will consider what strategies universities can use to engage staff and students in order to make the most of technology to support learning, teaching and the student experience.
We also discuss how pedagogy can drive take-up of technology enhanced learning, and how technology-enhanced approaches can contribute to the TEF.
Taking forward change in technology-enhanced educationRichard Hall
My presentation for the JISC-funded Strategy Cascade: Taking forward change in technology-enhanced education workshop, run by Mark Johnson [University of Bolton] and Keith Smythe [Edinburgh Napier University]. See: http://strategycascade.wordpress.com/
We’ve seen many major industries undergo dramatic change in the last decade (i.e. manufacturing, newspapers, and customer service). With the introduction of MOOCs, adaptive learning systems, and content-delivery platforms, higher education doesn’t seem as “untouchable” as it used to. How can you stay ahead of the trends and stay relevant in this new world of technology-enhanced education?
Evidence-based practice in technology-enhanced learningJisc
How much do we know about what works in technology-enhanced learning in higher education?
How can universities and course teams ensure that they’re making most effective use of technology to improve students’ learning experience?
In this workshop you will hear from a range of universities on how they explore impact and what they’ve discovered about what works, and share any findings of your own.
We will also discuss how the evidence base can be brought together and made more accessible.
How does technology-enhanced learning contribute to teaching excellence?Jisc
Speakers:
Sarah Davies, head of higher eduaction and student experience, Jisc
Dr Rhona Sharpe, deputy HR director and head of OCSLD, Oxford Brookes University
Prof Paul Bartholomew, pro vice-chancellor student experience, Ulster University
The introduction of the Teaching Excellence Framework (TEF) has focused attention on how technology-enhanced learning contributes to teaching excellence, and how we can begin to evidence this.
In this session our speakers will consider what strategies universities can use to engage staff and students in order to make the most of technology to support learning, teaching and the student experience.
We also discuss how pedagogy can drive take-up of technology enhanced learning, and how technology-enhanced approaches can contribute to the TEF.
The presentation slides of conference IC2020
https://webikeo.fr/webinar/ic-2-partie-1
Yoan Chabot, Thomas Labbé, Jixiong Liu, Raphaël Troncy
DAGOBAH : Un système d’annotation sémantique de données tabulaires indépendant du contexte
Crude-Oil Scheduling Technology: moving from simulation to optimizationBrenno Menezes
Scheduling technology either commercial or homegrown in today’s crude-oil refining industries relies on a complex simulation of scenarios where the user is solely responsible for making many different decisions manually in the search for feasible solutions over some limited time-horizon i.e., trial-and-error heuristics. As a normal outcome, schedulers abandon these solutions and then return to their simpler spreadsheet simulators due to: (i) time-consuming efforts to configure and manage numerous scheduling scenarios, and (ii) requirements of updating premises and situations that are constantly changing. Moving to solutions based in optimization rather than simulation, the lecture describes the future steps in the refactoring of the scheduling technology in PETROBRAS considering in separate the graphic user interface (GUI) and data communication developments (non-modeling related), and the modeling and process engineering related in an automated decision-making with built-in problem representation facilities and integrated data handling features among other techniques in a smart scheduling frontline.
Results of the GPUs for GEC Competition held at GECCO 2013.
Organizers
Daniele Loiacono, Politecnico di Milano
Antonino Tumeo, Pacific Northwest National Laboratory
Webpage
http://gpu.geccocompetitions.com
Recent Developments in Computational Methods for the Analysis of Ducted Prope...João Baltazar
This paper presents an overview of the recent developments at IST and MARIN in applying computational methods for the hydrodynamic analysis of ducted propellers. The developments focus on the propeller performance prediction in open water conditions using Boundary Element Methods and Reynolds-averaged Navier-Stokes solvers. The paper starts with an estimation of the numerical errors involved in both methods. Then, the different viscous mechanisms involved in the ducted propeller flow are discussed and numerical procedures for the potential flow solution proposed. Finally, the numerical predictions are compared with experimental measurements.
Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYCMLconf
Graph Traversal at 30 billion edges per second with NVIDIA GPUs: I will discuss current research on the MapGraph platform. MapGraph is a new and disruptive technology for ultra-fast processing of large graphs on commodity many-core hardware. On a single GPU you can analyze the bitcoin transaction graph in .35 seconds. With MapGraph on 64 NVIDIA K20 GPUs, you can traverse a scale-free graph of 4.3 billion directed edges in .13 seconds for a throughput of 32 Billion Traversed Edges Per Second (32 GTEPS). I will explain why GPUs are an interesting option for data intensive applications, how we map graphs onto many-core processors, and what the future looks like for the MapGraph platform.
MapGraph provides a familiar vertex-centric abstraction, but its GPU acceleration is 100s of times faster than main memory CPU-only technologies and up to 100,000 times faster than graph technologies based on MapReduce or key-value stores such as HBase, Titan, and Accumulo. Learn more athttp://MapGraph.io.
Despite the existence of data analysis tools such as R, SQL, Excel and others, it is still insufficient to cope with today's big data analysis needs.
The author proposes a CUI (Character User Interface) toolset with dozens of functions to neatly handle tabular data in TSV (Tab Separated Values) files.
It implements many basic and useful functions that have not been implemented in existing software with each function borrowing the ideas of Unix philosophy and covering the most frequent pre-analysis tasks during the initial exploratory stage of data analysis projects.
Also, it greatly speeds up basic analysis tasks, such as drawing cross tables, Venn diagrams, etc., while existing software inevitably requires rather complicated programming and debugging processes for even these basic tasks.
Here, tabular data mainly means TSV (Tab-Separated Values) files as well as other CSV (Comma Separated Value)-type files which are all widely used for storing data and suitable for data analysis.
Presentation by Arjen Markus (Deltares) at the Data Science Symposium 2018, during Delft Software Days - Edition 2018. Thursday 15 November 2018, Delft.
• Utilized SAS to make exploratory analysis of the data then found the optimal model by testing many different models. Made residual analysis and model diagnostics followed by forecast analysis.
• We used time plot, distribution analysis, correlation analysis, stationarity analysis to find optimal model. The forecast analysis proved that our model was good. It provided a simple parametric function that can be used to describe the volatility evolution. Also, it provided simple approach to calculating value at risk of a financial position in risk management.
Abstract: Iterative stencils represent the core computational kernel of many applications belonging to different domains, from scientific computing to finance. Given the complex dependencies and the low computation to memory access ratio, this kernels represent a challenging acceleration target on every architecture. This is especially true for FPGAs, whose direct hardware execution offers the possibility for high performance and power efficiency, but where the non-fixed architecture can lead to very large solutions spaces to be explored.
In this work, we build upon an FPGA-based acceleration methodology for iterative stencil algorithms previously presented, where we provide a dataflow architectural template that implements optimal on-chip buffering and is able to increase almost linearly in performance using a scaling technique denoted as iterations queuing. In particular, we propose a set of design improvements and we elaborate an accurate analytical performance model that can be used to support the exploration of the design space. Experimental results obtained implementing a set of benchmarks from different application domains on a Xilinx VC707 board show an average performance and power efficiency increase over the previous work of respectively around 22x and 8x, and a prediction error that is on average less than 1%.
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/may-2019-embedded-vision-summit-yu
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Chen-Ping Yu, Co-founder and CEO of Phiar, presents the "Separable Convolutions for Efficient Implementation of CNNs and Other Vision Algorithms" tutorial at the May 2019 Embedded Vision Summit.
Separable convolutions are an important technique for implementing efficient convolutional neural networks (CNNs), made popular by MobileNet’s use of depthwise separable convolutions. But separable convolutions are not a new concept, and their utility is not limited to CNNs. Separable convolutions have been widely studied and employed in classical computer vision algorithms as well, in order to reduce computation demands.
We begin this talk with an introduction to separable convolutions. We then explore examples of their application in classical computer vision algorithms and in efficient CNNs, comparing some recent neural network models. We also examine practical considerations of when and how to best utilize separable convolutions in order to maximize their benefits.
The presentation slides of conference IC2020
https://webikeo.fr/webinar/ic-2-partie-1
Yoan Chabot, Thomas Labbé, Jixiong Liu, Raphaël Troncy
DAGOBAH : Un système d’annotation sémantique de données tabulaires indépendant du contexte
Crude-Oil Scheduling Technology: moving from simulation to optimizationBrenno Menezes
Scheduling technology either commercial or homegrown in today’s crude-oil refining industries relies on a complex simulation of scenarios where the user is solely responsible for making many different decisions manually in the search for feasible solutions over some limited time-horizon i.e., trial-and-error heuristics. As a normal outcome, schedulers abandon these solutions and then return to their simpler spreadsheet simulators due to: (i) time-consuming efforts to configure and manage numerous scheduling scenarios, and (ii) requirements of updating premises and situations that are constantly changing. Moving to solutions based in optimization rather than simulation, the lecture describes the future steps in the refactoring of the scheduling technology in PETROBRAS considering in separate the graphic user interface (GUI) and data communication developments (non-modeling related), and the modeling and process engineering related in an automated decision-making with built-in problem representation facilities and integrated data handling features among other techniques in a smart scheduling frontline.
Results of the GPUs for GEC Competition held at GECCO 2013.
Organizers
Daniele Loiacono, Politecnico di Milano
Antonino Tumeo, Pacific Northwest National Laboratory
Webpage
http://gpu.geccocompetitions.com
Recent Developments in Computational Methods for the Analysis of Ducted Prope...João Baltazar
This paper presents an overview of the recent developments at IST and MARIN in applying computational methods for the hydrodynamic analysis of ducted propellers. The developments focus on the propeller performance prediction in open water conditions using Boundary Element Methods and Reynolds-averaged Navier-Stokes solvers. The paper starts with an estimation of the numerical errors involved in both methods. Then, the different viscous mechanisms involved in the ducted propeller flow are discussed and numerical procedures for the potential flow solution proposed. Finally, the numerical predictions are compared with experimental measurements.
Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYCMLconf
Graph Traversal at 30 billion edges per second with NVIDIA GPUs: I will discuss current research on the MapGraph platform. MapGraph is a new and disruptive technology for ultra-fast processing of large graphs on commodity many-core hardware. On a single GPU you can analyze the bitcoin transaction graph in .35 seconds. With MapGraph on 64 NVIDIA K20 GPUs, you can traverse a scale-free graph of 4.3 billion directed edges in .13 seconds for a throughput of 32 Billion Traversed Edges Per Second (32 GTEPS). I will explain why GPUs are an interesting option for data intensive applications, how we map graphs onto many-core processors, and what the future looks like for the MapGraph platform.
MapGraph provides a familiar vertex-centric abstraction, but its GPU acceleration is 100s of times faster than main memory CPU-only technologies and up to 100,000 times faster than graph technologies based on MapReduce or key-value stores such as HBase, Titan, and Accumulo. Learn more athttp://MapGraph.io.
Despite the existence of data analysis tools such as R, SQL, Excel and others, it is still insufficient to cope with today's big data analysis needs.
The author proposes a CUI (Character User Interface) toolset with dozens of functions to neatly handle tabular data in TSV (Tab Separated Values) files.
It implements many basic and useful functions that have not been implemented in existing software with each function borrowing the ideas of Unix philosophy and covering the most frequent pre-analysis tasks during the initial exploratory stage of data analysis projects.
Also, it greatly speeds up basic analysis tasks, such as drawing cross tables, Venn diagrams, etc., while existing software inevitably requires rather complicated programming and debugging processes for even these basic tasks.
Here, tabular data mainly means TSV (Tab-Separated Values) files as well as other CSV (Comma Separated Value)-type files which are all widely used for storing data and suitable for data analysis.
Presentation by Arjen Markus (Deltares) at the Data Science Symposium 2018, during Delft Software Days - Edition 2018. Thursday 15 November 2018, Delft.
• Utilized SAS to make exploratory analysis of the data then found the optimal model by testing many different models. Made residual analysis and model diagnostics followed by forecast analysis.
• We used time plot, distribution analysis, correlation analysis, stationarity analysis to find optimal model. The forecast analysis proved that our model was good. It provided a simple parametric function that can be used to describe the volatility evolution. Also, it provided simple approach to calculating value at risk of a financial position in risk management.
Abstract: Iterative stencils represent the core computational kernel of many applications belonging to different domains, from scientific computing to finance. Given the complex dependencies and the low computation to memory access ratio, this kernels represent a challenging acceleration target on every architecture. This is especially true for FPGAs, whose direct hardware execution offers the possibility for high performance and power efficiency, but where the non-fixed architecture can lead to very large solutions spaces to be explored.
In this work, we build upon an FPGA-based acceleration methodology for iterative stencil algorithms previously presented, where we provide a dataflow architectural template that implements optimal on-chip buffering and is able to increase almost linearly in performance using a scaling technique denoted as iterations queuing. In particular, we propose a set of design improvements and we elaborate an accurate analytical performance model that can be used to support the exploration of the design space. Experimental results obtained implementing a set of benchmarks from different application domains on a Xilinx VC707 board show an average performance and power efficiency increase over the previous work of respectively around 22x and 8x, and a prediction error that is on average less than 1%.
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/may-2019-embedded-vision-summit-yu
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Chen-Ping Yu, Co-founder and CEO of Phiar, presents the "Separable Convolutions for Efficient Implementation of CNNs and Other Vision Algorithms" tutorial at the May 2019 Embedded Vision Summit.
Separable convolutions are an important technique for implementing efficient convolutional neural networks (CNNs), made popular by MobileNet’s use of depthwise separable convolutions. But separable convolutions are not a new concept, and their utility is not limited to CNNs. Separable convolutions have been widely studied and employed in classical computer vision algorithms as well, in order to reduce computation demands.
We begin this talk with an introduction to separable convolutions. We then explore examples of their application in classical computer vision algorithms and in efficient CNNs, comparing some recent neural network models. We also examine practical considerations of when and how to best utilize separable convolutions in order to maximize their benefits.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
2. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
• Located on the outskirts of
Geneva. France - Switzerland
• 27 km in circumference
• The tunnel is buried around 50
to 175 m. underground.
2
LHC - CERN
4. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 4
Multiple Algorithms in Parallel"#$%&'()&(%*+(,($-.&.+/*%012.
!!!!!"##$%&'!!!!!!!!!!!!!!!!"()&$*(+!!!!!!!!!!,(%-*.
/&0*$*#+!1-&&$!!2&3-(4!2&%5#-6$!!74&8&+%$
Using another ML algorithm to combine the
result of individual classifiers.
Purpose: extract all possible information
from the Dataset.
The Combination
produces an output, from
where all measurements
are obtained
Combine
5. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 5
Mobile Market Place
6. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
Data Processing and Modelling
Transaction
grade
APIs + MQs
Data Lake
HBase,
Cassandra,
etc.
Stream
Processing
Batch
Processing
Model
Generator
Decision
Engine
(context, event, data)
(event)
(data)
Feature Selection
Model Training
Model Evaluation
Model Assembly
Real-Time
Layer
Batch Processing
Layer
{
Data Science
1. Fraud Detection
2. Search
3. Recommendations
4. Notifications
5. Ratings
6. Merchant Intelligence
7. Engagement
Optimization
8. Marketing Optimization
9. App Personalization
10. Ad Network Support
11. Image / Speech
Recognition
Theory
(Math, Algorithms)
Proof-of-Concept
(R, Python, Scala, C++)
Spark Implementation
(Scalability, Robustness)
Platform Integration
7. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
Fraud Detection
7
• Very small number of fraud cases
• Large number of good transactions
• Many different “types” of anomalies.
Hard for algorithms to learn from
positive examples what the anomalies
look like
• Future anomalies may look nothing
like any of the anomalous examples
we’ve seen so far
8. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 8
Personalization
• Offers targeted for each user
• Use browsing history and shopping
habits to determine products the user is
most likely to buy
• Similarity among users
• Similarity among items
• Catalog search results
9. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 9
Incorporating ML to Design
Visual Inputs
Aural Inputs
Corporal Inputs
Environmental Inputs
10. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
• Machine Learning algorithm capable of
discovering pattern with data presented to
them. How can we make use of it?
• Find discovery opportunities that only are
possible with the help of Machine Learning
• Designers and programmers to establish a
strong collaboration to find ground-
breaking applications.
• Understand rules to know which ones to
bend or break
10
Creating Dialogue
12. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 12
Search Strategy
Initial
objects Found it!
15
)2
Invariant Mass (GeV/c
5.2 5.4 5.6 5.8 6 6.2 6.4 6.6 6.8 7
Events/(0.05)
2
4
6
8
10
12
)2
Invariant Mass (GeV/c
5.2 5.4 5.6 5.8 6 6.2 6.4 6.6 6.8 7
Events/(0.05)
2
4
6
8
10
12
)2
Invariant Mass (GeV/c
5.2 5.4 5.6 5.8 6 6.2 6.4 6.6 6.8 7
Events/(0.05)
0
2
4
6
8
10
12
)2
Invariant Mass (GeV/c
5.2 5.4 5.6 5.8 6 6.2 6.4 6.6 6.8 7
Events/(0.05)
0
2
4
6
8
10
12
FIG. 16: b mass distribution of background events from J/ sideband events after all selection cuts have been applied (top),
and these events -red squares- on top of the signal observed in right-sign combination events -open circles- (bottom).
3. ⇥b reconstruction on b ⇥ J/⇥ (p ) MC events.
We applied our ⇥b selection on 30K generated b ⇥ J/⇥ (p ) MC events. This is p17 MC with the same
cuts at generation level as those applied to our ⇥b MC, and reprocessed with the same extended configuration
as used on data. No events survived after selection.
VI. CONCLUSIONS
By using a simple set of cuts we observe a signal peak with a mass of 5.774 ± 0.011 GeV/c2
(stat) ± 0.22 GeV/c2
(sys) and a width of 0.037 ± 0.008 GeV/c2
, a significance of 5.53 and S/
⇤
B = 7.80. This peak is showed in Fig. 12
and the results of the fit are in Table II. This support the previous report of the observation by using Bagger Decision
Trees [6]. We measure a relative production ratio to be
f(b⇥⇥b )Br(⇥b ⇥J/⇥⇥ ( ))
f(b⇥ b)Br( b⇥J/⇥ ) = 0.376 ± 0.119stat. ± 0.188syst
[1] PL B384 449, D. Buskalic et. al.
[2] ZPHY C68 541 P. Abreu et al.
[3] Common Samples Group, http://wwwd0.fnal.gov/Run2Physics/cs/.
[4] See description of ”J/psi & dimuon mass continuum” at http://d0server1.fnal.gov/users/nomerot/Run2A/BANA/Dskim.html.
[5] Reconstruction of B hadron signals at DØ , DØ Note 4481.
[6] DØ Note 5401.
DØ Note 5403
Version 4.1 as June 5, 2007
Observation of the heavy baryon b
E. De La Cruz Burelo, H.A. Neal, and J. Qian
University of Michigan
B. Abbott
University of Oklahoma
G.D. Alexeev, Yu.P. Merekov, G.A. Panov, A.M. Rozhdestvensky, L.S. Vertogradov, Yu.L. Vertogradova
Joint Institute for Nuclear Research, Russia
Using approximately 1.3 fb 1
of data collected by the upgraded DØ detector in Run II of the
Tevatron, the ⇤b state has been observed in the decay mode J/⇤(⇤ µ+
µ )⇤ (⇤ ⇤ ⇥⇥±
, ⇥ ⇤ ⇥p)
A tracking algorithm which allows a more e⇧cient method of reconstructing tracks with large impact
parameters was used in order to increase the e⇧ciency of reconstructing the ⇥ and ⇤ . We observe
the ⇤b with a significance of 2 ln(L) = 5.53, S/
⌅
B = 7.80 with a mass of 5.774 ± 0.011
GeV/c2
(stat) ± .022 GeV/c2
(sys). We measure the relative production ratio to be
f(b ⇤ ⇤b )Br(⇤b ⇤ J/⇤⇤ (⇥⇥ ))
f(b ⇤ ⇥b)Br(⇥b ⇤ J/⇤⇥)
= 0.376 ± 0.119 stat. ± 0.188 syst.
Data Cleaning
Signal to Bkg
20:1
Initial
objects
Found it!Data Cleaning
Machine
Learning
9.4.2 Observed Results
tb+tqb DT Output
0 0.2 0.4 0.6 0.8 1
EventYield
0
200
400
600
800
-1
D0 RunII Prelim. 2.3 fb
channelµp17+p20 e+
1-2 b-tags
2-4 jets
tb+tqb DT Output
0 0.2 0.4 0.6 0.8 1
EventYield
0
200
400
600
800
tb+tqb DT Output
0 0.2 0.4 0.6 0.8 1
EventYield
2
10
3
10 -1
D0 RunII Prelim. 2.3 fb
channelµp17+p20 e+
1-2 b-tags
2-4 jets
tb+tqb DT Output
0 0.2 0.4 0.6 0.8 1
EventYield
2
10
3
10
ield
60 -1
D0 RunII Prelim. 2.3 fb
ield
60
Traditional searches
Small Signal Analysis
Signal to Bkg
1:20
13. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
Signal
13
Decision Trees
1
Signal
Bkg
Bkg
x
y
y
1
1
2
2
X
Y
Bkg
Signal
x<x
x<x
y<y
Bkg
Signal
BkgSignal
Bkg
x
2
1
1
L 4 R4
L R3 3L R2 2
y<y2
L 1 R
Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving
the classification problem of signal and background.
8.1 Overview
Signal
Signal
Bkg
Bkg
Bkg
Task: separate signal from background
Issue: A single split on X or Y is not
enough!
Solution: Use a series of
consecutive splits,
generating a tree structure
1
Signal
Bkg
Bkg
x
y
y
1
1
2
2
X
Y
Bkg
Signal
x<x
x<x
y<y
Bkg
Signal
BkgSignal
Bkg
x
2
1
1
L 4 R4
L R3 3L R2 2
y<y2
L 1 R
Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving
the classification problem of signal and background.
14. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
Signal
14
Decision Trees
1
Signal
Bkg
Bkg
x
y
y
1
1
2
2
X
Y
Bkg
Signal
x<x
x<x
y<y
Bkg
Signal
BkgSignal
Bkg
x
2
1
1
L 4 R4
L R3 3L R2 2
y<y2
L 1 R
Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving
the classification problem of signal and background.
8.1 Overview
Failed
C1
Split 1: on the X variable
1
Signal
Bkg
Bkg
x
y
y
1
1
2
2
X
Y
Bkg
Signal
x<x
x<x
y<y
Bkg
Signal
BkgSignal
Bkg
x
2
1
1
L 4 R4
L R3 3L R2 2
y<y2
L 1 R
Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving
the classification problem of signal and background.
Passed
C1
P1F1
15. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
Signal
15
Decision Trees
1
Signal
Bkg
Bkg
x
y
y
1
1
2
2
X
Y
Bkg
Signal
x<x
x<x
y<y
Bkg
Signal
BkgSignal
Bkg
x
2
1
1
L 4 R4
L R3 3L R2 2
y<y2
L 1 R
Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving
the classification problem of signal and background.
8.1 Overview
F: C1
F: C2
Split 2: Recovered events that failed the split 1
1
Signal
Bkg
Bkg
x
y
y
1
1
2
2
X
Y
Bkg
Signal
x<x
x<x
y<y
Bkg
Signal
BkgSignal
Bkg
x
2
1
1
L 4 R4
L R3 3L R2 2
y<y2
L 1 R
Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving
the classification problem of signal and background.
Passed
C1
P1F1
P2F2
F: C1
P: C2
repeat and continue the splitting process until events are classified
16. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 16
Decision Trees
After 4 splits: Signal and Background regions are separated! Done!
1
Signal
Bkg
Bkg
x
y
y
1
1
2
2
X
Y
Bkg
Signal
x<x
x<x
y<y
Bkg
Signal
BkgSignal
Bkg
x
2
1
1
L 4 R4
L R3 3L R2 2
y<y2
L 1 R
Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving
the classification problem of signal and background.
P1F1
P2F2 P3F3
P4F4
Signal
1
Signal
Bkg
Bkg
x
y
y
1
1
2
2
X
Y
Bkg
Signal
x<x
x<x
y<y
Bkg
Signal
BkgSignal
Bkg
x
2
1
1
L 4 R4
L R3 3L R2 2
y<y2
L 1 R
Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving
the classification problem of signal and background.
8.1 Overview
F: C1
P: C2
P: C1,C2
F: C4
P: C1,
C3,C4
F: C1,C2
P: C1
F: C2
Toy model: only 2 variables, easy to determine cut values
17. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 17
A/B Testing
18. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
• Consultant @ BMO Capital Markets
• Previously:
• Data Scientist @ Paytm Labs
• Researcher - ATLAS Experiment @ CERN
• Researcher - Fermilab National Laboratory
18
Background
19. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
Anomaly detection
19
๏
Fit model on training set
๏
On a cross validation/test example, predict
๏
Possible evaluation metrics:
๏ True positive, false positive, false negative, true negative
๏ Precision/Recall
๏ F1-score
20. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016
• The SM describes the world around
us
• Components:
• 24 particles of matter
• 4 mediators
• Interactions of the particles explained
by the mediators
• Does not include: gravity, dark
matter and dark energy
20
Standard Model (SM)
21. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 21
Identity Resolution
• What?
Identify products having similar properties (name, colour, size) as a
unique product
• Why?
Recommender systems trained on these products would produce
better recommendations -> Non-repetitive
• How?
• Classifying pairs as match or non-match, based on how similar they
are.
• Making use of catalog known features