SlideShare a Scribd company logo
Center for Data Science
Paris-Saclay1
CNRS & University Paris Saclay	

TektosData
BALÁZS KÉGL
RAPID ANALYTICS AND MODEL
PROTOTYPING (RAMP)	

COLLABORATIVE CHALLENGE WITH CODE SUBMISSION
Center for Data Science
Paris-Saclay
Center for Data Science
Paris-Saclay2
A bit of context
Center for Data Science
Paris-Saclay3
UNIVERSITÉ PARIS-SACLAY
19 founding partners
Center for Data Science
Paris-Saclay
Center for Data Science
Paris-Saclay4
UNIVERSITÉ PARIS-SACLAY
+ horizontal multi-disciplinary and multi-partner
initiatives to create cohesion
Center for Data Science
Paris-Saclay
Center for Data Science
Paris-Saclay5
Center for Data Science
Paris-Saclay
A multi-disciplinary initiative to define, structure, and manage
the data science ecosystem at the Université Paris-Saclay
http://www.datascience-paris-saclay.fr/
Biology & bioinformatics
IBISC/UEvry
LRI/UPSud
Hepatinov
CESP/UPSud-UVSQ-Inserm
IGM-I2BC/UPSud
MIA/Agro
MIAj-MIG/INRA
LMAS/Centrale
Chemistry
EA4041/UPSud
Earth sciences
LATMOS/UVSQ
GEOPS/UPSud
IPSL/UVSQ
LSCE/UVSQ
LMD/Polytechnique
Economy
LM/ENSAE
RITM/UPSud
LFA/ENSAE
Neuroscience
UNICOG/Inserm
U1000/Inserm
NeuroSpin/CEA
Particle physics
astrophysics &
cosmology
LPP/Polytechnique
DMPH/ONERA
CosmoStat/CEA
IAS/UPSud
AIM/CEA
LAL/UPSud
250researchers in 35laboratories
Machine learning
LRI/UPSud
LTCI/Telecom
CMLA/Cachan
LS/ENSAE
LIX/Polytechnique
MIA/Agro
CMA/Polytechnique
LSS/Supélec
CVN/Centrale
LMAS/Centrale
DTIM/ONERA
IBISC/UEvry
Visualization
INRIA
LIMSI
Signal processing
LTCI/Telecom
CMA/Polytechnique
CVN/Centrale
LSS/Supélec
CMLA/Cachan
LIMSI
DTIM/ONERA
Statistics
LMO/UPSud
LS/ENSAE
LSS/Supélec
CMA/Polytechnique
LMAS/Centrale
MIA/AgroParisTech
machine learning
information retrieval
signal processing
data visualization
databases
Domain science
human society
life
brain
earth
universe
Tool building
software engineering
clouds/grids
high-performance
computing
optimization
Domain scientistSoftware engineer
datascience-paris-saclay.fr
LIST/CEA
Center for Data Science
Paris-Saclay
Data domains
energy and physical sciences
health and life sciences
Earth and environment
economy and society
brain
Data scientist
Data trainer
Applied scientist
Domain scientistSoftware engineer
Data engineer
Data science
statistics

machine learning
information retrieval
signal processing
data visualization
databases
Tool building
software engineering

clouds/grids
high-performance

computing
optimization
• (The lack of) manpower	

• especially at the interfaces
• industrial brain-drain	

• Incentives	

• data scientists are not incentivized 

to work on domain science	

• scientists are not incentivized to work on tools	

• Access	

• no well-developed channels to identify the right experts for a given problem	

• Tools	

• few tools that can help domain scientists and data scientists to collaborate efficiently
6
CHALLENGES
https://medium.com/@balazskegl
Center for Data Science
Paris-Saclay
Center for Data Science
Paris-Saclay7
TWO ANALYTICS TOOLS FOR INITIATING	

DOMAIN-DATA SCIENCE INTERACTIONS
RAPID ANALYTICS AND
MODEL PROTOTYPING
(RAMP)
DATA CHALLENGES
Center for Data Science
Paris-Saclay
Center for Data Science
Paris-Saclay8
DATA CHALLENGES
Center for Data Science
Paris-Saclay
Center for Data Science
Paris-Saclay9
DATA CHALLENGES
• The HiggsML challenge on Kaggle	

• https://www.kaggle.com/c/higgs-boson
Center for Data Science
Paris-Saclay
Center for Data Science
Paris-Saclay10
HUGE PUBLICITY	

CLASSIFICATION FOR DISCOVERY
14
Center for Data Science
Paris-Saclay
Center for Data Science
Paris-Saclay11
SIGNIFICANT IMPROVEMENT OVER THE BASELINE
CLASSIFICATION FOR DISCOVERY
15
Center for Data Science
Paris-Saclay
Center for Data Science
Paris-Saclay12
HUGE PUBLICITY	

SIGNIFICANT IMPROVEMENT OVER THE BASELINE
yet partially missing the objectives
Center for Data Science
Paris-Saclay
• Challenges are useful for	

• generating visibility in the data science community about novel
application domains	

• benchmarking in a fair way state-of-the-art techniques on well-
defined problems	

• finding talented data scientists	

• Limitations	

• not necessary adapted to solving complex and open-ended data
science problems in realistic environments	

• no direct access to solutions and data scientist	

• emphasizes competition
13
DATA CHALLENGES
Center for Data Science
Paris-Saclay
Center for Data Science
Paris-Saclay14
HUGE PUBLICITY	

We decided to design something better
Center for Data Science
Paris-Saclay
• Prototyping
• Training
• Human resources	

• Collaboration building, networking
• Social science observatory
15
RAPID ANALYTICS AND MODEL PROTOTYPING (RAMP)
Center for Data Science
Paris-Saclay
RAMPS
16
• Single-day coding sessions
• 20-40 participants	

• preparation is similar to challenges
• Goals	

• focusing and motivating top talents	

• promoting collaboration, speed, and efficiency	

• solving (prototyping) real problems
Center for Data Science
Paris-Saclay17
ANALYTICS TOOLS TO PROMOTE 	

COLLABORATION AND CODE REUSE
Center for Data Science
Paris-Saclay
Center for Data Science
Paris-Saclay18
ANALYTICS TOOL TO PROMOTE 	

COLLABORATION AND CODE REUSE
Center for Data Science
Paris-Saclay
RAMPS
19
www.ramp.studio
software + management
backend is open source:
https://github.com/camillemarini/datarun
Center for Data Science
Paris-Saclay
Center for Data Science
Paris-Saclay
2015 Jan 15
The HiggsML challenge
20
RAPID ANALYTICS AND MODEL PROTOTYPING
Center for Data Science
Paris-Saclay
Center for Data Science
Paris-Saclay
2015 Apr 10
Classifying variable stars
21
RAPID ANALYTICS AND MODEL PROTOTYPING
Center for Data Science
Paris-Saclay
Center for Data Science
Paris-Saclay22
VARIABLE STARS
Center for Data Science
Paris-Saclay
Center for Data Science
Paris-Saclay23
VARIABLE STARS
accuracy improvement: 89% to 96%
Center for Data Science
Paris-Saclay
Center for Data Science
Paris-Saclay
2015 June 16 and Sept 26
Predicting El Nino
24
RAPID ANALYTICS AND MODEL PROTOTYPING
Center for Data Science
Paris-Saclay
Center for Data Science
Paris-Saclay25
RMSE improvement: 0.9˚C to 0.4˚C
RAPID ANALYTICS AND MODEL PROTOTYPING
Center for Data Science
Paris-Saclay
Center for Data Science
Paris-Saclay26
2015 October 8
Insect classification
RAPID ANALYTICS AND MODEL PROTOTYPING
Center for Data Science
Paris-Saclay
Center for Data Science
Paris-Saclay27
accuracy improvement: 30% to 70%
RAPID ANALYTICS AND MODEL PROTOTYPING
Center for Data Science
Paris-Saclay
Center for Data Science
Paris-Saclay28
2016 February 10
Macroeconomic agent-based models
RAPID ANALYTICS AND MODEL PROTOTYPING
Center for Data Science
Paris-Saclay
Center for Data Science
Paris-Saclay29
f1-score improvement: 0.57 to 0.63
RAPID ANALYTICS AND MODEL PROTOTYPING
Center for Data Science
Paris-Saclay
Center for Data Science
Paris-Saclay30
2016 February 13
Epidemium cancer survival rate
RAPID ANALYTICS AND MODEL PROTOTYPING
Center for Data Science
Paris-Saclay
Center for Data Science
Paris-Saclay31
RMSE improvement: 3000 to 300
RAPID ANALYTICS AND MODEL PROTOTYPING
Center for Data Science
Paris-Saclay
Center for Data Science
Paris-Saclay32
2016 May 11
Drug identification from spectra
RAPID ANALYTICS AND MODEL PROTOTYPING
Center for Data Science
Paris-Saclay
Center for Data Science
Paris-Saclay33
Drug identification error
improvement: 9% to 3%
Drug concentration accuracy
improvement: 20% to 12%
RAPID ANALYTICS AND MODEL PROTOTYPING
Center for Data Science
Paris-Saclay
• Fast development of analytics solutions	

• Teaching support	

• Networking and HR support	

• Support for collaborative team work	

• Commercialized through TektosData
34
THE RAMP TOOL
A prototyping tool for collaborative
development of data science workflows
Center for Data Science
Paris-Saclay
• We have a cool tool for collaborative data analytics
• designing workflows beyond scikit-learn predictors
• Data management/munging is a big part of the data
analytics workflow, we need tools	

• preparing a RAMP takes two weeks to six months	

• Big data is rare: our problems are more about flexible
organization of heterogeneous data
35
TAKE HOME MESSAGES
Center for Data Science
Paris-Saclay36
THANK YOU!

More Related Content

Viewers also liked

Game On! Using Video Games to Ramp Up Your Instruction
Game On! Using Video Games to Ramp Up Your InstructionGame On! Using Video Games to Ramp Up Your Instruction
Game On! Using Video Games to Ramp Up Your Instruction
Jennifer LaGarde
 
How to Beat NYC Parking Ticket For: (1) Missing Required Elements (2) Pedest...
How to Beat NYC Parking Ticket For:  (1) Missing Required Elements (2) Pedest...How to Beat NYC Parking Ticket For:  (1) Missing Required Elements (2) Pedest...
How to Beat NYC Parking Ticket For: (1) Missing Required Elements (2) Pedest...
Lawrence Berezin
 
RAMP - Your way out of the modernisation maze
RAMP - Your way out of the modernisation mazeRAMP - Your way out of the modernisation maze
RAMP - Your way out of the modernisation maze
LANSA
 
LinkedIn Content Marketing 2.0 Seminar: Five Tactics to Ramp & Scale Your Con...
LinkedIn Content Marketing 2.0 Seminar: Five Tactics to Ramp & Scale Your Con...LinkedIn Content Marketing 2.0 Seminar: Five Tactics to Ramp & Scale Your Con...
LinkedIn Content Marketing 2.0 Seminar: Five Tactics to Ramp & Scale Your Con...
LinkedIn Singapore
 
Time After Time: Creating Reliable Administrative Methods Process (RAMP)
Time After Time: Creating Reliable Administrative Methods Process (RAMP)Time After Time: Creating Reliable Administrative Methods Process (RAMP)
Time After Time: Creating Reliable Administrative Methods Process (RAMP)
Beckie Layton
 
Entreprise Sylvie Lacelle
Entreprise Sylvie LacelleEntreprise Sylvie Lacelle
Entreprise Sylvie Lacelle
Sylvie Lacelle
 
1.07 D.A Other Heat Demos
1.07 D.A Other Heat Demos1.07 D.A Other Heat Demos
1.07 D.A Other Heat Demos
phyz
 
RAMP: Repository Analytics and Metrics Portal
RAMP: Repository Analytics and Metrics PortalRAMP: Repository Analytics and Metrics Portal
RAMP: Repository Analytics and Metrics Portal
Kenning Arlitsch
 
FG Gardiner New Simcoe Ramp - display boards
FG Gardiner   New Simcoe Ramp - display boardsFG Gardiner   New Simcoe Ramp - display boards
FG Gardiner New Simcoe Ramp - display boards
Toronto Public Consultation Unit
 
Stage 2011 Ret D
Stage 2011 Ret DStage 2011 Ret D
Stage 2011 Ret DCerestech
 
Circulatory & respiratory systems
Circulatory & respiratory systemsCirculatory & respiratory systems
Circulatory & respiratory systems
Mr. Fields' Class
 
Concept 30.2
Concept 30.2Concept 30.2
Concept 30.2
Javier Aguirre
 

Viewers also liked (13)

Game On! Using Video Games to Ramp Up Your Instruction
Game On! Using Video Games to Ramp Up Your InstructionGame On! Using Video Games to Ramp Up Your Instruction
Game On! Using Video Games to Ramp Up Your Instruction
 
How to Beat NYC Parking Ticket For: (1) Missing Required Elements (2) Pedest...
How to Beat NYC Parking Ticket For:  (1) Missing Required Elements (2) Pedest...How to Beat NYC Parking Ticket For:  (1) Missing Required Elements (2) Pedest...
How to Beat NYC Parking Ticket For: (1) Missing Required Elements (2) Pedest...
 
RAMP - Your way out of the modernisation maze
RAMP - Your way out of the modernisation mazeRAMP - Your way out of the modernisation maze
RAMP - Your way out of the modernisation maze
 
LinkedIn Content Marketing 2.0 Seminar: Five Tactics to Ramp & Scale Your Con...
LinkedIn Content Marketing 2.0 Seminar: Five Tactics to Ramp & Scale Your Con...LinkedIn Content Marketing 2.0 Seminar: Five Tactics to Ramp & Scale Your Con...
LinkedIn Content Marketing 2.0 Seminar: Five Tactics to Ramp & Scale Your Con...
 
Time After Time: Creating Reliable Administrative Methods Process (RAMP)
Time After Time: Creating Reliable Administrative Methods Process (RAMP)Time After Time: Creating Reliable Administrative Methods Process (RAMP)
Time After Time: Creating Reliable Administrative Methods Process (RAMP)
 
Stage 2011
Stage 2011Stage 2011
Stage 2011
 
Entreprise Sylvie Lacelle
Entreprise Sylvie LacelleEntreprise Sylvie Lacelle
Entreprise Sylvie Lacelle
 
1.07 D.A Other Heat Demos
1.07 D.A Other Heat Demos1.07 D.A Other Heat Demos
1.07 D.A Other Heat Demos
 
RAMP: Repository Analytics and Metrics Portal
RAMP: Repository Analytics and Metrics PortalRAMP: Repository Analytics and Metrics Portal
RAMP: Repository Analytics and Metrics Portal
 
FG Gardiner New Simcoe Ramp - display boards
FG Gardiner   New Simcoe Ramp - display boardsFG Gardiner   New Simcoe Ramp - display boards
FG Gardiner New Simcoe Ramp - display boards
 
Stage 2011 Ret D
Stage 2011 Ret DStage 2011 Ret D
Stage 2011 Ret D
 
Circulatory & respiratory systems
Circulatory & respiratory systemsCirculatory & respiratory systems
Circulatory & respiratory systems
 
Concept 30.2
Concept 30.2Concept 30.2
Concept 30.2
 

Similar to RAMP: Collaborative challenge with code submission

The Paris-Saclay Center for Data Science
The Paris-Saclay Center for Data ScienceThe Paris-Saclay Center for Data Science
The Paris-Saclay Center for Data Science
Balázs Kégl
 
The systemic challenges in data science initiatives (and some solutions)
The systemic challenges in data science initiatives (and some solutions)The systemic challenges in data science initiatives (and some solutions)
The systemic challenges in data science initiatives (and some solutions)
Balázs Kégl
 
Build your own data challenge, or just organize team work
Build your own data challenge, or just organize team workBuild your own data challenge, or just organize team work
Build your own data challenge, or just organize team work
Balázs Kégl
 
RAMP Data Challenge
RAMP Data Challenge RAMP Data Challenge
RAMP Data Challenge
Proto204
 
What is wrong with data challenges
What is wrong with data challengesWhat is wrong with data challenges
What is wrong with data challenges
Balázs Kégl
 
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE
 
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
Pedro Príncipe
 
Frankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee ProjeectFrankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee Projeect
Goethe Univeristy
 
Data Science 101
Data Science 101Data Science 101
Data Science 101
odsc
 
Patrick Consorti for ISSIP
Patrick Consorti for ISSIPPatrick Consorti for ISSIP
From Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceFrom Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data Science
Institute of Contemporary Sciences
 
Updates from Apereo Learning Analytics Initiative (Apereo LAI)
Updates from Apereo Learning Analytics Initiative (Apereo LAI)Updates from Apereo Learning Analytics Initiative (Apereo LAI)
Updates from Apereo Learning Analytics Initiative (Apereo LAI)
Sandeep M. Jayaprakash
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data Science
Juuso Parkkinen
 
Winning Marie Curie with Open Science 2018
Winning Marie Curie with Open Science 2018Winning Marie Curie with Open Science 2018
Winning Marie Curie with Open Science 2018
Ivo Grigorov
 
Dolap13 v9 7.docx
Dolap13 v9 7.docxDolap13 v9 7.docx
Dolap13 v9 7.docx
Hassan Nazih
 
Efficient Data Labelling for Ocular Imaging
Efficient Data Labelling for Ocular ImagingEfficient Data Labelling for Ocular Imaging
Efficient Data Labelling for Ocular Imaging
PetteriTeikariPhD
 
OpenAIRE Infrastructure & Services: we need your input!
OpenAIRE Infrastructure & Services: we need your input!OpenAIRE Infrastructure & Services: we need your input!
OpenAIRE Infrastructure & Services: we need your input!
OpenAIRE
 
How to become the best datascientist in Europe
How to become the best datascientist in EuropeHow to become the best datascientist in Europe
How to become the best datascientist in Europe
DigitYser
 
Smb 25092014 helma rutjes pivot park
Smb 25092014 helma rutjes   pivot parkSmb 25092014 helma rutjes   pivot park
Smb 25092014 helma rutjes pivot park
SMBBV
 
Semantic Web Technologies: Principles and Practices
Semantic Web Technologies: Principles and PracticesSemantic Web Technologies: Principles and Practices
Semantic Web Technologies: Principles and Practices
Steffen Staab
 

Similar to RAMP: Collaborative challenge with code submission (20)

The Paris-Saclay Center for Data Science
The Paris-Saclay Center for Data ScienceThe Paris-Saclay Center for Data Science
The Paris-Saclay Center for Data Science
 
The systemic challenges in data science initiatives (and some solutions)
The systemic challenges in data science initiatives (and some solutions)The systemic challenges in data science initiatives (and some solutions)
The systemic challenges in data science initiatives (and some solutions)
 
Build your own data challenge, or just organize team work
Build your own data challenge, or just organize team workBuild your own data challenge, or just organize team work
Build your own data challenge, or just organize team work
 
RAMP Data Challenge
RAMP Data Challenge RAMP Data Challenge
RAMP Data Challenge
 
What is wrong with data challenges
What is wrong with data challengesWhat is wrong with data challenges
What is wrong with data challenges
 
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
 
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
 
Frankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee ProjeectFrankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee Projeect
 
Data Science 101
Data Science 101Data Science 101
Data Science 101
 
Patrick Consorti for ISSIP
Patrick Consorti for ISSIPPatrick Consorti for ISSIP
Patrick Consorti for ISSIP
 
From Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceFrom Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data Science
 
Updates from Apereo Learning Analytics Initiative (Apereo LAI)
Updates from Apereo Learning Analytics Initiative (Apereo LAI)Updates from Apereo Learning Analytics Initiative (Apereo LAI)
Updates from Apereo Learning Analytics Initiative (Apereo LAI)
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data Science
 
Winning Marie Curie with Open Science 2018
Winning Marie Curie with Open Science 2018Winning Marie Curie with Open Science 2018
Winning Marie Curie with Open Science 2018
 
Dolap13 v9 7.docx
Dolap13 v9 7.docxDolap13 v9 7.docx
Dolap13 v9 7.docx
 
Efficient Data Labelling for Ocular Imaging
Efficient Data Labelling for Ocular ImagingEfficient Data Labelling for Ocular Imaging
Efficient Data Labelling for Ocular Imaging
 
OpenAIRE Infrastructure & Services: we need your input!
OpenAIRE Infrastructure & Services: we need your input!OpenAIRE Infrastructure & Services: we need your input!
OpenAIRE Infrastructure & Services: we need your input!
 
How to become the best datascientist in Europe
How to become the best datascientist in EuropeHow to become the best datascientist in Europe
How to become the best datascientist in Europe
 
Smb 25092014 helma rutjes pivot park
Smb 25092014 helma rutjes   pivot parkSmb 25092014 helma rutjes   pivot park
Smb 25092014 helma rutjes pivot park
 
Semantic Web Technologies: Principles and Practices
Semantic Web Technologies: Principles and PracticesSemantic Web Technologies: Principles and Practices
Semantic Web Technologies: Principles and Practices
 

More from Balázs Kégl

Data-driven hypothesis generation using deep neural nets
Data-driven hypothesis generation using deep neural netsData-driven hypothesis generation using deep neural nets
Data-driven hypothesis generation using deep neural nets
Balázs Kégl
 
Model-based reinforcement learning and self-driving engineering systems
Model-based reinforcement learning and self-driving engineering systemsModel-based reinforcement learning and self-driving engineering systems
Model-based reinforcement learning and self-driving engineering systems
Balázs Kégl
 
Managing the AI process: putting humans (back) in the loop
Managing the AI process: putting humans (back) in the loopManaging the AI process: putting humans (back) in the loop
Managing the AI process: putting humans (back) in the loop
Balázs Kégl
 
DARMDN: Deep autoregressive mixture density nets for dynamical system mode...
   DARMDN: Deep autoregressive mixture density nets for dynamical system mode...   DARMDN: Deep autoregressive mixture density nets for dynamical system mode...
DARMDN: Deep autoregressive mixture density nets for dynamical system mode...
Balázs Kégl
 
Machine learning in scientific workflows
Machine learning in scientific workflowsMachine learning in scientific workflows
Machine learning in scientific workflows
Balázs Kégl
 
A historical introduction to deep learning: hardware, data, and tricks
A historical introduction to deep learning: hardware, data, and tricksA historical introduction to deep learning: hardware, data, and tricks
A historical introduction to deep learning: hardware, data, and tricks
Balázs Kégl
 
Deep learning and the systemic challenges of data science initiatives
Deep learning and the systemic challenges of data science initiativesDeep learning and the systemic challenges of data science initiatives
Deep learning and the systemic challenges of data science initiatives
Balázs Kégl
 
Learning do discover: machine learning in high-energy physics
Learning do discover: machine learning in high-energy physicsLearning do discover: machine learning in high-energy physics
Learning do discover: machine learning in high-energy physics
Balázs Kégl
 

More from Balázs Kégl (8)

Data-driven hypothesis generation using deep neural nets
Data-driven hypothesis generation using deep neural netsData-driven hypothesis generation using deep neural nets
Data-driven hypothesis generation using deep neural nets
 
Model-based reinforcement learning and self-driving engineering systems
Model-based reinforcement learning and self-driving engineering systemsModel-based reinforcement learning and self-driving engineering systems
Model-based reinforcement learning and self-driving engineering systems
 
Managing the AI process: putting humans (back) in the loop
Managing the AI process: putting humans (back) in the loopManaging the AI process: putting humans (back) in the loop
Managing the AI process: putting humans (back) in the loop
 
DARMDN: Deep autoregressive mixture density nets for dynamical system mode...
   DARMDN: Deep autoregressive mixture density nets for dynamical system mode...   DARMDN: Deep autoregressive mixture density nets for dynamical system mode...
DARMDN: Deep autoregressive mixture density nets for dynamical system mode...
 
Machine learning in scientific workflows
Machine learning in scientific workflowsMachine learning in scientific workflows
Machine learning in scientific workflows
 
A historical introduction to deep learning: hardware, data, and tricks
A historical introduction to deep learning: hardware, data, and tricksA historical introduction to deep learning: hardware, data, and tricks
A historical introduction to deep learning: hardware, data, and tricks
 
Deep learning and the systemic challenges of data science initiatives
Deep learning and the systemic challenges of data science initiativesDeep learning and the systemic challenges of data science initiatives
Deep learning and the systemic challenges of data science initiatives
 
Learning do discover: machine learning in high-energy physics
Learning do discover: machine learning in high-energy physicsLearning do discover: machine learning in high-energy physics
Learning do discover: machine learning in high-energy physics
 

Recently uploaded

The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 

Recently uploaded (20)

The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 

RAMP: Collaborative challenge with code submission

  • 1. Center for Data Science Paris-Saclay1 CNRS & University Paris Saclay TektosData BALÁZS KÉGL RAPID ANALYTICS AND MODEL PROTOTYPING (RAMP) COLLABORATIVE CHALLENGE WITH CODE SUBMISSION
  • 2. Center for Data Science Paris-Saclay Center for Data Science Paris-Saclay2 A bit of context
  • 3. Center for Data Science Paris-Saclay3 UNIVERSITÉ PARIS-SACLAY 19 founding partners
  • 4. Center for Data Science Paris-Saclay Center for Data Science Paris-Saclay4 UNIVERSITÉ PARIS-SACLAY + horizontal multi-disciplinary and multi-partner initiatives to create cohesion
  • 5. Center for Data Science Paris-Saclay Center for Data Science Paris-Saclay5 Center for Data Science Paris-Saclay A multi-disciplinary initiative to define, structure, and manage the data science ecosystem at the Université Paris-Saclay http://www.datascience-paris-saclay.fr/ Biology & bioinformatics IBISC/UEvry LRI/UPSud Hepatinov CESP/UPSud-UVSQ-Inserm IGM-I2BC/UPSud MIA/Agro MIAj-MIG/INRA LMAS/Centrale Chemistry EA4041/UPSud Earth sciences LATMOS/UVSQ GEOPS/UPSud IPSL/UVSQ LSCE/UVSQ LMD/Polytechnique Economy LM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA CosmoStat/CEA IAS/UPSud AIM/CEA LAL/UPSud 250researchers in 35laboratories Machine learning LRI/UPSud LTCI/Telecom CMLA/Cachan LS/ENSAE LIX/Polytechnique MIA/Agro CMA/Polytechnique LSS/Supélec CVN/Centrale LMAS/Centrale DTIM/ONERA IBISC/UEvry Visualization INRIA LIMSI Signal processing LTCI/Telecom CMA/Polytechnique CVN/Centrale LSS/Supélec CMLA/Cachan LIMSI DTIM/ONERA Statistics LMO/UPSud LS/ENSAE LSS/Supélec CMA/Polytechnique LMAS/Centrale MIA/AgroParisTech machine learning information retrieval signal processing data visualization databases Domain science human society life brain earth universe Tool building software engineering clouds/grids high-performance computing optimization Domain scientistSoftware engineer datascience-paris-saclay.fr LIST/CEA
  • 6. Center for Data Science Paris-Saclay Data domains energy and physical sciences health and life sciences Earth and environment economy and society brain Data scientist Data trainer Applied scientist Domain scientistSoftware engineer Data engineer Data science statistics
 machine learning information retrieval signal processing data visualization databases Tool building software engineering
 clouds/grids high-performance
 computing optimization • (The lack of) manpower • especially at the interfaces • industrial brain-drain • Incentives • data scientists are not incentivized 
 to work on domain science • scientists are not incentivized to work on tools • Access • no well-developed channels to identify the right experts for a given problem • Tools • few tools that can help domain scientists and data scientists to collaborate efficiently 6 CHALLENGES https://medium.com/@balazskegl
  • 7. Center for Data Science Paris-Saclay Center for Data Science Paris-Saclay7 TWO ANALYTICS TOOLS FOR INITIATING DOMAIN-DATA SCIENCE INTERACTIONS RAPID ANALYTICS AND MODEL PROTOTYPING (RAMP) DATA CHALLENGES
  • 8. Center for Data Science Paris-Saclay Center for Data Science Paris-Saclay8 DATA CHALLENGES
  • 9. Center for Data Science Paris-Saclay Center for Data Science Paris-Saclay9 DATA CHALLENGES • The HiggsML challenge on Kaggle • https://www.kaggle.com/c/higgs-boson
  • 10. Center for Data Science Paris-Saclay Center for Data Science Paris-Saclay10 HUGE PUBLICITY CLASSIFICATION FOR DISCOVERY 14
  • 11. Center for Data Science Paris-Saclay Center for Data Science Paris-Saclay11 SIGNIFICANT IMPROVEMENT OVER THE BASELINE CLASSIFICATION FOR DISCOVERY 15
  • 12. Center for Data Science Paris-Saclay Center for Data Science Paris-Saclay12 HUGE PUBLICITY SIGNIFICANT IMPROVEMENT OVER THE BASELINE yet partially missing the objectives
  • 13. Center for Data Science Paris-Saclay • Challenges are useful for • generating visibility in the data science community about novel application domains • benchmarking in a fair way state-of-the-art techniques on well- defined problems • finding talented data scientists • Limitations • not necessary adapted to solving complex and open-ended data science problems in realistic environments • no direct access to solutions and data scientist • emphasizes competition 13 DATA CHALLENGES
  • 14. Center for Data Science Paris-Saclay Center for Data Science Paris-Saclay14 HUGE PUBLICITY We decided to design something better
  • 15. Center for Data Science Paris-Saclay • Prototyping • Training • Human resources • Collaboration building, networking • Social science observatory 15 RAPID ANALYTICS AND MODEL PROTOTYPING (RAMP)
  • 16. Center for Data Science Paris-Saclay RAMPS 16 • Single-day coding sessions • 20-40 participants • preparation is similar to challenges • Goals • focusing and motivating top talents • promoting collaboration, speed, and efficiency • solving (prototyping) real problems
  • 17. Center for Data Science Paris-Saclay17 ANALYTICS TOOLS TO PROMOTE COLLABORATION AND CODE REUSE
  • 18. Center for Data Science Paris-Saclay Center for Data Science Paris-Saclay18 ANALYTICS TOOL TO PROMOTE COLLABORATION AND CODE REUSE
  • 19. Center for Data Science Paris-Saclay RAMPS 19 www.ramp.studio software + management backend is open source: https://github.com/camillemarini/datarun
  • 20. Center for Data Science Paris-Saclay Center for Data Science Paris-Saclay 2015 Jan 15 The HiggsML challenge 20 RAPID ANALYTICS AND MODEL PROTOTYPING
  • 21. Center for Data Science Paris-Saclay Center for Data Science Paris-Saclay 2015 Apr 10 Classifying variable stars 21 RAPID ANALYTICS AND MODEL PROTOTYPING
  • 22. Center for Data Science Paris-Saclay Center for Data Science Paris-Saclay22 VARIABLE STARS
  • 23. Center for Data Science Paris-Saclay Center for Data Science Paris-Saclay23 VARIABLE STARS accuracy improvement: 89% to 96%
  • 24. Center for Data Science Paris-Saclay Center for Data Science Paris-Saclay 2015 June 16 and Sept 26 Predicting El Nino 24 RAPID ANALYTICS AND MODEL PROTOTYPING
  • 25. Center for Data Science Paris-Saclay Center for Data Science Paris-Saclay25 RMSE improvement: 0.9˚C to 0.4˚C RAPID ANALYTICS AND MODEL PROTOTYPING
  • 26. Center for Data Science Paris-Saclay Center for Data Science Paris-Saclay26 2015 October 8 Insect classification RAPID ANALYTICS AND MODEL PROTOTYPING
  • 27. Center for Data Science Paris-Saclay Center for Data Science Paris-Saclay27 accuracy improvement: 30% to 70% RAPID ANALYTICS AND MODEL PROTOTYPING
  • 28. Center for Data Science Paris-Saclay Center for Data Science Paris-Saclay28 2016 February 10 Macroeconomic agent-based models RAPID ANALYTICS AND MODEL PROTOTYPING
  • 29. Center for Data Science Paris-Saclay Center for Data Science Paris-Saclay29 f1-score improvement: 0.57 to 0.63 RAPID ANALYTICS AND MODEL PROTOTYPING
  • 30. Center for Data Science Paris-Saclay Center for Data Science Paris-Saclay30 2016 February 13 Epidemium cancer survival rate RAPID ANALYTICS AND MODEL PROTOTYPING
  • 31. Center for Data Science Paris-Saclay Center for Data Science Paris-Saclay31 RMSE improvement: 3000 to 300 RAPID ANALYTICS AND MODEL PROTOTYPING
  • 32. Center for Data Science Paris-Saclay Center for Data Science Paris-Saclay32 2016 May 11 Drug identification from spectra RAPID ANALYTICS AND MODEL PROTOTYPING
  • 33. Center for Data Science Paris-Saclay Center for Data Science Paris-Saclay33 Drug identification error improvement: 9% to 3% Drug concentration accuracy improvement: 20% to 12% RAPID ANALYTICS AND MODEL PROTOTYPING
  • 34. Center for Data Science Paris-Saclay • Fast development of analytics solutions • Teaching support • Networking and HR support • Support for collaborative team work • Commercialized through TektosData 34 THE RAMP TOOL A prototyping tool for collaborative development of data science workflows
  • 35. Center for Data Science Paris-Saclay • We have a cool tool for collaborative data analytics • designing workflows beyond scikit-learn predictors • Data management/munging is a big part of the data analytics workflow, we need tools • preparing a RAMP takes two weeks to six months • Big data is rare: our problems are more about flexible organization of heterogeneous data 35 TAKE HOME MESSAGES
  • 36. Center for Data Science Paris-Saclay36 THANK YOU!