SlideShare a Scribd company logo
1 of 26
Download to read offline
A data science
observatory
Akin Kazakci, Mines ParisTech
Balazs Kégl, CNRS
Team
Balázs Kégl
CNRS
Alexandre Gramfort
Télécom ParisTech
Akın Kazakçı
Mines ParisTech
Camille Marini
Télécom ParisTech
Mehdi Cherti
UP Saclay
Yohann Sitruk
Mines ParisTech
Djalel Benbouzid
UPMC
1
The research objective & questions
Enough with the chairs
• Design research is falling behind in dealing with contemporary
challenges
• Claim 1a: Too much in-breeding and repetition
• Claim 1b: Huge amount of work is based on ideas from 80s’
• Design is not about objects, but about reasoning
- Physics (Particle physics, Plasma
physics, astrophysics…)
- Biology (Genetics, Epidemiology…)
- Chemistry
- Economics, Finance, Banking
- Manufacturing, Industrial Internet
- Internet of things, Connected Devices
- Social media
- Transport & Mobility
- …
There is not enough
data scientists to
handle this much data
Revealing the potential of data: what role for design?
Last year: Crowdsourcing data challenges
1785 teams
Kazakci, A., Data science as a new frontier for design, ICED’15, Milan
Reasonable doubts about
the effectiveness of data
science contests
Crowdsourcing /?/ Design
crowd
/kraʊd/ noun
1.a large number of people gathered together in a
disorganized or unruly way
1. How to study the design process of a crowd?
2. How to manage the design process of a crowd?
SeekerSolvers
?
Crowdsourcing: C-K dynamics
Crowdsourced contests Crowdsourced collaboration
Analysis of design strategies
Achieve 5σ Dicovery condition: A discovery is claimed when we find a ‘region’
of the space where there is significant excess of ‘signal’ events.
(rejecting background-only hypothesis with a p value less than 2,9 x
10-7, corresponding to 5 Sigma).
Problem formulation: Traditional classification setting: « the task of
the participants is to train a classifier g based on the training data D
with the goal of maximizing the AMS (7) on a held-out (test) data
set » (HiggsML documentation)
With 2 tweaks:
- Training set events are « weighted »
- Maximize « Approximate Median Significance »:
Select a classification
method
Pre-processing
Choose hyper-params
Train
Optimize for
X
SVM Decision Trees NN…..…..
Performance metrics: During the overall learning process
performance metrics are used to supervise the quality and convergence
of a learned model. A traditional metric is accuracy:
where
Note that for HiggsML AMS, TP (s) and FP (b) are of particular
importance.
Boosting Bagging
others
Ensemble
Methods
(Extended) Dominant
Design
Traditional workflow = Dominant design
C space K Space
Fixating others…
Achieve 5σ
Select a classification
method
Pre-processing
Choose hyper-params
Train
Optimize for accuracy
SVM Decision Trees NN…..…..
Integrate AMS
directly in training
during Gradient
Boosting
(John)
Dicovery condition: A discovery is claimed
when we …
Problem formulation: Traditional
classification setting…
Cross-Validation: Techniques for evaluating
how a …
Ensemble Methods
during node
split in
random
forest
(John)
Weighted
Classification
Cascades
? ? ? ? ?
Optimization of AMS
Design for statistical efficiency
The biggest challenge is the unstability of
AMS. Competition results clearly show
that only participants who dealt effectively
with this issue have had higher ranks.
1st
2nd
3rd
Ensembles + CV monitoring
+ cutoff threshold seem to be
a winning strategy
monitoring
progress with
CV
+
ensembles
+
selecting a cutoff
threshold that optimise
(or stabilise AMS)
Public guide to AMS 3.6
« moves » many participants to the
given path
Fixation vs. Creative Authority
(Agogué et al, 2014)
Generating new design strategies
Data science as a new frontier for design
A. Kazakci, ICED’15 (submitted)
• Available data for HiggsML
- Forums ➔ 136 topics, 1400+ posts
- Documentation
- Participants’ blog entries
- GitHub codes
• Qualitative interpretation combined with
C-K modelling of participants’ strategies
Data challenges are hard to analyze
How do you put a crowd under a microscope?
2
The research instrument
RAMP - Rapid Analytics and Model Prototyping
A Collaborative Development Platform for Data Science
Instant access to all submitted code - for
participants & organizers
RAMP allows us to collect data on the
data science model development
process
1
A Collaborative Development Platform for Data Science
2
3
We prepare a ‘starting kit’
Continuous access to code:
Organizers can follow real-time
what’s happening - and react
Participants can analyse and
build on every submission
Submissions are trained and
performances are displayed
4 Users actions and
interactions are recorded
5 Main Output: Dozens of
predictive models and
performance benchmark
RAMP - Rapid Analytics and Model Prototyping
Collecting data with RAMP
- Number of submissions
- Frequency of submissions
- Timing of submissions
- User interactions
- Performance of submissions
- Submitted code
- …
We are interested in
- the variety (code space +
prediction space)
- the mutual influences and
inheritance (code space)
- score and delta score
(impact)
- …
3
Some applications, preliminary
observations & findings
Climatology
Time Series Based Event
Prediction on Geo-tagged data
Two workshops: Improvement
in RMSE score: from 0.90 to
0.43
El Nino Prediction
- Temperature data
- Geo-tagged time series
- Prediction: 6 months ahead
George Washington University
George Mason University
Astrophysics
Classification of variable stars
One day workshop: Accuracy
improvement: %89 to %96
Light curves (luminosity vs
time profiles)
- Static features
- Functional data
Marc Monier (LAL),
Gilles Faÿ (Centrale-Supelec)
Ecology
Finding & Classifying Insects
One day event:
Improvement in prediction
accuracy: from 0.31 to 0.70
from Image Data
Pollenating Insects
- Image data (20K images)
- 18 types of insects
- Deep neural net models
Paris Museum of Natural History,
SPIPOLL.org, NVDIA,
Université de Champagne-
Ardenne ROMEO HPC Center
Cancer Research
A graph of model similarities
- Steady progression. They have built systematically on a submission they previously created, without being influenced by
the others. Their performance may go either up (constantly) or down (constantly).
- Breakdowns or jumps. There are other groups, where the performance increased or decreased strongly from one
modification submission to the next). There may be some robustness/vulnerability issue with their approach - to be further
investigated.
- Successful expansions. An important “break” has happened at 12:00. This corresponds to “cropping” idea. Strangely,
two very similar submissions (small distance) have been submitted at the same time - one of them did not improve the
score at all (around 0.35, while the leader was around 0.55), whereas the other improved considerably (0.65).
- Currently, we see no dependency between this break and the winning solution. This might be related to the way we
have measured the code similarity.
Some observations
“Note that, following the RAMP approach, this model is
the result of a succession of small improvements
made on top of other participants’ contributions. We
did not reach a prediction score of 0.71 in one shot, but
after applying several tricks and manually tuning some
parameters.”
Heuritech,
Winner of Insect Challenge
Blog entry
How to compare design concepts - represented as code?
~
?
Various distance measures
Comparing performance profiles: Promoting novelty search
- Greyness: Model’s raw score
- Size: model’s contribution
- Position: similarity/dissimilarity in predictions
2D projection (MDS) of model’s prediction profiles
• Monitoring & Modelling “contribution” 

(Pierre Fleckinger, Economic Agents & Incentive Theory)
• Pushing towards “Novelty Search”

(Jean-Bastiste Mouret, Novelty Search)
• Controlled experiments
We Found (to be validated by further studies):
In progress:
• Gravitation: following a given submission, others are hovering around the same
coordinates, by incremental adjustments
• Repulsion: new submission using out-of-the-box code to explore the white space (no
previous close-by submissions exist)
• Hybridation: opportunistic integration of previous submissions, involving/inspired by
at least two different source of code.
RAMP platform
• RAMP platform is meant to be a free tool for researchers and students; this
opens up new perspectives (pedagogy & research) and hopefully brings
closer different communities
Akin Kazakci, Mines ParisTech
akin.kazakci@mines-paristech.fr
Thank you

More Related Content

What's hot

Identification of Relevant Sections in Web Pages Using a Machine Learning App...
Identification of Relevant Sections in Web Pages Using a Machine Learning App...Identification of Relevant Sections in Web Pages Using a Machine Learning App...
Identification of Relevant Sections in Web Pages Using a Machine Learning App...Jerrin George
 
A Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningA Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningHaptik
 
Machine Learning Real Life Applications By Examples
Machine Learning Real Life Applications By ExamplesMachine Learning Real Life Applications By Examples
Machine Learning Real Life Applications By ExamplesMario Cartia
 
The Success of Deep Generative Models
The Success of Deep Generative ModelsThe Success of Deep Generative Models
The Success of Deep Generative Modelsinside-BigData.com
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
Introduction to Machine Learning, Hands-on Deep Learning with Tensroflow 2.0
Introduction to Machine Learning, Hands-on Deep Learning with Tensroflow 2.0Introduction to Machine Learning, Hands-on Deep Learning with Tensroflow 2.0
Introduction to Machine Learning, Hands-on Deep Learning with Tensroflow 2.0Natig Vahabov
 
Kaggle Days Paris - Alberto Danese - ML Interpretability
Kaggle Days Paris - Alberto Danese - ML InterpretabilityKaggle Days Paris - Alberto Danese - ML Interpretability
Kaggle Days Paris - Alberto Danese - ML InterpretabilityAlberto Danese
 
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Edureka!
 
How machines can take decisions
How machines can take decisionsHow machines can take decisions
How machines can take decisionsDeepu S Nath
 
Machine Learning part1 - Introduction to Data Science
Machine Learning part1 - Introduction to Data Science Machine Learning part1 - Introduction to Data Science
Machine Learning part1 - Introduction to Data Science Frank Kienle
 
Interpretable machine learning
Interpretable machine learningInterpretable machine learning
Interpretable machine learningSri Ambati
 
Fundementals of Machine Learning and Deep Learning
Fundementals of Machine Learning and Deep Learning Fundementals of Machine Learning and Deep Learning
Fundementals of Machine Learning and Deep Learning ParrotAI
 
Building a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to ZBuilding a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to ZCharles Vestur
 
Teaching Mathematics Concepts via Computer Algebra Systems
Teaching Mathematics Concepts via Computer Algebra SystemsTeaching Mathematics Concepts via Computer Algebra Systems
Teaching Mathematics Concepts via Computer Algebra Systemsinventionjournals
 
Kaggle Days Brussels - Alberto Danese
Kaggle Days Brussels - Alberto DaneseKaggle Days Brussels - Alberto Danese
Kaggle Days Brussels - Alberto DaneseAlberto Danese
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluationeShikshak
 

What's hot (20)

L11. The Future of Machine Learning
L11. The Future of Machine LearningL11. The Future of Machine Learning
L11. The Future of Machine Learning
 
Identification of Relevant Sections in Web Pages Using a Machine Learning App...
Identification of Relevant Sections in Web Pages Using a Machine Learning App...Identification of Relevant Sections in Web Pages Using a Machine Learning App...
Identification of Relevant Sections in Web Pages Using a Machine Learning App...
 
A Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningA Friendly Introduction to Machine Learning
A Friendly Introduction to Machine Learning
 
Clusterix at VDS 2016
Clusterix at VDS 2016Clusterix at VDS 2016
Clusterix at VDS 2016
 
Machine Learning Real Life Applications By Examples
Machine Learning Real Life Applications By ExamplesMachine Learning Real Life Applications By Examples
Machine Learning Real Life Applications By Examples
 
The Success of Deep Generative Models
The Success of Deep Generative ModelsThe Success of Deep Generative Models
The Success of Deep Generative Models
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Introduction to Machine Learning, Hands-on Deep Learning with Tensroflow 2.0
Introduction to Machine Learning, Hands-on Deep Learning with Tensroflow 2.0Introduction to Machine Learning, Hands-on Deep Learning with Tensroflow 2.0
Introduction to Machine Learning, Hands-on Deep Learning with Tensroflow 2.0
 
4.80 sy it
4.80 sy it4.80 sy it
4.80 sy it
 
Kaggle Days Paris - Alberto Danese - ML Interpretability
Kaggle Days Paris - Alberto Danese - ML InterpretabilityKaggle Days Paris - Alberto Danese - ML Interpretability
Kaggle Days Paris - Alberto Danese - ML Interpretability
 
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
 
How machines can take decisions
How machines can take decisionsHow machines can take decisions
How machines can take decisions
 
Machine Learning part1 - Introduction to Data Science
Machine Learning part1 - Introduction to Data Science Machine Learning part1 - Introduction to Data Science
Machine Learning part1 - Introduction to Data Science
 
Interpretable machine learning
Interpretable machine learningInterpretable machine learning
Interpretable machine learning
 
Fundementals of Machine Learning and Deep Learning
Fundementals of Machine Learning and Deep Learning Fundementals of Machine Learning and Deep Learning
Fundementals of Machine Learning and Deep Learning
 
Building a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to ZBuilding a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to Z
 
Teaching Mathematics Concepts via Computer Algebra Systems
Teaching Mathematics Concepts via Computer Algebra SystemsTeaching Mathematics Concepts via Computer Algebra Systems
Teaching Mathematics Concepts via Computer Algebra Systems
 
Kaggle Days Brussels - Alberto Danese
Kaggle Days Brussels - Alberto DaneseKaggle Days Brussels - Alberto Danese
Kaggle Days Brussels - Alberto Danese
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
 

Viewers also liked

Presentation de la R&D à la RID : Des projets innovants bien financés ou du b...
Presentation de la R&D à la RID : Des projets innovants bien financés ou du b...Presentation de la R&D à la RID : Des projets innovants bien financés ou du b...
Presentation de la R&D à la RID : Des projets innovants bien financés ou du b...Marseille Innovation
 
What is wrong with data challenges
What is wrong with data challengesWhat is wrong with data challenges
What is wrong with data challengesBalázs Kégl
 
Udo ulfkotte megvásárolt újságírók
Udo ulfkotte   megvásárolt  újságírókUdo ulfkotte   megvásárolt  újságírók
Udo ulfkotte megvásárolt újságíróksaraviola
 
Correa tubular corregido
Correa tubular corregidoCorrea tubular corregido
Correa tubular corregidojose antonio
 
Seni hiburan kontemporari di malaysia
Seni hiburan kontemporari di malaysiaSeni hiburan kontemporari di malaysia
Seni hiburan kontemporari di malaysiaNur Atikah Amira
 
Семен Боярский. Одноклассники. Возможности для бизнеса
Семен Боярский. Одноклассники. Возможности для бизнесаСемен Боярский. Одноклассники. Возможности для бизнеса
Семен Боярский. Одноклассники. Возможности для бизнесаWebcom Group
 
PMP MOLD Factory
PMP MOLD Factory PMP MOLD Factory
PMP MOLD Factory Jim Wu
 
Tablete de-stil-de-viata-sanatatea
Tablete de-stil-de-viata-sanatateaTablete de-stil-de-viata-sanatatea
Tablete de-stil-de-viata-sanatateasaraviola
 
Fluorescence in situ hybridization (FISH)
Fluorescence in situ hybridization (FISH)Fluorescence in situ hybridization (FISH)
Fluorescence in situ hybridization (FISH)Nur Atikah Amira
 
CRUP; Laringotraqueobronquitis
CRUP; LaringotraqueobronquitisCRUP; Laringotraqueobronquitis
CRUP; Laringotraqueobronquitispablocortez3004
 
Psychology of persuasion - 7 tactics - 2017 02 23
Psychology of persuasion - 7 tactics - 2017 02 23Psychology of persuasion - 7 tactics - 2017 02 23
Psychology of persuasion - 7 tactics - 2017 02 23Tim Fidgeon
 
Gabriel david morales zapata
Gabriel david morales zapataGabriel david morales zapata
Gabriel david morales zapatapuertaalvaro
 
Content Strategy - Econsultancy - MWL17 - TimFidgeon
Content Strategy - Econsultancy - MWL17 - TimFidgeonContent Strategy - Econsultancy - MWL17 - TimFidgeon
Content Strategy - Econsultancy - MWL17 - TimFidgeonTim Fidgeon
 

Viewers also liked (16)

Presentation de la R&D à la RID : Des projets innovants bien financés ou du b...
Presentation de la R&D à la RID : Des projets innovants bien financés ou du b...Presentation de la R&D à la RID : Des projets innovants bien financés ou du b...
Presentation de la R&D à la RID : Des projets innovants bien financés ou du b...
 
What is wrong with data challenges
What is wrong with data challengesWhat is wrong with data challenges
What is wrong with data challenges
 
Udo ulfkotte megvásárolt újságírók
Udo ulfkotte   megvásárolt  újságírókUdo ulfkotte   megvásárolt  újságírók
Udo ulfkotte megvásárolt újságírók
 
Correa tubular corregido
Correa tubular corregidoCorrea tubular corregido
Correa tubular corregido
 
Final slideshare
Final slideshareFinal slideshare
Final slideshare
 
Seni hiburan kontemporari di malaysia
Seni hiburan kontemporari di malaysiaSeni hiburan kontemporari di malaysia
Seni hiburan kontemporari di malaysia
 
Семен Боярский. Одноклассники. Возможности для бизнеса
Семен Боярский. Одноклассники. Возможности для бизнесаСемен Боярский. Одноклассники. Возможности для бизнеса
Семен Боярский. Одноклассники. Возможности для бизнеса
 
PMP MOLD Factory
PMP MOLD Factory PMP MOLD Factory
PMP MOLD Factory
 
Sebastian y melisa
Sebastian y melisaSebastian y melisa
Sebastian y melisa
 
biografia de ana
biografia de anabiografia de ana
biografia de ana
 
Tablete de-stil-de-viata-sanatatea
Tablete de-stil-de-viata-sanatateaTablete de-stil-de-viata-sanatatea
Tablete de-stil-de-viata-sanatatea
 
Fluorescence in situ hybridization (FISH)
Fluorescence in situ hybridization (FISH)Fluorescence in situ hybridization (FISH)
Fluorescence in situ hybridization (FISH)
 
CRUP; Laringotraqueobronquitis
CRUP; LaringotraqueobronquitisCRUP; Laringotraqueobronquitis
CRUP; Laringotraqueobronquitis
 
Psychology of persuasion - 7 tactics - 2017 02 23
Psychology of persuasion - 7 tactics - 2017 02 23Psychology of persuasion - 7 tactics - 2017 02 23
Psychology of persuasion - 7 tactics - 2017 02 23
 
Gabriel david morales zapata
Gabriel david morales zapataGabriel david morales zapata
Gabriel david morales zapata
 
Content Strategy - Econsultancy - MWL17 - TimFidgeon
Content Strategy - Econsultancy - MWL17 - TimFidgeonContent Strategy - Econsultancy - MWL17 - TimFidgeon
Content Strategy - Econsultancy - MWL17 - TimFidgeon
 

Similar to A data science observatory based on RAMP - rapid analytics and model prototyping

Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Greg Makowski
 
230208 MLOps Getting from Good to Great.pptx
230208 MLOps Getting from Good to Great.pptx230208 MLOps Getting from Good to Great.pptx
230208 MLOps Getting from Good to Great.pptxArthur240715
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2Mohit Garg
 
Fast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareFast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareTigerGraph
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyAlon Bochman, CFA
 
2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheonMark Reynolds
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning台灣資料科學年會
 
Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science ChallengeMark Nichols, P.E.
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupDoug Needham
 
Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxVenkateswaraBabuRavi
 
Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4Salford Systems
 
Automated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsAutomated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsLionel Briand
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" Joshua Bloom
 
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxData Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxsumitkumar600840
 
Testing Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal PerspectiveTesting Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal PerspectiveLionel Briand
 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-stepsShesha R
 

Similar to A data science observatory based on RAMP - rapid analytics and model prototyping (20)

Big Data Challenges and Solutions
Big Data Challenges and SolutionsBig Data Challenges and Solutions
Big Data Challenges and Solutions
 
presentationIDC - 14MAY2015
presentationIDC - 14MAY2015presentationIDC - 14MAY2015
presentationIDC - 14MAY2015
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
 
Manufacturing Data Analytics
Manufacturing Data AnalyticsManufacturing Data Analytics
Manufacturing Data Analytics
 
LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
 
230208 MLOps Getting from Good to Great.pptx
230208 MLOps Getting from Good to Great.pptx230208 MLOps Getting from Good to Great.pptx
230208 MLOps Getting from Good to Great.pptx
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2
 
Fast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareFast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA Hardware
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
 
2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning
 
Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science Challenge
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup Group
 
Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptx
 
Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4
 
Automated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsAutomated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance Systems
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning"
 
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxData Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptx
 
Testing Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal PerspectiveTesting Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal Perspective
 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-steps
 

More from Akin Osman Kazakci

Transformation digitale par l'IA et la valorisation des données
Transformation digitale par l'IA et la valorisation des données Transformation digitale par l'IA et la valorisation des données
Transformation digitale par l'IA et la valorisation des données Akin Osman Kazakci
 
Learning, Representations, Generative modelling
Learning, Representations, Generative modellingLearning, Representations, Generative modelling
Learning, Representations, Generative modellingAkin Osman Kazakci
 
Data Science for Business Managers - The bare minimum a manager should know
Data Science for Business Managers - The bare minimum a manager should knowData Science for Business Managers - The bare minimum a manager should know
Data Science for Business Managers - The bare minimum a manager should knowAkin Osman Kazakci
 
Data Science for Business Managers - Trends and Evolutions
Data Science for Business Managers - Trends and EvolutionsData Science for Business Managers - Trends and Evolutions
Data Science for Business Managers - Trends and EvolutionsAkin Osman Kazakci
 
Introduction to search and optimisation for the design theorist
Introduction to search and optimisation for the design theoristIntroduction to search and optimisation for the design theorist
Introduction to search and optimisation for the design theoristAkin Osman Kazakci
 
Innovative Design Workshop - HiggsML and beyond (Machine Learning in Particle...
Innovative Design Workshop - HiggsML and beyond (Machine Learning in Particle...Innovative Design Workshop - HiggsML and beyond (Machine Learning in Particle...
Innovative Design Workshop - HiggsML and beyond (Machine Learning in Particle...Akin Osman Kazakci
 

More from Akin Osman Kazakci (7)

Transformation digitale par l'IA et la valorisation des données
Transformation digitale par l'IA et la valorisation des données Transformation digitale par l'IA et la valorisation des données
Transformation digitale par l'IA et la valorisation des données
 
Learning, Representations, Generative modelling
Learning, Representations, Generative modellingLearning, Representations, Generative modelling
Learning, Representations, Generative modelling
 
Value of Data Science
Value of Data ScienceValue of Data Science
Value of Data Science
 
Data Science for Business Managers - The bare minimum a manager should know
Data Science for Business Managers - The bare minimum a manager should knowData Science for Business Managers - The bare minimum a manager should know
Data Science for Business Managers - The bare minimum a manager should know
 
Data Science for Business Managers - Trends and Evolutions
Data Science for Business Managers - Trends and EvolutionsData Science for Business Managers - Trends and Evolutions
Data Science for Business Managers - Trends and Evolutions
 
Introduction to search and optimisation for the design theorist
Introduction to search and optimisation for the design theoristIntroduction to search and optimisation for the design theorist
Introduction to search and optimisation for the design theorist
 
Innovative Design Workshop - HiggsML and beyond (Machine Learning in Particle...
Innovative Design Workshop - HiggsML and beyond (Machine Learning in Particle...Innovative Design Workshop - HiggsML and beyond (Machine Learning in Particle...
Innovative Design Workshop - HiggsML and beyond (Machine Learning in Particle...
 

Recently uploaded

9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 

Recently uploaded (20)

9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 

A data science observatory based on RAMP - rapid analytics and model prototyping

  • 1. A data science observatory Akin Kazakci, Mines ParisTech Balazs Kégl, CNRS
  • 2. Team Balázs Kégl CNRS Alexandre Gramfort Télécom ParisTech Akın Kazakçı Mines ParisTech Camille Marini Télécom ParisTech Mehdi Cherti UP Saclay Yohann Sitruk Mines ParisTech Djalel Benbouzid UPMC
  • 4. Enough with the chairs • Design research is falling behind in dealing with contemporary challenges • Claim 1a: Too much in-breeding and repetition • Claim 1b: Huge amount of work is based on ideas from 80s’ • Design is not about objects, but about reasoning
  • 5. - Physics (Particle physics, Plasma physics, astrophysics…) - Biology (Genetics, Epidemiology…) - Chemistry - Economics, Finance, Banking - Manufacturing, Industrial Internet - Internet of things, Connected Devices - Social media - Transport & Mobility - … There is not enough data scientists to handle this much data Revealing the potential of data: what role for design?
  • 6. Last year: Crowdsourcing data challenges 1785 teams Kazakci, A., Data science as a new frontier for design, ICED’15, Milan Reasonable doubts about the effectiveness of data science contests
  • 7. Crowdsourcing /?/ Design crowd /kraʊd/ noun 1.a large number of people gathered together in a disorganized or unruly way 1. How to study the design process of a crowd? 2. How to manage the design process of a crowd?
  • 8. SeekerSolvers ? Crowdsourcing: C-K dynamics Crowdsourced contests Crowdsourced collaboration
  • 9. Analysis of design strategies Achieve 5σ Dicovery condition: A discovery is claimed when we find a ‘region’ of the space where there is significant excess of ‘signal’ events. (rejecting background-only hypothesis with a p value less than 2,9 x 10-7, corresponding to 5 Sigma). Problem formulation: Traditional classification setting: « the task of the participants is to train a classifier g based on the training data D with the goal of maximizing the AMS (7) on a held-out (test) data set » (HiggsML documentation) With 2 tweaks: - Training set events are « weighted » - Maximize « Approximate Median Significance »: Select a classification method Pre-processing Choose hyper-params Train Optimize for X SVM Decision Trees NN…..….. Performance metrics: During the overall learning process performance metrics are used to supervise the quality and convergence of a learned model. A traditional metric is accuracy: where Note that for HiggsML AMS, TP (s) and FP (b) are of particular importance. Boosting Bagging others Ensemble Methods (Extended) Dominant Design Traditional workflow = Dominant design C space K Space Fixating others… Achieve 5σ Select a classification method Pre-processing Choose hyper-params Train Optimize for accuracy SVM Decision Trees NN…..….. Integrate AMS directly in training during Gradient Boosting (John) Dicovery condition: A discovery is claimed when we … Problem formulation: Traditional classification setting… Cross-Validation: Techniques for evaluating how a … Ensemble Methods during node split in random forest (John) Weighted Classification Cascades ? ? ? ? ? Optimization of AMS Design for statistical efficiency The biggest challenge is the unstability of AMS. Competition results clearly show that only participants who dealt effectively with this issue have had higher ranks. 1st 2nd 3rd Ensembles + CV monitoring + cutoff threshold seem to be a winning strategy monitoring progress with CV + ensembles + selecting a cutoff threshold that optimise (or stabilise AMS) Public guide to AMS 3.6 « moves » many participants to the given path Fixation vs. Creative Authority (Agogué et al, 2014) Generating new design strategies Data science as a new frontier for design A. Kazakci, ICED’15 (submitted) • Available data for HiggsML - Forums ➔ 136 topics, 1400+ posts - Documentation - Participants’ blog entries - GitHub codes • Qualitative interpretation combined with C-K modelling of participants’ strategies Data challenges are hard to analyze
  • 10. How do you put a crowd under a microscope?
  • 12. RAMP - Rapid Analytics and Model Prototyping A Collaborative Development Platform for Data Science Instant access to all submitted code - for participants & organizers
  • 13. RAMP allows us to collect data on the data science model development process 1 A Collaborative Development Platform for Data Science 2 3 We prepare a ‘starting kit’ Continuous access to code: Organizers can follow real-time what’s happening - and react Participants can analyse and build on every submission Submissions are trained and performances are displayed 4 Users actions and interactions are recorded 5 Main Output: Dozens of predictive models and performance benchmark RAMP - Rapid Analytics and Model Prototyping
  • 14. Collecting data with RAMP - Number of submissions - Frequency of submissions - Timing of submissions - User interactions - Performance of submissions - Submitted code - … We are interested in - the variety (code space + prediction space) - the mutual influences and inheritance (code space) - score and delta score (impact) - …
  • 16. Climatology Time Series Based Event Prediction on Geo-tagged data Two workshops: Improvement in RMSE score: from 0.90 to 0.43 El Nino Prediction - Temperature data - Geo-tagged time series - Prediction: 6 months ahead George Washington University George Mason University
  • 17. Astrophysics Classification of variable stars One day workshop: Accuracy improvement: %89 to %96 Light curves (luminosity vs time profiles) - Static features - Functional data Marc Monier (LAL), Gilles Faÿ (Centrale-Supelec)
  • 18. Ecology Finding & Classifying Insects One day event: Improvement in prediction accuracy: from 0.31 to 0.70 from Image Data Pollenating Insects - Image data (20K images) - 18 types of insects - Deep neural net models Paris Museum of Natural History, SPIPOLL.org, NVDIA, Université de Champagne- Ardenne ROMEO HPC Center
  • 20. A graph of model similarities - Steady progression. They have built systematically on a submission they previously created, without being influenced by the others. Their performance may go either up (constantly) or down (constantly). - Breakdowns or jumps. There are other groups, where the performance increased or decreased strongly from one modification submission to the next). There may be some robustness/vulnerability issue with their approach - to be further investigated. - Successful expansions. An important “break” has happened at 12:00. This corresponds to “cropping” idea. Strangely, two very similar submissions (small distance) have been submitted at the same time - one of them did not improve the score at all (around 0.35, while the leader was around 0.55), whereas the other improved considerably (0.65). - Currently, we see no dependency between this break and the winning solution. This might be related to the way we have measured the code similarity. Some observations
  • 21. “Note that, following the RAMP approach, this model is the result of a succession of small improvements made on top of other participants’ contributions. We did not reach a prediction score of 0.71 in one shot, but after applying several tricks and manually tuning some parameters.” Heuritech, Winner of Insect Challenge Blog entry
  • 22. How to compare design concepts - represented as code? ~ ?
  • 24. Comparing performance profiles: Promoting novelty search - Greyness: Model’s raw score - Size: model’s contribution - Position: similarity/dissimilarity in predictions 2D projection (MDS) of model’s prediction profiles
  • 25. • Monitoring & Modelling “contribution” 
 (Pierre Fleckinger, Economic Agents & Incentive Theory) • Pushing towards “Novelty Search”
 (Jean-Bastiste Mouret, Novelty Search) • Controlled experiments We Found (to be validated by further studies): In progress: • Gravitation: following a given submission, others are hovering around the same coordinates, by incremental adjustments • Repulsion: new submission using out-of-the-box code to explore the white space (no previous close-by submissions exist) • Hybridation: opportunistic integration of previous submissions, involving/inspired by at least two different source of code. RAMP platform • RAMP platform is meant to be a free tool for researchers and students; this opens up new perspectives (pedagogy & research) and hopefully brings closer different communities
  • 26. Akin Kazakci, Mines ParisTech akin.kazakci@mines-paristech.fr Thank you