SlideShare a Scribd company logo
1 of 59
Download to read offline
N E T W O R K E D MAC H I N E L E A R N I N G
J OAQ U I N VA N S C H O R E N ( T U / E ) , 2 0 1 4
#OpenML
Research different.
1 6 1 0
G A L I L E O G A L I L E I
D I S C O V E R S S A T U R N ’ S R I N G S
‘ S M A I S M R M I L M E P O E TA L
E U M I B U N E N U G T TA U I R A S ’
Research different.
Royal society: Take nobody’s word for it
Scientific Journal: Reputation-based culture
3 0 0 Y E A R S L AT E R
J O U R N A L S S H O W L I M I T S
• Complex code not included
• Large data sets not included
• Experiment details scant
• Results hard to reproduce
• Papers not updatable
• Slow, incomplete tracking of
paper impact
• Publication bias
• No online public discussion
• Open access?
J O U R N A L S : L O N G - T E R M M E M O RY
I N T E R N E T: S H O R T- T E R M W O R K I N G M E M O RY
N E T W O R K E D S C I E N C E
O N L I N E D A TA B A S E S
O P E N S O U R C E C O D E
W E B S E R V I C E S , A P I S
C O L L A B O R A T I V E T O O L S
!
O P E N , S C A L A B L E C O L L A B O R A T I O N
R E A L - T I M E D I S C U S S I O N
C O M B I N E , R E U S E S C I E N T I F I C R E S U LT S
C I T I Z E N S C I E N C E
Research different.
Polymaths: Solve math problems through
massive collaboration (not competition)
Broadcast question, combine 	

many minds to solve it
Solved hard problems in weeks
Many (joint) publications
Research different.
SDSS: Robotic telescope, data publicly online (SkyServer)
+1 million distinct users 	

vs. 10.000 astronomers
Broadcast data, allow many minds to ask the right questions
Thousands of papers
Research different.
Galaxy Zoo: citizen scientists classify a million galaxies
Offer right tools so that anybody can be a scientist
Many novel discoveries by scientists and citizens
Research different.
Sharing data sparks discovery
Designed serendipity:	

- What’s hard for one scientist is
easy for another	

- Surprising ideas, observations
can spark new discoveries
Share, organise data for easy, 	

large-scale collaboration
Data exploding in all sciences: 	

collaborative data analysis needed
Building reputation
Authorship: easy to contribute + contributions stored, visible online
Collaboration: build trust, work 	

with new people
Citation: more people see, build upon, and cite your work. 	

Tell people how to cite data and code.
Altmetrics: track reuse/interest online (ArXiv)
N E T W O R K E D MAC H I N E L E A R N I N G
Machine learning
Complex code, large-scale data, experiments (impossible to print)
Experiments not shared online: impossible to build on prior work:
inhibits deeper analysis (e.g. meta-learning)
Low reproducibility, generalisability (studies contradict)
What if we could all connect with each other, and with other 	

scientists, to explore and apply machine learning?
Few collaborative tools to speed up research
OpenML
Place to share data, code, experiments in full detail
All results organised, linked together for further (meta)analysis,
reuse, discussion, study, education
Links to (open-source) code, open data anywhere online.
Anyone can post data to analyse, anyone can share code and
results (models, predictions, evaluations)
Integrated in ML platforms (R,Weka, Rapidminer,…) 	

to automatically load data, upload results
Scientists can work in teams, but results only publicly visible if
data, code shared
OpenML: benefits for scientists
More time: automates routinizable work: 	

- find data and/or code	

- setup and run large-scale experiments	

- results compared to state-of-the-art	

- log experiment details for future reference
More control: 	

- state how others should cite your work	

- track reuse	

- share results more easily
More knowledge: 	

- more time for actual research	

- build directly on prior work	

- easier, large-scale collaboration + interaction
Plugins:WEKA
Plugins: MOA
Plugins: RapidMiner
1 . O P E R AT O R T O D O W N L O A D TA S K ( TA S K T Y P E S P E C I F I C )
2 . S U B W O R K F L O W T H AT S O LV E S T H E TA S K , G E N E R AT E S R E S U LT S
3 . O P E R AT O R F O R U P L O A D I N G R E S U LT S
OpenML: under development
OpenML studies	

- collection of datasets, flows, runs, results in a study	

- online counterpart of paper (with url)	

- construct by simply tagging resources	

- easily include (build on) data of others
Reputation building	

- Profile page: statistics of activity and impact on OpenML 	

- Collaborative leaderboards: best contributors to solving a task
Teams	

- Add scientists in teams (circles)	

- Share resources, results within team only	

- Make public at any time (e.g. after publication)
Meta-learning support	

- Data/Flow qualities: easy adding, better overviews	

- Algorithm selection techniques running on website (vs humans?)
J O I N T H E C LU B

More Related Content

Similar to OpenML Tutorial: Networked Science in Machine Learning

Open Science Framework (OSF)
Open Science Framework (OSF)Open Science Framework (OSF)
Open Science Framework (OSF)Andrew Sallans
 
Data Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong LearningData Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong LearningMegan Bowe
 
Data Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong LearningData Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong LearningMegan Bowe
 
Big Data + Learning Theory + Computational Power => Actionable Insight
Big Data + Learning Theory + Computational Power => Actionable InsightBig Data + Learning Theory + Computational Power => Actionable Insight
Big Data + Learning Theory + Computational Power => Actionable Insightalywise
 
Network Mapping & Data Storytelling for Beginners
Network Mapping & Data Storytelling for BeginnersNetwork Mapping & Data Storytelling for Beginners
Network Mapping & Data Storytelling for BeginnersRenaud Clément
 
Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...
Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...
Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...Werner Leyh
 
Data visualisationsummit 2013
Data visualisationsummit 2013Data visualisationsummit 2013
Data visualisationsummit 2013The Pathway Group
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving UpPaco Nathan
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
 
Open Science for sustainability and inclusiveness: the SKA role model
 Open Science for sustainability and inclusiveness: the SKA role model Open Science for sustainability and inclusiveness: the SKA role model
Open Science for sustainability and inclusiveness: the SKA role modelLourdes Verdes-Montenegro
 
OpenML Reproducibility in Machine Learning ICML2017
OpenML Reproducibility in Machine Learning ICML2017OpenML Reproducibility in Machine Learning ICML2017
OpenML Reproducibility in Machine Learning ICML2017Joaquin Vanschoren
 
Open Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics InstituteOpen Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics InstituteTheContentMine
 
New e-Science Edinburgh Late Edition
New e-Science Edinburgh Late EditionNew e-Science Edinburgh Late Edition
New e-Science Edinburgh Late EditionDavid De Roure
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData TheContentMine
 
The Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustThe Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustLEARN Project
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?LEARN Project
 
Social media cafe ResearchGate
Social media cafe ResearchGateSocial media cafe ResearchGate
Social media cafe ResearchGateHugo Besemer
 

Similar to OpenML Tutorial: Networked Science in Machine Learning (20)

OpenML data@Sheffield
OpenML data@SheffieldOpenML data@Sheffield
OpenML data@Sheffield
 
Open Science Framework (OSF)
Open Science Framework (OSF)Open Science Framework (OSF)
Open Science Framework (OSF)
 
Data Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong LearningData Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong Learning
 
Data Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong LearningData Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong Learning
 
Big Data + Learning Theory + Computational Power => Actionable Insight
Big Data + Learning Theory + Computational Power => Actionable InsightBig Data + Learning Theory + Computational Power => Actionable Insight
Big Data + Learning Theory + Computational Power => Actionable Insight
 
Network Mapping & Data Storytelling for Beginners
Network Mapping & Data Storytelling for BeginnersNetwork Mapping & Data Storytelling for Beginners
Network Mapping & Data Storytelling for Beginners
 
SENCER_panel.ppt
SENCER_panel.pptSENCER_panel.ppt
SENCER_panel.ppt
 
Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...
Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...
Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...
 
Data visualisationsummit 2013
Data visualisationsummit 2013Data visualisationsummit 2013
Data visualisationsummit 2013
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
 
Open Science for sustainability and inclusiveness: the SKA role model
 Open Science for sustainability and inclusiveness: the SKA role model Open Science for sustainability and inclusiveness: the SKA role model
Open Science for sustainability and inclusiveness: the SKA role model
 
OpenML Reproducibility in Machine Learning ICML2017
OpenML Reproducibility in Machine Learning ICML2017OpenML Reproducibility in Machine Learning ICML2017
OpenML Reproducibility in Machine Learning ICML2017
 
Ebi
EbiEbi
Ebi
 
Open Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics InstituteOpen Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics Institute
 
New e-Science Edinburgh Late Edition
New e-Science Edinburgh Late EditionNew e-Science Edinburgh Late Edition
New e-Science Edinburgh Late Edition
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData
 
The Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustThe Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-Rust
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?
 
Social media cafe ResearchGate
Social media cafe ResearchGateSocial media cafe ResearchGate
Social media cafe ResearchGate
 

More from Joaquin Vanschoren (15)

Meta learning tutorial
Meta learning tutorialMeta learning tutorial
Meta learning tutorial
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
OpenML 2019
OpenML 2019OpenML 2019
OpenML 2019
 
Exposé Ontology
Exposé OntologyExposé Ontology
Exposé Ontology
 
Designed Serendipity
Designed SerendipityDesigned Serendipity
Designed Serendipity
 
Learning how to learn
Learning how to learnLearning how to learn
Learning how to learn
 
OpenML NeurIPS2018
OpenML NeurIPS2018OpenML NeurIPS2018
OpenML NeurIPS2018
 
Open and Automated Machine Learning
Open and Automated Machine LearningOpen and Automated Machine Learning
Open and Automated Machine Learning
 
OpenML DALI
OpenML DALIOpenML DALI
OpenML DALI
 
Data science
Data scienceData science
Data science
 
Open Machine Learning
Open Machine LearningOpen Machine Learning
Open Machine Learning
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Hadoop sensordata part2
Hadoop sensordata part2Hadoop sensordata part2
Hadoop sensordata part2
 
Hadoop sensordata part1
Hadoop sensordata part1Hadoop sensordata part1
Hadoop sensordata part1
 
Hadoop sensordata part3
Hadoop sensordata part3Hadoop sensordata part3
Hadoop sensordata part3
 

Recently uploaded

Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 

Recently uploaded (20)

Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 

OpenML Tutorial: Networked Science in Machine Learning

  • 1. N E T W O R K E D MAC H I N E L E A R N I N G J OAQ U I N VA N S C H O R E N ( T U / E ) , 2 0 1 4 #OpenML
  • 3. 1 6 1 0 G A L I L E O G A L I L E I D I S C O V E R S S A T U R N ’ S R I N G S ‘ S M A I S M R M I L M E P O E TA L E U M I B U N E N U G T TA U I R A S ’
  • 4. Research different. Royal society: Take nobody’s word for it Scientific Journal: Reputation-based culture
  • 5. 3 0 0 Y E A R S L AT E R J O U R N A L S S H O W L I M I T S • Complex code not included • Large data sets not included • Experiment details scant • Results hard to reproduce • Papers not updatable • Slow, incomplete tracking of paper impact • Publication bias • No online public discussion • Open access?
  • 6. J O U R N A L S : L O N G - T E R M M E M O RY I N T E R N E T: S H O R T- T E R M W O R K I N G M E M O RY N E T W O R K E D S C I E N C E O N L I N E D A TA B A S E S O P E N S O U R C E C O D E W E B S E R V I C E S , A P I S C O L L A B O R A T I V E T O O L S ! O P E N , S C A L A B L E C O L L A B O R A T I O N R E A L - T I M E D I S C U S S I O N C O M B I N E , R E U S E S C I E N T I F I C R E S U LT S C I T I Z E N S C I E N C E
  • 7. Research different. Polymaths: Solve math problems through massive collaboration (not competition) Broadcast question, combine many minds to solve it Solved hard problems in weeks Many (joint) publications
  • 8. Research different. SDSS: Robotic telescope, data publicly online (SkyServer) +1 million distinct users vs. 10.000 astronomers Broadcast data, allow many minds to ask the right questions Thousands of papers
  • 9. Research different. Galaxy Zoo: citizen scientists classify a million galaxies Offer right tools so that anybody can be a scientist Many novel discoveries by scientists and citizens
  • 10. Research different. Sharing data sparks discovery Designed serendipity: - What’s hard for one scientist is easy for another - Surprising ideas, observations can spark new discoveries Share, organise data for easy, large-scale collaboration Data exploding in all sciences: collaborative data analysis needed
  • 11. Building reputation Authorship: easy to contribute + contributions stored, visible online Collaboration: build trust, work with new people Citation: more people see, build upon, and cite your work. Tell people how to cite data and code. Altmetrics: track reuse/interest online (ArXiv)
  • 12. N E T W O R K E D MAC H I N E L E A R N I N G
  • 13. Machine learning Complex code, large-scale data, experiments (impossible to print) Experiments not shared online: impossible to build on prior work: inhibits deeper analysis (e.g. meta-learning) Low reproducibility, generalisability (studies contradict) What if we could all connect with each other, and with other scientists, to explore and apply machine learning? Few collaborative tools to speed up research
  • 14. OpenML Place to share data, code, experiments in full detail All results organised, linked together for further (meta)analysis, reuse, discussion, study, education Links to (open-source) code, open data anywhere online. Anyone can post data to analyse, anyone can share code and results (models, predictions, evaluations) Integrated in ML platforms (R,Weka, Rapidminer,…) to automatically load data, upload results Scientists can work in teams, but results only publicly visible if data, code shared
  • 15. OpenML: benefits for scientists More time: automates routinizable work: - find data and/or code - setup and run large-scale experiments - results compared to state-of-the-art - log experiment details for future reference More control: - state how others should cite your work - track reuse - share results more easily More knowledge: - more time for actual research - build directly on prior work - easier, large-scale collaboration + interaction
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 57. Plugins: RapidMiner 1 . O P E R AT O R T O D O W N L O A D TA S K ( TA S K T Y P E S P E C I F I C ) 2 . S U B W O R K F L O W T H AT S O LV E S T H E TA S K , G E N E R AT E S R E S U LT S 3 . O P E R AT O R F O R U P L O A D I N G R E S U LT S
  • 58. OpenML: under development OpenML studies - collection of datasets, flows, runs, results in a study - online counterpart of paper (with url) - construct by simply tagging resources - easily include (build on) data of others Reputation building - Profile page: statistics of activity and impact on OpenML - Collaborative leaderboards: best contributors to solving a task Teams - Add scientists in teams (circles) - Share resources, results within team only - Make public at any time (e.g. after publication) Meta-learning support - Data/Flow qualities: easy adding, better overviews - Algorithm selection techniques running on website (vs humans?)
  • 59. J O I N T H E C LU B