OpenML
M A K I N G M A C H I N E L E A R N I N G M O R E R E P R O D U C I B L E
( A N D E A S I E R ) B Y B R I N G I N G I T O N L I N E
J O A Q U I N VA N S C H O R E N
R E P R O D U C I B I L I T Y I N M A C H I N E L E A R N I N G , I C M L 2 0 1 7
@open_ml
Research different.
Polymaths: Solve math problems
by massive online collaboration
Broadcast question, combine
many minds to solve it
Networked Science
Designed serendipity: what’s hard for one person is easy for another
Collaboration only scales if all friction is eliminated
Friction in machine learning
• Data hard to find, in different formats
• Code hard to find, difficult to use
• Published experiments are often not reproducible
• Reproducing my own results can be difficult
• No easy overview of state-of-the-art
• Benchmarks are often incomparable
• Experimentation is mostly solitary, offline
• Only afterwards, we share aggregated results
• Limited interaction, visibility in early stage
• Fear or getting scooped
Make machine learning as simple as possible
(but not simpler)
Use any (open-source) tool to try many algorithms,
Share code and experiments online
Find richly annotated datasets, import easily.
Or share your own.
Clear problem descriptions, evaluation protocols,
overview of state of the art.
Reproducible, transparent, reusable results
Organized for easy analysis and reuse
Frictionless online collaboration, track contributions
Easy sharing of data and results
W H AT I F W E C A N A N A LY S E D ATA
C O L L A B O R AT I V E LY
W H AT I F W E C A N A N A LY S E D ATA
C O L L A B O R AT I V E LY
O N W E B S C A L E
W H AT I F W E C A N A N A LY S E D ATA
C O L L A B O R AT I V E LY
O N W E B S C A L E I N R E A L T I M E
C O L L A B O R AT I V E M A C H I N E L E A R N I N G
Easy to use: Integrated in many ML tools/environments
Modular contribution: Share data, code, models, evaluations
Organized data: Meta-data, reproducible models, link to people
Reward structure: Track your impact, build reputation
Self-learning: Learn from many experiments to help people
OpenML
It starts with data
Data (ARFF) uploaded or referenced (URL)
auto-versioned, analysed, meta-data
extracted, organised online
It starts with data
auto-versioned, analysed, meta-data
extracted, organised online
Tasks contain data, goals, procedures.
Readable by tools, automates experimentation
Train-test
splits
Classify target X
All results organized online: realtime overview
Collaborate online
Train-test
splits
All results organized online: realtime overview
Classify target X
Flows (workflows, scripts) can run anywhere (locally)
Tool integrations + APIs (REST, R, Python, Java,…)
Use any ML library
Integrations + APIs (REST, R, Python, Java,…)
Integrations + APIs (REST, R, Python, Java,…)
Run wherever you want
reproducible, linked to data, flows, authors
and all other experiments
Experiments auto-uploaded, evaluated online
Analyse results objectively
Experiments auto-uploaded, evaluated online
Publish, and track your impact
Demo
OpenML Community
Nov-Dec 2016
3000+ registered,
2500 30-day active users
Studies
Online collaboration environment
Link to GitHub, Jupyter notebooks, wiki, Overleaf,…
Sharing settings
Share datasets, flows, studies with certain people.
Easily publish them later.
Code submissions, GitHub integration
Sharing versioned code, docker images, archiving
In progress…
Benchmarking suites
Curated datasets + all reproducible results
Meta-learning, learning to learn
Learn from all prior experiments -> bots
Learn across datasets, use OpenML as ‘memory’
Deep learning integration
Native integration in deep learning libraries
Store/analyse models (architecture, parameters)
In progress (all help welcome!)
Daily research
Support other machine learning tasks, tools
Support ML in other scientific domains
Auto-experimentation bot
Join us!
Open initiative, contribute via GitHub/Slack
Regular week-long hackathons
Bring your own data, tools
Resources, ideas welcome
Thank You
@open_ml
OpenML
Now try it yourself :)
www.openml.org

OpenML Reproducibility in Machine Learning ICML2017

  • 1.
    OpenML M A KI N G M A C H I N E L E A R N I N G M O R E R E P R O D U C I B L E ( A N D E A S I E R ) B Y B R I N G I N G I T O N L I N E J O A Q U I N VA N S C H O R E N R E P R O D U C I B I L I T Y I N M A C H I N E L E A R N I N G , I C M L 2 0 1 7 @open_ml
  • 2.
    Research different. Polymaths: Solvemath problems by massive online collaboration Broadcast question, combine many minds to solve it
  • 3.
    Networked Science Designed serendipity:what’s hard for one person is easy for another Collaboration only scales if all friction is eliminated
  • 4.
    Friction in machinelearning • Data hard to find, in different formats • Code hard to find, difficult to use • Published experiments are often not reproducible • Reproducing my own results can be difficult • No easy overview of state-of-the-art • Benchmarks are often incomparable • Experimentation is mostly solitary, offline • Only afterwards, we share aggregated results • Limited interaction, visibility in early stage • Fear or getting scooped
  • 5.
    Make machine learningas simple as possible (but not simpler) Use any (open-source) tool to try many algorithms, Share code and experiments online Find richly annotated datasets, import easily. Or share your own. Clear problem descriptions, evaluation protocols, overview of state of the art. Reproducible, transparent, reusable results Organized for easy analysis and reuse Frictionless online collaboration, track contributions Easy sharing of data and results
  • 6.
    W H ATI F W E C A N A N A LY S E D ATA C O L L A B O R AT I V E LY
  • 7.
    W H ATI F W E C A N A N A LY S E D ATA C O L L A B O R AT I V E LY O N W E B S C A L E
  • 8.
    W H ATI F W E C A N A N A LY S E D ATA C O L L A B O R AT I V E LY O N W E B S C A L E I N R E A L T I M E
  • 9.
    C O LL A B O R AT I V E M A C H I N E L E A R N I N G Easy to use: Integrated in many ML tools/environments Modular contribution: Share data, code, models, evaluations Organized data: Meta-data, reproducible models, link to people Reward structure: Track your impact, build reputation Self-learning: Learn from many experiments to help people OpenML
  • 10.
  • 11.
    Data (ARFF) uploadedor referenced (URL) auto-versioned, analysed, meta-data extracted, organised online It starts with data
  • 12.
  • 13.
    Tasks contain data,goals, procedures. Readable by tools, automates experimentation Train-test splits Classify target X All results organized online: realtime overview Collaborate online
  • 14.
    Train-test splits All results organizedonline: realtime overview Classify target X
  • 15.
    Flows (workflows, scripts)can run anywhere (locally) Tool integrations + APIs (REST, R, Python, Java,…) Use any ML library
  • 16.
    Integrations + APIs(REST, R, Python, Java,…)
  • 17.
    Integrations + APIs(REST, R, Python, Java,…) Run wherever you want
  • 18.
    reproducible, linked todata, flows, authors and all other experiments Experiments auto-uploaded, evaluated online Analyse results objectively
  • 19.
  • 20.
    Publish, and trackyour impact
  • 21.
  • 22.
    OpenML Community Nov-Dec 2016 3000+registered, 2500 30-day active users
  • 23.
    Studies Online collaboration environment Linkto GitHub, Jupyter notebooks, wiki, Overleaf,… Sharing settings Share datasets, flows, studies with certain people. Easily publish them later. Code submissions, GitHub integration Sharing versioned code, docker images, archiving In progress… Benchmarking suites Curated datasets + all reproducible results
  • 24.
    Meta-learning, learning tolearn Learn from all prior experiments -> bots Learn across datasets, use OpenML as ‘memory’ Deep learning integration Native integration in deep learning libraries Store/analyse models (architecture, parameters) In progress (all help welcome!) Daily research Support other machine learning tasks, tools Support ML in other scientific domains
  • 25.
  • 26.
    Join us! Open initiative,contribute via GitHub/Slack Regular week-long hackathons Bring your own data, tools Resources, ideas welcome
  • 27.
    Thank You @open_ml OpenML Now tryit yourself :) www.openml.org