OpenML Reproducibility in Machine Learning ICML2017

OpenML
M A K I N G M A C H I N E L E A R N I N G M O R E R E P R O D U C I B L E
( A N D E A S I E R ) B Y B R I N G I N G I T O N L I N E
J O A Q U I N VA N S C H O R E N
R E P R O D U C I B I L I T Y I N M A C H I N E L E A R N I N G , I C M L 2 0 1 7
@open_ml

Research different.
Polymaths: Solve math problems
by massive online collaboration
Broadcast question, combine
many minds to solve it

Networked Science
Designed serendipity: what’s hard for one person is easy for another
Collaboration only scales if all friction is eliminated

Friction in machine learning
• Data hard to find, in different formats
• Code hard to find, difficult to use
• Published experiments are often not reproducible
• Reproducing my own results can be difficult
• No easy overview of state-of-the-art
• Benchmarks are often incomparable
• Experimentation is mostly solitary, offline
• Only afterwards, we share aggregated results
• Limited interaction, visibility in early stage
• Fear or getting scooped

Make machine learning as simple as possible
(but not simpler)
Use any (open-source) tool to try many algorithms,
Share code and experiments online
Find richly annotated datasets, import easily.
Or share your own.
Clear problem descriptions, evaluation protocols,
overview of state of the art.
Reproducible, transparent, reusable results
Organized for easy analysis and reuse
Frictionless online collaboration, track contributions
Easy sharing of data and results

W H AT I F W E C A N A N A LY S E D ATA
C O L L A B O R AT I V E LY

O N W E B S C A L E

O N W E B S C A L E I N R E A L T I M E

C O L L A B O R AT I V E M A C H I N E L E A R N I N G
Easy to use: Integrated in many ML tools/environments
Modular contribution: Share data, code, models, evaluations
Organized data: Meta-data, reproducible models, link to people
Reward structure: Track your impact, build reputation
Self-learning: Learn from many experiments to help people
OpenML

Data (ARFF) uploaded or referenced (URL)
auto-versioned, analysed, meta-data
extracted, organised online
It starts with data

auto-versioned, analysed, meta-data
extracted, organised online

Tasks contain data, goals, procedures.
Readable by tools, automates experimentation
Train-test
splits
Classify target X
All results organized online: realtime overview
Collaborate online

Train-test
splits
All results organized online: realtime overview
Classify target X

Flows (workﬂows, scripts) can run anywhere (locally)
Tool integrations + APIs (REST, R, Python, Java,…)
Use any ML library

Integrations + APIs (REST, R, Python, Java,…)

Integrations + APIs (REST, R, Python, Java,…)
Run wherever you want

reproducible, linked to data, ﬂows, authors
and all other experiments
Experiments auto-uploaded, evaluated online
Analyse results objectively

Experiments auto-uploaded, evaluated online

Publish, and track your impact

OpenML Community
Nov-Dec 2016
3000+ registered,
2500 30-day active users

Studies
Online collaboration environment
Link to GitHub, Jupyter notebooks, wiki, Overleaf,…
Sharing settings
Share datasets, ﬂows, studies with certain people.
Easily publish them later.
Code submissions, GitHub integration
Sharing versioned code, docker images, archiving
In progress…
Benchmarking suites
Curated datasets + all reproducible results

Meta-learning, learning to learn
Learn from all prior experiments -> bots
Learn across datasets, use OpenML as ‘memory’
Deep learning integration
Native integration in deep learning libraries
Store/analyse models (architecture, parameters)
In progress (all help welcome!)
Daily research
Support other machine learning tasks, tools
Support ML in other scientiﬁc domains

Join us!
Open initiative, contribute via GitHub/Slack
Regular week-long hackathons
Bring your own data, tools
Resources, ideas welcome

Thank You
@open_ml
OpenML
Now try it yourself :)
www.openml.org

OpenML Reproducibility in Machine Learning ICML2017

More Related Content

Similar to OpenML Reproducibility in Machine Learning ICML2017

More from Joaquin Vanschoren

Recently uploaded

OpenML Reproducibility in Machine Learning ICML2017