SlideShare a Scribd company logo
the open experiment database
      meta-learning for the masses



                        Joaquin Vanschoren   @joavanschoren
The Polymath story




     Tim Gowers
Machine Learning
   are we doing it right?
Computer Science
• The scientific method
 • Make a hypothesis about the world
 • Generate predictions based on this hypothesis
 • Design experiments to verify/falsify the prediction
   • Predictions verified: hypothesis might be true
   • Predictions falsified: hypothesis is wrong
Computer Science
• The scientific method (for ML)
 • Make a hypothesis about (the structure of) given data
 • Generate models based on this hypothesis
 • Design experiments to measure accuracy of the models
   • Good performance: It works (on this data)
   • Bad performance: It doesn’t work on this data
   • Aggregates (it works 60% of the time) not useful
Computer Science
• The scientific method (for ML)
   • Make a hypothesis about (the structure of) given data
   • Generate models based on this hypothesis
   • Design experiments to measure accuracy of the models
     • Good performance: It works (on this data)
                              n zed o
                          acteri doesn’t work on this data
                    char
     • Badtperformance: Its well?
        nd  a a be        work
H ow •a Aggregatesm works 60% of the time) not useful
      c
                 orith (it
   hich t he alg
 w
Computer Science
• The scientific method (for ML)
   • Make a hypothesis about (the structure of) given data
   • Generate models based on this hypothesis
   • Design experiments to measure accuracy of the models
     • Good performance: It works (on this data)
                              n
                           teri zed o                        ct of
                     harac It doesn’t work on thissdata effe
     • Badtperformance: s well?
               a be c
         n da             work                hat i the tings?
H     ca
  ow • Aggregatesm works 60% of the time) not eter set
                  rith (it                 W
         th e algo                            aram useful
 w hich                                     p
Meta-Learning
• The science of understanding which algorithms work
  well on which types of data
 • Hard: thorough understanding of data and algorithms
 • Requires good data: extensive experimentation

• Why is this separate from other ML research?
 • A thorough algorithm evaluation = a meta-learning study
 • Original authors know algorithms and data best, have large sets
   of experiments, are (presumably) interested in knowing on
   which data their algorithms work well (or not)
Meta-Learning
         With the right tools, can we make everyone a
                         meta-learner?

                     datasets     algorithm comparison
               data insight
                                      learning curves
Large sets of experiments
                                       algorithm selection
  ML algorithm
                                          meta-learning
    design
                                   algorithm characterization
     algorithm insight
                                 data characterization
           source code
                            bias-variance analysis
Open Machine Learning
Open science




World-wide Telescope
Open science




Microarray Databases
Open science




   GenBank
Open machine learning?
• We can also be `open’
 • Simple, common formats to describe experiments, workflows,
   algorithms,...
 • Platform to share, store, query, interact

• We can go (much) further
 • Share experiments automatically (open source ML tools)
 • Experiment on-the-fly (cheap, no expensive instruments)
 • Controlled experimentation (experimentation engine)
Formalizing
          machine learning

• Unique names for algorithms, datasets, evaluation
  measures, data characterizations,... (ontology)
  • Based on DMOP, OntoDM, KDOntology, EXPO,...
• Simple, structured way to describe algorithm setups,
  workflows and experiment runs
• Detailed enough to reproduce all experiments
Run




run
Run


       Execution of a
      predefined setup

run
Run


         Execution of a
        predefined setup

run




setup
Run




run




setup
Run



       in
data        run




            setup
Run

            machine



       in
data          run




             setup
Run

            machine



       in             out
data          run           data




             setup
Run

            machine



       in             out
data          run           data


                            Also:
                             start time
             setup           author
                             status,...
Setup




 setup
Setup
   Plan of what
   we want to do


 setup
Setup
                       Plan of what
                       we want to do


                    setup



             f(x)
algorithm   function   workflow   experiment
  setup       setup
Setup
                                         Hierarchical


            part of



                       setup



                f(x)
algorithm     function    workflow   experiment
  setup         setup
Setup
                                         Hierarchical
                                        Parameterized

            part of
                                         p=!
                       setup         parameter setting



                f(x)
algorithm     function    workflow   experiment
  setup         setup
Setup
                                      Hierarchical
                                     Parameterized
                                Abstract/concrete
            part of
                                         p=!
                       setup         parameter setting



                f(x)
algorithm     function    workflow   experiment
  setup         setup
Algorithm Setup



     algorithm
       setup
Algorithm Setup
                   Fully defined algorithm
  part of               configuration


            algorithm
              setup
Algorithm Setup
                             Fully defined algorithm
            part of               configuration


                      algorithm
                        setup



                       p=!              f(x)
implementation   parameter setting    function
                                        setup
Algorithm Setup
                             Fully defined algorithm
            part of               configuration


                      algorithm
                        setup



                       p=!              f(x)
implementation   parameter setting    function
                                        setup
Algorithm Setup
           part of


                     algorithm
                       setup


                      p=!             f(x)
implementation   parameter setting   function
                                       setup
Algorithm Setup
                    part of


                                algorithm
                                  setup


                                 p=!                f(x)
        implementation    parameter setting       function
                                                    setup



                                       p=?              f(x)
algorithm quality   algorithm       parameter   mathematical function
Algorithm Setup
                    part of
 unique
 names
                                algorithm
                                  setup


                                 p=!                f(x)
        implementation    parameter setting       function
                                                    setup



                                       p=?              f(x)
algorithm quality   algorithm       parameter   mathematical function
Algorithm Setup
                    part of
 unique                                             Roles:
 names                                              learner,
                                algorithm           base-learner,
                                  setup             kernel,...


                                 p=!                f(x)
        implementation    parameter setting       function
                                                    setup



                                       p=?              f(x)
algorithm quality   algorithm       parameter   mathematical function
Setup


            part of



                       setup



                f(x)
algorithm     function    workflow   experiment
  setup         setup
Workflow Setup
            part of



                      setup




algorithm             workflow
  setup
Workflow Setup
            part of




                                  ta
                                 so

                                     rge
                                 ur
                      setup




                                        t
                                  ce
algorithm             workflow           connection
  setup

       Workflow: components, connections,
             and parameters (inputs)
Workflow Setup
            part of
                                              Also:




                                  ta
                                               ports




                                 so

                                     rge
                                 ur
                      setup




                                        t
                                               datatype




                                  ce
algorithm             workflow           connection
  setup

       Workflow: components, connections,
             and parameters (inputs)
Workflow
                                      Example

             Weka.                       Weka.               Weka.SMO
url                                                                               Weka.RBF   eval   evalu-
             ARFFLoader                  Evaluation
                           data                                                                     ations
par      p=! location=             p=!     F=10        p=!    C=0.01      p=!       G=0.01
                http://...  data
        logRuns=true               p=!    S=1                           f(x)   5:kernel
                                                                                             pred predic-
                                                       logRuns=false
                                                                                                  tions
         2:loadData                logRuns=true         4:learner
                                     3:crossValidate
      1:mainFlow
Workflow
                                      Example

             Weka.                       Weka.                 Weka.SMO
url                                                                                 Weka.RBF           eval   evalu-
             ARFFLoader                  Evaluation
                           data                                                                               ations
par      p=! location=             p=!     F=10          p=!    C=0.01       p=!       G=0.01
                http://...  data
        logRuns=true               p=!    S=1                             f(x)   5:kernel
                                                                                                       pred predic-
                                                         logRuns=false
                                                                                                            tions
         2:loadData                logRuns=true           4:learner
                                     3:crossValidate
      1:mainFlow
                                                                       evaluations              6
                                                               eval                                 Evaluations
                               data        8      data          pred
                                   Weka.Instances                        predictions            7
                                                                                                    Predictions
Setup


            part of



                       setup



                f(x)
algorithm     function    workflow   experiment
  setup         setup
Experiment
                      Setup
  part of



            setup




                                         <X>
algorithm      workflow   experiment   experiment
  setup                                  variable
Experiment
                       Setup
  part of
                          se
                               tu
                                  p
            setup




                                              <X>
algorithm      workflow        experiment   experiment
  setup                                       variable


                    Also: experiment design, description,
                          literature reference, author,...
Experiment Setup
Experiment Setup
Variables: labeled tuples which can be
         referenced in setups
Run

            machine



       in             out
data          run           data


                            Also:
                             start time
             setup           author
                             status,...
Run


                  data




dataset   evaluation     model   predictions
Run

                           source
                  data                run




dataset   evaluation     model   predictions
Run

                                source
data quality           data                run




 dataset       evaluation     model   predictions
EXPML
             Weka.                       Weka.               Weka.SMO
url                                                                               Weka.RBF   eval   evalu-
             ARFFLoader                  Evaluation
                           data                                                                     ations
par      p=!  location=            p=!     F=10        p=!    C=0.01      p=!       G=0.01
                http://...  data
        logRuns=true               p=!    S=1                           f(x)   5:kernel
                                                                                             pred predic-
                                                       logRuns=false
                                                                                                  tions
         2:loadData                logRuns=true         4:learner
                                     3:crossValidate
      1:mainFlow
Demo
(preview)
Examples
                        1$

                      0.9$

                      0.8$
predic've)accuracy)




                      0.7$

                      0.6$
                                                      RandomForest$
                      0.5$                            C45$
                                                      Logis<cRegression$
                      0.4$
                                                      RacedIncrementalLogitBoostAStump$
                      0.3$                            NaiveBayes$
                                                      SVMARBF$
                      0.2$
                             10$   20$   30$    40$     50$   60$   70$   80$   90$ 100$
                                         percentage)of)original)dataset)size)
                                         Learning curves
Examples




When does one algorithm outperform another?
Examples




When does one algorithm outperform another?
Examples




Bias-variance profile + effect of dataset size
Examples

                 boosting


                            bagging




Bias-variance profile + effect of dataset size
Examples




Bias-variance profile + effect of dataset size
Taking it further
       Seamless integration

• Webservice for sharing, querying experiments
• Integrate experiment sharing in ML tools (WEKA,
  KNIME, RapidMiner, R, ....)
  • Mapping implementations, evaluation measures,...

• Online platform for custom querying, community
  interaction
• Semantic wiki: algorithm/data descriptions, rankings, ...
Experimentation Engine
• Controlled experimentation (Delve, MLComp)
 • Download datasets, build training/test sets
 • Feed training and test sets to algorithms, retrieve predictions/
   models
 • Run broad set of evaluation measures
 • Benchmarking (Cross-Validation), learning curve analysis,
   bias-variance analysis, workflows(!)
 • Compute data properties for new datasets
Why would you use it?
            (seeding)
• Let the system run the experiments for you
• Immediate, highly detailed benchmarks (no repeats)
• Up to date, detailed results (vs. static, aggregated in journals)
• All your results organized online (private?), anytime, anywhere
• Interact with people (weird results?)
• Get credit for all your results (e.g. citations), unexpected results
• Visibility, new collaborations
• Check if your algorithm really the best (e.g. active testing)
• On which datasets does it perform well/badly?
Question

     Is
    open
machine learning
  possible?
Merci
                  Danke            Thanks
        Xie Xie
                                            Diolch
     Toda
                                               Dank U
 Grazie
                                                    Spasiba
Efharisto
                                                Gracias
  Arigato
                                                Köszönöm
Tesekkurler
                                               Kia ora
   Dhanyavaad
                                            Hvala

              http://expdb.cs.kuleuven.be

More Related Content

What's hot

Optimization
OptimizationOptimization
OptimizationManas Das
 
Functional Python Webinar from October 22nd, 2014
Functional Python Webinar from October 22nd, 2014Functional Python Webinar from October 22nd, 2014
Functional Python Webinar from October 22nd, 2014
Reuven Lerner
 
Enumerable
EnumerableEnumerable
Enumerable
mussawir20
 
Python functions
Python functionsPython functions
Python functions
Prof. Dr. K. Adisesha
 
10. funtions and closures IN SWIFT PROGRAMMING
10. funtions and closures IN SWIFT PROGRAMMING10. funtions and closures IN SWIFT PROGRAMMING
10. funtions and closures IN SWIFT PROGRAMMING
LOVELY PROFESSIONAL UNIVERSITY
 
Functional programming in clojure
Functional programming in clojureFunctional programming in clojure
Functional programming in clojure
Juan-Manuel Gimeno
 
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Edureka!
 
Python programming
Python  programmingPython  programming
Python programming
Ashwin Kumar Ramasamy
 
Iterarators and generators in python
Iterarators and generators in pythonIterarators and generators in python
Iterarators and generators in python
Sarfaraz Ghanta
 
Lab5
Lab5Lab5
Howto argparse
Howto argparseHowto argparse
Howto argparse
Manuel Cueto
 
Chapter3
Chapter3Chapter3
Chapter3
Subhadip Pal
 
Lecture08 stacks and-queues_v3
Lecture08 stacks and-queues_v3Lecture08 stacks and-queues_v3
Lecture08 stacks and-queues_v3Hariz Mustafa
 
Diploma ii cfpc- u-5.1 pointer, structure ,union and intro to file handling
Diploma ii  cfpc- u-5.1 pointer, structure ,union and intro to file handlingDiploma ii  cfpc- u-5.1 pointer, structure ,union and intro to file handling
Diploma ii cfpc- u-5.1 pointer, structure ,union and intro to file handling
Rai University
 
Generic Programming
Generic ProgrammingGeneric Programming
Generic Programming
Muhammad Alhalaby
 
Python functions
Python functionsPython functions
Python functions
Aliyamanasa
 
An Introduction to Functional Programming - DeveloperUG - 20140311
An Introduction to Functional Programming - DeveloperUG - 20140311An Introduction to Functional Programming - DeveloperUG - 20140311
An Introduction to Functional Programming - DeveloperUG - 20140311
Andreas Pauley
 
Intro to Functions Python
Intro to Functions PythonIntro to Functions Python
Intro to Functions Python
primeteacher32
 

What's hot (20)

Optimization
OptimizationOptimization
Optimization
 
Functional Python Webinar from October 22nd, 2014
Functional Python Webinar from October 22nd, 2014Functional Python Webinar from October 22nd, 2014
Functional Python Webinar from October 22nd, 2014
 
Enumerable
EnumerableEnumerable
Enumerable
 
Python functions
Python functionsPython functions
Python functions
 
10. funtions and closures IN SWIFT PROGRAMMING
10. funtions and closures IN SWIFT PROGRAMMING10. funtions and closures IN SWIFT PROGRAMMING
10. funtions and closures IN SWIFT PROGRAMMING
 
Functional programming in clojure
Functional programming in clojureFunctional programming in clojure
Functional programming in clojure
 
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
 
Python programming
Python  programmingPython  programming
Python programming
 
Iterarators and generators in python
Iterarators and generators in pythonIterarators and generators in python
Iterarators and generators in python
 
Lab5
Lab5Lab5
Lab5
 
Howto argparse
Howto argparseHowto argparse
Howto argparse
 
Ch2
Ch2Ch2
Ch2
 
Chapter3
Chapter3Chapter3
Chapter3
 
Lecture08 stacks and-queues_v3
Lecture08 stacks and-queues_v3Lecture08 stacks and-queues_v3
Lecture08 stacks and-queues_v3
 
Diploma ii cfpc- u-5.1 pointer, structure ,union and intro to file handling
Diploma ii  cfpc- u-5.1 pointer, structure ,union and intro to file handlingDiploma ii  cfpc- u-5.1 pointer, structure ,union and intro to file handling
Diploma ii cfpc- u-5.1 pointer, structure ,union and intro to file handling
 
Unit 8
Unit 8Unit 8
Unit 8
 
Generic Programming
Generic ProgrammingGeneric Programming
Generic Programming
 
Python functions
Python functionsPython functions
Python functions
 
An Introduction to Functional Programming - DeveloperUG - 20140311
An Introduction to Functional Programming - DeveloperUG - 20140311An Introduction to Functional Programming - DeveloperUG - 20140311
An Introduction to Functional Programming - DeveloperUG - 20140311
 
Intro to Functions Python
Intro to Functions PythonIntro to Functions Python
Intro to Functions Python
 

Similar to Open Machine Learning

The ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxThe ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptx
Ruby Shrestha
 
Supervised Machine Learning in R
Supervised  Machine Learning  in RSupervised  Machine Learning  in R
Supervised Machine Learning in R
Babu Priyavrat
 
Python
PythonPython
Machine X Language
Machine X LanguageMachine X Language
Machine X Language
Yun-Yan Chi
 
Machine Learning at Geeky Base
Machine Learning at Geeky BaseMachine Learning at Geeky Base
Machine Learning at Geeky Base
Kan Ouivirach, Ph.D.
 
1998 - Thesis JL Pacherie Parallel perators
1998 - Thesis JL Pacherie Parallel perators1998 - Thesis JL Pacherie Parallel perators
1998 - Thesis JL Pacherie Parallel peratorsJean-Lin Pacherie, Ph.D.
 
Presentation
PresentationPresentation
Presentationbutest
 
Lecture 08.pptx
Lecture 08.pptxLecture 08.pptx
Lecture 08.pptx
Mohammad Hassan
 
Ml programming with python
Ml programming with pythonMl programming with python
Ml programming with python
Kumud Arora
 
Basic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsBasic terminologies & asymptotic notations
Basic terminologies & asymptotic notations
Rajendran
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
Ralf Gommers
 
EuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and HadoopEuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and Hadoop
Max Tepkeev
 
Java 8 new features
Java 8 new featuresJava 8 new features
Java 8 new features
Aniket Thakur
 
object oriented programming java lectures
object oriented programming java lecturesobject oriented programming java lectures
object oriented programming java lectures
MSohaib24
 
app4.pptx
app4.pptxapp4.pptx
app4.pptx
sg4795
 
Functional Thinking - Programming with Lambdas in Java 8
Functional Thinking - Programming with Lambdas in Java 8Functional Thinking - Programming with Lambdas in Java 8
Functional Thinking - Programming with Lambdas in Java 8
Ganesh Samarthyam
 
Functional Programming
Functional ProgrammingFunctional Programming
Functional Programming
Yuan Wang
 
A Brief Overview of (Static) Program Query Languages
A Brief Overview of (Static) Program Query LanguagesA Brief Overview of (Static) Program Query Languages
A Brief Overview of (Static) Program Query Languages
Kim Mens
 

Similar to Open Machine Learning (20)

The ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxThe ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptx
 
Supervised Machine Learning in R
Supervised  Machine Learning  in RSupervised  Machine Learning  in R
Supervised Machine Learning in R
 
Python
PythonPython
Python
 
Machine X Language
Machine X LanguageMachine X Language
Machine X Language
 
Machine Learning at Geeky Base
Machine Learning at Geeky BaseMachine Learning at Geeky Base
Machine Learning at Geeky Base
 
1998 - Thesis JL Pacherie Parallel perators
1998 - Thesis JL Pacherie Parallel perators1998 - Thesis JL Pacherie Parallel perators
1998 - Thesis JL Pacherie Parallel perators
 
Presentation
PresentationPresentation
Presentation
 
Lecture 08.pptx
Lecture 08.pptxLecture 08.pptx
Lecture 08.pptx
 
Testing untestable code - DPC10
Testing untestable code - DPC10Testing untestable code - DPC10
Testing untestable code - DPC10
 
Ml programming with python
Ml programming with pythonMl programming with python
Ml programming with python
 
Basic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsBasic terminologies & asymptotic notations
Basic terminologies & asymptotic notations
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
 
EuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and HadoopEuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and Hadoop
 
Java 8 new features
Java 8 new featuresJava 8 new features
Java 8 new features
 
object oriented programming java lectures
object oriented programming java lecturesobject oriented programming java lectures
object oriented programming java lectures
 
app4.pptx
app4.pptxapp4.pptx
app4.pptx
 
Functional Thinking - Programming with Lambdas in Java 8
Functional Thinking - Programming with Lambdas in Java 8Functional Thinking - Programming with Lambdas in Java 8
Functional Thinking - Programming with Lambdas in Java 8
 
Functional Programming
Functional ProgrammingFunctional Programming
Functional Programming
 
A Brief Overview of (Static) Program Query Languages
A Brief Overview of (Static) Program Query LanguagesA Brief Overview of (Static) Program Query Languages
A Brief Overview of (Static) Program Query Languages
 
Java
JavaJava
Java
 

More from Joaquin Vanschoren

Meta learning tutorial
Meta learning tutorialMeta learning tutorial
Meta learning tutorial
Joaquin Vanschoren
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
Joaquin Vanschoren
 
OpenML 2019
OpenML 2019OpenML 2019
OpenML 2019
Joaquin Vanschoren
 
Exposé Ontology
Exposé OntologyExposé Ontology
Exposé Ontology
Joaquin Vanschoren
 
Designed Serendipity
Designed SerendipityDesigned Serendipity
Designed Serendipity
Joaquin Vanschoren
 
Learning how to learn
Learning how to learnLearning how to learn
Learning how to learn
Joaquin Vanschoren
 
OpenML NeurIPS2018
OpenML NeurIPS2018OpenML NeurIPS2018
OpenML NeurIPS2018
Joaquin Vanschoren
 
Open and Automated Machine Learning
Open and Automated Machine LearningOpen and Automated Machine Learning
Open and Automated Machine Learning
Joaquin Vanschoren
 
OpenML Reproducibility in Machine Learning ICML2017
OpenML Reproducibility in Machine Learning ICML2017OpenML Reproducibility in Machine Learning ICML2017
OpenML Reproducibility in Machine Learning ICML2017
Joaquin Vanschoren
 
OpenML DALI
OpenML DALIOpenML DALI
OpenML DALI
Joaquin Vanschoren
 
OpenML data@Sheffield
OpenML data@SheffieldOpenML data@Sheffield
OpenML data@Sheffield
Joaquin Vanschoren
 
OpenML Tutorial ECMLPKDD 2015
OpenML Tutorial ECMLPKDD 2015OpenML Tutorial ECMLPKDD 2015
OpenML Tutorial ECMLPKDD 2015
Joaquin Vanschoren
 
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine LearningOpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
Joaquin Vanschoren
 
Data science
Data scienceData science
Data science
Joaquin Vanschoren
 
OpenML 2014
OpenML 2014OpenML 2014
OpenML 2014
Joaquin Vanschoren
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
Joaquin Vanschoren
 
Hadoop sensordata part2
Hadoop sensordata part2Hadoop sensordata part2
Hadoop sensordata part2
Joaquin Vanschoren
 
Hadoop sensordata part1
Hadoop sensordata part1Hadoop sensordata part1
Hadoop sensordata part1
Joaquin Vanschoren
 
Hadoop sensordata part3
Hadoop sensordata part3Hadoop sensordata part3
Hadoop sensordata part3
Joaquin Vanschoren
 

More from Joaquin Vanschoren (19)

Meta learning tutorial
Meta learning tutorialMeta learning tutorial
Meta learning tutorial
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
OpenML 2019
OpenML 2019OpenML 2019
OpenML 2019
 
Exposé Ontology
Exposé OntologyExposé Ontology
Exposé Ontology
 
Designed Serendipity
Designed SerendipityDesigned Serendipity
Designed Serendipity
 
Learning how to learn
Learning how to learnLearning how to learn
Learning how to learn
 
OpenML NeurIPS2018
OpenML NeurIPS2018OpenML NeurIPS2018
OpenML NeurIPS2018
 
Open and Automated Machine Learning
Open and Automated Machine LearningOpen and Automated Machine Learning
Open and Automated Machine Learning
 
OpenML Reproducibility in Machine Learning ICML2017
OpenML Reproducibility in Machine Learning ICML2017OpenML Reproducibility in Machine Learning ICML2017
OpenML Reproducibility in Machine Learning ICML2017
 
OpenML DALI
OpenML DALIOpenML DALI
OpenML DALI
 
OpenML data@Sheffield
OpenML data@SheffieldOpenML data@Sheffield
OpenML data@Sheffield
 
OpenML Tutorial ECMLPKDD 2015
OpenML Tutorial ECMLPKDD 2015OpenML Tutorial ECMLPKDD 2015
OpenML Tutorial ECMLPKDD 2015
 
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine LearningOpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
 
Data science
Data scienceData science
Data science
 
OpenML 2014
OpenML 2014OpenML 2014
OpenML 2014
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Hadoop sensordata part2
Hadoop sensordata part2Hadoop sensordata part2
Hadoop sensordata part2
 
Hadoop sensordata part1
Hadoop sensordata part1Hadoop sensordata part1
Hadoop sensordata part1
 
Hadoop sensordata part3
Hadoop sensordata part3Hadoop sensordata part3
Hadoop sensordata part3
 

Recently uploaded

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
ViralQR
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 

Recently uploaded (20)

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 

Open Machine Learning

  • 1. the open experiment database meta-learning for the masses Joaquin Vanschoren @joavanschoren
  • 2. The Polymath story Tim Gowers
  • 3. Machine Learning are we doing it right?
  • 4. Computer Science • The scientific method • Make a hypothesis about the world • Generate predictions based on this hypothesis • Design experiments to verify/falsify the prediction • Predictions verified: hypothesis might be true • Predictions falsified: hypothesis is wrong
  • 5. Computer Science • The scientific method (for ML) • Make a hypothesis about (the structure of) given data • Generate models based on this hypothesis • Design experiments to measure accuracy of the models • Good performance: It works (on this data) • Bad performance: It doesn’t work on this data • Aggregates (it works 60% of the time) not useful
  • 6. Computer Science • The scientific method (for ML) • Make a hypothesis about (the structure of) given data • Generate models based on this hypothesis • Design experiments to measure accuracy of the models • Good performance: It works (on this data) n zed o acteri doesn’t work on this data char • Badtperformance: Its well? nd a a be work H ow •a Aggregatesm works 60% of the time) not useful c orith (it hich t he alg w
  • 7. Computer Science • The scientific method (for ML) • Make a hypothesis about (the structure of) given data • Generate models based on this hypothesis • Design experiments to measure accuracy of the models • Good performance: It works (on this data) n teri zed o ct of harac It doesn’t work on thissdata effe • Badtperformance: s well? a be c n da work hat i the tings? H ca ow • Aggregatesm works 60% of the time) not eter set rith (it W th e algo aram useful w hich p
  • 8. Meta-Learning • The science of understanding which algorithms work well on which types of data • Hard: thorough understanding of data and algorithms • Requires good data: extensive experimentation • Why is this separate from other ML research? • A thorough algorithm evaluation = a meta-learning study • Original authors know algorithms and data best, have large sets of experiments, are (presumably) interested in knowing on which data their algorithms work well (or not)
  • 9. Meta-Learning With the right tools, can we make everyone a meta-learner? datasets algorithm comparison data insight learning curves Large sets of experiments algorithm selection ML algorithm meta-learning design algorithm characterization algorithm insight data characterization source code bias-variance analysis
  • 13. Open science GenBank
  • 14. Open machine learning? • We can also be `open’ • Simple, common formats to describe experiments, workflows, algorithms,... • Platform to share, store, query, interact • We can go (much) further • Share experiments automatically (open source ML tools) • Experiment on-the-fly (cheap, no expensive instruments) • Controlled experimentation (experimentation engine)
  • 15. Formalizing machine learning • Unique names for algorithms, datasets, evaluation measures, data characterizations,... (ontology) • Based on DMOP, OntoDM, KDOntology, EXPO,... • Simple, structured way to describe algorithm setups, workflows and experiment runs • Detailed enough to reproduce all experiments
  • 17. Run Execution of a predefined setup run
  • 18. Run Execution of a predefined setup run setup
  • 20. Run in data run setup
  • 21. Run machine in data run setup
  • 22. Run machine in out data run data setup
  • 23. Run machine in out data run data Also: start time setup author status,...
  • 25. Setup Plan of what we want to do setup
  • 26. Setup Plan of what we want to do setup f(x) algorithm function workflow experiment setup setup
  • 27. Setup Hierarchical part of setup f(x) algorithm function workflow experiment setup setup
  • 28. Setup Hierarchical Parameterized part of p=! setup parameter setting f(x) algorithm function workflow experiment setup setup
  • 29. Setup Hierarchical Parameterized Abstract/concrete part of p=! setup parameter setting f(x) algorithm function workflow experiment setup setup
  • 30. Algorithm Setup algorithm setup
  • 31. Algorithm Setup Fully defined algorithm part of configuration algorithm setup
  • 32. Algorithm Setup Fully defined algorithm part of configuration algorithm setup p=! f(x) implementation parameter setting function setup
  • 33. Algorithm Setup Fully defined algorithm part of configuration algorithm setup p=! f(x) implementation parameter setting function setup
  • 34. Algorithm Setup part of algorithm setup p=! f(x) implementation parameter setting function setup
  • 35. Algorithm Setup part of algorithm setup p=! f(x) implementation parameter setting function setup p=? f(x) algorithm quality algorithm parameter mathematical function
  • 36. Algorithm Setup part of unique names algorithm setup p=! f(x) implementation parameter setting function setup p=? f(x) algorithm quality algorithm parameter mathematical function
  • 37. Algorithm Setup part of unique Roles: names learner, algorithm base-learner, setup kernel,... p=! f(x) implementation parameter setting function setup p=? f(x) algorithm quality algorithm parameter mathematical function
  • 38. Setup part of setup f(x) algorithm function workflow experiment setup setup
  • 39. Workflow Setup part of setup algorithm workflow setup
  • 40. Workflow Setup part of ta so rge ur setup t ce algorithm workflow connection setup Workflow: components, connections, and parameters (inputs)
  • 41. Workflow Setup part of Also: ta ports so rge ur setup t datatype ce algorithm workflow connection setup Workflow: components, connections, and parameters (inputs)
  • 42. Workflow Example Weka. Weka. Weka.SMO url Weka.RBF eval evalu- ARFFLoader Evaluation data ations par p=! location= p=! F=10 p=! C=0.01 p=! G=0.01 http://... data logRuns=true p=! S=1 f(x) 5:kernel pred predic- logRuns=false tions 2:loadData logRuns=true 4:learner 3:crossValidate 1:mainFlow
  • 43. Workflow Example Weka. Weka. Weka.SMO url Weka.RBF eval evalu- ARFFLoader Evaluation data ations par p=! location= p=! F=10 p=! C=0.01 p=! G=0.01 http://... data logRuns=true p=! S=1 f(x) 5:kernel pred predic- logRuns=false tions 2:loadData logRuns=true 4:learner 3:crossValidate 1:mainFlow evaluations 6 eval Evaluations data 8 data pred Weka.Instances predictions 7 Predictions
  • 44. Setup part of setup f(x) algorithm function workflow experiment setup setup
  • 45. Experiment Setup part of setup <X> algorithm workflow experiment experiment setup variable
  • 46. Experiment Setup part of se tu p setup <X> algorithm workflow experiment experiment setup variable Also: experiment design, description, literature reference, author,...
  • 48. Experiment Setup Variables: labeled tuples which can be referenced in setups
  • 49. Run machine in out data run data Also: start time setup author status,...
  • 50. Run data dataset evaluation model predictions
  • 51. Run source data run dataset evaluation model predictions
  • 52. Run source data quality data run dataset evaluation model predictions
  • 53. EXPML Weka. Weka. Weka.SMO url Weka.RBF eval evalu- ARFFLoader Evaluation data ations par p=! location= p=! F=10 p=! C=0.01 p=! G=0.01 http://... data logRuns=true p=! S=1 f(x) 5:kernel pred predic- logRuns=false tions 2:loadData logRuns=true 4:learner 3:crossValidate 1:mainFlow
  • 55. Examples 1$ 0.9$ 0.8$ predic've)accuracy) 0.7$ 0.6$ RandomForest$ 0.5$ C45$ Logis<cRegression$ 0.4$ RacedIncrementalLogitBoostAStump$ 0.3$ NaiveBayes$ SVMARBF$ 0.2$ 10$ 20$ 30$ 40$ 50$ 60$ 70$ 80$ 90$ 100$ percentage)of)original)dataset)size) Learning curves
  • 56. Examples When does one algorithm outperform another?
  • 57. Examples When does one algorithm outperform another?
  • 58. Examples Bias-variance profile + effect of dataset size
  • 59. Examples boosting bagging Bias-variance profile + effect of dataset size
  • 60. Examples Bias-variance profile + effect of dataset size
  • 61. Taking it further Seamless integration • Webservice for sharing, querying experiments • Integrate experiment sharing in ML tools (WEKA, KNIME, RapidMiner, R, ....) • Mapping implementations, evaluation measures,... • Online platform for custom querying, community interaction • Semantic wiki: algorithm/data descriptions, rankings, ...
  • 62. Experimentation Engine • Controlled experimentation (Delve, MLComp) • Download datasets, build training/test sets • Feed training and test sets to algorithms, retrieve predictions/ models • Run broad set of evaluation measures • Benchmarking (Cross-Validation), learning curve analysis, bias-variance analysis, workflows(!) • Compute data properties for new datasets
  • 63. Why would you use it? (seeding) • Let the system run the experiments for you • Immediate, highly detailed benchmarks (no repeats) • Up to date, detailed results (vs. static, aggregated in journals) • All your results organized online (private?), anytime, anywhere • Interact with people (weird results?) • Get credit for all your results (e.g. citations), unexpected results • Visibility, new collaborations • Check if your algorithm really the best (e.g. active testing) • On which datasets does it perform well/badly?
  • 64. Question Is open machine learning possible?
  • 65. Merci Danke Thanks Xie Xie Diolch Toda Dank U Grazie Spasiba Efharisto Gracias Arigato Köszönöm Tesekkurler Kia ora Dhanyavaad Hvala http://expdb.cs.kuleuven.be