SlideShare a Scribd company logo
1 of 65
Download to read offline
the open experiment database
      meta-learning for the masses



                        Joaquin Vanschoren   @joavanschoren
The Polymath story




     Tim Gowers
Machine Learning
   are we doing it right?
Computer Science
• The scientific method
 • Make a hypothesis about the world
 • Generate predictions based on this hypothesis
 • Design experiments to verify/falsify the prediction
   • Predictions verified: hypothesis might be true
   • Predictions falsified: hypothesis is wrong
Computer Science
• The scientific method (for ML)
 • Make a hypothesis about (the structure of) given data
 • Generate models based on this hypothesis
 • Design experiments to measure accuracy of the models
   • Good performance: It works (on this data)
   • Bad performance: It doesn’t work on this data
   • Aggregates (it works 60% of the time) not useful
Computer Science
• The scientific method (for ML)
   • Make a hypothesis about (the structure of) given data
   • Generate models based on this hypothesis
   • Design experiments to measure accuracy of the models
     • Good performance: It works (on this data)
                              n zed o
                          acteri doesn’t work on this data
                    char
     • Badtperformance: Its well?
        nd  a a be        work
H ow •a Aggregatesm works 60% of the time) not useful
      c
                 orith (it
   hich t he alg
 w
Computer Science
• The scientific method (for ML)
   • Make a hypothesis about (the structure of) given data
   • Generate models based on this hypothesis
   • Design experiments to measure accuracy of the models
     • Good performance: It works (on this data)
                              n
                           teri zed o                        ct of
                     harac It doesn’t work on thissdata effe
     • Badtperformance: s well?
               a be c
         n da             work                hat i the tings?
H     ca
  ow • Aggregatesm works 60% of the time) not eter set
                  rith (it                 W
         th e algo                            aram useful
 w hich                                     p
Meta-Learning
• The science of understanding which algorithms work
  well on which types of data
 • Hard: thorough understanding of data and algorithms
 • Requires good data: extensive experimentation

• Why is this separate from other ML research?
 • A thorough algorithm evaluation = a meta-learning study
 • Original authors know algorithms and data best, have large sets
   of experiments, are (presumably) interested in knowing on
   which data their algorithms work well (or not)
Meta-Learning
         With the right tools, can we make everyone a
                         meta-learner?

                     datasets     algorithm comparison
               data insight
                                      learning curves
Large sets of experiments
                                       algorithm selection
  ML algorithm
                                          meta-learning
    design
                                   algorithm characterization
     algorithm insight
                                 data characterization
           source code
                            bias-variance analysis
Open Machine Learning
Open science




World-wide Telescope
Open science




Microarray Databases
Open science




   GenBank
Open machine learning?
• We can also be `open’
 • Simple, common formats to describe experiments, workflows,
   algorithms,...
 • Platform to share, store, query, interact

• We can go (much) further
 • Share experiments automatically (open source ML tools)
 • Experiment on-the-fly (cheap, no expensive instruments)
 • Controlled experimentation (experimentation engine)
Formalizing
          machine learning

• Unique names for algorithms, datasets, evaluation
  measures, data characterizations,... (ontology)
  • Based on DMOP, OntoDM, KDOntology, EXPO,...
• Simple, structured way to describe algorithm setups,
  workflows and experiment runs
• Detailed enough to reproduce all experiments
Run




run
Run


       Execution of a
      predefined setup

run
Run


         Execution of a
        predefined setup

run




setup
Run




run




setup
Run



       in
data        run




            setup
Run

            machine



       in
data          run




             setup
Run

            machine



       in             out
data          run           data




             setup
Run

            machine



       in             out
data          run           data


                            Also:
                             start time
             setup           author
                             status,...
Setup




 setup
Setup
   Plan of what
   we want to do


 setup
Setup
                       Plan of what
                       we want to do


                    setup



             f(x)
algorithm   function   workflow   experiment
  setup       setup
Setup
                                         Hierarchical


            part of



                       setup



                f(x)
algorithm     function    workflow   experiment
  setup         setup
Setup
                                         Hierarchical
                                        Parameterized

            part of
                                         p=!
                       setup         parameter setting



                f(x)
algorithm     function    workflow   experiment
  setup         setup
Setup
                                      Hierarchical
                                     Parameterized
                                Abstract/concrete
            part of
                                         p=!
                       setup         parameter setting



                f(x)
algorithm     function    workflow   experiment
  setup         setup
Algorithm Setup



     algorithm
       setup
Algorithm Setup
                   Fully defined algorithm
  part of               configuration


            algorithm
              setup
Algorithm Setup
                             Fully defined algorithm
            part of               configuration


                      algorithm
                        setup



                       p=!              f(x)
implementation   parameter setting    function
                                        setup
Algorithm Setup
                             Fully defined algorithm
            part of               configuration


                      algorithm
                        setup



                       p=!              f(x)
implementation   parameter setting    function
                                        setup
Algorithm Setup
           part of


                     algorithm
                       setup


                      p=!             f(x)
implementation   parameter setting   function
                                       setup
Algorithm Setup
                    part of


                                algorithm
                                  setup


                                 p=!                f(x)
        implementation    parameter setting       function
                                                    setup



                                       p=?              f(x)
algorithm quality   algorithm       parameter   mathematical function
Algorithm Setup
                    part of
 unique
 names
                                algorithm
                                  setup


                                 p=!                f(x)
        implementation    parameter setting       function
                                                    setup



                                       p=?              f(x)
algorithm quality   algorithm       parameter   mathematical function
Algorithm Setup
                    part of
 unique                                             Roles:
 names                                              learner,
                                algorithm           base-learner,
                                  setup             kernel,...


                                 p=!                f(x)
        implementation    parameter setting       function
                                                    setup



                                       p=?              f(x)
algorithm quality   algorithm       parameter   mathematical function
Setup


            part of



                       setup



                f(x)
algorithm     function    workflow   experiment
  setup         setup
Workflow Setup
            part of



                      setup




algorithm             workflow
  setup
Workflow Setup
            part of




                                  ta
                                 so

                                     rge
                                 ur
                      setup




                                        t
                                  ce
algorithm             workflow           connection
  setup

       Workflow: components, connections,
             and parameters (inputs)
Workflow Setup
            part of
                                              Also:




                                  ta
                                               ports




                                 so

                                     rge
                                 ur
                      setup




                                        t
                                               datatype




                                  ce
algorithm             workflow           connection
  setup

       Workflow: components, connections,
             and parameters (inputs)
Workflow
                                      Example

             Weka.                       Weka.               Weka.SMO
url                                                                               Weka.RBF   eval   evalu-
             ARFFLoader                  Evaluation
                           data                                                                     ations
par      p=! location=             p=!     F=10        p=!    C=0.01      p=!       G=0.01
                http://...  data
        logRuns=true               p=!    S=1                           f(x)   5:kernel
                                                                                             pred predic-
                                                       logRuns=false
                                                                                                  tions
         2:loadData                logRuns=true         4:learner
                                     3:crossValidate
      1:mainFlow
Workflow
                                      Example

             Weka.                       Weka.                 Weka.SMO
url                                                                                 Weka.RBF           eval   evalu-
             ARFFLoader                  Evaluation
                           data                                                                               ations
par      p=! location=             p=!     F=10          p=!    C=0.01       p=!       G=0.01
                http://...  data
        logRuns=true               p=!    S=1                             f(x)   5:kernel
                                                                                                       pred predic-
                                                         logRuns=false
                                                                                                            tions
         2:loadData                logRuns=true           4:learner
                                     3:crossValidate
      1:mainFlow
                                                                       evaluations              6
                                                               eval                                 Evaluations
                               data        8      data          pred
                                   Weka.Instances                        predictions            7
                                                                                                    Predictions
Setup


            part of



                       setup



                f(x)
algorithm     function    workflow   experiment
  setup         setup
Experiment
                      Setup
  part of



            setup




                                         <X>
algorithm      workflow   experiment   experiment
  setup                                  variable
Experiment
                       Setup
  part of
                          se
                               tu
                                  p
            setup




                                              <X>
algorithm      workflow        experiment   experiment
  setup                                       variable


                    Also: experiment design, description,
                          literature reference, author,...
Experiment Setup
Experiment Setup
Variables: labeled tuples which can be
         referenced in setups
Run

            machine



       in             out
data          run           data


                            Also:
                             start time
             setup           author
                             status,...
Run


                  data




dataset   evaluation     model   predictions
Run

                           source
                  data                run




dataset   evaluation     model   predictions
Run

                                source
data quality           data                run




 dataset       evaluation     model   predictions
EXPML
             Weka.                       Weka.               Weka.SMO
url                                                                               Weka.RBF   eval   evalu-
             ARFFLoader                  Evaluation
                           data                                                                     ations
par      p=!  location=            p=!     F=10        p=!    C=0.01      p=!       G=0.01
                http://...  data
        logRuns=true               p=!    S=1                           f(x)   5:kernel
                                                                                             pred predic-
                                                       logRuns=false
                                                                                                  tions
         2:loadData                logRuns=true         4:learner
                                     3:crossValidate
      1:mainFlow
Demo
(preview)
Examples
                        1$

                      0.9$

                      0.8$
predic've)accuracy)




                      0.7$

                      0.6$
                                                      RandomForest$
                      0.5$                            C45$
                                                      Logis<cRegression$
                      0.4$
                                                      RacedIncrementalLogitBoostAStump$
                      0.3$                            NaiveBayes$
                                                      SVMARBF$
                      0.2$
                             10$   20$   30$    40$     50$   60$   70$   80$   90$ 100$
                                         percentage)of)original)dataset)size)
                                         Learning curves
Examples




When does one algorithm outperform another?
Examples




When does one algorithm outperform another?
Examples




Bias-variance profile + effect of dataset size
Examples

                 boosting


                            bagging




Bias-variance profile + effect of dataset size
Examples




Bias-variance profile + effect of dataset size
Taking it further
       Seamless integration

• Webservice for sharing, querying experiments
• Integrate experiment sharing in ML tools (WEKA,
  KNIME, RapidMiner, R, ....)
  • Mapping implementations, evaluation measures,...

• Online platform for custom querying, community
  interaction
• Semantic wiki: algorithm/data descriptions, rankings, ...
Experimentation Engine
• Controlled experimentation (Delve, MLComp)
 • Download datasets, build training/test sets
 • Feed training and test sets to algorithms, retrieve predictions/
   models
 • Run broad set of evaluation measures
 • Benchmarking (Cross-Validation), learning curve analysis,
   bias-variance analysis, workflows(!)
 • Compute data properties for new datasets
Why would you use it?
            (seeding)
• Let the system run the experiments for you
• Immediate, highly detailed benchmarks (no repeats)
• Up to date, detailed results (vs. static, aggregated in journals)
• All your results organized online (private?), anytime, anywhere
• Interact with people (weird results?)
• Get credit for all your results (e.g. citations), unexpected results
• Visibility, new collaborations
• Check if your algorithm really the best (e.g. active testing)
• On which datasets does it perform well/badly?
Question

     Is
    open
machine learning
  possible?
Merci
                  Danke            Thanks
        Xie Xie
                                            Diolch
     Toda
                                               Dank U
 Grazie
                                                    Spasiba
Efharisto
                                                Gracias
  Arigato
                                                Köszönöm
Tesekkurler
                                               Kia ora
   Dhanyavaad
                                            Hvala

              http://expdb.cs.kuleuven.be

More Related Content

What's hot

Optimization
OptimizationOptimization
OptimizationManas Das
 
Functional Python Webinar from October 22nd, 2014
Functional Python Webinar from October 22nd, 2014Functional Python Webinar from October 22nd, 2014
Functional Python Webinar from October 22nd, 2014Reuven Lerner
 
Functional programming in clojure
Functional programming in clojureFunctional programming in clojure
Functional programming in clojureJuan-Manuel Gimeno
 
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...Edureka!
 
Iterarators and generators in python
Iterarators and generators in pythonIterarators and generators in python
Iterarators and generators in pythonSarfaraz Ghanta
 
Lecture08 stacks and-queues_v3
Lecture08 stacks and-queues_v3Lecture08 stacks and-queues_v3
Lecture08 stacks and-queues_v3Hariz Mustafa
 
Diploma ii cfpc- u-5.1 pointer, structure ,union and intro to file handling
Diploma ii  cfpc- u-5.1 pointer, structure ,union and intro to file handlingDiploma ii  cfpc- u-5.1 pointer, structure ,union and intro to file handling
Diploma ii cfpc- u-5.1 pointer, structure ,union and intro to file handlingRai University
 
Python functions
Python functionsPython functions
Python functionsAliyamanasa
 
An Introduction to Functional Programming - DeveloperUG - 20140311
An Introduction to Functional Programming - DeveloperUG - 20140311An Introduction to Functional Programming - DeveloperUG - 20140311
An Introduction to Functional Programming - DeveloperUG - 20140311Andreas Pauley
 
Intro to Functions Python
Intro to Functions PythonIntro to Functions Python
Intro to Functions Pythonprimeteacher32
 

What's hot (20)

Optimization
OptimizationOptimization
Optimization
 
Functional Python Webinar from October 22nd, 2014
Functional Python Webinar from October 22nd, 2014Functional Python Webinar from October 22nd, 2014
Functional Python Webinar from October 22nd, 2014
 
Enumerable
EnumerableEnumerable
Enumerable
 
Python functions
Python functionsPython functions
Python functions
 
10. funtions and closures IN SWIFT PROGRAMMING
10. funtions and closures IN SWIFT PROGRAMMING10. funtions and closures IN SWIFT PROGRAMMING
10. funtions and closures IN SWIFT PROGRAMMING
 
Functional programming in clojure
Functional programming in clojureFunctional programming in clojure
Functional programming in clojure
 
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
 
Python programming
Python  programmingPython  programming
Python programming
 
Iterarators and generators in python
Iterarators and generators in pythonIterarators and generators in python
Iterarators and generators in python
 
Lab5
Lab5Lab5
Lab5
 
Howto argparse
Howto argparseHowto argparse
Howto argparse
 
Ch2
Ch2Ch2
Ch2
 
Chapter3
Chapter3Chapter3
Chapter3
 
Lecture08 stacks and-queues_v3
Lecture08 stacks and-queues_v3Lecture08 stacks and-queues_v3
Lecture08 stacks and-queues_v3
 
Diploma ii cfpc- u-5.1 pointer, structure ,union and intro to file handling
Diploma ii  cfpc- u-5.1 pointer, structure ,union and intro to file handlingDiploma ii  cfpc- u-5.1 pointer, structure ,union and intro to file handling
Diploma ii cfpc- u-5.1 pointer, structure ,union and intro to file handling
 
Unit 8
Unit 8Unit 8
Unit 8
 
Generic Programming
Generic ProgrammingGeneric Programming
Generic Programming
 
Python functions
Python functionsPython functions
Python functions
 
An Introduction to Functional Programming - DeveloperUG - 20140311
An Introduction to Functional Programming - DeveloperUG - 20140311An Introduction to Functional Programming - DeveloperUG - 20140311
An Introduction to Functional Programming - DeveloperUG - 20140311
 
Intro to Functions Python
Intro to Functions PythonIntro to Functions Python
Intro to Functions Python
 

Similar to Open Machine Learning

The ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxThe ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxRuby Shrestha
 
Supervised Machine Learning in R
Supervised  Machine Learning  in RSupervised  Machine Learning  in R
Supervised Machine Learning in RBabu Priyavrat
 
Machine X Language
Machine X LanguageMachine X Language
Machine X LanguageYun-Yan Chi
 
1998 - Thesis JL Pacherie Parallel perators
1998 - Thesis JL Pacherie Parallel perators1998 - Thesis JL Pacherie Parallel perators
1998 - Thesis JL Pacherie Parallel peratorsJean-Lin Pacherie, Ph.D.
 
Presentation
PresentationPresentation
Presentationbutest
 
Ml programming with python
Ml programming with pythonMl programming with python
Ml programming with pythonKumud Arora
 
Basic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsBasic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsRajendran
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonRalf Gommers
 
EuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and HadoopEuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and HadoopMax Tepkeev
 
object oriented programming java lectures
object oriented programming java lecturesobject oriented programming java lectures
object oriented programming java lecturesMSohaib24
 
app4.pptx
app4.pptxapp4.pptx
app4.pptxsg4795
 
Functional Thinking - Programming with Lambdas in Java 8
Functional Thinking - Programming with Lambdas in Java 8Functional Thinking - Programming with Lambdas in Java 8
Functional Thinking - Programming with Lambdas in Java 8Ganesh Samarthyam
 
Functional Programming
Functional ProgrammingFunctional Programming
Functional ProgrammingYuan Wang
 
A Brief Overview of (Static) Program Query Languages
A Brief Overview of (Static) Program Query LanguagesA Brief Overview of (Static) Program Query Languages
A Brief Overview of (Static) Program Query LanguagesKim Mens
 

Similar to Open Machine Learning (20)

The ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxThe ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptx
 
Supervised Machine Learning in R
Supervised  Machine Learning  in RSupervised  Machine Learning  in R
Supervised Machine Learning in R
 
Python
PythonPython
Python
 
Machine X Language
Machine X LanguageMachine X Language
Machine X Language
 
Machine Learning at Geeky Base
Machine Learning at Geeky BaseMachine Learning at Geeky Base
Machine Learning at Geeky Base
 
1998 - Thesis JL Pacherie Parallel perators
1998 - Thesis JL Pacherie Parallel perators1998 - Thesis JL Pacherie Parallel perators
1998 - Thesis JL Pacherie Parallel perators
 
Presentation
PresentationPresentation
Presentation
 
Lecture 08.pptx
Lecture 08.pptxLecture 08.pptx
Lecture 08.pptx
 
Testing untestable code - DPC10
Testing untestable code - DPC10Testing untestable code - DPC10
Testing untestable code - DPC10
 
Ml programming with python
Ml programming with pythonMl programming with python
Ml programming with python
 
Basic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsBasic terminologies & asymptotic notations
Basic terminologies & asymptotic notations
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
 
EuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and HadoopEuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and Hadoop
 
Java 8 new features
Java 8 new featuresJava 8 new features
Java 8 new features
 
object oriented programming java lectures
object oriented programming java lecturesobject oriented programming java lectures
object oriented programming java lectures
 
app4.pptx
app4.pptxapp4.pptx
app4.pptx
 
Functional Thinking - Programming with Lambdas in Java 8
Functional Thinking - Programming with Lambdas in Java 8Functional Thinking - Programming with Lambdas in Java 8
Functional Thinking - Programming with Lambdas in Java 8
 
Functional Programming
Functional ProgrammingFunctional Programming
Functional Programming
 
A Brief Overview of (Static) Program Query Languages
A Brief Overview of (Static) Program Query LanguagesA Brief Overview of (Static) Program Query Languages
A Brief Overview of (Static) Program Query Languages
 
Java
JavaJava
Java
 

More from Joaquin Vanschoren (19)

Meta learning tutorial
Meta learning tutorialMeta learning tutorial
Meta learning tutorial
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
OpenML 2019
OpenML 2019OpenML 2019
OpenML 2019
 
Exposé Ontology
Exposé OntologyExposé Ontology
Exposé Ontology
 
Designed Serendipity
Designed SerendipityDesigned Serendipity
Designed Serendipity
 
Learning how to learn
Learning how to learnLearning how to learn
Learning how to learn
 
OpenML NeurIPS2018
OpenML NeurIPS2018OpenML NeurIPS2018
OpenML NeurIPS2018
 
Open and Automated Machine Learning
Open and Automated Machine LearningOpen and Automated Machine Learning
Open and Automated Machine Learning
 
OpenML Reproducibility in Machine Learning ICML2017
OpenML Reproducibility in Machine Learning ICML2017OpenML Reproducibility in Machine Learning ICML2017
OpenML Reproducibility in Machine Learning ICML2017
 
OpenML DALI
OpenML DALIOpenML DALI
OpenML DALI
 
OpenML data@Sheffield
OpenML data@SheffieldOpenML data@Sheffield
OpenML data@Sheffield
 
OpenML Tutorial ECMLPKDD 2015
OpenML Tutorial ECMLPKDD 2015OpenML Tutorial ECMLPKDD 2015
OpenML Tutorial ECMLPKDD 2015
 
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine LearningOpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
 
Data science
Data scienceData science
Data science
 
OpenML 2014
OpenML 2014OpenML 2014
OpenML 2014
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Hadoop sensordata part2
Hadoop sensordata part2Hadoop sensordata part2
Hadoop sensordata part2
 
Hadoop sensordata part1
Hadoop sensordata part1Hadoop sensordata part1
Hadoop sensordata part1
 
Hadoop sensordata part3
Hadoop sensordata part3Hadoop sensordata part3
Hadoop sensordata part3
 

Recently uploaded

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Open Machine Learning

  • 1. the open experiment database meta-learning for the masses Joaquin Vanschoren @joavanschoren
  • 2. The Polymath story Tim Gowers
  • 3. Machine Learning are we doing it right?
  • 4. Computer Science • The scientific method • Make a hypothesis about the world • Generate predictions based on this hypothesis • Design experiments to verify/falsify the prediction • Predictions verified: hypothesis might be true • Predictions falsified: hypothesis is wrong
  • 5. Computer Science • The scientific method (for ML) • Make a hypothesis about (the structure of) given data • Generate models based on this hypothesis • Design experiments to measure accuracy of the models • Good performance: It works (on this data) • Bad performance: It doesn’t work on this data • Aggregates (it works 60% of the time) not useful
  • 6. Computer Science • The scientific method (for ML) • Make a hypothesis about (the structure of) given data • Generate models based on this hypothesis • Design experiments to measure accuracy of the models • Good performance: It works (on this data) n zed o acteri doesn’t work on this data char • Badtperformance: Its well? nd a a be work H ow •a Aggregatesm works 60% of the time) not useful c orith (it hich t he alg w
  • 7. Computer Science • The scientific method (for ML) • Make a hypothesis about (the structure of) given data • Generate models based on this hypothesis • Design experiments to measure accuracy of the models • Good performance: It works (on this data) n teri zed o ct of harac It doesn’t work on thissdata effe • Badtperformance: s well? a be c n da work hat i the tings? H ca ow • Aggregatesm works 60% of the time) not eter set rith (it W th e algo aram useful w hich p
  • 8. Meta-Learning • The science of understanding which algorithms work well on which types of data • Hard: thorough understanding of data and algorithms • Requires good data: extensive experimentation • Why is this separate from other ML research? • A thorough algorithm evaluation = a meta-learning study • Original authors know algorithms and data best, have large sets of experiments, are (presumably) interested in knowing on which data their algorithms work well (or not)
  • 9. Meta-Learning With the right tools, can we make everyone a meta-learner? datasets algorithm comparison data insight learning curves Large sets of experiments algorithm selection ML algorithm meta-learning design algorithm characterization algorithm insight data characterization source code bias-variance analysis
  • 13. Open science GenBank
  • 14. Open machine learning? • We can also be `open’ • Simple, common formats to describe experiments, workflows, algorithms,... • Platform to share, store, query, interact • We can go (much) further • Share experiments automatically (open source ML tools) • Experiment on-the-fly (cheap, no expensive instruments) • Controlled experimentation (experimentation engine)
  • 15. Formalizing machine learning • Unique names for algorithms, datasets, evaluation measures, data characterizations,... (ontology) • Based on DMOP, OntoDM, KDOntology, EXPO,... • Simple, structured way to describe algorithm setups, workflows and experiment runs • Detailed enough to reproduce all experiments
  • 17. Run Execution of a predefined setup run
  • 18. Run Execution of a predefined setup run setup
  • 20. Run in data run setup
  • 21. Run machine in data run setup
  • 22. Run machine in out data run data setup
  • 23. Run machine in out data run data Also: start time setup author status,...
  • 25. Setup Plan of what we want to do setup
  • 26. Setup Plan of what we want to do setup f(x) algorithm function workflow experiment setup setup
  • 27. Setup Hierarchical part of setup f(x) algorithm function workflow experiment setup setup
  • 28. Setup Hierarchical Parameterized part of p=! setup parameter setting f(x) algorithm function workflow experiment setup setup
  • 29. Setup Hierarchical Parameterized Abstract/concrete part of p=! setup parameter setting f(x) algorithm function workflow experiment setup setup
  • 30. Algorithm Setup algorithm setup
  • 31. Algorithm Setup Fully defined algorithm part of configuration algorithm setup
  • 32. Algorithm Setup Fully defined algorithm part of configuration algorithm setup p=! f(x) implementation parameter setting function setup
  • 33. Algorithm Setup Fully defined algorithm part of configuration algorithm setup p=! f(x) implementation parameter setting function setup
  • 34. Algorithm Setup part of algorithm setup p=! f(x) implementation parameter setting function setup
  • 35. Algorithm Setup part of algorithm setup p=! f(x) implementation parameter setting function setup p=? f(x) algorithm quality algorithm parameter mathematical function
  • 36. Algorithm Setup part of unique names algorithm setup p=! f(x) implementation parameter setting function setup p=? f(x) algorithm quality algorithm parameter mathematical function
  • 37. Algorithm Setup part of unique Roles: names learner, algorithm base-learner, setup kernel,... p=! f(x) implementation parameter setting function setup p=? f(x) algorithm quality algorithm parameter mathematical function
  • 38. Setup part of setup f(x) algorithm function workflow experiment setup setup
  • 39. Workflow Setup part of setup algorithm workflow setup
  • 40. Workflow Setup part of ta so rge ur setup t ce algorithm workflow connection setup Workflow: components, connections, and parameters (inputs)
  • 41. Workflow Setup part of Also: ta ports so rge ur setup t datatype ce algorithm workflow connection setup Workflow: components, connections, and parameters (inputs)
  • 42. Workflow Example Weka. Weka. Weka.SMO url Weka.RBF eval evalu- ARFFLoader Evaluation data ations par p=! location= p=! F=10 p=! C=0.01 p=! G=0.01 http://... data logRuns=true p=! S=1 f(x) 5:kernel pred predic- logRuns=false tions 2:loadData logRuns=true 4:learner 3:crossValidate 1:mainFlow
  • 43. Workflow Example Weka. Weka. Weka.SMO url Weka.RBF eval evalu- ARFFLoader Evaluation data ations par p=! location= p=! F=10 p=! C=0.01 p=! G=0.01 http://... data logRuns=true p=! S=1 f(x) 5:kernel pred predic- logRuns=false tions 2:loadData logRuns=true 4:learner 3:crossValidate 1:mainFlow evaluations 6 eval Evaluations data 8 data pred Weka.Instances predictions 7 Predictions
  • 44. Setup part of setup f(x) algorithm function workflow experiment setup setup
  • 45. Experiment Setup part of setup <X> algorithm workflow experiment experiment setup variable
  • 46. Experiment Setup part of se tu p setup <X> algorithm workflow experiment experiment setup variable Also: experiment design, description, literature reference, author,...
  • 48. Experiment Setup Variables: labeled tuples which can be referenced in setups
  • 49. Run machine in out data run data Also: start time setup author status,...
  • 50. Run data dataset evaluation model predictions
  • 51. Run source data run dataset evaluation model predictions
  • 52. Run source data quality data run dataset evaluation model predictions
  • 53. EXPML Weka. Weka. Weka.SMO url Weka.RBF eval evalu- ARFFLoader Evaluation data ations par p=! location= p=! F=10 p=! C=0.01 p=! G=0.01 http://... data logRuns=true p=! S=1 f(x) 5:kernel pred predic- logRuns=false tions 2:loadData logRuns=true 4:learner 3:crossValidate 1:mainFlow
  • 55. Examples 1$ 0.9$ 0.8$ predic've)accuracy) 0.7$ 0.6$ RandomForest$ 0.5$ C45$ Logis<cRegression$ 0.4$ RacedIncrementalLogitBoostAStump$ 0.3$ NaiveBayes$ SVMARBF$ 0.2$ 10$ 20$ 30$ 40$ 50$ 60$ 70$ 80$ 90$ 100$ percentage)of)original)dataset)size) Learning curves
  • 56. Examples When does one algorithm outperform another?
  • 57. Examples When does one algorithm outperform another?
  • 58. Examples Bias-variance profile + effect of dataset size
  • 59. Examples boosting bagging Bias-variance profile + effect of dataset size
  • 60. Examples Bias-variance profile + effect of dataset size
  • 61. Taking it further Seamless integration • Webservice for sharing, querying experiments • Integrate experiment sharing in ML tools (WEKA, KNIME, RapidMiner, R, ....) • Mapping implementations, evaluation measures,... • Online platform for custom querying, community interaction • Semantic wiki: algorithm/data descriptions, rankings, ...
  • 62. Experimentation Engine • Controlled experimentation (Delve, MLComp) • Download datasets, build training/test sets • Feed training and test sets to algorithms, retrieve predictions/ models • Run broad set of evaluation measures • Benchmarking (Cross-Validation), learning curve analysis, bias-variance analysis, workflows(!) • Compute data properties for new datasets
  • 63. Why would you use it? (seeding) • Let the system run the experiments for you • Immediate, highly detailed benchmarks (no repeats) • Up to date, detailed results (vs. static, aggregated in journals) • All your results organized online (private?), anytime, anywhere • Interact with people (weird results?) • Get credit for all your results (e.g. citations), unexpected results • Visibility, new collaborations • Check if your algorithm really the best (e.g. active testing) • On which datasets does it perform well/badly?
  • 64. Question Is open machine learning possible?
  • 65. Merci Danke Thanks Xie Xie Diolch Toda Dank U Grazie Spasiba Efharisto Gracias Arigato Köszönöm Tesekkurler Kia ora Dhanyavaad Hvala http://expdb.cs.kuleuven.be