• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Open Machine Learning
 

Open Machine Learning

on

  • 469 views

This talk explores the possibility of turning machine learning research into open science and proposed concrete approaches to achieve this goal

This talk explores the possibility of turning machine learning research into open science and proposed concrete approaches to achieve this goal

Statistics

Views

Total Views
469
Views on SlideShare
467
Embed Views
2

Actions

Likes
0
Downloads
4
Comments
0

1 Embed 2

http://localhost 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Open Machine Learning Open Machine Learning Presentation Transcript

    • the open experiment database meta-learning for the masses Joaquin Vanschoren @joavanschoren
    • The Polymath story Tim Gowers
    • Machine Learning are we doing it right?
    • Computer Science• The scientific method • Make a hypothesis about the world • Generate predictions based on this hypothesis • Design experiments to verify/falsify the prediction • Predictions verified: hypothesis might be true • Predictions falsified: hypothesis is wrong
    • Computer Science• The scientific method (for ML) • Make a hypothesis about (the structure of) given data • Generate models based on this hypothesis • Design experiments to measure accuracy of the models • Good performance: It works (on this data) • Bad performance: It doesn’t work on this data • Aggregates (it works 60% of the time) not useful
    • Computer Science• The scientific method (for ML) • Make a hypothesis about (the structure of) given data • Generate models based on this hypothesis • Design experiments to measure accuracy of the models • Good performance: It works (on this data) n zed o acteri doesn’t work on this data char • Badtperformance: Its well? nd a a be workH ow •a Aggregatesm works 60% of the time) not useful c orith (it hich t he alg w
    • Computer Science• The scientific method (for ML) • Make a hypothesis about (the structure of) given data • Generate models based on this hypothesis • Design experiments to measure accuracy of the models • Good performance: It works (on this data) n teri zed o ct of harac It doesn’t work on thissdata effe • Badtperformance: s well? a be c n da work hat i the tings?H ca ow • Aggregatesm works 60% of the time) not eter set rith (it W th e algo aram useful w hich p
    • Meta-Learning• The science of understanding which algorithms work well on which types of data • Hard: thorough understanding of data and algorithms • Requires good data: extensive experimentation• Why is this separate from other ML research? • A thorough algorithm evaluation = a meta-learning study • Original authors know algorithms and data best, have large sets of experiments, are (presumably) interested in knowing on which data their algorithms work well (or not)
    • Meta-Learning With the right tools, can we make everyone a meta-learner? datasets algorithm comparison data insight learning curvesLarge sets of experiments algorithm selection ML algorithm meta-learning design algorithm characterization algorithm insight data characterization source code bias-variance analysis
    • Open Machine Learning
    • Open scienceWorld-wide Telescope
    • Open scienceMicroarray Databases
    • Open science GenBank
    • Open machine learning?• We can also be `open’ • Simple, common formats to describe experiments, workflows, algorithms,... • Platform to share, store, query, interact• We can go (much) further • Share experiments automatically (open source ML tools) • Experiment on-the-fly (cheap, no expensive instruments) • Controlled experimentation (experimentation engine)
    • Formalizing machine learning• Unique names for algorithms, datasets, evaluation measures, data characterizations,... (ontology) • Based on DMOP, OntoDM, KDOntology, EXPO,...• Simple, structured way to describe algorithm setups, workflows and experiment runs• Detailed enough to reproduce all experiments
    • Runrun
    • Run Execution of a predefined setuprun
    • Run Execution of a predefined setuprunsetup
    • Runrunsetup
    • Run indata run setup
    • Run machine indata run setup
    • Run machine in outdata run data setup
    • Run machine in outdata run data Also: start time setup author status,...
    • Setup setup
    • Setup Plan of what we want to do setup
    • Setup Plan of what we want to do setup f(x)algorithm function workflow experiment setup setup
    • Setup Hierarchical part of setup f(x)algorithm function workflow experiment setup setup
    • Setup Hierarchical Parameterized part of p=! setup parameter setting f(x)algorithm function workflow experiment setup setup
    • Setup Hierarchical Parameterized Abstract/concrete part of p=! setup parameter setting f(x)algorithm function workflow experiment setup setup
    • Algorithm Setup algorithm setup
    • Algorithm Setup Fully defined algorithm part of configuration algorithm setup
    • Algorithm Setup Fully defined algorithm part of configuration algorithm setup p=! f(x)implementation parameter setting function setup
    • Algorithm Setup Fully defined algorithm part of configuration algorithm setup p=! f(x)implementation parameter setting function setup
    • Algorithm Setup part of algorithm setup p=! f(x)implementation parameter setting function setup
    • Algorithm Setup part of algorithm setup p=! f(x) implementation parameter setting function setup p=? f(x)algorithm quality algorithm parameter mathematical function
    • Algorithm Setup part of unique names algorithm setup p=! f(x) implementation parameter setting function setup p=? f(x)algorithm quality algorithm parameter mathematical function
    • Algorithm Setup part of unique Roles: names learner, algorithm base-learner, setup kernel,... p=! f(x) implementation parameter setting function setup p=? f(x)algorithm quality algorithm parameter mathematical function
    • Setup part of setup f(x)algorithm function workflow experiment setup setup
    • Workflow Setup part of setupalgorithm workflow setup
    • Workflow Setup part of ta so rge ur setup t cealgorithm workflow connection setup Workflow: components, connections, and parameters (inputs)
    • Workflow Setup part of Also: ta ports so rge ur setup t datatype cealgorithm workflow connection setup Workflow: components, connections, and parameters (inputs)
    • Workflow Example Weka. Weka. Weka.SMOurl Weka.RBF eval evalu- ARFFLoader Evaluation data ationspar p=! location= p=! F=10 p=! C=0.01 p=! G=0.01 http://... data logRuns=true p=! S=1 f(x) 5:kernel pred predic- logRuns=false tions 2:loadData logRuns=true 4:learner 3:crossValidate 1:mainFlow
    • Workflow Example Weka. Weka. Weka.SMOurl Weka.RBF eval evalu- ARFFLoader Evaluation data ationspar p=! location= p=! F=10 p=! C=0.01 p=! G=0.01 http://... data logRuns=true p=! S=1 f(x) 5:kernel pred predic- logRuns=false tions 2:loadData logRuns=true 4:learner 3:crossValidate 1:mainFlow evaluations 6 eval Evaluations data 8 data pred Weka.Instances predictions 7 Predictions
    • Setup part of setup f(x)algorithm function workflow experiment setup setup
    • Experiment Setup part of setup <X>algorithm workflow experiment experiment setup variable
    • Experiment Setup part of se tu p setup <X>algorithm workflow experiment experiment setup variable Also: experiment design, description, literature reference, author,...
    • Experiment Setup
    • Experiment SetupVariables: labeled tuples which can be referenced in setups
    • Run machine in outdata run data Also: start time setup author status,...
    • Run datadataset evaluation model predictions
    • Run source data rundataset evaluation model predictions
    • Run sourcedata quality data run dataset evaluation model predictions
    • EXPML Weka. Weka. Weka.SMOurl Weka.RBF eval evalu- ARFFLoader Evaluation data ationspar p=! location= p=! F=10 p=! C=0.01 p=! G=0.01 http://... data logRuns=true p=! S=1 f(x) 5:kernel pred predic- logRuns=false tions 2:loadData logRuns=true 4:learner 3:crossValidate 1:mainFlow
    • Demo(preview)
    • Examples 1$ 0.9$ 0.8$predicve)accuracy) 0.7$ 0.6$ RandomForest$ 0.5$ C45$ Logis<cRegression$ 0.4$ RacedIncrementalLogitBoostAStump$ 0.3$ NaiveBayes$ SVMARBF$ 0.2$ 10$ 20$ 30$ 40$ 50$ 60$ 70$ 80$ 90$ 100$ percentage)of)original)dataset)size) Learning curves
    • ExamplesWhen does one algorithm outperform another?
    • ExamplesWhen does one algorithm outperform another?
    • ExamplesBias-variance profile + effect of dataset size
    • Examples boosting baggingBias-variance profile + effect of dataset size
    • ExamplesBias-variance profile + effect of dataset size
    • Taking it further Seamless integration• Webservice for sharing, querying experiments• Integrate experiment sharing in ML tools (WEKA, KNIME, RapidMiner, R, ....) • Mapping implementations, evaluation measures,...• Online platform for custom querying, community interaction• Semantic wiki: algorithm/data descriptions, rankings, ...
    • Experimentation Engine• Controlled experimentation (Delve, MLComp) • Download datasets, build training/test sets • Feed training and test sets to algorithms, retrieve predictions/ models • Run broad set of evaluation measures • Benchmarking (Cross-Validation), learning curve analysis, bias-variance analysis, workflows(!) • Compute data properties for new datasets
    • Why would you use it? (seeding)• Let the system run the experiments for you• Immediate, highly detailed benchmarks (no repeats)• Up to date, detailed results (vs. static, aggregated in journals)• All your results organized online (private?), anytime, anywhere• Interact with people (weird results?)• Get credit for all your results (e.g. citations), unexpected results• Visibility, new collaborations• Check if your algorithm really the best (e.g. active testing)• On which datasets does it perform well/badly?
    • Question Is openmachine learning possible?
    • Merci Danke Thanks Xie Xie Diolch Toda Dank U Grazie SpasibaEfharisto Gracias Arigato KöszönömTesekkurler Kia ora Dhanyavaad Hvala http://expdb.cs.kuleuven.be