ML Schema: Machine Learning Schema
Agnieszka Lawrynowicz
Poznan University of Technology, Poland
OpenML2016
March 17, 2016
Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 1 / 18
W3C Machine Learning Schema Community Group
https://www.w3.org/community/ml-schema/
Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 2 / 18
Goals
To define a simple shared schema of data mining/ machine learning
(DM/ML) algorithms, datasets, and experiments that may be used in
many di↵erent formats: XML, RDF, OWL, spreadsheet tables.
Collect use cases from the academic community and industry
Use this schema as a basis to align existing DM/ML ontologies and
develop more specific ontologies with specific purposes/applications
Prevent a proliferation of incompatible DM/ML ontologies
Turn machine learning algorithms and results into linked open data
Promote the use of this schema, including involving stakeholders like
ML tool developers
Apply for funding (e.g. EU COST, UK Research Councils,
Horizon2020 Coordination and Support Actions) to organize
workshops, and for dissemination
Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 3 / 18
Goals
Use this schema as a basis to align existing DM/ML ontologies
and develop more specific ontologies with specific
purposes/applications
Prevent a proliferation of incompatible DM/ML ontologies
Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 4 / 18
ML ontologies and vocabularies
OntoDM
DMOP
Expos´e
MEX vocabulary
others: KDDONTO, KD, DMWF, ...
mostly having several hundreds of classes, some highly axiomatized
Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 5 / 18
OntoDM
Pance Panov, Larisa N. Soldatova, Saso Dzeroski: Ontology of core data mining
entities. Data Min. Knowl. Discov. 28(5-6): 1222-1265 (2014)
built in compliance to upper level ontologies BFO, OBI, IAO, modularized
incorporates structured data mining
Use case: generic, middle level ontology for ML; representing QSAR entities for
drug design, used by Eve Robot Scientist
Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 6 / 18
DMOP: Data Mining Optimization Ontology
C. Maria Keet, Agnieszka Lawrynowicz, Claudia d’Amato, Alexandros Kalousis, Phong
Nguyen, Raul Palma, Robert Stevens, Melanie Hilario: The Data Mining OPtimization
Ontology. J. Web Sem. 32: 43-53 (2015)
development started in e-LICO EU FP7 project (2009-2012)
detailed algorithm internal characteristics (’qualities’)
Use case: meta-learning (’whitebox’), meta-mining, used to produce Intelligent
Discovery Assistant for RapidMiner
Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 7 / 18
Expos´e
Joaquin Vanschoren, Hendrik Blockeel, Bernhard Pfahringer, Geo↵rey Holmes:
Experiment databases - A new way to share, organize and learn from experiments.
Machine Learning 87(2): 127-158 (2012)
re-uses OntoDM (at top-level) and DMOP (at bottom level)
superseded by OpenML DB schema
Use case: experiment databases, ExpML markup
Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 8 / 18
MEX vocabulary
Diego Esteves, Diego Moussallem, Ciro Baron Neto, Tommaso Soru, Ricardo Usbeck,
Markus Ackermann, Jens Lehmann: MEX vocabulary: a lightweight interchange format
for machine learning experiments. SEMANTICS 2015: 169-176
lightweight interchange format
maps to PROV
Use case: annotating ML experiments and interchanging ML metadata
Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 9 / 18
Previous step towards aligning DM/ML ontologies
DMO Ontology Jamboree, Josef Stefan Institute, Slovenia, 2010
Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 10 / 18
Goals
To define a simple shared schema of data mining/ machine
learning (DM/ML) algorithms, datasets, and experiments that
may be used in many di↵erent formats: XML, RDF, OWL,
spreadsheet tables.
Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 11 / 18
The current draft of ML Schema
OpenML2016, Lorentz Center, Netherlands, 2016
(our work may be found at https://github.com/ML-Schema/core)
Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 12 / 18
Goals
Turn machine learning algorithms and results into linked open
data
Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 13 / 18
OpenML2016 plan for integrating OpenML with ML
Schema and Linked Data
Assign URIs to OpenML classes and properties
Align OpenML vocabulary to ML-Schema
Complete an initial specification of ML-Schema v1.0
Develop a tool to provide each OpenML entity with RDF data
(JSON-LD)
Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 14 / 18
Goals
Collect use cases from the academic community and industry
Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 15 / 18
Use cases
Experiment/model sharing
Workflow design/planning
Meta learning
Text mining
Experiment reproducibility in publications
Comparison of ML algorithms
Education
Call for use cases!
Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 16 / 18
Goals
Promote the use of this schema, including involving
stakeholders like ML tool developers
Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 17 / 18
You are invited to join the W3C ML Schema group!
Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 18 / 18

ML Schema: Machine Learning Schema

  • 1.
    ML Schema: MachineLearning Schema Agnieszka Lawrynowicz Poznan University of Technology, Poland OpenML2016 March 17, 2016 Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 1 / 18
  • 2.
    W3C Machine LearningSchema Community Group https://www.w3.org/community/ml-schema/ Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 2 / 18
  • 3.
    Goals To define asimple shared schema of data mining/ machine learning (DM/ML) algorithms, datasets, and experiments that may be used in many di↵erent formats: XML, RDF, OWL, spreadsheet tables. Collect use cases from the academic community and industry Use this schema as a basis to align existing DM/ML ontologies and develop more specific ontologies with specific purposes/applications Prevent a proliferation of incompatible DM/ML ontologies Turn machine learning algorithms and results into linked open data Promote the use of this schema, including involving stakeholders like ML tool developers Apply for funding (e.g. EU COST, UK Research Councils, Horizon2020 Coordination and Support Actions) to organize workshops, and for dissemination Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 3 / 18
  • 4.
    Goals Use this schemaas a basis to align existing DM/ML ontologies and develop more specific ontologies with specific purposes/applications Prevent a proliferation of incompatible DM/ML ontologies Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 4 / 18
  • 5.
    ML ontologies andvocabularies OntoDM DMOP Expos´e MEX vocabulary others: KDDONTO, KD, DMWF, ... mostly having several hundreds of classes, some highly axiomatized Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 5 / 18
  • 6.
    OntoDM Pance Panov, LarisaN. Soldatova, Saso Dzeroski: Ontology of core data mining entities. Data Min. Knowl. Discov. 28(5-6): 1222-1265 (2014) built in compliance to upper level ontologies BFO, OBI, IAO, modularized incorporates structured data mining Use case: generic, middle level ontology for ML; representing QSAR entities for drug design, used by Eve Robot Scientist Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 6 / 18
  • 7.
    DMOP: Data MiningOptimization Ontology C. Maria Keet, Agnieszka Lawrynowicz, Claudia d’Amato, Alexandros Kalousis, Phong Nguyen, Raul Palma, Robert Stevens, Melanie Hilario: The Data Mining OPtimization Ontology. J. Web Sem. 32: 43-53 (2015) development started in e-LICO EU FP7 project (2009-2012) detailed algorithm internal characteristics (’qualities’) Use case: meta-learning (’whitebox’), meta-mining, used to produce Intelligent Discovery Assistant for RapidMiner Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 7 / 18
  • 8.
    Expos´e Joaquin Vanschoren, HendrikBlockeel, Bernhard Pfahringer, Geo↵rey Holmes: Experiment databases - A new way to share, organize and learn from experiments. Machine Learning 87(2): 127-158 (2012) re-uses OntoDM (at top-level) and DMOP (at bottom level) superseded by OpenML DB schema Use case: experiment databases, ExpML markup Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 8 / 18
  • 9.
    MEX vocabulary Diego Esteves,Diego Moussallem, Ciro Baron Neto, Tommaso Soru, Ricardo Usbeck, Markus Ackermann, Jens Lehmann: MEX vocabulary: a lightweight interchange format for machine learning experiments. SEMANTICS 2015: 169-176 lightweight interchange format maps to PROV Use case: annotating ML experiments and interchanging ML metadata Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 9 / 18
  • 10.
    Previous step towardsaligning DM/ML ontologies DMO Ontology Jamboree, Josef Stefan Institute, Slovenia, 2010 Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 10 / 18
  • 11.
    Goals To define asimple shared schema of data mining/ machine learning (DM/ML) algorithms, datasets, and experiments that may be used in many di↵erent formats: XML, RDF, OWL, spreadsheet tables. Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 11 / 18
  • 12.
    The current draftof ML Schema OpenML2016, Lorentz Center, Netherlands, 2016 (our work may be found at https://github.com/ML-Schema/core) Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 12 / 18
  • 13.
    Goals Turn machine learningalgorithms and results into linked open data Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 13 / 18
  • 14.
    OpenML2016 plan forintegrating OpenML with ML Schema and Linked Data Assign URIs to OpenML classes and properties Align OpenML vocabulary to ML-Schema Complete an initial specification of ML-Schema v1.0 Develop a tool to provide each OpenML entity with RDF data (JSON-LD) Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 14 / 18
  • 15.
    Goals Collect use casesfrom the academic community and industry Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 15 / 18
  • 16.
    Use cases Experiment/model sharing Workflowdesign/planning Meta learning Text mining Experiment reproducibility in publications Comparison of ML algorithms Education Call for use cases! Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 16 / 18
  • 17.
    Goals Promote the useof this schema, including involving stakeholders like ML tool developers Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 17 / 18
  • 18.
    You are invitedto join the W3C ML Schema group! Agnieszka Lawrynowicz ML Schema: Machine Learning Schema 18 / 18