SlideShare a Scribd company logo
MEX Vocabulary
A Lightweight Interchange Format for Machine Learning Experiments
Diego Esteves et al.
Department of Computer Science, AKSW
University of Leipzig
17 Sep 2015 - SEMANTiCS
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 1 / 30
Outline
1 Introduction
Problem
Motivation
Challenges
State of the Art
2 MEX
The Inspiration
The Architecture
Examples
3 Conclusion and Future Work
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 2 / 30
Motivation
The Problem
The Problem
How should we represent results of machine learning experiments in a
common, comprehensive and interoperable format?
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 3 / 30
Motivation
Example 1: Collaborative Project
Three Universities are working collaboratively in a research project
How to achieve a high level of interoperability?
A uses the Weka1 toolkit.
B uses DL-Learner2
C uses the Accord Framework3
1
http://www.cs.waikato.ac.nz/ml/weka/
2
http://dl-learner.org/
3
http://accord-framework.net/
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 4 / 30
Motivation
Example 2: hands on...
A complex script-based scenario
You are working on your research about stock market predictions and want
to store the data for further analysis?
eg.: a script which takes 2 days to run a multi-level machine learning
algorithm.
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 5 / 30
Motivation
Example 3: Reading or Reviewing a paper
You are a reviewer or scientist...
sometimes it’s hard to understand the proposed solution of a research
paper.
.
The ACL POS Tagging website (State of the art)
exemplifies a good use case for MEX on the web 1.
Furthermore, in both cases the task/reading is error-prone and
time-consuming.
1
http://aclweb.org/aclwiki/index.php?title=POS_Tagging_(State_of_
the_art)
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 6 / 30
Motivation
Solution
Machine-readable data
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 7 / 30
Motivation
Solution
Existing Standards:
Comma-Separated Values (CSV)
eXtensible Markup Language (XML)
JavaScript Object Notation (JSON)
Value-Object (VO)
Data-Transfer-Objects (DTO)
Database Management System (DBMS)
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 8 / 30
Motivation
Solution
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 9 / 30
Motivation
3 drawbacks
1 The lack of schema definition: you always have to define the
schema by yourself and share your model afterwards.
2 DBMS is technology-dependent and does not provides reasoning
and inference capabilities.
3 the lack of semantic information.
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 10 / 30
Motivation
Problem: an example
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 11 / 30
Motivation
The Problem
The Optimal Scenario
How should we represent results of machine learning experiments in a
common1, comprehensive (but not complex)2, lightweight3,
interoperable4 and flexible5 format, taking into consideration a low
effort-level6 for implementation?
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 12 / 30
State of the Art
Related Work
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 13 / 30
State of the Art
Platforms for e-science workflows
Name Description
MyExperiment
[DeRoure2009 ]
A collaborative environment
where scientists can
publish their workflows and
experiment plans
Wings
[Gil2011 ]
A Semantic Approach to
creating very large
scientific workflows
OpenTOX
[Tcheremenskaia2012 ]
An interoperable predictive
toxicology framework
OpenML
[Vanschoren2014 ]
A frictionless,
collaborative environment
for exploring machine
learning
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 14 / 30
State of the Art
Ontologies
Name Description
Expos´e
[Vanschoren2010 ]
Data mining experiments
used in conjunction with
Experiment Databases
OntoDM
[Panov2013 ]
Data mining investigations
DMOP
[Keet2015 ]
Data Mining OPtimization
Ontology: It supports
informed decision-making
at various choice points of
the data mining process
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 15 / 30
MEX.aksw.org
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 16 / 30
The abstraction
What we want to describe
Machine Learning Definition by T.Mitchell
“A computer program is said to learn from experience E with respect to
some task T and some performance measure P, if its performance on
T, as measured by P, improves with experience E” – Tom Mitchell
ML Concepts MEX Classes
experience E mexcore:ExecutionCollection
task T mexalgo:Algorithm
performance measure P mexperf:ExecutionPerformance
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 17 / 30
MEX
3 Vocabularies
MEX Core
formalizes the key entities for representing the basic
steps on machine learning executions
MEX Algorithm
representing the context of machine learning algorithms and
their associated characteristics
MEX Performance
provides the basic entities for representing the
experimental results of executions of machine learning
algorithms
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 18 / 30
MEX Vocabulary (:mexalgo + :mexcore + :mexperf)
and Related Ontologies
402
778
858
757
MEX (7+14+10=31)
ONTO-DM
Expos´e
DMOP
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 19 / 30
MEX
Interlinking the 3 layers: mexalgo, mexcore and mexperf
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 20 / 30
:mexalgo
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 21 / 30
:mexcore
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 22 / 30
:mexperf
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 23 / 30
MEX
ACL POS Tagging website metadata
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 24 / 30
Next chapter ;-)
RDF? Ontology? Jena?
Dublin Core...? SPARQL?
OWL? PROV-O, What?
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 25 / 30
p u b l i c s t a t i c void main ( S t r i n g [ ] args ) {
MyMEX 10 mex = new MyMEX 10 ( ) ;
mex . setAuthorName (”D Esteves ” ) ;
S t r i n g e i d = ”E001S001 ”;
mex . addConf ( e i d ) . s e t D e s c r i p t i o n (” h e l l o world experiment ” ) ;
mex . Conf ( e i d ) . addFeature (” min ; max ; op ; c l o s e ” ) ;
mex . Conf ( e i d ) . Implementation ( ) . s e t ( enumImplementation . Weka ) ;
mex . Conf ( e i d ) . addAlgorithm ( enumAlgorithm . SupportVectorMachines ) ;
mex . Conf ( e i d ) . addAlgorithm ( enumAlgorithm . NaiveBayes ) ;
mex . Conf ( e i d ) . Algorithm ( enumAlgorithm . SupportVectorMachines ) . addParameter (”C” , ”10ˆ3”);
mex . Conf ( e i d ) . Algorithm ( enumAlgorithm . SupportVectorMachines ) . addParameter (” alpha ” , ” 0 . 2 ” ) ;
. . .
}
/∗ your code here ∗/
. . .
S t r i n g e x i d = mex . Conf ( e i d ) . a d d E x e c u t i o n O v e r a l l . addPerformance ( enumMeasures .ACCURACY, . 9 6 ) ;
S t r i n g e x i d = mex . Conf ( e i d ) . E x e c u t i o n O v e r a l l ( e x i d ) . addPerformance ( enumMeasures .TPR, . 7 8 ) ;
. . .
MEXSerializer 10 . g e t I n s t a n c e ( ) . parse (mex ) ;
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 26 / 30
Conclusion
D.Esteves et al.
Requirement Argumentation
lightweight 7: this is the minimal number of
classes you need for representing a
basic execution. 31: this is the
number of the most important entities
in the 3 layers
flexible Single or Overall Executions
Choose your inputs/outputs
low
effort-level
MEX provides APIs which encapsulate
the semantic knowledge. So you can
avoid extra implementation-effort and
just log your inputs and outputs
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 27 / 30
Conclusion
D.Esteves et al.
Requirement Argumentation
common The concepts behind vocabularies
allow us to achieve a high level
of abstraction, generalization and
formalization of concepts
interoperable Vocabularies are the current best
choice for representing real-world
entities
comprehensive classification, regression and
clustering problems are covered
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 28 / 30
Conclusion
D.Esteves et al.
1 Produces Provenance Metadata.
2 Allows Querying Results.
3 Defines an Interoperable Format for Sharing Machine Learning
Experiments.
4 Benefits Meta-Learning [Vilalta2002 ] Approaches.
5 Tends to minimize the misinterpretation probability rate
on persuasive and informative aspects [Gillen2006 ].
6 MEX is flexible and lightweight.
7 Experiment Databases [Blockeel2007 ][Vanschoren2012 ] need
an interchange format for experiments.
8 MEX provides APIs which facilitate the file generation process.
9 Benchmark Systems[Usbeck2014 ] can benefit from a standard
format.
10 Generate your LaTeX table automatically.
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 29 / 30
MEX
D.Esteves et al.
Thank you so much for your attention!
mex.aksw.org
Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 30 / 30

More Related Content

Similar to MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

RANDOM TESTS COMBINING MATHEMATICA PACKAGE AND LATEX COMPILER
RANDOM TESTS COMBINING MATHEMATICA PACKAGE AND LATEX COMPILERRANDOM TESTS COMBINING MATHEMATICA PACKAGE AND LATEX COMPILER
RANDOM TESTS COMBINING MATHEMATICA PACKAGE AND LATEX COMPILER
ijseajournal
 
INTEGRATION OF LATEX FORMULA IN COMPUTER-BASED TEST APPLICATION FOR ACADEMIC ...
INTEGRATION OF LATEX FORMULA IN COMPUTER-BASED TEST APPLICATION FOR ACADEMIC ...INTEGRATION OF LATEX FORMULA IN COMPUTER-BASED TEST APPLICATION FOR ACADEMIC ...
INTEGRATION OF LATEX FORMULA IN COMPUTER-BASED TEST APPLICATION FOR ACADEMIC ...
IJCSES Journal
 
GPSS interactive learning environment
GPSS interactive learning environmentGPSS interactive learning environment
GPSS interactive learning environment
Servicio de Difusión de la Creación Intelectual (SEDICI)
 
Slider: an Efficient Incremental Reasoner, by Jules Chevalier
Slider: an Efficient Incremental Reasoner, by Jules ChevalierSlider: an Efficient Incremental Reasoner, by Jules Chevalier
Slider: an Efficient Incremental Reasoner, by Jules Chevalier
opencloudware
 
Early Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEarly Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data Cubes
Enrico Daga
 
COMBINE standards & tools: Getting model management right
COMBINE standards & tools: Getting model management rightCOMBINE standards & tools: Getting model management right
COMBINE standards & tools: Getting model management right
University Medicine Greifswald
 
RuleML 2015: Semantics of Notation3 Logic: A Solution for Implicit Quantifica...
RuleML 2015: Semantics of Notation3 Logic: A Solution for Implicit Quantifica...RuleML 2015: Semantics of Notation3 Logic: A Solution for Implicit Quantifica...
RuleML 2015: Semantics of Notation3 Logic: A Solution for Implicit Quantifica...
RuleML
 
GPSS interactive learning environment
GPSS interactive learning environmentGPSS interactive learning environment
GPSS interactive learning environment
Servicio de Difusión de la Creación Intelectual (SEDICI)
 
«Ejemplos de herramientas que nos facilitan las analíticas de aprendizaje en ...
«Ejemplos de herramientas que nos facilitan las analíticas de aprendizaje en ...«Ejemplos de herramientas que nos facilitan las analíticas de aprendizaje en ...
«Ejemplos de herramientas que nos facilitan las analíticas de aprendizaje en ...
eMadrid network
 
Latex crash course
Latex crash courseLatex crash course
Latex crash course
Tomislav Hengl
 
Programming the Interaction Space Effectively with ReSpecTX
Programming the Interaction Space Effectively with ReSpecTXProgramming the Interaction Space Effectively with ReSpecTX
Programming the Interaction Space Effectively with ReSpecTX
Stefano Mariani
 
NOGESI case study
NOGESI case studyNOGESI case study
NOGESI case study
Marketing Simware
 
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
ijseajournal
 
Functional Verification of Large-integers Circuits using a Cosimulation-base...
Functional Verification of Large-integers Circuits using a  Cosimulation-base...Functional Verification of Large-integers Circuits using a  Cosimulation-base...
Functional Verification of Large-integers Circuits using a Cosimulation-base...
IJECEIAES
 
Data structures and algorithms short note (version 14).pd
Data structures and algorithms short note (version 14).pdData structures and algorithms short note (version 14).pd
Data structures and algorithms short note (version 14).pd
Nimmi Weeraddana
 
E NHANCED S PREADSHEET C OMPUTING W ITH F INITE - D OMAIN C ONSTRAINT S ATI...
E NHANCED S PREADSHEET C OMPUTING  W ITH F INITE - D OMAIN  C ONSTRAINT S ATI...E NHANCED S PREADSHEET C OMPUTING  W ITH F INITE - D OMAIN  C ONSTRAINT S ATI...
E NHANCED S PREADSHEET C OMPUTING W ITH F INITE - D OMAIN C ONSTRAINT S ATI...
ijpla
 
5215ijpla01
5215ijpla015215ijpla01
5215ijpla01
ijpla
 
Composite Design Pattern
Composite Design PatternComposite Design Pattern
Composite Design Pattern
Ferdous Mahmud Shaon
 
Building data fusion surrogate models for spacecraft aerodynamic problems wit...
Building data fusion surrogate models for spacecraft aerodynamic problems wit...Building data fusion surrogate models for spacecraft aerodynamic problems wit...
Building data fusion surrogate models for spacecraft aerodynamic problems wit...
Shinwoo Jang
 
2Regression testing refers to a software testing technique that re-runs non-f...
2Regression testing refers to a software testing technique that re-runs non-f...2Regression testing refers to a software testing technique that re-runs non-f...
2Regression testing refers to a software testing technique that re-runs non-f...
gjeyasriitaamecnew
 

Similar to MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments (20)

RANDOM TESTS COMBINING MATHEMATICA PACKAGE AND LATEX COMPILER
RANDOM TESTS COMBINING MATHEMATICA PACKAGE AND LATEX COMPILERRANDOM TESTS COMBINING MATHEMATICA PACKAGE AND LATEX COMPILER
RANDOM TESTS COMBINING MATHEMATICA PACKAGE AND LATEX COMPILER
 
INTEGRATION OF LATEX FORMULA IN COMPUTER-BASED TEST APPLICATION FOR ACADEMIC ...
INTEGRATION OF LATEX FORMULA IN COMPUTER-BASED TEST APPLICATION FOR ACADEMIC ...INTEGRATION OF LATEX FORMULA IN COMPUTER-BASED TEST APPLICATION FOR ACADEMIC ...
INTEGRATION OF LATEX FORMULA IN COMPUTER-BASED TEST APPLICATION FOR ACADEMIC ...
 
GPSS interactive learning environment
GPSS interactive learning environmentGPSS interactive learning environment
GPSS interactive learning environment
 
Slider: an Efficient Incremental Reasoner, by Jules Chevalier
Slider: an Efficient Incremental Reasoner, by Jules ChevalierSlider: an Efficient Incremental Reasoner, by Jules Chevalier
Slider: an Efficient Incremental Reasoner, by Jules Chevalier
 
Early Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEarly Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data Cubes
 
COMBINE standards & tools: Getting model management right
COMBINE standards & tools: Getting model management rightCOMBINE standards & tools: Getting model management right
COMBINE standards & tools: Getting model management right
 
RuleML 2015: Semantics of Notation3 Logic: A Solution for Implicit Quantifica...
RuleML 2015: Semantics of Notation3 Logic: A Solution for Implicit Quantifica...RuleML 2015: Semantics of Notation3 Logic: A Solution for Implicit Quantifica...
RuleML 2015: Semantics of Notation3 Logic: A Solution for Implicit Quantifica...
 
GPSS interactive learning environment
GPSS interactive learning environmentGPSS interactive learning environment
GPSS interactive learning environment
 
«Ejemplos de herramientas que nos facilitan las analíticas de aprendizaje en ...
«Ejemplos de herramientas que nos facilitan las analíticas de aprendizaje en ...«Ejemplos de herramientas que nos facilitan las analíticas de aprendizaje en ...
«Ejemplos de herramientas que nos facilitan las analíticas de aprendizaje en ...
 
Latex crash course
Latex crash courseLatex crash course
Latex crash course
 
Programming the Interaction Space Effectively with ReSpecTX
Programming the Interaction Space Effectively with ReSpecTXProgramming the Interaction Space Effectively with ReSpecTX
Programming the Interaction Space Effectively with ReSpecTX
 
NOGESI case study
NOGESI case studyNOGESI case study
NOGESI case study
 
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
 
Functional Verification of Large-integers Circuits using a Cosimulation-base...
Functional Verification of Large-integers Circuits using a  Cosimulation-base...Functional Verification of Large-integers Circuits using a  Cosimulation-base...
Functional Verification of Large-integers Circuits using a Cosimulation-base...
 
Data structures and algorithms short note (version 14).pd
Data structures and algorithms short note (version 14).pdData structures and algorithms short note (version 14).pd
Data structures and algorithms short note (version 14).pd
 
E NHANCED S PREADSHEET C OMPUTING W ITH F INITE - D OMAIN C ONSTRAINT S ATI...
E NHANCED S PREADSHEET C OMPUTING  W ITH F INITE - D OMAIN  C ONSTRAINT S ATI...E NHANCED S PREADSHEET C OMPUTING  W ITH F INITE - D OMAIN  C ONSTRAINT S ATI...
E NHANCED S PREADSHEET C OMPUTING W ITH F INITE - D OMAIN C ONSTRAINT S ATI...
 
5215ijpla01
5215ijpla015215ijpla01
5215ijpla01
 
Composite Design Pattern
Composite Design PatternComposite Design Pattern
Composite Design Pattern
 
Building data fusion surrogate models for spacecraft aerodynamic problems wit...
Building data fusion surrogate models for spacecraft aerodynamic problems wit...Building data fusion surrogate models for spacecraft aerodynamic problems wit...
Building data fusion surrogate models for spacecraft aerodynamic problems wit...
 
2Regression testing refers to a software testing technique that re-runs non-f...
2Regression testing refers to a software testing technique that re-runs non-f...2Regression testing refers to a software testing technique that re-runs non-f...
2Regression testing refers to a software testing technique that re-runs non-f...
 

Recently uploaded

Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 

Recently uploaded (20)

Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 

MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

  • 1. MEX Vocabulary A Lightweight Interchange Format for Machine Learning Experiments Diego Esteves et al. Department of Computer Science, AKSW University of Leipzig 17 Sep 2015 - SEMANTiCS Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 1 / 30
  • 2. Outline 1 Introduction Problem Motivation Challenges State of the Art 2 MEX The Inspiration The Architecture Examples 3 Conclusion and Future Work Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 2 / 30
  • 3. Motivation The Problem The Problem How should we represent results of machine learning experiments in a common, comprehensive and interoperable format? Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 3 / 30
  • 4. Motivation Example 1: Collaborative Project Three Universities are working collaboratively in a research project How to achieve a high level of interoperability? A uses the Weka1 toolkit. B uses DL-Learner2 C uses the Accord Framework3 1 http://www.cs.waikato.ac.nz/ml/weka/ 2 http://dl-learner.org/ 3 http://accord-framework.net/ Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 4 / 30
  • 5. Motivation Example 2: hands on... A complex script-based scenario You are working on your research about stock market predictions and want to store the data for further analysis? eg.: a script which takes 2 days to run a multi-level machine learning algorithm. Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 5 / 30
  • 6. Motivation Example 3: Reading or Reviewing a paper You are a reviewer or scientist... sometimes it’s hard to understand the proposed solution of a research paper. . The ACL POS Tagging website (State of the art) exemplifies a good use case for MEX on the web 1. Furthermore, in both cases the task/reading is error-prone and time-consuming. 1 http://aclweb.org/aclwiki/index.php?title=POS_Tagging_(State_of_ the_art) Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 6 / 30
  • 7. Motivation Solution Machine-readable data Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 7 / 30
  • 8. Motivation Solution Existing Standards: Comma-Separated Values (CSV) eXtensible Markup Language (XML) JavaScript Object Notation (JSON) Value-Object (VO) Data-Transfer-Objects (DTO) Database Management System (DBMS) Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 8 / 30
  • 9. Motivation Solution Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 9 / 30
  • 10. Motivation 3 drawbacks 1 The lack of schema definition: you always have to define the schema by yourself and share your model afterwards. 2 DBMS is technology-dependent and does not provides reasoning and inference capabilities. 3 the lack of semantic information. Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 10 / 30
  • 11. Motivation Problem: an example Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 11 / 30
  • 12. Motivation The Problem The Optimal Scenario How should we represent results of machine learning experiments in a common1, comprehensive (but not complex)2, lightweight3, interoperable4 and flexible5 format, taking into consideration a low effort-level6 for implementation? Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 12 / 30
  • 13. State of the Art Related Work Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 13 / 30
  • 14. State of the Art Platforms for e-science workflows Name Description MyExperiment [DeRoure2009 ] A collaborative environment where scientists can publish their workflows and experiment plans Wings [Gil2011 ] A Semantic Approach to creating very large scientific workflows OpenTOX [Tcheremenskaia2012 ] An interoperable predictive toxicology framework OpenML [Vanschoren2014 ] A frictionless, collaborative environment for exploring machine learning Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 14 / 30
  • 15. State of the Art Ontologies Name Description Expos´e [Vanschoren2010 ] Data mining experiments used in conjunction with Experiment Databases OntoDM [Panov2013 ] Data mining investigations DMOP [Keet2015 ] Data Mining OPtimization Ontology: It supports informed decision-making at various choice points of the data mining process Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 15 / 30
  • 16. MEX.aksw.org Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 16 / 30
  • 17. The abstraction What we want to describe Machine Learning Definition by T.Mitchell “A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E” – Tom Mitchell ML Concepts MEX Classes experience E mexcore:ExecutionCollection task T mexalgo:Algorithm performance measure P mexperf:ExecutionPerformance Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 17 / 30
  • 18. MEX 3 Vocabularies MEX Core formalizes the key entities for representing the basic steps on machine learning executions MEX Algorithm representing the context of machine learning algorithms and their associated characteristics MEX Performance provides the basic entities for representing the experimental results of executions of machine learning algorithms Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 18 / 30
  • 19. MEX Vocabulary (:mexalgo + :mexcore + :mexperf) and Related Ontologies 402 778 858 757 MEX (7+14+10=31) ONTO-DM Expos´e DMOP Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 19 / 30
  • 20. MEX Interlinking the 3 layers: mexalgo, mexcore and mexperf Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 20 / 30
  • 21. :mexalgo Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 21 / 30
  • 22. :mexcore Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 22 / 30
  • 23. :mexperf Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 23 / 30
  • 24. MEX ACL POS Tagging website metadata Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 24 / 30
  • 25. Next chapter ;-) RDF? Ontology? Jena? Dublin Core...? SPARQL? OWL? PROV-O, What? Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 25 / 30
  • 26. p u b l i c s t a t i c void main ( S t r i n g [ ] args ) { MyMEX 10 mex = new MyMEX 10 ( ) ; mex . setAuthorName (”D Esteves ” ) ; S t r i n g e i d = ”E001S001 ”; mex . addConf ( e i d ) . s e t D e s c r i p t i o n (” h e l l o world experiment ” ) ; mex . Conf ( e i d ) . addFeature (” min ; max ; op ; c l o s e ” ) ; mex . Conf ( e i d ) . Implementation ( ) . s e t ( enumImplementation . Weka ) ; mex . Conf ( e i d ) . addAlgorithm ( enumAlgorithm . SupportVectorMachines ) ; mex . Conf ( e i d ) . addAlgorithm ( enumAlgorithm . NaiveBayes ) ; mex . Conf ( e i d ) . Algorithm ( enumAlgorithm . SupportVectorMachines ) . addParameter (”C” , ”10ˆ3”); mex . Conf ( e i d ) . Algorithm ( enumAlgorithm . SupportVectorMachines ) . addParameter (” alpha ” , ” 0 . 2 ” ) ; . . . } /∗ your code here ∗/ . . . S t r i n g e x i d = mex . Conf ( e i d ) . a d d E x e c u t i o n O v e r a l l . addPerformance ( enumMeasures .ACCURACY, . 9 6 ) ; S t r i n g e x i d = mex . Conf ( e i d ) . E x e c u t i o n O v e r a l l ( e x i d ) . addPerformance ( enumMeasures .TPR, . 7 8 ) ; . . . MEXSerializer 10 . g e t I n s t a n c e ( ) . parse (mex ) ; Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 26 / 30
  • 27. Conclusion D.Esteves et al. Requirement Argumentation lightweight 7: this is the minimal number of classes you need for representing a basic execution. 31: this is the number of the most important entities in the 3 layers flexible Single or Overall Executions Choose your inputs/outputs low effort-level MEX provides APIs which encapsulate the semantic knowledge. So you can avoid extra implementation-effort and just log your inputs and outputs Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 27 / 30
  • 28. Conclusion D.Esteves et al. Requirement Argumentation common The concepts behind vocabularies allow us to achieve a high level of abstraction, generalization and formalization of concepts interoperable Vocabularies are the current best choice for representing real-world entities comprehensive classification, regression and clustering problems are covered Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 28 / 30
  • 29. Conclusion D.Esteves et al. 1 Produces Provenance Metadata. 2 Allows Querying Results. 3 Defines an Interoperable Format for Sharing Machine Learning Experiments. 4 Benefits Meta-Learning [Vilalta2002 ] Approaches. 5 Tends to minimize the misinterpretation probability rate on persuasive and informative aspects [Gillen2006 ]. 6 MEX is flexible and lightweight. 7 Experiment Databases [Blockeel2007 ][Vanschoren2012 ] need an interchange format for experiments. 8 MEX provides APIs which facilitate the file generation process. 9 Benchmark Systems[Usbeck2014 ] can benefit from a standard format. 10 Generate your LaTeX table automatically. Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 29 / 30
  • 30. MEX D.Esteves et al. Thank you so much for your attention! mex.aksw.org Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 30 / 30