SlideShare a Scribd company logo
1 of 33
Download to read offline
Introduction Three classifiers Experiments and results Summary
Similarity Features, and their Role in Concept
Alignment Learning
Shenghui Wang1 Gwenn Englebienne2 Christophe Gu´eret 1
Stefan Schlobach1 Antoine Isaac1 Martijn Schut1
1 Vrije Universiteit Amsterdam
2 Universiteit van Amsterdam
SEMAPRO 2010
Florence
Introduction Three classifiers Experiments and results Summary
Outline
1 Introduction
Classification of concept mappings based on instance
similarity
2 Three classifiers
Markov Random Field
Multi-objective Evolution Strategy
Support Vector Machine
3 Experiments and results
4 Summary
Introduction Three classifiers Experiments and results Summary
Thesaurus mapping
SemanTic Interoperability To access Cultural Heritage
(STITCH) through mappings between thesauri
Scope of the problem:
Big thesauri with tens of thousands of concepts
Huge collections (e.g., National Library of the Neterlands:
80km of books in one collection)
Heterogeneous (e.g., books, manuscripts, illustrations, etc.)
Multi-lingual problem
Solving matching problems is one step to the solution of the
interoperability problem.
e.g., “plankzeilen” vs. “surfsport”
e.g., “archeology” vs. “excavation”
Introduction Three classifiers Experiments and results Summary
Thesaurus mapping
SemanTic Interoperability To access Cultural Heritage
(STITCH) through mappings between thesauri
Scope of the problem:
Big thesauri with tens of thousands of concepts
Huge collections (e.g., National Library of the Neterlands:
80km of books in one collection)
Heterogeneous (e.g., books, manuscripts, illustrations, etc.)
Multi-lingual problem
Solving matching problems is one step to the solution of the
interoperability problem.
e.g., “plankzeilen” vs. “surfsport”
e.g., “archeology” vs. “excavation”
Introduction Three classifiers Experiments and results Summary
Thesaurus mapping
SemanTic Interoperability To access Cultural Heritage
(STITCH) through mappings between thesauri
Scope of the problem:
Big thesauri with tens of thousands of concepts
Huge collections (e.g., National Library of the Neterlands:
80km of books in one collection)
Heterogeneous (e.g., books, manuscripts, illustrations, etc.)
Multi-lingual problem
Solving matching problems is one step to the solution of the
interoperability problem.
e.g., “plankzeilen” vs. “surfsport”
e.g., “archeology” vs. “excavation”
Introduction Three classifiers Experiments and results Summary
Automatic alignment techniques
Lexical
labels and textual information of entities
Structural
structure of the formal definitions of entities, position in the
hierarchy
Extensional
statistical information of instances, i.e., objects indexed with
entities
Background knowledge
using a shared conceptual reference to find links indirectly
Introduction Three classifiers Experiments and results Summary
Instance-based techniques: common instance based
Introduction Three classifiers Experiments and results Summary
Instance-based techniques: common instance based
Introduction Three classifiers Experiments and results Summary
Instance-based techniques: common instance based
Introduction Three classifiers Experiments and results Summary
Pros and cons
Advantages
Simple to implement
Interesting results
Disadvantages
Requires sufficient amounts of common instances
Only uses part of the available information
Introduction Three classifiers Experiments and results Summary
Instance-based techniques: Instance similarity based
Introduction Three classifiers Experiments and results Summary
Instance-based techniques: Instance similarity based
Introduction Three classifiers Experiments and results Summary
Instance-based techniques: Instance similarity based
Introduction Three classifiers Experiments and results Summary
Representing concepts and the similarity between them
Instance features Concept features Pair features
Cos. dist.
Bag of words
Bag of words
Bag of words
Bag of words
Bag of words
Bag of words
Creator
Title
Publisher
...
Creator
Title
Publisher
...
Creator
Title
Publisher
...
...
f1
f2
f3
Concept1Concept2
{{
{
{
{
Creator
Title
Publisher
...
Creator
Title
Publisher
...
Creator
Term 1: 4
Term 2: 1
Term 3: 0
...
Title
Term 1: 0
Term 2: 3
Term 3: 0
...
Publisher
Term 1: 2
Term 2: 1
Term 3: 3
...
Creator
Term 1: 2
Term 2: 0
Term 3: 0
...
Title
Term 1: 0
Term 2: 4
Term 3: 1
...
Publisher
Term 1: 4
Term 2: 1
Term 3: 1
...
Cos. dist.
Cos. dist.
Introduction Three classifiers Experiments and results Summary
Classification of concept mappings based on instance similarity
Classification based on instance similarity
Each pair of concepts is treated as a point in a “similarity
space”
Its position is defined by the features of the pair.
The features of the pair are the different measures of similarity
between the concepts’ instances.
Hypothesis: the label of a point — which represents whether
the pair is a positive mapping or negative one — is correlated
with the position of this point in this space.
With already labelled points and the actual similarity values of
concepts involved, it is possible to classify a point, i.e., to give
it a right label, based on its location given by the actual
similarity values.
Introduction Three classifiers Experiments and results Summary
Classification of concept mappings based on instance similarity
Research questions
How do different classifiers perform on this instance-based
mapping task?
What are the benefits of using a machine learning algorithm
to determine the importance of features?
Are there regularities wrt. the relative importance given to
specific features for similarity computation? Are these weights
related to application data characteristics?
Introduction Three classifiers Experiments and results Summary
Three classifiers used
Markov Random Field (MRF)
Evolutionary Strategy (ES)
Support Vector Machine (SVM)
Introduction Three classifiers Experiments and results Summary
Markov Random Field
Markov Random Field
Let T = { (x(i), y(i)) }N
i=1 be the training set
x(i)
∈ RK
, the features
y(i)
∈ Y = {positive, negative}, the label
The conditional probability of a label given the input is
modelled as
p(y(i)
|xi , θ) =
1
Z(xi , θ)
exp
K
j=1
λj φj (y(i)
, x(i)
) , (1)
where θ = { λj }K
j=1 are the weights associated to the feature
functions φ and Z(xi , θ) is a normalisation constant
Introduction Three classifiers Experiments and results Summary
Markov Random Field
The classifier used: Markov Random Field (cont’)
The likelihood of the data set for given model parameters
p(T|θ) is given by:
p(T|θ) =
N
i=1
p(y(i)
|x(i)
) (2)
During learning, our objective is to find the most likely values
for θ for the given training data.
The decision criterion for assigning a label y(i) to a new pair
of concepts i is then simply given by:
y(i)
= argmax
y
p(y|x(i)
) (3)
Introduction Three classifiers Experiments and results Summary
Multi-objective Evolution Strategy
Multi-objective Evolution Strategy
Evolutionary strategies (ES) have two characteristic properties:
firstly, they are used for continuous value optimisation, and,
secondly, they are self-adaptive.
An ES individual is a direct model of the searched solution,
defined by Λ and some evolution strategy parameters:
Λ, Σ ↔ λ1, . . . , λK , σ1, . . . , σK (4)
The fitness function is related to the decision criterion for the
ES, which is sign-based:
LES
i =
1 if K
j=1 λi Fij > 0
0 otherwise
(5)
Introduction Three classifiers Experiments and results Summary
Multi-objective Evolution Strategy
Multi-objective Evolution Strategy (cont’)
Maximising the number of positive results and negative results
are two opposite goals.
f1(Λ | F, L) = #{Fi |
K
j=1
λi Fij > 0 ∧ Li = 1} (6)
f2(Λ | F, L) = #{Fi |
K
j=1
λi Fij ≤ 0 ∧ Li = 0} (7)
Introduction Three classifiers Experiments and results Summary
Multi-objective Evolution Strategy
Multi-objective Evolution Strategy (cont’)
Evolution process
Recombination: Two parent individuals are combined using
different weighting, producing two new individuals
Mutation: One parent individual changes itself into a new child
individual
Survivor selection: NSGA-II
Introduction Three classifiers Experiments and results Summary
Support Vector Machine
Support Vector Machine
Support Vector Machine (SVM) is used as a maximum margin
classifier whose task consists in finding an hyperplane separating
the two classes.
The objective is to maximise the margin separating the two
classes whilst minimizing classification error risk.
Introduction Three classifiers Experiments and results Summary
Experiments and results
Thesauri to match: GTT (35K) and Brinkman (5K)
Instances: 1 million books
GTT annotated books: 307K
Brinkman annotated books: 490K
Dually annotated books: 222K
Introduction Three classifiers Experiments and results Summary
Feature slection for similarity calculation
λj Feature
1 Lexical
2 Jaccard
3 Date
4 ISBN
5 NBN
6 PPN
7 SelSleutel
8 abstract
9 alternative
10 annotation
λj Feature
11 author
12 contributor
13 creator
14 dateCopyrighted
15 description
16 extent
17 hasFormat
18 hasPart
19 identifier
20 isVersionOf
λj Feature
21 issued
22 language
23 mods:edition
24 publisher
25 refNBN
26 relation
27 spatial
28 subject
29 temporal
30 title
Table: List of the features
Introduction Three classifiers Experiments and results Summary
Quality of learning
0
0.2
0.4
0.6
0.8
1
Precision Recall F-measure
MRF 1-30
MRF 3-30
ES
SVM
0
0.2
0.4
0.6
0.8
1
Precision Recall F-measure
0
0.2
0.4
0.6
0.8
1
Precision Recall F-measure
0
0.2
0.4
0.6
0.8
1
Precision Recall F-measure
Figure: Precision, recall and F-Measure for mappings with a positive
label (top) and a negative label (bottom). Error bars indicate one
standard deviation over the 10 folds of cross-validation.
Introduction Three classifiers Experiments and results Summary
Relative importance of features
Which features of our instances are important for mapping?
Figure: Mutual information between features and labels
Introduction Three classifiers Experiments and results Summary
Relative importance of features
ES lambdas are not really conclusive
ES lambdas that are most inconclusive correspond to the least
informative features
Introduction Three classifiers Experiments and results Summary
Relative importance of features
Important features in terms of mutual information are
associated to large MRF weights
Introduction Three classifiers Experiments and results Summary
A more detailed analysis
Expected important features:
Label similarity (1), instance overlap (2), subject (28), etc.
Expected unimportant features:
Size of the book (16), format description (17) and language
(22), etc.
Surprisingly important features:
Date (14)
Surprisingly unimportant features:
Description (15) and abstract (8)
Introduction Three classifiers Experiments and results Summary
A more detailed analysis
Expected important features:
Label similarity (1), instance overlap (2), subject (28), etc.
Expected unimportant features:
Size of the book (16), format description (17) and language
(22), etc.
Surprisingly important features:
Date (14)
Surprisingly unimportant features:
Description (15) and abstract (8)
Introduction Three classifiers Experiments and results Summary
Summary
We tried three machine learning classifiers on the
instance-based mapping task, among which MRF and ES can
automatically identify meaningful features.
The MRF and the ES, result in a performance in the
neighbourhood of 90%, showing the validity of the approach.
Our analysis suggests that when many different description
features interact, there is no systematic correlation between
what a learning method could find and what an application
expert may anticipate.
Introduction Three classifiers Experiments and results Summary
Thank you

More Related Content

What's hot

Discovering Novel Information with sentence Level clustering From Multi-docu...
Discovering Novel Information with sentence Level clustering  From Multi-docu...Discovering Novel Information with sentence Level clustering  From Multi-docu...
Discovering Novel Information with sentence Level clustering From Multi-docu...irjes
 
Towards Transfer Learning of Link Specifications
Towards Transfer Learning of Link SpecificationsTowards Transfer Learning of Link Specifications
Towards Transfer Learning of Link Specificationsgeoknow
 
Machine learning and Neural Networks
Machine learning and Neural NetworksMachine learning and Neural Networks
Machine learning and Neural Networksbutest
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Modelsguest0edcaf
 
Lecture 2
Lecture 2Lecture 2
Lecture 2butest
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biologyKernel methods for data integration in systems biology
Kernel methods for data integration in systems biologytuxette
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basicHouw Liong The
 
The science behind predictive analytics a text mining perspective
The science behind predictive analytics  a text mining perspectiveThe science behind predictive analytics  a text mining perspective
The science behind predictive analytics a text mining perspectiveankurpandeyinfo
 
2.7 other classifiers
2.7 other classifiers2.7 other classifiers
2.7 other classifiersKrish_ver2
 
Machine Learning: An Introduction Fu Chang
Machine Learning: An Introduction Fu ChangMachine Learning: An Introduction Fu Chang
Machine Learning: An Introduction Fu Changbutest
 
3.1 clustering
3.1 clustering3.1 clustering
3.1 clusteringKrish_ver2
 
Investigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysisInvestigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysistuxette
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasicengrasi
 
Chapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningChapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningHouw Liong The
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology tuxette
 

What's hot (18)

Discovering Novel Information with sentence Level clustering From Multi-docu...
Discovering Novel Information with sentence Level clustering  From Multi-docu...Discovering Novel Information with sentence Level clustering  From Multi-docu...
Discovering Novel Information with sentence Level clustering From Multi-docu...
 
Biehl hanze-2021
Biehl hanze-2021Biehl hanze-2021
Biehl hanze-2021
 
Towards Transfer Learning of Link Specifications
Towards Transfer Learning of Link SpecificationsTowards Transfer Learning of Link Specifications
Towards Transfer Learning of Link Specifications
 
Machine learning and Neural Networks
Machine learning and Neural NetworksMachine learning and Neural Networks
Machine learning and Neural Networks
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biologyKernel methods for data integration in systems biology
Kernel methods for data integration in systems biology
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basic
 
The science behind predictive analytics a text mining perspective
The science behind predictive analytics  a text mining perspectiveThe science behind predictive analytics  a text mining perspective
The science behind predictive analytics a text mining perspective
 
2.7 other classifiers
2.7 other classifiers2.7 other classifiers
2.7 other classifiers
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
 
Dbm630 lecture09
Dbm630 lecture09Dbm630 lecture09
Dbm630 lecture09
 
Machine Learning: An Introduction Fu Chang
Machine Learning: An Introduction Fu ChangMachine Learning: An Introduction Fu Chang
Machine Learning: An Introduction Fu Chang
 
3.1 clustering
3.1 clustering3.1 clustering
3.1 clustering
 
Investigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysisInvestigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysis
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
 
Chapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningChapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text mining
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology
 

Viewers also liked

Alexander Bogomolov. RUSSIAN SOFT POWER IN UKRAINE: sources and effects
Alexander Bogomolov. RUSSIAN SOFT POWER IN UKRAINE: sources and effects Alexander Bogomolov. RUSSIAN SOFT POWER IN UKRAINE: sources and effects
Alexander Bogomolov. RUSSIAN SOFT POWER IN UKRAINE: sources and effects sergeAmes
 
ss10_unit3
ss10_unit3ss10_unit3
ss10_unit3Mr. Park
 
Week 2
Week 2Week 2
Week 2emankl
 
A study of the contemporary world:Pakistan
A study of the contemporary world:PakistanA study of the contemporary world:Pakistan
A study of the contemporary world:Pakistanswanwesha
 
The history and evolution of foreign policy analysis
The history and evolution of foreign policy analysisThe history and evolution of foreign policy analysis
The history and evolution of foreign policy analysisibrahimkoncak
 
7 foreign policy process (1)
7 foreign policy process (1)7 foreign policy process (1)
7 foreign policy process (1)Ayesha Bhatti
 
Foreign Policy and Level of Analysis Problem
Foreign Policy and Level of Analysis ProblemForeign Policy and Level of Analysis Problem
Foreign Policy and Level of Analysis ProblemAbdul Basit Adeel
 
Multinational corporations
Multinational corporationsMultinational corporations
Multinational corporationsSHANTANU TYAGI
 
Introduction to political science
Introduction to political scienceIntroduction to political science
Introduction to political scienceNoel Jopson
 

Viewers also liked (10)

Alexander Bogomolov. RUSSIAN SOFT POWER IN UKRAINE: sources and effects
Alexander Bogomolov. RUSSIAN SOFT POWER IN UKRAINE: sources and effects Alexander Bogomolov. RUSSIAN SOFT POWER IN UKRAINE: sources and effects
Alexander Bogomolov. RUSSIAN SOFT POWER IN UKRAINE: sources and effects
 
ss10_unit3
ss10_unit3ss10_unit3
ss10_unit3
 
Week 2
Week 2Week 2
Week 2
 
A study of the contemporary world:Pakistan
A study of the contemporary world:PakistanA study of the contemporary world:Pakistan
A study of the contemporary world:Pakistan
 
The history and evolution of foreign policy analysis
The history and evolution of foreign policy analysisThe history and evolution of foreign policy analysis
The history and evolution of foreign policy analysis
 
7 foreign policy process (1)
7 foreign policy process (1)7 foreign policy process (1)
7 foreign policy process (1)
 
Foreign Policy and Level of Analysis Problem
Foreign Policy and Level of Analysis ProblemForeign Policy and Level of Analysis Problem
Foreign Policy and Level of Analysis Problem
 
FOREIGN POLICY ANALYSIS
FOREIGN POLICY ANALYSISFOREIGN POLICY ANALYSIS
FOREIGN POLICY ANALYSIS
 
Multinational corporations
Multinational corporationsMultinational corporations
Multinational corporations
 
Introduction to political science
Introduction to political scienceIntroduction to political science
Introduction to political science
 

Similar to Classifying Concept Mappings using Instance Similarity Features

Tdm probabilistic models (part 2)
Tdm probabilistic  models (part  2)Tdm probabilistic  models (part  2)
Tdm probabilistic models (part 2)KU Leuven
 
Genetic Programming for Generating Prototypes in Classification Problems
Genetic Programming for Generating Prototypes in Classification ProblemsGenetic Programming for Generating Prototypes in Classification Problems
Genetic Programming for Generating Prototypes in Classification ProblemsTarundeep Dhot
 
Clustering techniques final
Clustering techniques finalClustering techniques final
Clustering techniques finalBenard Maina
 
Machine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptMachine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptAnshika865276
 
PaperReview_ “Few-shot Graph Classification with Contrastive Loss and Meta-cl...
PaperReview_ “Few-shot Graph Classification with Contrastive Loss and Meta-cl...PaperReview_ “Few-shot Graph Classification with Contrastive Loss and Meta-cl...
PaperReview_ “Few-shot Graph Classification with Contrastive Loss and Meta-cl...AkankshaRawat53
 
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...kevig
 
An improved teaching learning
An improved teaching learningAn improved teaching learning
An improved teaching learningcsandit
 
Text categorization as a graph
Text categorization as a graphText categorization as a graph
Text categorization as a graphJames Wong
 
Graph classification problem.pptx
Graph classification problem.pptxGraph classification problem.pptx
Graph classification problem.pptxTony Nguyen
 
Text categorization as a graph
Text categorization as a graphText categorization as a graph
Text categorization as a graphFraboni Ec
 
Text categorization
Text categorization Text categorization
Text categorization Luis Goldster
 
Text categorization as a graph
Text categorization as a graph Text categorization as a graph
Text categorization as a graph David Hoen
 
Text categorization as graph
Text categorization as graphText categorization as graph
Text categorization as graphHarry Potter
 
Text categorization as a graph
Text categorization as a graphText categorization as a graph
Text categorization as a graphYoung Alista
 
Online Multi-Person Tracking Using Variance Magnitude of Image colors and Sol...
Online Multi-Person Tracking Using Variance Magnitude of Image colors and Sol...Online Multi-Person Tracking Using Variance Magnitude of Image colors and Sol...
Online Multi-Person Tracking Using Variance Magnitude of Image colors and Sol...Pourya Jafarzadeh
 

Similar to Classifying Concept Mappings using Instance Similarity Features (20)

Tdm probabilistic models (part 2)
Tdm probabilistic  models (part  2)Tdm probabilistic  models (part  2)
Tdm probabilistic models (part 2)
 
Genetic Programming for Generating Prototypes in Classification Problems
Genetic Programming for Generating Prototypes in Classification ProblemsGenetic Programming for Generating Prototypes in Classification Problems
Genetic Programming for Generating Prototypes in Classification Problems
 
Clustering techniques final
Clustering techniques finalClustering techniques final
Clustering techniques final
 
Machine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptMachine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.ppt
 
nnml.ppt
nnml.pptnnml.ppt
nnml.ppt
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
Extracting biclusters of similar values with Triadic Concept Analysis
Extracting biclusters of similar values with Triadic Concept AnalysisExtracting biclusters of similar values with Triadic Concept Analysis
Extracting biclusters of similar values with Triadic Concept Analysis
 
EiB Seminar from Esteban Vegas, Ph.D.
EiB Seminar from Esteban Vegas, Ph.D. EiB Seminar from Esteban Vegas, Ph.D.
EiB Seminar from Esteban Vegas, Ph.D.
 
PaperReview_ “Few-shot Graph Classification with Contrastive Loss and Meta-cl...
PaperReview_ “Few-shot Graph Classification with Contrastive Loss and Meta-cl...PaperReview_ “Few-shot Graph Classification with Contrastive Loss and Meta-cl...
PaperReview_ “Few-shot Graph Classification with Contrastive Loss and Meta-cl...
 
Clustering
ClusteringClustering
Clustering
 
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
 
An improved teaching learning
An improved teaching learningAn improved teaching learning
An improved teaching learning
 
Text categorization as a graph
Text categorization as a graphText categorization as a graph
Text categorization as a graph
 
Graph classification problem.pptx
Graph classification problem.pptxGraph classification problem.pptx
Graph classification problem.pptx
 
Text categorization as a graph
Text categorization as a graphText categorization as a graph
Text categorization as a graph
 
Text categorization
Text categorization Text categorization
Text categorization
 
Text categorization as a graph
Text categorization as a graph Text categorization as a graph
Text categorization as a graph
 
Text categorization as graph
Text categorization as graphText categorization as graph
Text categorization as graph
 
Text categorization as a graph
Text categorization as a graphText categorization as a graph
Text categorization as a graph
 
Online Multi-Person Tracking Using Variance Magnitude of Image colors and Sol...
Online Multi-Person Tracking Using Variance Magnitude of Image colors and Sol...Online Multi-Person Tracking Using Variance Magnitude of Image colors and Sol...
Online Multi-Person Tracking Using Variance Magnitude of Image colors and Sol...
 

More from Shenghui Wang

Non-parametric Subject Prediction
Non-parametric Subject PredictionNon-parametric Subject Prediction
Non-parametric Subject PredictionShenghui Wang
 
Our journey with semantic embedding
Our journey with semantic embeddingOur journey with semantic embedding
Our journey with semantic embeddingShenghui Wang
 
Linking entities via semantic indexing
Linking entities via semantic indexingLinking entities via semantic indexing
Linking entities via semantic indexingShenghui Wang
 
Semantic indexing for KOS
Semantic indexing for KOSSemantic indexing for KOS
Semantic indexing for KOSShenghui Wang
 
Contextualization of topics - browsing through terms, authors, journals and c...
Contextualization of topics - browsing through terms, authors, journals and c...Contextualization of topics - browsing through terms, authors, journals and c...
Contextualization of topics - browsing through terms, authors, journals and c...Shenghui Wang
 
Exploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadataExploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadataShenghui Wang
 
Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...Shenghui Wang
 
Learning Concept Mappings from Instance Similarity
Learning Concept Mappings from Instance SimilarityLearning Concept Mappings from Instance Similarity
Learning Concept Mappings from Instance SimilarityShenghui Wang
 
Measuring the dynamic bi-directional influence between content and social ne...
Measuring the dynamic bi-directional influence between content and social ne...Measuring the dynamic bi-directional influence between content and social ne...
Measuring the dynamic bi-directional influence between content and social ne...Shenghui Wang
 
What is concept dirft and how to measure it?
What is concept dirft and how to measure it?What is concept dirft and how to measure it?
What is concept dirft and how to measure it?Shenghui Wang
 
Study concept drift in political ontologies
Study concept drift in political ontologiesStudy concept drift in political ontologies
Study concept drift in political ontologiesShenghui Wang
 

More from Shenghui Wang (13)

Non-parametric Subject Prediction
Non-parametric Subject PredictionNon-parametric Subject Prediction
Non-parametric Subject Prediction
 
Our journey with semantic embedding
Our journey with semantic embeddingOur journey with semantic embedding
Our journey with semantic embedding
 
Linking entities via semantic indexing
Linking entities via semantic indexingLinking entities via semantic indexing
Linking entities via semantic indexing
 
Semantic indexing for KOS
Semantic indexing for KOSSemantic indexing for KOS
Semantic indexing for KOS
 
Contextualization of topics - browsing through terms, authors, journals and c...
Contextualization of topics - browsing through terms, authors, journals and c...Contextualization of topics - browsing through terms, authors, journals and c...
Contextualization of topics - browsing through terms, authors, journals and c...
 
Exploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadataExploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadata
 
Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...
 
Learning Concept Mappings from Instance Similarity
Learning Concept Mappings from Instance SimilarityLearning Concept Mappings from Instance Similarity
Learning Concept Mappings from Instance Similarity
 
Measuring the dynamic bi-directional influence between content and social ne...
Measuring the dynamic bi-directional influence between content and social ne...Measuring the dynamic bi-directional influence between content and social ne...
Measuring the dynamic bi-directional influence between content and social ne...
 
What is concept dirft and how to measure it?
What is concept dirft and how to measure it?What is concept dirft and how to measure it?
What is concept dirft and how to measure it?
 
ICA Slides
ICA SlidesICA Slides
ICA Slides
 
ECCS 2010
ECCS 2010ECCS 2010
ECCS 2010
 
Study concept drift in political ontologies
Study concept drift in political ontologiesStudy concept drift in political ontologies
Study concept drift in political ontologies
 

Recently uploaded

AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxPoojaSen20
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 

Recently uploaded (20)

AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 

Classifying Concept Mappings using Instance Similarity Features

  • 1. Introduction Three classifiers Experiments and results Summary Similarity Features, and their Role in Concept Alignment Learning Shenghui Wang1 Gwenn Englebienne2 Christophe Gu´eret 1 Stefan Schlobach1 Antoine Isaac1 Martijn Schut1 1 Vrije Universiteit Amsterdam 2 Universiteit van Amsterdam SEMAPRO 2010 Florence
  • 2. Introduction Three classifiers Experiments and results Summary Outline 1 Introduction Classification of concept mappings based on instance similarity 2 Three classifiers Markov Random Field Multi-objective Evolution Strategy Support Vector Machine 3 Experiments and results 4 Summary
  • 3. Introduction Three classifiers Experiments and results Summary Thesaurus mapping SemanTic Interoperability To access Cultural Heritage (STITCH) through mappings between thesauri Scope of the problem: Big thesauri with tens of thousands of concepts Huge collections (e.g., National Library of the Neterlands: 80km of books in one collection) Heterogeneous (e.g., books, manuscripts, illustrations, etc.) Multi-lingual problem Solving matching problems is one step to the solution of the interoperability problem. e.g., “plankzeilen” vs. “surfsport” e.g., “archeology” vs. “excavation”
  • 4. Introduction Three classifiers Experiments and results Summary Thesaurus mapping SemanTic Interoperability To access Cultural Heritage (STITCH) through mappings between thesauri Scope of the problem: Big thesauri with tens of thousands of concepts Huge collections (e.g., National Library of the Neterlands: 80km of books in one collection) Heterogeneous (e.g., books, manuscripts, illustrations, etc.) Multi-lingual problem Solving matching problems is one step to the solution of the interoperability problem. e.g., “plankzeilen” vs. “surfsport” e.g., “archeology” vs. “excavation”
  • 5. Introduction Three classifiers Experiments and results Summary Thesaurus mapping SemanTic Interoperability To access Cultural Heritage (STITCH) through mappings between thesauri Scope of the problem: Big thesauri with tens of thousands of concepts Huge collections (e.g., National Library of the Neterlands: 80km of books in one collection) Heterogeneous (e.g., books, manuscripts, illustrations, etc.) Multi-lingual problem Solving matching problems is one step to the solution of the interoperability problem. e.g., “plankzeilen” vs. “surfsport” e.g., “archeology” vs. “excavation”
  • 6. Introduction Three classifiers Experiments and results Summary Automatic alignment techniques Lexical labels and textual information of entities Structural structure of the formal definitions of entities, position in the hierarchy Extensional statistical information of instances, i.e., objects indexed with entities Background knowledge using a shared conceptual reference to find links indirectly
  • 7. Introduction Three classifiers Experiments and results Summary Instance-based techniques: common instance based
  • 8. Introduction Three classifiers Experiments and results Summary Instance-based techniques: common instance based
  • 9. Introduction Three classifiers Experiments and results Summary Instance-based techniques: common instance based
  • 10. Introduction Three classifiers Experiments and results Summary Pros and cons Advantages Simple to implement Interesting results Disadvantages Requires sufficient amounts of common instances Only uses part of the available information
  • 11. Introduction Three classifiers Experiments and results Summary Instance-based techniques: Instance similarity based
  • 12. Introduction Three classifiers Experiments and results Summary Instance-based techniques: Instance similarity based
  • 13. Introduction Three classifiers Experiments and results Summary Instance-based techniques: Instance similarity based
  • 14. Introduction Three classifiers Experiments and results Summary Representing concepts and the similarity between them Instance features Concept features Pair features Cos. dist. Bag of words Bag of words Bag of words Bag of words Bag of words Bag of words Creator Title Publisher ... Creator Title Publisher ... Creator Title Publisher ... ... f1 f2 f3 Concept1Concept2 {{ { { { Creator Title Publisher ... Creator Title Publisher ... Creator Term 1: 4 Term 2: 1 Term 3: 0 ... Title Term 1: 0 Term 2: 3 Term 3: 0 ... Publisher Term 1: 2 Term 2: 1 Term 3: 3 ... Creator Term 1: 2 Term 2: 0 Term 3: 0 ... Title Term 1: 0 Term 2: 4 Term 3: 1 ... Publisher Term 1: 4 Term 2: 1 Term 3: 1 ... Cos. dist. Cos. dist.
  • 15. Introduction Three classifiers Experiments and results Summary Classification of concept mappings based on instance similarity Classification based on instance similarity Each pair of concepts is treated as a point in a “similarity space” Its position is defined by the features of the pair. The features of the pair are the different measures of similarity between the concepts’ instances. Hypothesis: the label of a point — which represents whether the pair is a positive mapping or negative one — is correlated with the position of this point in this space. With already labelled points and the actual similarity values of concepts involved, it is possible to classify a point, i.e., to give it a right label, based on its location given by the actual similarity values.
  • 16. Introduction Three classifiers Experiments and results Summary Classification of concept mappings based on instance similarity Research questions How do different classifiers perform on this instance-based mapping task? What are the benefits of using a machine learning algorithm to determine the importance of features? Are there regularities wrt. the relative importance given to specific features for similarity computation? Are these weights related to application data characteristics?
  • 17. Introduction Three classifiers Experiments and results Summary Three classifiers used Markov Random Field (MRF) Evolutionary Strategy (ES) Support Vector Machine (SVM)
  • 18. Introduction Three classifiers Experiments and results Summary Markov Random Field Markov Random Field Let T = { (x(i), y(i)) }N i=1 be the training set x(i) ∈ RK , the features y(i) ∈ Y = {positive, negative}, the label The conditional probability of a label given the input is modelled as p(y(i) |xi , θ) = 1 Z(xi , θ) exp K j=1 λj φj (y(i) , x(i) ) , (1) where θ = { λj }K j=1 are the weights associated to the feature functions φ and Z(xi , θ) is a normalisation constant
  • 19. Introduction Three classifiers Experiments and results Summary Markov Random Field The classifier used: Markov Random Field (cont’) The likelihood of the data set for given model parameters p(T|θ) is given by: p(T|θ) = N i=1 p(y(i) |x(i) ) (2) During learning, our objective is to find the most likely values for θ for the given training data. The decision criterion for assigning a label y(i) to a new pair of concepts i is then simply given by: y(i) = argmax y p(y|x(i) ) (3)
  • 20. Introduction Three classifiers Experiments and results Summary Multi-objective Evolution Strategy Multi-objective Evolution Strategy Evolutionary strategies (ES) have two characteristic properties: firstly, they are used for continuous value optimisation, and, secondly, they are self-adaptive. An ES individual is a direct model of the searched solution, defined by Λ and some evolution strategy parameters: Λ, Σ ↔ λ1, . . . , λK , σ1, . . . , σK (4) The fitness function is related to the decision criterion for the ES, which is sign-based: LES i = 1 if K j=1 λi Fij > 0 0 otherwise (5)
  • 21. Introduction Three classifiers Experiments and results Summary Multi-objective Evolution Strategy Multi-objective Evolution Strategy (cont’) Maximising the number of positive results and negative results are two opposite goals. f1(Λ | F, L) = #{Fi | K j=1 λi Fij > 0 ∧ Li = 1} (6) f2(Λ | F, L) = #{Fi | K j=1 λi Fij ≤ 0 ∧ Li = 0} (7)
  • 22. Introduction Three classifiers Experiments and results Summary Multi-objective Evolution Strategy Multi-objective Evolution Strategy (cont’) Evolution process Recombination: Two parent individuals are combined using different weighting, producing two new individuals Mutation: One parent individual changes itself into a new child individual Survivor selection: NSGA-II
  • 23. Introduction Three classifiers Experiments and results Summary Support Vector Machine Support Vector Machine Support Vector Machine (SVM) is used as a maximum margin classifier whose task consists in finding an hyperplane separating the two classes. The objective is to maximise the margin separating the two classes whilst minimizing classification error risk.
  • 24. Introduction Three classifiers Experiments and results Summary Experiments and results Thesauri to match: GTT (35K) and Brinkman (5K) Instances: 1 million books GTT annotated books: 307K Brinkman annotated books: 490K Dually annotated books: 222K
  • 25. Introduction Three classifiers Experiments and results Summary Feature slection for similarity calculation λj Feature 1 Lexical 2 Jaccard 3 Date 4 ISBN 5 NBN 6 PPN 7 SelSleutel 8 abstract 9 alternative 10 annotation λj Feature 11 author 12 contributor 13 creator 14 dateCopyrighted 15 description 16 extent 17 hasFormat 18 hasPart 19 identifier 20 isVersionOf λj Feature 21 issued 22 language 23 mods:edition 24 publisher 25 refNBN 26 relation 27 spatial 28 subject 29 temporal 30 title Table: List of the features
  • 26. Introduction Three classifiers Experiments and results Summary Quality of learning 0 0.2 0.4 0.6 0.8 1 Precision Recall F-measure MRF 1-30 MRF 3-30 ES SVM 0 0.2 0.4 0.6 0.8 1 Precision Recall F-measure 0 0.2 0.4 0.6 0.8 1 Precision Recall F-measure 0 0.2 0.4 0.6 0.8 1 Precision Recall F-measure Figure: Precision, recall and F-Measure for mappings with a positive label (top) and a negative label (bottom). Error bars indicate one standard deviation over the 10 folds of cross-validation.
  • 27. Introduction Three classifiers Experiments and results Summary Relative importance of features Which features of our instances are important for mapping? Figure: Mutual information between features and labels
  • 28. Introduction Three classifiers Experiments and results Summary Relative importance of features ES lambdas are not really conclusive ES lambdas that are most inconclusive correspond to the least informative features
  • 29. Introduction Three classifiers Experiments and results Summary Relative importance of features Important features in terms of mutual information are associated to large MRF weights
  • 30. Introduction Three classifiers Experiments and results Summary A more detailed analysis Expected important features: Label similarity (1), instance overlap (2), subject (28), etc. Expected unimportant features: Size of the book (16), format description (17) and language (22), etc. Surprisingly important features: Date (14) Surprisingly unimportant features: Description (15) and abstract (8)
  • 31. Introduction Three classifiers Experiments and results Summary A more detailed analysis Expected important features: Label similarity (1), instance overlap (2), subject (28), etc. Expected unimportant features: Size of the book (16), format description (17) and language (22), etc. Surprisingly important features: Date (14) Surprisingly unimportant features: Description (15) and abstract (8)
  • 32. Introduction Three classifiers Experiments and results Summary Summary We tried three machine learning classifiers on the instance-based mapping task, among which MRF and ES can automatically identify meaningful features. The MRF and the ES, result in a performance in the neighbourhood of 90%, showing the validity of the approach. Our analysis suggests that when many different description features interact, there is no systematic correlation between what a learning method could find and what an application expert may anticipate.
  • 33. Introduction Three classifiers Experiments and results Summary Thank you