SlideShare a Scribd company logo
Knowlege Discovery for the Semantic Web
An Application to Web Usage Mining
&
How to use semantics in the Preprocessing stage
Input
Data
Data Preprocessing
and Transformation
Data Mining
Interpretation
and Evaluation
Information/
Taking Action
Data fusion (multiple sources)
Data Cleaning (noise,missing val.)
Feature Selection
Dimensionality Reduction
Data Normalization
Filtering Patterns
Visualization
Statistical Analysis
- Hypothesis testing
- Attribute evaluation
- Comparing learned models
- Computing Confidence Intervals
Claudia D’Amato - University of Bari, IT.

Laura Hollink - Centrum Wiskunde & Informatica, Amsterdam, NL.
Knowlege Discovery for the Semantic Web
An Application to Web Usage Mining
&
How to use semantics in the Preprocessing stage
Input
Data
Data Preprocessing
and Transformation
Data Mining
Interpretation
and Evaluation
Information/
Taking Action
Data fusion (multiple sources)
Data Cleaning (noise,missing val.)
Feature Selection
Dimensionality Reduction
Data Normalization
Filtering Patterns
Visualization
Statistical Analysis
- Hypothesis testing
- Attribute evaluation
- Comparing learned models
- Computing Confidence Intervals
Claudia D’Amato - University of Bari, IT.

Laura Hollink - Centrum Wiskunde & Informatica, Amsterdam, NL.
An application to Web Usage Mining
Web Usage Mining = discovering patterns in logs of user interaction with Web
resources

• logs typically contain an identifier for users (e.g. ip address), their queries
and clicks
An application to Web Usage Mining
Web Usage Mining = discovering patterns in logs of user interaction with Web
resources

• logs typically contain an identifier for users (e.g. ip address), their queries
and clicks
• What about usage of Linked
Open Data?
An application to Web Usage Mining
Web Usage Mining = discovering patterns in logs of user interaction with Web
resources

• logs typically contain an identifier for users (e.g. ip address), their queries
and clicks
• What about usage of Linked
Open Data?
• Can we use semantics to
improve mining of Web Usage?
Mining Usage of Linked Open Data in USEWOD
USEWOD: http://usewod.org/ [B. Berendt, L. Hollink., M. Luczak-Roesch, et al.]

1. USEWOD workshop series @ ESWC / WWW since 2011 

2. USEWOD dataset: server logs of DBpedia, BioPortal, LinkedGeoData, etc.,
and client side logs from YASGUI.
Mining Usage of Linked Open Data in USEWOD
USEWOD: http://usewod.org/ [B. Berendt, L. Hollink., M. Luczak-Roesch, et al.]

1. USEWOD workshop series @ ESWC / WWW since 2011 

2. USEWOD dataset: server logs of DBpedia, BioPortal, LinkedGeoData, etc.,
and client side logs from YASGUI.
example removed
Mining Usage of Linked Open Data in USEWOD
• Results of USEWOD: LOD usage mining for more efficient indexing [1],
cashing [2], auto-completion [3], etc.
[1] Arias, M., Fernández, J. D., Martínez-Prieto, M. A., & de la Fuente, P. (2011). An empirical study
of real-world SPARQL queries. USEWOD @ WWW 2011
[2] Lorey, J., & Naumann, F. Caching and prefetching strategies for sparql queries. USEWOD @
ESWC 2013.
[3] K. Kramer,R.Q. Dividino, and G. Gröner. SPACE: SPARQL Index for Efficient Autocompletion.
ISWC (Posters & Demos) 2013.
[4] Rietveld, L., & Hoekstra, R. Man vs. Machine: Differences in SPARQL Queries. USEWOD @
ESWC 2014
[5] Huelss, J., & Paulheim, H. What SPARQL Query Logs Tell and do not Tell about Semantic
Relatedness in LOD. NoISE @ ESWC 2015
• Issues: 

• what is the difference between
queries by machines and humans? [4]

• what is the meaning of repeated
queries by bots/tools?

• a lot of the usage is invisible due to
data dump download [5]
Usage mining example 1: clustering rdf:properties
in DBpedia
Instead of listing all DBpedia properties
alphabetically, can we display them in a
more meaningful way? Can we use query
logs for this?
[5]
Usage mining example 1: clustering rdf:properties
in DBpedia
Instead of listing all DBpedia properties
alphabetically, can we display them in a
more meaningful way? Can we use query
logs for this?
[5]
[5] Huelss, J., & Paulheim, H. What SPARQL
Query Logs Tell and do not Tell about Semantic
Relatedness in LOD. NoISE @ ESWC 2015
Disclaimer: simplified discussion of this paper!
Usage mining example 1: clustering rdf:properties
in DBpedia
Approach: Hierarchical Clustering of
properties, where the distance between a
pair of properties is based on how often
they co-occur in a SPARQL query in the
USEWOD2015 logs.
[5] Huelss, J., & Paulheim, H. What SPARQL
Query Logs Tell and do not Tell about Semantic
Relatedness in LOD. NoISE @ ESWC 2015
Disclaimer: simplified discussion of this paper!
Usage mining example 1: clustering rdf:properties
in DBpedia
Approach: Hierarchical Clustering of
properties, where the distance between a
pair of properties is based on how often
they co-occur in a SPARQL query in the
USEWOD2015 logs.
[5] Huelss, J., & Paulheim, H. What SPARQL
Query Logs Tell and do not Tell about Semantic
Relatedness in LOD. NoISE @ ESWC 2015
Disclaimer: simplified discussion of this paper!
Evaluation: run an experiment to
measure how quickly and accurately
people identify facts when looking
at the standard view or the clustered
view.
Usage mining example 1: clustering rdf:properties
in DBpedia
Approach: Hierarchical Clustering of
properties, where the distance between a
pair of properties is based on how often
they co-occur in a SPARQL query in the
USEWOD2015 logs.
[5] Huelss, J., & Paulheim, H. What SPARQL
Query Logs Tell and do not Tell about Semantic
Relatedness in LOD. NoISE @ ESWC 2015
Disclaimer: simplified discussion of this paper!
Evaluation: run an experiment to
measure how quickly and accurately
people identify facts when looking
at the standard view or the clustered
view.
Result: no significant differences ☹
Usage mining example 1: clustering rdf:properties
in DBpedia
Approach: Hierarchical Clustering of
properties, where the distance between a
pair of properties is based on how often
they co-occur in a SPARQL query in the
USEWOD2015 logs.
[5] Huelss, J., & Paulheim, H. What SPARQL
Query Logs Tell and do not Tell about Semantic
Relatedness in LOD. NoISE @ ESWC 2015
Disclaimer: simplified discussion of this paper!
Evaluation: run an experiment to
measure how quickly and accurately
people identify facts when looking
at the standard view or the clustered
view.
Result: no significant differences ☹
Usage mining example 2: mining semantically
enriched query logs
[5] Laura Hollink, Peter Mika and Roi Blanco. Web
Usage Mining with Semantic Analysis. WWW 2013.
Usage mining example 2: mining semantically
enriched query logs
Data: queries and clicks on Yahoo! search engine.
[5] Laura Hollink, Peter Mika and Roi Blanco. Web
Usage Mining with Semantic Analysis. WWW 2013.
Usage mining example 2: mining semantically
enriched query logs
Data: queries and clicks on Yahoo! search engine.
Problem when mining ‘raw’ logs: low support of even the most
frequent patterns
[5] Laura Hollink, Peter Mika and Roi Blanco. Web
Usage Mining with Semantic Analysis. WWW 2013.
Usage mining example 2: mining semantically
enriched query logs
Data: queries and clicks on Yahoo! search engine.
Problem when mining ‘raw’ logs: low support of even the most
frequent patterns
[5] Laura Hollink, Peter Mika and Roi Blanco. Web
Usage Mining with Semantic Analysis. WWW 2013.
Usage mining example 2: mining semantically
enriched query logs
Approach:

1. link queries to entities in
LOD cloud

2. choose class of entity +
selected properties

3. detect modifier words
(download, trailer, cast,
date, etc.)
Usage mining example 2: mining semantically
enriched query logs
Approach:

1. link queries to entities in
LOD cloud

2. choose class of entity +
selected properties

3. detect modifier words
(download, trailer, cast,
date, etc.)
1. Link queries to entities in LOD cloud:

• Freebase (has a lot of movie related info)

• DBpedia (Wikipedia is widely used)
Usage mining example 2: mining semantically
enriched query logs
Approach:

1. link queries to entities in
LOD cloud

2. choose class of entity +
selected properties

3. detect modifier words
(download, trailer, cast,
date, etc.)
1. Link queries to entities in LOD cloud:

• Freebase (has a lot of movie related info)

• DBpedia (Wikipedia is widely used)
Usage mining example 2: mining semantically
enriched query logs
Approach:

1. link queries to entities in
LOD cloud

2. choose class of entity +
selected properties

3. detect modifier words
(download, trailer, cast,
date, etc.)
1. Link queries to entities in LOD cloud:

• Freebase (has a lot of movie related info)

• DBpedia (Wikipedia is widely used)
Usage mining example 2: mining semantically
enriched query logs
•Sequential
pattern mining
on the class-
level using
PrefixSpan.
Usage mining example 2: mining semantically
enriched query logs
•Sequential
pattern mining
on the class-
level using
PrefixSpan.
Usage mining example 2: mining semantically
enriched query logs
1.Discover frequent patterns on class-level using
• Using the efficient PrefixSpan algorithm to mine all possible subsequence
patterns
Usage mining example 3: semantic patterns of
query modification
•Goal: Identify frequent query modifications in an image archive

• state of the art = 3 classes: generalization, specification,
reformulation 

•Approach:

1.link queries to entities in the LOD cloud

2.Choose class of entity

3.Determine shortest path between consecutive queries Q1 and
Q2
4.Rank property-paths according to support and confidence.
Hollink, V., Tsikrika, T., & de Vries, A. P.
(2011). Semantic search log analysis: a
method and a study on professional image
search. JASIST 62(4), 691-713.
See also:
Huurnink, B., Hollink, L., Van Den Heuvel,
W., & De Rijke, M. (2010). Search behavior
of media professionals at an audiovisual
archive: A transaction log analysis. JASIST,
61(6), 1180-1197.
Usage mining example 3: semantic patterns of
query modification
•Goal: Identify frequent query modifications in an image archive

• state of the art = 3 classes: generalization, specification,
reformulation 

•Approach:

1.link queries to entities in the LOD cloud

2.Choose class of entity

3.Determine shortest path between consecutive queries Q1 and
Q2
4.Rank property-paths according to support and confidence.
Hollink, V., Tsikrika, T., & de Vries, A. P.
(2011). Semantic search log analysis: a
method and a study on professional image
search. JASIST 62(4), 691-713.
See also:
Huurnink, B., Hollink, L., Van Den Heuvel,
W., & De Rijke, M. (2010). Search behavior
of media professionals at an audiovisual
archive: A transaction log analysis. JASIST,
61(6), 1180-1197.
Usage mining example 3: semantic patterns of
query modification
•Goal: Identify frequent query modifications in an image archive

• state of the art = 3 classes: generalization, specification,
reformulation 

•Approach:

1.link queries to entities in the LOD cloud

2.Choose class of entity

3.Determine shortest path between consecutive queries Q1 and
Q2
4.Rank property-paths according to support and confidence.
Hollink, V., Tsikrika, T., & de Vries, A. P.
(2011). Semantic search log analysis: a
method and a study on professional image
search. JASIST 62(4), 691-713.
See also:
Huurnink, B., Hollink, L., Van Den Heuvel,
W., & De Rijke, M. (2010). Search behavior
of media professionals at an audiovisual
archive: A transaction log analysis. JASIST,
61(6), 1180-1197.
Conclusions:
• Identified patterns not visible on raw
data.
• but “the method is only moderately
successful in identifying the most
prominent relations for a given query
pair”
The feature selection issue when using LOD
Input
Data
Data Preprocessing
and Transformation
Data Mining
Interpretation
and Evaluation
Information/
Taking Action
Data fusion (multiple sources)
Data Cleaning (noise,missing val.)
Feature Selection
Dimensionality Reduction
Data Normalization
Filtering Patterns
Visualization
Statistical Analysis
- Hypothesis testing
- Attribute evaluation
- Comparing learned models
- Computing Confidence Intervals
Feature Selection
• Feature selection = Limiting the number of features for faster computation
times, more understandable models, better prediction value.

• Using Linked Open Data can lead to large number of features per data point.

• a DBpedia resource easily has 50 property-value pairs.

• more are easily added using reasoning

• note: these numbers are not large compared to the number of features in
DNA strings, or all words in a text corpus!

• Still, many of them are irrelevant or redundant.
Feature Selection Example
• Goal: learn a relation R between x and y.
• In this paper, R = ‘occupation’, ‘gender’, ‘instance_of’, ‘acted_in’, ‘genre’,
‘position_played_on_team’

• Approach: given a training set of pairs of x, y, learn a “whitelist” of properties
in DBpedia, WikiData, YAGO and WordNet that indicate a relation R between
x and y

• Cast as a subset selection problem:

• E = the set of possible properties

• local search over the power set of E (i.a. all subsets) to find the optimal
subset.
Learning to Exploit Structured Resources
for Lexical Inference. Vered Shwartz, Omer
Levy, Ido Dagan and Jacob Goldberger.
CoNLL 2015 (to appear)july
Data Fusion
Input
Data
Data Preprocessing
and Transformation
Data Mining
Interpretation
and Evaluation
Information/
Taking Action
Data fusion (multiple sources)
Data Cleaning (noise,missing val.)
Feature Selection
Dimensionality Reduction
Data Normalization
Filtering Patterns
Visualization
Statistical Analysis
- Hypothesis testing
- Attribute evaluation
- Comparing learned models
- Computing Confidence Intervals
Data Fusion / Ontology Alignment / Mapping /
Matching / Linking / Integration
Ontology /
Schema / T-box
level
Instance / data /
A-box level
Data Fusion
Ontology
Alignment
Data Fusion
Instance matching
Data Fusion
Property matching
Data Fusion
~~~ ~~~ ~
~~~ ~~ ~
~~~ ~~~ ~
~~~ ~~ ~
~~~ ~~~ ~
~~~ ~~ ~
~~~ ~~~ ~
~~~ ~~ ~
~~~ ~~~ ~
~~~ ~~ ~
~~~ ~~~ ~
~~~ ~~ ~
Entity detection /
entity linking
Methods for Data Fusion (ontology alignment)
label
label
label
label
Methods for Data Fusion: structural matchers
label
label
label
label
Methods for Data Fusion: structural matchers
label
label
label
label
• E.g. Similarity Flooding: the similarity of a matched pair s1
and t1 propagates to their respective neighbors s2 and t2.

• neighbors can be defined as subclasses,
superclasses, instances, domain/ranges, etc.

• Structural measures are in practice never used stand
alone.
[10] Ngo, Duy Hoa, and Zohra Bellahsene.
YAM++-results for OAEI 2012. OAEI @
ISWC 2012.

[11] Sergey Melnik, Hector Garcia-Molina,
and Erhard Rahm. Similarity flooding: A
versatile graph matching algorithm and its
application to schema matching.

ICDE 2002.
Methods for Data Fusion: instance based matchers
label
label
label
label
Methods for Data Fusion: instance based matchers
label
label
label
label
• Match classes based on similarity of their instances

• note: you need a way to assess similarity of the instances!
Methods for Data Fusion: string based
label
label
label
label
Methods for Data Fusion: string based
• This is the most important feature in ontology alignment.

• “nearly all [ontology alignment systems] use a string similarity metric” [12]

• stopping and stemming is not helpful! Nor is using WordNet synonyms. [12]

• In [13] we took an even less semantic approach: linking based on URL syntax.
label
label
label
label
[12] Cheatham, M., & Hitzler, P. String
similarity metrics for ontology alignment.
ISWC 2013.

[13] The debates of the European
Parliament as Linked Open Data. Under
review. See http://www.talkofeurope.eu/
data/ for details.
Methods for Data Fusion: string based
• This is the most important feature in ontology alignment.

• “nearly all [ontology alignment systems] use a string similarity metric” [12]

• stopping and stemming is not helpful! Nor is using WordNet synonyms. [12]

• In [13] we took an even less semantic approach: linking based on URL syntax.
label
label
label
label
[12] Cheatham, M., & Hitzler, P. String
similarity metrics for ontology alignment.
ISWC 2013.

[13] The debates of the European
Parliament as Linked Open Data. Under
review. See http://www.talkofeurope.eu/
data/ for details.
Methods for Data Fusion: string based
• This is the most important feature in ontology alignment.

• “nearly all [ontology alignment systems] use a string similarity metric” [12]

• stopping and stemming is not helpful! Nor is using WordNet synonyms. [12]

• In [13] we took an even less semantic approach: linking based on URL syntax.
label
label
label
label
[12] Cheatham, M., & Hitzler, P. String
similarity metrics for ontology alignment.
ISWC 2013.

[13] The debates of the European
Parliament as Linked Open Data. Under
review. See http://www.talkofeurope.eu/
data/ for details.

http://www.dbpedia.org/page/Judith_Sargentini
Link types
Equality
SameAs
EquivalentClasses
EquivalentProperties
“Den Haag” = “The Hague”
wood-material = wood
Hierarchical
rdfs:subClassOf
rdf:type
rdfs:subPropertyOf
aat:Artist ⊇ wn:Artist
tgn:Africa ∈ wn:Continent
conf:has_the_last_name =
edas:hasLastName
Weaker semantics
skos:closeMatch / exactMatch /
broadMatch /narrowMatch /
relatedMatch
geonames:Italy skos:closeMatch
librarytopics:Italy
Domain specific links
E.g. born-in
E.g. hasStyle
E.g. hasPart
Van Gogh (ULAN) born-in Groot-
Zundert (TGN)
Representation of links
architecten architectsskos:exactMatch
Representation of links
architecten
architects
Link 001
skos:exactMatch
handmatigL. Hollink
concept1
concept2
link type
link methode
auteur
architecten architectsskos:exactMatch
Representation of links
architecten
architects
Link 001
skos:exactMatch
handmatigL. Hollink
concept1
concept2
link type
link methode
auteur
architecten architectsskos:exactMatch
• Open Question: how valid are the
patterns we discover in data when
the quality of the links is low?
Representation of links
architecten
architects
Link 001
skos:exactMatch
handmatigL. Hollink
concept1
concept2
link type
link methode
auteur
architecten architectsskos:exactMatch
• Open Question: how valid are the
patterns we discover in data when
the quality of the links is low?
Representation of links
architecten
architects
Link 001
skos:exactMatch
handmatigL. Hollink
concept1
concept2
link type
link methode
auteur
architecten architectsskos:exactMatch
• Open Question: how valid are the
patterns we discover in data when
the quality of the links is low?
• Even more important to be critical
and evaluate the data

• source criticism

• tool criticism (see http://
event.cwi.nl/toolcriticism/)
Evaluation of Data Fusion / Linking
Evaluation of Data Fusion / Linking
1. Manually rating (a sample of) mappings

• relatively cheap and easy to interpret

• only precision, no recall
Evaluation of Data Fusion / Linking
1. Manually rating (a sample of) mappings

• relatively cheap and easy to interpret

• only precision, no recall
2. Comparison to a reference alignment

• precision and recall

• used in OAEI on the SEALS platform

• more expensive if a reference alignment has to be
created (but: crowd sourcing!)
Evaluation of Data Fusion / Linking
1. Manually rating (a sample of) mappings

• relatively cheap and easy to interpret

• only precision, no recall
2. Comparison to a reference alignment

• precision and recall

• used in OAEI on the SEALS platform

• more expensive if a reference alignment has to be
created (but: crowd sourcing!)
3. End-to-end evaluation (a.k.a. evaluating an application
that uses the mappings)

• arguably the best method!

• need to have access to an application + users
Evaluation of Data Fusion / Linking
Evaluation of Data Fusion / Linking
• Comparison to a reference alignment: Alternative measures: 

• 1. instead of a binary classification into correct/incorrect mappings, take
into account how wrong an link is:
Evaluation of Data Fusion / Linking
• Comparison to a reference alignment: Alternative measures: 

• 1. instead of a binary classification into correct/incorrect mappings, take
into account how wrong an link is:
Evaluation of Data Fusion / Linking
• Comparison to a reference alignment: Alternative measures: 

• 1. instead of a binary classification into correct/incorrect mappings, take
into account how wrong an link is:
Evaluation of Data Fusion / Linking
• Comparison to a reference alignment: Alternative measures: 

• 1. instead of a binary classification into correct/incorrect mappings, take
into account how wrong an link is:
• where r(a) is the semantic distance between correspondence a and
correspondence a’ in the reference alignment, A is the number of
correspondences.

• 2. weight score of mappings based on the frequency of their use

• e.g from usage logs! Laura Hollink, Mark van Assem, Shenghui
Wang, Antoine Isaac, Guus Schreiber. Two
Variations on Ontology Alignment
Evaluation: Methodological Issues.ESWC
2008.
Evaluation of Data Fusion / Linking
1. Manually rating (a sample of) mappings

• relatively cheap and easy to interpret

• only precision, no recall

2. Comparison to a reference alignment

• precision and recall

• used in OAEI on the SEALS platform

• more expensive if a reference alignment has to be
created (but: crowd sourcing!)

3. End-to-end evaluation (a.k.a. evaluating an application
that uses the mappings)

• arguably the best method!

• need to have access to an application + users
Discovering links from text
Pointers to what happens in other communities
• Word2Vec: efficient deep learning algorithm to learn vector representations of
words

• vector similarity captures semantics between words

• No explicit semantics, but we can’t deny that there is meaning there!

• Success seems to be mostly due to big data
Discovering links from text
Pointers to what happens in other communities
• Word2Vec: efficient deep learning algorithm to learn vector representations of
words

• vector similarity captures semantics between words

• No explicit semantics, but we can’t deny that there is meaning there!

• Success seems to be mostly due to big data
Mikolov, Tomas, et al. "Distributed
representations of words and phrases and
their compositionality." Advances in neural
information processing systems. 2013.
Discovering links from text
Pointers to what happens in other communities
• Word2Vec: efficient deep learning algorithm to learn vector representations of
words

• vector similarity captures semantics between words

• No explicit semantics, but we can’t deny that there is meaning there!

• Success seems to be mostly due to big data
Mikolov, Tomas, et al. "Distributed
representations of words and phrases and
their compositionality." Advances in neural
information processing systems. 2013.
Example:

Vec(Madrid) - Vec(Spain) + Vec(France)
is closer to Vec(Paris) than to any other
vector
NELL: Never-Ending Language Learning
• several machine learning approaches to discover facts (beliefs) from text on
the web

• string features, distribution of context words, html structure, visual image
analysis.

• Running since 2010, has so far learned over 80 million beliefs
NELL: Never-Ending Language Learning
• several machine learning approaches to discover facts (beliefs) from text on
the web

• string features, distribution of context words, html structure, visual image
analysis.

• Running since 2010, has so far learned over 80 million beliefs
T. Mitchell, W. Cohen, E. Hruschka, P.
Talukdar, J. Betteridge, A. Carlson, B. Dalvi,
M. Gardner, B. Kisiel, J. Krishnamurthy, N.
Lao, K. Mazaitis, T. Mohamed, N.
Nakashole, E. Platanios, A. Ritter, M.
Samadi, B. Settles, R. Wang, D. Wijaya, A.
Gupta, X. Chen, A. Saparov, M. Greaves, J.
Welling. In Proceedings of the Conference
on Artificial Intelligence (AAAI), 2015.
Research Task Format
Work in 6 groups of 10 students

• 5 people design an approach to
association rules with semantics

• 5 people focus on how that
approach should be evaluated

The idea is to work together!
E.g. which measures are best
for this approach? Which
versions of the approach
should be evaluated? Will this
approach score high on these
measures? In which cases?
Research Task Format
Work in 6 groups of 10 students

• 5 people design an approach to
association rules with semantics

• 5 people focus on how that
approach should be evaluated

The idea is to work together!
E.g. which measures are best
for this approach? Which
versions of the approach
should be evaluated? Will this
approach score high on these
measures? In which cases?
• We would like one presentation per group of 10 people

• of 3 or 4 slides

• of max 4 minutes (less is fine too!)

• Send me the slides in PDF, with your group number in the title,
by email to l.hollink@cwi.nl, today before 16:30.

• The presentation should show clearly:

1. the AR method

2. how did you take into account semantics?

3. the evaluation method

• BONUS: argue when and why your approach will score high.

• BONUS: discuss how the newly learned links can be
represented and used.
Research Task Format
Work in 6 groups of 10 students

• 5 people design an approach to
association rules with semantics

• 5 people focus on how that
approach should be evaluated

The idea is to work together!
E.g. which measures are best
for this approach? Which
versions of the approach
should be evaluated? Will this
approach score high on these
measures? In which cases?
• We would like one presentation per group of 10 people

• of 3 or 4 slides

• of max 4 minutes (less is fine too!)

• Send me the slides in PDF, with your group number in the title,
by email to l.hollink@cwi.nl, today before 16:30.

• The presentation should show clearly:

1. the AR method

2. how did you take into account semantics?

3. the evaluation method

• BONUS: argue when and why your approach will score high.

• BONUS: discuss how the newly learned links can be
represented and used.
Tips:

• you may pick a dataset that
you will use as an example

More Related Content

What's hot

NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
National Information Standards Organization (NISO)
 
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioDo it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Open Knowledge Belgium
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
Ontotext
 
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageBuild Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Ontotext
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Sören Auer
 
Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010
Juan Sequeda
 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
NISO/NFAIS Joint Virtual Conference:  Connecting the Library to the Wider Wor...NISO/NFAIS Joint Virtual Conference:  Connecting the Library to the Wider Wor...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
National Information Standards Organization (NISO)
 
From Structured Data to Linked Open Governmental Data
From Structured Data to Linked Open Governmental DataFrom Structured Data to Linked Open Governmental Data
From Structured Data to Linked Open Governmental Data
Dongpo Deng
 
Let your data shine... with OpenRefine
Let your data shine... with OpenRefineLet your data shine... with OpenRefine
Let your data shine... with OpenRefine
Open Knowledge Belgium
 
Linked Data Snowball, or Why We Need Reconciliation
Linked Data Snowball, or Why We Need ReconciliationLinked Data Snowball, or Why We Need Reconciliation
Linked Data Snowball, or Why We Need Reconciliation
Robert Sanderson
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
Carole Goble
 
DH11: Browsing Highly Interconnected Humanities Databases Through Multi-Resul...
DH11: Browsing Highly Interconnected Humanities Databases Through Multi-Resul...DH11: Browsing Highly Interconnected Humanities Databases Through Multi-Resul...
DH11: Browsing Highly Interconnected Humanities Databases Through Multi-Resul...Michele Pasin
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
EUCLID project
 
Efficient Practices for Large Scale Text Mining Process
Efficient Practices for Large Scale Text Mining ProcessEfficient Practices for Large Scale Text Mining Process
Efficient Practices for Large Scale Text Mining Process
Ontotext
 
The SFX Framework for Context-Sensitive Reference Linking
The SFX Framework for  Context-Sensitive Reference LinkingThe SFX Framework for  Context-Sensitive Reference Linking
The SFX Framework for Context-Sensitive Reference LinkingHerbert Van de Sompel
 
The Web of Data is Our Opportunity
The Web of Data is Our OpportunityThe Web of Data is Our Opportunity
The Web of Data is Our Opportunity
Richard Wallis
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
Armin Haller
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic Framework
Paul Groth
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
Chiara Del Vescovo
 
Museum Data Exchange
Museum Data ExchangeMuseum Data Exchange
Museum Data Exchange
OCLC Research
 

What's hot (20)

NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
 
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioDo it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
 
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageBuild Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
 
Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010
 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
NISO/NFAIS Joint Virtual Conference:  Connecting the Library to the Wider Wor...NISO/NFAIS Joint Virtual Conference:  Connecting the Library to the Wider Wor...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
 
From Structured Data to Linked Open Governmental Data
From Structured Data to Linked Open Governmental DataFrom Structured Data to Linked Open Governmental Data
From Structured Data to Linked Open Governmental Data
 
Let your data shine... with OpenRefine
Let your data shine... with OpenRefineLet your data shine... with OpenRefine
Let your data shine... with OpenRefine
 
Linked Data Snowball, or Why We Need Reconciliation
Linked Data Snowball, or Why We Need ReconciliationLinked Data Snowball, or Why We Need Reconciliation
Linked Data Snowball, or Why We Need Reconciliation
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
DH11: Browsing Highly Interconnected Humanities Databases Through Multi-Resul...
DH11: Browsing Highly Interconnected Humanities Databases Through Multi-Resul...DH11: Browsing Highly Interconnected Humanities Databases Through Multi-Resul...
DH11: Browsing Highly Interconnected Humanities Databases Through Multi-Resul...
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
 
Efficient Practices for Large Scale Text Mining Process
Efficient Practices for Large Scale Text Mining ProcessEfficient Practices for Large Scale Text Mining Process
Efficient Practices for Large Scale Text Mining Process
 
The SFX Framework for Context-Sensitive Reference Linking
The SFX Framework for  Context-Sensitive Reference LinkingThe SFX Framework for  Context-Sensitive Reference Linking
The SFX Framework for Context-Sensitive Reference Linking
 
The Web of Data is Our Opportunity
The Web of Data is Our OpportunityThe Web of Data is Our Opportunity
The Web of Data is Our Opportunity
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic Framework
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
 
Museum Data Exchange
Museum Data ExchangeMuseum Data Exchange
Museum Data Exchange
 

Viewers also liked

Advance Clustering Technique Based on Markov Chain for Predicting Next User M...
Advance Clustering Technique Based on Markov Chain for Predicting Next User M...Advance Clustering Technique Based on Markov Chain for Predicting Next User M...
Advance Clustering Technique Based on Markov Chain for Predicting Next User M...
idescitation
 
Dotnet titles 2016 17
Dotnet titles 2016 17Dotnet titles 2016 17
Dotnet titles 2016 17
praba123456
 
Webmining ppt
Webmining pptWebmining ppt
Webmining ppt
kiransatyawada
 
Applying web mining application for user behavior understanding
Applying web mining application for user behavior understandingApplying web mining application for user behavior understanding
Applying web mining application for user behavior understandingZakaria Zubi
 
Preprocessing of Web Log Data for Web Usage Mining
Preprocessing of Web Log Data for Web Usage MiningPreprocessing of Web Log Data for Web Usage Mining
Preprocessing of Web Log Data for Web Usage Mining
Amir Masoud Sefidian
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
Devakumar Jain
 
Web Mining Presentation Final
Web Mining Presentation FinalWeb Mining Presentation Final
Web Mining Presentation FinalEr. Jagrat Gupta
 

Viewers also liked (8)

Advance Clustering Technique Based on Markov Chain for Predicting Next User M...
Advance Clustering Technique Based on Markov Chain for Predicting Next User M...Advance Clustering Technique Based on Markov Chain for Predicting Next User M...
Advance Clustering Technique Based on Markov Chain for Predicting Next User M...
 
Dotnet titles 2016 17
Dotnet titles 2016 17Dotnet titles 2016 17
Dotnet titles 2016 17
 
Webmining ppt
Webmining pptWebmining ppt
Webmining ppt
 
5463 26 web mining
5463 26 web mining5463 26 web mining
5463 26 web mining
 
Applying web mining application for user behavior understanding
Applying web mining application for user behavior understandingApplying web mining application for user behavior understanding
Applying web mining application for user behavior understanding
 
Preprocessing of Web Log Data for Web Usage Mining
Preprocessing of Web Log Data for Web Usage MiningPreprocessing of Web Log Data for Web Usage Mining
Preprocessing of Web Log Data for Web Usage Mining
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
 
Web Mining Presentation Final
Web Mining Presentation FinalWeb Mining Presentation Final
Web Mining Presentation Final
 

Similar to Knowledge discoverylaurahollink

FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
Carole Goble
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
Roberto García
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Web
ebiquity
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
Carole Goble
 
Metadata Provenance Tutorial at SWIB 13, Part 1
Metadata Provenance Tutorial at SWIB 13, Part 1Metadata Provenance Tutorial at SWIB 13, Part 1
Metadata Provenance Tutorial at SWIB 13, Part 1
Kai Eckert
 
BibBase Linked Data Triplification Challenge 2010 Presentation
BibBase Linked Data Triplification Challenge 2010 PresentationBibBase Linked Data Triplification Challenge 2010 Presentation
BibBase Linked Data Triplification Challenge 2010 Presentation
Reynold Xin
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
Herbert Van de Sompel
 
Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)
robin fay
 
Research Shared: researchobject.org
Research Shared: researchobject.orgResearch Shared: researchobject.org
Research Shared: researchobject.org
Norman Morrison
 
A Framework for Ontology Usage Analysis
A Framework for Ontology Usage AnalysisA Framework for Ontology Usage Analysis
A Framework for Ontology Usage Analysis
Jamshaid Ashraf
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Anita de Waard
 
Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking  Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking
Mohamed BEN ELLEFI
 
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
Mark Wilkinson
 
20141112 courtot big_datasemwebontologies
20141112 courtot big_datasemwebontologies20141112 courtot big_datasemwebontologies
20141112 courtot big_datasemwebontologies
Melanie Courtot
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Anja Jentzsch
 
Role of Semantic Web in Health Informatics
Role of Semantic Web in Health InformaticsRole of Semantic Web in Health Informatics
Role of Semantic Web in Health Informatics
Artificial Intelligence Institute at UofSC
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
Riccardo Albertoni
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
Lucy McKenna
 
Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overview
Amit Sheth
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
Carole Goble
 

Similar to Knowledge discoverylaurahollink (20)

FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Web
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
 
Metadata Provenance Tutorial at SWIB 13, Part 1
Metadata Provenance Tutorial at SWIB 13, Part 1Metadata Provenance Tutorial at SWIB 13, Part 1
Metadata Provenance Tutorial at SWIB 13, Part 1
 
BibBase Linked Data Triplification Challenge 2010 Presentation
BibBase Linked Data Triplification Challenge 2010 PresentationBibBase Linked Data Triplification Challenge 2010 Presentation
BibBase Linked Data Triplification Challenge 2010 Presentation
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)
 
Research Shared: researchobject.org
Research Shared: researchobject.orgResearch Shared: researchobject.org
Research Shared: researchobject.org
 
A Framework for Ontology Usage Analysis
A Framework for Ontology Usage AnalysisA Framework for Ontology Usage Analysis
A Framework for Ontology Usage Analysis
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking  Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking
 
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
 
20141112 courtot big_datasemwebontologies
20141112 courtot big_datasemwebontologies20141112 courtot big_datasemwebontologies
20141112 courtot big_datasemwebontologies
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
 
Role of Semantic Web in Health Informatics
Role of Semantic Web in Health InformaticsRole of Semantic Web in Health Informatics
Role of Semantic Web in Health Informatics
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
 
Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overview
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
 

Recently uploaded

Fresher’s Quiz 2023 at GMC Nizamabad.pptx
Fresher’s Quiz 2023 at GMC Nizamabad.pptxFresher’s Quiz 2023 at GMC Nizamabad.pptx
Fresher’s Quiz 2023 at GMC Nizamabad.pptx
SriSurya50
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
Bisnar Chase Personal Injury Attorneys
 
What is the purpose of studying mathematics.pptx
What is the purpose of studying mathematics.pptxWhat is the purpose of studying mathematics.pptx
What is the purpose of studying mathematics.pptx
christianmathematics
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
Krisztián Száraz
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
ArianaBusciglio
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
ak6969907
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 
Assignment_4_ArianaBusciglio Marvel(1).docx
Assignment_4_ArianaBusciglio Marvel(1).docxAssignment_4_ArianaBusciglio Marvel(1).docx
Assignment_4_ArianaBusciglio Marvel(1).docx
ArianaBusciglio
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
RitikBhardwaj56
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Delivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and TrainingDelivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and Training
AG2 Design
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
kitab khulasah nurul yaqin jilid 1 - 2.pptx
kitab khulasah nurul yaqin jilid 1 - 2.pptxkitab khulasah nurul yaqin jilid 1 - 2.pptx
kitab khulasah nurul yaqin jilid 1 - 2.pptx
datarid22
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 

Recently uploaded (20)

Fresher’s Quiz 2023 at GMC Nizamabad.pptx
Fresher’s Quiz 2023 at GMC Nizamabad.pptxFresher’s Quiz 2023 at GMC Nizamabad.pptx
Fresher’s Quiz 2023 at GMC Nizamabad.pptx
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
 
What is the purpose of studying mathematics.pptx
What is the purpose of studying mathematics.pptxWhat is the purpose of studying mathematics.pptx
What is the purpose of studying mathematics.pptx
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 
Assignment_4_ArianaBusciglio Marvel(1).docx
Assignment_4_ArianaBusciglio Marvel(1).docxAssignment_4_ArianaBusciglio Marvel(1).docx
Assignment_4_ArianaBusciglio Marvel(1).docx
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Delivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and TrainingDelivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and Training
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
kitab khulasah nurul yaqin jilid 1 - 2.pptx
kitab khulasah nurul yaqin jilid 1 - 2.pptxkitab khulasah nurul yaqin jilid 1 - 2.pptx
kitab khulasah nurul yaqin jilid 1 - 2.pptx
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 

Knowledge discoverylaurahollink

  • 1. Knowlege Discovery for the Semantic Web An Application to Web Usage Mining & How to use semantics in the Preprocessing stage Input Data Data Preprocessing and Transformation Data Mining Interpretation and Evaluation Information/ Taking Action Data fusion (multiple sources) Data Cleaning (noise,missing val.) Feature Selection Dimensionality Reduction Data Normalization Filtering Patterns Visualization Statistical Analysis - Hypothesis testing - Attribute evaluation - Comparing learned models - Computing Confidence Intervals Claudia D’Amato - University of Bari, IT. Laura Hollink - Centrum Wiskunde & Informatica, Amsterdam, NL.
  • 2. Knowlege Discovery for the Semantic Web An Application to Web Usage Mining & How to use semantics in the Preprocessing stage Input Data Data Preprocessing and Transformation Data Mining Interpretation and Evaluation Information/ Taking Action Data fusion (multiple sources) Data Cleaning (noise,missing val.) Feature Selection Dimensionality Reduction Data Normalization Filtering Patterns Visualization Statistical Analysis - Hypothesis testing - Attribute evaluation - Comparing learned models - Computing Confidence Intervals Claudia D’Amato - University of Bari, IT. Laura Hollink - Centrum Wiskunde & Informatica, Amsterdam, NL.
  • 3. An application to Web Usage Mining Web Usage Mining = discovering patterns in logs of user interaction with Web resources • logs typically contain an identifier for users (e.g. ip address), their queries and clicks
  • 4. An application to Web Usage Mining Web Usage Mining = discovering patterns in logs of user interaction with Web resources • logs typically contain an identifier for users (e.g. ip address), their queries and clicks • What about usage of Linked Open Data?
  • 5. An application to Web Usage Mining Web Usage Mining = discovering patterns in logs of user interaction with Web resources • logs typically contain an identifier for users (e.g. ip address), their queries and clicks • What about usage of Linked Open Data? • Can we use semantics to improve mining of Web Usage?
  • 6. Mining Usage of Linked Open Data in USEWOD USEWOD: http://usewod.org/ [B. Berendt, L. Hollink., M. Luczak-Roesch, et al.] 1. USEWOD workshop series @ ESWC / WWW since 2011 2. USEWOD dataset: server logs of DBpedia, BioPortal, LinkedGeoData, etc., and client side logs from YASGUI.
  • 7. Mining Usage of Linked Open Data in USEWOD USEWOD: http://usewod.org/ [B. Berendt, L. Hollink., M. Luczak-Roesch, et al.] 1. USEWOD workshop series @ ESWC / WWW since 2011 2. USEWOD dataset: server logs of DBpedia, BioPortal, LinkedGeoData, etc., and client side logs from YASGUI. example removed
  • 8. Mining Usage of Linked Open Data in USEWOD • Results of USEWOD: LOD usage mining for more efficient indexing [1], cashing [2], auto-completion [3], etc. [1] Arias, M., Fernández, J. D., Martínez-Prieto, M. A., & de la Fuente, P. (2011). An empirical study of real-world SPARQL queries. USEWOD @ WWW 2011 [2] Lorey, J., & Naumann, F. Caching and prefetching strategies for sparql queries. USEWOD @ ESWC 2013. [3] K. Kramer,R.Q. Dividino, and G. Gröner. SPACE: SPARQL Index for Efficient Autocompletion. ISWC (Posters & Demos) 2013. [4] Rietveld, L., & Hoekstra, R. Man vs. Machine: Differences in SPARQL Queries. USEWOD @ ESWC 2014 [5] Huelss, J., & Paulheim, H. What SPARQL Query Logs Tell and do not Tell about Semantic Relatedness in LOD. NoISE @ ESWC 2015 • Issues: • what is the difference between queries by machines and humans? [4] • what is the meaning of repeated queries by bots/tools? • a lot of the usage is invisible due to data dump download [5]
  • 9. Usage mining example 1: clustering rdf:properties in DBpedia Instead of listing all DBpedia properties alphabetically, can we display them in a more meaningful way? Can we use query logs for this? [5]
  • 10. Usage mining example 1: clustering rdf:properties in DBpedia Instead of listing all DBpedia properties alphabetically, can we display them in a more meaningful way? Can we use query logs for this? [5] [5] Huelss, J., & Paulheim, H. What SPARQL Query Logs Tell and do not Tell about Semantic Relatedness in LOD. NoISE @ ESWC 2015 Disclaimer: simplified discussion of this paper!
  • 11. Usage mining example 1: clustering rdf:properties in DBpedia Approach: Hierarchical Clustering of properties, where the distance between a pair of properties is based on how often they co-occur in a SPARQL query in the USEWOD2015 logs. [5] Huelss, J., & Paulheim, H. What SPARQL Query Logs Tell and do not Tell about Semantic Relatedness in LOD. NoISE @ ESWC 2015 Disclaimer: simplified discussion of this paper!
  • 12. Usage mining example 1: clustering rdf:properties in DBpedia Approach: Hierarchical Clustering of properties, where the distance between a pair of properties is based on how often they co-occur in a SPARQL query in the USEWOD2015 logs. [5] Huelss, J., & Paulheim, H. What SPARQL Query Logs Tell and do not Tell about Semantic Relatedness in LOD. NoISE @ ESWC 2015 Disclaimer: simplified discussion of this paper! Evaluation: run an experiment to measure how quickly and accurately people identify facts when looking at the standard view or the clustered view.
  • 13. Usage mining example 1: clustering rdf:properties in DBpedia Approach: Hierarchical Clustering of properties, where the distance between a pair of properties is based on how often they co-occur in a SPARQL query in the USEWOD2015 logs. [5] Huelss, J., & Paulheim, H. What SPARQL Query Logs Tell and do not Tell about Semantic Relatedness in LOD. NoISE @ ESWC 2015 Disclaimer: simplified discussion of this paper! Evaluation: run an experiment to measure how quickly and accurately people identify facts when looking at the standard view or the clustered view. Result: no significant differences ☹
  • 14. Usage mining example 1: clustering rdf:properties in DBpedia Approach: Hierarchical Clustering of properties, where the distance between a pair of properties is based on how often they co-occur in a SPARQL query in the USEWOD2015 logs. [5] Huelss, J., & Paulheim, H. What SPARQL Query Logs Tell and do not Tell about Semantic Relatedness in LOD. NoISE @ ESWC 2015 Disclaimer: simplified discussion of this paper! Evaluation: run an experiment to measure how quickly and accurately people identify facts when looking at the standard view or the clustered view. Result: no significant differences ☹
  • 15. Usage mining example 2: mining semantically enriched query logs [5] Laura Hollink, Peter Mika and Roi Blanco. Web Usage Mining with Semantic Analysis. WWW 2013.
  • 16. Usage mining example 2: mining semantically enriched query logs Data: queries and clicks on Yahoo! search engine. [5] Laura Hollink, Peter Mika and Roi Blanco. Web Usage Mining with Semantic Analysis. WWW 2013.
  • 17. Usage mining example 2: mining semantically enriched query logs Data: queries and clicks on Yahoo! search engine. Problem when mining ‘raw’ logs: low support of even the most frequent patterns [5] Laura Hollink, Peter Mika and Roi Blanco. Web Usage Mining with Semantic Analysis. WWW 2013.
  • 18. Usage mining example 2: mining semantically enriched query logs Data: queries and clicks on Yahoo! search engine. Problem when mining ‘raw’ logs: low support of even the most frequent patterns [5] Laura Hollink, Peter Mika and Roi Blanco. Web Usage Mining with Semantic Analysis. WWW 2013.
  • 19. Usage mining example 2: mining semantically enriched query logs Approach: 1. link queries to entities in LOD cloud 2. choose class of entity + selected properties 3. detect modifier words (download, trailer, cast, date, etc.)
  • 20. Usage mining example 2: mining semantically enriched query logs Approach: 1. link queries to entities in LOD cloud 2. choose class of entity + selected properties 3. detect modifier words (download, trailer, cast, date, etc.) 1. Link queries to entities in LOD cloud: • Freebase (has a lot of movie related info) • DBpedia (Wikipedia is widely used)
  • 21. Usage mining example 2: mining semantically enriched query logs Approach: 1. link queries to entities in LOD cloud 2. choose class of entity + selected properties 3. detect modifier words (download, trailer, cast, date, etc.) 1. Link queries to entities in LOD cloud: • Freebase (has a lot of movie related info) • DBpedia (Wikipedia is widely used)
  • 22. Usage mining example 2: mining semantically enriched query logs Approach: 1. link queries to entities in LOD cloud 2. choose class of entity + selected properties 3. detect modifier words (download, trailer, cast, date, etc.) 1. Link queries to entities in LOD cloud: • Freebase (has a lot of movie related info) • DBpedia (Wikipedia is widely used)
  • 23. Usage mining example 2: mining semantically enriched query logs •Sequential pattern mining on the class- level using PrefixSpan.
  • 24. Usage mining example 2: mining semantically enriched query logs •Sequential pattern mining on the class- level using PrefixSpan.
  • 25. Usage mining example 2: mining semantically enriched query logs 1.Discover frequent patterns on class-level using • Using the efficient PrefixSpan algorithm to mine all possible subsequence patterns
  • 26. Usage mining example 3: semantic patterns of query modification •Goal: Identify frequent query modifications in an image archive • state of the art = 3 classes: generalization, specification, reformulation •Approach: 1.link queries to entities in the LOD cloud 2.Choose class of entity 3.Determine shortest path between consecutive queries Q1 and Q2 4.Rank property-paths according to support and confidence. Hollink, V., Tsikrika, T., & de Vries, A. P. (2011). Semantic search log analysis: a method and a study on professional image search. JASIST 62(4), 691-713. See also: Huurnink, B., Hollink, L., Van Den Heuvel, W., & De Rijke, M. (2010). Search behavior of media professionals at an audiovisual archive: A transaction log analysis. JASIST, 61(6), 1180-1197.
  • 27. Usage mining example 3: semantic patterns of query modification •Goal: Identify frequent query modifications in an image archive • state of the art = 3 classes: generalization, specification, reformulation •Approach: 1.link queries to entities in the LOD cloud 2.Choose class of entity 3.Determine shortest path between consecutive queries Q1 and Q2 4.Rank property-paths according to support and confidence. Hollink, V., Tsikrika, T., & de Vries, A. P. (2011). Semantic search log analysis: a method and a study on professional image search. JASIST 62(4), 691-713. See also: Huurnink, B., Hollink, L., Van Den Heuvel, W., & De Rijke, M. (2010). Search behavior of media professionals at an audiovisual archive: A transaction log analysis. JASIST, 61(6), 1180-1197.
  • 28. Usage mining example 3: semantic patterns of query modification •Goal: Identify frequent query modifications in an image archive • state of the art = 3 classes: generalization, specification, reformulation •Approach: 1.link queries to entities in the LOD cloud 2.Choose class of entity 3.Determine shortest path between consecutive queries Q1 and Q2 4.Rank property-paths according to support and confidence. Hollink, V., Tsikrika, T., & de Vries, A. P. (2011). Semantic search log analysis: a method and a study on professional image search. JASIST 62(4), 691-713. See also: Huurnink, B., Hollink, L., Van Den Heuvel, W., & De Rijke, M. (2010). Search behavior of media professionals at an audiovisual archive: A transaction log analysis. JASIST, 61(6), 1180-1197. Conclusions: • Identified patterns not visible on raw data. • but “the method is only moderately successful in identifying the most prominent relations for a given query pair”
  • 29. The feature selection issue when using LOD Input Data Data Preprocessing and Transformation Data Mining Interpretation and Evaluation Information/ Taking Action Data fusion (multiple sources) Data Cleaning (noise,missing val.) Feature Selection Dimensionality Reduction Data Normalization Filtering Patterns Visualization Statistical Analysis - Hypothesis testing - Attribute evaluation - Comparing learned models - Computing Confidence Intervals
  • 30. Feature Selection • Feature selection = Limiting the number of features for faster computation times, more understandable models, better prediction value. • Using Linked Open Data can lead to large number of features per data point. • a DBpedia resource easily has 50 property-value pairs. • more are easily added using reasoning • note: these numbers are not large compared to the number of features in DNA strings, or all words in a text corpus! • Still, many of them are irrelevant or redundant.
  • 31. Feature Selection Example • Goal: learn a relation R between x and y. • In this paper, R = ‘occupation’, ‘gender’, ‘instance_of’, ‘acted_in’, ‘genre’, ‘position_played_on_team’ • Approach: given a training set of pairs of x, y, learn a “whitelist” of properties in DBpedia, WikiData, YAGO and WordNet that indicate a relation R between x and y • Cast as a subset selection problem: • E = the set of possible properties • local search over the power set of E (i.a. all subsets) to find the optimal subset. Learning to Exploit Structured Resources for Lexical Inference. Vered Shwartz, Omer Levy, Ido Dagan and Jacob Goldberger. CoNLL 2015 (to appear)july
  • 32. Data Fusion Input Data Data Preprocessing and Transformation Data Mining Interpretation and Evaluation Information/ Taking Action Data fusion (multiple sources) Data Cleaning (noise,missing val.) Feature Selection Dimensionality Reduction Data Normalization Filtering Patterns Visualization Statistical Analysis - Hypothesis testing - Attribute evaluation - Comparing learned models - Computing Confidence Intervals
  • 33. Data Fusion / Ontology Alignment / Mapping / Matching / Linking / Integration Ontology / Schema / T-box level Instance / data / A-box level
  • 37. Data Fusion ~~~ ~~~ ~ ~~~ ~~ ~ ~~~ ~~~ ~ ~~~ ~~ ~ ~~~ ~~~ ~ ~~~ ~~ ~ ~~~ ~~~ ~ ~~~ ~~ ~ ~~~ ~~~ ~ ~~~ ~~ ~ ~~~ ~~~ ~ ~~~ ~~ ~ Entity detection / entity linking
  • 38. Methods for Data Fusion (ontology alignment) label label label label
  • 39. Methods for Data Fusion: structural matchers label label label label
  • 40. Methods for Data Fusion: structural matchers label label label label • E.g. Similarity Flooding: the similarity of a matched pair s1 and t1 propagates to their respective neighbors s2 and t2. • neighbors can be defined as subclasses, superclasses, instances, domain/ranges, etc. • Structural measures are in practice never used stand alone. [10] Ngo, Duy Hoa, and Zohra Bellahsene. YAM++-results for OAEI 2012. OAEI @ ISWC 2012. [11] Sergey Melnik, Hector Garcia-Molina, and Erhard Rahm. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. ICDE 2002.
  • 41. Methods for Data Fusion: instance based matchers label label label label
  • 42. Methods for Data Fusion: instance based matchers label label label label • Match classes based on similarity of their instances • note: you need a way to assess similarity of the instances!
  • 43. Methods for Data Fusion: string based label label label label
  • 44. Methods for Data Fusion: string based • This is the most important feature in ontology alignment. • “nearly all [ontology alignment systems] use a string similarity metric” [12] • stopping and stemming is not helpful! Nor is using WordNet synonyms. [12] • In [13] we took an even less semantic approach: linking based on URL syntax. label label label label [12] Cheatham, M., & Hitzler, P. String similarity metrics for ontology alignment. ISWC 2013. [13] The debates of the European Parliament as Linked Open Data. Under review. See http://www.talkofeurope.eu/ data/ for details.
  • 45. Methods for Data Fusion: string based • This is the most important feature in ontology alignment. • “nearly all [ontology alignment systems] use a string similarity metric” [12] • stopping and stemming is not helpful! Nor is using WordNet synonyms. [12] • In [13] we took an even less semantic approach: linking based on URL syntax. label label label label [12] Cheatham, M., & Hitzler, P. String similarity metrics for ontology alignment. ISWC 2013. [13] The debates of the European Parliament as Linked Open Data. Under review. See http://www.talkofeurope.eu/ data/ for details.
  • 46. Methods for Data Fusion: string based • This is the most important feature in ontology alignment. • “nearly all [ontology alignment systems] use a string similarity metric” [12] • stopping and stemming is not helpful! Nor is using WordNet synonyms. [12] • In [13] we took an even less semantic approach: linking based on URL syntax. label label label label [12] Cheatham, M., & Hitzler, P. String similarity metrics for ontology alignment. ISWC 2013. [13] The debates of the European Parliament as Linked Open Data. Under review. See http://www.talkofeurope.eu/ data/ for details. http://www.dbpedia.org/page/Judith_Sargentini
  • 47. Link types Equality SameAs EquivalentClasses EquivalentProperties “Den Haag” = “The Hague” wood-material = wood Hierarchical rdfs:subClassOf rdf:type rdfs:subPropertyOf aat:Artist ⊇ wn:Artist tgn:Africa ∈ wn:Continent conf:has_the_last_name = edas:hasLastName Weaker semantics skos:closeMatch / exactMatch / broadMatch /narrowMatch / relatedMatch geonames:Italy skos:closeMatch librarytopics:Italy Domain specific links E.g. born-in E.g. hasStyle E.g. hasPart Van Gogh (ULAN) born-in Groot- Zundert (TGN)
  • 48. Representation of links architecten architectsskos:exactMatch
  • 49. Representation of links architecten architects Link 001 skos:exactMatch handmatigL. Hollink concept1 concept2 link type link methode auteur architecten architectsskos:exactMatch
  • 50. Representation of links architecten architects Link 001 skos:exactMatch handmatigL. Hollink concept1 concept2 link type link methode auteur architecten architectsskos:exactMatch • Open Question: how valid are the patterns we discover in data when the quality of the links is low?
  • 51. Representation of links architecten architects Link 001 skos:exactMatch handmatigL. Hollink concept1 concept2 link type link methode auteur architecten architectsskos:exactMatch • Open Question: how valid are the patterns we discover in data when the quality of the links is low?
  • 52. Representation of links architecten architects Link 001 skos:exactMatch handmatigL. Hollink concept1 concept2 link type link methode auteur architecten architectsskos:exactMatch • Open Question: how valid are the patterns we discover in data when the quality of the links is low? • Even more important to be critical and evaluate the data • source criticism • tool criticism (see http:// event.cwi.nl/toolcriticism/)
  • 53. Evaluation of Data Fusion / Linking
  • 54. Evaluation of Data Fusion / Linking 1. Manually rating (a sample of) mappings • relatively cheap and easy to interpret • only precision, no recall
  • 55. Evaluation of Data Fusion / Linking 1. Manually rating (a sample of) mappings • relatively cheap and easy to interpret • only precision, no recall 2. Comparison to a reference alignment • precision and recall • used in OAEI on the SEALS platform • more expensive if a reference alignment has to be created (but: crowd sourcing!)
  • 56. Evaluation of Data Fusion / Linking 1. Manually rating (a sample of) mappings • relatively cheap and easy to interpret • only precision, no recall 2. Comparison to a reference alignment • precision and recall • used in OAEI on the SEALS platform • more expensive if a reference alignment has to be created (but: crowd sourcing!) 3. End-to-end evaluation (a.k.a. evaluating an application that uses the mappings) • arguably the best method! • need to have access to an application + users
  • 57. Evaluation of Data Fusion / Linking
  • 58. Evaluation of Data Fusion / Linking • Comparison to a reference alignment: Alternative measures: • 1. instead of a binary classification into correct/incorrect mappings, take into account how wrong an link is:
  • 59. Evaluation of Data Fusion / Linking • Comparison to a reference alignment: Alternative measures: • 1. instead of a binary classification into correct/incorrect mappings, take into account how wrong an link is:
  • 60. Evaluation of Data Fusion / Linking • Comparison to a reference alignment: Alternative measures: • 1. instead of a binary classification into correct/incorrect mappings, take into account how wrong an link is:
  • 61. Evaluation of Data Fusion / Linking • Comparison to a reference alignment: Alternative measures: • 1. instead of a binary classification into correct/incorrect mappings, take into account how wrong an link is: • where r(a) is the semantic distance between correspondence a and correspondence a’ in the reference alignment, A is the number of correspondences. • 2. weight score of mappings based on the frequency of their use • e.g from usage logs! Laura Hollink, Mark van Assem, Shenghui Wang, Antoine Isaac, Guus Schreiber. Two Variations on Ontology Alignment Evaluation: Methodological Issues.ESWC 2008.
  • 62. Evaluation of Data Fusion / Linking 1. Manually rating (a sample of) mappings • relatively cheap and easy to interpret • only precision, no recall 2. Comparison to a reference alignment • precision and recall • used in OAEI on the SEALS platform • more expensive if a reference alignment has to be created (but: crowd sourcing!) 3. End-to-end evaluation (a.k.a. evaluating an application that uses the mappings) • arguably the best method! • need to have access to an application + users
  • 63. Discovering links from text Pointers to what happens in other communities • Word2Vec: efficient deep learning algorithm to learn vector representations of words • vector similarity captures semantics between words • No explicit semantics, but we can’t deny that there is meaning there! • Success seems to be mostly due to big data
  • 64. Discovering links from text Pointers to what happens in other communities • Word2Vec: efficient deep learning algorithm to learn vector representations of words • vector similarity captures semantics between words • No explicit semantics, but we can’t deny that there is meaning there! • Success seems to be mostly due to big data Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013.
  • 65. Discovering links from text Pointers to what happens in other communities • Word2Vec: efficient deep learning algorithm to learn vector representations of words • vector similarity captures semantics between words • No explicit semantics, but we can’t deny that there is meaning there! • Success seems to be mostly due to big data Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013. Example: Vec(Madrid) - Vec(Spain) + Vec(France) is closer to Vec(Paris) than to any other vector
  • 66. NELL: Never-Ending Language Learning • several machine learning approaches to discover facts (beliefs) from text on the web • string features, distribution of context words, html structure, visual image analysis. • Running since 2010, has so far learned over 80 million beliefs
  • 67. NELL: Never-Ending Language Learning • several machine learning approaches to discover facts (beliefs) from text on the web • string features, distribution of context words, html structure, visual image analysis. • Running since 2010, has so far learned over 80 million beliefs T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohamed, N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves, J. Welling. In Proceedings of the Conference on Artificial Intelligence (AAAI), 2015.
  • 68. Research Task Format Work in 6 groups of 10 students • 5 people design an approach to association rules with semantics • 5 people focus on how that approach should be evaluated The idea is to work together! E.g. which measures are best for this approach? Which versions of the approach should be evaluated? Will this approach score high on these measures? In which cases?
  • 69. Research Task Format Work in 6 groups of 10 students • 5 people design an approach to association rules with semantics • 5 people focus on how that approach should be evaluated The idea is to work together! E.g. which measures are best for this approach? Which versions of the approach should be evaluated? Will this approach score high on these measures? In which cases? • We would like one presentation per group of 10 people • of 3 or 4 slides • of max 4 minutes (less is fine too!) • Send me the slides in PDF, with your group number in the title, by email to l.hollink@cwi.nl, today before 16:30. • The presentation should show clearly: 1. the AR method 2. how did you take into account semantics? 3. the evaluation method • BONUS: argue when and why your approach will score high. • BONUS: discuss how the newly learned links can be represented and used.
  • 70. Research Task Format Work in 6 groups of 10 students • 5 people design an approach to association rules with semantics • 5 people focus on how that approach should be evaluated The idea is to work together! E.g. which measures are best for this approach? Which versions of the approach should be evaluated? Will this approach score high on these measures? In which cases? • We would like one presentation per group of 10 people • of 3 or 4 slides • of max 4 minutes (less is fine too!) • Send me the slides in PDF, with your group number in the title, by email to l.hollink@cwi.nl, today before 16:30. • The presentation should show clearly: 1. the AR method 2. how did you take into account semantics? 3. the evaluation method • BONUS: argue when and why your approach will score high. • BONUS: discuss how the newly learned links can be represented and used. Tips: • you may pick a dataset that you will use as an example