SlideShare a Scribd company logo
1 of 42
Download to read offline
Improving access to digital collections by
semantic enrichment
Theo van Veen and Juliette Lonij, Semantics 2017, 12-09-2017
Overview
• Motivation and approach
• Entity linking
• Presentation
• Semantic search
• User feedback
• Wikidata as thesaurus
• Conclusions and next steps
T. v. Veen and J. Lonij, Semantics 2017
Motivation and approach
T. v. Veen and J. Lonij, Semantics 2017
Motivation
• Improving discovery and usability requires intelligent
connection of content to the outside world.
• Content contains “knowledge” requiring intelligent
preprocessing to be found.
• Knowledge should be offered to the user more or less
unsolicited
• Our software should have read, analyzed and
enriched our content completely prior to the user!!
T. v. Veen and J. Lonij, Semantics 2017
Enrichment: purpose and approach
• making content better findable and usable,
especially newspaper articles
• by enriching text and names in the text with links to
related information
• which is in most cases linked data (links to Wikidata,
Polygoon news reels, images)
• and which enables advanced queries and presentation
of context information
T. v. Veen and J. Lonij, Semantics 2017
How?
• “Things” in text have to be uniquely identified
• When identifiers link to resource descriptions it is
possible to present context information about
“things”
• Relevant context information can be indexed as part of
a “thing” and so it can be searched for
• Using properties in external resource descriptions enable
semantic search
1. Identification
2. Context
3. Indexing
4. Semantic search
T. v. Veen and J. Lonij, Semantics 2017
Enrichment types
• Newspaper articles and radio bulletins linked to Polygoon newsreels
• Named entities linked to DBpedia (and Wikidata, VIAF etc.)
• Place-street combinations in newspaper articles linked to latitude
and longitude
• Newspaper articles linked to images from the Memory of the
Netherlands
Named
entities
Geodata Links Extracted
features
User
annotation
Image enrichment
DBpedia Street, place,
latt., long.
Web pages Classification Tags Face recognition
Wikidata Place,
latt., long.
Video Sentiment Stories (oral
history)
Emotion detection
VIAF Images Relevance Object detection
Geonames Sound Interestingness Structure detection
Etc.
Now available
T. v. Veen and J. Lonij, Semantics 2017
https://www.wikidata.org/wiki/Q44306
WIKIDATA
http://viaf.org/viaf/29540187/
VIAF
http://www.isni.org/isni/0000000122778487
ISNI
http://data.kb.nl/thesaurus/068941056
KB thesaurus
• For bibliographic data
trusted links are available:
from thesaurus to VIAF,
from VIAF to ISNI and
from ISNI to Wikidata
• For persons, events and
locations in text the
links have to be created
T. v. Veen and J. Lonij, Semantics 2017
We need:
How do we deal with names in text?
• Recognize names (named entity recognition)
• Identify names by searching them in DBpedia and link the names
to the DBpedia descriptions
• Those names are ambiguous: does Einstein link to Albert Einstein
or Alfred Einstein? We need disambiguation algorithms.
• The accuracy of the links will be improved by machine learning
techniques; conventional “if then else” software isn’t fit for this job
• We need user feedback to correct false links or add missing
links and this will be used for additional training
T. v. Veen and J. Lonij, Semantics 2017
Entity linking
T. v. Veen and J. Lonij, Semantics 2017
Named Entity Linking
DBpediaSolr
Index
DBpedia
Search
entity
Named
Entity
recognition
List
with
Einsteins
Enrichment
database
Enrichment
and
training
process
article
Get entities
Store
article id +
resource ids
Find the best
candidate
VIAF
Wikidata
Etc.
T. v. Veen and J. Lonij, Semantics 2017
Enrichment
process
articles
Enrichment
database
KB research environment SURFsara HPC environment
DBpedia
index
Enrichment infrastructure using SURFsara HPC cloud
disambiguation
Named
entity
recognition
search
T. v. Veen and J. Lonij, Semantics 2017
Continuous improvement
of enrichment algorithm
article number / time
80
1 108 mlj
• All DBpedia titles searched in news articles
• Named Entities searched in DBpedia
• Speedup by using HPC cloud SURFsara
• Using context and machine learning
Quality/confidence(%)
70
T. v. Veen and J. Lonij, Semantics 2017
90
At the end cycle to first article and
overwrite earlier enrichments with
newest algorithm
algorithm accuracy link recall link precision link F-measure
Rule based .76 .76 .65 .70
Machine learning (SVM) .84 .76 .83 .79
Neural network .84 .73 .87 .79
Extra features
e.g. word embedding
.85 .81 .82 .82
Extra Wikidata data,
more training data
.87 .81 .86 .84
Entity embedding .88 .86 .85 .85
From conventional entity linking to deep learning and
beyond
T. v. Veen and J. Lonij, Semantics 2017
Development cycle
Justification: Our aim is obtaining a higher quality than existing entity linking software (e.g. DBpedia Spotlight)
Trust/quality
Stored
LNE’s
Running
algorithm
Algorithm in
development
Enriched by
users
Target trust level
T. v. Veen and J. Lonij, Semantics 2017
train
replace
improve Example of comparison of stored LNE’s, result of current algorithm, result
of algorithm under development and existing software.
Presentation
T. v. Veen and J. Lonij, Semantics 2017
Naam en/of datum
Naam en/of datum
• Theo van Veen, 16-6-2016
Research portal
Identified names (LNE)
and other enrichments
Configurable services
Context information
and extra navigation
options for a name
Semantic search
T. v. Veen and J. Lonij, Semantics 2017
Sematic search:
index resource identifiers
Newspaper
index
Text + Viaf id +
Wikidata id etc.
Enrichment
database
Indexing
Get text for
article X Get enrichments
for article X
search articles with
wikidata id’s
Wikidata Semantic search (SPARQL)
providing wikidata id’s
T. v. Veen and J. Lonij, Semantics 2017
Articles mentioning
members of
parliament not born in
the Netherlands
SELECT ?p WHERE {
?p wdt:P39 wd:Q18887908 .
?p wdt:P19 ?place .
?place wdt:P17 ?country .
FILTER NOT EXISTS {
?place wdt:P17 wd:Q55 .
} }
For the same query in
the catalogue the
Wikidata identifier is
converted to the local
thesaurus identifier
• Semantic query between [ ], in this
case expand to all Roman Emperors
• Select “newspaper+” collection
• Select a result
• Click on a linked named entity for more
information
• Click on “More info” for properties of
this entity
• Click on a property for searching more
articles about resources with that
property
• And see the result: all articles
mentioning persons that have been
married to Elizabeth Taylor
Navigation example Using square brackets the
software tries a few
Wikidata SPARQL queries
and replaces this string
by the Wikidata results.
• Semantic query between [ ], in this case
expand to all Roman Emperors
• Select “newspaper+” collection
• Select a result
• Click on a linked named entity for more
information
• Click on “More info” for properties of
this entity
• Click on a property for searching more
articles about resources with that
property
• And see the result: all articles
mentioning persons that have been
married to Elizabeth Taylor
Navigation example
• Semantic query between [ ], in this case
expand to all Roman Emperors
• Select “newspaper+” collection
• Select a result
• Click on a linked named entity for more
information
• Click on “More info” for properties of
this entity
• Click on a property for searching more
articles about resources with that
property
• And see the result: all articles
mentioning persons that have been
married to Elizabeth Taylor
Navigation example
• Semantic query between [ ], in this
case expand to all Roman Emperors
• Select “newspaper+” collection
• Select a result
• Click on a linked named entity for
more information
• Click on “More info” for properties of
this entity
• Click on a property for searching more
articles about resources with that
property
• And see the result: all articles
mentioning persons that have been
married to Elizabeth Taylor
Navigation example
• Semantic query between [ ], in this case
expand to all Roman Emperors
• Select “newspaper+” collection
• Select a result
• Click on a linked named entity for more
information
• Click on “More info” for properties of
this entity
• Click on a property for searching more
articles about resources with that
property
• And see the result: all articles
mentioning persons that have been
married to Elizabeth Taylor
Navigation example
spouse=Elizabeth Taylor
• Semantic query between [ ], in this case
expand to all Roman Emperors
• Select “newspaper+” collection
• Select a result
• Click on a linked named entity for more
information
• Click on “More info” for properties of this
entity
• Click on a property for searching more
articles about resources with that
property
• And see the result: all articles
mentioning persons that have been
married to Elizabeth Taylor
Navigation example
User feedback
T. v. Veen and J. Lonij, Semantics 2017
User feedback is needed
for correcting false links
and adding new links!
This feedback serves as
additional training data
for the disambiguation
software
Wikidata as thesaurus
T. v. Veen and J. Lonij, Semantics 2017
Wikidata as central hub?
W
W
“Everything links to
everything”
“Everything links to
Wikidata”
Current situation: many to many links
(many identifiers for single resource)
Proposed: everything links to Wikidata
(same identifier for single resource)
Wikidata as universal thesaurus for libraries
T. v. Veen and J. Lonij, Semantics 2017
Conclusions and next steps
T. v. Veen and J. Lonij, Semantics 2017
Conclusions and next steps
• Entity linking combining machine learning and domain knowledge is
promising and we still have ideas for improvements
• We have shown the added value of linking named entities to
Wikidata/DBpedia: it improves findability and usability of content as
demonstrated with the research portal
• Our aim is to increase the confidence of links so users can trust them
“enough” for using them for searching and research
• User feedback provides additional training data and needs to be
deployed on a larger scale
T. v. Veen and J. Lonij, Semantics 2017
Any questions?
theo.vanveen@kb.nl
juliette.lonij@kb.nl
http://www.kbresearch.nl/xportal

More Related Content

What's hot

DBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataDBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataSebastian Hellmann
 
Linked Data: from Library Entities to the Web of Data
Linked Data: from Library Entities to the Web of DataLinked Data: from Library Entities to the Web of Data
Linked Data: from Library Entities to the Web of DataRichard Wallis
 
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...WARCnet
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...Martin Klein
 
Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...LIBER Europe
 
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons LearnedWWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons LearnedStefan Dietze
 
What can linked data do for digital libraries
What can linked data do for digital librariesWhat can linked data do for digital libraries
What can linked data do for digital librariesSören Auer
 
Next Steps for IMLS's National Digital Platform
Next Steps for IMLS's National Digital PlatformNext Steps for IMLS's National Digital Platform
Next Steps for IMLS's National Digital PlatformTrevor Owens
 
Keystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joinedKeystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joinedJoel Azzopardi
 
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...BigData_Europe
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...BigData_Europe
 
OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...
OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...
OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...OpenAIRE
 
Towards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA projectTowards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA projectPRELIDA Project
 
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Anita de Waard
 
A discovery service for UK research data
A discovery service for UK research dataA discovery service for UK research data
A discovery service for UK research dataJisc RDM
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupAnita de Waard
 
SWSIG wlic2016
SWSIG wlic2016SWSIG wlic2016
SWSIG wlic2016Figoblog
 

What's hot (20)

DBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataDBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of Data
 
Linked Data: from Library Entities to the Web of Data
Linked Data: from Library Entities to the Web of DataLinked Data: from Library Entities to the Web of Data
Linked Data: from Library Entities to the Web of Data
 
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
 
Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...
 
Linked Data
Linked DataLinked Data
Linked Data
 
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons LearnedWWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
 
What can linked data do for digital libraries
What can linked data do for digital librariesWhat can linked data do for digital libraries
What can linked data do for digital libraries
 
Next Steps for IMLS's National Digital Platform
Next Steps for IMLS's National Digital PlatformNext Steps for IMLS's National Digital Platform
Next Steps for IMLS's National Digital Platform
 
Wikidata
WikidataWikidata
Wikidata
 
Keystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joinedKeystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joined
 
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
 
OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...
OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...
OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...
 
Towards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA projectTowards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA project
 
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
 
A discovery service for UK research data
A discovery service for UK research dataA discovery service for UK research data
A discovery service for UK research data
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest Group
 
CTDA Brown Bag, Oct. 2016
CTDA Brown Bag, Oct. 2016CTDA Brown Bag, Oct. 2016
CTDA Brown Bag, Oct. 2016
 
SWSIG wlic2016
SWSIG wlic2016SWSIG wlic2016
SWSIG wlic2016
 

Similar to Session 1.2 improving access to digital content by semantic enrichment

Schema.org - Extending Benefits
Schema.org - Extending BenefitsSchema.org - Extending Benefits
Schema.org - Extending BenefitsRichard Wallis
 
Schema.org - An Extending Influence
Schema.org - An Extending InfluenceSchema.org - An Extending Influence
Schema.org - An Extending InfluenceRichard Wallis
 
Data Designed for Discovery
Data Designed for DiscoveryData Designed for Discovery
Data Designed for DiscoveryOCLC
 
It19 20140721 linked data personal perspective
It19 20140721 linked data personal perspectiveIt19 20140721 linked data personal perspective
It19 20140721 linked data personal perspectiveJanifer Gatenby
 
Research on collaborative information sharing systems
Research on collaborative information sharing systemsResearch on collaborative information sharing systems
Research on collaborative information sharing systemsDavide Eynard
 
Services and Kew's (names) data
Services and Kew's (names) dataServices and Kew's (names) data
Services and Kew's (names) datanickyn
 
Linked Open Data in Romania
Linked Open Data in RomaniaLinked Open Data in Romania
Linked Open Data in RomaniaVlad Posea
 
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...Bradley Allen
 
Democratizing Data at Airbnb
Democratizing Data at AirbnbDemocratizing Data at Airbnb
Democratizing Data at AirbnbNeo4j
 
Crossref for Ambassadors - Introductory webinar
Crossref for Ambassadors - Introductory webinarCrossref for Ambassadors - Introductory webinar
Crossref for Ambassadors - Introductory webinarCrossref
 
Crossref for Ambassadors - Introductory webinar
Crossref for Ambassadors - Introductory webinarCrossref for Ambassadors - Introductory webinar
Crossref for Ambassadors - Introductory webinarVanessa Fairhurst
 
UKSG Conference 2015 - In and out: how does that metadata get into a knowledg...
UKSG Conference 2015 - In and out: how does that metadata get into a knowledg...UKSG Conference 2015 - In and out: how does that metadata get into a knowledg...
UKSG Conference 2015 - In and out: how does that metadata get into a knowledg...UKSG: connecting the knowledge community
 
Identifying The Benefit of Linked Data
Identifying The Benefit of Linked DataIdentifying The Benefit of Linked Data
Identifying The Benefit of Linked DataRichard Wallis
 
Metadata, Open Access and More: Crossref presentation
Metadata, Open Access and More: Crossref presentationMetadata, Open Access and More: Crossref presentation
Metadata, Open Access and More: Crossref presentationCrossref
 
Crossref LIVE UK Online
Crossref LIVE UK OnlineCrossref LIVE UK Online
Crossref LIVE UK OnlineCrossref
 
What open data can do
What open data can doWhat open data can do
What open data can doHailey Pate
 
What happened to the Semantic Web?
What happened to the Semantic Web?What happened to the Semantic Web?
What happened to the Semantic Web?Peter Mika
 

Similar to Session 1.2 improving access to digital content by semantic enrichment (20)

Schema.org - Extending Benefits
Schema.org - Extending BenefitsSchema.org - Extending Benefits
Schema.org - Extending Benefits
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
 
Schema.org - An Extending Influence
Schema.org - An Extending InfluenceSchema.org - An Extending Influence
Schema.org - An Extending Influence
 
Data Designed for Discovery
Data Designed for DiscoveryData Designed for Discovery
Data Designed for Discovery
 
It19 20140721 linked data personal perspective
It19 20140721 linked data personal perspectiveIt19 20140721 linked data personal perspective
It19 20140721 linked data personal perspective
 
Research on collaborative information sharing systems
Research on collaborative information sharing systemsResearch on collaborative information sharing systems
Research on collaborative information sharing systems
 
Linked data 20171106
Linked data 20171106Linked data 20171106
Linked data 20171106
 
NISO Webinar: Back From the Endangered List: Using Authority Data to Enhance ...
NISO Webinar: Back From the Endangered List: Using Authority Data to Enhance ...NISO Webinar: Back From the Endangered List: Using Authority Data to Enhance ...
NISO Webinar: Back From the Endangered List: Using Authority Data to Enhance ...
 
Services and Kew's (names) data
Services and Kew's (names) dataServices and Kew's (names) data
Services and Kew's (names) data
 
Linked Open Data in Romania
Linked Open Data in RomaniaLinked Open Data in Romania
Linked Open Data in Romania
 
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...
 
Democratizing Data at Airbnb
Democratizing Data at AirbnbDemocratizing Data at Airbnb
Democratizing Data at Airbnb
 
Crossref for Ambassadors - Introductory webinar
Crossref for Ambassadors - Introductory webinarCrossref for Ambassadors - Introductory webinar
Crossref for Ambassadors - Introductory webinar
 
Crossref for Ambassadors - Introductory webinar
Crossref for Ambassadors - Introductory webinarCrossref for Ambassadors - Introductory webinar
Crossref for Ambassadors - Introductory webinar
 
UKSG Conference 2015 - In and out: how does that metadata get into a knowledg...
UKSG Conference 2015 - In and out: how does that metadata get into a knowledg...UKSG Conference 2015 - In and out: how does that metadata get into a knowledg...
UKSG Conference 2015 - In and out: how does that metadata get into a knowledg...
 
Identifying The Benefit of Linked Data
Identifying The Benefit of Linked DataIdentifying The Benefit of Linked Data
Identifying The Benefit of Linked Data
 
Metadata, Open Access and More: Crossref presentation
Metadata, Open Access and More: Crossref presentationMetadata, Open Access and More: Crossref presentation
Metadata, Open Access and More: Crossref presentation
 
Crossref LIVE UK Online
Crossref LIVE UK OnlineCrossref LIVE UK Online
Crossref LIVE UK Online
 
What open data can do
What open data can doWhat open data can do
What open data can do
 
What happened to the Semantic Web?
What happened to the Semantic Web?What happened to the Semantic Web?
What happened to the Semantic Web?
 

More from semanticsconference

Linear books to open world adventure
Linear books to open world adventureLinear books to open world adventure
Linear books to open world adventuresemanticsconference
 
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
Session 1.2   high-precision, context-free entity linking exploiting unambigu...Session 1.2   high-precision, context-free entity linking exploiting unambigu...
Session 1.2 high-precision, context-free entity linking exploiting unambigu...semanticsconference
 
Session 4.3 semantic annotation for enhancing collaborative ideation
Session 4.3   semantic annotation for enhancing collaborative ideationSession 4.3   semantic annotation for enhancing collaborative ideation
Session 4.3 semantic annotation for enhancing collaborative ideationsemanticsconference
 
Session 1.1 dalicc - data licenses clearance center
Session 1.1   dalicc - data licenses clearance centerSession 1.1   dalicc - data licenses clearance center
Session 1.1 dalicc - data licenses clearance centersemanticsconference
 
Session 1.3 context information management across smart city knowledge domains
Session 1.3   context information management across smart city knowledge domainsSession 1.3   context information management across smart city knowledge domains
Session 1.3 context information management across smart city knowledge domainssemanticsconference
 
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0   aussenac semanticsnl-pwebsem2017-v4Session 0.0   aussenac semanticsnl-pwebsem2017-v4
Session 0.0 aussenac semanticsnl-pwebsem2017-v4semanticsconference
 
Session 0.0 keynote sandeep sacheti - final hi res
Session 0.0   keynote sandeep sacheti - final hi resSession 0.0   keynote sandeep sacheti - final hi res
Session 0.0 keynote sandeep sacheti - final hi ressemanticsconference
 
Session 1.1 linked data applied: a field report from the netherlands
Session 1.1   linked data applied: a field report from the netherlandsSession 1.1   linked data applied: a field report from the netherlands
Session 1.1 linked data applied: a field report from the netherlandssemanticsconference
 
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
Session 1.2   enrich your knowledge graphs: linked data integration with pool...Session 1.2   enrich your knowledge graphs: linked data integration with pool...
Session 1.2 enrich your knowledge graphs: linked data integration with pool...semanticsconference
 
Session 1.4 connecting information from legislation and datasets using a ca...
Session 1.4   connecting information from legislation and datasets using a ca...Session 1.4   connecting information from legislation and datasets using a ca...
Session 1.4 connecting information from legislation and datasets using a ca...semanticsconference
 
Session 1.4 a distributed network of heritage information
Session 1.4   a distributed network of heritage informationSession 1.4   a distributed network of heritage information
Session 1.4 a distributed network of heritage informationsemanticsconference
 
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
Session 0.0   media panel - matthias priem - gtuo - semantics 2017Session 0.0   media panel - matthias priem - gtuo - semantics 2017
Session 0.0 media panel - matthias priem - gtuo - semantics 2017semanticsconference
 
Session 1.3 semantic asset management in the dutch rail engineering and con...
Session 1.3   semantic asset management in the dutch rail engineering and con...Session 1.3   semantic asset management in the dutch rail engineering and con...
Session 1.3 semantic asset management in the dutch rail engineering and con...semanticsconference
 
Session 1.3 energy, smart homes & smart grids: towards interoperability...
Session 1.3   energy, smart homes & smart grids: towards interoperability...Session 1.3   energy, smart homes & smart grids: towards interoperability...
Session 1.3 energy, smart homes & smart grids: towards interoperability...semanticsconference
 
Session 2.3 semantics for safeguarding & security – a police story
Session 2.3   semantics for safeguarding & security – a police storySession 2.3   semantics for safeguarding & security – a police story
Session 2.3 semantics for safeguarding & security – a police storysemanticsconference
 
Session 2.5 semantic similarity based clustering of license excerpts for im...
Session 2.5   semantic similarity based clustering of license excerpts for im...Session 2.5   semantic similarity based clustering of license excerpts for im...
Session 2.5 semantic similarity based clustering of license excerpts for im...semanticsconference
 
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
Session 4.2   unleash the triple: leveraging a corporate discovery interface....Session 4.2   unleash the triple: leveraging a corporate discovery interface....
Session 4.2 unleash the triple: leveraging a corporate discovery interface....semanticsconference
 
Session 5.6 towards a semantic outlier detection framework in wireless sens...
Session 5.6   towards a semantic outlier detection framework in wireless sens...Session 5.6   towards a semantic outlier detection framework in wireless sens...
Session 5.6 towards a semantic outlier detection framework in wireless sens...semanticsconference
 
Session 2.2 ontology-guided job market demand analysis: a cross-sectional s...
Session 2.2   ontology-guided job market demand analysis: a cross-sectional s...Session 2.2   ontology-guided job market demand analysis: a cross-sectional s...
Session 2.2 ontology-guided job market demand analysis: a cross-sectional s...semanticsconference
 
Session 0.0 poster minutes madness
Session 0.0   poster minutes madnessSession 0.0   poster minutes madness
Session 0.0 poster minutes madnesssemanticsconference
 

More from semanticsconference (20)

Linear books to open world adventure
Linear books to open world adventureLinear books to open world adventure
Linear books to open world adventure
 
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
Session 1.2   high-precision, context-free entity linking exploiting unambigu...Session 1.2   high-precision, context-free entity linking exploiting unambigu...
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
 
Session 4.3 semantic annotation for enhancing collaborative ideation
Session 4.3   semantic annotation for enhancing collaborative ideationSession 4.3   semantic annotation for enhancing collaborative ideation
Session 4.3 semantic annotation for enhancing collaborative ideation
 
Session 1.1 dalicc - data licenses clearance center
Session 1.1   dalicc - data licenses clearance centerSession 1.1   dalicc - data licenses clearance center
Session 1.1 dalicc - data licenses clearance center
 
Session 1.3 context information management across smart city knowledge domains
Session 1.3   context information management across smart city knowledge domainsSession 1.3   context information management across smart city knowledge domains
Session 1.3 context information management across smart city knowledge domains
 
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0   aussenac semanticsnl-pwebsem2017-v4Session 0.0   aussenac semanticsnl-pwebsem2017-v4
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
 
Session 0.0 keynote sandeep sacheti - final hi res
Session 0.0   keynote sandeep sacheti - final hi resSession 0.0   keynote sandeep sacheti - final hi res
Session 0.0 keynote sandeep sacheti - final hi res
 
Session 1.1 linked data applied: a field report from the netherlands
Session 1.1   linked data applied: a field report from the netherlandsSession 1.1   linked data applied: a field report from the netherlands
Session 1.1 linked data applied: a field report from the netherlands
 
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
Session 1.2   enrich your knowledge graphs: linked data integration with pool...Session 1.2   enrich your knowledge graphs: linked data integration with pool...
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
 
Session 1.4 connecting information from legislation and datasets using a ca...
Session 1.4   connecting information from legislation and datasets using a ca...Session 1.4   connecting information from legislation and datasets using a ca...
Session 1.4 connecting information from legislation and datasets using a ca...
 
Session 1.4 a distributed network of heritage information
Session 1.4   a distributed network of heritage informationSession 1.4   a distributed network of heritage information
Session 1.4 a distributed network of heritage information
 
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
Session 0.0   media panel - matthias priem - gtuo - semantics 2017Session 0.0   media panel - matthias priem - gtuo - semantics 2017
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
 
Session 1.3 semantic asset management in the dutch rail engineering and con...
Session 1.3   semantic asset management in the dutch rail engineering and con...Session 1.3   semantic asset management in the dutch rail engineering and con...
Session 1.3 semantic asset management in the dutch rail engineering and con...
 
Session 1.3 energy, smart homes & smart grids: towards interoperability...
Session 1.3   energy, smart homes & smart grids: towards interoperability...Session 1.3   energy, smart homes & smart grids: towards interoperability...
Session 1.3 energy, smart homes & smart grids: towards interoperability...
 
Session 2.3 semantics for safeguarding & security – a police story
Session 2.3   semantics for safeguarding & security – a police storySession 2.3   semantics for safeguarding & security – a police story
Session 2.3 semantics for safeguarding & security – a police story
 
Session 2.5 semantic similarity based clustering of license excerpts for im...
Session 2.5   semantic similarity based clustering of license excerpts for im...Session 2.5   semantic similarity based clustering of license excerpts for im...
Session 2.5 semantic similarity based clustering of license excerpts for im...
 
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
Session 4.2   unleash the triple: leveraging a corporate discovery interface....Session 4.2   unleash the triple: leveraging a corporate discovery interface....
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
 
Session 5.6 towards a semantic outlier detection framework in wireless sens...
Session 5.6   towards a semantic outlier detection framework in wireless sens...Session 5.6   towards a semantic outlier detection framework in wireless sens...
Session 5.6 towards a semantic outlier detection framework in wireless sens...
 
Session 2.2 ontology-guided job market demand analysis: a cross-sectional s...
Session 2.2   ontology-guided job market demand analysis: a cross-sectional s...Session 2.2   ontology-guided job market demand analysis: a cross-sectional s...
Session 2.2 ontology-guided job market demand analysis: a cross-sectional s...
 
Session 0.0 poster minutes madness
Session 0.0   poster minutes madnessSession 0.0   poster minutes madness
Session 0.0 poster minutes madness
 

Recently uploaded

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 

Recently uploaded (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 

Session 1.2 improving access to digital content by semantic enrichment

  • 1. Improving access to digital collections by semantic enrichment Theo van Veen and Juliette Lonij, Semantics 2017, 12-09-2017
  • 2. Overview • Motivation and approach • Entity linking • Presentation • Semantic search • User feedback • Wikidata as thesaurus • Conclusions and next steps T. v. Veen and J. Lonij, Semantics 2017
  • 3. Motivation and approach T. v. Veen and J. Lonij, Semantics 2017
  • 4. Motivation • Improving discovery and usability requires intelligent connection of content to the outside world. • Content contains “knowledge” requiring intelligent preprocessing to be found. • Knowledge should be offered to the user more or less unsolicited • Our software should have read, analyzed and enriched our content completely prior to the user!! T. v. Veen and J. Lonij, Semantics 2017
  • 5. Enrichment: purpose and approach • making content better findable and usable, especially newspaper articles • by enriching text and names in the text with links to related information • which is in most cases linked data (links to Wikidata, Polygoon news reels, images) • and which enables advanced queries and presentation of context information T. v. Veen and J. Lonij, Semantics 2017
  • 6. How? • “Things” in text have to be uniquely identified • When identifiers link to resource descriptions it is possible to present context information about “things” • Relevant context information can be indexed as part of a “thing” and so it can be searched for • Using properties in external resource descriptions enable semantic search 1. Identification 2. Context 3. Indexing 4. Semantic search T. v. Veen and J. Lonij, Semantics 2017
  • 7. Enrichment types • Newspaper articles and radio bulletins linked to Polygoon newsreels • Named entities linked to DBpedia (and Wikidata, VIAF etc.) • Place-street combinations in newspaper articles linked to latitude and longitude • Newspaper articles linked to images from the Memory of the Netherlands Named entities Geodata Links Extracted features User annotation Image enrichment DBpedia Street, place, latt., long. Web pages Classification Tags Face recognition Wikidata Place, latt., long. Video Sentiment Stories (oral history) Emotion detection VIAF Images Relevance Object detection Geonames Sound Interestingness Structure detection Etc. Now available T. v. Veen and J. Lonij, Semantics 2017
  • 8. https://www.wikidata.org/wiki/Q44306 WIKIDATA http://viaf.org/viaf/29540187/ VIAF http://www.isni.org/isni/0000000122778487 ISNI http://data.kb.nl/thesaurus/068941056 KB thesaurus • For bibliographic data trusted links are available: from thesaurus to VIAF, from VIAF to ISNI and from ISNI to Wikidata • For persons, events and locations in text the links have to be created T. v. Veen and J. Lonij, Semantics 2017 We need:
  • 9. How do we deal with names in text? • Recognize names (named entity recognition) • Identify names by searching them in DBpedia and link the names to the DBpedia descriptions • Those names are ambiguous: does Einstein link to Albert Einstein or Alfred Einstein? We need disambiguation algorithms. • The accuracy of the links will be improved by machine learning techniques; conventional “if then else” software isn’t fit for this job • We need user feedback to correct false links or add missing links and this will be used for additional training T. v. Veen and J. Lonij, Semantics 2017
  • 10. Entity linking T. v. Veen and J. Lonij, Semantics 2017
  • 11. Named Entity Linking DBpediaSolr Index DBpedia Search entity Named Entity recognition List with Einsteins Enrichment database Enrichment and training process article Get entities Store article id + resource ids Find the best candidate VIAF Wikidata Etc. T. v. Veen and J. Lonij, Semantics 2017
  • 12. Enrichment process articles Enrichment database KB research environment SURFsara HPC environment DBpedia index Enrichment infrastructure using SURFsara HPC cloud disambiguation Named entity recognition search T. v. Veen and J. Lonij, Semantics 2017
  • 13. Continuous improvement of enrichment algorithm article number / time 80 1 108 mlj • All DBpedia titles searched in news articles • Named Entities searched in DBpedia • Speedup by using HPC cloud SURFsara • Using context and machine learning Quality/confidence(%) 70 T. v. Veen and J. Lonij, Semantics 2017 90 At the end cycle to first article and overwrite earlier enrichments with newest algorithm
  • 14. algorithm accuracy link recall link precision link F-measure Rule based .76 .76 .65 .70 Machine learning (SVM) .84 .76 .83 .79 Neural network .84 .73 .87 .79 Extra features e.g. word embedding .85 .81 .82 .82 Extra Wikidata data, more training data .87 .81 .86 .84 Entity embedding .88 .86 .85 .85 From conventional entity linking to deep learning and beyond T. v. Veen and J. Lonij, Semantics 2017
  • 15. Development cycle Justification: Our aim is obtaining a higher quality than existing entity linking software (e.g. DBpedia Spotlight) Trust/quality Stored LNE’s Running algorithm Algorithm in development Enriched by users Target trust level T. v. Veen and J. Lonij, Semantics 2017 train replace improve Example of comparison of stored LNE’s, result of current algorithm, result of algorithm under development and existing software.
  • 16. Presentation T. v. Veen and J. Lonij, Semantics 2017
  • 18. Naam en/of datum • Theo van Veen, 16-6-2016
  • 19.
  • 20. Research portal Identified names (LNE) and other enrichments Configurable services Context information and extra navigation options for a name
  • 21. Semantic search T. v. Veen and J. Lonij, Semantics 2017
  • 22. Sematic search: index resource identifiers Newspaper index Text + Viaf id + Wikidata id etc. Enrichment database Indexing Get text for article X Get enrichments for article X search articles with wikidata id’s Wikidata Semantic search (SPARQL) providing wikidata id’s T. v. Veen and J. Lonij, Semantics 2017
  • 23. Articles mentioning members of parliament not born in the Netherlands SELECT ?p WHERE { ?p wdt:P39 wd:Q18887908 . ?p wdt:P19 ?place . ?place wdt:P17 ?country . FILTER NOT EXISTS { ?place wdt:P17 wd:Q55 . } }
  • 24. For the same query in the catalogue the Wikidata identifier is converted to the local thesaurus identifier
  • 25. • Semantic query between [ ], in this case expand to all Roman Emperors • Select “newspaper+” collection • Select a result • Click on a linked named entity for more information • Click on “More info” for properties of this entity • Click on a property for searching more articles about resources with that property • And see the result: all articles mentioning persons that have been married to Elizabeth Taylor Navigation example Using square brackets the software tries a few Wikidata SPARQL queries and replaces this string by the Wikidata results.
  • 26. • Semantic query between [ ], in this case expand to all Roman Emperors • Select “newspaper+” collection • Select a result • Click on a linked named entity for more information • Click on “More info” for properties of this entity • Click on a property for searching more articles about resources with that property • And see the result: all articles mentioning persons that have been married to Elizabeth Taylor Navigation example
  • 27. • Semantic query between [ ], in this case expand to all Roman Emperors • Select “newspaper+” collection • Select a result • Click on a linked named entity for more information • Click on “More info” for properties of this entity • Click on a property for searching more articles about resources with that property • And see the result: all articles mentioning persons that have been married to Elizabeth Taylor Navigation example
  • 28. • Semantic query between [ ], in this case expand to all Roman Emperors • Select “newspaper+” collection • Select a result • Click on a linked named entity for more information • Click on “More info” for properties of this entity • Click on a property for searching more articles about resources with that property • And see the result: all articles mentioning persons that have been married to Elizabeth Taylor Navigation example
  • 29. • Semantic query between [ ], in this case expand to all Roman Emperors • Select “newspaper+” collection • Select a result • Click on a linked named entity for more information • Click on “More info” for properties of this entity • Click on a property for searching more articles about resources with that property • And see the result: all articles mentioning persons that have been married to Elizabeth Taylor Navigation example
  • 30. spouse=Elizabeth Taylor • Semantic query between [ ], in this case expand to all Roman Emperors • Select “newspaper+” collection • Select a result • Click on a linked named entity for more information • Click on “More info” for properties of this entity • Click on a property for searching more articles about resources with that property • And see the result: all articles mentioning persons that have been married to Elizabeth Taylor Navigation example
  • 31. User feedback T. v. Veen and J. Lonij, Semantics 2017
  • 32. User feedback is needed for correcting false links and adding new links!
  • 33.
  • 34.
  • 35.
  • 36. This feedback serves as additional training data for the disambiguation software
  • 37. Wikidata as thesaurus T. v. Veen and J. Lonij, Semantics 2017
  • 38. Wikidata as central hub? W W “Everything links to everything” “Everything links to Wikidata”
  • 39. Current situation: many to many links (many identifiers for single resource) Proposed: everything links to Wikidata (same identifier for single resource) Wikidata as universal thesaurus for libraries T. v. Veen and J. Lonij, Semantics 2017
  • 40. Conclusions and next steps T. v. Veen and J. Lonij, Semantics 2017
  • 41. Conclusions and next steps • Entity linking combining machine learning and domain knowledge is promising and we still have ideas for improvements • We have shown the added value of linking named entities to Wikidata/DBpedia: it improves findability and usability of content as demonstrated with the research portal • Our aim is to increase the confidence of links so users can trust them “enough” for using them for searching and research • User feedback provides additional training data and needs to be deployed on a larger scale T. v. Veen and J. Lonij, Semantics 2017