Presentation of the tutorial session at DI4R conference in Krakov (Sept. 2016), by Sahar Vahdati & Giorgos Alexiou. Title: Making Use of the Linked Open Data Services for OpenAIRE: Querying Data about Research Results, Persons, Projects and Organisations
Making Use of Linked Open Data Services for OpenAIRE
1. Sahar Vahdati
Christoph Lange
Giorgos Alexiou
George Papastefanatos
Making Use of the Linked Open Data
Services for OpenAIRE:
Querying Data about Research Results, Person, Projects and
Organizations
Digital Infrastructure for Research (DI4R)
28-30 September 2016
Krakau, Poland
University of Bonn, Germany Athena Research Center
3. Open Access Infrastructure for Research
in Europe
Need for digital research infrastructures for all kinds of
research outputs, across disciplines and countries!
•comprises a database of all EC FP7 and H2020 funded research projects, publications, datasets
•manages scientific publications and associated scientific material
•aggregates Open Access publications and links them to research data and funding bodies
•supports the Open Access principles via national helpdesks and comprehensive guidelines
http://www.openaire.eu
4. OpenAIRE Services
OpenAIRE focuses on:
• Workflows and processes of scholarly communication rather than resources,
• Research data and other research outputs rather than only publications,
• The links between considered entities,
• Relationship of European OA infrastructures with other regions of the world.
enables search, discovery and monitoring of the publications and
datasets resulting from:
>100k research projects
>17m publications
>23k datasets
>5k repositories.
6. Example of data about Core Entities
Entity type Result
openaireID od_______908::fac3db85bbcb1f52ae07c5868d8fb453
dateOfTransformation 2015-02-06
dateOfCollection 2015-02-06
title A Patient from Argentina Infected with Rickettsia massiliae
Dateofacceptance 01/04/2010
Publisher The American Society of Tropical Medicine and Hygiene
Pid oai:europepmc.org:2077077;PMC2844561
Language English
Subject Articles
BestLicense Open Acces
An entity of type Result
7. Interlink to other databases
Support researchers by answering interesting queries
The OpenAIRE vision:
• Data about scientific events emergence of scientific topics
• Data about people affiliation impact of certain research
8. Use cases:
• Research managers use new indicators for measuring the quality
• Policy makers get a quick overview of the findings and projects
• Researchers find comprehensive citations list, research movement between communities/organizations
• Reviewers get a quick overview of the field covered by the paper or dataset under review
9. Challenges supported by LOD Services
Linked Open Data
(LOD)
RDF data model
Publishing the OpenAIRE data as Linked Open Data and linking it to related datasets!
• Diverse data formats
• Various means to access/query data
• Use of different identifiers
• Heterogeneity of metadata schemas
10. Expected values
• Open up a window to the Linked Open Data Web
• Increase the OpenAIRE technical interoperability
• Increase the reusability of the OpenAIRE research metadata
• Engage with additional user communities
• Explore synergies with and added value to related open content initiatives
• Provide links through LOD to similar infrastructures
• Offer new services for OA data monitoring activities
• Provide services to export the OpenAIRE objects as a LOD graph
• Facilitate integration with other LOD graphs relative to similar systems and infrastructures
• Find patterns to enrich the OpenAIRE information space
Exposing the OpenAIRE Information Space as linked data!
11. Towards OpenAIRE LOD Services
Phase 1: LOD Production
Phase 1: Interlinking OpenAIRE RDF Graph to LOD cloud
12. Steps:
• Specify an RDF vocabulary
• Specify terms and namespaces
• Map the OA data model to an RDF data model
• Map the OA data to an statistic RDF dump
• Specify strategies to automate the RDF generation
OA RDF graph
…
@prefix oad: <http://lod.openaire.eu/data/> .
@prefix oav: <http://lod.openaire.eu/vocab#> .
@prefix dbpedia-owl: http://dbpedia.org/ontology/.
@prefix vivo: <http://vivoweb.org/files/vivo-isf-public-1.6.owl#> .
@prefix pext: <http://www.ontotext.com/proton-ontology/#> .
@prefix swrc:<http://swrc.ontoware.org/ontology#> .
oad:07553d8e646b69b868a9791da39a1802 a foaf:Person;
foaf:firstName "P."^^xsd:string; foaf:lastName "Jha"^^xsd:string;
foaf:name "Jha, P."^^xsd:string; oav:isAuthorOf .
oad:755469c995c2cb6cb55c3483634b026 a foaf:Person;
oav:hasTarget resultdoajarticles_6fcd7b3b47ebbd05ce73018731ff9095;
oav:hasLabel "personResult_authorship_isAuthorOf"^^xsd:string;
oav:ranking "6"^^xsd:integer.
oad:075558cd104f737d82a34cb7e9fecd7d a foaf:Person;
foaf:firstName "T."^^xsd:string; foaf:lastName "Bere"^^xsd:string;
foaf:name "Bere, T."^^xsd:string.
…
OpenAIRE data
OA RDF
Phase 1.
LOD Production
Core entities
Linking entities
14. Organizations Results* Persons Datasources Projects
68.526 17,414,766 62,958,315 19,443 624,417
*including duplicates connected with sameAs
Total Number of Triples: 1,013,527,855
Distinct Entities: 98,256
OpenAIRE data as RDF Graph
15. Steps:
• Identify datasets to be interlinked to
• Select interlinking tools: LIMES, Silk
• Test interlinking OA with DBLP and DBpedia
• Evaluate resulting link sets
• Specify strategy for interlinking in OA workflow
DBLP
CiteSeer
CEUR Ope
Pu
lAK A
Phase2. Interlinking OA-RDF Graph to
LOD cloud
…
@prefix oad: <http://lod.openaire.eu/data/> .
@prefix oav: <http://lod.openaire.eu/vocab#> .
@prefix dbpedia-owl: http://dbpedia.org/ontology/.
.
oad:07553d8e646b69b868a9791da39a1802 a foaf:Person; foaf:firstName
"P."^^xsd:string; foaf:lastName "Jha"^^xsd:string; foaf:name "Jha, P."^^xsd:string;
oav:isAuthorOf .
oad:755469c995c2cb6cb55c3483634b026 a foaf:Person;
oav:hasTarget resultdoajarticles_6fcd7b3b47ebbd05ce73018731ff9095;
oav:hasLabel "personResult_authorship_isAuthorOf"^^xsd:string;
oav:ranking "6"^^xsd:integer.
OA LOD
Linked Open Data
(LOD)
http://beta.lod.openaire.eu/
16. RDF (Resource Description Framework)
• Resource : anything uniquely identifiable
• Description: description of resource via representing properties and relations
• Framework: web-based protocols and semantics
• RDF triples: List of statements
Subject (URI)
Predicate (URI)
Object (URI or Literal)
oad:publication1
“Juan Carlos García“
oav:hasAuthor
17. RDF version of example
PREFIX dcterms: <http://purl.org/dc/terms/>
…
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX cerif: <http://www.eurocris.org/ontologies/cerif/1.3#>
PREFIX prov: <http://www.w3.org/ns/prov#
:od_______908::… rdf:type cerif:ResultEntity;
dcterms:description “ The first confirmed case “;
dcterms:publisher “The American Society of Tropical
Medicine and Hygiene”;
…
oav:resultSubject “Articles“;
oav:dateOfCollection 2015-02-06.
.
18. Example of data about Linking entities
An entity of type Person_Result whose ranking property can have the value 1 to
indicate the first author.
od_______908::f39…1c4a PersonResult od_______908::fa3...b453
Rdf:type foaf:Person;
oav:rank 1.
Rdf:type cerif:ResultEntity
19. How to query RDF?
SPARQL (Protocol and RDF Query Language)
• Query language of RDF-based data
• SPARQL endpoint: RDF-triple database on a server available on the Web
• Pattern matching language
• Protocol layer
• Query interface
20. How to query?
• SPARQL variables are bound to RDF terms e.g., ?title , ?author
• Inspired by SQL via SELECT statement
Example: SELECT ?title ?author
• Return as a table
?title ?author
A Patient from Argentina Infected with Rickettsia massiliae Juan Carlos García
21. OpenAIRE as LOD
• OA LOD in BETA version
• Triples per entity
• Online data: SPARQL endpoint
• Offline data: RDF dump
• Entities and URIs (interactive browsing)
• Dereferenceable URIs for all entities
http://www. beta.lod.openaire.eu
22. Steps:
• Specify an RDF vocabulary
• Specify terms and namespaces
• Map the OA data model to an RDF data model
• Map the OA data to an statistic RDF dump
• Specify strategies to automate the RDF generation
Data conforming to LOD best practices
published in BETA, December 2015
Main entitiesLinking entities
http://beta.lod.openaire.eu/
OA RDF graph
…
@prefix oad: <http://lod.openaire.eu/data/> .
@prefix oav: <http://lod.openaire.eu/vocab#> .
@prefix dbpedia-owl: http://dbpedia.org/ontology/.
@prefix vivo: <http://vivoweb.org/files/vivo-isf-public-1.6.owl#> .
@prefix pext: <http://www.ontotext.com/proton-ontology/#> .
@prefix swrc:<http://swrc.ontoware.org/ontology#> .
oad:07553d8e646b69b868a9791da39a1802 a foaf:Person;
foaf:firstName "P."^^xsd:string; foaf:lastName "Jha"^^xsd:string;
foaf:name "Jha, P."^^xsd:string; oav:isAuthorOf .
oad:755469c995c2cb6cb55c3483634b026 a foaf:Person;
oav:hasTarget resultdoajarticles_6fcd7b3b47ebbd05ce73018731ff9095;
oav:hasLabel "personResult_authorship_isAuthorOf"^^xsd:string;
oav:ranking "6"^^xsd:integer.
oad:075558cd104f737d82a34cb7e9fecd7d a foaf:Person;
foaf:firstName "T."^^xsd:string; foaf:lastName "Bere"^^xsd:string;
foaf:name "Bere, T."^^xsd:string.
…
OpenAIRE data
OA RDF
23. Sample query
select (count (distinct ?s) as ?count) ?flevel from <test> from <relationsTest>
where {?s a <http://www.eurocris.org/ontologies/cerif/1.3#Project>;
<http://lod.openaire.eu/vocab/fundingLevel0> ?flevel} GROUP BY ?flevel order by
?count
Number of publications with their corresponding funding level
25. Steps:
• Identify datasets to be interlinked to
• Select interlinking tools: LIMES, Silk
• Test interlinking OA with DBLP and DBpedia
• Evaluate resulting link sets
• Specify strategy for interlinking in OA workflow
DBLP
CiteSeer
CEUR Ope
Pu
lAK A
Interlinking OpenAIRE RDF
Graph to LOD cloud
…
@prefix oad: <http://lod.openaire.eu/data/> .
@prefix oav: <http://lod.openaire.eu/vocab#> .
@prefix dbpedia-owl: http://dbpedia.org/ontology/.
@prefix vivo: <http://vivoweb.org/files/vivo-isf-public-1.6.owl#> .
@prefix pext: <http://www.ontotext.com/proton-ontology/#> .
@prefix swrc:<http://swrc.ontoware.org/ontology#> .
oad:07553d8e646b69b868a9791da39a1802 a foaf:Person;
foaf:firstName "P."^^xsd:string; foaf:lastName "Jha"^^xsd:string;
foaf:name "Jha, P."^^xsd:string; oav:isAuthorOf .
oad:755469c995c2cb6cb55c3483634b026 a foaf:Person;
oav:hasTarget resultdoajarticles_6fcd7b3b47ebbd05ce73018731ff9095;
oav:hasLabel "personResult_authorship_isAuthorOf"^^xsd:string;
oav:ranking "6"^^xsd:integer.
oad:075558cd104f737d82a34cb7e9fecd7d a foaf:Person;
foaf:firstName "T."^^xsd:string; foaf:lastName "Bere"^^xsd:string;
foaf:name "Bere, T."^^xsd:string.
…
OA LOD
Linked Open Data
(LOD)
http://beta.lod.openaire.eu/
26. OA LOD interlinking workflow
Preprocessing
• Process all the dumps from candidate datasets
• Prune useless metadata
• Transform the metadata to key-value pairs(hadoop key(ID)-
value([Properties]))
• Store in HDFS
27. Sample interlinking result
Result of interlinking is a set of links between URIs from source and
target dataset:
DBLP dump is not complete
<http://lod.openaire...bde783> owl:sameAs <http://dblp.l3s.../BoissonnatN96>
<http://lod.openaire...4f8964> owl:sameAs <http://dblp.l3s.../Shrobe96>
<http://lod.openaire...27fea2> owl:sameAs <http://dblp.l3s.../X96c>
<http://lod.openaire...f433b9> owl:sameAs <http://dblp.l3s.../LiroyG96>
28. DBLP
CiteSeer
CEUR Ope
Pu
lAK A
…
@prefix oad: <http://lod.openaire.eu/data/> .
@prefix oav: <http://lod.openaire.eu/vocab#> .
@prefix dbpedia-owl: http://dbpedia.org/ontology/.
@prefix vivo: <http://vivoweb.org/files/vivo-isf-public-1.6.owl#> .
@prefix pext: <http://www.ontotext.com/proton-ontology/#> .
@prefix swrc:<http://swrc.ontoware.org/ontology#> .
oad:07553d8e646b69b868a9791da39a1802 a foaf:Person;
foaf:firstName "P."^^xsd:string; foaf:lastName "Jha"^^xsd:string;
foaf:name "Jha, P."^^xsd:string; oav:isAuthorOf .
oad:755469c995c2cb6cb55c3483634b026 a foaf:Person;
oav:hasTarget resultdoajarticles_6fcd7b3b47ebbd05ce73018731ff9095;
oav:hasLabel "personResult_authorship_isAuthorOf"^^xsd:string;
oav:ranking "6"^^xsd:integer.
oad:075558cd104f737d82a34cb7e9fecd7d a foaf:Person;
foaf:firstName "T."^^xsd:string; foaf:lastName "Bere"^^xsd:string;
foaf:name "Bere, T."^^xsd:string.
…
OA LOD
Linked Open Data
(LOD)
Ideas for LOD in Monitoring
monitoring interlinking:
when the target dataset grows from one
version to another one,
we can expect the linkset to grow as well
29. Scientific events
Bootstrapping datasets for scientific events:
• CEUR-WS.org dataset
• OpenResearch.org
• Include events in OA Data Model (Conference Object?)
• Measure the quality of events
• Related to funding and sponsoring
• Continuality
• Accepted project publications
• Reputation of people
• Location
• Citation
• …
32. Example: What is the overall research output
of a given project?
oav:produces and UNION are not working:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX oav: <http://lod.openaire.eu/vocab/>
PREFIX cerif: http://www.eurocris.org/ontologies/cerif/1.3#
SELECT ?x ?y
WHERE
{
?y a cerif:ResultEntity
{ ?y oav:resultType 'dataset'}
UNION
{ ?y oav:resultType 'publication'}
?x a cerif:Project.
?y cerif:linkToProject ?y
} LIMIT 10
33. PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX oav: <http://lod.openaire.eu/vocab/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?o
WHERE
{
?x oav:projectOrganization ?o.
?o a foaf:Organization.
?y oav:projectOrganization ?o2.
?o2 a foaf:Organization.
FILTER (sameTerm(?o, ?o2) && !sameTerm(?x, ?y))
} LIMIT 10
Example: What organizations are more
active than others w.r.t. projects?
34. PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX oav: <http://lod.openaire.eu/vocab/>
PREFIX cerif: <http://www.eurocris.org/ontologies/cerif/1.3#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?y
WHERE
{
?p cerif:linksToPerson ?x
?x a foaf:Person.
?x dcterms:creator ?y.
?y oav:resultType "dataset"
} LIMIT 10
Example: What datasets has published by a
specific person who involved in a given
project?
35. PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX oav: <http://lod.openaire.eu/vocab/>
PREFIX cerif: <http://www.eurocris.org/ontologies/cerif/1.3#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?y
WHERE
{
?p cerif:linksToPerson ?x
?x a foaf:Person.
?x dcterms:creator ?y.
?y oav:resultType "dataset"
}
LIMIT 10
Example: List the full names of all authors
who have (co-)authored a publication in
project P?
Editor's Notes
UBONN together with ARC develops the LOD services for OA.
CNR provides technical support for synchronizing content of the OpenAIRE Information Space with LOD services and vice versa
The why!
Reviewer: The OpenAIRE LOD itself has information about the subject of a paper or a dataset, which can be linked to subject classification schemes such as the ACM CCS. Furthermore, CiteSeer provides citation graphs of papers. We can thus offer to peer reviewers a service that finds papers or datasets similar to the one under review.
Researcher: A service similar to the one for peer reviewers explained above could be offered to authors.
move in the community, e.g., to other organizations. Use case 7: Having access to the networks of a paper's authors and their organizations, and furthermore taking into account the events in which people participate enables new indicators for measuring the quality and relevance of research that are not just based on counting citations.
To be able to give such services, you need to deal with diverse data format…. LOD support solving such challenges easily…
Explore synergies with and added value to related open content initiatives
(e.g. in the Open Educational Resources)
Find patterns to enrich the OpenAIRE information space by exploiting the enrichments inherited by third-party re-use of its LOD graph representation
The how!
Here show how to map
First let me tell you what is RDF data model!
The result of mapping
Repeated just for reminding
(potential research paper e.g., for ESWC 2017 Linked Data Track; cfp. http://2016.eswc-conferences.org/call-papers#3)