Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Sahar Vahdati
Christoph Lange
Giorgos Alexiou
George Papastefanatos
Making Use of the Linked Open Data
Services for OpenAI...
Session outline
• Introduction to OpenAIRE
• Technical Concepts
• Hands on Session
Open Access Infrastructure for Research
in Europe
Need for digital research infrastructures for all kinds of
research outp...
OpenAIRE Services
OpenAIRE focuses on:
• Workflows and processes of scholarly communication rather than resources,
• Resea...
Core entities
Linking entities
OpenAIRE Data Model
Example of data about Core Entities
Entity type Result
openaireID od_______908::fac3db85bbcb1f52ae07c5868d8fb453
dateOfTra...
Interlink to other databases
Support researchers by answering interesting queries
The OpenAIRE vision:
• Data about scient...
Use cases:
• Research managers  use new indicators for measuring the quality
• Policy makers  get a quick overview of th...
Challenges supported by LOD Services
Linked Open Data
(LOD)
RDF data model
Publishing the OpenAIRE data as Linked Open Dat...
Expected values
• Open up a window to the Linked Open Data Web
• Increase the OpenAIRE technical interoperability
• Increa...
Towards OpenAIRE LOD Services
Phase 1: LOD Production
Phase 1: Interlinking OpenAIRE RDF Graph to LOD cloud
Steps:
• Specify an RDF vocabulary
• Specify terms and namespaces
• Map the OA data model to an RDF data model
• Map the O...
Specify vocabularies
Organizations Results* Persons Datasources Projects
68.526 17,414,766 62,958,315 19,443 624,417
*including duplicates conn...
Steps:
• Identify datasets to be interlinked to
• Select interlinking tools: LIMES, Silk
• Test interlinking OA with DBLP ...
RDF (Resource Description Framework)
• Resource : anything uniquely identifiable
• Description: description of resource vi...
RDF version of example
PREFIX dcterms: <http://purl.org/dc/terms/>
…
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-...
Example of data about Linking entities
An entity of type Person_Result whose ranking property can have the value 1 to
indi...
How to query RDF?
SPARQL (Protocol and RDF Query Language)
• Query language of RDF-based data
• SPARQL endpoint: RDF-tripl...
How to query?
• SPARQL variables are bound to RDF terms e.g., ?title , ?author
• Inspired by SQL via SELECT statement
Exam...
OpenAIRE as LOD
• OA LOD in BETA version
• Triples per entity
• Online data: SPARQL endpoint
• Offline data: RDF dump
• En...
Steps:
• Specify an RDF vocabulary
• Specify terms and namespaces
• Map the OA data model to an RDF data model
• Map the O...
Sample query
select (count (distinct ?s) as ?count) ?flevel from <test> from <relationsTest>
where {?s a <http://www.euroc...
General architecture
OpenAIRE Metadata
RDFization
Interlinking RDF Store
Deduplication & Inference
Apache Solr
https://www...
Steps:
• Identify datasets to be interlinked to
• Select interlinking tools: LIMES, Silk
• Test interlinking OA with DBLP ...
OA LOD interlinking workflow
Preprocessing
• Process all the dumps from candidate datasets
• Prune useless metadata
• Tran...
Sample interlinking result
Result of interlinking is a set of links between URIs from source and
target dataset:
DBLP dump...
DBLP
CiteSeer
CEUR Ope
Pu
lAK A
…
@prefix oad: <http://lod.openaire.eu/data/> .
@prefix oav: <http://lod.openaire.eu/vocab...
Scientific events
Bootstrapping datasets for scientific events:
• CEUR-WS.org dataset
• OpenResearch.org
• Include events ...
Hands on
http://beta.lod.openaire.eu/sparql
Example: What is the overall research output
of a given project?
oav:produces and UNION are not working:
PREFIX rdf: <http...
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX oav: <http://lod.openaire.eu/vocab/>
PREFIX foaf: <http:/...
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX oav: <http://lod.openaire.eu/vocab/>
PREFIX cerif: <http:...
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX oav: <http://lod.openaire.eu/vocab/>
PREFIX cerif: <http:...
Upcoming SlideShare
Loading in …5
×

Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial session)

496 views

Published on

Presentation of the tutorial session at DI4R conference in Krakov (Sept. 2016), by Sahar Vahdati & Giorgos Alexiou. Title: Making Use of the Linked Open Data Services for OpenAIRE: Querying Data about Research Results, Persons, Projects and Organisations

Published in: Internet
  • Be the first to comment

Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial session)

  1. 1. Sahar Vahdati Christoph Lange Giorgos Alexiou George Papastefanatos Making Use of the Linked Open Data Services for OpenAIRE: Querying Data about Research Results, Person, Projects and Organizations Digital Infrastructure for Research (DI4R) 28-30 September 2016 Krakau, Poland University of Bonn, Germany Athena Research Center
  2. 2. Session outline • Introduction to OpenAIRE • Technical Concepts • Hands on Session
  3. 3. Open Access Infrastructure for Research in Europe Need for digital research infrastructures for all kinds of research outputs, across disciplines and countries! •comprises a database of all EC FP7 and H2020 funded research projects, publications, datasets •manages scientific publications and associated scientific material •aggregates Open Access publications and links them to research data and funding bodies •supports the Open Access principles via national helpdesks and comprehensive guidelines http://www.openaire.eu
  4. 4. OpenAIRE Services OpenAIRE focuses on: • Workflows and processes of scholarly communication rather than resources, • Research data and other research outputs rather than only publications, • The links between considered entities, • Relationship of European OA infrastructures with other regions of the world. enables search, discovery and monitoring of the publications and datasets resulting from: >100k research projects >17m publications >23k datasets >5k repositories.
  5. 5. Core entities Linking entities OpenAIRE Data Model
  6. 6. Example of data about Core Entities Entity type Result openaireID od_______908::fac3db85bbcb1f52ae07c5868d8fb453 dateOfTransformation 2015-02-06 dateOfCollection 2015-02-06 title A Patient from Argentina Infected with Rickettsia massiliae Dateofacceptance 01/04/2010 Publisher The American Society of Tropical Medicine and Hygiene Pid oai:europepmc.org:2077077;PMC2844561 Language English Subject Articles BestLicense Open Acces An entity of type Result
  7. 7. Interlink to other databases Support researchers by answering interesting queries The OpenAIRE vision: • Data about scientific events  emergence of scientific topics • Data about people affiliation  impact of certain research
  8. 8. Use cases: • Research managers  use new indicators for measuring the quality • Policy makers  get a quick overview of the findings and projects • Researchers  find comprehensive citations list, research movement between communities/organizations • Reviewers  get a quick overview of the field covered by the paper or dataset under review
  9. 9. Challenges supported by LOD Services Linked Open Data (LOD) RDF data model Publishing the OpenAIRE data as Linked Open Data and linking it to related datasets! • Diverse data formats • Various means to access/query data • Use of different identifiers • Heterogeneity of metadata schemas
  10. 10. Expected values • Open up a window to the Linked Open Data Web • Increase the OpenAIRE technical interoperability • Increase the reusability of the OpenAIRE research metadata • Engage with additional user communities • Explore synergies with and added value to related open content initiatives • Provide links through LOD to similar infrastructures • Offer new services for OA data monitoring activities • Provide services to export the OpenAIRE objects as a LOD graph • Facilitate integration with other LOD graphs relative to similar systems and infrastructures • Find patterns to enrich the OpenAIRE information space Exposing the OpenAIRE Information Space as linked data!
  11. 11. Towards OpenAIRE LOD Services Phase 1: LOD Production Phase 1: Interlinking OpenAIRE RDF Graph to LOD cloud
  12. 12. Steps: • Specify an RDF vocabulary • Specify terms and namespaces • Map the OA data model to an RDF data model • Map the OA data to an statistic RDF dump • Specify strategies to automate the RDF generation OA RDF graph … @prefix oad: <http://lod.openaire.eu/data/> . @prefix oav: <http://lod.openaire.eu/vocab#> . @prefix dbpedia-owl: http://dbpedia.org/ontology/. @prefix vivo: <http://vivoweb.org/files/vivo-isf-public-1.6.owl#> . @prefix pext: <http://www.ontotext.com/proton-ontology/#> . @prefix swrc:<http://swrc.ontoware.org/ontology#> . oad:07553d8e646b69b868a9791da39a1802 a foaf:Person; foaf:firstName "P."^^xsd:string; foaf:lastName "Jha"^^xsd:string; foaf:name "Jha, P."^^xsd:string; oav:isAuthorOf . oad:755469c995c2cb6cb55c3483634b026 a foaf:Person; oav:hasTarget resultdoajarticles_6fcd7b3b47ebbd05ce73018731ff9095; oav:hasLabel "personResult_authorship_isAuthorOf"^^xsd:string; oav:ranking "6"^^xsd:integer. oad:075558cd104f737d82a34cb7e9fecd7d a foaf:Person; foaf:firstName "T."^^xsd:string; foaf:lastName "Bere"^^xsd:string; foaf:name "Bere, T."^^xsd:string. … OpenAIRE data OA RDF Phase 1. LOD Production Core entities Linking entities
  13. 13. Specify vocabularies
  14. 14. Organizations Results* Persons Datasources Projects 68.526 17,414,766 62,958,315 19,443 624,417 *including duplicates connected with sameAs Total Number of Triples: 1,013,527,855 Distinct Entities: 98,256 OpenAIRE data as RDF Graph
  15. 15. Steps: • Identify datasets to be interlinked to • Select interlinking tools: LIMES, Silk • Test interlinking OA with DBLP and DBpedia • Evaluate resulting link sets • Specify strategy for interlinking in OA workflow DBLP CiteSeer CEUR Ope Pu lAK A Phase2. Interlinking OA-RDF Graph to LOD cloud … @prefix oad: <http://lod.openaire.eu/data/> . @prefix oav: <http://lod.openaire.eu/vocab#> . @prefix dbpedia-owl: http://dbpedia.org/ontology/. . oad:07553d8e646b69b868a9791da39a1802 a foaf:Person; foaf:firstName "P."^^xsd:string; foaf:lastName "Jha"^^xsd:string; foaf:name "Jha, P."^^xsd:string; oav:isAuthorOf . oad:755469c995c2cb6cb55c3483634b026 a foaf:Person; oav:hasTarget resultdoajarticles_6fcd7b3b47ebbd05ce73018731ff9095; oav:hasLabel "personResult_authorship_isAuthorOf"^^xsd:string; oav:ranking "6"^^xsd:integer. OA LOD Linked Open Data (LOD) http://beta.lod.openaire.eu/
  16. 16. RDF (Resource Description Framework) • Resource : anything uniquely identifiable • Description: description of resource via representing properties and relations • Framework: web-based protocols and semantics • RDF triples: List of statements Subject (URI) Predicate (URI) Object (URI or Literal) oad:publication1 “Juan Carlos García“ oav:hasAuthor
  17. 17. RDF version of example PREFIX dcterms: <http://purl.org/dc/terms/> … PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX cerif: <http://www.eurocris.org/ontologies/cerif/1.3#> PREFIX prov: <http://www.w3.org/ns/prov# :od_______908::… rdf:type cerif:ResultEntity; dcterms:description “ The first confirmed case “; dcterms:publisher “The American Society of Tropical Medicine and Hygiene”; … oav:resultSubject “Articles“; oav:dateOfCollection 2015-02-06. .
  18. 18. Example of data about Linking entities An entity of type Person_Result whose ranking property can have the value 1 to indicate the first author. od_______908::f39…1c4a PersonResult od_______908::fa3...b453 Rdf:type foaf:Person; oav:rank 1. Rdf:type cerif:ResultEntity
  19. 19. How to query RDF? SPARQL (Protocol and RDF Query Language) • Query language of RDF-based data • SPARQL endpoint: RDF-triple database on a server available on the Web • Pattern matching language • Protocol layer • Query interface
  20. 20. How to query? • SPARQL variables are bound to RDF terms e.g., ?title , ?author • Inspired by SQL via SELECT statement Example: SELECT ?title ?author • Return as a table ?title ?author A Patient from Argentina Infected with Rickettsia massiliae Juan Carlos García
  21. 21. OpenAIRE as LOD • OA LOD in BETA version • Triples per entity • Online data: SPARQL endpoint • Offline data: RDF dump • Entities and URIs (interactive browsing) • Dereferenceable URIs for all entities http://www. beta.lod.openaire.eu
  22. 22. Steps: • Specify an RDF vocabulary • Specify terms and namespaces • Map the OA data model to an RDF data model • Map the OA data to an statistic RDF dump • Specify strategies to automate the RDF generation Data conforming to LOD best practices published in BETA, December 2015 Main entitiesLinking entities http://beta.lod.openaire.eu/ OA RDF graph … @prefix oad: <http://lod.openaire.eu/data/> . @prefix oav: <http://lod.openaire.eu/vocab#> . @prefix dbpedia-owl: http://dbpedia.org/ontology/. @prefix vivo: <http://vivoweb.org/files/vivo-isf-public-1.6.owl#> . @prefix pext: <http://www.ontotext.com/proton-ontology/#> . @prefix swrc:<http://swrc.ontoware.org/ontology#> . oad:07553d8e646b69b868a9791da39a1802 a foaf:Person; foaf:firstName "P."^^xsd:string; foaf:lastName "Jha"^^xsd:string; foaf:name "Jha, P."^^xsd:string; oav:isAuthorOf . oad:755469c995c2cb6cb55c3483634b026 a foaf:Person; oav:hasTarget resultdoajarticles_6fcd7b3b47ebbd05ce73018731ff9095; oav:hasLabel "personResult_authorship_isAuthorOf"^^xsd:string; oav:ranking "6"^^xsd:integer. oad:075558cd104f737d82a34cb7e9fecd7d a foaf:Person; foaf:firstName "T."^^xsd:string; foaf:lastName "Bere"^^xsd:string; foaf:name "Bere, T."^^xsd:string. … OpenAIRE data OA RDF
  23. 23. Sample query select (count (distinct ?s) as ?count) ?flevel from <test> from <relationsTest> where {?s a <http://www.eurocris.org/ontologies/cerif/1.3#Project>; <http://lod.openaire.eu/vocab/fundingLevel0> ?flevel} GROUP BY ?flevel order by ?count Number of publications with their corresponding funding level
  24. 24. General architecture OpenAIRE Metadata RDFization Interlinking RDF Store Deduplication & Inference Apache Solr https://www.openaire.eu LOD Client http://beta.lod.openaire.eu OA Vocabulary OA Data Model HTML Browser HTML HTML RDF
  25. 25. Steps: • Identify datasets to be interlinked to • Select interlinking tools: LIMES, Silk • Test interlinking OA with DBLP and DBpedia • Evaluate resulting link sets • Specify strategy for interlinking in OA workflow DBLP CiteSeer CEUR Ope Pu lAK A Interlinking OpenAIRE RDF Graph to LOD cloud … @prefix oad: <http://lod.openaire.eu/data/> . @prefix oav: <http://lod.openaire.eu/vocab#> . @prefix dbpedia-owl: http://dbpedia.org/ontology/. @prefix vivo: <http://vivoweb.org/files/vivo-isf-public-1.6.owl#> . @prefix pext: <http://www.ontotext.com/proton-ontology/#> . @prefix swrc:<http://swrc.ontoware.org/ontology#> . oad:07553d8e646b69b868a9791da39a1802 a foaf:Person; foaf:firstName "P."^^xsd:string; foaf:lastName "Jha"^^xsd:string; foaf:name "Jha, P."^^xsd:string; oav:isAuthorOf . oad:755469c995c2cb6cb55c3483634b026 a foaf:Person; oav:hasTarget resultdoajarticles_6fcd7b3b47ebbd05ce73018731ff9095; oav:hasLabel "personResult_authorship_isAuthorOf"^^xsd:string; oav:ranking "6"^^xsd:integer. oad:075558cd104f737d82a34cb7e9fecd7d a foaf:Person; foaf:firstName "T."^^xsd:string; foaf:lastName "Bere"^^xsd:string; foaf:name "Bere, T."^^xsd:string. … OA LOD Linked Open Data (LOD) http://beta.lod.openaire.eu/
  26. 26. OA LOD interlinking workflow Preprocessing • Process all the dumps from candidate datasets • Prune useless metadata • Transform the metadata to key-value pairs(hadoop key(ID)- value([Properties])) • Store in HDFS
  27. 27. Sample interlinking result Result of interlinking is a set of links between URIs from source and target dataset: DBLP dump is not complete <http://lod.openaire...bde783> owl:sameAs <http://dblp.l3s.../BoissonnatN96> <http://lod.openaire...4f8964> owl:sameAs <http://dblp.l3s.../Shrobe96> <http://lod.openaire...27fea2> owl:sameAs <http://dblp.l3s.../X96c> <http://lod.openaire...f433b9> owl:sameAs <http://dblp.l3s.../LiroyG96>
  28. 28. DBLP CiteSeer CEUR Ope Pu lAK A … @prefix oad: <http://lod.openaire.eu/data/> . @prefix oav: <http://lod.openaire.eu/vocab#> . @prefix dbpedia-owl: http://dbpedia.org/ontology/. @prefix vivo: <http://vivoweb.org/files/vivo-isf-public-1.6.owl#> . @prefix pext: <http://www.ontotext.com/proton-ontology/#> . @prefix swrc:<http://swrc.ontoware.org/ontology#> . oad:07553d8e646b69b868a9791da39a1802 a foaf:Person; foaf:firstName "P."^^xsd:string; foaf:lastName "Jha"^^xsd:string; foaf:name "Jha, P."^^xsd:string; oav:isAuthorOf . oad:755469c995c2cb6cb55c3483634b026 a foaf:Person; oav:hasTarget resultdoajarticles_6fcd7b3b47ebbd05ce73018731ff9095; oav:hasLabel "personResult_authorship_isAuthorOf"^^xsd:string; oav:ranking "6"^^xsd:integer. oad:075558cd104f737d82a34cb7e9fecd7d a foaf:Person; foaf:firstName "T."^^xsd:string; foaf:lastName "Bere"^^xsd:string; foaf:name "Bere, T."^^xsd:string. … OA LOD Linked Open Data (LOD) Ideas for LOD in Monitoring monitoring interlinking: when the target dataset grows from one version to another one, we can expect the linkset to grow as well
  29. 29. Scientific events Bootstrapping datasets for scientific events: • CEUR-WS.org dataset • OpenResearch.org • Include events in OA Data Model (Conference Object?) • Measure the quality of events • Related to funding and sponsoring • Continuality • Accepted project publications • Reputation of people • Location • Citation • …
  30. 30. Hands on
  31. 31. http://beta.lod.openaire.eu/sparql
  32. 32. Example: What is the overall research output of a given project? oav:produces and UNION are not working: PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX oav: <http://lod.openaire.eu/vocab/> PREFIX cerif: http://www.eurocris.org/ontologies/cerif/1.3# SELECT ?x ?y WHERE { ?y a cerif:ResultEntity { ?y oav:resultType 'dataset'} UNION { ?y oav:resultType 'publication'} ?x a cerif:Project. ?y cerif:linkToProject ?y } LIMIT 10
  33. 33. PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX oav: <http://lod.openaire.eu/vocab/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?o WHERE { ?x oav:projectOrganization ?o. ?o a foaf:Organization. ?y oav:projectOrganization ?o2. ?o2 a foaf:Organization. FILTER (sameTerm(?o, ?o2) && !sameTerm(?x, ?y)) } LIMIT 10 Example: What organizations are more active than others w.r.t. projects?
  34. 34. PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX oav: <http://lod.openaire.eu/vocab/> PREFIX cerif: <http://www.eurocris.org/ontologies/cerif/1.3#> PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?y WHERE { ?p cerif:linksToPerson ?x ?x a foaf:Person. ?x dcterms:creator ?y. ?y oav:resultType "dataset" } LIMIT 10 Example: What datasets has published by a specific person who involved in a given project?
  35. 35. PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX oav: <http://lod.openaire.eu/vocab/> PREFIX cerif: <http://www.eurocris.org/ontologies/cerif/1.3#> PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?y WHERE { ?p cerif:linksToPerson ?x ?x a foaf:Person. ?x dcterms:creator ?y. ?y oav:resultType "dataset" } LIMIT 10 Example: List the full names of all authors who have (co-)authored a publication in project P?

×