> LOP – Capturing and Linking
Open Provenance on LOD Cycle
Rogers R. de Mendonça, Jonas F. S. M. De La Cerda, Kelli F. de ...
>Outline
Introduction
– Provenance
– Linked Open Data Lifecycle
An Approach for Linked Open Provenance Capture
– Data Prep...
>Increase of the Web of Data
What about
data reliability and quality ?
>
Information about the history of the data:
– Where did the data come from?
– Who designed the publishing process?
– Who ...
>Semantic Web Stack
Provenance
W3C®
>
Provenance data available according to LOD principles:
1. Use URIs as names for things
2. Use HTTP URIs, so that people ...
>Related Works
Ontologies / Vocabularies
– PROV-O (PROV-DM)
http://purl.org/net/opmv/ns
– OPMV (OPM)
http://www.w3.org/TR/...
>Related Works
Use of provenance to support quality and reliability
assessment of published data
– Provenance Information ...
>
Interlinking
EnrichmentAuthoring
Linked Open Data Lifecycle
Quality
Evolution
Exploration
Extraction
Storage
LOD2
>
Interlinking
EnrichmentAuthoring
Quality Phase
Quality assessment
Quality
Evolution
Exploration
Extraction
Storage
LOD2
...
>
Interlinking
EnrichmentAuthoring
Interlinking Phase
Create and maintain links
Quality assessment
Quality
Evolution
Explo...
>
Interlinking
EnrichmentAuthoring
Extraction Phase
Create and maintain links
Quality assessment
Quality
Evolution
Explora...
>
Interlinking
EnrichmentAuthoring
Extension of Extraction Phase
Create and maintain links
Quality assessment
Quality
Evol...
>
Interlinking
EnrichmentAuthoring
Extension: Preparation Before Triplification
Create and maintain links
Quality assessme...
>Data Publishing and Interlinking Process
>Data Publishing and Interlinking Process
Extraction Phase
>Data Preparation and Transformation Process
Heterogeneous
Data Sources
Triplify
Extract
Clean
Conform
Pre-Integrate
Data ...
>Data Preparation and Transformation Process
Heterogeneous
Data Sources
Triplify
Extract
Clean
Conform
Pre-Integrate
Data ...
>Data Publishing and Interlinking Process
Extraction Phase
Interlinking Phase
>Data Interlinking Process
Data Interlinking Process
Web Data
Access
Schema
Mappings
Identity
Resolution
Quality
Evaluator
>Data Interlinking Process
Data Interlinking Process
Web Data
Access
Schema
Mappings
Identity
Resolution
Quality
Evaluator...
>Data Interlinking Process
Data Interlinking Process
Web Data
Access
Schema
Mappings
Identity
Resolution
Quality
Evaluator...
>Data Interlinking Process
Data Interlinking Process
Web Data
Access
Schema
Mappings
Identity
Resolution
Quality
Evaluator...
>Data Interlinking Process
Data Interlinking Process
Web Data
Access
Schema
Mappings
Identity
Resolution
Quality
Evaluator...
>Provenance Oportunity
Data Interlinking Process
Heterogeneous
Data Sources
Triplify
Extract
Clean
Conform
Pre-Integrate
D...
>Linked Open Provenance Architecture
>Data Interlinking Scenarios
>Implementation of PGA
Provenance Gathering Agent
RDF Triple
Triple StoreTriple Store
Provenance
Data
Staging DatabaseStag...
>Implementation of PGA
The andThe PGA wraps the ETL process and
stores provenance in data staging
tables to be further ext...
>Implementation of PGA
Web Data Access
Schema MappingsSchema Mappings
Identity Resolution
Provenance Gathering Agent was
i...
>Use Case Scenario
>Use Case Scenario
CNPq = Brazilian governmental organization
responsible for fostering scientific research
RNP = Brazilia...
>Use Case Scenario – First Part
>Use Case Scenario – First Part
>
SELECT ?group_name ?project_name ?researcher_uri ?process_name
FROM NAMED <http://linkgraph.provenance.br>
FROM NAMED <h...
>Querying Linked Open Provenance
SELECT ?group_name ?project_name ?researcher_uri ?process_name
FROM NAMED <http://linkgra...
>
group_name project_name research_uri process_name
"GRECO - Grupo
Engenharia do
Conhecimento"@pt
"LinkedDataBR -
Exposiçã...
>Use Case Scenario – Second Part
>Use Case Scenario – Second Part
>Use Case Scenario – Provenance Evaluation
At the end of the execution of both processes, a
SPARQL query could be used to ...
>Conclusion - Contributions
New strategy to provide provenance for data and links
of Web of Data
LOD cycle is extended wit...
>Conclusion – Future works
Development of provenance query interface
– Take advantage of LOP and support its exploration
D...
>Thank You !
LOP – Capturing and Linking Open
Provenance on LOD Cycle
Rogers R. de Mendonça 1
rogers@ufrj.br
Jonas F. S. M...
Upcoming SlideShare
Loading in …5
×

LOP – Capturing and Linking Open Provenance on LOD Cycle

484 views

Published on

Presentation of the paper "LOP – Capturing and Linking Open Provenance on LOD Cycle" at 5th Internacional Workshop on Semantic Web Information Management (SWIM 2013). New York, USA – June 23, 2013

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
484
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

LOP – Capturing and Linking Open Provenance on LOD Cycle

  1. 1. > LOP – Capturing and Linking Open Provenance on LOD Cycle Rogers R. de Mendonça, Jonas F. S. M. De La Cerda, Kelli F. de Cordeiro Sérgio M. S. da Cruz, Maria Cláudia Cavalcanti, Maria Luiza M. Campos 5th Internacional Workshop on Semantic Web Information Management SWIM 2013 New York, USA – June 23, 2013
  2. 2. >Outline Introduction – Provenance – Linked Open Data Lifecycle An Approach for Linked Open Provenance Capture – Data Preparation and Transformation Process– Data Preparation and Transformation Process – Data Interlinking Process – Linked Open Provenance Architecture – Usage Scenario Conclusion – Contributions – Future Works
  3. 3. >Increase of the Web of Data What about data reliability and quality ?
  4. 4. > Information about the history of the data: – Where did the data come from? – Who designed the publishing process? – Who executed the publishing process? – Which operations were applied to the data? Provenance Importance to the Web of Data: – Support quality and reliability assessment of the published data
  5. 5. >Semantic Web Stack Provenance W3C®
  6. 6. > Provenance data available according to LOD principles: 1. Use URIs as names for things 2. Use HTTP URIs, so that people can look up those names 3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) Linked Open Provenance (LOP) information, using the standards (RDF, SPARQL) 4. Include links to other URIs, so that they can discover more things
  7. 7. >Related Works Ontologies / Vocabularies – PROV-O (PROV-DM) http://purl.org/net/opmv/ns – OPMV (OPM) http://www.w3.org/TR/prov-o/http://www.w3.org/TR/prov-o/ – Cogs (ETL) http://vocab.deri.ie/cogs – Dublin Core Metadata Terms , FOAF
  8. 8. >Related Works Use of provenance to support quality and reliability assessment of published data – Provenance Information in the Web of Data (HARTIG, 2009) – Managing the life-cycle of linked data with the LOD2 stack. (AUER et al, 2012)stack. (AUER et al, 2012) – Linked Data Quality Assessment and Fusion (MENDES et al, 2012) Focus on metadata about the source and access of the data
  9. 9. > Interlinking EnrichmentAuthoring Linked Open Data Lifecycle Quality Evolution Exploration Extraction Storage LOD2
  10. 10. > Interlinking EnrichmentAuthoring Quality Phase Quality assessment Quality Evolution Exploration Extraction Storage LOD2 Quality assessment
  11. 11. > Interlinking EnrichmentAuthoring Interlinking Phase Create and maintain links Quality assessment Quality Evolution Exploration Extraction Storage LOD2 Quality assessment
  12. 12. > Interlinking EnrichmentAuthoring Extraction Phase Create and maintain links Quality assessment Quality Evolution Exploration Extraction Storage LOD2 Quality assessment Extract and triplify data
  13. 13. > Interlinking EnrichmentAuthoring Extension of Extraction Phase Create and maintain links Quality assessment Quality Evolution Exploration Extraction Storage Preparation LOD2 Quality assessment
  14. 14. > Interlinking EnrichmentAuthoring Extension: Preparation Before Triplification Create and maintain links Quality assessment Quality Evolution Exploration Extraction Storage Preparation LOD2 Quality assessment Extract, prepare and triplify data
  15. 15. >Data Publishing and Interlinking Process
  16. 16. >Data Publishing and Interlinking Process Extraction Phase
  17. 17. >Data Preparation and Transformation Process Heterogeneous Data Sources Triplify Extract Clean Conform Pre-Integrate Data Preparation and Transformation Process ETL (Extraction-Tranformation-Loading) approach:ETL (Extraction-Tranformation-Loading) approach: – Foundation of DW systems – Its techniques and tools have been developed and refined over many years in challenging BI scenarios – It is very advantageous to inherit the potential of theses techniques and tools to publish LOD and LOP
  18. 18. >Data Preparation and Transformation Process Heterogeneous Data Sources Triplify Extract Clean Conform Pre-Integrate Data Preparation and Transformation Process Use of a workflow to have:Use of a workflow to have: – Systematization of the publishing process – Monitoring and management of the several tasks – Facilities for reusing the process Pentaho Data Integration (a.k.a. Kettle) – Open source, large community of users, extensible
  19. 19. >Data Publishing and Interlinking Process Extraction Phase Interlinking Phase
  20. 20. >Data Interlinking Process Data Interlinking Process Web Data Access Schema Mappings Identity Resolution Quality Evaluator
  21. 21. >Data Interlinking Process Data Interlinking Process Web Data Access Schema Mappings Identity Resolution Quality Evaluator Extracts data from its original sources
  22. 22. >Data Interlinking Process Data Interlinking Process Web Data Access Schema Mappings Identity Resolution Quality Evaluator Matches corresponding terms of multiple vocabularies
  23. 23. >Data Interlinking Process Data Interlinking Process Web Data Access Schema Mappings Identity Resolution Quality Evaluator Finds and links similar resources on different datasets
  24. 24. >Data Interlinking Process Data Interlinking Process Web Data Access Schema Mappings Identity Resolution Quality Evaluator Evaluates data quality based on a set of rules
  25. 25. >Provenance Oportunity Data Interlinking Process Heterogeneous Data Sources Triplify Extract Clean Conform Pre-Integrate Data Preparation and Transformation Process All steps need heavy parameterization and produce a lot of results – Employed parameter values and techniques as well as results obtained are all provenance data Web Data Access Schema Mappings Identity Resolution Quality Evaluator
  26. 26. >Linked Open Provenance Architecture
  27. 27. >Data Interlinking Scenarios
  28. 28. >Implementation of PGA Provenance Gathering Agent RDF Triple Triple StoreTriple Store Provenance Data Staging DatabaseStaging Database
  29. 29. >Implementation of PGA The andThe PGA wraps the ETL process and stores provenance in data staging tables to be further extracted, RDF Triple Triple StoreTriple Store Provenance Data Staging DatabaseStaging Database tables to be further extracted, triplified and loaded to the triple store by other specific steps, developed through Kettle API and Linked Open Data frameworks
  30. 30. >Implementation of PGA Web Data Access Schema MappingsSchema Mappings Identity Resolution Provenance Gathering Agent was implemented as a web service written in Scala (www.scala-lang.org) Provenance Gathering Agent was implemented as a web service written in Scala (www.scala-lang.org)
  31. 31. >Use Case Scenario
  32. 32. >Use Case Scenario CNPq = Brazilian governmental organization responsible for fostering scientific research RNP = Brazilian governmental organization that finances research projects
  33. 33. >Use Case Scenario – First Part
  34. 34. >Use Case Scenario – First Part
  35. 35. > SELECT ?group_name ?project_name ?researcher_uri ?process_name FROM NAMED <http://linkgraph.provenance.br> FROM NAMED <http://datagraph.provenance.br> FROM NAMED <http://www.cnpq.br> FROM NAMED <http://lattes.cnpq.br> WHERE { GRAPH <http://linkgraph.provenance.br> { ?row_uri provprop:cnpqResearchGroup ?group_uri . ?row_uri provprop:lattesProject ?project_uri . ?row_uri provprop:lattesResearcher ?researcher_uri . } GRAPH <http://datagraph.provenance.br> { Gets researcher’s groups, projects and researchers from data graphs of domain dataset Querying Linked Open Provenance GRAPH <http://datagraph.provenance.br> { ?row_uri opmv:wasGeneratedBy ?process_uri . ?process_uri provprop:composition ?process_def_uri . ?process_def_uri dcterms:title ?process_name . } GRAPH <http://www.cnpq.br> { ?group_uri cnpq:project ?project_uri . ?group_uri foaf:name ?group_name . } GRAPH <http://lattes.cnpq.br> { ?project_uri foaf:name ?project_name . ?researcher_uri foaf:name ?researcher_name . } } Data, that were in differents datasources of the CNPq organization, are now integrated in the Web of Data.
  36. 36. >Querying Linked Open Provenance SELECT ?group_name ?project_name ?researcher_uri ?process_name FROM NAMED <http://linkgraph.provenance.br> FROM NAMED <http://datagraph.provenance.br> FROM NAMED <http://www.cnpq.br> FROM NAMED <http://lattes.cnpq.br> WHERE { GRAPH <http://linkgraph.provenance.br> { ?row_uri provprop:cnpqResearchGroup ?group_uri . ?row_uri provprop:lattesProject ?project_uri . ?row_uri provprop:lattesResearcher ?researcher_uri . } GRAPH <http://datagraph.provenance.br> { Also gets the integration process from provenance graphs of Linked Open Provenance dataset GRAPH <http://datagraph.provenance.br> { ?row_uri opmv:wasGeneratedBy ?process_uri . ?process_uri provprop:composition ?process_def_uri . ?process_def_uri dcterms:title ?process_name . } GRAPH <http://www.cnpq.br> { ?group_uri cnpq:project ?project_uri . ?group_uri foaf:name ?group_name . } GRAPH <http://lattes.cnpq.br> { ?project_uri foaf:name ?project_name . ?researcher_uri foaf:name ?researcher_name . } }
  37. 37. > group_name project_name research_uri process_name "GRECO - Grupo Engenharia do Conhecimento"@pt "LinkedDataBR - Exposição, compartilhamento e http://lattes.cn pq.br/resourc e/Researcher/ "Merge CNPq Research Groups x Lattes Projects" Querying Linked Open Provenance Conhecimento"@pt compartilhamento e conexão de recursos de dados abertos na Web (Linked Open Data)"@pt e/Researcher/ K4781460T3 x Lattes Projects" "GRECO - Grupo Engenharia do Conhecimento"@pt "Núcleo de Pesquisa de Sistemas Computacionais Complexos para a Gestão de Emergências"@pt http://lattes.cn pq.br/resourc e/Researcher/ K4717449A7 "Merge CNPq Research Groups x Lattes Projects" "GRECO - Grupo Engenharia do Conhecimento"@pt "Identificação e Análise de Redes Sociais Complexas"@pt http://lattes.cn pq.br/resourc e/Researcher/ K4761314U5 "Merge CNPq Research Groups x Lattes Projects"
  38. 38. >Use Case Scenario – Second Part
  39. 39. >Use Case Scenario – Second Part
  40. 40. >Use Case Scenario – Provenance Evaluation At the end of the execution of both processes, a SPARQL query could be used to ask: “At which projects does a researcher work?” The result would include projects declared in the CNPq dataset and in the RNP datasetdataset and in the RNP dataset If the projects returned by CNPq diverges from RNP, it is possible to investigate the cause by querying and evaluating LOP data
  41. 41. >Conclusion - Contributions New strategy to provide provenance for data and links of Web of Data LOD cycle is extended with a systematic data preparation and transformation process, supported by an ETL workflow frameworkan ETL workflow framework Provenance data is available according to LOD principles (Linked Open Provenance)
  42. 42. >Conclusion – Future works Development of provenance query interface – Take advantage of LOP and support its exploration Development / evolution of a provenance ontology – Today, we are using a combination of vocabularies Investigation in the area of Big Data – Fine-grained provenance generates large volumes of data
  43. 43. >Thank You ! LOP – Capturing and Linking Open Provenance on LOD Cycle Rogers R. de Mendonça 1 rogers@ufrj.br Jonas F. S. M. De La Cerda 2 jonas.ferreira@uniriotec.br Kelli F. de Cordeiro 1 kelli@ufrj.br Sérgio M. S. da Cruz 3 serra@ufrrj.br Maria Cláudia Cavalcanti 2 yoko@ime.eb.br Maria Luiza M. Campos 1 mluiza@ppgi.ufrj.br 1 Federal University of Rio de Janeiro - UFRJ 2 Military Institute of Engineering - IME 3 Federal Rural University of Rio de Janeiro - UFRRJ

×