Towards Query Generation for
PROV-O Data
Jun Zhao1, HongHanWu2 and Jeff Z. Pan2
1Lancaster University
@junszhao | j.zhao5 ...
Outline
• Motivation
• Profile-driven query generation
– K-Drive
– ProvQ
• Result discussion
• Future work
The Big Picture of PROV: A Motivation Scenario
http://www.w3.org/2005/Incubator/prov/wiki/images/3/38/Content-b.png
The Big Picture of PROV: A Motivation Scenario
Adapted from:
http://www.w3.org/2005/Incubator/prov/wiki/images/3/38/Conten...
The Big Picture of PROV: A Motivation Scenario
http://www.w3.org/2005/Incubator/prov/wiki/images/b/b8/Use-b.png
Provenance in the Wild v.s. ProvBench
Taverna-
PROV
Vistrails
PROV
Wings
PROV
Wikipedia
-PROV
Twitter-
PROV
OBIAMA
(social...
Next Step: Access PROV Datasets
Taverna-
PROV
Vistrails
PROV
Wings
PROV
Wikipedia
-PROV
Twitter-
PROV
OBIAMA
(social
simul...
Query Generation: A Bottom-up Approach
Taverna-
PROV
Wings
PROV
Wikipedia
-PROV
OBIAMA
(social
simulation)
Provenance Data...
Query Generation: A First Step
A
PROV
Dataset
Provenance Data Profile Generator
Provenance Query Builder
SPARQL queries
fo...
Big City:
Big Road:
Slide credit: Dr Wu at Scottish Linked Data Workshop 2014
http://www.kdrive-project.eu EU FP7 Marie-Cu...
K-Drive Generator
Live demo:
http://homepages.abdn.ac.uk/honghan.wu/pages/prov2/index.html
Live demo:
http://homepages.abdn.ac.uk/honghan.wu/pages/prov2/index.html
SELECT ?Generation ?x4_1 ?x3_1 ?x0_1
WHERE {
?Gen...
ProvQ: Property Association Mining
A
PROV
Dataset
Provenance Data Profile Generator
Provenance Query Builder
SPARQL querie...
ProvQ: Property Association Mining
• Advantages
– Reduce the performance challenge usually faced
in association rule minin...
Expanding Starting Queries
Approach Walk-Through
• Given a seed atomic query,
we have seed property:
• We find all properties used together with
– ht...
Results Comparison
• K-Drive Generator
– 7 Queries
– 3 of them are not
exactly provenance
queries
– Probably easier to
und...
Future Work
• Define and evaluate usefulness
• Test against more datasets
• Experiment with reasoning
• Query generation a...
Thank you!
These slides have been created by Jun Zhao
This work is licensed under a Creative Commons
Attribution-NonCommer...
Upcoming SlideShare
Loading in...5
×

Query-generation-for-provo-data-201406

304

Published on

Query-generation-for-provo-data for provAnalytics 2014 at Provenance Week: http://provenanceweek.org/2014/analytics/

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
304
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • wasGeneratedBy, startedAtTime, endedAtTime, wasAssociatedWith, wasAttributedTo, actedOnBehalfOf, wasInformedBy
  • From prov:wasGeneratedBy:

    Select distinct *
    where {
    ?s prov:wasGeneratedBy ?o .
    optional {?s <http://purl.org/wf4ever/wfprov#describedByParameter> ?o1.}
    optional {?s <http://purl.org/wf4ever/wfprov#wasOutputFrom> ?o3 .}
    optional {?s <http://www.w3.org/ns/prov#qualifiedGeneration> ?o4 .}
    }
    limit 100

    2. From prov:used

    <http://purl.org/wf4ever/wfprov#usedInput>; 1
    rdfs:label; 1
    prov:endedAtTime; 1
    prov:startedAtTime; 1
    prov:qualifiedAssociation; 1
    prov:qualifiedUsage; 1
    <http://purl.org/wf4ever/wfprov#describedByProcess>; 0.98
    <http://purl.org/wf4ever/wfprov#wasPartOfWorkflowRun>; 0.98


    Select distinct *
    where {
    ?s prov:used ?o .
    ?s <http://purl.org/wf4ever/wfprov#usedInput> ?o1 .
    ?s rdfs:label ?o2 .
    ?s prov:endedAtTime ?o3 .
    ?s prov:startedAtTime ?o4 .
    ?s prov:qualifiedAssociation ?o5 .
    ?s prov:qualifiedUsage ?o6 .
    optional {?s <http://purl.org/wf4ever/wfprov#describedByProcess> ?o7 .}
    optional {?s <http://purl.org/wf4ever/wfprov#wasPartOfWorkflowRun> ?o8 .}
    }
    limit 100

    3. From prov:wasDerivedFrom

    <http://ns.taverna.org.uk/2012/tavernaprov/errorMessage>; 1
    <http://ns.taverna.org.uk/2012/tavernaprov/stackTrace>; 1

    Select distinct *
    where {
    ?s prov:wasDerivedFrom ?o .
    ?s <http://ns.taverna.org.uk/2012/tavernaprov/errorMessage> ?o1.
    ?s <http://ns.taverna.org.uk/2012/tavernaprov/stackTrace> ?o2 .
    }
    limit 100

    4. From prov:startedAtTime and prov:endedAtTime, will produce similar result as query 2
    rdfs:label; 1
    prov:endedAtTime; 1
    prov:qualifiedAssociation; 1

    <http://purl.org/wf4ever/wfprov#describedByProcess>; 0.97
    <http://purl.org/wf4ever/wfprov#wasPartOfWorkflowRun>; 0.97
    prov:qualifiedUsage; 0.90
    prov:used; 0.90
    <http://purl.org/wf4ever/wfprov#usedInput>; 0.90

    Select distinct *
    where {
    ?s prov:startedAtTime?o .
    ?s rdfs:label ?o1 .
    ?s prov:endedAtTime ?o2 .
    ?s prov:qualifiedAssociation ?o3 .
    optional {?s <http://purl.org/wf4ever/wfprov#describedByProcess> ?o4 .}
    optional {?s <http://purl.org/wf4ever/wfprov#wasPartOfWorkflowRun> ?o5 .}
    optional {?s <http://purl.org/wf4ever/wfprov#usedInput> ?o6 .}
    optional {?s prov:qualifiedUsage ?o7 .}
    optional {?s prov:used ?o8 .}
    }
    limit 100


  • 3 queries were largely the same, 3 queries were only returned by K-Drive, and the rest had different degrees of overlap.
    1 query not returned
  • Query-generation-for-provo-data-201406

    1. 1. Towards Query Generation for PROV-O Data Jun Zhao1, HongHanWu2 and Jeff Z. Pan2 1Lancaster University @junszhao | j.zhao5 at lancaster.ac.uk 2University of Aberdeen honghan.wu | jeff.z.pan at abdn.ac.uk
    2. 2. Outline • Motivation • Profile-driven query generation – K-Drive – ProvQ • Result discussion • Future work
    3. 3. The Big Picture of PROV: A Motivation Scenario http://www.w3.org/2005/Incubator/prov/wiki/images/3/38/Content-b.png
    4. 4. The Big Picture of PROV: A Motivation Scenario Adapted from: http://www.w3.org/2005/Incubator/prov/wiki/images/3/38/Content-b.png Provenance information
    5. 5. The Big Picture of PROV: A Motivation Scenario http://www.w3.org/2005/Incubator/prov/wiki/images/b/b8/Use-b.png
    6. 6. Provenance in the Wild v.s. ProvBench Taverna- PROV Vistrails PROV Wings PROV Wikipedia -PROV Twitter- PROV OBIAMA (social simulation) Workflow / scientific domain • 11 repositories so far • Various representations • Cross different domains • Openly accessible under different open licenses Web resources Social domain https://github.com/provbench https://sites.google.com/site/provbench/home
    7. 7. Next Step: Access PROV Datasets Taverna- PROV Vistrails PROV Wings PROV Wikipedia -PROV Twitter- PROV OBIAMA (social simulation) Can we query across them? Can we learn something by querying across them? What can we do with them? ……
    8. 8. Query Generation: A Bottom-up Approach Taverna- PROV Wings PROV Wikipedia -PROV OBIAMA (social simulation) Provenance Data Profile Generator Provenance Query Builder SPARQL queries for PROV-O datasets Example profiles: • Class associations • Property associations
    9. 9. Query Generation: A First Step A PROV Dataset Provenance Data Profile Generator Provenance Query Builder SPARQL queries for the PROV-O dataset Example profiles: • Class associations • Property associations
    10. 10. Big City: Big Road: Slide credit: Dr Wu at Scottish Linked Data Workshop 2014 http://www.kdrive-project.eu EU FP7 Marie-Curie 286348 Pan et al. Query generation for semantic datasets. K-CAP 2013. p 113-116 • University of Aberdeen • A generic query generation tool for semantic web data • Find key sub-graphs in the RDF data – Big City: The most instantialised concepts in the data – Big Road: The most frequent relations connecting those big cities K-Drive Query Generation
    11. 11. K-Drive Generator Live demo: http://homepages.abdn.ac.uk/honghan.wu/pages/prov2/index.html
    12. 12. Live demo: http://homepages.abdn.ac.uk/honghan.wu/pages/prov2/index.html SELECT ?Generation ?x4_1 ?x3_1 ?x0_1 WHERE { ?Generation rdf:type <http://www.w3.org/ns/prov#Generation>. ?Generation <http://www.w3.org/ns/prov#activity> ?x4_1 . ?Generation <http://www.w3.org/ns/prov#hadRole> ?x3_1 . ?x0_1 <http://www.w3.org/ns/prov#qualifiedGeneration> ?Generation . } K-Drive Generator
    13. 13. ProvQ: Property Association Mining A PROV Dataset Provenance Data Profile Generator Provenance Query Builder SPARQL queries for the PROV-O dataset Discover properties that are used together with each PROV-O properties Expand a set of “seed” PROV-O queries using the discovered associating properties https://github.com/junszhao/ProvQ
    14. 14. ProvQ: Property Association Mining • Advantages – Reduce the performance challenge usually faced in association rule mining – Produce provenance-centric queries • Disadvantages – Could miss queries that are not related to PROV- O terms at all
    15. 15. Expanding Starting Queries
    16. 16. Approach Walk-Through • Given a seed atomic query, we have seed property: • We find all properties used together with – http://purl.org/wf4ever/wfprov#describedByParameter – http://purl.org/wf4ever/wfprov#wasOutputFrom – http://www.w3.org/ns/prov#qualifiedGeneration • Return resulting conjunctive SPARQL query
    17. 17. Results Comparison • K-Drive Generator – 7 Queries – 3 of them are not exactly provenance queries – Probably easier to understand because classes are included in the queries – But queries can be complex • ProvQ – 7 Queries – 1 not returned by K-Drive (prov:wasDerivedFrom) – Only provenance queries are returned – Queries are simple, based on properties associations starting from “seed” PROV-O properties https://github.com/junszhao/ProvQ/blob/master/results/query-analysis.txt
    18. 18. Future Work • Define and evaluate usefulness • Test against more datasets • Experiment with reasoning • Query generation across multiple datasets
    19. 19. Thank you! These slides have been created by Jun Zhao This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported http://creativecommons.org/licenses/by-nc-sa/3.0/
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×