Query-generation-for-provo-data-201406
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Query-generation-for-provo-data-201406

  • 188 views
Uploaded on

Query-generation-for-provo-data for provAnalytics 2014 at Provenance Week: http://provenanceweek.org/2014/analytics/

Query-generation-for-provo-data for provAnalytics 2014 at Provenance Week: http://provenanceweek.org/2014/analytics/

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
188
On Slideshare
181
From Embeds
7
Number of Embeds
1

Actions

Shares
Downloads
4
Comments
0
Likes
0

Embeds 7

https://twitter.com 7

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • wasGeneratedBy, startedAtTime, endedAtTime, wasAssociatedWith, wasAttributedTo, actedOnBehalfOf, wasInformedBy <br />
  • From prov:wasGeneratedBy: <br /> <br /> Select distinct * <br /> where { <br /> ?s prov:wasGeneratedBy ?o . <br /> optional {?s ?o1.} <br /> optional {?s ?o3 .} <br /> optional {?s ?o4 .} <br /> } <br /> limit 100 <br /> <br /> 2. From prov:used <br /> <br /> ; 1 <br /> rdfs:label; 1 <br /> prov:endedAtTime; 1 <br /> prov:startedAtTime; 1 <br /> prov:qualifiedAssociation; 1 <br /> prov:qualifiedUsage; 1 <br /> ; 0.98 <br /> ; 0.98 <br /> <br /> <br /> Select distinct * <br /> where { <br /> ?s prov:used ?o . <br /> ?s ?o1 . <br /> ?s rdfs:label ?o2 . <br /> ?s prov:endedAtTime ?o3 . <br /> ?s prov:startedAtTime ?o4 . <br /> ?s prov:qualifiedAssociation ?o5 . <br /> ?s prov:qualifiedUsage ?o6 . <br /> optional {?s ?o7 .} <br /> optional {?s ?o8 .} <br /> } <br /> limit 100 <br /> <br /> 3. From prov:wasDerivedFrom <br /> <br /> ; 1 <br /> ; 1 <br /> <br /> Select distinct * <br /> where { <br /> ?s prov:wasDerivedFrom ?o . <br /> ?s ?o1. <br /> ?s ?o2 . <br /> } <br /> limit 100 <br /> <br /> 4. From prov:startedAtTime and prov:endedAtTime, will produce similar result as query 2 <br /> rdfs:label; 1 <br /> prov:endedAtTime; 1 <br /> prov:qualifiedAssociation; 1 <br /> <br /> ; 0.97 <br /> ; 0.97 <br /> prov:qualifiedUsage; 0.90 <br /> prov:used; 0.90 <br /> ; 0.90 <br /> <br /> Select distinct * <br /> where { <br /> ?s prov:startedAtTime?o . <br /> ?s rdfs:label ?o1 . <br /> ?s prov:endedAtTime ?o2 . <br /> ?s prov:qualifiedAssociation ?o3 . <br /> optional {?s ?o4 .} <br /> optional {?s ?o5 .} <br /> optional {?s ?o6 .} <br /> optional {?s prov:qualifiedUsage ?o7 .} <br /> optional {?s prov:used ?o8 .} <br /> } <br /> limit 100 <br /> <br /> <br />
  • 3 queries were largely the same, 3 queries were only returned by K-Drive, and the rest had different degrees of overlap. <br /> 1 query not returned <br />

Transcript

  • 1. Towards Query Generation for PROV-O Data Jun Zhao1, HongHanWu2 and Jeff Z. Pan2 1Lancaster University @junszhao | j.zhao5 at lancaster.ac.uk 2University of Aberdeen honghan.wu | jeff.z.pan at abdn.ac.uk
  • 2. Outline • Motivation • Profile-driven query generation – K-Drive – ProvQ • Result discussion • Future work
  • 3. The Big Picture of PROV: A Motivation Scenario http://www.w3.org/2005/Incubator/prov/wiki/images/3/38/Content-b.png
  • 4. The Big Picture of PROV: A Motivation Scenario Adapted from: http://www.w3.org/2005/Incubator/prov/wiki/images/3/38/Content-b.png Provenance information
  • 5. The Big Picture of PROV: A Motivation Scenario http://www.w3.org/2005/Incubator/prov/wiki/images/b/b8/Use-b.png
  • 6. Provenance in the Wild v.s. ProvBench Taverna- PROV Vistrails PROV Wings PROV Wikipedia -PROV Twitter- PROV OBIAMA (social simulation) Workflow / scientific domain • 11 repositories so far • Various representations • Cross different domains • Openly accessible under different open licenses Web resources Social domain https://github.com/provbench https://sites.google.com/site/provbench/home
  • 7. Next Step: Access PROV Datasets Taverna- PROV Vistrails PROV Wings PROV Wikipedia -PROV Twitter- PROV OBIAMA (social simulation) Can we query across them? Can we learn something by querying across them? What can we do with them? ……
  • 8. Query Generation: A Bottom-up Approach Taverna- PROV Wings PROV Wikipedia -PROV OBIAMA (social simulation) Provenance Data Profile Generator Provenance Query Builder SPARQL queries for PROV-O datasets Example profiles: • Class associations • Property associations
  • 9. Query Generation: A First Step A PROV Dataset Provenance Data Profile Generator Provenance Query Builder SPARQL queries for the PROV-O dataset Example profiles: • Class associations • Property associations
  • 10. Big City: Big Road: Slide credit: Dr Wu at Scottish Linked Data Workshop 2014 http://www.kdrive-project.eu EU FP7 Marie-Curie 286348 Pan et al. Query generation for semantic datasets. K-CAP 2013. p 113-116 • University of Aberdeen • A generic query generation tool for semantic web data • Find key sub-graphs in the RDF data – Big City: The most instantialised concepts in the data – Big Road: The most frequent relations connecting those big cities K-Drive Query Generation
  • 11. K-Drive Generator Live demo: http://homepages.abdn.ac.uk/honghan.wu/pages/prov2/index.html
  • 12. Live demo: http://homepages.abdn.ac.uk/honghan.wu/pages/prov2/index.html SELECT ?Generation ?x4_1 ?x3_1 ?x0_1 WHERE { ?Generation rdf:type <http://www.w3.org/ns/prov#Generation>. ?Generation <http://www.w3.org/ns/prov#activity> ?x4_1 . ?Generation <http://www.w3.org/ns/prov#hadRole> ?x3_1 . ?x0_1 <http://www.w3.org/ns/prov#qualifiedGeneration> ?Generation . } K-Drive Generator
  • 13. ProvQ: Property Association Mining A PROV Dataset Provenance Data Profile Generator Provenance Query Builder SPARQL queries for the PROV-O dataset Discover properties that are used together with each PROV-O properties Expand a set of “seed” PROV-O queries using the discovered associating properties https://github.com/junszhao/ProvQ
  • 14. ProvQ: Property Association Mining • Advantages – Reduce the performance challenge usually faced in association rule mining – Produce provenance-centric queries • Disadvantages – Could miss queries that are not related to PROV- O terms at all
  • 15. Expanding Starting Queries
  • 16. Approach Walk-Through • Given a seed atomic query, we have seed property: • We find all properties used together with – http://purl.org/wf4ever/wfprov#describedByParameter – http://purl.org/wf4ever/wfprov#wasOutputFrom – http://www.w3.org/ns/prov#qualifiedGeneration • Return resulting conjunctive SPARQL query
  • 17. Results Comparison • K-Drive Generator – 7 Queries – 3 of them are not exactly provenance queries – Probably easier to understand because classes are included in the queries – But queries can be complex • ProvQ – 7 Queries – 1 not returned by K-Drive (prov:wasDerivedFrom) – Only provenance queries are returned – Queries are simple, based on properties associations starting from “seed” PROV-O properties https://github.com/junszhao/ProvQ/blob/master/results/query-analysis.txt
  • 18. Future Work • Define and evaluate usefulness • Test against more datasets • Experiment with reasoning • Query generation across multiple datasets
  • 19. Thank you! These slides have been created by Jun Zhao This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported http://creativecommons.org/licenses/by-nc-sa/3.0/