0
Using
 Web Data Provenance
           for
  Quality Assessment
Olaf Hartig*
Jun Zhao˚




*Humboldt-Universität zu Berlin ...
Information Quality (IQ)
 ●   Common definition: fitness for use of information
 ●   Multidimensional concept
     Categor...
IQ Assessment

 ●   Assigning numerical values (IQ scores) to IQ criteria
 ●   It is difficult!
     ●   Precision vs. Pra...
Automated IQ Assessment
 ●   Literature only outlines ideas for automatic methods
 ●   Content analysis
     ●   Compariso...
Our Goal:
                             Methods to automatically assess
                                 IQ criteria of Web...
Outline



           1. Web Data Provenance

           2. General Assessment Approach

           3. Development of Asse...
Existing Provenance Research
 ●   Main research areas: (scientific) workflows, DBMSs
 ●   General focus:
           data c...
Provenance of Web Data




Olaf Hartig - Using Web Data Provenance for Quality Assessment   8
Provenance of Web Data



                      Web data provenance
                           comprises
                 ...
Model of Web Data Provenance
 ●   Provenance graph describes provenance of a data item
     ●   Nodes: provenance elements...
Model of Web Data Provenance
 ●   Provenance model defines:                                   Actors
     ●   Types of pro...
Data Access Dimension
                                                                                       Data Item
   ...
Data Access Dimension cont.

                                    (Verified)
                                     Artifact
...
Data Creation Dimension
                                                                        Provenance
               ...
Outline



           1. Web Data Provenance

           2. General Assessment Approach

           3. Development of Asse...
A General Approach

 ●   Blueprint for actual assessment methods that
     ●   Address specific scenario
     ●   Focus on...
General Assessment Procedure




 Step 1 – Generate a provenance graph for the data item

 Step 2 – Annotate the provenanc...
Outline



           1. Web Data Provenance

           2. General Assessment Approach

           3. Development of Asse...
Designing Assessment Methods
 ●   Developing the general approach into an actual method
 ●   Fundamental design question:
...
Designing Assessment Methods
 ●   Developing the general approach into an actual method
 ●   Fundamental design question:
...
1 Generate the Provenance Graph

 What types of provenance elements are necessary?
     What level of detail (i.e. granula...
1 Generate the Provenance Graph
 Example:
 ●   Sensors (e.g. sensor1) hourly take measurement (e.g. msr)
 ●   All msr stor...
1 Generate the Provenance Graph
 Example:
 ●   Sensors (e.g. sensor1) hourly take measurement (e.g. msr)
 ●   All msr stor...
2 Annotation with Impact Values

                                              How might each provenance
                 ...
Determining Impact Values
 ●   From the provenance information
 ●   From user input
     ●   Configuration options
     ● ...
2 Annotation with Impact Values

                                              How might each provenance
                 ...
2 Annotation with Impact Values
           msr                  created by                                performed by    ...
2 Annotation with Impact Values
           msr                  created by                                performed by    ...
2 Annotation with Impact Values
           msr                created by                       performed by               ...
3 Assessment Function

     How do we represent the IQ criterion by an IQ score?


                 What does the assessme...
Step 3 – Assessment Function




Olaf Hartig - Using Web Data Provenance for Quality Assessment   31
Step 3 – Assessment Function




           msr                created by                       performed by              ...
Step 3 – Assessment Function




           msr                created by                       performed by              ...
Step 3 – Assessment Function



                                                        t(msr) = 1 – (10:15 – 10:00) / (11...
Conclusion
 ●   Web Data Provenance (data creation + data access)
 ●   General approach for provenance-based IQ assessment...
These slides have been created by
                                            Olaf Hartig
                                ...
Upcoming SlideShare
Loading in...5
×

Using Web Data Provenance for Quality Assessment

1,962

Published on

With these slides I presented our paper at the provenance workshop (SWPM) at the International Semantic Web Conference (ISWC), Oct.2009

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,962
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
31
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Using Web Data Provenance for Quality Assessment"

  1. 1. Using Web Data Provenance for Quality Assessment Olaf Hartig* Jun Zhao˚ *Humboldt-Universität zu Berlin ˚University of Oxford
  2. 2. Information Quality (IQ) ● Common definition: fitness for use of information ● Multidimensional concept Category* Criteria / Dimensions Intrinsic Accuracy, Believability, Objectivity, ... Contextual Completeness, Relevance, Timeliness, ... Representational Conciseness, Understandability, ... Accessibility Availability, Security, ... *Classification by Wang and Strong, 1996 ● IQ criteria not independent of each other ● Relevancy of criteria determined by task and preferences Olaf Hartig - Using Web Data Provenance for Quality Assessment 2
  3. 3. IQ Assessment ● Assigning numerical values (IQ scores) to IQ criteria ● It is difficult! ● Precision vs. Practicality Manual methods Semi-automatic methods ● Questionnaires ● Rating-based ● Reputation-based Olaf Hartig - Using Web Data Provenance for Quality Assessment 3
  4. 4. Automated IQ Assessment ● Literature only outlines ideas for automatic methods ● Content analysis ● Comparison (e.g. outlier detection) ● Application of information retrieval methods ● Analysis of results from data cleansing ● Sampling techniques ● Context analysis ● Analysis of metadata ● Utilization of domain knowledge Olaf Hartig - Using Web Data Provenance for Quality Assessment 4
  5. 5. Our Goal: Methods to automatically assess IQ criteria of Web data Primary means: Provenance of assessed data Olaf Hartig - Using Web Data Provenance for Quality Assessment 5
  6. 6. Outline 1. Web Data Provenance 2. General Assessment Approach 3. Development of Assessment Methods Olaf Hartig - Using Web Data Provenance for Quality Assessment 6
  7. 7. Existing Provenance Research ● Main research areas: (scientific) workflows, DBMSs ● General focus: data creation Olaf Hartig - Using Web Data Provenance for Quality Assessment 7
  8. 8. Provenance of Web Data Olaf Hartig - Using Web Data Provenance for Quality Assessment 8
  9. 9. Provenance of Web Data Web data provenance comprises two dimensions: Data Creation • Data Access Olaf Hartig - Using Web Data Provenance for Quality Assessment 9
  10. 10. Model of Web Data Provenance ● Provenance graph describes provenance of a data item ● Nodes: provenance elements – pieces of provenance info ● Edges: relate provenance elements to each other ● Subgraphs for related data items possible Olaf Hartig - Using Web Data Provenance for Quality Assessment 10
  11. 11. Model of Web Data Provenance ● Provenance model defines: Actors ● Types of provenance elements Executions ● Relationships Artifacts Olaf Hartig - Using Web Data Provenance for Quality Assessment 11
  12. 12. Data Access Dimension Data Item Data Accessor (Non-Human) contains performs retrieved by Document Execution Time Data Access accessed Data Providing Service (Non-Human) controls uses Service Provider Data Publisher (Human) Relation to the provided Information Resource Olaf Hartig - Using Web Data Provenance for Quality Assessment 12
  13. 13. Data Access Dimension cont. (Verified) Artifact Integrity Verification Verification Result {incomplete} Signer Signature Verification Relation to the signed Data Signature Method Olaf Hartig - Using Web Data Provenance for Quality Assessment 13
  14. 14. Data Creation Dimension Provenance Information Source Data Execution Time Provenance Information Creation Guidelines Data Creator Data Creation (Human or Non-human) {complete,disjoint} Data Creating Device (e.g. Sensor) Data Item Data Creating Service (e.g. Software Agent) part of responsible for responsible for Provenance Data Creating Entity Information (e.g. Person, Group, Orga.) (Encompassing) Data Item Relation to the created Data Olaf Hartig - Using Web Data Provenance for Quality Assessment 14
  15. 15. Outline 1. Web Data Provenance 2. General Assessment Approach 3. Development of Assessment Methods Olaf Hartig - Using Web Data Provenance for Quality Assessment 15
  16. 16. A General Approach ● Blueprint for actual assessment methods that ● Address specific scenario ● Focus on specific IQ criterion ● Provenance elements have an influence on IQ ● Impact values represent these influences ● Assessment is affected by knowing about the influences ● Calculation of the IQ score with an assessment function that combines all impact values Olaf Hartig - Using Web Data Provenance for Quality Assessment 16
  17. 17. General Assessment Procedure Step 1 – Generate a provenance graph for the data item Step 2 – Annotate the provenance graph with impact values Step 3 – Execute the assessment function Olaf Hartig - Using Web Data Provenance for Quality Assessment 17
  18. 18. Outline 1. Web Data Provenance 2. General Assessment Approach 3. Development of Assessment Methods Olaf Hartig - Using Web Data Provenance for Quality Assessment 18
  19. 19. Designing Assessment Methods ● Developing the general approach into an actual method ● Fundamental design question: For which IQ criterion do we want to apply the method? Olaf Hartig - Using Web Data Provenance for Quality Assessment 19
  20. 20. Designing Assessment Methods ● Developing the general approach into an actual method ● Fundamental design question: For which IQ criterion do we want to apply the method? ● Timeliness: degree to which the data item is up-to-date with respect to the task at hand ● Representation* as an absolute measure in [0,1] ● 1 – meeting the most strict timeliness standards ● 0 – unacceptable *Following Ballou et al., 1998 Olaf Hartig - Using Web Data Provenance for Quality Assessment 20
  21. 21. 1 Generate the Provenance Graph What types of provenance elements are necessary? What level of detail (i.e. granularity) is necessary? Where and how do we get provenance information? ● Two complementary options: ● Recording ● Analyzing metadata Olaf Hartig - Using Web Data Provenance for Quality Assessment 21
  22. 22. 1 Generate the Provenance Graph Example: ● Sensors (e.g. sensor1) hourly take measurement (e.g. msr) ● All msr stored in a Web-accessible storage device (store) ● Our system (sys) accesses them for further processing ● sys assesses the timeliness of all msr Olaf Hartig - Using Web Data Provenance for Quality Assessment 22
  23. 23. 1 Generate the Provenance Graph Example: ● Sensors (e.g. sensor1) hourly take measurement (e.g. msr) ● All msr stored in a Web-accessible storage device (store) ● Our system (sys) accesses them for further processing ● sys assesses the timeliness of all msr msr created by performed by sensor1 type: Data Item cExc type: Data Creator type: Data Creation contained by Execution Time: 10:00 doc retrieved by store type: Document type: Data Providing Service aExc accessed type: Data Access sys performed by type: Data Accessor Execution Time: 10:13 Olaf Hartig - Using Web Data Provenance for Quality Assessment 23
  24. 24. 2 Annotation with Impact Values How might each provenance element influence the IQ criterion? ● Systematically analyze each type of provenance elements What kind of impact values are necessary? How do we represent the influences by impact values? ● Impact values not necessarily numerical ● Depends on the assessment function in step 3 How do we determine impact values? Olaf Hartig - Using Web Data Provenance for Quality Assessment 24
  25. 25. Determining Impact Values ● From the provenance information ● From user input ● Configuration options ● Rating-based, Reputation-based ● By content analysis ● Comparison (e.g. outlier detection) ● Adoption of information retrieval methods ● Adoption of data cleansing techniques ● By context analysis ● Further metadata ● Domain knowledge Olaf Hartig - Using Web Data Provenance for Quality Assessment 25
  26. 26. 2 Annotation with Impact Values How might each provenance element influence the IQ criterion? Data Creation Dimension: Prov. Element Type Impact Values Data Creation ● creation time ● weights Creation Guidelines - (Source) Data Item ● expiry time Data Creator - Olaf Hartig - Using Web Data Provenance for Quality Assessment 26
  27. 27. 2 Annotation with Impact Values msr created by performed by sensor1 type: Data Item cExc type: Data Creator type: Data Creation contained by Execution Time: 10:00 doc retrieved by store type: Document type: Data Providing Service aExc accessed type: Data Access sys performed by type: Data Accessor Execution Time: 10:13 Prov. Element Type Impact Values Data Creation ● creation time ● weights Creation Guidelines - (Source) Data Item ● expiry time Data Creator - Olaf Hartig - Using Web Data Provenance for Quality Assessment 27
  28. 28. 2 Annotation with Impact Values msr created by performed by sensor1 type: Data Item cExc type: Data Creator type: Data Creation creation time contained by 10:00 Execution Time: 10:00 doc retrieved by store type: Document type: Data Providing Service aExc accessed type: Data Access sys performed by type: Data Accessor Execution Time: 10:13 Prov. Element Type Impact Values Data Creation ● creation time ● weights Creation Guidelines - (Source) Data Item ● expiry time Data Creator - Olaf Hartig - Using Web Data Provenance for Quality Assessment 28
  29. 29. 2 Annotation with Impact Values msr created by performed by sensor1 type: Data Item cExc type: Data Creator expiry time type: Data Creation 11:00 creation time contained by 10:00 Execution Time: 10:00 doc retrieved by store type: Document type: Data Providing Service aExc accessed type: Data Access sys performed by type: Data Accessor Execution Time: 10:13 Prov. Element Type Impact Values Data Creation ● creation time ● weights Creation Guidelines - (Source) Data Item ● expiry time Data Creator - Olaf Hartig - Using Web Data Provenance for Quality Assessment 29
  30. 30. 3 Assessment Function How do we represent the IQ criterion by an IQ score? What does the assessment function look like? ● Develop the function together with the impact values ● Take incompleteness into consideration ● Provenance graphs could be fragmentary ● Annotations could be missing Olaf Hartig - Using Web Data Provenance for Quality Assessment 30
  31. 31. Step 3 – Assessment Function Olaf Hartig - Using Web Data Provenance for Quality Assessment 31
  32. 32. Step 3 – Assessment Function msr created by performed by sensor1 type: Data Item cExc type: Data Creator expiry time type: Data Creation 11:00 creation time contained by 10:00 Execution Time: 10:00 doc retrieved by store type: Document type: Data Providing Service aExc accessed type: Data Access sys performed by type: Data Accessor Execution Time: 10:13 Olaf Hartig - Using Web Data Provenance for Quality Assessment 32
  33. 33. Step 3 – Assessment Function msr created by performed by sensor1 type: Data Item cExc type: Data Creator expiry time type: Data Creation 11:00 creation time contained by 10:00 Execution Time: 10:00 doc retrieved by store type: Document type: Data Providing Service aExc accessed type: Data Access sys performed by type: Data Accessor Execution Time: 10:13 Olaf Hartig - Using Web Data Provenance for Quality Assessment 33
  34. 34. Step 3 – Assessment Function t(msr) = 1 – (10:15 – 10:00) / (11:00 – 10:00) =1– 0.25h / 1h = 0.75 msr created by performed by sensor1 type: Data Item cExc type: Data Creator expiry time type: Data Creation 11:00 creation time contained by 10:00 Execution Time: 10:00 doc retrieved by store type: Document type: Data Providing Service aExc accessed type: Data Access sys performed by type: Data Accessor Execution Time: 10:13 Olaf Hartig - Using Web Data Provenance for Quality Assessment 34
  35. 35. Conclusion ● Web Data Provenance (data creation + data access) ● General approach for provenance-based IQ assessment ● Impact values: influence of provenance elements on IQ ● Design decisions for actual assessment methods ● Application to timeliness (more in the paper) ● Future work: ● How do we deal with incompleteness? ● Application of the approach to other IQ criteria Olaf Hartig - Using Web Data Provenance for Quality Assessment 35
  36. 36. These slides have been created by Olaf Hartig http://olafhartig.de This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License (http://creativecommons.org/licenses/by-sa/3.0/) Attribution: ● http://www.flickr.com/photos/rrrrred/3809362767/ ● http://www.hasslefreeclipart.com Olaf Hartig - Using Web Data Provenance for Quality Assessment 36
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×