Advertisement
Advertisement

More Related Content

Similar to User requirments for geospatial provenance(20)

More from dgarijo(20)

Advertisement

User requirments for geospatial provenance

  1. Date: 09/06/2014 User Requirements for Geospatial Provenance Daniel Garijo, Andreas Harth, Yolanda Gil Ontology Engineering Group. Universidad Politécnica de Madrid Information Sciences Institute, University of Southern California Institute AIFB, Karlsruhe Institute of Technology
  2. Problem statement Maps can integrate many different sources •Open Street Maps •GeoNames •CIA World Factbook •Etc. Interaction to standarize 2
  3. Outline 1. Challenges 2. Assumptions 3. Types of provenance in the geospatial domain 1. Provenance of datasets and sets of datasets 2. Provenance of objects and sets of objects 3. Provenance of properties and sets of properties 4. Other requirements related to provenance 4. Modeling geospatial provenance with PROV-O 1. Dataset level provenance • Updating a map 2. Object level provenance 3. Property level provenance 5. Summary 6. Conclusions and Future work 3
  4. Challenges concerning provenance Versioning and provenance (Map updates ) Trust based provenance Data integration and provenance Crowdsourcing and provenance Granularity and provenance Aggregation and provenance 4
  5. Assumptions Simplifying the problem… •The entities across datasets have been mapped. •The datasets share the same data model and vocabulary. •Each dataset contains objects with unique identifiers. •The integrated map is going to be presented to a user who is interested in using the information for some purpose. 5
  6. Summary 1. Challenges 2. Assumptions 3. Types of provenance in the geospatial domain 1. Provenance of datasets and sets of datasets 2. Provenance of objects and sets of objects 3. Provenance of properties and sets of properties 4. Other requirements related to provenance 4. Modeling geospatial provenance with PROV-O 1. Dataset level provenance • Updating a map 2. Object level provenance 3. Property level provenance 5. Summary 6. Conclusions and Future work 6
  7. Types of provenance: Provenance of Datasets and sets of Datasets Provenance of a map… •Sources used to create the map •Creator of the map •Creation process used (algorithms, etc.) •Recent changes of the map •Reason why the map has been updated Browsing different versions of a map… •Most recent maps •Maps from an organization •Maps created from a version of a dataset or algorithm Map release June OSM FAO GADM Integration June 7
  8. Types of provenance: Provenance of Objects and sets of Objects Objects: lower granularity entities in the map •Original data source of the object •Organizations responsible for the creation of the object •Date of creation of the object •Date of insertion of the object in the map •Process of inclusion in the dataset Provenance of collections of objects… •Source of the objects of a region/area •Objects from a specific organization •Objects belonging to a type of source (e.g., crowdsourced map) •Objects introduced in the last version of the map A B C bridge stadium intersection 8
  9. Types of provenance: Provenance of Properties and sets of Properties Properties: attributes of objects in a map •Sources of the property •Creator of the property •Date of the creation/update of the property •Process by which the property was added Provenance of sets of properties… •Properties of objects coming from one data source •Properties of objects belonging to a crowdsourced map •Properties of the selected objects that have the same source 9 Source A Source B Height: 20 m Length: 1 km Name: 405 Fwy overpass
  10. Other requirements related to provenance 10 Other requirements might not be straightforward to answer… •How did a set of manual corrections help to improve the map? •What is new in this map? •What objects are integrated with a high confidence? •Why is an object not appearing? •General highlights of the map …but they can be addressed having provenance records
  11. Summary 1. Challenges 2. Assumptions 3. Types of provenance in the geospatial domain 1. Provenance of datasets and sets of datasets 2. Provenance of objects and sets of objects 3. Provenance of properties and sets of properties 4. Other requirements related to provenance 4. Modeling geospatial provenance with PROV-O 1. Dataset level provenance • Updating a map 2. Object level provenance 3. Property level provenance 5. Summary 6. Conclusions and Future work 11
  12. Modeling provenance in the geospatial domain: PROV-O extension Simple PROV-O extension to model the dataset level 12
  13. Dataset Level Provenance: Example 13
  14. Dataset integration approaches There are different alternatives for updating a map 14
  15. Object level provenance: scalability 15
  16. Property level provenance 16 Asserted properties do not have URIs! •New entities for describing their provenance Source A Source B :Bridge :height 20m :Bridge :length 1 km :Bridge :name “405 Fwy overpass” :metadata1 :metadata2 prov:wasDerivedFrom prov:wasDerivedFrom
  17. Conclusions 17 Requirements and major challenges for geospatial provenance 4 main categories: •Provenance of datasets •Provenance of objects appearing in the map •Provenance of properties •Other Analogous questions are relevant for dataset/object/prop erty provenance in non-geospatial domains.
  18. Date: 09/06/2014 User Requirements for Geospatial Provenance Daniel Garijo, Andreas Harth, Yolanda Gil Ontology Engineering Group. Universidad Politécnica de Madrid Information Sciences Institute, University of Southern California Institute AIFB, Karlsruhe Institute of Technology

Editor's Notes

  1. This presentation is a summary of the OWS-9 y OWS-10 discussions (In the context of OGC) Maps integrate information from many resources. Normally the data integration process is automatic, although it may have some manual steps (curate data, etc). Each source may have their own properties, geometries, data, etc, but when presenting to a user just a value for each thing is shown. Maps can be updated (e.g., a new road is built), and we need to track the provenance of the information to check its authenticity. This work summarizes the discussions with researchers and practitioners at several meetings and workshops on geospatial data. This effort is also of great importance for the community, as there is an ongoing effort on standarizing how to link entities in geospatial data (OGC and W3C)
  2. Given the previous problem, in this presentation we will show the challenges derived from the problem, A set of assumptions to simplify the integration scenario, the types of provenance that we can find on it, How to model it with PROV and the conclusions and future work.
  3. Trust based provenance: If a map is created from many datasets, we need to know if that dataset is a trusted one or not. Data integration and provenance: knowing which data came from each dataset can be very relevant to understand why a map is the way it is. Crowdsourcing and provenance: Some datasets like OSM depend on the data provided by users. It is key to know who contributed in what to assess its quality Granularity and provenance: different datasets provide different levels of granularity. A geographical feature can be a point, line or 3d area. Aggregation and provenance: maps are aggregations of features from other sources. Versioning and provenance: map updates
  4. Given the heterogeneity of the data, in this first approach to the problem we decided to simplify it. In a nutshell, what we assume is that the datasets are using the same model and that the entities across different datasets have been mapped. This is unrealistic, as it is a great effort. However, the W3C and OGC are already talking on how to align existent approaches to make a standard. We do this to be able to tackle and describe the main challenges regarding provenance in this scenario.
  5. Next I’ll talk about the types of provenance that we can find in the geospatial domain.
  6. Types of provenance: provenance of datasets. This is the most typical one, as it aims to describe the main features of a map: which sources were used, which process led to its creation, what are the changes made to the map, etc. A map may have been updated, and different versions might be available. Therefore we are also interested in browsing the provenance of sets of maps.
  7. Drilling down in granularity: maps are made of objects, and these objects may have its provenance as well. You could ask where does the object come from the organizations responsible for its appearance in the map, the date when the object was inserted, etc. As happened with the maps, we may be also interested in annotating sets of objects (in case they all share different annotations) instead of having them annotated individually.
  8. An object can have properties which have been integrated from different sources. The questions related to them are analogous to those that we could do to an object.
  9. Other requirements are not that easy to answer (not directly with a sparql query), but they can be benefited from the previous types of provenance. For example, if we want to answer how a set of corrections helped to improve a map, we can show the previous map and slowly introduce the changes, thus showing how the map is complete. We could answer the second question by retrieving the objects introduced in the newer version of the map, we could retrieve those with high confidence by modeling extra metadata from the algorithm, etc.
  10. Now that we have introduced the main requirements, how do we tackle them with PROV?
  11. First we need to introduce some basic extensions to PROV. These are very basic extensions and additional ones could be necessary to deal with the different levels of granularity. This is a work in progress and we still haven’t published the vocabulary extensions. We wanted to distinguish crowdsourced maps from integrated maps, as the former will be the inputs and the latter the outputs of the map integration processes. Other entities are the additional datasets consulted by the algorithm responsible for the integration of the map. We were going to introduce roles as well, but in the end decided to cut them out for simplicity.
  12. This would be an example of an integration of a map created from two different maps (GM and OSM). Explain a little the example
  13. There are three alternative approaches to creating new versions of the map: the new version of the map is generated anew, the new version of the map is generated taking into account the previous version of the map, and only the delta of the changes are generated. We assumed the second one in the previous example, although each approach is possible.
  14. This figure shows an example of several ways to store object level provenance. Maps can be big, and storing the provenance of every object might bring scalability issues. Recording partial provenance: Only particular aspects of provenance could be stored. For example, the only provenance assertions for an object could be references to the original objects identifiers. • Recording provenance selectively: During the integration process, specific decisions would be made as to what objects grant a detailed provenance record and which ones do not. For example, if an object was created with low confidence then detailed provenance would be recorded. • Aggregating provenance of objects: Objects with equivalent provenance could be grouped into collections, and the provenance would be attached to the collections. • Storing provenance separately: Provenance can be stored separately from the map itself. Several provenance services could be set up for the same map.
  15. The problem of modeling properties is that they do not have an identifier. Therefore we need to create a new entity (annotation, bundle, etc) which will contain the provenance for it. Explain the example with the bridge
  16. This is a summary of all the previous requirements, which is the main contribution. Discuss a little the difference between the sections and summarize each one a bit. Another contribution is the PROV extension
Advertisement