Improvement of Spatial Data Quality Using the Data Conflation

1,275 views

Published on

Improvement of Spatial Data Quality Using the Data Conflation
Silvija Stankute, Hartmut Asche -Geoinformation Research Group, Department of Geography, University of Potsdam

Published in: Technology, Education
1 Comment
1 Like
Statistics
Notes
No Downloads
Views
Total views
1,275
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
32
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide
  • With the introduction of digital mapping techniques in the 1960s and then GIS shortly afterwards, researchers realized that error and uncertainty in digital spatial data had the potential to cause problems that had not been experienced with paper maps. An international trend started in the early-1980s to design and implement data transfer standards which would include data quality information that had disappeared from the margins of paper maps with the transformation to digital data products. The main intention of this work is to present the data conflation as one of the options for improvement of spatial data quality.
  • In a number of fields, the approach to quality evolved into a definition based on fitness-for-use. ISO 8402 defines the quality as the ‘totality of characteristics of a product that bear on its ability to satisfy stated or implied needs’. This means that to define the quality two information are needed: the information on the data being used and on the users needs. Spatial data is defined to be fitness-for-use if it meets requirements of the target application.   Data quality is defined by one or more quality dimensions. Quality dimensions for geographic data are called spatial data quality elements. They include completeness, logical consistency, positional accuracy, temporal accuracy (the accuracy of reporting time associated with the data) and thematical/semantical or attribute accuracy. Typically, metadata for spatial data include descriptions of data quality and include information about these elements.
  • During the conflation process information from the source input dataset (SDS) and the target input dataset (TDS) have to be assigned to each other. The SDS is defined as the dataset from where the geospatial information is taken (e.g. thematic information) and the TDS is defined as the dataset to which the geospatial information taken from the SDS is being transferred, i.e. the expanded dataset.
  • In order to transmit the real world into the language understandable for the computer, it should be modeled according to specific rules in a simplified form. Such data models represent the objects of reality as points, lines or areas (polygons). Each of these objects is provided with the x -and y-coordinates and contains information on the spatial reference. This example shows the differences of data formats of the same object.
  • The different producers of spatial data detected the same object of the real world differently. There are no uniform rules for acquisition of spatial data. According to this the different abstract representations of one and the same object of the real world may arise. This Figure shows an example of alternative geometric representations of the same real world object. Each representation was generated by different spatial data providers.
  • The approach presented here improves the quality of spatial data. This method illustrates how to increase the geometrical completeness of the road networks data. In the source dataset available objects such roundabouts must be found in the target dataset and assigned to the new amended dataset.   The problem is that crossroads, which are roundabouts, in the dataset are saved as simple crossroad. At first a position of all available crossroads in the both datasets has to be found. A roundabout is finding if minimum three edges of the road network have the same start- and endpoint. If there are three edges, which have the same node, regardless of that is start or end point of each edge, then this intersection is a part of the roundabout.
  • In this way every crossroad of the dataset is verified. If a roundabout is defined, than at the second step the adequate crossroad is searched in the second dataset. Therefore the points are used, which are valid as traffic access or exits
  • All access or exits of roundabout are found in the first input dataset. The corresponding edges in the second input dataset are also found. Now the geometrical information about new objects can be assigned
  • After merge process of two or more datasets, the completeness of input data is always increased. This applies to all data types: polygons, lines, points. One condition must be fulfilled - one of the input datasets must have more information than the other. Not all new geometrical object of the end dataset include information about attributes. The completeness of the end dataset can never be complete in terms of thematic information. Datasets generated by conflation can be complete only in terms of geometrical information. The figure illustrates this problem.   The figure 3 shows an example of two datasets. The first dataset (source dataset) includes the information about 6 buildings. However in the real world total number of buildings is 8, so two objects in this dataset are not provided. The source dataset includes thematic information about type of use of these buildings. The second dataset (target dataset) includes geometrical information about 5 objects. The information about existence of the buildings number 6, 7 and 8 is not available. Unlike source dataset, target data have information about quantity of floors. This information in the first dataset is missing. The end dataset in the figure 3 shows the complete dataset in terms of geometric information. The table under it shows increment of attributes. Geometrical objects, which are available in both input datasets, have 100% thematically completeness. The missing objects have thematic information of only one input dataset.
  • Conflation approaches allow the improvement of positional and temporal accuracy as well. Positional accuracy of a dataset can be increased with the information given by another input dataset. If both datasets have the major variance from real world, the arithmetic average of all input datasets can increase this quality element. The temporal accuracy will be improved if metadata provide information about actuality of spatial data.
  • Improvement of Spatial Data Quality Using the Data Conflation

    1. 1. <ul><li>Improvement of </li></ul><ul><li>spatial data quality through data conflation </li></ul><ul><li>Silvija Stankute, Hartmut Asche </li></ul><ul><li>Geoinformation Research Group </li></ul><ul><li>Dept of Geography | University of Potsdam | Germany </li></ul>ICCSA 2011 | GEOG-AN-MOD 2011 | University of Santander | 20-23/06/2011
    2. 2. Summary <ul><li>Motivation: Spatial data quality matters </li></ul><ul><li>Spatial data quality: Definition, indicators </li></ul><ul><li>Data conflation: Optimising spatial data quality </li></ul><ul><li>Data conflation at work: Inserting a roundabout </li></ul><ul><li>Conclusion: What‘s the merit of data conflation? </li></ul>
    3. 3. <ul><li>Introduction of digital mapping techniques and GIS in the 1960s made quality of digital spatial data an issue in geoinformation processing (GI) </li></ul><ul><li>Error and uncertainty in spatial data identified as potential problems in GI processing uncommon in production and use of paper maps </li></ul><ul><li>Ongoing development from 1980s to design and implement data transfer standards which include data quality information hitherto available on the margins of paper maps only </li></ul><ul><li>Objective of this work is to present data conflation as one option in GI processing for improvement of spatial data quality </li></ul>1 Motivation Spatial data quality matters
    4. 4. OpenStreetMap Analog topo map 1:10K Brandenburg Viewer 1 Motivation Spatial data quality matters Potsdam in different spatial datasets
    5. 5. <ul><li>Geodata quality </li></ul><ul><li>ISO 8402: totality of characteristics of a product that bear on its ability to satisfy stated or implied needs > fitness-for-use </li></ul><ul><li>Definition of spatial data quality necessitates information on (a) geodata used, ( b) user requirements </li></ul><ul><li>Fitness-for-use: data meet requirements of target application </li></ul><ul><li>Geo data quality indicators </li></ul><ul><li>Completeness </li></ul><ul><li>Logical consistency </li></ul><ul><li>Positional accuracy </li></ul><ul><li>Temporal accuracy: accuracy of reporting time of data </li></ul><ul><li>Semantic/thematic/attribute accuracy </li></ul><ul><li>Information on geodata quality included in metadata </li></ul>2 Spatial data quality Definition, indicators
    6. 6. <ul><li>D ata acquisition </li></ul><ul><li>Different methods for spatial data acquisition developed by spatial data producers result in different </li></ul><ul><li> data types </li></ul><ul><li> data formats </li></ul><ul><li> semantic information of geodata </li></ul><ul><li>Consequence: multiplicity of spatial data </li></ul><ul><li>Problem: multiple data use of specific datasets </li></ul><ul><li>Option: data integration or data conflation applied to existing datasets instead of continuous acquisition of new spatial data with above faults </li></ul>2 Spatial data quality Data acquisiton
    7. 7. <ul><li>Objective </li></ul><ul><li>Automated merge of heterogenous geodata to application requirements to produce best-fit dataset for any specific application </li></ul>source dataset SDS target dataset TDS output dataset 3 Data conflation Optimising spatial data quality missing data inserted data
    8. 8. <ul><li>One spatial object, different data models </li></ul><ul><li>Real world spatial data transformed into computer-readable digital data model representing spatial features as (a) points, (b) lines or (c) areas (polygons) </li></ul><ul><li>Modelling of real world spatial data can result in different data models of identical real world object: traffic roundabout </li></ul>3 Data conflation Optimising spatial data quality
    9. 9. One spatial object, multiple geometry OpenStreet Map TeleAtlas ATKIS 3 Data conflation Optimising spatial data quality
    10. 10. 4 Data conflation at work Conceputal framework <ul><li>Substituting roundabout for road crossing </li></ul><ul><li>Inserting roundabout in dataset where roundabout modelled as road crossing = not defined as roundabout </li></ul><ul><li>Detecting “missing” roundabout by identifying position of crossings in input datasets: roundabout identified if minimum of 3 edges of road network have identical start and end point </li></ul><ul><li>When 3 edges are identified which have the same node (start or end point of edge), this intersection is part of roundabout </li></ul>
    11. 11. 4 Data conflation at work Automated workflow Producing best-fit dataset dataset 1 dataset 2 pre-processing pre-processing object assignment new dataset data sources
    12. 12. <ul><li>(a) edge tracing for identification of roundabout in input data-set 1, (b) search for roundabout access/exits in input dataset 2 </li></ul><ul><li>Merge access/exits with corresponding points on crossroads </li></ul>4 Data conflation at work Semantic accuracy <ul><li>Inserting roundabout in target dataset </li></ul><ul><li>Inserting roundabout </li></ul>
    13. 13. <ul><li>All access or exits of roundabout found in first input dataset </li></ul><ul><li>Corresponding edges in second input dataset also detected. </li></ul><ul><li>Geometrical information about new objects can be assigned to target dataset </li></ul>4 Data conflation at work Geometric completeness <ul><li>Assigning geometric information </li></ul><ul><li>Inserting roundabout </li></ul>
    14. 14. <ul><li>After completion of merge process of 2 or more datasets (points, lines, polygons) completeness of input data is always increased </li></ul><ul><li>Prerequisite: one of the input datasets must have more infor-mation than the other(s) </li></ul><ul><li>Not all new geometry objects of target dataset include infor-mation on thematic attributes, hence completeness of target dataset can never be complete in terms of thematic information </li></ul><ul><li>Consequence: Datasets generated by conflation can only be complete in terms of geometrical information </li></ul>4 Data conflation at work Data quality optimised
    15. 15. 4 Data conflation at work Data quality optimised <ul><li>Real world spatial data: 8 buildings </li></ul><ul><li>Source dataset in-cludes information on 6 buildings (geo-metry, use) </li></ul><ul><li>Target dataset in-cludes information on 5 buildings (geo-metry, floors) </li></ul><ul><li>End dataset com-plete with geometric information </li></ul><ul><li>Geometric objects of both input datasets have 100% thematic completeness </li></ul>
    16. 16. <ul><li>Conflation methods allow the improvement of positional and temporal accuracy of spatial data </li></ul><ul><li>Positional accuracy of a dataset can be increased with the information provided by another input dataset </li></ul><ul><li>If both datasets show major variance from the corresponding real world objects, arithmetic average of all input datasets can increase this quality element </li></ul><ul><li>Temporal accuracy can be improved if metadata provide infor-mation about actuality of spatial data </li></ul><ul><li>Data conflation facilitates multiple use of quality spatial data which can be generated automatically to application require-ments from existing suboptimal datasets </li></ul>5 Conclusion What‘s the merit of data conflation?
    17. 17. Thank you for your attention Questions? Comments? Feedback? Contact Hartmut Asche | gislab@uni-potsdam.de Dept of Geography | University of Potsdam | GER Web www.geographie.uni-potsdam.de/geoinformatik ICCSA 2011 | GEOG-AN-MOD 2011 | University of Santander | 20-23/06/2011

    ×