Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
An intrinsic approach for the detection and correction of attributive
inconsistencies and semantic heterogeneity in OSM da...
 OSM bottom-up community approach
 Rudimentary data model and attribute structure (tagging scheme K = v)
 Attributes: r...
 Within one way
 Within a succession of ways (e.g. street)
Attributive Inconsistencies
3
highway = motorway
name = Kenne...
 Different (correct) description for one and the same entity
 Specific to crowd-sourced data (≠ authoritative data follo...
 Considering attributive inconsistencies and semantic
heterogeneity is relevant for …
 Visualization (data rendering)
 ...
 Spatial data quality
 Standards (e.g. ISO 19157 = harmonization of
multiple preceeding standards) and extensive
body of...
 Why an intrinsic approach?
 Extrinsic approach requires reference data set,
which ideally has:
 Same geographical cove...
 Exclusively based on respective data set (data-centered
approach)
 Makes use of:
 Redundancy
 Inherent logic, functio...
Case Study Area
9
 4,600 km² in Austrian-Bavarian
boarder region
 ~ 22,600 km total network length
 Rural and urban are...
Major Road Network
10
 Major road = motorway, primary, secondary
(incl. links)
 Consistent for road category (highway = ...
Local Road Network
11
 Majority of ways in OSM
 Differences in terms of attribute
quality (existence, consistency etc.)
...
 Define set of logical/legal contradictions
 Connect to corresponding tags
 Tag specification according to Wiki
 Query...
 Distribution of inconsistencies:
 Regional diversity (national laws?)
 Spatial clusters (local mapper/communities?)
Sp...
 Correction without ground truthing = estimation
 Quality of estimation depends on number of functionally
related attrib...
 How to map a mixed foot-/cycleway in OSM?
Heterogeneity
15
http://www.stadt-salzburg.at
 How to map a mixed foot-/cycleway in OSM?
 Co-existence vs. “tag war”
 Credibility and reputation (Flanagin & Metzger ...
 Different (correct) views on same entity
Heterogeneity
17
highway = cycleway
surface = asphalt
ref = BGL 3
foot = design...
18
highway = track
name = Treppelweg
surface = gravel
tracktype = grade2
foot = yes
bicycle = yes
width = 3
highway = path...
 Define derived attributes that fit best for actual purpose
Harmonization of Heterogeneity
19
Loidl & Zagel (2014)
 OSMAXX
 Extracts OSM data
 Data cleaning (capital
letters etc.) and
harmonization
(generalization)
 Conversion to GIS...
 Inconsistency = quality issue
 Can be detected with intrinsic approach
 Heterogeneity = depends on purpose
 Definitio...
Upcoming SlideShare
Loading in …5
×

#AAG2015 presentation on OSM attribute inconsistency and semantic heterogeneity

5,391 views

Published on

This presentation was given in a session dedictated to OpenStreetMap studies during the annual meeting of the Association of the American Geographers (AAG) in Chicago, IL.

Published in: Science
  • Be the first to comment

#AAG2015 presentation on OSM attribute inconsistency and semantic heterogeneity

  1. 1. An intrinsic approach for the detection and correction of attributive inconsistencies and semantic heterogeneity in OSM data Martin Loidl | martin.loidl@sbg.ac.at Stefan Keller| sfkeller@hsr.ch AAG Annual Meeting – Workshop OpenStreetMap Studies Chicago, April 24th 2015
  2. 2.  OSM bottom-up community approach  Rudimentary data model and attribute structure (tagging scheme K = v)  Attributes: recommendations ≠ conventions ≠ formalized standard  No restriction of tag usage and definition Problem Statement 2 http://www.openstreetmap.es
  3. 3.  Within one way  Within a succession of ways (e.g. street) Attributive Inconsistencies 3 highway = motorway name = Kennedy Expressway bicycle = yes highway = motorway name = Kennedy Expressway ref = I 90 highway = motorway name = Fisher Freeway ref = I 90 highway = motorway name = Kennedy Expressway ref = I 90
  4. 4.  Different (correct) description for one and the same entity  Specific to crowd-sourced data (≠ authoritative data follow strict specifications) Semantic Heterogeneity 4 highway = cycleway foot = designated width = 3 highway = path bicycle = designated foot = yes highway = footway bicycle = designated surface = asphalt
  5. 5.  Considering attributive inconsistencies and semantic heterogeneity is relevant for …  Visualization (data rendering)  Descriptive statistics (classification)  Spatial analysis (e.g. routing)  Improve results through  Harmonization (remove semantic heterogeneities)  Correction through estimation (gaps, inconsistencies) Relevance 5
  6. 6.  Spatial data quality  Standards (e.g. ISO 19157 = harmonization of multiple preceeding standards) and extensive body of literature  of limited use for OSM data  Quality asssessment of OSM data  Primarily focusing on positional accuracy and geometrical completeness  Reference data set and/or descriptive statistics  Comparable little work on attribute quality Data Quality 6 Haklay 2010 Hochmair et al. 2015 Barron et al. 2014
  7. 7.  Why an intrinsic approach?  Extrinsic approach requires reference data set, which ideally has:  Same geographical coverage  Same data model and attribute structure  [Koukoletsos et al. (2012): multi-stage process to deal with it to a certain extent]  Quality of reference data set (authoritative data doesn‘t necessarily imply better data!)  Data often created for very different purposes Quality Assessment 7 Elsbethen (Austria): authoritative data – OSM data
  8. 8.  Exclusively based on respective data set (data-centered approach)  Makes use of:  Redundancy  Inherent logic, functionally related attributes Intrinsic Approach 8 Translation into query statements highway = * surface = * tracktype = *
  9. 9. Case Study Area 9  4,600 km² in Austrian-Bavarian boarder region  ~ 22,600 km total network length  Rural and urban areas  Data preparation  Extraction from OSM Database (April 1st 2015)  Conversion to topological correct graph (edge-node) in GeoDB
  10. 10. Major Road Network 10  Major road = motorway, primary, secondary (incl. links)  Consistent for road category (highway = *)  Makes features mappable = primary intent/purpose of OSM  Attributes incomplete (n = 11,951 segments)  name = *: 64.6%  surface = *: 22.93% [ can be estimated: asphalt]  maxspeed = *: 72.19%  lanes = *: 57.86%  Rather an issue of completeness than of inconsistency and heterogeneity
  11. 11. Local Road Network 11  Majority of ways in OSM  Differences in terms of attribute quality (existence, consistency etc.)  Relevant e.g. for active modes of transport (cycling, hiking etc.)  In many cases more extensive (spatial coverage, attribute details) than authoritative data
  12. 12.  Define set of logical/legal contradictions  Connect to corresponding tags  Tag specification according to Wiki  Query the dataset for contradictions Attributive Inconsistencies 12 approx. 1 from 1,000 ("tracktype" = 'grade3' or "tracktype" = 'grade4' or "tracktype" = 'grade5') and "surface" = 'asphalt'
  13. 13.  Distribution of inconsistencies:  Regional diversity (national laws?)  Spatial clusters (local mapper/communities?) Spatial Particularities 13 highway = residential maxspeed = 80
  14. 14.  Correction without ground truthing = estimation  Quality of estimation depends on number of functionally related attributes Correction of Inconsistencies 14
  15. 15.  How to map a mixed foot-/cycleway in OSM? Heterogeneity 15 http://www.stadt-salzburg.at
  16. 16.  How to map a mixed foot-/cycleway in OSM?  Co-existence vs. “tag war”  Credibility and reputation (Flanagin & Metzger 2008) Heterogeneity 16 ("highway" = 'footway' and ("bicycle" = 'designated' or "bicycle" = 'yes' or "bicycle" = 'official')) OR ("highway" = 'cycleway' and ("foot" = 'designated' or "foot" = 'yes')) OR ("highway" = 'path' and ("foot" = 'designated' or "foot" = 'official') and ("bicycle" = 'designated' or "bicycle" = 'official')) OR ("highway" = 'track' and ("foot" = 'designated' or "foot" = 'official') and ("bicycle" = 'designated' or "bicycle" = 'official')) 669 segments 1,202 segments 2,655 segments 73 segments
  17. 17.  Different (correct) views on same entity Heterogeneity 17 highway = cycleway surface = asphalt ref = BGL 3 foot = designated bicycle = designated segregated = no Last editor: j_cook highway = path surface = asphalt foot = designated bicycle = designated Last editor: pyram
  18. 18. 18 highway = track name = Treppelweg surface = gravel tracktype = grade2 foot = yes bicycle = yes width = 3 highway = path name = Treppelweg surface = gravel tracktype = grade2 foot = designated bicycle = designated width = 3 http://www.bing.com/maps
  19. 19.  Define derived attributes that fit best for actual purpose Harmonization of Heterogeneity 19 Loidl & Zagel (2014)
  20. 20.  OSMAXX  Extracts OSM data  Data cleaning (capital letters etc.) and harmonization (generalization)  Conversion to GIS formats  For visualization and geospatial analysis Harmonization of Heterogeneity 20
  21. 21.  Inconsistency = quality issue  Can be detected with intrinsic approach  Heterogeneity = depends on purpose  Definition of derived attributes  Implement assessment routines during editing or in post- processing?  Tag recommender system during editing (Vandecasteele & Devillers 2014)  Probabilistic approach and/or functionally related attributes  Prevent from contradiction  Data tuning in post-processing allows specification for actual purpose  Combination  prevent – detect – repair (Herzog et al 2007)  Data model issue  social complexity of OSM (Spielmann 2014) Wrap-Up 21 @gicycle_ gicycle.wordpress.com

×