• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Beyond good enough? Spatial Data Quality and OpenStreetMap data
 

Beyond good enough? Spatial Data Quality and OpenStreetMap data

on

  • 20,151 views

State of the Map '09 presentation. Covering spatial data quality and comparison of Ordnance Survey data (Meridian 2, 10K Raster, MasterMap ITN) to OSM for England....

State of the Map '09 presentation. Covering spatial data quality and comparison of Ordnance Survey data (Meridian 2, 10K Raster, MasterMap ITN) to OSM for England.
Some material appeared in previous presentation.

Statistics

Views

Total Views
20,151
Views on SlideShare
19,262
Embed Views
889

Actions

Likes
7
Downloads
0
Comments
0

9 Embeds 889

http://povesham.wordpress.com 725
http://adeshogues.wordpress.com 126
http://www.slideshare.net 31
https://povesham.wordpress.com 2
http://geothink.textcube.com 1
http://www.yatedo.com 1
https://twitter.com 1
http://www.yatedo.fr 1
http://www.google.co.uk 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • The Athenian metaphor goes even further. You also need to be a male, healthy, in your 20 or 30 and wealthy in order to participate in the fun.
  • The higher level dataset to be used was OS MasterMap. The Ordnance Survey’s MasterMap is a database that records every fixed feature of Great Britain larger than a few metres in one continuous digital map – It is a framework on which future OS products will be based The ITN layer is one of four layers, and the layer used for this project. ITN obtained from Edina, OSM for CloudeMadeITN data obtained in GML format, which was then converted to shp file format, and the required OSM downloaded as shp file.
  • - 25km square grid , based on OS 1:10000 grid tilesThe tiles were chosen to cover a range of London, North, South, East & West, two close and two further out from the centre, so that a good sample was chosen
  • Requires the use of a reference source (ITN) and a test source (OSM)Here we have an example of an ITN and corresponding OSM road feature.ITN is the higher level dataset, and therefore the ITN road feature is considered to be the actual centreline of the road. A buffer of width x is created around the ITN featureThe proportion of the OSM road that lies that lies within the buffer is then calculatedThis analysis carried out for every A-road, B-road and motorway feature in the datasets
  • The results are very high, indicating that the positional accuracy of OSM is very good.In particular, the South London tile had an average of 92.62% overlap for A-roads, and North/Central London tile had 81.46% overlap for B-roads.Only one motorway segment (very high)
  • - Results maps for each of the four test regions- The distribution of results in the histogram is clearly illustrated by the results maps. Anything above 90% overlap in red, between 80-90% in orange, and between 70-80% in yellow.
  • First diagram:The trend line in the scatter diagram shows a slight positive trend indicating the that as the number of users increase so does the road name attribute completeness This is certainly the case for any grid square constituting more than 25 users, where the road name attribute completeness was no less than 60% with most results between 80-100%. However, the spread of results are very varied for all the grid squares with less than 20 users; in fact removing all results above 20 users would produce no correlation the vast majority of results appear to lie between 60-100% road name attribute completeness, and are therefore all good results. the user analysis research carried by Dr Haklay found 89.5% of the whole of England covered by 3 or less users. Considering my results showed no grid squares as having less than 5 users, it is fair to say that these results are not a true reflection of the majority of England, with all my results being covered by more users than normal This probably explains why the attribute completeness was generally quite highSecond Diagram:There is no positive or negative trend here, and all that the results really shows is that positional accuracy is very high regardless of the number of users. This is probably due to a number of reasons. Firstly, it only takes a single user to achieve very high positional accuracy depending on the GPS equipment used and the nature of data collection. Secondly, unlikely that users in same regions set out to measure features already mapped in OSM

Beyond good enough? Spatial Data Quality and OpenStreetMap data Beyond good enough? Spatial Data Quality and OpenStreetMap data Presentation Transcript

  • Beyond good enough? Spatial Data Quality and OpenStreetMap data
    Dr Muki Haklay
    Department of Civil, Environmental and Geomatic Engineering, UCL
    m.haklay@ucl.ac.uk
    With contributions from AamerAther (M.Eng 2009) and NaureenZulfiqar (M.Eng 2008)
    Ordnance Survey data was kindly provided by the Ordnance Survey research unit.
    OSM data was provided by GeoFabrik & CloudMade
  • Outline
    Understanding quality of geographical information
    Evaluation of OSM with Meridian data set
    Evaluation of OSM with MasterMap
    What does it all means?
  • The quality issue
    How good it the data?
    First question: good for what? Subjective quality – fitness for purpose/use
    Second question: how to measure? Objective quality – but need to evaluate it in light of the first question
  • The quality issue
    How good it the data?
    Positional accuracy – the position of features or geographic objects in either two or three dimensions
    Temporal accuracy – how up to date is the data? Does it presents the existing situation and when will it be updated?
    Thematic/attribute accuracy – for quantitative attributes (width) and qualitative attributes (geographic names)
    Completeness – The presence and absence of objects in a dataset at a particular point in time
    Logical consistency –adherence to the logical rules of the data structure, attribution and relationships
  • The ‘problem’
    We know little about the people that collect it, their skills, knowledge or patterns of data collection
    Loose coordination and no top-down quality assurance processes – can’t produce good data
    It is not complete and comprehensive – there are white areas
  • Who collects?
  • Who collects?
    (c) Dair Grant
    (cc) Shaun McDonald
    (cc) Chris Fleming
  • Working together
  • Users
    Participation inequality – small group of users collect most of the information, lots of users collect very little
    Little ‘on the ground’ collaboration. Important as this is can be the main source of quality assurance - ‘Given enough eyeballs, all bugs are shallow’ (Raymond, 2001)
    But does Linus’ law apply to OSM?!?
  • Accuracy and Completeness- Study I
    Comparing OSM to OS Meridian 2 roads layer
    Maridian 2 -Motorways, major and minor roads are... Complex junctions are collapsed to single nodes and multi-carriageways to single links... some minor roads and cul-de-sacs less than 200m are not represented... Private roads and tracks are not included...
    Nodes are derived from 1:1,250-1:2,500 mapping, with 20m filter around centre line generalisation
  • Positional Accuracy
    A
    B
    Meridian 2 and OSM – Motorway comparison
  • Goodchild and Hunter (1997), Hunter (1999) method
    Assuming that one dataset is of higher quality
    Create buffer around the dataset with known width
    Calculate the percentage of the evaluated dataset that falls within the buffer
  • Motorway comparison
    Buffer of 20m
    Average of 80% - ranging from 59.81% to 88.80%
  • Estimating positional accuracy
  • Positional accuracy
    On each tile, 100 points sample with evaluation of distance between OSM and Meridian 2
    Can see significant variability: from about 3m to over 8m
  • Completeness – bulk method
    Assumption: as Meridian 2 is generalised, for each completed sq km:
    Total length(OSM roads)>Total length(Meridian 2 roads)
    Dividing England to 1km grid squares, and running a comparison for each cell
  • London
  • Birmingham
  • Manchester and Liverpool
  • Length comparison
    For 29.3% of the area of England, OSM is getting nearer completion and as good as Meridian 2 (March 2008). Estimated at %45-50 today.
    When adding to this attributes, the percentage drops to 24.5% (March 2008). Estimated %35 today.
    Centres of major cities are well mapped.
  • Completeness - visual comparison
  • Completeness – visual comparison
  • Completeness – difference by user?
  • Comparison II – Ordnance Survey Master Map
    Data used for comparison: OS MasterMap Integrated Transport Network (ITN) layer
    ITN consists of road network information
    The most accurate and up-to-date geographic reference for Great Britain’s road structure
    Any major real world changes are updated within 6 months
    Used for numerous applications
    e.g. Transport management systems, road routing, emergency planning...
  • Four test locations chosen:
    TQ28se
    TQ38se
    TQ17ne
    TQ37sw
  • Buffer analysis – again based on Goodchild and Hunter (1997) buffer comparison technique:
    Buffer width (X):
    X
    ITN
    OSM
    Comparison methodology
  • Buffer overlap results:
    109 roads examined covering over 328 km
    Results of Master Map comparison
  • TQ38se (East London)
    TQ28se (North/Central London)
  • TQ37sw (South London)
    TQ17ne (West London)
  • Quality not linked to length
  • What does it mean?
    OSM is better than Meridian 2 in terms of positional accuracy, and less accurate than MasterMap
    The differences that were found in comparison I are a mix of the positional inaccuracies of both Meridian 2 and OSM. The higher overlap with MasterMap tells us that OSM was the more accurate of the two...
  • However ...
  • What are they paying for?
    Meridian is officially not complete, clearly not accurate in terms of position, and without clear ‘6 month of major changes update’ rule
    Hypothesis:
    When people buy geodata, they pay for the errors, or the notion that the errors are well known and quantified.
    Are they?
  • Putting a price tag on OSM ?
    1 seat of Meridian 2 for England - £1272
    OSM is 35% complete (positional and attribute) ...
    ... But higher positional accuracy than Meridian 2
    So maybe £500 per seat?
    If so, each Sq Km of OSM is worth about 40p or 0.5€ .
  • Linus’ law and OSM – inconclusive
  • So should I use OSM?
    OSM is fit for many purposes to which Meridian 2 is suitable
    Positional accuracy is satisfactory for many applications. Attribute accuracy is also satisfactory.
    Completeness in major urban area is satisfactory – and if the work is at a specific location, it is easy to improve and complete the dataset
  • Conclusions
    OSM quality is beyond good enough, it is a product that can be used for a wide range of activities
    Better quality proxies, can be developed (for example, by user)
    Quality procedures should also developed with passive sensing from mobile devices
    More work is required on Linus’ Law
  • Further reading
    Haklay, M., 2008, How good is OpenStreetMap information? A comparative study of OpenStreetMap and Ordnance Survey datasets for London and the rest of England, submitted to Environment and Planning B.
    Haklay, M. And Weber, P., 2008, OpenStreetMap – User Generated Street Map, IEEE Pervasive Computing.
    Haklay, M., Singleton, A., and Parker, C., 2008, Web mapping 2.0: the Neogeography of the Geoweb, Geography Compass
    Haklay, M., 2008, Open Knowledge – learning from environmental information, presented at the Open Knowledge Conference (OKCon) 2008, London, 15 March.
    Haklay, M., 2007, OSM and the public - what barriers need to be crossed?presented at State of the Map conference, Manchester, UK, 14-15 July.
    To get a copy, write to m.haklay@ucl.ac.uk , or get them on povesham.wordpress.com