4B_2_A step towards the improvement of spatial quality of web 2.0 geo-applications

826 views
777 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
826
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • The subject of the presentation is the improvement of SDQ of web 2.0 geo-applications by examining in particular the case of OSM
  • Well, maybe the most fundamental problem of GIS is how we can put the real world into an information system. How we can model the reality in such a way in order to fit into a GIS.
  • To deal with that problem the Ordnance Survey has published a catalogue that contains the real-world objects which actually serves as a specification for their OS MasterMap porduct. The scope of this catalogue, that has 566 pages, is to provide a list of the RWOs of the product and a list of features and attributes of each of the RWOs
  • In fact OSM has something similar to that. Well, it is not a catalogue per se rather a wiki page, but it serves the same purpose to provide a list of entities and possible attributes (or tags) that the users can assign to these entities
  • The thing is though that this list has not just been published but it has been created through democratic procedures with the help of the wiki technology. In brief, OSM users through a voting system can suggest which entities or tags need to be deleted, altered or added at the map feature page.
  • So in fact when we speak for OSM Data we actually speak for the geometry and the attributes or tags that users have assigned to the real world entities.
  • Now, regarding the quality of the OSM Geometry there has been some research either to examine completeness or positional accuracy against the OS Meridian2 dataset. But what we haven’t seen up to now is the quality of the tags
  • So the question is what is going on with the tags in OSM?
  • After all, tags is what actually transforms spaghetti-like digitized data into a proper map
  • So, we looked into what is going on in the OSM tags for GB. This graph shows 2 things. The first thing is the number of tags recorded for each of these 18 categories for GB So, we see that the population of tags ranges from just few thousands for motorway_links up to 900k tags for the residential roads category. In total these 18 categories have more than 2.2 million tags. The second thing shown here is the number of unique tags recorded for each category. The interesting thing here in this line graph is that we obviously don’t need more than 300 unique tags to describe a residential road, well, not even 50 unique tags to describe a motorway_link.
  • Now, the thing is that, despite the huge amount of tags generated by the OSM contributors the average number of tags per recorded entity is quite small, with the majority of the categories having between 1 and 3 tags per feature. This really gives us an indication about the OSM completeness in terms of entity attribution and certainly indicates that population of tags will keep getting bigger and bigger both because new entities will be digitized but also because the average number of tags per entity will grow.
  • Now, what we wanted to see is how often a new tag is introduced. Well, the answer is that this depends in the total tag population of each category. So, for example, for the residential roads, in average, we have a new tag for almost every 3000 tags where as for primary roads we get a new tag for every 600 tags. So, the question now is…. Is this good or bad? Well, in order to answer that question we translated this figures into percentage of growth
  • So, this graph shows what the growth of the tags population has to be for each category in order to have a new tag introduced for that category The interesting thing here is that after a threshold of about 40.000 tags, an increase of 0.3%-0.5% percent creates a new tag in each category.
  • Now, the next question is ….ok we do not need that many tags per feature category but how many tags are actually enough? We see here that just a small fraction of tags covers the 95% of the tag population in each case. So, actually we need only the tip of the iceberg to correctly model the real world and not the whole iceberg itself. So, is there something we can do about that?
  • Now, our initial aim was to examine the quality of OSM tag…. But examine the quality against what? OSM is a product that literally has no specification and it captures reality in much more detail than any other product. So what we wanted was first to create an XML Schema that will work as the OSM specification We did that by both manually gathering information both from the OSM wiki pages and by examining the tags that were included in the tip of the iceberg that showed you earlier that I showed you earlier. So just to give you an example of what the schema looks like
  • So, when we finished some fragments of that Schema we start performing all shorts of comparisons and with the actual data. Here is some of the interesting stuff we found. When we examined the entities of some of the OSM feature categories we found that the % of Schema violation was really high We noticed that the majority of the entities violated the schema because they had the ‘create_by’ tag that had been deprecated. When we didn’t take the create_by tag into account we saw that the % of feature violation was considerably smaller.
  • We performed the same evaluation in larger categories including all OSM Highways, Nodes and Places but this time we just examined the specific Schema principle that says that tags should not have the created_by key both for all the entities and for entities created after the 30 of April last year when this rule was adopted. The interesting thing here is that while new guidelines are announced for the OSM dataset users don’t implement them immediately rather they continue providing data as they used to.
  • So, what we suggest is in order to improve the quality of OSM and why not of other VGI geo-applications some sort of formalization should exist under the hood. Putting the XML Schema as a layer between the editors that users use and the database we can have Freedom, Formalization and preserve the Quality Standards of the dataset
  • This schema could also be used again under the hood with the voting system in that way that any changes decided would be automatically propagated to the schema and finally to the data
  • Before answering that question, lets see another parameter of the problem: In OSM data, since users are free to digitize or capture with a GPS whatever they want, there are numerous cases that the data capture has stopped without actually recording the entire entity. Where is the problem with that?
  • Suppose that while the geometry of the two adjacent features is pretty much ok, there might be discrepancies in the tags contributed to each part of that entity That kind of discrepancies actually diminishes the value of the entity.
  • Now, the OSM community has set up some of applications to deal with errors of the OSM dataset. These application, while they are quite impressive, to my view deal with the problem in a patchy way without providing a standard way of solving problems. In other word, while these apps spot some of the problems and invite users to correct them, they provide no guaranty that while correcting one error the user is not introducing another one.
  • To deal with that problem the Ordnance Survey has published a catalogue that contains the real-world objects which actually serves as a specification for their OS MasterMap porduct. The scope of this catalogue, that has 566 pages, is to provide a list of the RWOs of the product and a list of features and attributes of each of the RWOs
  • So, the next time someone shows you that really exciting graph of the OSM growth just remember that for every tiny fraction of that growth the errors are growing as well.
  • 4B_2_A step towards the improvement of spatial quality of web 2.0 geo-applications

    1. 1. A step towards the improvement of spatial data quality of Web 2.0 geo-applications The case of OpenStreetMap Vyron Antoniou, Muki Haklay, Jeremy Morley Department of Civil, Environmental and Geomatic Engineering
    2. 2. A fundamental GIS problem Information System Real World http://www.bing.com/maps Google Earth
    3. 4. OSM Map Features
    4. 5. Wiki Democracy +
    5. 6. OSM Data Geometry Attributes (Tags) +
    6. 7. OSM’s Geometry Haklay et al. Antoniou et al. Completeness Positional Accuracy
    7. 8. Tags?
    8. 10. Unique Tags vs Total Tags for each OSM Feature Category (GB) Sum: 2.25M tags
    9. 11. How many tags do we have for each entity?
    10. 12. Residential (2826) Primary (623) How often there is a new Tag introduced?
    11. 13. How often there is a new Tag introduced?
    12. 14. Unique Tags vs Popular Tags (95% of population)
    13. 15. From OSM wiki-pages to XML Schema XML Schema = OSM Specification
    14. 16. From OSM wiki-pages to XML Schema
    15. 17. <ul><li>Data Quality changes whenever there is a change in: </li></ul><ul><li>The data (e.g. due to a transformation) </li></ul><ul><li>The ground truth </li></ul><ul><li>The specifications of the product </li></ul>
    16. 18. <ul><li>Data Quality changes whenever there is a change in: </li></ul><ul><li>The data (e.g. due to a transformation) </li></ul><ul><li>The ground truth </li></ul><ul><li>The specifications of the product </li></ul>
    17. 19. Merkaartor Potlatch JOSM Freedom, Formalization and Quality Standards?
    18. 20. Freedom, Formalization and Quality Standards?
    19. 23. Final Points <ul><li>Tags can greatly affect the quality of a VGI Map. </li></ul><ul><li>The loose structure of Feature extraction and attribution introduces errors. </li></ul><ul><li>Errors are proportional to the tags’ population. </li></ul><ul><li>The uncontrolled voting system deteriorates the overall quality of the OSM data. </li></ul><ul><li>Errors or discrepancies in Features’ Tags can spread around. </li></ul><ul><li>Formalization can be achieved under the hood. </li></ul>
    20. 24. Thank you
    21. 25. Thank you!
    22. 26. User Discrepancies Start and Stop Digitizing?
    23. 27. Name: X Road One-way: Yes Name: Y Road One-way: No User Discrepancies
    24. 28. OSM Quality Assurance Openstreetbugs Keep Right
    25. 29. The purpose of this specification (566 pages!) is to: • provide a list of the RWOs present in OS MasterMap data; • define the RWOs present in OS MasterMap data; • list the features and attributes of each of the RWOs in OS MasterMap data; • clarify the representation of the RWOs in OS MasterMap data when needed; a nd • provide a document that will accommodate change as OS MasterMap data is enhanced.
    26. 30. OSM Growth Source: http://www.openstreetmap.org
    27. 33. OSM Data Quality Tests
    28. 36. Video
    29. 37. Tags: 2.25M tags in 18 OSM Feature Categories (UK)
    30. 38. Unique Tags per OSM Feature Category
    31. 39. Unique Tags vs Total Tags for each OSM Feature Category
    32. 40. What is really new with OSM? Free Raster maps…..Google Maps, Bing Maps, Yahoo! Maps? Licence terms User generated Content Geometry….. Vector data Attributes….. Tags

    ×