Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Quality and Neogeography


Published on

A review of the role played by User Generated Content in creating or augmenting spatial databases.

Published in: Technology
  • Be the first to comment

Data Quality and Neogeography

  1. 2. Before We Start <ul><li>I am not here to persuade you about the usefulness or limitations of Neogeography or User Generated Content </li></ul><ul><li>I am here to share my views on issues relating to the topic of spatial data quality and neogeography </li></ul><ul><li>Disclaimer - In general, my observations derive from my familiarity with mapping, navigation and local search </li></ul>
  2. 3. My Background <ul><li>PhD in Geography, specializing in Cartography </li></ul><ul><li>Attended AutoCarto 1 in 1974 (and gave the keynote in 2008) </li></ul><ul><li>Associate Professor of mapping and geography at SUNY Albany (1972–1985) </li></ul><ul><li>Associate at Spad Systems </li></ul><ul><li>Chief Cartographer, Chief Technologist and VP of BizDev for Rand McNally (1986-1999) </li></ul><ul><li>CTO and EVP of Engineering for go2 Systems (YP over cell phones) </li></ul><ul><li>Now run a consulting business focused on geospatial, especially local search, mapping and navigation applications </li></ul>
  3. 4. Data Quality and Neogeography Dr. Mike Dobson President TeleMapics LLC [email_address]
  4. 5. Spatial Data Quality? <ul><li>Overall concern regarding the “fitness” of data for a particular use </li></ul><ul><ul><li>Accuracy of position </li></ul></ul><ul><ul><ul><li>resolution </li></ul></ul></ul><ul><ul><li>Accuracy of Attribution </li></ul></ul><ul><ul><ul><li>Logical Consistency </li></ul></ul></ul><ul><ul><li>Completeness </li></ul></ul><ul><ul><ul><li>Including spatial coverage </li></ul></ul></ul><ul><ul><li>Temporal relevance </li></ul></ul><ul><ul><li>Metadata </li></ul></ul>
  5. 6. Spatial Data’s Emerging Popularity <ul><li>World of spatial data is exploding </li></ul><ul><ul><ul><li>Accessibility to spatial data increasing </li></ul></ul></ul><ul><ul><ul><li>Availability of spatial data increasing </li></ul></ul></ul><ul><ul><ul><li>Today’s online environment provides </li></ul></ul></ul><ul><ul><ul><ul><li>Easy-to-use tools for collecting spatial data </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Easy-to-use tools for analyzing spatial data </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Easy-to-use tools for presenting spatial data </li></ul></ul></ul></ul>
  6. 7. Why Is This of Concern? <ul><li>The quality of spatial data mitigates the success of communicating spatial concepts </li></ul><ul><ul><li>Could this explosive growth have an influence on the quality of spatial data? </li></ul></ul>
  7. 8. Why Data Quality Is Key
  8. 9. No Integrity!
  9. 10. Neogeography <ul><li>Neogeography </li></ul><ul><ul><li>“new” geography using non-traditional tools </li></ul></ul><ul><ul><li>Neogeographers </li></ul></ul><ul><ul><ul><li>Want to communicate/share their interests in geography and are willing to do something about it </li></ul></ul></ul>
  10. 11. NeoGeos <ul><li>What Roles do Neogeographers play in the process of communicating spatial data? </li></ul><ul><ul><li>Data collectors – database creators </li></ul></ul><ul><ul><li>Data analyzers </li></ul></ul><ul><ul><li>Data Presenters </li></ul></ul><ul><li>While all three roles impact or are influenced by “data quality”, today I will focus on neogeographers and data collection /database creation </li></ul>
  11. 12. Spatial Data Quality and Neogeography <ul><li>In order to help you understand my persuasion on data quality and neogeography, I would like to explore User Generated Content </li></ul><ul><ul><li>UGC is one of the primary means that neogeographers use to express their interest in Geography </li></ul></ul><ul><ul><ul><li>On this journey we will loop outside of geography and then fall back in through mapping and other uses of spatial data. </li></ul></ul></ul>
  12. 13. U ser G enerated C ontent? <ul><li>Content that is produced by users of web sites and digital media </li></ul><ul><ul><li>Contrasted with traditional media producers such as broadcasters, production companies publishing companies and map database companies </li></ul></ul>
  13. 14. So What’s Important About UGC? <ul><li>Equality of opportunity to publish </li></ul><ul><li>Coupled with one of the most significant demographic trends in the last century: </li></ul><ul><ul><li>“ It’s about me” (e.g. use of YouTube, MySpace, Facebook ) </li></ul></ul><ul><ul><ul><li>“ Especially in respect to the streets, roads and trails I travel, as well as the POIs I frequent and the spatial topics of interest to me ” </li></ul></ul></ul>
  14. 15. Social Networking
  15. 16. How Did This Happen? <ul><li>Technology that allows you to be “connected”, as well as to communicate and collaborate on your own terms </li></ul><ul><ul><li>Internet </li></ul></ul><ul><ul><li>Cellular telephony </li></ul></ul><ul><li>Development of comprehensive spatial databases </li></ul><ul><ul><li>Pushing geospatial into the mainstream -Neogeography </li></ul></ul>
  16. 17. How Did This Happen? <ul><li>Networks provide for </li></ul><ul><ul><li>Collective intelligence – the hive mentality or perhaps the Borg </li></ul></ul><ul><ul><li>Aggregated knowledge from decentralized sources (Wikipedia – Wikinomics) </li></ul></ul><ul><ul><li>Low cost collaboration </li></ul></ul>
  17. 18. UGC Potential Benefits <ul><li>Linus’s law </li></ul><ul><ul><li>With enough eyes all bugs ( spatial errors ) become trivial </li></ul></ul><ul><li>Contributors exhibit </li></ul><ul><ul><li>Self selection </li></ul></ul><ul><ul><li>Focus </li></ul></ul><ul><ul><li>Self benefit </li></ul></ul><ul><li>Numerousness </li></ul><ul><ul><li>There should be more interested spatial data contributors than professional map editors </li></ul></ul><ul><li>Spatial distribution </li></ul><ul><ul><li>The distribution of UGCers is more ubiquitous than that of professional map editors. </li></ul></ul>
  18. 19. Criticisms Of UGC <ul><li>Some error situations are too complex to be understood real-time </li></ul><ul><li>Usability may be low </li></ul><ul><li>May require extensive error checking </li></ul><ul><li>User priorities may lead to unreliability </li></ul><ul><li>Prejudice in responses </li></ul>
  19. 20. Lake What Road?
  20. 21. Not enough Contributors -Data Points?
  21. 22. User Priorities - Oooops
  22. 23. Prejudice in Response?
  23. 24. Prejudice in Response
  24. 25. UGC And Spatial Databases
  25. 26. Spatial Database Creation
  26. 27. What’s Being Optimized In The Previous Process? <ul><li>spatial data quality </li></ul><ul><ul><li>Accuracy of position </li></ul></ul><ul><ul><ul><li>resolution </li></ul></ul></ul><ul><ul><li>Accuracy of Attribution </li></ul></ul><ul><ul><ul><li>Logical Consistency </li></ul></ul></ul><ul><ul><li>Completeness </li></ul></ul><ul><ul><ul><li>Including spatial coverage </li></ul></ul></ul><ul><ul><li>Temporal relevance </li></ul></ul><ul><ul><li>Metadata </li></ul></ul>
  27. 28. How Optimized? <ul><li>Data Quality is an integral part of the process </li></ul><ul><ul><li>Initially </li></ul></ul><ul><ul><ul><li>Data collected according to specifications </li></ul></ul></ul><ul><ul><ul><ul><li>Bad data re-collected or placed in the update queue </li></ul></ul></ul></ul><ul><ul><li>Ongoing </li></ul></ul><ul><ul><ul><li>Every year significant spatial changes are accommodated. </li></ul></ul></ul><ul><ul><ul><li>Areas of high change are identified and updated. </li></ul></ul></ul><ul><ul><ul><li>Other changes are found by systematically working research teams through the entire coverage over time </li></ul></ul></ul><ul><ul><li>The overall assignment is designed to maximize the time value of money, while increasing the integrity of the database. </li></ul></ul>
  28. 29. Harmonization <ul><li>It is this attempt to actively harmonize all data that distinguishes database building efforts. </li></ul><ul><li>Important Issues </li></ul><ul><ul><ul><li>Who directs crowdsourced data from an editorial perspective? </li></ul></ul></ul><ul><ul><ul><li>Who sets standards for crowdsourced data? </li></ul></ul></ul><ul><ul><ul><li>Who Quality Controls crowdsourced data? </li></ul></ul></ul><ul><ul><ul><li>What external guidance exists in crowdsourced systems ? </li></ul></ul></ul>
  29. 30. Three Categories of Spatial Data <ul><li>Controlled data </li></ul><ul><ul><ul><li>OS, Navteq, TeleAtlas, INFOusa </li></ul></ul></ul><ul><li>Hybrid (a mix of controlled and uncontrolled data) </li></ul><ul><ul><ul><li>Google, Yahoo, MSN, TomTom </li></ul></ul></ul><ul><li>Crowdsourced (uncontrolled) </li></ul><ul><ul><ul><li>OSM, Flickr, etc </li></ul></ul></ul>
  30. 31. Issue <ul><li>It is possible to manage controlled data quality to meet specific requirements </li></ul><ul><li>It is possible to manage hybrid data quality to meet specific requirements </li></ul><ul><li>But can you manage crowdsourced data quality to meet specific requirements on a reliable basis? </li></ul><ul><li>Let’s look at database compilation for some insights </li></ul>
  31. 32. Compilation <ul><li>Commercial </li></ul><ul><ul><li>Training in compilation </li></ul></ul><ul><ul><li>Specialization </li></ul></ul><ul><ul><li>Staff size limited </li></ul></ul><ul><ul><li>Research limited </li></ul></ul><ul><ul><li>Sweat of the brow </li></ul></ul><ul><ul><ul><li>But salaried sweat of the brow </li></ul></ul></ul><ul><li>Wiki </li></ul><ul><ul><li>Self Selection </li></ul></ul><ul><ul><li>Local experience </li></ul></ul><ul><ul><li>Staff size potentially unlimited </li></ul></ul><ul><ul><li>Research hours potentially unlimited </li></ul></ul><ul><ul><li>Avocation </li></ul></ul>
  32. 33. Compare and Contrast <ul><li>Commercial </li></ul><ul><ul><li>What are my coverage goals? </li></ul></ul><ul><ul><li>What are my accuracy goals? </li></ul></ul><ul><ul><li>How Much can I spend on updating? </li></ul></ul><ul><ul><li>What size of capable staff can I afford? </li></ul></ul><ul><ul><ul><li>How well can I pay them? </li></ul></ul></ul><ul><ul><ul><li>How can I otherwise incent them to create the best database possible? </li></ul></ul></ul><ul><li>WIKI </li></ul><ul><ul><li>How many people will contribute? </li></ul></ul><ul><ul><ul><li>How many are capable? </li></ul></ul></ul><ul><ul><li>Where are they located? </li></ul></ul><ul><ul><ul><li>Does this match areas of weak coverage? </li></ul></ul></ul><ul><ul><li>How long will it take to get good results over large coverages? </li></ul></ul><ul><ul><li>How to motivate these collaborators over long periods? </li></ul></ul>
  33. 34. What Are The Potential Weaknesses of WIKI? <ul><li>Common issues </li></ul><ul><ul><li>Not enough data gatherers to validate the data </li></ul></ul><ul><ul><ul><li>or a method to redeploy them </li></ul></ul></ul><ul><ul><li>Not enough coverage to meet the need (the distribution of the UGCers) </li></ul></ul><ul><ul><ul><li>Or a method to redeploy them </li></ul></ul></ul><ul><ul><li>Lack of Standards </li></ul></ul><ul><ul><li>Lack of Quality Control </li></ul></ul><ul><li>But all of these limitation can be accommodated </li></ul>
  34. 35. Getting Around Some UGC Issues
  35. 36. Are Other Types of Spatial Databases Superior? <ul><li>Even with the benefits of Moolah ($) -Major navigation databases are </li></ul><ul><ul><li>Out of date </li></ul></ul><ul><ul><li>Inaccurate </li></ul></ul><ul><ul><li>Non-comprehensive </li></ul></ul><ul><ul><li>Variable quality </li></ul></ul><ul><ul><li>Too expensive to maintain </li></ul></ul><ul><ul><ul><li>Navteq database extension and update costs in 2007 were over $300,000,000 </li></ul></ul></ul>
  36. 37. Haywards Heath
  37. 38. And That’s Why UGC and Neogeographers <ul><li>Will become an integral part of building spatial databases </li></ul><ul><li>Hybrid data collection systems using UCG and controlled data are where geospatial is going </li></ul><ul><ul><li>Let’s look </li></ul></ul>
  38. 39. Old Information Sharing
  39. 40. New Information Sharing
  40. 41. What’s The New Process
  41. 42. Social Networking Tools Of Interest in Compilation
  42. 43. Spatial Data Collection <ul><ul><li>Some UGC will be active </li></ul></ul><ul><ul><ul><li>User connects to an app and enters relevant spatial data for updating or extending a spatial database </li></ul></ul></ul><ul><ul><li>Some UGC will be passive </li></ul></ul><ul><ul><ul><li>Device tracks and reports (anonymously) user paths, builds database by merging path information over time </li></ul></ul></ul><ul><ul><ul><ul><li>Passive is particularly useful in building navigation databases </li></ul></ul></ul></ul>
  43. 44. Relative Cost
  44. 45. Relative Accuracy
  45. 46. Summing UP <ul><li>Data Collection Systems </li></ul><ul><ul><li>Closed – commercial compilation efforts, no UGC </li></ul></ul><ul><ul><li>Open – WIKI approaches, no proprietary data </li></ul></ul><ul><ul><li>Hybrid – where geospatial is going </li></ul></ul><ul><ul><ul><li>Advantages spatial data accuracy by contributing the best of both approaches. </li></ul></ul></ul>
  46. 47. Raises These Questions <ul><li>Will the winners be </li></ul><ul><ul><li>Established commercial companies that capitalize on UGC to augment their data? </li></ul></ul><ul><ul><li>New competitors that commercialize UGC and augment these data to compete with established commercial systems? </li></ul></ul>
  47. 48. PND Data Flow – A Winner
  48. 49. UGC Open Street Data Flow – No Medal
  49. 50. Commercializing UGC
  50. 51. Relative Benefits Of Types Of UGC By Device
  51. 52. Why We Need UGC and Neogeographers
  52. 53. Thanks