Before We Start I am not here to persuade you about the usefulness or limitations of Neogeography or User Generated Content I am here to share my views on issues relating to the topic of spatial data quality and neogeography Disclaimer - In general,  my observations derive from my familiarity with mapping, navigation and local search
My Background PhD in Geography, specializing in Cartography Attended AutoCarto 1 in 1974 (and gave the keynote in 2008) Associate Professor of mapping and geography at SUNY Albany (1972–1985) Associate at Spad Systems Chief Cartographer, Chief Technologist and VP of BizDev for Rand McNally (1986-1999) CTO and EVP of Engineering for go2 Systems (YP over cell phones) Now run a consulting business focused on geospatial, especially local search, mapping and navigation applications
Data Quality and Neogeography Dr. Mike Dobson President TeleMapics LLC [email_address]
Spatial Data Quality? Overall concern regarding the “fitness” of data for a particular use Accuracy of position resolution Accuracy of Attribution Logical Consistency Completeness Including spatial coverage Temporal relevance Metadata
Spatial Data’s Emerging  Popularity World of spatial data is exploding Accessibility to spatial data increasing Availability of spatial data increasing Today’s online environment provides Easy-to-use tools for collecting spatial data Easy-to-use tools for analyzing spatial data Easy-to-use tools for presenting spatial data
Why Is This of Concern? The quality of spatial data mitigates the success of communicating spatial concepts Could this explosive growth have an influence on the quality of spatial data?
Why Data Quality Is Key
No Integrity!
Neogeography Neogeography  “new” geography using non-traditional tools Neogeographers Want to communicate/share their interests in geography and are willing to do something about it
NeoGeos What Roles do Neogeographers play in the process of communicating spatial data? Data collectors – database creators Data analyzers Data Presenters While all three roles impact or are influenced by “data quality”, today I will focus on neogeographers and data collection /database creation
Spatial Data Quality and Neogeography In order to help you understand my persuasion on data quality and neogeography, I would like to explore User Generated Content UGC is one of the primary means that neogeographers use to express their interest in Geography On this journey we will loop outside of geography and then fall back in  through mapping and other uses of spatial data.
U ser  G enerated  C ontent? Content that is produced by users of web sites and digital media Contrasted with traditional media producers such as broadcasters, production companies publishing companies and map database companies
So What’s Important About UGC? Equality of opportunity to publish Coupled with one of the most significant demographic trends in the last century: “ It’s about me” (e.g. use of  YouTube, MySpace, Facebook )  “ Especially in respect to the streets, roads and trails I travel, as well as the POIs I frequent and the  spatial topics of interest to me ”
Social Networking
How Did This Happen? Technology that allows you to be “connected”, as well as to communicate and collaborate on your own terms Internet Cellular telephony Development of comprehensive spatial databases Pushing geospatial into the mainstream -Neogeography
How Did This Happen? Networks provide for Collective intelligence – the hive mentality or perhaps the Borg Aggregated knowledge from decentralized sources (Wikipedia – Wikinomics) Low cost collaboration
UGC Potential Benefits Linus’s law  With enough eyes all bugs ( spatial errors ) become trivial Contributors exhibit Self selection Focus Self benefit Numerousness There should be more interested spatial data contributors than professional map editors Spatial distribution The distribution of UGCers is more ubiquitous than that of professional map editors.
Criticisms Of UGC Some error situations are too complex to be understood real-time Usability may be low May require extensive error checking User priorities may lead to unreliability Prejudice in responses
Lake What Road?
Not enough Contributors -Data Points?
User Priorities - Oooops
Prejudice in Response?
Prejudice in Response
UGC And Spatial Databases
Spatial Database Creation
What’s Being Optimized In The Previous Process? spatial data quality Accuracy of position resolution Accuracy of Attribution Logical Consistency Completeness Including spatial coverage Temporal relevance Metadata
How Optimized? Data Quality is an integral part of the process Initially Data collected according to specifications Bad data re-collected or placed in the update queue Ongoing Every year significant spatial changes are accommodated. Areas of high change are identified and updated. Other changes are found by systematically working  research teams through the entire coverage over time The overall assignment is designed to  maximize the time value of money,  while increasing  the integrity of the database.
Harmonization It is this attempt to  actively harmonize all data  that distinguishes database building efforts. Important Issues Who directs crowdsourced data from an editorial perspective? Who sets standards for crowdsourced data? Who Quality Controls crowdsourced data? What external guidance exists in crowdsourced systems ?
Three Categories of Spatial Data Controlled data OS, Navteq, TeleAtlas, INFOusa Hybrid (a mix of controlled and uncontrolled data) Google, Yahoo, MSN, TomTom Crowdsourced (uncontrolled) OSM, Flickr, etc
Issue It is possible to manage  controlled data quality  to meet specific requirements It is possible to manage  hybrid data quality  to meet specific requirements But can you manage  crowdsourced data quality   to meet specific requirements on a reliable basis? Let’s look at database compilation for some insights
Compilation Commercial Training in compilation Specialization Staff size limited Research limited Sweat of the brow But salaried sweat of the brow Wiki Self Selection Local experience Staff size potentially unlimited Research hours potentially unlimited Avocation
Compare and Contrast Commercial What are my coverage goals? What are my accuracy goals? How Much can I spend on  updating? What size of capable staff can I afford? How well can I pay them? How can I otherwise incent them to create the best database possible? WIKI How many people will contribute? How many are capable? Where are they located? Does this match areas of weak coverage? How long will it take to get good results over large coverages? How to motivate these collaborators over long periods?
What Are The Potential Weaknesses of WIKI? Common issues Not enough data gatherers to validate the data  or a method to redeploy them Not enough coverage to meet the need (the distribution of the UGCers) Or a method to redeploy them Lack of Standards Lack of Quality Control But all of these limitation can be accommodated
Getting Around Some UGC Issues
Are Other Types of Spatial Databases Superior? Even with the benefits of Moolah ($) -Major navigation databases are Out of date Inaccurate Non-comprehensive Variable quality Too expensive to maintain Navteq database extension and update costs in 2007 were over $300,000,000
www.refnum.com/osm/gmaps/ Haywards Heath
And That’s Why UGC and Neogeographers Will become an integral part of building spatial databases Hybrid data collection systems using UCG and controlled data are where geospatial is going Let’s look
Old Information Sharing
New Information Sharing
What’s The New Process
Social Networking Tools Of Interest in Compilation
Spatial Data Collection Some UGC will be active User connects to an app and enters relevant spatial data for updating or extending a spatial database Some UGC will be passive Device tracks and reports (anonymously) user paths, builds database by merging path information over time Passive is particularly useful in building navigation databases
Relative Cost
Relative Accuracy
Summing UP Data Collection Systems Closed – commercial compilation efforts, no UGC Open – WIKI approaches, no proprietary data Hybrid – where geospatial is going Advantages spatial data accuracy by contributing the best of both approaches.
Raises These Questions Will the winners be Established commercial companies that capitalize on UGC to augment their data? New competitors that commercialize UGC and augment these data to compete with established commercial systems?
PND Data Flow – A Winner
UGC Open Street Data Flow – No Medal
Commercializing UGC
Relative Benefits Of Types Of UGC By Device
Why We Need UGC and Neogeographers
Thanks

Data Quality and Neogeography

  • 1.
  • 2.
    Before We StartI am not here to persuade you about the usefulness or limitations of Neogeography or User Generated Content I am here to share my views on issues relating to the topic of spatial data quality and neogeography Disclaimer - In general, my observations derive from my familiarity with mapping, navigation and local search
  • 3.
    My Background PhDin Geography, specializing in Cartography Attended AutoCarto 1 in 1974 (and gave the keynote in 2008) Associate Professor of mapping and geography at SUNY Albany (1972–1985) Associate at Spad Systems Chief Cartographer, Chief Technologist and VP of BizDev for Rand McNally (1986-1999) CTO and EVP of Engineering for go2 Systems (YP over cell phones) Now run a consulting business focused on geospatial, especially local search, mapping and navigation applications
  • 4.
    Data Quality andNeogeography Dr. Mike Dobson President TeleMapics LLC [email_address]
  • 5.
    Spatial Data Quality?Overall concern regarding the “fitness” of data for a particular use Accuracy of position resolution Accuracy of Attribution Logical Consistency Completeness Including spatial coverage Temporal relevance Metadata
  • 6.
    Spatial Data’s Emerging Popularity World of spatial data is exploding Accessibility to spatial data increasing Availability of spatial data increasing Today’s online environment provides Easy-to-use tools for collecting spatial data Easy-to-use tools for analyzing spatial data Easy-to-use tools for presenting spatial data
  • 7.
    Why Is Thisof Concern? The quality of spatial data mitigates the success of communicating spatial concepts Could this explosive growth have an influence on the quality of spatial data?
  • 8.
  • 9.
  • 10.
    Neogeography Neogeography “new” geography using non-traditional tools Neogeographers Want to communicate/share their interests in geography and are willing to do something about it
  • 11.
    NeoGeos What Rolesdo Neogeographers play in the process of communicating spatial data? Data collectors – database creators Data analyzers Data Presenters While all three roles impact or are influenced by “data quality”, today I will focus on neogeographers and data collection /database creation
  • 12.
    Spatial Data Qualityand Neogeography In order to help you understand my persuasion on data quality and neogeography, I would like to explore User Generated Content UGC is one of the primary means that neogeographers use to express their interest in Geography On this journey we will loop outside of geography and then fall back in through mapping and other uses of spatial data.
  • 13.
    U ser G enerated C ontent? Content that is produced by users of web sites and digital media Contrasted with traditional media producers such as broadcasters, production companies publishing companies and map database companies
  • 14.
    So What’s ImportantAbout UGC? Equality of opportunity to publish Coupled with one of the most significant demographic trends in the last century: “ It’s about me” (e.g. use of YouTube, MySpace, Facebook ) “ Especially in respect to the streets, roads and trails I travel, as well as the POIs I frequent and the spatial topics of interest to me ”
  • 15.
  • 16.
    How Did ThisHappen? Technology that allows you to be “connected”, as well as to communicate and collaborate on your own terms Internet Cellular telephony Development of comprehensive spatial databases Pushing geospatial into the mainstream -Neogeography
  • 17.
    How Did ThisHappen? Networks provide for Collective intelligence – the hive mentality or perhaps the Borg Aggregated knowledge from decentralized sources (Wikipedia – Wikinomics) Low cost collaboration
  • 18.
    UGC Potential BenefitsLinus’s law With enough eyes all bugs ( spatial errors ) become trivial Contributors exhibit Self selection Focus Self benefit Numerousness There should be more interested spatial data contributors than professional map editors Spatial distribution The distribution of UGCers is more ubiquitous than that of professional map editors.
  • 19.
    Criticisms Of UGCSome error situations are too complex to be understood real-time Usability may be low May require extensive error checking User priorities may lead to unreliability Prejudice in responses
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
    UGC And SpatialDatabases
  • 26.
  • 27.
    What’s Being OptimizedIn The Previous Process? spatial data quality Accuracy of position resolution Accuracy of Attribution Logical Consistency Completeness Including spatial coverage Temporal relevance Metadata
  • 28.
    How Optimized? DataQuality is an integral part of the process Initially Data collected according to specifications Bad data re-collected or placed in the update queue Ongoing Every year significant spatial changes are accommodated. Areas of high change are identified and updated. Other changes are found by systematically working research teams through the entire coverage over time The overall assignment is designed to maximize the time value of money, while increasing the integrity of the database.
  • 29.
    Harmonization It isthis attempt to actively harmonize all data that distinguishes database building efforts. Important Issues Who directs crowdsourced data from an editorial perspective? Who sets standards for crowdsourced data? Who Quality Controls crowdsourced data? What external guidance exists in crowdsourced systems ?
  • 30.
    Three Categories ofSpatial Data Controlled data OS, Navteq, TeleAtlas, INFOusa Hybrid (a mix of controlled and uncontrolled data) Google, Yahoo, MSN, TomTom Crowdsourced (uncontrolled) OSM, Flickr, etc
  • 31.
    Issue It ispossible to manage controlled data quality to meet specific requirements It is possible to manage hybrid data quality to meet specific requirements But can you manage crowdsourced data quality to meet specific requirements on a reliable basis? Let’s look at database compilation for some insights
  • 32.
    Compilation Commercial Trainingin compilation Specialization Staff size limited Research limited Sweat of the brow But salaried sweat of the brow Wiki Self Selection Local experience Staff size potentially unlimited Research hours potentially unlimited Avocation
  • 33.
    Compare and ContrastCommercial What are my coverage goals? What are my accuracy goals? How Much can I spend on updating? What size of capable staff can I afford? How well can I pay them? How can I otherwise incent them to create the best database possible? WIKI How many people will contribute? How many are capable? Where are they located? Does this match areas of weak coverage? How long will it take to get good results over large coverages? How to motivate these collaborators over long periods?
  • 34.
    What Are ThePotential Weaknesses of WIKI? Common issues Not enough data gatherers to validate the data or a method to redeploy them Not enough coverage to meet the need (the distribution of the UGCers) Or a method to redeploy them Lack of Standards Lack of Quality Control But all of these limitation can be accommodated
  • 35.
  • 36.
    Are Other Typesof Spatial Databases Superior? Even with the benefits of Moolah ($) -Major navigation databases are Out of date Inaccurate Non-comprehensive Variable quality Too expensive to maintain Navteq database extension and update costs in 2007 were over $300,000,000
  • 37.
  • 38.
    And That’s WhyUGC and Neogeographers Will become an integral part of building spatial databases Hybrid data collection systems using UCG and controlled data are where geospatial is going Let’s look
  • 39.
  • 40.
  • 41.
  • 42.
    Social Networking ToolsOf Interest in Compilation
  • 43.
    Spatial Data CollectionSome UGC will be active User connects to an app and enters relevant spatial data for updating or extending a spatial database Some UGC will be passive Device tracks and reports (anonymously) user paths, builds database by merging path information over time Passive is particularly useful in building navigation databases
  • 44.
  • 45.
  • 46.
    Summing UP DataCollection Systems Closed – commercial compilation efforts, no UGC Open – WIKI approaches, no proprietary data Hybrid – where geospatial is going Advantages spatial data accuracy by contributing the best of both approaches.
  • 47.
    Raises These QuestionsWill the winners be Established commercial companies that capitalize on UGC to augment their data? New competitors that commercialize UGC and augment these data to compete with established commercial systems?
  • 48.
    PND Data Flow– A Winner
  • 49.
    UGC Open StreetData Flow – No Medal
  • 50.
  • 51.
    Relative Benefits OfTypes Of UGC By Device
  • 52.
    Why We NeedUGC and Neogeographers
  • 53.