Official and crowdsourced geospatial data integration

1,031 views

Published on

Searching solutions to improve the processes in cartography production and updating, using OpenStreetMap as data source

Published in: Technology, Education
1 Comment
4 Likes
Statistics
Notes
No Downloads
Views
Total views
1,031
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
0
Comments
1
Likes
4
Embeds 0
No embeds

No notes for slide

Official and crowdsourced geospatial data integration

  1. 1. OFFICIAL AND CROWDSOURCEDGEOSPATIAL DATAINTEGRATIONSearching solutions to improve the processes incartography updating By Jimena Martínez Supervisors: Antonio Vázquez and Marianne de Vries
  2. 2. 2Table of contents Background Problems The idea The steps to develop the idea, and an example to show it
  3. 3. 3 Background BCN200 BTN25 BTA5 MGCP International Local (Spanish National (Spain) National (Spain) (Africa, Middle Scope provinces) East) Cartography Cells (208 Spain: 6 Provinces Sheets Sheets units countries) Scale 1/200.000 1/25.000 1/5.000 1/50.000Updating cycle 2 years 4 years 4 years 4 years Spanish budget: Budget 300.000 € 3.500.000 € 800.000 € 27.000.000 €
  4. 4. 4 Problems 1. Why official cartography is never enough updated? Update process Satellite/ aerial Release date images collecting date (2011 version) Feb. 2011 Dec. 2011 Dec. 2010 May 20111st real change 2nd real change Off. Data reflects 1st change
  5. 5. 5Problems2. Why updating process is such long and expensive? Traditional updating process Vector cartography from last year. Set of data sources against which compare the cartography (images, maps, raster, vector) Reviewing the whole cartography unit. Too much time to review, not much time to edit features.
  6. 6. 6Problems2. Why updating process is such long and expensive? Madrid case (1/200k) Time to update: 4 weeks 1 person Features edited percentage: 30% Time to edit this features: 1,5 weeks Would be possible to save the other 2,5 weeks?
  7. 7. 7Problems1. Why official cartography 2. Why updating process is such As a result:is never enough updated? long and expensive? Reviewing the whole cartographyTraditional process based against different data sources is Long processon different data sources. needed…Data sources have different to detect changes. Expensive processdates (collecting dates) Not always useful result (if highly updated cartography is needed)
  8. 8. 8The idea To develop a general A system that finds methodology to decide where the official whether crowdsourced dataset need to be Saving costs (OpenStreetMap) and updated, and which and obtaining official geodata could be type of update needs better updatedintegrated or not in order each feature, without cartography to use OSM to improve reviewing the whole the official cartography cartography unit. updating process.
  9. 9. 9The ideaData sources in the updating process Better updated features (not always) Official data Official data Vector format Not complete OSM data Not homogeneous OSM to indicate where to update
  10. 10. 10The ideaDifferences in updating processes NMAs official data Crowdsourced data (OSM) Government (NMA) Hours/days Users/NMAs/companies Updating & MAP v.1. productionMonths/ processes MAP v.1…v.n Years Tenders (companies) Updating processes MAP v.2
  11. 11. 11 The idea Differences in updating processes Update process OSM update Satellite/ aerial OSM update Release date images collecting date (2011 version) Jan. 2011 Feb. 2011 June 2011 Dec. 2011 Dec. 2010 May 20111st real change OSM reflects 2nd real change OSM reflects Off. Data reflects 1st change 2nd change 1st change
  12. 12. 12 The idea Differences Which dataset is “better”? Official data OSM data Which one is better? Some studies (Haklay 2008, As a result OSM But, what Zielstra&Zipf, 2010) take this data The desired is not 100% happens with set as the “truth” against which result will be: complete that? to compare OSMOfficial data OSM data Types of updates
  13. 13. 13The idea “Given enoughQuestions to answer eyeballs, all bugs are shallow” WHY OSM? Accuracy data (Linus Law). Amount of data. Updated data. Comparative studies WHAT features from OSM? OSM not as features to take, but as indicators If not useful, not used: types of to use. updates. AIM 3 HOW to integrate OSM and official data? Matching data models in a reference semantic Quality indicators (traditional and model (domain ontology) Crowd quality parameters) AIM 1 AIM 2
  14. 14. The idea: the proposed system 14 WEB Update OSM Official data set OSM data set Semantic INPUT specifications Specifications Reference: Domain Ontology Feature class 1 Feature class 1 (50) Feature class 2 Feature class 2 (80) ... Candidates ... Feature class n Feature class n (N) Matching process (feature classes filter)Updating process“Updating gaps” Feature class 1 Feature class 1 (50) Feature class 2 Feature class 2 (80) ... ...Types of updates Feature class n Feature class n (N) VGI teams/Online updating QC and QA (features filter) Feature class 1 (30) ... Crowd Feature class n (N-M) ISO 19157 Quality
  15. 15. 15The steps to reach the goalAnd an example to show them 1 • Making the matching between data models and features. Ontology approach 2 • To study Quality parameters to decide which features could be used. 3 • Proposing a new updating process based on flags and types of updates.
  16. 16. 161st step: making the matchingComparing data models NMA data model OSM data model Format Database, shp XML (.osm) Node Node (Geometric) Primitives Arc Way Tag Face Relations Feature class Table, file Primary tag (key) Feature (each object) Row Primary tag (value) Attribute Column Tag (key) Values (domains) Cells Tag (value)
  17. 17. 17 1st step: making the matching An approach (based on H. Uitemark) A1 A1: building of interest1. Official dataset C1 Legend C1: motorway D1 D1: toll motorway A B Real world Candidates: C D {[(A1,A2), (A1,B2)], [(C1,C2), (C1,D2)], [(D1,C2), (D1,D2)]} D2 A2 A2: building, church2. OpenStreetMap Legend B2: building, school C2 B2 C2, D2: highway, motorway
  18. 18. 181st step: making the matchingThe example: motorways (BCN Spain-OSM) Something“superior” and semanticisneededto compare 2 data models
  19. 19. 191st step: making the matchingUsing ontologies. First approach Official OSM data data set set matching matching Ontology Ontology Domain Ontology (OSMONTO)
  20. 20. 201st step: making the matchingUsing ontologies. Second approach Official OSM data data set set mapping Ontology Domain Ontology (OSMONTO) ODEMapster R2RML
  21. 21. 212nd step: quality studyStudying the quality: traditional parameters van Oort (2006) Haklay (2008) ISO 19157 (2011) Completeness Completeness Completeness Logical consistency Logical consistency Logical consistency Positional accuracy Positional accuracy Positional accuracy Attribute accuracy Attribute accuracy Thematic accuracy Temporal quality Temporal quality Temporal quality Semantic Accuracy Semantic Accuracy Usage, purpose and Usage, purpose and Usability element constraints constraints Lineage Lineage Lineage (19115) Variation in Quality Meta-quality Resolution (≈ scale)
  22. 22. 222nd step: quality studyStudying the quality: (some) crowd quality parameters Maué (2007). PGIS Haklay (2008) van Exel (2010) OthersReputation of contributors Longevity of engagement User quality Lineage • Local knowledege Number of editions on a • Experience Information assymetry • Recognition Homogeneity in Quality feature Number of contributors Feature related quality Time between editions on a feature • Lineage on a feature • Possitional accuracy Number of bugs fixed • Semantic accuracy
  23. 23. 23 2nd step: quality study Higher quality Lower quality Some methods to measure traditional quality (pos. accuracy) Buffer width: Perkal • Possitional accuracy Until blue is totally (1966) • Interpretation of epsilon band inside orangeGoodchild • Possitional accuracy and Buffer width: • Complete data sets are needed Hunter Until blue is 90- (1997) • A higher quality dataset is needed 95% inside orange • Possitional accuracy (OS-OSM) Buffer width: Haklay • Complete data sets are needed. He Two buffers. (2008) completed OSM Compare de • Suposed OS is higher quality than OSM overlap areas • No complete data (and nobody is going Buffer width:BCNSpain- to complete). Neither BCN nor OSM Could be OSM • Don´t know which data set is better impossible to (OSM to update BCN) achieve 90-95%
  24. 24. 242nd step: quality studyExample: measures of positional accuracy on motorways BCN Spain OSM
  25. 25. 25 2nd step: quality study Example: measures of positional accuracy on motorways % length of BCN motorways within the OSM buffer 90%% of BCN roadswithinthe OSM buffer 80% 70% 60% 50% 40% 30% 20% 10% 0% 1 2 3 4 5 6 7 8 9 10 15 20 25 30 50 100 200 500 Bufferwidth(m) A 500 m buffer around OSM is needed to reach 80% of the BCN length within the buffer= lack of completeness in OSM dataset BCN Scale 1/200k (buffer must be ≈ 20m, which means 73% of the length wihtin the OSM buffer)
  26. 26. 26 2nd step: quality study Example: measures of positional accuracy on motorways % length of OSM motorways within the BCN buffer% of OSM roadswithinthe BCN buffer 100% 80% 60% 40% 20% 0% 1 2 3 4 5 6 7 8 9 10 15 20 25 30 50 100 200 500 Bufferwidth(m) A 25m buffer around BCN is needed to reach 90% of the OSM length within the buffer. In this case the method works because every OSM motorways are also in BCN dataset.
  27. 27. 27 Higher quality 2nd step: quality study Lower quality Some methods to measure traditional quality (completeness) • Based on boundary box on each feature • 300 m radius to find candidates to match If the BboxOSL Musical • Additionally, levenshtein distance matches, then Chairs (streets) the street Algorithm name is • A higher quality data set is needed to compared compare • http://humanleg.org.uk/code/oslmusic alchairs/ • Not useful for motorways or longBCNSpain- features. OSM • Useful for streets or polygons • Convex hull could be used instead Bbox
  28. 28. 282nd step: quality studyConclusions about traditional qualityWhich parameter comes before? • Complete data set (not a measure of Completeness or Possitional completeness) is needed to measure positional accuracy accuracyIt is been proved that OSM is not • Congrats! complete • OSM not as features to take, but as indicatorsIt brings me to the first statement to use • It doesn´t matter if OSM is not complete • “Updating gaps”: which include the lack of A new approach completeness of OSM
  29. 29. 293rd step: purpose updating processTraditional classification of updates Add Updates Delete Geometry Modify Attributes
  30. 30. 30 3rd step: purpose updating process Proposed classification of updates Don´t need to be YES updated ROAD_ATT (offic) = YES ROAD_ATT (OSM) Updating AttributeROAD_G (official)= NO gap, type I updating ROAD_G (OSM) Updating Classification YES gap, type II updatingOfficial data ROAD_G (official)= NO OSM data OTHER_G (OSM) Doesn´t exist in Updating OSM can´t be used, but OSM gap, type III adviced. NO Automatically Doesn´t exist in Updating updating from official dataset gap, type IV OSM?
  31. 31. 31The result Madrid case (1/200k) Time to update: 1,5 weeks, 1 person Features edited percentage: 30% Time saved: 2,5 weeks Costs saved: 40%
  32. 32. 32Next steps Find the best method to compare both data sets and try it in different data sets (based on TQ and CQ) Obtaining automatically different types of updating gaps. Look for a better way to compare data models (ontology approach) Try an automatic method to update the updating gaps based on OSM.
  33. 33. 33Thank you!Dank je wel! Gracias! J.MartinezRamos@student.tudelft.nl

×