Machine Learning for Better Maps
Zhuangfang Yi- Development Seed - @geonanayi
Workshop @Clark University, 01/07/2019
OpenStreetMap label quality and geodiversity for
machine learning applications
Deep Learning
Machine Learning + OpenStreetMap + Satellite Imagery
Urban = Yes
Image Classification Training Dataset
Object Detection training dataset
Segmentation training dataset
Label Maker
OpenStreetMap is an attractive label/tags database for machine learning
applications that holds repid updated mapped object daily by thousands of users.
Training Data Completeness Matters
- OSM tag/label info and
popularity
Tag info in France.
Landuse is one of tags that has
been frequently used by the
users.
Label Maker
OpenStreetMap is an attractive label/tags database for machine learning
applications that host rapidly update mapped object daily by thousands of users.
Training data
Completeness Matters
OpenStreetMap Label Quality for Machine Learning
Applications
ISO standard for geographic information data: positional accuracy,
completeness, and logical consistency.
Other data quality issues in OSM:
- Vandalism
- Missing details
- Completeness and accuracy
Training data Completeness Matters
Training Data Completeness Matters
Available tools for data quality
assessment:
- OSM analytics (OSM v.s.
Human Settlement Layer)
- OSM-lint (e.g. OSM v.s.
US census TIGER in USA)
Training Data Completeness Matters
Building classification in Vietnam with LeNet
on AWS SageMaker.
Individual building detection with Tensorflow Object
detection in Mexico
60% -> 84% from Vietnam to Mexico
- OSM label data + satellite images match
- OSM label data is not well-aligned with the paired satellite image
Training Data Completeness Matters
Training Data Completeness Matters
HOT Task Manager
Training Data Completeness Matters
Urchn for urban change detection with ML
Training Data
Geodiversity Matters
Applying Machine Learning Applications for Geospatial
Analysis
Training Data Geodiversity Matters
High-voltage grid detection
with deep learning in
Pakistan, Nigeria, and
Zambia
Training Data Geodiversity Matters
Training Data Geodiversity Matters
Training Data Geodiversity Matters
Urban settlement change detection in Ethiopia between 2000 - 2017 with random
Conclusions
When it comes to applying machine learning applications:
- Training data quality matters, and to use OSM label data for ML
applications, I recommend:
1. Do a proper label completeness assessment with currently
available tools;
2. Check OSM tag/label info and frequency for your area of interest;
3. For segmentation ML application, make sure the image tiles
align-well with your label dataset;
4. Prepare training dataset using: Label Maker, or RoboSat or other
data prep tools.
- Training data geodiversity matters, and recommend to do:
a. data/image feature similarity analysis;
Contacts
Twitter @geonanayi
GitHub @geoyi
Email nana@developmentseed.org
Data Completeness Matters
HOT Analytics for Health
With support of the Bill and Melinda Gates
Foundation and the Clinton Health Access
Initiative, we have designed an analysis tool
to evaluate the accuracy and precision of
OpenStreetMap field data.
Other data quality issues in OSM:
- Vandalism
- Missing details
- Completeness and accuracy
The results of this analysis found the
positional accuracy of OpenStreetMap data
to be very good in comparison to OS
MasterMap, with over 80% overlap
between most the road objects tested
between the two datasets. The results also
found there to be a positive correlation
between road name attribute
completeness and number of users per
area.
Training data Completeness Matters

Machine Learning for Better Maps

  • 1.
    Machine Learning forBetter Maps Zhuangfang Yi- Development Seed - @geonanayi Workshop @Clark University, 01/07/2019 OpenStreetMap label quality and geodiversity for machine learning applications
  • 2.
    Deep Learning Machine Learning+ OpenStreetMap + Satellite Imagery
  • 3.
    Urban = Yes ImageClassification Training Dataset
  • 4.
  • 5.
  • 6.
    Label Maker OpenStreetMap isan attractive label/tags database for machine learning applications that holds repid updated mapped object daily by thousands of users.
  • 8.
    Training Data CompletenessMatters - OSM tag/label info and popularity Tag info in France. Landuse is one of tags that has been frequently used by the users.
  • 9.
    Label Maker OpenStreetMap isan attractive label/tags database for machine learning applications that host rapidly update mapped object daily by thousands of users.
  • 10.
    Training data Completeness Matters OpenStreetMapLabel Quality for Machine Learning Applications
  • 11.
    ISO standard forgeographic information data: positional accuracy, completeness, and logical consistency. Other data quality issues in OSM: - Vandalism - Missing details - Completeness and accuracy Training data Completeness Matters
  • 12.
    Training Data CompletenessMatters Available tools for data quality assessment: - OSM analytics (OSM v.s. Human Settlement Layer) - OSM-lint (e.g. OSM v.s. US census TIGER in USA)
  • 13.
    Training Data CompletenessMatters Building classification in Vietnam with LeNet on AWS SageMaker. Individual building detection with Tensorflow Object detection in Mexico 60% -> 84% from Vietnam to Mexico
  • 14.
    - OSM labeldata + satellite images match - OSM label data is not well-aligned with the paired satellite image Training Data Completeness Matters
  • 15.
    Training Data CompletenessMatters HOT Task Manager
  • 16.
    Training Data CompletenessMatters Urchn for urban change detection with ML
  • 17.
    Training Data Geodiversity Matters ApplyingMachine Learning Applications for Geospatial Analysis
  • 18.
    Training Data GeodiversityMatters High-voltage grid detection with deep learning in Pakistan, Nigeria, and Zambia
  • 19.
  • 20.
  • 21.
    Training Data GeodiversityMatters Urban settlement change detection in Ethiopia between 2000 - 2017 with random
  • 22.
    Conclusions When it comesto applying machine learning applications: - Training data quality matters, and to use OSM label data for ML applications, I recommend: 1. Do a proper label completeness assessment with currently available tools; 2. Check OSM tag/label info and frequency for your area of interest; 3. For segmentation ML application, make sure the image tiles align-well with your label dataset; 4. Prepare training dataset using: Label Maker, or RoboSat or other data prep tools. - Training data geodiversity matters, and recommend to do: a. data/image feature similarity analysis;
  • 23.
  • 24.
    Data Completeness Matters HOTAnalytics for Health With support of the Bill and Melinda Gates Foundation and the Clinton Health Access Initiative, we have designed an analysis tool to evaluate the accuracy and precision of OpenStreetMap field data.
  • 25.
    Other data qualityissues in OSM: - Vandalism - Missing details - Completeness and accuracy The results of this analysis found the positional accuracy of OpenStreetMap data to be very good in comparison to OS MasterMap, with over 80% overlap between most the road objects tested between the two datasets. The results also found there to be a positive correlation between road name attribute completeness and number of users per area. Training data Completeness Matters