SlideShare a Scribd company logo
AMBIGUITY & PLAUSIBILITY
Managing Classification Quality in Volunteered Geographic Information
Ahmed Loai Ali, Falko Schmid, Rami Al-Salman, Tomi Kauppinen
University of Bremen
Cognitive Systems Research Group
Reliable
Services
VGI
Data
Quality
Management
Classification
Rendering
Rendering
Turn left onto Schwachhauser Ring
Go straight forward through the park
Get out of the park right to Am Weidedamm
Navigation
Cross the lake
POI
Search
Ambiguity & Plausibility
park
garden
recreatio
n
grass
Proposed
Approach
Locality
Filtering
Maintain locality during
learning
Data with sufficient
quality for learning
Classification by Tags
Classification by Tags
Residential
Industrial
Agriculture
Forest
Park
Garden
Playground
garden
grass
meadow
park
Garden: "a distinguishable planned space, usually outdoors, set aside for the display, cultivation, and
enjoyment of plants and other forms of nature. Residential garden is most common, it is generally
found in proximity to a residence, such as the front or back garden."
Grass: "a smaller areas of mown and managed grass for example in the middle of a roundabout, verges
beside a road or in the middle of a dual-carriageway."
Meadow: "a land primarily vegetated by grass plus other non-woody plants."
Park: "an open, green area for recreation, usually municipal. These are outdoor areas, typically grassy
or
green areas, set aside of leisure and recreation. Typically open to the public, but may be fenced, and
may be closed; e.g., at night time."
Classifier
properties
Classifier
learning
Classifier
validation
Classifier
application
Data from densest
cities
Meta-data analysis
Mapper activities analysis
Version edits analysis
𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
𝑎𝑟𝑒𝑎
A. L. Ali and F. Schmid, Data quality assurance for Volunteered Geographic Information
In Proceedings of the 8th International Conference on Geographic Information Science,
GIScience2014, pages 126-141, 2014
• 9-Intersection Model (9IM)
• Meet, Overlap and Contains relations
• Assumptions
• Park usually contains entertainments facilities
• “residential” Garden often meet “residential” houses
• Grass meets roads or buildings and rarely contains other objects
• Meadow likely meets or overlaps with farms/farmlands
Frequent Keys involved in the topological relations
admin_level building amenity wetland surface bicycle barrier historic aerialway tourism
man_made covered landuse aeroway power bridge foot wood bridge service
intermittent shop natural leisure office religion ref highway tunnel width
construction water harbour military sport place name railway waterway brand
Frequent Keys involved in the topological relations
1.99% 43.79% 5.15% 0.10% 0.00% 15.37% 6.40% 0.49% 0.01% 0.71%
0.00% 0.05% 22.79% 0.05% 1.18% 0.02% 12.33% 1.19% 1.22% 1.71%
0.00% 0.14% 6.16% 10.30% 0.00% 0.28% 0.00% 63.40% 1.97% 0.00%
0.06% 0.08% 0.00% 0.00% 2.74% 1.07% 0.00% 0.00% 3.64% 0.01%
Frequent Keys involved in the topological relations
building amenity bicycle barrier
landuse foot
natural leisure highway
sport waterway
1.99% 0.10% 0.00% 0.49% 0.01% 0.71%
0.00% 0.05% 0.05% 1.18% 0.02% 1.19% 1.22% 1.71%
0.00% 0.14% 0.00% 0.28% 0.00% 1.97% 0.00%
0.06% 0.08% 0.00% 0.00% 1.07% 0.00% 0.00% 0.01%
Entity
size 𝑚𝑒𝑒𝑡 𝐴 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑠 𝐴𝑜𝑣𝑒𝑟𝑙𝑎𝑝 𝐴
𝑜𝑣𝑒𝑟𝑙𝑎𝑝 𝐿𝑚𝑒𝑒𝑡 𝐿 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑠 𝐿
land use
amenity
building
leisure
sport
highway
waterway
foot
bicycle
barrier
grass
meadow
garden
park
natural
meadow
K-nearest neighbours
Classification
eagerlazy
Tag
Based
Land use
grass
meadow
Leisure
garden
park
Label
Based
grass
meadow
garden
park
Label-Based Model
(LBM)
Tag-Based Model
(TBM)
Accuracy
Area Under ROC Curve
(AUC)
𝐴𝑐𝑐𝑢𝑟𝑎𝑦 =
𝑇𝑃 + 𝑇𝑁
𝑇𝑃 + 𝐹𝑁 + 𝐹𝑃 + 𝑇𝑁
%
Accuracy AUC
GERMAN
Y
64.3 % 0.85
UK 85.1 % 0.93
Accuracy AUC
GERMAN
Y
76.8 % 0.85
UK 89.0 % 0.92
Label-Based Model
(LBM)
Tag-Based Model
(TBM)
64.3 %
76.8 %
85.1 %
89.0 %
0
10
20
30
40
50
60
70
80
90
100
LBM TBM
Germany UK
Manual checking
Re-checking
Empirical study
Evaluation
Manual
checking
Re-checking
Empirical
study
park garden
meadowgrass
Manual
checking
Re-checking
Empirical
study
December 2013 June 2014
Detected
Outliers
Updated
Germany 6568 entities ≈ 23 %
UK 310 entities ≈ 60 %
Manual
checking
Re-checking
Empirical
study
• Present a sample of entities to participants
• Ask bout their opinions about the current
classification of the entities
• In case of disagreement with the current
class, the participant is asked to provide an
appropriate class
157 participants
115 pa. complete the study
81 pa. give complete opinions
They represent different cultures
More than 10 mother languages
Various levels of OSM experience
24 no knowledge, 17 beginners, 21 moderate knowledge,19 experts
• To evaluate the results, we used Light's Kappa for m raters
• 1.0 means maximum agreement
• Less than 0 means chance agreement
• 0.01 to 1.0 is slight, fair, moderate, and substantial
• Light's Kappa for all 81 participants was 0.176
• slight agreement
Conclusion
• Quality management mechanisms are required for VGI
• Classification is one facet of data quality of VGI
• In VGI context, the classification depends on multiple
factors:
• User perception, locality and expert level
• Purpose-for-usage
• Inherent properties
• Entity’s geographic context
Conclusion
• Classification process has various characteristics:
• Structured or Unstructured
• With vast amount of data, learning tackles the problem
• Crowdsourcing revisions acts to check the detected outliers
• Guided classifications mechanism is needed
• Only guided without force
?
contact: loai@informatik.uni-bremen.de

More Related Content

Recently uploaded

University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 

Recently uploaded (20)

University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 

Featured

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
GetSmarter
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
Alireza Esmikhani
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
Project for Public Spaces & National Center for Biking and Walking
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
Erica Santiago
 

Featured (20)

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 

Ambiguity & Plausibility: managing the classification quality in Volunteered Geographic Information

Editor's Notes

  1. With the ubiquity of technologies and hand-held location sensing devices individuals enable to produce geographic information. The phenomena that is known by VGI Nowadays, VGI data is increasingly produced from different sources and supporting various type of applications In VGI context, the data might be intentionally or unintentionally produced the geographic information might be implicitly of explicitly annotated.
  2. In our research we focus on the VGI data resulting form collaborative mapping projects Collaborative mapping where a group of individual whatever their knowledge and experience work together to collect, share and use information about geographical features.
  3. OSM is an example of VGI collaborative mapping Where the project aims to develop a free world map: editable, usable and valid for everyone to contribute and use.
  4. Providing reliable services usually depends on trusted data, which is not guaranteed in VGI context In VGI, the quality of data is not guaranteed, and the need of quality management mechanism is required to depend on such data sources to provide reliable services.
  5. Generally the quality of spatial data has multiple perspectives, positional and attribute accuracy, consistency, completeness and lineage Whereas, In our research we tackle the quality of VGI from the perspectives of classification. The classification is one facet of attribute accuracy measure of quality.
  6. Classification accuracy plays a major role in various applications for examples In rendering: Inappropriate classification lead to wrong handling of data by algorithms
  7. In rendering: Inappropriate classification lead to wrong handling of data by algorithms The challenges might increase in the classification of similar features, due to the ambiguous characteristics
  8. Another example, where the classification of geographic features plays a role is in Cognitive navigation . It depends on feature type in guiding and directing the users rather than directions and distances of normal navigation systems depends on the classification to some extend
  9. Defiantly, POI search depends on the classification of the features. For example searching for nearby service or facilities basically depends on features classes In this example, searching for nearby park comes up with this a result which might unsatisfied the user’s intension
  10. The challenge of classification has various reasons: Remote classification : most of contributors edits by tracking satellite image, where the intrinsic properties of an object might be vague or unclear. The classification of such cases mainly depends on the contributor’s perspective and requires user’s locality and some common sense about the geographical context
  11. Another reason behind the problematic classification is ambiguity of terms and professional classification Some features also have an unclear definition or an ambiguous characteristics. Thus non-professional contributors potentially assign inappropriate classes for these entities As an example, for non-professionals what is the difference between landuse and landcover, if the area covered with heavy trees is classifies as “forest” or “wood”
  12. Plus many other factors like the heterogeneity of tools and contributors and the loose classification mechanisms All that lead to classification ambiguity and plausibility problem in VGI context. For an example Such of this entity could belong to the following classes: park, garden, grass, recreation. That we called classification ambiguity However, in comparison with the majority of other similar entities this entity has higher plausibility to be park….. And that what we called classification plausibility When this entity classified as grass or meadow we called that inappropriate classification.
  13. For these types of features which have an arbitrary structure we propose learning-based model to check the integrity of data. The approach consists of two phases: learning and consistency checking ============================================================ To tackle the classification ambiguity and plausibility we developed a learning-based integrity checking mechanism The approach aims to learn the characteristics of specific features by applying machine learning algorithm The approach divided into two phases: The first phase, classification aims to develop a robust classifier able to distinguish between similar features. The classifier here aims to detect the problematic classified entities and propose a recommended class in case of problematic or inconsistency The send phase, consistency checking, where the classifier has 3 different implementation scenarios CC : by checking the classification at the contribution time by encoding the classifier into the contribution tool. In this scenario, the tool guides the contributor in case of inconsistent classification. MC : by depending in manual revisions of the outliers. And in this one, crowdsourcing revision plays major role. AC : when the classifier able to detect clear outliers then the auto correction could be applied
  14. The approach depends mainly upon two keys 1- locality maintaining 2- filtering and learn from data of sufficient quality
  15. In our studies, we utilize OSM data as an example of a successful VGI project Let’s talk about OSM as a prominent example of VGI project Classification of entities is done by means of tags, where each tag consist of key and values In most of VGI projects the contributions are annotated by tags In OSM these tags have the form of key=values
  16. The key represent the classification perspectives and the value represent the class label of an entity.
  17. We exemplify our approach to distinguish between four class of grass covered lands Grass, garden , meadow and park The 4 classes used to describe lands cover by grass
  18. By looking into OSM recommendations on its WIKI The definitions either too abstracts or unclear, Which potentially resulting in a certain level of ambiguity and plausibility In other cases, the definition is too complicated to be recognized by inexpert and (non-professional) contributors /////////////////////////////////////////// As they all describe lands covered by grass, however, each class has its own unique characteristics but with certain level of similarity with each other.
  19. Regarding the approach we built a classifier with dual functionality ---- pointing out the problematic classified entities ---- guiding towards the most appropriate classes in case of outliers The process goes in forward manner from classifier properties selection, learning , validation process, ends up with applying the developed classifier
  20. Depending on one dataset for learning and validating the classifier might be biased At the same time, building a robust classifier requires data of sufficient quality So Frist, we utilized only the data of the 10 most densest cities to ensure an active mapping community behind this data. Second, we investigated the meta-data of the contributions to extract subsets for validation process.
  21. In a previous work we investigated only the geometric properties to distinguish between park and garden entities
  22. In this work, investigating the area of the entities of the four classes we found that:: -- Park and meadow entities usually have larger areas than garden and grass entities
  23. Investigating the geometric properties only probably not enough to learn the intrinsic and extrinsic properties of the entities We investigate the topological properties as well. We use 9IM to investigate the 8 topological relation between the target entities and their context. We take into account 3 relations and ignore the others, as they didn't add additional information to the classifiers. The investigation of the topological features bases on some assumption like the follow:
  24. Further investigation is taken into account to analysis which type of features frequently involved in a topological relation with the target entities. We checked all keys of entities involved in the three mentioned realations
  25. Between vast amount of keys we take into account keys that are frequently more than 2% and add much information to the classifer
  26. We divided they keys into: Keys pointing to areal features And Keys pointing to linear features
  27. For each entity we have the following structure Each entity is associated with a set of properties and assigned to a class label
  28. We utilize KNN learning algorithm, in which the classification of an entity is done to the majority of the similar entities KNN is one form of lazy learning, when the classifier is developed based on the complete existence data set
  29. We developed two models for classifiers The first LBM aims to distinguish between the four class labels And The second TBM aims to distinguish between the tagging mechanism itself: as park and garden belongs to “leisure” key and grass and meadow entities belong to “land use”
  30. We depend on two measures to evaluate the performance of the developed classifiers Accuracy : the percentage of the correct classified entities. higher false positives and false negatives means lower performance And AUC: Receiver Operating Characteristic (ROC) is the curve which determine the relation between true positive rate and false positive rate the optimal classifier has AUC of 1, random classifier has AUC of 0.5, and less than 0.5 indicates low performance classifiers.
  31. Regarding the developed two models we get the following results We conducted the studies on data sets of two different countries known by acceptable level of data quality, according to pervious researches and studies Higher performance of the classifiers of the data of UK indicates more classification consistency of entities in UK data set over Germany data set In both models the AUC have nearly the same values, whereas the higher accuracy of TBM due to aggregation of classes.
  32. To evaluate our approach we followed three methodologies: We applied the TBM classification and performed
  33. First manual checking of the outliers we noticed large amount of clear cases of inappropriate classified entities For example roundabouts classified as garden or park Garden between residential houses classified as grass park entity with playground, sport areas and water bodies classified as a meadow
  34. Second, we rechecked the detected outliers 6 months later A promising amount of detected entities has been updated by the OSM community This indicate the disagreement of the OSM mapper’s community with the previous classification of these entities
  35. The third methodology: We conducted an empirical study as the follow:
  36. It was a web based study depending on crowd participants At the beginning, we introduce to the participant the recommended definitions of the target classes
  37. Then they are asked to provide some anonymous data about Gender, age, OSM experience and their mother tongue as mirror of their culture
  38. afterwards a sample of 30 entities are presented to the participants with the availability to check the definition and showing the associated tags with the entity. The satellite image of the entity is also provided as well
  39. Thanks for all the participants who contribute in our studies
  40. We used light’ kappa m raters to evaluate the results Using kappa coefficient to measure the intra-user agreements within multiple raters 1 maximum agreements less than zero less than chance the range between 0 and 1 is divided into S, F, M and Sub The Light’s kappa was 0.176 which indicate slight agreements between participants and their disagreements on the given classifications
  41. We investigate the results statistically as well to evaluate the findings Regarding most of the sample, the participants are relatively disagreement with the given classifications The challenges was also they are even disagree between themselves on a classification of the entities individually