SlideShare a Scribd company logo
1 of 28
HYBRID GEO-INFORMATION PROCESSING
CROWDSOURCED SUPERVISION OF GEO-SPATIAL MACHINE LEARNING
TASKS
Frank O. Ostermann
18th AGILE Conference, 11.06.2015
 Introduction: Why machine learning and
why crowdsource supervision?
 State of the art: Practical examples
 Opportunities and challenges: Future
research directions
11.06.2015F.O.Ostermann - 18th AGILE Conference 2
HYBRID GEO-INFORMATION PROCESSING
CROWDSOURCED SUPERVISION OF GEO-SPATIAL MACHINE LEARNING TASKS
11.06.2015F.O.Ostermann - 18th AGILE Conference 3
NEW SOURCES OF GEO-INFORMATION
GEO-SOCIAL MEDIA AS SENSORS
Geography
Explicit Implicit
Participation
Explicit
Volunteered Geographic
Information (VGI)
Open Street Map
Volunteered Geographic Content
(VGC)
Wikipedia articles on non-geographic
topics containing place names,
Foursquare
Implicit
Contributed / Ambient
Geographic Information (CGI/AGI)
Public Tweets referring to the
properties of an identifiable place.
User-Generated Geographic Content
(UGGC)
Public Flickr images containing a place
name or being georeferenced
11.06.2015F.O.Ostermann - 18th AGILE Conference 4
GEO-SOCIAL MEDIA SENSORS - WHAT‘S DIFFERENT?
GEO-SOCIAL MEDIA AS SENSORS
• Rich, pre-processed information
• Uneven distribution
• Heterogeneous level of quality
• Varying but high update frequency (stream)
• Redundancy of content and channels (sharing)
• Heterogeneous structure
• Unknown source/lineage
• Unclear / changing licencing, property rights, liability
• Unknown/Immeasurable precision, error, completeness
11.06.2015F.O.Ostermann - 18th AGILE Conference 5
CALIBRATING GEO-SOCIAL SENSORS
GEO-SOCIAL MEDIA AS SENSORS
How to calibrate? (Should we?)
So far:
• Crowdsourced curation
• Post-hoc analysis
Crowdsourced curation problems
• Scalability
• Sustainability
Automate curation!
 Introduction: Why machine learning and
why crowdsource supervision?
 State of the art: Practical examples
 Opportunities and challenges: Future
research directions
11.06.2015F.O.Ostermann - 18th AGILE Conference 6
HYBRID GEO-INFORMATION PROCESSING
CROWDSOURCED SUPERVISION OF GEO-SPATIAL MACHINE LEARNING TASKS
11.06.2015F.O.Ostermann - 18th AGILE Conference 7
EXISTING SYSTEMS
TWITCIDENT - CROWDSENSE
11.06.2015F.O.Ostermann - 18th AGILE Conference 8
EXISTING SYSTEMS
CRISISTRACKER (AIDR)
11.06.2015F.O.Ostermann - 18th AGILE Conference 9
EXISTING SYSTEMS
AIDR
http://irevolution.net/2013/10/01/aidr-artificial-
intelligence-for-disaster-response/
11.06.2015F.O.Ostermann - 18th AGILE Conference 10
EXISTING SYSTEMS
GEOCONAVI
Adopted from [2, 3]
Source
Credibility
Relevance
Content
Location
Context
Geographic
Contextualization
11.06.2015F.O.Ostermann - 18th AGILE Conference 11
FRENCH FOREST FIRE SOCIAL MEDIA
GEOCONAVI
(2) Machine-learned
relevance filter:
25,684 items left
(3) Geocoded and
context enriched:
5,770 items left
(4) Clustered in
space and time:
129 clusters with
2,682 items
(5) Second relevance filter:
11 clusters left
with 469 items
(1) Containing French keywords:
659,676 Tweets and
39,016 Flickr images
11.06.2015F.O.Ostermann - 18th AGILE Conference 12
GEOCONAVI FIGHTING FOREST FIRES
TOPICALITY MACHINE LEARNING CLASSIFICATION
1. Manually annotated (Yes/No) random sample
2. Counted keyword occurences
3. Used Weka 10-fold stratified cross validation with
a) Decision trees
b) Naive Bayes
c) Association Rules
4. J48 Decision Tree works best
Classified as YES Classified as NO
On Forest Fire 1196 370
Not on Forest Fire 403 3712
11.06.2015F.O.Ostermann - 18th AGILE Conference 13
GEOCONAVI FIGHTING FOREST FIRES
GEOCONAVI
1.1 Retrieval
Scheduled Java code
accessing APIs
2.1 Topicality
Scheduled PLSQL job
2.2 Geo-Coding
a) Scheduled PLSQL job
b) Scheduled Java code
2.3 Geographic context
Scheduled PLSQL job
3.1 Spatio-temporal
clustering
Scheduled Python script
calling SatScan job
2.4 Quality Assessment
Scheduled PLSQL job
1.2 Storage
Scheduled Java code
writing to DBMS
Oracle DBMS
3.2 Quality Re-Assessment
Scheduled PLSQL job
Twitter
Stream-
ing API
Flickr
Search
API
Dissemination
SMS, WFS, WMS, RSS, SES
EFFIS
Hotspot
Data
European Media Monitor
Geo-coding API
11.06.2015F.O.Ostermann - 18th AGILE Conference 14
HYBRID GEO-INFORMATION PROCESSING
WHY THE EFFORT?
Time-consuming and resource-intensive
• Manual annotation and experiments for topicality filtering
• Parameterization of spatio-temporal clustering
Other challenges:
• Dependency on data quality
• Overfitting
• Diversity of contexts and tasks
• Near real-time
Crowdsourced Supervision
 Introduction: Why machine learning and
why crowdsource supervision?
 State of the art: Practical examples
 Opportunities and challenges: Future
research directions
11.06.2015F.O.Ostermann - 18th AGILE Conference 15
HYBRID GEO-INFORMATION PROCESSING
CROWDSOURCED SUPERVISION OF GEO-SPATIAL MACHINE LEARNING TASKS
11.06.2015F.O.Ostermann - 18th AGILE Conference 16
INTEGRATING GEO-SOCIAL MEDIA
FUTURE IDEAS
11.06.2015F.O.Ostermann - 18th AGILE Conference 17
HYBRID GEO-INFORMATION PROCESSING
RESEARCH QUESTIONS
Developing hybrid quality assurance mechanisms for near real-
time geo-information streams
• Link the characteristics of geographic information with machine
learning class labelling and regression
• Provide a multi-modal interface to let human oracles simultaneously
label instances
• Translate the learner models into nomothetic principles on
geographic semantics
11.06.2015F.O.Ostermann - 18th AGILE Conference 18
MACHINE LEARNING FOR GEO-SOCIAL MEDIA
LINKING GEOINFORMATION WITH MACHINE LEARNERS
Every UGGC instance needs multi-class labelling:
• Content type
• Geographic footprints of locations and/or events
• Distinct event membership
• Credibility based on a combination of the other class labels
Learners have to deal with characteristics of geographic information:
• Spatial autocorrelation
• Vague boundaries and class memberships
• Uncontrolled variance
11.06.2015F.O.Ostermann - 18th AGILE Conference 19
MACHINE LEARNING FOR GEO-SOCIAL MEDIA
HUMAN ORACLES AND GEO-SEMANTICS
• Multiple human oracles annotate instances for all model classes
• Responses will modify the
• Learners
• Parameters used for the geographic analysis steps to compute
footprints and clusters.
• Resulting models indirectly encode the semantic similarity of
geographic places and concepts
• Reference to (linked) data repositories such as DBpedia and
GeoNames when possible.
11.06.2015F.O.Ostermann - 18th AGILE Conference 20
ACTIVE LEARNING
BASIC CONSIDERATIONS
• Learner chooses instances to be labelled and presents them to the
human annotator
• Maximize the impact of human annotation
• Learner remains flexible towards new instances
11.06.2015F.O.Ostermann - 18th AGILE Conference 21
ACTIVE LEARNING FOR GEO-SOCIAL MEDIA
ADVANCED CONSIDERATIONS
• Active learners profit from domain expertise
• Passive learners suited for domain novices
• Initial training set should be representative with respect to the
classes that the learning process is to handle; omitting classes form
the inital seed set might result in trouble further down the road
• Batch-mode better suited to multiple, parallel annotators
• Learning costs positively related to labeling informativeness
• Crowdsourced labeling might require repeated labiling to de-noise
existing training instances
11.06.2015F.O.Ostermann - 18th AGILE Conference 22
ACTIVE LEARNING FOR GEO-SOCIAL MEDIA
QUERIES STRATEGIES AND TYPES
• Stream-based selective sampling: Learner samples instance and
decides to query it or not; well-established e.g. for word sense
disambiguation.
• Density-weighted margin-based uncertainty sampling: Avoids
choice of outliers which have high uncertainty but will not improve a
model's performance; well-established for classification tasks
• Membership (is this concept an example of the target concept)
• Equivalence (is x equivalent to y)
• Disjointness (are x any disjoint)
11.06.2015F.O.Ostermann - 18th AGILE Conference 23
ACTIVE LEARNING FOR GEO-SOCIAL MEDIA
QUERIES
Toponym disambiguation:
• “Does this [item] talk about [location A] or [location B], or none, or
both?”
Spatial footprint calculation for vague geographies:
• “Is this spatial footprint for [item] correct? If not, is it too large, too
small, or wrong shape, or wrong place?”
Spatio-temporal clustering:
• “Does this [item] belong to a cluster named [event] in [location]? If
not, what’s wrong: Event, Location, or both?”
11.06.2015F.O.Ostermann - 18th AGILE Conference 24
HYBRID GEO-INFORMATION PROCESSING
WORKFLOW
11.06.2015F.O.Ostermann - 18th AGILE Conference 25
HYBRID GEO-INFORMATION PROCESSING
WORKFLOW
11.06.2015F.O.Ostermann - 18th AGILE Conference 26
HYBRID GEO-INFORMATION PROCESSING
KEY METHODS
Key Techniques
• Decision Trees
• Naive Bayes
• Support Vector Machines
Key Technologies
• Apache Spark / Storm (Analytical geoprocessing tasks)
• Pybossa (Crowdsourced supervision)
• Cloud Computing
11.06.2015F.O.Ostermann - 18th AGILE Conference 27
HYBRID GEO-INFORMATION PROCESSING
FUTURE STRATEGIES
Two future implementation strategies
• Extension of AIDR with GeoCONAVI functionality
• Extension of GeoCONAVI with facilities to crowd-source the
supervision of machine learning tasks and the parameterization of
analysis function.
“The next step will be to decide on a concrete strategy, followed by a
step-wise, iterative implementation and testing of the geoprocessing
tasks described in this paper.”
11.06.2015F.O.Ostermann - 18th AGILE Conference 28
CHALLENGES AND OPPORTUNITIES OF GEO-SOCIAL
MEDIA
EARTH OBSERVATION WITH UNCALIBRATED IN-SITU SENSORS
Thank you!
f.o.ostermann@utwente.nl
@f_ostermann
nl.linkedin.com/in/foost

More Related Content

Viewers also liked

Viewers also liked (16)

Handling crowdsourced geographic information
Handling crowdsourced geographic informationHandling crowdsourced geographic information
Handling crowdsourced geographic information
 
Challenges and opportunities of geo-social media
Challenges and opportunities of geo-social mediaChallenges and opportunities of geo-social media
Challenges and opportunities of geo-social media
 
MyBazaar Product - Customer Insights and Value Proposition
MyBazaar Product - Customer Insights and Value PropositionMyBazaar Product - Customer Insights and Value Proposition
MyBazaar Product - Customer Insights and Value Proposition
 
Tarea del seminario 2
Tarea del seminario 2 Tarea del seminario 2
Tarea del seminario 2
 
Pandes A Styleguide_093015
Pandes A Styleguide_093015Pandes A Styleguide_093015
Pandes A Styleguide_093015
 
Evaluation question
Evaluation questionEvaluation question
Evaluation question
 
List of funny pranks
List of funny pranksList of funny pranks
List of funny pranks
 
Funny pranks to do on your friends
Funny pranks to do on your friendsFunny pranks to do on your friends
Funny pranks to do on your friends
 
Video of funny pranks
Video of funny pranksVideo of funny pranks
Video of funny pranks
 
Funny pranks in school
Funny pranks in schoolFunny pranks in school
Funny pranks in school
 
Funny website pranks
Funny website pranksFunny website pranks
Funny website pranks
 
Very funny pranks videos
Very funny pranks videosVery funny pranks videos
Very funny pranks videos
 
Funny pranks for adults
Funny pranks for adultsFunny pranks for adults
Funny pranks for adults
 
Easy funny pranks to do at home
Easy funny pranks to do at homeEasy funny pranks to do at home
Easy funny pranks to do at home
 
Funny pranks you can do at home
Funny pranks you can do at homeFunny pranks you can do at home
Funny pranks you can do at home
 
Caglar KARATAS
Caglar KARATASCaglar KARATAS
Caglar KARATAS
 

Similar to Hybrid geo-information processing

Cloud and Crowd research at ITC
Cloud and Crowd research at ITCCloud and Crowd research at ITC
Cloud and Crowd research at ITC
Frank Ostermann
 
Event-based MultiMedia Search and Retrieval for Question Answering
Event-based MultiMedia Search and Retrieval for Question AnsweringEvent-based MultiMedia Search and Retrieval for Question Answering
Event-based MultiMedia Search and Retrieval for Question Answering
Benoit HUET
 
The Matrix: connecting and re-using digital records of archaeological investi...
The Matrix: connecting and re-using digital records of archaeological investi...The Matrix: connecting and re-using digital records of archaeological investi...
The Matrix: connecting and re-using digital records of archaeological investi...
Keith.May
 
Connecting and synchronizing scientific knowledge
Connecting and synchronizing scientific knowledgeConnecting and synchronizing scientific knowledge
Connecting and synchronizing scientific knowledge
Prashant Gupta
 
Mon domingue key_introduction to semantic
Mon domingue key_introduction to semanticMon domingue key_introduction to semantic
Mon domingue key_introduction to semantic
eswcsummerschool
 
[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...
[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...
[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...
YONG ZHENG
 

Similar to Hybrid geo-information processing (20)

Enriching geo-social media content @AGILE 2015
Enriching geo-social media content @AGILE 2015Enriching geo-social media content @AGILE 2015
Enriching geo-social media content @AGILE 2015
 
Exploratory analysis of OpenStreetMap for land use classification
Exploratory analysis of OpenStreetMap for land use classificationExploratory analysis of OpenStreetMap for land use classification
Exploratory analysis of OpenStreetMap for land use classification
 
Mining user-generated geographic content: An interactive, crowdsourced approa...
Mining user-generated geographic content: An interactive, crowdsourced approa...Mining user-generated geographic content: An interactive, crowdsourced approa...
Mining user-generated geographic content: An interactive, crowdsourced approa...
 
AI_Session 29 Graphplan algorithm.pptx
AI_Session 29 Graphplan algorithm.pptxAI_Session 29 Graphplan algorithm.pptx
AI_Session 29 Graphplan algorithm.pptx
 
Cloud and Crowd research at ITC
Cloud and Crowd research at ITCCloud and Crowd research at ITC
Cloud and Crowd research at ITC
 
Managed Forgetting (WP3 - ForgetIT 1st year review)
Managed Forgetting (WP3 - ForgetIT 1st year review)Managed Forgetting (WP3 - ForgetIT 1st year review)
Managed Forgetting (WP3 - ForgetIT 1st year review)
 
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
 
Maggie and peter williams liv3 d vis
Maggie and peter williams liv3 d visMaggie and peter williams liv3 d vis
Maggie and peter williams liv3 d vis
 
DSWS PARSEC 200925
DSWS PARSEC 200925DSWS PARSEC 200925
DSWS PARSEC 200925
 
fOSSa 2010 - GeoBI initiative, Open Source Location Intelligence
fOSSa 2010 - GeoBI initiative, Open Source Location IntelligencefOSSa 2010 - GeoBI initiative, Open Source Location Intelligence
fOSSa 2010 - GeoBI initiative, Open Source Location Intelligence
 
From Digital Earth to the Internet of Places for Management of Risks and Emer...
From Digital Earth to the Internet of Places for Management of Risks and Emer...From Digital Earth to the Internet of Places for Management of Risks and Emer...
From Digital Earth to the Internet of Places for Management of Risks and Emer...
 
A presentation about my recent projects on goup ideation and deliberation
A presentation about my recent projects on goup ideation and deliberationA presentation about my recent projects on goup ideation and deliberation
A presentation about my recent projects on goup ideation and deliberation
 
AI_Session 11: searching with Non-Deterministic Actions and partial observati...
AI_Session 11: searching with Non-Deterministic Actions and partial observati...AI_Session 11: searching with Non-Deterministic Actions and partial observati...
AI_Session 11: searching with Non-Deterministic Actions and partial observati...
 
Event-based MultiMedia Search and Retrieval for Question Answering
Event-based MultiMedia Search and Retrieval for Question AnsweringEvent-based MultiMedia Search and Retrieval for Question Answering
Event-based MultiMedia Search and Retrieval for Question Answering
 
The Matrix: connecting and re-using digital records of archaeological investi...
The Matrix: connecting and re-using digital records of archaeological investi...The Matrix: connecting and re-using digital records of archaeological investi...
The Matrix: connecting and re-using digital records of archaeological investi...
 
IEEE p1589 'ARLEM' virtual meeting, September 9, 2015
IEEE p1589 'ARLEM' virtual meeting, September 9, 2015IEEE p1589 'ARLEM' virtual meeting, September 9, 2015
IEEE p1589 'ARLEM' virtual meeting, September 9, 2015
 
Big Data HPC Convergence
Big Data HPC ConvergenceBig Data HPC Convergence
Big Data HPC Convergence
 
Connecting and synchronizing scientific knowledge
Connecting and synchronizing scientific knowledgeConnecting and synchronizing scientific knowledge
Connecting and synchronizing scientific knowledge
 
Mon domingue key_introduction to semantic
Mon domingue key_introduction to semanticMon domingue key_introduction to semantic
Mon domingue key_introduction to semantic
 
[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...
[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...
[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...
 

Recently uploaded

Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!
University of Hertfordshire
 

Recently uploaded (20)

Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Information science research with large language models: between science and ...
Information science research with large language models: between science and ...
 
Introduction and significance of Symbiotic algae
Introduction and significance of  Symbiotic algaeIntroduction and significance of  Symbiotic algae
Introduction and significance of Symbiotic algae
 
GBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) MetabolismGBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) Metabolism
 
Costs to heap leach gold ore tailings in Karamoja region of Uganda
Costs to heap leach gold ore tailings in Karamoja region of UgandaCosts to heap leach gold ore tailings in Karamoja region of Uganda
Costs to heap leach gold ore tailings in Karamoja region of Uganda
 
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptxSaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
 
GBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of AsepsisGBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of Asepsis
 
Biochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptx
Biochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptxBiochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptx
Biochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptx
 
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
 
Fun for mover student's book- English book for teaching.pdf
Fun for mover student's book- English book for teaching.pdfFun for mover student's book- English book for teaching.pdf
Fun for mover student's book- English book for teaching.pdf
 
GBSN - Biochemistry (Unit 8) Enzymology
GBSN - Biochemistry (Unit 8) EnzymologyGBSN - Biochemistry (Unit 8) Enzymology
GBSN - Biochemistry (Unit 8) Enzymology
 
Film Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdfFilm Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdf
 
Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!
 
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
 
EU START PROJECT. START-Newsletter_Issue_4.pdf
EU START PROJECT. START-Newsletter_Issue_4.pdfEU START PROJECT. START-Newsletter_Issue_4.pdf
EU START PROJECT. START-Newsletter_Issue_4.pdf
 
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
 
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY  // USES OF ANTIOBIOTICS TYPES OF ANTIB...ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY  // USES OF ANTIOBIOTICS TYPES OF ANTIB...
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
 
PARENTAL CARE IN FISHES.pptx for 5th sem
PARENTAL CARE IN FISHES.pptx for 5th semPARENTAL CARE IN FISHES.pptx for 5th sem
PARENTAL CARE IN FISHES.pptx for 5th sem
 
NUMERICAL Proof Of TIme Electron Theory.
NUMERICAL Proof Of TIme Electron Theory.NUMERICAL Proof Of TIme Electron Theory.
NUMERICAL Proof Of TIme Electron Theory.
 
GBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday LifeGBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday Life
 
Mining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptxMining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptx
 

Hybrid geo-information processing

  • 1. HYBRID GEO-INFORMATION PROCESSING CROWDSOURCED SUPERVISION OF GEO-SPATIAL MACHINE LEARNING TASKS Frank O. Ostermann 18th AGILE Conference, 11.06.2015
  • 2.  Introduction: Why machine learning and why crowdsource supervision?  State of the art: Practical examples  Opportunities and challenges: Future research directions 11.06.2015F.O.Ostermann - 18th AGILE Conference 2 HYBRID GEO-INFORMATION PROCESSING CROWDSOURCED SUPERVISION OF GEO-SPATIAL MACHINE LEARNING TASKS
  • 3. 11.06.2015F.O.Ostermann - 18th AGILE Conference 3 NEW SOURCES OF GEO-INFORMATION GEO-SOCIAL MEDIA AS SENSORS Geography Explicit Implicit Participation Explicit Volunteered Geographic Information (VGI) Open Street Map Volunteered Geographic Content (VGC) Wikipedia articles on non-geographic topics containing place names, Foursquare Implicit Contributed / Ambient Geographic Information (CGI/AGI) Public Tweets referring to the properties of an identifiable place. User-Generated Geographic Content (UGGC) Public Flickr images containing a place name or being georeferenced
  • 4. 11.06.2015F.O.Ostermann - 18th AGILE Conference 4 GEO-SOCIAL MEDIA SENSORS - WHAT‘S DIFFERENT? GEO-SOCIAL MEDIA AS SENSORS • Rich, pre-processed information • Uneven distribution • Heterogeneous level of quality • Varying but high update frequency (stream) • Redundancy of content and channels (sharing) • Heterogeneous structure • Unknown source/lineage • Unclear / changing licencing, property rights, liability • Unknown/Immeasurable precision, error, completeness
  • 5. 11.06.2015F.O.Ostermann - 18th AGILE Conference 5 CALIBRATING GEO-SOCIAL SENSORS GEO-SOCIAL MEDIA AS SENSORS How to calibrate? (Should we?) So far: • Crowdsourced curation • Post-hoc analysis Crowdsourced curation problems • Scalability • Sustainability Automate curation!
  • 6.  Introduction: Why machine learning and why crowdsource supervision?  State of the art: Practical examples  Opportunities and challenges: Future research directions 11.06.2015F.O.Ostermann - 18th AGILE Conference 6 HYBRID GEO-INFORMATION PROCESSING CROWDSOURCED SUPERVISION OF GEO-SPATIAL MACHINE LEARNING TASKS
  • 7. 11.06.2015F.O.Ostermann - 18th AGILE Conference 7 EXISTING SYSTEMS TWITCIDENT - CROWDSENSE
  • 8. 11.06.2015F.O.Ostermann - 18th AGILE Conference 8 EXISTING SYSTEMS CRISISTRACKER (AIDR)
  • 9. 11.06.2015F.O.Ostermann - 18th AGILE Conference 9 EXISTING SYSTEMS AIDR http://irevolution.net/2013/10/01/aidr-artificial- intelligence-for-disaster-response/
  • 10. 11.06.2015F.O.Ostermann - 18th AGILE Conference 10 EXISTING SYSTEMS GEOCONAVI Adopted from [2, 3] Source Credibility Relevance Content Location Context Geographic Contextualization
  • 11. 11.06.2015F.O.Ostermann - 18th AGILE Conference 11 FRENCH FOREST FIRE SOCIAL MEDIA GEOCONAVI (2) Machine-learned relevance filter: 25,684 items left (3) Geocoded and context enriched: 5,770 items left (4) Clustered in space and time: 129 clusters with 2,682 items (5) Second relevance filter: 11 clusters left with 469 items (1) Containing French keywords: 659,676 Tweets and 39,016 Flickr images
  • 12. 11.06.2015F.O.Ostermann - 18th AGILE Conference 12 GEOCONAVI FIGHTING FOREST FIRES TOPICALITY MACHINE LEARNING CLASSIFICATION 1. Manually annotated (Yes/No) random sample 2. Counted keyword occurences 3. Used Weka 10-fold stratified cross validation with a) Decision trees b) Naive Bayes c) Association Rules 4. J48 Decision Tree works best Classified as YES Classified as NO On Forest Fire 1196 370 Not on Forest Fire 403 3712
  • 13. 11.06.2015F.O.Ostermann - 18th AGILE Conference 13 GEOCONAVI FIGHTING FOREST FIRES GEOCONAVI 1.1 Retrieval Scheduled Java code accessing APIs 2.1 Topicality Scheduled PLSQL job 2.2 Geo-Coding a) Scheduled PLSQL job b) Scheduled Java code 2.3 Geographic context Scheduled PLSQL job 3.1 Spatio-temporal clustering Scheduled Python script calling SatScan job 2.4 Quality Assessment Scheduled PLSQL job 1.2 Storage Scheduled Java code writing to DBMS Oracle DBMS 3.2 Quality Re-Assessment Scheduled PLSQL job Twitter Stream- ing API Flickr Search API Dissemination SMS, WFS, WMS, RSS, SES EFFIS Hotspot Data European Media Monitor Geo-coding API
  • 14. 11.06.2015F.O.Ostermann - 18th AGILE Conference 14 HYBRID GEO-INFORMATION PROCESSING WHY THE EFFORT? Time-consuming and resource-intensive • Manual annotation and experiments for topicality filtering • Parameterization of spatio-temporal clustering Other challenges: • Dependency on data quality • Overfitting • Diversity of contexts and tasks • Near real-time Crowdsourced Supervision
  • 15.  Introduction: Why machine learning and why crowdsource supervision?  State of the art: Practical examples  Opportunities and challenges: Future research directions 11.06.2015F.O.Ostermann - 18th AGILE Conference 15 HYBRID GEO-INFORMATION PROCESSING CROWDSOURCED SUPERVISION OF GEO-SPATIAL MACHINE LEARNING TASKS
  • 16. 11.06.2015F.O.Ostermann - 18th AGILE Conference 16 INTEGRATING GEO-SOCIAL MEDIA FUTURE IDEAS
  • 17. 11.06.2015F.O.Ostermann - 18th AGILE Conference 17 HYBRID GEO-INFORMATION PROCESSING RESEARCH QUESTIONS Developing hybrid quality assurance mechanisms for near real- time geo-information streams • Link the characteristics of geographic information with machine learning class labelling and regression • Provide a multi-modal interface to let human oracles simultaneously label instances • Translate the learner models into nomothetic principles on geographic semantics
  • 18. 11.06.2015F.O.Ostermann - 18th AGILE Conference 18 MACHINE LEARNING FOR GEO-SOCIAL MEDIA LINKING GEOINFORMATION WITH MACHINE LEARNERS Every UGGC instance needs multi-class labelling: • Content type • Geographic footprints of locations and/or events • Distinct event membership • Credibility based on a combination of the other class labels Learners have to deal with characteristics of geographic information: • Spatial autocorrelation • Vague boundaries and class memberships • Uncontrolled variance
  • 19. 11.06.2015F.O.Ostermann - 18th AGILE Conference 19 MACHINE LEARNING FOR GEO-SOCIAL MEDIA HUMAN ORACLES AND GEO-SEMANTICS • Multiple human oracles annotate instances for all model classes • Responses will modify the • Learners • Parameters used for the geographic analysis steps to compute footprints and clusters. • Resulting models indirectly encode the semantic similarity of geographic places and concepts • Reference to (linked) data repositories such as DBpedia and GeoNames when possible.
  • 20. 11.06.2015F.O.Ostermann - 18th AGILE Conference 20 ACTIVE LEARNING BASIC CONSIDERATIONS • Learner chooses instances to be labelled and presents them to the human annotator • Maximize the impact of human annotation • Learner remains flexible towards new instances
  • 21. 11.06.2015F.O.Ostermann - 18th AGILE Conference 21 ACTIVE LEARNING FOR GEO-SOCIAL MEDIA ADVANCED CONSIDERATIONS • Active learners profit from domain expertise • Passive learners suited for domain novices • Initial training set should be representative with respect to the classes that the learning process is to handle; omitting classes form the inital seed set might result in trouble further down the road • Batch-mode better suited to multiple, parallel annotators • Learning costs positively related to labeling informativeness • Crowdsourced labeling might require repeated labiling to de-noise existing training instances
  • 22. 11.06.2015F.O.Ostermann - 18th AGILE Conference 22 ACTIVE LEARNING FOR GEO-SOCIAL MEDIA QUERIES STRATEGIES AND TYPES • Stream-based selective sampling: Learner samples instance and decides to query it or not; well-established e.g. for word sense disambiguation. • Density-weighted margin-based uncertainty sampling: Avoids choice of outliers which have high uncertainty but will not improve a model's performance; well-established for classification tasks • Membership (is this concept an example of the target concept) • Equivalence (is x equivalent to y) • Disjointness (are x any disjoint)
  • 23. 11.06.2015F.O.Ostermann - 18th AGILE Conference 23 ACTIVE LEARNING FOR GEO-SOCIAL MEDIA QUERIES Toponym disambiguation: • “Does this [item] talk about [location A] or [location B], or none, or both?” Spatial footprint calculation for vague geographies: • “Is this spatial footprint for [item] correct? If not, is it too large, too small, or wrong shape, or wrong place?” Spatio-temporal clustering: • “Does this [item] belong to a cluster named [event] in [location]? If not, what’s wrong: Event, Location, or both?”
  • 24. 11.06.2015F.O.Ostermann - 18th AGILE Conference 24 HYBRID GEO-INFORMATION PROCESSING WORKFLOW
  • 25. 11.06.2015F.O.Ostermann - 18th AGILE Conference 25 HYBRID GEO-INFORMATION PROCESSING WORKFLOW
  • 26. 11.06.2015F.O.Ostermann - 18th AGILE Conference 26 HYBRID GEO-INFORMATION PROCESSING KEY METHODS Key Techniques • Decision Trees • Naive Bayes • Support Vector Machines Key Technologies • Apache Spark / Storm (Analytical geoprocessing tasks) • Pybossa (Crowdsourced supervision) • Cloud Computing
  • 27. 11.06.2015F.O.Ostermann - 18th AGILE Conference 27 HYBRID GEO-INFORMATION PROCESSING FUTURE STRATEGIES Two future implementation strategies • Extension of AIDR with GeoCONAVI functionality • Extension of GeoCONAVI with facilities to crowd-source the supervision of machine learning tasks and the parameterization of analysis function. “The next step will be to decide on a concrete strategy, followed by a step-wise, iterative implementation and testing of the geoprocessing tasks described in this paper.”
  • 28. 11.06.2015F.O.Ostermann - 18th AGILE Conference 28 CHALLENGES AND OPPORTUNITIES OF GEO-SOCIAL MEDIA EARTH OBSERVATION WITH UNCALIBRATED IN-SITU SENSORS Thank you! f.o.ostermann@utwente.nl @f_ostermann nl.linkedin.com/in/foost