Geographic knowledge discovery (PhD Theme) by Roberto Zagal

Geographical Knowledge
Discovery
applied to the
Social Perception of Pollution
in Mexico City
Roberto Zagal,Instituto Politecnico Nacional, ESCOM-IPN
Felix Mata, Instituto Politecnico Nacional, UPIITA-IPN
Christophe Claramunt, Naval Academy Research Institute
1

Introduction (1)
•
Traditionally Pollution Data has been produced by
institutions, government and vendors
•
But now… the Pollution Data is produced by persons, too
2

Information about Pollution topic is expressed in
different ways by:
− Government,
− News media
− People in social networks
Introduction (2)

Introduction (3)
But…
What about the certainty of this
information?

Introduction (4)

What about ...

inconsistency?
Id Type Description
1 Tweet
newspaper1
The index of IMECAS is 135 #CDMX
2 Tweet
Newspaper2
@ the #contamination of air is 127 IMECAS
#CDMX #bad #new 

Related work
•
The social data problem has been faced:
1. KDD and Social Mining
2. Formal publications (news media) guide the classification
of the interests of social media users [1]
3. Opinion mining and topic modeling [2].
But not using a GKD with an approach of crossing data
layers
6

Goal
Know how to:

Discover the certainty level of information
by

Crossing geographic and social information
7

8
Solution proposed:
GKD Framework
For
Data Air Polluttion
Phase 1
Phase 2
Phase 3

Data extraction: Sample tweet (Phase 1)
9
Id Type Description
1 Tweet
newspaper1
TheThe index of IMECAS is 135 #CDMX
2 Tweet
Newspaper2
#CDMX #bad #news 
We consider tweets from accounts that periodically
reports data of air pollution

Data extraction: Domain Detection
(Phase 1)
10
Id Type Description
2 Tweet
Newspape
r2
@ #contamination air is
127 IMECAS #CDMX #bad
#new
The post is related to a pollution topic

Preprocessing (Phase 2)
•
Emotion detection [3]
•
Location extraction
11
Id Type Description
2 Tweet
Newspaper2
@ #contamination air is 127 IMECAS #CDMX
#bad #new 

•
If we detect to which category belongs each set of data:
•
Health and Pollution, Transport and Pollution
Then, we can select which data sources should beThen, we can select which data sources should be
crossed with the tweet , in order to discovercrossed with the tweet , in order to discover
KnowledgeKnowledge
12
Classification C5 algorithm (Phase 3)
Id Description Category
2 @ #contamination air is 127 IMECAS
#CDMX #bad #new 
Health and
pollution

Crossing data (Phase 4)
•
Example 1:
•
Inconsistencies in tweet 1 and 2?
13
Id Type Description
1 Tweet
Newspaper1
The index of IMECAS is 135 #CDMX
2 Tweet
Newspaper2
#CDMX 
What is correct?

How to know what tweet is correct?
Answer:
It was classified in the domain of:
Health and pollution ( In Phase 3 )
Then
The official data from Healt reports and pollution reports are
selected to be crosssed with the Tweet (in Phase 4)
28/10/16

• Data are crossed considering different attributes,
from the tweet is taken the date and hour of
publication
• When is crossed with the date and hour from
official reports of air quality: a match is found
28/10/16

We discovered the tweets are correct but with
different location (the location is not include in
the original tweet)
28/10/16
1 Tweet
newspaper1
The index of IMECAS is in
135 #CDMX
#Taxqueña 10:00
hours
2 Tweet
Newspaper2
The #contaminación of air
is in 127 IMECAS #CDMX

#Indios
Verdes
15:00
hours
Knowledge
Discovered!

Other preliminary results
•
Following the same approach
•
Knowledge discovered: what topic are talked by region
17
Topic Geographic Period
Health
South , West March-June
Transport
North, East January
December
Policy and
programs
Center January
December
Pollution
Surrounding Mexico City January-June
Public roads
Surrounding Mexico City January-
December

Conclusions and Future work
•
The integration of the geographical and temporal
dimensions allow us to discover data correlations
knowledge can increase certainty of some
information in social networks .
•
The main contribution is the domain discovery and
classification of information is a key element of news
aproaches for to discover geographic information.
18

Conclusions and future work
•
Future work
•
Use of clustering or deep learning approaches to improve the
classification process
•
The location detection is a hard problem. It can be test another
machine learning methods for social media [4, 5]
•
¿How can we improve the geographic discovery knowledge
considering no explicit links between traditional data sources and
social sources?
19

Many Thanks!
Questions?
Roberto Zagal
zagalmmx@gmail.com
IPN, México
28/10/16

References
[1] Jonghyun Han, Hyunju Lee, Characterizing the interests of social media users: Refinement of a topic model for
incorporating heterogeneous media, Information Sciences, Volumes 358–359, 1 September 2016, Pages 112-128, ISSN
0020-0255.
[2] Schubert, E., Weiler, M., & Kriegel, H. P. (2014, August). Signitrend: scalable detection of emerging topics in textual
streams by hashed significance thresholds. In Proceedings of the 20th ACM SIGKDD international conference on
Knowledge discovery and data mining (pp. 871-880). ACM.
architecture for analysis of feelings in Facebook with semantic approach (Spanish), pp. 59–69; rec. 2014-06-22; acc.
2014-07-21 59 Research in Computing Science 75 (2014). http://www.rcs.cic.ipn.mx/rcs/2014_75/
[4] Ting Hua, Liang Zhao, Feng Chen, Chang-Tien Lu, and Naren Ramakrishnan. 2016. How events unfold: spatiotemporal
mining in social media. SIGSPATIAL Special 7, 3 (January 2016), 19-25. DOI=http://dx.doi.org/10.1145/2876480.2876485
[5] Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. Earthquake shakes twitter users: real-time event detection by social
sensors. In Proceedings of the 19th International Conference on World Wide Web, pages 851–860. ACM, 2010.
28/10/16

Geographic knowledge discovery (PhD Theme) by Roberto Zagal

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (19)

Similar to Geographic knowledge discovery (PhD Theme) by Roberto Zagal

Similar to Geographic knowledge discovery (PhD Theme) by Roberto Zagal (20)

Recently uploaded

Recently uploaded (12)

Geographic knowledge discovery (PhD Theme) by Roberto Zagal

Editor's Notes