SlideShare a Scribd company logo
Geographical Knowledge
Discovery
applied to the
Social Perception of Pollution
in Mexico City
Roberto Zagal,Instituto Politecnico Nacional, ESCOM-IPN
Felix Mata, Instituto Politecnico Nacional, UPIITA-IPN
Christophe Claramunt, Naval Academy Research Institute
1
Introduction (1)
•
Traditionally Pollution Data has been produced by
institutions, government and vendors
•
But now… the Pollution Data is produced by persons, too
2
Information about Pollution topic is expressed in
different ways by:
− Government,
− News media
− People in social networks
Introduction (2)
Introduction (3)
But…
What about the certainty of this
information?
Introduction (4)

What about ...

inconsistency?
Id Type Description
1 Tweet
newspaper1
The index of IMECAS is 135 #CDMX
2 Tweet
Newspaper2
@ the #contamination of air is 127 IMECAS
#CDMX #bad #new 
Related work
•
The social data problem has been faced:
1. KDD and Social Mining
2. Formal publications (news media) guide the classification
of the interests of social media users [1]
3. Opinion mining and topic modeling [2].
But not using a GKD with an approach of crossing data
layers
6
Goal
Know how to:

Discover the certainty level of information
by

Crossing geographic and social information
7
8
Solution proposed:
GKD Framework
For
Data Air Polluttion
Phase 1
Phase 2
Phase 3
Data extraction: Sample tweet (Phase 1)
9
Id Type Description
1 Tweet
newspaper1
TheThe index of IMECAS is 135 #CDMX
2 Tweet
Newspaper2
@ the #contamination of air is 127 IMECAS
#CDMX #bad #news 
We consider tweets from accounts that periodically
reports data of air pollution
Data extraction: Domain Detection
(Phase 1)
10
Id Type Description
2 Tweet
Newspape
r2
@ #contamination air is
127 IMECAS #CDMX #bad
#new
The post is related to a pollution topic
Preprocessing (Phase 2)
•
Emotion detection [3]
•
Location extraction
11
Id Type Description
2 Tweet
Newspaper2
@ #contamination air is 127 IMECAS #CDMX
#bad #new 
•
If we detect to which category belongs each set of data:
•
Health and Pollution, Transport and Pollution
Then, we can select which data sources should beThen, we can select which data sources should be
crossed with the tweet , in order to discovercrossed with the tweet , in order to discover
KnowledgeKnowledge
12
Classification C5 algorithm (Phase 3)
Id Description Category
2 @ #contamination air is 127 IMECAS
#CDMX #bad #new 
Health and
pollution
Crossing data (Phase 4)
•
Example 1:
•
Inconsistencies in tweet 1 and 2?
13
Id Type Description
1 Tweet
Newspaper1
The index of IMECAS is 135 #CDMX
2 Tweet
Newspaper2
@ the #contamination of air is 127 IMECAS
#CDMX 
What is correct?
How to know what tweet is correct?
Answer:
It was classified in the domain of:
Health and pollution ( In Phase 3 )
Then
The official data from Healt reports and pollution reports are
selected to be crosssed with the Tweet (in Phase 4)
28/10/16
Crossing data (Phase 4)
Crossing data (Phase 4)
• Data are crossed considering different attributes,
from the tweet is taken the date and hour of
publication
• When is crossed with the date and hour from
official reports of air quality: a match is found
28/10/16
We discovered the tweets are correct but with
different location (the location is not include in
the original tweet)
28/10/16
1 Tweet
newspaper1
The index of IMECAS is in
135 #CDMX
#Taxqueña 10:00
hours
2 Tweet
Newspaper2
The #contaminación of air
is in 127 IMECAS #CDMX

#Indios
Verdes
15:00
hours
Knowledge
Discovered!
Crossing data (Phase 4)
Other preliminary results
•
Following the same approach
•
Knowledge discovered: what topic are talked by region
17
Topic Geographic Period
Health
South , West March-June
Transport
North, East January
December
Policy and
programs
Center January
December
Pollution
Surrounding Mexico City January-June
Public roads
Surrounding Mexico City January-
December
Conclusions and Future work
•
The integration of the geographical and temporal
dimensions allow us to discover data correlations
knowledge can increase certainty of some
information in social networks .
•
The main contribution is the domain discovery and
classification of information is a key element of news
aproaches for to discover geographic information.
18
Conclusions and future work
•
Future work
•
Use of clustering or deep learning approaches to improve the
classification process
•
The location detection is a hard problem. It can be test another
machine learning methods for social media [4, 5]
•
¿How can we improve the geographic discovery knowledge
considering no explicit links between traditional data sources and
social sources?
19
Many Thanks!
Questions?
Roberto Zagal
zagalmmx@gmail.com
IPN, México
28/10/16
References
[1] Jonghyun Han, Hyunju Lee, Characterizing the interests of social media users: Refinement of a topic model for
incorporating heterogeneous media, Information Sciences, Volumes 358–359, 1 September 2016, Pages 112-128, ISSN
0020-0255.
[2] Schubert, E., Weiler, M., & Kriegel, H. P. (2014, August). Signitrend: scalable detection of emerging topics in textual
streams by hashed significance thresholds. In Proceedings of the 20th ACM SIGKDD international conference on
Knowledge discovery and data mining (pp. 871-880). ACM.
architecture for analysis of feelings in Facebook with semantic approach (Spanish), pp. 59–69; rec. 2014-06-22; acc.
2014-07-21 59 Research in Computing Science 75 (2014). http://www.rcs.cic.ipn.mx/rcs/2014_75/
[4] Ting Hua, Liang Zhao, Feng Chen, Chang-Tien Lu, and Naren Ramakrishnan. 2016. How events unfold: spatiotemporal
mining in social media. SIGSPATIAL Special 7, 3 (January 2016), 19-25. DOI=http://dx.doi.org/10.1145/2876480.2876485
[5] Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. Earthquake shakes twitter users: real-time event detection by social
sensors. In Proceedings of the 19th International Conference on World Wide Web, pages 851–860. ACM, 2010.
28/10/16

More Related Content

What's hot

ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
IJCSEA Journal
 
No misunderstandings during Earthquakes
No misunderstandings during EarthquakesNo misunderstandings during Earthquakes
No misunderstandings during Earthquakes
ISCRAM 2015
 
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Sameera Horawalavithana
 
Research @ the Social Media Lab (@SMLabTO)
Research @ the Social Media Lab (@SMLabTO)Research @ the Social Media Lab (@SMLabTO)
Research @ the Social Media Lab (@SMLabTO)
Philip Mai
 
A method to evaluate the reliability of social media data for social network ...
A method to evaluate the reliability of social media data for social network ...A method to evaluate the reliability of social media data for social network ...
A method to evaluate the reliability of social media data for social network ...
Derek Weber
 
Information propagation in a social network site
Information propagation in a social network siteInformation propagation in a social network site
Information propagation in a social network site
Matteo Magnani
 
Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...
Marco Brambilla
 
Big data analysis of news and social media content
Big data analysis of news and social media contentBig data analysis of news and social media content
Big data analysis of news and social media content
Firas Husseini
 
Individual project 2.20
Individual project 2.20Individual project 2.20
Individual project 2.20
Monisha100
 
Automatic Hate Speech Detection: A Literature Review
Automatic Hate Speech Detection: A Literature ReviewAutomatic Hate Speech Detection: A Literature Review
Automatic Hate Speech Detection: A Literature Review
Dr. Amarjeet Singh
 
Analysis Tweets Korea Politicians(25 Sep2009)Sj
Analysis Tweets Korea Politicians(25 Sep2009)SjAnalysis Tweets Korea Politicians(25 Sep2009)Sj
Analysis Tweets Korea Politicians(25 Sep2009)Sj
WCU Webometrics Institute
 
CATEGORIZING 2019-N-COV TWITTER HASHTAG DATA BY CLUSTERING
CATEGORIZING 2019-N-COV TWITTER HASHTAG DATA BY CLUSTERINGCATEGORIZING 2019-N-COV TWITTER HASHTAG DATA BY CLUSTERING
CATEGORIZING 2019-N-COV TWITTER HASHTAG DATA BY CLUSTERING
ijaia
 
Useful by Piet Daas
Useful by Piet DaasUseful by Piet Daas
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGESOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
cscpconf
 
Isi 2017 presentation on Big Data and bias
Isi 2017 presentation on Big Data and biasIsi 2017 presentation on Big Data and bias
Isi 2017 presentation on Big Data and bias
Piet J.H. Daas
 
Dealing with Information Overload When Using Social Media for Emergency Manag...
Dealing with Information Overload When Using Social Media for Emergency Manag...Dealing with Information Overload When Using Social Media for Emergency Manag...
Dealing with Information Overload When Using Social Media for Emergency Manag...
Mirjam-Mona
 
Rapid identification of new drugs through online monitoring tools: The case o...
Rapid identification of new drugs through online monitoring tools: The case o...Rapid identification of new drugs through online monitoring tools: The case o...
Rapid identification of new drugs through online monitoring tools: The case o...
Australian Drug Foundation
 
On How the Darknet and its Access to SCADA is a Threat to National Critical I...
On How the Darknet and its Access to SCADA is a Threat to National Critical I...On How the Darknet and its Access to SCADA is a Threat to National Critical I...
On How the Darknet and its Access to SCADA is a Threat to National Critical I...
Matthew Kurnava
 
Fake News Detection using Machine Learning
Fake News Detection using Machine LearningFake News Detection using Machine Learning
Fake News Detection using Machine Learning
ijtsrd
 

What's hot (19)

ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
 
No misunderstandings during Earthquakes
No misunderstandings during EarthquakesNo misunderstandings during Earthquakes
No misunderstandings during Earthquakes
 
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
 
Research @ the Social Media Lab (@SMLabTO)
Research @ the Social Media Lab (@SMLabTO)Research @ the Social Media Lab (@SMLabTO)
Research @ the Social Media Lab (@SMLabTO)
 
A method to evaluate the reliability of social media data for social network ...
A method to evaluate the reliability of social media data for social network ...A method to evaluate the reliability of social media data for social network ...
A method to evaluate the reliability of social media data for social network ...
 
Information propagation in a social network site
Information propagation in a social network siteInformation propagation in a social network site
Information propagation in a social network site
 
Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...
 
Big data analysis of news and social media content
Big data analysis of news and social media contentBig data analysis of news and social media content
Big data analysis of news and social media content
 
Individual project 2.20
Individual project 2.20Individual project 2.20
Individual project 2.20
 
Automatic Hate Speech Detection: A Literature Review
Automatic Hate Speech Detection: A Literature ReviewAutomatic Hate Speech Detection: A Literature Review
Automatic Hate Speech Detection: A Literature Review
 
Analysis Tweets Korea Politicians(25 Sep2009)Sj
Analysis Tweets Korea Politicians(25 Sep2009)SjAnalysis Tweets Korea Politicians(25 Sep2009)Sj
Analysis Tweets Korea Politicians(25 Sep2009)Sj
 
CATEGORIZING 2019-N-COV TWITTER HASHTAG DATA BY CLUSTERING
CATEGORIZING 2019-N-COV TWITTER HASHTAG DATA BY CLUSTERINGCATEGORIZING 2019-N-COV TWITTER HASHTAG DATA BY CLUSTERING
CATEGORIZING 2019-N-COV TWITTER HASHTAG DATA BY CLUSTERING
 
Useful by Piet Daas
Useful by Piet DaasUseful by Piet Daas
Useful by Piet Daas
 
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGESOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
 
Isi 2017 presentation on Big Data and bias
Isi 2017 presentation on Big Data and biasIsi 2017 presentation on Big Data and bias
Isi 2017 presentation on Big Data and bias
 
Dealing with Information Overload When Using Social Media for Emergency Manag...
Dealing with Information Overload When Using Social Media for Emergency Manag...Dealing with Information Overload When Using Social Media for Emergency Manag...
Dealing with Information Overload When Using Social Media for Emergency Manag...
 
Rapid identification of new drugs through online monitoring tools: The case o...
Rapid identification of new drugs through online monitoring tools: The case o...Rapid identification of new drugs through online monitoring tools: The case o...
Rapid identification of new drugs through online monitoring tools: The case o...
 
On How the Darknet and its Access to SCADA is a Threat to National Critical I...
On How the Darknet and its Access to SCADA is a Threat to National Critical I...On How the Darknet and its Access to SCADA is a Threat to National Critical I...
On How the Darknet and its Access to SCADA is a Threat to National Critical I...
 
Fake News Detection using Machine Learning
Fake News Detection using Machine LearningFake News Detection using Machine Learning
Fake News Detection using Machine Learning
 

Viewers also liked

Research Methodology
Research MethodologyResearch Methodology
Research Methodology
sh_neha252
 
Civic Exchange - 2009 The Air We Breathe Conference - Application of Studies ...
Civic Exchange - 2009 The Air We Breathe Conference - Application of Studies ...Civic Exchange - 2009 The Air We Breathe Conference - Application of Studies ...
Civic Exchange - 2009 The Air We Breathe Conference - Application of Studies ...
Civic Exchange
 
Svs R
Svs RSvs R
Svs R
pollution1
 
AIR AND NOISE POLLUTION RESEARCH
AIR AND NOISE POLLUTION RESEARCHAIR AND NOISE POLLUTION RESEARCH
AIR AND NOISE POLLUTION RESEARCH
kl university
 
Air Pollution, Asthma, Triggers & Health - Research and Remediation Strategies
Air Pollution, Asthma, Triggers & Health - Research and Remediation StrategiesAir Pollution, Asthma, Triggers & Health - Research and Remediation Strategies
Air Pollution, Asthma, Triggers & Health - Research and Remediation Strategies
Sean McCormick
 
Air pollutionand its effects and causes
Air pollutionand its effects and causesAir pollutionand its effects and causes
Air pollutionand its effects and causes
SRINIVASULU N V
 
Academic Stress For Management Students
Academic Stress For Management StudentsAcademic Stress For Management Students
Academic Stress For Management Students
Latha setna
 
Impact of academic stress on students
Impact of academic stress on studentsImpact of academic stress on students
Impact of academic stress on students
Karaikudi Institute of Management
 
AIR POLLUTION MONITORING USING RS
AIR POLLUTION MONITORING USING RSAIR POLLUTION MONITORING USING RS
AIR POLLUTION MONITORING USING RS
Abhiram Kanigolla
 
Air Pollution
Air PollutionAir Pollution
Air Pollution.Ppt
Air Pollution.PptAir Pollution.Ppt
Air Pollution.Ppt
guestc24e66e9
 
Air pollution
Air pollutionAir pollution
Air pollution
Sesham Akhila
 
Research Methodology Lecture for Master & Phd Students
Research Methodology  Lecture for Master & Phd StudentsResearch Methodology  Lecture for Master & Phd Students
Research Methodology Lecture for Master & Phd Students
SHAYA'A OTHMAN MANAGEMENT & RESEARCH METHODOLOGY
 
Research methodology ppt babasab
Research methodology ppt babasab Research methodology ppt babasab
Research methodology ppt babasab
Babasab Patil
 
Air pollution: its causes,effects and pollutants
Air pollution: its causes,effects and pollutantsAir pollution: its causes,effects and pollutants
Air pollution: its causes,effects and pollutants
Maliha Eesha
 
A RESEARCH ON EFFECT OF STRESS AMONG KMPh STUDENTS
 A RESEARCH ON EFFECT OF STRESS AMONG KMPh STUDENTS  A RESEARCH ON EFFECT OF STRESS AMONG KMPh STUDENTS
A RESEARCH ON EFFECT OF STRESS AMONG KMPh STUDENTS
Natrah Abd Rahman
 
Pollution.Ppt
Pollution.PptPollution.Ppt
Pollution.Ppt
SVS
 
Pollution its types, causes and effects by naveed.m
Pollution its types, causes and effects by naveed.mPollution its types, causes and effects by naveed.m
Pollution its types, causes and effects by naveed.m
Naveed Abbas Malik
 
Air pollution final.ppt
Air pollution final.pptAir pollution final.ppt
Air pollution final.ppt
Aeb Yam Durante
 

Viewers also liked (19)

Research Methodology
Research MethodologyResearch Methodology
Research Methodology
 
Civic Exchange - 2009 The Air We Breathe Conference - Application of Studies ...
Civic Exchange - 2009 The Air We Breathe Conference - Application of Studies ...Civic Exchange - 2009 The Air We Breathe Conference - Application of Studies ...
Civic Exchange - 2009 The Air We Breathe Conference - Application of Studies ...
 
Svs R
Svs RSvs R
Svs R
 
AIR AND NOISE POLLUTION RESEARCH
AIR AND NOISE POLLUTION RESEARCHAIR AND NOISE POLLUTION RESEARCH
AIR AND NOISE POLLUTION RESEARCH
 
Air Pollution, Asthma, Triggers & Health - Research and Remediation Strategies
Air Pollution, Asthma, Triggers & Health - Research and Remediation StrategiesAir Pollution, Asthma, Triggers & Health - Research and Remediation Strategies
Air Pollution, Asthma, Triggers & Health - Research and Remediation Strategies
 
Air pollutionand its effects and causes
Air pollutionand its effects and causesAir pollutionand its effects and causes
Air pollutionand its effects and causes
 
Academic Stress For Management Students
Academic Stress For Management StudentsAcademic Stress For Management Students
Academic Stress For Management Students
 
Impact of academic stress on students
Impact of academic stress on studentsImpact of academic stress on students
Impact of academic stress on students
 
AIR POLLUTION MONITORING USING RS
AIR POLLUTION MONITORING USING RSAIR POLLUTION MONITORING USING RS
AIR POLLUTION MONITORING USING RS
 
Air Pollution
Air PollutionAir Pollution
Air Pollution
 
Air Pollution.Ppt
Air Pollution.PptAir Pollution.Ppt
Air Pollution.Ppt
 
Air pollution
Air pollutionAir pollution
Air pollution
 
Research Methodology Lecture for Master & Phd Students
Research Methodology  Lecture for Master & Phd StudentsResearch Methodology  Lecture for Master & Phd Students
Research Methodology Lecture for Master & Phd Students
 
Research methodology ppt babasab
Research methodology ppt babasab Research methodology ppt babasab
Research methodology ppt babasab
 
Air pollution: its causes,effects and pollutants
Air pollution: its causes,effects and pollutantsAir pollution: its causes,effects and pollutants
Air pollution: its causes,effects and pollutants
 
A RESEARCH ON EFFECT OF STRESS AMONG KMPh STUDENTS
 A RESEARCH ON EFFECT OF STRESS AMONG KMPh STUDENTS  A RESEARCH ON EFFECT OF STRESS AMONG KMPh STUDENTS
A RESEARCH ON EFFECT OF STRESS AMONG KMPh STUDENTS
 
Pollution.Ppt
Pollution.PptPollution.Ppt
Pollution.Ppt
 
Pollution its types, causes and effects by naveed.m
Pollution its types, causes and effects by naveed.mPollution its types, causes and effects by naveed.m
Pollution its types, causes and effects by naveed.m
 
Air pollution final.ppt
Air pollution final.pptAir pollution final.ppt
Air pollution final.ppt
 

Similar to Geographic knowledge discovery (PhD Theme) by Roberto Zagal

A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
Paolo Missier
 
IRJET- Sentiment Analysis using Machine Learning
IRJET- Sentiment Analysis using Machine LearningIRJET- Sentiment Analysis using Machine Learning
IRJET- Sentiment Analysis using Machine Learning
IRJET Journal
 
Analyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-TweetsAnalyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-Tweets
RESHAN FARAZ
 
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
University of Groningen (The Netherlands)
 
From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?
Yiannis Kompatsiaris
 
DigiCCurr 2013 PhD Workshop - Citizen Science and Data Curation: Who needs what?
DigiCCurr 2013 PhD Workshop - Citizen Science and Data Curation: Who needs what?DigiCCurr 2013 PhD Workshop - Citizen Science and Data Curation: Who needs what?
DigiCCurr 2013 PhD Workshop - Citizen Science and Data Curation: Who needs what?
Todd Suomela
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
Farida Vis
 
Critically Assembling Data, Processes & Things: Toward and Open Smart City
Critically Assembling Data, Processes & Things: Toward and Open Smart CityCritically Assembling Data, Processes & Things: Toward and Open Smart City
Critically Assembling Data, Processes & Things: Toward and Open Smart City
Communication and Media Studies, Carleton University
 
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
IRJET Journal
 
Analysis of Topic Modeling with Unpooled and Pooled Tweets and Exploration of...
Analysis of Topic Modeling with Unpooled and Pooled Tweets and Exploration of...Analysis of Topic Modeling with Unpooled and Pooled Tweets and Exploration of...
Analysis of Topic Modeling with Unpooled and Pooled Tweets and Exploration of...
IJCSEA Journal
 
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
IJCSEA Journal
 
International Journal of Computer Science, Engineering and Applications (IJCSEA)
International Journal of Computer Science, Engineering and Applications (IJCSEA)International Journal of Computer Science, Engineering and Applications (IJCSEA)
International Journal of Computer Science, Engineering and Applications (IJCSEA)
IJCSEA Journal
 
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
IJCSEA Journal
 
CS322 Network Analysis.docx
CS322 Network Analysis.docxCS322 Network Analysis.docx
CS322 Network Analysis.docx
write31
 
Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...
Axel Bruns
 
A framework for real time semantic social media analysis
A framework for real time semantic social media analysis A framework for real time semantic social media analysis
A framework for real time semantic social media analysis
Zelia Blaga
 
Combating propaganda texts using transfer learning
Combating propaganda texts using transfer learningCombating propaganda texts using transfer learning
Combating propaganda texts using transfer learning
IAESIJAI
 
Use of ICT in Research Writing: Tools and Technology
Use of ICT in Research Writing: Tools and TechnologyUse of ICT in Research Writing: Tools and Technology
Use of ICT in Research Writing: Tools and Technology
ssuser1310d0
 
s00146-014-0549-4.pdf
s00146-014-0549-4.pdfs00146-014-0549-4.pdf
s00146-014-0549-4.pdf
EngrAliSarfrazSiddiq
 
understanding the pandemic through mining covid news using natural language p...
understanding the pandemic through mining covid news using natural language p...understanding the pandemic through mining covid news using natural language p...
understanding the pandemic through mining covid news using natural language p...
Kishor Datta Gupta
 

Similar to Geographic knowledge discovery (PhD Theme) by Roberto Zagal (20)

A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
 
IRJET- Sentiment Analysis using Machine Learning
IRJET- Sentiment Analysis using Machine LearningIRJET- Sentiment Analysis using Machine Learning
IRJET- Sentiment Analysis using Machine Learning
 
Analyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-TweetsAnalyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-Tweets
 
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
 
From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?
 
DigiCCurr 2013 PhD Workshop - Citizen Science and Data Curation: Who needs what?
DigiCCurr 2013 PhD Workshop - Citizen Science and Data Curation: Who needs what?DigiCCurr 2013 PhD Workshop - Citizen Science and Data Curation: Who needs what?
DigiCCurr 2013 PhD Workshop - Citizen Science and Data Curation: Who needs what?
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
 
Critically Assembling Data, Processes & Things: Toward and Open Smart City
Critically Assembling Data, Processes & Things: Toward and Open Smart CityCritically Assembling Data, Processes & Things: Toward and Open Smart City
Critically Assembling Data, Processes & Things: Toward and Open Smart City
 
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
 
Analysis of Topic Modeling with Unpooled and Pooled Tweets and Exploration of...
Analysis of Topic Modeling with Unpooled and Pooled Tweets and Exploration of...Analysis of Topic Modeling with Unpooled and Pooled Tweets and Exploration of...
Analysis of Topic Modeling with Unpooled and Pooled Tweets and Exploration of...
 
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
 
International Journal of Computer Science, Engineering and Applications (IJCSEA)
International Journal of Computer Science, Engineering and Applications (IJCSEA)International Journal of Computer Science, Engineering and Applications (IJCSEA)
International Journal of Computer Science, Engineering and Applications (IJCSEA)
 
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
 
CS322 Network Analysis.docx
CS322 Network Analysis.docxCS322 Network Analysis.docx
CS322 Network Analysis.docx
 
Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...
 
A framework for real time semantic social media analysis
A framework for real time semantic social media analysis A framework for real time semantic social media analysis
A framework for real time semantic social media analysis
 
Combating propaganda texts using transfer learning
Combating propaganda texts using transfer learningCombating propaganda texts using transfer learning
Combating propaganda texts using transfer learning
 
Use of ICT in Research Writing: Tools and Technology
Use of ICT in Research Writing: Tools and TechnologyUse of ICT in Research Writing: Tools and Technology
Use of ICT in Research Writing: Tools and Technology
 
s00146-014-0549-4.pdf
s00146-014-0549-4.pdfs00146-014-0549-4.pdf
s00146-014-0549-4.pdf
 
understanding the pandemic through mining covid news using natural language p...
understanding the pandemic through mining covid news using natural language p...understanding the pandemic through mining covid news using natural language p...
understanding the pandemic through mining covid news using natural language p...
 

Recently uploaded

办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
xjq03c34
 
一比一原版(USYD毕业证)悉尼大学毕业证如何办理
一比一原版(USYD毕业证)悉尼大学毕业证如何办理一比一原版(USYD毕业证)悉尼大学毕业证如何办理
一比一原版(USYD毕业证)悉尼大学毕业证如何办理
k4ncd0z
 
Should Repositories Participate in the Fediverse?
Should Repositories Participate in the Fediverse?Should Repositories Participate in the Fediverse?
Should Repositories Participate in the Fediverse?
Paul Walk
 
Bengaluru Dreamin' 24 - Personal Branding
Bengaluru Dreamin' 24 - Personal BrandingBengaluru Dreamin' 24 - Personal Branding
Bengaluru Dreamin' 24 - Personal Branding
Tarandeep Singh
 
Securing BGP: Operational Strategies and Best Practices for Network Defenders...
Securing BGP: Operational Strategies and Best Practices for Network Defenders...Securing BGP: Operational Strategies and Best Practices for Network Defenders...
Securing BGP: Operational Strategies and Best Practices for Network Defenders...
APNIC
 
HijackLoader Evolution: Interactive Process Hollowing
HijackLoader Evolution: Interactive Process HollowingHijackLoader Evolution: Interactive Process Hollowing
HijackLoader Evolution: Interactive Process Hollowing
Donato Onofri
 
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
thezot
 
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
APNIC
 
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
3a0sd7z3
 
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
3a0sd7z3
 
怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
rtunex8r
 
Discover the benefits of outsourcing SEO to India
Discover the benefits of outsourcing SEO to IndiaDiscover the benefits of outsourcing SEO to India
Discover the benefits of outsourcing SEO to India
davidjhones387
 

Recently uploaded (12)

办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
 
一比一原版(USYD毕业证)悉尼大学毕业证如何办理
一比一原版(USYD毕业证)悉尼大学毕业证如何办理一比一原版(USYD毕业证)悉尼大学毕业证如何办理
一比一原版(USYD毕业证)悉尼大学毕业证如何办理
 
Should Repositories Participate in the Fediverse?
Should Repositories Participate in the Fediverse?Should Repositories Participate in the Fediverse?
Should Repositories Participate in the Fediverse?
 
Bengaluru Dreamin' 24 - Personal Branding
Bengaluru Dreamin' 24 - Personal BrandingBengaluru Dreamin' 24 - Personal Branding
Bengaluru Dreamin' 24 - Personal Branding
 
Securing BGP: Operational Strategies and Best Practices for Network Defenders...
Securing BGP: Operational Strategies and Best Practices for Network Defenders...Securing BGP: Operational Strategies and Best Practices for Network Defenders...
Securing BGP: Operational Strategies and Best Practices for Network Defenders...
 
HijackLoader Evolution: Interactive Process Hollowing
HijackLoader Evolution: Interactive Process HollowingHijackLoader Evolution: Interactive Process Hollowing
HijackLoader Evolution: Interactive Process Hollowing
 
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
 
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
 
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
 
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
 
怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
 
Discover the benefits of outsourcing SEO to India
Discover the benefits of outsourcing SEO to IndiaDiscover the benefits of outsourcing SEO to India
Discover the benefits of outsourcing SEO to India
 

Geographic knowledge discovery (PhD Theme) by Roberto Zagal

  • 1. Geographical Knowledge Discovery applied to the Social Perception of Pollution in Mexico City Roberto Zagal,Instituto Politecnico Nacional, ESCOM-IPN Felix Mata, Instituto Politecnico Nacional, UPIITA-IPN Christophe Claramunt, Naval Academy Research Institute 1
  • 2. Introduction (1) • Traditionally Pollution Data has been produced by institutions, government and vendors • But now… the Pollution Data is produced by persons, too 2
  • 3. Information about Pollution topic is expressed in different ways by: − Government, − News media − People in social networks Introduction (2)
  • 4. Introduction (3) But… What about the certainty of this information?
  • 5. Introduction (4)  What about ...  inconsistency? Id Type Description 1 Tweet newspaper1 The index of IMECAS is 135 #CDMX 2 Tweet Newspaper2 @ the #contamination of air is 127 IMECAS #CDMX #bad #new 
  • 6. Related work • The social data problem has been faced: 1. KDD and Social Mining 2. Formal publications (news media) guide the classification of the interests of social media users [1] 3. Opinion mining and topic modeling [2]. But not using a GKD with an approach of crossing data layers 6
  • 7. Goal Know how to:  Discover the certainty level of information by  Crossing geographic and social information 7
  • 8. 8 Solution proposed: GKD Framework For Data Air Polluttion Phase 1 Phase 2 Phase 3
  • 9. Data extraction: Sample tweet (Phase 1) 9 Id Type Description 1 Tweet newspaper1 TheThe index of IMECAS is 135 #CDMX 2 Tweet Newspaper2 @ the #contamination of air is 127 IMECAS #CDMX #bad #news  We consider tweets from accounts that periodically reports data of air pollution
  • 10. Data extraction: Domain Detection (Phase 1) 10 Id Type Description 2 Tweet Newspape r2 @ #contamination air is 127 IMECAS #CDMX #bad #new The post is related to a pollution topic
  • 11. Preprocessing (Phase 2) • Emotion detection [3] • Location extraction 11 Id Type Description 2 Tweet Newspaper2 @ #contamination air is 127 IMECAS #CDMX #bad #new 
  • 12. • If we detect to which category belongs each set of data: • Health and Pollution, Transport and Pollution Then, we can select which data sources should beThen, we can select which data sources should be crossed with the tweet , in order to discovercrossed with the tweet , in order to discover KnowledgeKnowledge 12 Classification C5 algorithm (Phase 3) Id Description Category 2 @ #contamination air is 127 IMECAS #CDMX #bad #new  Health and pollution
  • 13. Crossing data (Phase 4) • Example 1: • Inconsistencies in tweet 1 and 2? 13 Id Type Description 1 Tweet Newspaper1 The index of IMECAS is 135 #CDMX 2 Tweet Newspaper2 @ the #contamination of air is 127 IMECAS #CDMX  What is correct?
  • 14. How to know what tweet is correct? Answer: It was classified in the domain of: Health and pollution ( In Phase 3 ) Then The official data from Healt reports and pollution reports are selected to be crosssed with the Tweet (in Phase 4) 28/10/16 Crossing data (Phase 4)
  • 15. Crossing data (Phase 4) • Data are crossed considering different attributes, from the tweet is taken the date and hour of publication • When is crossed with the date and hour from official reports of air quality: a match is found 28/10/16
  • 16. We discovered the tweets are correct but with different location (the location is not include in the original tweet) 28/10/16 1 Tweet newspaper1 The index of IMECAS is in 135 #CDMX #Taxqueña 10:00 hours 2 Tweet Newspaper2 The #contaminación of air is in 127 IMECAS #CDMX  #Indios Verdes 15:00 hours Knowledge Discovered! Crossing data (Phase 4)
  • 17. Other preliminary results • Following the same approach • Knowledge discovered: what topic are talked by region 17 Topic Geographic Period Health South , West March-June Transport North, East January December Policy and programs Center January December Pollution Surrounding Mexico City January-June Public roads Surrounding Mexico City January- December
  • 18. Conclusions and Future work • The integration of the geographical and temporal dimensions allow us to discover data correlations knowledge can increase certainty of some information in social networks . • The main contribution is the domain discovery and classification of information is a key element of news aproaches for to discover geographic information. 18
  • 19. Conclusions and future work • Future work • Use of clustering or deep learning approaches to improve the classification process • The location detection is a hard problem. It can be test another machine learning methods for social media [4, 5] • ¿How can we improve the geographic discovery knowledge considering no explicit links between traditional data sources and social sources? 19
  • 21. References [1] Jonghyun Han, Hyunju Lee, Characterizing the interests of social media users: Refinement of a topic model for incorporating heterogeneous media, Information Sciences, Volumes 358–359, 1 September 2016, Pages 112-128, ISSN 0020-0255. [2] Schubert, E., Weiler, M., & Kriegel, H. P. (2014, August). Signitrend: scalable detection of emerging topics in textual streams by hashed significance thresholds. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 871-880). ACM. architecture for analysis of feelings in Facebook with semantic approach (Spanish), pp. 59–69; rec. 2014-06-22; acc. 2014-07-21 59 Research in Computing Science 75 (2014). http://www.rcs.cic.ipn.mx/rcs/2014_75/ [4] Ting Hua, Liang Zhao, Feng Chen, Chang-Tien Lu, and Naren Ramakrishnan. 2016. How events unfold: spatiotemporal mining in social media. SIGSPATIAL Special 7, 3 (January 2016), 19-25. DOI=http://dx.doi.org/10.1145/2876480.2876485 [5] Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In Proceedings of the 19th International Conference on World Wide Web, pages 851–860. ACM, 2010. 28/10/16

Editor's Notes

  1. SLIDE 1: 1.- Good morning. 2.- My name is Roberto. I'm PHD student of National Polytechnic Institute in Mexico City. 3.- Thanks for the invitation to be here today 4.- I’m talking about of“Geographical Knowledge Discovery applied to the Social Perception of Pollution in Mexico City” 5.- This research has the advice of Dr. Felix and Dr. Christophe Claramunt 7.- in recent years, air pollution in Mexico City has increased considerably 8.- The air pollution, it is a problem that requires analysis of multiple domains of knowledge because actually we have more information in data sources more complex.
  2. SLIDE 2: Currently, social networks become increasingly relevant as a means of diffusion and sharing of citizen views. In order to discover new knowledge in air pollution, We need to consider data from diferent sources, like: Government, Social groups, social media and other web data. In social media, the people make comments and observations, they might reflect important on different topics related in air pollution.
  3. SLIDE 3. 1.-We reviewed three representative and heterogenous data sources: 2.- Government of Mexico City, because it generates information in traditional databases about pollution. The informaiton is trustworthy 3.- News media, it is an important element, because it provide a valuable source for deriving on-the-fly citizens opinions. 4.- For example, people in social networks express complaints, opinions, reports of problems and observations regarding air pollution topic, 5.- We consider the social networks as a instantaneous picture of the social perception of air pollution. 6.- Now, the question is: How can we cross this information to discover new confidence knowledge about pollution?
  4. SLIDE 4: 1. Information produced by institutions has degree of certainty and veracity, It is assumed that it is true. 2. But. 3. All information produced in social networks ¿can be trustworthy?. 4.- What is the level of certainty in the information produced in social networks related to others sources?. 5.- This is the statement problem of this preliminary investigation.
  5. SLIDE 5: 1.The information, sometimes needs to be verified to KNOW if it is correct or not 2. For example: 3. We have an inconsistency in the following two tweets about air quality 4. The IMECAS is the acronym of The Metropolitan Index of Air Quality in the city of Mexico. 5. In tweet 1: newspaper report that the imecas index is one hundred thirty five (135). 6. In tweet 2: newspaper report that the imecas index is one hundred twenty seven (127). 7. Which one have the correct information?. 8. How can we detect and resolve the inconsistency in the information?.
  6. SLIDE 6: 1.- The papers have not a explicit relation with the geographic dimension 2.- And they don’t explore the certainty of information.
  7. SLIDE 7: 1. It means, that we can discover the level of certain of the publications that appear in social media 2. by crossing these data with other additional formal of . 4. The geographic information can be used as a linker to different data sources.
  8. SLIDE 8: 1.- We propose a GKD Framework for Air Polluttion that includes four Phases: 2.- Data extraction: is oriented to get information from social sources and newspapers. 3.- The processing phase: includes locations and sentiment detection. 4.- The Classification categoriza los datos en topicos especificos. 5.- Crossing data, helps to detect of level of information certainty.
  9. SLIDE 9: 1.- For extraction, we consider tweets from accounts that periodically report data of air pollution, for example digital newspapers of Mexico 2.- Extraction continues using initial key phrases and hashtags, like #CDMX or #AirPollution. 4.- After, a data cleaning is developed: that includes tokenization, removing of stop words and stemming.
  10. SLIDE 10: 1. Domain detection is pre-classify semantically tweets to a category of pollution, for example: 2. In tweet 2 the term “contamination" matches with the “pollution” class, by synonymy 3. Next, the word IMECAS matches with the class “IMECAS” that is a subclass of “IndexOfAirQuality”. 4. We can say, that the post is related to a pollution topic, it is a generic class. 5. it is possible that the tweet belongs to a more specific category that describes the nature of the post.
  11. SLIDE 11: 1. In this part, we detect if the post is related to a positive or negative feeling by words or emoticons. This detection is useful for identifying trends in the social perception of a specific topic of pollution, for example tweets positive to talk about politics and pollution. 2. Regarding the location of the tweet, we assume that each tweet contains the information in the metadata about of its place and time of publication. 3. Sometimes a tweet not contain explicit or implicit information that allows to define its location. In this case only it considered the time of publication for the following phases.
  12. SLIDE 12: 1. If we detect to which specific category belongs each set of data: we can select the data sources which should be crossed with the tweet , in order to discover new Knowledge and certainty . 2. The Tweet 2 is classified in a more specific category; health and pollution. 3. We choose C5 because, is one of the algorithms that have shown good performance in knolewge discovery in data bases.
  13. Slide 13. At this stage quantitative values and qualitative values are separated. 1) Using the ontology we can identify and separate the terms like: IMECAS, Air and Pollution. 2) The a numerical value IMECA is separated. 3) Now, we know that this value must be in a range from 0 to 201 according to definition of index IMECA. If this happens, we can say that we have found a valid value of air quality. 4) Is this possible that this approach does not work in some cases. 5) The Tweets do not contain information about its location but we consider the time of publication. 6) Using the IMECA value and time of Tweet, we proceed to search for matches in government data sources on air quality
  14. Slide 14: 1. Through the categorization of the tweet, we know that we can exchange information with the database of air quality, because it is related to pollution and public health topics.
  15. SLIDE 15: 1.- The Air Quality Data is provided by: Environmental monitoring ministry of CDMX goverment
  16. SLIDE 16: 1. The tweet have not no location, but using its time component 2. We find in official data a match using the value of IMECA 3. Then, the official data help us to discover the tweet location
  17. SLIDE 17: 1. In these additional results, we can see the classification of tweets by topic and location. 2. These results show the trend of social perception in certain subjects and geographical areas.
  18. Slide 18. 1.- The information of dimensions. 2.- The domain discovery.