Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Geographic knowledge discovery (PhD Theme) by Roberto Zagal


Published on

Article published in ACMGIS 2016, November 2016
It is part of a project and PhD Theme in Labmovil-UPIITA-IPN, SEPI-UPIITA-IPN

Published in: Internet
  • Be the first to comment

  • Be the first to like this

Geographic knowledge discovery (PhD Theme) by Roberto Zagal

  1. 1. Geographical Knowledge Discovery applied to the Social Perception of Pollution in Mexico City Roberto Zagal,Instituto Politecnico Nacional, ESCOM-IPN Felix Mata, Instituto Politecnico Nacional, UPIITA-IPN Christophe Claramunt, Naval Academy Research Institute 1
  2. 2. Introduction (1) • Traditionally Pollution Data has been produced by institutions, government and vendors • But now… the Pollution Data is produced by persons, too 2
  3. 3. Information about Pollution topic is expressed in different ways by: − Government, − News media − People in social networks Introduction (2)
  4. 4. Introduction (3) But… What about the certainty of this information?
  5. 5. Introduction (4)  What about ...  inconsistency? Id Type Description 1 Tweet newspaper1 The index of IMECAS is 135 #CDMX 2 Tweet Newspaper2 @ the #contamination of air is 127 IMECAS #CDMX #bad #new 
  6. 6. Related work • The social data problem has been faced: 1. KDD and Social Mining 2. Formal publications (news media) guide the classification of the interests of social media users [1] 3. Opinion mining and topic modeling [2]. But not using a GKD with an approach of crossing data layers 6
  7. 7. Goal Know how to:  Discover the certainty level of information by  Crossing geographic and social information 7
  8. 8. 8 Solution proposed: GKD Framework For Data Air Polluttion Phase 1 Phase 2 Phase 3
  9. 9. Data extraction: Sample tweet (Phase 1) 9 Id Type Description 1 Tweet newspaper1 TheThe index of IMECAS is 135 #CDMX 2 Tweet Newspaper2 @ the #contamination of air is 127 IMECAS #CDMX #bad #news  We consider tweets from accounts that periodically reports data of air pollution
  10. 10. Data extraction: Domain Detection (Phase 1) 10 Id Type Description 2 Tweet Newspape r2 @ #contamination air is 127 IMECAS #CDMX #bad #new The post is related to a pollution topic
  11. 11. Preprocessing (Phase 2) • Emotion detection [3] • Location extraction 11 Id Type Description 2 Tweet Newspaper2 @ #contamination air is 127 IMECAS #CDMX #bad #new 
  12. 12. • If we detect to which category belongs each set of data: • Health and Pollution, Transport and Pollution Then, we can select which data sources should beThen, we can select which data sources should be crossed with the tweet , in order to discovercrossed with the tweet , in order to discover KnowledgeKnowledge 12 Classification C5 algorithm (Phase 3) Id Description Category 2 @ #contamination air is 127 IMECAS #CDMX #bad #new  Health and pollution
  13. 13. Crossing data (Phase 4) • Example 1: • Inconsistencies in tweet 1 and 2? 13 Id Type Description 1 Tweet Newspaper1 The index of IMECAS is 135 #CDMX 2 Tweet Newspaper2 @ the #contamination of air is 127 IMECAS #CDMX  What is correct?
  14. 14. How to know what tweet is correct? Answer: It was classified in the domain of: Health and pollution ( In Phase 3 ) Then The official data from Healt reports and pollution reports are selected to be crosssed with the Tweet (in Phase 4) 28/10/16 Crossing data (Phase 4)
  15. 15. Crossing data (Phase 4) • Data are crossed considering different attributes, from the tweet is taken the date and hour of publication • When is crossed with the date and hour from official reports of air quality: a match is found 28/10/16
  16. 16. We discovered the tweets are correct but with different location (the location is not include in the original tweet) 28/10/16 1 Tweet newspaper1 The index of IMECAS is in 135 #CDMX #Taxqueña 10:00 hours 2 Tweet Newspaper2 The #contaminación of air is in 127 IMECAS #CDMX  #Indios Verdes 15:00 hours Knowledge Discovered! Crossing data (Phase 4)
  17. 17. Other preliminary results • Following the same approach • Knowledge discovered: what topic are talked by region 17 Topic Geographic Period Health South , West March-June Transport North, East January December Policy and programs Center January December Pollution Surrounding Mexico City January-June Public roads Surrounding Mexico City January- December
  18. 18. Conclusions and Future work • The integration of the geographical and temporal dimensions allow us to discover data correlations knowledge can increase certainty of some information in social networks . • The main contribution is the domain discovery and classification of information is a key element of news aproaches for to discover geographic information. 18
  19. 19. Conclusions and future work • Future work • Use of clustering or deep learning approaches to improve the classification process • The location detection is a hard problem. It can be test another machine learning methods for social media [4, 5] • ¿How can we improve the geographic discovery knowledge considering no explicit links between traditional data sources and social sources? 19
  20. 20. Many Thanks! Questions? Roberto Zagal IPN, México 28/10/16
  21. 21. References [1] Jonghyun Han, Hyunju Lee, Characterizing the interests of social media users: Refinement of a topic model for incorporating heterogeneous media, Information Sciences, Volumes 358–359, 1 September 2016, Pages 112-128, ISSN 0020-0255. [2] Schubert, E., Weiler, M., & Kriegel, H. P. (2014, August). Signitrend: scalable detection of emerging topics in textual streams by hashed significance thresholds. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 871-880). ACM. architecture for analysis of feelings in Facebook with semantic approach (Spanish), pp. 59–69; rec. 2014-06-22; acc. 2014-07-21 59 Research in Computing Science 75 (2014). [4] Ting Hua, Liang Zhao, Feng Chen, Chang-Tien Lu, and Naren Ramakrishnan. 2016. How events unfold: spatiotemporal mining in social media. SIGSPATIAL Special 7, 3 (January 2016), 19-25. DOI= [5] Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In Proceedings of the 19th International Conference on World Wide Web, pages 851–860. ACM, 2010. 28/10/16