SlideShare a Scribd company logo
1 of 20
Download to read offline
D-Sieve: A Novel Data Processing
Engine for Crises Related Social
Messages
Soudip Roy Chowdhury (INRIA & UPSUD, France)
Hemant Purohit (Wright State Uni., USA)
Muhammad Imran (QCRI, Doha)
SWDM 2015 May18, 2015
Social Media During Crises
Courtesy:
http://worldmap.harvard.edu/maps/nepalqu
ake
And it contains…
Timely
Information
And it contains…
Actionable
Information
And it contains…
Tactical
Information
Social Data Processing
• Real-time analysis of crises related social data
is necessary
– Manual analysis
• Standby task force [N. Morrow et al. 2011]
– System mediated analysis
• Unsupervised learning [T. Sakaki et al. 2010]
• Supervised learning [S. Roy Chowdhury et al. 2013]
(not scalable)
(noisy / ambiguous data)
(scarce training data / non-uniform
class distribution)
D-Sieve
• Post-classification data processing engine in a
supervised classification setting
• Noise
ClassA
ClassB
• Nois
e
ClassAClassB
Supervised
Classifier
D-SieveRaw Data
Correctly
Classified Data
Data Model
• A message contains
TEXT Date
Data Model
• A message contains
TEXT Date
hashtags
content
URL
Our Approach
• D-Sieve extracts two content features to
classify messages correctly
– Stable hashtag association
– Stable named entity association
Stable Hashtag Association
Relative frequencies of stable hashtags for an
event remain constant over time….
Stable Hashtag Association
• A hashtag is stable, if the moving average score
of its frequency distribution (after k posts)
becomes steady [X.S. Yang et al. 2013]
• Stable hashtag set contains all stable
hashtags for an incident
• Stable hashtag association metric for an input
message
• and are set of stable hashtags for true
positive and true negative classes
(f)
VH =
| (fTP -fTN )Ç Hi | - | (fTN -fTP )Ç Hi |
Hi
fTP
fTN
Stable Named Entity Association
• Named entities are extracted from the
message content + URL content
• Stable named entity association metric for an
input message VE =
| (ETP - ETN )Ç IE | - | (ETN - ETP )Ç IE |
IE
• We calculate the final feature association
metrics
• Feature weights ( ) are chosen such that
ranges [-1,1]
• Rule of thumb weights are set as 0.5
SC =WH.VH +WE.VE
WH ,WE
SC
Functional Architecture
In our experiments..
CrisisLexT6
[A.Oleanu et al.
2014]
Hurricane Sandy
(10k) , Queensland
Flood (10k)
Random
Forest[A.
Liaw et al.
2002]
UB = 0.8
LB = 0.7
50% of the data
used during model
creation and rest
used for testing
Few Observations
• Stable hashtag set contained
– 24 and 21 tags for TP class of Sandy and
Queensland dataset respectively
– 0 and 7 tags for TN class of Sandy and Queensland
dataset respectively
– 68% and 70% of the whole data for Sandy and
Queensland data satisfy the UB and LB constraints
Experimental Results
precision recall
OC 96.09 81.42
OC+HT 96.23 84.51
OC+NE 96.31 86.49
OC++ 96.35 87.55
70
75
80
85
90
95
100
#percentage
Sandy 2012
precesion recall
OC 99.53 36.75
OC+HT 99.8 58.47
OC+NE 99.75 68.9
OC++ 99.8 74.24
0
20
40
60
80
100
120
#percentage
Queensland 2013
D-sieve improves recall values substantially (6%
and 37% improvements for two datasets).
Future Work
• Investigate efficiency of D-sieve both for
online and offline classification settings
• Address the latency issue for feature
extraction (URL content crawling and entitiy
spotting) using parallel computing
• Extend our experiments with Facebook and
G+ data
References
• A. Liaw and M. Wiener. Classification and regression by randomforest. R news,
2(3):18–22, 2002.
• A. Olteanu, C. Castillo, F. Diaz, and S. Vieweg. “Crisislex: A lexicon for collecting
and filtering microblogged communications in crises”. In ICWSM, 2014.
• N. Morrow, N. Mock, A. Papendieck, and N. Kocmich. “Independent evaluation of
the ushahidi haiti project”. Development Information Systems International, 2011.
• S. Roy Chowdhury, M. Imran, M. R. Asghar, S. Amer-Yahia, and C. Castillo.
“Tweet4act: Using incident-specific profiles for classifying crisis-related
messages”. In ISCRAM 2013.
• Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. “Earthquake shakes Twitter
users: real-time event detection by social sensors”. In WWW 2010.
• X. S. Yang, D. W. Cheung, L. Mo, R. Cheng, and B. Kao. “On incentive-based
tagging”. In ICDE 2013.
D-sieve : A Novel Data Processing Engine for Crises Related Social Messages

More Related Content

Similar to D-sieve : A Novel Data Processing Engine for Crises Related Social Messages

EarthCube Stakeholder Alignment Survey - End-Users & Professional Societies W...
EarthCube Stakeholder Alignment Survey - End-Users & Professional Societies W...EarthCube Stakeholder Alignment Survey - End-Users & Professional Societies W...
EarthCube Stakeholder Alignment Survey - End-Users & Professional Societies W...EarthCube
 
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...CUbRIK Project
 
Recruiting Study Participants Online using Amazon's Mechanical Turk
Recruiting Study Participants Online using Amazon's Mechanical TurkRecruiting Study Participants Online using Amazon's Mechanical Turk
Recruiting Study Participants Online using Amazon's Mechanical TurkSC CTSI at USC and CHLA
 
Recommending Sequences RecTour 2017
Recommending Sequences RecTour 2017Recommending Sequences RecTour 2017
Recommending Sequences RecTour 2017Gunjan Kumar
 
Do you really need to test with only 5 users
Do you really need to test with only 5 usersDo you really need to test with only 5 users
Do you really need to test with only 5 usersVictoria Bondarchuk
 
A Movement Recognition Method using LBP
A Movement Recognition Method using LBPA Movement Recognition Method using LBP
A Movement Recognition Method using LBPZihui Li
 
Présentation PowerPoint - Diapositive 1
Présentation PowerPoint - Diapositive 1Présentation PowerPoint - Diapositive 1
Présentation PowerPoint - Diapositive 1butest
 
Présentation PowerPoint - Diapositive 1
Présentation PowerPoint - Diapositive 1Présentation PowerPoint - Diapositive 1
Présentation PowerPoint - Diapositive 1butest
 
Subword and spatiotemporal models for identifying actionable information in ...
Subword and spatiotemporal models for identifying actionable information in ...Subword and spatiotemporal models for identifying actionable information in ...
Subword and spatiotemporal models for identifying actionable information in ...Robert Munro
 
Conferencia Líder Dr. Rapinder Sawhney
Conferencia Líder Dr. Rapinder SawhneyConferencia Líder Dr. Rapinder Sawhney
Conferencia Líder Dr. Rapinder Sawhneylideresacademicos
 
Simulation: From theory to implementation
Simulation: From theory to implementationSimulation: From theory to implementation
Simulation: From theory to implementationAdam Dubrowski
 
Download-manuals-water quality-wq-training-44howtocarryoutcorrelationandspec...
 Download-manuals-water quality-wq-training-44howtocarryoutcorrelationandspec... Download-manuals-water quality-wq-training-44howtocarryoutcorrelationandspec...
Download-manuals-water quality-wq-training-44howtocarryoutcorrelationandspec...hydrologywebsite1
 
Download-manuals-water quality-wq-training-44howtocarryoutcorrelationandspec...
 Download-manuals-water quality-wq-training-44howtocarryoutcorrelationandspec... Download-manuals-water quality-wq-training-44howtocarryoutcorrelationandspec...
Download-manuals-water quality-wq-training-44howtocarryoutcorrelationandspec...hydrologyproject001
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning AnalyticsXavier Ochoa
 
Dynamics of project-driven systems A production model for repetitive processe...
Dynamics of project-driven systems A production model for repetitive processe...Dynamics of project-driven systems A production model for repetitive processe...
Dynamics of project-driven systems A production model for repetitive processe...Ricardo Magno Antunes
 
Practical computer vision-- A problem-driven approach towards learning CV/ML/DL
Practical computer vision-- A problem-driven approach towards learning CV/ML/DLPractical computer vision-- A problem-driven approach towards learning CV/ML/DL
Practical computer vision-- A problem-driven approach towards learning CV/ML/DLAlbert Y. C. Chen
 
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisWorkshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisOlga Scrivner
 
Multi-Scale and Multi-Modal Streaming Data Aggregation and Processing for Dec...
Multi-Scale and Multi-Modal Streaming Data Aggregation and Processing for Dec...Multi-Scale and Multi-Modal Streaming Data Aggregation and Processing for Dec...
Multi-Scale and Multi-Modal Streaming Data Aggregation and Processing for Dec...Artificial Intelligence Institute at UofSC
 
GECCO 2014 - Learning Classifier System Tutorial
GECCO 2014 - Learning Classifier System TutorialGECCO 2014 - Learning Classifier System Tutorial
GECCO 2014 - Learning Classifier System TutorialPier Luca Lanzi
 

Similar to D-sieve : A Novel Data Processing Engine for Crises Related Social Messages (20)

DeepLearning
DeepLearningDeepLearning
DeepLearning
 
EarthCube Stakeholder Alignment Survey - End-Users & Professional Societies W...
EarthCube Stakeholder Alignment Survey - End-Users & Professional Societies W...EarthCube Stakeholder Alignment Survey - End-Users & Professional Societies W...
EarthCube Stakeholder Alignment Survey - End-Users & Professional Societies W...
 
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
 
Recruiting Study Participants Online using Amazon's Mechanical Turk
Recruiting Study Participants Online using Amazon's Mechanical TurkRecruiting Study Participants Online using Amazon's Mechanical Turk
Recruiting Study Participants Online using Amazon's Mechanical Turk
 
Recommending Sequences RecTour 2017
Recommending Sequences RecTour 2017Recommending Sequences RecTour 2017
Recommending Sequences RecTour 2017
 
Do you really need to test with only 5 users
Do you really need to test with only 5 usersDo you really need to test with only 5 users
Do you really need to test with only 5 users
 
A Movement Recognition Method using LBP
A Movement Recognition Method using LBPA Movement Recognition Method using LBP
A Movement Recognition Method using LBP
 
Présentation PowerPoint - Diapositive 1
Présentation PowerPoint - Diapositive 1Présentation PowerPoint - Diapositive 1
Présentation PowerPoint - Diapositive 1
 
Présentation PowerPoint - Diapositive 1
Présentation PowerPoint - Diapositive 1Présentation PowerPoint - Diapositive 1
Présentation PowerPoint - Diapositive 1
 
Subword and spatiotemporal models for identifying actionable information in ...
Subword and spatiotemporal models for identifying actionable information in ...Subword and spatiotemporal models for identifying actionable information in ...
Subword and spatiotemporal models for identifying actionable information in ...
 
Conferencia Líder Dr. Rapinder Sawhney
Conferencia Líder Dr. Rapinder SawhneyConferencia Líder Dr. Rapinder Sawhney
Conferencia Líder Dr. Rapinder Sawhney
 
Simulation: From theory to implementation
Simulation: From theory to implementationSimulation: From theory to implementation
Simulation: From theory to implementation
 
Download-manuals-water quality-wq-training-44howtocarryoutcorrelationandspec...
 Download-manuals-water quality-wq-training-44howtocarryoutcorrelationandspec... Download-manuals-water quality-wq-training-44howtocarryoutcorrelationandspec...
Download-manuals-water quality-wq-training-44howtocarryoutcorrelationandspec...
 
Download-manuals-water quality-wq-training-44howtocarryoutcorrelationandspec...
 Download-manuals-water quality-wq-training-44howtocarryoutcorrelationandspec... Download-manuals-water quality-wq-training-44howtocarryoutcorrelationandspec...
Download-manuals-water quality-wq-training-44howtocarryoutcorrelationandspec...
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
 
Dynamics of project-driven systems A production model for repetitive processe...
Dynamics of project-driven systems A production model for repetitive processe...Dynamics of project-driven systems A production model for repetitive processe...
Dynamics of project-driven systems A production model for repetitive processe...
 
Practical computer vision-- A problem-driven approach towards learning CV/ML/DL
Practical computer vision-- A problem-driven approach towards learning CV/ML/DLPractical computer vision-- A problem-driven approach towards learning CV/ML/DL
Practical computer vision-- A problem-driven approach towards learning CV/ML/DL
 
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisWorkshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
 
Multi-Scale and Multi-Modal Streaming Data Aggregation and Processing for Dec...
Multi-Scale and Multi-Modal Streaming Data Aggregation and Processing for Dec...Multi-Scale and Multi-Modal Streaming Data Aggregation and Processing for Dec...
Multi-Scale and Multi-Modal Streaming Data Aggregation and Processing for Dec...
 
GECCO 2014 - Learning Classifier System Tutorial
GECCO 2014 - Learning Classifier System TutorialGECCO 2014 - Learning Classifier System Tutorial
GECCO 2014 - Learning Classifier System Tutorial
 

Recently uploaded

Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxMadhavi Dharankar
 
Unit :1 Basics of Professional Intelligence
Unit :1 Basics of Professional IntelligenceUnit :1 Basics of Professional Intelligence
Unit :1 Basics of Professional IntelligenceDr Vijay Vishwakarma
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesVijayaLaxmi84
 
Shark introduction Morphology and its behaviour characteristics
Shark introduction Morphology and its behaviour characteristicsShark introduction Morphology and its behaviour characteristics
Shark introduction Morphology and its behaviour characteristicsArubSultan
 
Transdisciplinary Pathways for Urban Resilience [Work in Progress].pptx
Transdisciplinary Pathways for Urban Resilience [Work in Progress].pptxTransdisciplinary Pathways for Urban Resilience [Work in Progress].pptx
Transdisciplinary Pathways for Urban Resilience [Work in Progress].pptxinfo924062
 
LEVERAGING SYNERGISM INDUSTRY-ACADEMIA PARTNERSHIP FOR IMPLEMENTATION OF NAT...
LEVERAGING SYNERGISM INDUSTRY-ACADEMIA PARTNERSHIP FOR IMPLEMENTATION OF  NAT...LEVERAGING SYNERGISM INDUSTRY-ACADEMIA PARTNERSHIP FOR IMPLEMENTATION OF  NAT...
LEVERAGING SYNERGISM INDUSTRY-ACADEMIA PARTNERSHIP FOR IMPLEMENTATION OF NAT...pragatimahajan3
 
4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptxmary850239
 
How to create _name_search function in odoo 17
How to create _name_search function in odoo 17How to create _name_search function in odoo 17
How to create _name_search function in odoo 17Celine George
 
Jason Potel In Media Res Media Component
Jason Potel In Media Res Media ComponentJason Potel In Media Res Media Component
Jason Potel In Media Res Media ComponentInMediaRes1
 
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFE
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFEPART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFE
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFEMISSRITIMABIOLOGYEXP
 
Sarah Lahm In Media Res Media Component
Sarah Lahm  In Media Res Media ComponentSarah Lahm  In Media Res Media Component
Sarah Lahm In Media Res Media ComponentInMediaRes1
 
The Shop Floor Overview in the Odoo 17 ERP
The Shop Floor Overview in the Odoo 17 ERPThe Shop Floor Overview in the Odoo 17 ERP
The Shop Floor Overview in the Odoo 17 ERPCeline George
 
Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...
Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...
Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...DrVipulVKapoor
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...Nguyen Thanh Tu Collection
 
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Osopher
 
Paul Dobryden In Media Res Media Component
Paul Dobryden In Media Res Media ComponentPaul Dobryden In Media Res Media Component
Paul Dobryden In Media Res Media ComponentInMediaRes1
 

Recently uploaded (20)

Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptx
 
Unit :1 Basics of Professional Intelligence
Unit :1 Basics of Professional IntelligenceUnit :1 Basics of Professional Intelligence
Unit :1 Basics of Professional Intelligence
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their uses
 
Shark introduction Morphology and its behaviour characteristics
Shark introduction Morphology and its behaviour characteristicsShark introduction Morphology and its behaviour characteristics
Shark introduction Morphology and its behaviour characteristics
 
Transdisciplinary Pathways for Urban Resilience [Work in Progress].pptx
Transdisciplinary Pathways for Urban Resilience [Work in Progress].pptxTransdisciplinary Pathways for Urban Resilience [Work in Progress].pptx
Transdisciplinary Pathways for Urban Resilience [Work in Progress].pptx
 
LEVERAGING SYNERGISM INDUSTRY-ACADEMIA PARTNERSHIP FOR IMPLEMENTATION OF NAT...
LEVERAGING SYNERGISM INDUSTRY-ACADEMIA PARTNERSHIP FOR IMPLEMENTATION OF  NAT...LEVERAGING SYNERGISM INDUSTRY-ACADEMIA PARTNERSHIP FOR IMPLEMENTATION OF  NAT...
LEVERAGING SYNERGISM INDUSTRY-ACADEMIA PARTNERSHIP FOR IMPLEMENTATION OF NAT...
 
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
 
4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx
 
How to create _name_search function in odoo 17
How to create _name_search function in odoo 17How to create _name_search function in odoo 17
How to create _name_search function in odoo 17
 
Jason Potel In Media Res Media Component
Jason Potel In Media Res Media ComponentJason Potel In Media Res Media Component
Jason Potel In Media Res Media Component
 
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFE
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFEPART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFE
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFE
 
Sarah Lahm In Media Res Media Component
Sarah Lahm  In Media Res Media ComponentSarah Lahm  In Media Res Media Component
Sarah Lahm In Media Res Media Component
 
CARNAVAL COM MAGIA E EUFORIA _
CARNAVAL COM MAGIA E EUFORIA            _CARNAVAL COM MAGIA E EUFORIA            _
CARNAVAL COM MAGIA E EUFORIA _
 
The Shop Floor Overview in the Odoo 17 ERP
The Shop Floor Overview in the Odoo 17 ERPThe Shop Floor Overview in the Odoo 17 ERP
The Shop Floor Overview in the Odoo 17 ERP
 
Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...
Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...
Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
 
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
 
Spearman's correlation,Formula,Advantages,
Spearman's correlation,Formula,Advantages,Spearman's correlation,Formula,Advantages,
Spearman's correlation,Formula,Advantages,
 
Paul Dobryden In Media Res Media Component
Paul Dobryden In Media Res Media ComponentPaul Dobryden In Media Res Media Component
Paul Dobryden In Media Res Media Component
 

D-sieve : A Novel Data Processing Engine for Crises Related Social Messages

  • 1. D-Sieve: A Novel Data Processing Engine for Crises Related Social Messages Soudip Roy Chowdhury (INRIA & UPSUD, France) Hemant Purohit (Wright State Uni., USA) Muhammad Imran (QCRI, Doha) SWDM 2015 May18, 2015
  • 2. Social Media During Crises Courtesy: http://worldmap.harvard.edu/maps/nepalqu ake
  • 6. Social Data Processing • Real-time analysis of crises related social data is necessary – Manual analysis • Standby task force [N. Morrow et al. 2011] – System mediated analysis • Unsupervised learning [T. Sakaki et al. 2010] • Supervised learning [S. Roy Chowdhury et al. 2013] (not scalable) (noisy / ambiguous data) (scarce training data / non-uniform class distribution)
  • 7. D-Sieve • Post-classification data processing engine in a supervised classification setting • Noise ClassA ClassB • Nois e ClassAClassB Supervised Classifier D-SieveRaw Data Correctly Classified Data
  • 8. Data Model • A message contains TEXT Date
  • 9. Data Model • A message contains TEXT Date hashtags content URL
  • 10. Our Approach • D-Sieve extracts two content features to classify messages correctly – Stable hashtag association – Stable named entity association
  • 11. Stable Hashtag Association Relative frequencies of stable hashtags for an event remain constant over time….
  • 12. Stable Hashtag Association • A hashtag is stable, if the moving average score of its frequency distribution (after k posts) becomes steady [X.S. Yang et al. 2013] • Stable hashtag set contains all stable hashtags for an incident • Stable hashtag association metric for an input message • and are set of stable hashtags for true positive and true negative classes (f) VH = | (fTP -fTN )Ç Hi | - | (fTN -fTP )Ç Hi | Hi fTP fTN
  • 13. Stable Named Entity Association • Named entities are extracted from the message content + URL content • Stable named entity association metric for an input message VE = | (ETP - ETN )Ç IE | - | (ETN - ETP )Ç IE | IE • We calculate the final feature association metrics • Feature weights ( ) are chosen such that ranges [-1,1] • Rule of thumb weights are set as 0.5 SC =WH.VH +WE.VE WH ,WE SC
  • 15. In our experiments.. CrisisLexT6 [A.Oleanu et al. 2014] Hurricane Sandy (10k) , Queensland Flood (10k) Random Forest[A. Liaw et al. 2002] UB = 0.8 LB = 0.7 50% of the data used during model creation and rest used for testing
  • 16. Few Observations • Stable hashtag set contained – 24 and 21 tags for TP class of Sandy and Queensland dataset respectively – 0 and 7 tags for TN class of Sandy and Queensland dataset respectively – 68% and 70% of the whole data for Sandy and Queensland data satisfy the UB and LB constraints
  • 17. Experimental Results precision recall OC 96.09 81.42 OC+HT 96.23 84.51 OC+NE 96.31 86.49 OC++ 96.35 87.55 70 75 80 85 90 95 100 #percentage Sandy 2012 precesion recall OC 99.53 36.75 OC+HT 99.8 58.47 OC+NE 99.75 68.9 OC++ 99.8 74.24 0 20 40 60 80 100 120 #percentage Queensland 2013 D-sieve improves recall values substantially (6% and 37% improvements for two datasets).
  • 18. Future Work • Investigate efficiency of D-sieve both for online and offline classification settings • Address the latency issue for feature extraction (URL content crawling and entitiy spotting) using parallel computing • Extend our experiments with Facebook and G+ data
  • 19. References • A. Liaw and M. Wiener. Classification and regression by randomforest. R news, 2(3):18–22, 2002. • A. Olteanu, C. Castillo, F. Diaz, and S. Vieweg. “Crisislex: A lexicon for collecting and filtering microblogged communications in crises”. In ICWSM, 2014. • N. Morrow, N. Mock, A. Papendieck, and N. Kocmich. “Independent evaluation of the ushahidi haiti project”. Development Information Systems International, 2011. • S. Roy Chowdhury, M. Imran, M. R. Asghar, S. Amer-Yahia, and C. Castillo. “Tweet4act: Using incident-specific profiles for classifying crisis-related messages”. In ISCRAM 2013. • Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. “Earthquake shakes Twitter users: real-time event detection by social sensors”. In WWW 2010. • X. S. Yang, D. W. Cheung, L. Mo, R. Cheng, and B. Kao. “On incentive-based tagging”. In ICDE 2013.

Editor's Notes

  1. We use InfoGain with Ranker algorithm to select top 800 content based features (uni-gram and bi-gram), and used the Ran- dom Forest [7] as our classification algorithm to learn the best-fit models. The AUC score for Sandy is 78% and 83% for Queensland.