SlideShare a Scribd company logo
Understanding the Diversity of Tweets
in the Time of Outbreaks
Nattiya Kanhabua and Wolfgang Nejdl
L3S Research Center
Leibniz Universität Hannover, Germany
http://www.L3S.de
Search result from Google
retrieved on 12 May 2013
Search result from Google
retrieved on 12 May 2013
Tweets in the Time of
OutbreaksPaper by Nattiya Kanhabua and Wolfgang Nejdl
Motivation
• Numerous works use Twitter to infer the existence
and magnitude of real-world events in real-time
– Earthquake [Sakaki et al., 2010]
– Predicting financial time series [Ruiz et al., 2012]
– Influenza epidemics [Culotta, 2010; Lampos et al.,
2011; Paul et al., 2011]
• In the medical domain, there has been a surge in
detecting health related tweets for early warning
– Allow a rapid response from authorities [Diaz-Aviles et
al., 2012]
Health related tweets
• User status updates or news related to
public health are common in Twitter
– I have the mumps...am I alone?
– my baby girl has a Gastroenteritis so great!! Please
do not give it to meee
– #Cholera breaks out in #Dadaab refugee camp in
#Kenya http://t.co/....
– As many as 16 people have been found infected with
Anthrax in Shahjadpur upazila of the Sirajganj district
in Bangladesh.
Web Observatory
Application
Challenge I. Noisy data
• Ambiguity
– having several meanings
– used in different contexts
• Incompleteness
– missing or under-reported events
– data processing errors
Challenge I. Noisy data
• Ambiguity
– having several meanings
– used in different contexts
• Incompleteness
– missing or under-reported events
– data processing errors
Category Example tweet
Literature A two hour train journey, Love In the Time of Cholera ...
Music Dengue Fever’s “Uku,” Mixed by Paul Dreux Smith
Universal Audio...
Marketing Exclusive distributor of high quality #HIV/AIDS Blood &
Urine and #Hepatitis #Self -testers.
General Identification of genotype 4 Hepatitis E virus binding
proteins on swine liver cells: Hepatitis E virus...
Negative i dont have sniffles and no real coughing..well its
coughing but not like an influenza cough.
Joke Thought I had Bieber Fever. Ends up I just had a combo
of the mumps, mono, measles & the hershey squ...
Challenge II. Dynamics
• Time
– seasonal infectious diseases
– rare and spontaneous outbreaks
• Place
– frequency and duration
– levels of prevalence or severity
Challenge II. Dynamics
• Time
– seasonal infectious diseases
– rare and spontaneous outbreaks
• Place
– frequency and duration
– levels of prevalence or severity
[Rortais et al., 2010 in Journal of Food Research International]
Challenge II. Dynamics
• Time
– seasonal infectious diseases
– rare and spontaneous outbreaks
• Place
– frequency and duration
– levels of prevalence or severity
Challenge II. Dynamics
[Emch et al., 2008 in International Journal of Health Geographics]
Problem Statement
• How to detect outbreaks for general diseases?
– Previous works focus on a limited number of diseases,
i.e., influenza or dengue, based on supervised learning
• How to take into account temporal and spatial
diversities for outbreak detection?
– Previous works do not explicitly model the diversity
dimension
Contributions
• We conduct the first study of temporal diversity
in Twitter
• A method to extract topic dynamics for outbreaks
used as an estimate of real-world statistics
• A correlation analysis of temporal diversity and
estimate statistics for 14 outbreak ground truths
System Framework
• Part I. Ground truth creation
– Official outbreak reports
• World Health Organization1
• ProMED-mail2
• Part II. Creating Twitter time series
1.medical condition
• disease name, synonyms, pathogens, symptoms
1.location
• geographic expressions, geo-location, or user profile
• 3 levels: country, continent, latitude
1
http://www.who.int
2
Ground Truths
• Extract events in a
pipeline fashion
• Annotated documents
– named entities (diseases,
victims and locations)
– temporal expressions
– a set of sentences
• Event e: (v, m, l, te)
– who (victim v) was infected
– what (disease m) causes
– where (location l)
– when (time te)
Unstructured
text collection
Sentence
Extraction
Sentence
Extraction
Tokenizati
on
Tokenizati
on
Identifying
Relevant
Time
Identifying
Relevant
Time
Event
Aggregation
Event
Aggregation
Text Annotation
Event
Extraction
Part-of-
speech
Tagging
Part-of-
speech
Tagging
Temporal
Expression
Extraction
Temporal
Expression
Extraction
Named
Entity
Recognition
Named
Entity
Recognition
Annotated
Document
s
Event
Profiles
User
browsing/
retrieving
[Kanhabua et al., 2012a]
Event Extraction
• An event is a sentence containing two entities
– (1) medical condition and (2) geographic expression
– A minimum requirement by domain experts
• A victim and the time of an event can be identified
from the sentence itself, or its surrounding context
• Output: a set of event candidates
Reported by World Health Organization (WHO) on
29 July 2012 about an ongoing Ebola outbreak
in Uganda since the beginning of July 2012
List of 14 Outbreaks
Matching Tweets
[Kanhabua et al., 2012b]
Matching Tweets
[Kanhabua et al., 2012b]
Identifying Topic Dynamics
• Input: time series data of relevant tweets
• For each time tk, unsupervised clustering by
topic
• Filter result topics by cluster quality
• Output: outbreak-related topic time series
Outbreak Negative Terms
Outbreak Topic Dynamics
• Input: time series data of relevant tweets
• For each time tk, unsupervised clustering by
topic
• Filter result topics by cluster quality
• Output: outbreak-related topic time series
07 Sep 2011
08 Sep 2011
Diversity Metric
• Refined Jaccard Index (RDJ-index)
– average Jaccard similarity of all object pairs
• Note: lower RDJ corresponds to higher diversity
• Problem: “All-Pair comparison”
• Solution: Estimation algorithms with probabilistic
error bound guarantees
[Deng et al., 2012]
∑<−
=
ji
ji OOJS
nn
RDJ ),(
)1(
2
nji ≤<≤1
∩ UU
Jaccard similarity
Diversity Metric
• Refined Jaccard Index (RDJ-index)
– average Jaccard similarity of all object pairs
• Note: lower RDJ corresponds to higher diversity
• Problem: “All-Pair comparison”
• Solution: Estimation algorithms with probabilistic
error bound guarantees
[Deng et al., 2012]
∑<−
=
ji
ji OOJS
nn
RDJ ),(
)1(
2
nji ≤<≤1
∩ UU
Jaccard similarity
(1) Top-k
terms
(2) Entities
• Input: Relative error e, accuracy confidence d
• Output: Estimated RDJ value
• Algorithms: SampleDJ, TrackDJ (claims and
proofs in [Deng et al., 2012])
Estimate Algorithms
δε <





>
−
RDJ
RDJRDJ ||
Pr
(slide provided by authors)
Temporal Diversity
• where α underlines the importance of both metrics. The
value will be empirically determined.
Temporal Diversity
Experimental Settings
• Official outbreak reports
– ~3,000 ProMED-mail reports from 2011
• Twitter data
– ~1,200 health-related terms
– Over 112 millions of tweets from 2011
• Series of NLP tools including
– OpenNLP (tokenization, sentence splitting, POS
tagging)
– OpenCalais (named entity recognition)
– HeidelTime (temporal expression extraction)
Results
• Identified topics show similar
trends during the known time
periods of real-world outbreaks
• Diversity reflects how the
language (i.e., terms and
locations) are used differently
• Div(entity) highly correlates
with topic dynamics for some
diseases, i.e., mumps, ebola,
botulism and ehec
• Div(term) shows correlation
with topic dynamics for cholera,
anthrax and rubella
Topic over time
Temporal Diversity
Cholera
Conclusions
• Study of detecting real-world outbreaks in Twitter
• Proposed method to compute temporal diversity
• Correlation analysis of temporal diversity and
estimate magnitude of outbreaks
• Future work: improve diversity measures
1.new representations for tweets, e.g., using other types
of entities
2.employ a semantic-based similarity measurement
References
• [Culotta, 2010] A. Culotta. Towards detecting influenza epidemics by analyzing twitter
messages. In Proceedings of the First Workshop on Social Media Analytics (SOMA’2010), 2010.
• [Diaz-Aviles et al., 2012] E. Diaz-Aviles, A. Stewart, E. Velasco, K. Denecke, and W. Nejdl.
Epidemic intelligence for the crowd, by the crowd. In Proceedings of International AAAI
Conference on Weblogs and Social Media (ICWSM’2012), 2012.
• [Kanhabua et al., 2012a] N. Kanhabua, Sara Romano, and A. Stewart, Identifying Relevant
Temporal Expressions for Real-world Events, In SIGIR 2012 Workshop on Time-aware
Information Access (TAIA'2012), 2012.
• [Kanhabua et al., 2012b] N. Kanhabua, Sara Romano, and A. Stewart and W. Nejdl. Supporting
Temporal Analytics for Health Related Events in Microblogs. In Proceedings of CIKM'2012, 2012.
• [Lampos et al., 2011] V. Lampos and N. Cristianini. Nowcasting events from the social web with
statistical learning. ACM TIST, 3, 2011.
• [Paul et al., 2011] M. J. Paul and M. Dredze. You are what you tweet: Analyzing twitter for public
health. In Proceedings of International AAAI Conference on Weblogs and Social Media
(ICWSM’2011), 2011.
• [Ruiz et al., 2012] E. J. Ruiz, V. Hristidis, C. Castillo, A. Gionis, and A. Jaimes. Correlating
financial time series with micro-blogging activity. In Proceedings of WSDM’2012, 2012.
• [Sakaki et al., 2010] T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users:
real-time event detection by social sensors. In Proceedings of WWW’2010, 2010.

More Related Content

Viewers also liked

Ranking Related News Predictions
Ranking Related News PredictionsRanking Related News Predictions
Ranking Related News Predictions
Nattiya Kanhabua
 
Leveraging Dynamic Query Subtopics for Time-aware Search Result Diversification
Leveraging Dynamic Query Subtopics for Time-aware Search Result DiversificationLeveraging Dynamic Query Subtopics for Time-aware Search Result Diversification
Leveraging Dynamic Query Subtopics for Time-aware Search Result Diversification
Nattiya Kanhabua
 
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...Concise Preservation by Combining Managed Forgetting and Contextualized Remem...
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...
Nattiya Kanhabua
 
Can Twitter & Co. Save Lives?
Can Twitter & Co. Save Lives?Can Twitter & Co. Save Lives?
Can Twitter & Co. Save Lives?
Nattiya Kanhabua
 
Leveraging Learning To Rank in an Optimization Framework for Timeline Summari...
Leveraging Learning To Rank in an Optimization Framework for Timeline Summari...Leveraging Learning To Rank in an Optimization Framework for Timeline Summari...
Leveraging Learning To Rank in an Optimization Framework for Timeline Summari...
Nattiya Kanhabua
 
On the Value of Temporal Anchor Texts in Wikipedia
On the Value of Temporal Anchor Texts in WikipediaOn the Value of Temporal Anchor Texts in Wikipedia
On the Value of Temporal Anchor Texts in Wikipedia
Nattiya Kanhabua
 
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Nattiya Kanhabua
 
Identifying Relevant Temporal Expressions for Real-world Events
Identifying Relevant Temporal Expressions for Real-world EventsIdentifying Relevant Temporal Expressions for Real-world Events
Identifying Relevant Temporal Expressions for Real-world Events
Nattiya Kanhabua
 
Temporal summarization of event related updates
Temporal summarization of event related updatesTemporal summarization of event related updates
Temporal summarization of event related updates
Nattiya Kanhabua
 
Time-aware Approaches to Information Retrieval
Time-aware Approaches to Information RetrievalTime-aware Approaches to Information Retrieval
Time-aware Approaches to Information Retrieval
Nattiya Kanhabua
 
Exploiting temporal information in retrieval of archived documents (doctoral ...
Exploiting temporal information in retrieval of archived documents (doctoral ...Exploiting temporal information in retrieval of archived documents (doctoral ...
Exploiting temporal information in retrieval of archived documents (doctoral ...
Nattiya Kanhabua
 
Why Is It Difficult to Detect Outbreaks in Twitter?
Why Is It Difficult to Detect Outbreaks in Twitter?Why Is It Difficult to Detect Outbreaks in Twitter?
Why Is It Difficult to Detect Outbreaks in Twitter?
Nattiya Kanhabua
 
Search, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving DataSearch, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving Data
Nattiya Kanhabua
 
Temporal Web Dynamics and Implications for Information Retrieval
Temporal Web Dynamics and Implications for Information RetrievalTemporal Web Dynamics and Implications for Information Retrieval
Temporal Web Dynamics and Implications for Information Retrieval
Nattiya Kanhabua
 
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)
Nattiya Kanhabua
 

Viewers also liked (15)

Ranking Related News Predictions
Ranking Related News PredictionsRanking Related News Predictions
Ranking Related News Predictions
 
Leveraging Dynamic Query Subtopics for Time-aware Search Result Diversification
Leveraging Dynamic Query Subtopics for Time-aware Search Result DiversificationLeveraging Dynamic Query Subtopics for Time-aware Search Result Diversification
Leveraging Dynamic Query Subtopics for Time-aware Search Result Diversification
 
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...Concise Preservation by Combining Managed Forgetting and Contextualized Remem...
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...
 
Can Twitter & Co. Save Lives?
Can Twitter & Co. Save Lives?Can Twitter & Co. Save Lives?
Can Twitter & Co. Save Lives?
 
Leveraging Learning To Rank in an Optimization Framework for Timeline Summari...
Leveraging Learning To Rank in an Optimization Framework for Timeline Summari...Leveraging Learning To Rank in an Optimization Framework for Timeline Summari...
Leveraging Learning To Rank in an Optimization Framework for Timeline Summari...
 
On the Value of Temporal Anchor Texts in Wikipedia
On the Value of Temporal Anchor Texts in WikipediaOn the Value of Temporal Anchor Texts in Wikipedia
On the Value of Temporal Anchor Texts in Wikipedia
 
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
 
Identifying Relevant Temporal Expressions for Real-world Events
Identifying Relevant Temporal Expressions for Real-world EventsIdentifying Relevant Temporal Expressions for Real-world Events
Identifying Relevant Temporal Expressions for Real-world Events
 
Temporal summarization of event related updates
Temporal summarization of event related updatesTemporal summarization of event related updates
Temporal summarization of event related updates
 
Time-aware Approaches to Information Retrieval
Time-aware Approaches to Information RetrievalTime-aware Approaches to Information Retrieval
Time-aware Approaches to Information Retrieval
 
Exploiting temporal information in retrieval of archived documents (doctoral ...
Exploiting temporal information in retrieval of archived documents (doctoral ...Exploiting temporal information in retrieval of archived documents (doctoral ...
Exploiting temporal information in retrieval of archived documents (doctoral ...
 
Why Is It Difficult to Detect Outbreaks in Twitter?
Why Is It Difficult to Detect Outbreaks in Twitter?Why Is It Difficult to Detect Outbreaks in Twitter?
Why Is It Difficult to Detect Outbreaks in Twitter?
 
Search, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving DataSearch, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving Data
 
Temporal Web Dynamics and Implications for Information Retrieval
Temporal Web Dynamics and Implications for Information RetrievalTemporal Web Dynamics and Implications for Information Retrieval
Temporal Web Dynamics and Implications for Information Retrieval
 
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)
 

Similar to Understanding the Diversity of Tweets in the Time of Outbreaks

I so p 9.10.2017
I so p 9.10.2017I so p 9.10.2017
I so p 9.10.2017
Nigel Collier
 
High throughput analysis and alerting of disease outbreaks from the grey lite...
High throughput analysis and alerting of disease outbreaks from the grey lite...High throughput analysis and alerting of disease outbreaks from the grey lite...
High throughput analysis and alerting of disease outbreaks from the grey lite...
Nigel Collier
 
Dengue Transmission and Risk Factors in Dhaka, Bangladesh
Dengue Transmission and Risk Factors in Dhaka, Bangladesh Dengue Transmission and Risk Factors in Dhaka, Bangladesh
Dengue Transmission and Risk Factors in Dhaka, Bangladesh
Global Risk Forum GRFDavos
 
Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...
Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...
Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...
Flávio Codeço Coelho
 
Introduction to Epidemiology and Surveillance
Introduction to Epidemiology and SurveillanceIntroduction to Epidemiology and Surveillance
Introduction to Epidemiology and Surveillance
George Moulton
 
Discussion 1 REPLYDescription       The source I found w.docx
Discussion 1 REPLYDescription       The source I found w.docxDiscussion 1 REPLYDescription       The source I found w.docx
Discussion 1 REPLYDescription       The source I found w.docx
duketjoy27252
 
Epidemic Investigation.pdf
Epidemic Investigation.pdfEpidemic Investigation.pdf
Epidemic Investigation.pdf
DinaOmer4
 
Improved Public Health by creating an interface between concern assessment an...
Improved Public Health by creating an interface between concern assessment an...Improved Public Health by creating an interface between concern assessment an...
Improved Public Health by creating an interface between concern assessment an...
Global Risk Forum GRFDavos
 
2_Epidemiology1_Basics.pdf
2_Epidemiology1_Basics.pdf2_Epidemiology1_Basics.pdf
2_Epidemiology1_Basics.pdf
NermineChoumane1
 
Surveillance of social media: Big data analytics
Surveillance of social media: Big data analyticsSurveillance of social media: Big data analytics
Surveillance of social media: Big data analytics
Health Informatics New Zealand
 
Using Twitter Data to Predict Flu Outbreak
Using Twitter Data to Predict Flu OutbreakUsing Twitter Data to Predict Flu Outbreak
Using Twitter Data to Predict Flu Outbreak
Division of Biomedical Informatics, UC San Diego
 
Epidemiology of periodontal diseases
Epidemiology of periodontal diseasesEpidemiology of periodontal diseases
Epidemiology of periodontal diseases
Dr. Ayushi Naagar
 
Lec 2 (2019)
Lec 2 (2019)Lec 2 (2019)
Lec 2 (2019)
NoorahMurad
 
Informatics for Disease Surveillance – New Technologies
Informatics for Disease Surveillance – New TechnologiesInformatics for Disease Surveillance – New Technologies
Informatics for Disease Surveillance – New Technologies
Dr Wasim Ahmed
 
Outbreak investigation.pptx
Outbreak investigation.pptxOutbreak investigation.pptx
Outbreak investigation.pptx
asifraza4646
 
BRENDER-Economic considerations in risk management-ID1485-IDRC2014_b
BRENDER-Economic considerations in risk management-ID1485-IDRC2014_bBRENDER-Economic considerations in risk management-ID1485-IDRC2014_b
BRENDER-Economic considerations in risk management-ID1485-IDRC2014_b
Global Risk Forum GRFDavos
 
Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an O...
Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an O...Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an O...
Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an O...
GigaScience, BGI Hong Kong
 
Study-Designs-in-Epidemiology-2.pdf
Study-Designs-in-Epidemiology-2.pdfStudy-Designs-in-Epidemiology-2.pdf
Study-Designs-in-Epidemiology-2.pdf
KelvinSoko
 
Moral Panic through the Lens of Twitter: An Analysis of Infectious Disease Ou...
Moral Panic through the Lens of Twitter: An Analysis of Infectious Disease Ou...Moral Panic through the Lens of Twitter: An Analysis of Infectious Disease Ou...
Moral Panic through the Lens of Twitter: An Analysis of Infectious Disease Ou...
Dr Wasim Ahmed
 

Similar to Understanding the Diversity of Tweets in the Time of Outbreaks (20)

I so p 9.10.2017
I so p 9.10.2017I so p 9.10.2017
I so p 9.10.2017
 
High throughput analysis and alerting of disease outbreaks from the grey lite...
High throughput analysis and alerting of disease outbreaks from the grey lite...High throughput analysis and alerting of disease outbreaks from the grey lite...
High throughput analysis and alerting of disease outbreaks from the grey lite...
 
Dengue Transmission and Risk Factors in Dhaka, Bangladesh
Dengue Transmission and Risk Factors in Dhaka, Bangladesh Dengue Transmission and Risk Factors in Dhaka, Bangladesh
Dengue Transmission and Risk Factors in Dhaka, Bangladesh
 
Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...
Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...
Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...
 
Introduction to Epidemiology and Surveillance
Introduction to Epidemiology and SurveillanceIntroduction to Epidemiology and Surveillance
Introduction to Epidemiology and Surveillance
 
Discussion 1 REPLYDescription       The source I found w.docx
Discussion 1 REPLYDescription       The source I found w.docxDiscussion 1 REPLYDescription       The source I found w.docx
Discussion 1 REPLYDescription       The source I found w.docx
 
Epidemic Investigation.pdf
Epidemic Investigation.pdfEpidemic Investigation.pdf
Epidemic Investigation.pdf
 
Improved Public Health by creating an interface between concern assessment an...
Improved Public Health by creating an interface between concern assessment an...Improved Public Health by creating an interface between concern assessment an...
Improved Public Health by creating an interface between concern assessment an...
 
2_Epidemiology1_Basics.pdf
2_Epidemiology1_Basics.pdf2_Epidemiology1_Basics.pdf
2_Epidemiology1_Basics.pdf
 
Surveillance of social media: Big data analytics
Surveillance of social media: Big data analyticsSurveillance of social media: Big data analytics
Surveillance of social media: Big data analytics
 
Using Twitter Data to Predict Flu Outbreak
Using Twitter Data to Predict Flu OutbreakUsing Twitter Data to Predict Flu Outbreak
Using Twitter Data to Predict Flu Outbreak
 
Epidemiology of periodontal diseases
Epidemiology of periodontal diseasesEpidemiology of periodontal diseases
Epidemiology of periodontal diseases
 
Lec 2 (2019)
Lec 2 (2019)Lec 2 (2019)
Lec 2 (2019)
 
Informatics for Disease Surveillance – New Technologies
Informatics for Disease Surveillance – New TechnologiesInformatics for Disease Surveillance – New Technologies
Informatics for Disease Surveillance – New Technologies
 
Outbreak investigation.pptx
Outbreak investigation.pptxOutbreak investigation.pptx
Outbreak investigation.pptx
 
Outbreak Investigation
Outbreak InvestigationOutbreak Investigation
Outbreak Investigation
 
BRENDER-Economic considerations in risk management-ID1485-IDRC2014_b
BRENDER-Economic considerations in risk management-ID1485-IDRC2014_bBRENDER-Economic considerations in risk management-ID1485-IDRC2014_b
BRENDER-Economic considerations in risk management-ID1485-IDRC2014_b
 
Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an O...
Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an O...Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an O...
Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an O...
 
Study-Designs-in-Epidemiology-2.pdf
Study-Designs-in-Epidemiology-2.pdfStudy-Designs-in-Epidemiology-2.pdf
Study-Designs-in-Epidemiology-2.pdf
 
Moral Panic through the Lens of Twitter: An Analysis of Infectious Disease Ou...
Moral Panic through the Lens of Twitter: An Analysis of Infectious Disease Ou...Moral Panic through the Lens of Twitter: An Analysis of Infectious Disease Ou...
Moral Panic through the Lens of Twitter: An Analysis of Infectious Disease Ou...
 

Recently uploaded

somanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptxsomanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
Howard Spence
 
Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...
Sebastiano Panichella
 
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Dutch Power
 
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdfSupercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Access Innovations, Inc.
 
María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024
eCommerce Institute
 
Obesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditionsObesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditions
Faculty of Medicine And Health Sciences
 
Gregory Harris - Cycle 2 - Civics Presentation
Gregory Harris - Cycle 2 - Civics PresentationGregory Harris - Cycle 2 - Civics Presentation
Gregory Harris - Cycle 2 - Civics Presentation
gharris9
 
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdfBonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
khadija278284
 
Collapsing Narratives: Exploring Non-Linearity • a micro report by Rosie Wells
Collapsing Narratives: Exploring Non-Linearity • a micro report by Rosie WellsCollapsing Narratives: Exploring Non-Linearity • a micro report by Rosie Wells
Collapsing Narratives: Exploring Non-Linearity • a micro report by Rosie Wells
Rosie Wells
 
International Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software TestingInternational Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software Testing
Sebastiano Panichella
 
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Dutch Power
 
Bitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXOBitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXO
Matjaž Lipuš
 
2024-05-30_meetup_devops_aix-marseille.pdf
2024-05-30_meetup_devops_aix-marseille.pdf2024-05-30_meetup_devops_aix-marseille.pdf
2024-05-30_meetup_devops_aix-marseille.pdf
Frederic Leger
 
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Sebastiano Panichella
 
Tom tresser burning issue.pptx My Burning issue
Tom tresser burning issue.pptx My Burning issueTom tresser burning issue.pptx My Burning issue
Tom tresser burning issue.pptx My Burning issue
amekonnen
 
Media as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern EraMedia as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern Era
faizulhassanfaiz1670
 
AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
AwangAniqkmals
 
Burning Issue Presentation By Kenmaryon.pdf
Burning Issue Presentation By Kenmaryon.pdfBurning Issue Presentation By Kenmaryon.pdf
Burning Issue Presentation By Kenmaryon.pdf
kkirkland2
 
Gregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptxGregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptx
gharris9
 

Recently uploaded (19)

somanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptxsomanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
 
Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...
 
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
 
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdfSupercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
 
María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024
 
Obesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditionsObesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditions
 
Gregory Harris - Cycle 2 - Civics Presentation
Gregory Harris - Cycle 2 - Civics PresentationGregory Harris - Cycle 2 - Civics Presentation
Gregory Harris - Cycle 2 - Civics Presentation
 
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdfBonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
 
Collapsing Narratives: Exploring Non-Linearity • a micro report by Rosie Wells
Collapsing Narratives: Exploring Non-Linearity • a micro report by Rosie WellsCollapsing Narratives: Exploring Non-Linearity • a micro report by Rosie Wells
Collapsing Narratives: Exploring Non-Linearity • a micro report by Rosie Wells
 
International Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software TestingInternational Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software Testing
 
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
 
Bitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXOBitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXO
 
2024-05-30_meetup_devops_aix-marseille.pdf
2024-05-30_meetup_devops_aix-marseille.pdf2024-05-30_meetup_devops_aix-marseille.pdf
2024-05-30_meetup_devops_aix-marseille.pdf
 
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
 
Tom tresser burning issue.pptx My Burning issue
Tom tresser burning issue.pptx My Burning issueTom tresser burning issue.pptx My Burning issue
Tom tresser burning issue.pptx My Burning issue
 
Media as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern EraMedia as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern Era
 
AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
 
Burning Issue Presentation By Kenmaryon.pdf
Burning Issue Presentation By Kenmaryon.pdfBurning Issue Presentation By Kenmaryon.pdf
Burning Issue Presentation By Kenmaryon.pdf
 
Gregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptxGregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptx
 

Understanding the Diversity of Tweets in the Time of Outbreaks

  • 1. Understanding the Diversity of Tweets in the Time of Outbreaks Nattiya Kanhabua and Wolfgang Nejdl L3S Research Center Leibniz Universität Hannover, Germany http://www.L3S.de
  • 2. Search result from Google retrieved on 12 May 2013
  • 3. Search result from Google retrieved on 12 May 2013 Tweets in the Time of OutbreaksPaper by Nattiya Kanhabua and Wolfgang Nejdl
  • 4. Motivation • Numerous works use Twitter to infer the existence and magnitude of real-world events in real-time – Earthquake [Sakaki et al., 2010] – Predicting financial time series [Ruiz et al., 2012] – Influenza epidemics [Culotta, 2010; Lampos et al., 2011; Paul et al., 2011] • In the medical domain, there has been a surge in detecting health related tweets for early warning – Allow a rapid response from authorities [Diaz-Aviles et al., 2012]
  • 5. Health related tweets • User status updates or news related to public health are common in Twitter – I have the mumps...am I alone? – my baby girl has a Gastroenteritis so great!! Please do not give it to meee – #Cholera breaks out in #Dadaab refugee camp in #Kenya http://t.co/.... – As many as 16 people have been found infected with Anthrax in Shahjadpur upazila of the Sirajganj district in Bangladesh.
  • 7. Challenge I. Noisy data • Ambiguity – having several meanings – used in different contexts • Incompleteness – missing or under-reported events – data processing errors
  • 8. Challenge I. Noisy data • Ambiguity – having several meanings – used in different contexts • Incompleteness – missing or under-reported events – data processing errors Category Example tweet Literature A two hour train journey, Love In the Time of Cholera ... Music Dengue Fever’s “Uku,” Mixed by Paul Dreux Smith Universal Audio... Marketing Exclusive distributor of high quality #HIV/AIDS Blood & Urine and #Hepatitis #Self -testers. General Identification of genotype 4 Hepatitis E virus binding proteins on swine liver cells: Hepatitis E virus... Negative i dont have sniffles and no real coughing..well its coughing but not like an influenza cough. Joke Thought I had Bieber Fever. Ends up I just had a combo of the mumps, mono, measles & the hershey squ...
  • 9. Challenge II. Dynamics • Time – seasonal infectious diseases – rare and spontaneous outbreaks • Place – frequency and duration – levels of prevalence or severity
  • 10. Challenge II. Dynamics • Time – seasonal infectious diseases – rare and spontaneous outbreaks • Place – frequency and duration – levels of prevalence or severity [Rortais et al., 2010 in Journal of Food Research International]
  • 11. Challenge II. Dynamics • Time – seasonal infectious diseases – rare and spontaneous outbreaks • Place – frequency and duration – levels of prevalence or severity
  • 12. Challenge II. Dynamics [Emch et al., 2008 in International Journal of Health Geographics]
  • 13. Problem Statement • How to detect outbreaks for general diseases? – Previous works focus on a limited number of diseases, i.e., influenza or dengue, based on supervised learning • How to take into account temporal and spatial diversities for outbreak detection? – Previous works do not explicitly model the diversity dimension
  • 14. Contributions • We conduct the first study of temporal diversity in Twitter • A method to extract topic dynamics for outbreaks used as an estimate of real-world statistics • A correlation analysis of temporal diversity and estimate statistics for 14 outbreak ground truths
  • 15. System Framework • Part I. Ground truth creation – Official outbreak reports • World Health Organization1 • ProMED-mail2 • Part II. Creating Twitter time series 1.medical condition • disease name, synonyms, pathogens, symptoms 1.location • geographic expressions, geo-location, or user profile • 3 levels: country, continent, latitude 1 http://www.who.int 2
  • 16. Ground Truths • Extract events in a pipeline fashion • Annotated documents – named entities (diseases, victims and locations) – temporal expressions – a set of sentences • Event e: (v, m, l, te) – who (victim v) was infected – what (disease m) causes – where (location l) – when (time te) Unstructured text collection Sentence Extraction Sentence Extraction Tokenizati on Tokenizati on Identifying Relevant Time Identifying Relevant Time Event Aggregation Event Aggregation Text Annotation Event Extraction Part-of- speech Tagging Part-of- speech Tagging Temporal Expression Extraction Temporal Expression Extraction Named Entity Recognition Named Entity Recognition Annotated Document s Event Profiles User browsing/ retrieving [Kanhabua et al., 2012a]
  • 17. Event Extraction • An event is a sentence containing two entities – (1) medical condition and (2) geographic expression – A minimum requirement by domain experts • A victim and the time of an event can be identified from the sentence itself, or its surrounding context • Output: a set of event candidates Reported by World Health Organization (WHO) on 29 July 2012 about an ongoing Ebola outbreak in Uganda since the beginning of July 2012
  • 18. List of 14 Outbreaks
  • 21. Identifying Topic Dynamics • Input: time series data of relevant tweets • For each time tk, unsupervised clustering by topic • Filter result topics by cluster quality • Output: outbreak-related topic time series
  • 23. Outbreak Topic Dynamics • Input: time series data of relevant tweets • For each time tk, unsupervised clustering by topic • Filter result topics by cluster quality • Output: outbreak-related topic time series 07 Sep 2011 08 Sep 2011
  • 24. Diversity Metric • Refined Jaccard Index (RDJ-index) – average Jaccard similarity of all object pairs • Note: lower RDJ corresponds to higher diversity • Problem: “All-Pair comparison” • Solution: Estimation algorithms with probabilistic error bound guarantees [Deng et al., 2012] ∑<− = ji ji OOJS nn RDJ ),( )1( 2 nji ≤<≤1 ∩ UU Jaccard similarity
  • 25. Diversity Metric • Refined Jaccard Index (RDJ-index) – average Jaccard similarity of all object pairs • Note: lower RDJ corresponds to higher diversity • Problem: “All-Pair comparison” • Solution: Estimation algorithms with probabilistic error bound guarantees [Deng et al., 2012] ∑<− = ji ji OOJS nn RDJ ),( )1( 2 nji ≤<≤1 ∩ UU Jaccard similarity (1) Top-k terms (2) Entities
  • 26. • Input: Relative error e, accuracy confidence d • Output: Estimated RDJ value • Algorithms: SampleDJ, TrackDJ (claims and proofs in [Deng et al., 2012]) Estimate Algorithms δε <      > − RDJ RDJRDJ || Pr (slide provided by authors)
  • 27. Temporal Diversity • where α underlines the importance of both metrics. The value will be empirically determined.
  • 29. Experimental Settings • Official outbreak reports – ~3,000 ProMED-mail reports from 2011 • Twitter data – ~1,200 health-related terms – Over 112 millions of tweets from 2011 • Series of NLP tools including – OpenNLP (tokenization, sentence splitting, POS tagging) – OpenCalais (named entity recognition) – HeidelTime (temporal expression extraction)
  • 30. Results • Identified topics show similar trends during the known time periods of real-world outbreaks • Diversity reflects how the language (i.e., terms and locations) are used differently • Div(entity) highly correlates with topic dynamics for some diseases, i.e., mumps, ebola, botulism and ehec • Div(term) shows correlation with topic dynamics for cholera, anthrax and rubella Topic over time Temporal Diversity Cholera
  • 31. Conclusions • Study of detecting real-world outbreaks in Twitter • Proposed method to compute temporal diversity • Correlation analysis of temporal diversity and estimate magnitude of outbreaks • Future work: improve diversity measures 1.new representations for tweets, e.g., using other types of entities 2.employ a semantic-based similarity measurement
  • 32. References • [Culotta, 2010] A. Culotta. Towards detecting influenza epidemics by analyzing twitter messages. In Proceedings of the First Workshop on Social Media Analytics (SOMA’2010), 2010. • [Diaz-Aviles et al., 2012] E. Diaz-Aviles, A. Stewart, E. Velasco, K. Denecke, and W. Nejdl. Epidemic intelligence for the crowd, by the crowd. In Proceedings of International AAAI Conference on Weblogs and Social Media (ICWSM’2012), 2012. • [Kanhabua et al., 2012a] N. Kanhabua, Sara Romano, and A. Stewart, Identifying Relevant Temporal Expressions for Real-world Events, In SIGIR 2012 Workshop on Time-aware Information Access (TAIA'2012), 2012. • [Kanhabua et al., 2012b] N. Kanhabua, Sara Romano, and A. Stewart and W. Nejdl. Supporting Temporal Analytics for Health Related Events in Microblogs. In Proceedings of CIKM'2012, 2012. • [Lampos et al., 2011] V. Lampos and N. Cristianini. Nowcasting events from the social web with statistical learning. ACM TIST, 3, 2011. • [Paul et al., 2011] M. J. Paul and M. Dredze. You are what you tweet: Analyzing twitter for public health. In Proceedings of International AAAI Conference on Weblogs and Social Media (ICWSM’2011), 2011. • [Ruiz et al., 2012] E. J. Ruiz, V. Hristidis, C. Castillo, A. Gionis, and A. Jaimes. Correlating financial time series with micro-blogging activity. In Proceedings of WSDM’2012, 2012. • [Sakaki et al., 2010] T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In Proceedings of WWW’2010, 2010.