Can Twitter & Co. Save Lives?

Can Twitter & Co. Save Lives?
Nattiya Kanhabua, Avaré Stewart, Sara Romano
Ernesto Diaz-Aviles, Wolf Siberski, and Wolfgang Nejdl
L3S Research Center / Leibniz Universität Hannover, Germany
Research Seminar @MPII, Saarbrücken
22 October 2013

Motivation
• Numerous works use Twitter to infer the existence
and magnitude of real-world events in real-time
– Earthquake [Sakaki et al., 2010]
– Predicting financial time series [Ruiz et al., 2012]
– Influenza epidemics [Culotta, 2010; Lampos et al.,
2011; Paul et al., 2011]

Health related tweets
• User status updates or news related to
public health are common in Twitter
– I have the mumps...am I alone?
– my baby girl has a Gastroenteritis so great!! Please
do not give it to meee
– #Cholera breaks out in #Dadaab refugee camp in
#Kenya http://t.co/....
– As many as 16 people have been found infected with
Anthrax in Shahjadpur upazila of the Sirajganj district
in Bangladesh.

Basic Approach
[Kanhabua et al., CIKM’12]

M-Eco System
Medical Ecosystem: Personalized Event-based Surveillance
http://www.meco-project.eu/

Data Collection
• Official outbreak reports
– ~3,000 ProMED-mail reports from 2011
– WHO reports have very small coverage
• Twitter data
– ~1,200 health-related terms (i.e., infectious
diseases, their synonyms, pathogens and symptoms)
– Over 112 millions of tweets from 2011
• Series of NLP tools including
– OpenNLP (tokenization, sentence splitting, POS
tagging)
– OpenCalais (named entity recognition)
– HeidelTime (temporal expression extraction)

Ground Truths
[Kanhabua et al., TAIA’ 12]

Event Extraction
• An event is a sentence containing two entities
– (1) medical condition and (2) geographic expression
– A minimum requirement by domain experts
• A victim and the time of an event can be identified
from the sentence itself, or its surrounding context
• Output: a set of event candidates
Reported by World Health Organization (WHO) on
29 July 2012 about an ongoing Ebola outbreak
in Uganda since the beginning of July 2012
[Kanhabua et al., TAIA’ 12]

Message Filtering: Challenges
• Ambiguity
– having several meanings
– used in different contexts
• Incompleteness
– missing or under-reported events
– data processing errors

Message Filtering: Challenges
• Ambiguity
– having several meanings
– used in different contexts
• Incompleteness
– missing or under-reported events
– data processing errors
Category Example tweet
Literature A two hour train journey, Love In the Time of Cholera ...
Music Dengue Fever’s “Uku,” Mixed by Paul Dreux Smith
Universal Audio...
Marketing Exclusive distributor of high quality #HIV/AIDS Blood &
Urine and #Hepatitis #Self -testers.
General Identification of genotype 4 Hepatitis E virus binding
proteins on swine liver cells: Hepatitis E virus...
Negative i dont have sniffles and no real coughing..well its
coughing but not like an influenza cough.
Joke Thought I had Bieber Fever. Ends up I just had a combo
of the mumps, mono, measles & the hershey squ...

Challenge I. Noisy/evolving
• Evolving data
– Relevant features changes over time

Approach for Noisy Data
• MedISys1
– providing a list of negative keywords created
by medical experts
• Urban Dictionary2
– a Web-based dictionary of slang, ethnic
culture words or phrases
1
http://medusa.jrc.it/medisys/homeedition/en/home.html
2
http://www.urbandictionary.com/

[Kanhabua and Nejdl, WOW’ 13]

Signal Generation: Challenges
• Temporal Dynamics
– seasonal infectious diseases
– rare and spontaneous outbreaks
• Location Dynamics
– frequency and duration
– levels of prevalence or severity

[Rortais et al., 2010 in Journal of Food Research International]

[Emch et al., 2008 in International Journal of Health Geographics]

Outbreak Categorization
How to generate a reliable
signal for low aggregate counts?

Approach
[Kanhabua and Nejdl, WOW’ 13]

Temporal Diversity
• Refined Jaccard Index (RDJ-index)
– average Jaccard similarity of all object pairs
• Note: lower RDJ corresponds to higher diversity
• Problem: “All-Pair comparison”
• Solution: Estimation algorithms with probabilistic
error bound guarantees
[Deng et al., CIKM’
∑<−
=
ji
ji OOJS
nn
RDJ ),(
)1(
2
nji ≤<≤1
∩ UU
Jaccard similarity

Temporal Diversity
• Refined Jaccard Index (RDJ-index)
– average Jaccard similarity of all object pairs
• Note: lower RDJ corresponds to higher diversity
• Problem: “All-Pair comparison”
• Solution: Estimation algorithms with probabilistic
error bound guarantees
[Deng et al., CIKM’
∑<−
=
ji
ji OOJS
nn
RDJ ),(
)1(
2
nji ≤<≤1
∩ UU
Jaccard similarity
(1) Top-k terms
(2) Entities

Threat Assessment: Challenge
• Overwhelming with the large number of tweets

Approach
• Personalized Tweet Ranking for Epidemic
Intelligence
– Learning to rank and recommender systems
– User's context as implicit criteria for recommendation
[Diaz-Aviles et al., ICWSM’
12, Diaz-Aviles et al., WWW’

Conclusion
• Can Twitter & Co. Save Lives?
– On a global level, we were able to generate
signals earlier than official reporting
mechanisms.
– The ultimate answer depends on: how a health
organization will use and react to information
provided by our system.

Future Work
• Real-Time Analysis of Big and Fast
Social Web Streams
– Scalable, efficient methods for filtering and
generating signals in real-time
– Effective methods for aggregating and
visualizing information in a meaningful way

References• [Culotta, 2010] A. Culotta. Towards detecting influenza epidemics by analyzing twitter
messages. In Proceedings of the First Workshop on Social Media Analytics (SOMA’2010), 2010.
• [Diaz-Aviles et al., 2012a] E. Diaz-Aviles, A. Stewart, E. Velasco, K. Denecke, and W. Nejdl.
Towards personalized learning to rank for epidemic intelligence based on social media streams.
In Proceedings of the 21st World Wide Web Conference (WWW ‘2012), 2012.
• [Diaz-Aviles et al., 2012b] E. Diaz-Aviles, A. Stewart, E. Velasco, K. Denecke, and W. Nejdl.
Epidemic intelligence for the crowd, by the crowd. In Proceedings of International AAAI
Conference on Weblogs and Social Media (ICWSM’2012), 2012.
• [Kanhabua et al., 2012a] N. Kanhabua, Sara Romano, and A. Stewart, Identifying Relevant
Temporal Expressions for Real-world Events, In SIGIR 2012 Workshop on Time-aware
Information Access (TAIA'2012), 2012.
• [Kanhabua et al., 2012b] N. Kanhabua, Sara Romano, and A. Stewart and W. Nejdl. Supporting
Temporal Analytics for Health Related Events in Microblogs. In Proceedings of CIKM'2012, 2012.
• [Kanhabua and Nejdl 2013] N. Kanhabua and W. Nejdl. Understanding the Diversity of Tweets
in the Time of Outbreaks. In Proceedings of the First International Web Observatory Workshop
(WOW'2013) at WWW'2013, 2013.
• [Lampos et al., 2011] V. Lampos and N. Cristianini. Nowcasting events from the social web with
statistical learning. ACM TIST, 3, 2011.
• [Paul et al., 2011] M. J. Paul and M. Dredze. You are what you tweet: Analyzing twitter for public
health. In Proceedings of International AAAI Conference on Weblogs and Social Media
(ICWSM’2011), 2011.
• [Ruiz et al., 2012] E. J. Ruiz, V. Hristidis, C. Castillo, A. Gionis, and A. Jaimes. Correlating
financial time series with micro-blogging activity. In Proceedings of WSDM’2012, 2012.
• [Sakaki et al., 2010] T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users:
real-time event detection by social sensors. In Proceedings of WWW’2010, 2010.

Can Twitter & Co. Save Lives?

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (19)

Similar to Can Twitter & Co. Save Lives?

Similar to Can Twitter & Co. Save Lives? (20)

Recently uploaded

Recently uploaded (20)

Can Twitter & Co. Save Lives?

Editor's Notes