Validating search protocols for mining of health and disease events on Twitter
1. Validating search protocols
for mining of health and disease events
on Twitter
Aditya Lia Ramadona1,2*, Lutfan Lazuardi3, Sulistyawati1,4,
Anwar Dwi Cahyono5, Åsa Holmner6, Hari Kusnanto3, Joacim Rocklöv1
The International Conference on Public Health (ICPH)
Solo, Indonesia; September 14-15, 2016
https://arxiv.org/abs/1608.05910
2.
3. Introduction Twitter
• free social networking and micro-
blogging service
• 140-character: news, events,
personal feeling and experiences, …
• May 2016: 24.34 million Indonesian
active users ~ 10% (Statista, 2016)
Twitter offers streams of the public
data flowing
• might contain health-related
information
• can be explored for public health
monitoring and surveillance
purposes (Paul et al. 2016)
Indonesia Social Media Trend (Jakpat, 2016)
4. Introduction
Previous studies
• Signorini et al. 2011: track levels of disease activity
• Eichstaedt et al. 2015: predicts heart disease mortality
• Strom et al. 2013: measuring health-related quality of life
• many more…
Methodological challenges
• data and language processing
• model development
www.bahasakita.com
5. Subjects and Methods
Develop groups of words and phrases relevant to disease symptoms
and health outcomes in the Bahasa Indonesia
historical Twitter
Twitter stream14d
real-time
6. Subjects and Methods
Sentiment analysis
• examining a tweet from Twitter feeds
• the decisions were made by people with expert knowledge
millions of tweets: time-consuming and inefficient
Replicating expert assessment
• develop a model, interpret results and adjust the model
• make predictions
7. Results: text analysis
Historical Twitter feeds: 390 tweets
• "rumah OR sakit OR rawat OR inap OR demam OR panas -cuaca OR berdarah
OR pendarahan OR tombosit OR badan OR muntah OR badan OR tua OR ':('"
Preprocessing
• removing retweets and eliminate some noise
• removing punctuation, numbers, capitalization, and the Bahasa stop-words
(e.g. kamu and aja)
[107] "@XYZ kamu izin aja, bilang kamu sakit :(("
[107] "xyz izin bilang sakit"
8. Results: text analysis
1,632 words
• the most highly correlate
words: sakit (sick, ill, pain)
hati (0.23) ~ shame, broken heart, …
rasa (0.13) ~ pain
perut (0.12) ~ stomach ache
Figure 1. Words that appear more than 10 times
9. Results: model development
Predictors
• highest words frequencies (22)
• counting the number of the predictor words in a tweet
Classification and Regression Trees model
(Breiman et al. 1983)
• rpart package (Therneau et al. 2015)
10. Results: model development
390 tweets
historical Twitter feeds
• 273 tweets (70%): training
• 117 tweets (30%): validating
1,145,649 tweets
Twitter stream feeds: testing
Indonesia: between 11°S and 6°N and 95°E and 141°E,
7 days: 26th July – 1st August 2016
• 100 from 6,109 TRUE results
• 100 from 1,139,540 FALSE result
13. Results
Model Performance Validation Testing
AUC 0.82 0.70
Sensitivity 80.0 42.0
Specifity 84.6 98.0
Positive Predictive Value 86.7 95.5
Negative Predictive Value 77.2 62.8
14. Limitations + Challenges = Future Work
team member involved
• academics, health workers
Twitter users
• telecommunications infrastructure
• characteristics of people
methods
• data: streaming (Indonesia, 7d/24h ~ 1.5GB in csv format)
• model: CART, RandomForest, GBM, …
15. Summary
Monitoring of public sentiment on Twitter + contextual knowledge
• a nearly real-time proxy for health-related indicators
Models do not replace expert judgment
• accurately analyze small amounts of information (tweets)
• improve and refine the model
• bias and emotion: integrate assessments of many experts
17. 1 Department of Public Health and Clinical Medicine, Epidemiology and Global Health, Umeå University
2 Center for Environmental Studies, Universitas Gadjah Mada
3 Department of Public Health, Faculty of Medicine, Universitas Gadjah Mada
4 Department of Public Health, Universitas Ahmad Dahlan
5 District Health Office, Yogyakarta
6 Department of Radiation Sciences, Umeå University
*alramadona@ugm.ac.id
www.themexpert.com/images/easyblog_articles/270/twitter_cover.jpg