Norovirus and Twitter - GSS data analysis competition 2015

•Download as PPTX, PDF•

2 likes•461 views

A short presentation on the use of tweets to measure and predict Norovirus outbreaks. This talk was given at the 2015 mini-conference on data science as one of four competition finalists, and the project eventually won.

Data & Analytics

Using Twitter to
predict Norovirus
outbreaks
David Millson
on behalf of Callum Staff

© 2015 Food Standards Agency
Predicting and reducing Norovirus
• Tweets discussing Norovirus and its symptoms were first identified as a
proxy indicator for Norovirus cases through an MSc project
• Outbreaks are predicted using rises in tweets about Norovirus
symptoms (diarrhoea and vomiting)
• Predictions are used to inform FSA and NHS Choices interventions
• Interventions can prevent outbreaks from getting out of control by
encouraging sufferers to stay at home and avoid passing the bug on

© 2015 Food Standards Agency
Citizen Need / Business Need
• Citizen Need:
• More timely surveillance = quicker reactions = more cases prevented
• Lower burden on economy and public
• Business Need:
• To understand the predictive power of social media data
• Demonstrate to Government the value of including social media
analysis in surveillance strategy

© 2015 Food Standards Agency
FSA and social media analysis – from little acorns
• Cabinet Office approached the FSA to pilot a joint project with Ipsos
MORI using machine learning to categorise Tweets
• Led in producing cross-government guidance on social media research
• Set up a review and innovation group to bring the expertise of industry
and academia to Government social media research
• Designed and presented workshop on using social media in policy and
analysis

© 2015 Food Standards Agency
Crowd-sourcing the keywords

© 2015 Food Standards Agency
Excluding bad keywords

© 2015 Food Standards Agency
Do people really tweet when they have Norovirus?

© 2015 Food Standards Agency
The trade-off between usefulness and rigour
• We can rigorously predict Norovirus cases three weeks after they
happen
• To be useful to Communications, we need to predict them three weeks
before they happen
Tweets
Community cases

© 2015 Food Standards Agency
Calibrating the Cut Off Value

© 2015 Food Standards Agency
Calibrating the Cut Off Value
0.35

© 2015 Food Standards Agency
Calibrating the Cut Off Value
0.30

© 2015 Food Standards Agency
Calibrating the Cut Off Value
0.25

© 2015 Food Standards Agency
Calibrating the Cut Off Value
0.20

© 2015 Food Standards Agency
Outbreak
predicted
Outbreak reduced
(hopefully)

© 2015 Food Standards Agency
Value for Money
• Cost of the project
• One analyst working approx. one day a week for 2/3 year ~ £2,500
• NHS Choices spend/external research brings total to approx. £20,000
• Cost of Norovirus
• Estimated 2.8 million cases in the UK a year at a cost of £120 million
WOULD NEED TO PREVENT JUST 500 CASES A YEAR, OR 0.02% OF
THE TOTAL, TO BE PROVIDING VALUE FOR MONEY

© 2015 Food Standards Agency
Spatial Mapping

© 2015 Food Standards Agency
Summary
• A marriage of “supply” and “demand”:
• Twitter identified as a measure of Norovirus, providing information
much more rapidly than lab reports
• A need to roll out public information on Norovirus at the right time to
make the biggest impact
• A gateway project to demonstrate the value of social media analysis
• Low cost and therefore low risk, with potentially high rewards

What's hot

STB owners in KenyaIpsos

Presentation1chapin777

IAOS 2018 - Why food security and poverty analysis must be anchored in the Na...StatsCommunications

Nantes Orpin Poster 2016Peter Orpin

DrVisit, Inc. Presentation May 2014Tom Chapman

Team Graphic7_alex

Innovative nursing informatics course to increase the scope of itTanzil Al Gazmir

PMTCT Q Data Review Meeting_KigomaVisualBee.com

Influenza sentinel-surveillance confirmation-ofparticipationSakibpedia

SO1-IR1Alexis Coppola

[Infographic Guide] Physician Trends - Physician NetworkingJackson Physician Search

What's hot (11)

STB owners in Kenya

Presentation1

IAOS 2018 - Why food security and poverty analysis must be anchored in the Na...

Nantes Orpin Poster 2016

DrVisit, Inc. Presentation May 2014

Team Graphic

Innovative nursing informatics course to increase the scope of it

PMTCT Q Data Review Meeting_Kigoma

Influenza sentinel-surveillance confirmation-ofparticipation

SO1-IR1

[Infographic Guide] Physician Trends - Physician Networking

Similar to Norovirus and Twitter - GSS data analysis competition 2015

Recent updates in TB programmeAvantikaGupta33

An ontology and platform for collecting impact: practical impact reportingORCID, Inc

National aids control program 4drrahul4publichealth

Responding to Non COVID-19: Identification of deterioration in childrenInnovation Agency

National Pork Board Update - 2015National Pork Board

Rntcp and national strategic plan(nsp) for tbWal

Scaling-up through Living LabsAALForum

Revised national tuberculosis elimination program pro.pptxDr. Mohammad Abas Reshi

National Tuberculosis Elimination Programme.pptxDarshnaSarvaiya2

NPPMTBI6-WPS Office.pptxSudipta Roy

Bayer Meet Management in London on September 30, 2014Bayer

Lagos state hiv responseNigerianBusinessCoal

Policy briefing launch: Ready to rollout – Improving uptake of routine immuni...ILC- UK

dallas symposium - Year Zero presentationdallas_events

Impact of decentralization on immunization services in kenyaJSI

critical review_RNTCP1 -Isha Porwal

Telehealth: InTechnology’s new patient monitoring services for the NHSInTechnology Managed Services (part of Redcentric)

TB CONTROL PROGRAM.pptxCbu

Towards TB elimination - Giovanni Battista MiglioriWAidid

Quality care indicators - 20/09/2012Anais IV CBED

Similar to Norovirus and Twitter - GSS data analysis competition 2015 (20)

Recent updates in TB programme

An ontology and platform for collecting impact: practical impact reporting

National aids control program 4

Responding to Non COVID-19: Identification of deterioration in children

National Pork Board Update - 2015

Rntcp and national strategic plan(nsp) for tb

Scaling-up through Living Labs

Revised national tuberculosis elimination program pro.pptx

National Tuberculosis Elimination Programme.pptx

NPPMTBI6-WPS Office.pptx

Bayer Meet Management in London on September 30, 2014

Lagos state hiv response

Policy briefing launch: Ready to rollout – Improving uptake of routine immuni...

dallas symposium - Year Zero presentation

Impact of decentralization on immunization services in kenya

critical review_RNTCP1 -

Telehealth: InTechnology’s new patient monitoring services for the NHS

TB CONTROL PROGRAM.pptx

Towards TB elimination - Giovanni Battista Migliori

Quality care indicators - 20/09/2012

Recently uploaded

In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940

如何办理(UCLA毕业证书）加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样jk0tkvfv

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann

Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Voces Mineras

Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan

一比一原版(曼大毕业证书）曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark

sourabh vyas1222222222222222222244444444saurabvyas476

DS Lecture-1 about discrete structure .pptTanveerAhmed817946

Displacement, Velocity, Acceleration, and Second Derivatives23050636

Case Study 4 Where the cry of rebellion happen?RemarkSemacio

Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh +966572737505 get cytotec

Pentesting_AI and security challenges of AIf6x4zqzk86

Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...ThinkInnovation

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795

Ranking and Scoring Exercises for ResearchRajesh Mondal

SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli

如何办理(UPenn毕业证书）宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证acoha1

DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSSnehalVinod

Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher

社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社

Recently uploaded (20)

In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia

如何办理(UCLA毕业证书）加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK

Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...

Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...

一比一原版(曼大毕业证书）曼尼托巴大学毕业证成绩单留信学历认证一手价格

sourabh vyas1222222222222222222244444444

DS Lecture-1 about discrete structure .ppt

Displacement, Velocity, Acceleration, and Second Derivatives

Case Study 4 Where the cry of rebellion happen?

Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec

Pentesting_AI and security challenges of AI

Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed

Ranking and Scoring Exercises for Research

SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...

如何办理(UPenn毕业证书）宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证

DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS

Reconciling Conflicting Data Curation Actions: Transparency Through Argument...

社内勉強会資料_Object Recognition as Next Token Prediction

Norovirus and Twitter - GSS data analysis competition 2015

1. Using Twitter to predict Norovirus outbreaks David Millson on behalf of Callum Staff

2. THE PROJECT

3. © 2015 Food Standards Agency Predicting and reducing Norovirus • Tweets discussing Norovirus and its symptoms were first identified as a proxy indicator for Norovirus cases through an MSc project • Outbreaks are predicted using rises in tweets about Norovirus symptoms (diarrhoea and vomiting) • Predictions are used to inform FSA and NHS Choices interventions • Interventions can prevent outbreaks from getting out of control by encouraging sufferers to stay at home and avoid passing the bug on

4. © 2015 Food Standards Agency Citizen Need / Business Need • Citizen Need: • More timely surveillance = quicker reactions = more cases prevented • Lower burden on economy and public • Business Need: • To understand the predictive power of social media data • Demonstrate to Government the value of including social media analysis in surveillance strategy

5. © 2015 Food Standards Agency FSA and social media analysis – from little acorns • Cabinet Office approached the FSA to pilot a joint project with Ipsos MORI using machine learning to categorise Tweets • Led in producing cross-government guidance on social media research • Set up a review and innovation group to bring the expertise of industry and academia to Government social media research • Designed and presented workshop on using social media in policy and analysis

6. BUILDING THE MODEL

10. © 2015 Food Standards Agency The trade-off between usefulness and rigour • We can rigorously predict Norovirus cases three weeks after they happen • To be useful to Communications, we need to predict them three weeks before they happen Tweets Community cases

11. © 2015 Food Standards Agency The trade-off between usefulness and rigour • We can rigorously predict Norovirus cases three weeks after they happen • To be useful to Communications, we need to predict them three weeks before they happen Tweets Community cases

18. USING THE RESULTS

20. VALUE FOR MONEY

21. © 2015 Food Standards Agency Value for Money • Cost of the project • One analyst working approx. one day a week for 2/3 year ~ £2,500 • NHS Choices spend/external research brings total to approx. £20,000 • Cost of Norovirus • Estimated 2.8 million cases in the UK a year at a cost of £120 million WOULD NEED TO PREVENT JUST 500 CASES A YEAR, OR 0.02% OF THE TOTAL, TO BE PROVIDING VALUE FOR MONEY

22. NEXT STEPS

24. © 2015 Food Standards Agency Summary • A marriage of “supply” and “demand”: • Twitter identified as a measure of Norovirus, providing information much more rapidly than lab reports • A need to roll out public information on Norovirus at the right time to make the biggest impact • A gateway project to demonstrate the value of social media analysis • Low cost and therefore low risk, with potentially high rewards

Editor's Notes

Staff members were asked on the FSA’s Yammer network (a closed social media network) to share their ideas of words that might be used when discussing norovirus. The results were used in a keyword search to build the dataset of tweets.
As well as keywords that we want to include, there are keywords that often go hand-in-hand with them and indicate that the subject of the tweet was unrelated to Norovirus. We also crowd-sourced these exclusions, as well as looking out for common terms that indicated a “red herring”
Assuming that lab reports are themselves a good indicator of Norovirus cases in the community, this chart demonstrates that tweets including “sickness bug” and related terms are a similarly good indicator, reproducing not just the seasonality but features such as the double peak in the winter of 2012/13 and the relative heights of the peaks. For this graph, both lab reports and volumes are smoothed by averaging over seven week periods.
In order to be a predictive tool, we need to identify the characteristics of the tweets curve at a time prior to the peak of the cases. We therefore look at a lagged set of data where we are comparing tweets to “future” cases.
In order to be a predictive tool, we need to identify the characteristics of the tweets curve at a time prior to the peak of the cases. We therefore look at a lagged set of data where we are comparing tweets to “future” cases.
The predictive model, whether it be logistic regression or naïve Bayes, will give us a value that we need to convert into a one or a zero. We need to choose a cutoff value that suits our needs for interventions. This could be done by minimising false positives, maximising true positives, or some other method. The outcome we want it a method that gives an early warning. We are willing to accept false positives in spring and summer on the basis that these can be eliminated by inspection.
The predictive model, whether it be logistic regression or naïve Bayes, will give us a value that we need to convert into a one or a zero. We need to choose a cutoff value that suits our needs for interventions. This could be done by minimising false positives, maximising true positives, or some other method. The outcome we want it a method that gives an early warning. We are willing to accept false positives in spring and summer on the basis that these can be eliminated by inspection.
The predictive model, whether it be logistic regression or naïve Bayes, will give us a value that we need to convert into a one or a zero. We need to choose a cutoff value that suits our needs for interventions. This could be done by minimising false positives, maximising true positives, or some other method. The outcome we want it a method that gives an early warning. We are willing to accept false positives in spring and summer on the basis that these can be eliminated by inspection.
The predictive model, whether it be logistic regression or naïve Bayes, will give us a value that we need to convert into a one or a zero. We need to choose a cutoff value that suits our needs for interventions. This could be done by minimising false positives, maximising true positives, or some other method. The outcome we want it a method that gives an early warning. We are willing to accept false positives in spring and summer on the basis that these can be eliminated by inspection.
The predictive model, whether it be logistic regression or naïve Bayes, will give us a value that we need to convert into a one or a zero. We need to choose a cutoff value that suits our needs for interventions. This could be done by minimising false positives, maximising true positives, or some other method. The outcome we want it a method that gives an early warning. We are willing to accept false positives in spring and summer on the basis that these can be eliminated by inspection.
The predictive model, whether it be logistic regression or naïve Bayes, will give us a value that we need to convert into a one or a zero. We need to choose a cutoff value that suits our needs for interventions. This could be done by minimising false positives, maximising true positives, or some other method. The outcome we want it a method that gives an early warning. We are willing to accept false positives in spring and summer on the basis that these can be eliminated by inspection.
NHS Choices have created infographics to influence people’s behaviour when they or their children catch Norovirus. These will be released when the model predicts an outbreak (these images are not yet finalised, and may still be subject to change).
Epidemiology and cost of nosocomial gastroenteritis, Avon, England, 2002-2003. Lopman BA1, Reacher MH, Vipond IB, Hill D, Perry C, Halladay T, Brown DW, Edmunds WJ, Sarangi J. http://www.ncbi.nlm.nih.gov/pubmed/15504271 Longitudinal study of infectious intestinal disease in the UK (IID2 study): incidence in the community and presenting to general practice Open Access Clarence C Tam1, Laura C Rodrigues1, Laura Viviani1, Julie P Dodds2, Meirion R Evans3, Paul R Hunter4, Jim J Gray5, Louise H Letley2, Greta Rait2, David S Tompkins6, Sarah J O'Brien7 On behalf of the IID2 Study Executive Committee* http://gut.bmj.com/content/early/2011/06/26/gut.2011.238386.short?q=w_gut_ahead_tab
Some tweets are geotagged, and most carry some location information such as the address of the user. This could potentially be used to map Norovirus outbreaks as they occur, and target interventions even more effectively.

Norovirus and Twitter - GSS data analysis competition 2015

Recommended

Recommended

More Related Content

What's hot

What's hot (11)

Similar to Norovirus and Twitter - GSS data analysis competition 2015

Similar to Norovirus and Twitter - GSS data analysis competition 2015 (20)

Recently uploaded

Recently uploaded (20)

Norovirus and Twitter - GSS data analysis competition 2015

Editor's Notes