SlideShare a Scribd company logo
1 of 20
Download to read offline
Statistically Solving Sneezes and
Sniffles - A Work In Progress
#ODSC 2016
License: CC By Attribution
Ian Ozsvald @IanOzsvald ModelInsight.io
Giles Weaver @GilesWeaver
Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
Who Are We?
●
Ian - “Industrial Data Scientist” for 15 yrs
●
Giles - bioinformatician turned Data Sci.
Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
Goal
●
Help my wife have a less sneezy life - therefore try to understand
“what drives a person's Rhinitis?” (i.e. sneezes)
●
Can we help folk reduce symptoms by explaining the drivers of
those symptoms? A step towards “personalised medicine”?
●
Could we help people reduce their medication?
●
10–30% of Western population affected by Allergic Rhinitis (overall
≈1.4 billion people?)
●
Some antihistamines (AH) have negative health associations -
(e.g. anticholinergics [inc. U.S. Benadryl] linked to Alzheimers)
●
UK folk don't tend to use these AHs but nobody knows the
consequences of long-term usage
Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
Counts of daily sneezes & AH
Using Seaborn and Pandas DataFrames for countplots
Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
Hypothesis
●
“Ian's wife Emily suffers from non-allergic Rhinitis” (not
allergic or infectious Rhinitis)
●
“Possibly it is weather related”
●
“Alcohol might make things worse”
●
“Airborne pollution might be a factor”
●
We need to gather data so we can answer these
questions
●
Note - sneeze & AH behaviour similar out of the country
and when at home (I'm not the cause! Nor, probably, is
our cat, nor the apartment)
Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
Data Gathering Methodology
●
iOS
●
Event logs
●
GPS trace
●
Editable history
●
Open Src
●
>1yr old
github.com/radicalrobot/allergy-tracker
Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
Some data issues
●
Apple's DateTime epoch is != the Unix DateTime
epoch (use ISO 8601!)
●
GPS on London Underground on iPhone 6
confidently reports location (0,0) # Nigeria?!
●
Weak experimental design (in hindsight) - we're
logging positive events - does “0 events” mean
“nothing happened” or “we forgot to log stuff”?
●
SQLite→DataFrame with Python for clean-up
Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
Sneezes and AH over 1yr
Self-logged data by Emily
Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
Sneezing by hour & day of week
Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
How long does an AH last for?
Uses: Plan your day? Compare effectiveness of different treatments?
Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
Learning Relationships
●
Antihistamine usage is ≈50/50 use/no
use per day - treat as binary
classification problem (not timeseries)
●
We want a robust, interpretable model
●
Logistic Regression with randomly
shuffled rows and cross validation
●
Can we find any strong features?
Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
Features - weather and pollution
Annual NO2 pollution
via LondonAir.org.ukweatherData R package for Wunderground London City Airport
Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
1 Year Model
●
84 features (raw & augmented), 330 rows of daily data (resampled
from sub-second timestamped raw events)
●
Add diet tracker, GPS locations, use of London Underground (Oyster)
●
Take a complex model, strip it down, remove everything that doesn't
feel right...
●
Left with few consistently predictive features - Sneezes per day,
Previous day's AH usage <sigh>
●
Everything else is not very predictive
●
What's wrong with 1 year of data?
●
Are signals like external humidity and temperature etc useful as a predictor
in e.g. mid-summer or winter?
Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
April-Aug 2015 Model
Days when Emily
exposed to 'the
weather', not in a
climate controlled
office - suddenly
some features
emerge
These boxplots show
LogReg. coefs. from
5000 models built on
80% randomly
sampled training
data and scores on
20% test data
Do we trust this?
Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
Introducing ‘Nasalcrom’
●
“Part Two” - new treatment
●
Discussed at Kings College with Professor
Clive Page and colleague Dr. Emlyn Page
(my PyDataLondon co-chair)
●
Ruled out allergic reaction (yay!)
●
Suggestion was to try Nasalcrom –
probably more benign that Loratidine
●
New Hypothesis - “NasalCrom is similarly
effective to Loratidine”
Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
Introducing ‘Nasalcrom’
Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
Improving the App’s UI
●
We need to log ‘no events happened’ rather than
‘nothing got recorded’ (which might mean we forgot to
log events)
●
We need feedback in UI to show that medication is
being taken consistently
●
Some trend display in the
App would
probably be useful
●
Record “Feeling coldy”
Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
Thoughts to pass on
●
Do you have all the data you need to
answer your questions?
●
Is the data quality high enough?
●
Feel free to use our data logger (link
earlier) – could you tackle a similar
challenge?
●
Doing this has opened new doors...
Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
Conclusion
●
Challenging problem - we have found 1
potentially predictive signal from scratch
●
We can answer “how effective is an
antihistamine”
●
Nasalcrom and Loratidine seem equally
effective
●
Thanks to:
Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
Does Alcohol Increase Sneezing?
"Possibly" - we need cleaner data. Hat tip to Jon Sedar for PyMC3 model

More Related Content

Viewers also liked

High-level Web Testing
High-level Web TestingHigh-level Web Testing
High-level Web Testingpetersergeant
 
Your first website in under a minute with Dancer
Your first website in under a minute with DancerYour first website in under a minute with Dancer
Your first website in under a minute with DancerxSawyer
 
A Wikipedian-in-Residence at the British Museum
A Wikipedian-in-Residence at the British MuseumA Wikipedian-in-Residence at the British Museum
A Wikipedian-in-Residence at the British MuseumMatthew Cock
 
Avoiding common pitfalls of datetime from a webapp's perspective
Avoiding common pitfalls of datetime from a webapp's perspectiveAvoiding common pitfalls of datetime from a webapp's perspective
Avoiding common pitfalls of datetime from a webapp's perspectiveindradhanush92
 
A music-sharing model of crowdsourcing in museums by ADOMultimedia
A music-sharing model of crowdsourcing in museums by ADOMultimediaA music-sharing model of crowdsourcing in museums by ADOMultimedia
A music-sharing model of crowdsourcing in museums by ADOMultimediaStefania Zardini Lacedelli
 
Our local state, my, my - Understanding Perl variables
Our local state, my, my - Understanding Perl variablesOur local state, my, my - Understanding Perl variables
Our local state, my, my - Understanding Perl variablesxSawyer
 
bongaus.fi - Spotting Service Powered by Django
bongaus.fi - Spotting Service Powered by Djangobongaus.fi - Spotting Service Powered by Django
bongaus.fi - Spotting Service Powered by DjangoJuho Vepsäläinen
 
Why Twitter Is All the Rage: A Data Miner's Perspective
Why Twitter Is All the Rage: A Data Miner's PerspectiveWhy Twitter Is All the Rage: A Data Miner's Perspective
Why Twitter Is All the Rage: A Data Miner's PerspectiveMatthew Russell
 
Schema Design at Scale
Schema Design at ScaleSchema Design at Scale
Schema Design at ScaleRick Copeland
 
Exploring slides
Exploring slidesExploring slides
Exploring slidesakaptur
 
Mining Social Web Data Like a Pro: Four Steps to Success
Mining Social Web Data Like a Pro: Four Steps to SuccessMining Social Web Data Like a Pro: Four Steps to Success
Mining Social Web Data Like a Pro: Four Steps to SuccessMatthew Russell
 
Mining Social Web APIs with IPython Notebook (Strata 2013)
Mining Social Web APIs with IPython Notebook (Strata 2013)Mining Social Web APIs with IPython Notebook (Strata 2013)
Mining Social Web APIs with IPython Notebook (Strata 2013)Matthew Russell
 
Limitless and recursion-free recursion limits!
Limitless and recursion-free recursion limits!Limitless and recursion-free recursion limits!
Limitless and recursion-free recursion limits!akaptur
 
Wroc.py #32: Microservices in flask
Wroc.py #32: Microservices in flaskWroc.py #32: Microservices in flask
Wroc.py #32: Microservices in flaskKrzysztof Żuraw
 
Frozen Perl 2011 Keynote
Frozen Perl 2011 KeynoteFrozen Perl 2011 Keynote
Frozen Perl 2011 Keynotebrian d foy
 
Rails vs Web2py
Rails vs Web2pyRails vs Web2py
Rails vs Web2pyjonromero
 
Automating Data Exploration SciPy 2016
Automating Data Exploration SciPy 2016Automating Data Exploration SciPy 2016
Automating Data Exploration SciPy 2016Gramener
 

Viewers also liked (20)

High-level Web Testing
High-level Web TestingHigh-level Web Testing
High-level Web Testing
 
Your first website in under a minute with Dancer
Your first website in under a minute with DancerYour first website in under a minute with Dancer
Your first website in under a minute with Dancer
 
Heroku Inside
Heroku InsideHeroku Inside
Heroku Inside
 
A Wikipedian-in-Residence at the British Museum
A Wikipedian-in-Residence at the British MuseumA Wikipedian-in-Residence at the British Museum
A Wikipedian-in-Residence at the British Museum
 
Avoiding common pitfalls of datetime from a webapp's perspective
Avoiding common pitfalls of datetime from a webapp's perspectiveAvoiding common pitfalls of datetime from a webapp's perspective
Avoiding common pitfalls of datetime from a webapp's perspective
 
A music-sharing model of crowdsourcing in museums by ADOMultimedia
A music-sharing model of crowdsourcing in museums by ADOMultimediaA music-sharing model of crowdsourcing in museums by ADOMultimedia
A music-sharing model of crowdsourcing in museums by ADOMultimedia
 
Our local state, my, my - Understanding Perl variables
Our local state, my, my - Understanding Perl variablesOur local state, my, my - Understanding Perl variables
Our local state, my, my - Understanding Perl variables
 
bongaus.fi - Spotting Service Powered by Django
bongaus.fi - Spotting Service Powered by Djangobongaus.fi - Spotting Service Powered by Django
bongaus.fi - Spotting Service Powered by Django
 
Why Twitter Is All the Rage: A Data Miner's Perspective
Why Twitter Is All the Rage: A Data Miner's PerspectiveWhy Twitter Is All the Rage: A Data Miner's Perspective
Why Twitter Is All the Rage: A Data Miner's Perspective
 
Schema Design at Scale
Schema Design at ScaleSchema Design at Scale
Schema Design at Scale
 
Exploring slides
Exploring slidesExploring slides
Exploring slides
 
Mining Social Web Data Like a Pro: Four Steps to Success
Mining Social Web Data Like a Pro: Four Steps to SuccessMining Social Web Data Like a Pro: Four Steps to Success
Mining Social Web Data Like a Pro: Four Steps to Success
 
Mining Social Web APIs with IPython Notebook (Strata 2013)
Mining Social Web APIs with IPython Notebook (Strata 2013)Mining Social Web APIs with IPython Notebook (Strata 2013)
Mining Social Web APIs with IPython Notebook (Strata 2013)
 
Limitless and recursion-free recursion limits!
Limitless and recursion-free recursion limits!Limitless and recursion-free recursion limits!
Limitless and recursion-free recursion limits!
 
Canjs
CanjsCanjs
Canjs
 
Wroc.py #32: Microservices in flask
Wroc.py #32: Microservices in flaskWroc.py #32: Microservices in flask
Wroc.py #32: Microservices in flask
 
Frozen Perl 2011 Keynote
Frozen Perl 2011 KeynoteFrozen Perl 2011 Keynote
Frozen Perl 2011 Keynote
 
Rails vs Web2py
Rails vs Web2pyRails vs Web2py
Rails vs Web2py
 
Automating Data Exploration SciPy 2016
Automating Data Exploration SciPy 2016Automating Data Exploration SciPy 2016
Automating Data Exploration SciPy 2016
 
Speccer
SpeccerSpeccer
Speccer
 

Recently uploaded

User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
Biological classification of plants with detail
Biological classification of plants with detailBiological classification of plants with detail
Biological classification of plants with detailhaiderbaloch3
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxzaydmeerab121
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxuniversity
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlshansessene
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicAditi Jain
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 

Recently uploaded (20)

User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Biological classification of plants with detail
Biological classification of plants with detailBiological classification of plants with detail
Biological classification of plants with detail
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptx
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girls
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by Petrovic
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 

Statistically solvingsneezessniffles odsc2016

  • 1. Statistically Solving Sneezes and Sniffles - A Work In Progress #ODSC 2016 License: CC By Attribution Ian Ozsvald @IanOzsvald ModelInsight.io Giles Weaver @GilesWeaver
  • 2. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald @gilesweaver Who Are We? ● Ian - “Industrial Data Scientist” for 15 yrs ● Giles - bioinformatician turned Data Sci.
  • 3. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald @gilesweaver Goal ● Help my wife have a less sneezy life - therefore try to understand “what drives a person's Rhinitis?” (i.e. sneezes) ● Can we help folk reduce symptoms by explaining the drivers of those symptoms? A step towards “personalised medicine”? ● Could we help people reduce their medication? ● 10–30% of Western population affected by Allergic Rhinitis (overall ≈1.4 billion people?) ● Some antihistamines (AH) have negative health associations - (e.g. anticholinergics [inc. U.S. Benadryl] linked to Alzheimers) ● UK folk don't tend to use these AHs but nobody knows the consequences of long-term usage
  • 4. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald @gilesweaver Counts of daily sneezes & AH Using Seaborn and Pandas DataFrames for countplots
  • 5. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald @gilesweaver Hypothesis ● “Ian's wife Emily suffers from non-allergic Rhinitis” (not allergic or infectious Rhinitis) ● “Possibly it is weather related” ● “Alcohol might make things worse” ● “Airborne pollution might be a factor” ● We need to gather data so we can answer these questions ● Note - sneeze & AH behaviour similar out of the country and when at home (I'm not the cause! Nor, probably, is our cat, nor the apartment)
  • 6. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald @gilesweaver Data Gathering Methodology ● iOS ● Event logs ● GPS trace ● Editable history ● Open Src ● >1yr old github.com/radicalrobot/allergy-tracker
  • 7. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald @gilesweaver Some data issues ● Apple's DateTime epoch is != the Unix DateTime epoch (use ISO 8601!) ● GPS on London Underground on iPhone 6 confidently reports location (0,0) # Nigeria?! ● Weak experimental design (in hindsight) - we're logging positive events - does “0 events” mean “nothing happened” or “we forgot to log stuff”? ● SQLite→DataFrame with Python for clean-up
  • 8. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald @gilesweaver Sneezes and AH over 1yr Self-logged data by Emily
  • 9. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald @gilesweaver Sneezing by hour & day of week
  • 10. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald @gilesweaver How long does an AH last for? Uses: Plan your day? Compare effectiveness of different treatments?
  • 11. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald @gilesweaver Learning Relationships ● Antihistamine usage is ≈50/50 use/no use per day - treat as binary classification problem (not timeseries) ● We want a robust, interpretable model ● Logistic Regression with randomly shuffled rows and cross validation ● Can we find any strong features?
  • 12. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald @gilesweaver Features - weather and pollution Annual NO2 pollution via LondonAir.org.ukweatherData R package for Wunderground London City Airport
  • 13. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald @gilesweaver 1 Year Model ● 84 features (raw & augmented), 330 rows of daily data (resampled from sub-second timestamped raw events) ● Add diet tracker, GPS locations, use of London Underground (Oyster) ● Take a complex model, strip it down, remove everything that doesn't feel right... ● Left with few consistently predictive features - Sneezes per day, Previous day's AH usage <sigh> ● Everything else is not very predictive ● What's wrong with 1 year of data? ● Are signals like external humidity and temperature etc useful as a predictor in e.g. mid-summer or winter?
  • 14. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald @gilesweaver April-Aug 2015 Model Days when Emily exposed to 'the weather', not in a climate controlled office - suddenly some features emerge These boxplots show LogReg. coefs. from 5000 models built on 80% randomly sampled training data and scores on 20% test data Do we trust this?
  • 15. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald @gilesweaver Introducing ‘Nasalcrom’ ● “Part Two” - new treatment ● Discussed at Kings College with Professor Clive Page and colleague Dr. Emlyn Page (my PyDataLondon co-chair) ● Ruled out allergic reaction (yay!) ● Suggestion was to try Nasalcrom – probably more benign that Loratidine ● New Hypothesis - “NasalCrom is similarly effective to Loratidine”
  • 16. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald @gilesweaver Introducing ‘Nasalcrom’
  • 17. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald @gilesweaver Improving the App’s UI ● We need to log ‘no events happened’ rather than ‘nothing got recorded’ (which might mean we forgot to log events) ● We need feedback in UI to show that medication is being taken consistently ● Some trend display in the App would probably be useful ● Record “Feeling coldy”
  • 18. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald @gilesweaver Thoughts to pass on ● Do you have all the data you need to answer your questions? ● Is the data quality high enough? ● Feel free to use our data logger (link earlier) – could you tackle a similar challenge? ● Doing this has opened new doors...
  • 19. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald @gilesweaver Conclusion ● Challenging problem - we have found 1 potentially predictive signal from scratch ● We can answer “how effective is an antihistamine” ● Nasalcrom and Loratidine seem equally effective ● Thanks to:
  • 20. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald @gilesweaver Does Alcohol Increase Sneezing? "Possibly" - we need cleaner data. Hat tip to Jon Sedar for PyMC3 model