1. Statistically Solving Sneezes and
Sniffles - A Work In Progress
#ODSC 2016
License: CC By Attribution
Ian Ozsvald @IanOzsvald ModelInsight.io
Giles Weaver @GilesWeaver
2. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
Who Are We?
●
Ian - “Industrial Data Scientist” for 15 yrs
●
Giles - bioinformatician turned Data Sci.
3. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
Goal
●
Help my wife have a less sneezy life - therefore try to understand
“what drives a person's Rhinitis?” (i.e. sneezes)
●
Can we help folk reduce symptoms by explaining the drivers of
those symptoms? A step towards “personalised medicine”?
●
Could we help people reduce their medication?
●
10–30% of Western population affected by Allergic Rhinitis (overall
≈1.4 billion people?)
●
Some antihistamines (AH) have negative health associations -
(e.g. anticholinergics [inc. U.S. Benadryl] linked to Alzheimers)
●
UK folk don't tend to use these AHs but nobody knows the
consequences of long-term usage
5. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
Hypothesis
●
“Ian's wife Emily suffers from non-allergic Rhinitis” (not
allergic or infectious Rhinitis)
●
“Possibly it is weather related”
●
“Alcohol might make things worse”
●
“Airborne pollution might be a factor”
●
We need to gather data so we can answer these
questions
●
Note - sneeze & AH behaviour similar out of the country
and when at home (I'm not the cause! Nor, probably, is
our cat, nor the apartment)
6. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
Data Gathering Methodology
●
iOS
●
Event logs
●
GPS trace
●
Editable history
●
Open Src
●
>1yr old
github.com/radicalrobot/allergy-tracker
7. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
Some data issues
●
Apple's DateTime epoch is != the Unix DateTime
epoch (use ISO 8601!)
●
GPS on London Underground on iPhone 6
confidently reports location (0,0) # Nigeria?!
●
Weak experimental design (in hindsight) - we're
logging positive events - does “0 events” mean
“nothing happened” or “we forgot to log stuff”?
●
SQLite→DataFrame with Python for clean-up
10. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
How long does an AH last for?
Uses: Plan your day? Compare effectiveness of different treatments?
11. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
Learning Relationships
●
Antihistamine usage is ≈50/50 use/no
use per day - treat as binary
classification problem (not timeseries)
●
We want a robust, interpretable model
●
Logistic Regression with randomly
shuffled rows and cross validation
●
Can we find any strong features?
12. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
Features - weather and pollution
Annual NO2 pollution
via LondonAir.org.ukweatherData R package for Wunderground London City Airport
13. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
1 Year Model
●
84 features (raw & augmented), 330 rows of daily data (resampled
from sub-second timestamped raw events)
●
Add diet tracker, GPS locations, use of London Underground (Oyster)
●
Take a complex model, strip it down, remove everything that doesn't
feel right...
●
Left with few consistently predictive features - Sneezes per day,
Previous day's AH usage <sigh>
●
Everything else is not very predictive
●
What's wrong with 1 year of data?
●
Are signals like external humidity and temperature etc useful as a predictor
in e.g. mid-summer or winter?
14. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
April-Aug 2015 Model
Days when Emily
exposed to 'the
weather', not in a
climate controlled
office - suddenly
some features
emerge
These boxplots show
LogReg. coefs. from
5000 models built on
80% randomly
sampled training
data and scores on
20% test data
Do we trust this?
15. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
Introducing ‘Nasalcrom’
●
“Part Two” - new treatment
●
Discussed at Kings College with Professor
Clive Page and colleague Dr. Emlyn Page
(my PyDataLondon co-chair)
●
Ruled out allergic reaction (yay!)
●
Suggestion was to try Nasalcrom –
probably more benign that Loratidine
●
New Hypothesis - “NasalCrom is similarly
effective to Loratidine”
17. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
Improving the App’s UI
●
We need to log ‘no events happened’ rather than
‘nothing got recorded’ (which might mean we forgot to
log events)
●
We need feedback in UI to show that medication is
being taken consistently
●
Some trend display in the
App would
probably be useful
●
Record “Feeling coldy”
18. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
Thoughts to pass on
●
Do you have all the data you need to
answer your questions?
●
Is the data quality high enough?
●
Feel free to use our data logger (link
earlier) – could you tackle a similar
challenge?
●
Doing this has opened new doors...
19. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
Conclusion
●
Challenging problem - we have found 1
potentially predictive signal from scratch
●
We can answer “how effective is an
antihistamine”
●
Nasalcrom and Loratidine seem equally
effective
●
Thanks to:
20. Ian.Ozsvald@ModelInsight.io ODSC 2016 @IanOzsvald
@gilesweaver
Does Alcohol Increase Sneezing?
"Possibly" - we need cleaner data. Hat tip to Jon Sedar for PyMC3 model