Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Statistically solvingsneezessniffles odsc2016


Published on

Published in: Science
  • Be the first to comment

  • Be the first to like this

Statistically solvingsneezessniffles odsc2016

  1. 1. Statistically Solving Sneezes and Sniffles - A Work In Progress #ODSC 2016 License: CC By Attribution Ian Ozsvald @IanOzsvald Giles Weaver @GilesWeaver
  2. 2. ODSC 2016 @IanOzsvald @gilesweaver Who Are We? ● Ian - “Industrial Data Scientist” for 15 yrs ● Giles - bioinformatician turned Data Sci.
  3. 3. ODSC 2016 @IanOzsvald @gilesweaver Goal ● Help my wife have a less sneezy life - therefore try to understand “what drives a person's Rhinitis?” (i.e. sneezes) ● Can we help folk reduce symptoms by explaining the drivers of those symptoms? A step towards “personalised medicine”? ● Could we help people reduce their medication? ● 10–30% of Western population affected by Allergic Rhinitis (overall ≈1.4 billion people?) ● Some antihistamines (AH) have negative health associations - (e.g. anticholinergics [inc. U.S. Benadryl] linked to Alzheimers) ● UK folk don't tend to use these AHs but nobody knows the consequences of long-term usage
  4. 4. ODSC 2016 @IanOzsvald @gilesweaver Counts of daily sneezes & AH Using Seaborn and Pandas DataFrames for countplots
  5. 5. ODSC 2016 @IanOzsvald @gilesweaver Hypothesis ● “Ian's wife Emily suffers from non-allergic Rhinitis” (not allergic or infectious Rhinitis) ● “Possibly it is weather related” ● “Alcohol might make things worse” ● “Airborne pollution might be a factor” ● We need to gather data so we can answer these questions ● Note - sneeze & AH behaviour similar out of the country and when at home (I'm not the cause! Nor, probably, is our cat, nor the apartment)
  6. 6. ODSC 2016 @IanOzsvald @gilesweaver Data Gathering Methodology ● iOS ● Event logs ● GPS trace ● Editable history ● Open Src ● >1yr old
  7. 7. ODSC 2016 @IanOzsvald @gilesweaver Some data issues ● Apple's DateTime epoch is != the Unix DateTime epoch (use ISO 8601!) ● GPS on London Underground on iPhone 6 confidently reports location (0,0) # Nigeria?! ● Weak experimental design (in hindsight) - we're logging positive events - does “0 events” mean “nothing happened” or “we forgot to log stuff”? ● SQLite→DataFrame with Python for clean-up
  8. 8. ODSC 2016 @IanOzsvald @gilesweaver Sneezes and AH over 1yr Self-logged data by Emily
  9. 9. ODSC 2016 @IanOzsvald @gilesweaver Sneezing by hour & day of week
  10. 10. ODSC 2016 @IanOzsvald @gilesweaver How long does an AH last for? Uses: Plan your day? Compare effectiveness of different treatments?
  11. 11. ODSC 2016 @IanOzsvald @gilesweaver Learning Relationships ● Antihistamine usage is ≈50/50 use/no use per day - treat as binary classification problem (not timeseries) ● We want a robust, interpretable model ● Logistic Regression with randomly shuffled rows and cross validation ● Can we find any strong features?
  12. 12. ODSC 2016 @IanOzsvald @gilesweaver Features - weather and pollution Annual NO2 pollution via R package for Wunderground London City Airport
  13. 13. ODSC 2016 @IanOzsvald @gilesweaver 1 Year Model ● 84 features (raw & augmented), 330 rows of daily data (resampled from sub-second timestamped raw events) ● Add diet tracker, GPS locations, use of London Underground (Oyster) ● Take a complex model, strip it down, remove everything that doesn't feel right... ● Left with few consistently predictive features - Sneezes per day, Previous day's AH usage <sigh> ● Everything else is not very predictive ● What's wrong with 1 year of data? ● Are signals like external humidity and temperature etc useful as a predictor in e.g. mid-summer or winter?
  14. 14. ODSC 2016 @IanOzsvald @gilesweaver April-Aug 2015 Model Days when Emily exposed to 'the weather', not in a climate controlled office - suddenly some features emerge These boxplots show LogReg. coefs. from 5000 models built on 80% randomly sampled training data and scores on 20% test data Do we trust this?
  15. 15. ODSC 2016 @IanOzsvald @gilesweaver Introducing ‘Nasalcrom’ ● “Part Two” - new treatment ● Discussed at Kings College with Professor Clive Page and colleague Dr. Emlyn Page (my PyDataLondon co-chair) ● Ruled out allergic reaction (yay!) ● Suggestion was to try Nasalcrom – probably more benign that Loratidine ● New Hypothesis - “NasalCrom is similarly effective to Loratidine”
  16. 16. ODSC 2016 @IanOzsvald @gilesweaver Introducing ‘Nasalcrom’
  17. 17. ODSC 2016 @IanOzsvald @gilesweaver Improving the App’s UI ● We need to log ‘no events happened’ rather than ‘nothing got recorded’ (which might mean we forgot to log events) ● We need feedback in UI to show that medication is being taken consistently ● Some trend display in the App would probably be useful ● Record “Feeling coldy”
  18. 18. ODSC 2016 @IanOzsvald @gilesweaver Thoughts to pass on ● Do you have all the data you need to answer your questions? ● Is the data quality high enough? ● Feel free to use our data logger (link earlier) – could you tackle a similar challenge? ● Doing this has opened new doors...
  19. 19. ODSC 2016 @IanOzsvald @gilesweaver Conclusion ● Challenging problem - we have found 1 potentially predictive signal from scratch ● We can answer “how effective is an antihistamine” ● Nasalcrom and Loratidine seem equally effective ● Thanks to:
  20. 20. ODSC 2016 @IanOzsvald @gilesweaver Does Alcohol Increase Sneezing? "Possibly" - we need cleaner data. Hat tip to Jon Sedar for PyMC3 model