Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Identifying Rare Diseases from Behavioural Data: A Machine Learning Approach

163 views

Published on

Presented at CHASE 2016

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Identifying Rare Diseases from Behavioural Data: A Machine Learning Approach

  1. 1. IDENTIFYINGRAREDISEASESFROMBEHAVIOURALDATA: AMACHINELEARNINGAPPROACH Haley MacLeod Ÿ Shuo Yang Ÿ Kim Oakes Ÿ Kay Connelly Ÿ Sriraam Natarajan School of Informatics and Computing, Indiana University @haley_macleod www.haleymacleod.com
  2. 2. Technology in Consumer Health & Wellness •  Managing health •  Changing behaviour •  Learn about a disease •  Get support •  Track information
  3. 3. Rare Diseases < 0.05%
  4. 4. There are over 7,000 different rare diseases.
  5. 5. 10% of the world’s population has a rare disease.
  6. 6. If everyone with a rare disease lived in the same country, it would be the world'sthirdmostpopulousnation. Rare World
  7. 7. There are experiences in common between people with rare diseases.
  8. 8. These experiences are distinct from the experiences of people with common chronic Illnesses.
  9. 9. There are a wealth of opportunities to support these experiences through the design of appropriate technologies.
  10. 10. Can we distinguish between people with common chronic diseases and people with rare diseases?
  11. 11. •  5 – 7 years to get a diagnosis •  2 – 3 misdiagnoses •  Many different specialists & physicians
  12. 12. “They were all like, ‘We don’t know what it is. We don’t want to take this risk. So go away. We have other problems, other patients.’ So I told them, ‘Okay, go on Google and Google it.’ ‘No, no, go to somebody else’s door.’ So we lost a lot of hours. My son had this vomiting episode and it was very scary to have him in the car and going around to find a doctor that is going to be brave enough to give him those shots. And that’s when I said ‘Okay, I’m going to learn to give myself the shot.’”
  13. 13. “I’m always online looking up different things. I just want to be informed . . . I know that Australia is doing an awful lot of research . . . the United States isn’t really doing the kind of research that they are in Australia.”
  14. 14. Can we account for class imbalance as part of the classification process, rather than as a preprocessing step?
  15. 15. The cost of misclassifying the rare class is higher than the cost of misclassifying the common class.
  16. 16. If we assume the class of rare disease as the positive class, then we prefer recall (how many relevant items are selected) rather than precision (how many selected items are relevant).
  17. 17. Oversampling the Minority Class
  18. 18. Undersampling the Majority Class
  19. 19. Synthetic Minority Oversampling Technique (SMOTE) (Chawla et al., 2002)
  20. 20. Synthetic Minority Oversampling Technique (SMOTE) (Chawla et al., 2002)
  21. 21. Today’s Talk: Data (survey design & distribution) Results Approach (soft-margin functional gradient boosting) Concluding thoughts
  22. 22. •  Demographic Information •  Disease Information •  Technology Use •  Health Care Professionals 35 Questions, 4 Topics Data Survey
  23. 23. •  Demographic Information •  Age, gender, country, employment, education, marital status •  Disease Information •  Technology Use •  Health Care Professionals 35 Questions, 4 Topics Data Survey
  24. 24. •  Demographic Information •  Disease Information •  Disease name, years of symptoms, years of diagnosis, severity of symptoms •  Technology Use •  Health Care Professionals 35 Questions, 4 Topics Data Survey
  25. 25. •  Demographic Information •  Disease Information •  Technology Use •  Devices owned, health apps, information seeking, participation in health groups •  Health Care Professionals 35 Questions, 4 Topics Data Survey
  26. 26. •  Demographic Information •  Disease Information •  Technology Use •  Health Care Professionals •  Number of specialists, helpfulness, sources of information 35 Questions, 4 Topics Data Survey
  27. 27. •  30.93% rare diseases, 69.07% common chronic illnesses •  67 Diseases •  Age 18 – 71 •  39.00% male, 59.53%female •  22 countries (mainly US, Canada, UK, and Australia) 341 Responses Data Responses
  28. 28. Approach Classification Standard Functional Gradient Boosting Initial Model Data Predictions - = Gradients Iterate Induce +
  29. 29. Approach Classification Standard Functional Gradient Boosting Final Model: + + + …
  30. 30. Approach Classification Soft Margin Functional Gradient Boosting Gradients
  31. 31. Results Experiments Q1: How effective is a class imbalance approach in detecting rare diseases using self-reported behavioural data?
  32. 32. Results Experiments Q1: How effective is a class imbalance approach in detecting rare diseases using self-reported behavioural data? Standard Classification Methods •  Naïve Bayes •  Logistic Regression •  5-Nearest Neighbours •  Decision Trees Class Imbalance Methods •  Random oversampling •  Random undersampling •  SMOTE •  Soft-FGB VS.
  33. 33. Results Experiments Q1: How effective is a class imbalance approach in detecting rare diseases using self-reported behavioural data? Standard Methods Imbalance Methods 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 True Postive Rate 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 F3 Measure Standard Methods Imbalance Methods
  34. 34. Results Experiments Q2: How effective is our method in handling class imbalance as part of the classification process (as opposed to changing the class distribution)?
  35. 35. Results Experiments Q2: How effective is our method in handling class imbalance as part of the classification process (as opposed to changing the class distribution)? •  Soft-FGB •  Random oversampling •  Random undersampling •  SMOTE VS.
  36. 36. Results Experiments Q2: How effective is our method in handling class imbalance as part of the classification process (as opposed to changing the class distribution)? 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Over Under SMOTE Soft True Postive Rate 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Over Under SMOTE Soft F3 Measure Pre-Processing Classification ProcessPre-Processing Classification Process
  37. 37. Concluding Thoughts Discussion People with rare disease have unique challenges that are distinctly different from people with common chronic illnesses and this presents design opportunities not yet addressed by existing interventions and HCI research.
  38. 38. Concluding Thoughts Discussion People with rare diseases: •  Join health support groups •  Search for information •  Watch videos online •  Post their own videos to share with others •  Post data/test results
  39. 39. Concluding Thoughts Discussion People with common chronic illnesses: •  Never joined a group •  Never posted a review •  Don’t follow friends updates •  Trust health professionals •  Use smartphone apps
  40. 40. Concluding Thoughts Future Work •  Social media data •  Integrating into online platforms •  Exploring healthy populations
  41. 41. Haley MacLeod hemacleo@indiana.edu Kay Connelly connelly@indiana.edu Sriraam Natarajan natarasr@indiana.edu Shuo Yang shuoyang@indiana.edu Kim Oakes kimoakes@indiana.edu

×