Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Utilizing Social Health Websites for Cognitive Computing and Clinical Decision Support Systems

3,084 views

Published on

Crowdsourced annotations data o ffers cognitive computing systems insights in lay semantics. This is especially important in health care, where medical terminology is often not aligned with patients `lay' language. However, the general crowd often has limited medical knowledge. Therefore this research investigated the opportunities of social health websites for obtaining ground truth annotations data for cognitive computing systems including clinical decision support systems. By identifying these websites and analyzing their data, it off ers a starting point for the future utilization of user-generated health content for cognitive systems. However, the opportunities of social health data are currently limited by various legal regulations. Therefore this paper also dwells on the legal aspects of implementing social health data for cognitive computing systems.

Published in: Healthcare
  • Be the first to comment

  • Be the first to like this

Utilizing Social Health Websites for Cognitive Computing and Clinical Decision Support Systems

  1. 1. Utilizing Social Health Websites for Cognitive Computing Exploring the Potential of User-Generated Health Content for Clinical Decision Support Systems Harriëtte Smook h.smook@vu.nl 28 October 2014
  2. 2. Cognitive Computing Systems ‘Prostheses’ for human cognition Interact naturally: Machines & users should be closer to each other by enabling machines to understand human natural language Introduce a new generation of Clinical Decision Support Systems Expand human cognition: Ease processes, especially those with large data sets or data that requires human interpretation. Learn by being used: Humans often can easily detect machine errors. Systems usage can be arranged in such a way that humans understand the system and the problems it solves. Apple Siri Google Glass IBM Watson Why Cognitive Systems? IBM Research. Retrieved from http://www.research.ibm.com/cognitive-computing/why-cognitive-systems.shtml, accessed 16 July 2014. Lora Aroyo. CrowdTruth: The 7 Myths of Human Annotation. Cognitive Computing Forum 2014. Retrieved from http://www.slideshare.net/laroyo/truth-is-a-lie-7-myths-about-human-annotation-cogcomputing-forum-2014, accessed 28 October 2014.
  3. 3. Clinical Decision Support Systems IBM Watson 2. Generates & evaluates! evidence-based hypothesis 1. Understands ! human natural language & human communication 3. Adapts & learns! from user selections & responses Transformational technologies combined Lora Aroyo. CrowdTruth: The 7 Myths of Human Annotation. Cognitive Computing Forum 2014. Retrieved from http://www.slideshare.net/laroyo/truth-is-a-lie-7-myths-about-human-annotation-cogcomputing-forum-2014, accessed 28 October 2014
  4. 4. How can Health 2.0 help cognitive computing systems? HealthUnlocked ? Health Tracking Tools: + Social Health Websites: ! PatientsLikeMe = ! … ! Collaboration of patients, medical experts and researchers Collective aggregation of information, experiences and data Tools for collecting, tracking and sharing health information: • Monitoring new treatments • Collecting real-world experiences • Patients have more explicit control over their own data
  5. 5. How can health 2.0 help cognitive computing systems? My patient has acute coryza! + = The crowd provides human perspectives: Crowdsourcing Human Semantics New generation of Clinical Decision Support Systems Doctors Patients Health-aware citizens Experts provide formal knowledge Well, I only have a cold.
  6. 6. How to utilize user-generated health content as training data for cognitive computing systems? 2. Data Analysis 3. Create Ground Truth Data Representativeness Validity Consistency Compare with existing Watson data 1. Gather the data PatientsLikeMe Publicly available pages
  7. 7. Data Analysis Important aspects for obtaining widespread health data Coverage of different medical conditions > 500 conditions Availability of different kinds of data Diverse health tracking tools Consistency in the used vocabulary 43% of the symptoms covered by UMLS Cultural and geographical dispersion of users > 260.000 users Website in English PatientsLikeMe (PLM) Catherine Arnott Smith and Paul J Wicks. Patientslikeme: Consumer health vocabulary as a folksonomy. In AMIA annual symposium proceedings, vol. 2008, p. 682. American Medical Informatics Association, 2008.
  8. 8. PLM Data Analysis Demographic analysis:! • Data analysis in terms of demographics & population • Countries of residence, gender & age Analysis of top-reported conditions:! • Prevalence on PLM vs. prevalence in the U.S. • Demographics per top-reported condition vs. official health statistics: • Gender, peak age & onset age Analysis of top-reported treatments:! • Top-reported treatments vs. official drug prescription statistics • PLM treatments per top-reported condition vs. officially listed treatments in U.S. Lexical Analysis:! • PLM conditions and treatments compared with official medical terminology (UMLS)
  9. 9. PLM Data Characteristics 373600 Patients Age Gender Gender per age category 233153 Unique members 99274 U.S. members 697 Conditions Current age Onset age 432 Conditions Reported treatments Perceived effectiveness of treatments 1617 Treatments Current patients Stopped patients Adherence Burden Costs Current duration Past duration Severity of side effects 1257 Treatments Reported purpose Perceived effectiveness per purpose 1172 Treatments Top reported dosages 1032 Treatments Top reasons why people stopped 663 Treatments Top reported side effects 663 Conditions Current patients Gender Primary condition Condition status Top reported symptoms
  10. 10. Demographic Analysis Countries of residence, gender and age 37% of PatientsLikeMe’s members lives in the United States Other United States United States United Kingdom Canada Australia India South Africa Ireland New Zealand Other 37,2% 4,2% 2,7% 1,1% 0,8% 0,3% 0,3% 0,2% 51,7%
  11. 11. The dataset is biased towards women Percentage of all members 10 9 8 7 6 5 4 3 2 1 0 Gender ratio 0 – 4 5 – 9 10 – 14 15 – 19 20 – 24 25 – 29 30 – 34 35 – 39 40 – 44 45 – 49 50 – 54 55 – 59 60 – 64 65 – 69 70 – 74 75 – 79 Age category 0,5 1,4 3,1 5,6 8,4 9,8 9,4 8,8 6,9 5,8 4,4 3 1,1 0,1 0,1 0,2 0,6 1,1 1,9 2,6 3 3,1 3,2 3,1 2,6 2,7 2,4 1,6 0,6 0,1 0,2 0,2 Male: 1 Female: 2,35
  12. 12. Percentage 18 16 14 12 10 8 6 4 2 0 People aged 30 - 70 are overrepresented USA PLM USA 0 – 4 5 – 9 10 – 14 15 – 19 20 – 24 25 – 29 30 – 34 35 – 39 40 – 44 45 – 49 50 – 54 55 – 59 60 – 64 65 – 69 70 – 74 75 – 79 Age category 1,6 3,4 6,7 11 15,1 16,7 15,6 14,3 11 9,2 6,5 3,9 1,3 0,2 0,4 0,4 2,4 3,2 4,4 5,7 6,6 7 7,2 6,7 6,2 6,8 6,6 6,9 7,1 6,5 6,6 6,7
  13. 13. Top-reported conditions Are more prevalent on PatientsLikeMe than in the United States Condition PLM US US 1 Fibromyalgia 21,4% 2% 2 Multiple Sclerosis! ! 19,3% 0,1% 3 Major Depressive Disorder 8,7% 6,7% 4 Generalized Anxiety Disorder 7% 3,1% 5 Chronic Fatigue Syndrome 6,6% 0,3% 6 Parkinson’s Disease 6,6% 0,3% 7 Epilepsy 4,5% 0,2% 8 Rheumatoid Arthritis 2,4% 0,6% 9 Amyotrophic Lateral Sclerosis 3,3% 0,01% 10 Post-Traumatic Stress Disorder 3,4% 3,6% U.S. most prevalent conditions are mainly related to heart disease and overweight
  14. 14. Demographics per condition Gender Women are overrepresented in all top conditions on PatientsLikeMe Peak age PLM patients suffering from mental health conditions are remarkably older than the peak age PLM patients suffering from conditions common among elderly are remarkably younger Onset age PLM patients suffering from mental health conditions experience these often already in their childhood
  15. 15. Top-reported treatments Are less popular prescription drugs in the U.S. Top-reported PLM treatments versus official U.S. rankings PLM Treatment U.S. rank 1 Gabapentin 20 2 Duloxetine n.a. 3 Pregabalin n.a. 4 Baclofen n.a. 5 Clonazepam n.a. 6 Copaxone n.a. 7 Levothyroxine 2 8 Tramadol 21 9 Lamotrigine n.a. 10 Bupropion n.a. Official U.S. rankings versus top-reported PLM treatments U.S. Treatment PLM rank 1 Hydrocodone Paracetamol 13 2 Levothyroxine Sodium 7 3 Lisinopril 37 4 Simvastatin 42 5 Metoprolol 53 6 Amlodipine 57 7 Omeprazole 9 8 Metformin 22 9 Salbutamol 28 10 Atorvastatin n.a. Frequently prescribed drugs in the U.S. are less popular on PLM
  16. 16. Lexical analysis The majority of the treatments and conditions is covered by UMLS Lexical tools:! • BeCas1 • UMLS Metathesaurus Browser2 • NCBO BioPortal Annotator3 • RxTerms4 All treatments and conditions from the data set are compared with UMLS! • Only 2 out of 1025 unique treatments & 9 out of 663 unique conditions are not covered: • Too general term (e.g. accidental fall) • Term is proposed and not yet included in UMLS or under discussion • Term is removed from UMLS • Term is not evidence-based and used by alternative healers 1. http://bioinformatics.ua.pt/becas/#!/about 2. http://uts.nlm.nih.gov/home.html 3. http://bioportal.bioontology.org/annotator 4. http://wwwcf.nlm.nih.gov/umlslicense/rxtermApp/rxTerm.cfm
  17. 17. Issues in utilizing user-generated health content as training data for cognitive computing systems Bias & Limitations Accessibility Privacy issues Each data source comes with bias and limitations that need to be considered Data is not easily accessible How to avoid?
  18. 18. Opportunities in utilizing user-generated health content as training data for cognitive computing systems Access to high coverage of (rare) medical conditions Access to patients and health-aware citizens as an intermediate between the general crowd and experts Knowledge from the patients’ perspective
  19. 19. In the future.. Perform analysis on data from alternative geographical contexts Perform analysis on data with different characteristics Generate better ground truth data

×