Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Leveraging Text Classification Strategies for Clinical and Public Health Applications

573 views

Published on

Human-generated text is a critical component of recorded clinical data, yet remains an under-utilised resource in clinical informatics applications due to minimal standards for sharing of unstructured data as well as concerns about patient privacy. Where we can access and analyse clinical text, we find that it provides a hugely valuable resource. In this talk, I will describe two projects where we have used text classification as the basis for addressing a clinical objective: (1) a syndromic surveillance project where the task is the monitoring of health and social media data sources for changes that indicate the onset of disease outbreaks, and (2) the analysis of hospital records to enable retrieval of specific disease cases, for monitoring of the hospital case mix as well as for construction of patient cohorts for clinical research studies. I will end by briefly discussing the huge potential for clinical text analysis to support changing the way modern medicine is practised.

Published in: Health & Medicine
  • Be the first to comment

Leveraging Text Classification Strategies for Clinical and Public Health Applications

  1. 1. Leveraging Text Classification Strategies for Clinical and Public Health Applications Karin M. Verspoor @karinv karin.verspoor@unimelb.edu.au The University of Melbourne Melbourne, Victoria, Australia January 2016, Qatar Computing Research Institute
  2. 2. (clinical) Data everywhere • Electronic health records – Patient demographics and biometrics – Laboratory test results – Clinical notes • Radiology and pathology – Images: X-ray, MRI and PET Scans – (Synoptic) Reports • Databases – Health Service reporting – National Prescribing Service – Registry data, Births and Deaths – Medicare/insurance claim data etc…..
  3. 3. Don’t forget unstructured data! • About 80% of clinical information is in textual form – ED triage notes – Clinical progress notes – Radiology and Pathology reports – GP and specialist letters – Discharge summaries • Published Literature – Clinical Trials – Molecular-level studies • and … social media text!
  4. 4. How is text used in medicine? • Direct analysis of clinical records – Information retrieval for clinical trials – Syndromic surveillance – Hospital Services Research – Clinical Decision Support – Pharmacovigilance • Literature mining – Evidence-based medicine – Systematic Reviews
  5. 5. Evidence from EHRs Mining electronic health records: towards better research applications and clinical care Peter B. Jensen, Lars J. Jensen & Søren Brunak, Nature Reviews Genetics 13, 395-405 (2012) doi:10.1038/nrg3208
  6. 6. Pharmacovigilance from EHRs Mining of clinical records to identify adverse drug events Estimated >90% of adverse events do not appear in coded data 6 LePendu et al. (2013) “Pharmacovigilance Using Clinical Notes” Clinical Pharmacology & Therapeutics 93(6), 547–555; doi: 10.1038/clpt.2013.47
  7. 7. … from social media Pacific Symposium on Biocomputing Shared Task on Social Media Mining Classification of tweets: mention an Adverse Drug Reaction? ADR classified @NAME Q makes me hungry. Olanzapine made me want to eat my own arm! Non-ADR classified I couldnt be a chef without nicotine and caffeine
  8. 8. Outline Problem setting Approach and Results EHR disease classification Kocbek et al (2015) Evaluating classification power of linked admission data sources with text mining; Proceedings of the Scientific Stream at Big Data in Health Analytics 2015 (BigData 2015).
  9. 9. ICD classification of EHR data • We address the task of detecting clinical records in a large record system corresponding to a given diagnosis of interest, based on text analysis • We focus on lung cancer records for a pilot study • We developed a system that classifies each admission as positive or negative for lung cancer • Not as simple as looking for “lung cancer” or synonyms in the EHRs! Kocbek et al (2015) HISA Big Data conference. http://ceur-ws.org/Vol-1468/bd2015_kocbek.pdf
  10. 10. Alfred REASON platform Kocbek et al. Big Data 2015, Sydney • 15+ years of data from. • 171,000+ updates each day. • 62.4 million updates per annum.
  11. 11. Radiology question 50yo complaining of left shoulder pain. Tender generally. Difficulty abducting the shoulder past 45 degrees. Home on HITH tomorrow - either inpatient or outpatient please Task Radiology report Mobile Chest performed on 02-JUN-2012 at 08:27 AM: The nasogastric tube has its tip in the stomach. The tracheostomy is seen at T2 level. …. Pathology report Urine Culture Acc No: 12-183-0731Source: Urine ------------ URINE MICROSCOPY (PHASE CONTRAST) ------------- Leucocytes x10^6/L (Ref <10).... <10 Erythrocytes x10^6/L (Ref <10).. <10....... Additional data Age: 50 Date of admission: Jun/12 Gender: F Country: … Admission ICD-10 code
  12. 12. Data Characteristics • Extracted data for 2 financial years, 2012-2014: – 150,521 admissions, – 40,800 radiology reports with associated question, – 20,872 pathology reports, – 121,700 additional data entries (demographics, hospital admission info). • Admissions are associated to ICD-10 codes: – Used as ground truth – ICD-10 code C34.*; positive cases for lung cancer – 496 such positive admissions – an additional 496 non-lung cancer submissions randomly subsampled as negatives
  13. 13. Outline Problem setting Approach and Results EHR disease classification
  14. 14. Research Question • Most previous TM applications use a single textual data source from the EHR despite a diversity of potential data • What is the impact of using more than one textual data source for the EHR classification task? – Considering different text sources; – and including patient (structured) meta-data?
  15. 15. Methods Radiology reports Machine learning algorithm (SVM) Textual and other features Biomedical knowledge sources Language processing Classification Model Additional data Pathology reports Radiology questions REASON sources
  16. 16. Text Processing • Medical terminology recognition and normalisation using MetaMap • NegEx to detect negation and negation scope The nasogastric tube has its tip in the stomach. Meta Candidates (Total=9; Excluded=0; Pruned=0; Remaining=9) 1000 C0085678:Nasogastric tube [Medical Device] 1000 C0812428:Nasogastric tube (Nasogastric tube procedures) [Therapeutic Procedure] 861 C0175730:Tube (biomedical tube device) [Medical Device] 861 C0694637:Nasogastric (Nasogastric Route of Drug Administration) [Functional Concept] 861 C1547937:Tube NOS (Specimen Source Codes - Tube) [Intellectual Product] 861 C1561954:tube [Conceptual Entity] 861 C1704730:TUBE (Packaging Tube) [Medical Device] 861 C1704731:Tube (Tube Device Component) [Medical Device] 861 C3282907:Nasogastric [Body Location or Region] Meta Mapping (1000): 1000 C0085678:Nasogastric tube [Medical Device]
  17. 17. Features Texts • bag of (MetaMap) phrases – separate feature for Positive/Negative context – experimented with keeping phrases separated according to source, or merging across sources Patient meta-data • demographic data (gender, age, ethnic origin, country, language, marital status, religion, and death date) • hospital-related admission data (hospital code, admission date and time, discharge date and time, length of stay, reason for admission, admission unit, discharge unit, admission type, source, destination and criteria)
  18. 18. Experimental setting • Heavily skewed data: undersampling of negatives • 10-fold cross validation • Support Vector Machine (Weka)
  19. 19. Results: Lung Cancer 0.873 0.901 0.870 0.885 0.900 0.915 0.930 1 2 3 4 radiology reports + 1 data source (F-Score) radiology question pathology report additional data
  20. 20. 0.873 0.901 0.917 0.870 0.885 0.900 0.915 0.930 1 2 3 4 radiology reports + 2 additional data sources (F-score) radiology question pathology reports additional data Results: Lung Cancer
  21. 21. 0.873 0.901 0.917 0.930 0.870 0.885 0.900 0.915 0.930 1 2 3 4 F-Score using 4 data sources radiology question pathology reports additional data Results: Lung Cancer
  22. 22. Discussion • More data sources lead to better performance • The classifier with the highest performance was built using features from all four data sources • Merging sources into aggregate features better • Not all improvements are significant: – Radiology question and metadata add clear value – Pathology reports does not • Not all admissions had a pathology report associated with them.
  23. 23. Case study 1: Conclusions • We built a text mining system for detecting lung cancer admissions using machine learning methods. • Our results show more effective systems can generally be built by including multiple linked data sources. • Work in progress: – Other diseases – Imbalanced datasets – Feature engineering and selection 0.893 0.820 0.830 0.840 0.850 0.860 0.870 0.880 0.890 0.900 0.910 0.920 1 2 3 4 Breast cancer
  24. 24. Outline DOD with Twitter Emotion classification DOD signal 1: Tweet emotion shift DOD signal 2: Tweet lexical shift Disease Outbreak Detection Ofoghi et al (2016) Towards early discovery of salient health threats: A social media emotion classification technique; Pacific Symposium on Biocomputing.
  25. 25. 25
  26. 26. Twitter for Outbreak Detection Assumptions • People tweet about diseases in the context of emerging outbreaks • Twitter can provide an “early warning” of an outbreak “Tweets started to rise in Nigeria 3-7 days prior to the official announcement of the first probable Ebola case. The topics discussed in tweets include risk factors, prevention education, disease trends, and compassion.” Amer J Infection Control (2015)
  27. 27. “Early warning” tweets
  28. 28. Ebola on Twitter 28
  29. 29. Twitter for Outbreak Detection Strategy • Trends: counting of (hashtag, term) frequencies • Coupled with geographic origin of tweets • Sentiment or content analyis Challenges • High volume of (mostly irrelevant) tweets • Hashtags alone may not be adequate • A mention of a disease does not necessarily indicate an active case
  30. 30. Many reasons to mention Ebola
  31. 31. DOD with Twitter | Previous Work 31
  32. 32. Is there a local emergent threat? Can we use shifts in emotional and lexical content of tweets to detect a disease outbreak?
  33. 33. A sliding window model
  34. 34. Ebola event/background data Dataset Date (±7) pre-corpus post-corpus #tweets |vocab| #tweets |vocab| ebola-event-1 29-Dec-14 73 204 337 906 ebola-event-2 31-Jan-15 165 700 90 417 ebola-background 16-12-14 429 1453 340 1208
  35. 35. Outline DOD with Twitter Emotion classification DOD signal 1: Tweet emotion shift DOD signal 2: Tweet lexical shift
  36. 36. Emotion classes • ECs: Ekman’s six basic emotions plus … – News-related – Criticism – Sarcasm https://www.behance.net/gallery/6-Basic-Emotions/930168 Sarcastic atsign atsign think I got Ebola there two minutes ago News-related atsign Another 4 American Ebola workers flown back to USA for monitoring..
  37. 37. Emotion classifier data • Data: collection – Twitter API – Second half of March 2015 – Total of 12,101 tweets – Contained “ebola” or “#ebola” – 4,405 tweets remained after some filtering… – Amazon’s Mechanical Turk was used to label tweets
  38. 38. Lexicon-Based Classification • Created an emotion vocabulary – Profile of Mood States (POMS) – FrameNet – Existing “feelings list” – Wikipedia • Vector space model – Binary vector per emotion – Binary vector per tweet – Cosine Similarity emotion vs tweet 1 2 3 anxious 4 5 6 7 affronted 8 9 497 498 499 :-| . . . https://bitbucket.org/readbiomed/socialsurveillance
  39. 39. 39
  40. 40. Outline DOD with Twitter Emotion classification DOD signal 1: Tweet emotion shift DOD signal 2: Tweet lexical shift
  41. 41. Emotion class distribution Classes Dataset p-value 6 emotions ebola-event-1 0.004* ebola-event-2 0.002* ebola-backgr. 0.259 6 emotions + 3 add’l ebola-event-1 0.009* ebola-event-2 0.007* ebola-backgr. 0.079 paired t-test, pre- and post-event windows; * Statistically significant at 5% level
  42. 42. Jensen-Shannon divergence Class ebola- event-1 ebola- event-2 ebola-bg. Sarcasm 0.0227 0.0032 0.1365 News-rel. 0.0226 0.0001 0.0074 Anger 0.0572 0.0382 0.0169 Criticism 0.0180 0.0056 0.0060 Surprise 0.1161 0.0220 0.0023 Fear 0.0768 0.0813 0.0913 Happiness 0.0444 0.0415 0.0064 Disgust 0.0604 0.0025 0.0044 Sadness 0.0023 0.0322 0.0060 AVERAGE 0.0467 0.0252 0.0308 Big differences compared with background, in both e1 and e2
  43. 43. Outline DOD with Twitter Emotion classification DOD signal 1: Tweet emotion shift DOD signal 2: Tweet lexical shift
  44. 44. Lexical shift analysis Within-corpus analysis: Cross-corpus analysis:
  45. 45. Term freq changes: Event 1
  46. 46. Term freq changes: Background
  47. 47. Case study 2: Conclusions • We introduced an Ebola tweet-based emotion classifier. • There are statistically significant differences in the distribution of emotion classes and lexical items in tweets preceding and following a salient emergent health threat. • This effect does not occur in a neutral background collection. Proposal: • Disease outbreak detection can be supported with monitoring of tweets using a sliding window model that tests for such distributional changes
  48. 48. Conclusions • There are myriad problems in the clinical context where unstructured data can be leveraged to good effect • Text classification is one tool that can be drawn on to make use of this unstructured data • Heterogeneous data integration is also important • Challenges exist in – Terminology – Skewed data – Missing data
  49. 49. Acknowledgements • Amazon Mechanical Turkers • James McCaw, Melbourne School of Population and Global Health Bahador Ofoghi Lawrence CavedonSimon Kocbek
  50. 50. Thank you!
  51. 51. ML-Based Classification • MALLET Naïve Bayes • Features – bag of words[+lem,-lem] – Lexicon-based similarity – emotion vocabulary – emoticons – punctuation – (Stanford) sentiment
  52. 52. KL-Divergence, full vocabulary
  53. 53. Emotion-level distribution KL-divergence (pre- vs. post-event, post- vs. pre-) P(x) and Q(x) represent probability of positive and negative emotion classes in the respective corpora
  54. 54. Top lexically distinct items
  55. 55. Log Likelihood analysis

×