Semantic Processing of Twitter Traffic for Epidemic Surveillance David S. Hale, Alla Keselman, Thomas C. Rindflesch Lister...
Pandemic Preparedness <ul><li>Early detection is critical to effective response </li></ul><ul><li>“ The truth is out there...
Monitoring the Internet for Syndromic Surveillance <ul><li>Current methods: keywords analysis </li></ul><ul><ul><li>News <...
Distribution of “swine flu”  google query
The Future of Syndromic Surveillance   <ul><li>Social  media, chatter </li></ul><ul><ul><li>Blogs , “Microblogging” </li><...
Twitter <ul><li>Micro-blogging service </li></ul><ul><li>SMS gateway enables posting from mobile devices </li></ul><ul><ul...
Tweet Characteristics <ul><li>Format: [username] [text] [date time client] </li></ul><ul><li>Length: Text limited to 140  ...
Tweet Content <ul><li>Some provide (purported) information  </li></ul><ul><ul><li>Authority not determined </li></ul></ul>...
Tweets:  Examples <ul><li>CDC tips for preventing the flu: wash hands often and stay home when sick </li></ul><ul><li>Okla...
NLP Analysis <ul><li>Unified Medical Language System (UMLS) </li></ul><ul><ul><li>Medical concepts in semantic types (or c...
Monitoring Twitter with NLP <ul><li>Processed 1300 Twitter posts </li></ul><ul><ul><li>Known to be about swine flue </li><...
Schema:  UMLS Semantic Types <ul><li>Focus output  </li></ul><ul><ul><li>In the area of interest </li></ul></ul><ul><ul><l...
MetaMap and SemRep Output <ul><li>Tweet </li></ul><ul><ul><li>Texas confirms third case of swine flu </li></ul></ul><ul><l...
Results:  Most Frequent Concepts <ul><li>371 Family suidae [Mammal] </li></ul><ul><li>324 Influenza [Disease or Syndrome] ...
Results:  Filtered through Schema <ul><li>Disease or Syndrome: Influenza </li></ul><ul><li>Sign or Symptom: Coughing </li>...
Results:  PROCESS_OF Relation <ul><li>Influenza PROCESS_OF Family suidae </li></ul><ul><li>Influenza PROCESS_OF Farmer, un...
Next Steps <ul><li>Twitter access </li></ul><ul><li>Further testing for effectiveness </li></ul><ul><li>Refine filters (fr...
Opportunities <ul><li>Biosurveillance </li></ul><ul><li>Monitoring of wide-spread sentiment </li></ul><ul><li>Targeted inf...
Conclusion <ul><li>Exploiting the Internet for disaster preparedness </li></ul><ul><li>Assessing public sentiment and even...
Upcoming SlideShare
Loading in...5
×

Semantic Processing of Twitter Traffic for Epidemic Surveillance

2,598

Published on

overview of initial research, utilizing semantic natural language processing of "swine flu" related Twitter posts and outline of next steps

Published in: Health & Medicine, Technology

Semantic Processing of Twitter Traffic for Epidemic Surveillance

  1. 1. Semantic Processing of Twitter Traffic for Epidemic Surveillance David S. Hale, Alla Keselman, Thomas C. Rindflesch Lister Hill National Center for Biomedical Communications Specialized Information Services
  2. 2. Pandemic Preparedness <ul><li>Early detection is critical to effective response </li></ul><ul><li>“ The truth is out there, it is just not indexed well” </li></ul><ul><ul><li>(A bumper sticker; NLM parking lot) </li></ul></ul><ul><li>Disaster information traffic: delays, loss, overload </li></ul><ul><li>Outbreaks data requires fast collection / dissemination </li></ul><ul><ul><li>Collection – from disease to syndromic surveillance </li></ul></ul><ul><ul><li>Dissemination – from formal announcements to informal channels </li></ul></ul><ul><li>Government agencies are entering web 2.0 innovations </li></ul><ul><ul><li>E.g., CDC on Twitter </li></ul></ul>
  3. 3. Monitoring the Internet for Syndromic Surveillance <ul><li>Current methods: keywords analysis </li></ul><ul><ul><li>News </li></ul></ul><ul><ul><ul><li>Aggregator and visualization tools (e.g., HealthMap) </li></ul></ul></ul><ul><ul><li>Web searches - queries </li></ul></ul><ul><ul><ul><li>Google Trends; Google Flu Trends </li></ul></ul></ul><ul><ul><ul><li>Brownstein et al. – peak for “food poisoning” preceded peak for “salmonella”, “peanut butter”, “recall” </li></ul></ul></ul><ul><li>Requires massive amounts of data </li></ul><ul><li>Ambiguous as to searchers’ precise information needs </li></ul>
  4. 4. Distribution of “swine flu” google query
  5. 5. The Future of Syndromic Surveillance <ul><li>Social media, chatter </li></ul><ul><ul><li>Blogs , “Microblogging” </li></ul></ul><ul><ul><li>More real-time data </li></ul></ul><ul><ul><li>Monitor sentiment as well as events </li></ul></ul><ul><li>NLP analysis </li></ul><ul><ul><li>Requires less data / lower computational intensity </li></ul></ul><ul><ul><li>More informative </li></ul></ul><ul><ul><ul><li>“ swine flu” and “travel” VS. “how fast swine flu travels” and “is it safe to travel during a swine flu epidemic” </li></ul></ul></ul>
  6. 6. Twitter <ul><li>Micro-blogging service </li></ul><ul><li>SMS gateway enables posting from mobile devices </li></ul><ul><ul><li>Users post without breaking context or setting </li></ul></ul><ul><ul><li>JIT (just-in-time) blogging </li></ul></ul><ul><li>API promotes community development of user experience and interaction </li></ul><ul><li>4-5 million users (Nov 2008) </li></ul><ul><li>17 million visitors (April 2009) </li></ul>
  7. 7. Tweet Characteristics <ul><li>Format: [username] [text] [date time client] </li></ul><ul><li>Length: Text limited to 140 characters </li></ul><ul><li>Char Set: Not limited to ISO 8859-1 Western (Latin) </li></ul><ul><li>Grammaticality: Variable </li></ul><ul><li>Hashtags (#): Denote topics </li></ul><ul><ul><li>Primarily utilized by experienced users </li></ul></ul>
  8. 8. Tweet Content <ul><li>Some provide (purported) information </li></ul><ul><ul><li>Authority not determined </li></ul></ul><ul><li>Majority express opinions </li></ul><ul><ul><li>Often with humor or sarcasm </li></ul></ul><ul><li>Value for syndromic surveillance </li></ul><ul><ul><li>Source for assessing public sentiment </li></ul></ul><ul><ul><li>Observation of information trending </li></ul></ul><ul><ul><li>As a guide for government action </li></ul></ul>
  9. 9. Tweets: Examples <ul><li>CDC tips for preventing the flu: wash hands often and stay home when sick </li></ul><ul><li>Oklahoma health officials say swine flu headed to state, public needs to take precautions </li></ul><ul><li>Napolitano says “not a pandemic” yet </li></ul><ul><li>I bet this whole swine flu scare really has Kermit the Frog rethinking his relationship </li></ul><ul><li>What’s next? Three-toed sloth flu? </li></ul>
  10. 10. NLP Analysis <ul><li>Unified Medical Language System (UMLS) </li></ul><ul><ul><li>Medical concepts in semantic types (or classes) </li></ul></ul><ul><li>MetaMap </li></ul><ul><ul><li>Identifies UMLS concepts in text </li></ul></ul><ul><li>SemRep </li></ul><ul><ul><li>Identifies semantic relations between concepts </li></ul></ul><ul><li>Rifampin for tuberculosis </li></ul><ul><ul><li>Rifampin [Pharmacologic Substance] </li></ul></ul><ul><ul><li>TREATS </li></ul></ul><ul><ul><li>Tuberculosis [Disease or Syndrome] </li></ul></ul>
  11. 11. Monitoring Twitter with NLP <ul><li>Processed 1300 Twitter posts </li></ul><ul><ul><li>Known to be about swine flue </li></ul></ul><ul><ul><li>Sent during 1 hour on Monday, April 27, 2009 </li></ul></ul><ul><li>Preprocessed, to accommodate format </li></ul><ul><li>Ran MetaMap and SemRep </li></ul><ul><ul><li>Extracted semantic concepts and relationships </li></ul></ul><ul><li>Defined a semantic schema for influenza epidemic </li></ul>
  12. 12. Schema: UMLS Semantic Types <ul><li>Focus output </li></ul><ul><ul><li>In the area of interest </li></ul></ul><ul><ul><li>And with the components in that area </li></ul></ul><ul><li>Schema for influenza epidemic </li></ul><ul><ul><li>Disease or Syndrome </li></ul></ul><ul><ul><li>Sign or Symptom </li></ul></ul><ul><ul><li>Geographic Area </li></ul></ul><ul><ul><li>Mammal </li></ul></ul><ul><ul><li>Health Care Organization </li></ul></ul><ul><ul><li>Medical Device </li></ul></ul>
  13. 13. MetaMap and SemRep Output <ul><li>Tweet </li></ul><ul><ul><li>Texas confirms third case of swine flu </li></ul></ul><ul><li>Concepts extracted </li></ul><ul><ul><li>Texas [Geographic Area] </li></ul></ul><ul><ul><li>Third [Quantitative Concept] </li></ul></ul><ul><ul><li>Family suidae [Mammal] </li></ul></ul><ul><ul><li>Influenza [Disease or Syndrome] </li></ul></ul><ul><li>Relationship </li></ul><ul><ul><li>Influenza PROCESS_OF Family suidae </li></ul></ul>
  14. 14. Results: Most Frequent Concepts <ul><li>371 Family suidae [Mammal] </li></ul><ul><li>324 Influenza [Disease or Syndrome] </li></ul><ul><li>115 Not [Functional Concept] </li></ul><ul><li>113 Mexico [Geographic Area] </li></ul><ul><li>89 Centers for Disease Control and Prevention (U.S.) [Health Care Related Organization] </li></ul><ul><li>71 Case unit dose [Quantitative Concept] </li></ul><ul><li>54 Time [Temporal Concept] </li></ul><ul><li>53 Pandemics [Phenomenon or Process] </li></ul>
  15. 15. Results: Filtered through Schema <ul><li>Disease or Syndrome: Influenza </li></ul><ul><li>Sign or Symptom: Coughing </li></ul><ul><li>Geographic Area: Mexico </li></ul><ul><li>Mammal: Family suidae </li></ul><ul><li>Health Care Organization: Centers for Disease Control and Prevention (U.S.) </li></ul><ul><li>Medical Device: Mask </li></ul>
  16. 16. Results: PROCESS_OF Relation <ul><li>Influenza PROCESS_OF Family suidae </li></ul><ul><li>Influenza PROCESS_OF Farmer, unspecified </li></ul><ul><li>Influenza PROCESS_OF Hispanics </li></ul><ul><li>Influenza PROCESS_OF Mexican </li></ul><ul><li>Influenza in Birds PROCESS_OF Human </li></ul><ul><li>Influenza-like symptoms PROCESS_OF Passenger </li></ul><ul><li>Flu symptoms PROCESS_OF Family suidae </li></ul><ul><li>Swine influenza PROCESS_OF Family suidae </li></ul>
  17. 17. Next Steps <ul><li>Twitter access </li></ul><ul><li>Further testing for effectiveness </li></ul><ul><li>Refine filters (frequency, semantic types) </li></ul><ul><ul><li>User control </li></ul></ul><ul><li>Implement proof-of-concept </li></ul><ul><ul><li>Preprocessing for tweet format </li></ul></ul><ul><ul><li>NLP </li></ul></ul><ul><ul><li>Final filtering </li></ul></ul><ul><li>Output format </li></ul><ul><ul><li>Graphs </li></ul></ul>
  18. 18. Opportunities <ul><li>Biosurveillance </li></ul><ul><li>Monitoring of wide-spread sentiment </li></ul><ul><li>Targeted information provision </li></ul><ul><ul><li>Respond to misinformation trends </li></ul></ul><ul><li>Potential for evaluating authenticity </li></ul><ul><ul><li>Semantic comparison to trusted source </li></ul></ul>
  19. 19. Conclusion <ul><li>Exploiting the Internet for disaster preparedness </li></ul><ul><li>Assessing public sentiment and events </li></ul><ul><li>Leveraging social media, e.g. Twitter </li></ul><ul><li>Using semantic NLP </li></ul><ul><li>Useful to CDC and other government agencies </li></ul><ul><li>Proof-of-concept experiment suggests the viability of this approach </li></ul>
  1. ¿Le ha llamado la atención una diapositiva en particular?

    Recortar diapositivas es una manera útil de recopilar información importante para consultarla más tarde.

×