Semantic Processing of Twitter Traffic for Epidemic Surveillance

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    6 Favorites

    Semantic Processing of Twitter Traffic for Epidemic Surveillance - Presentation Transcript

    1. Semantic Processing of Twitter Traffic for Epidemic Surveillance David S. Hale, Alla Keselman, Thomas C. Rindflesch Lister Hill National Center for Biomedical Communications Specialized Information Services
    2. Pandemic Preparedness
      • Early detection is critical to effective response
      • “ The truth is out there, it is just not indexed well”
        • (A bumper sticker; NLM parking lot)
      • Disaster information traffic: delays, loss, overload
      • Outbreaks data requires fast collection / dissemination
        • Collection – from disease to syndromic surveillance
        • Dissemination – from formal announcements to informal channels
      • Government agencies are entering web 2.0 innovations
        • E.g., CDC on Twitter
    3. Monitoring the Internet for Syndromic Surveillance
      • Current methods: keywords analysis
        • News
          • Aggregator and visualization tools (e.g., HealthMap)
        • Web searches - queries
          • Google Trends; Google Flu Trends
          • Brownstein et al. – peak for “food poisoning” preceded peak for “salmonella”, “peanut butter”, “recall”
      • Requires massive amounts of data
      • Ambiguous as to searchers’ precise information needs
    4. Distribution of “swine flu” google query
    5. The Future of Syndromic Surveillance
      • Social media, chatter
        • Blogs , “Microblogging”
        • More real-time data
        • Monitor sentiment as well as events
      • NLP analysis
        • Requires less data / lower computational intensity
        • More informative
          • “ swine flu” and “travel” VS. “how fast swine flu travels” and “is it safe to travel during a swine flu epidemic”
    6. Twitter
      • Micro-blogging service
      • SMS gateway enables posting from mobile devices
        • Users post without breaking context or setting
        • JIT (just-in-time) blogging
      • API promotes community development of user experience and interaction
      • 4-5 million users (Nov 2008)
      • 17 million visitors (April 2009)
    7. Tweet Characteristics
      • Format: [username] [text] [date time client]
      • Length: Text limited to 140 characters
      • Char Set: Not limited to ISO 8859-1 Western (Latin)
      • Grammaticality: Variable
      • Hashtags (#): Denote topics
        • Primarily utilized by experienced users
    8. Tweet Content
      • Some provide (purported) information
        • Authority not determined
      • Majority express opinions
        • Often with humor or sarcasm
      • Value for syndromic surveillance
        • Source for assessing public sentiment
        • Observation of information trending
        • As a guide for government action
    9. Tweets: Examples
      • CDC tips for preventing the flu: wash hands often and stay home when sick
      • Oklahoma health officials say swine flu headed to state, public needs to take precautions
      • Napolitano says “not a pandemic” yet
      • I bet this whole swine flu scare really has Kermit the Frog rethinking his relationship
      • What’s next? Three-toed sloth flu?
    10. NLP Analysis
      • Unified Medical Language System (UMLS)
        • Medical concepts in semantic types (or classes)
      • MetaMap
        • Identifies UMLS concepts in text
      • SemRep
        • Identifies semantic relations between concepts
      • Rifampin for tuberculosis
        • Rifampin [Pharmacologic Substance]
        • TREATS
        • Tuberculosis [Disease or Syndrome]
    11. Monitoring Twitter with NLP
      • Processed 1300 Twitter posts
        • Known to be about swine flue
        • Sent during 1 hour on Monday, April 27, 2009
      • Preprocessed, to accommodate format
      • Ran MetaMap and SemRep
        • Extracted semantic concepts and relationships
      • Defined a semantic schema for influenza epidemic
    12. Schema: UMLS Semantic Types
      • Focus output
        • In the area of interest
        • And with the components in that area
      • Schema for influenza epidemic
        • Disease or Syndrome
        • Sign or Symptom
        • Geographic Area
        • Mammal
        • Health Care Organization
        • Medical Device
    13. MetaMap and SemRep Output
      • Tweet
        • Texas confirms third case of swine flu
      • Concepts extracted
        • Texas [Geographic Area]
        • Third [Quantitative Concept]
        • Family suidae [Mammal]
        • Influenza [Disease or Syndrome]
      • Relationship
        • Influenza PROCESS_OF Family suidae
    14. Results: Most Frequent Concepts
      • 371 Family suidae [Mammal]
      • 324 Influenza [Disease or Syndrome]
      • 115 Not [Functional Concept]
      • 113 Mexico [Geographic Area]
      • 89 Centers for Disease Control and Prevention (U.S.) [Health Care Related Organization]
      • 71 Case unit dose [Quantitative Concept]
      • 54 Time [Temporal Concept]
      • 53 Pandemics [Phenomenon or Process]
    15. Results: Filtered through Schema
      • Disease or Syndrome: Influenza
      • Sign or Symptom: Coughing
      • Geographic Area: Mexico
      • Mammal: Family suidae
      • Health Care Organization: Centers for Disease Control and Prevention (U.S.)
      • Medical Device: Mask
    16. Results: PROCESS_OF Relation
      • Influenza PROCESS_OF Family suidae
      • Influenza PROCESS_OF Farmer, unspecified
      • Influenza PROCESS_OF Hispanics
      • Influenza PROCESS_OF Mexican
      • Influenza in Birds PROCESS_OF Human
      • Influenza-like symptoms PROCESS_OF Passenger
      • Flu symptoms PROCESS_OF Family suidae
      • Swine influenza PROCESS_OF Family suidae
    17. Next Steps
      • Twitter access
      • Further testing for effectiveness
      • Refine filters (frequency, semantic types)
        • User control
      • Implement proof-of-concept
        • Preprocessing for tweet format
        • NLP
        • Final filtering
      • Output format
        • Graphs
    18. Opportunities
      • Biosurveillance
      • Monitoring of wide-spread sentiment
      • Targeted information provision
        • Respond to misinformation trends
      • Potential for evaluating authenticity
        • Semantic comparison to trusted source
    19. Conclusion
      • Exploiting the Internet for disaster preparedness
      • Assessing public sentiment and events
      • Leveraging social media, e.g. Twitter
      • Using semantic NLP
      • Useful to CDC and other government agencies
      • Proof-of-concept experiment suggests the viability of this approach

    + Specialized Information Services, U.S. National Library of MedicineSpecialized Information Services, U.S. National Library of Medicine, 5 months ago

    custom

    1246 views, 6 favs, 3 embeds more stats

    overview of initial research, utilizing semantic na more

    More info about this document

    CC Attribution License

    Go to text version

    • Total Views 1246
      • 1234 on SlideShare
      • 12 from embeds
    • Comments 0
    • Favorites 6
    • Downloads 0
    Most viewed embeds
    • 9 views on http://www.slideshare.net
    • 2 views on http://82.130.119.56
    • 1 views on http://edgarvidela.blogspot.com

    more

    All embeds
    • 9 views on http://www.slideshare.net
    • 2 views on http://82.130.119.56
    • 1 views on http://edgarvidela.blogspot.com

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories