Your SlideShare is downloading. ×
Aggregation of online media reports  for global infectious disease intelligence Clark Freifeld Research Software Developer...
Early reporting of SARS  Nov 2002 Mar 2003 Progression of outbreak Electronic Surveillance Cases of atypical pneumonia  Fo...
<ul><li>Traditional Surveillance </li></ul><ul><li>Lack of infrastructure </li></ul><ul><li>Low level training </li></ul><...
Source of outbreak news verified by WHO Adapted from Heymann 2001
Limitations of Web-based surveillance <ul><li>Abundance of resources but none comprehensive </li></ul><ul><li>Information ...
www.healthmap.org
HealthMap Objectives <ul><li>Automated, real-time, multi-stream </li></ul><ul><li>Supplement existing clinical and public ...
HealthMap Article Processing EXTRACTION 8 Feeds;>10,000 sites  Every hour; 24/7 TEXT MINING 1500 disease patterns  4000 lo...
 
 
 
 
Emerging Disease Surveillance Current Lyme disease  (Brownstein et. al. Env Health Perspectives) 2020 2050 2080 West Nile ...
 
Implementation Background <ul><li>Came about as a combination of new software tools and existing epidemiology challenges <...
Evolving Systems and Datasets - Challenges <ul><li>Until now, focus has been knowledge management </li></ul><ul><li>Method...
Data Quality <ul><li>News Sources </li></ul><ul><ul><li>Local </li></ul></ul><ul><ul><li>National </li></ul></ul><ul><ul><...
Data Quality <ul><li>News Sources </li></ul><ul><ul><li>Local </li></ul></ul><ul><ul><li>National </li></ul></ul><ul><ul><...
Data Quality <ul><li>News Sources </li></ul><ul><ul><li>Local </li></ul></ul><ul><ul><li>National </li></ul></ul><ul><ul><...
Data Quality <ul><li>Clickstream/Keyword Searching </li></ul><ul><li>Blogs/Chatrooms </li></ul><ul><li>News Sources </li><...
Alert Volume by Source <ul><li>Google News: 3194 (22.8 per day) </li></ul><ul><li>ProMED: 985 (7.0 per day) </li></ul><ul>...
Multi-Stream Alarming: Heat Index <ul><li>Meta-alert composite score, based on </li></ul><ul><ul><li>Number of sources pro...
www.healthmap.org
Geographic Representation <ul><li>Alerts by country </li></ul><ul><ul><li>1-USA: 4351 </li></ul></ul><ul><ul><li>2-UK: 101...
Multi-lingual Surveillance
 
Coverage Comparison: Argentina <ul><li>English News </li></ul><ul><ul><li>Bovine Anthrax </li></ul></ul><ul><ul><li>Citrus...
Coverage Comparison: Argentina <ul><li>Spanish News </li></ul><ul><ul><li>Trichinosis </li></ul></ul><ul><ul><li>Bronchiol...
Case Study: Legionnaire’s in Spain  June 30th Google (ES) Alert Alert #1 July 2nd ProMED-mail Alert Alert #2 July 4th Goog...
Early Stats <ul><li>150 alerts per day </li></ul><ul><li>> 34,000 alerts so far </li></ul><ul><li>Alerts in 201 countries ...
Usage <ul><li>500-600 visits per day </li></ul><ul><li>80,000 unique visitors since 9/06 launch  </li></ul><ul><li>Top vis...
Public Health Resource
Various implementations International Society for Infectious Disease Liberty Science Museum, NYC HHS Command Center
Tool for general population
Future Directions <ul><li>Improve existing filtering algorithms </li></ul><ul><li>More sensitive, noisy sources </li></ul>...
Acknowledgments <ul><li>Children’s Hospital Informatics Program </li></ul><ul><li>@ Harvard-MIT HST </li></ul><ul><li>John...
Contact <ul><li>[email_address] </li></ul><ul><li>www.healthmap.org </li></ul><ul><li>www.chip.org </li></ul>
Upcoming SlideShare
Loading in...5
×

HealthMap.org: Aggregation of Online Media Reports for Global Infectious Disease Intelligence / Forum One Web Executive Seminar

7,573

Published on

Clark Freifeld, co-creator of HealthMap.org, discusses the potential of his Google Map mashup of publically-available RSS feeds (and tools like it) for improving the early reporting of infections diseases around the world. More information at http://ow.ly/oYll . Contact: Suzanne Rainey / srainey@ForumOne.com .

Published in: Technology, Health & Medicine
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
7,573
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide
  • Surveillance sans frontières: Internet-based emerging infectious disease intelligence
  • Transcript of "HealthMap.org: Aggregation of Online Media Reports for Global Infectious Disease Intelligence / Forum One Web Executive Seminar"

    1. 1. Aggregation of online media reports for global infectious disease intelligence Clark Freifeld Research Software Developer Harvard Medical School Children’s Hospital Informatics Program Harvard-MIT Division of Health Sciences and Technology
    2. 2. Early reporting of SARS Nov 2002 Mar 2003 Progression of outbreak Electronic Surveillance Cases of atypical pneumonia Foshan Nov 16th Infected Chinese Doctor Hong Kong hotel Feb 21st 305 Cases of acute resp Guangdong Province Feb 11th Pharma report Guangdong Province November 27 Media reports Guangdong Province Feb 10 Astute physician on ProMED Feb 10 Initial WHO Report Feb 25 Official WHO Report March 10
    3. 3. <ul><li>Traditional Surveillance </li></ul><ul><li>Lack of infrastructure </li></ul><ul><li>Low level training </li></ul><ul><li>Gaps in coverage </li></ul><ul><li>Poor information flow </li></ul><ul><li>Internet-based Surveillance </li></ul><ul><li>Abundant cheap/free resource </li></ul><ul><li>Detailed local information </li></ul><ul><li>Near real-time reporting </li></ul><ul><li>Less susceptible to political pressure </li></ul>
    4. 4. Source of outbreak news verified by WHO Adapted from Heymann 2001
    5. 5. Limitations of Web-based surveillance <ul><li>Abundance of resources but none comprehensive </li></ul><ul><li>Information is unstructured -- free text </li></ul><ul><li>Each has geographic, expertise, population gaps </li></ul><ul><li>Lack of integration between tools and information sources </li></ul><ul><li>No synthesized view of the current state of global health </li></ul>Brownstein et al. Institute of Medicine. 2007.
    6. 6. www.healthmap.org
    7. 7. HealthMap Objectives <ul><li>Automated, real-time, multi-stream </li></ul><ul><li>Supplement existing clinical and public health systems </li></ul><ul><li>Free and open resource </li></ul><ul><li>Serve the public as well as professionals </li></ul>
    8. 8. HealthMap Article Processing EXTRACTION 8 Feeds;>10,000 sites Every hour; 24/7 TEXT MINING 1500 disease patterns 4000 location patterns BAYES FILTERING >5 million phrases 90-94% accuracy DUPLICATE ID Text Matching Similarity Score
    9. 13. Emerging Disease Surveillance Current Lyme disease (Brownstein et. al. Env Health Perspectives) 2020 2050 2080 West Nile virus (Brownstein et. al. Emerging Infectious Diseases)
    10. 15. Implementation Background <ul><li>Came about as a combination of new software tools and existing epidemiology challenges </li></ul><ul><li>Linux, Apache, MySQL, PHP </li></ul><ul><ul><li>All free, open source tools, widely available </li></ul></ul><ul><li>Prototype early </li></ul><ul><ul><li>Start simple, think big </li></ul></ul>
    11. 16. Evolving Systems and Datasets - Challenges <ul><li>Until now, focus has been knowledge management </li></ul><ul><li>Methods for analyzing news are currently under-developed </li></ul><ul><li>Unknown data characteristics: geographic, population, availability </li></ul><ul><li>Little assessment – sensitivity, specificity, signal:noise, timeliness </li></ul> Evaluating statistical characteristics of data is a first step
    12. 17. Data Quality <ul><li>News Sources </li></ul><ul><ul><li>Local </li></ul></ul><ul><ul><li>National </li></ul></ul><ul><ul><li>International </li></ul></ul>Specificity Reliability Timeliness
    13. 18. Data Quality <ul><li>News Sources </li></ul><ul><ul><li>Local </li></ul></ul><ul><ul><li>National </li></ul></ul><ul><ul><li>International </li></ul></ul>Timeliness Specificity Reliability
    14. 19. Data Quality <ul><li>News Sources </li></ul><ul><ul><li>Local </li></ul></ul><ul><ul><li>National </li></ul></ul><ul><ul><li>International </li></ul></ul><ul><li>Mailing lists (ProMED) </li></ul><ul><li>Multi-national surveillance (Eurosurveillance) </li></ul><ul><li>Validated official global alerts (WHO) </li></ul>Timeliness Specificity Reliability
    15. 20. Data Quality <ul><li>Clickstream/Keyword Searching </li></ul><ul><li>Blogs/Chatrooms </li></ul><ul><li>News Sources </li></ul><ul><ul><li>Local </li></ul></ul><ul><ul><li>National </li></ul></ul><ul><ul><li>International </li></ul></ul><ul><li>Mailing lists (ProMED) </li></ul><ul><li>Multi-national surveillance (Eurosurveillance) </li></ul><ul><li>Validated official global alerts (WHO) </li></ul>Timeliness Specificity Reliability
    16. 21. Alert Volume by Source <ul><li>Google News: 3194 (22.8 per day) </li></ul><ul><li>ProMED: 985 (7.0 per day) </li></ul><ul><li>WHO: 45 (0.32 per day) </li></ul>
    17. 22. Multi-Stream Alarming: Heat Index <ul><li>Meta-alert composite score, based on </li></ul><ul><ul><li>Number of sources providing information at a particular location </li></ul></ul><ul><ul><li>Recentness of alert </li></ul></ul><ul><li>Marker algorithm </li></ul><ul><ul><li>Exponentially weighted alerts </li></ul></ul><ul><ul><li>Increase heat (redness) for more recent event and higher impact </li></ul></ul>low high
    18. 23. www.healthmap.org
    19. 24. Geographic Representation <ul><li>Alerts by country </li></ul><ul><ul><li>1-USA: 4351 </li></ul></ul><ul><ul><li>2-UK: 1018 </li></ul></ul><ul><ul><li>3-Canada: 880 </li></ul></ul><ul><ul><li>4-China:737 </li></ul></ul>
    20. 25. Multi-lingual Surveillance
    21. 27. Coverage Comparison: Argentina <ul><li>English News </li></ul><ul><ul><li>Bovine Anthrax </li></ul></ul><ul><ul><li>Citrus Canker </li></ul></ul>
    22. 28. Coverage Comparison: Argentina <ul><li>Spanish News </li></ul><ul><ul><li>Trichinosis </li></ul></ul><ul><ul><li>Bronchiolitis </li></ul></ul><ul><ul><li>Rotavirus </li></ul></ul><ul><ul><li>Influenza </li></ul></ul>
    23. 29. Case Study: Legionnaire’s in Spain June 30th Google (ES) Alert Alert #1 July 2nd ProMED-mail Alert Alert #2 July 4th Google (EN) Alert Alert #3
    24. 30. Early Stats <ul><li>150 alerts per day </li></ul><ul><li>> 34,000 alerts so far </li></ul><ul><li>Alerts in 201 countries </li></ul><ul><li>169 disease categories </li></ul>
    25. 31. Usage <ul><li>500-600 visits per day </li></ul><ul><li>80,000 unique visitors since 9/06 launch </li></ul><ul><li>Top visitors: </li></ul><ul><ul><li>dhs.gov </li></ul></ul><ul><ul><li>cdc.gov </li></ul></ul><ul><ul><li>state.fl.us </li></ul></ul><ul><ul><li>reinhartfoodservice.com </li></ul></ul><ul><ul><li>state.id.us </li></ul></ul>
    26. 32. Public Health Resource
    27. 33. Various implementations International Society for Infectious Disease Liberty Science Museum, NYC HHS Command Center
    28. 34. Tool for general population
    29. 35. Future Directions <ul><li>Improve existing filtering algorithms </li></ul><ul><li>More sensitive, noisy sources </li></ul><ul><li>More filters: number of cases, species affected </li></ul><ul><li>More languages </li></ul><ul><li>Other areas: </li></ul><ul><ul><li>Environmental health </li></ul></ul><ul><ul><li>Chronic disease </li></ul></ul><ul><ul><li>Violence, conflict zones </li></ul></ul><ul><ul><li>Pharmaceuticals </li></ul></ul><ul><li>Your ideas </li></ul>
    30. 36. Acknowledgments <ul><li>Children’s Hospital Informatics Program </li></ul><ul><li>@ Harvard-MIT HST </li></ul><ul><li>John Brownstein, PhD </li></ul><ul><li>Ken Mandl, MD MPH </li></ul><ul><li>Ben Reis, PhD </li></ul><ul><li>Mikaela Keller, PhD </li></ul><ul><li>Isaac Kohane, MD PhD </li></ul><ul><li>Carlo Venis (Wabash) </li></ul><ul><li>Roger Araujo (Peru NMRCD) </li></ul><ul><li>David Blazes (Peru NMRCD) </li></ul><ul><li>Aranka Anema (UBC) </li></ul><ul><li>Larry Madoff (ProMED) </li></ul><ul><li>Funding </li></ul><ul><li>Google Foundation </li></ul><ul><li>National Library of Medicine (NLM) </li></ul><ul><li>Centers for Disease Control and Prevention </li></ul><ul><li>Canadian Institutes of Health Research (CIHR) </li></ul>
    31. 37. Contact <ul><li>[email_address] </li></ul><ul><li>www.healthmap.org </li></ul><ul><li>www.chip.org </li></ul>

    ×