Biosurveillance 2.0 Collaboration and Web 2.0/3.0 Semantic Technologies for Better Early Disease Warning and Effective Response Taha Kass-Hout Nicolás di Tada
Background
Late Detection and Response DAY CASES Opportunity for control Background
Early Detection and Response DAY CASES Opportunity for control Background
Public Health Measures
Representativeness
Completeness
Predictive Value
Timeliness
Background
Public Health Measures 1000 Malaria infections (100%) 50 Malaria notifications (5%) Specificity / Reliability Sensitivity / Timeliness
Main attributes
Representativeness
Completeness
Predictive value positive
Background Get as close to the bottom of the pyramid as possible Urge frequent reporting: Weekly daily immediately
Public Health Measures Analyze and interpret Automated analysis/ thresholds Time
Main attributes
Timeliness
Health care hotline Background Signal as early as possible
The Problem Space
Current systems design, analysis and evaluation has been geared towards specific data sources and detection algorithms – not humans
We have systems in place for those threats we have been faced with before
The Problem
Traditional DISEASE SURVEILLANCE
In the past two decades focus was on
automatically detecting anomalous patterns in data (often a single stream)
Modern methods
rely on human input and judgment
incorporate temporal , spatial , and multivariate information
The Problem
Traditional DISEASE SURVEILLANCE 9/20, 15213, cough/cold, … 9/21, 15207, antifever, … 9/22, 15213, CC = cough, ... 1,000,000 more records… Huge mass of data Detection algorithm “ What are we supposed to do with this?” Too many alerts The Problem
Our Approach
Human-based
Collaborative and cross-disciplinary
Web 2.0/3.0 platform
Our Approach
Information Sources
Event-based - ad-hoc unstructured reports issued by formal or informal sources
Indicator-based - (number of cases, rates, proportion of strains…)
MODERN DISEASE SURVEILLANCE 9/20, 15213, cough/cold, … 9/21, 15207, antifever, … 9/22, 15213, CC = cough, ... 1,000,000 more records… Huge mass of data Feedback loop Our Approach Fewer and more actionable alerts Effective and coordinated response
Evolve: Main Components Feature extraction, reference and baseline information Tags Multiple Data Streams User-Generated and Machine Learning Metadata Comments Spatio-temporal Flags/Alerts/Bookmarks Evolve Bot Event Classification, Characterization and Detection Previous Event Training Data Previous Event Control Data Metadata extraction Machine learning Social network Professional feedback Anomaly detection Collaborative Spaces Hypotheses generation esting Our Solution
Evolve: Main Components Our Solution
Evolve: Process Item Hypothesis Field Actions and Verifications Feedback / Confirmation Our Solution Item Item Item Item Item Item Item Item
How to represent a document: cold fever Our Solution
(1) Classifiers: Problem Definition
Map items to vectors (Feature extraction)
Normalize those vectors
Train the classifier
Measure the results with new information
Feedback the classifier
Separate classes in feature space
Our Solution
Classifiers: Support Vector Machines (SVM) Our Solution
SVM – Margin Maximization
Support vectors define the separator
Our Solution
SVM – Non-linear? Φ : x -> φ ( x ) Map to higher-dimension space Our Solution
SVM – Filtering or classifying Classifier Document 1 Document 2 Document 3 Positives Negatives Training Document Training Document Our Solution
(2) Clustering: Problem Definition
Map items to vectors (Feature extraction)
Normalization
Agglomerative or Partitional
Our Solution
Clustering: AGGLOMERATIVE Our Solution
Clustering: PARTITIONAL Our Solution
(3) Bayesian Statistics Probability of disease A (flu) once symptom B (fever) is observed Probability of fever once flu is confirmed Probability of flu (prior or marginal) Probability of fever (prior or marginal) Our Solution
(4) Neural Networks
Given a set of stimuli, train a system to produce a given output…
Result of incorporating all 5 techniques: Improved Surveillance Our Solution
Our Solution InSTEDD Evolve Related items (e.g., News articles) are grouped into a thread. Threads are later associated with events (hypothesized or confirmed). InSTEDD Evolve : ( http://instedd.org/evolve ) Tag cloud and semantic heatmap
Our Solution InSTEDD Evolve InSTEDD Evolve : ( http://instedd.org/evolve ) Filter feature which automatically filters for related items, updates the map and associated tags
Our Solution InSTEDD Evolve InSTEDD Evolve : ( http://instedd.org/evolve ) Auto-generated (machine-learning) tags. These tags are semantically ranked (a statistical probability match). Users can further train the classifier by accepting or rejecting a suggestion. Users can similarly train the geo-locator by simply accepting or rejecting and updating a location.
Our Solution InSTEDD Evolve InSTEDD Evolve : ( http://instedd.org/evolve ) Tracking the recent Avian Influenza Outbreak in Egypt (reports started to appear late January 2009). Notice the pattern of reported incidents along the Nile river.
Acknowledgements
Through funding from:
Thank You!
Taha Kass-Hout
Nicolás di Tada
BACKGROUND MATERIAL
Index
Disease surveillance References
Computing
Automating Laboratory Reporting
Using EMR data for disease surveillance
Related Projects
Misc Readings
Open Source Software (OSS) References
Open Source License References
Open Source References
Open Source and Public Health References
Architectural Matters
Service Oriented Architecture (or SOA)
Synchronization Architecture
Cloud Architecture
DISEASE SURVEILLANCE
References and Related-Efforts
REFERENCES
Izadi, M. and Buckeridge, D., Decision Theoretic Analysis of Improving Epidemic Detection, AMIA 2007, Symposium Proceedings 2007
EpiNorth-Based material ( http://www.epinorth.org ):
Epidemic Intelligence: Signals from surveillance systems, Anne Mazick, Statens Serum Institut, Denmark, EpiTrain III, Jurmala, August 2006
Daniel Neil, Incorporating Learning into Disease Surveillance Systems
REFERENCES
Computing
The Future of Statistical Computing in Wilkinson (2008)
Complex Event Processing Over Uncertain Data in Wasserkrug (2008)
Outbreak detection through automated surveillance A review of the determinants of detection in Buckeridge (2007)
Approaches to the evaluation of outbreak detection methods in Watkins (2006)
Algorithms for rapid outbreak detection a research synthesis Buckeridge (2004)
Data mining in bioinformatics using Weka in Frank (2004)
Aho-Corasick Algorithm in Kilpeläinen
Automating Laboratory Reporting
Automatic Electronic Laboratory-Based Reporting in Panackal (2002)
Benefits and Barriers to Electronic Laboratory Results Reporting for Notifiable Diseases in Nguyen (2007)
REFERENCES
Using EMR Data for Disease Surveillance
Using Electronic Medical Records to Enhance Detection and Reporting of Vaccine Adverse Events in Hinrichsen (2007)
Electronic Medical Record Support for PH in Klompas (2007)
A knowledgebase to support notifiable disease surveillance in Doyle (2005)
Automated Detection of Tuberculosis Using Electronic Medical Record Data in Calderwood (2007)
Misc Readings
Breakthrough in modeling emerging disease hotspots in Jones (2008)
Use of data mining techniques to investigate disease risk classification as a proxy for compromised biosecurity of cattle herds in Wales in Ortiz-Pelaez (2008)
Tag Decay: A View Over Aging Folksonomy in Russell (2007)
Cloudalicious: Folksonomy Over Time in Russell (2006)
RELATED PROJECTS
InSTEDD Evolve : ( http://instedd.org/evolve )
Collaborative Analytics and Environment for Linking Early Health-Related Event Detection to an Effective Response ( http://taha.instedd.org/2008/09/collaborative-analytics-and-environment.html )
ALPACA "ALPACA Light Parsing And Classifying Application (ALPACA) is a classifying tool designed for use in community-oriented software as well as in Academia. The application consists of two parts: a parsing tool for transforming raw documents into readable data, and a classifying tool for categorizing documents into user-provided classes. The application provides a user-friendly interface and a Plug-in functionality to provide a simple way to add more parsers/classifiers to the application." http://2008.hfoss.org/ALPACA
Weka An open source "...collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes." http://www.cs.waikato.ac.nz/~ml/weka/
RELATED PROJECTS
The R Project for statistical computing: http://www.r-project.org
Surveillance Project: An Open Source R-package disease surveillance framework for "...the development and the evaluation of outbreak detection algorithms in univariate and multivariate routine collected public health surveillance data." http://surveillance.r-forge.r-project.org
The R package surveillance in Höhle (multiple articles)
Google's Research Publications: MapReduce Simplified Data Processing on Large Clusters ( http://labs.google.com/papers/mapreduce.html )
Hadoop : a software platform that lets one easily write and run applications that process vast amounts of data ( http://hadoop.apache.org/core )
Open Source Development for Public Health: A Primer with Examples of Existing Enterprise Ready Open Source Applications in Turner (2006)
A Quick Survey of Open Source Software for Public Health Organizations in Mirabito and Kass-Hout (2007)
ARCHITECTURAL MATTERS
References and Related-Efforts
REFERENCES
Service Oriented Architecture (or SOA)
Proposal for Fulfilling Strategic Objectives of the U.S. Roadmap for National Action on Decision Support through a Service—oriented Architecture Leveraging HL7 Services in Kawamoto (2007)
Service-oriented Architecture in Medical Software: Promises and Perils in Nadkarni (2007)
0 comments
Post a comment