Invited lecture at Emory University Rollins School of Public Health. We presented our InSTEDD global early warning and response social platform; Evolve (http://instedd.org/evolve) with live
Invited lecture at Emory University Rollins School of Public Health. We presented our InSTEDD global early warning and response social platform; Evolve (http://instedd.org/evolve) with live demonstration.
Biosurveillance 2.0 Collaboration and Web 2.0/3.0 Semantic Technologies for Better Early Disease Warning and Effective Response Taha Kass-Hout Nicolás di Tada Invited by Dr. Barbara Massoudi, PhD, MPH Lecture at Emory University Rollins School of Public Health Public Health Informatics, INFO 503 Atlanta, GA, USA
automatically detecting anomalous patterns in data (often a single stream)
rely on human input and judgment
incorporate temporal , spatial , and multivariate information
Traditional DISEASE SURVEILLANCE 9/20, 15213, cough/cold, … 9/21, 15207, antifever, … 9/22, 15213, CC = cough, ... 1,000,000 more records… Huge mass of data Detection algorithm “ What are we supposed to do with this?” Too many alerts The Problem
MODERN DISEASE SURVEILLANCE 9/20, 15213, cough/cold, … 9/21, 15207, antifever, … 9/22, 15213, CC = cough, ... 1,000,000 more records… Huge mass of data Feedback loop Our Approach Fewer and more actionable alerts Effective and coordinated response
Evolve: Main Components Feature extraction, reference and baseline information Tags Multiple Data Streams User-Generated and Machine Learning Metadata Comments Spatio-temporal Flags/Alerts/Bookmarks Evolve Bot Event Classification, Characterization and Detection Previous Event Training Data Previous Event Control Data Metadata extraction Machine learning Social network Professional feedback Anomaly detection Collaborative Spaces Hypotheses generation esting Our Solution
(3) Bayesian Statistics Probability of disease A (flu) once symptom B (fever) is observed Probability of fever once flu is confirmed Probability of flu (prior or marginal) Probability of fever (prior or marginal) Our Solution
Result of incorporating all 5 techniques: Improved Surveillance Our Solution
Our Solution InSTEDD Evolve Related items (e.g., News articles) are grouped into a thread. Threads are later associated with events (hypothesized or confirmed). InSTEDD Evolve : ( http://instedd.org/evolve ) Tag cloud and semantic heatmap
Our Solution InSTEDD Evolve InSTEDD Evolve : ( http://instedd.org/evolve ) Filter feature which automatically filters for related items, updates the map and associated tags
Our Solution InSTEDD Evolve InSTEDD Evolve : ( http://instedd.org/evolve ) Auto-generated (machine-learning) tags. These tags are semantically ranked (a statistical probability match). Users can further train the classifier by accepting or rejecting a suggestion. Users can similarly train the geo-locator by simply accepting or rejecting and updating a location.
Our Solution InSTEDD Evolve InSTEDD Evolve : ( http://instedd.org/evolve ) Tracking the recent Avian Influenza Outbreak in Egypt (reports started to appear late January 2009). Notice the pattern of reported incidents along the Nile river.
Collaborative Analytics and Environment for Linking Early Health-Related Event Detection to an Effective Response ( http://taha.instedd.org/2008/09/collaborative-analytics-and-environment.html )
ALPACA "ALPACA Light Parsing And Classifying Application (ALPACA) is a classifying tool designed for use in community-oriented software as well as in Academia. The application consists of two parts: a parsing tool for transforming raw documents into readable data, and a classifying tool for categorizing documents into user-provided classes. The application provides a user-friendly interface and a Plug-in functionality to provide a simple way to add more parsers/classifiers to the application." http://2008.hfoss.org/ALPACA
Weka An open source "...collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes." http://www.cs.waikato.ac.nz/~ml/weka/
The R Project for statistical computing: http://www.r-project.org
Surveillance Project: An Open Source R-package disease surveillance framework for "...the development and the evaluation of outbreak detection algorithms in univariate and multivariate routine collected public health surveillance data." http://surveillance.r-forge.r-project.org
The R package surveillance in Höhle (multiple articles)
Google's Research Publications: MapReduce Simplified Data Processing on Large Clusters ( http://labs.google.com/papers/mapreduce.html )
Hadoop : a software platform that lets one easily write and run applications that process vast amounts of data ( http://hadoop.apache.org/core )