Riff: A Social Network and Collaborative Platform for Public Health Disease Surveillance
Upcoming SlideShare
Loading in...5
×
 

Riff: A Social Network and Collaborative Platform for Public Health Disease Surveillance

on

  • 4,191 views

A hybrid (event-based and indicator-based) platform designed to streamline the collaboration between domain experts and machine learning algorithms for detection, prediction and response to ...

A hybrid (event-based and indicator-based) platform designed to streamline the collaboration between domain experts and machine learning algorithms for detection, prediction and response to health-related events (such as disease outbreaks or pandemics). The platform helps synthesize health-related event indicators from a wide variety of information sources (structured and unstructured) into a consolidated picture for analysis, maintenance of “community-wide coherence”, and collaboration processes. The platform offers features to detect anomalies, visualize clusters of potential events, predict the rate and spread of a disease outbreak and provide decision makers with tools, methodologies and processes to investigate the event.

Statistics

Views

Total Views
4,191
Views on SlideShare
3,814
Embed Views
377

Actions

Likes
3
Downloads
124
Comments
0

18 Embeds 377

http://kasshout.blogspot.com 241
http://www.slideshare.net 115
http://kasshout.blogspot.ca 2
http://feeds.feedburner.com 2
http://www.linkedin.com 2
http://kasshout.blogspot.pt 2
http://kasshout.blogspot.fr 2
http://kasshout.blogspot.tw 1
http://translate.googleusercontent.com 1
http://kasshout.blogspot.nl 1
http://globemedum.blogspot.com 1
http://www.mefeedia.com 1
http://kasshout.blogspot.de 1
http://kasshout.blogspot.com.ar 1
http://77.68.57.130:8011 1
http://kasshout.blogspot.in 1
http://kasshout.blogspot.com.au 1
http://kasshout.blogspot.it 1
More...

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Old ideas: Crows recognized for divination in Roman times: A crucial component of the US West Nile Virus control program New technologies must: Bring multiple disciplines together Offer a collaborative and Open source model OUR MODEL: Commercial models rely on competition to drive innovation. Their tools fail at the edge where there is no market to drive success. Non-profits know the “edge” challenges, but lack the resources for technical innovation We recognize our success will be measured by effective adoption at both the edge and the center. And it has to be open-source and free. We’ve decided to rely on environmental forces (rather than a market) to drive innovation. And it works.
  • Our track record: HIV pandemic Rift valley fever FMD pandemic West Nile Virus in the US SARS Monkeypox No room for complacency!!!
  • Early detection of disease outbreaks is the holy grail of public health, and has now also become a crucial issue for governments facing the threat of bioterrorism. OUR BIG PICTURE: We want to help people detect things early, connect people with each other so they can respond sooner
  • It is not necessarily lack of information… we have a lot of information… rather, can we put the information into intelligence (or context) in a timely manner? Multiple streams include the following- say something about why you need to stitch multiple sources together... How do you put an event into context? And, where is the next disease is going to emerge from... that is the holly grail in this business... Dead crows on the streets of NYC Pepto-bismol disappearing from the shelves of grocery stores Phone calls from citizens and the media to the health department about increased absenteeism from schools and businesses Increased Internet search hits on certain terms per week Image Source: Dead Crow: http://www.birds.cornell.edu/crows/images/deadcrow.jpg Empty Shelves: http://farm3.static.flickr.com/2029/2239605500_6ef2fd2295.jpg?v=0 Sidebar: 5/50 rule, in 5 years time, 50% of all content will be user-generated: (Reference: The Podshow by Ron Bloom (http://www.ronbloom.com/?p=11) 60% content has geo-spatial and temporal aspects… Image Sources: Wikipedia: http://www.citris-uc.org/system/files/imce-u10/Wikipedia-logo.png Blogger: http://z.about.com/d/weblogs/1/5/V/-/-/-/BloggerHomePage.PNG OpenMRS: http://ruddzw.files.wordpress.com/2007/05/openmrs_osx.png Remote Sensing: http://www.medscape.com/content/2000/00/41/47/414717/art-e0603.01.fig2.jpg Cell phone/iPhone; http://healthinformaticsblog.files.wordpress.com/2008/03/iphone-denticon-patient-thumb.jpg WhoIsSick.org: http://gmapsmania.googlepages.com/whosickgmm.JPG
  • Indicator-based Surveillance: Computation of indicators upon which unusual disease patterns to investigate are detected (number of cases, rates, proportion of strains…) Lack of infrastructure Low level training Gaps in coverage Poor information flow Event-based Surveillance: The detection of public health events based on the capture of ad-hoc unstructured reports issued by formal or informal sources. Abundant cheap/free resource Detailed local information Near real-time reporting Less susceptible to political pressure Novel data sources: Online news, chat rooms, blogs, articles, multimedia Remote sensing: Algal blooms can be used to monitor the threat of cholera (e.g., Southern Baltic Sea)
  • Proportion of infection detected… Control confounding effects by: Including more than the demand side (Internet search query) but also the supply side (e.g., information on news websites) Link to Healthmap.org or GPHIN Including longitudinal data on health information supply Including accurate geographic distribution Infodemiology: Develop methodology and real-time measures (indices) to understand patterns and trends for general health information Understand the predictive value of what the community of practice is looking for ( demand ) for early detection of emerging diseases, infectious disease outbreaks, or bioterrorism Identify and quantify gaps between between information supply and demand Discover and and validate predictive metrics Could an X number (threshold) of Internet search hits on fever per week trigger a flu-outbreak?
  • Timeliness… We could potentially observe the progression of a disease outbreak within a population at multiple touch points (data) Some of these data may be collected before visits to the physician or hospital have actually happened Patients might search the Internet on symptoms they’re experiencing Patients might adjust their diet when they feel ill (such as drinking more water, juice, and have more rest) If the symptoms become more severe, patients might seek over-the counter (OTC) medicine, and miss classes or work In many cases, patients might go to work late or leave for home early Patients might also experience subtle change of their behavior at work When the symptoms continue, patients might seek help from physicians (e.g., schedule appointments, present with chief complaints, lab tests ordered, medicines prescribed) Similar models can also be established for pollution, non-infectious diseases, chronic diseases, injury, and natural disasters
  • There is currently NO turnkey solution to this problem… You have to involve humans and provide a collaborative environment for these people to work together… and we’re adopting a web 2.0/3.0 approach to pull everything together: In the Pepto Bismol example, the most interesting aspects of this event was that the majority of the victims did not seek medical attention at first. The Milwaukee Health Department in 1993 became aware of widespread gastrointestinal illness in the community through phone calls from citizens and the media. There was increased absenteeism from schools and businesses, and groceries and pharmacies reported depletion of anti-diarrheal medications. In an event like this, a human expert could associate certain indications and arrive at a conclusion or a few hypotheses to corroborate or refute an event: There have been unusually heavy rains for the last few weeks The Water authority has received several complaints about cloudy water from customers Now we have all these calls and concerns from the community So perhaps I should lean towards a waterborne hypothesis vs. something else… the human eye can also quickly detect a cluster of pins on a map over time and space and make certain assumptions… As we’re faced with a cross-disciplinary problem (human, animal, environment, organisms, etc.) it becomes more clear that we need to offer a collaborative space for experts from multiple fields to work together on solving the problem Back when I was in the trenches of SARS, we found out very quickly the importance of crowdsourcing and the need to share certain types of data quickly
  • Social distance can be more important than the geographic distance Networks can be incrementally developed and don’t need defined a priori Contradictory assumptions can be investigated in parallel (alternate hypotheses for causes, case definitions, etc) Items can be merged if duplication is discovered, or split if needed Each change to an element may trigger notification to users, and business logic Workflow assumes that actions be taken within specific time windows or else additional actions will be triggered Practically every item can be “tagged” by users with notes and supplementary data Users will communicate and collaborate through existing communication channels as much as possible Auditing of each step allows users to “back up” characterizations of health events through their history as well as a wide set of potential metrics for evaluating the processes involved in biosurveillance
  • Social distance can be more important than the geographic distance Networks can be incrementally developed and don’t need defined a priori Contradictory assumptions can be investigated in parallel (alternate hypotheses for causes, case definitions, etc) Items can be merged if duplication is discovered, or split if needed Each change to an element may trigger notification to users, and business logic Workflow assumes that actions be taken within specific time windows or else additional actions will be triggered Practically every item can be “tagged” by users with notes and supplementary data Users will communicate and collaborate through existing communication channels as much as possible Auditing of each step allows users to “back up” characterizations of health events through their history as well as a wide set of potential metrics for evaluating the processes involved in biosurveillance
  • Health Information Service (HIS) Metadata definitions Augment data with additional attributes (e.g., location, date, key words, related terms, video, images) Provide a markup language: GHML (Google Health Markup Language) based on national and international standards which describes the data and extends its meaning Provide a set of APIs and metadata that can support the following features: Search Visualization Collaboration Situational awareness Analysis Alerts Enhance accuracy, reliability, validity and utility by allowing the community of practice to augment the data Allow users to tag data of interest to further refine its meaning Allow users to link and share data that can be used by others (collaboration) Provide publish-and-subscribe functionality (RSS, GeoRSS, SSE, REST…) Allow users to invoke "health agents“
  • 1- Information gets collected from different sources 2-Information gets decorated with different layers of data, like remote sensing information about temperature, humidity or terrain. 3-Machine learning modules classify the articles in the system, determining location, name of diseases, symptoms or syndromes, extracting structured data like epidemiological numbers of suspected or confirm cases. 4-Experts from different disciplines collaborate around the information, creating comments, tagging, relating articles and correcting or training machine-learning algorithms. 5-Experts can use different visualizations and filtering tools, to explore the body of evidence as the event unfolds over time and space and create hypothesis of events that they can discuss or refine with their team members and decide whether they think that a field investigation is needed. 6-Field staff can collect and report information that gets incorporated back to the system.
  • 1- Information gets collected from different sources 2-Information gets decorated with different layers of data, like remote sensing information about temperature, humidity or terrain. 3-Machine learning modules classify the articles in the system, determining location, name of diseases, symptoms or syndromes, extracting structured data like epidemiological numbers of suspected or confirm cases. 4-Experts from different disciplines collaborate around the information, creating comments, tagging, relating articles and correcting or training machine-learning algorithms. 5-Experts can use different visualizations and filtering tools, to explore the body of evidence as the event unfolds over time and space and create hypothesis of events that they can discuss or refine with their team members and decide whether they think that a field investigation is needed. 6-Field staff can collect and report information that gets incorporated back to the system.
  • Saved filters with subscriptions List, Grid or Map views -Tags -Related items Publish and share information through RSS feeds
  • And of course, you can combine filters by tags, with filters by region or any other property that the article has in the system.
  • Hurdles to be overcome Diagnostics – limited availability Data collection – limited capacity Partial coverage – the black holes are getting larger Inconsistent definitions and quality of data Incompatible reporting systems and stove piping of information Political filters Technical: Collaboration: Commenting Capability Notification via a “publish and subscribe” capability Shared group definitions and calendars Shared access to key artifacts Support for Mobile devices (e.g., SMS) and VOIP Organizational – China might not want to share information, others might not want to..lots of policy, etc. required… Evaluation Framework: Overall measures (situation awareness and shared mental model) Individual processes measures Network parameters: Which automated systems generated the most reliable alerts, and for what types of conditions? Which human users where the most effective in identifying conditions? Which indicators are the most effective in identifying a health event? Which elements of the biosurveillance lifecycle require the most time and/or collaboration? The network history will provide a common point of evaluation for a variety of surveillance and response techniques System Evaluation: System description Purpose (detection- and information-based) Stakeholders Operations Health-related event detection Timeliness Validity Validation approach Statistical assessment of validity Data quality System experience System usefulness Flexibility Acceptability Portability Stability Costs Sustainability
  • To recap, The human experts interacting with automated systems The collaborative decision making environment We are sure one day soon we will have an EID (Emerging Infectious Disease) impact assessment... just like there is an environmental impact assessment…
  • E. coli Norwalk-like virus Salmonellosis Dengue fever Herpes Cholera Gastroenteritis Pertussis Rift Valley fever C. difficile Staphylococcal disease Diarrhea Legionellosis Tuberculosis Malaria Chickenpox Measles …

Riff: A Social Network and Collaborative Platform for Public Health Disease Surveillance Riff: A Social Network and Collaborative Platform for Public Health Disease Surveillance Presentation Transcript

  • Photo credit: IRMA (Integrated Risk Management for Africa)
  • Taha Kass-Hout and Nicolas di Tada, Summer 2008, Washington, DC, USA.
    • What is public health disease surveillance
      • “ Public health surveillance is the ongoing systematic collection, analysis, and interpretation of health data essential to the planning, implementation, and evaluation of public health practice, closely integrated with the timely dissemination of these data to those who need to know. The final link in the surveillance chain is the application of these data to prevention and control. A surveillance system includes a functional capacity for data collection, analysis, and dissemination linked to public health programs. ”
    • What is syndromic surveillance?
      • US CDC defines syndromic surveillance as “ surveillance using health-related data that precede diagnosis and signal a sufficient probability of a case or an outbreak to warrant further public health response. ”
    Thacker, S.B., and Berkelman, R.L. "Public Health Surveillance in the United States." Epidemiology Reviews 10 (1988): 164-90.
    • Current systems design, analysis and evaluation of disease surveillance systems has been geared towards specific data sources and detection algorithms – not humans
      • Much less has been towards interaction with responders and domain experts across agencies and at multiple levels
      • Often provide contradictory interpretations of ongoing events
    • We have disease surveillance systems in place for those threats we have been faced with before
      • We are more vulnerable to those we know about, but have not faced on a major scale
      • Even more vulnerable to those that we don’t know about
    • The likelihood of disasters and disease outbreaks is growing
      • According to a recent Oxfam report, there has been a four-fold increase in the annual number of natural disasters
      • 30 new infectious diseases identified since 1973
    • Potential impact is getting greater
      • Impact on health, economies & security
      • Capable of spreading faster than ever before
    http://www.oxfam.org/en/policy/briefingpapers/bp108_climate_change_alarm_0711
    • To address these challenges by adopting a social and collaborative decision making approach in order to facilitate
      • early characterization and identification of potential health threats
      • their verification, assessment and investigation
      • in order to recommend measures (public health and others) to control them
    • Event-based - ad-hoc unstructured reports issued by formal or informal sources
    • Indicator-based - (number of cases, rates, proportion of strains…)
    Timeliness, Representativeness, Completeness, Predictive Value, Quality, Cost, Feasibility, …
  • Identified risks Mandatory notification Laboratory surveillance Emerging risks Syndromic surveillance Mortality monitoring Healthcare activity monitoring Prescription monitoring Non healthcare based Veterinary surveillance Behavioral surveillance Environmental surveillance Poison centers Food safety/water supply … Domestic Media NGOs Field Epi points
    • International
    • Distribution lists
      • ProMed (English, Chinese, Spanish, Russian, etc.)
    • International agencies
      • WHO
      • OIE
      • CDC
      • NASA (e.g., remote sensing, weather, population migration, bird migration, population density, plant, animal)
    • Confidential/Limited mailing list dissemination
      • ProMed (e.g., MBDS)
      • International health regulation agencies (WHO, OIE, CDC, NASA)
      • Threat bulletin (EWARN, ECDC)
    • Public dissemination
      • News, blogs, articles,
      • Health ministry press releases sites
      • Weekly releases (Eurosurveillance)
    Adopted from WHO
  • Reduce Morbidity and Mortality and Improve Health Adopted from WHO
  • 1000 Shigella infections (100%) 50 Shigella notifications (5%)
    • Main attributes
      • Representativeness
      • Completeness
      • Predictive value positive
    Specificity / Reliability Sensitivity / Timeliness Get as close to the bottom of the pyramid as possible Urge frequent reporting
  • Time
    • Main attributes
      • Timeliness
    Analyze and interpret Signal as early as possible Automated analysis/thresholds
    • Clickstream/Keyword Searching
    • Blogs/Chatrooms
    • News Sources
      • Local
      • National
      • International
    • Curated mailing lists (ProMED)
    • Multi-national surveillance (Eurosurveillance)
    • Validated official global alerts (WHO)
    Sensitivity / Timeliness Specificity / Reliability
    • Main attributes
      • Data quality
  • Lab Confirmation Detection/ Reporting First Case Opportunity for control Adopted from WHO Response DAY CASES
  • First Case Detection/ Reporting Confirmation Investigation Opportunity for control Response DAY CASES Adopted from WHO
  • Nov 2002 Mar 2003 Progression of outbreak Electronic Surveillance Adopted from Brownstein, et al. Cases of atypical pneumonia Foshan Nov 16th Infected Chinese Doctor Hong Kong hotel Feb 21st 305 Cases of acute resp Guangdong Province Feb 11th Pharma report Guangdong Province November 27 Media reports Guangdong Province Feb 10 Astute physician on ProMED Feb 10 Initial WHO Report Feb 25 Official WHO Report March 10
  • News articles Alerts Disease reports
  • 9/20, 15213, cough/cold, … 9/21, 15207, antifever, … 9/22, 15213, CC = cough, ... 1,000,000 more records… Huge mass of data Detection algorithm Too many alerts Duplicative and uni-directional channels Uncoordinated response
    • Hybrid: Machine- and Human-based
    • Social, collaborative and cross-disciplinary
    • Web 2.0/3.0 platform
    • Better detection model
    • Better response model
    Source: http://www.pbs.org/wgbh/pages/frontline/shows/georgia/outbreak/matrix.html Source: www.sociology.columbia.edu/pdf-files/bearmanarticle.pdf
  • News item 345 Field alerts Disease report Health News Field alerts News sources Alerts Data + Metadata
      • Collaboration and multi-directional communication between interested groups
      • Interactions beyond that allowed by original sources and with controlled visibility
      • Customizable, secure ‘social’ and ‘professional’ metadata around information
  • 9/20, 15213, cough/cold, … 9/21, 15207, antifever, … 9/22, 15213, CC = cough, ... 1,000,000 more records… Huge mass of data Feedback loop Fewer and more actionable alerts Effective and coordinated response Multi-directional communication
  • Feature extraction (including geo-location) Tags Comments Location Flags/Alerts/Bookmarks Environment Factors Animal Health Factors Remote Sensing Event Classification and Detection Previous Event Training Data Previous Event Control Data Metadata extraction Other reference information Machine learning Show event characterizations Social network Other inferred information … Professional network feedback Professional feedback Anomaly detection Multiple data streams (multi-lingual) User-Generated and Machine Learning Metadata Existing Social Network (e.g., Comm. of interest) Riff Bot
  • Kass-Hout and di Tada: Best Poster Award for Improving Public Health Investigation and Response at the Seventh Annual ISDS Conference, December 3-5, 2008 at the Raliegh Conference Civic Center. http://kasshout.blogspot.com/2008/12/best-poster-award-for-improving-public.html and http://www.isdsjournal.org/article/viewArticle/3308
  • Search: _____ {tag Cloud} Terms tagged by human collaborators or source {Event Tag cloud} X Diarreha X Cholera X Influenza X Respiratory lllness X Fever [Show me unusual distributions]
  • Filters Item (e.g., disease report, news article, alert) summary and location (s) Tag cloud Subscriptions SMS alerts Ratings, comments, alerts, flags Tags (automatic + humans classification) Thread (related Items)
    • LOCATIONS
    • HEATMAP
  • Tracking the Avian Influenza Outbreak in Egypt (reports started to appear late January 2009).
    • Current classifications (automated and corrected by human experts) includes:
      • 7 syndromes
      • 10 transmission modes
      • > 100 infectious diseases
      • > 180 micro-organisms
      • > 140 symptoms
      • > 50 chemicals
    HFOSS Disease Ontology Prediction Project http://2009.hfoss.org/Evolve_-_Disease_Ontology_Prediction
    • Over the summer, the Humanitarian FOSS (HFOSS) Project Summer Institute 2008 (May' 08 - July' 08) carried out an internship project mentored by InSTEDD and a number of HFOSS faculty. During this internship, Juan Pablo Mendoza and Qianqian Lin developed ALPACA Light Parsing And Classifying Application ( ALPACA ) to:
      • Transform raw unstructured documents (e.g., news reports, ProMED mail , etc.) into machine readable and analyzable data using a text parsing module
      • Categorize documents using a SVM classifier using libSVM for: 
        • a) Classification into a predetermined (user-defined) list of categories as described above (syndromes, symptoms, routes of transmission, diseases, etc.), and 
        • b) Suggesting additional tags and/or topics using a Naive Bayes classifier given existing topics and monitoring human input and review. This is especially helpful with new (emerging) threats or those threats that we know about but we experience them at a much bigger scale than usual (e.g., far more virulent flu virus than we’ve experienced over the past few years)
    • We tested ALPACA against two widely accepted early sources of information in the public health community; Reuters news and ProMED mail . Results are shown here:
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    • ALPACA is extensible through a plug-in functionality that provides a simple way to add additional parsers and classifiers to the application. We are continuously adding and testing additional algorithms and we welcome your contribution to help us better calibrate existing classifiers and parsers as well as introduce additional ones (you can visit our collaborative space here .)
    • To-date, we have:
      • 480 registered users
      • 394 collaboration spaces
      • 694 streams of information sources (RSS, SMS, etc.)
      • 900.000 items [e.g., news articles, disease reports] analyzed
      • 443,151 geo-coded locations
      • 700 terms [tags] ‘trained’ [accept/reject] by human experts
      • 12.000+ tags ‘suggested’ by human experts
    • Technical considerations
      • Collaboration
      • Workflow
    • Organizational considerations
    • Evaluation framework
    • Latest Progress
      • Ontologies (e.g., BioCaster, SNOMED, ICD)
      • Event reporting, analysis and public announcements (e.g., Thomson Reuters Foundation’s Emergency Information Service (EIS) deployment during the Haiti Response, 2010
    • Planned Steps
      • API for external extensions and interactions
      • Full support for structured data
      • Automatic field data collection through forms, SMS, etc.
      • Anomaly detections (e.g., EARS)
    http://alertnet.org/db/blogs/1564/2010/00/24-120746-1.htm http://ndt.instedd.org/search/label/eis
  • Taha Kass-Hout, MD, MS http://kasshout.blogspot.com Nicolás di Tada [email_address] Riff http://riff.instedd.org [Software: http://code.google.com/p/riff-evolve Code license: GNU General Public License v3, Content license: Creative Commons 3.0 BY-SA] Cambodia, Photo taken by Taha Kass-Hout, October 2008 “ this pic says it all- our kids are all the same- they deserve the same ”, Comment by Robert Gregg on Facebook, October 2008
  •  
  • Kass-Hout and di Tada: Best Poster Award for Improving Public Health Investigation and Response at the Seventh Annual ISDS Conference, December 3-5, 2008 at the Raliegh Conference Civic Center. http://kasshout.blogspot.com/2008/12/best-poster-award-for-improving-public.html and http://www.isdsjournal.org/article/viewArticle/3308
    • Detection-focused visualization
      • Individual alert listings
      • Summary alerts
      • Alerts in time-series graph
      • Mapping alerts
    • Information-based visualization (visualizing data and information)
      • Data query
      • Data stratification
      • Time-series graphs
      • Data line listing
      • Matrix portal
      • Mapping
    • How to communicate information to users of the system. Typically there are three basic components:
    • Time-series graphs
    • Maps
    • Data tables
    • However, depending on the primary focus (detection-bases, information-based, or a hybrid of both) there can be more components as follows:
    • Detection-based systems (alert listings and maps of current anomalies are the two most important visualization components)
      • What-if scenarios
      • Automatic anomaly detection
    • Statistical anomaly
    • System believes there is an anomaly of interest to the user
    • Information-based systems (GIS, time-series graphs, data tables, query wizards, and real-time displays are the most important visualization components)
      • Create new case definition
      • Select different processing options
      • Customize presentation to meet users needs
    • Additionally, we propose building a hybrid solution that combines both detection and information-based systems, which supports the following:
    • Early detection
    • Spatio-temporal
    • Belief Networks (BNs)
    • Simulation and modeling
    • Provide earlier notification of a change in the normal levels of observed counts of the desired health indicator.
    • Emphasis on the importance of matching the analytic process to the data type so as to achieve the performance needed for early identification of the event with minimum false alarms (Type I and Type II errors).
    • Performance evaluation f analytic processes using accepted metrics.
    • Statistical decision of the analytic data monitor include:
      • which combinations of data sources to test
      • which algorithms to use with respect to characteristics of the data background
      • how to achieve sensitivity over many locations within manageable false alert rate frequency
      • how much corroboration among data streams is required to achieve a threshold for escalating the information
    • A multiplicity of data sources has appeal because consistent evidence may be employed to suggest inferential accuracy. In practice, however, multiple data sources can be contradictory. Decision requirements for the prospective analytic data monitor involve when and how deeply to investigate a data anomaly as well as when to escalate the information (as an alert) for action. Unambiguous, corroborated data spikes are the exception rather than the rule. For single data streams, univariate algorithms employ data modeling and hypothesis tests to provide systematic signal escalation protocols.
    • Use of application-linked and hyperlinked-fields for integration of analysis and visualization tools
    • Commenting Capability
    • Notification via a “publish and subscribe” capability
    • Shared group definitions and calendars
    • Shared access to key artifacts
    • Support for Mobile devices (e.g., SMS) and VOIP
    • The enforcement of the business rules for distributing and validating alerts, escalation, and the definition of tasks
    • Keeping the business logic encapsulated in an business engine, as opposed to “coding it into” the core applications
    • Modification of operations “on the fly”, and supporting different modes of operation depending on the current level of emergency
  •  
    • Overall measures
      • Situation Awareness Global Assessment Technique (SAGAT)
      • The Situation Awareness Rating Technique (SART)
    • Individual processes measures
    • Network parameters
    • Which automated systems generated the most reliable alerts, and for what types of conditions?
    • Which human users where the most effective in identifying conditions?
    • Which indicators are the most effective in identifying a health event?
    • What factors help to minimize or aggravate a health event?
    • Which elements of the biosurveillance lifecycle require the most time and/or collaboration?
    • The network history will provide a common point of evaluation for a variety of surveillance and response techniques
    • System description
      • Purpose (detection- and information-based)
      • Stakeholders
      • Operations
    • Health-related event detection
      • Timeliness
      • Validity
    • Validation approach
    • Statistical assessment of validity
    • Data quality
    • System experience
      • System usefulness
      • Flexibility
      • Acceptability
      • Portability
      • Stability
      • Costs
    • Evaluation here is primarily for the timely detection of health-related event and effectiveness of response. We have to keep in mind the flexibility of the system and how it can meet both regular and advanced users. Advanced users often want control in order to customize queries, modify graphic presentation, adjust sensitivity levels of detection algorithms, etc.
  •  
    • Source Type
    • Non-Specific
      • Syndromic
    • Specific
      • Case Definition
    Note: All tags can follow a hierarchical construct Ontology Example: A subset of disease ontology, showing relationships between the various forms of pneumonia. Pneumonia and influenza Pneumonia due to Staphylococcus aureus Other bacterial pneumonia Pneumococcal pneumonia Pneumonia due to Hemophilus influenzae
    • Human
      • Prodromal
      • Clinical
      • Morbidity and Mortality
    • Animal
    • Environmental (or Climate)
    • Allied Professional Source
    • Building/vessel contamination
    • Continuous or intermittent release of an agent
    • Contagious person-to-person
    • Commercially distributed products
    • Waterborne
    • Vector/host borne
    • Sexually transmitted
    • Other
      • Large-scale bioaerosol
      • Premonitory release of agent
    • CLIMATE
    • PEOPLE
    • Temperature change
    • Precipitation change
    • Wind change
    • Die-offs observed
    • Sentinels tested
    ANIMAL
    • Increased mortality rate
    • Increased presentations for treatment
    • Building/vessel contamination
    • Continuous or intermittent release of an agent
    • Contagious person-to-person
    • Commercially distributed products
    • Waterborne
    • Vector/host borne
    • Sexually transmitted
    • Other
      • Large-scale bioaerosol
      • Premonitory release of agent
    TRANSMISSION ROUTE Note: All tags can follow a hierarchical construct
    • RESPIRATORY
    • BREATHING DIFFICULTY
    • GI
    • Hemoptysis
    • Asthma attack
    • Croup
    • Pneumonia
    • Wheezing
    • Runny or stuffy nose
    • Pleuritic pain
    • Sore throat
    • URI
    • Fever
    • Weakness
    • Anorexia
    • Viral syndrome
    • Faintness
    • Malaise
    • Body aches
    • General illness
    • Chills
    • Lymphadenopathy
    • Sweating
    CONSTITUTIONAL IRRITABLE BABY
    • Abdominal pain
    • Diarrhea
    • Vomiting
    • Nausea
    • Gastroenteritis
    • Dehydration
    • Cough
    • Sore throat
    • Fever
    • Weakness
    • Viral syndrome
    • Body aches
    • Bronchiolitis
    • Pnemonia
    • Upper respiratory infection
    • Malaise
    • Chills
    • Influenza
    INFLUENZA-LIKE ILLNESS (OR ILI) Note: All tags can follow a hierarchical construct
    • UNDIAGNOSED
    • GI
    • Respiratory
    • DIAGNOSED
    • Influenza
      • Avian influenza
    • Can be mapped to standards , such as:
    • Unified Medical Language System (UMLS) [which supports SNOMED, LOINC, ICDs, etc.] http://www.nlm.nih.gov/research/umls/
    • PHIN VADS ( http://www.cdc.gov/PHIN )
    • Case Definition:
      • Probable
      • Possible
      • Confirmed
    Note: All tags can follow a hierarchical construct
  • Cough [13 of 130] If Item has: Runny Nose [20 of 130] Fever [23 of 130] Then tag it with: Flu [10 of 130] Admin configures a new inference: User sees a suggestion for a new item: System will analyze the existing tagged Items and find out the probability of an item been a flu given that it has cough, runny nose and fever. Flu [85% confidence because of cough, runny nose and fever] Influenza [55% confidence because of cough and headace] Tags inferred
  • Cough Longitude Latitude Fever 3 items clustered because of its proximity and similar symptoms Note: This is actually done in a n-dimensional space, n being the number of tags available, plus the number of relevant words detected, plus a possible spatio-temporal dimension Time
    • Each item gets represented by a vector of the relevant words it contains with the corresponding frequency.
    • Each tag classifier gets its linear classifier, which needs at least one positive and one negative sample. The classification is based on the vectors for each item, the linear classifier creates a hyperplane which divides the n-space in two for positive and negative predictions.
    • Whenever a user corrects or confirms a suggestion we feedback the classifier.
    • Any number of BNs can be created to map some “evidence” tags to a “prediction” tag. The system will measure for each item the probability of having that tag based on the existence of previous tags.
    • The vectors for the items can be grouped to find clusters. This will mean that those items are near in the n-space so they have similar values for their word content and tags.
    • This is just an initial approach, there are a number of alternatives implementations:
      • Automatic tagging can be done using clustering: we create clusters for each tag and for new items we measure to which cluster centroids the item is closer.
      • Automatic tagging can also be done using BNs, our evidence can be the words and we can measure the probability of a certain tag based on the words contained by the item.
      • New Tag suggestions can be done using clustering instead of BNs: clustering all the items and suggesting tags that some of the items in the cluster have and the others don’t.
    • Given that we implement the algorithm abstractly enough, it should be simple to interchange them and see what works best.
  • P(malaria) = 22% P(influenza) = 13% P(other ILI) = 33%
    • Classifiers
    • Clustering
    • Bayesian Statistics
    • Neural Networks
    • Genetic Algorithms
  • cold fever
    • Map items to vectors (Feature extraction)
    • Normalize those vectors
    • Train the classifier
    • Measure the results with new information
    • Feedback the classifier
    • Separate classes in feature space
    • Support vectors define the separator
  • Φ : x -> φ ( x ) Map to higher-dimension space
  • Classifier Document 1 Document 2 Document 3 Positives Negatives Training Document Training Document
    • Map items to vectors (Feature extraction)
    • Normalization
    • Agglomerative or Partitional
  • Probability of disease A (flu) once symptom B (fever) is observed Probability of fever once flu is confirmed Probability of flu (prior or marginal) Probability of fever (prior or marginal)
    • Given a set of stimuli, train a system to produce a given output…
  • Hidden Layer Output Layer Input Layer […] […] {I 0 ,I 1 ,……I n } {O 0 ,O 1 ,……O n } Weight
  • Event?
    • Define the model that you want to optimize
    • Create the fitness function
    • Evolve the gene pool testing against the fitness function.
    • Select the best individual
    • Model the transmission process using a set of parameters ( e.g., an infectious disease ):
      • Onset time between an infection and illness
      • Latency period
      • Incubation period
      • Symptomatic period
      • Infectious period
    (Onset, Latency, Incubation, Symptomatic , Infectious) ( 2 days, 3 days, 1 day, 4 days, 3 days)
  • Fitness = 1/Area
    • Create an initial population of candidates
    • Use operators to generate new candidates (mating and mutation)
    • Discard worst individuals or select best individuals in generation
    • Repeat from 2 until you find a candidate that satisfies the solution searched
  • (4, 5 ,6, 3 ,5) (4,3,6,2,5) (5,3,4,6,2) (2,4,6,3,5) (4,3,6,5,2) (2,3,4,6,5) (3,4,5,2,6) (3,5,4,6,2) (4,5,3,6,2) (5,4,2,3,6) (4,6,3,2,5) (3,4,2,6,5) (3,6,5,1,4) ( 5,3 , 2,6,5 ) ( 3,4 , 4,6,2 ) ( 5,3 , 2,6,5 ) ( 3,4 , 4,6,2 )
  •  
      • Each "pill" is an hypothesis, it has the event tags on top, followed by the author. 
      • Each hypothesis can have many responses to it, a response means someone disagrees or wants to add something to the proposed hypothesis.
      • The Red area are items that have been rejected by the responder to the previous hypothesis. Something like "Yes, I agree with you, but this and this don't belong here".
      • The Blue area means items that have been added to the responded hypothesis, like "Yes, you are right, but also this and this items should be included in this event.
      • A hypothesis response can include changes in the event tags: "You are right, there's definitely something going on with all those items, but I don't think its airborne, I think its waterborne"
      • The list of people below the pill are all the users that have subscribed or agreed to that hypothesis: "Yes, Doc. James is right, I put my signature here."
      • The size of the subscribers below is relative to their "reputation" inside Riff, the whole area below a pill gives an idea of how well supported a hypothesis is. We could also grow the pill proportionally.
      • The "bold" pill is the confirmed one, once there's a confirmation, that should increase the reputation of all the subscribers to that hypothesis. 
      • Hovering each pill's area, should display a summary of the number of items there, maybe the area and the items tags. 
      • Clicking should popup a small navigation of the actual items contained there.
  •  
    • Can trend analysis predict outbreaks?
    • Recent studies show that Internet search has:
      • … considerable potential as one of the earliest indicators for syndromic surveillance
      • … the potential to predict population-based events relevant to public health
      • … the potential for a higher sensitivity compared to other early sources (e.g., media, ProMed)
    • Many individuals experiencing symptoms of illness conduct Internet search prior to seeking medical attention
      • Wilson, Kumanan, Brownstein, John S., Early detection of disease outbreaks using the Internet, CMAJ 2009 180: 829-831
      • Ginsberg J, Mohebbi MH, Patel RS, et al. Detecting influenza epidemics using search engine query data. Nature 2009;457:1012–4
      • Mostashari F. Can Internet searches provide useful data for public health surveillance?. Advances in Disease Surveillance 2007;2:209
      • Wethington H, Bartlett P. Usage and data collection patterns for a novel web-based foodborne-disease surveillance system. J Environ Health. 2006 Mar;68(7):25-9
      • Cooper CP, Mallon KP, Leadbetter S, Pollack LA, Peipins LA. Cancer Internet search activity on a major search engine, United States 2001-2003. J Med Internet Res 2005;7:e36 Li CS, Aggarwal C, Campbell M, et al. Site-Based Biosurveillance. MMWR September 24, 2004 / 53(Suppl);249
      • Eysenbach G, Kohler C. What is the prevalence of health-related searches on the World Wide Web? Qualitative and quantitative analysis of search engine queries on the Internet. Proc AMIA Annu Fall Symp 2003;225-9
  • Internet search for allergies and ragweed search terms increase in the spring , and allergy and pollen search terms increase significantly in the fall . It would also appear that Texas and Oklahoma are leading locales for ragweed. Source: Mostashari F. Can Internet searches provide useful data for public health surveillance?. Advances in Disease Surveillance 2007;2:209
  • A search for the term “leptospirosis” in the United States finds dramatically higher search rates from Honolulu, Hawaii, consistent with the epidemiology of the illness in the United States (more than half of all national cases are reported from Hawaii). Source: Mostashari F. Can Internet searches provide useful data for public health surveillance?. Advances in Disease Surveillance 2007;2:209
  • Internet search for “contact lens” increased in Singapore in February 2006, prior to the notification from CDC of the first US cases of contact lens-associated Fusarium keratitis in March 2006, and prior to widespread news coverage in April 2006. Source: Mostashari F. Can Internet searches provide useful data for public health surveillance?. Advances in Disease Surveillance 2007;2:209
  • Following large anti-war protests on the Mall in Washington DC in late September 2005, multiple environmental sensors watching for bioterror events detected the presence of Francisella tularensis . Interestingly, queries appear to have increased prior to discovery of the sensor findings by public health officials on September 30 th . Source: Mostashari F. Can Internet searches provide useful data for public health surveillance?. Advances in Disease Surveillance 2007;2:209
  • While uncommon words like “croup” readily reveal the expected seasonal pattern of Internet search, more common words like “cough” or “throat” require logical modifiers to rule out more common search phrases. Source: Mostashari F. Can Internet searches provide useful data for public health surveillance?. Advances in Disease Surveillance 2007;2:209