0
Photo credit: IRMA (Integrated Risk Management for Africa)
Taha Kass-Hout and Nicolas di Tada, Summer 2008, Washington, DC, USA.
<ul><li>What is public health disease surveillance </li></ul><ul><ul><li>“ Public health surveillance is the ongoing syste...
<ul><li>Current systems design, analysis and evaluation of disease surveillance systems has been geared towards specific d...
<ul><li>The likelihood of disasters and disease outbreaks is growing </li></ul><ul><ul><li>According to a recent Oxfam rep...
<ul><li>To address these challenges by adopting a  social and collaborative decision making approach  in order to  facilit...
<ul><li>Event-based -  ad-hoc unstructured reports issued by formal or informal sources </li></ul><ul><li>Indicator-based ...
Identified risks Mandatory notification Laboratory surveillance Emerging risks Syndromic surveillance Mortality monitoring...
Reduce Morbidity and Mortality and Improve Health Adopted from WHO
1000  Shigella  infections (100%) 50  Shigella  notifications (5%) <ul><li>Main attributes </li></ul><ul><ul><li>Represent...
Time <ul><li>Main attributes </li></ul><ul><ul><li>Timeliness </li></ul></ul>Analyze and interpret  Signal as early  as po...
<ul><li>Clickstream/Keyword Searching </li></ul><ul><li>Blogs/Chatrooms </li></ul><ul><li>News Sources </li></ul><ul><ul><...
Lab Confirmation Detection/ Reporting First  Case Opportunity  for control Adopted from WHO Response DAY CASES
First  Case Detection/ Reporting Confirmation Investigation Opportunity  for control Response DAY CASES Adopted from WHO
Nov 2002 Mar 2003 Progression of outbreak Electronic Surveillance Adopted from Brownstein, et al. Cases of atypical pneumo...
News articles Alerts Disease reports
9/20, 15213, cough/cold, … 9/21, 15207, antifever, … 9/22, 15213, CC = cough, ... 1,000,000  more records… Huge mass of da...
<ul><li>Hybrid: Machine- and Human-based  </li></ul><ul><li>Social, collaborative and cross-disciplinary </li></ul><ul><li...
<ul><li>Better detection model </li></ul><ul><li>Better response model </li></ul>Source:  http://www.pbs.org/wgbh/pages/fr...
News item 345 Field alerts Disease report Health News Field alerts News sources Alerts Data + Metadata <ul><ul><li>Collabo...
9/20, 15213, cough/cold, … 9/21, 15207, antifever, … 9/22, 15213, CC = cough, ... 1,000,000  more records… Huge mass of da...
Feature extraction (including geo-location) Tags Comments Location Flags/Alerts/Bookmarks Environment Factors Animal Healt...
Kass-Hout and di Tada: Best Poster Award for Improving Public Health Investigation and Response at the Seventh Annual ISDS...
Search: _____ {tag Cloud} Terms tagged by human collaborators or source {Event Tag cloud} X  Diarreha X  Cholera X  Influe...
Filters Item (e.g., disease report, news article, alert) summary and location (s) Tag cloud Subscriptions  SMS alerts Rati...
<ul><li>LOCATIONS </li></ul><ul><li>HEATMAP </li></ul>
Tracking the Avian Influenza Outbreak in Egypt (reports started to appear late January 2009).
<ul><li>Current classifications (automated and corrected by human experts) includes:  </li></ul><ul><ul><li>7 syndromes </...
<ul><li>Over the summer, the  Humanitarian FOSS (HFOSS) Project Summer Institute 2008  (May' 08 - July' 08) carried out an...
<ul><li>We tested ALPACA against two widely accepted  early  sources of information in the public health community;  Reute...
<ul><li>To-date, we have: </li></ul><ul><ul><li>480 registered users </li></ul></ul><ul><ul><li>394 collaboration spaces <...
<ul><li>Technical considerations </li></ul><ul><ul><li>Collaboration </li></ul></ul><ul><ul><li>Workflow </li></ul></ul><u...
<ul><li>Latest Progress </li></ul><ul><ul><li>Ontologies (e.g., BioCaster, SNOMED, ICD) </li></ul></ul><ul><ul><li>Event r...
Taha Kass-Hout, MD, MS http://kasshout.blogspot.com   Nicolás di Tada [email_address] Riff http://riff.instedd.org [Softwa...
 
Kass-Hout and di Tada: Best Poster Award for Improving Public Health Investigation and Response at the Seventh Annual ISDS...
<ul><li>Detection-focused visualization </li></ul><ul><ul><li>Individual alert listings </li></ul></ul><ul><ul><li>Summary...
<ul><li>Detection-based systems (alert listings and maps of current anomalies are the two most important visualization com...
<ul><li>Early detection </li></ul><ul><li>Spatio-temporal </li></ul><ul><li>Belief Networks (BNs) </li></ul><ul><li>Simula...
<ul><li>Statistical decision of the  analytic data monitor  include:  </li></ul><ul><ul><li>which combinations of data sou...
<ul><li>Use of application-linked and hyperlinked-fields for integration of analysis and visualization tools  </li></ul><u...
<ul><li>The enforcement of the business rules for distributing and validating alerts, escalation, and the definition of ta...
 
<ul><li>Overall measures  </li></ul><ul><ul><li>Situation Awareness Global Assessment Technique (SAGAT) </li></ul></ul><ul...
<ul><li>Which automated systems generated the most reliable alerts, and for what types of conditions? </li></ul><ul><li>Wh...
<ul><li>System description </li></ul><ul><ul><li>Purpose (detection- and information-based) </li></ul></ul><ul><ul><li>Sta...
 
<ul><li>Source Type </li></ul><ul><li>Non-Specific </li></ul><ul><ul><li>Syndromic </li></ul></ul><ul><li>Specific </li></...
<ul><li>Human </li></ul><ul><ul><li>Prodromal </li></ul></ul><ul><ul><li>Clinical </li></ul></ul><ul><ul><li>Morbidity and...
<ul><li>Building/vessel contamination </li></ul><ul><li>Continuous or intermittent release of an agent </li></ul><ul><li>C...
<ul><li>CLIMATE </li></ul><ul><li>PEOPLE </li></ul><ul><li>Temperature change </li></ul><ul><li>Precipitation change </li>...
<ul><li>RESPIRATORY </li></ul><ul><li>BREATHING DIFFICULTY </li></ul><ul><li>GI </li></ul><ul><li>Hemoptysis </li></ul><ul...
<ul><li>UNDIAGNOSED </li></ul><ul><li>GI </li></ul><ul><li>Respiratory </li></ul><ul><li>… </li></ul><ul><li>DIAGNOSED </l...
Cough [13 of 130] If Item has: Runny Nose [20 of 130] Fever [23 of 130] Then tag it with: Flu [10 of 130] Admin configures...
Cough Longitude Latitude Fever 3 items clustered because of its proximity and similar symptoms Note: This is actually done...
<ul><li>Each item gets represented by a vector of the relevant words it contains with the corresponding frequency. </li></...
<ul><li>This is just an initial approach, there are a number of alternatives implementations: </li></ul><ul><ul><li>Automa...
P(malaria) = 22%  P(influenza) = 13%  P(other ILI) = 33%
<ul><li>Classifiers </li></ul><ul><li>Clustering </li></ul><ul><li>Bayesian Statistics </li></ul><ul><li>Neural Networks <...
cold fever
<ul><li>Map items to vectors (Feature extraction) </li></ul><ul><li>Normalize those vectors </li></ul><ul><li>Train the cl...
<ul><li>Support vectors define the separator </li></ul>
Φ :  x   ->   φ ( x ) Map to higher-dimension space
Classifier Document 1 Document 2 Document 3 Positives Negatives Training Document Training Document
<ul><li>Map items to vectors (Feature extraction) </li></ul><ul><li>Normalization </li></ul><ul><li>Agglomerative or Parti...
Probability of disease A (flu) once symptom B (fever) is observed Probability of fever once flu is confirmed Probability o...
<ul><li>Given a set of stimuli, train a system to produce a given output… </li></ul>
Hidden Layer Output Layer Input Layer […] […] {I 0 ,I 1 ,……I n } {O 0 ,O 1 ,……O n } Weight
Event?
<ul><li>Define the model that you want to optimize </li></ul><ul><li>Create the fitness function </li></ul><ul><li>Evolve ...
<ul><li>Model the transmission process using a set of parameters ( e.g., an infectious disease ): </li></ul><ul><ul><li>On...
Fitness = 1/Area
<ul><li>Create an initial population of candidates </li></ul><ul><li>Use operators to generate new candidates (mating and ...
(4, 5 ,6, 3 ,5)  (4,3,6,2,5)  (5,3,4,6,2) (2,4,6,3,5) (4,3,6,5,2) (2,3,4,6,5) (3,4,5,2,6) (3,5,4,6,2) (4,5,3,6,2) (5,4,2,3...
 
<ul><ul><li>Each &quot;pill&quot; is an hypothesis, it has the event tags on top, followed by the author.  </li></ul></ul>...
<ul><ul><li>A hypothesis response can include changes in the event tags: &quot;You are right, there's definitely something...
<ul><ul><li>The &quot;bold&quot; pill is the confirmed one, once there's a confirmation, that should increase the reputati...
 
<ul><li>Can trend analysis predict outbreaks?  </li></ul><ul><li>Recent studies show that Internet search has: </li></ul><...
<ul><li>Many individuals experiencing symptoms of illness conduct Internet search prior to seeking medical attention </li>...
Internet search for  allergies and ragweed  search terms  increase in the spring , and  allergy and pollen  search terms  ...
A search for the term “leptospirosis” in the United States finds dramatically higher search rates from Honolulu, Hawaii,  ...
Internet search for “contact lens” increased in Singapore in February 2006,  prior  to the notification from CDC  of the f...
Following large anti-war protests on the Mall in Washington DC in late September 2005, multiple environmental sensors watc...
While uncommon words like “croup” readily reveal the expected seasonal pattern of Internet search, more  common words like...
Upcoming SlideShare
Loading in...5
×

Riff: A Social Network and Collaborative Platform for Public Health Disease Surveillance

3,983

Published on

A hybrid (event-based and indicator-based) platform designed to streamline the collaboration between domain experts and machine learning algorithms for detection, prediction and response to health-related events (such as disease outbreaks or pandemics). The platform helps synthesize health-related event indicators from a wide variety of information sources (structured and unstructured) into a consolidated picture for analysis, maintenance of “community-wide coherence”, and collaboration processes. The platform offers features to detect anomalies, visualize clusters of potential events, predict the rate and spread of a disease outbreak and provide decision makers with tools, methodologies and processes to investigate the event.

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,983
On Slideshare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
130
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • Old ideas: Crows recognized for divination in Roman times: A crucial component of the US West Nile Virus control program New technologies must: Bring multiple disciplines together Offer a collaborative and Open source model OUR MODEL: Commercial models rely on competition to drive innovation. Their tools fail at the edge where there is no market to drive success. Non-profits know the “edge” challenges, but lack the resources for technical innovation We recognize our success will be measured by effective adoption at both the edge and the center. And it has to be open-source and free. We’ve decided to rely on environmental forces (rather than a market) to drive innovation. And it works.
  • Our track record: HIV pandemic Rift valley fever FMD pandemic West Nile Virus in the US SARS Monkeypox No room for complacency!!!
  • Early detection of disease outbreaks is the holy grail of public health, and has now also become a crucial issue for governments facing the threat of bioterrorism. OUR BIG PICTURE: We want to help people detect things early, connect people with each other so they can respond sooner
  • It is not necessarily lack of information… we have a lot of information… rather, can we put the information into intelligence (or context) in a timely manner? Multiple streams include the following- say something about why you need to stitch multiple sources together... How do you put an event into context? And, where is the next disease is going to emerge from... that is the holly grail in this business... Dead crows on the streets of NYC Pepto-bismol disappearing from the shelves of grocery stores Phone calls from citizens and the media to the health department about increased absenteeism from schools and businesses Increased Internet search hits on certain terms per week Image Source: Dead Crow: http://www.birds.cornell.edu/crows/images/deadcrow.jpg Empty Shelves: http://farm3.static.flickr.com/2029/2239605500_6ef2fd2295.jpg?v=0 Sidebar: 5/50 rule, in 5 years time, 50% of all content will be user-generated: (Reference: The Podshow by Ron Bloom (http://www.ronbloom.com/?p=11) 60% content has geo-spatial and temporal aspects… Image Sources: Wikipedia: http://www.citris-uc.org/system/files/imce-u10/Wikipedia-logo.png Blogger: http://z.about.com/d/weblogs/1/5/V/-/-/-/BloggerHomePage.PNG OpenMRS: http://ruddzw.files.wordpress.com/2007/05/openmrs_osx.png Remote Sensing: http://www.medscape.com/content/2000/00/41/47/414717/art-e0603.01.fig2.jpg Cell phone/iPhone; http://healthinformaticsblog.files.wordpress.com/2008/03/iphone-denticon-patient-thumb.jpg WhoIsSick.org: http://gmapsmania.googlepages.com/whosickgmm.JPG
  • Indicator-based Surveillance: Computation of indicators upon which unusual disease patterns to investigate are detected (number of cases, rates, proportion of strains…) Lack of infrastructure Low level training Gaps in coverage Poor information flow Event-based Surveillance: The detection of public health events based on the capture of ad-hoc unstructured reports issued by formal or informal sources. Abundant cheap/free resource Detailed local information Near real-time reporting Less susceptible to political pressure Novel data sources: Online news, chat rooms, blogs, articles, multimedia Remote sensing: Algal blooms can be used to monitor the threat of cholera (e.g., Southern Baltic Sea)
  • Proportion of infection detected… Control confounding effects by: Including more than the demand side (Internet search query) but also the supply side (e.g., information on news websites) Link to Healthmap.org or GPHIN Including longitudinal data on health information supply Including accurate geographic distribution Infodemiology: Develop methodology and real-time measures (indices) to understand patterns and trends for general health information Understand the predictive value of what the community of practice is looking for ( demand ) for early detection of emerging diseases, infectious disease outbreaks, or bioterrorism Identify and quantify gaps between between information supply and demand Discover and and validate predictive metrics Could an X number (threshold) of Internet search hits on fever per week trigger a flu-outbreak?
  • Timeliness… We could potentially observe the progression of a disease outbreak within a population at multiple touch points (data) Some of these data may be collected before visits to the physician or hospital have actually happened Patients might search the Internet on symptoms they’re experiencing Patients might adjust their diet when they feel ill (such as drinking more water, juice, and have more rest) If the symptoms become more severe, patients might seek over-the counter (OTC) medicine, and miss classes or work In many cases, patients might go to work late or leave for home early Patients might also experience subtle change of their behavior at work When the symptoms continue, patients might seek help from physicians (e.g., schedule appointments, present with chief complaints, lab tests ordered, medicines prescribed) Similar models can also be established for pollution, non-infectious diseases, chronic diseases, injury, and natural disasters
  • There is currently NO turnkey solution to this problem… You have to involve humans and provide a collaborative environment for these people to work together… and we’re adopting a web 2.0/3.0 approach to pull everything together: In the Pepto Bismol example, the most interesting aspects of this event was that the majority of the victims did not seek medical attention at first. The Milwaukee Health Department in 1993 became aware of widespread gastrointestinal illness in the community through phone calls from citizens and the media. There was increased absenteeism from schools and businesses, and groceries and pharmacies reported depletion of anti-diarrheal medications. In an event like this, a human expert could associate certain indications and arrive at a conclusion or a few hypotheses to corroborate or refute an event: There have been unusually heavy rains for the last few weeks The Water authority has received several complaints about cloudy water from customers Now we have all these calls and concerns from the community So perhaps I should lean towards a waterborne hypothesis vs. something else… the human eye can also quickly detect a cluster of pins on a map over time and space and make certain assumptions… As we’re faced with a cross-disciplinary problem (human, animal, environment, organisms, etc.) it becomes more clear that we need to offer a collaborative space for experts from multiple fields to work together on solving the problem Back when I was in the trenches of SARS, we found out very quickly the importance of crowdsourcing and the need to share certain types of data quickly
  • Social distance can be more important than the geographic distance Networks can be incrementally developed and don’t need defined a priori Contradictory assumptions can be investigated in parallel (alternate hypotheses for causes, case definitions, etc) Items can be merged if duplication is discovered, or split if needed Each change to an element may trigger notification to users, and business logic Workflow assumes that actions be taken within specific time windows or else additional actions will be triggered Practically every item can be “tagged” by users with notes and supplementary data Users will communicate and collaborate through existing communication channels as much as possible Auditing of each step allows users to “back up” characterizations of health events through their history as well as a wide set of potential metrics for evaluating the processes involved in biosurveillance
  • Social distance can be more important than the geographic distance Networks can be incrementally developed and don’t need defined a priori Contradictory assumptions can be investigated in parallel (alternate hypotheses for causes, case definitions, etc) Items can be merged if duplication is discovered, or split if needed Each change to an element may trigger notification to users, and business logic Workflow assumes that actions be taken within specific time windows or else additional actions will be triggered Practically every item can be “tagged” by users with notes and supplementary data Users will communicate and collaborate through existing communication channels as much as possible Auditing of each step allows users to “back up” characterizations of health events through their history as well as a wide set of potential metrics for evaluating the processes involved in biosurveillance
  • Health Information Service (HIS) Metadata definitions Augment data with additional attributes (e.g., location, date, key words, related terms, video, images) Provide a markup language: GHML (Google Health Markup Language) based on national and international standards which describes the data and extends its meaning Provide a set of APIs and metadata that can support the following features: Search Visualization Collaboration Situational awareness Analysis Alerts Enhance accuracy, reliability, validity and utility by allowing the community of practice to augment the data Allow users to tag data of interest to further refine its meaning Allow users to link and share data that can be used by others (collaboration) Provide publish-and-subscribe functionality (RSS, GeoRSS, SSE, REST…) Allow users to invoke &amp;quot;health agents“
  • 1- Information gets collected from different sources 2-Information gets decorated with different layers of data, like remote sensing information about temperature, humidity or terrain. 3-Machine learning modules classify the articles in the system, determining location, name of diseases, symptoms or syndromes, extracting structured data like epidemiological numbers of suspected or confirm cases. 4-Experts from different disciplines collaborate around the information, creating comments, tagging, relating articles and correcting or training machine-learning algorithms. 5-Experts can use different visualizations and filtering tools, to explore the body of evidence as the event unfolds over time and space and create hypothesis of events that they can discuss or refine with their team members and decide whether they think that a field investigation is needed. 6-Field staff can collect and report information that gets incorporated back to the system.
  • 1- Information gets collected from different sources 2-Information gets decorated with different layers of data, like remote sensing information about temperature, humidity or terrain. 3-Machine learning modules classify the articles in the system, determining location, name of diseases, symptoms or syndromes, extracting structured data like epidemiological numbers of suspected or confirm cases. 4-Experts from different disciplines collaborate around the information, creating comments, tagging, relating articles and correcting or training machine-learning algorithms. 5-Experts can use different visualizations and filtering tools, to explore the body of evidence as the event unfolds over time and space and create hypothesis of events that they can discuss or refine with their team members and decide whether they think that a field investigation is needed. 6-Field staff can collect and report information that gets incorporated back to the system.
  • Saved filters with subscriptions List, Grid or Map views -Tags -Related items Publish and share information through RSS feeds
  • And of course, you can combine filters by tags, with filters by region or any other property that the article has in the system.
  • Hurdles to be overcome Diagnostics – limited availability Data collection – limited capacity Partial coverage – the black holes are getting larger Inconsistent definitions and quality of data Incompatible reporting systems and stove piping of information Political filters Technical: Collaboration: Commenting Capability Notification via a “publish and subscribe” capability Shared group definitions and calendars Shared access to key artifacts Support for Mobile devices (e.g., SMS) and VOIP Organizational – China might not want to share information, others might not want to..lots of policy, etc. required… Evaluation Framework: Overall measures (situation awareness and shared mental model) Individual processes measures Network parameters: Which automated systems generated the most reliable alerts, and for what types of conditions? Which human users where the most effective in identifying conditions? Which indicators are the most effective in identifying a health event? Which elements of the biosurveillance lifecycle require the most time and/or collaboration? The network history will provide a common point of evaluation for a variety of surveillance and response techniques System Evaluation: System description Purpose (detection- and information-based) Stakeholders Operations Health-related event detection Timeliness Validity Validation approach Statistical assessment of validity Data quality System experience System usefulness Flexibility Acceptability Portability Stability Costs Sustainability
  • To recap, The human experts interacting with automated systems The collaborative decision making environment We are sure one day soon we will have an EID (Emerging Infectious Disease) impact assessment... just like there is an environmental impact assessment…
  • E. coli Norwalk-like virus Salmonellosis Dengue fever Herpes Cholera Gastroenteritis Pertussis Rift Valley fever C. difficile Staphylococcal disease Diarrhea Legionellosis Tuberculosis Malaria Chickenpox Measles …
  • Transcript of "Riff: A Social Network and Collaborative Platform for Public Health Disease Surveillance"

    1. 1. Photo credit: IRMA (Integrated Risk Management for Africa)
    2. 2. Taha Kass-Hout and Nicolas di Tada, Summer 2008, Washington, DC, USA.
    3. 3. <ul><li>What is public health disease surveillance </li></ul><ul><ul><li>“ Public health surveillance is the ongoing systematic collection, analysis, and interpretation of health data essential to the planning, implementation, and evaluation of public health practice, closely integrated with the timely dissemination of these data to those who need to know. The final link in the surveillance chain is the application of these data to prevention and control. A surveillance system includes a functional capacity for data collection, analysis, and dissemination linked to public health programs. ” </li></ul></ul><ul><li>What is syndromic surveillance? </li></ul><ul><ul><li>US CDC defines syndromic surveillance as “ surveillance using health-related data that precede diagnosis and signal a sufficient probability of a case or an outbreak to warrant further public health response. ” </li></ul></ul>Thacker, S.B., and Berkelman, R.L. &quot;Public Health Surveillance in the United States.&quot; Epidemiology Reviews 10 (1988): 164-90.
    4. 4. <ul><li>Current systems design, analysis and evaluation of disease surveillance systems has been geared towards specific data sources and detection algorithms – not humans </li></ul><ul><ul><li>Much less has been towards interaction with responders and domain experts across agencies and at multiple levels </li></ul></ul><ul><ul><li>Often provide contradictory interpretations of ongoing events </li></ul></ul><ul><li>We have disease surveillance systems in place for those threats we have been faced with before </li></ul><ul><ul><li>We are more vulnerable to those we know about, but have not faced on a major scale </li></ul></ul><ul><ul><li>Even more vulnerable to those that we don’t know about </li></ul></ul>
    5. 5. <ul><li>The likelihood of disasters and disease outbreaks is growing </li></ul><ul><ul><li>According to a recent Oxfam report, there has been a four-fold increase in the annual number of natural disasters </li></ul></ul><ul><ul><li>30 new infectious diseases identified since 1973 </li></ul></ul><ul><li>Potential impact is getting greater </li></ul><ul><ul><li>Impact on health, economies & security </li></ul></ul><ul><ul><li>Capable of spreading faster than ever before </li></ul></ul>http://www.oxfam.org/en/policy/briefingpapers/bp108_climate_change_alarm_0711
    6. 6. <ul><li>To address these challenges by adopting a social and collaborative decision making approach in order to facilitate </li></ul><ul><ul><li>early characterization and identification of potential health threats </li></ul></ul><ul><ul><li>their verification, assessment and investigation </li></ul></ul><ul><ul><li>in order to recommend measures (public health and others) to control them </li></ul></ul>
    7. 7. <ul><li>Event-based - ad-hoc unstructured reports issued by formal or informal sources </li></ul><ul><li>Indicator-based - (number of cases, rates, proportion of strains…) </li></ul>Timeliness, Representativeness, Completeness, Predictive Value, Quality, Cost, Feasibility, …
    8. 8. Identified risks Mandatory notification Laboratory surveillance Emerging risks Syndromic surveillance Mortality monitoring Healthcare activity monitoring Prescription monitoring Non healthcare based Veterinary surveillance Behavioral surveillance Environmental surveillance Poison centers Food safety/water supply … Domestic Media NGOs Field Epi points <ul><li>International </li></ul><ul><li>Distribution lists </li></ul><ul><ul><li>ProMed (English, Chinese, Spanish, Russian, etc.) </li></ul></ul><ul><li>International agencies </li></ul><ul><ul><li>WHO </li></ul></ul><ul><ul><li>OIE </li></ul></ul><ul><ul><li>CDC </li></ul></ul><ul><ul><li>NASA (e.g., remote sensing, weather, population migration, bird migration, population density, plant, animal) </li></ul></ul><ul><li>Confidential/Limited mailing list dissemination </li></ul><ul><ul><li>ProMed (e.g., MBDS) </li></ul></ul><ul><ul><li>International health regulation agencies (WHO, OIE, CDC, NASA) </li></ul></ul><ul><ul><li>Threat bulletin (EWARN, ECDC) </li></ul></ul><ul><li>Public dissemination </li></ul><ul><ul><li>News, blogs, articles, </li></ul></ul><ul><ul><li>Health ministry press releases sites </li></ul></ul><ul><ul><li>Weekly releases (Eurosurveillance) </li></ul></ul>Adopted from WHO
    9. 9. Reduce Morbidity and Mortality and Improve Health Adopted from WHO
    10. 10. 1000 Shigella infections (100%) 50 Shigella notifications (5%) <ul><li>Main attributes </li></ul><ul><ul><li>Representativeness </li></ul></ul><ul><ul><li>Completeness </li></ul></ul><ul><ul><li>Predictive value positive </li></ul></ul>Specificity / Reliability Sensitivity / Timeliness Get as close to the bottom of the pyramid as possible Urge frequent reporting
    11. 11. Time <ul><li>Main attributes </li></ul><ul><ul><li>Timeliness </li></ul></ul>Analyze and interpret Signal as early as possible Automated analysis/thresholds
    12. 12. <ul><li>Clickstream/Keyword Searching </li></ul><ul><li>Blogs/Chatrooms </li></ul><ul><li>News Sources </li></ul><ul><ul><li>Local </li></ul></ul><ul><ul><li>National </li></ul></ul><ul><ul><li>International </li></ul></ul><ul><li>Curated mailing lists (ProMED) </li></ul><ul><li>Multi-national surveillance (Eurosurveillance) </li></ul><ul><li>Validated official global alerts (WHO) </li></ul>Sensitivity / Timeliness Specificity / Reliability <ul><li>Main attributes </li></ul><ul><ul><li>Data quality </li></ul></ul>
    13. 13. Lab Confirmation Detection/ Reporting First Case Opportunity for control Adopted from WHO Response DAY CASES
    14. 14. First Case Detection/ Reporting Confirmation Investigation Opportunity for control Response DAY CASES Adopted from WHO
    15. 15. Nov 2002 Mar 2003 Progression of outbreak Electronic Surveillance Adopted from Brownstein, et al. Cases of atypical pneumonia Foshan Nov 16th Infected Chinese Doctor Hong Kong hotel Feb 21st 305 Cases of acute resp Guangdong Province Feb 11th Pharma report Guangdong Province November 27 Media reports Guangdong Province Feb 10 Astute physician on ProMED Feb 10 Initial WHO Report Feb 25 Official WHO Report March 10
    16. 16.
    17. 17. News articles Alerts Disease reports
    18. 18. 9/20, 15213, cough/cold, … 9/21, 15207, antifever, … 9/22, 15213, CC = cough, ... 1,000,000 more records… Huge mass of data Detection algorithm Too many alerts Duplicative and uni-directional channels Uncoordinated response
    19. 19. <ul><li>Hybrid: Machine- and Human-based </li></ul><ul><li>Social, collaborative and cross-disciplinary </li></ul><ul><li>Web 2.0/3.0 platform </li></ul>
    20. 20. <ul><li>Better detection model </li></ul><ul><li>Better response model </li></ul>Source: http://www.pbs.org/wgbh/pages/frontline/shows/georgia/outbreak/matrix.html Source: www.sociology.columbia.edu/pdf-files/bearmanarticle.pdf
    21. 21. News item 345 Field alerts Disease report Health News Field alerts News sources Alerts Data + Metadata <ul><ul><li>Collaboration and multi-directional communication between interested groups </li></ul></ul><ul><ul><li>Interactions beyond that allowed by original sources and with controlled visibility </li></ul></ul><ul><ul><li>Customizable, secure ‘social’ and ‘professional’ metadata around information </li></ul></ul>
    22. 22. 9/20, 15213, cough/cold, … 9/21, 15207, antifever, … 9/22, 15213, CC = cough, ... 1,000,000 more records… Huge mass of data Feedback loop Fewer and more actionable alerts Effective and coordinated response Multi-directional communication
    23. 23. Feature extraction (including geo-location) Tags Comments Location Flags/Alerts/Bookmarks Environment Factors Animal Health Factors Remote Sensing Event Classification and Detection Previous Event Training Data Previous Event Control Data Metadata extraction Other reference information Machine learning Show event characterizations Social network Other inferred information … Professional network feedback Professional feedback Anomaly detection Multiple data streams (multi-lingual) User-Generated and Machine Learning Metadata Existing Social Network (e.g., Comm. of interest) Riff Bot
    24. 24.
    25. 25. Kass-Hout and di Tada: Best Poster Award for Improving Public Health Investigation and Response at the Seventh Annual ISDS Conference, December 3-5, 2008 at the Raliegh Conference Civic Center. http://kasshout.blogspot.com/2008/12/best-poster-award-for-improving-public.html and http://www.isdsjournal.org/article/viewArticle/3308
    26. 26. Search: _____ {tag Cloud} Terms tagged by human collaborators or source {Event Tag cloud} X Diarreha X Cholera X Influenza X Respiratory lllness X Fever [Show me unusual distributions]
    27. 27.
    28. 28.
    29. 29. Filters Item (e.g., disease report, news article, alert) summary and location (s) Tag cloud Subscriptions SMS alerts Ratings, comments, alerts, flags Tags (automatic + humans classification) Thread (related Items)
    30. 30. <ul><li>LOCATIONS </li></ul><ul><li>HEATMAP </li></ul>
    31. 31.
    32. 32.
    33. 33.
    34. 34.
    35. 35.
    36. 36. Tracking the Avian Influenza Outbreak in Egypt (reports started to appear late January 2009).
    37. 37. <ul><li>Current classifications (automated and corrected by human experts) includes: </li></ul><ul><ul><li>7 syndromes </li></ul></ul><ul><ul><li>10 transmission modes </li></ul></ul><ul><ul><li>> 100 infectious diseases </li></ul></ul><ul><ul><li>> 180 micro-organisms </li></ul></ul><ul><ul><li>> 140 symptoms </li></ul></ul><ul><ul><li>> 50 chemicals </li></ul></ul>HFOSS Disease Ontology Prediction Project http://2009.hfoss.org/Evolve_-_Disease_Ontology_Prediction
    38. 38. <ul><li>Over the summer, the Humanitarian FOSS (HFOSS) Project Summer Institute 2008 (May' 08 - July' 08) carried out an internship project mentored by InSTEDD and a number of HFOSS faculty. During this internship, Juan Pablo Mendoza and Qianqian Lin developed ALPACA Light Parsing And Classifying Application ( ALPACA ) to: </li></ul><ul><ul><li>Transform raw unstructured documents (e.g., news reports, ProMED mail , etc.) into machine readable and analyzable data using a text parsing module </li></ul></ul><ul><ul><li>Categorize documents using a SVM classifier using libSVM for:  </li></ul></ul><ul><ul><ul><li>a) Classification into a predetermined (user-defined) list of categories as described above (syndromes, symptoms, routes of transmission, diseases, etc.), and  </li></ul></ul></ul><ul><ul><ul><li>b) Suggesting additional tags and/or topics using a Naive Bayes classifier given existing topics and monitoring human input and review. This is especially helpful with new (emerging) threats or those threats that we know about but we experience them at a much bigger scale than usual (e.g., far more virulent flu virus than we’ve experienced over the past few years) </li></ul></ul></ul>
    39. 39. <ul><li>We tested ALPACA against two widely accepted early sources of information in the public health community; Reuters news and ProMED mail . Results are shown here: </li></ul><ul><li>  </li></ul><ul><li>  </li></ul><ul><li>  </li></ul><ul><li>  </li></ul><ul><li>  </li></ul><ul><li>  </li></ul><ul><li>  </li></ul><ul><li>  </li></ul><ul><li>  </li></ul><ul><li>  </li></ul><ul><li>  </li></ul><ul><li>  </li></ul><ul><li>ALPACA is extensible through a plug-in functionality that provides a simple way to add additional parsers and classifiers to the application. We are continuously adding and testing additional algorithms and we welcome your contribution to help us better calibrate existing classifiers and parsers as well as introduce additional ones (you can visit our collaborative space here .) </li></ul>
    40. 40. <ul><li>To-date, we have: </li></ul><ul><ul><li>480 registered users </li></ul></ul><ul><ul><li>394 collaboration spaces </li></ul></ul><ul><ul><li>694 streams of information sources (RSS, SMS, etc.) </li></ul></ul><ul><ul><li>900.000 items [e.g., news articles, disease reports] analyzed </li></ul></ul><ul><ul><li>443,151 geo-coded locations </li></ul></ul><ul><ul><li>700 terms [tags] ‘trained’ [accept/reject] by human experts </li></ul></ul><ul><ul><li>12.000+ tags ‘suggested’ by human experts </li></ul></ul>
    41. 41. <ul><li>Technical considerations </li></ul><ul><ul><li>Collaboration </li></ul></ul><ul><ul><li>Workflow </li></ul></ul><ul><li>Organizational considerations </li></ul><ul><li>Evaluation framework </li></ul>
    42. 42. <ul><li>Latest Progress </li></ul><ul><ul><li>Ontologies (e.g., BioCaster, SNOMED, ICD) </li></ul></ul><ul><ul><li>Event reporting, analysis and public announcements (e.g., Thomson Reuters Foundation’s Emergency Information Service (EIS) deployment during the Haiti Response, 2010 </li></ul></ul><ul><li>Planned Steps </li></ul><ul><ul><li>API for external extensions and interactions </li></ul></ul><ul><ul><li>Full support for structured data </li></ul></ul><ul><ul><li>Automatic field data collection through forms, SMS, etc. </li></ul></ul><ul><ul><li>Anomaly detections (e.g., EARS) </li></ul></ul>http://alertnet.org/db/blogs/1564/2010/00/24-120746-1.htm http://ndt.instedd.org/search/label/eis
    43. 43.
    44. 44.
    45. 45. Taha Kass-Hout, MD, MS http://kasshout.blogspot.com Nicolás di Tada [email_address] Riff http://riff.instedd.org [Software: http://code.google.com/p/riff-evolve Code license: GNU General Public License v3, Content license: Creative Commons 3.0 BY-SA] Cambodia, Photo taken by Taha Kass-Hout, October 2008 “ this pic says it all- our kids are all the same- they deserve the same ”, Comment by Robert Gregg on Facebook, October 2008
    46. 47. Kass-Hout and di Tada: Best Poster Award for Improving Public Health Investigation and Response at the Seventh Annual ISDS Conference, December 3-5, 2008 at the Raliegh Conference Civic Center. http://kasshout.blogspot.com/2008/12/best-poster-award-for-improving-public.html and http://www.isdsjournal.org/article/viewArticle/3308
    47. 48. <ul><li>Detection-focused visualization </li></ul><ul><ul><li>Individual alert listings </li></ul></ul><ul><ul><li>Summary alerts </li></ul></ul><ul><ul><li>Alerts in time-series graph </li></ul></ul><ul><ul><li>Mapping alerts </li></ul></ul><ul><li>Information-based visualization (visualizing data and information) </li></ul><ul><ul><li>Data query </li></ul></ul><ul><ul><li>Data stratification </li></ul></ul><ul><ul><li>Time-series graphs </li></ul></ul><ul><ul><li>Data line listing </li></ul></ul><ul><ul><li>Matrix portal </li></ul></ul><ul><ul><li>Mapping </li></ul></ul><ul><li>How to communicate information to users of the system. Typically there are three basic components: </li></ul><ul><li>Time-series graphs </li></ul><ul><li>Maps </li></ul><ul><li>Data tables </li></ul><ul><li>However, depending on the primary focus (detection-bases, information-based, or a hybrid of both) there can be more components as follows: </li></ul>
    48. 49. <ul><li>Detection-based systems (alert listings and maps of current anomalies are the two most important visualization components) </li></ul><ul><ul><li>What-if scenarios </li></ul></ul><ul><ul><li>Automatic anomaly detection </li></ul></ul><ul><li>Statistical anomaly </li></ul><ul><li>System believes there is an anomaly of interest to the user </li></ul><ul><li>Information-based systems (GIS, time-series graphs, data tables, query wizards, and real-time displays are the most important visualization components) </li></ul><ul><ul><li>Create new case definition </li></ul></ul><ul><ul><li>Select different processing options </li></ul></ul><ul><ul><li>Customize presentation to meet users needs </li></ul></ul><ul><li>Additionally, we propose building a hybrid solution that combines both detection and information-based systems, which supports the following: </li></ul>
    49. 50. <ul><li>Early detection </li></ul><ul><li>Spatio-temporal </li></ul><ul><li>Belief Networks (BNs) </li></ul><ul><li>Simulation and modeling </li></ul><ul><li>Provide earlier notification of a change in the normal levels of observed counts of the desired health indicator. </li></ul><ul><li>Emphasis on the importance of matching the analytic process to the data type so as to achieve the performance needed for early identification of the event with minimum false alarms (Type I and Type II errors). </li></ul><ul><li>Performance evaluation f analytic processes using accepted metrics. </li></ul>
    50. 51. <ul><li>Statistical decision of the analytic data monitor include: </li></ul><ul><ul><li>which combinations of data sources to test </li></ul></ul><ul><ul><li>which algorithms to use with respect to characteristics of the data background </li></ul></ul><ul><ul><li>how to achieve sensitivity over many locations within manageable false alert rate frequency </li></ul></ul><ul><ul><li>how much corroboration among data streams is required to achieve a threshold for escalating the information </li></ul></ul><ul><li>A multiplicity of data sources has appeal because consistent evidence may be employed to suggest inferential accuracy. In practice, however, multiple data sources can be contradictory. Decision requirements for the prospective analytic data monitor involve when and how deeply to investigate a data anomaly as well as when to escalate the information (as an alert) for action. Unambiguous, corroborated data spikes are the exception rather than the rule. For single data streams, univariate algorithms employ data modeling and hypothesis tests to provide systematic signal escalation protocols. </li></ul>
    51. 52. <ul><li>Use of application-linked and hyperlinked-fields for integration of analysis and visualization tools </li></ul><ul><li>Commenting Capability </li></ul><ul><li>Notification via a “publish and subscribe” capability </li></ul><ul><li>Shared group definitions and calendars </li></ul><ul><li>Shared access to key artifacts </li></ul><ul><li>Support for Mobile devices (e.g., SMS) and VOIP </li></ul>
    52. 53. <ul><li>The enforcement of the business rules for distributing and validating alerts, escalation, and the definition of tasks </li></ul><ul><li>Keeping the business logic encapsulated in an business engine, as opposed to “coding it into” the core applications </li></ul><ul><li>Modification of operations “on the fly”, and supporting different modes of operation depending on the current level of emergency </li></ul>
    53. 55. <ul><li>Overall measures </li></ul><ul><ul><li>Situation Awareness Global Assessment Technique (SAGAT) </li></ul></ul><ul><ul><li>The Situation Awareness Rating Technique (SART) </li></ul></ul><ul><li>Individual processes measures </li></ul><ul><li>Network parameters </li></ul>
    54. 56. <ul><li>Which automated systems generated the most reliable alerts, and for what types of conditions? </li></ul><ul><li>Which human users where the most effective in identifying conditions? </li></ul><ul><li>Which indicators are the most effective in identifying a health event? </li></ul><ul><li>What factors help to minimize or aggravate a health event? </li></ul><ul><li>Which elements of the biosurveillance lifecycle require the most time and/or collaboration? </li></ul><ul><li>The network history will provide a common point of evaluation for a variety of surveillance and response techniques </li></ul>
    55. 57. <ul><li>System description </li></ul><ul><ul><li>Purpose (detection- and information-based) </li></ul></ul><ul><ul><li>Stakeholders </li></ul></ul><ul><ul><li>Operations </li></ul></ul><ul><li>Health-related event detection </li></ul><ul><ul><li>Timeliness </li></ul></ul><ul><ul><li>Validity </li></ul></ul><ul><li>Validation approach </li></ul><ul><li>Statistical assessment of validity </li></ul><ul><li>Data quality </li></ul><ul><li>System experience </li></ul><ul><ul><li>System usefulness </li></ul></ul><ul><ul><li>Flexibility </li></ul></ul><ul><ul><li>Acceptability </li></ul></ul><ul><ul><li>Portability </li></ul></ul><ul><ul><li>Stability </li></ul></ul><ul><ul><li>Costs </li></ul></ul><ul><li>Evaluation here is primarily for the timely detection of health-related event and effectiveness of response. We have to keep in mind the flexibility of the system and how it can meet both regular and advanced users. Advanced users often want control in order to customize queries, modify graphic presentation, adjust sensitivity levels of detection algorithms, etc. </li></ul>
    56. 59. <ul><li>Source Type </li></ul><ul><li>Non-Specific </li></ul><ul><ul><li>Syndromic </li></ul></ul><ul><li>Specific </li></ul><ul><ul><li>Case Definition </li></ul></ul>Note: All tags can follow a hierarchical construct Ontology Example: A subset of disease ontology, showing relationships between the various forms of pneumonia. Pneumonia and influenza Pneumonia due to Staphylococcus aureus Other bacterial pneumonia Pneumococcal pneumonia Pneumonia due to Hemophilus influenzae
    57. 60. <ul><li>Human </li></ul><ul><ul><li>Prodromal </li></ul></ul><ul><ul><li>Clinical </li></ul></ul><ul><ul><li>Morbidity and Mortality </li></ul></ul><ul><li>Animal </li></ul><ul><li>Environmental (or Climate) </li></ul><ul><li>Allied Professional Source </li></ul>
    58. 61. <ul><li>Building/vessel contamination </li></ul><ul><li>Continuous or intermittent release of an agent </li></ul><ul><li>Contagious person-to-person </li></ul><ul><li>Commercially distributed products </li></ul><ul><li>Waterborne </li></ul><ul><li>Vector/host borne </li></ul><ul><li>Sexually transmitted </li></ul><ul><li>Other </li></ul><ul><ul><li>Large-scale bioaerosol </li></ul></ul><ul><ul><li>Premonitory release of agent </li></ul></ul><ul><ul><li>… </li></ul></ul>
    59. 62. <ul><li>CLIMATE </li></ul><ul><li>PEOPLE </li></ul><ul><li>Temperature change </li></ul><ul><li>Precipitation change </li></ul><ul><li>Wind change </li></ul><ul><li>… </li></ul><ul><li>Die-offs observed </li></ul><ul><li>Sentinels tested </li></ul><ul><li>… </li></ul>ANIMAL <ul><li>Increased mortality rate </li></ul><ul><li>Increased presentations for treatment </li></ul><ul><li>… </li></ul><ul><li>Building/vessel contamination </li></ul><ul><li>Continuous or intermittent release of an agent </li></ul><ul><li>Contagious person-to-person </li></ul><ul><li>Commercially distributed products </li></ul><ul><li>Waterborne </li></ul><ul><li>Vector/host borne </li></ul><ul><li>Sexually transmitted </li></ul><ul><li>Other </li></ul><ul><ul><li>Large-scale bioaerosol </li></ul></ul><ul><ul><li>Premonitory release of agent </li></ul></ul><ul><ul><li>… </li></ul></ul>TRANSMISSION ROUTE Note: All tags can follow a hierarchical construct
    60. 63. <ul><li>RESPIRATORY </li></ul><ul><li>BREATHING DIFFICULTY </li></ul><ul><li>GI </li></ul><ul><li>Hemoptysis </li></ul><ul><li>Asthma attack </li></ul><ul><li>Croup </li></ul><ul><li>Pneumonia </li></ul><ul><li>Wheezing </li></ul><ul><li>Runny or stuffy nose </li></ul><ul><li>Pleuritic pain </li></ul><ul><li>Sore throat </li></ul><ul><li>URI </li></ul><ul><li>… </li></ul><ul><li>Fever </li></ul><ul><li>Weakness </li></ul><ul><li>Anorexia </li></ul><ul><li>Viral syndrome </li></ul><ul><li>Faintness </li></ul><ul><li>Malaise </li></ul><ul><li>Body aches </li></ul><ul><li>General illness </li></ul><ul><li>Chills </li></ul><ul><li>Lymphadenopathy </li></ul><ul><li>Sweating </li></ul><ul><li>… </li></ul>CONSTITUTIONAL IRRITABLE BABY <ul><li>Abdominal pain </li></ul><ul><li>Diarrhea </li></ul><ul><li>Vomiting </li></ul><ul><li>Nausea </li></ul><ul><li>Gastroenteritis </li></ul><ul><li>Dehydration </li></ul><ul><li>… </li></ul><ul><li>Cough </li></ul><ul><li>Sore throat </li></ul><ul><li>Fever </li></ul><ul><li>Weakness </li></ul><ul><li>Viral syndrome </li></ul><ul><li>Body aches </li></ul><ul><li>Bronchiolitis </li></ul><ul><li>Pnemonia </li></ul><ul><li>Upper respiratory infection </li></ul><ul><li>Malaise </li></ul><ul><li>Chills </li></ul><ul><li>Influenza </li></ul><ul><li>… </li></ul>INFLUENZA-LIKE ILLNESS (OR ILI) Note: All tags can follow a hierarchical construct
    61. 64. <ul><li>UNDIAGNOSED </li></ul><ul><li>GI </li></ul><ul><li>Respiratory </li></ul><ul><li>… </li></ul><ul><li>DIAGNOSED </li></ul><ul><li>Influenza </li></ul><ul><ul><li>Avian influenza </li></ul></ul><ul><ul><li>… </li></ul></ul><ul><li>Can be mapped to standards , such as: </li></ul><ul><li>Unified Medical Language System (UMLS) [which supports SNOMED, LOINC, ICDs, etc.] http://www.nlm.nih.gov/research/umls/ </li></ul><ul><li>PHIN VADS ( http://www.cdc.gov/PHIN ) </li></ul><ul><li>Case Definition: </li></ul><ul><ul><li>Probable </li></ul></ul><ul><ul><li>Possible </li></ul></ul><ul><ul><li>Confirmed </li></ul></ul>Note: All tags can follow a hierarchical construct
    62. 65.
    63. 66. Cough [13 of 130] If Item has: Runny Nose [20 of 130] Fever [23 of 130] Then tag it with: Flu [10 of 130] Admin configures a new inference: User sees a suggestion for a new item: System will analyze the existing tagged Items and find out the probability of an item been a flu given that it has cough, runny nose and fever. Flu [85% confidence because of cough, runny nose and fever] Influenza [55% confidence because of cough and headace] Tags inferred
    64. 67. Cough Longitude Latitude Fever 3 items clustered because of its proximity and similar symptoms Note: This is actually done in a n-dimensional space, n being the number of tags available, plus the number of relevant words detected, plus a possible spatio-temporal dimension Time
    65. 68. <ul><li>Each item gets represented by a vector of the relevant words it contains with the corresponding frequency. </li></ul><ul><li>Each tag classifier gets its linear classifier, which needs at least one positive and one negative sample. The classification is based on the vectors for each item, the linear classifier creates a hyperplane which divides the n-space in two for positive and negative predictions. </li></ul><ul><li>Whenever a user corrects or confirms a suggestion we feedback the classifier. </li></ul><ul><li>Any number of BNs can be created to map some “evidence” tags to a “prediction” tag. The system will measure for each item the probability of having that tag based on the existence of previous tags. </li></ul><ul><li>The vectors for the items can be grouped to find clusters. This will mean that those items are near in the n-space so they have similar values for their word content and tags. </li></ul>
    66. 69. <ul><li>This is just an initial approach, there are a number of alternatives implementations: </li></ul><ul><ul><li>Automatic tagging can be done using clustering: we create clusters for each tag and for new items we measure to which cluster centroids the item is closer. </li></ul></ul><ul><ul><li>Automatic tagging can also be done using BNs, our evidence can be the words and we can measure the probability of a certain tag based on the words contained by the item. </li></ul></ul><ul><ul><li>New Tag suggestions can be done using clustering instead of BNs: clustering all the items and suggesting tags that some of the items in the cluster have and the others don’t. </li></ul></ul><ul><li>Given that we implement the algorithm abstractly enough, it should be simple to interchange them and see what works best. </li></ul>
    67. 70. P(malaria) = 22% P(influenza) = 13% P(other ILI) = 33%
    68. 71. <ul><li>Classifiers </li></ul><ul><li>Clustering </li></ul><ul><li>Bayesian Statistics </li></ul><ul><li>Neural Networks </li></ul><ul><li>Genetic Algorithms </li></ul>
    69. 72. cold fever
    70. 73. <ul><li>Map items to vectors (Feature extraction) </li></ul><ul><li>Normalize those vectors </li></ul><ul><li>Train the classifier </li></ul><ul><li>Measure the results with new information </li></ul><ul><li>Feedback the classifier </li></ul><ul><li>Separate classes in feature space </li></ul>
    71. 74.
    72. 75. <ul><li>Support vectors define the separator </li></ul>
    73. 76. Φ : x -> φ ( x ) Map to higher-dimension space
    74. 77. Classifier Document 1 Document 2 Document 3 Positives Negatives Training Document Training Document
    75. 78. <ul><li>Map items to vectors (Feature extraction) </li></ul><ul><li>Normalization </li></ul><ul><li>Agglomerative or Partitional </li></ul>
    76. 79.
    77. 80.
    78. 81. Probability of disease A (flu) once symptom B (fever) is observed Probability of fever once flu is confirmed Probability of flu (prior or marginal) Probability of fever (prior or marginal)
    79. 82. <ul><li>Given a set of stimuli, train a system to produce a given output… </li></ul>
    80. 83. Hidden Layer Output Layer Input Layer […] […] {I 0 ,I 1 ,……I n } {O 0 ,O 1 ,……O n } Weight
    81. 84. Event?
    82. 85. <ul><li>Define the model that you want to optimize </li></ul><ul><li>Create the fitness function </li></ul><ul><li>Evolve the gene pool testing against the fitness function. </li></ul><ul><li>Select the best individual </li></ul>
    83. 86. <ul><li>Model the transmission process using a set of parameters ( e.g., an infectious disease ): </li></ul><ul><ul><li>Onset time between an infection and illness </li></ul></ul><ul><ul><li>Latency period </li></ul></ul><ul><ul><li>Incubation period </li></ul></ul><ul><ul><li>Symptomatic period </li></ul></ul><ul><ul><li>Infectious period </li></ul></ul>(Onset, Latency, Incubation, Symptomatic , Infectious) ( 2 days, 3 days, 1 day, 4 days, 3 days)
    84. 87. Fitness = 1/Area
    85. 88. <ul><li>Create an initial population of candidates </li></ul><ul><li>Use operators to generate new candidates (mating and mutation) </li></ul><ul><li>Discard worst individuals or select best individuals in generation </li></ul><ul><li>Repeat from 2 until you find a candidate that satisfies the solution searched </li></ul>
    86. 89. (4, 5 ,6, 3 ,5) (4,3,6,2,5) (5,3,4,6,2) (2,4,6,3,5) (4,3,6,5,2) (2,3,4,6,5) (3,4,5,2,6) (3,5,4,6,2) (4,5,3,6,2) (5,4,2,3,6) (4,6,3,2,5) (3,4,2,6,5) (3,6,5,1,4) ( 5,3 , 2,6,5 ) ( 3,4 , 4,6,2 ) ( 5,3 , 2,6,5 ) ( 3,4 , 4,6,2 )
    87. 90.
    88. 92. <ul><ul><li>Each &quot;pill&quot; is an hypothesis, it has the event tags on top, followed by the author.  </li></ul></ul><ul><ul><li>Each hypothesis can have many responses to it, a response means someone disagrees or wants to add something to the proposed hypothesis. </li></ul></ul><ul><ul><li>The Red area are items that have been rejected by the responder to the previous hypothesis. Something like &quot;Yes, I agree with you, but this and this don't belong here&quot;. </li></ul></ul><ul><ul><li>The Blue area means items that have been added to the responded hypothesis, like &quot;Yes, you are right, but also this and this items should be included in this event. </li></ul></ul>
    89. 93. <ul><ul><li>A hypothesis response can include changes in the event tags: &quot;You are right, there's definitely something going on with all those items, but I don't think its airborne, I think its waterborne&quot; </li></ul></ul><ul><ul><li>The list of people below the pill are all the users that have subscribed or agreed to that hypothesis: &quot;Yes, Doc. James is right, I put my signature here.&quot; </li></ul></ul><ul><ul><li>The size of the subscribers below is relative to their &quot;reputation&quot; inside Riff, the whole area below a pill gives an idea of how well supported a hypothesis is. We could also grow the pill proportionally. </li></ul></ul>
    90. 94. <ul><ul><li>The &quot;bold&quot; pill is the confirmed one, once there's a confirmation, that should increase the reputation of all the subscribers to that hypothesis.  </li></ul></ul><ul><ul><li>Hovering each pill's area, should display a summary of the number of items there, maybe the area and the items tags.  </li></ul></ul><ul><ul><li>Clicking should popup a small navigation of the actual items contained there. </li></ul></ul>
    91. 96. <ul><li>Can trend analysis predict outbreaks? </li></ul><ul><li>Recent studies show that Internet search has: </li></ul><ul><ul><li>… considerable potential as one of the earliest indicators for syndromic surveillance </li></ul></ul><ul><ul><li>… the potential to predict population-based events relevant to public health </li></ul></ul><ul><ul><li>… the potential for a higher sensitivity compared to other early sources (e.g., media, ProMed) </li></ul></ul>
    92. 97. <ul><li>Many individuals experiencing symptoms of illness conduct Internet search prior to seeking medical attention </li></ul><ul><ul><li>Wilson, Kumanan, Brownstein, John S., Early detection of disease outbreaks using the Internet, CMAJ 2009 180: 829-831 </li></ul></ul><ul><ul><li>Ginsberg J, Mohebbi MH, Patel RS, et al. Detecting influenza epidemics using search engine query data. Nature 2009;457:1012–4 </li></ul></ul><ul><ul><li>Mostashari F. Can Internet searches provide useful data for public health surveillance?. Advances in Disease Surveillance 2007;2:209 </li></ul></ul><ul><ul><li>Wethington H, Bartlett P. Usage and data collection patterns for a novel web-based foodborne-disease surveillance system. J Environ Health. 2006 Mar;68(7):25-9 </li></ul></ul><ul><ul><li>Cooper CP, Mallon KP, Leadbetter S, Pollack LA, Peipins LA. Cancer Internet search activity on a major search engine, United States 2001-2003. J Med Internet Res 2005;7:e36 Li CS, Aggarwal C, Campbell M, et al. Site-Based Biosurveillance. MMWR September 24, 2004 / 53(Suppl);249 </li></ul></ul><ul><ul><li>Eysenbach G, Kohler C. What is the prevalence of health-related searches on the World Wide Web? Qualitative and quantitative analysis of search engine queries on the Internet. Proc AMIA Annu Fall Symp 2003;225-9 </li></ul></ul>
    93. 98. Internet search for allergies and ragweed search terms increase in the spring , and allergy and pollen search terms increase significantly in the fall . It would also appear that Texas and Oklahoma are leading locales for ragweed. Source: Mostashari F. Can Internet searches provide useful data for public health surveillance?. Advances in Disease Surveillance 2007;2:209
    94. 99. A search for the term “leptospirosis” in the United States finds dramatically higher search rates from Honolulu, Hawaii, consistent with the epidemiology of the illness in the United States (more than half of all national cases are reported from Hawaii). Source: Mostashari F. Can Internet searches provide useful data for public health surveillance?. Advances in Disease Surveillance 2007;2:209
    95. 100. Internet search for “contact lens” increased in Singapore in February 2006, prior to the notification from CDC of the first US cases of contact lens-associated Fusarium keratitis in March 2006, and prior to widespread news coverage in April 2006. Source: Mostashari F. Can Internet searches provide useful data for public health surveillance?. Advances in Disease Surveillance 2007;2:209
    96. 101. Following large anti-war protests on the Mall in Washington DC in late September 2005, multiple environmental sensors watching for bioterror events detected the presence of Francisella tularensis . Interestingly, queries appear to have increased prior to discovery of the sensor findings by public health officials on September 30 th . Source: Mostashari F. Can Internet searches provide useful data for public health surveillance?. Advances in Disease Surveillance 2007;2:209
    97. 102. While uncommon words like “croup” readily reveal the expected seasonal pattern of Internet search, more common words like “cough” or “throat” require logical modifiers to rule out more common search phrases. Source: Mostashari F. Can Internet searches provide useful data for public health surveillance?. Advances in Disease Surveillance 2007;2:209
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×