Biosurveillance: Machine Learning And Disease Surveillance by Kass-Hout Di Tada

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    1 Favorite

    Biosurveillance: Machine Learning And Disease Surveillance by Kass-Hout Di Tada - Presentation Transcript

    1. Taha Kass-Hout, MD, MS Nicolás di Tada October 2008 MACHINE LEARNING AND DISEASE SURVEILLANCE
    2. Image source: http://www.birds.cornell.edu/crows/images/deadcrow.jpg Image source: http://farm3.static.flickr.com/2029/2239605500_6ef2fd2295.jpg?v=0
    3. LATE DETECTION – RESPONSE DAY CASES Opportunity for control
    4. EARLY DETECTION AND RESPONSE DAY CASES Opportunity for control
    5. INFORMATION SOURCES
      • Event-based – ad-hoc unstructured reports issued by formal or informal sources
      • Indicator-based – (number of cases, rates, proportion of strains…)
    6. PUBLIC HEALTH MEASURES
      • Representativeness
      • Completeness
      • Predictive Value
      • Timeliness
    7. PUBLIC HEALTH MEASURES 1000 Malaria infections (100%) 50 Malaria notifications (5%) Specificity / Reliability Sensitivity / Timeliness
        • Main attributes
          • Representativeness
          • Completeness
          • Predictive value positive
      Get as close to the bottom of the pyramid as possible Urge frequent reporting: Weekly  daily  immediately
    8. PUBLIC HEALTH MEASURES Analyze and interpret Automated analysis/ thresholds Time
        • Main attributes
          • Timeliness
      Health care hotline Signal as early as possible
    9. THE PROBLEM SPACE
      • Current systems design, analysis and evaluation has been geared towards specific data sources and detection algorithms – not humans
      • We have systems in place for those threats we have been faced with before
    10. PUBLIC HEALTH – TWO PERSPECTIVES
      • Case management
        • Individual cases of notifiable diseases
        • Relationship networks (contact tracing)
      • Population surveillance
        • Larger risk patterns
    11. CASE MANAGEMENT
      • Questions/problems:
        • Is a case due to recent transmission?
        • If so, does the case share any feature with other, recent cases?
      • Ways it's being done:
        • Investigations/interviews
        • Meeting with other investigators
    12. POPULATION SURVEILLANCE
      • Questions/problems:
        • Are more cases happening than expected?
        • Does an excess suggest ongoing transmission in a specific region?
      • Way it's being done:
        • Semi-automated routine temporal and space-time statistical analysis
    13. WHY LOCATION MATTERS – CASE MANAGEMENT
      • If you are studying a case of a certain disease that was just declared
      • It is harder to picture the situation by looking at something as this..
    14. WHY LOCATION MATTERS – CASE MANAGEMENT
    15. WHY LOCATION MATTERS – CASE MANAGEMENT
      • Than by looking at this..
    16. WHY LOCATION MATTERS – CASE MANAGEMENT
    17. WHY LOCATION MATTERS – POP SURVEILLANCE
      • If you are studying the spatial distribution of a set of disease clusters
      • This would seem more difficult..
    18. WHY LOCATION MATTERS – POP SURVEILLANCE
    19. WHY LOCATION MATTERS – POP SURVEILLANCE
      • Than this..
    20. WHY LOCATION MATTERS – POP SURVEILLANCE
    21. MODERN DISEASE SURVEILLANCE
      • In the past two decades, much disease surveillance research has focused on developing analytical methods for automatically detecting anomalous patterns in data
      • Modern methods can achieve timely detection of anomalies by incorporating temporal , spatial , and multivariate information
    22. MODERN DISEASE SURVEILLANCE 9/20, 15213, cough/cold, … 9/21, 15207, antifever, … 9/22, 15213, CC = cough, ... 1,000,000 more records… Huge mass of data Detection algorithm “ What are we supposed to do with this?” Too many alerts
    23. MODERN DISEASE SURVEILLANCE 9/20, 15213, cough/cold, … 9/21, 15207, antifever, … 9/22, 15213, CC = cough, ... 1,000,000 more records… Huge mass of data Feedback loop
    24. ADVANTAGES OF MACHINE LEARNING P(malaria) = 22% P(influenza) = 13% P(other ILI) = 33%
    25. MACHINE LEARNING TECHNIQUES
      • Classifiers
      • Clustering
      • Bayesian Statistics
      • Neural Networks
      • Genetic Algorithms
    26. HOW TO REPRESENT A DOCUMENT? “ This morning I woke up with fever, I might have a flu.” “ I had a flu last month. […] I had a flu early this week.” flu fever
    27. CLASSIFIERS – PROBLEM DEFINITION
      • Map items to vectors (Feature extraction)
      • Normalize those vectors
      • Train the classifier
      • Measure the results with new information
      • Feedback the classifier
      • Separate classes in feature space
    28. CLASSIFIERS - SVM
    29. SVM – MARGIN MAXIMIZATION
      • Support vectors define the separator
    30. SVM – NON LINEAR? Φ : x -> φ ( x ) Map to higher-dimension space
    31. SVM – FILTERING OR CLASSIFYING Document 1 Document 2 Document 3 Positives Negatives Training Document Training Document Classifier
    32. CLUSTERING – PROBLEM DEFINITION
      • Map items to vectors (Feature extraction)
      • Normalization
      • Agglomerative and Partitional
    33. CLUSTERING - AGGLOMERATIVE
    34. CLUSTERING - PARTITIONAL
    35. BAYESIAN STATISTICS Probability of disease A (flu) once symptoms B (fever) are observed Probability of fever once flu is confirmed Probability of flu (prior or marginal) Probability of fever (prior or marginal)
    36. NEURAL NETWORKS
      • Given a set of stimulus, train a system to produce a given output
    37. NEURAL NETWORKS - STRUCTURE Hidden Layer Output Layer Input Layer […] […] {I 0 ,I 1 ,……I n } {O 0 ,O 1 ,……O n } Weight
    38. NEURAL NETWORK - APPLICATION Event?
    39. GENETIC ALGORITHM - BASICS
      • Define the model that you want to optimize
      • Create the fitness function
      • Evolve the gene pool testing against the fitness function.
      • Select the best individual
    40. GENETIC ALGORITHM – MODEL
      • Model the transmission process using a set of parameters:
        • Onset time between an infection and illness
        • Latency period
        • Incubation period
        • Symptomatic period
        • Infectious period
      (Onset, Latency, Incubation, Symptomatic , Infectious) ( 2 days, 3 days, 1 day, 4 days, 3 days)
    41. GENETIC ALGORITHM – MODEL FITNESS Fitness = 1/Area
    42. GENETIC ALGORITHM – PROCESS
      • Create an initial population of candidates
      • Use operators to generate new candidates (mating and mutation)
      • Discard worst individuals or select best individuals in generation
      • Repeat from 2 until you find a candidate that satisfies the solution searched
    43. GENETIC ALGORITHM - PROCESS (4, 5 ,6, 3 ,5) (4,3,6,2,5) (5,3,4,6,2) (2,4,6,3,5) (4,3,6,5,2) (2,3,4,6,5) (3,4,5,2,6) (3,5,4,6,2) (4,5,3,6,2) (5,4,2,3,6) (4,6,3,2,5) (3,4,2,6,5) (3,6,5,1,4) ( 5,3 , 2,6,5 ) ( 3,4 , 4,6,2 ) ( 5,3 , 2,6,5 ) ( 3,4 , 4,6,2 )
    44. RESULTS – IMPROVED SURVEILLANCE
    45. Q&A
    46. THANK YOU!
      • Taha Kass-Hout, MD, MS
      • http://www.instedd.org
      • [email_address]
      • http://taha.instedd.org
      • Nicolás di Tada
      • http://www.manas.com.ar
      • [email_address]
      • http://weblogs.manas.com.ar/ndt/
    47. BACKUP SLIDES
    48. REFERENCES
      • Izadi, M. and Buckeridge, D., Decision Theoretic Analysis of Improving Epidemic Detection, AMIA 2007, Symposium Proceedings 2007
      • EpiNorth-Based material ( http://www.epinorth.org ):
        • Mereckiene, J., Outbreak Investigation Operational Aspects. Jurmala, Latvia, 2006
        • Bagdonaite, J., and Mereckiene, J., Outbreak Investigation Methodological aspects. Jurmala, Latvia, 2006
        • Epidemic Intelligence: Signals from surveillance systems, Anne Mazick, Statens Serum Institut, Denmark, EpiTrain III, Jurmala, August 2006
      • Daniel Neil, Incorporating Learning into Disease Surveillance Systems
    49. REFERENCES
      • Algorithms
        • Complex Event Processing Over Uncertain Data in Wasserkrug (2008)
        • Outbreak detection through automated surveillance A review of the determinants of detection in Buckeridge (2007)
        • Approaches to the evaluation of outbreak detection methods in Watkins (2006)
        • Algorithms for rapid outbreak detection a research synthesis Buckeridge (2004)
        • Data mining in bioinformatics using Weka in Frank (2004)
    50. REFERENCES
      • Automating Laboratory Reporting
        • Automatic Electronic Laboratory-Based Reporting in Panackal (2002)
        • Benefits and Barriers to Electronic Laboratory Results Reporting for Notifiable Diseases in Nguyen (2007)
      • Using EMR Data for Disease Surveillance
        • Using Electronic Medical Records to Enhance Detection and Reporting of Vaccine Adverse Events in Hinrichsen (2007)
        • Electronic Medical Record Support for PH in Klompas (2007)
        • A knowledgebase to support notifiable disease surveillance in Doyle (2005)
        • Automated Detection of Tuberculosis Using Electronic Medical Record Data in Calderwood (2007)
      • Misc Readings
        • Breakthrough in modeling emerging disease hotspots in Jones (2008)
        • Use of data mining techniques to investigate disease risk classification as a proxy for compromised biosecurity of cattle herds in Wales in Ortiz-Pelaez (2008)
    51. RELATED PROJECTS
      • InSTEDD RNA (or Event Evolution): Collaborative Analytics and Environment for Linking Early Health-Related Event Detection to an Effective Response ( http://taha.instedd.org/2008/09/collaborative-analytics-and-environment.html )
      • ALPACA "ALPACA Light Parsing And Classifying Application (ALPACA) is a classifying tool designed for use in community-oriented software as well as in Academia. The application consists of two parts: a parsing tool for transforming raw documents into readable data, and a classifying tool for categorizing documents into user-provided classes. The application provides a user-friendly interface and a Plug-in functionality to provide a simple way to add more parsers/classifiers to the application." http://2008.hfoss.org/ALPACA
      • Surveillance Project An Open Source R-package disease surveillance framework for "...the development and the evaluation of outbreak detection algorithms in univariate and multivariate routine collected public health surveillance data." http://surveillance.r-forge.r-project.org/
      • Weka An open source "...collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes." http://www.cs.waikato.ac.nz/~ml/weka/
    52.  

    + Taha Kass-HoutTaha Kass-Hout, 2 years ago

    custom

    1605 views, 1 favs, 6 embeds more stats

    The majority of the designs, analyses and evaluatio more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 1605
      • 1531 on SlideShare
      • 74 from embeds
    • Comments 0
    • Favorites 1
    • Downloads 32
    Most viewed embeds
    • 25 views on http://taha.instedd.org
    • 25 views on http://weblogs.manas.com.ar
    • 17 views on http://instedd.org
    • 5 views on http://www.instedd.org
    • 1 views on http://www.instedd.com

    more

    All embeds
    • 25 views on http://taha.instedd.org
    • 25 views on http://weblogs.manas.com.ar
    • 17 views on http://instedd.org
    • 5 views on http://www.instedd.org
    • 1 views on http://www.instedd.com
    • 1 views on http://www.fachak.com

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories