SlideShare a Scribd company logo
1 of 35
Herman Tolentino, MD
Director, Public Health Informatics Fellowship Program
Presentation Outline
 What is EpiSPIDER?
 Why was EpiSPIDER built?
 What is event-based surveillance?
 How was EpiSPIDER built?
 The EpiSPIDER “Information Ecosystem”
 Evolution of EpiSPIDER
 How has EpiSPIDER been used?
 What are the challenges in implementing EpiSPIDER?
 Overall challenges in event-based surveillance
 Next steps
 Summary
What is EpiSPIDER?
 The acronym stands for Semantic Processing and
Integration of Distributed Electronic Resources
for Epidemics and disasters
 Key words
 Semantic processing
 Integration of distributed electronic resources
• “Mashup”
• Visualization
Why was EpiSPIDER built?
 2005: Request from ProMED Mail to represent
their emerging infectious disease reports in time
and space and provide RSS feeds to their
members
 2006: Growth beyond ProMED Mail and Google
maps
 2009 and beyond: Leveraging linked data to
reduce information overload
Why was EpiSPIDER built?
 Early response to disease outbreaks is a public health priority
 Emerging infectious diseases may not be part of routine public
health reporting in many countries
 We can potentially leverage non-traditional sources of data to
provide practitioners with early warning
 Specifically, leverage Internet killer applications to collect and
exchange health event information
 Extracting and visualizing event information from unstructured
data can be done using computer algorithms such as NLP and text
mining (80% of health information remain locked in free text)
The Role of Information Technology and Surveillance Systems in Bioterrorism Readiness.
Bioterrorism and Health System Preparedness, Issue Brief No. 5. AHRQ Publication No. 05-0072,
March 2005. Agency for Healthcare Research and Quality, Rockville, MD.
http://www.ahrq.gov/news/ulp/btbriefs/btbrief5.htm
What is event-based surveillance?
WHO DEFINITION
 Definition: The organized and rapid capture of information about events that
are a potential risk to public health
 Can be rumors and other ad-hoc reports transmitted through formal channels
(i.e. established routine reporting systems) and informal channels (i.e. media,
health workers and nongovernmental organizations reports), including:
 Events related to the occurrence of disease in humans, such as clustered cases of a
disease or syndromes, unusual disease patterns or unexpected deaths as
recognized by health workers and other key informants in the country; and
 Events related to potential exposure for humans, such as events related to diseases
and deaths in animals, contaminated food products or water, and environmental
hazards including chemical and radio-nuclear events.
 Information received through event-based surveillance should be rapidly
assessed for the risk the event poses to public health and responded to
appropriately
Source: WHO, A guide to establishing event-based surveillance, 2008. URL:
http://www.wpro.who.int/internet/resources.ashx/CSR/Publications/eventbasedsurv.pdf
Role of event-based surveillance in
national surveillance system (WHO)
Source: WHO, A guide to establishing event-based surveillance, 2008. URL:
http://www.wpro.who.int/internet/resources.ashx/CSR/Publications/eventbasedsurv.pdf
Indicator-based Surveillance
Routine reporting of cases of disease,
including
•Notifiable disease surveillance system
•Sentinel surveillance
•Laboratory-based surveillance
Commonly
•Health care facility based
•Weekly, monthly reporting
Event-based Surveillance
Rapid detection, reporting,
confirmation, assessment of public
health events including
•Clusters of disease
•Rumors of unexplained deaths
Commonly
•Immediate reporting
Response
Linked to surveillance
National and subnational capacity to respond to alerts
Role of event-based surveillance in
national surveillance (ECDC)
Indicator-based
component
Surveillance Systems
Event-based
component
Event-monitoring
Data Events
Signal
Public health alert
Control measures
Capture
Filter
Validate
Collect
Analyse
Interpret
Assess
Investigate
Disseminate
Confidential: EWRS
Restricted access: network
inquiries, ECDC threat bulletin
Public: Eurosurveillance, press
release, web site
Paquet C, et..al. Epidemic intelligence: A new framework for strengthening disease surveillance in Europe. Euro Surveill.
2006;11(12): 212-4. URL: http://www.eurosurveillance.org/ViewArticle.aspx?ArticleId=665
Major challenges in developing automated
event-based surveillance systems
 Can event-based surveillance systems be
automated?
 Major challenges:
 Describing what information can be extracted from
event reports
 Identifying methods to extract desired information
 Identifying methods to convert unstructured to
structured data
How was EpiSPIDER built?
 Began as a fellowship project in 2005 with Dr. Raoul
Kamadjeu
 On a “shoestring budget,” utilizing Open-Source
software and freely available web services and data
sources
 Linux, Apache, MySQL and PHP (LAMP)
 Initially Scalable Vector Graphics then Yahoo Maps and
Google Maps
 Existing RSS feeds and unstructured web content
 Custom-developed NLP later replaced with
OpenCalais NLP web service
The Ecosystem
 Definition: Any natural unit or entity including living and non-living
parts that interact to produce a stable system through cyclic exchange
of materials [NASA Earth Observatory Glossary].
 Concept can be applied to Internet-based applications that function as
information-consuming or information producing “organisms” that
interact with each other in an interdependent way through exchange of
information.
 This information “ecosystem” has:
 Producers of data
 Transformers of data
 Consumers of data
http://earthobservatory.nasa.gov/Glossary/?mode=all
Graphical depiction of “ecosystem”
Yahoo Pipes
ProMED Mail
UNDP
CIA
WAHID
Unstructured Text
Google News
Moreover
Reuters
WHO
GDACS
Twitter
RSS
RSS
RSS, GeoRSS
OpenCalais
Alchemy
UMLSKS
uClassifier
Geonames
Google Translate
Yahoo Maps
Wikipedia
KML
Exhibit
Faceted Browsing
Google Maps
JSON data
RDF, XML
XMLSOAP REST
Mobile Provider
SMTP
SMS
Dapper
RSS
Consumers
Transformers
Producers
RSS
EpiSPIDER
RSS
RSS
EpiSPIDER Web Services
CATEGORIES BY TASK
Task Category Services
Information retrieval Search engines , RSS feeds, Raw HTML sources
Information extraction Dapper, Yahoo Pipes, Alchemy
Language identification Alchemy, Twitter, uClassifier
Language translation Google Translate
Keyword extraction Alchemy
Named entity recognition OpenCalais, Alchemy
Text classification uClassifier
Visualization SIMILE Exhibit, Google Visualization API, Google Maps
Georeferencing Google Maps, Yahoo Maps, Geonames, Twitter,
OpenCalais, Alchemy
Concept annotation UMLS Knowledge Source Server
Technology Adoption Timeline
2005 2006 2007 2008
Data sources
RSS Feeds (2)
Unstructured content
(1)
Visualization tools
Scalable Vector
Graphics
JPGraph
Web services
Yahoo Maps
askMEDLINE
Products
RSS feeds
Visualizations
Data sources
RSS Feeds (4)
Unstructured content
(3)
Email
Visualization tools
Google, Yahoo Maps
JPGraph
Web services
Yahoo Maps
Google Maps
askMEDLINE
Geonames
Products
RSS feeds
Visualizations
Data sources
RSS Feeds (8)
Unstructured content
(4)
Email
Visualization tools
SIMILE Exhibit
AJAX visualization
tools
Web services
Yahoo Maps
Google Maps
askMEDLINE
Geonames
Wikipedia
Products
RSS , GeoRSS feeds
KML feeds
SMS
Visualizations
Custom products
Data sources
RSS Feeds (8)
Unstructured content (4)
Email
(Server)
Visualization tools
SIMILE Exhibit
AJAX visualization tools
Google Earth
Web services
Yahoo Maps
Google Maps
Google Visualization API (1)
askMEDLINE
Geonames
Wikipedia
UMLSKS
OpenCalais
Yahoo Pipes
Dapper
Products
RSS, GeoRSS feeds
KML feeds
SMS
Visualizations
Custom products
Data sources
RSS Feeds (9)
Unstructured content (6)
Linked Data
Email
(Server)
Social networks: Twitter
Visualization tools
SIMILE Exhibit
AJAX visualization tools
Google Earth
Wordle
Web services
Yahoo Maps
Google Maps
Google Translate
Google Visualization API (3)
askMEDLINE
Geonames
Wikipedia
UMLSKS
OpenCalais
Yahoo Pipes
Dapper
uClassifier
Alchemy
Twitter
URL services
Products
RSS, GeoRSS feeds
KML feeds
SMS
Visualizations
Custom products
2009
EpiSPIDER, 2005-2006
SCALABLE VECTOR GRAPHICS MAP INTERFACE
EpiSPIDER, 2006
GOOGLE MAPS INTERFACE
ProMED Mail RSS Feeds, 2006
EpiSPIDER, 2009
SIMILE EXHIBIT INTERFACE
EpiSPIDER, 2009
EpiSPIDER, 2009
EpiSPIDER, 2008
KML FEEDS FOR GOOGLE EARTH
EpiSPIDER, 2009
SMS USING MOBILE PROVIDER GATEWAYS
Server Load Alert RSS Feed Outage ProMED Mail Latest
How has EpiSPIDER been used?
How has EpiSPIDER been used?
 Access by type (most to least)
 RSS
 Exhibit
 KML
 Access by organization
 Government agencies
 Academic institutions
 Research organizations
 Health departments
 Access by individuals
Challenges in implementing EpiSPIDER
 Changing nature of data
 Emergent nature of web services
 Understanding and developing connections with
complex APIs
 Information extraction and data linking
challenges
 Service delivery expansion increases resource
demands
Changing nature of web data
CHALLENGES IN IMPLEMENTING EPISPIDER
 Challenges with underlying HTML structure
 Non-standard HTML use prevents effective parsing of
content
 Need to map data to shared terminologies and
ontologies and knowledge metadata
 For better integration into an information ecosystem,
system needs to let other “organisms” know what
information it needs and what type of information it
produces
Emergent nature of web services
CHALLENGES IN IMPLEMENTING EPISPIDER
 Adapting to changing interfaces
 Must go beyond “taping” applications together manually
- need for automated “duct tape” adjustments
 Difficult for some interfaces (non-SOAP)
 Feed URL changes
 Have to subscribe to multiple mailing lists
 Changes in data structure of service response
 Service may have new data elements
 Example, new Twitter geolocation elements
Understanding complex APIs
CHALLENGES IN IMPLEMENTING EPISPIDER
 APIs are in continuous development
 Complexity increasing
 Knowledge base rapidly expanding
 Example:
 OpenCalais and Alchemy - addition of named entities
and relationships and linked data (Wikipedia,
Freebase) for disambiguation
 Promising developments
 Number of APIs in different task categories increasing
Information extraction and data linking challenges
CHALLENGES IN IMPLEMENTING EPISPIDER
 Named entity recognition and disambiguation
 Named entity recognition by web services of emerging
diseases may lag behind and provide non-specific
references
 Example: H1N1 may just be tagged as “influenza”
(nonspecific)
 Missing piece: UMLS Knowledge Source Server
named-entity extraction and concept annotation
web service
 Currently a standalone download: Metamap Transfer
Service delivery increases resource demands
CHALLENGES IN IMPLEMENTING EPISPIDER
 Managing contention for scarce computing
resources
 How to process huge amounts of information
without crashing the server
 Automated responses to certain parameters –
feedback loop
 Avoiding process collisions
 Alerting mechanisms
 How to send alerts when the server is about to crash
Overall challenges in event-based surveillance for
public health threats
 Increasing dependence on and need for development of
semantic tools to:
 Identify emerging outbreaks
 Assign outbreak severity
 Track escalation/decline, social disruption and government
response over time
 Promoting semantic data sharing among similar systems
 Shared terminologies
 Ontologies
 Knowledge metadata
Chute C. Biosurveillance, Classification, and Semantic Health Technologies (editorial), J Am Med Inform Assoc.
2008;15:172–173.
Advantages of web services
 Main advantages
 Outsource complex tasks to agents who can devote
resources and economies of scale to deliver high
quality, reliable service and outputs
 Promote use of standards for information exchange
 Other advantages
 Develop and reuse standard tools for processing
unstructured information
What could be next steps?
 Critical
 Incorporation of and mapping of knowledge base to ontology for event-based
surveillance to enable sharing of data across event-based surveillance systems
 Implementing event-based surveillance systems at national level to enable
targeted, distributed collection of event-based data
 Exposing underlying database as Resource Description Framework (RDF) or other
standards-based data
 Collaboration across event-based surveillance systems to enable system-to-system
interoperability
 Non-critical
 Continue to explore new data sources
 Annotated view of news articles
 Providing citizen reporting and participatory information processing interfaces to
end-users
Summary
 Inflection point in evolution of web services just
“around the corner”
 Challenges remain in:
 Automation and integration of web services in event-
based surveillance systems
 Integrating event-based surveillance in national
surveillance systems (local public health context)
 Enabling sharing of data across event-based
surveillance systems
Acknowledgements
 NCIRD: Raoul Kamadjeu
 NLM: Paul Fontelo, Fang Liu, Olivier Bodenreider
 ProMED Mail: Larry Madoff, Marjorie Pollack,
Alison Bodenheimer, Drew Tenenholz
The findings and conclusions in this report are those of the author(s) and
do not necessarily represent the official position of the Centers for
Disease Control and Prevention

More Related Content

Similar to 2009 EpiSPIDER CDC GIS Day

InSTEDD: Collaboration in Disease Surveillance & Response
InSTEDD: Collaboration in Disease Surveillance & ResponseInSTEDD: Collaboration in Disease Surveillance & Response
InSTEDD: Collaboration in Disease Surveillance & ResponseInSTEDD
 
A Cloud-Based Prototype Implementation of a Disease Outbreak Notification System
A Cloud-Based Prototype Implementation of a Disease Outbreak Notification SystemA Cloud-Based Prototype Implementation of a Disease Outbreak Notification System
A Cloud-Based Prototype Implementation of a Disease Outbreak Notification SystemIJCSEA Journal
 
Big Data Fusion for eHealth and Ambient Assisted Living Cloud Applications
Big Data Fusion for eHealth and Ambient Assisted Living Cloud ApplicationsBig Data Fusion for eHealth and Ambient Assisted Living Cloud Applications
Big Data Fusion for eHealth and Ambient Assisted Living Cloud ApplicationsAccelerate Project
 
A CLOUD-BASED PROTOTYPE IMPLEMENTATION OF A DISEASE OUTBREAK NOTIFICATION SYS...
A CLOUD-BASED PROTOTYPE IMPLEMENTATION OF A DISEASE OUTBREAK NOTIFICATION SYS...A CLOUD-BASED PROTOTYPE IMPLEMENTATION OF A DISEASE OUTBREAK NOTIFICATION SYS...
A CLOUD-BASED PROTOTYPE IMPLEMENTATION OF A DISEASE OUTBREAK NOTIFICATION SYS...IJCSEA Journal
 
Andrew Murdoch Avian Influenza 20080414
Andrew Murdoch Avian Influenza 20080414Andrew Murdoch Avian Influenza 20080414
Andrew Murdoch Avian Influenza 20080414a_murdoch
 
Semantic Web Adoption
Semantic Web AdoptionSemantic Web Adoption
Semantic Web Adoptionguest262aaa
 
ESIP Federation: Using social networks and social media to connect communitie...
ESIP Federation: Using social networks and social media to connect communitie...ESIP Federation: Using social networks and social media to connect communitie...
ESIP Federation: Using social networks and social media to connect communitie...Erin Robinson
 
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...Anastasija Nikiforova
 
Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK
Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK
NeISSProject
 
Supporting epidemic intelligence, personalised and public health with advance...
Supporting epidemic intelligence, personalised and public health with advance...Supporting epidemic intelligence, personalised and public health with advance...
Supporting epidemic intelligence, personalised and public health with advance...Joao Pita Costa
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHpetermurrayrust
 
2003-12-02 Environmental Information Systems for Monitoring, Assessment, and ...
2003-12-02 Environmental Information Systems for Monitoring, Assessment, and ...2003-12-02 Environmental Information Systems for Monitoring, Assessment, and ...
2003-12-02 Environmental Information Systems for Monitoring, Assessment, and ...Rudolf Husar
 
Unidata's Approach to Community Broadening through Data and Technology Sharing
Unidata's Approach to Community Broadening through Data and Technology SharingUnidata's Approach to Community Broadening through Data and Technology Sharing
Unidata's Approach to Community Broadening through Data and Technology SharingThe HDF-EOS Tools and Information Center
 

Similar to 2009 EpiSPIDER CDC GIS Day (20)

InSTEDD: Collaboration in Disease Surveillance & Response
InSTEDD: Collaboration in Disease Surveillance & ResponseInSTEDD: Collaboration in Disease Surveillance & Response
InSTEDD: Collaboration in Disease Surveillance & Response
 
InSTEDD HISA Conference
InSTEDD HISA ConferenceInSTEDD HISA Conference
InSTEDD HISA Conference
 
A Cloud-Based Prototype Implementation of a Disease Outbreak Notification System
A Cloud-Based Prototype Implementation of a Disease Outbreak Notification SystemA Cloud-Based Prototype Implementation of a Disease Outbreak Notification System
A Cloud-Based Prototype Implementation of a Disease Outbreak Notification System
 
Big Data Fusion for eHealth and Ambient Assisted Living Cloud Applications
Big Data Fusion for eHealth and Ambient Assisted Living Cloud ApplicationsBig Data Fusion for eHealth and Ambient Assisted Living Cloud Applications
Big Data Fusion for eHealth and Ambient Assisted Living Cloud Applications
 
A CLOUD-BASED PROTOTYPE IMPLEMENTATION OF A DISEASE OUTBREAK NOTIFICATION SYS...
A CLOUD-BASED PROTOTYPE IMPLEMENTATION OF A DISEASE OUTBREAK NOTIFICATION SYS...A CLOUD-BASED PROTOTYPE IMPLEMENTATION OF A DISEASE OUTBREAK NOTIFICATION SYS...
A CLOUD-BASED PROTOTYPE IMPLEMENTATION OF A DISEASE OUTBREAK NOTIFICATION SYS...
 
Epi Info™ Mesh4x
Epi Info™ Mesh4xEpi Info™ Mesh4x
Epi Info™ Mesh4x
 
Andrew Murdoch Avian Influenza 20080414
Andrew Murdoch Avian Influenza 20080414Andrew Murdoch Avian Influenza 20080414
Andrew Murdoch Avian Influenza 20080414
 
From Clinical Information Systems toward HealthGrid
From Clinical Information Systems toward HealthGridFrom Clinical Information Systems toward HealthGrid
From Clinical Information Systems toward HealthGrid
 
Semantic Web Adoption
Semantic Web AdoptionSemantic Web Adoption
Semantic Web Adoption
 
ESIP Federation: Using social networks and social media to connect communitie...
ESIP Federation: Using social networks and social media to connect communitie...ESIP Federation: Using social networks and social media to connect communitie...
ESIP Federation: Using social networks and social media to connect communitie...
 
Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration
Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration
Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration
 
Global pulse technology
Global pulse technologyGlobal pulse technology
Global pulse technology
 
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
 
Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK
Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

 
Supporting epidemic intelligence, personalised and public health with advance...
Supporting epidemic intelligence, personalised and public health with advance...Supporting epidemic intelligence, personalised and public health with advance...
Supporting epidemic intelligence, personalised and public health with advance...
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIH
 
Final group research_project
Final group research_projectFinal group research_project
Final group research_project
 
Cec Intro3 Mashups
Cec Intro3 MashupsCec Intro3 Mashups
Cec Intro3 Mashups
 
2003-12-02 Environmental Information Systems for Monitoring, Assessment, and ...
2003-12-02 Environmental Information Systems for Monitoring, Assessment, and ...2003-12-02 Environmental Information Systems for Monitoring, Assessment, and ...
2003-12-02 Environmental Information Systems for Monitoring, Assessment, and ...
 
Unidata's Approach to Community Broadening through Data and Technology Sharing
Unidata's Approach to Community Broadening through Data and Technology SharingUnidata's Approach to Community Broadening through Data and Technology Sharing
Unidata's Approach to Community Broadening through Data and Technology Sharing
 

2009 EpiSPIDER CDC GIS Day

  • 1. Herman Tolentino, MD Director, Public Health Informatics Fellowship Program
  • 2. Presentation Outline  What is EpiSPIDER?  Why was EpiSPIDER built?  What is event-based surveillance?  How was EpiSPIDER built?  The EpiSPIDER “Information Ecosystem”  Evolution of EpiSPIDER  How has EpiSPIDER been used?  What are the challenges in implementing EpiSPIDER?  Overall challenges in event-based surveillance  Next steps  Summary
  • 3. What is EpiSPIDER?  The acronym stands for Semantic Processing and Integration of Distributed Electronic Resources for Epidemics and disasters  Key words  Semantic processing  Integration of distributed electronic resources • “Mashup” • Visualization
  • 4. Why was EpiSPIDER built?  2005: Request from ProMED Mail to represent their emerging infectious disease reports in time and space and provide RSS feeds to their members  2006: Growth beyond ProMED Mail and Google maps  2009 and beyond: Leveraging linked data to reduce information overload
  • 5. Why was EpiSPIDER built?  Early response to disease outbreaks is a public health priority  Emerging infectious diseases may not be part of routine public health reporting in many countries  We can potentially leverage non-traditional sources of data to provide practitioners with early warning  Specifically, leverage Internet killer applications to collect and exchange health event information  Extracting and visualizing event information from unstructured data can be done using computer algorithms such as NLP and text mining (80% of health information remain locked in free text) The Role of Information Technology and Surveillance Systems in Bioterrorism Readiness. Bioterrorism and Health System Preparedness, Issue Brief No. 5. AHRQ Publication No. 05-0072, March 2005. Agency for Healthcare Research and Quality, Rockville, MD. http://www.ahrq.gov/news/ulp/btbriefs/btbrief5.htm
  • 6. What is event-based surveillance? WHO DEFINITION  Definition: The organized and rapid capture of information about events that are a potential risk to public health  Can be rumors and other ad-hoc reports transmitted through formal channels (i.e. established routine reporting systems) and informal channels (i.e. media, health workers and nongovernmental organizations reports), including:  Events related to the occurrence of disease in humans, such as clustered cases of a disease or syndromes, unusual disease patterns or unexpected deaths as recognized by health workers and other key informants in the country; and  Events related to potential exposure for humans, such as events related to diseases and deaths in animals, contaminated food products or water, and environmental hazards including chemical and radio-nuclear events.  Information received through event-based surveillance should be rapidly assessed for the risk the event poses to public health and responded to appropriately Source: WHO, A guide to establishing event-based surveillance, 2008. URL: http://www.wpro.who.int/internet/resources.ashx/CSR/Publications/eventbasedsurv.pdf
  • 7. Role of event-based surveillance in national surveillance system (WHO) Source: WHO, A guide to establishing event-based surveillance, 2008. URL: http://www.wpro.who.int/internet/resources.ashx/CSR/Publications/eventbasedsurv.pdf Indicator-based Surveillance Routine reporting of cases of disease, including •Notifiable disease surveillance system •Sentinel surveillance •Laboratory-based surveillance Commonly •Health care facility based •Weekly, monthly reporting Event-based Surveillance Rapid detection, reporting, confirmation, assessment of public health events including •Clusters of disease •Rumors of unexplained deaths Commonly •Immediate reporting Response Linked to surveillance National and subnational capacity to respond to alerts
  • 8. Role of event-based surveillance in national surveillance (ECDC) Indicator-based component Surveillance Systems Event-based component Event-monitoring Data Events Signal Public health alert Control measures Capture Filter Validate Collect Analyse Interpret Assess Investigate Disseminate Confidential: EWRS Restricted access: network inquiries, ECDC threat bulletin Public: Eurosurveillance, press release, web site Paquet C, et..al. Epidemic intelligence: A new framework for strengthening disease surveillance in Europe. Euro Surveill. 2006;11(12): 212-4. URL: http://www.eurosurveillance.org/ViewArticle.aspx?ArticleId=665
  • 9. Major challenges in developing automated event-based surveillance systems  Can event-based surveillance systems be automated?  Major challenges:  Describing what information can be extracted from event reports  Identifying methods to extract desired information  Identifying methods to convert unstructured to structured data
  • 10. How was EpiSPIDER built?  Began as a fellowship project in 2005 with Dr. Raoul Kamadjeu  On a “shoestring budget,” utilizing Open-Source software and freely available web services and data sources  Linux, Apache, MySQL and PHP (LAMP)  Initially Scalable Vector Graphics then Yahoo Maps and Google Maps  Existing RSS feeds and unstructured web content  Custom-developed NLP later replaced with OpenCalais NLP web service
  • 11. The Ecosystem  Definition: Any natural unit or entity including living and non-living parts that interact to produce a stable system through cyclic exchange of materials [NASA Earth Observatory Glossary].  Concept can be applied to Internet-based applications that function as information-consuming or information producing “organisms” that interact with each other in an interdependent way through exchange of information.  This information “ecosystem” has:  Producers of data  Transformers of data  Consumers of data http://earthobservatory.nasa.gov/Glossary/?mode=all
  • 12. Graphical depiction of “ecosystem” Yahoo Pipes ProMED Mail UNDP CIA WAHID Unstructured Text Google News Moreover Reuters WHO GDACS Twitter RSS RSS RSS, GeoRSS OpenCalais Alchemy UMLSKS uClassifier Geonames Google Translate Yahoo Maps Wikipedia KML Exhibit Faceted Browsing Google Maps JSON data RDF, XML XMLSOAP REST Mobile Provider SMTP SMS Dapper RSS Consumers Transformers Producers RSS EpiSPIDER RSS RSS
  • 13. EpiSPIDER Web Services CATEGORIES BY TASK Task Category Services Information retrieval Search engines , RSS feeds, Raw HTML sources Information extraction Dapper, Yahoo Pipes, Alchemy Language identification Alchemy, Twitter, uClassifier Language translation Google Translate Keyword extraction Alchemy Named entity recognition OpenCalais, Alchemy Text classification uClassifier Visualization SIMILE Exhibit, Google Visualization API, Google Maps Georeferencing Google Maps, Yahoo Maps, Geonames, Twitter, OpenCalais, Alchemy Concept annotation UMLS Knowledge Source Server
  • 14. Technology Adoption Timeline 2005 2006 2007 2008 Data sources RSS Feeds (2) Unstructured content (1) Visualization tools Scalable Vector Graphics JPGraph Web services Yahoo Maps askMEDLINE Products RSS feeds Visualizations Data sources RSS Feeds (4) Unstructured content (3) Email Visualization tools Google, Yahoo Maps JPGraph Web services Yahoo Maps Google Maps askMEDLINE Geonames Products RSS feeds Visualizations Data sources RSS Feeds (8) Unstructured content (4) Email Visualization tools SIMILE Exhibit AJAX visualization tools Web services Yahoo Maps Google Maps askMEDLINE Geonames Wikipedia Products RSS , GeoRSS feeds KML feeds SMS Visualizations Custom products Data sources RSS Feeds (8) Unstructured content (4) Email (Server) Visualization tools SIMILE Exhibit AJAX visualization tools Google Earth Web services Yahoo Maps Google Maps Google Visualization API (1) askMEDLINE Geonames Wikipedia UMLSKS OpenCalais Yahoo Pipes Dapper Products RSS, GeoRSS feeds KML feeds SMS Visualizations Custom products Data sources RSS Feeds (9) Unstructured content (6) Linked Data Email (Server) Social networks: Twitter Visualization tools SIMILE Exhibit AJAX visualization tools Google Earth Wordle Web services Yahoo Maps Google Maps Google Translate Google Visualization API (3) askMEDLINE Geonames Wikipedia UMLSKS OpenCalais Yahoo Pipes Dapper uClassifier Alchemy Twitter URL services Products RSS, GeoRSS feeds KML feeds SMS Visualizations Custom products 2009
  • 15. EpiSPIDER, 2005-2006 SCALABLE VECTOR GRAPHICS MAP INTERFACE
  • 17. ProMED Mail RSS Feeds, 2006
  • 21. EpiSPIDER, 2008 KML FEEDS FOR GOOGLE EARTH
  • 22. EpiSPIDER, 2009 SMS USING MOBILE PROVIDER GATEWAYS Server Load Alert RSS Feed Outage ProMED Mail Latest
  • 23. How has EpiSPIDER been used?
  • 24. How has EpiSPIDER been used?  Access by type (most to least)  RSS  Exhibit  KML  Access by organization  Government agencies  Academic institutions  Research organizations  Health departments  Access by individuals
  • 25. Challenges in implementing EpiSPIDER  Changing nature of data  Emergent nature of web services  Understanding and developing connections with complex APIs  Information extraction and data linking challenges  Service delivery expansion increases resource demands
  • 26. Changing nature of web data CHALLENGES IN IMPLEMENTING EPISPIDER  Challenges with underlying HTML structure  Non-standard HTML use prevents effective parsing of content  Need to map data to shared terminologies and ontologies and knowledge metadata  For better integration into an information ecosystem, system needs to let other “organisms” know what information it needs and what type of information it produces
  • 27. Emergent nature of web services CHALLENGES IN IMPLEMENTING EPISPIDER  Adapting to changing interfaces  Must go beyond “taping” applications together manually - need for automated “duct tape” adjustments  Difficult for some interfaces (non-SOAP)  Feed URL changes  Have to subscribe to multiple mailing lists  Changes in data structure of service response  Service may have new data elements  Example, new Twitter geolocation elements
  • 28. Understanding complex APIs CHALLENGES IN IMPLEMENTING EPISPIDER  APIs are in continuous development  Complexity increasing  Knowledge base rapidly expanding  Example:  OpenCalais and Alchemy - addition of named entities and relationships and linked data (Wikipedia, Freebase) for disambiguation  Promising developments  Number of APIs in different task categories increasing
  • 29. Information extraction and data linking challenges CHALLENGES IN IMPLEMENTING EPISPIDER  Named entity recognition and disambiguation  Named entity recognition by web services of emerging diseases may lag behind and provide non-specific references  Example: H1N1 may just be tagged as “influenza” (nonspecific)  Missing piece: UMLS Knowledge Source Server named-entity extraction and concept annotation web service  Currently a standalone download: Metamap Transfer
  • 30. Service delivery increases resource demands CHALLENGES IN IMPLEMENTING EPISPIDER  Managing contention for scarce computing resources  How to process huge amounts of information without crashing the server  Automated responses to certain parameters – feedback loop  Avoiding process collisions  Alerting mechanisms  How to send alerts when the server is about to crash
  • 31. Overall challenges in event-based surveillance for public health threats  Increasing dependence on and need for development of semantic tools to:  Identify emerging outbreaks  Assign outbreak severity  Track escalation/decline, social disruption and government response over time  Promoting semantic data sharing among similar systems  Shared terminologies  Ontologies  Knowledge metadata Chute C. Biosurveillance, Classification, and Semantic Health Technologies (editorial), J Am Med Inform Assoc. 2008;15:172–173.
  • 32. Advantages of web services  Main advantages  Outsource complex tasks to agents who can devote resources and economies of scale to deliver high quality, reliable service and outputs  Promote use of standards for information exchange  Other advantages  Develop and reuse standard tools for processing unstructured information
  • 33. What could be next steps?  Critical  Incorporation of and mapping of knowledge base to ontology for event-based surveillance to enable sharing of data across event-based surveillance systems  Implementing event-based surveillance systems at national level to enable targeted, distributed collection of event-based data  Exposing underlying database as Resource Description Framework (RDF) or other standards-based data  Collaboration across event-based surveillance systems to enable system-to-system interoperability  Non-critical  Continue to explore new data sources  Annotated view of news articles  Providing citizen reporting and participatory information processing interfaces to end-users
  • 34. Summary  Inflection point in evolution of web services just “around the corner”  Challenges remain in:  Automation and integration of web services in event- based surveillance systems  Integrating event-based surveillance in national surveillance systems (local public health context)  Enabling sharing of data across event-based surveillance systems
  • 35. Acknowledgements  NCIRD: Raoul Kamadjeu  NLM: Paul Fontelo, Fang Liu, Olivier Bodenreider  ProMED Mail: Larry Madoff, Marjorie Pollack, Alison Bodenheimer, Drew Tenenholz The findings and conclusions in this report are those of the author(s) and do not necessarily represent the official position of the Centers for Disease Control and Prevention