2. Presentation Outline
What is EpiSPIDER?
Why was EpiSPIDER built?
What is event-based surveillance?
How was EpiSPIDER built?
The EpiSPIDER “Information Ecosystem”
Evolution of EpiSPIDER
How has EpiSPIDER been used?
What are the challenges in implementing EpiSPIDER?
Overall challenges in event-based surveillance
Next steps
Summary
3. What is EpiSPIDER?
The acronym stands for Semantic Processing and
Integration of Distributed Electronic Resources
for Epidemics and disasters
Key words
Semantic processing
Integration of distributed electronic resources
• “Mashup”
• Visualization
4. Why was EpiSPIDER built?
2005: Request from ProMED Mail to represent
their emerging infectious disease reports in time
and space and provide RSS feeds to their
members
2006: Growth beyond ProMED Mail and Google
maps
2009 and beyond: Leveraging linked data to
reduce information overload
5. Why was EpiSPIDER built?
Early response to disease outbreaks is a public health priority
Emerging infectious diseases may not be part of routine public
health reporting in many countries
We can potentially leverage non-traditional sources of data to
provide practitioners with early warning
Specifically, leverage Internet killer applications to collect and
exchange health event information
Extracting and visualizing event information from unstructured
data can be done using computer algorithms such as NLP and text
mining (80% of health information remain locked in free text)
The Role of Information Technology and Surveillance Systems in Bioterrorism Readiness.
Bioterrorism and Health System Preparedness, Issue Brief No. 5. AHRQ Publication No. 05-0072,
March 2005. Agency for Healthcare Research and Quality, Rockville, MD.
http://www.ahrq.gov/news/ulp/btbriefs/btbrief5.htm
6. What is event-based surveillance?
WHO DEFINITION
Definition: The organized and rapid capture of information about events that
are a potential risk to public health
Can be rumors and other ad-hoc reports transmitted through formal channels
(i.e. established routine reporting systems) and informal channels (i.e. media,
health workers and nongovernmental organizations reports), including:
Events related to the occurrence of disease in humans, such as clustered cases of a
disease or syndromes, unusual disease patterns or unexpected deaths as
recognized by health workers and other key informants in the country; and
Events related to potential exposure for humans, such as events related to diseases
and deaths in animals, contaminated food products or water, and environmental
hazards including chemical and radio-nuclear events.
Information received through event-based surveillance should be rapidly
assessed for the risk the event poses to public health and responded to
appropriately
Source: WHO, A guide to establishing event-based surveillance, 2008. URL:
http://www.wpro.who.int/internet/resources.ashx/CSR/Publications/eventbasedsurv.pdf
7. Role of event-based surveillance in
national surveillance system (WHO)
Source: WHO, A guide to establishing event-based surveillance, 2008. URL:
http://www.wpro.who.int/internet/resources.ashx/CSR/Publications/eventbasedsurv.pdf
Indicator-based Surveillance
Routine reporting of cases of disease,
including
•Notifiable disease surveillance system
•Sentinel surveillance
•Laboratory-based surveillance
Commonly
•Health care facility based
•Weekly, monthly reporting
Event-based Surveillance
Rapid detection, reporting,
confirmation, assessment of public
health events including
•Clusters of disease
•Rumors of unexplained deaths
Commonly
•Immediate reporting
Response
Linked to surveillance
National and subnational capacity to respond to alerts
8. Role of event-based surveillance in
national surveillance (ECDC)
Indicator-based
component
Surveillance Systems
Event-based
component
Event-monitoring
Data Events
Signal
Public health alert
Control measures
Capture
Filter
Validate
Collect
Analyse
Interpret
Assess
Investigate
Disseminate
Confidential: EWRS
Restricted access: network
inquiries, ECDC threat bulletin
Public: Eurosurveillance, press
release, web site
Paquet C, et..al. Epidemic intelligence: A new framework for strengthening disease surveillance in Europe. Euro Surveill.
2006;11(12): 212-4. URL: http://www.eurosurveillance.org/ViewArticle.aspx?ArticleId=665
9. Major challenges in developing automated
event-based surveillance systems
Can event-based surveillance systems be
automated?
Major challenges:
Describing what information can be extracted from
event reports
Identifying methods to extract desired information
Identifying methods to convert unstructured to
structured data
10. How was EpiSPIDER built?
Began as a fellowship project in 2005 with Dr. Raoul
Kamadjeu
On a “shoestring budget,” utilizing Open-Source
software and freely available web services and data
sources
Linux, Apache, MySQL and PHP (LAMP)
Initially Scalable Vector Graphics then Yahoo Maps and
Google Maps
Existing RSS feeds and unstructured web content
Custom-developed NLP later replaced with
OpenCalais NLP web service
11. The Ecosystem
Definition: Any natural unit or entity including living and non-living
parts that interact to produce a stable system through cyclic exchange
of materials [NASA Earth Observatory Glossary].
Concept can be applied to Internet-based applications that function as
information-consuming or information producing “organisms” that
interact with each other in an interdependent way through exchange of
information.
This information “ecosystem” has:
Producers of data
Transformers of data
Consumers of data
http://earthobservatory.nasa.gov/Glossary/?mode=all
12. Graphical depiction of “ecosystem”
Yahoo Pipes
ProMED Mail
UNDP
CIA
WAHID
Unstructured Text
Google News
Moreover
Reuters
WHO
GDACS
Twitter
RSS
RSS
RSS, GeoRSS
OpenCalais
Alchemy
UMLSKS
uClassifier
Geonames
Google Translate
Yahoo Maps
Wikipedia
KML
Exhibit
Faceted Browsing
Google Maps
JSON data
RDF, XML
XMLSOAP REST
Mobile Provider
SMTP
SMS
Dapper
RSS
Consumers
Transformers
Producers
RSS
EpiSPIDER
RSS
RSS
13. EpiSPIDER Web Services
CATEGORIES BY TASK
Task Category Services
Information retrieval Search engines , RSS feeds, Raw HTML sources
Information extraction Dapper, Yahoo Pipes, Alchemy
Language identification Alchemy, Twitter, uClassifier
Language translation Google Translate
Keyword extraction Alchemy
Named entity recognition OpenCalais, Alchemy
Text classification uClassifier
Visualization SIMILE Exhibit, Google Visualization API, Google Maps
Georeferencing Google Maps, Yahoo Maps, Geonames, Twitter,
OpenCalais, Alchemy
Concept annotation UMLS Knowledge Source Server
14. Technology Adoption Timeline
2005 2006 2007 2008
Data sources
RSS Feeds (2)
Unstructured content
(1)
Visualization tools
Scalable Vector
Graphics
JPGraph
Web services
Yahoo Maps
askMEDLINE
Products
RSS feeds
Visualizations
Data sources
RSS Feeds (4)
Unstructured content
(3)
Email
Visualization tools
Google, Yahoo Maps
JPGraph
Web services
Yahoo Maps
Google Maps
askMEDLINE
Geonames
Products
RSS feeds
Visualizations
Data sources
RSS Feeds (8)
Unstructured content
(4)
Email
Visualization tools
SIMILE Exhibit
AJAX visualization
tools
Web services
Yahoo Maps
Google Maps
askMEDLINE
Geonames
Wikipedia
Products
RSS , GeoRSS feeds
KML feeds
SMS
Visualizations
Custom products
Data sources
RSS Feeds (8)
Unstructured content (4)
Email
(Server)
Visualization tools
SIMILE Exhibit
AJAX visualization tools
Google Earth
Web services
Yahoo Maps
Google Maps
Google Visualization API (1)
askMEDLINE
Geonames
Wikipedia
UMLSKS
OpenCalais
Yahoo Pipes
Dapper
Products
RSS, GeoRSS feeds
KML feeds
SMS
Visualizations
Custom products
Data sources
RSS Feeds (9)
Unstructured content (6)
Linked Data
Email
(Server)
Social networks: Twitter
Visualization tools
SIMILE Exhibit
AJAX visualization tools
Google Earth
Wordle
Web services
Yahoo Maps
Google Maps
Google Translate
Google Visualization API (3)
askMEDLINE
Geonames
Wikipedia
UMLSKS
OpenCalais
Yahoo Pipes
Dapper
uClassifier
Alchemy
Twitter
URL services
Products
RSS, GeoRSS feeds
KML feeds
SMS
Visualizations
Custom products
2009
24. How has EpiSPIDER been used?
Access by type (most to least)
RSS
Exhibit
KML
Access by organization
Government agencies
Academic institutions
Research organizations
Health departments
Access by individuals
25. Challenges in implementing EpiSPIDER
Changing nature of data
Emergent nature of web services
Understanding and developing connections with
complex APIs
Information extraction and data linking
challenges
Service delivery expansion increases resource
demands
26. Changing nature of web data
CHALLENGES IN IMPLEMENTING EPISPIDER
Challenges with underlying HTML structure
Non-standard HTML use prevents effective parsing of
content
Need to map data to shared terminologies and
ontologies and knowledge metadata
For better integration into an information ecosystem,
system needs to let other “organisms” know what
information it needs and what type of information it
produces
27. Emergent nature of web services
CHALLENGES IN IMPLEMENTING EPISPIDER
Adapting to changing interfaces
Must go beyond “taping” applications together manually
- need for automated “duct tape” adjustments
Difficult for some interfaces (non-SOAP)
Feed URL changes
Have to subscribe to multiple mailing lists
Changes in data structure of service response
Service may have new data elements
Example, new Twitter geolocation elements
28. Understanding complex APIs
CHALLENGES IN IMPLEMENTING EPISPIDER
APIs are in continuous development
Complexity increasing
Knowledge base rapidly expanding
Example:
OpenCalais and Alchemy - addition of named entities
and relationships and linked data (Wikipedia,
Freebase) for disambiguation
Promising developments
Number of APIs in different task categories increasing
29. Information extraction and data linking challenges
CHALLENGES IN IMPLEMENTING EPISPIDER
Named entity recognition and disambiguation
Named entity recognition by web services of emerging
diseases may lag behind and provide non-specific
references
Example: H1N1 may just be tagged as “influenza”
(nonspecific)
Missing piece: UMLS Knowledge Source Server
named-entity extraction and concept annotation
web service
Currently a standalone download: Metamap Transfer
30. Service delivery increases resource demands
CHALLENGES IN IMPLEMENTING EPISPIDER
Managing contention for scarce computing
resources
How to process huge amounts of information
without crashing the server
Automated responses to certain parameters –
feedback loop
Avoiding process collisions
Alerting mechanisms
How to send alerts when the server is about to crash
31. Overall challenges in event-based surveillance for
public health threats
Increasing dependence on and need for development of
semantic tools to:
Identify emerging outbreaks
Assign outbreak severity
Track escalation/decline, social disruption and government
response over time
Promoting semantic data sharing among similar systems
Shared terminologies
Ontologies
Knowledge metadata
Chute C. Biosurveillance, Classification, and Semantic Health Technologies (editorial), J Am Med Inform Assoc.
2008;15:172–173.
32. Advantages of web services
Main advantages
Outsource complex tasks to agents who can devote
resources and economies of scale to deliver high
quality, reliable service and outputs
Promote use of standards for information exchange
Other advantages
Develop and reuse standard tools for processing
unstructured information
33. What could be next steps?
Critical
Incorporation of and mapping of knowledge base to ontology for event-based
surveillance to enable sharing of data across event-based surveillance systems
Implementing event-based surveillance systems at national level to enable
targeted, distributed collection of event-based data
Exposing underlying database as Resource Description Framework (RDF) or other
standards-based data
Collaboration across event-based surveillance systems to enable system-to-system
interoperability
Non-critical
Continue to explore new data sources
Annotated view of news articles
Providing citizen reporting and participatory information processing interfaces to
end-users
34. Summary
Inflection point in evolution of web services just
“around the corner”
Challenges remain in:
Automation and integration of web services in event-
based surveillance systems
Integrating event-based surveillance in national
surveillance systems (local public health context)
Enabling sharing of data across event-based
surveillance systems
35. Acknowledgements
NCIRD: Raoul Kamadjeu
NLM: Paul Fontelo, Fang Liu, Olivier Bodenreider
ProMED Mail: Larry Madoff, Marjorie Pollack,
Alison Bodenheimer, Drew Tenenholz
The findings and conclusions in this report are those of the author(s) and
do not necessarily represent the official position of the Centers for
Disease Control and Prevention