SlideShare a Scribd company logo
1 of 37
Download to read offline
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 1
Pandemic Response in the Era of Big Data:
Exploring the Complexities of Global Influenza Surveillance and Information Overload
Kyle Prier
Johns Hopkins Bloomberg School of Public Health
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 2
Pandemic Response in the Era of Big Data:
Exploring the Complexities of Global Influenza Surveillance and Information Overload
Introduction
In March 2009, the Mexican Ministry of Health reported to the World Health
Organization (WHO) Global Influenza Surveillance Network (GISN) an unusual increase of
Influenza-like Illnesses (ILI) during a period of seasonal outbreak decline (1). This initial
deviation reported from Mexican authorities became the basis for an alert to global public health
officials of an outbreak of a novel, highly contagious influenza strain, which we now refer to as
the 2009 H1N1 pandemic.
After the March 2009 report by Mexican authorities of the novel influenza A in Mexico,
rapid testing protocols to confirm H1N1 virological were quickly developed. Using GISN
specimens of viral strains, within days, the Centers for Disease Control and Prevention (CDC)
developed and shared a real-time reverse transcriptase-polymerase chain reaction (RTPCR)
protocol that could quickly identify cases of H1N1 (2, 3). These testing protocols were critical to
collect data so that officials could respond to the emerging epidemic.
By July 2009, after 4 months of the initial report by Mexican authorities, the H1N1
pandemic had infected over 100,000 individuals globally. By July 2009, data were coming in so
rapidly, it became too burdensome on a global level to efficiently track and validate cases using
the rapid CDC protocol (2). During this time in July 2009, WHO officials resorted to relying
upon general qualitative indicators of basic pandemic changes that were communicated via e-
mail by country officials (2). Quite literally, at the height of the H1N1 pandemic, the highest
level global public health decision makers were asking countries if the H1N1 pandemic was
getting better or worse, despite the availability of reported data via existing protocols.
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 3
During the height of the pandemic, WHO later determined there was not sufficient time to
develop even “preliminary estimates of severity parameters such as the case fatality ratio,” and
that these estimates “lagged behind key decision-making and response-planning” (2). The rapid
increase of information and data at the global level during the H1N1 pandemic contributed to
subsequent negative effects on key decision making, which ultimately negatively impacted the
ability of WHO and countries to institute appropriate pharmaceutical and non-pharmaceutical
interventions (4). In the H1N1 case, key decision makers became overwhelmed with the rate and
amount of data, which effectively muted their organizational ability to most effectively manage
the H1N1 pandemic.
Information Overload
The organizational inability of the WHO and other key decision makers to respond
efficiently to the H1N1 pandemic can be attributed to the theory and concept of information
overload. Information overload can be described as psychological and organizational
phenomenon that occurs when the amount of input of information to a system organization
exceeds its processing capacity (5, 6). In describing the concept of information overload, Shenk
argues that “at a certain level of input, the law of diminishing returns takes effect” and that the
“glut of information” leads to a negative situation that “cultivate[s] stress, confusion and even
ignorance” (7). He further argues that information overload leaves us “less cohesive as a
society”, and that on an individual level it “diminishes our control over our own lives,” while
strengthening the positions of those “already in power” (7).
Throughout the emergence and growth of electronic and computer-mediated communication
systems, researchers have warned against the threat of information overload among various
organizations (8, 9). The issue of information overload has extended into the public health
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 4
surveillance arena. More specifically, the increased availability and frequency of infectious
disease data from electronic sources has been burdensome for organizations to respond to disease
outbreaks quickly and accurately (10). In 2006, the HealthMap project was created primarily to
mitigate “information overload” among public health organizations to effectively monitor global
infectious diseases (10).
Big Data
In 2014, researchers, governments, corporations, organizations, and individuals have access
to almost unfathomable amounts of data relating to human behavior, communication, and health
(just to name a few). The influx in available data has been made possible due to the influence of
the Internet and the evolution World Wide Web.
In 2004, 14.2 % of the global population had access to the Internet, compared to 35.5 % in
2012 (11). The World Wide Web has matured significantly over the last decade from a
communication medium of passive content distribution (Web 1.0) to a platform of interactive
user collaboration and user-generated content (Web 2.0). Today’s Web is made up of numerous
online social networks like Twitter, Facebook, LinkedIn, Pintrest, and Tumblr (among many
others). Such an influx of user-generated content has been accompanied by advancements in
computer hardware and software.
New disciplines within mathematics and computer science (e.g. Data Mining, Natural
Language Processing, and Machine Learning) have emerged to study such large amounts of data
in order to induce trends and relationships among potential variables. A fascination with the
utility of “big data” has bled into many disciplines, including public health. Consider some of the
buzz words: mHealth (12), eHealth (13, 14), infodemiology (15, 16), and infoveillance (16).
Purpose
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 5
The primary purposes of this paper are (1) explore the evolution and complexities
associated with disease surveillance, (2) consider the utility and implications of these methods on
the capacity of organizations to promptly and effectively respond to infectious disease threats.
Additionally, this paper explores emerging novel syndromic surveillance systems,
including the analytical evaluation of a novel global Twitter-based influenza surveillance system
to determine how closely these Twitter data correlate with traditional weekly influenza reports
among several English-speaking countries.
Influenza
Influenza (or the “flu”) is a contagious acute viral infection that primarily impacts the
brochi, throat, and occasionally the lungs, with an incubation period of about 2 days (17). With
such a relatively short time between infection and onset, infected individuals will often quickly
develop symptoms of fever, cough, sore throat, runny/stuffy nose, muscle aches, and general
malaise and fatigue (18).
Although influenza impacts all age groups (19-21), mortality rates of influenza cases are
more likely among population groups who are at a higher risk including young children less than
2 years, the elderly (65+ years), and those who have preexisting chronic illnesses (e.g. chronic
lung disease, heart disease, asthma, diabetes, weakened immune systems, and morbid obesity)
(22-26).
The primary purposes of this paper are (1) explore the evolution and complexities associated
with disease surveillance, (2) consider the utility and implications of these methods on the
capacity of organizations to promptly and effectively respond to infectious disease threats.
Additionally, this paper explores emerging novel syndromic surveillance systems,
including the analytical evaluation of a novel global Twitter-based influenza surveillance system
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 6
to determine how closely these Twitter data correlate with traditional weekly influenza reports
among several English-speaking countries.
Disease Burden
Although most people with the flu will recover within a few days to less than 2 weeks,
some will develop life-threatening complications from the flu including pneumonia and
worsening of preexisting conditions (26). Although it is sometimes difficult to identify influenza-
related mortalities, it is estimated that seasonal influenza accounts for about 250,000 to 500,000
deaths globally (17). In addition to flu related deaths, influenza epidemics and pandemics
decrease worker productivity and economic output, while generating considerable costs for
necessary treatment and prevention interventions each year.
Disease Transmission
Influenza is spread primarily from person-to-person contact. The virus is believed to be
spread through droplets or aerosols created when infected individuals cough, sneeze, or talk from
up to 6 feet away (27). Influenza often spreads efficiently and quickly through villages, cities,
schools, and other areas where human-to-human contact is likely (17).
Seasonal vs. Pandemic Influenza
In temperate areas, influenza typically occurs annually as regional or national epidemics
(28). This yearly emergence of seasonal influenza is due primarily to constant antigenic drift of
influenza viruses (29, 30). In addition to annual or seasonal influenza infections, global flu
pandemics do rarely occur as novel influenza A viruses emerge (31). Examples of such global
influenza pandemics include the 1918 Spanish Influenza, a 1957 Asian Influenza, a 1968 Hong
Kong Influenza, as well as the 2009 H1N1 Influenza (32).
Disease Surveillance
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 7
Traditional Methods
Timely and accurate surveillance of seasonal and pandemic flu trends is a crucial tool for
global, national, state, and local organizations to better understand key important
epidemiological and virological aspects of the pandemic. An accurate understanding of these
aspects of a pandemic better enables organizations to determine allocation of key resources (e.g.
vaccinations and antivirals), how to communicate to the public, or whether restrictions on travel
should be implemented. The World Health Organization identifies the primary goal of global
influenza surveillance “to develop a global picture of the event through sharing and analysis of
information provided by individual countries” (2).
Traditional influenza surveillance methodologies typically refer to virological
surveillance, or identification of influenza specimen strains in a laboratory setting. In the United
States virological surveillance is primarily reported through FluView, which includes laboratory
data from 85 WHO Collaborating Laboratories in the US as well as data from the 60 laboratories
in the US that make up the National Respiratory and Enteric Virus Surveillance System
(NREVSS) (33)) . Additionally, the WHO Global Influenza Surveillance Network (GISN) has
been in use for over 60 years and comprises over 131 National Influenza Centers (NICs) in 105
countries (2).
Syndromic Surveillance
In contrast to traditional methods of virological surveillance, nontraditional syndromic-
based surveillance methods have become more widely implemented and developed over the last
decade due to a greater “atmosphere of concern” after the terrorist attacks of September 11,
2001(34, 35). Syndromic surveillance systems primarily implement various statistical analyses of
data that relate to individual behavioral patterns that indicate or suggest influenza infection (36).
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 8
Such behavioral indicators may include various data points such as healthcare visits, drug
purchases, or work absence.
Syndromic surveillance in the US is primarily conducted via the U.S. Outpatient Influenza-
like Illness Surveillance Network (ILINet). ILINet consists of about 2900 sentinel outpatient
healthcare providers that send weekly reports to the CDC with information on the number of
patient visits who meet the case definition of having an Influenza-like Illness (ILI) (37).
States also have their own methods of surveillance, and some are implementing more novel
web-based approaches. For example, the Maryland Department of Health and Mental Hygiene
has recently implemented a new online tracking survey called the Maryland Resident Influenza
Tracking Survey (MRITS) (38). MRITS is intended to compliment information from sentinel
providers to help identify ILI prevalence through an internet survey to residents.
Early Warning Capability
The practices and aims of virological and syndromic influenza surveillance are perhaps
better understood in terms of medical screening and testing. Syndromic surveillance as a
screening tool aims to provide easily accessible and timely data, which could then be verified
through virological testing. Generally, syndromic surveillance data should be available as close
to real-time as possible, as syndromic surveillance’s primary functions is to provide early
warning of potential novel outbreaks (36). As with all medical tests, there is a tradeoff that must
be made between the specificity and sensitivity of such tests. In general, practitioners determine
to what extent their screening methodology should be prone to type 1 statistical errors by giving
a false alarm. This determination is carefully determined based on the purpose of the surveillance
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 9
system. Protocols have been developed by the CDC and the Institute of Medicine to assess the
quality and effectiveness of syndromic surveillance systems (39, 40).
Post 9/11 and Bioterrorism. The terrorist attacks of September 11, 2001 undoubtedly
mark the beginning of the emergence of new policies and actions by governments to protect
against potential vulnerabilities for terrorist attacks. In particular, the threat of bioterrorism led
the United States government to further the design and implementation of automated, electronic
surveillance systems that could provide early warnings of emerging epidemics. This new post
9/11 environment of placed a “premium on timeliness,” which lead to an “emphasis on
automation of the full cycle of surveillance” (41).
The US military has maintained a laboratory-based global influenza surveillance program
since 1976 initially under a program of the United States Air Force (42). In 1997 this program
was expanded to include all US military services under the Department of Defense Global
Emerging Infections Surveillance and Response System (GEIS), which monitors various
infectious disease outbreaks throughout the world (43). In February 2012, the Department of
Defense (DOD) reorganized the GEIS under the newly established Armed Forces Health
Surveillance Center (AFHSC), which is currently designated as the primary source for all DOD-
level surveillance data (44). The GEIS is a member of the WHO Global Outbreak and Response
Network (GOARN), which comprises various international institutions through which weekly
privileged public health related alerts can be shared (45).
Novel Syndromic Systems
In 2001, the DOD began implementation of their first version of the Electronic
Surveillance System for the Early Notification of Community-based Epidemics (ESSENCE I),
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 10
which provided syndromic surveillance for US active military (46). Because of increased
concern for bioterrorism, an automated surveillance reporting mechanism was implemented with
ESSENCE I in 2002 (35).
Since ESSENCE I multiple newer versions of the syndromic surveillance system have been
developed and catered for various civilian areas and military facilities (47). ESSENCE II is a
system jointly developed by the Johns Hopkins University Applied Physics Laboratory in
conjunction with the Maryland Department of Health and Mental Hygiene, the District of
Columbia Department of Health, and the Virgina Department of Health (47). ESSENCE II began
implementing other secondary data sources related to disease transmission into more innovative
statistical and computational models.
GEIS began researching and developing additional novel syndromic surveillance methods
with ESSENCE II. As a part of this development, the Bio-ALIRT Biosurveillance Detection
Algorithm was developed by the Defense Advanced Research Projects Agency (DARPA). Other
contractors involved include Johns Hopkins University Applied Physics Laboratory, the Walter
Reed Army Institute of Research, the University of Pittsburgh/Carnegie Mellon University, the
General Dynamics Advanced Information Systems, the Stanford University Medical Informatics
group, the Potomac Institute, CDC, and the IBM corporation (48).
Additionally, GEIS introduced BioWar into their systems, which is a computer simulation
by Carnegie Mellon University that models disease transmission using social network,
communication media, weather models, and other untraditional data sources (49).
Web-based Surveillance. In addition to the collection of disease surveillance data from
sentinel outpatient providers, there has been increased interest in the use of social media and
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 11
other Web-based sources for disease surveillance. Traditional disease surveillance methods rely
on the aggregation of data sourced from actual clinical observations, which is comparatively
time-consuming and expensive (50). Alternatively, novel web-based surveillance platforms have
the potential to provide more cost-effective solutions of surveillance to provide real-time
surveillance and aggregation of disease-specific data from around the world.
Notable Projects. HealthMap is an internet-based service that collects and combines
disease outbreak data from various information sources. Since 2006 the HealthMap project
attempts to automate the process of data query, filtering, integration, and visualization of various
web-based reports of outbreaks disease outbreaks(10). Examples of the project’s sources for
aggregation include: news reports (via Google News aggregation service), official alerts and
announcements (e.g. WHO, CDC), as well as through “expert-curated” accounts who report
disease outbreaks via the ProMED Mail service (10).
Flu Near You uses elements of crowd-sourcing to estimate and communicate local ILI
activity. Flu Near You was created as a partnership between HealthMap at Boston Children’s
Hospital, the American Public Health Association (APHA), and the Skoll Global Threats Fund
(skollglobalthreats.org). Flu Near You estimates local influenza infection rates through self-
reported symptomatic data from its users. Users must be at least 13 years and old and reside in
either the United States or Canada. Additionally, Flu Near You aggregates and visualizes
regional ILI activity from user self-reported data, CDC weekly flu activity reports, and Google
Flu Trends ILI detection.
Twitter. Social media platforms like Twitter have enabled people to share concise
messages about their thoughts, opinions, and feelings that often relate to their personal lives.
Twitter is a prevalent microblogging service where users can post short status updates (or tweets)
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 12
in less than 140 characters. On Twitter, users are able to follow other users, which enables
followers to receive notifications and status updates of those whom they follow. Because tweets
are public, a user who is followed by another user does not have to reciprocate, or confirm, the
connection as is the case with other Social media services like Facebook. Jansen et al.
characterize microblogging platforms like Twitter by the following characteristics: (1) short
messages or status updates, (2) instantaneous or real-time publication of messages, and (3) a
subscription component for users to receive status updates of other users (51). In addition to
these 3 components, a fourth characteristic that should be included is that of the dissemination of
messages through various devices, platforms, and applications. This cross-platform
interoperability component is enabled primarily by the provision of an application programming
interface (API), which enables computer programmers to develop software that can easily
exchange data directly with a service provider like Twitter. The Twitter APIs not only promote
the adoption of the Twitter platform on other platforms like mobile devices and other web
services, but the APIs enable researchers and other 3rd parties to easily access status updates and
other meta-data on a large scale. The ability of public access of tweets coupled with the Twitter
APIs enable researchers to quickly and cheaply collect large samples of conversational data.
The information contained in tweets often provides relevant real-time insight and
information that relates to the larger contexts beyond the individual (50). Researchers have used
Twitter data in a variety of applications to infer global trends and events including political
opinion (52), and earthquake monitoring (53). Social media sources like Twitter often contain
geospatial components that could prove useful in tracking health conditions in various locales
where outpatient provider data may be sparse.
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 13
Locale-specific information about users is often tracked and calculated by services for
various marketing and advertising objectives regardless of user input or awareness. At the most
granular level, the geographic details of social media users can be directly provided by the users
in the form of latitude and longitudes from GPS equipped mobile devices. Users may be unaware
that they are directly submitting or publishing data that contain geospatial elements. Additionally
users may perceive that services provide added value e.g. traffic/commute times, location of
nearby events. Users may also provide geospatial data in order to help others (examples include
Waze for traffic or Mr. Checkpoint for location of DUI checkpoints). Social media services
often approximate users’ locations through a combination of various meta-data even if the
location is not provided by the mobile device. Geospatial data can be gleaned from users by
approximating location through users’ Internet Protocol addresses, phone numbers, and search
queries.
Analysis of Twitter Data
Twitter Surveillance System
Broniatowski et al. have developed and implemented an automated software platform to
query, filter, and integrate Twitter conversational data that estimates ILI prevalence globally
(54). Broniatowski et al. demonstrated that their platform’s estimates of ILI during the 2012-
2013 flu season were strongly correlated with CDC weekly surveillance data reports for the
United States (r = 0.93, p < 0.001) (54). Although Broniatowski et al. have reported how well
their Twitter-based surveillance method performs within the United States, it is currently
unknown how well their methodology will perform in other countries and locales. Furthermore,
it is unknown how the Twitter platform will perform with non-English conversational data.
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 14
Research Objectives
The primary objective of this study is to analyze the performance of the Broniatowski et al.
platform for several different English speaking countries. For this study we included the United
States as a baseline, and we selected four additional English-speaking countries in the UK to
assess the Twitter platform’s performance outside the United States. We included English-
speaking countries only, so that we could better assess the impact on performance by location.
This study extends upon the previous performance estimates by including weekly estimates over
3 flu seasons between August 15, 2011 and January 5, 2014. The following research questions
are addressed:
(1) How do the global weekly ILI estimates from the proposed Twitter surveillance
platform correlate with national influenza-like illness (ILI) incidence estimates as reported
weekly by the government surveillance networks in the United States, England, Scotland,
Wales, and Northern Ireland?
(2) How does performance of ILI estimation via Tweets differ by country and year?
Methods
The proposed platform implements a supervised classification model that determines
whether a Tweet indicates infection rather than just concern or discussion of influenza
symptoms. Broniatowski et al. couple this classification model with a specialized geolocation
system to infer influenza prevalence parameters. The details of this inference platform have been
described in detail previously (54, 55).
Data Collection
Broniatowski et al. use the Twitter API to access and download real-time “streams” of
public conversational data. Data were downloaded via 2 separate streams: a “general” stream that
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 15
is a random sample of all Tweets and a “health” stream that only downloads Tweets that mention
predefined health-related terms (54). The general stream sample is a random representative
sample of 1% of all tweets, while the health stream sample represents 1% of all Tweets that
include the health terms. This stratified sampling method makes it possible to normalize
influenza prevalence estimation (strata are Tweets that mention health terms, and all Tweets).
Tweets that indicate ILI infection are identified and coded by a complex, automated computer
algorithm. Additionally, Broniatowski et al. used human readers to code a small portion of
Tweets as a means of data validation and cross-checking with the computational method. After
filtering both streams by location, the influenza prevalence among the health stream tweets is
then normalized by the proportion of all Tweets by location from the general stream. For this
study, the totals of all Tweets collected was not available- instead, daily ILI estimates were
provided that were generated by the Broniatowski et al. procedure. The details of this inference
platform have been described in detail previously (54, 55).
Government-reported weekly ILI estimates for the various countries were collected by
me electronically primarily through the various official online data portals. All data were
publicly available and contained no personal identifiers and were therefore not subject to
Institutional Review Board (IRB) consideration according to, the National Human Subjects
Protection Advisory Committee (NHRPAC) (56) and the Johns Hopkins University IRB (57).
All data retrieved were regional and country estimates that contained no individual-specific
information. Data were accessed during January 2014. For the United States, online archives of
the CDCs FluView were accessed, which is a weekly report prepared by the CDCs Influenza
Division, using data from ILINet (58). ILI estimates through ILINet are determined each week
by the percentage of outpatient visits that are due to an influenza-like-illness. ILI is
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 16
symptomatically defined by ILINet as patients having fever (100 degrees Fahrenheit or more)
and cough and/or sore throat (37). I retrieved and stored weekly ILI estimates from week 35 of
2009 through Week 2 of 2014.
Weekly reporting periods for the United States are from Sunday through Saturday, whereas
the ISO-8601 week is Monday through Sunday. Because the UK reports conform to the ISO
Mon-Sun week, I grouped daily Twitter estimates into weeks according to the standard used by
government reports. This caveat is important especially if one tries to compare weekly ILI
reports and predictions between various countries. The weekly periods used in the analysis and
results are based on the standard used by each specific country. In this case, weeks for both US
government and Twitter estimates are from Sunday through Saturday, whereas weeks for the
other countries are Monday through Saturday. Because government reported data only provide
weekly totals of ILI, it was not possible to standardize data by time across countries.
For the United Kingdom, I collected weekly ILI estimates for each country from the Public
Health England (PHE) Influenza surveillance network reports from week 40 of 2010 through
week 7 of 2014. Weekly ILI estimates from PHE are determined from clinical data by general
practitioners (GPs). In the UK, GP clinical data are provided weekly to PHE by the various GP
networks for each country. For England, this data comes from the Royal College of General
Practitioners (RCGP). The RCGP weekly returns service has been run by the RCGP since 1966
(59). Each country within the UK uses slightly different schemas for GP-based surveillance,
which creates additional challenges in normalizing data between countries. For example, the
RCGP England and NHS Wales include only primary or first-time consultations, while Health
Protection (HP) Scotland includes repeat consultations (59-61). Because we cannot meaningfully
differentiate between first visits and follow-up visits in Scotland, we cannot normalize the data
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 17
with other countries in the UK. Additionally, PHE notes that health-seeking behaviors differ
among populations- specifically, according to reports from the Northern Ireland Department of
Health, Social Services and Public Safety (PHA Ireland), those in Northern Ireland go to a GP
more often than those in England (59, 62). Furthermore, the various national surveillance
systems define flu activity differently- RGCP England and HP Scotland use ILI, NPHS Wales
uses influenza, while PHA Northern Ireland combines influenza and ILI (59).
Results
Overall Correlation. For the entire period October 2011 through December 2013, the
correlation of Twitter and government ILI estimates in the United States was 0.84 (p <0.001).
Twitter ILI correlation was significantly lower in the UK with 0.41 correlation for England (p
<0.001), 0.37 for Scotland (p<0.001.), 0.35 for Wales (p=0.001), and 0.36 (p<0.001) for
Northern Ireland.
Yearly. The Twitter algorithm performed better each year in the US with a correlation of
0.71 during 2011-2012 (p<0.001), 0.89 in 2012-2013 (p<0.001) and 0.93 in 2013 (p = 0.001).
The Twitter algorithm performed the best during the 2013 time period in the United States
compared with all other years and countries (r = 0.93).
Among all other countries, the Twitter algorithm performed worse with each subsequent
time period. Additionally, for the 2013 time period Pearson’s correlation coefficient was found
not to be significant a=0.05 for all countries besides the United States. Outside of the United
States the Twitter algorithm performed best in Scotland during 2011-2012 (r=0.59).
Time Series Graphical Visualization. Both weekly Twitter and Government estimates
are plotted over the three periods used previously as well as over the entire 2011-2014 dataset.
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 18
A visualization of the entire series from 2011 to 2014 encompasses three separate flu seasons
and gives a visual overview of the performance of both surveillance methods over time.
Figures 1 and 2 cover the entire time period (combined). Figure 1 plots estimations from both
Twitter and government for the United States, while Figure 2 contains similar plots for each
country of the United Kingdom in one figure. The combined overview and helps describe
seasonal trends and lags between surveillance methods.
In the United States, Twitter ILI estimates are generally higher than the CDC estimates
for both 2010 and 2013 flu seasons. During times of lower flu activity (off season), Twitter
estimates are higher in the United States. In the UK, Twitter estimates are generally higher that
government estimates during the off seasons as well. Also, during the 2011-2012 and 2013-
2014 flu seasons Twitter estimates are higher in UK countries.
As is expected with this data, all of the time series exhibit seasonality; however, the
strength of yearly seasonality does vary among countries and time periods (especially for UK
countries in 2011-2012 and 2013-2014). During flu seasons in 2011-2012 and 2013-2014 in
the UK, Twitter estimates are considerably higher and more apparent than government reports.
Government estimates lag behind Twitter estimates for all countries in the UK during all time
periods.
In the United States, however, there is much smaller evidence of lag in 2011-2012 and
2013-2014, but there appears to be little or no lag during 2012-2013. For the 2012-2013 season
in the United States, both CDC and Twitter estimates appear closely correlated. Conversely,
Twitter and Government estimates for countries in the UK indicate a significant lag between
surveillance methods of at least 10 weeks. This lag is most evident in the 2012-2013 flu
season, which was the season with the most influenza activity among all countries.
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 19
Discussion
The Twitter ILI surveillance method performed much better in the US in terms of its
correlation with CDC weekly reports. It is expected that the Twitter estimates would perform
better with CDC data in the US, primarily because the computational model and methods were
developed and trained to predict CDC ILI. The CDC ILI predictions are different than other
countries. As mentioned previously, ILI case definition is slightly different, and the CDC does
not use the ISO week reporting period like most other countries. The lack of uniformity in data
collection, evaluation, and distribution among the various countries is problematic in
implementing and assessing infectious disease activity at a global level. For example, the ILI
case definition is slightly different between the CDC, WHO, and the UK. It is particularly
concerning; however, that the Twitter method generally overestimates CDC reports especially
during off seasons and during times of increasing ILI rates at the start of an outbreak. It is
particularly challenging to develop a computational method that can differentiate between tweets
from infected individuals and those who are merely discussing infection (i.e. people talking
about flu activity but are not infected). It makes sense that chatter about flu would be higher
during times of increased infection rates.
The comparatively low correlation among UK countries does demonstrate the difficulty
and challenges in implementing and assessing novel computational instruments such as the
Twitter-based method used for this study. Currently, various countries and organizations use
differing criteria and methodologies for disease surveillance. Because of differing collection and
coding methods of these countries’ national health organizations, it is difficult to estimate and
compare activity between countries. Therefore, we will draw special attention to those areas
where comparisons are rendered ineffective or impossible to make due to this challenge.
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 20
Limitations. While this study demonstrates compelling correlations between government-
reported and socially-generated health data, it is not without limitation. Additional analytic
methods should be implemented in future studies that can provide additional assessments of time
series data. Seasonality and lag should be assessed and modeled using more advanced statistical
methods like the Box-Jenkins method, which implements autoregressive moving average
(ARMA) and autoregressive integrated moving average (ARIMA) models to time-series data
(66). Additionally, autocorrelation and partial autocorrelation plots could be used to assess
stationarity and seasonality and to identify the indicated model(s) for the data. While such
statistical tools are interesting, these analyses are beyond the scope of this study. Additionally,
this dataset has a limited time period overall as well as only partial data for the 2013-2014
season. There is only partial data for the 2013-2014 season because the data was accessed in
January 2014. Currently, it is difficult to access and analyze data from various countries. For
example, since January 2014 the UK has changed the format and frequency of influenza reports
(59). Assessing lag is problematic because of potential confounders. Such factors that may
influence lag are reporting methods and turnaround times, including whether locales use internet-
based data submission tools. Furthermore, the ILI estimates provided by governments are
adjusted internally before reported. The CDC will even retrospectively change previously
reported estimates if new laboratory data become available (58).
Influence of Social Factors. There are also significant social, cultural, economic, and
even language considerations that should be assessed among the populations from which the
various surveillance methods sample. One must consider the profiles or potential biases of the
populations sampled by both government and Twitter methods. With each surveillance method,
one should consider which segments of the population are misrepresented. Which groups of
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 21
people are these surveillance methods not detecting? How do rates of type II errors (false
negatives, or a failure to detect infectious disease infection) vary among population subgroups
for a given surveillance instrument? The ideal surveillance method should have constant type II
errors across all population subgroups.
From a public health intervention standpoint, it is imperative that we better determine the
extent to which these surveillance methods potentially misrepresent key population groups. This
issue is compounded by the potential reality that underrepresented population groups may be at
greater risk for infection and subsequent morbidity and mortality. An accurate assessment of
sampling biases and population characteristics would enable us to better control for these
variations and ultimately ensure that interventional responses are prioritized and delivered to
population groups with the greatest risk.
Government Syndromic Surveillance. For example, people in Northern Ireland are more
likely to go to a general practitioner when sick and have symptoms of the flu. Because the UK’s
syndromic surveillance methods rely upon ILI reports from Northern Ireland general
practitioners, the government’s ILI estimates for this area tend to be higher than other areas
where individuals do not seek care when they have ILI symptoms (59). Because most
government syndromic surveillance methods rely upon reports by sentinel physician networks,
any factors that increase barriers to care would likely increase the likelihood of type II error
among population sub-groups affected by such factors. Those who live in rural areas (or areas
where it is difficult to seek medical care) often have greater barriers to care (76), and they may
be more likely to treat ILI symptoms without visiting a physician. Similarly, individuals without
health insurance and lower socio-economic status face greater barriers to care (77), and they may
be less likely to visit a doctor when experiencing ILI symptoms. Furthermore, there is evidence
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 22
that people with chronic mental disorders experience significant barriers to primary care access,
which could indicate these individuals would be more likely to be excluded from government-
based surveillance methods (63). Conversely, individuals with pre-existing medical conditions
that receive more frequent medical care (most likely older segments of the population), may in
fact be more likely to go to their doctor when sick. In this particular example, these individuals
would likely be overrepresented through government syndromic surveillance; however, this
group is likely more at risk for infection and greater mortality and morbidity. In this case, this
population subgroup may benefit more from current surveillance methods. A counterexample
would be young children (especially younger than 2 years) and pregnant women with low
socioeconomic status who have significant barriers to care. Migrant worker families with young
children and pregnant mothers may be less likely to go to a doctor when sick due to a
combination of potential factors (including legal issues, access to transportation, familiarity with
the healthcare system, language barriers, and other socioeconomic and cultural factors) (74, 75).
Like the example of elderly individuals, these individuals have a high risk for influenza infection
and complications; however, the latter would be more likely underrepresented by government
surveillance.
Future work is needed to integrate other existing datasets that include population-based
data that may be relevant to healthcare access and other risk factors for influenza infection and
complications. Additional research should explore methods that can control for such factors, so
we can more accurately identify disease within the populations and sub-groups from which
various surveillance methods sample.
Twitter Surveillance. In addition to the challenges with traditional syndromic influenza
surveillance methods, it is difficult to describe and characterize the users who tweet about their
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 23
own health statuses. While the sensitivity, or recall, of government methods is likely influenced
by access to care, the precision of Twitter methods rely on factors that influence the likelihood of
individuals posting tweets that include individual syndromic information. Regarding Twitter
surveillance, it is important to consider how various factors related to Twitter usage relate to
various population groups and sub-groups globally. In addition to the socio-economic factors
mention regarding government surveillance, additional factors should be considered like internet
access, social media usage, online communication preferences, and factors that describe types of
people who share personal syndromic information via Twitter.
It is important to determine how various groups of people are misrepresented with Twitter
surveillance. Lack of internet availability and access among less developed countries is certainly
a concern regarding the ability of a Twitter surveillance method to assess infectious disease
prevalence. Geopolitical factors influence internet access and access to specific web services
including Twitter. Individuals living in some countries may be hesitant to post personal
information on the Web due to political and social pressures. Activity of Twitter users also varies
by continent- North America has the greatest number of active users, while Africa has the least
amount of active users (69). Additionally, there are differences among who uses Twitter based
on simple demographic indicators like age, gender, and race/ethnicity.
There is evidence that Twitter users (in the United States) make up a highly non-uniform
sample of the US population in regards to geography, gender, and race/ethnicity (64). One study
reports that Twitter users in the US are more likely men from urban areas (64). In the UK,
younger people between the ages of 18 and 24 are much more likely to use Twitter than other
age groups, while individuals 65 years and older are adopting social media platforms more
frequently, especially Facebook (65). In reality, the demographics and characteristics of Twitter
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 24
users is changing, and it is difficult to access demographic information about Twitter users.
Current methods of determining demographics of Twitter users rely on traditional survey
methods using telephone interviews (68). Although Twitter stores detailed information about its
users internally, we are limited to the information the company provides through the Twitter API
(69). Furthermore, the long-term of the availability of Twitter data is not known. Although
Twitter currently provides access to information, there is a possibility the company could move
towards a more private model like Facebook.
There are likely many other factors that influence the likelihood of a Twitter user to
mention symptomology. There is a need to describe and better understand Twitter users who post
tweets about their own health. For example, one would expect women to be more likely to tweet
about their health, yet in the US, Twitter users are more likely to be younger men from urban
areas. There is evidence men and women use social media sites like Twitter differently (71).
There is evidence that online users discuss and share personal health-related topics, and such
users receive greater social support, emotional support, information support, and sometimes
tangible benefit (72, 73). However, additional research is needed to assess what factors influence
how people discuss personal health symptoms specifically on the Twitter platform. Future
studies and research should seek to control for various characteristics and demographics of
Twitter users, although the Twitter platform makes it difficult (if not impossible) to accurately
describe the characteristics of all Twitter users. The reality and nature of the Twitter platform
makes it challenging to normalize Twitter because of the complexities and difficulty in
characterizing Twitter users. These issues with the Twitter platform certainly challenge the
generalizability of Twitter as a stand-alone biosurveillance instrument. From a practical
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 25
perspective, public health researchers and practitioners should continue to develop methods that
combine traditional syndromic surveillance methods with other novel syndromic methods (67).
For this particular analysis, it is assumed that the government weekly reports are the most
accurate data sources that are readily accessible to the public. Because of this assumption, these
official reports are used as the standard to which the Twitter estimates are compared. There is the
possibility that the novel Twitter surveillance method is more indicative of actual influenza
infection than government reports (70), although this is difficult (if not impossible) to assess on a
global-population level. Government reports may provide a sufficient indication of influenza
outbreak within their specific country; however, because of the varying reporting methods it is
difficult to track infection trends between countries. The challenges of global infectious disease
surveillance are highlighted in how the WHO was unable to respond effectively to the 2009
H1N1 pandemic. A novel passive surveillance method (like the Twitter method) is particularly
promising because it implements a single method across all areas, which bypasses the issue of
various governments’ reporting methods and practices. Further research should address how to
integrate both traditional and novel methods of infectious disease surveillance into a single
instrument that can deliver rapid report packages to key decision makers at global, national,
regional, and community levels. Additional work should focus on creating such a product that
minimizes the potential of information overload, while retaining key information needed for
decision making.
Reflection
This research practicum has been extremely challenging to say the least. Additionally, this
experience has been rewarding and beneficial for me. During the course of my research, I had
to learn and develop new analytic skills to be able to work with time series data of this scale. I
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 26
was also able to develop computer and statistical programming skills in order to acquire,
organize, clean, analyze, and visualize data. I was able to use the Python programming
language to develop programs to automate the process of data collection and preparation.
In addition to analytic and technical challenges and growth, I have learned a tremendous
amount about disease surveillance in general and how disease surveillance is changing and
evolving today. It is very exciting to work with large datasets and to assess health conditions
and outcomes on a large population-based scale. In addition to working with large-scale data, it
was even more interesting for me to research and learn how individual stakeholders and groups
integrate such large amounts of data into their specific decisions and interventions at local
levels.
The analysis of the Twitter data highlights very well the difficulty dealing with large,
time-series datasets. It has been difficult for me to fully understand these various datasets,
especially regarding how to make specific interventional decisions at various micro and macro
levels. It was honestly quite reassuring to learn that organizations like the WHO have also
struggled to work through this process in order to make timely and appropriate decisions to
save lives.
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 27
Table I. Correlation of Twitter weekly ILI Estimates by Country and Calendar Year with National Surveillance ILI Reports
Note: Correlation coefficients, r, marked with an asterisk were not statistically significant (p < .05)
Time Period
2011-2014 (Combined) 2011-2012 2012-2013 2013-2014
10/3/2011 – 12/29/2013
10/3/2011 -
9/30/2012
10/1/2012 -
9/29/2013
9/30/2013 –
12/29/2013
Country
Pearson’s Product-
Moment Correlation
Coefficient, r
United States 0.84 (p < 0.001) 0.71 (p < 0.001) 0.89 (p < 0.001 ) 0.93 (p = 0.001)
England 0.41 (p < 0.001) 0.51 (p = 0.0002 ) 0.48 (p = 0.0004) -0.18 (p = 0.554 )*
Scotland 0.37 (p < 0.001) 0.59 (p < 0.001) 0.54 (p < 0.001) -0.38 (p = 0.196)*
Wales 0.35 (p = 0.001 ) 0.54 (p < 0.001) 0.37 (p = 0.0069) 0.33 (p = 0.271 )*
N. Ireland 0.36 (p < 0.001) 0.46 (p=0.0006 ) 0.44 (p = 0.0014) -0.35 (p = 0.267)*
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 28
Figures
Figure 1-1.
Figure 1-2.
0246810
Influenzaprevalence(%)
123456
2011w40 2012w1 2012w14 2012w27 2012w40 2013w1 2013w13 2013w26 2013w40 2014w1 2014w13
Week
ILI USA (Government) ILI USA (Twitter)
United States
2011 - 2014 Influenza
0510152025
Influenzaprevalence(%)
0
20406080
ILIPrevalence(%)
2011w40 2012w1 2012w14 2012w27 2012w40 2013w1 2013w13 2013w26 2013w40 2014w1 2014w13
Week
ILI England (Government) ILI Scotland (Government)
ILI Wales (Government) ILI N. Ireland (Government)
ILI England (Twitter) ILI Scotland (Twitter)
ILI Wales (Twitter) ILI N. Ireland (Twitter)
United Kingdom
2011 - 2014 Influenza
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 29
Appendix
0510152025
Influenzaprevalence(%)
0
10203040
ILIPrevalence(%)
2011w40 2012w1 2012w14 2012w27 2012w40 2013w1 2013w13 2013w26 2013w40 2014w1 2014w13
Week
ILI Wales (Government) ILI Wales (Twitter)
Wales
2011 - 2014 Influenza
012345
Influenzaprevalence(%)
.5
1
1.5
2
2.5
2011w31 2011w40 2011w48 2012w5 2012w14 2012w22 2012w31
Week
ILI USA (Government) ILI USA (Twitter)
United States
2011-2012 Influenza Season
12345
Influenzaprevalence(%)
12345
2013w35 2013w40 2013w44 2013w48 2014w1 2014w5 2014w9 2014w13
Week
ILI USA (Government) ILI USA (Twitter)
United States
2013-2014 Influenza Season
246810
Influenzaprevalence(%)
123456
ILIPrevalence(%ofoutpatientvisits)
2012w31 2012w40 2012w48 2013w5 2013w13 2013w22 2013w31
Week
ILI USA (Government) ILI USA (Twitter)
United States
2012-2013 Influenza Season
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 30
0
20
40
60
80
2012w1 2012w27 2013w1 2013w26 2014w1 2014w26
Time
ILI USA (Government) ILI N. Ireland (Government)
ILI England (Government) ILI Wales (Government)
ILI Scotland (Government)
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 31
References
1. World Health Organization. New influenza A (H1N1) virus infections: Global surveillance
summary, may 2009. 2009.
2. Briand S, Mounts A, Chamberland M. Challenges of global surveillance during an influenza
pandemic. Public Health. 2011;125(5):247-56.
3. World Health Organization. CDC protocol of realtime RTPCR for swine influenza A (H1N1).
2009.
4. Lipsitch M, Riley S, Cauchemez S, Ghani AC, Ferguson NM. Managing and reducing
uncertainty in an emerging influenza pandemic. N Engl J Med. 2009;361(2):112-5.
5. Milord JT, Perry RP. A methodological study of overloadx. J Gen Psychol. 1977;97(1):131-7.
6. Speier C, Valacich JS, Vessey I. The influence of task interruption on individual decision
making: An information overload perspective. Decision Sciences. 1999;30(2):337-60.
7. Shenk D. Information overload, concept of. Encyclopedia of International Media and
Communications. 2003;2.
8. Hiltz SR, Turoff M. Structuring computer-mediated communication systems to avoid
information overload. Commun ACM. 1985;28(7):680-9.
9. Berghel H. Cyberspace 2000: Dealing with information overload. Commun ACM.
1997;40(2):19-24.
10. Freifeld CC, Mandl KD, Reis BY, Brownstein JS. HealthMap: Global infectious disease
monitoring through automated classification and visualization of internet media reports. J Am
Med Inform Assoc. 2008 Mar-Apr;15(2):150-7.
11. Internet users (per 100 people). data retrieved june 3, 2014, from world DataBank: World
development indicators database. [Internet].; 2014
http://findit.library.jhu.edu/resolve?sid=Refworks&charset=utf-
8&__char_set=utf8&genre=article&aulast=World%20Bank&date=2014&atitle=Internet%20user
s%20(per%20100%20people).%20Data%20retrieved%20June%203%2C%202014%2C%20from
%20World%20DataBank%3A%20World%20Development%20Indicators%20database.&au=Wo
rld%20Bank%20&.
12. Kay M. mHealth: New horizons for health through mobile technologies. World Health
Organization. 2011.
13. Black AD, Car J, Pagliari C, Anandan C, Cresswell K, Bokun T, et al. The impact of eHealth
on the quality and safety of health care: A systematic overview. PLoS medicine.
2011;8(1):e1000387.
14. Eysenbach G, CONSORT-EHEALTH Group. CONSORT-EHEALTH: Improving and
standardizing evaluation reports of web-based and mobile health interventions. J Med Internet
Res. 2011 Dec 31;13(4):e126.
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 32
15. Eysenbach G. Infodemiology: Tracking flu-related searches on the web for syndromic
surveillance. AMIA Annu Symp Proc. 2006:244-8.
16. Eysenbach G. Infodemiology and infoveillance: Framework for an emerging set of public
health informatics methods to analyze search, communication and publication behavior on the
internet. J Med Internet Res. 2009 Mar 27;11(1):e11.
17. Influenza (seasonal) fact sheet no. 211 [Internet]; March 2014. Available from:
http://www.who.int/mediacentre/factsheets/fs211/en/http://findit.library.jhu.edu/resolve?sid=Ref
works&charset=utf-
8&__char_set=utf8&genre=article&aulast=World%20Health%20Organization&date=March%2
02014&volume=2014&issue=May%2025&atitle=Influenza%20(Seasonal)%20Fact%20Sheet%2
0No.%20211&au=World%20Health%20Organization%20&.
18. Flu symptoms & severity [Internet]; September 2013 . Available from:
http://www.cdc.gov/flu/about/disease/symptoms.htmhttp://findit.library.jhu.edu/resolve?sid=Ref
works&charset=utf-
8&__char_set=utf8&genre=article&aulast=Centers%20for%20Disease%20Control%20and%20
Prevention&date=September%202013&volume=2014&issue=May%2025&atitle=Flu%20Sympt
oms%20%26%20Severity&au=Centers%20for%20Disease%20Control%20and%20Prevention%
20&.
19. Glezen WP, Greenberg SB, Atmar RL, Piedra PA, Couch RB. Impact of respiratory virus
infections on persons with chronic underlying conditions. JAMA. 2000;283(4):499-505.
20. Glezen WP, Couch RB, MacLean RA, Payne A, Baird JN, Vallbona C, et al. Interpandemic
influenza in the houston area, 1974–76. N Engl J Med. 1978;298(11):587-92.
21. Monto AS, Kioumehr F. The tecumseh study of respiratory illness. IX. occurence of
influenza in the community, 1966--1971. Am J Epidemiol. 1975 Dec;102(6):553-63.
22. Glezen WP. Serious morbidity and mortality associated with influenza epidemics. Epidemiol
Rev. 1982;4:25-44.
23. Monto AS. Influenza: Quantifying morbidity and mortality. Am J Med. 1987;82(6):20-5.
24. Barker WH. Excess pneumonia and influenza associated hospitalization during influenza
epidemics in the united states, 1970-78. Am J Public Health. 1986 Jul;76(7):761-5.
25. Barker WH, Mullooly JP. Impact of epidemic type A influenza in a defined adult population.
Am J Epidemiol. 1980 Dec;112(6):798-811.
26. People at high risk of developing Flu–Related complications [Internet]. . Available from:
http://www.cdc.gov/flu/about/disease/high_risk.htmhttp://findit.library.jhu.edu/resolve?sid=Ref
works&charset=utf-
8&__char_set=utf8&genre=article&aulast=Centers%20for%20Disease%20Control%20and%20
Prevention&volume=2014&issue=4%20June%202014&atitle=People%20at%20High%20Risk%
20of%20Developing%20Flu%E2%80%93Related%20Complications&au=Centers%20for%20D
isease%20Control%20and%20Prevention%20&.
27. How flu spreads [Internet]. . Available from:
http://www.cdc.gov/flu/about/disease/spread.htmhttp://findit.library.jhu.edu/resolve?sid=Refwor
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 33
ks&charset=utf-
8&__char_set=utf8&genre=article&aulast=Centers%20for%20Disease%20Control%20and%20
Prevention&volume=2014&issue=25%20June&atitle=How%20Flu%20Spreads&au=Centers%2
0for%20Disease%20Control%20and%20Prevention%20&.
28. Noble G. Epidemiological and clinical aspects of influenza, beare AS, basic and applied
influenza research, 1982, 11-50.
29. Simonsen L, Clarke MJ, Schonberger LB, Arden NH, Cox NJ, Fukuda K. Pandemic versus
epidemic influenza mortality: A pattern of changing age distribution. J Infect Dis. 1998
Jul;178(1):53-60.
30. Webster RG, Bean WJ, Gorman OT, Chambers TM, Kawaoka Y. Evolution and ecology of
influenza A viruses. Microbiol Rev. 1992 Mar;56(1):152-79.
31. Dolin R. Influenza-interpandemic as well as pandemic disease. N Engl J Med.
2005;353(24):2535.
32. Noble G. Epidemiological and clinical aspects of influenza. 1982.
33. The national respiratory and enteric virus surveillance system (NREVSS) [Internet].; 2014 .
Available from:
http://www.cdc.gov/surveillance/nrevss/http://findit.library.jhu.edu/resolve?sid=Refworks&chars
et=utf-
8&__char_set=utf8&genre=article&aulast=Centers%20for%20Disease%20Control%20and%20
Prevention&date=2014&volume=2014&issue=June%2018&atitle=The%20National%20Respira
tory%20and%20Enteric%20Virus%20Surveillance%20System%20(NREVSS)&au=Centers%20
for%20Disease%20Control%20and%20Prevention%20&.
34. Henning KJ. What is syndromic surveillance? Morb Mortal Weekly Rep. 2004:7-11.
35. Marsden-Haug N, Foster VB, Gould PL, Elbert E, Wang H, Pavlin JA. Code-based
syndromic surveillance for influenzalike illness by international classification of diseases, ninth
revision. Emerg Infect Dis. 2007 Feb;13(2):207-16.
36. Stoto MA, Schonlau M, Mariano LT. Syndromic surveillance: Is it worth the effort? Chance.
2004;17(1):19-24.
37. Overview of influenza surveillance in the united states [Internet]. Available from:
http://www.cdc.gov/flu/weekly/overview.htm#Viralhttp://findit.library.jhu.edu/resolve?sid=Ref
works&charset=utf-
8&__char_set=utf8&genre=article&aulast=Centers%20for%20Disease%20Control%20and%20
Prevention&volume=2014&issue=June%2018&atitle=Overview%20of%20Influenza%20Surveil
lance%20in%20the%20United%20States&au=Centers%20for%20Disease%20Control%20and%
20Prevention%20&.
38. Maryland resident influenza tracking survey [Internet]. Available from:
http://flusurvey.dhmh.md.gov/http://findit.library.jhu.edu/resolve?sid=Refworks&charset=utf-
8&__char_set=utf8&genre=article&aulast=Maryland%20Department%20of%20Health%20and
%20Mental%20Hygiene&volume=2014&issue=June%2023&atitle=Maryland%20Resident%20I
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 34
nfluenza%20Tracking%20Survey&au=Maryland%20Department%20of%20Health%20and%20
Mental%20Hygiene%20&.
39. German RR, Lee L, Horan J, Milstein R, Pertowski C, Waller M. Updated guidelines for
evaluating public health surveillance systems. MMWR Recomm Rep. 2001;50(RR-13):1-35.
40. Mallon TM. Progress in implementing recommendations in the national academy of sciences
reports:“Protecting those who serve: Strategies to protect the health of deployed US forces”. Mil
Med. 2011;176(7S):9-16.
41. Buehler JW, Sonricker A, Paladini M, Soper P, Mostashari F. Syndromic surveillance
practice in the united states: Findings from a survey of state, territorial, and selected local health
departments. Advances in Disease Surveillance. 2008;6(3):1-20.
42. Canas LC, Lohman K, Pavlin JA, Endy T, Singh DL, Pandey P, et al. The department of
defense laboratory-based global influenza surveillance system. Mil Med. 2000 Jul;165(7 Suppl
2):52-6.
43. Global emerging infections surveillance & response system [Internet]. Available from:
http://www.afhsc.mil/geishttp://findit.library.jhu.edu/resolve?sid=Refworks&charset=utf-
8&__char_set=utf8&genre=article&aulast=Armed%20Forces%20Health%20Surveillance%20C
enter&volume=2014&issue=June%2018&atitle=Global%20Emerging%20Infections%20Surveil
lance%20%26%20Response%20System&au=Armed%20Forces%20Health%20Surveillance%20
Center%20&.
44. Department of Defense. Comprehensive health surveillance. 2012. Report No.: DoDD
640.02E.
45. Homeland Security: Improving Public Health Surveillance, Hearing Before the Subcomitte
on Government Reform, House of Representitives of the 108th Congress, 1st Sess, 2003).
46. Mandl KD, Overhage JM, Wagner MM, Lober WB, Sebastiani P, Mostashari F, et al.
Implementing syndromic surveillance: A practical guide informed by the early experience. J Am
Med Inform Assoc. 2004 Mar-Apr;11(2):141-50.
47. Lombardo MJ, Burkom H, Elbert ME, Magruder S, Lewis MSH, Loschen MW, et al. A
systems overview of the electronic surveillance system for the early notification of community-
based epidemics (ESSENCE II). Journal of urban health. 2003;80(1):i32-42.
48. Siegrist D, Pavlin J. Bio-ALIRT biosurveillance detection algorithm evaluation. Morb Mortal
Weekly Rep. 2004:152-8.
49. Carley KM, Altman N, Kaminsky B, Nave D, Yahja A. BioWar: a city-scale multi-agent
network model of weaponized biological attacks. 2004.
50. Dredze M. How social media will change public health. Intelligent Systems, IEEE.
2012;27(4):81-4.
51. Jansen BJ, Zhang M, Sobel K, Chowdury A. Twitter power: Tweets as electronic word of
mouth. J Am Soc Inf Sci Technol. 2009;60(11):2169-88.
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 35
52. O'Connor B, Balasubramanyan R, Routledge BR, Smith NA. From tweets to polls: Linking
text sentiment to public opinion time series. ICWSM. 2010;11:122-9.
53. Sakaki T, Okazaki M, Matsuo Y. Earthquake shakes twitter users: Real-time event detection
by social sensors. Proceedings of the 19th international conference on world wide web; ACM;
2010.
54. Broniatowski DA, Paul MJ, Dredze M. National and local influenza surveillance through
twitter: An analysis of the 2012-2013 influenza epidemic. PloS one. 2013;8(12):e83672.
55. Paul MJ, Dredze M. A model for mining public health topics from twitter. HEALTH.
2012;11:16-.
56. National Human Subjects Protection Advisory Committee. Recommendations on public use
data files. Office for Human Research Protection; 2002.
57. IRB office preliminary determinations for MPH and other degree students [Internet].
Available from: http://www.jhsph.edu/offices-and-services/institutional-review-board/student-
projects/other-degree-
students.htmlhttp://findit.library.jhu.edu/resolve?sid=Refworks&charset=utf-
8&__char_set=utf8&genre=article&aulast=JHSPH%20Institutional%20Review%20Board&volu
me=2014&issue=12%2F18&atitle=IRB%20Office%20Preliminary%20Determinations%20for%
20MPH%20and%20Other%20Degree%20Students&au=JHSPH%20Institutional%20Review%2
0Board%20&.
58. FluView: Weekly U.S. influenza surveillance report [Internet]. Available from:
http://www.cdc.gov/flu/weekly/pastreports.htmhttp://findit.library.jhu.edu/resolve?sid=Refworks
&charset=utf-
8&__char_set=utf8&genre=article&aulast=Centers%20for%20Disease%20Control%20and%20
Prevention%20Influenza%20Division&volume=2014&issue=January%2013&atitle=FluView%
3A%20Weekly%20U.S.%20Influenza%20Surveillance%20Report&au=Centers%20for%20Dise
ase%20Control%20and%20Prevention%20Influenza%20Division%20&.
59. Sources of UK flu data: Influenza surveillance in the UK [Internet].; 2014 . Available from:
https://www.gov.uk/sources-of-uk-flu-data-influenza-surveillance-in-the-
ukhttp://findit.library.jhu.edu/resolve?sid=Refworks&charset=utf-
8&__char_set=utf8&genre=article&aulast=Public%20Health%20England&date=2014&volume
=2014&issue=February%2015&atitle=Sources%20of%20UK%20flu%20data%3A%20influenza
%20surveillance%20in%20the%20UK&au=Public%20Health%20England%20&.
60. Weekly influenza activity in wales report [Internet].; 2014
http://findit.library.jhu.edu/resolve?sid=Refworks&charset=utf-
8&__char_set=utf8&genre=article&aulast=Public%20Health%20Wales&date=2014&volume=2
014&issue=January%2028&atitle=Weekly%20Influenza%20Activity%20in%20Wales%20Repo
rt&au=Public%20Health%20Wales%20&.
61. National influenza report [Internet].; 2014 . Available from:
http://www.hps.scot.nhs.uk/resp/influenzareports.aspxhttp://findit.library.jhu.edu/resolve?sid=Re
fworks&charset=utf-
8&__char_set=utf8&genre=article&aulast=Health%20Protection%20Scotland&date=2014&vol
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 36
ume=2014&issue=January%2021&atitle=National%20Influenza%20Report&au=Health%20Prot
ection%20Scotland%20&.
62. Nidirect [Internet].; 2014 . Available from:
http://www.dhsspsni.gov.uk/http://findit.library.jhu.edu/resolve?sid=Refworks&charset=utf-
8&__char_set=utf8&genre=article&aulast=Northern%20Ireland%20Department%20of%20Heal
th%2C%20Social%20Services%20and%20Public%20Safety&auinit=%20Social%20Services%2
0and%20Public%20Safety&date=2014&volume=2014&issue=January%2014&atitle=nidirect&a
u=Northern%20Ireland%20Department%20of%20Health%2C%20Social%20Services%20and%
20Public%20Safety%20&.
63. Miller CL, Druss BG, Dombrowski EA, Rosenheck RA. Barriers to primary medical care
among patients at a community mental health center. Psychiatric Services. 2003;54(8):1158-60.
64. Mislove A, Lehmann S, Ahn Y, Onnela J, Rosenquist JN. Understanding the demographics
of twitter users. ICWSM. 2011;11:5th.
65. UK seniors choose facebook [Internet].; 2013 . Available from:
http://www.emarketer.com/Article/UK-Seniors-Choose-
Facebook/1010484http://findit.library.jhu.edu/resolve?sid=Refworks&charset=utf-
8&__char_set=utf8&genre=article&date=2013&volume=2014&issue=12%2F18&atitle=UK%2
0Seniors%20Choose%20Facebook&.
66. Anderson OD. Time series analysis and forecasting: The box-jenkins approach. Butterworths
London and Boston; 1976.
67. Wagner, M. M., Espino, J., Tsui, F. C., Gesteland, P., Chapman, W., Ivanov, O., ... &
Hutman, J. (2004). Syndrome and outbreak detection using chief-complaint data—experience of
the Real-Time Outbreak and Disease Surveillance project. Morbidity and Mortality Weekly
Report, 28-31.
68. Duggan, M., & Brenner, J. (2013). The demographics of social media users, 2012 (Vol. 14).
Washington, DC: Pew Research Center's Internet & American Life Project.
69. Java, A., Song, X., Finin, T., & Tseng, B. (2007, August). Why we twitter: understanding
microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD
2007 workshop on Web mining and social network analysis (pp. 56-65). ACM.
70. Aramaki, E., Maskawa, S., & Morita, M. (2011, July). Twitter catches the flu: detecting
influenza epidemics using Twitter. In Proceedings of the Conference on Empirical Methods in
Natural Language Processing (pp. 1568-1576). Association for Computational Linguistics.
71. Heil, B., & Piskorski, M. (2009). New research: Men follow men and nobody tweets. 2009-
06-01). http:∥ blogs. hbr. org/cs/2009/06/new _ twitter _ research _ men _ follo. html.
72. Rains, S. A., & Keating, D. M. (2011). The social dimension of blogging about health:
Health blogging, social support, and well-being. Communication Monographs, 78(4), 511-534.
73. Mo, P. K., & Coulson, N. S. (2008). Exploring the communication of social support within
virtual communities: A content analysis of messages posted to an online HIV/AIDS support
group. Cyberpsychology & behavior, 11(3), 371-374.
PANDEMIC RESPONSE IN THE ERA OF BIG DATA 37
74. Phillips, K. A., Mayer, M. L., & Aday, L. A. (2000). Barriers to care among racial/ethnic
groups under managed care. Health Affairs, 19(4), 65-75.
75. Ngo‐Metzger, Q., Massagli, M. P., Clarridge, B. R., Manocchia, M., Davis, R. B., Iezzoni, L.
I., & Phillips, R. S. (2003). Linguistic and cultural barriers to care. Journal of general internal
medicine, 18(1), 44-52.
76. Heckman, T. G., Somlai, A. M., Peters, J., Walker, J., Otto-Salaj, L., Galdabini, C. A., &
Kelly, J. A. (1998). Barriers to care among persons living with HIV/AIDS in urban and rural
areas. AIDS care, 10(3), 365-375.
77. Newacheck, P. W., Stoddard, J. J., Hughes, D. C., & Pearl, M. (1998). Health insurance and
access to primary care for children. New England Journal of Medicine, 338(8), 513-519.

More Related Content

What's hot

Covid tto reino_unido Dr. Freddy Flores Malpartida
Covid tto reino_unido Dr. Freddy Flores MalpartidaCovid tto reino_unido Dr. Freddy Flores Malpartida
Covid tto reino_unido Dr. Freddy Flores MalpartidaFreddy Flores Malpartida
 
Lotta Berg 2009 WSPA conference Poultry welfare AI
Lotta Berg 2009 WSPA conference Poultry welfare AILotta Berg 2009 WSPA conference Poultry welfare AI
Lotta Berg 2009 WSPA conference Poultry welfare AIHarm Kiezebrink
 
COVID-19 and Italy: what next?
COVID-19 and Italy: what next?COVID-19 and Italy: what next?
COVID-19 and Italy: what next?Valentina Corona
 
Abstract congress covid 19.docx30
Abstract congress covid 19.docx30Abstract congress covid 19.docx30
Abstract congress covid 19.docx30Enida Xhaferi
 
Coronavirus Unmasked - Biosecurity and Medical Fascism
Coronavirus Unmasked - Biosecurity and Medical FascismCoronavirus Unmasked - Biosecurity and Medical Fascism
Coronavirus Unmasked - Biosecurity and Medical FascismAndrew Johnson
 
Triple Helix Article_Katznelson
Triple Helix Article_KatznelsonTriple Helix Article_Katznelson
Triple Helix Article_KatznelsonEthan Katznelson
 
Human-to-Human transmission of H7H7 in Holland 2003
Human-to-Human transmission of H7H7 in Holland 2003Human-to-Human transmission of H7H7 in Holland 2003
Human-to-Human transmission of H7H7 in Holland 2003Harm Kiezebrink
 
Impact of health facilities and low death rate of COVID-19-in Germany compare...
Impact of health facilities and low death rate of COVID-19-in Germany compare...Impact of health facilities and low death rate of COVID-19-in Germany compare...
Impact of health facilities and low death rate of COVID-19-in Germany compare...SubmissionResearchpa
 
Wind mediated spread of LPAI
Wind mediated spread of LPAIWind mediated spread of LPAI
Wind mediated spread of LPAIHarm Kiezebrink
 
Public health emergency: International Concern
Public health emergency: International ConcernPublic health emergency: International Concern
Public health emergency: International Concernlanhelmy
 
THE NEGATIVE IMPACT OF COVID-19 ON THE ENJOYMENT OF LEGAL FREEDOMS
THE NEGATIVE IMPACT OF COVID-19 ON THE ENJOYMENT OF LEGAL FREEDOMSTHE NEGATIVE IMPACT OF COVID-19 ON THE ENJOYMENT OF LEGAL FREEDOMS
THE NEGATIVE IMPACT OF COVID-19 ON THE ENJOYMENT OF LEGAL FREEDOMSAkashSharma618775
 
Deadly H5N1 birdflu needs just five mutations to spread easily in people
Deadly H5N1 birdflu needs just five mutations to spread easily in peopleDeadly H5N1 birdflu needs just five mutations to spread easily in people
Deadly H5N1 birdflu needs just five mutations to spread easily in peopleHarm Kiezebrink
 
COVID-19: Make it the Last Pandemic
COVID-19: Make it the Last PandemicCOVID-19: Make it the Last Pandemic
COVID-19: Make it the Last PandemicGuy Boulianne
 
Ruan2020 likelihood of survival of coronavirus disease 2019
Ruan2020   likelihood of survival of coronavirus disease 2019Ruan2020   likelihood of survival of coronavirus disease 2019
Ruan2020 likelihood of survival of coronavirus disease 2019Nilda Vllacres
 
The 3 P’s of avian influenza Prevent, Plan, Practice
The 3 P’s of avian influenza Prevent, Plan, PracticeThe 3 P’s of avian influenza Prevent, Plan, Practice
The 3 P’s of avian influenza Prevent, Plan, PracticeHarm Kiezebrink
 
Understanding A Pandemic: COVID-19
Understanding A Pandemic: COVID-19Understanding A Pandemic: COVID-19
Understanding A Pandemic: COVID-19Alex Sterling
 

What's hot (20)

Covid tto reino_unido Dr. Freddy Flores Malpartida
Covid tto reino_unido Dr. Freddy Flores MalpartidaCovid tto reino_unido Dr. Freddy Flores Malpartida
Covid tto reino_unido Dr. Freddy Flores Malpartida
 
Piis0140673620302609
Piis0140673620302609Piis0140673620302609
Piis0140673620302609
 
Lotta Berg 2009 WSPA conference Poultry welfare AI
Lotta Berg 2009 WSPA conference Poultry welfare AILotta Berg 2009 WSPA conference Poultry welfare AI
Lotta Berg 2009 WSPA conference Poultry welfare AI
 
COVID-19 and Italy: what next?
COVID-19 and Italy: what next?COVID-19 and Italy: what next?
COVID-19 and Italy: what next?
 
Abstract congress covid 19.docx30
Abstract congress covid 19.docx30Abstract congress covid 19.docx30
Abstract congress covid 19.docx30
 
Coronavirus Unmasked - Biosecurity and Medical Fascism
Coronavirus Unmasked - Biosecurity and Medical FascismCoronavirus Unmasked - Biosecurity and Medical Fascism
Coronavirus Unmasked - Biosecurity and Medical Fascism
 
Triple Helix Article_Katznelson
Triple Helix Article_KatznelsonTriple Helix Article_Katznelson
Triple Helix Article_Katznelson
 
Human-to-Human transmission of H7H7 in Holland 2003
Human-to-Human transmission of H7H7 in Holland 2003Human-to-Human transmission of H7H7 in Holland 2003
Human-to-Human transmission of H7H7 in Holland 2003
 
Impact of health facilities and low death rate of COVID-19-in Germany compare...
Impact of health facilities and low death rate of COVID-19-in Germany compare...Impact of health facilities and low death rate of COVID-19-in Germany compare...
Impact of health facilities and low death rate of COVID-19-in Germany compare...
 
Wind mediated spread of LPAI
Wind mediated spread of LPAIWind mediated spread of LPAI
Wind mediated spread of LPAI
 
Public health emergency: International Concern
Public health emergency: International ConcernPublic health emergency: International Concern
Public health emergency: International Concern
 
THE NEGATIVE IMPACT OF COVID-19 ON THE ENJOYMENT OF LEGAL FREEDOMS
THE NEGATIVE IMPACT OF COVID-19 ON THE ENJOYMENT OF LEGAL FREEDOMSTHE NEGATIVE IMPACT OF COVID-19 ON THE ENJOYMENT OF LEGAL FREEDOMS
THE NEGATIVE IMPACT OF COVID-19 ON THE ENJOYMENT OF LEGAL FREEDOMS
 
Deadly H5N1 birdflu needs just five mutations to spread easily in people
Deadly H5N1 birdflu needs just five mutations to spread easily in peopleDeadly H5N1 birdflu needs just five mutations to spread easily in people
Deadly H5N1 birdflu needs just five mutations to spread easily in people
 
COVID-19: Make it the Last Pandemic
COVID-19: Make it the Last PandemicCOVID-19: Make it the Last Pandemic
COVID-19: Make it the Last Pandemic
 
Ruan2020 likelihood of survival of coronavirus disease 2019
Ruan2020   likelihood of survival of coronavirus disease 2019Ruan2020   likelihood of survival of coronavirus disease 2019
Ruan2020 likelihood of survival of coronavirus disease 2019
 
The 3 P’s of avian influenza Prevent, Plan, Practice
The 3 P’s of avian influenza Prevent, Plan, PracticeThe 3 P’s of avian influenza Prevent, Plan, Practice
The 3 P’s of avian influenza Prevent, Plan, Practice
 
Rockefeller vaccination plan
Rockefeller vaccination planRockefeller vaccination plan
Rockefeller vaccination plan
 
Covid 19 ppt 1
Covid 19  ppt 1Covid 19  ppt 1
Covid 19 ppt 1
 
Understanding A Pandemic: COVID-19
Understanding A Pandemic: COVID-19Understanding A Pandemic: COVID-19
Understanding A Pandemic: COVID-19
 
Analytics on covid 19
Analytics on covid 19Analytics on covid 19
Analytics on covid 19
 

Similar to Pandemic Response in the Era of Big Data (Prier, 2015)

Social Media could be a of Threat for an “Infodemic” throughout COVID-19 Pand...
Social Media could be a of Threat for an “Infodemic” throughout COVID-19 Pand...Social Media could be a of Threat for an “Infodemic” throughout COVID-19 Pand...
Social Media could be a of Threat for an “Infodemic” throughout COVID-19 Pand...asclepiuspdfs
 
Alternative mental health therapies in prolonged lockdowns: narratives from C...
Alternative mental health therapies in prolonged lockdowns: narratives from C...Alternative mental health therapies in prolonged lockdowns: narratives from C...
Alternative mental health therapies in prolonged lockdowns: narratives from C...Petar Radanliev
 
Imperial college-covid19-npi-modelling-16-03-2020
Imperial college-covid19-npi-modelling-16-03-2020Imperial college-covid19-npi-modelling-16-03-2020
Imperial college-covid19-npi-modelling-16-03-2020Mumbaikar Le
 
Imperial college-covid19-npi-modelling-16-03-2020
Imperial college-covid19-npi-modelling-16-03-2020Imperial college-covid19-npi-modelling-16-03-2020
Imperial college-covid19-npi-modelling-16-03-2020Wouter de Heij
 
A Topic Analysis Of Traditional And Social Media News Coverage Of The Early C...
A Topic Analysis Of Traditional And Social Media News Coverage Of The Early C...A Topic Analysis Of Traditional And Social Media News Coverage Of The Early C...
A Topic Analysis Of Traditional And Social Media News Coverage Of The Early C...Vicki Cristol
 
Computational Epidemiology tutorial featured at ACM Knowledge Discovery and D...
Computational Epidemiology tutorial featured at ACM Knowledge Discovery and D...Computational Epidemiology tutorial featured at ACM Knowledge Discovery and D...
Computational Epidemiology tutorial featured at ACM Knowledge Discovery and D...Biocomplexity Institute of Virginia Tech
 
Exercise rudiments Coronavirus complaint 2019
Exercise rudiments Coronavirus complaint 2019Exercise rudiments Coronavirus complaint 2019
Exercise rudiments Coronavirus complaint 2019GaryJohns11
 
Review of Recent COVID-19 Science ~ Denis G. Rancourt, PhD
Review of Recent COVID-19 Science ~ Denis G. Rancourt, PhDReview of Recent COVID-19 Science ~ Denis G. Rancourt, PhD
Review of Recent COVID-19 Science ~ Denis G. Rancourt, PhDPandataAnalytics
 
Patient perception from internet on adverse effects vs benefits of vaccination
Patient perception from internet on adverse effects vs benefits of vaccination  Patient perception from internet on adverse effects vs benefits of vaccination
Patient perception from internet on adverse effects vs benefits of vaccination Cecilia Young 楊幽幽
 
Influenza, Public Health Communication
Influenza, Public Health CommunicationInfluenza, Public Health Communication
Influenza, Public Health CommunicationEmilio Mordini
 
Tweeting fever can twitter be used to Monitor the Incidence of Dengue-Like Il...
Tweeting fever can twitter be used to Monitor the Incidence of Dengue-Like Il...Tweeting fever can twitter be used to Monitor the Incidence of Dengue-Like Il...
Tweeting fever can twitter be used to Monitor the Incidence of Dengue-Like Il...Muhammad Habibi
 
Imperial college covid19 europe estimates and npi impact
Imperial college covid19 europe estimates and npi impactImperial college covid19 europe estimates and npi impact
Imperial college covid19 europe estimates and npi impactValentina Corona
 
Global%20disease
Global%20diseaseGlobal%20disease
Global%20diseasePaul Vidal
 
A Review on Psychological Impact of Coronavirus Disease 2019
A Review on Psychological Impact of Coronavirus Disease 2019A Review on Psychological Impact of Coronavirus Disease 2019
A Review on Psychological Impact of Coronavirus Disease 2019ijtsrd
 
Dermatological health in the COVID-19 era
Dermatological health in the COVID-19 eraDermatological health in the COVID-19 era
Dermatological health in the COVID-19 erakomalicarol
 
Covid 19 is now a pandemic
Covid 19 is now a pandemicCovid 19 is now a pandemic
Covid 19 is now a pandemicRonald Vincent
 
Modeling and Forecasting the COVID-19 Temporal Spread in Greece: An Explorato...
Modeling and Forecasting the COVID-19 Temporal Spread in Greece: An Explorato...Modeling and Forecasting the COVID-19 Temporal Spread in Greece: An Explorato...
Modeling and Forecasting the COVID-19 Temporal Spread in Greece: An Explorato...Konstantinos Demertzis
 
Public health disease pandemics
Public health disease pandemicsPublic health disease pandemics
Public health disease pandemicsjharm35
 

Similar to Pandemic Response in the Era of Big Data (Prier, 2015) (20)

Social Media could be a of Threat for an “Infodemic” throughout COVID-19 Pand...
Social Media could be a of Threat for an “Infodemic” throughout COVID-19 Pand...Social Media could be a of Threat for an “Infodemic” throughout COVID-19 Pand...
Social Media could be a of Threat for an “Infodemic” throughout COVID-19 Pand...
 
Alternative mental health therapies in prolonged lockdowns: narratives from C...
Alternative mental health therapies in prolonged lockdowns: narratives from C...Alternative mental health therapies in prolonged lockdowns: narratives from C...
Alternative mental health therapies in prolonged lockdowns: narratives from C...
 
Imperial college-covid19-npi-modelling-16-03-2020
Imperial college-covid19-npi-modelling-16-03-2020Imperial college-covid19-npi-modelling-16-03-2020
Imperial college-covid19-npi-modelling-16-03-2020
 
Imperial college-covid19-npi-modelling-16-03-2020
Imperial college-covid19-npi-modelling-16-03-2020Imperial college-covid19-npi-modelling-16-03-2020
Imperial college-covid19-npi-modelling-16-03-2020
 
A Topic Analysis Of Traditional And Social Media News Coverage Of The Early C...
A Topic Analysis Of Traditional And Social Media News Coverage Of The Early C...A Topic Analysis Of Traditional And Social Media News Coverage Of The Early C...
A Topic Analysis Of Traditional And Social Media News Coverage Of The Early C...
 
Computational Epidemiology tutorial featured at ACM Knowledge Discovery and D...
Computational Epidemiology tutorial featured at ACM Knowledge Discovery and D...Computational Epidemiology tutorial featured at ACM Knowledge Discovery and D...
Computational Epidemiology tutorial featured at ACM Knowledge Discovery and D...
 
Main
MainMain
Main
 
Exercise rudiments Coronavirus complaint 2019
Exercise rudiments Coronavirus complaint 2019Exercise rudiments Coronavirus complaint 2019
Exercise rudiments Coronavirus complaint 2019
 
Review of Recent COVID-19 Science ~ Denis G. Rancourt, PhD
Review of Recent COVID-19 Science ~ Denis G. Rancourt, PhDReview of Recent COVID-19 Science ~ Denis G. Rancourt, PhD
Review of Recent COVID-19 Science ~ Denis G. Rancourt, PhD
 
Patient perception from internet on adverse effects vs benefits of vaccination
Patient perception from internet on adverse effects vs benefits of vaccination  Patient perception from internet on adverse effects vs benefits of vaccination
Patient perception from internet on adverse effects vs benefits of vaccination
 
Influenza, Public Health Communication
Influenza, Public Health CommunicationInfluenza, Public Health Communication
Influenza, Public Health Communication
 
Tweeting fever can twitter be used to Monitor the Incidence of Dengue-Like Il...
Tweeting fever can twitter be used to Monitor the Incidence of Dengue-Like Il...Tweeting fever can twitter be used to Monitor the Incidence of Dengue-Like Il...
Tweeting fever can twitter be used to Monitor the Incidence of Dengue-Like Il...
 
Imperial college covid19 europe estimates and npi impact
Imperial college covid19 europe estimates and npi impactImperial college covid19 europe estimates and npi impact
Imperial college covid19 europe estimates and npi impact
 
Global%20disease
Global%20diseaseGlobal%20disease
Global%20disease
 
A Review on Psychological Impact of Coronavirus Disease 2019
A Review on Psychological Impact of Coronavirus Disease 2019A Review on Psychological Impact of Coronavirus Disease 2019
A Review on Psychological Impact of Coronavirus Disease 2019
 
Co v trends
Co v trendsCo v trends
Co v trends
 
Dermatological health in the COVID-19 era
Dermatological health in the COVID-19 eraDermatological health in the COVID-19 era
Dermatological health in the COVID-19 era
 
Covid 19 is now a pandemic
Covid 19 is now a pandemicCovid 19 is now a pandemic
Covid 19 is now a pandemic
 
Modeling and Forecasting the COVID-19 Temporal Spread in Greece: An Explorato...
Modeling and Forecasting the COVID-19 Temporal Spread in Greece: An Explorato...Modeling and Forecasting the COVID-19 Temporal Spread in Greece: An Explorato...
Modeling and Forecasting the COVID-19 Temporal Spread in Greece: An Explorato...
 
Public health disease pandemics
Public health disease pandemicsPublic health disease pandemics
Public health disease pandemics
 

Pandemic Response in the Era of Big Data (Prier, 2015)

  • 1. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 1 Pandemic Response in the Era of Big Data: Exploring the Complexities of Global Influenza Surveillance and Information Overload Kyle Prier Johns Hopkins Bloomberg School of Public Health
  • 2. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 2 Pandemic Response in the Era of Big Data: Exploring the Complexities of Global Influenza Surveillance and Information Overload Introduction In March 2009, the Mexican Ministry of Health reported to the World Health Organization (WHO) Global Influenza Surveillance Network (GISN) an unusual increase of Influenza-like Illnesses (ILI) during a period of seasonal outbreak decline (1). This initial deviation reported from Mexican authorities became the basis for an alert to global public health officials of an outbreak of a novel, highly contagious influenza strain, which we now refer to as the 2009 H1N1 pandemic. After the March 2009 report by Mexican authorities of the novel influenza A in Mexico, rapid testing protocols to confirm H1N1 virological were quickly developed. Using GISN specimens of viral strains, within days, the Centers for Disease Control and Prevention (CDC) developed and shared a real-time reverse transcriptase-polymerase chain reaction (RTPCR) protocol that could quickly identify cases of H1N1 (2, 3). These testing protocols were critical to collect data so that officials could respond to the emerging epidemic. By July 2009, after 4 months of the initial report by Mexican authorities, the H1N1 pandemic had infected over 100,000 individuals globally. By July 2009, data were coming in so rapidly, it became too burdensome on a global level to efficiently track and validate cases using the rapid CDC protocol (2). During this time in July 2009, WHO officials resorted to relying upon general qualitative indicators of basic pandemic changes that were communicated via e- mail by country officials (2). Quite literally, at the height of the H1N1 pandemic, the highest level global public health decision makers were asking countries if the H1N1 pandemic was getting better or worse, despite the availability of reported data via existing protocols.
  • 3. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 3 During the height of the pandemic, WHO later determined there was not sufficient time to develop even “preliminary estimates of severity parameters such as the case fatality ratio,” and that these estimates “lagged behind key decision-making and response-planning” (2). The rapid increase of information and data at the global level during the H1N1 pandemic contributed to subsequent negative effects on key decision making, which ultimately negatively impacted the ability of WHO and countries to institute appropriate pharmaceutical and non-pharmaceutical interventions (4). In the H1N1 case, key decision makers became overwhelmed with the rate and amount of data, which effectively muted their organizational ability to most effectively manage the H1N1 pandemic. Information Overload The organizational inability of the WHO and other key decision makers to respond efficiently to the H1N1 pandemic can be attributed to the theory and concept of information overload. Information overload can be described as psychological and organizational phenomenon that occurs when the amount of input of information to a system organization exceeds its processing capacity (5, 6). In describing the concept of information overload, Shenk argues that “at a certain level of input, the law of diminishing returns takes effect” and that the “glut of information” leads to a negative situation that “cultivate[s] stress, confusion and even ignorance” (7). He further argues that information overload leaves us “less cohesive as a society”, and that on an individual level it “diminishes our control over our own lives,” while strengthening the positions of those “already in power” (7). Throughout the emergence and growth of electronic and computer-mediated communication systems, researchers have warned against the threat of information overload among various organizations (8, 9). The issue of information overload has extended into the public health
  • 4. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 4 surveillance arena. More specifically, the increased availability and frequency of infectious disease data from electronic sources has been burdensome for organizations to respond to disease outbreaks quickly and accurately (10). In 2006, the HealthMap project was created primarily to mitigate “information overload” among public health organizations to effectively monitor global infectious diseases (10). Big Data In 2014, researchers, governments, corporations, organizations, and individuals have access to almost unfathomable amounts of data relating to human behavior, communication, and health (just to name a few). The influx in available data has been made possible due to the influence of the Internet and the evolution World Wide Web. In 2004, 14.2 % of the global population had access to the Internet, compared to 35.5 % in 2012 (11). The World Wide Web has matured significantly over the last decade from a communication medium of passive content distribution (Web 1.0) to a platform of interactive user collaboration and user-generated content (Web 2.0). Today’s Web is made up of numerous online social networks like Twitter, Facebook, LinkedIn, Pintrest, and Tumblr (among many others). Such an influx of user-generated content has been accompanied by advancements in computer hardware and software. New disciplines within mathematics and computer science (e.g. Data Mining, Natural Language Processing, and Machine Learning) have emerged to study such large amounts of data in order to induce trends and relationships among potential variables. A fascination with the utility of “big data” has bled into many disciplines, including public health. Consider some of the buzz words: mHealth (12), eHealth (13, 14), infodemiology (15, 16), and infoveillance (16). Purpose
  • 5. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 5 The primary purposes of this paper are (1) explore the evolution and complexities associated with disease surveillance, (2) consider the utility and implications of these methods on the capacity of organizations to promptly and effectively respond to infectious disease threats. Additionally, this paper explores emerging novel syndromic surveillance systems, including the analytical evaluation of a novel global Twitter-based influenza surveillance system to determine how closely these Twitter data correlate with traditional weekly influenza reports among several English-speaking countries. Influenza Influenza (or the “flu”) is a contagious acute viral infection that primarily impacts the brochi, throat, and occasionally the lungs, with an incubation period of about 2 days (17). With such a relatively short time between infection and onset, infected individuals will often quickly develop symptoms of fever, cough, sore throat, runny/stuffy nose, muscle aches, and general malaise and fatigue (18). Although influenza impacts all age groups (19-21), mortality rates of influenza cases are more likely among population groups who are at a higher risk including young children less than 2 years, the elderly (65+ years), and those who have preexisting chronic illnesses (e.g. chronic lung disease, heart disease, asthma, diabetes, weakened immune systems, and morbid obesity) (22-26). The primary purposes of this paper are (1) explore the evolution and complexities associated with disease surveillance, (2) consider the utility and implications of these methods on the capacity of organizations to promptly and effectively respond to infectious disease threats. Additionally, this paper explores emerging novel syndromic surveillance systems, including the analytical evaluation of a novel global Twitter-based influenza surveillance system
  • 6. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 6 to determine how closely these Twitter data correlate with traditional weekly influenza reports among several English-speaking countries. Disease Burden Although most people with the flu will recover within a few days to less than 2 weeks, some will develop life-threatening complications from the flu including pneumonia and worsening of preexisting conditions (26). Although it is sometimes difficult to identify influenza- related mortalities, it is estimated that seasonal influenza accounts for about 250,000 to 500,000 deaths globally (17). In addition to flu related deaths, influenza epidemics and pandemics decrease worker productivity and economic output, while generating considerable costs for necessary treatment and prevention interventions each year. Disease Transmission Influenza is spread primarily from person-to-person contact. The virus is believed to be spread through droplets or aerosols created when infected individuals cough, sneeze, or talk from up to 6 feet away (27). Influenza often spreads efficiently and quickly through villages, cities, schools, and other areas where human-to-human contact is likely (17). Seasonal vs. Pandemic Influenza In temperate areas, influenza typically occurs annually as regional or national epidemics (28). This yearly emergence of seasonal influenza is due primarily to constant antigenic drift of influenza viruses (29, 30). In addition to annual or seasonal influenza infections, global flu pandemics do rarely occur as novel influenza A viruses emerge (31). Examples of such global influenza pandemics include the 1918 Spanish Influenza, a 1957 Asian Influenza, a 1968 Hong Kong Influenza, as well as the 2009 H1N1 Influenza (32). Disease Surveillance
  • 7. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 7 Traditional Methods Timely and accurate surveillance of seasonal and pandemic flu trends is a crucial tool for global, national, state, and local organizations to better understand key important epidemiological and virological aspects of the pandemic. An accurate understanding of these aspects of a pandemic better enables organizations to determine allocation of key resources (e.g. vaccinations and antivirals), how to communicate to the public, or whether restrictions on travel should be implemented. The World Health Organization identifies the primary goal of global influenza surveillance “to develop a global picture of the event through sharing and analysis of information provided by individual countries” (2). Traditional influenza surveillance methodologies typically refer to virological surveillance, or identification of influenza specimen strains in a laboratory setting. In the United States virological surveillance is primarily reported through FluView, which includes laboratory data from 85 WHO Collaborating Laboratories in the US as well as data from the 60 laboratories in the US that make up the National Respiratory and Enteric Virus Surveillance System (NREVSS) (33)) . Additionally, the WHO Global Influenza Surveillance Network (GISN) has been in use for over 60 years and comprises over 131 National Influenza Centers (NICs) in 105 countries (2). Syndromic Surveillance In contrast to traditional methods of virological surveillance, nontraditional syndromic- based surveillance methods have become more widely implemented and developed over the last decade due to a greater “atmosphere of concern” after the terrorist attacks of September 11, 2001(34, 35). Syndromic surveillance systems primarily implement various statistical analyses of data that relate to individual behavioral patterns that indicate or suggest influenza infection (36).
  • 8. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 8 Such behavioral indicators may include various data points such as healthcare visits, drug purchases, or work absence. Syndromic surveillance in the US is primarily conducted via the U.S. Outpatient Influenza- like Illness Surveillance Network (ILINet). ILINet consists of about 2900 sentinel outpatient healthcare providers that send weekly reports to the CDC with information on the number of patient visits who meet the case definition of having an Influenza-like Illness (ILI) (37). States also have their own methods of surveillance, and some are implementing more novel web-based approaches. For example, the Maryland Department of Health and Mental Hygiene has recently implemented a new online tracking survey called the Maryland Resident Influenza Tracking Survey (MRITS) (38). MRITS is intended to compliment information from sentinel providers to help identify ILI prevalence through an internet survey to residents. Early Warning Capability The practices and aims of virological and syndromic influenza surveillance are perhaps better understood in terms of medical screening and testing. Syndromic surveillance as a screening tool aims to provide easily accessible and timely data, which could then be verified through virological testing. Generally, syndromic surveillance data should be available as close to real-time as possible, as syndromic surveillance’s primary functions is to provide early warning of potential novel outbreaks (36). As with all medical tests, there is a tradeoff that must be made between the specificity and sensitivity of such tests. In general, practitioners determine to what extent their screening methodology should be prone to type 1 statistical errors by giving a false alarm. This determination is carefully determined based on the purpose of the surveillance
  • 9. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 9 system. Protocols have been developed by the CDC and the Institute of Medicine to assess the quality and effectiveness of syndromic surveillance systems (39, 40). Post 9/11 and Bioterrorism. The terrorist attacks of September 11, 2001 undoubtedly mark the beginning of the emergence of new policies and actions by governments to protect against potential vulnerabilities for terrorist attacks. In particular, the threat of bioterrorism led the United States government to further the design and implementation of automated, electronic surveillance systems that could provide early warnings of emerging epidemics. This new post 9/11 environment of placed a “premium on timeliness,” which lead to an “emphasis on automation of the full cycle of surveillance” (41). The US military has maintained a laboratory-based global influenza surveillance program since 1976 initially under a program of the United States Air Force (42). In 1997 this program was expanded to include all US military services under the Department of Defense Global Emerging Infections Surveillance and Response System (GEIS), which monitors various infectious disease outbreaks throughout the world (43). In February 2012, the Department of Defense (DOD) reorganized the GEIS under the newly established Armed Forces Health Surveillance Center (AFHSC), which is currently designated as the primary source for all DOD- level surveillance data (44). The GEIS is a member of the WHO Global Outbreak and Response Network (GOARN), which comprises various international institutions through which weekly privileged public health related alerts can be shared (45). Novel Syndromic Systems In 2001, the DOD began implementation of their first version of the Electronic Surveillance System for the Early Notification of Community-based Epidemics (ESSENCE I),
  • 10. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 10 which provided syndromic surveillance for US active military (46). Because of increased concern for bioterrorism, an automated surveillance reporting mechanism was implemented with ESSENCE I in 2002 (35). Since ESSENCE I multiple newer versions of the syndromic surveillance system have been developed and catered for various civilian areas and military facilities (47). ESSENCE II is a system jointly developed by the Johns Hopkins University Applied Physics Laboratory in conjunction with the Maryland Department of Health and Mental Hygiene, the District of Columbia Department of Health, and the Virgina Department of Health (47). ESSENCE II began implementing other secondary data sources related to disease transmission into more innovative statistical and computational models. GEIS began researching and developing additional novel syndromic surveillance methods with ESSENCE II. As a part of this development, the Bio-ALIRT Biosurveillance Detection Algorithm was developed by the Defense Advanced Research Projects Agency (DARPA). Other contractors involved include Johns Hopkins University Applied Physics Laboratory, the Walter Reed Army Institute of Research, the University of Pittsburgh/Carnegie Mellon University, the General Dynamics Advanced Information Systems, the Stanford University Medical Informatics group, the Potomac Institute, CDC, and the IBM corporation (48). Additionally, GEIS introduced BioWar into their systems, which is a computer simulation by Carnegie Mellon University that models disease transmission using social network, communication media, weather models, and other untraditional data sources (49). Web-based Surveillance. In addition to the collection of disease surveillance data from sentinel outpatient providers, there has been increased interest in the use of social media and
  • 11. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 11 other Web-based sources for disease surveillance. Traditional disease surveillance methods rely on the aggregation of data sourced from actual clinical observations, which is comparatively time-consuming and expensive (50). Alternatively, novel web-based surveillance platforms have the potential to provide more cost-effective solutions of surveillance to provide real-time surveillance and aggregation of disease-specific data from around the world. Notable Projects. HealthMap is an internet-based service that collects and combines disease outbreak data from various information sources. Since 2006 the HealthMap project attempts to automate the process of data query, filtering, integration, and visualization of various web-based reports of outbreaks disease outbreaks(10). Examples of the project’s sources for aggregation include: news reports (via Google News aggregation service), official alerts and announcements (e.g. WHO, CDC), as well as through “expert-curated” accounts who report disease outbreaks via the ProMED Mail service (10). Flu Near You uses elements of crowd-sourcing to estimate and communicate local ILI activity. Flu Near You was created as a partnership between HealthMap at Boston Children’s Hospital, the American Public Health Association (APHA), and the Skoll Global Threats Fund (skollglobalthreats.org). Flu Near You estimates local influenza infection rates through self- reported symptomatic data from its users. Users must be at least 13 years and old and reside in either the United States or Canada. Additionally, Flu Near You aggregates and visualizes regional ILI activity from user self-reported data, CDC weekly flu activity reports, and Google Flu Trends ILI detection. Twitter. Social media platforms like Twitter have enabled people to share concise messages about their thoughts, opinions, and feelings that often relate to their personal lives. Twitter is a prevalent microblogging service where users can post short status updates (or tweets)
  • 12. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 12 in less than 140 characters. On Twitter, users are able to follow other users, which enables followers to receive notifications and status updates of those whom they follow. Because tweets are public, a user who is followed by another user does not have to reciprocate, or confirm, the connection as is the case with other Social media services like Facebook. Jansen et al. characterize microblogging platforms like Twitter by the following characteristics: (1) short messages or status updates, (2) instantaneous or real-time publication of messages, and (3) a subscription component for users to receive status updates of other users (51). In addition to these 3 components, a fourth characteristic that should be included is that of the dissemination of messages through various devices, platforms, and applications. This cross-platform interoperability component is enabled primarily by the provision of an application programming interface (API), which enables computer programmers to develop software that can easily exchange data directly with a service provider like Twitter. The Twitter APIs not only promote the adoption of the Twitter platform on other platforms like mobile devices and other web services, but the APIs enable researchers and other 3rd parties to easily access status updates and other meta-data on a large scale. The ability of public access of tweets coupled with the Twitter APIs enable researchers to quickly and cheaply collect large samples of conversational data. The information contained in tweets often provides relevant real-time insight and information that relates to the larger contexts beyond the individual (50). Researchers have used Twitter data in a variety of applications to infer global trends and events including political opinion (52), and earthquake monitoring (53). Social media sources like Twitter often contain geospatial components that could prove useful in tracking health conditions in various locales where outpatient provider data may be sparse.
  • 13. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 13 Locale-specific information about users is often tracked and calculated by services for various marketing and advertising objectives regardless of user input or awareness. At the most granular level, the geographic details of social media users can be directly provided by the users in the form of latitude and longitudes from GPS equipped mobile devices. Users may be unaware that they are directly submitting or publishing data that contain geospatial elements. Additionally users may perceive that services provide added value e.g. traffic/commute times, location of nearby events. Users may also provide geospatial data in order to help others (examples include Waze for traffic or Mr. Checkpoint for location of DUI checkpoints). Social media services often approximate users’ locations through a combination of various meta-data even if the location is not provided by the mobile device. Geospatial data can be gleaned from users by approximating location through users’ Internet Protocol addresses, phone numbers, and search queries. Analysis of Twitter Data Twitter Surveillance System Broniatowski et al. have developed and implemented an automated software platform to query, filter, and integrate Twitter conversational data that estimates ILI prevalence globally (54). Broniatowski et al. demonstrated that their platform’s estimates of ILI during the 2012- 2013 flu season were strongly correlated with CDC weekly surveillance data reports for the United States (r = 0.93, p < 0.001) (54). Although Broniatowski et al. have reported how well their Twitter-based surveillance method performs within the United States, it is currently unknown how well their methodology will perform in other countries and locales. Furthermore, it is unknown how the Twitter platform will perform with non-English conversational data.
  • 14. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 14 Research Objectives The primary objective of this study is to analyze the performance of the Broniatowski et al. platform for several different English speaking countries. For this study we included the United States as a baseline, and we selected four additional English-speaking countries in the UK to assess the Twitter platform’s performance outside the United States. We included English- speaking countries only, so that we could better assess the impact on performance by location. This study extends upon the previous performance estimates by including weekly estimates over 3 flu seasons between August 15, 2011 and January 5, 2014. The following research questions are addressed: (1) How do the global weekly ILI estimates from the proposed Twitter surveillance platform correlate with national influenza-like illness (ILI) incidence estimates as reported weekly by the government surveillance networks in the United States, England, Scotland, Wales, and Northern Ireland? (2) How does performance of ILI estimation via Tweets differ by country and year? Methods The proposed platform implements a supervised classification model that determines whether a Tweet indicates infection rather than just concern or discussion of influenza symptoms. Broniatowski et al. couple this classification model with a specialized geolocation system to infer influenza prevalence parameters. The details of this inference platform have been described in detail previously (54, 55). Data Collection Broniatowski et al. use the Twitter API to access and download real-time “streams” of public conversational data. Data were downloaded via 2 separate streams: a “general” stream that
  • 15. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 15 is a random sample of all Tweets and a “health” stream that only downloads Tweets that mention predefined health-related terms (54). The general stream sample is a random representative sample of 1% of all tweets, while the health stream sample represents 1% of all Tweets that include the health terms. This stratified sampling method makes it possible to normalize influenza prevalence estimation (strata are Tweets that mention health terms, and all Tweets). Tweets that indicate ILI infection are identified and coded by a complex, automated computer algorithm. Additionally, Broniatowski et al. used human readers to code a small portion of Tweets as a means of data validation and cross-checking with the computational method. After filtering both streams by location, the influenza prevalence among the health stream tweets is then normalized by the proportion of all Tweets by location from the general stream. For this study, the totals of all Tweets collected was not available- instead, daily ILI estimates were provided that were generated by the Broniatowski et al. procedure. The details of this inference platform have been described in detail previously (54, 55). Government-reported weekly ILI estimates for the various countries were collected by me electronically primarily through the various official online data portals. All data were publicly available and contained no personal identifiers and were therefore not subject to Institutional Review Board (IRB) consideration according to, the National Human Subjects Protection Advisory Committee (NHRPAC) (56) and the Johns Hopkins University IRB (57). All data retrieved were regional and country estimates that contained no individual-specific information. Data were accessed during January 2014. For the United States, online archives of the CDCs FluView were accessed, which is a weekly report prepared by the CDCs Influenza Division, using data from ILINet (58). ILI estimates through ILINet are determined each week by the percentage of outpatient visits that are due to an influenza-like-illness. ILI is
  • 16. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 16 symptomatically defined by ILINet as patients having fever (100 degrees Fahrenheit or more) and cough and/or sore throat (37). I retrieved and stored weekly ILI estimates from week 35 of 2009 through Week 2 of 2014. Weekly reporting periods for the United States are from Sunday through Saturday, whereas the ISO-8601 week is Monday through Sunday. Because the UK reports conform to the ISO Mon-Sun week, I grouped daily Twitter estimates into weeks according to the standard used by government reports. This caveat is important especially if one tries to compare weekly ILI reports and predictions between various countries. The weekly periods used in the analysis and results are based on the standard used by each specific country. In this case, weeks for both US government and Twitter estimates are from Sunday through Saturday, whereas weeks for the other countries are Monday through Saturday. Because government reported data only provide weekly totals of ILI, it was not possible to standardize data by time across countries. For the United Kingdom, I collected weekly ILI estimates for each country from the Public Health England (PHE) Influenza surveillance network reports from week 40 of 2010 through week 7 of 2014. Weekly ILI estimates from PHE are determined from clinical data by general practitioners (GPs). In the UK, GP clinical data are provided weekly to PHE by the various GP networks for each country. For England, this data comes from the Royal College of General Practitioners (RCGP). The RCGP weekly returns service has been run by the RCGP since 1966 (59). Each country within the UK uses slightly different schemas for GP-based surveillance, which creates additional challenges in normalizing data between countries. For example, the RCGP England and NHS Wales include only primary or first-time consultations, while Health Protection (HP) Scotland includes repeat consultations (59-61). Because we cannot meaningfully differentiate between first visits and follow-up visits in Scotland, we cannot normalize the data
  • 17. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 17 with other countries in the UK. Additionally, PHE notes that health-seeking behaviors differ among populations- specifically, according to reports from the Northern Ireland Department of Health, Social Services and Public Safety (PHA Ireland), those in Northern Ireland go to a GP more often than those in England (59, 62). Furthermore, the various national surveillance systems define flu activity differently- RGCP England and HP Scotland use ILI, NPHS Wales uses influenza, while PHA Northern Ireland combines influenza and ILI (59). Results Overall Correlation. For the entire period October 2011 through December 2013, the correlation of Twitter and government ILI estimates in the United States was 0.84 (p <0.001). Twitter ILI correlation was significantly lower in the UK with 0.41 correlation for England (p <0.001), 0.37 for Scotland (p<0.001.), 0.35 for Wales (p=0.001), and 0.36 (p<0.001) for Northern Ireland. Yearly. The Twitter algorithm performed better each year in the US with a correlation of 0.71 during 2011-2012 (p<0.001), 0.89 in 2012-2013 (p<0.001) and 0.93 in 2013 (p = 0.001). The Twitter algorithm performed the best during the 2013 time period in the United States compared with all other years and countries (r = 0.93). Among all other countries, the Twitter algorithm performed worse with each subsequent time period. Additionally, for the 2013 time period Pearson’s correlation coefficient was found not to be significant a=0.05 for all countries besides the United States. Outside of the United States the Twitter algorithm performed best in Scotland during 2011-2012 (r=0.59). Time Series Graphical Visualization. Both weekly Twitter and Government estimates are plotted over the three periods used previously as well as over the entire 2011-2014 dataset.
  • 18. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 18 A visualization of the entire series from 2011 to 2014 encompasses three separate flu seasons and gives a visual overview of the performance of both surveillance methods over time. Figures 1 and 2 cover the entire time period (combined). Figure 1 plots estimations from both Twitter and government for the United States, while Figure 2 contains similar plots for each country of the United Kingdom in one figure. The combined overview and helps describe seasonal trends and lags between surveillance methods. In the United States, Twitter ILI estimates are generally higher than the CDC estimates for both 2010 and 2013 flu seasons. During times of lower flu activity (off season), Twitter estimates are higher in the United States. In the UK, Twitter estimates are generally higher that government estimates during the off seasons as well. Also, during the 2011-2012 and 2013- 2014 flu seasons Twitter estimates are higher in UK countries. As is expected with this data, all of the time series exhibit seasonality; however, the strength of yearly seasonality does vary among countries and time periods (especially for UK countries in 2011-2012 and 2013-2014). During flu seasons in 2011-2012 and 2013-2014 in the UK, Twitter estimates are considerably higher and more apparent than government reports. Government estimates lag behind Twitter estimates for all countries in the UK during all time periods. In the United States, however, there is much smaller evidence of lag in 2011-2012 and 2013-2014, but there appears to be little or no lag during 2012-2013. For the 2012-2013 season in the United States, both CDC and Twitter estimates appear closely correlated. Conversely, Twitter and Government estimates for countries in the UK indicate a significant lag between surveillance methods of at least 10 weeks. This lag is most evident in the 2012-2013 flu season, which was the season with the most influenza activity among all countries.
  • 19. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 19 Discussion The Twitter ILI surveillance method performed much better in the US in terms of its correlation with CDC weekly reports. It is expected that the Twitter estimates would perform better with CDC data in the US, primarily because the computational model and methods were developed and trained to predict CDC ILI. The CDC ILI predictions are different than other countries. As mentioned previously, ILI case definition is slightly different, and the CDC does not use the ISO week reporting period like most other countries. The lack of uniformity in data collection, evaluation, and distribution among the various countries is problematic in implementing and assessing infectious disease activity at a global level. For example, the ILI case definition is slightly different between the CDC, WHO, and the UK. It is particularly concerning; however, that the Twitter method generally overestimates CDC reports especially during off seasons and during times of increasing ILI rates at the start of an outbreak. It is particularly challenging to develop a computational method that can differentiate between tweets from infected individuals and those who are merely discussing infection (i.e. people talking about flu activity but are not infected). It makes sense that chatter about flu would be higher during times of increased infection rates. The comparatively low correlation among UK countries does demonstrate the difficulty and challenges in implementing and assessing novel computational instruments such as the Twitter-based method used for this study. Currently, various countries and organizations use differing criteria and methodologies for disease surveillance. Because of differing collection and coding methods of these countries’ national health organizations, it is difficult to estimate and compare activity between countries. Therefore, we will draw special attention to those areas where comparisons are rendered ineffective or impossible to make due to this challenge.
  • 20. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 20 Limitations. While this study demonstrates compelling correlations between government- reported and socially-generated health data, it is not without limitation. Additional analytic methods should be implemented in future studies that can provide additional assessments of time series data. Seasonality and lag should be assessed and modeled using more advanced statistical methods like the Box-Jenkins method, which implements autoregressive moving average (ARMA) and autoregressive integrated moving average (ARIMA) models to time-series data (66). Additionally, autocorrelation and partial autocorrelation plots could be used to assess stationarity and seasonality and to identify the indicated model(s) for the data. While such statistical tools are interesting, these analyses are beyond the scope of this study. Additionally, this dataset has a limited time period overall as well as only partial data for the 2013-2014 season. There is only partial data for the 2013-2014 season because the data was accessed in January 2014. Currently, it is difficult to access and analyze data from various countries. For example, since January 2014 the UK has changed the format and frequency of influenza reports (59). Assessing lag is problematic because of potential confounders. Such factors that may influence lag are reporting methods and turnaround times, including whether locales use internet- based data submission tools. Furthermore, the ILI estimates provided by governments are adjusted internally before reported. The CDC will even retrospectively change previously reported estimates if new laboratory data become available (58). Influence of Social Factors. There are also significant social, cultural, economic, and even language considerations that should be assessed among the populations from which the various surveillance methods sample. One must consider the profiles or potential biases of the populations sampled by both government and Twitter methods. With each surveillance method, one should consider which segments of the population are misrepresented. Which groups of
  • 21. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 21 people are these surveillance methods not detecting? How do rates of type II errors (false negatives, or a failure to detect infectious disease infection) vary among population subgroups for a given surveillance instrument? The ideal surveillance method should have constant type II errors across all population subgroups. From a public health intervention standpoint, it is imperative that we better determine the extent to which these surveillance methods potentially misrepresent key population groups. This issue is compounded by the potential reality that underrepresented population groups may be at greater risk for infection and subsequent morbidity and mortality. An accurate assessment of sampling biases and population characteristics would enable us to better control for these variations and ultimately ensure that interventional responses are prioritized and delivered to population groups with the greatest risk. Government Syndromic Surveillance. For example, people in Northern Ireland are more likely to go to a general practitioner when sick and have symptoms of the flu. Because the UK’s syndromic surveillance methods rely upon ILI reports from Northern Ireland general practitioners, the government’s ILI estimates for this area tend to be higher than other areas where individuals do not seek care when they have ILI symptoms (59). Because most government syndromic surveillance methods rely upon reports by sentinel physician networks, any factors that increase barriers to care would likely increase the likelihood of type II error among population sub-groups affected by such factors. Those who live in rural areas (or areas where it is difficult to seek medical care) often have greater barriers to care (76), and they may be more likely to treat ILI symptoms without visiting a physician. Similarly, individuals without health insurance and lower socio-economic status face greater barriers to care (77), and they may be less likely to visit a doctor when experiencing ILI symptoms. Furthermore, there is evidence
  • 22. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 22 that people with chronic mental disorders experience significant barriers to primary care access, which could indicate these individuals would be more likely to be excluded from government- based surveillance methods (63). Conversely, individuals with pre-existing medical conditions that receive more frequent medical care (most likely older segments of the population), may in fact be more likely to go to their doctor when sick. In this particular example, these individuals would likely be overrepresented through government syndromic surveillance; however, this group is likely more at risk for infection and greater mortality and morbidity. In this case, this population subgroup may benefit more from current surveillance methods. A counterexample would be young children (especially younger than 2 years) and pregnant women with low socioeconomic status who have significant barriers to care. Migrant worker families with young children and pregnant mothers may be less likely to go to a doctor when sick due to a combination of potential factors (including legal issues, access to transportation, familiarity with the healthcare system, language barriers, and other socioeconomic and cultural factors) (74, 75). Like the example of elderly individuals, these individuals have a high risk for influenza infection and complications; however, the latter would be more likely underrepresented by government surveillance. Future work is needed to integrate other existing datasets that include population-based data that may be relevant to healthcare access and other risk factors for influenza infection and complications. Additional research should explore methods that can control for such factors, so we can more accurately identify disease within the populations and sub-groups from which various surveillance methods sample. Twitter Surveillance. In addition to the challenges with traditional syndromic influenza surveillance methods, it is difficult to describe and characterize the users who tweet about their
  • 23. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 23 own health statuses. While the sensitivity, or recall, of government methods is likely influenced by access to care, the precision of Twitter methods rely on factors that influence the likelihood of individuals posting tweets that include individual syndromic information. Regarding Twitter surveillance, it is important to consider how various factors related to Twitter usage relate to various population groups and sub-groups globally. In addition to the socio-economic factors mention regarding government surveillance, additional factors should be considered like internet access, social media usage, online communication preferences, and factors that describe types of people who share personal syndromic information via Twitter. It is important to determine how various groups of people are misrepresented with Twitter surveillance. Lack of internet availability and access among less developed countries is certainly a concern regarding the ability of a Twitter surveillance method to assess infectious disease prevalence. Geopolitical factors influence internet access and access to specific web services including Twitter. Individuals living in some countries may be hesitant to post personal information on the Web due to political and social pressures. Activity of Twitter users also varies by continent- North America has the greatest number of active users, while Africa has the least amount of active users (69). Additionally, there are differences among who uses Twitter based on simple demographic indicators like age, gender, and race/ethnicity. There is evidence that Twitter users (in the United States) make up a highly non-uniform sample of the US population in regards to geography, gender, and race/ethnicity (64). One study reports that Twitter users in the US are more likely men from urban areas (64). In the UK, younger people between the ages of 18 and 24 are much more likely to use Twitter than other age groups, while individuals 65 years and older are adopting social media platforms more frequently, especially Facebook (65). In reality, the demographics and characteristics of Twitter
  • 24. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 24 users is changing, and it is difficult to access demographic information about Twitter users. Current methods of determining demographics of Twitter users rely on traditional survey methods using telephone interviews (68). Although Twitter stores detailed information about its users internally, we are limited to the information the company provides through the Twitter API (69). Furthermore, the long-term of the availability of Twitter data is not known. Although Twitter currently provides access to information, there is a possibility the company could move towards a more private model like Facebook. There are likely many other factors that influence the likelihood of a Twitter user to mention symptomology. There is a need to describe and better understand Twitter users who post tweets about their own health. For example, one would expect women to be more likely to tweet about their health, yet in the US, Twitter users are more likely to be younger men from urban areas. There is evidence men and women use social media sites like Twitter differently (71). There is evidence that online users discuss and share personal health-related topics, and such users receive greater social support, emotional support, information support, and sometimes tangible benefit (72, 73). However, additional research is needed to assess what factors influence how people discuss personal health symptoms specifically on the Twitter platform. Future studies and research should seek to control for various characteristics and demographics of Twitter users, although the Twitter platform makes it difficult (if not impossible) to accurately describe the characteristics of all Twitter users. The reality and nature of the Twitter platform makes it challenging to normalize Twitter because of the complexities and difficulty in characterizing Twitter users. These issues with the Twitter platform certainly challenge the generalizability of Twitter as a stand-alone biosurveillance instrument. From a practical
  • 25. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 25 perspective, public health researchers and practitioners should continue to develop methods that combine traditional syndromic surveillance methods with other novel syndromic methods (67). For this particular analysis, it is assumed that the government weekly reports are the most accurate data sources that are readily accessible to the public. Because of this assumption, these official reports are used as the standard to which the Twitter estimates are compared. There is the possibility that the novel Twitter surveillance method is more indicative of actual influenza infection than government reports (70), although this is difficult (if not impossible) to assess on a global-population level. Government reports may provide a sufficient indication of influenza outbreak within their specific country; however, because of the varying reporting methods it is difficult to track infection trends between countries. The challenges of global infectious disease surveillance are highlighted in how the WHO was unable to respond effectively to the 2009 H1N1 pandemic. A novel passive surveillance method (like the Twitter method) is particularly promising because it implements a single method across all areas, which bypasses the issue of various governments’ reporting methods and practices. Further research should address how to integrate both traditional and novel methods of infectious disease surveillance into a single instrument that can deliver rapid report packages to key decision makers at global, national, regional, and community levels. Additional work should focus on creating such a product that minimizes the potential of information overload, while retaining key information needed for decision making. Reflection This research practicum has been extremely challenging to say the least. Additionally, this experience has been rewarding and beneficial for me. During the course of my research, I had to learn and develop new analytic skills to be able to work with time series data of this scale. I
  • 26. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 26 was also able to develop computer and statistical programming skills in order to acquire, organize, clean, analyze, and visualize data. I was able to use the Python programming language to develop programs to automate the process of data collection and preparation. In addition to analytic and technical challenges and growth, I have learned a tremendous amount about disease surveillance in general and how disease surveillance is changing and evolving today. It is very exciting to work with large datasets and to assess health conditions and outcomes on a large population-based scale. In addition to working with large-scale data, it was even more interesting for me to research and learn how individual stakeholders and groups integrate such large amounts of data into their specific decisions and interventions at local levels. The analysis of the Twitter data highlights very well the difficulty dealing with large, time-series datasets. It has been difficult for me to fully understand these various datasets, especially regarding how to make specific interventional decisions at various micro and macro levels. It was honestly quite reassuring to learn that organizations like the WHO have also struggled to work through this process in order to make timely and appropriate decisions to save lives.
  • 27. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 27 Table I. Correlation of Twitter weekly ILI Estimates by Country and Calendar Year with National Surveillance ILI Reports Note: Correlation coefficients, r, marked with an asterisk were not statistically significant (p < .05) Time Period 2011-2014 (Combined) 2011-2012 2012-2013 2013-2014 10/3/2011 – 12/29/2013 10/3/2011 - 9/30/2012 10/1/2012 - 9/29/2013 9/30/2013 – 12/29/2013 Country Pearson’s Product- Moment Correlation Coefficient, r United States 0.84 (p < 0.001) 0.71 (p < 0.001) 0.89 (p < 0.001 ) 0.93 (p = 0.001) England 0.41 (p < 0.001) 0.51 (p = 0.0002 ) 0.48 (p = 0.0004) -0.18 (p = 0.554 )* Scotland 0.37 (p < 0.001) 0.59 (p < 0.001) 0.54 (p < 0.001) -0.38 (p = 0.196)* Wales 0.35 (p = 0.001 ) 0.54 (p < 0.001) 0.37 (p = 0.0069) 0.33 (p = 0.271 )* N. Ireland 0.36 (p < 0.001) 0.46 (p=0.0006 ) 0.44 (p = 0.0014) -0.35 (p = 0.267)*
  • 28. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 28 Figures Figure 1-1. Figure 1-2. 0246810 Influenzaprevalence(%) 123456 2011w40 2012w1 2012w14 2012w27 2012w40 2013w1 2013w13 2013w26 2013w40 2014w1 2014w13 Week ILI USA (Government) ILI USA (Twitter) United States 2011 - 2014 Influenza 0510152025 Influenzaprevalence(%) 0 20406080 ILIPrevalence(%) 2011w40 2012w1 2012w14 2012w27 2012w40 2013w1 2013w13 2013w26 2013w40 2014w1 2014w13 Week ILI England (Government) ILI Scotland (Government) ILI Wales (Government) ILI N. Ireland (Government) ILI England (Twitter) ILI Scotland (Twitter) ILI Wales (Twitter) ILI N. Ireland (Twitter) United Kingdom 2011 - 2014 Influenza
  • 29. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 29 Appendix 0510152025 Influenzaprevalence(%) 0 10203040 ILIPrevalence(%) 2011w40 2012w1 2012w14 2012w27 2012w40 2013w1 2013w13 2013w26 2013w40 2014w1 2014w13 Week ILI Wales (Government) ILI Wales (Twitter) Wales 2011 - 2014 Influenza 012345 Influenzaprevalence(%) .5 1 1.5 2 2.5 2011w31 2011w40 2011w48 2012w5 2012w14 2012w22 2012w31 Week ILI USA (Government) ILI USA (Twitter) United States 2011-2012 Influenza Season 12345 Influenzaprevalence(%) 12345 2013w35 2013w40 2013w44 2013w48 2014w1 2014w5 2014w9 2014w13 Week ILI USA (Government) ILI USA (Twitter) United States 2013-2014 Influenza Season 246810 Influenzaprevalence(%) 123456 ILIPrevalence(%ofoutpatientvisits) 2012w31 2012w40 2012w48 2013w5 2013w13 2013w22 2013w31 Week ILI USA (Government) ILI USA (Twitter) United States 2012-2013 Influenza Season
  • 30. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 30 0 20 40 60 80 2012w1 2012w27 2013w1 2013w26 2014w1 2014w26 Time ILI USA (Government) ILI N. Ireland (Government) ILI England (Government) ILI Wales (Government) ILI Scotland (Government)
  • 31. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 31 References 1. World Health Organization. New influenza A (H1N1) virus infections: Global surveillance summary, may 2009. 2009. 2. Briand S, Mounts A, Chamberland M. Challenges of global surveillance during an influenza pandemic. Public Health. 2011;125(5):247-56. 3. World Health Organization. CDC protocol of realtime RTPCR for swine influenza A (H1N1). 2009. 4. Lipsitch M, Riley S, Cauchemez S, Ghani AC, Ferguson NM. Managing and reducing uncertainty in an emerging influenza pandemic. N Engl J Med. 2009;361(2):112-5. 5. Milord JT, Perry RP. A methodological study of overloadx. J Gen Psychol. 1977;97(1):131-7. 6. Speier C, Valacich JS, Vessey I. The influence of task interruption on individual decision making: An information overload perspective. Decision Sciences. 1999;30(2):337-60. 7. Shenk D. Information overload, concept of. Encyclopedia of International Media and Communications. 2003;2. 8. Hiltz SR, Turoff M. Structuring computer-mediated communication systems to avoid information overload. Commun ACM. 1985;28(7):680-9. 9. Berghel H. Cyberspace 2000: Dealing with information overload. Commun ACM. 1997;40(2):19-24. 10. Freifeld CC, Mandl KD, Reis BY, Brownstein JS. HealthMap: Global infectious disease monitoring through automated classification and visualization of internet media reports. J Am Med Inform Assoc. 2008 Mar-Apr;15(2):150-7. 11. Internet users (per 100 people). data retrieved june 3, 2014, from world DataBank: World development indicators database. [Internet].; 2014 http://findit.library.jhu.edu/resolve?sid=Refworks&charset=utf- 8&__char_set=utf8&genre=article&aulast=World%20Bank&date=2014&atitle=Internet%20user s%20(per%20100%20people).%20Data%20retrieved%20June%203%2C%202014%2C%20from %20World%20DataBank%3A%20World%20Development%20Indicators%20database.&au=Wo rld%20Bank%20&. 12. Kay M. mHealth: New horizons for health through mobile technologies. World Health Organization. 2011. 13. Black AD, Car J, Pagliari C, Anandan C, Cresswell K, Bokun T, et al. The impact of eHealth on the quality and safety of health care: A systematic overview. PLoS medicine. 2011;8(1):e1000387. 14. Eysenbach G, CONSORT-EHEALTH Group. CONSORT-EHEALTH: Improving and standardizing evaluation reports of web-based and mobile health interventions. J Med Internet Res. 2011 Dec 31;13(4):e126.
  • 32. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 32 15. Eysenbach G. Infodemiology: Tracking flu-related searches on the web for syndromic surveillance. AMIA Annu Symp Proc. 2006:244-8. 16. Eysenbach G. Infodemiology and infoveillance: Framework for an emerging set of public health informatics methods to analyze search, communication and publication behavior on the internet. J Med Internet Res. 2009 Mar 27;11(1):e11. 17. Influenza (seasonal) fact sheet no. 211 [Internet]; March 2014. Available from: http://www.who.int/mediacentre/factsheets/fs211/en/http://findit.library.jhu.edu/resolve?sid=Ref works&charset=utf- 8&__char_set=utf8&genre=article&aulast=World%20Health%20Organization&date=March%2 02014&volume=2014&issue=May%2025&atitle=Influenza%20(Seasonal)%20Fact%20Sheet%2 0No.%20211&au=World%20Health%20Organization%20&. 18. Flu symptoms & severity [Internet]; September 2013 . Available from: http://www.cdc.gov/flu/about/disease/symptoms.htmhttp://findit.library.jhu.edu/resolve?sid=Ref works&charset=utf- 8&__char_set=utf8&genre=article&aulast=Centers%20for%20Disease%20Control%20and%20 Prevention&date=September%202013&volume=2014&issue=May%2025&atitle=Flu%20Sympt oms%20%26%20Severity&au=Centers%20for%20Disease%20Control%20and%20Prevention% 20&. 19. Glezen WP, Greenberg SB, Atmar RL, Piedra PA, Couch RB. Impact of respiratory virus infections on persons with chronic underlying conditions. JAMA. 2000;283(4):499-505. 20. Glezen WP, Couch RB, MacLean RA, Payne A, Baird JN, Vallbona C, et al. Interpandemic influenza in the houston area, 1974–76. N Engl J Med. 1978;298(11):587-92. 21. Monto AS, Kioumehr F. The tecumseh study of respiratory illness. IX. occurence of influenza in the community, 1966--1971. Am J Epidemiol. 1975 Dec;102(6):553-63. 22. Glezen WP. Serious morbidity and mortality associated with influenza epidemics. Epidemiol Rev. 1982;4:25-44. 23. Monto AS. Influenza: Quantifying morbidity and mortality. Am J Med. 1987;82(6):20-5. 24. Barker WH. Excess pneumonia and influenza associated hospitalization during influenza epidemics in the united states, 1970-78. Am J Public Health. 1986 Jul;76(7):761-5. 25. Barker WH, Mullooly JP. Impact of epidemic type A influenza in a defined adult population. Am J Epidemiol. 1980 Dec;112(6):798-811. 26. People at high risk of developing Flu–Related complications [Internet]. . Available from: http://www.cdc.gov/flu/about/disease/high_risk.htmhttp://findit.library.jhu.edu/resolve?sid=Ref works&charset=utf- 8&__char_set=utf8&genre=article&aulast=Centers%20for%20Disease%20Control%20and%20 Prevention&volume=2014&issue=4%20June%202014&atitle=People%20at%20High%20Risk% 20of%20Developing%20Flu%E2%80%93Related%20Complications&au=Centers%20for%20D isease%20Control%20and%20Prevention%20&. 27. How flu spreads [Internet]. . Available from: http://www.cdc.gov/flu/about/disease/spread.htmhttp://findit.library.jhu.edu/resolve?sid=Refwor
  • 33. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 33 ks&charset=utf- 8&__char_set=utf8&genre=article&aulast=Centers%20for%20Disease%20Control%20and%20 Prevention&volume=2014&issue=25%20June&atitle=How%20Flu%20Spreads&au=Centers%2 0for%20Disease%20Control%20and%20Prevention%20&. 28. Noble G. Epidemiological and clinical aspects of influenza, beare AS, basic and applied influenza research, 1982, 11-50. 29. Simonsen L, Clarke MJ, Schonberger LB, Arden NH, Cox NJ, Fukuda K. Pandemic versus epidemic influenza mortality: A pattern of changing age distribution. J Infect Dis. 1998 Jul;178(1):53-60. 30. Webster RG, Bean WJ, Gorman OT, Chambers TM, Kawaoka Y. Evolution and ecology of influenza A viruses. Microbiol Rev. 1992 Mar;56(1):152-79. 31. Dolin R. Influenza-interpandemic as well as pandemic disease. N Engl J Med. 2005;353(24):2535. 32. Noble G. Epidemiological and clinical aspects of influenza. 1982. 33. The national respiratory and enteric virus surveillance system (NREVSS) [Internet].; 2014 . Available from: http://www.cdc.gov/surveillance/nrevss/http://findit.library.jhu.edu/resolve?sid=Refworks&chars et=utf- 8&__char_set=utf8&genre=article&aulast=Centers%20for%20Disease%20Control%20and%20 Prevention&date=2014&volume=2014&issue=June%2018&atitle=The%20National%20Respira tory%20and%20Enteric%20Virus%20Surveillance%20System%20(NREVSS)&au=Centers%20 for%20Disease%20Control%20and%20Prevention%20&. 34. Henning KJ. What is syndromic surveillance? Morb Mortal Weekly Rep. 2004:7-11. 35. Marsden-Haug N, Foster VB, Gould PL, Elbert E, Wang H, Pavlin JA. Code-based syndromic surveillance for influenzalike illness by international classification of diseases, ninth revision. Emerg Infect Dis. 2007 Feb;13(2):207-16. 36. Stoto MA, Schonlau M, Mariano LT. Syndromic surveillance: Is it worth the effort? Chance. 2004;17(1):19-24. 37. Overview of influenza surveillance in the united states [Internet]. Available from: http://www.cdc.gov/flu/weekly/overview.htm#Viralhttp://findit.library.jhu.edu/resolve?sid=Ref works&charset=utf- 8&__char_set=utf8&genre=article&aulast=Centers%20for%20Disease%20Control%20and%20 Prevention&volume=2014&issue=June%2018&atitle=Overview%20of%20Influenza%20Surveil lance%20in%20the%20United%20States&au=Centers%20for%20Disease%20Control%20and% 20Prevention%20&. 38. Maryland resident influenza tracking survey [Internet]. Available from: http://flusurvey.dhmh.md.gov/http://findit.library.jhu.edu/resolve?sid=Refworks&charset=utf- 8&__char_set=utf8&genre=article&aulast=Maryland%20Department%20of%20Health%20and %20Mental%20Hygiene&volume=2014&issue=June%2023&atitle=Maryland%20Resident%20I
  • 34. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 34 nfluenza%20Tracking%20Survey&au=Maryland%20Department%20of%20Health%20and%20 Mental%20Hygiene%20&. 39. German RR, Lee L, Horan J, Milstein R, Pertowski C, Waller M. Updated guidelines for evaluating public health surveillance systems. MMWR Recomm Rep. 2001;50(RR-13):1-35. 40. Mallon TM. Progress in implementing recommendations in the national academy of sciences reports:“Protecting those who serve: Strategies to protect the health of deployed US forces”. Mil Med. 2011;176(7S):9-16. 41. Buehler JW, Sonricker A, Paladini M, Soper P, Mostashari F. Syndromic surveillance practice in the united states: Findings from a survey of state, territorial, and selected local health departments. Advances in Disease Surveillance. 2008;6(3):1-20. 42. Canas LC, Lohman K, Pavlin JA, Endy T, Singh DL, Pandey P, et al. The department of defense laboratory-based global influenza surveillance system. Mil Med. 2000 Jul;165(7 Suppl 2):52-6. 43. Global emerging infections surveillance & response system [Internet]. Available from: http://www.afhsc.mil/geishttp://findit.library.jhu.edu/resolve?sid=Refworks&charset=utf- 8&__char_set=utf8&genre=article&aulast=Armed%20Forces%20Health%20Surveillance%20C enter&volume=2014&issue=June%2018&atitle=Global%20Emerging%20Infections%20Surveil lance%20%26%20Response%20System&au=Armed%20Forces%20Health%20Surveillance%20 Center%20&. 44. Department of Defense. Comprehensive health surveillance. 2012. Report No.: DoDD 640.02E. 45. Homeland Security: Improving Public Health Surveillance, Hearing Before the Subcomitte on Government Reform, House of Representitives of the 108th Congress, 1st Sess, 2003). 46. Mandl KD, Overhage JM, Wagner MM, Lober WB, Sebastiani P, Mostashari F, et al. Implementing syndromic surveillance: A practical guide informed by the early experience. J Am Med Inform Assoc. 2004 Mar-Apr;11(2):141-50. 47. Lombardo MJ, Burkom H, Elbert ME, Magruder S, Lewis MSH, Loschen MW, et al. A systems overview of the electronic surveillance system for the early notification of community- based epidemics (ESSENCE II). Journal of urban health. 2003;80(1):i32-42. 48. Siegrist D, Pavlin J. Bio-ALIRT biosurveillance detection algorithm evaluation. Morb Mortal Weekly Rep. 2004:152-8. 49. Carley KM, Altman N, Kaminsky B, Nave D, Yahja A. BioWar: a city-scale multi-agent network model of weaponized biological attacks. 2004. 50. Dredze M. How social media will change public health. Intelligent Systems, IEEE. 2012;27(4):81-4. 51. Jansen BJ, Zhang M, Sobel K, Chowdury A. Twitter power: Tweets as electronic word of mouth. J Am Soc Inf Sci Technol. 2009;60(11):2169-88.
  • 35. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 35 52. O'Connor B, Balasubramanyan R, Routledge BR, Smith NA. From tweets to polls: Linking text sentiment to public opinion time series. ICWSM. 2010;11:122-9. 53. Sakaki T, Okazaki M, Matsuo Y. Earthquake shakes twitter users: Real-time event detection by social sensors. Proceedings of the 19th international conference on world wide web; ACM; 2010. 54. Broniatowski DA, Paul MJ, Dredze M. National and local influenza surveillance through twitter: An analysis of the 2012-2013 influenza epidemic. PloS one. 2013;8(12):e83672. 55. Paul MJ, Dredze M. A model for mining public health topics from twitter. HEALTH. 2012;11:16-. 56. National Human Subjects Protection Advisory Committee. Recommendations on public use data files. Office for Human Research Protection; 2002. 57. IRB office preliminary determinations for MPH and other degree students [Internet]. Available from: http://www.jhsph.edu/offices-and-services/institutional-review-board/student- projects/other-degree- students.htmlhttp://findit.library.jhu.edu/resolve?sid=Refworks&charset=utf- 8&__char_set=utf8&genre=article&aulast=JHSPH%20Institutional%20Review%20Board&volu me=2014&issue=12%2F18&atitle=IRB%20Office%20Preliminary%20Determinations%20for% 20MPH%20and%20Other%20Degree%20Students&au=JHSPH%20Institutional%20Review%2 0Board%20&. 58. FluView: Weekly U.S. influenza surveillance report [Internet]. Available from: http://www.cdc.gov/flu/weekly/pastreports.htmhttp://findit.library.jhu.edu/resolve?sid=Refworks &charset=utf- 8&__char_set=utf8&genre=article&aulast=Centers%20for%20Disease%20Control%20and%20 Prevention%20Influenza%20Division&volume=2014&issue=January%2013&atitle=FluView% 3A%20Weekly%20U.S.%20Influenza%20Surveillance%20Report&au=Centers%20for%20Dise ase%20Control%20and%20Prevention%20Influenza%20Division%20&. 59. Sources of UK flu data: Influenza surveillance in the UK [Internet].; 2014 . Available from: https://www.gov.uk/sources-of-uk-flu-data-influenza-surveillance-in-the- ukhttp://findit.library.jhu.edu/resolve?sid=Refworks&charset=utf- 8&__char_set=utf8&genre=article&aulast=Public%20Health%20England&date=2014&volume =2014&issue=February%2015&atitle=Sources%20of%20UK%20flu%20data%3A%20influenza %20surveillance%20in%20the%20UK&au=Public%20Health%20England%20&. 60. Weekly influenza activity in wales report [Internet].; 2014 http://findit.library.jhu.edu/resolve?sid=Refworks&charset=utf- 8&__char_set=utf8&genre=article&aulast=Public%20Health%20Wales&date=2014&volume=2 014&issue=January%2028&atitle=Weekly%20Influenza%20Activity%20in%20Wales%20Repo rt&au=Public%20Health%20Wales%20&. 61. National influenza report [Internet].; 2014 . Available from: http://www.hps.scot.nhs.uk/resp/influenzareports.aspxhttp://findit.library.jhu.edu/resolve?sid=Re fworks&charset=utf- 8&__char_set=utf8&genre=article&aulast=Health%20Protection%20Scotland&date=2014&vol
  • 36. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 36 ume=2014&issue=January%2021&atitle=National%20Influenza%20Report&au=Health%20Prot ection%20Scotland%20&. 62. Nidirect [Internet].; 2014 . Available from: http://www.dhsspsni.gov.uk/http://findit.library.jhu.edu/resolve?sid=Refworks&charset=utf- 8&__char_set=utf8&genre=article&aulast=Northern%20Ireland%20Department%20of%20Heal th%2C%20Social%20Services%20and%20Public%20Safety&auinit=%20Social%20Services%2 0and%20Public%20Safety&date=2014&volume=2014&issue=January%2014&atitle=nidirect&a u=Northern%20Ireland%20Department%20of%20Health%2C%20Social%20Services%20and% 20Public%20Safety%20&. 63. Miller CL, Druss BG, Dombrowski EA, Rosenheck RA. Barriers to primary medical care among patients at a community mental health center. Psychiatric Services. 2003;54(8):1158-60. 64. Mislove A, Lehmann S, Ahn Y, Onnela J, Rosenquist JN. Understanding the demographics of twitter users. ICWSM. 2011;11:5th. 65. UK seniors choose facebook [Internet].; 2013 . Available from: http://www.emarketer.com/Article/UK-Seniors-Choose- Facebook/1010484http://findit.library.jhu.edu/resolve?sid=Refworks&charset=utf- 8&__char_set=utf8&genre=article&date=2013&volume=2014&issue=12%2F18&atitle=UK%2 0Seniors%20Choose%20Facebook&. 66. Anderson OD. Time series analysis and forecasting: The box-jenkins approach. Butterworths London and Boston; 1976. 67. Wagner, M. M., Espino, J., Tsui, F. C., Gesteland, P., Chapman, W., Ivanov, O., ... & Hutman, J. (2004). Syndrome and outbreak detection using chief-complaint data—experience of the Real-Time Outbreak and Disease Surveillance project. Morbidity and Mortality Weekly Report, 28-31. 68. Duggan, M., & Brenner, J. (2013). The demographics of social media users, 2012 (Vol. 14). Washington, DC: Pew Research Center's Internet & American Life Project. 69. Java, A., Song, X., Finin, T., & Tseng, B. (2007, August). Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis (pp. 56-65). ACM. 70. Aramaki, E., Maskawa, S., & Morita, M. (2011, July). Twitter catches the flu: detecting influenza epidemics using Twitter. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 1568-1576). Association for Computational Linguistics. 71. Heil, B., & Piskorski, M. (2009). New research: Men follow men and nobody tweets. 2009- 06-01). http:∥ blogs. hbr. org/cs/2009/06/new _ twitter _ research _ men _ follo. html. 72. Rains, S. A., & Keating, D. M. (2011). The social dimension of blogging about health: Health blogging, social support, and well-being. Communication Monographs, 78(4), 511-534. 73. Mo, P. K., & Coulson, N. S. (2008). Exploring the communication of social support within virtual communities: A content analysis of messages posted to an online HIV/AIDS support group. Cyberpsychology & behavior, 11(3), 371-374.
  • 37. PANDEMIC RESPONSE IN THE ERA OF BIG DATA 37 74. Phillips, K. A., Mayer, M. L., & Aday, L. A. (2000). Barriers to care among racial/ethnic groups under managed care. Health Affairs, 19(4), 65-75. 75. Ngo‐Metzger, Q., Massagli, M. P., Clarridge, B. R., Manocchia, M., Davis, R. B., Iezzoni, L. I., & Phillips, R. S. (2003). Linguistic and cultural barriers to care. Journal of general internal medicine, 18(1), 44-52. 76. Heckman, T. G., Somlai, A. M., Peters, J., Walker, J., Otto-Salaj, L., Galdabini, C. A., & Kelly, J. A. (1998). Barriers to care among persons living with HIV/AIDS in urban and rural areas. AIDS care, 10(3), 365-375. 77. Newacheck, P. W., Stoddard, J. J., Hughes, D. C., & Pearl, M. (1998). Health insurance and access to primary care for children. New England Journal of Medicine, 338(8), 513-519.