Social media and its use in disease surveillanceMarch 2010
✤ How do we improve disease surveillance?✤ Can social media (e.g. twitter) be effectively used to monitor disease outbreaks?
Tweets: disease reports✤ Omg.. The never-ending ﬂu+sore throat.. ☹ bleh.. ☹✤ Stomach ﬂu. Urgh.✤ i love puking... f@#k you ﬂu✤ Having a sore throat,sucks.Having ﬂu,sucks even MORE.DAMMIT!✤ Feeling dizzy/ feverish ever since that class at the gym! overexertion or the ﬂu??
Tweets: non disease reports✤ Study ﬁnds H1N1 ﬂu in pregnancy is critical risk - Reuters - http://bit.ly/bLiLnz✤ This March Madness turns out to be the ﬂu!✤ Smiling is infectious, You can catch it like the ﬂu. Someone smiled at me today, And I started smiling too.
We need Natural LanguageProcessing (NLP)✤ We need a NLP engine in order to process tweets:✤ Tweet → NLP Engine → Its the ﬂu!
Maybe we need NLP + Ontologies✤ Do we just search for simple keywords?✤ An ontology can provide us with organized concepts relevant to a domain (i.e. health, biomedicine)✤ How about processing natural language to match concepts organized in an ontology?
Ontologies help answer thesequestions✤ How do we know if a user is referring to a symptom or a disease?✤ We seem to need a set of keywords. Where do get this set of symptoms and disease names?✤ How do we link references to one or more symptom to a speciﬁc disease?
The UMLS Ontology✤ A comprehensive thesaurus and ontology of biomedical concepts✤ Facilitates development of computer systems that behave as if they "understand" the meaning of the language of biomedicine and health.✤ Integrates 2+ million names for ~900k concepts from 60+ families of biomedical vocabularies, and 12 million relations among these concepts.
UMLS & MetaMap✤ MetaMap is a tool that given an arbitrary piece of text, ﬁnds and returns the relevant concepts available in the UMLS Ontology✤ MetaMap is a software interface to query the “MetaThesaurus” and the “Semantic Network”, both a component of UMLS
Concept mapping with MetaMap✤ Using MetaMap to query the MetaThesaurus, we can map the following text strings to the concept "Atrial Fibrillation" ✤ Atrial ﬁbrillation! ✤ AF! ✤ AFib! ✤ Atrial ﬁbrillation (disorder)
✤ But who actually tweets “atrial fibrillation” ??
“Having a sore throat, sucks.Having flu, sucks even MORE”✤ Matches: ✤ SORETHROAT (Sore Throat) [Sign or Symptom] ✤ Flu (Inﬂuenza) [Disease or Syndrome] ✤ Sucking [Physiologic Function]
“i love puking... damn you flu”✤ Matches: ✤ I (Iodides) [Inorganic Chemical] ✤ Love [Mental Process] ✤ Flu (Inﬂuenza) [Disease or Syndrome]
“Feeling dizzy/ feverish ever since that class atthe gym! overexertion or the flu??”✤ Matches: ✤ Feeling dizzy [Sign or Symptom] ✤ Feverish (Fever) [Finding] ✤ Overexertion (Exhaustion due to excessive exertion) [Injury or Poisoning] ✤ Flu (Inﬂuenza) [Disease or Syndrome]
“Smiling is infectious, u can catch it like theflu; someone smiled at me today, and I startedsmiling too”✤ Matches: ✤ Smiling [Social Behavior] ✤ Infection [Disease or Syndrome] ✤ Catch (Catch - Finding of sensory dimension of pain) [Sign or Symptom] ✤ Flu (Inﬂuenza) [Disease or Syndrome] ✤ Today [Temporal Concept]
Using MetaMap✤ Free of Charge!✤ MetaMap Transfer (MMTx) is a java-based distributable version of the MetaMap program✤ Requires 7GB disk space (uncompressed) and at least 1GB of RAM (2GB recommended)✤ “MetaMap is not an end user product. Users will need a moderate amount of programming knowledge to use MMTx effectively.” - from UMLS website
We identified tweets that mentiona concept...SO WHAT?✤ We cant assume its a case report!✤ How the we go around this?✤ Are we done here?
Supervised learning to improvethe results?✤ What if we use machine learning?✤ Supervised learning is a machine learning technique for deducing a function from training data
Is it feasible?✤ Weka is a collection of machine learning algorithms for data mining tasks.✤ Algorithms can be applied directly to a dataset or called from your own Java code.✤ Input: dataset of concept matches; Output: Classiﬁer Java Class✤ This automatically generated java class can be easily be used to answer if a tweet matching X and Y medical concepts is or is not a disease report
Processing a tweet overview✤ Get Tweet✤ Process tweet using MetaMap✤ Get matching concepts from MetaMap✤ Feed the matches to the Classiﬁer Java Class✤ Get a True or False answer indicator “its a disease report”