SlideShare a Scribd company logo
Exploiting NLP for Digital Disease Informatics
University of Warwick, October 15th 2015
Nigel Collier, Language Technology Lab
Department of Theoretical and Applied Linguistics
Really understanding natural language is the next grand
challenge
• High throughput methods have transformed biomedicine into a
data-rich science
• All genes in a genome, all proteins in a proteome, all transcripts in a
cell, all metabolic processes in a tissue…
Really understanding natural language is the next grand
challenge
• High throughput methods have transformed biomedicine into a
data-rich science
• All genes in a genome, all proteins in a proteome, all transcripts in a
cell, all metabolic processes in a tissue…
• A significant portion of human health data is ‘messy data’
existing only as unstructured text
• Biomedical literature, Clinical trials data, Lab notebooks, Clinical
records, Diagnostic reports, News reports, Social media messages
• Represents the most contextually grounded, high precision
information about an individual’s health, attitudes and behaviours
Really understanding natural language is the next grand
challenge
• High throughput methods have transformed biomedicine into a
data-rich science
• All genes in a genome, all proteins in a proteome, all transcripts in a
cell, all metabolic processes in a tissue…
• A significant portion of human health data is ‘messy data’
existing only as unstructured text
• Biomedical literature, Clinical trials data, Lab notebooks, Clinical
records, Diagnostic reports, News reports, Social media messages
• Represents the most contextually grounded, high precision
information about an individual’s health, attitudes and behaviours
• Natural language processing (NLP) is a cornerstone
technology to translate ‘messy data’ into structured forms that
are systematically encoded, e.g. SNOMED-CT, ICD.
Experience from personal research
(1) Global infectious disease alerting and
mapping
(2) Extracting a database of phenotype terms
(3) Understanding the voice of the patient
(4) Chemical cancer risk assessment
(5) Critical hypothesis generation from literature
Typical workflow from text to knowledge
raw text
document
sentence
segmentation
tokenization
lexical
featurisation
entity
recognition
trigger
detection
relation
extraction
event
extraction
entity
grounding
knowledge objects
syntactic
parsing
Broad Research Objectives
• Extrinsic: Robust data collection from across health-related text types:
literature, patient records, news, social media (public health alerts, developing
disease profiles, etc.)
• Intrinsic: Understand how NLP/ML/Ontology techniques perform and can be
improved in operational settings
BIOCASTER: GLOBAL INFECTIOUS DISEASE
ALERTING AND MAPPING
Case study #1
[5] Collier, N. et al. (2008). BioCaster: detecting public health rumors with a Web-based text mining system. Bioinformatics, 24(24), 2940-2941.
[6] Collier, N., et al. (2011). OMG U got flu? Analysis of shared health messages for bio-surveillance. J. Biomedical Semantics, 2(S-5), S9.
[7] Hay, S. I., et al. (2013). Global mapping of infectious disease. Philosophical Transactions of the Royal Society of London B: Biological
Sciences, 368(1614), 20120250.
Infectious diseases spread rapidly
“We live in a world where threats to health arise from the speed and volume of air
travel, the way we produce and trade food, the way we use and misuse antibiotics,
and the way we manage the environment…”
- Dr. Margaret Chan, DG WHO
SARS, 2003
HK, world
H5N1 flu, 2003-
PRC, Thailand,
ROC, Vietnam
Foot & mouth, 2001
United Kingdom
Ebola, 2014-
Guinea, Liberia,
Sierra Leone,
Nigeria
Trend graphs
Event summaries
Event alerts
Ontology browsing
Email/GeoRSS alerting
Watchboard, etc.
Real time Twitter
analysis
Up to date news in
12 languages
Event database search
GHSI
partners
US
UK
FR
DE
WHO
IT
JP
CA
Digital epidemic surveillance with BioCaster
Example frame
<SLOT name="HAS_DISEASE" type="DISEASE" content="Anthrax" alt="" root_term="Anthrax" bid=""/>
<SLOT name="HAS_LOCATION.COUNTRY" type="LOCATION" content="Morocco" alt="" root_term="Morocco" bid=""/>
<SLOT name="HAS_LOCATION.PROVINCE" type="LOCATION" content="Marrakech" alt="" root_term="" bid=""/>
<SLOT name="HAS_AGENT" type="micro_organism" content="Bacillus anthracis" alt="" root_term="" bid=""/>
<SLOT name="HAS_SPECIES" type="animal" content="human" alt="" root_term="" bid=""/>
<SLOT name="TIME.relative" type="string" content=""/>
<SLOT name="INTERNATIONAL_TRAVEL" type="Boolean" content="false"/>
<SLOT name="DELIBERATE_RELEASE" type="Boolean" content="false"/>
<SLOT name="ZOONOSIS" type="Boolean" content="false"/>
<SLOT name="DRUG_RESISTANCE" type="Boolean" content="false"/>
<SLOT name="FOOD_CONTAMINATION" type="Boolean" content="false"/>
<SLOT name="HOSPITAL_WORKER" type="Boolean" content="false"/>
<SLOT name="FARM_WORKER" type="Boolean" content="false"/>
<SLOT name="MALFORMED_PRODUCT" type="Boolean" content="false"/>
<SLOT name="NEW_TYPE_AGENT" type="Boolean" content="false"/>
<SLOT name="SERVICE_DISRUPTION" type="Boolean" content="false"/>
<SLOT name="CATEGORY_A" type="Boolean" content="true">
</EVENT>
Technical challenges
X0,000 news providers
REAL TIME
SCALING 30,000-40,000 news items/day
900 on topic/day
200 events/day
4 alerts/day
Technical challenges
X0,000 news providers
MULTILINGUALITY
Avian Flu
Influenza aviaire
鳥インフルエンザ
조류인플루엔자
โรคไข้หวัดนก
Cúm gia cầm
REAL TIME
SCALING
 Increased sensitivity and
timeliness from multilingual
news
News event counts for porcine foot-
and-mouth outbreak in South Korea
2010-2011
Technical challenges
X0,000 news providers
MULTILINGUALITY
REAL TIME
SCALING
AMBIGUITY
“Obama fever builds as Americans
await a new era”
Equine influenza in Camden
Camden (UK) Camden (AU) Camden (CA) + 19 others
Entity identification
Toponym grounding
Tajoura Tajura Tajoora…
Variant transliterations
Coreference
“Two British holidaymakers fell ill… ”
“Two male pensioners died…”
2 or 4 victims?
Temporal identification
“The Spanish flu outbreak…
Semantic pipeline
Source: BioCaster
Outbreak characteristics: Early surge vs multi-modal transmission
News event frequency over time
Looking for bursts of activity
Source: GENI-DB
0
1
0
40
80
120
160
200
ct
μ
μ+3σ
Gold
Alerts with the C2 test statistic:
St = max(0, (Ct – (μt + 3σt))/ σt)
First English language
reports (MMWR + AP)
Understanding norms and their violations
5 detection algorithms
1. Early aberration reporting system (EARS) C2 algorithm
• captures the number of standard deviations that the current count exceeds the history mean;
• St = max(0, (Ct – (μt + kσt))/ σt)
2. EARS C3 algorithm
• similar to C2 except that C3 uses a weighted sum of the previous 3 days for the current period;
3. W2 algorithm
• a modified version of C2 which ignores history counts on Saturdays and Sundays to compensate for day of week effects;
4. F statistic
• compares the variance in the history window to the variance in the current window;
• St = σt
2 +σb
2
5. Exponential Weighted Moving Average (EWMA)
• provides less weight to days in the history that are further from the test day.
• St = (Yt – μt)/[σt * (λ/(2- λ))1/2], where Y1 = C1 and Yt = λCt + (1- λ)Yt-1
Model parameters were estimated based on an additional 5 epidemic data sets from ProMED-mail (data not
shown)
[8] Burkom H. S. (2005), “Accessible Alerting Algorithms for Biosurveillance”. National Syndromic Surveillance Conference
[9] Jackson M. L. et all (2007), “A simulation study comparing aberration detection algorithms for syndromic surveillance” Medical Informatics and Decision
Making , 7(6): BMC, DOI: 10.1186/1472-6947-7-6.
[10] Madoff L. (2004), “ProMED-mail: An early warning system for emerging diseases”. Clin Infect Dis , 39(2): 227–232.
# Disease Country ProMED-alerts
1 Hand,foot,mo
uth
PR China 9
2 Ebola Congo 17
3 Yellow fever Brazil 28
4 Influenza USA 21
5 Cholera Iraq 5
6 Chikungunya Singapore 8
7 Anthrax USA 15
8 Yellow fever Argentina 5
9 Ebola Reston Philippine
s
15
# Disease Country ProMED-alerts
10 Influenza Egypt 49
11 Plague USA 8
12 Dengue Brazil 27
13 Dengue Indonesia 14
14 Measles UK 13
15 Chikunguny
a
Malaysia 15
16 Yellow fever Senegal 0
17 Influenza Indonesia 35
18 Influenza Banglade
sh
3
14 countries and 11 infectious disease types. 366 days of news data was collected from BioCaster for each disease and
country. The study period is 17th June 2008 to 17th June 2009
Creating a benchmark data set
C3 C2 W2 F-statistic EWMA
Sensitivity 0.74 0.66 0.66 0.78 0.73
(0.69-0.78) (0.61-0.72) (0.60-0.71) (0.74-0.82) (0.68-0.78)
Specificity 0.96 0.98 0.98 0.92 0.95
(0.95-0.96) (0.98-0.98) (0.98-0.99) (0.91-0.92) (0.94-0.96)
PPV 0.55 0.64 0.65 0.46 0.47
(0.98-0.99) (0.98-0.99) (0.98-0.99) (0.98-0.99) (0.98-0.99)
NPV 0.98 0.98 0.98 0.98 0.98
(0.98-0.99) (0.98-0.99) (0.98-0.99) (0.98-0.98) (0.98-0.99)
Alarms/100 days 6.48 4.52 4.17 12.34 7.85
F-measure 0.63 0.65 0.66 0.58 0.58
Results in parentheses show 95% confidence intervals
[11] Collier, N. (2009), “What’s unusual in online disease outbreak news?”, in BMC Biiomedical Semantics, 1(2).
Comparison of 5 aberration detection algorithms
Field evaluation
• (2006-2012) Global Health Security Initiative– a unique initiative by G7+WHO+EC to
bring together end-users, system providers and stakeholders to test the feasibility of
open source public health intelligence systems.
[12] Barboza, P., Vaillant, L., Le Strat, Y., Hartley, D. M., Nelson, N. P., Mawudeku, A., Madoff, L. C., Linge, J. P., Collier, N., Brownstein, J. S. and Astagneau,
P. (2014). Factors Influencing Performance of Internet-Based Biosurveillance Systems Used in Epidemic Intelligence for Early Detection of Infectious Diseases
Outbreaks. PloS one, 9(3), e90536.
[13] Barboza, P., Vaillant, L., Mawudeku, A., Nelson, N., Hartley, D., Madoff, L., Linge, J., Collier, N., Brownstein, J., Yangarber, R. and Astagneau, P. (2013),
“Evaluation of epidemic intelligence systems integrated in the Early Alerting and Reporting project for the detection of A/H5N1 Influenza events”, PLoS One,
8(3):e57252.
Major findings for A/H5H1:
- Detection rates for individual systems from
31% to 38%
- Rising to 72% for the combined system
- PPV ranged from 3% to 24%
- F1 ranged from 6% to 27%
- Sensitivity ranged from 38% to 72%
- Average improvement in alerting over WHO or
OIE was 10.2 days
User outcomes
• Used by WHO and Japanese MoH to detect early cases during
the A(H1N1) pandemic;
• Used by ECDC to monitor diseases during the Shanghai Expo
2010, London Olympics 2012;
• Used by French Institute for Public Health to monitor for human-
to-human A(H5N1) transmission;
• Used by GHSI members to monitor for suspected accidental or
deliberate releases;
• Used by CDC to help monitor for health impact of the Oil spill in
the Gulf of Mexico;
PHENOMINER/PHENEBANK: EXTRACTING A
DATABASE OF PHENOTYPE TERMS
Case study #2
[14] Collier, N., Groza, T., Smedley, D., Robinson, P., Oellrich, A. and Rebholz-Schuhmann, D. (2015). PhenoMiner: from text to a database of
phenotypes associated with OMIM diseases. Database, Oxford University Press (in press).
[15] Collier, N., Oellrich, A. and Groza, T. (2013), “Toward knowledge support for analysis and interpretation of complex traits”, Genome Biology
14(9):214.
What is a phenotype?
Image courtesy of Washington, Haendel, Mungall, Ashburner, Westfield and Lewis (2009), “Linking human diseases
to animal models using ontology-based phenotype annotation”, PLoS Biology, 7(11):e1000247.
“… patients were selected for FOXP2 screening only if
they fulfilled the following criteria: presence of
speech articulation problems diagnosed by a clinician …”
HPO: 0009088 Speech articulation difficulties
Image courtesy of Damian Smedley,
Welcome Trust Sanger Institute,
Hinxton and Tudor Groza, University
of Queensland, Brisbane
Coding personal terminology
SVM learn-to-rank (pairwise)
Maximum entropy
Priority list heuristic
“… patients were selected for FOXP2 screening only if
they fulfilled the following criteria: presence of
speech articulation problems diagnosed by a clinician”
“… patients were selected for FOXP2 screening only if
they fulfilled the following criteria: presence of
speech articulation problems diagnosed by a clinician”
Creating a benchmark data set
• Data from OMIM cited autoimmune literature (112 abstracts, 472
phenotypes, 1611 gene/gene products).
F-scores computed using ablation on various domain
ontologies
F-scores using 3 hypothesis resolution strategies
[16] Collier, N., Tran, M., Le, H. Ha, Q., Oellrich, A. Rebholz-Schuhmann, D. (2013), “Learning to recognize phenotype candidates in the auto-immune literature
using SVM re-ranking”, PLoS One 8(10): e72965.
Lesson learnt … sampling matters
Resource Size (records)
PubMed 23,765,575
GENIA 2,000
PennBioIe 1414
FSU-PRGE 3,236
Arizona corpus 2,775
sentences
I2B2/VA 2010 826
M1: In domain approach
Sample B1
Learner
Knowledge
Evaluation (A)
Sample B2
Sample B
M2: Out domain approach
Sample A
Learner
Knowledge
Evaluation (B)
Sample B
M3: Mix-in approach
Sample A
Learner
Knowledge
Evaluation (B)
Sample BSample B
+
M4: stack approach
Learner
Knowledge
Evaluation (B)
Sample BSample B
Sample A
Learner
Knowledge
M5: binary class
Sample A
Learner
Knowledge
Evaluation (B)
Sample B
Sample B
+
Re-label PHE
PHE-1 and PHE-2
Re-label PHE-1
and PHE-2 as PHE
M6: frustratingly simple
Sample A
Learner
Knowledge
Evaluation (B)
Sample B
Sample B
+
Re-label features as
Sample A, Sample B and Joint
37
Near domain transfer results
How can we do domain adaptation better (with less
annotations)?
[17] Collier, N., Paster, F., Campus, H., & Tran, A. M. V. (2014), “The impact of near domain transfer on biomedical named entity recognition”, Proc. 5th
International Workshop on Health Text Mining and Information Analysis (LOUHI) at the European Conference on Computational Linguistics (EACL),
Gothenburg, Sweden, pp. 11-20.
SIPHS: UNDERSTANDING THE VOICE OF THE
PATIENT
Case study #3
[18] Limsopatham, N. and Collier, N. (2015), “Adapting phrase-based machine translation to normalise medical terms in social media messages”,
in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17-21 September 2015, pp.
1675-1680.
[19] Limsopatham, N. and Collier, N. (2015), “Towards the semantic interpretation of personal health messages from social media”, in Proceedings
of the 24th ACM International Conference on Information and Knowledge Management (CIKM 2015), Workshop on Understanding the City with
Urban Informatics (UCUI 2015), Melbourne, Australia, 19-23 October 2015.
What do people talk about?
Types Tweet samples
Influenza
confirmation
I got flu n coughed a lot. Now my voice is like
monster’s voice. Rrr
Influenza symptoms My day: flu-like symptoms (headache, body aches,
cough, chills, 100.9 fever). Swine flu not ruled out.
#H1N1
Flu shots I’m still getting flu shots, nothing is worth flu turning
into bronchitis into pneumonia
Self protection Cover your mouth if coughing, use a tissue, wash
your hands often & get a flu shot - protect and
defend your community from #H1N1
Medication Wondering why I didn’t take the flu shot, laying in
bed with cough drops, medicine, and the remote
Tracking anxiety indicators have moderate-strong correlation
with CDC seasonal flu tracking
Category Spearman’s
Rho
P-value
A 0.66 0.020
S 0.66 0.021
I 0.58 0.048
P 0.67 0.017
A+I+P 0.68 0.008
A+I+P+S 0.67 0.017
0
50
100
150
200
250
300
350
400
450
0
500
1000
1500
2000
2500
3000
46 47 48 49 50 51 52 1 2 3 4 5
CDC
A
S
I
P
A+I+P
A+I+P+S
Data source: CDC (2009-2010 flu season)
“Cover your mouth if coughing,
use a tissue, wash your hands often & get
a flu shot - protect and defend your
community”
“I’m still getting flu shots, nothing is worth
flu turning into bronchitis into pneumonia”
“I can ignore this sore throat no longer.
And, um, maybe I should have gotten
that H1N1 vaccine.“
Frustratingly simple models work better
Classifying respiratory syndrome: Turning 225,000 Tweets into a
high correlation influenza tracker
[22] Doan, S., Ohno-Machado, L. and Collier, N. (2012), "Enhancing Twitter data analysis with simple semantic filtering: example in tracking Influenza-Like Illnesses", in
the 2nd IEEE Conference on Healthcare Informatics, Imaging and Systems Biology: Analyzing Big Data for Healthcare and Biomedical Sciences, California, USA,
September 27-28.
Coding the voice of the patient in SIPHS
• Integrate the language of Social Media and Lifescience Ontologies
• ‘Voice of the patient’ – real time public health mapping/risk analysis
• Code patient-centred vocabulary and links
• Generate public health summaries, e.g. infectious diseases, ADRs
Twitter message SNOMED
preferred
term
SNOMED ID
No way I’m getting any sleep 2nite Insomnia 193462001
Take _DRUG_ and can’t even
focus forreal
Unable to
concentrate
60032008
_DRUG_ makes u skinny Weight loss 89362005
“You shall know a word by the company it keeps”
– (Firth, J. R. 1957)
• Existing work [1,2] used word vector similarity to measure the
semantic similarity between texts
 Performance seems depended on the used vector representation (e.g.
CBOW [1], GloVe [2])
[23] Mikolov et al. Distributed representations of words and phrases and their compositionality. NIPS 2013
[24] Pennington et al. GloVe: Global vectors for word representation. EMNLP 2014
• Recent advances in deep learning
technology [1,2] allowed the learned
representation of terms (i.e. DWRs) that
could capture the semantic similarity of
terms based on their co-occurrences e.g.
Continuous bag-of-words (CBOW) [1], Global
Vector (GloVe) [2]
44
Related work – Phrase-based MT
• Phrase-based MT [3]: Translate between languages by learning local
term dependencies from parallel corpora
 We adapt phrase-based MT to translate from social media language to
formal medical language
Can’t even focus forreal  no concentrate  ???
[25] Koehn et al. Statistical phrase-based translation. NAACL 2003
45
Adapting Phrase-based MT for Twitter Normalisation
• We use phrase-based MT to translate social media text to formal
medical text, then map the translated symptoms to a SNOMED-CT
concept
Can’t even focus forreal  unable to focus  unable to concentrate
(ID 60032008)
translate
find semantic distance
[18] Limsopatham, N. and Collier, N. (2015), “Adapting phrase-based machine translation to noramlise medical terms
in social media messages”, in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing,
Lisbon, Portugal, September, pp. 1675-1680.
A Twitter Phrase
Training pairs of
Twitter phrases and
SNOMED-CT terms
A Phrase-based MT
Model
Our Mapping
Approach (i.e. Sim,
rSim)
A ranking of
mapped
concepts
e.g. ‘No way I’m getting
any sleep 2nite’
e.g.
‘no sleep week’ = ‘Insomnia’,
‘so unfocussed!!!’ = ‘Unable to
concentrate’
Using a phrase-based
model, such as Koehn et
al. (2003)
e.g.
1. Insomnia (193462001)
2. Productivity at work
(224403006)
System Architecture
Experimental Setup
• Instantiations of our approach:
 Sim(1): using only the best translation
 Sim(5): using the top 5 translations
 rSim(5): using the top 5 translations
• Baseline: Cosine similarity of vector representations of the original
tweet and the description of a concept
 One-hot
 Continuous Bags of Words (CBOW)
 Global Vector (GloVe)
48
Experimental Results
• RQ1: Does our approach perform better than SOTA DWR baselines?
0.1675
0.2232
0.2491
0.2458
0.1896
0.1869
0
0.05
0.1
0.15
0.2
0.25
0.3
Baseline Sim(1) Sim(5) rSim(5)
One-hot CBOW GloVe
Yes, all instances of our approach
markedly outperformed the DWR
baselines by up to 33%MRR-5
49
Twitter message: “unable to sleep at all”
Baseline:
Mapping: “unable to sleep at all”  ‘unable to concentrate’
Our approach:
Translation: “unable to sleep at all”  “insomnia of”
Mapping: “insomnia of”  ‘insomnia’
Experimental Results
• RQ2: Which types of DWRs are effective for our approach?
0.1675
0.2232
0.2491
0.2458
0.1896
0.2070
0.2104
0.2109
0.1869
0.2500
0.2638
0.2617
0
0.05
0.1
0.15
0.2
0.25
0.3
Baseline Sim(1) Sim(5) rSim(5)
One-hot CBOW GloVe
Both Sim and rSim outperform
the baseline, regardless of the
used vector representation
MRR-5
Experimental Results
• RQ3: Would the performance improve if we consider both original and
translated text when mapping a concept? Performances improved
when using one-hot
representationMRR-5
0.2232
0.242
0.2491
0.2556
0.2458
0.2594
0.2070
0.1953
0.2104
0.2144
0.2109
0.207
0.2500
0.2532
0.2638
0.2600
0.2617
0.2509
0.15
0.17
0.19
0.21
0.23
0.25
0.27
Sim(1) Sim(1)+ Sim(5) Sim(5)+ rSim(5) rSim(5)+
One-hot CBOW GloVe
51
Summary
• How we exploit the base of medical evidence is changing as access to unstructured
‘messy’ data opens up new opportunities
• Data access, bias and standards
• We can expect impact in epidemic detection, pharmacovigilence, translational health,
disease mapping, risk communication, rare disease profiling and many other areas.
• Encoding the data increases value through data mining, exchange and integration
• Machine learning outperforms dictionaries and hand built rules
• Finding the right lexical representation and right target form is key
Thank you
Contributions by:
Nigel Collier
nhc30@cam.ac.uk
Anna Korhonen
alk23@cam.ac.uk
Nut Limsopatham
nl347@cam.ac.uk
Further information at the
Language Technology Lab
http://ltl.mml.cam.ac.uk/
Funding:

More Related Content

What's hot

Introduction to Cancer Genomics Databases
Introduction to Cancer Genomics DatabasesIntroduction to Cancer Genomics Databases
Introduction to Cancer Genomics Databases
Neuro, McGill University
 
Bioinformatics in medicine
Bioinformatics in medicineBioinformatics in medicine
Bioinformatics in medicine
Kokulapalan Wimalanathan
 
Cancer genome databases & Ecological databases
Cancer genome databases & Ecological databases Cancer genome databases & Ecological databases
Cancer genome databases & Ecological databases
Waliullah Wali
 
Bioinformatics workshop presentation
Bioinformatics   workshop presentationBioinformatics   workshop presentation
Bioinformatics workshop presentation
SKUAST-Kashmir
 
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET
 
Brief introduction to Bioinformatics
Brief introduction to BioinformaticsBrief introduction to Bioinformatics
Brief introduction to Bioinformatics
Cynthia Alexander Rascon
 
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29Stephen Friend Institute of Development, Aging and Cancer 2011-11-29
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29
Sage Base
 
Bioinformatics Information Sources
Bioinformatics Information SourcesBioinformatics Information Sources
Bioinformatics Information Sources
Dr. Rupak Chakravarty
 
Patient-Organized Genomic Research Studies
Patient-Organized Genomic Research StudiesPatient-Organized Genomic Research Studies
Patient-Organized Genomic Research Studies
Melanie Swan
 
Computational challenges in precision medicine and genomics
Computational challenges in precision medicine and genomicsComputational challenges in precision medicine and genomics
Computational challenges in precision medicine and genomics
Gary Bader
 
Potentials of 3D models in anticancer drug screening
Potentials of 3D models in anticancer drug screeningPotentials of 3D models in anticancer drug screening
Potentials of 3D models in anticancer drug screening
Anjali R.
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
Tanveer Abbas
 
Emerging collaboration models for academic medical centers _ our place in the...
Emerging collaboration models for academic medical centers _ our place in the...Emerging collaboration models for academic medical centers _ our place in the...
Emerging collaboration models for academic medical centers _ our place in the...
Rick Silva
 
BigData in Life Sciences, Genomics and Systems Biology
BigData in Life Sciences, Genomics and Systems BiologyBigData in Life Sciences, Genomics and Systems Biology
BigData in Life Sciences, Genomics and Systems Biology
Harsha Rajasimha
 
John Boikov Personalised Medicine Essay, Mark - 95 out of 100
John Boikov Personalised Medicine Essay, Mark - 95 out of 100John Boikov Personalised Medicine Essay, Mark - 95 out of 100
John Boikov Personalised Medicine Essay, Mark - 95 out of 100
John Boikov
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6
Andrew Su
 
Cancer uk 2015_module1_ouellette_ver02
Cancer uk 2015_module1_ouellette_ver02Cancer uk 2015_module1_ouellette_ver02
Cancer uk 2015_module1_ouellette_ver02
Neuro, McGill University
 
Use cases
Use casesUse cases
Use cases
improvemed
 

What's hot (18)

Introduction to Cancer Genomics Databases
Introduction to Cancer Genomics DatabasesIntroduction to Cancer Genomics Databases
Introduction to Cancer Genomics Databases
 
Bioinformatics in medicine
Bioinformatics in medicineBioinformatics in medicine
Bioinformatics in medicine
 
Cancer genome databases & Ecological databases
Cancer genome databases & Ecological databases Cancer genome databases & Ecological databases
Cancer genome databases & Ecological databases
 
Bioinformatics workshop presentation
Bioinformatics   workshop presentationBioinformatics   workshop presentation
Bioinformatics workshop presentation
 
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
 
Brief introduction to Bioinformatics
Brief introduction to BioinformaticsBrief introduction to Bioinformatics
Brief introduction to Bioinformatics
 
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29Stephen Friend Institute of Development, Aging and Cancer 2011-11-29
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29
 
Bioinformatics Information Sources
Bioinformatics Information SourcesBioinformatics Information Sources
Bioinformatics Information Sources
 
Patient-Organized Genomic Research Studies
Patient-Organized Genomic Research StudiesPatient-Organized Genomic Research Studies
Patient-Organized Genomic Research Studies
 
Computational challenges in precision medicine and genomics
Computational challenges in precision medicine and genomicsComputational challenges in precision medicine and genomics
Computational challenges in precision medicine and genomics
 
Potentials of 3D models in anticancer drug screening
Potentials of 3D models in anticancer drug screeningPotentials of 3D models in anticancer drug screening
Potentials of 3D models in anticancer drug screening
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Emerging collaboration models for academic medical centers _ our place in the...
Emerging collaboration models for academic medical centers _ our place in the...Emerging collaboration models for academic medical centers _ our place in the...
Emerging collaboration models for academic medical centers _ our place in the...
 
BigData in Life Sciences, Genomics and Systems Biology
BigData in Life Sciences, Genomics and Systems BiologyBigData in Life Sciences, Genomics and Systems Biology
BigData in Life Sciences, Genomics and Systems Biology
 
John Boikov Personalised Medicine Essay, Mark - 95 out of 100
John Boikov Personalised Medicine Essay, Mark - 95 out of 100John Boikov Personalised Medicine Essay, Mark - 95 out of 100
John Boikov Personalised Medicine Essay, Mark - 95 out of 100
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6
 
Cancer uk 2015_module1_ouellette_ver02
Cancer uk 2015_module1_ouellette_ver02Cancer uk 2015_module1_ouellette_ver02
Cancer uk 2015_module1_ouellette_ver02
 
Use cases
Use casesUse cases
Use cases
 

Viewers also liked

Data integration: The STITCH database of protein-small molecule interactions
Data integration: The STITCH database of protein-small molecule interactionsData integration: The STITCH database of protein-small molecule interactions
Data integration: The STITCH database of protein-small molecule interactions
Lars Juhl Jensen
 
Integration of biomedical literature and databases
Integration of biomedical literature and databasesIntegration of biomedical literature and databases
Integration of biomedical literature and databases
Lars Juhl Jensen
 
Text mining for protein and small molecule relations
Text mining for protein and small molecule relationsText mining for protein and small molecule relations
Text mining for protein and small molecule relations
Lars Juhl Jensen
 
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Julien PLU
 
Open access - making the most of biomedical literature mining
Open access - making the most of biomedical literature miningOpen access - making the most of biomedical literature mining
Open access - making the most of biomedical literature mining
Lars Juhl Jensen
 
Utilizing literature for biological discovery
Utilizing literature for biological discoveryUtilizing literature for biological discovery
Utilizing literature for biological discovery
Lars Juhl Jensen
 
One tagger, many uses - Illustrating the power of ontologies in named entity ...
One tagger, many uses - Illustrating the power of ontologies in named entity ...One tagger, many uses - Illustrating the power of ontologies in named entity ...
One tagger, many uses - Illustrating the power of ontologies in named entity ...
Lars Juhl Jensen
 
STRING - Protein networks from data and text mining
STRING - Protein networks from data and text miningSTRING - Protein networks from data and text mining
STRING - Protein networks from data and text mining
Lars Juhl Jensen
 

Viewers also liked (8)

Data integration: The STITCH database of protein-small molecule interactions
Data integration: The STITCH database of protein-small molecule interactionsData integration: The STITCH database of protein-small molecule interactions
Data integration: The STITCH database of protein-small molecule interactions
 
Integration of biomedical literature and databases
Integration of biomedical literature and databasesIntegration of biomedical literature and databases
Integration of biomedical literature and databases
 
Text mining for protein and small molecule relations
Text mining for protein and small molecule relationsText mining for protein and small molecule relations
Text mining for protein and small molecule relations
 
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
 
Open access - making the most of biomedical literature mining
Open access - making the most of biomedical literature miningOpen access - making the most of biomedical literature mining
Open access - making the most of biomedical literature mining
 
Utilizing literature for biological discovery
Utilizing literature for biological discoveryUtilizing literature for biological discovery
Utilizing literature for biological discovery
 
One tagger, many uses - Illustrating the power of ontologies in named entity ...
One tagger, many uses - Illustrating the power of ontologies in named entity ...One tagger, many uses - Illustrating the power of ontologies in named entity ...
One tagger, many uses - Illustrating the power of ontologies in named entity ...
 
STRING - Protein networks from data and text mining
STRING - Protein networks from data and text miningSTRING - Protein networks from data and text mining
STRING - Protein networks from data and text mining
 

Similar to Exploiting NLP for Digital Disease Informatics

Exploiting NLP for Digital Disease Informatics
Exploiting NLP for Digital Disease InformaticsExploiting NLP for Digital Disease Informatics
Exploiting NLP for Digital Disease Informatics
Nigel Collier
 
Bioinformatics issues and challanges presentation at s p college
Bioinformatics  issues and challanges  presentation at s p collegeBioinformatics  issues and challanges  presentation at s p college
Bioinformatics issues and challanges presentation at s p college
SKUASTKashmir
 
Basic of bioinformatics
Basic of bioinformaticsBasic of bioinformatics
Basic of bioinformatics
Jayati Shrivastava
 
Integrative Everything, Deep Learning and Streaming Data
Integrative Everything, Deep Learning and Streaming DataIntegrative Everything, Deep Learning and Streaming Data
Integrative Everything, Deep Learning and Streaming Data
Joel Saltz
 
Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014
Joel Saltz
 
Clinical Genomics and Medicine
Clinical Genomics and MedicineClinical Genomics and Medicine
Clinical Genomics and Medicine
Warren Kibbe
 
Biobanking a user’s perspective: Dr. Jonathan Pevsner
Biobanking a user’s perspective: Dr. Jonathan PevsnerBiobanking a user’s perspective: Dr. Jonathan Pevsner
Biobanking a user’s perspective: Dr. Jonathan Pevsner
Data Science NIH
 
NIH Data Science Special Interest Group
NIH Data Science Special Interest GroupNIH Data Science Special Interest Group
NIH Data Science Special Interest Group
Yaffa Rubinstien
 
Human Disease and Genomics
Human Disease and GenomicsHuman Disease and Genomics
Human Disease and Genomics
oliai
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
JTADrexel
 
Cross-Disciplinary Biomedical Research at Calit2
Cross-Disciplinary Biomedical Research at Calit2Cross-Disciplinary Biomedical Research at Calit2
Cross-Disciplinary Biomedical Research at Calit2
Larry Smarr
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Paolo Missier
 
Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...
Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...
Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...
Health IT Conference – iHT2
 
Gellibolian 2010 Audio Visual2
Gellibolian 2010 Audio Visual2Gellibolian 2010 Audio Visual2
Gellibolian 2010 Audio Visual2
Robert Gellibolian, Ph.D
 
Master Thesis
Master ThesisMaster Thesis
Master Thesis
Svitlana volkova
 
Big Data and Immunology at GFII Paris fevrier 2014
Big Data and Immunology at GFII Paris fevrier 2014Big Data and Immunology at GFII Paris fevrier 2014
Big Data and Immunology at GFII Paris fevrier 2014
OKCC/C3O and CREM/université Lorraine
 
ICBO 2014, October 8, 2014
ICBO 2014, October 8, 2014ICBO 2014, October 8, 2014
ICBO 2014, October 8, 2014
Warren Kibbe
 
Challenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchChallenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical research
FranciscoJAzuajeG
 
Pathomics Based Biomarkers, Tools, and Methods
Pathomics Based Biomarkers, Tools, and MethodsPathomics Based Biomarkers, Tools, and Methods
Pathomics Based Biomarkers, Tools, and Methods
imgcommcall
 
Bio ontology drtc-seminar_anwesha
Bio ontology drtc-seminar_anweshaBio ontology drtc-seminar_anwesha
Bio ontology drtc-seminar_anwesha
anwesha bhattacharya
 

Similar to Exploiting NLP for Digital Disease Informatics (20)

Exploiting NLP for Digital Disease Informatics
Exploiting NLP for Digital Disease InformaticsExploiting NLP for Digital Disease Informatics
Exploiting NLP for Digital Disease Informatics
 
Bioinformatics issues and challanges presentation at s p college
Bioinformatics  issues and challanges  presentation at s p collegeBioinformatics  issues and challanges  presentation at s p college
Bioinformatics issues and challanges presentation at s p college
 
Basic of bioinformatics
Basic of bioinformaticsBasic of bioinformatics
Basic of bioinformatics
 
Integrative Everything, Deep Learning and Streaming Data
Integrative Everything, Deep Learning and Streaming DataIntegrative Everything, Deep Learning and Streaming Data
Integrative Everything, Deep Learning and Streaming Data
 
Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014
 
Clinical Genomics and Medicine
Clinical Genomics and MedicineClinical Genomics and Medicine
Clinical Genomics and Medicine
 
Biobanking a user’s perspective: Dr. Jonathan Pevsner
Biobanking a user’s perspective: Dr. Jonathan PevsnerBiobanking a user’s perspective: Dr. Jonathan Pevsner
Biobanking a user’s perspective: Dr. Jonathan Pevsner
 
NIH Data Science Special Interest Group
NIH Data Science Special Interest GroupNIH Data Science Special Interest Group
NIH Data Science Special Interest Group
 
Human Disease and Genomics
Human Disease and GenomicsHuman Disease and Genomics
Human Disease and Genomics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Cross-Disciplinary Biomedical Research at Calit2
Cross-Disciplinary Biomedical Research at Calit2Cross-Disciplinary Biomedical Research at Calit2
Cross-Disciplinary Biomedical Research at Calit2
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
 
Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...
Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...
Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...
 
Gellibolian 2010 Audio Visual2
Gellibolian 2010 Audio Visual2Gellibolian 2010 Audio Visual2
Gellibolian 2010 Audio Visual2
 
Master Thesis
Master ThesisMaster Thesis
Master Thesis
 
Big Data and Immunology at GFII Paris fevrier 2014
Big Data and Immunology at GFII Paris fevrier 2014Big Data and Immunology at GFII Paris fevrier 2014
Big Data and Immunology at GFII Paris fevrier 2014
 
ICBO 2014, October 8, 2014
ICBO 2014, October 8, 2014ICBO 2014, October 8, 2014
ICBO 2014, October 8, 2014
 
Challenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchChallenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical research
 
Pathomics Based Biomarkers, Tools, and Methods
Pathomics Based Biomarkers, Tools, and MethodsPathomics Based Biomarkers, Tools, and Methods
Pathomics Based Biomarkers, Tools, and Methods
 
Bio ontology drtc-seminar_anwesha
Bio ontology drtc-seminar_anweshaBio ontology drtc-seminar_anwesha
Bio ontology drtc-seminar_anwesha
 

Recently uploaded

“Psychiatry and the Humanities”: An Innovative Course at the University of Mo...
“Psychiatry and the Humanities”: An Innovative Course at the University of Mo...“Psychiatry and the Humanities”: An Innovative Course at the University of Mo...
“Psychiatry and the Humanities”: An Innovative Course at the University of Mo...
Université de Montréal
 
Microbiology & Parasitology Exercises Parts of the Microscope
Microbiology & Parasitology Exercises Parts of the MicroscopeMicrobiology & Parasitology Exercises Parts of the Microscope
Microbiology & Parasitology Exercises Parts of the Microscope
ThaShee2
 
Patellar Instability: Diagnosis Management
Patellar Instability: Diagnosis  ManagementPatellar Instability: Diagnosis  Management
Patellar Instability: Diagnosis Management
Dr Nitin Tyagi
 
Computer in pharmaceutical research and development-Mpharm(Pharmaceutics)
Computer in pharmaceutical research and development-Mpharm(Pharmaceutics)Computer in pharmaceutical research and development-Mpharm(Pharmaceutics)
Computer in pharmaceutical research and development-Mpharm(Pharmaceutics)
MuskanShingari
 
Pharmacology of Prostaglandins, Thromboxanes and Leukotrienes
Pharmacology of Prostaglandins, Thromboxanes and LeukotrienesPharmacology of Prostaglandins, Thromboxanes and Leukotrienes
Pharmacology of Prostaglandins, Thromboxanes and Leukotrienes
Dr. Nikhilkumar Sakle
 
Helminthiasis or Worm infestation in Children for Nursing students
Helminthiasis or Worm infestation in Children for Nursing studentsHelminthiasis or Worm infestation in Children for Nursing students
Helminthiasis or Worm infestation in Children for Nursing students
RAJU B N
 
PGx Analysis in VarSeq: A User’s Perspective
PGx Analysis in VarSeq: A User’s PerspectivePGx Analysis in VarSeq: A User’s Perspective
PGx Analysis in VarSeq: A User’s Perspective
Golden Helix
 
Dr. Tan's Balance Method.pdf (From Academy of Oriental Medicine at Austin)
Dr. Tan's Balance Method.pdf (From Academy of Oriental Medicine at Austin)Dr. Tan's Balance Method.pdf (From Academy of Oriental Medicine at Austin)
Dr. Tan's Balance Method.pdf (From Academy of Oriental Medicine at Austin)
GeorgeKieling1
 
Call Girls Lucknow 9024918724 Vip Call Girls Lucknow
Call Girls Lucknow 9024918724 Vip Call Girls LucknowCall Girls Lucknow 9024918724 Vip Call Girls Lucknow
Call Girls Lucknow 9024918724 Vip Call Girls Lucknow
nandinirastogi03
 
Spontaneous Bacterial Peritonitis - Pathogenesis , Clinical Features & Manage...
Spontaneous Bacterial Peritonitis - Pathogenesis , Clinical Features & Manage...Spontaneous Bacterial Peritonitis - Pathogenesis , Clinical Features & Manage...
Spontaneous Bacterial Peritonitis - Pathogenesis , Clinical Features & Manage...
Jim Jacob Roy
 
June 2024 Oncology Cartoons By Dr Kanhu Charan Patro
June 2024 Oncology Cartoons By Dr Kanhu Charan PatroJune 2024 Oncology Cartoons By Dr Kanhu Charan Patro
June 2024 Oncology Cartoons By Dr Kanhu Charan Patro
Kanhu Charan
 
KENT'S REPERTORY by dr niranjan mohanty.pptx
KENT'S REPERTORY by dr niranjan mohanty.pptxKENT'S REPERTORY by dr niranjan mohanty.pptx
KENT'S REPERTORY by dr niranjan mohanty.pptx
SravsPandu1
 
pharmacology for dummies free pdf download.pdf
pharmacology for dummies free pdf download.pdfpharmacology for dummies free pdf download.pdf
pharmacology for dummies free pdf download.pdf
KerlynIgnacio
 
pharmacy exam preparation for undergradute students.pptx
pharmacy exam preparation for undergradute students.pptxpharmacy exam preparation for undergradute students.pptx
pharmacy exam preparation for undergradute students.pptx
AdugnaWari
 
Public Health Lecture 4 Social Sciences and Public Health
Public Health Lecture 4 Social Sciences and Public HealthPublic Health Lecture 4 Social Sciences and Public Health
Public Health Lecture 4 Social Sciences and Public Health
phuakl
 
Pharmacology of Drugs for Congestive Heart Failure
Pharmacology of Drugs for Congestive Heart FailurePharmacology of Drugs for Congestive Heart Failure
Pharmacology of Drugs for Congestive Heart Failure
Dr. Nikhilkumar Sakle
 
Giloy in Ayurveda - Classical Categorization and Synonyms
Giloy in Ayurveda - Classical Categorization and SynonymsGiloy in Ayurveda - Classical Categorization and Synonyms
Giloy in Ayurveda - Classical Categorization and Synonyms
Planet Ayurveda
 
RESPIRATORY DISEASES by bhavya kelavadiya
RESPIRATORY DISEASES by bhavya kelavadiyaRESPIRATORY DISEASES by bhavya kelavadiya
RESPIRATORY DISEASES by bhavya kelavadiya
Bhavyakelawadiya
 
District Residency Programme (DRP) for PGs in India.pptx
District Residency Programme (DRP) for PGs in India.pptxDistrict Residency Programme (DRP) for PGs in India.pptx
District Residency Programme (DRP) for PGs in India.pptx
CommunityMedicine46
 
Gene Expression System-viral gene delivery Mpharm(Pharamaceutics)
Gene Expression System-viral gene delivery Mpharm(Pharamaceutics)Gene Expression System-viral gene delivery Mpharm(Pharamaceutics)
Gene Expression System-viral gene delivery Mpharm(Pharamaceutics)
MuskanShingari
 

Recently uploaded (20)

“Psychiatry and the Humanities”: An Innovative Course at the University of Mo...
“Psychiatry and the Humanities”: An Innovative Course at the University of Mo...“Psychiatry and the Humanities”: An Innovative Course at the University of Mo...
“Psychiatry and the Humanities”: An Innovative Course at the University of Mo...
 
Microbiology & Parasitology Exercises Parts of the Microscope
Microbiology & Parasitology Exercises Parts of the MicroscopeMicrobiology & Parasitology Exercises Parts of the Microscope
Microbiology & Parasitology Exercises Parts of the Microscope
 
Patellar Instability: Diagnosis Management
Patellar Instability: Diagnosis  ManagementPatellar Instability: Diagnosis  Management
Patellar Instability: Diagnosis Management
 
Computer in pharmaceutical research and development-Mpharm(Pharmaceutics)
Computer in pharmaceutical research and development-Mpharm(Pharmaceutics)Computer in pharmaceutical research and development-Mpharm(Pharmaceutics)
Computer in pharmaceutical research and development-Mpharm(Pharmaceutics)
 
Pharmacology of Prostaglandins, Thromboxanes and Leukotrienes
Pharmacology of Prostaglandins, Thromboxanes and LeukotrienesPharmacology of Prostaglandins, Thromboxanes and Leukotrienes
Pharmacology of Prostaglandins, Thromboxanes and Leukotrienes
 
Helminthiasis or Worm infestation in Children for Nursing students
Helminthiasis or Worm infestation in Children for Nursing studentsHelminthiasis or Worm infestation in Children for Nursing students
Helminthiasis or Worm infestation in Children for Nursing students
 
PGx Analysis in VarSeq: A User’s Perspective
PGx Analysis in VarSeq: A User’s PerspectivePGx Analysis in VarSeq: A User’s Perspective
PGx Analysis in VarSeq: A User’s Perspective
 
Dr. Tan's Balance Method.pdf (From Academy of Oriental Medicine at Austin)
Dr. Tan's Balance Method.pdf (From Academy of Oriental Medicine at Austin)Dr. Tan's Balance Method.pdf (From Academy of Oriental Medicine at Austin)
Dr. Tan's Balance Method.pdf (From Academy of Oriental Medicine at Austin)
 
Call Girls Lucknow 9024918724 Vip Call Girls Lucknow
Call Girls Lucknow 9024918724 Vip Call Girls LucknowCall Girls Lucknow 9024918724 Vip Call Girls Lucknow
Call Girls Lucknow 9024918724 Vip Call Girls Lucknow
 
Spontaneous Bacterial Peritonitis - Pathogenesis , Clinical Features & Manage...
Spontaneous Bacterial Peritonitis - Pathogenesis , Clinical Features & Manage...Spontaneous Bacterial Peritonitis - Pathogenesis , Clinical Features & Manage...
Spontaneous Bacterial Peritonitis - Pathogenesis , Clinical Features & Manage...
 
June 2024 Oncology Cartoons By Dr Kanhu Charan Patro
June 2024 Oncology Cartoons By Dr Kanhu Charan PatroJune 2024 Oncology Cartoons By Dr Kanhu Charan Patro
June 2024 Oncology Cartoons By Dr Kanhu Charan Patro
 
KENT'S REPERTORY by dr niranjan mohanty.pptx
KENT'S REPERTORY by dr niranjan mohanty.pptxKENT'S REPERTORY by dr niranjan mohanty.pptx
KENT'S REPERTORY by dr niranjan mohanty.pptx
 
pharmacology for dummies free pdf download.pdf
pharmacology for dummies free pdf download.pdfpharmacology for dummies free pdf download.pdf
pharmacology for dummies free pdf download.pdf
 
pharmacy exam preparation for undergradute students.pptx
pharmacy exam preparation for undergradute students.pptxpharmacy exam preparation for undergradute students.pptx
pharmacy exam preparation for undergradute students.pptx
 
Public Health Lecture 4 Social Sciences and Public Health
Public Health Lecture 4 Social Sciences and Public HealthPublic Health Lecture 4 Social Sciences and Public Health
Public Health Lecture 4 Social Sciences and Public Health
 
Pharmacology of Drugs for Congestive Heart Failure
Pharmacology of Drugs for Congestive Heart FailurePharmacology of Drugs for Congestive Heart Failure
Pharmacology of Drugs for Congestive Heart Failure
 
Giloy in Ayurveda - Classical Categorization and Synonyms
Giloy in Ayurveda - Classical Categorization and SynonymsGiloy in Ayurveda - Classical Categorization and Synonyms
Giloy in Ayurveda - Classical Categorization and Synonyms
 
RESPIRATORY DISEASES by bhavya kelavadiya
RESPIRATORY DISEASES by bhavya kelavadiyaRESPIRATORY DISEASES by bhavya kelavadiya
RESPIRATORY DISEASES by bhavya kelavadiya
 
District Residency Programme (DRP) for PGs in India.pptx
District Residency Programme (DRP) for PGs in India.pptxDistrict Residency Programme (DRP) for PGs in India.pptx
District Residency Programme (DRP) for PGs in India.pptx
 
Gene Expression System-viral gene delivery Mpharm(Pharamaceutics)
Gene Expression System-viral gene delivery Mpharm(Pharamaceutics)Gene Expression System-viral gene delivery Mpharm(Pharamaceutics)
Gene Expression System-viral gene delivery Mpharm(Pharamaceutics)
 

Exploiting NLP for Digital Disease Informatics

  • 1. Exploiting NLP for Digital Disease Informatics University of Warwick, October 15th 2015 Nigel Collier, Language Technology Lab Department of Theoretical and Applied Linguistics
  • 2. Really understanding natural language is the next grand challenge • High throughput methods have transformed biomedicine into a data-rich science • All genes in a genome, all proteins in a proteome, all transcripts in a cell, all metabolic processes in a tissue…
  • 3. Really understanding natural language is the next grand challenge • High throughput methods have transformed biomedicine into a data-rich science • All genes in a genome, all proteins in a proteome, all transcripts in a cell, all metabolic processes in a tissue… • A significant portion of human health data is ‘messy data’ existing only as unstructured text • Biomedical literature, Clinical trials data, Lab notebooks, Clinical records, Diagnostic reports, News reports, Social media messages • Represents the most contextually grounded, high precision information about an individual’s health, attitudes and behaviours
  • 4. Really understanding natural language is the next grand challenge • High throughput methods have transformed biomedicine into a data-rich science • All genes in a genome, all proteins in a proteome, all transcripts in a cell, all metabolic processes in a tissue… • A significant portion of human health data is ‘messy data’ existing only as unstructured text • Biomedical literature, Clinical trials data, Lab notebooks, Clinical records, Diagnostic reports, News reports, Social media messages • Represents the most contextually grounded, high precision information about an individual’s health, attitudes and behaviours • Natural language processing (NLP) is a cornerstone technology to translate ‘messy data’ into structured forms that are systematically encoded, e.g. SNOMED-CT, ICD.
  • 5. Experience from personal research (1) Global infectious disease alerting and mapping (2) Extracting a database of phenotype terms (3) Understanding the voice of the patient (4) Chemical cancer risk assessment (5) Critical hypothesis generation from literature
  • 6. Typical workflow from text to knowledge raw text document sentence segmentation tokenization lexical featurisation entity recognition trigger detection relation extraction event extraction entity grounding knowledge objects syntactic parsing
  • 7. Broad Research Objectives • Extrinsic: Robust data collection from across health-related text types: literature, patient records, news, social media (public health alerts, developing disease profiles, etc.) • Intrinsic: Understand how NLP/ML/Ontology techniques perform and can be improved in operational settings
  • 8. BIOCASTER: GLOBAL INFECTIOUS DISEASE ALERTING AND MAPPING Case study #1 [5] Collier, N. et al. (2008). BioCaster: detecting public health rumors with a Web-based text mining system. Bioinformatics, 24(24), 2940-2941. [6] Collier, N., et al. (2011). OMG U got flu? Analysis of shared health messages for bio-surveillance. J. Biomedical Semantics, 2(S-5), S9. [7] Hay, S. I., et al. (2013). Global mapping of infectious disease. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 368(1614), 20120250.
  • 9. Infectious diseases spread rapidly “We live in a world where threats to health arise from the speed and volume of air travel, the way we produce and trade food, the way we use and misuse antibiotics, and the way we manage the environment…” - Dr. Margaret Chan, DG WHO SARS, 2003 HK, world H5N1 flu, 2003- PRC, Thailand, ROC, Vietnam Foot & mouth, 2001 United Kingdom Ebola, 2014- Guinea, Liberia, Sierra Leone, Nigeria
  • 10. Trend graphs Event summaries Event alerts Ontology browsing Email/GeoRSS alerting Watchboard, etc. Real time Twitter analysis Up to date news in 12 languages Event database search GHSI partners US UK FR DE WHO IT JP CA Digital epidemic surveillance with BioCaster
  • 11. Example frame <SLOT name="HAS_DISEASE" type="DISEASE" content="Anthrax" alt="" root_term="Anthrax" bid=""/> <SLOT name="HAS_LOCATION.COUNTRY" type="LOCATION" content="Morocco" alt="" root_term="Morocco" bid=""/> <SLOT name="HAS_LOCATION.PROVINCE" type="LOCATION" content="Marrakech" alt="" root_term="" bid=""/> <SLOT name="HAS_AGENT" type="micro_organism" content="Bacillus anthracis" alt="" root_term="" bid=""/> <SLOT name="HAS_SPECIES" type="animal" content="human" alt="" root_term="" bid=""/> <SLOT name="TIME.relative" type="string" content=""/> <SLOT name="INTERNATIONAL_TRAVEL" type="Boolean" content="false"/> <SLOT name="DELIBERATE_RELEASE" type="Boolean" content="false"/> <SLOT name="ZOONOSIS" type="Boolean" content="false"/> <SLOT name="DRUG_RESISTANCE" type="Boolean" content="false"/> <SLOT name="FOOD_CONTAMINATION" type="Boolean" content="false"/> <SLOT name="HOSPITAL_WORKER" type="Boolean" content="false"/> <SLOT name="FARM_WORKER" type="Boolean" content="false"/> <SLOT name="MALFORMED_PRODUCT" type="Boolean" content="false"/> <SLOT name="NEW_TYPE_AGENT" type="Boolean" content="false"/> <SLOT name="SERVICE_DISRUPTION" type="Boolean" content="false"/> <SLOT name="CATEGORY_A" type="Boolean" content="true"> </EVENT>
  • 12. Technical challenges X0,000 news providers REAL TIME SCALING 30,000-40,000 news items/day 900 on topic/day 200 events/day 4 alerts/day
  • 13. Technical challenges X0,000 news providers MULTILINGUALITY Avian Flu Influenza aviaire 鳥インフルエンザ 조류인플루엔자 โรคไข้หวัดนก Cúm gia cầm REAL TIME SCALING  Increased sensitivity and timeliness from multilingual news News event counts for porcine foot- and-mouth outbreak in South Korea 2010-2011
  • 14. Technical challenges X0,000 news providers MULTILINGUALITY REAL TIME SCALING AMBIGUITY “Obama fever builds as Americans await a new era” Equine influenza in Camden Camden (UK) Camden (AU) Camden (CA) + 19 others Entity identification Toponym grounding Tajoura Tajura Tajoora… Variant transliterations Coreference “Two British holidaymakers fell ill… ” “Two male pensioners died…” 2 or 4 victims? Temporal identification “The Spanish flu outbreak…
  • 16. Source: BioCaster Outbreak characteristics: Early surge vs multi-modal transmission News event frequency over time Looking for bursts of activity
  • 17. Source: GENI-DB 0 1 0 40 80 120 160 200 ct μ μ+3σ Gold Alerts with the C2 test statistic: St = max(0, (Ct – (μt + 3σt))/ σt) First English language reports (MMWR + AP) Understanding norms and their violations
  • 18. 5 detection algorithms 1. Early aberration reporting system (EARS) C2 algorithm • captures the number of standard deviations that the current count exceeds the history mean; • St = max(0, (Ct – (μt + kσt))/ σt) 2. EARS C3 algorithm • similar to C2 except that C3 uses a weighted sum of the previous 3 days for the current period; 3. W2 algorithm • a modified version of C2 which ignores history counts on Saturdays and Sundays to compensate for day of week effects; 4. F statistic • compares the variance in the history window to the variance in the current window; • St = σt 2 +σb 2 5. Exponential Weighted Moving Average (EWMA) • provides less weight to days in the history that are further from the test day. • St = (Yt – μt)/[σt * (λ/(2- λ))1/2], where Y1 = C1 and Yt = λCt + (1- λ)Yt-1 Model parameters were estimated based on an additional 5 epidemic data sets from ProMED-mail (data not shown) [8] Burkom H. S. (2005), “Accessible Alerting Algorithms for Biosurveillance”. National Syndromic Surveillance Conference [9] Jackson M. L. et all (2007), “A simulation study comparing aberration detection algorithms for syndromic surveillance” Medical Informatics and Decision Making , 7(6): BMC, DOI: 10.1186/1472-6947-7-6. [10] Madoff L. (2004), “ProMED-mail: An early warning system for emerging diseases”. Clin Infect Dis , 39(2): 227–232.
  • 19. # Disease Country ProMED-alerts 1 Hand,foot,mo uth PR China 9 2 Ebola Congo 17 3 Yellow fever Brazil 28 4 Influenza USA 21 5 Cholera Iraq 5 6 Chikungunya Singapore 8 7 Anthrax USA 15 8 Yellow fever Argentina 5 9 Ebola Reston Philippine s 15 # Disease Country ProMED-alerts 10 Influenza Egypt 49 11 Plague USA 8 12 Dengue Brazil 27 13 Dengue Indonesia 14 14 Measles UK 13 15 Chikunguny a Malaysia 15 16 Yellow fever Senegal 0 17 Influenza Indonesia 35 18 Influenza Banglade sh 3 14 countries and 11 infectious disease types. 366 days of news data was collected from BioCaster for each disease and country. The study period is 17th June 2008 to 17th June 2009 Creating a benchmark data set
  • 20. C3 C2 W2 F-statistic EWMA Sensitivity 0.74 0.66 0.66 0.78 0.73 (0.69-0.78) (0.61-0.72) (0.60-0.71) (0.74-0.82) (0.68-0.78) Specificity 0.96 0.98 0.98 0.92 0.95 (0.95-0.96) (0.98-0.98) (0.98-0.99) (0.91-0.92) (0.94-0.96) PPV 0.55 0.64 0.65 0.46 0.47 (0.98-0.99) (0.98-0.99) (0.98-0.99) (0.98-0.99) (0.98-0.99) NPV 0.98 0.98 0.98 0.98 0.98 (0.98-0.99) (0.98-0.99) (0.98-0.99) (0.98-0.98) (0.98-0.99) Alarms/100 days 6.48 4.52 4.17 12.34 7.85 F-measure 0.63 0.65 0.66 0.58 0.58 Results in parentheses show 95% confidence intervals [11] Collier, N. (2009), “What’s unusual in online disease outbreak news?”, in BMC Biiomedical Semantics, 1(2). Comparison of 5 aberration detection algorithms
  • 21. Field evaluation • (2006-2012) Global Health Security Initiative– a unique initiative by G7+WHO+EC to bring together end-users, system providers and stakeholders to test the feasibility of open source public health intelligence systems. [12] Barboza, P., Vaillant, L., Le Strat, Y., Hartley, D. M., Nelson, N. P., Mawudeku, A., Madoff, L. C., Linge, J. P., Collier, N., Brownstein, J. S. and Astagneau, P. (2014). Factors Influencing Performance of Internet-Based Biosurveillance Systems Used in Epidemic Intelligence for Early Detection of Infectious Diseases Outbreaks. PloS one, 9(3), e90536. [13] Barboza, P., Vaillant, L., Mawudeku, A., Nelson, N., Hartley, D., Madoff, L., Linge, J., Collier, N., Brownstein, J., Yangarber, R. and Astagneau, P. (2013), “Evaluation of epidemic intelligence systems integrated in the Early Alerting and Reporting project for the detection of A/H5N1 Influenza events”, PLoS One, 8(3):e57252. Major findings for A/H5H1: - Detection rates for individual systems from 31% to 38% - Rising to 72% for the combined system - PPV ranged from 3% to 24% - F1 ranged from 6% to 27% - Sensitivity ranged from 38% to 72% - Average improvement in alerting over WHO or OIE was 10.2 days
  • 22. User outcomes • Used by WHO and Japanese MoH to detect early cases during the A(H1N1) pandemic; • Used by ECDC to monitor diseases during the Shanghai Expo 2010, London Olympics 2012; • Used by French Institute for Public Health to monitor for human- to-human A(H5N1) transmission; • Used by GHSI members to monitor for suspected accidental or deliberate releases; • Used by CDC to help monitor for health impact of the Oil spill in the Gulf of Mexico;
  • 23. PHENOMINER/PHENEBANK: EXTRACTING A DATABASE OF PHENOTYPE TERMS Case study #2 [14] Collier, N., Groza, T., Smedley, D., Robinson, P., Oellrich, A. and Rebholz-Schuhmann, D. (2015). PhenoMiner: from text to a database of phenotypes associated with OMIM diseases. Database, Oxford University Press (in press). [15] Collier, N., Oellrich, A. and Groza, T. (2013), “Toward knowledge support for analysis and interpretation of complex traits”, Genome Biology 14(9):214.
  • 24. What is a phenotype? Image courtesy of Washington, Haendel, Mungall, Ashburner, Westfield and Lewis (2009), “Linking human diseases to animal models using ontology-based phenotype annotation”, PLoS Biology, 7(11):e1000247.
  • 25. “… patients were selected for FOXP2 screening only if they fulfilled the following criteria: presence of speech articulation problems diagnosed by a clinician …” HPO: 0009088 Speech articulation difficulties Image courtesy of Damian Smedley, Welcome Trust Sanger Institute, Hinxton and Tudor Groza, University of Queensland, Brisbane Coding personal terminology
  • 26. SVM learn-to-rank (pairwise) Maximum entropy Priority list heuristic “… patients were selected for FOXP2 screening only if they fulfilled the following criteria: presence of speech articulation problems diagnosed by a clinician” “… patients were selected for FOXP2 screening only if they fulfilled the following criteria: presence of speech articulation problems diagnosed by a clinician”
  • 27. Creating a benchmark data set • Data from OMIM cited autoimmune literature (112 abstracts, 472 phenotypes, 1611 gene/gene products).
  • 28. F-scores computed using ablation on various domain ontologies
  • 29. F-scores using 3 hypothesis resolution strategies [16] Collier, N., Tran, M., Le, H. Ha, Q., Oellrich, A. Rebholz-Schuhmann, D. (2013), “Learning to recognize phenotype candidates in the auto-immune literature using SVM re-ranking”, PLoS One 8(10): e72965.
  • 30. Lesson learnt … sampling matters Resource Size (records) PubMed 23,765,575 GENIA 2,000 PennBioIe 1414 FSU-PRGE 3,236 Arizona corpus 2,775 sentences I2B2/VA 2010 826
  • 31. M1: In domain approach Sample B1 Learner Knowledge Evaluation (A) Sample B2 Sample B
  • 32. M2: Out domain approach Sample A Learner Knowledge Evaluation (B) Sample B
  • 33. M3: Mix-in approach Sample A Learner Knowledge Evaluation (B) Sample BSample B +
  • 34. M4: stack approach Learner Knowledge Evaluation (B) Sample BSample B Sample A Learner Knowledge
  • 35. M5: binary class Sample A Learner Knowledge Evaluation (B) Sample B Sample B + Re-label PHE PHE-1 and PHE-2 Re-label PHE-1 and PHE-2 as PHE
  • 36. M6: frustratingly simple Sample A Learner Knowledge Evaluation (B) Sample B Sample B + Re-label features as Sample A, Sample B and Joint
  • 38. How can we do domain adaptation better (with less annotations)? [17] Collier, N., Paster, F., Campus, H., & Tran, A. M. V. (2014), “The impact of near domain transfer on biomedical named entity recognition”, Proc. 5th International Workshop on Health Text Mining and Information Analysis (LOUHI) at the European Conference on Computational Linguistics (EACL), Gothenburg, Sweden, pp. 11-20.
  • 39. SIPHS: UNDERSTANDING THE VOICE OF THE PATIENT Case study #3 [18] Limsopatham, N. and Collier, N. (2015), “Adapting phrase-based machine translation to normalise medical terms in social media messages”, in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17-21 September 2015, pp. 1675-1680. [19] Limsopatham, N. and Collier, N. (2015), “Towards the semantic interpretation of personal health messages from social media”, in Proceedings of the 24th ACM International Conference on Information and Knowledge Management (CIKM 2015), Workshop on Understanding the City with Urban Informatics (UCUI 2015), Melbourne, Australia, 19-23 October 2015.
  • 40. What do people talk about? Types Tweet samples Influenza confirmation I got flu n coughed a lot. Now my voice is like monster’s voice. Rrr Influenza symptoms My day: flu-like symptoms (headache, body aches, cough, chills, 100.9 fever). Swine flu not ruled out. #H1N1 Flu shots I’m still getting flu shots, nothing is worth flu turning into bronchitis into pneumonia Self protection Cover your mouth if coughing, use a tissue, wash your hands often & get a flu shot - protect and defend your community from #H1N1 Medication Wondering why I didn’t take the flu shot, laying in bed with cough drops, medicine, and the remote
  • 41. Tracking anxiety indicators have moderate-strong correlation with CDC seasonal flu tracking Category Spearman’s Rho P-value A 0.66 0.020 S 0.66 0.021 I 0.58 0.048 P 0.67 0.017 A+I+P 0.68 0.008 A+I+P+S 0.67 0.017 0 50 100 150 200 250 300 350 400 450 0 500 1000 1500 2000 2500 3000 46 47 48 49 50 51 52 1 2 3 4 5 CDC A S I P A+I+P A+I+P+S Data source: CDC (2009-2010 flu season) “Cover your mouth if coughing, use a tissue, wash your hands often & get a flu shot - protect and defend your community” “I’m still getting flu shots, nothing is worth flu turning into bronchitis into pneumonia” “I can ignore this sore throat no longer. And, um, maybe I should have gotten that H1N1 vaccine.“
  • 42. Frustratingly simple models work better Classifying respiratory syndrome: Turning 225,000 Tweets into a high correlation influenza tracker [22] Doan, S., Ohno-Machado, L. and Collier, N. (2012), "Enhancing Twitter data analysis with simple semantic filtering: example in tracking Influenza-Like Illnesses", in the 2nd IEEE Conference on Healthcare Informatics, Imaging and Systems Biology: Analyzing Big Data for Healthcare and Biomedical Sciences, California, USA, September 27-28.
  • 43. Coding the voice of the patient in SIPHS • Integrate the language of Social Media and Lifescience Ontologies • ‘Voice of the patient’ – real time public health mapping/risk analysis • Code patient-centred vocabulary and links • Generate public health summaries, e.g. infectious diseases, ADRs Twitter message SNOMED preferred term SNOMED ID No way I’m getting any sleep 2nite Insomnia 193462001 Take _DRUG_ and can’t even focus forreal Unable to concentrate 60032008 _DRUG_ makes u skinny Weight loss 89362005
  • 44. “You shall know a word by the company it keeps” – (Firth, J. R. 1957) • Existing work [1,2] used word vector similarity to measure the semantic similarity between texts  Performance seems depended on the used vector representation (e.g. CBOW [1], GloVe [2]) [23] Mikolov et al. Distributed representations of words and phrases and their compositionality. NIPS 2013 [24] Pennington et al. GloVe: Global vectors for word representation. EMNLP 2014 • Recent advances in deep learning technology [1,2] allowed the learned representation of terms (i.e. DWRs) that could capture the semantic similarity of terms based on their co-occurrences e.g. Continuous bag-of-words (CBOW) [1], Global Vector (GloVe) [2] 44
  • 45. Related work – Phrase-based MT • Phrase-based MT [3]: Translate between languages by learning local term dependencies from parallel corpora  We adapt phrase-based MT to translate from social media language to formal medical language Can’t even focus forreal  no concentrate  ??? [25] Koehn et al. Statistical phrase-based translation. NAACL 2003 45
  • 46. Adapting Phrase-based MT for Twitter Normalisation • We use phrase-based MT to translate social media text to formal medical text, then map the translated symptoms to a SNOMED-CT concept Can’t even focus forreal  unable to focus  unable to concentrate (ID 60032008) translate find semantic distance [18] Limsopatham, N. and Collier, N. (2015), “Adapting phrase-based machine translation to noramlise medical terms in social media messages”, in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, September, pp. 1675-1680.
  • 47. A Twitter Phrase Training pairs of Twitter phrases and SNOMED-CT terms A Phrase-based MT Model Our Mapping Approach (i.e. Sim, rSim) A ranking of mapped concepts e.g. ‘No way I’m getting any sleep 2nite’ e.g. ‘no sleep week’ = ‘Insomnia’, ‘so unfocussed!!!’ = ‘Unable to concentrate’ Using a phrase-based model, such as Koehn et al. (2003) e.g. 1. Insomnia (193462001) 2. Productivity at work (224403006) System Architecture
  • 48. Experimental Setup • Instantiations of our approach:  Sim(1): using only the best translation  Sim(5): using the top 5 translations  rSim(5): using the top 5 translations • Baseline: Cosine similarity of vector representations of the original tweet and the description of a concept  One-hot  Continuous Bags of Words (CBOW)  Global Vector (GloVe) 48
  • 49. Experimental Results • RQ1: Does our approach perform better than SOTA DWR baselines? 0.1675 0.2232 0.2491 0.2458 0.1896 0.1869 0 0.05 0.1 0.15 0.2 0.25 0.3 Baseline Sim(1) Sim(5) rSim(5) One-hot CBOW GloVe Yes, all instances of our approach markedly outperformed the DWR baselines by up to 33%MRR-5 49 Twitter message: “unable to sleep at all” Baseline: Mapping: “unable to sleep at all”  ‘unable to concentrate’ Our approach: Translation: “unable to sleep at all”  “insomnia of” Mapping: “insomnia of”  ‘insomnia’
  • 50. Experimental Results • RQ2: Which types of DWRs are effective for our approach? 0.1675 0.2232 0.2491 0.2458 0.1896 0.2070 0.2104 0.2109 0.1869 0.2500 0.2638 0.2617 0 0.05 0.1 0.15 0.2 0.25 0.3 Baseline Sim(1) Sim(5) rSim(5) One-hot CBOW GloVe Both Sim and rSim outperform the baseline, regardless of the used vector representation MRR-5
  • 51. Experimental Results • RQ3: Would the performance improve if we consider both original and translated text when mapping a concept? Performances improved when using one-hot representationMRR-5 0.2232 0.242 0.2491 0.2556 0.2458 0.2594 0.2070 0.1953 0.2104 0.2144 0.2109 0.207 0.2500 0.2532 0.2638 0.2600 0.2617 0.2509 0.15 0.17 0.19 0.21 0.23 0.25 0.27 Sim(1) Sim(1)+ Sim(5) Sim(5)+ rSim(5) rSim(5)+ One-hot CBOW GloVe 51
  • 52. Summary • How we exploit the base of medical evidence is changing as access to unstructured ‘messy’ data opens up new opportunities • Data access, bias and standards • We can expect impact in epidemic detection, pharmacovigilence, translational health, disease mapping, risk communication, rare disease profiling and many other areas. • Encoding the data increases value through data mining, exchange and integration • Machine learning outperforms dictionaries and hand built rules • Finding the right lexical representation and right target form is key
  • 53. Thank you Contributions by: Nigel Collier nhc30@cam.ac.uk Anna Korhonen alk23@cam.ac.uk Nut Limsopatham nl347@cam.ac.uk Further information at the Language Technology Lab http://ltl.mml.cam.ac.uk/ Funding: