Is Natural Language Processing Good for your Health?
25th April 2018
Nigel Collier
Theoretical and Applied Linguistics, MML
Some Preliminaries
• Knowledge graph:
• A type of large-scale semantic network describing concepts and their logical
relationships, e.g. WordNet, BabelNet, Yago, SNOMED CT
• Named Entity
• A sequence of words that denote a term or individual
Entity linking (‘grounding’, ‘coding’)
• To establish the specific reference of a named entity according to an ontology
• Distributional semantics
• A system of representing meaning whereby words/phrases are understood as points in
a low dimensional geometric space. Semantics is encoded as a configuration pattern
over all dimensions. e.g. word2vec embeddings (Mikolov et a. 2013) (Firth 1957)
• Deep neural networks (‘deep learning’)
• A type of artificial neural network with multiple hidden layers of units
Data Science in Health Requires the Combined Expertise of…
Biologists
Clinicians
Mathematicians
Statisticians
Computer
ScientistsGeneticists
Bioinformaticians
Computational
Linguists…?
Computational Models of Language are Key to Making Sense
of Health Data
• Western clinical notes date back to the 5th/6th
centuries bc
• Today 60 to 70% of NHS data exists only as
unstructured text
• Biomedical literature, Clinical trials data, Lab
notebooks, Clinical records, Diagnostic reports,
News reports on disease outbreaks, Social
media messages, Patient interviews, Patient
forum data …
• Represents the most contextually grounded,
high precision information about an individual’s
health, attitudes and behaviours
Case Studies: Health and NLP
• Infectious disease monitoring [4,5]
• Drug safety analysis [6,7]
• Diagnosis of semantic dementia [8]
• Monitoring air quality [9]
[4] Aramaki, E. et al. (2011). Twitter catches the flu: detecting influenza epidemics using Twitter. In Proceedings of the conference on empirical methods in natural language
processing (pp. 1568-1576). Association for Computational Linguistics. [5] Collier, N., Son, N. T., & Nguyen, N. M. (2011). OMG U got flu? Analysis of shared health messages
for bio-surveillance. Journal of biomedical semantics, 2(5), S9. [6] Sarker, A. et al. (2015). Utilizing social media data for pharmacovigilance: A review. Journal of biomedical
informatics, 54, 202-212. [7] Yang, C. C., et al. (2014). Postmarketing drug safety surveillance using publicly available health-consumer-contributed content in social media. ACM
Transactions on Management Information Systems (TMIS), 5(1), 2 [8] Pakhomov, S. V., Smith, G. E., Marino, S., Birnbaum, A., Graff-Radford, N., Caselli, R., ... & Knopman, D.
S. (2010). A computerized technique to assess language use patterns in patients with frontotemporal dementia. Journal of neurolinguistics, 23(2), 127-144. [9] Wang, S., Paul, M.
J., & Dredze, M. (2015). Social media as a sensor of air quality and public response in china. Journal of medical Internet research, 17(3), e22. [10] Nakhasi, A., Passarella, R.,
Bell, S. G., Paul, M. J., Dredze, M., & Pronovost, P. (2012, October). Malpractice and malcontent: Analyzing medical complaints in twitter. In 2012 AAAI Fall Symposium Series.
[11] Weber, I., & Achananuparp, P. (2015). Insights from machine-learned diet success prediction. arXiv preprint arXiv:1510.04802. [12] Dos Reis, V. L., & Culotta, A. (2015,).
Using matched samples to estimate the effects of exercise on mental health from Twitter. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (pp. 182-
188). [13] Patel, R. et al. (2015). Negative symptoms in schizophrenia: a study in a large clinical sample of patients using a novel automated method. BMJ open, 5(9), e007619.
• Analysing malpractice [10]
• Diet success [11]
• Exercise and mental health [12]
• Identifying symptoms of
schizophrenia [13]
Combining Language Models for Information Extraction
raw text
document
Sentence
segmentation
Tokenization
Lexical
featurisation
Entity
recognition
Trigger
detection
Relation
extraction
Event
extraction
Entity
linking
knowledge objects
Syntactic
parsing
Entity Linking: a Central Task in Information Extraction
Textual evidence for ‘JFK International’
“JetBlue begins direct service between Barnstable
Airport and JFK International” [14]
[14] Ling, X., Singh, S., & Weld, D. S. (2015). Design challenges for entity linking. Transactions of the
Association for Computational Linguistics, 3, 315-328.
Wikipedia entry for ‘JFK International’
Illustrating the Complexities of Entity Linking in Health
Source Entity Mention Target Concept (SNOMED) Current
Data Driven
Solution?
Twitter hungry hunger y
Twitter gained 2kgs in
weight
weight gain y
Twitter head spinning dizziness y
Twitter rupturd his bowel gastrointestinal perforation y
EHR No pneumothorax history of pneumothorax,
negative
?
EHR right breast cancer breast cancer + right n
EHR A.FIB atrial fibrillation ?
Literature peculiar changes in
the dendrites of
Purjinje cells
abnormal + Purjinje cell +
dendrite + associated
morphology
n
A Brief Overview of Entity Linking
• 1. Manually defined symbolic features – fast, clear but restricted coverage
• String matching on concept labels, e.g. “hungry”  hunger [15]
• 2. Machine translation models using symbolic features – better coverage
• Recognise variant compositions, e.g. “gained 2kgs in weight”  weight gain [16]
• 3. Distributed compositional semantic models – best coverage but opaque
• Recognises latent similarities, e.g. “head spinning”  dizziness
• Dependent to some extent on large-scale data
• Doesn’t account by itself for complex utterances such as post-coordinated concepts.[17]
• Vagueness? “terrible headache this morning”  sinus headache ? Tension headache?
Hangover ?
[15] Zhiyong Lu, et al. The gene normalization task in biocreative iii. BMC bioinformatics, 12(8):S2, 2011. [16] Nut Limsopatham and Nigel Collier. Adapting
phrase-based machine translation to normalise medical terms in social media messages. In Proceedings of the 2015 Conference on Empirical Methods in Natural
Language Processing, pages 1675–1680. Association for Computational Linguistics, 2015. [17] Ferdinand Dhombres and Olivier Bodenreider. Interoperability
between phenotypes in research and healthcare terminologies—investigating partial mappings between hpo and snomed ct. J. Biomedical semantics, 7(1):3, 2016.
Experimental Setup: Language Models
10
Model Description Ref.
TF-IDF Traditional statistical term
matching approach
[18]
BM25 Traditional term ranking function [19]
SVM LTR Supervised machine learning
model (current SOTA)
[20]
DWR Cosine similarity between word
vectors for mention and concept
SMT + DWR Statistical word alignment model [21]
CNN Supervised neural network model [22]
[18] Spärck Jones, K. (2004). IDF term weighting and IR research lessons. Journal of documentation, 60(5), 521-523. [19] Robertson, S., Zaragoza, H., & Taylor,
M. (2004, November). Simple BM25 extension to multiple weighted fields. In Proceedings of the thirteenth ACM international conference on Information and
knowledge management (pp. 42-49). ACM. [20] Leaman, R., Islamaj Doğan, R., & Lu, Z. (2013). DNorm: disease name normalization with pairwise learning to
rank. Bioinformatics, 29(22), 2909-2917. [21] Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., ... & Dyer, C. (2007, June). Moses:
Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions (pp.
177-180). Association for Computational Linguistics. [22] Limsopatham, N., & Collier, N. H. (2016). Normalising medical concepts in social media texts by learning
semantic representation.
Experimental Setup: Data Sets
• We evaluate our approaches on three different datasets
11
Dataset # Queries
# Target
concepts
Data source
TwADR-S 201 58 Twitter Messages
TwADR-L 2,220 1,436 Twitter Messages
AskPatient 8,662 1,036 Blog posts from askapatient.com
Experimental Results
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
TwADR-S TwADR-L AskPatient
TF-IDF BM25 DWR SVM LTR SMT+DWR CNN
Accuracy
12
Conclusion
• NLP can contribute to medical discovery through the intelligent use of existing data. This
shortens the time to insight.
• I’ve proposed and tested various ways of linking entities to knowledge bases using
machine learning
• I’ve used distributed semantic representations to infer latent similarities between
mentions and concepts
• Future research:
• Generating explanations
• Handling compositionality
• Scaling up to larger knowledge bases
• Get involved: Data challenge?  CLEF eHealth Lab + BioCreative Connect? 
ESPRC Healtext Network + Alan Turing Institute Publish? e.g. ACL + EMNLP +
BioNLP + SocialNLP Software? Apache cTakes + GATE
Thank you!
Slides available at: https://www.slideshare.net/nigel_collier
https://sites.google.com/site/nhcollier/
nhc30@cam.ac.uk
ORCID: 0000-0002-7230-4164
Twitter: @nigelhcollier

Cambridge seminar april 2018

  • 1.
    Is Natural LanguageProcessing Good for your Health? 25th April 2018 Nigel Collier Theoretical and Applied Linguistics, MML
  • 2.
    Some Preliminaries • Knowledgegraph: • A type of large-scale semantic network describing concepts and their logical relationships, e.g. WordNet, BabelNet, Yago, SNOMED CT • Named Entity • A sequence of words that denote a term or individual Entity linking (‘grounding’, ‘coding’) • To establish the specific reference of a named entity according to an ontology • Distributional semantics • A system of representing meaning whereby words/phrases are understood as points in a low dimensional geometric space. Semantics is encoded as a configuration pattern over all dimensions. e.g. word2vec embeddings (Mikolov et a. 2013) (Firth 1957) • Deep neural networks (‘deep learning’) • A type of artificial neural network with multiple hidden layers of units
  • 3.
    Data Science inHealth Requires the Combined Expertise of… Biologists Clinicians Mathematicians Statisticians Computer ScientistsGeneticists Bioinformaticians Computational Linguists…?
  • 4.
    Computational Models ofLanguage are Key to Making Sense of Health Data • Western clinical notes date back to the 5th/6th centuries bc • Today 60 to 70% of NHS data exists only as unstructured text • Biomedical literature, Clinical trials data, Lab notebooks, Clinical records, Diagnostic reports, News reports on disease outbreaks, Social media messages, Patient interviews, Patient forum data … • Represents the most contextually grounded, high precision information about an individual’s health, attitudes and behaviours
  • 5.
    Case Studies: Healthand NLP • Infectious disease monitoring [4,5] • Drug safety analysis [6,7] • Diagnosis of semantic dementia [8] • Monitoring air quality [9] [4] Aramaki, E. et al. (2011). Twitter catches the flu: detecting influenza epidemics using Twitter. In Proceedings of the conference on empirical methods in natural language processing (pp. 1568-1576). Association for Computational Linguistics. [5] Collier, N., Son, N. T., & Nguyen, N. M. (2011). OMG U got flu? Analysis of shared health messages for bio-surveillance. Journal of biomedical semantics, 2(5), S9. [6] Sarker, A. et al. (2015). Utilizing social media data for pharmacovigilance: A review. Journal of biomedical informatics, 54, 202-212. [7] Yang, C. C., et al. (2014). Postmarketing drug safety surveillance using publicly available health-consumer-contributed content in social media. ACM Transactions on Management Information Systems (TMIS), 5(1), 2 [8] Pakhomov, S. V., Smith, G. E., Marino, S., Birnbaum, A., Graff-Radford, N., Caselli, R., ... & Knopman, D. S. (2010). A computerized technique to assess language use patterns in patients with frontotemporal dementia. Journal of neurolinguistics, 23(2), 127-144. [9] Wang, S., Paul, M. J., & Dredze, M. (2015). Social media as a sensor of air quality and public response in china. Journal of medical Internet research, 17(3), e22. [10] Nakhasi, A., Passarella, R., Bell, S. G., Paul, M. J., Dredze, M., & Pronovost, P. (2012, October). Malpractice and malcontent: Analyzing medical complaints in twitter. In 2012 AAAI Fall Symposium Series. [11] Weber, I., & Achananuparp, P. (2015). Insights from machine-learned diet success prediction. arXiv preprint arXiv:1510.04802. [12] Dos Reis, V. L., & Culotta, A. (2015,). Using matched samples to estimate the effects of exercise on mental health from Twitter. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (pp. 182- 188). [13] Patel, R. et al. (2015). Negative symptoms in schizophrenia: a study in a large clinical sample of patients using a novel automated method. BMJ open, 5(9), e007619. • Analysing malpractice [10] • Diet success [11] • Exercise and mental health [12] • Identifying symptoms of schizophrenia [13]
  • 6.
    Combining Language Modelsfor Information Extraction raw text document Sentence segmentation Tokenization Lexical featurisation Entity recognition Trigger detection Relation extraction Event extraction Entity linking knowledge objects Syntactic parsing
  • 7.
    Entity Linking: aCentral Task in Information Extraction Textual evidence for ‘JFK International’ “JetBlue begins direct service between Barnstable Airport and JFK International” [14] [14] Ling, X., Singh, S., & Weld, D. S. (2015). Design challenges for entity linking. Transactions of the Association for Computational Linguistics, 3, 315-328. Wikipedia entry for ‘JFK International’
  • 8.
    Illustrating the Complexitiesof Entity Linking in Health Source Entity Mention Target Concept (SNOMED) Current Data Driven Solution? Twitter hungry hunger y Twitter gained 2kgs in weight weight gain y Twitter head spinning dizziness y Twitter rupturd his bowel gastrointestinal perforation y EHR No pneumothorax history of pneumothorax, negative ? EHR right breast cancer breast cancer + right n EHR A.FIB atrial fibrillation ? Literature peculiar changes in the dendrites of Purjinje cells abnormal + Purjinje cell + dendrite + associated morphology n
  • 9.
    A Brief Overviewof Entity Linking • 1. Manually defined symbolic features – fast, clear but restricted coverage • String matching on concept labels, e.g. “hungry”  hunger [15] • 2. Machine translation models using symbolic features – better coverage • Recognise variant compositions, e.g. “gained 2kgs in weight”  weight gain [16] • 3. Distributed compositional semantic models – best coverage but opaque • Recognises latent similarities, e.g. “head spinning”  dizziness • Dependent to some extent on large-scale data • Doesn’t account by itself for complex utterances such as post-coordinated concepts.[17] • Vagueness? “terrible headache this morning”  sinus headache ? Tension headache? Hangover ? [15] Zhiyong Lu, et al. The gene normalization task in biocreative iii. BMC bioinformatics, 12(8):S2, 2011. [16] Nut Limsopatham and Nigel Collier. Adapting phrase-based machine translation to normalise medical terms in social media messages. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1675–1680. Association for Computational Linguistics, 2015. [17] Ferdinand Dhombres and Olivier Bodenreider. Interoperability between phenotypes in research and healthcare terminologies—investigating partial mappings between hpo and snomed ct. J. Biomedical semantics, 7(1):3, 2016.
  • 10.
    Experimental Setup: LanguageModels 10 Model Description Ref. TF-IDF Traditional statistical term matching approach [18] BM25 Traditional term ranking function [19] SVM LTR Supervised machine learning model (current SOTA) [20] DWR Cosine similarity between word vectors for mention and concept SMT + DWR Statistical word alignment model [21] CNN Supervised neural network model [22] [18] Spärck Jones, K. (2004). IDF term weighting and IR research lessons. Journal of documentation, 60(5), 521-523. [19] Robertson, S., Zaragoza, H., & Taylor, M. (2004, November). Simple BM25 extension to multiple weighted fields. In Proceedings of the thirteenth ACM international conference on Information and knowledge management (pp. 42-49). ACM. [20] Leaman, R., Islamaj Doğan, R., & Lu, Z. (2013). DNorm: disease name normalization with pairwise learning to rank. Bioinformatics, 29(22), 2909-2917. [21] Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., ... & Dyer, C. (2007, June). Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions (pp. 177-180). Association for Computational Linguistics. [22] Limsopatham, N., & Collier, N. H. (2016). Normalising medical concepts in social media texts by learning semantic representation.
  • 11.
    Experimental Setup: DataSets • We evaluate our approaches on three different datasets 11 Dataset # Queries # Target concepts Data source TwADR-S 201 58 Twitter Messages TwADR-L 2,220 1,436 Twitter Messages AskPatient 8,662 1,036 Blog posts from askapatient.com
  • 12.
    Experimental Results 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 TwADR-S TwADR-LAskPatient TF-IDF BM25 DWR SVM LTR SMT+DWR CNN Accuracy 12
  • 13.
    Conclusion • NLP cancontribute to medical discovery through the intelligent use of existing data. This shortens the time to insight. • I’ve proposed and tested various ways of linking entities to knowledge bases using machine learning • I’ve used distributed semantic representations to infer latent similarities between mentions and concepts • Future research: • Generating explanations • Handling compositionality • Scaling up to larger knowledge bases • Get involved: Data challenge?  CLEF eHealth Lab + BioCreative Connect?  ESPRC Healtext Network + Alan Turing Institute Publish? e.g. ACL + EMNLP + BioNLP + SocialNLP Software? Apache cTakes + GATE
  • 14.
    Thank you! Slides availableat: https://www.slideshare.net/nigel_collier https://sites.google.com/site/nhcollier/ nhc30@cam.ac.uk ORCID: 0000-0002-7230-4164 Twitter: @nigelhcollier