SlideShare a Scribd company logo
1 of 16
Understanding medical concepts and codes through
NLP methods
Presenter:
Ashis Kumar Chanda
Ph.D. student, CIS Dept.
Contents
● Introduction
● Related works
● Project 1:
■ Improving Medical concept representations with external knowledge
● Project 2:
■ Jointly learning medical concepts and code representation
● Conclusions and future work
2
Introduction
● What is medical concepts?
● Medical concepts are medical terms, abbreviations or short form words.
■ Ex: heart attack, breast cancer, tumor, ‘cp’ for chest pain, or drug names.
● What is medical codes?
● Standard codes for representing diagnosis, procedures or drugs.
● Different medical organizations provide standard code format.
■ Ex: 1749 is a ICD9 code for breast cancer.
■ Ex: 96409 is a CPT code for chemotherapy.
3
Introduction
● How looks like Electronic Health Records (EHRs)?
● This dataset has both structured (i.e. lab values, medical codes) and
unstructured data (physician’s note data).
4
Unstructured data
(clinical note)
Structured data
(medical code events)
Code Code description
1749 breast cancer
96409 chemotherapy
… … …
Introduction
● What is NLP?
● Natural Language Processing, or NLP, is defined as the automatic
manipulation of natural language, like speech and text.
● Finding machine readable representation for words, and documents.
■ Ex: Understanding ‘severity’ of patients from physician’s note.
■ Understanding semantic similarity between ‘kidney’ and ‘renal’.
■ Finding patient phenotype or clusters from clinical note description.
5
Related works
● Previous research used EHRs for patient phenotyping [1], health risk
prediction [2, 3], cohort selection [4], and visual explorations [5, 6].
● Understanding text written in the medical notes is a very important step
for such research studies.
● Many frequency based methods, such as BOW, TF-IDF [9], PMI [10],
GloVe [7] have been developed to present documents/sentences.
● Recent studies focused on neural network based methods.
6
Skip-gram Model
● Skip-gram model scans each sentence to find the log-likelihood of scanned
(target) words within their context window.
● The likelihood of observing the context word wi for the target word wt is:
Wt + 2- 2
7
How would we learn this probability?
Project 1: Improving medical concept
representations with external knowledge
8
Problem: Learning medical concept represenations
● We can run skip-gram on medical notes to find concept representations.
● However, many concepts are rarely used in notes, but are important and
have significant meaning.
● External knowledge can help to improve the medical concept representations.
9How can we integrate this knowledge? Modified skip-gram model
Results: Qualitative analysis
● For a given medical concept, we check the 10 nearest neighbors based on the cosine similarity in
the learned vector space.
Top 10 nearest neighbor concepts of “bipolar disorder”
10
Our model
Bipolar disorder
depression
anxiety
Project: Jointly learning medical code
and concept representation
11
T. Bai, A. K. Chanda, B. L. Egleston, S. Vucetic, Joint learning of representations of medical
concepts and words from EHR data, in: IEEE International Conference on Bioinformatics and
Biomedicine, BIBM, 2017.
Problem
● EHRs contain structured data such as diagnostic codes and laboratory tests,
they also contain unstructured clinical notes.
● Joint Skip-gram model jointly learns medical code and word representations.
● Four types of pairs (context, target) are considered for learning representations
following skip-gram model (code to word, code to code, word to word, word
to code).
12
13
Conclusions and future plans:
● Improving medical concept representations using context-free model.
● Clinical BERT is a recent context-free model.
● Skip-gram model could be applied on other fields.
■ Web mining: assuming user’s web click as a bag of words.
■ Business transaction: user’s item purchase history is also a kind of sequence.
■ Human activity: human activity also provides a log sequence.
14
References:
1. Halpern, Y., Horng, S., Choi, Y., Sontag, D.: Electronic medical record phenotyping using the anchor and learn framework. Journal of the American Medical Informatics Association 23(4), 731{740 (2016)
2. Choi, E., Schuetz, A., Stewart, W.F., Sun, J.: Medical concept representation learning from electronic health records and its application on heart failure prediction. arXiv preprint arXiv:1602.03686 (2016)
3. Choi, E., Schuetz, A., Stewart, W.F., Sun, J.: Using recurrent neural network models for early detection of heart failure onset. Journal of the American Medical Informatics Association 24(2), 361{370 (2016)
4. A. B. Nattinger, P. W. Laud, R. Bajorunaite, R. A. Sparapani, and J. L. Freeman, “An algorithm for the use of medicare claims data to identify women with incident breast cancer.” Health services research, 39(6p1):17331750,
2004.
5. D. Gotz, F. Wang, and A. Perer. A methodology for interactive mining and visual analysis of clinical event patterns using electronic health record data. Journal of biomedical informatics, 48:148–159, 2014
6. J. Krause, N. Razavian, E. Bertini, and D. Sontag. Visual exploration of temporal data in electronic medical records. In AMIA, 2015
7. Pennington, J.; Socher, R.; and Manning, C. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1532–1543
8. T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, CoRR abs/1301.3781. arXiv:1301.3781.
9. Ramos, J., 2003, December. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning (Vol. 242, pp. 133-142).
10. P. D. Turney and P. Pantel. From frequency to meaning: Vector space models of semantics. J. Artif. Intell. Res., 37:141–188, 2010. doi: 10. 1613/jair.2934
11. Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics. 2017 Dec;5:135-46.
12. X. Rong, word2vec parameter learning explained, CoRR abs/1411.2738. arXiv:1411.2738. URL http://arxiv.org/abs/1411.2738
13. T. Bai, A. K. Chanda, S. Vucetic, B. L. Egleston. "EHR phenotyping via jointly embedding medical concepts and words into a unified vector space". Journal of BMC medical info., Publisher: BioMed Central, vol. 18, 2018.
14. S. Vucetic, A. K. Chanda, S. Zhang, T. Bai, A. Maiti "Peer assessment of CS doctoral programs shows strong correlation with faculty citations". Journal of Communications of the ACM, vol. 61, p. 70-76, 2018.
15. T. Bai, A. K. Chanda, S. Vucetic, B. L. Egleston. " Joint learning of representations of medical concepts and words from EHR data". IEEE International Conference on Bioinformatics and Biomedicine (BIBM), p 764-769,
2017.
16. Aronson, A. R., and Lang, F.-M. 2010. An overview of metamap: historical perspective and recent advances. Journal of the American Medical Informatics Association 17(3):229–236.
17. Bodenreider, O. 2004. The unified medical language system (umls): integrating biomedical terminology. Nucleic acids research 32(suppl 1):D267–D270
18. Johnson, A. E.; Pollard, T. J.; Shen, L.; Li-wei, H. L.; Feng, M.; Ghassemi, M.; Moody, B.; Szolovits, P.; Celi, L. A.; and Mark, R. G. 2016. Mimic-iii, a freely accessible critical care database. Scientific data 3:160035.
19. Mullenbach, J.; Wiegreffe, S.; Duke, J.; Sun, J.; and Eisenstein, J. 2018. Explainable prediction of medical codes from clinical text. In Proceedings of the 2018 Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies, NAACLHLT 2018
20. Pei, Jian, et al. "Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth." Proceedings 17th international conference on data engineering. IEEE, 2001.
21. D. J. Gligorijevic, J. Stojanovic, and Z. Obradovic, “Modeling healthcare quality via compact representations of electronic health records.” Transactions on Computational Biology and Bioinformatics, 2016
15
Thank you all!
Contact me: ashis@temple.edu
16

More Related Content

Similar to Medical concept and code representations through NLP

Waterloo September 00 Presentations
Waterloo September 00 PresentationsWaterloo September 00 Presentations
Waterloo September 00 Presentationsbrighteyes
 
Learning to speak medicine
Learning to speak medicineLearning to speak medicine
Learning to speak medicineXavier Amatriain
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...Thien Q. Tran
 
CV_Min_Jiang
CV_Min_JiangCV_Min_Jiang
CV_Min_JiangMIN JIANG
 
Accessing and Sharing Electronic Personal Health Data.
Accessing and Sharing Electronic Personal Health Data.Accessing and Sharing Electronic Personal Health Data.
Accessing and Sharing Electronic Personal Health Data.Maria Karampela
 
Accessing and Sharing Electronic Personal Health Data
Accessing and Sharing Electronic Personal Health DataAccessing and Sharing Electronic Personal Health Data
Accessing and Sharing Electronic Personal Health DataSofia Ouhbi
 
Embi cri review-2013-final
Embi cri review-2013-finalEmbi cri review-2013-final
Embi cri review-2013-finalPeter Embi
 
NLM Georgia Biomedical Informatics
NLM Georgia Biomedical InformaticsNLM Georgia Biomedical Informatics
NLM Georgia Biomedical InformaticsAlison Aldrich
 
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui..."Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...Kishor Datta Gupta
 
Embi cri review-2012-final
Embi cri review-2012-finalEmbi cri review-2012-final
Embi cri review-2012-finalPeter Embi
 
NLP support for clinical tasks and decisions
NLP support for clinical tasks and decisionsNLP support for clinical tasks and decisions
NLP support for clinical tasks and decisionsCORIA-TALN 2018
 
Concepts Related to Health Literacy in Online Information Environments: A Sys...
Concepts Related to Health Literacy in Online Information Environments: A Sys...Concepts Related to Health Literacy in Online Information Environments: A Sys...
Concepts Related to Health Literacy in Online Information Environments: A Sys...Tuula Myllylä-Nygård
 
13-Jan-121AHCP 5330Introduction to Informatics.docx
13-Jan-121AHCP 5330Introduction to Informatics.docx13-Jan-121AHCP 5330Introduction to Informatics.docx
13-Jan-121AHCP 5330Introduction to Informatics.docxhyacinthshackley2629
 
Future of Healthcare: 3 Disruptive Trends
Future of Healthcare: 3 Disruptive TrendsFuture of Healthcare: 3 Disruptive Trends
Future of Healthcare: 3 Disruptive TrendsSean Koon, MD, MS
 
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...Health Informatics New Zealand
 
The Many Lives of Data
The Many Lives of DataThe Many Lives of Data
The Many Lives of Dataljmcneill33
 

Similar to Medical concept and code representations through NLP (20)

Waterloo September 00 Presentations
Waterloo September 00 PresentationsWaterloo September 00 Presentations
Waterloo September 00 Presentations
 
Learning to speak medicine
Learning to speak medicineLearning to speak medicine
Learning to speak medicine
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
 
CV_Min_Jiang
CV_Min_JiangCV_Min_Jiang
CV_Min_Jiang
 
Accessing and Sharing Electronic Personal Health Data.
Accessing and Sharing Electronic Personal Health Data.Accessing and Sharing Electronic Personal Health Data.
Accessing and Sharing Electronic Personal Health Data.
 
Accessing and Sharing Electronic Personal Health Data
Accessing and Sharing Electronic Personal Health DataAccessing and Sharing Electronic Personal Health Data
Accessing and Sharing Electronic Personal Health Data
 
Embi cri review-2013-final
Embi cri review-2013-finalEmbi cri review-2013-final
Embi cri review-2013-final
 
NLM Georgia Biomedical Informatics
NLM Georgia Biomedical InformaticsNLM Georgia Biomedical Informatics
NLM Georgia Biomedical Informatics
 
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui..."Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
 
Embi cri review-2012-final
Embi cri review-2012-finalEmbi cri review-2012-final
Embi cri review-2012-final
 
Quality of Life Technologies: From Cure to Care
Quality of Life Technologies: From Cure to CareQuality of Life Technologies: From Cure to Care
Quality of Life Technologies: From Cure to Care
 
NLP support for clinical tasks and decisions
NLP support for clinical tasks and decisionsNLP support for clinical tasks and decisions
NLP support for clinical tasks and decisions
 
Concepts Related to Health Literacy in Online Information Environments: A Sys...
Concepts Related to Health Literacy in Online Information Environments: A Sys...Concepts Related to Health Literacy in Online Information Environments: A Sys...
Concepts Related to Health Literacy in Online Information Environments: A Sys...
 
13-Jan-121AHCP 5330Introduction to Informatics.docx
13-Jan-121AHCP 5330Introduction to Informatics.docx13-Jan-121AHCP 5330Introduction to Informatics.docx
13-Jan-121AHCP 5330Introduction to Informatics.docx
 
Future of Healthcare: 3 Disruptive Trends
Future of Healthcare: 3 Disruptive TrendsFuture of Healthcare: 3 Disruptive Trends
Future of Healthcare: 3 Disruptive Trends
 
1-s2.0-S0167923620300944-main.pdf
1-s2.0-S0167923620300944-main.pdf1-s2.0-S0167923620300944-main.pdf
1-s2.0-S0167923620300944-main.pdf
 
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
 
The Many Lives of Data
The Many Lives of DataThe Many Lives of Data
The Many Lives of Data
 
6431 WK10Assn1Pt2DonovanC
6431 WK10Assn1Pt2DonovanC6431 WK10Assn1Pt2DonovanC
6431 WK10Assn1Pt2DonovanC
 

More from Ashis Chanda

Information extraction from EHR
Information extraction from EHRInformation extraction from EHR
Information extraction from EHRAshis Chanda
 
Understanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational DatabasesUnderstanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational DatabasesAshis Chanda
 
Multi-class Image Classification using Deep Convolutional Networks on extreme...
Multi-class Image Classification using Deep Convolutional Networks on extreme...Multi-class Image Classification using Deep Convolutional Networks on extreme...
Multi-class Image Classification using Deep Convolutional Networks on extreme...Ashis Chanda
 
Full resolution image compression with recurrent neural networks
Full resolution image compression with recurrent neural networksFull resolution image compression with recurrent neural networks
Full resolution image compression with recurrent neural networksAshis Chanda
 
Iterative deepening search
Iterative deepening searchIterative deepening search
Iterative deepening searchAshis Chanda
 
Periodic pattern mining
Periodic pattern miningPeriodic pattern mining
Periodic pattern miningAshis Chanda
 
An efficient approach to mine flexible periodic patterns in time series datab...
An efficient approach to mine flexible periodic patterns in time series datab...An efficient approach to mine flexible periodic patterns in time series datab...
An efficient approach to mine flexible periodic patterns in time series datab...Ashis Chanda
 
Frequent Pattern Growth Algorithm (FP growth method)
Frequent Pattern Growth Algorithm (FP growth method)Frequent Pattern Growth Algorithm (FP growth method)
Frequent Pattern Growth Algorithm (FP growth method)Ashis Chanda
 

More from Ashis Chanda (11)

Word2vector
Word2vectorWord2vector
Word2vector
 
Information extraction from EHR
Information extraction from EHRInformation extraction from EHR
Information extraction from EHR
 
Understanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational DatabasesUnderstanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational Databases
 
Multi-class Image Classification using Deep Convolutional Networks on extreme...
Multi-class Image Classification using Deep Convolutional Networks on extreme...Multi-class Image Classification using Deep Convolutional Networks on extreme...
Multi-class Image Classification using Deep Convolutional Networks on extreme...
 
Full resolution image compression with recurrent neural networks
Full resolution image compression with recurrent neural networksFull resolution image compression with recurrent neural networks
Full resolution image compression with recurrent neural networks
 
Iterative deepening search
Iterative deepening searchIterative deepening search
Iterative deepening search
 
Periodic pattern mining
Periodic pattern miningPeriodic pattern mining
Periodic pattern mining
 
An efficient approach to mine flexible periodic patterns in time series datab...
An efficient approach to mine flexible periodic patterns in time series datab...An efficient approach to mine flexible periodic patterns in time series datab...
An efficient approach to mine flexible periodic patterns in time series datab...
 
Data Mining
Data MiningData Mining
Data Mining
 
Frequent Pattern Growth Algorithm (FP growth method)
Frequent Pattern Growth Algorithm (FP growth method)Frequent Pattern Growth Algorithm (FP growth method)
Frequent Pattern Growth Algorithm (FP growth method)
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 

Recently uploaded

What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxabhijeetpadhi001
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 

Recently uploaded (20)

What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 

Medical concept and code representations through NLP

  • 1. Understanding medical concepts and codes through NLP methods Presenter: Ashis Kumar Chanda Ph.D. student, CIS Dept.
  • 2. Contents ● Introduction ● Related works ● Project 1: ■ Improving Medical concept representations with external knowledge ● Project 2: ■ Jointly learning medical concepts and code representation ● Conclusions and future work 2
  • 3. Introduction ● What is medical concepts? ● Medical concepts are medical terms, abbreviations or short form words. ■ Ex: heart attack, breast cancer, tumor, ‘cp’ for chest pain, or drug names. ● What is medical codes? ● Standard codes for representing diagnosis, procedures or drugs. ● Different medical organizations provide standard code format. ■ Ex: 1749 is a ICD9 code for breast cancer. ■ Ex: 96409 is a CPT code for chemotherapy. 3
  • 4. Introduction ● How looks like Electronic Health Records (EHRs)? ● This dataset has both structured (i.e. lab values, medical codes) and unstructured data (physician’s note data). 4 Unstructured data (clinical note) Structured data (medical code events) Code Code description 1749 breast cancer 96409 chemotherapy … … …
  • 5. Introduction ● What is NLP? ● Natural Language Processing, or NLP, is defined as the automatic manipulation of natural language, like speech and text. ● Finding machine readable representation for words, and documents. ■ Ex: Understanding ‘severity’ of patients from physician’s note. ■ Understanding semantic similarity between ‘kidney’ and ‘renal’. ■ Finding patient phenotype or clusters from clinical note description. 5
  • 6. Related works ● Previous research used EHRs for patient phenotyping [1], health risk prediction [2, 3], cohort selection [4], and visual explorations [5, 6]. ● Understanding text written in the medical notes is a very important step for such research studies. ● Many frequency based methods, such as BOW, TF-IDF [9], PMI [10], GloVe [7] have been developed to present documents/sentences. ● Recent studies focused on neural network based methods. 6
  • 7. Skip-gram Model ● Skip-gram model scans each sentence to find the log-likelihood of scanned (target) words within their context window. ● The likelihood of observing the context word wi for the target word wt is: Wt + 2- 2 7 How would we learn this probability?
  • 8. Project 1: Improving medical concept representations with external knowledge 8
  • 9. Problem: Learning medical concept represenations ● We can run skip-gram on medical notes to find concept representations. ● However, many concepts are rarely used in notes, but are important and have significant meaning. ● External knowledge can help to improve the medical concept representations. 9How can we integrate this knowledge? Modified skip-gram model
  • 10. Results: Qualitative analysis ● For a given medical concept, we check the 10 nearest neighbors based on the cosine similarity in the learned vector space. Top 10 nearest neighbor concepts of “bipolar disorder” 10 Our model Bipolar disorder depression anxiety
  • 11. Project: Jointly learning medical code and concept representation 11 T. Bai, A. K. Chanda, B. L. Egleston, S. Vucetic, Joint learning of representations of medical concepts and words from EHR data, in: IEEE International Conference on Bioinformatics and Biomedicine, BIBM, 2017.
  • 12. Problem ● EHRs contain structured data such as diagnostic codes and laboratory tests, they also contain unstructured clinical notes. ● Joint Skip-gram model jointly learns medical code and word representations. ● Four types of pairs (context, target) are considered for learning representations following skip-gram model (code to word, code to code, word to word, word to code). 12
  • 13. 13
  • 14. Conclusions and future plans: ● Improving medical concept representations using context-free model. ● Clinical BERT is a recent context-free model. ● Skip-gram model could be applied on other fields. ■ Web mining: assuming user’s web click as a bag of words. ■ Business transaction: user’s item purchase history is also a kind of sequence. ■ Human activity: human activity also provides a log sequence. 14
  • 15. References: 1. Halpern, Y., Horng, S., Choi, Y., Sontag, D.: Electronic medical record phenotyping using the anchor and learn framework. Journal of the American Medical Informatics Association 23(4), 731{740 (2016) 2. Choi, E., Schuetz, A., Stewart, W.F., Sun, J.: Medical concept representation learning from electronic health records and its application on heart failure prediction. arXiv preprint arXiv:1602.03686 (2016) 3. Choi, E., Schuetz, A., Stewart, W.F., Sun, J.: Using recurrent neural network models for early detection of heart failure onset. Journal of the American Medical Informatics Association 24(2), 361{370 (2016) 4. A. B. Nattinger, P. W. Laud, R. Bajorunaite, R. A. Sparapani, and J. L. Freeman, “An algorithm for the use of medicare claims data to identify women with incident breast cancer.” Health services research, 39(6p1):17331750, 2004. 5. D. Gotz, F. Wang, and A. Perer. A methodology for interactive mining and visual analysis of clinical event patterns using electronic health record data. Journal of biomedical informatics, 48:148–159, 2014 6. J. Krause, N. Razavian, E. Bertini, and D. Sontag. Visual exploration of temporal data in electronic medical records. In AMIA, 2015 7. Pennington, J.; Socher, R.; and Manning, C. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1532–1543 8. T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, CoRR abs/1301.3781. arXiv:1301.3781. 9. Ramos, J., 2003, December. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning (Vol. 242, pp. 133-142). 10. P. D. Turney and P. Pantel. From frequency to meaning: Vector space models of semantics. J. Artif. Intell. Res., 37:141–188, 2010. doi: 10. 1613/jair.2934 11. Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics. 2017 Dec;5:135-46. 12. X. Rong, word2vec parameter learning explained, CoRR abs/1411.2738. arXiv:1411.2738. URL http://arxiv.org/abs/1411.2738 13. T. Bai, A. K. Chanda, S. Vucetic, B. L. Egleston. "EHR phenotyping via jointly embedding medical concepts and words into a unified vector space". Journal of BMC medical info., Publisher: BioMed Central, vol. 18, 2018. 14. S. Vucetic, A. K. Chanda, S. Zhang, T. Bai, A. Maiti "Peer assessment of CS doctoral programs shows strong correlation with faculty citations". Journal of Communications of the ACM, vol. 61, p. 70-76, 2018. 15. T. Bai, A. K. Chanda, S. Vucetic, B. L. Egleston. " Joint learning of representations of medical concepts and words from EHR data". IEEE International Conference on Bioinformatics and Biomedicine (BIBM), p 764-769, 2017. 16. Aronson, A. R., and Lang, F.-M. 2010. An overview of metamap: historical perspective and recent advances. Journal of the American Medical Informatics Association 17(3):229–236. 17. Bodenreider, O. 2004. The unified medical language system (umls): integrating biomedical terminology. Nucleic acids research 32(suppl 1):D267–D270 18. Johnson, A. E.; Pollard, T. J.; Shen, L.; Li-wei, H. L.; Feng, M.; Ghassemi, M.; Moody, B.; Szolovits, P.; Celi, L. A.; and Mark, R. G. 2016. Mimic-iii, a freely accessible critical care database. Scientific data 3:160035. 19. Mullenbach, J.; Wiegreffe, S.; Duke, J.; Sun, J.; and Eisenstein, J. 2018. Explainable prediction of medical codes from clinical text. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACLHLT 2018 20. Pei, Jian, et al. "Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth." Proceedings 17th international conference on data engineering. IEEE, 2001. 21. D. J. Gligorijevic, J. Stojanovic, and Z. Obradovic, “Modeling healthcare quality via compact representations of electronic health records.” Transactions on Computational Biology and Bioinformatics, 2016 15
  • 16. Thank you all! Contact me: ashis@temple.edu 16