SlideShare a Scribd company logo
Medical Text Classification using
Convolutional Neural Networks
Mark Hughes, Irene Li , Spyros Kotoulas and Toyotaro Suzumura
26, April, 2017
Informatics for Health
IBM Research Ireland
Japan Science and Technology Agency, Tokyo, Japan
IBM TJ Watson Research Center, New York, USA
Motivation: Medical Text Classification
( A 75-y-o woman) with sudden onset back pain last
night while lifting turkey from oven. The pain is worse
with movement or deep breath, better with rest. No
symptoms in legs, no fever or chills. No chest pain,
cough, wheezing, abdominal pain, headache… Married.
Two children. No smoking.
Unstructural
clinical notes:
Various Topics
Messy
Irrelevant
IBM Watson Smart Notes Project
Search info related to particular illnesses
--- sentence-level classification
State-of-the-art Representation of NLP
[1] Distributed Representations of Words and Phrases and their Compositionality, Mikolov et.al. 2013
[2] Distributed Representations of Sentences and Documents, Quoc V.Le et.al. 2014
[3] Gensim: https://radimrehurek.com/gensim/models/doc2vec.html
[4] Dai, Andrew M., Christopher Olah, and Quoc V. Le. "Document embedding with paragraph vectors." (2015).
Distributed Representations: dense vectors
• Embedding Models: Word2vec[1] , Doc2vec[2,3]
• Visualization Example:
– Semantically clusterred
– Unsupervised learning
– Large training corpus
Convolutional Neural Network Modeling Sentences
Figure from Kim, YoonConvolutional neural networks for sentence classification." arXiv preprint arXiv:1408.5882 (2014).
Proposed Model: Convolutional Neural Network
features…
Datasets
[1]: US National Library of Medicine National Institutes of Health Search database http://www.ncbi.nlm.nih.gov/pubmed
[2]: Merck Manual Dataset http://www.merckmanuals.com/
Pre-trained Word2vec: 15,000 clinical research papers from PubMed[1].
Experiments: 26 Categories, 4000 sentences each, 1000 sentences validation
from Merck Manual[2].
Sentence embeddings + SVM
▪ Doc2vec, the distributed memory (PV-DM) model: represent each sentence
as a vector;
▪ Sentence vectors as inputs, supervised learning by SVM.
Mean Word embeddings + SVM
▪ Pair-wise mean sentence embeddings: each sentence is a vector, add zero
or eliminate if unseen;
▪ Sentence vectors as inputs, supervised learning by SVM.
Word embeddings with BOW(Bag-of-Word) Features
▪ K-means: word embeddings into 1000 clusters;
▪ BOW histogram: each sentence represented by a 1000-d vector;
▪ Sentence vectors as inputs, supervised learning by SVM.
Evaluation: Baselines
Results: Accuracy
Conclusions & Discussions
Convolutional Neural Nets
• sentence-level classification in clinical domain;
• possible to be scaled up to paragraph/document level;
• the better ability to do classification compared with shallow
learning methods.
Representation Learning
• the ability to represent in a distributed way;
• pre-trained embeddings are useful for text
comparison/retrieval tasks.
Future Works
Dataset
• Extend in-domain knowledge: papers, books, relevant topics in
Wikipedia, etc;
• Test on fine graied set of clinical datasets.
Potential Applications
• Notes classification;
• Patient2vec (Use Case next page): representation learning on
individual patient, high level semantic representation of each
patient.
Patient2Vec: Every patient is a vector
Feature extraction from everything:
gender,age, body conditions, history
treatments, …
Thanks!
Q&A
Acknowledgement: This project is partially funded by CREST, Japan Science and
Technology Agency, Tokyo, Japan (Grant number : Number JPMJCR1303)

More Related Content

Similar to Medical Text Classification using Convolutional Neural Network

Connected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul GrothConnected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul Groth
Connected Data World
 
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
Artificial Intelligence Institute at UofSC
 
LSTM Based Sentiment Analysis
LSTM Based Sentiment AnalysisLSTM Based Sentiment Analysis
LSTM Based Sentiment Analysis
ijtsrd
 
A Survey Of Various Machine Learning Techniques For Text Classification
A Survey Of Various Machine Learning Techniques For Text ClassificationA Survey Of Various Machine Learning Techniques For Text Classification
A Survey Of Various Machine Learning Techniques For Text Classification
Joshua Gorinson
 
"Analysis of Different Text Classification Algorithms: An Assessment "
"Analysis of Different Text Classification Algorithms: An Assessment ""Analysis of Different Text Classification Algorithms: An Assessment "
"Analysis of Different Text Classification Algorithms: An Assessment "
ijtsrd
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
Bhaskar Mitra
 
Ran zhou poster 2018
Ran zhou poster 2018Ran zhou poster 2018
Ran zhou poster 2018
Ran Zhou
 
Doc format.
Doc format.Doc format.
Doc format.
butest
 
Survey of natural language processing(midp2)
Survey of natural language processing(midp2)Survey of natural language processing(midp2)
Survey of natural language processing(midp2)
Tariqul islam
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Matthew Lease
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
RajkiranVeluri
 
How do we know what we don’t know: Using the Neuroscience Information Framew...
How do we know what we don’t know:  Using the Neuroscience Information Framew...How do we know what we don’t know:  Using the Neuroscience Information Framew...
How do we know what we don’t know: Using the Neuroscience Information Framew...
Maryann Martone
 
Convolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classificationConvolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classification
Yunchao He
 
Continuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdfContinuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdf
devangmittal4
 
NLP Techniques for Text Classification.docx
NLP Techniques for Text Classification.docxNLP Techniques for Text Classification.docx
NLP Techniques for Text Classification.docx
KevinSims18
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
Viet-Trung TRAN
 
Challenges in transfer learning in nlp
Challenges in transfer learning in nlpChallenges in transfer learning in nlp
Challenges in transfer learning in nlp
LaraOlmosCamarena
 
Context Driven Technique for Document Classification
Context Driven Technique for Document ClassificationContext Driven Technique for Document Classification
Context Driven Technique for Document Classification
IDES Editor
 
Simulation: From theory to implementation
Simulation: From theory to implementationSimulation: From theory to implementation
Simulation: From theory to implementation
Adam Dubrowski
 
MS-Presentation-new template arid university.pptx
MS-Presentation-new template arid university.pptxMS-Presentation-new template arid university.pptx
MS-Presentation-new template arid university.pptx
NimraTariq69
 

Similar to Medical Text Classification using Convolutional Neural Network (20)

Connected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul GrothConnected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul Groth
 
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
 
LSTM Based Sentiment Analysis
LSTM Based Sentiment AnalysisLSTM Based Sentiment Analysis
LSTM Based Sentiment Analysis
 
A Survey Of Various Machine Learning Techniques For Text Classification
A Survey Of Various Machine Learning Techniques For Text ClassificationA Survey Of Various Machine Learning Techniques For Text Classification
A Survey Of Various Machine Learning Techniques For Text Classification
 
"Analysis of Different Text Classification Algorithms: An Assessment "
"Analysis of Different Text Classification Algorithms: An Assessment ""Analysis of Different Text Classification Algorithms: An Assessment "
"Analysis of Different Text Classification Algorithms: An Assessment "
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
Ran zhou poster 2018
Ran zhou poster 2018Ran zhou poster 2018
Ran zhou poster 2018
 
Doc format.
Doc format.Doc format.
Doc format.
 
Survey of natural language processing(midp2)
Survey of natural language processing(midp2)Survey of natural language processing(midp2)
Survey of natural language processing(midp2)
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
How do we know what we don’t know: Using the Neuroscience Information Framew...
How do we know what we don’t know:  Using the Neuroscience Information Framew...How do we know what we don’t know:  Using the Neuroscience Information Framew...
How do we know what we don’t know: Using the Neuroscience Information Framew...
 
Convolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classificationConvolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classification
 
Continuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdfContinuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdf
 
NLP Techniques for Text Classification.docx
NLP Techniques for Text Classification.docxNLP Techniques for Text Classification.docx
NLP Techniques for Text Classification.docx
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
 
Challenges in transfer learning in nlp
Challenges in transfer learning in nlpChallenges in transfer learning in nlp
Challenges in transfer learning in nlp
 
Context Driven Technique for Document Classification
Context Driven Technique for Document ClassificationContext Driven Technique for Document Classification
Context Driven Technique for Document Classification
 
Simulation: From theory to implementation
Simulation: From theory to implementationSimulation: From theory to implementation
Simulation: From theory to implementation
 
MS-Presentation-new template arid university.pptx
MS-Presentation-new template arid university.pptxMS-Presentation-new template arid university.pptx
MS-Presentation-new template arid university.pptx
 

Recently uploaded

J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
Bert Jan Schrijver
 
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
dakas1
 
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Paul Brebner
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
kalichargn70th171
 
42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert
vaishalijagtap12
 
Orca: Nocode Graphical Editor for Container Orchestration
Orca: Nocode Graphical Editor for Container OrchestrationOrca: Nocode Graphical Editor for Container Orchestration
Orca: Nocode Graphical Editor for Container Orchestration
Pedro J. Molina
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
widenerjobeyrl638
 
Secure-by-Design Using Hardware and Software Protection for FDA Compliance
Secure-by-Design Using Hardware and Software Protection for FDA ComplianceSecure-by-Design Using Hardware and Software Protection for FDA Compliance
Secure-by-Design Using Hardware and Software Protection for FDA Compliance
ICS
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
Alina Yurenko
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
dakas1
 
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSISDECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
Tier1 app
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
safelyiotech
 
Going AOT: Everything you need to know about GraalVM for Java applications
Going AOT: Everything you need to know about GraalVM for Java applicationsGoing AOT: Everything you need to know about GraalVM for Java applications
Going AOT: Everything you need to know about GraalVM for Java applications
Alina Yurenko
 
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdfBaha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid
 
Microsoft-Power-Platform-Adoption-Planning.pptx
Microsoft-Power-Platform-Adoption-Planning.pptxMicrosoft-Power-Platform-Adoption-Planning.pptx
Microsoft-Power-Platform-Adoption-Planning.pptx
jrodriguezq3110
 
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
campbellclarkson
 
Building API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructureBuilding API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructure
confluent
 
The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024
Yara Milbes
 

Recently uploaded (20)

J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
 
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
 
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
 
42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert
 
Orca: Nocode Graphical Editor for Container Orchestration
Orca: Nocode Graphical Editor for Container OrchestrationOrca: Nocode Graphical Editor for Container Orchestration
Orca: Nocode Graphical Editor for Container Orchestration
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
 
Secure-by-Design Using Hardware and Software Protection for FDA Compliance
Secure-by-Design Using Hardware and Software Protection for FDA ComplianceSecure-by-Design Using Hardware and Software Protection for FDA Compliance
Secure-by-Design Using Hardware and Software Protection for FDA Compliance
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
 
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSISDECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
 
Going AOT: Everything you need to know about GraalVM for Java applications
Going AOT: Everything you need to know about GraalVM for Java applicationsGoing AOT: Everything you need to know about GraalVM for Java applications
Going AOT: Everything you need to know about GraalVM for Java applications
 
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdfBaha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
 
Microsoft-Power-Platform-Adoption-Planning.pptx
Microsoft-Power-Platform-Adoption-Planning.pptxMicrosoft-Power-Platform-Adoption-Planning.pptx
Microsoft-Power-Platform-Adoption-Planning.pptx
 
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
 
Building API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructureBuilding API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructure
 
The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024
 

Medical Text Classification using Convolutional Neural Network

  • 1. Medical Text Classification using Convolutional Neural Networks Mark Hughes, Irene Li , Spyros Kotoulas and Toyotaro Suzumura 26, April, 2017 Informatics for Health IBM Research Ireland Japan Science and Technology Agency, Tokyo, Japan IBM TJ Watson Research Center, New York, USA
  • 2. Motivation: Medical Text Classification ( A 75-y-o woman) with sudden onset back pain last night while lifting turkey from oven. The pain is worse with movement or deep breath, better with rest. No symptoms in legs, no fever or chills. No chest pain, cough, wheezing, abdominal pain, headache… Married. Two children. No smoking. Unstructural clinical notes: Various Topics Messy Irrelevant IBM Watson Smart Notes Project Search info related to particular illnesses --- sentence-level classification
  • 3. State-of-the-art Representation of NLP [1] Distributed Representations of Words and Phrases and their Compositionality, Mikolov et.al. 2013 [2] Distributed Representations of Sentences and Documents, Quoc V.Le et.al. 2014 [3] Gensim: https://radimrehurek.com/gensim/models/doc2vec.html [4] Dai, Andrew M., Christopher Olah, and Quoc V. Le. "Document embedding with paragraph vectors." (2015). Distributed Representations: dense vectors • Embedding Models: Word2vec[1] , Doc2vec[2,3] • Visualization Example: – Semantically clusterred – Unsupervised learning – Large training corpus
  • 4. Convolutional Neural Network Modeling Sentences Figure from Kim, YoonConvolutional neural networks for sentence classification." arXiv preprint arXiv:1408.5882 (2014).
  • 5. Proposed Model: Convolutional Neural Network features…
  • 6. Datasets [1]: US National Library of Medicine National Institutes of Health Search database http://www.ncbi.nlm.nih.gov/pubmed [2]: Merck Manual Dataset http://www.merckmanuals.com/ Pre-trained Word2vec: 15,000 clinical research papers from PubMed[1]. Experiments: 26 Categories, 4000 sentences each, 1000 sentences validation from Merck Manual[2].
  • 7. Sentence embeddings + SVM ▪ Doc2vec, the distributed memory (PV-DM) model: represent each sentence as a vector; ▪ Sentence vectors as inputs, supervised learning by SVM. Mean Word embeddings + SVM ▪ Pair-wise mean sentence embeddings: each sentence is a vector, add zero or eliminate if unseen; ▪ Sentence vectors as inputs, supervised learning by SVM. Word embeddings with BOW(Bag-of-Word) Features ▪ K-means: word embeddings into 1000 clusters; ▪ BOW histogram: each sentence represented by a 1000-d vector; ▪ Sentence vectors as inputs, supervised learning by SVM. Evaluation: Baselines
  • 9. Conclusions & Discussions Convolutional Neural Nets • sentence-level classification in clinical domain; • possible to be scaled up to paragraph/document level; • the better ability to do classification compared with shallow learning methods. Representation Learning • the ability to represent in a distributed way; • pre-trained embeddings are useful for text comparison/retrieval tasks.
  • 10. Future Works Dataset • Extend in-domain knowledge: papers, books, relevant topics in Wikipedia, etc; • Test on fine graied set of clinical datasets. Potential Applications • Notes classification; • Patient2vec (Use Case next page): representation learning on individual patient, high level semantic representation of each patient.
  • 11. Patient2Vec: Every patient is a vector Feature extraction from everything: gender,age, body conditions, history treatments, …
  • 12. Thanks! Q&A Acknowledgement: This project is partially funded by CREST, Japan Science and Technology Agency, Tokyo, Japan (Grant number : Number JPMJCR1303)