NLP in Healthcare to Predict Adverse Events with Amazon SageMaker (AIM346) - AWS re:Invent 2018

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NLP in Healthcare to Predict
Adverse Events with Amazon
SageMaker
Garin Kessler
Data Scientist
AWS Machine Learning Solutions Lab
A I M 3 4 6
Mayank Thakkar
Life Sciences Specialist
AWS Solutions Architecture

Goal
Learn how to apply machine learning methods to predict adverse
events from reported patient data
… and much more

Background
• Pharmacovigilance and patient safety programs
• Adverse events and FDA regulations
• FAERS
• Workable data
• Call center recording / summaries
• Emails / faxes

Adverse event detection – The challenge
• Disparate data types
• Unstructured data
• Understanding semantic
dispositions
• Synonyms, spelling mistakes
• Sentiment detection
• Categorizing interactions
• Various data sources
• Meeting compliance
objectives
• True positives, “sleeping doctor”
• Scale, enormous scale
• Cost efficiency

Machine learning to the rescue
• Improve accuracy and
reliability
• Doesn’t replace humans – aids
humans
• Offload repetitive work – humans
can handle edge cases
• Decrease costs
• Repurpose human workforce for
‘value-adding’ endeavors
• Keep up with ongoing
research
• Incorporate published articles at
scale

Machine learning – The process
Fetch data
Clean &
format data
Prepare &
transform
data
Train model
Evaluate
model
Integrate
with prod
Monitor /
debug /
refresh
Data wrangling
• Set up and manage Notebook
environments
• Get data to notebooks securely
Experimentation
• Set up and manage clusters
• Scale/distribute ML algorithms
Deployment
• Set up and manage
inference clusters
• Manage and auto scale
inference APIs
• Testing, versioning, and
monitoring 6-18
months

A managed service
that provides one of the quickest and easiest ways for
your data scientists and developers to get
ML models from idea to production
Amazon SageMaker

Introducing Amazon SageMaker
End-to-end
machine learning
platform
Zero setup Flexible model
training - bring
your own deep
learning script
Pay by the
second
Or your custom
algorithm
Docker image
One step
deployment
A/B testing Low latency,
high
throughput,
high reliability
Choice of several
ML algorithms
Train faster, in
a single pass

Introducing Amazon SageMaker
Choice of several
ML algorithms
XGBoost, FM,
and Linear for
classification
and regression
K-means and
PCA for
clustering and
dimensionality
reduction
LDA and NTM
for topic
modeling,
seq2seq for
translation
Image
classification
with
convolutional
neural
networks

Natural language processing methods
• Dataset preprocessing - feature generators
• Latent Dirichlet Analysis (LDA)
• Comprehend topic modeling
• BlazingText word embeddings
• Classification - algorithms utilized
• K-nearest neighbors
• Logistic regression
• XGBoost
• Amazon SageMaker BlazingText Classifier
• Deep convolutional neural network running on TensorFlow and Amazon SageMaker

Preprocessing
• Data sources
• Call center summaries
• Stored in Amazon Simple Storage Service (Amazon S3)
• Preprocessing
• Lemmatization with Natural Language Toolkit (NLTK)
• BlazingText with Amazon SageMaker
Using BlazingText, reduced the preprocessing time by 10x

Amazon SageMaker tooling
• TensorFlow and Keras
• “Bring your own model”
• Convolutional neural network
• Built-in algorithms
• Automatic model tuning
• Spinning out many jobs simultaneously
• Amazon CloudWatch and TensorBoard
• Monitoring instances and accuracy metrics

Architecture
VPC
Private subnet
AWS Cloud
Availability zone 1
AWS Region Raw data and
model artifacts
Production
data
Availability zone 2
Private subnet
Training Deployment
Training Deployment
Auto
Scaling
group
Auto
Scaling
group
Endpoint
Endpoint

Results by algorithm
Feature generator Classifier Accuracy AUC
False
positive
rate
False
negative rate
Precision Recall Sensitivity Specificity
LDA
(Latent Dirichlet Allocation)
kNN 0.775 0.767 0.182 0.288 0.729 0.712 0.712 0.818
Logistic regression 0.728 0.787 0.277 0.257 0.485 0.743 0.743 0.723
XGBoost 0.812 0.905 0.152 0.240 0.774 0.759 0.759 0.848
Comprehend topic modeling
kNN 0.759 0.718 0.254 0.189 0.516 0.811 0.811 0.742
Logistic regression 0.516 0.892 0.395 0.602 0.433 0.398 0.398 0.605
XGBoost 0.855 0.936 0.069 0.230 0.908 0.769 0.769 0.931
Amazon SageMaker
BlazingText
BlazingText Classifier 0.979 0.997 0.023 0.020 0.980 0.985 0.985 0.970
Amazon SageMaker Deep
CNN
0.978 0.998 0.021 0.020 0.978 0.982 0.982 0.972

BlazingText embeddings
overview
• Plot the top 5000 most common terms
• Terms overlap with semantically similar
terms
• Models leverage these semantics for
computation and performance
• Will look at terms in two sections of the
word embedding space

BlazingText embeddings: Zoomed in – Part 1
• Model has learned
important familial
and patient
relationships,
including caregivers
and reporters
• Robust to typos:
Patient, Pateint, Pt

BlazingText embeddings: Zoomed in – Part 2
• Model has learned
important side effects
and adverse drug
reactions
• Types of reactions are
even clustered

Cost
Service Resources used Pricing dimension Cost
Amazon S3 50 GB for one month $0.023 per GB-month $1.15
Amazon EFS Storage $3
$1.3
$0.0714 per instance-minute $8.55
$0.021 per instance-minute $0.084
Total $14.08 ($0.11 per 1000 predictions)*
What does it cost to run this model?
Amazon SageMaker on-demand ML instances let you pay for machine learning compute capacity by the second, with a one-minute minimum, with no long-term
commitments.

To learn more…
• Amazon SageMaker here
• Blogs:
• Enhanced text classification and word vectors using Amazon SageMaker BlazingText
• https://tinyurl.com/sagemaker-blazingtext
• Bring your own pre-trained MXNet or TensorFlow models into Amazon SageMaker
• https://tinyurl.com/sagemaker-byom

Questions?
Garin Kessler
Data Scientist
Mayank Thakkar

Thank you!
Garin Kessler
Data Scientist
Mayank Thakkar

NLP in Healthcare to Predict Adverse Events with Amazon SageMaker (AIM346) - AWS re:Invent 2018

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to NLP in Healthcare to Predict Adverse Events with Amazon SageMaker (AIM346) - AWS re:Invent 2018

Similar to NLP in Healthcare to Predict Adverse Events with Amazon SageMaker (AIM346) - AWS re:Invent 2018 (20)

More from Amazon Web Services

More from Amazon Web Services (20)

NLP in Healthcare to Predict Adverse Events with Amazon SageMaker (AIM346) - AWS re:Invent 2018