Data Con LA 2022- Embedding medical journeys with machine learning to improve member health at Aetna

1
Consumer Health & Services
Strictly confidential
Proprietary
Embedding medical journeys with machine
learning to improve member health at Aetna
Core Contributors: Jai Bansal, Matt Churgin, Reed Peterson, Evan Lyle

Agenda
1. Messaging members to improve health
2. What are embeddings?
3. How can embeddings support an insurer’s work?
4. Evaluation and applications

Messaging members to improve
health

Insurers can impact members through behavior change campaigns
• These campaigns can promote healthy and cost-effective
choices for members
• Sample Process
• Identify domain where members can benefit from targeted communication.
Review concept with relevant business partners, clinicians, and legal team
• Design outreach with multi-disciplinary group. Outreach channels could
include email, direct mail, and text message.
• Implement outreach using a randomized control trial framework and
measure results
• Call-To-Action Examples: an insurer could message about
• Preventive care: encourage members to utilize preventive care
benefits to improve their long-term health
• Medication adherence: encourage members to follow prescribed
medication regimes to improve long-term health
• Preferred site of care: encourage members to seek routine services
at in-network providers to reduce out-of-pocket medical spend
Illustrative example of messaging

Campaigns can use predictive models to inform targeting. Medical claims data
can be used to create model features.
• Predictive models might be used to identify members that
have a high likelihood of responding to messaging or
developing a preventable condition or illness.
• Insurers could use medical claims to build models. Medical
claims are artifacts generated from members’ interactions
with providers.
• One of the key pieces of data contained in claims are
medical codes
• ICD codes indicate a member’s diagnosis
• CPT codes indicate any procedure a member underwent
• GPI codes indicate member prescriptions
• There are >10K ICD codes
Sample ICD (Diagnosis) Codes
ICD Code Lookup Site: https://www.icd10data.com/ICD10CM/Codes
ICD Code Description
A00.9 Cholera, unspecified
Z86.16 Personal history of COVID-19
T33.012D
Superficial frostbite of left ear,
subsequent encounter
F17.200
Nicotine dependence,
unspecified, uncomplicated
W61.02XD
Struck by parrot, subsequent
encounter
Z63.1
Problems in relationship with in-
laws

Embeddings are simple representations of complex data
The [0.13, 1.31, -0.13, 0.56, …]
Word
Embedding Algorithm (Made-up) Embedding Representation
The dog chased the cat. [0.36, -0.81, 0.40, 0.43, …]
[1.32, -0.90, 0.20, 0.73, …]
Sentence
Image

Embeddings capture information about the features they are built from
A famous example from text embeddings is that embeddings should capture relationships between royal
and non-royal as well as man and woman.
King Man Queen Woman
[x, y, z] [a, b, c] [q, w, e] [r, t, y]
Raw text
Embedding representations
Embeddings should preserve existing
relationships

How can embeddings support
an insurer’s work?

Medical codes contained in claims are a rich feature source, but cannot be used
in models in their raw form
• Diagnosis, procedure, and prescription codes represent
granular data about a member’s healthcare journey. But
they can’t be used in models in their raw form.
• Could one-hot encoding solve the issue? Not really
• There are >10K diagnosis codes, so one-hot encoding would result in
extremely sparse vectors
• One-hot encoded vectors also would not support comparison of codes
(but embeddings would)
• Embedding medical codes can provides a way to use
valuable claims information
• There’s also another opportunity here: since all medical
claims use these codes, it’s possible to build an automated
feature generation tool with code representations
Sample ICD (Diagnosis) Codes
ICD Code Lookup Site: https://www.icd10data.com/ICD10CM/Codes
ICD Code Description
A00.9 Cholera, unspecified
Z86.16 Personal history of COVID-19
T33.012D
Superficial frostbite of left ear,
subsequent encounter
F17.200
Nicotine dependence,
unspecified, uncomplicated
W61.02X
D
Struck by parrot, subsequent
encounter
Z63.1
Problems in relationship with in-
laws

Feature engineering is a critical part of building predictive models and takes
substantial data scientist time and effort
• Feature engineering (FE), including data collection and cleaning, takes 80% of DS time during model
development
• Models often use similar features so a lot of individual FE is duplicative. If a typical DS spends 30% of their
time on FE and has an all-in cost of $200K, then $60K is being spent on FE per DS per year.
• With individual DSs doing custom FE, model features may miss important information. By creating
standardized, comprehensive features, adding embedding features could improve model recall by 10% on
average.
Model Development Feature Engineering: 80%
Overall DS Time Feature Engineering: 30%

Embeddings can be trained using de-identified member medical claim data
Members’ de-identified
medical history
is recorded in ICD +
procedure + GPI codes.
Sample ICD Codes
• Jan 1: H60.33
• Feb 1: L20.82
• Mar 1: M16.30
The codes can then be fed into an
embedding training algorithm (for
example, word2vec or GloVe). Each
code is a token and a member’s series
of code would be treated as a
“sentence.”
(Made-up) ICD Embeddings
• H60.33 : [1.3, 2.4, …, 3.2]
• L20.82 : [9.3, 1.2, …, 8.3]
• M16.30 : [4.5, 7.6, …, 2.6]
Embeddings would be trained using
claims data for a significant population
of members to the extent permitted by
law and client contracts. A member’s
code embeddings over a user-defined
time period should be averaged to
obtain the final member-level
embedding.

Plotting diagnosis codes in 2D yields reasonable spatial relationships based on
domain knowledge
• >10K unique ICD (diagnosis)
codes
• Each point is colored by ICD
group and represents one ICD
code’s embedding
• Codes in the same group and
related groups tend to cluster
together
• Embeddings preserve our
qualitative expectation of
relationships between codes,
with the added benefit that these
relationships are quantified
ICD code embeddings (2-D UMAP projection)
Cancer
Psychiatric Epilepsy
ICD Code: O28.1
Abnormal biochemical
finding on antenatal
screening of mother
ICD Code: O22.22
Superficial
thrombophlebitis in
pregnancy, second
trimester
ICD Code: H40.051
Ocular hypertension,
right eye
ICD Code: H18.463
Peripheral corneal
degeneration,
bilateral
Any data contained in this slide is used to the extent permitted by law and client contracts

Plotting procedure codes in 2D can reveal interesting differences between members
• The plots below illustrate insights that can be derived from visualizing members’ embeddings
• Each point represents a member’s averaged procedure code embeddings
• Embeddings allow identification and comparison of members based on medical utilization
Medicare and Commercial members undergo
different procedures
Procedures are generally similar across gender,
with a few important exceptions
Members of different ages undergo different
procedures

Using embeddings as features provides a quantitative evaluation method
• Comparing embedding features
to simple group counts for a
variety of medical events is a
quantitative way to evaluate the
effectiveness of embedding
features
• For most events, embedding
features outperform simple
count features
• Some medical events are more
predictable overall than others

Medical code embeddings can add value in two main ways
Value Add 1: Embedding features provide an easy way to improve
performance of existing models.
Value Add 2: Embeddings can be used to quickly train new models
with minimal feature engineering.

Potential Next Steps for Embeddings
1. Track internal usage via installs and/or monthly active users
2. Test new embedding algorithms
3. Explore embeddings for other types of medical codes
4. Consider more applications for embeddings, for example member clustering

Embedding vs. one-hot code representations
Data
representation
method
One-hot encoded (~10,000-d vector) Embedding (~100-d vector)
Example
Pros 1. Simple to create and interpret 1. Enables quantitative comparisons between
categories
2. Can be used as features of a predictive model
Cons 1. Cannot easily compare degree of similarity
2. Cannot easily be used as features in a model
1. More challenging to interpret
[1 0 … 0 0]
[0 1 … 0 0]
[0 0 … 0 1]
…
[0.2 -0.1 … 0.5 -.25]
[-0.5 -0.1 … 0.3 -0.1]
[0.15 0.5 … -0.1 -0.3]
…
Code 2
Code 10,000
…
Code 1
Code 2
Code 10,000
…
Code 1

Data Con LA 2022- Embedding medical journeys with machine learning to improve member health at Aetna

Recommended

Recommended

More Related Content

More from Data Con LA

More from Data Con LA (20)

Data Con LA 2022- Embedding medical journeys with machine learning to improve member health at Aetna