Jai Bansal, Senior Manager, Data Science at Aetna
This talk describes an internal data product called Member Embeddings that facilitates modeling of member medical journeys with machine learning.
Medical claims are the key data source we use to understand health journeys at Aetna. Claims are the data artifacts that result from our members' interactions with the healthcare system. Claims contain data like the amount the provider billed, the place of service, and provider specialty. The primary medical information in a claim is represented in codes that indicate the diagnoses, procedures, or drugs for which a member was billed. These codes give us a semi-structured view into the medical reason for each claim and so contain rich information about members' health journeys. However, since the codes themselves are categorical and high-dimensional (10K cardinality), it's challenging to extract insight or predictive power directly from the raw codes on a claim.
To transform claim codes into a more useful format for machine learning, we turned to the concept of embeddings. Word embeddings are widely used in natural language processing to provide numeric vector representations of individual words.
We use a similar approach with our claims data. We treat each claim code as a word or token and use embedding algorithms to learn lower-dimensional vector representations that preserve the original high-dimensional semantic meaning.
This process converts the categorical features into dense numeric representations. In our case, we use sequences of anonymized member claim diagnosis, procedure, and drug codes as training data. We tested a variety of algorithms to learn embeddings for each type of claim code.
We found that the trained embeddings showed relationships between codes that were reasonable from the point of view of subject matter experts. In addition, using the embeddings to predict future healthcare-related events outperformed other basic features, making this tool an easy way to improve predictive model performance and save data scientist time.
Data Con LA 2022 - Open Source Large Knowledge Graph Factory
Data Con LA 2022- Embedding medical journeys with machine learning to improve member health at Aetna
1. 1
Consumer Health & Services
Strictly confidential
Proprietary
Embedding medical journeys with machine
learning to improve member health at Aetna
Core Contributors: Jai Bansal, Matt Churgin, Reed Peterson, Evan Lyle
2. Agenda
1. Messaging members to improve health
2. What are embeddings?
3. How can embeddings support an insurer’s work?
4. Evaluation and applications
4. Insurers can impact members through behavior change campaigns
• These campaigns can promote healthy and cost-effective
choices for members
• Sample Process
• Identify domain where members can benefit from targeted communication.
Review concept with relevant business partners, clinicians, and legal team
• Design outreach with multi-disciplinary group. Outreach channels could
include email, direct mail, and text message.
• Implement outreach using a randomized control trial framework and
measure results
• Call-To-Action Examples: an insurer could message about
• Preventive care: encourage members to utilize preventive care
benefits to improve their long-term health
• Medication adherence: encourage members to follow prescribed
medication regimes to improve long-term health
• Preferred site of care: encourage members to seek routine services
at in-network providers to reduce out-of-pocket medical spend
Illustrative example of messaging
5. Campaigns can use predictive models to inform targeting. Medical claims data
can be used to create model features.
• Predictive models might be used to identify members that
have a high likelihood of responding to messaging or
developing a preventable condition or illness.
• Insurers could use medical claims to build models. Medical
claims are artifacts generated from members’ interactions
with providers.
• One of the key pieces of data contained in claims are
medical codes
• ICD codes indicate a member’s diagnosis
• CPT codes indicate any procedure a member underwent
• GPI codes indicate member prescriptions
• There are >10K ICD codes
Sample ICD (Diagnosis) Codes
ICD Code Lookup Site: https://www.icd10data.com/ICD10CM/Codes
ICD Code Description
A00.9 Cholera, unspecified
Z86.16 Personal history of COVID-19
T33.012D
Superficial frostbite of left ear,
subsequent encounter
F17.200
Nicotine dependence,
unspecified, uncomplicated
W61.02XD
Struck by parrot, subsequent
encounter
Z63.1
Problems in relationship with in-
laws
7. Embeddings are simple representations of complex data
The [0.13, 1.31, -0.13, 0.56, …]
Word
Embedding Algorithm (Made-up) Embedding Representation
The dog chased the cat. [0.36, -0.81, 0.40, 0.43, …]
[1.32, -0.90, 0.20, 0.73, …]
Sentence
Image
8. Embeddings capture information about the features they are built from
A famous example from text embeddings is that embeddings should capture relationships between royal
and non-royal as well as man and woman.
King Man Queen Woman
[x, y, z] [a, b, c] [q, w, e] [r, t, y]
Raw text
Embedding representations
Embeddings should preserve existing
relationships
10. Medical codes contained in claims are a rich feature source, but cannot be used
in models in their raw form
• Diagnosis, procedure, and prescription codes represent
granular data about a member’s healthcare journey. But
they can’t be used in models in their raw form.
• Could one-hot encoding solve the issue? Not really
• There are >10K diagnosis codes, so one-hot encoding would result in
extremely sparse vectors
• One-hot encoded vectors also would not support comparison of codes
(but embeddings would)
• Embedding medical codes can provides a way to use
valuable claims information
• There’s also another opportunity here: since all medical
claims use these codes, it’s possible to build an automated
feature generation tool with code representations
Sample ICD (Diagnosis) Codes
ICD Code Lookup Site: https://www.icd10data.com/ICD10CM/Codes
ICD Code Description
A00.9 Cholera, unspecified
Z86.16 Personal history of COVID-19
T33.012D
Superficial frostbite of left ear,
subsequent encounter
F17.200
Nicotine dependence,
unspecified, uncomplicated
W61.02X
D
Struck by parrot, subsequent
encounter
Z63.1
Problems in relationship with in-
laws
11. Feature engineering is a critical part of building predictive models and takes
substantial data scientist time and effort
• Feature engineering (FE), including data collection and cleaning, takes 80% of DS time during model
development
• Models often use similar features so a lot of individual FE is duplicative. If a typical DS spends 30% of their
time on FE and has an all-in cost of $200K, then $60K is being spent on FE per DS per year.
• With individual DSs doing custom FE, model features may miss important information. By creating
standardized, comprehensive features, adding embedding features could improve model recall by 10% on
average.
Model Development Feature Engineering: 80%
Overall DS Time Feature Engineering: 30%
12. Embeddings can be trained using de-identified member medical claim data
Members’ de-identified
medical history
is recorded in ICD +
procedure + GPI codes.
Sample ICD Codes
• Jan 1: H60.33
• Feb 1: L20.82
• Mar 1: M16.30
The codes can then be fed into an
embedding training algorithm (for
example, word2vec or GloVe). Each
code is a token and a member’s series
of code would be treated as a
“sentence.”
(Made-up) ICD Embeddings
• H60.33 : [1.3, 2.4, …, 3.2]
• L20.82 : [9.3, 1.2, …, 8.3]
• M16.30 : [4.5, 7.6, …, 2.6]
Embeddings would be trained using
claims data for a significant population
of members to the extent permitted by
law and client contracts. A member’s
code embeddings over a user-defined
time period should be averaged to
obtain the final member-level
embedding.
14. Plotting diagnosis codes in 2D yields reasonable spatial relationships based on
domain knowledge
• >10K unique ICD (diagnosis)
codes
• Each point is colored by ICD
group and represents one ICD
code’s embedding
• Codes in the same group and
related groups tend to cluster
together
• Embeddings preserve our
qualitative expectation of
relationships between codes,
with the added benefit that these
relationships are quantified
ICD code embeddings (2-D UMAP projection)
Cancer
Psychiatric Epilepsy
ICD Code: O28.1
Abnormal biochemical
finding on antenatal
screening of mother
ICD Code: O22.22
Superficial
thrombophlebitis in
pregnancy, second
trimester
ICD Code: H40.051
Ocular hypertension,
right eye
ICD Code: H18.463
Peripheral corneal
degeneration,
bilateral
Any data contained in this slide is used to the extent permitted by law and client contracts
15. Plotting procedure codes in 2D can reveal interesting differences between members
• The plots below illustrate insights that can be derived from visualizing members’ embeddings
• Each point represents a member’s averaged procedure code embeddings
• Embeddings allow identification and comparison of members based on medical utilization
Medicare and Commercial members undergo
different procedures
Procedures are generally similar across gender,
with a few important exceptions
Members of different ages undergo different
procedures
Any data contained in this slide is used to the extent permitted by law and client contracts
16. Using embeddings as features provides a quantitative evaluation method
• Comparing embedding features
to simple group counts for a
variety of medical events is a
quantitative way to evaluate the
effectiveness of embedding
features
• For most events, embedding
features outperform simple
count features
• Some medical events are more
predictable overall than others
Any data contained in this slide is used to the extent permitted by law and client contracts
17. Medical code embeddings can add value in two main ways
Value Add 1: Embedding features provide an easy way to improve
performance of existing models.
Value Add 2: Embeddings can be used to quickly train new models
with minimal feature engineering.
18. Potential Next Steps for Embeddings
1. Track internal usage via installs and/or monthly active users
2. Test new embedding algorithms
3. Explore embeddings for other types of medical codes
4. Consider more applications for embeddings, for example member clustering
20. Embedding vs. one-hot code representations
Data
representation
method
One-hot encoded (~10,000-d vector) Embedding (~100-d vector)
Example
Pros 1. Simple to create and interpret 1. Enables quantitative comparisons between
categories
2. Can be used as features of a predictive model
Cons 1. Cannot easily compare degree of similarity
2. Cannot easily be used as features in a model
1. More challenging to interpret
[1 0 … 0 0]
[0 1 … 0 0]
[0 0 … 0 1]
…
[0.2 -0.1 … 0.5 -.25]
[-0.5 -0.1 … 0.3 -0.1]
[0.15 0.5 … -0.1 -0.3]
…
Code 2
Code 10,000
…
Code 1
Code 2
Code 10,000
…
Code 1