SlideShare a Scribd company logo
1 of 20
1
Consumer Health & Services
Strictly confidential
Proprietary
Embedding medical journeys with machine
learning to improve member health at Aetna
Core Contributors: Jai Bansal, Matt Churgin, Reed Peterson, Evan Lyle
Agenda
1. Messaging members to improve health
2. What are embeddings?
3. How can embeddings support an insurer’s work?
4. Evaluation and applications
Messaging members to improve
health
Insurers can impact members through behavior change campaigns
• These campaigns can promote healthy and cost-effective
choices for members
• Sample Process
• Identify domain where members can benefit from targeted communication.
Review concept with relevant business partners, clinicians, and legal team
• Design outreach with multi-disciplinary group. Outreach channels could
include email, direct mail, and text message.
• Implement outreach using a randomized control trial framework and
measure results
• Call-To-Action Examples: an insurer could message about
• Preventive care: encourage members to utilize preventive care
benefits to improve their long-term health
• Medication adherence: encourage members to follow prescribed
medication regimes to improve long-term health
• Preferred site of care: encourage members to seek routine services
at in-network providers to reduce out-of-pocket medical spend
Illustrative example of messaging
Campaigns can use predictive models to inform targeting. Medical claims data
can be used to create model features.
• Predictive models might be used to identify members that
have a high likelihood of responding to messaging or
developing a preventable condition or illness.
• Insurers could use medical claims to build models. Medical
claims are artifacts generated from members’ interactions
with providers.
• One of the key pieces of data contained in claims are
medical codes
• ICD codes indicate a member’s diagnosis
• CPT codes indicate any procedure a member underwent
• GPI codes indicate member prescriptions
• There are >10K ICD codes
Sample ICD (Diagnosis) Codes
ICD Code Lookup Site: https://www.icd10data.com/ICD10CM/Codes
ICD Code Description
A00.9 Cholera, unspecified
Z86.16 Personal history of COVID-19
T33.012D
Superficial frostbite of left ear,
subsequent encounter
F17.200
Nicotine dependence,
unspecified, uncomplicated
W61.02XD
Struck by parrot, subsequent
encounter
Z63.1
Problems in relationship with in-
laws
What are embeddings?
Embeddings are simple representations of complex data
The [0.13, 1.31, -0.13, 0.56, …]
Word
Embedding Algorithm (Made-up) Embedding Representation
The dog chased the cat. [0.36, -0.81, 0.40, 0.43, …]
[1.32, -0.90, 0.20, 0.73, …]
Sentence
Image
Embeddings capture information about the features they are built from
A famous example from text embeddings is that embeddings should capture relationships between royal
and non-royal as well as man and woman.
King Man Queen Woman
[x, y, z] [a, b, c] [q, w, e] [r, t, y]
Raw text
Embedding representations
Embeddings should preserve existing
relationships
How can embeddings support
an insurer’s work?
Medical codes contained in claims are a rich feature source, but cannot be used
in models in their raw form
• Diagnosis, procedure, and prescription codes represent
granular data about a member’s healthcare journey. But
they can’t be used in models in their raw form.
• Could one-hot encoding solve the issue? Not really
• There are >10K diagnosis codes, so one-hot encoding would result in
extremely sparse vectors
• One-hot encoded vectors also would not support comparison of codes
(but embeddings would)
• Embedding medical codes can provides a way to use
valuable claims information
• There’s also another opportunity here: since all medical
claims use these codes, it’s possible to build an automated
feature generation tool with code representations
Sample ICD (Diagnosis) Codes
ICD Code Lookup Site: https://www.icd10data.com/ICD10CM/Codes
ICD Code Description
A00.9 Cholera, unspecified
Z86.16 Personal history of COVID-19
T33.012D
Superficial frostbite of left ear,
subsequent encounter
F17.200
Nicotine dependence,
unspecified, uncomplicated
W61.02X
D
Struck by parrot, subsequent
encounter
Z63.1
Problems in relationship with in-
laws
Feature engineering is a critical part of building predictive models and takes
substantial data scientist time and effort
• Feature engineering (FE), including data collection and cleaning, takes 80% of DS time during model
development
• Models often use similar features so a lot of individual FE is duplicative. If a typical DS spends 30% of their
time on FE and has an all-in cost of $200K, then $60K is being spent on FE per DS per year.
• With individual DSs doing custom FE, model features may miss important information. By creating
standardized, comprehensive features, adding embedding features could improve model recall by 10% on
average.
Model Development Feature Engineering: 80%
Overall DS Time Feature Engineering: 30%
Embeddings can be trained using de-identified member medical claim data
Members’ de-identified
medical history
is recorded in ICD +
procedure + GPI codes.
Sample ICD Codes
• Jan 1: H60.33
• Feb 1: L20.82
• Mar 1: M16.30
The codes can then be fed into an
embedding training algorithm (for
example, word2vec or GloVe). Each
code is a token and a member’s series
of code would be treated as a
“sentence.”
(Made-up) ICD Embeddings
• H60.33 : [1.3, 2.4, …, 3.2]
• L20.82 : [9.3, 1.2, …, 8.3]
• M16.30 : [4.5, 7.6, …, 2.6]
Embeddings would be trained using
claims data for a significant population
of members to the extent permitted by
law and client contracts. A member’s
code embeddings over a user-defined
time period should be averaged to
obtain the final member-level
embedding.
Evaluation and applications
Plotting diagnosis codes in 2D yields reasonable spatial relationships based on
domain knowledge
• >10K unique ICD (diagnosis)
codes
• Each point is colored by ICD
group and represents one ICD
code’s embedding
• Codes in the same group and
related groups tend to cluster
together
• Embeddings preserve our
qualitative expectation of
relationships between codes,
with the added benefit that these
relationships are quantified
ICD code embeddings (2-D UMAP projection)
Cancer
Psychiatric Epilepsy
ICD Code: O28.1
Abnormal biochemical
finding on antenatal
screening of mother
ICD Code: O22.22
Superficial
thrombophlebitis in
pregnancy, second
trimester
ICD Code: H40.051
Ocular hypertension,
right eye
ICD Code: H18.463
Peripheral corneal
degeneration,
bilateral
Any data contained in this slide is used to the extent permitted by law and client contracts
Plotting procedure codes in 2D can reveal interesting differences between members
• The plots below illustrate insights that can be derived from visualizing members’ embeddings
• Each point represents a member’s averaged procedure code embeddings
• Embeddings allow identification and comparison of members based on medical utilization
Medicare and Commercial members undergo
different procedures
Procedures are generally similar across gender,
with a few important exceptions
Members of different ages undergo different
procedures
Any data contained in this slide is used to the extent permitted by law and client contracts
Using embeddings as features provides a quantitative evaluation method
• Comparing embedding features
to simple group counts for a
variety of medical events is a
quantitative way to evaluate the
effectiveness of embedding
features
• For most events, embedding
features outperform simple
count features
• Some medical events are more
predictable overall than others
Any data contained in this slide is used to the extent permitted by law and client contracts
Medical code embeddings can add value in two main ways
Value Add 1: Embedding features provide an easy way to improve
performance of existing models.
Value Add 2: Embeddings can be used to quickly train new models
with minimal feature engineering.​
Potential Next Steps for Embeddings
1. Track internal usage via installs and/or monthly active users
2. Test new embedding algorithms
3. Explore embeddings for other types of medical codes
4. Consider more applications for embeddings, for example member clustering
Appendix
Embedding vs. one-hot code representations
Data
representation
method
One-hot encoded (~10,000-d vector) Embedding (~100-d vector)
Example
Pros 1. Simple to create and interpret 1. Enables quantitative comparisons between
categories
2. Can be used as features of a predictive model
Cons 1. Cannot easily compare degree of similarity
2. Cannot easily be used as features in a model
1. More challenging to interpret
[1 0 … 0 0]
[0 1 … 0 0]
[0 0 … 0 1]
…
[0.2 -0.1 … 0.5 -.25]
[-0.5 -0.1 … 0.3 -0.1]
[0.15 0.5 … -0.1 -0.3]
…
Code 2
Code 10,000
…
Code 1
Code 2
Code 10,000
…
Code 1

More Related Content

More from Data Con LA

Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...Data Con LA
 
Data Con LA 2022 - Finding true purpose after falling to addiction, and inspi...
Data Con LA 2022 - Finding true purpose after falling to addiction, and inspi...Data Con LA 2022 - Finding true purpose after falling to addiction, and inspi...
Data Con LA 2022 - Finding true purpose after falling to addiction, and inspi...Data Con LA
 
Data Con LA 2022 - Supercharge your Snowflake Data Cloud from a Snowflake Dat...
Data Con LA 2022 - Supercharge your Snowflake Data Cloud from a Snowflake Dat...Data Con LA 2022 - Supercharge your Snowflake Data Cloud from a Snowflake Dat...
Data Con LA 2022 - Supercharge your Snowflake Data Cloud from a Snowflake Dat...Data Con LA
 
Data Con LA 2022 - The Evolution of AI in Cybersecurity
Data Con LA 2022 - The Evolution of AI in CybersecurityData Con LA 2022 - The Evolution of AI in Cybersecurity
Data Con LA 2022 - The Evolution of AI in CybersecurityData Con LA
 
Data Con LA 2022 - Who Owns That Yacht? How Graphs Are Used to Identify Asset...
Data Con LA 2022 - Who Owns That Yacht? How Graphs Are Used to Identify Asset...Data Con LA 2022 - Who Owns That Yacht? How Graphs Are Used to Identify Asset...
Data Con LA 2022 - Who Owns That Yacht? How Graphs Are Used to Identify Asset...Data Con LA
 
Data Con LA 2022 - Event Sourcing with Apache Pulsar and Apache Quarkus
Data Con LA 2022 - Event Sourcing with Apache Pulsar and Apache QuarkusData Con LA 2022 - Event Sourcing with Apache Pulsar and Apache Quarkus
Data Con LA 2022 - Event Sourcing with Apache Pulsar and Apache QuarkusData Con LA
 
Data Con LA 2022 - Customer-Driven Data Engineering
Data Con LA 2022 - Customer-Driven Data EngineeringData Con LA 2022 - Customer-Driven Data Engineering
Data Con LA 2022 - Customer-Driven Data EngineeringData Con LA
 
Data Con LA 2022 - Early cancer detection using higher-order genome architecture
Data Con LA 2022 - Early cancer detection using higher-order genome architectureData Con LA 2022 - Early cancer detection using higher-order genome architecture
Data Con LA 2022 - Early cancer detection using higher-order genome architectureData Con LA
 
Data Con LA 2022 - Open Source Large Knowledge Graph Factory
Data Con LA 2022 - Open Source Large Knowledge Graph FactoryData Con LA 2022 - Open Source Large Knowledge Graph Factory
Data Con LA 2022 - Open Source Large Knowledge Graph FactoryData Con LA
 

More from Data Con LA (20)

Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...
 
Data Con LA 2022 - Finding true purpose after falling to addiction, and inspi...
Data Con LA 2022 - Finding true purpose after falling to addiction, and inspi...Data Con LA 2022 - Finding true purpose after falling to addiction, and inspi...
Data Con LA 2022 - Finding true purpose after falling to addiction, and inspi...
 
Data Con LA 2022 - Supercharge your Snowflake Data Cloud from a Snowflake Dat...
Data Con LA 2022 - Supercharge your Snowflake Data Cloud from a Snowflake Dat...Data Con LA 2022 - Supercharge your Snowflake Data Cloud from a Snowflake Dat...
Data Con LA 2022 - Supercharge your Snowflake Data Cloud from a Snowflake Dat...
 
Data Con LA 2022 - The Evolution of AI in Cybersecurity
Data Con LA 2022 - The Evolution of AI in CybersecurityData Con LA 2022 - The Evolution of AI in Cybersecurity
Data Con LA 2022 - The Evolution of AI in Cybersecurity
 
Data Con LA 2022 - Who Owns That Yacht? How Graphs Are Used to Identify Asset...
Data Con LA 2022 - Who Owns That Yacht? How Graphs Are Used to Identify Asset...Data Con LA 2022 - Who Owns That Yacht? How Graphs Are Used to Identify Asset...
Data Con LA 2022 - Who Owns That Yacht? How Graphs Are Used to Identify Asset...
 
Data Con LA 2022 - Event Sourcing with Apache Pulsar and Apache Quarkus
Data Con LA 2022 - Event Sourcing with Apache Pulsar and Apache QuarkusData Con LA 2022 - Event Sourcing with Apache Pulsar and Apache Quarkus
Data Con LA 2022 - Event Sourcing with Apache Pulsar and Apache Quarkus
 
Data Con LA 2022 - Customer-Driven Data Engineering
Data Con LA 2022 - Customer-Driven Data EngineeringData Con LA 2022 - Customer-Driven Data Engineering
Data Con LA 2022 - Customer-Driven Data Engineering
 
Data Con LA 2022 - Early cancer detection using higher-order genome architecture
Data Con LA 2022 - Early cancer detection using higher-order genome architectureData Con LA 2022 - Early cancer detection using higher-order genome architecture
Data Con LA 2022 - Early cancer detection using higher-order genome architecture
 
Data Con LA 2022 - Open Source Large Knowledge Graph Factory
Data Con LA 2022 - Open Source Large Knowledge Graph FactoryData Con LA 2022 - Open Source Large Knowledge Graph Factory
Data Con LA 2022 - Open Source Large Knowledge Graph Factory
 

Data Con LA 2022- Embedding medical journeys with machine learning to improve member health at Aetna

  • 1. 1 Consumer Health & Services Strictly confidential Proprietary Embedding medical journeys with machine learning to improve member health at Aetna Core Contributors: Jai Bansal, Matt Churgin, Reed Peterson, Evan Lyle
  • 2. Agenda 1. Messaging members to improve health 2. What are embeddings? 3. How can embeddings support an insurer’s work? 4. Evaluation and applications
  • 3. Messaging members to improve health
  • 4. Insurers can impact members through behavior change campaigns • These campaigns can promote healthy and cost-effective choices for members • Sample Process • Identify domain where members can benefit from targeted communication. Review concept with relevant business partners, clinicians, and legal team • Design outreach with multi-disciplinary group. Outreach channels could include email, direct mail, and text message. • Implement outreach using a randomized control trial framework and measure results • Call-To-Action Examples: an insurer could message about • Preventive care: encourage members to utilize preventive care benefits to improve their long-term health • Medication adherence: encourage members to follow prescribed medication regimes to improve long-term health • Preferred site of care: encourage members to seek routine services at in-network providers to reduce out-of-pocket medical spend Illustrative example of messaging
  • 5. Campaigns can use predictive models to inform targeting. Medical claims data can be used to create model features. • Predictive models might be used to identify members that have a high likelihood of responding to messaging or developing a preventable condition or illness. • Insurers could use medical claims to build models. Medical claims are artifacts generated from members’ interactions with providers. • One of the key pieces of data contained in claims are medical codes • ICD codes indicate a member’s diagnosis • CPT codes indicate any procedure a member underwent • GPI codes indicate member prescriptions • There are >10K ICD codes Sample ICD (Diagnosis) Codes ICD Code Lookup Site: https://www.icd10data.com/ICD10CM/Codes ICD Code Description A00.9 Cholera, unspecified Z86.16 Personal history of COVID-19 T33.012D Superficial frostbite of left ear, subsequent encounter F17.200 Nicotine dependence, unspecified, uncomplicated W61.02XD Struck by parrot, subsequent encounter Z63.1 Problems in relationship with in- laws
  • 7. Embeddings are simple representations of complex data The [0.13, 1.31, -0.13, 0.56, …] Word Embedding Algorithm (Made-up) Embedding Representation The dog chased the cat. [0.36, -0.81, 0.40, 0.43, …] [1.32, -0.90, 0.20, 0.73, …] Sentence Image
  • 8. Embeddings capture information about the features they are built from A famous example from text embeddings is that embeddings should capture relationships between royal and non-royal as well as man and woman. King Man Queen Woman [x, y, z] [a, b, c] [q, w, e] [r, t, y] Raw text Embedding representations Embeddings should preserve existing relationships
  • 9. How can embeddings support an insurer’s work?
  • 10. Medical codes contained in claims are a rich feature source, but cannot be used in models in their raw form • Diagnosis, procedure, and prescription codes represent granular data about a member’s healthcare journey. But they can’t be used in models in their raw form. • Could one-hot encoding solve the issue? Not really • There are >10K diagnosis codes, so one-hot encoding would result in extremely sparse vectors • One-hot encoded vectors also would not support comparison of codes (but embeddings would) • Embedding medical codes can provides a way to use valuable claims information • There’s also another opportunity here: since all medical claims use these codes, it’s possible to build an automated feature generation tool with code representations Sample ICD (Diagnosis) Codes ICD Code Lookup Site: https://www.icd10data.com/ICD10CM/Codes ICD Code Description A00.9 Cholera, unspecified Z86.16 Personal history of COVID-19 T33.012D Superficial frostbite of left ear, subsequent encounter F17.200 Nicotine dependence, unspecified, uncomplicated W61.02X D Struck by parrot, subsequent encounter Z63.1 Problems in relationship with in- laws
  • 11. Feature engineering is a critical part of building predictive models and takes substantial data scientist time and effort • Feature engineering (FE), including data collection and cleaning, takes 80% of DS time during model development • Models often use similar features so a lot of individual FE is duplicative. If a typical DS spends 30% of their time on FE and has an all-in cost of $200K, then $60K is being spent on FE per DS per year. • With individual DSs doing custom FE, model features may miss important information. By creating standardized, comprehensive features, adding embedding features could improve model recall by 10% on average. Model Development Feature Engineering: 80% Overall DS Time Feature Engineering: 30%
  • 12. Embeddings can be trained using de-identified member medical claim data Members’ de-identified medical history is recorded in ICD + procedure + GPI codes. Sample ICD Codes • Jan 1: H60.33 • Feb 1: L20.82 • Mar 1: M16.30 The codes can then be fed into an embedding training algorithm (for example, word2vec or GloVe). Each code is a token and a member’s series of code would be treated as a “sentence.” (Made-up) ICD Embeddings • H60.33 : [1.3, 2.4, …, 3.2] • L20.82 : [9.3, 1.2, …, 8.3] • M16.30 : [4.5, 7.6, …, 2.6] Embeddings would be trained using claims data for a significant population of members to the extent permitted by law and client contracts. A member’s code embeddings over a user-defined time period should be averaged to obtain the final member-level embedding.
  • 14. Plotting diagnosis codes in 2D yields reasonable spatial relationships based on domain knowledge • >10K unique ICD (diagnosis) codes • Each point is colored by ICD group and represents one ICD code’s embedding • Codes in the same group and related groups tend to cluster together • Embeddings preserve our qualitative expectation of relationships between codes, with the added benefit that these relationships are quantified ICD code embeddings (2-D UMAP projection) Cancer Psychiatric Epilepsy ICD Code: O28.1 Abnormal biochemical finding on antenatal screening of mother ICD Code: O22.22 Superficial thrombophlebitis in pregnancy, second trimester ICD Code: H40.051 Ocular hypertension, right eye ICD Code: H18.463 Peripheral corneal degeneration, bilateral Any data contained in this slide is used to the extent permitted by law and client contracts
  • 15. Plotting procedure codes in 2D can reveal interesting differences between members • The plots below illustrate insights that can be derived from visualizing members’ embeddings • Each point represents a member’s averaged procedure code embeddings • Embeddings allow identification and comparison of members based on medical utilization Medicare and Commercial members undergo different procedures Procedures are generally similar across gender, with a few important exceptions Members of different ages undergo different procedures Any data contained in this slide is used to the extent permitted by law and client contracts
  • 16. Using embeddings as features provides a quantitative evaluation method • Comparing embedding features to simple group counts for a variety of medical events is a quantitative way to evaluate the effectiveness of embedding features • For most events, embedding features outperform simple count features • Some medical events are more predictable overall than others Any data contained in this slide is used to the extent permitted by law and client contracts
  • 17. Medical code embeddings can add value in two main ways Value Add 1: Embedding features provide an easy way to improve performance of existing models. Value Add 2: Embeddings can be used to quickly train new models with minimal feature engineering.​
  • 18. Potential Next Steps for Embeddings 1. Track internal usage via installs and/or monthly active users 2. Test new embedding algorithms 3. Explore embeddings for other types of medical codes 4. Consider more applications for embeddings, for example member clustering
  • 20. Embedding vs. one-hot code representations Data representation method One-hot encoded (~10,000-d vector) Embedding (~100-d vector) Example Pros 1. Simple to create and interpret 1. Enables quantitative comparisons between categories 2. Can be used as features of a predictive model Cons 1. Cannot easily compare degree of similarity 2. Cannot easily be used as features in a model 1. More challenging to interpret [1 0 … 0 0] [0 1 … 0 0] [0 0 … 0 1] … [0.2 -0.1 … 0.5 -.25] [-0.5 -0.1 … 0.3 -0.1] [0.15 0.5 … -0.1 -0.3] … Code 2 Code 10,000 … Code 1 Code 2 Code 10,000 … Code 1