SlideShare a Scribd company logo
1 of 28
Download to read offline
FlorenceAI
Reinventing Data Science at Humana
David Mack, PhD
Cognitive/Machine Learning Principal
AI Engineering, Digital Health and Analytics
TM
A more human way to healthcareTM
David Mack, PhD – Cognitive/Machine Learning Principal
I have worked at Humana for 5½ years in clinical and enterprise
data science. I have been one of the primary architects and
maintainers of Humana’s ML Platform for the past 2 years that
now serves hundreds of data scientists. I love to tinker with
homemade IoT devices, build cool stuff, and learn new things!
Humana’s bold goal is to address the needs of the whole person
Have focused on community partnerships and social determinants of health
Commitment to help our millions of members achieve their best health
Fortune 50 company with $77.2bn consolidated revenue in 2020
Humana has invested significant resources into fighting:
• COVID-19 Pandemic
• Food Insecurity
• Loneliness and Social Isolation
• Inequities in Healthcare
Formed Digital Health and Analytics Organization in 2018
Through advanced analytics, experiential design, data and technology we are
working to meet our associates, members and the communities we serve,
anytime, anywhere, anyhow
What exactly is FlorenceAI*?
| 3
A cloud platform for automating and accelerating the delivery
lifecycle of data science solutions at scale in Azure
Key Foundational Pillars
• Feature stores
• Starter code frameworks
• Notebook based workflow
• Prod deployment partnership
• Extensive training curriculum
End-to-end ecosystem benefits
• Empowers data scientists to solve complex problems
• Promotes access to open-source innovation
• Simplifies model consumption with single interface
• Transforms workflows to improve performance
Microsoft Azure Cloud
Foundational Components
Other Key Tools
* Patent Pending
Feature Stores – Quality Ingredients for ML Algorithms
| 4
Extensive Metadata
• Standard descriptions
• Centralized ref tables​
• Ratings to identify any
quality impacts
• Enables discovery and
exploration
Tens of thousands of features available for training and scoring
with hundreds of instances available across multiple years​
Economies of Scale
• Pre-computed​ for
entire population
• Refreshed regularly​ at
different cadences
• Production ready and
pre-validated
Flexible but Specific
• Designed to cover
most use cases
• Domain expertise in
feature design
• Self-service for
custom situations
End-to-End Process
| 5
Cohort
Design
Initial Feature
Selection
Model Training
Experiments
Score and Register
Best Model
Record Training
Artifacts
Scoring Code
and Testing
Promote Model and
Automate Scoring
Example Problem to Help Trace the Workflow
| 6
12 months of history
Over 11 months of enrollment
6 months looking forward
Continuous enrollment
Fixed Calendar Date
Age ≥ 65, Medicare Advantage
Evidence of CKD stage in Medical Claims or Lab Results
Predict the most severe stage of Chronic Kidney Disease in the next 6 months​
Criteria to Define the Cohort
All code snippets shown in subsequent slides are for illustrative purposes only and may have certain field names or variables redacted for security
Initial Feature Selection and
Traditional Model Training
Walkthrough:
Initial Feature Selection Notebook
Goal:
Identify hundreds of important
features among tens of thousands
First Round of Model Experimentation using SparkML
| 9
Helper Function to execute
the run available in shared
“experiment utility”
Arrive at a “Best Model” using SparkML
| 10
Different helper function to
save the best model and
provide more details
Accuracy alone isn’t always enough, so it’s important
to have views like ROC curves or Heatmaps to help
catch potential mistakes early
Walkthrough:
SparkML Helper Functions
Goals:
Abstract complexity and
standardize logging
Encouraging Reproducibility with Reusable Code
| 12
What items are automatically saved to the MLFlow run?
• Hyperparameters
• Relevant Metrics
• MLFlow model object
• Evaluation Metric Figure (Downloadable)
What other artifacts are saved to ADLS?
• Original Input Schemas before any indexing or feature prep
• Original Training and Test Datasets with just selected features
• String Indexes and Imputation Dictionaries (outside of pipeline models)
• Best Model Scores from both training and test data
Storage
Account
Scoped
Workspace
Scoped
Applying Deep Neural Networks
to Tabular Data at Scale
Key Distinctions of Deep Neural Networks
| 14
Multiclass
Example
Learns over
repeated passes
called “epochs”
What extra things can we do to help us decide which model is the best?
• Use early stopping to minimize training time and combat overfitting
• Use callbacks to log values at the end of each epoch
• Test on smaller chunks of data and scale up as we learn more
Bayesian Hyperparameter Searching with Hyperopt
| 15
Attempts to minimize
our loss function
Can set our hyperparameter space and the
number of trials we want to run
Used a sample of our training data to go
quickly over the 20 trials we chose to run
MLFlow has a Handy Comparison Tool to Help us Focus
| 16
Quick Insights: Complex Layer 1 and Complex Layer 2 don’t do well
Complex Layer 1 with Simpler Layer 2 do much better
Can highlight
ranges to focus
our attention
Let’s use MORE Data with Distributed Training!
| 17
Driver Only Petastorm
Petastorm &
Horovod
1 MM members
1 Worker
6 sec per epoch
Lots of trials to narrow
down our choices
10 MM members
1 Worker
63 sec per epoch
Using all the data, but
takes forever
10 MM members
16 Workers
14 sec per epoch
Train on all the data
much more quickly
We generally see a sqrt(n) speed up over a single worker
Using Petastorm and Horovod, we used all the data and trained 4.5x faster
Walkthrough:
Petastorm and Horovod
Helper Functions
Goals:
Save headaches and empower
data scientists to train on all of the
data quickly
We Improved the Precision of our Model!
| 19
We don’t see as much over-prediction of the majority class
and see better precision in the mid-range classes
SparkML Logistic Regression Tensorflow NN on all the Data
Weighted f1 score = 0.615
(prw = 0.633, rcw = 0.609)
Weighted f1 score = 0.615
(prw = 0.646, rcw = 0.602)
Register, Score, and Preserve
the Model Before Deploying
it to Production
Scoring with a Spark UDF from MLFlow
| 21
• This allows us to easily get the scores into a Spark dataframe from any MLFlow model
• Can repeat for other types of targets or our training DF
Registering the Model
| 22
Model Metadata
(Screenshot from Models Tab in DB Workspace)
First registered in the Data
Scientist’s dev DB workspace
The Data Scientist promotes it to
“production” status in the dev
workspace after review
The associated MLFlow run is used
to also register it in our “production”
workspace for automated jobs
This newly registered model
is the official version used for
automated scoring
The path within the ADLS storage account contains the version so we can support multiple versions at the same time
Production Deployment Pipeline – Notebook-based Workflow
| 23
Key Requirements
• Use Azure DevOps to deploy code to various environments for testing and execution
• Tie execution to specific package versions and LTS non-ML Databricks Runtimes
• Use ADF Parameters to provide flexibility to minimize YAML code duplication
Reusable Framework of 3 notebooks: Feature Engineering, Scoring, Validation
Upstream Dependency Check
to prevent flow of bad data
and errors from missing data Logging via SQL Server to record
both success and failure
Partnership Between Data Scientists and AI Engineers is Pivotal
| 24
Each of the required files needed for deployment are part of the starter repo
and help the data scientist to have the end goal in view from the beginning
Each model is initially reviewed
and subsequently monitored for
AI Bias in key areas
All models are peer reviewed for both domain and
technical accuracy prior to production deployment
Early Wins for the Platform
Key Early Wins – big steps forward
Scaling and automating clunky processes
• Scaled from less than 40 condition flags on-premise to over 3x this in the cloud
• Got contributions from multiple teams following templates
• Now updates over 1 bn rows daily in 1.5 hours for entire member population
Faster prep, more iterations, better tuning and collaboration
• Reduced feature engineering step on very large source from hours to a few min
• Enabled DS team to iterate on models faster, going from 5+ hours for training to a
half hour or less, even for complex GBT models
• Reduced scoring step on prospective members from a week to 30 minutes
Shared resources accelerate everyone
• Hundreds of feature stores mean less process/data duplication and more time to
improve model design with a variety of approaches
• Flexibility to score at scale regardless of algorithm package in automated fashion
with a common output format
A more human way to healthcareTM
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

More Related Content

What's hot

Open Innovation Process and Open Closed Innovation
Open Innovation Process and Open Closed Innovation Open Innovation Process and Open Closed Innovation
Open Innovation Process and Open Closed Innovation Sandra Cecet
 
Cyberspace and cyberethics and social networking
Cyberspace and cyberethics and social networkingCyberspace and cyberethics and social networking
Cyberspace and cyberethics and social networkingYUSRA FERNANDO
 
Platform Dynamics - The rise and dominance of Platforms
Platform Dynamics - The rise and dominance of PlatformsPlatform Dynamics - The rise and dominance of Platforms
Platform Dynamics - The rise and dominance of PlatformsJason Dojc
 
Professional ethics as an engineer
Professional ethics as an engineerProfessional ethics as an engineer
Professional ethics as an engineerlaxman kunwor
 
Information system of Uber
Information system of UberInformation system of Uber
Information system of Ubersisilin93
 

What's hot (10)

Amazon
AmazonAmazon
Amazon
 
Open Innovation Process and Open Closed Innovation
Open Innovation Process and Open Closed Innovation Open Innovation Process and Open Closed Innovation
Open Innovation Process and Open Closed Innovation
 
Cyberspace and cyberethics and social networking
Cyberspace and cyberethics and social networkingCyberspace and cyberethics and social networking
Cyberspace and cyberethics and social networking
 
Platform Dynamics - The rise and dominance of Platforms
Platform Dynamics - The rise and dominance of PlatformsPlatform Dynamics - The rise and dominance of Platforms
Platform Dynamics - The rise and dominance of Platforms
 
wk 4 models of innovation
wk 4 models of innovationwk 4 models of innovation
wk 4 models of innovation
 
Innovation
InnovationInnovation
Innovation
 
Professional ethics as an engineer
Professional ethics as an engineerProfessional ethics as an engineer
Professional ethics as an engineer
 
Information system of Uber
Information system of UberInformation system of Uber
Information system of Uber
 
Amazon Business Model
Amazon Business ModelAmazon Business Model
Amazon Business Model
 
BUSINESS PLAN JUMIA IVORY COAST
BUSINESS PLAN JUMIA IVORY COASTBUSINESS PLAN JUMIA IVORY COAST
BUSINESS PLAN JUMIA IVORY COAST
 

Similar to FlorenceAI: Reinventing Data Science at Humana

Experimentation to Industrialization: Implementing MLOps
Experimentation to Industrialization: Implementing MLOpsExperimentation to Industrialization: Implementing MLOps
Experimentation to Industrialization: Implementing MLOpsDatabricks
 
Mohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowMohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowLviv Startup Club
 
Mohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowMohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowEdunomica
 
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...Lviv Startup Club
 
Deep Caliper Event Integration, Blackboard Learn and Kaltura Video Platform
Deep Caliper Event Integration, Blackboard Learn and Kaltura Video PlatformDeep Caliper Event Integration, Blackboard Learn and Kaltura Video Platform
Deep Caliper Event Integration, Blackboard Learn and Kaltura Video PlatformDan Rinzel
 
Managing Data Science Projects
Managing Data Science ProjectsManaging Data Science Projects
Managing Data Science ProjectsDanielle Dean
 
Chapter 10
Chapter 10Chapter 10
Chapter 10bodo-con
 
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya
 
renita lobo-CV-Automation
renita lobo-CV-Automationrenita lobo-CV-Automation
renita lobo-CV-AutomationRenita Lobo
 
Bridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionBridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionFlorian Wilhelm
 
Rsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AIRsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AISanjana Chowdhury
 
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersR+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersRevolution Analytics
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in productionTuri, Inc.
 
Agile Development unleashed
Agile Development unleashedAgile Development unleashed
Agile Development unleashedlivgeni
 
An intro to building an architecture repository meta model and modeling frame...
An intro to building an architecture repository meta model and modeling frame...An intro to building an architecture repository meta model and modeling frame...
An intro to building an architecture repository meta model and modeling frame...wweinmeyer79
 
2018-Sogeti-TestExpo-Intelligent_Predictive_Models.pptx
2018-Sogeti-TestExpo-Intelligent_Predictive_Models.pptx2018-Sogeti-TestExpo-Intelligent_Predictive_Models.pptx
2018-Sogeti-TestExpo-Intelligent_Predictive_Models.pptxMinh Nguyen
 

Similar to FlorenceAI: Reinventing Data Science at Humana (20)

Experimentation to Industrialization: Implementing MLOps
Experimentation to Industrialization: Implementing MLOpsExperimentation to Industrialization: Implementing MLOps
Experimentation to Industrialization: Implementing MLOps
 
Lect7
Lect7Lect7
Lect7
 
Lect7
Lect7Lect7
Lect7
 
Foutse_Khomh.pptx
Foutse_Khomh.pptxFoutse_Khomh.pptx
Foutse_Khomh.pptx
 
Mohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowMohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with Kubeflow
 
Mohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowMohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with Kubeflow
 
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
 
Deep Caliper Event Integration, Blackboard Learn and Kaltura Video Platform
Deep Caliper Event Integration, Blackboard Learn and Kaltura Video PlatformDeep Caliper Event Integration, Blackboard Learn and Kaltura Video Platform
Deep Caliper Event Integration, Blackboard Learn and Kaltura Video Platform
 
Managing Data Science Projects
Managing Data Science ProjectsManaging Data Science Projects
Managing Data Science Projects
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
 
renita lobo-CV-Automation
renita lobo-CV-Automationrenita lobo-CV-Automation
renita lobo-CV-Automation
 
Bridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionBridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to Production
 
Rsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AIRsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AI
 
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersR+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
 
Agile Development unleashed
Agile Development unleashedAgile Development unleashed
Agile Development unleashed
 
DevOps 101
DevOps 101DevOps 101
DevOps 101
 
An intro to building an architecture repository meta model and modeling frame...
An intro to building an architecture repository meta model and modeling frame...An intro to building an architecture repository meta model and modeling frame...
An intro to building an architecture repository meta model and modeling frame...
 
2018-Sogeti-TestExpo-Intelligent_Predictive_Models.pptx
2018-Sogeti-TestExpo-Intelligent_Predictive_Models.pptx2018-Sogeti-TestExpo-Intelligent_Predictive_Models.pptx
2018-Sogeti-TestExpo-Intelligent_Predictive_Models.pptx
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookmanojkuma9823
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 

Recently uploaded (20)

RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 

FlorenceAI: Reinventing Data Science at Humana

  • 1. FlorenceAI Reinventing Data Science at Humana David Mack, PhD Cognitive/Machine Learning Principal AI Engineering, Digital Health and Analytics TM A more human way to healthcareTM
  • 2. David Mack, PhD – Cognitive/Machine Learning Principal I have worked at Humana for 5½ years in clinical and enterprise data science. I have been one of the primary architects and maintainers of Humana’s ML Platform for the past 2 years that now serves hundreds of data scientists. I love to tinker with homemade IoT devices, build cool stuff, and learn new things! Humana’s bold goal is to address the needs of the whole person Have focused on community partnerships and social determinants of health Commitment to help our millions of members achieve their best health Fortune 50 company with $77.2bn consolidated revenue in 2020 Humana has invested significant resources into fighting: • COVID-19 Pandemic • Food Insecurity • Loneliness and Social Isolation • Inequities in Healthcare Formed Digital Health and Analytics Organization in 2018 Through advanced analytics, experiential design, data and technology we are working to meet our associates, members and the communities we serve, anytime, anywhere, anyhow
  • 3. What exactly is FlorenceAI*? | 3 A cloud platform for automating and accelerating the delivery lifecycle of data science solutions at scale in Azure Key Foundational Pillars • Feature stores • Starter code frameworks • Notebook based workflow • Prod deployment partnership • Extensive training curriculum End-to-end ecosystem benefits • Empowers data scientists to solve complex problems • Promotes access to open-source innovation • Simplifies model consumption with single interface • Transforms workflows to improve performance Microsoft Azure Cloud Foundational Components Other Key Tools * Patent Pending
  • 4. Feature Stores – Quality Ingredients for ML Algorithms | 4 Extensive Metadata • Standard descriptions • Centralized ref tables​ • Ratings to identify any quality impacts • Enables discovery and exploration Tens of thousands of features available for training and scoring with hundreds of instances available across multiple years​ Economies of Scale • Pre-computed​ for entire population • Refreshed regularly​ at different cadences • Production ready and pre-validated Flexible but Specific • Designed to cover most use cases • Domain expertise in feature design • Self-service for custom situations
  • 5. End-to-End Process | 5 Cohort Design Initial Feature Selection Model Training Experiments Score and Register Best Model Record Training Artifacts Scoring Code and Testing Promote Model and Automate Scoring
  • 6. Example Problem to Help Trace the Workflow | 6 12 months of history Over 11 months of enrollment 6 months looking forward Continuous enrollment Fixed Calendar Date Age ≥ 65, Medicare Advantage Evidence of CKD stage in Medical Claims or Lab Results Predict the most severe stage of Chronic Kidney Disease in the next 6 months​ Criteria to Define the Cohort All code snippets shown in subsequent slides are for illustrative purposes only and may have certain field names or variables redacted for security
  • 7. Initial Feature Selection and Traditional Model Training
  • 8. Walkthrough: Initial Feature Selection Notebook Goal: Identify hundreds of important features among tens of thousands
  • 9. First Round of Model Experimentation using SparkML | 9 Helper Function to execute the run available in shared “experiment utility”
  • 10. Arrive at a “Best Model” using SparkML | 10 Different helper function to save the best model and provide more details Accuracy alone isn’t always enough, so it’s important to have views like ROC curves or Heatmaps to help catch potential mistakes early
  • 11. Walkthrough: SparkML Helper Functions Goals: Abstract complexity and standardize logging
  • 12. Encouraging Reproducibility with Reusable Code | 12 What items are automatically saved to the MLFlow run? • Hyperparameters • Relevant Metrics • MLFlow model object • Evaluation Metric Figure (Downloadable) What other artifacts are saved to ADLS? • Original Input Schemas before any indexing or feature prep • Original Training and Test Datasets with just selected features • String Indexes and Imputation Dictionaries (outside of pipeline models) • Best Model Scores from both training and test data Storage Account Scoped Workspace Scoped
  • 13. Applying Deep Neural Networks to Tabular Data at Scale
  • 14. Key Distinctions of Deep Neural Networks | 14 Multiclass Example Learns over repeated passes called “epochs” What extra things can we do to help us decide which model is the best? • Use early stopping to minimize training time and combat overfitting • Use callbacks to log values at the end of each epoch • Test on smaller chunks of data and scale up as we learn more
  • 15. Bayesian Hyperparameter Searching with Hyperopt | 15 Attempts to minimize our loss function Can set our hyperparameter space and the number of trials we want to run Used a sample of our training data to go quickly over the 20 trials we chose to run
  • 16. MLFlow has a Handy Comparison Tool to Help us Focus | 16 Quick Insights: Complex Layer 1 and Complex Layer 2 don’t do well Complex Layer 1 with Simpler Layer 2 do much better Can highlight ranges to focus our attention
  • 17. Let’s use MORE Data with Distributed Training! | 17 Driver Only Petastorm Petastorm & Horovod 1 MM members 1 Worker 6 sec per epoch Lots of trials to narrow down our choices 10 MM members 1 Worker 63 sec per epoch Using all the data, but takes forever 10 MM members 16 Workers 14 sec per epoch Train on all the data much more quickly We generally see a sqrt(n) speed up over a single worker Using Petastorm and Horovod, we used all the data and trained 4.5x faster
  • 18. Walkthrough: Petastorm and Horovod Helper Functions Goals: Save headaches and empower data scientists to train on all of the data quickly
  • 19. We Improved the Precision of our Model! | 19 We don’t see as much over-prediction of the majority class and see better precision in the mid-range classes SparkML Logistic Regression Tensorflow NN on all the Data Weighted f1 score = 0.615 (prw = 0.633, rcw = 0.609) Weighted f1 score = 0.615 (prw = 0.646, rcw = 0.602)
  • 20. Register, Score, and Preserve the Model Before Deploying it to Production
  • 21. Scoring with a Spark UDF from MLFlow | 21 • This allows us to easily get the scores into a Spark dataframe from any MLFlow model • Can repeat for other types of targets or our training DF
  • 22. Registering the Model | 22 Model Metadata (Screenshot from Models Tab in DB Workspace) First registered in the Data Scientist’s dev DB workspace The Data Scientist promotes it to “production” status in the dev workspace after review The associated MLFlow run is used to also register it in our “production” workspace for automated jobs This newly registered model is the official version used for automated scoring The path within the ADLS storage account contains the version so we can support multiple versions at the same time
  • 23. Production Deployment Pipeline – Notebook-based Workflow | 23 Key Requirements • Use Azure DevOps to deploy code to various environments for testing and execution • Tie execution to specific package versions and LTS non-ML Databricks Runtimes • Use ADF Parameters to provide flexibility to minimize YAML code duplication Reusable Framework of 3 notebooks: Feature Engineering, Scoring, Validation Upstream Dependency Check to prevent flow of bad data and errors from missing data Logging via SQL Server to record both success and failure
  • 24. Partnership Between Data Scientists and AI Engineers is Pivotal | 24 Each of the required files needed for deployment are part of the starter repo and help the data scientist to have the end goal in view from the beginning Each model is initially reviewed and subsequently monitored for AI Bias in key areas All models are peer reviewed for both domain and technical accuracy prior to production deployment
  • 25. Early Wins for the Platform
  • 26. Key Early Wins – big steps forward Scaling and automating clunky processes • Scaled from less than 40 condition flags on-premise to over 3x this in the cloud • Got contributions from multiple teams following templates • Now updates over 1 bn rows daily in 1.5 hours for entire member population Faster prep, more iterations, better tuning and collaboration • Reduced feature engineering step on very large source from hours to a few min • Enabled DS team to iterate on models faster, going from 5+ hours for training to a half hour or less, even for complex GBT models • Reduced scoring step on prospective members from a week to 30 minutes Shared resources accelerate everyone • Hundreds of feature stores mean less process/data duplication and more time to improve model design with a variety of approaches • Flexibility to score at scale regardless of algorithm package in automated fashion with a common output format
  • 27. A more human way to healthcareTM
  • 28. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.