SlideShare a Scribd company logo
ML Explainability in
Michelangelo
Eric Wang
Michelangelo - Uber ML Platform Team
01 Challenges and Needs
02 Importance of ML explainability
03 Explainers
04 Architecture
05 User workflow and case studies
06 Future opportunities and Q&A
Contents
Challenges and Needs
Needs
- Understand and interpret how models make decisions.
- Provide transparency and understanding for ML practitioners
and stakeholders
- Model exploration and understanding
- Efficiency of their features
Why this is important
● Uber operates ~1000 ML pipelines
● No offer of feature importance insight for DL models
● Time consuming to explore features by training new models
(Efficiency/Resource)
Challenges and Needs
Model performance
Feature null rate monitoring
Michelangelo provided visual interfaces for:
Importance of Model Explainability
model_1
score
model_2
score
… ...
better
model?
better score?
??
Summary stats (AUC, MAE…) are informative, but not instructive for debugging
Questions:
1. Some features drift/quality
changes, are they important
enough?
2. Why two models performs
differently which features
drive more to the outcome?
3. Provide explanations for
operations and legals.
User’s Request
Making models more transparent and
interpretable
Needs to implement Explainable AI (XAI) for their Keras model to provide
clear explanations for model decisions, Users are investigating replacing a
formulaic model with the DNN model. Obviously the formulaic model is
more interpretable, Hence the team is looking for the DNN model to roughly
be explained by the same features.
Needs to provide explanation for business owners
regarding how a feed is promoted, this is also involved in
understanding how the decision is made for legal and
marketing teams.
A need to provide explanation for DL model in the online
prediction which is the same process for existing XGBoost
model. This requires us to develop a solution that can
integrate in Training so we can have baselines for the
explainer during realtime.
“
“
“
”
”
”
Importance of Model Explainability
Model debugging in Michelangelo: to make the 80% effort more efficient and effective.
Explainability in model debugging: Transparency and Trust, Feature importance
analysis,, Comparison
ML is a widely used technology for Uber’s business
However, developing successful models is a long and non-trivial process
80/20 rule in machine learning: 20% of percent effort building the initial working
model, 80% effort to improve its performance to the ideal level.
From https://cornellius.substack.com/p/pareto-principle-in-machine-learning
Explanation methods
TreeShap
Interactive tree ensemble model visualizer on frontend
Data source: any serialized tree model (Spark GBDT, XGBoost, ...)
KernelShap
The good
Model Agnostic
Local explanation support
Captures Feature Interactions
Comprehensive Explanations
The bad
Computational Complexity
Scalability Issues
Independence Assumption
Integrated gradients
Why?
● Gradient based with baselines comparison
● a popular interpretability technique to any
differentiable model (e.g. images, text,
structured data)
● Scalable with large computation needs
● Many machine learning libraries
(TensorFlow, PyTorch) provide
implementations of IG.
Integrated gradients
Benefits
● Completeness
● Interpretability
● Feature dependency
agnostic
● Efficiency
Feature values
Predicted score
Average prediction
Effect of feature on prediction:
0 + 0.17 +0.06 - 0.06 - 0.07 - 0.08 + 0.09 - 0.1 - 0.1 - 0.13 - 0.36 = -0.58 ~ -0.6
Integrated gradients
Notes
● Flatten Input features
● Choose the right layers (especially with categorical features)
● Use model wrapper to aggregate all outputs if possible
Using integration gradients
Model, model and model
Explainer
Model packaging for serving
Basis Feature
set
Feature joins
Prediction
(serving model)
Aggregated
Feature set
Feature
transformation
Decision
threshold
Post
processing
Not a raw model!
Using integration gradients
Save DL model separately
Explainer
Train Serving model
Raw model
Deploy to
endpoint
[keras.model, torch.nn.model, lightning…]
[torchscript, tf.compat.v1…]
Using integration gradients
Flatten input features
- Entity - an Uber business entity
such as city, rider, and driver (ex:
city, rider, driver, store)
- Feature Group - a feature group
for a given entity maps to a Hive
table and has features that are
related and convenient to
compute together
Entity
Feature
Group 1
Feature
Group 2
Feature
Group 3
Feature 1 Feature 2 Feature 3
Importance
level
Using integration gradients
Flatten input features
Entity
Feature
Group 1
Feature
Group 2
Feature
Group 3
Feature 1 Feature 2 Feature 3
Input to
model
Vectorized Bucketized
Vectorized
Feature 4
Importance
level
Using integration gradients
Flatten input features
Feature 1 Feature 2 Feature 3
Input to
model
Vectorized Bucketized
Vectorized
Feature 4
Using integration gradients
Choose the right layers
● Support both pyTorch
and Keras
● Support gradients on
input or output
● Ideal to pick the layer
for categorical features
Explainer
Using integration gradients
Use model wrapper to aggregate all outputs if possible
Model prediction pipelines
Basis Feature
set
Feature joins Prediction
Aggregated
Feature set
Feature
transformation
Decision
threshold
Post
processing
Using integration gradients
Use model wrapper to aggregate all outputs if possible
Model prediction pipelines
Basis
Feature set
Feature joins Prediction
Aggregate
d Feature
set
Feature
transformation
Decision
threshold
Post
processing
Calibrated
ML explainer in Michelangelo
Notebooks
1. Model debugging
2. Feature importance comparing
3. Visualization
Enabled for users
1. Different explainers (IG, TreeShap, KernelShap, etc)
2. Data conversion among different formats
3. Plotting
4. Model wrapper for calibration
● Backed in intuitive notions of what a good explanation
● Allows for both local and global reasoning, and it is
● Model agnostic
● Good adoption from popular explanation techniques
Visualize using Shapely value
Feature_0
Feature_1
Feature_2
Feature_3
Feature_4
Feature_5
Feature_6
ML explainer in Michelangelo
Generate feature importance in training pipeline
Model training pipelines
Basis Feature
set
Feature joins Trainer
Aggregated
Feature set
Feature
transformation
Explainer Packaging
ML explainer in Michelangelo
Monitoring Pipelines
1. Generate features importance during training time
2. Different thresholds based on importance
3. Reduce noise from feature quality null rate
Case Studies
Case 1. Identifying Useful Features
suburb Non-suburb
compare
Scenario
- A team at Uber is evaluating the order
conversion rate is very different
between suburb and non-suburb areas.
- Adding new features did not change
model’s overall performance
- What feature affect the most between
the different datasets.
Method: compared different datasets in the
same model
Findings: The location feature is more
important than engagement features such as
historical orders in the non-suburb dataset
Conclusion: should zoom-in the location
feature more to make it bit more accurate.
Smaller hexagon size helps
Scenario
- A team at Uber is evaluating the order
conversion rate is very different
between suburb and non-suburb areas.
- Adding new features did not change
model’s overall performance
- What feature affect the most between
the different datasets.
Method: compared different datasets in the
same model
Findings: The location feature is more
important than engagement features such as
historical orders in the non-suburb dataset
Conclusion: should zoom-in the location
feature more to make it bit more accurate.
Smaller hexagon size helps
Case Studies
Case 1. Identifying Useful Features
Scenario
A team want to see what photos the model
predicted incorrectly. Since our action involves
a cost associated with a wrong prediction
Method:
● generate importance for all low
prediction score features
● generate our features by calling the
label/object detection models from
external
Findings: Some object names not categorized
properly
Conclusion:
Created one hot encoded features from
dropped objects
Created features that look at whether the string
contains certain words
Case Studies
Case 2. Identifying false positive/negative
Architecture
XAI framework
Architecture
Components:
1. Data processing
a. Converting from pySpark to numpy
b. Feature flatten for calculating gradients
2. Explainer
a. Support multiple explainers (TreeShap/Kernel/IG…)
3. Model wrapper
a. Support different model caller function or forward
function. (keyword based or array based)
b. Support calibration and aggregation or specific output
layer
4. Importance aggregation
a. Aggregate importance from multiple dimensions
b. Feature mapping from output to input.
XAI framework
● Support LLM explanation in prompting engineer
● Feature selection assistant
● Interactive visualization tools
Future opportunities
Q&A
AI/ML Infra Meetup | ML explainability in Michelangelo

More Related Content

Similar to AI/ML Infra Meetup | ML explainability in Michelangelo

The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
Ivo Andreev
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
UXDXConf
 
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
Prasanna Hegde
 
IntelligentEnterprise
IntelligentEnterpriseIntelligentEnterprise
IntelligentEnterprise
Barry Grushkin 9,600 +
 
Scaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will loveScaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will love
June Andrews
 
Super applied in a sitecore migration project
Super applied in a sitecore migration projectSuper applied in a sitecore migration project
Super applied in a sitecore migration project
dodoshelu
 
Managing the Machine Learning Lifecycle with MLflow
Managing the Machine Learning Lifecycle with MLflowManaging the Machine Learning Lifecycle with MLflow
Managing the Machine Learning Lifecycle with MLflow
Databricks
 
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Benjamin Bengfort
 
Modeling Search Computing Applications
Modeling Search Computing ApplicationsModeling Search Computing Applications
Modeling Search Computing Applications
Marco Brambilla
 
C3 w5
C3 w5C3 w5
Deep Learning Vocabulary.docx
Deep Learning Vocabulary.docxDeep Learning Vocabulary.docx
Deep Learning Vocabulary.docx
jaffarbikat
 
School of Computing, Science & EngineeringAssessment Briefin.docx
School of Computing, Science & EngineeringAssessment Briefin.docxSchool of Computing, Science & EngineeringAssessment Briefin.docx
School of Computing, Science & EngineeringAssessment Briefin.docx
anhlodge
 
A Review of Feature Model Position in the Software Product Line and Its Extra...
A Review of Feature Model Position in the Software Product Line and Its Extra...A Review of Feature Model Position in the Software Product Line and Its Extra...
A Review of Feature Model Position in the Software Product Line and Its Extra...
CSCJournals
 
IRJET- Automatic Object Sorting using Deep Learning
IRJET- Automatic Object Sorting using Deep LearningIRJET- Automatic Object Sorting using Deep Learning
IRJET- Automatic Object Sorting using Deep Learning
IRJET Journal
 
Google machine learning engineer exam dumps 2022
Google machine learning engineer exam dumps 2022Google machine learning engineer exam dumps 2022
Google machine learning engineer exam dumps 2022
SkillCertProExams
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Jim Dowling
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
Data Science Milan
 
Boost your App with Gatling
Boost your App with GatlingBoost your App with Gatling
Boost your App with Gatling
Knoldus Inc.
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Edunomica
 
ODSC APAC 2022 - Explainable AI
ODSC APAC 2022 - Explainable AIODSC APAC 2022 - Explainable AI
ODSC APAC 2022 - Explainable AI
Aditya Bhattacharya
 

Similar to AI/ML Infra Meetup | ML explainability in Michelangelo (20)

The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
 
IntelligentEnterprise
IntelligentEnterpriseIntelligentEnterprise
IntelligentEnterprise
 
Scaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will loveScaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will love
 
Super applied in a sitecore migration project
Super applied in a sitecore migration projectSuper applied in a sitecore migration project
Super applied in a sitecore migration project
 
Managing the Machine Learning Lifecycle with MLflow
Managing the Machine Learning Lifecycle with MLflowManaging the Machine Learning Lifecycle with MLflow
Managing the Machine Learning Lifecycle with MLflow
 
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
 
Modeling Search Computing Applications
Modeling Search Computing ApplicationsModeling Search Computing Applications
Modeling Search Computing Applications
 
C3 w5
C3 w5C3 w5
C3 w5
 
Deep Learning Vocabulary.docx
Deep Learning Vocabulary.docxDeep Learning Vocabulary.docx
Deep Learning Vocabulary.docx
 
School of Computing, Science & EngineeringAssessment Briefin.docx
School of Computing, Science & EngineeringAssessment Briefin.docxSchool of Computing, Science & EngineeringAssessment Briefin.docx
School of Computing, Science & EngineeringAssessment Briefin.docx
 
A Review of Feature Model Position in the Software Product Line and Its Extra...
A Review of Feature Model Position in the Software Product Line and Its Extra...A Review of Feature Model Position in the Software Product Line and Its Extra...
A Review of Feature Model Position in the Software Product Line and Its Extra...
 
IRJET- Automatic Object Sorting using Deep Learning
IRJET- Automatic Object Sorting using Deep LearningIRJET- Automatic Object Sorting using Deep Learning
IRJET- Automatic Object Sorting using Deep Learning
 
Google machine learning engineer exam dumps 2022
Google machine learning engineer exam dumps 2022Google machine learning engineer exam dumps 2022
Google machine learning engineer exam dumps 2022
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
Boost your App with Gatling
Boost your App with GatlingBoost your App with Gatling
Boost your App with Gatling
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
 
ODSC APAC 2022 - Explainable AI
ODSC APAC 2022 - Explainable AIODSC APAC 2022 - Explainable AI
ODSC APAC 2022 - Explainable AI
 

More from Alluxio, Inc.

Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data PlatformAlluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio, Inc.
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
Alluxio, Inc.
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
Alluxio, Inc.
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio, Inc.
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio, Inc.
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
Alluxio, Inc.
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
Alluxio, Inc.
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
Alluxio, Inc.
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Alluxio, Inc.
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio, Inc.
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio, Inc.
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Alluxio, Inc.
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Alluxio, Inc.
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Alluxio, Inc.
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
Alluxio, Inc.
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
Alluxio, Inc.
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio, Inc.
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
Alluxio, Inc.
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
Alluxio, Inc.
 

More from Alluxio, Inc. (20)

Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data PlatformAlluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 

Recently uploaded

一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
dakas1
 
Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
sjcobrien
 
Preparing Non - Technical Founders for Engaging a Tech Agency
Preparing Non - Technical Founders for Engaging  a  Tech AgencyPreparing Non - Technical Founders for Engaging  a  Tech Agency
Preparing Non - Technical Founders for Engaging a Tech Agency
ISH Technologies
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
dakas1
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
brainerhub1
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
Sven Peters
 
UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
Peter Muessig
 
Benefits of Artificial Intelligence in Healthcare!
Benefits of  Artificial Intelligence in Healthcare!Benefits of  Artificial Intelligence in Healthcare!
Benefits of Artificial Intelligence in Healthcare!
Prestware
 
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptxMigration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
ervikas4
 
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
Luigi Fugaro
 
DevOps Consulting Company | Hire DevOps Services
DevOps Consulting Company | Hire DevOps ServicesDevOps Consulting Company | Hire DevOps Services
DevOps Consulting Company | Hire DevOps Services
seospiralmantra
 
Boost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management AppsBoost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management Apps
Jhone kinadey
 
ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.
Maitrey Patel
 
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
kalichargn70th171
 
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSISDECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
Tier1 app
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
safelyiotech
 
Kubernetes at Scale: Going Multi-Cluster with Istio
Kubernetes at Scale:  Going Multi-Cluster  with IstioKubernetes at Scale:  Going Multi-Cluster  with Istio
Kubernetes at Scale: Going Multi-Cluster with Istio
Severalnines
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
Bert Jan Schrijver
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
Alberto Brandolini
 
The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024
Yara Milbes
 

Recently uploaded (20)

一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
 
Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
 
Preparing Non - Technical Founders for Engaging a Tech Agency
Preparing Non - Technical Founders for Engaging  a  Tech AgencyPreparing Non - Technical Founders for Engaging  a  Tech Agency
Preparing Non - Technical Founders for Engaging a Tech Agency
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
 
UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
 
Benefits of Artificial Intelligence in Healthcare!
Benefits of  Artificial Intelligence in Healthcare!Benefits of  Artificial Intelligence in Healthcare!
Benefits of Artificial Intelligence in Healthcare!
 
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptxMigration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
 
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
 
DevOps Consulting Company | Hire DevOps Services
DevOps Consulting Company | Hire DevOps ServicesDevOps Consulting Company | Hire DevOps Services
DevOps Consulting Company | Hire DevOps Services
 
Boost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management AppsBoost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management Apps
 
ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.
 
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
 
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSISDECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
 
Kubernetes at Scale: Going Multi-Cluster with Istio
Kubernetes at Scale:  Going Multi-Cluster  with IstioKubernetes at Scale:  Going Multi-Cluster  with Istio
Kubernetes at Scale: Going Multi-Cluster with Istio
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
 
The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024
 

AI/ML Infra Meetup | ML explainability in Michelangelo

  • 1. ML Explainability in Michelangelo Eric Wang Michelangelo - Uber ML Platform Team
  • 2. 01 Challenges and Needs 02 Importance of ML explainability 03 Explainers 04 Architecture 05 User workflow and case studies 06 Future opportunities and Q&A Contents
  • 3. Challenges and Needs Needs - Understand and interpret how models make decisions. - Provide transparency and understanding for ML practitioners and stakeholders - Model exploration and understanding - Efficiency of their features Why this is important ● Uber operates ~1000 ML pipelines ● No offer of feature importance insight for DL models ● Time consuming to explore features by training new models (Efficiency/Resource)
  • 4. Challenges and Needs Model performance Feature null rate monitoring Michelangelo provided visual interfaces for:
  • 5. Importance of Model Explainability model_1 score model_2 score … ... better model? better score? ?? Summary stats (AUC, MAE…) are informative, but not instructive for debugging Questions: 1. Some features drift/quality changes, are they important enough? 2. Why two models performs differently which features drive more to the outcome? 3. Provide explanations for operations and legals.
  • 6. User’s Request Making models more transparent and interpretable Needs to implement Explainable AI (XAI) for their Keras model to provide clear explanations for model decisions, Users are investigating replacing a formulaic model with the DNN model. Obviously the formulaic model is more interpretable, Hence the team is looking for the DNN model to roughly be explained by the same features. Needs to provide explanation for business owners regarding how a feed is promoted, this is also involved in understanding how the decision is made for legal and marketing teams. A need to provide explanation for DL model in the online prediction which is the same process for existing XGBoost model. This requires us to develop a solution that can integrate in Training so we can have baselines for the explainer during realtime. “ “ “ ” ” ”
  • 7. Importance of Model Explainability Model debugging in Michelangelo: to make the 80% effort more efficient and effective. Explainability in model debugging: Transparency and Trust, Feature importance analysis,, Comparison ML is a widely used technology for Uber’s business However, developing successful models is a long and non-trivial process 80/20 rule in machine learning: 20% of percent effort building the initial working model, 80% effort to improve its performance to the ideal level. From https://cornellius.substack.com/p/pareto-principle-in-machine-learning
  • 9. TreeShap Interactive tree ensemble model visualizer on frontend Data source: any serialized tree model (Spark GBDT, XGBoost, ...)
  • 10. KernelShap The good Model Agnostic Local explanation support Captures Feature Interactions Comprehensive Explanations The bad Computational Complexity Scalability Issues Independence Assumption
  • 11. Integrated gradients Why? ● Gradient based with baselines comparison ● a popular interpretability technique to any differentiable model (e.g. images, text, structured data) ● Scalable with large computation needs ● Many machine learning libraries (TensorFlow, PyTorch) provide implementations of IG.
  • 12. Integrated gradients Benefits ● Completeness ● Interpretability ● Feature dependency agnostic ● Efficiency Feature values Predicted score Average prediction Effect of feature on prediction: 0 + 0.17 +0.06 - 0.06 - 0.07 - 0.08 + 0.09 - 0.1 - 0.1 - 0.13 - 0.36 = -0.58 ~ -0.6
  • 13. Integrated gradients Notes ● Flatten Input features ● Choose the right layers (especially with categorical features) ● Use model wrapper to aggregate all outputs if possible
  • 14. Using integration gradients Model, model and model Explainer Model packaging for serving Basis Feature set Feature joins Prediction (serving model) Aggregated Feature set Feature transformation Decision threshold Post processing Not a raw model!
  • 15. Using integration gradients Save DL model separately Explainer Train Serving model Raw model Deploy to endpoint [keras.model, torch.nn.model, lightning…] [torchscript, tf.compat.v1…]
  • 16. Using integration gradients Flatten input features - Entity - an Uber business entity such as city, rider, and driver (ex: city, rider, driver, store) - Feature Group - a feature group for a given entity maps to a Hive table and has features that are related and convenient to compute together Entity Feature Group 1 Feature Group 2 Feature Group 3 Feature 1 Feature 2 Feature 3
  • 17. Importance level Using integration gradients Flatten input features Entity Feature Group 1 Feature Group 2 Feature Group 3 Feature 1 Feature 2 Feature 3 Input to model Vectorized Bucketized Vectorized Feature 4
  • 18. Importance level Using integration gradients Flatten input features Feature 1 Feature 2 Feature 3 Input to model Vectorized Bucketized Vectorized Feature 4
  • 19. Using integration gradients Choose the right layers ● Support both pyTorch and Keras ● Support gradients on input or output ● Ideal to pick the layer for categorical features
  • 20. Explainer Using integration gradients Use model wrapper to aggregate all outputs if possible Model prediction pipelines Basis Feature set Feature joins Prediction Aggregated Feature set Feature transformation Decision threshold Post processing
  • 21. Using integration gradients Use model wrapper to aggregate all outputs if possible Model prediction pipelines Basis Feature set Feature joins Prediction Aggregate d Feature set Feature transformation Decision threshold Post processing Calibrated
  • 22. ML explainer in Michelangelo Notebooks 1. Model debugging 2. Feature importance comparing 3. Visualization Enabled for users 1. Different explainers (IG, TreeShap, KernelShap, etc) 2. Data conversion among different formats 3. Plotting 4. Model wrapper for calibration
  • 23. ● Backed in intuitive notions of what a good explanation ● Allows for both local and global reasoning, and it is ● Model agnostic ● Good adoption from popular explanation techniques Visualize using Shapely value Feature_0 Feature_1 Feature_2 Feature_3 Feature_4 Feature_5 Feature_6
  • 24. ML explainer in Michelangelo Generate feature importance in training pipeline Model training pipelines Basis Feature set Feature joins Trainer Aggregated Feature set Feature transformation Explainer Packaging
  • 25. ML explainer in Michelangelo Monitoring Pipelines 1. Generate features importance during training time 2. Different thresholds based on importance 3. Reduce noise from feature quality null rate
  • 26. Case Studies Case 1. Identifying Useful Features suburb Non-suburb compare Scenario - A team at Uber is evaluating the order conversion rate is very different between suburb and non-suburb areas. - Adding new features did not change model’s overall performance - What feature affect the most between the different datasets. Method: compared different datasets in the same model Findings: The location feature is more important than engagement features such as historical orders in the non-suburb dataset Conclusion: should zoom-in the location feature more to make it bit more accurate. Smaller hexagon size helps
  • 27. Scenario - A team at Uber is evaluating the order conversion rate is very different between suburb and non-suburb areas. - Adding new features did not change model’s overall performance - What feature affect the most between the different datasets. Method: compared different datasets in the same model Findings: The location feature is more important than engagement features such as historical orders in the non-suburb dataset Conclusion: should zoom-in the location feature more to make it bit more accurate. Smaller hexagon size helps Case Studies Case 1. Identifying Useful Features
  • 28. Scenario A team want to see what photos the model predicted incorrectly. Since our action involves a cost associated with a wrong prediction Method: ● generate importance for all low prediction score features ● generate our features by calling the label/object detection models from external Findings: Some object names not categorized properly Conclusion: Created one hot encoded features from dropped objects Created features that look at whether the string contains certain words Case Studies Case 2. Identifying false positive/negative
  • 30. Architecture Components: 1. Data processing a. Converting from pySpark to numpy b. Feature flatten for calculating gradients 2. Explainer a. Support multiple explainers (TreeShap/Kernel/IG…) 3. Model wrapper a. Support different model caller function or forward function. (keyword based or array based) b. Support calibration and aggregation or specific output layer 4. Importance aggregation a. Aggregate importance from multiple dimensions b. Feature mapping from output to input. XAI framework
  • 31. ● Support LLM explanation in prompting engineer ● Feature selection assistant ● Interactive visualization tools Future opportunities
  • 32. Q&A