SlideShare a Scribd company logo
Department of Mathematics unchitta@ucla.edu
Interpretability in ML
& Sparse Linear Models
Unchitta Kanjanasaratool
Mentor: Denali Molitor
Department of Mathematics unchitta@ucla.edu
Several components: model transparency, holistic model interpretability,
modular-level interpretability, local interpretability for a single prediction or a
group of predictions.
Proposed levels of evaluating interpretability (Doshi-Velez & Kim, 2017):
1. Application level
2. Human level
3. Function level
Definition (non-mathematical): Interpretability is the degree to which a
human can consistently explain or interpret why a model makes certain
decisions.
Interpretability in ML: Why It’s Important
Department of Mathematics unchitta@ucla.edu
We may not always need interpretability e.g. in low-risk or extensively studied
problems, but knowing the why can help us learn more about why a model
might fail.
Transparency & Ethics: The EU for example mandates that automated decisions
must be explainable and should respect fundamental rights.
Interpretability also satisfies human curiosity and learning.
Interpretability helps explain why a model makes certain predictions.
Interpretability in ML: Why It’s Important
Department of Mathematics unchitta@ucla.edu
Assume a linear relationship between the input and response variables, for example
Y = β0
+ β1
X1
+ … + βp
Xp
+ ϵ ,
we can predict
ŷ = + X1
+ Xp
,
where the ’s are the coefficients in the residual sum of squares (RSS) minimization
problem, namely
β^
= arg minβ0…βp
𝚺i=0,i=p
(yi
- ŷi
)2
.
The linear coefficients (the β’s) make this model easy to interpret on a modular
level, and help us see the level of influence each variable has on the prediction.
Interpretable Models: Linear Regression
Department of Mathematics unchitta@ucla.edu
In practice however, interpreting linear models can still be hard if there are too many
variables. There are certain variations of linear regression that can help with this
problem, such as the sparse models. An example would be the “least absolute
shrinkage and selection operator,” or lasso regression which minimizes
RSS subject to 𝚺j=0,j=p
|βj
| ≤ s.
This is equivalent to minimizing, with regularization, the quantity
RSS + ƛ 𝚺j=0,j=p
|βj
|,
where the second term in the quantity is some constant lambda times the L1 norm of
the coefficients. (Lambda normally chosen using cross-validation to yield the minimal
β’s).
Sparse Models & Lasso Regression
Department of Mathematics unchitta@ucla.edu
Intuitively, lasso penalizes large models and selects small coefficients. Unintuitively
(but we will see why), it also yields a model where a number of the coefficients are 0
or essentially close to 0. This is an example of a sparse regression model, which
penalizes large models (i.e. lots of features) and performs variable selection.
The penalization is controlled via lambda: the larger it is, the bigger the sparsity. If
lambda is small enough, lasso will yield the same results as the least square
estimates (standard linear regression).
Sparse Models & Lasso Regression
Department of Mathematics unchitta@ucla.edu
Having fewer variables often mean better interpretability.
Your model is explained by only a number of significant features, which reduces
complexity and increases explainability. This is especially important when your data
has hundreds or thousands of features; the complexity may be beyond human
comprehension and there may not be enough observations.
Other methods for introducing sparsity to linear models include feature selection
processes, subset selection and step-wise procedures e.g. forward and backward
selection, sparse PCA.
Sparsity & Interpretability
Department of Mathematics unchitta@ucla.edu
Sparse Property of Lasso
The variable selection property of lasso regression comes from the L1 regularizer.
Consider the following figure in 2D problems. In higher dimensions the constraint
region becomes polytopes (shapes with flat sides/sharp edges).
Courtesy of Introduction to Statistical Learning, G. James, et al.
The red ellipses represent the
contours of the RSS of lasso (left)
and ridge regression (right). The solid
green areas are the corresponding
constraint functions,
and .
Department of Mathematics unchitta@ucla.edu
Example: UCI Bike Sharing Dataset
Here we are trying to predict the
number of bike rentals as a function of
11 variables. As lambda increases the
response variable depends on smaller
number of predictors.
Department of Mathematics unchitta@ucla.edu
Example: UCI Bike Sharing Dataset
Likely the most
significant
features
Department of Mathematics unchitta@ucla.edu
Introduction to Statistical Learning with
Applications in R, Gareth James, et al.
Resources
Interpretable Machine Learning: A Guide for
Making Black Box Models Explainable,
Christopher Molnar

More Related Content

Similar to Interpretability in ML & Sparse Linear Regression

Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Madhav Mishra
 
Module 3: Linear Regression
Module 3:  Linear RegressionModule 3:  Linear Regression
Module 3: Linear Regression
Sara Hooker
 
MF Presentation.pptx
MF Presentation.pptxMF Presentation.pptx
MF Presentation.pptx
HarshitSingh334328
 
0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf
Leonardo Auslender
 
Visual Tools for explaining Machine Learning Models
Visual Tools for explaining Machine Learning ModelsVisual Tools for explaining Machine Learning Models
Visual Tools for explaining Machine Learning Models
Leonardo Auslender
 
4_5_Model Interpretation and diagnostics part 4_B.pdf
4_5_Model Interpretation and diagnostics part 4_B.pdf4_5_Model Interpretation and diagnostics part 4_B.pdf
4_5_Model Interpretation and diagnostics part 4_B.pdf
Leonardo Auslender
 
Machine Learning Interview Question and Answer
Machine Learning Interview Question and AnswerMachine Learning Interview Question and Answer
Machine Learning Interview Question and Answer
Learnbay Datascience
 
4_5_Model Interpretation and diagnostics part 4.pdf
4_5_Model Interpretation and diagnostics part 4.pdf4_5_Model Interpretation and diagnostics part 4.pdf
4_5_Model Interpretation and diagnostics part 4.pdf
Leonardo Auslender
 
2018 p 2019-ee-a2
2018 p 2019-ee-a22018 p 2019-ee-a2
2018 p 2019-ee-a2
uetian12
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
University of Sindh
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
Vimal Gupta
 
Regularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptxRegularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptx
Mohamed Essam
 
Pentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BIPentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BI
Studio Synthesis
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoff
Raman Kannan
 
Chapter01 introductory handbook
Chapter01 introductory handbookChapter01 introductory handbook
Chapter01 introductory handbook
Raman Kannan
 
Linear logisticregression
Linear logisticregressionLinear logisticregression
Linear logisticregression
kongara
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
ijaia
 
Mc0079 computer based optimization methods--phpapp02
Mc0079 computer based optimization methods--phpapp02Mc0079 computer based optimization methods--phpapp02
Mc0079 computer based optimization methods--phpapp02
Rabby Bhatt
 
Machine Learning basics
Machine Learning basicsMachine Learning basics
Machine Learning basics
NeeleEilers
 
Numerical models
Numerical modelsNumerical models
Numerical models
Robinson
 

Similar to Interpretability in ML & Sparse Linear Regression (20)

Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
 
Module 3: Linear Regression
Module 3:  Linear RegressionModule 3:  Linear Regression
Module 3: Linear Regression
 
MF Presentation.pptx
MF Presentation.pptxMF Presentation.pptx
MF Presentation.pptx
 
0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf
 
Visual Tools for explaining Machine Learning Models
Visual Tools for explaining Machine Learning ModelsVisual Tools for explaining Machine Learning Models
Visual Tools for explaining Machine Learning Models
 
4_5_Model Interpretation and diagnostics part 4_B.pdf
4_5_Model Interpretation and diagnostics part 4_B.pdf4_5_Model Interpretation and diagnostics part 4_B.pdf
4_5_Model Interpretation and diagnostics part 4_B.pdf
 
Machine Learning Interview Question and Answer
Machine Learning Interview Question and AnswerMachine Learning Interview Question and Answer
Machine Learning Interview Question and Answer
 
4_5_Model Interpretation and diagnostics part 4.pdf
4_5_Model Interpretation and diagnostics part 4.pdf4_5_Model Interpretation and diagnostics part 4.pdf
4_5_Model Interpretation and diagnostics part 4.pdf
 
2018 p 2019-ee-a2
2018 p 2019-ee-a22018 p 2019-ee-a2
2018 p 2019-ee-a2
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
 
Regularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptxRegularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptx
 
Pentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BIPentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BI
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoff
 
Chapter01 introductory handbook
Chapter01 introductory handbookChapter01 introductory handbook
Chapter01 introductory handbook
 
Linear logisticregression
Linear logisticregressionLinear logisticregression
Linear logisticregression
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
 
Mc0079 computer based optimization methods--phpapp02
Mc0079 computer based optimization methods--phpapp02Mc0079 computer based optimization methods--phpapp02
Mc0079 computer based optimization methods--phpapp02
 
Machine Learning basics
Machine Learning basicsMachine Learning basics
Machine Learning basics
 
Numerical models
Numerical modelsNumerical models
Numerical models
 

Recently uploaded

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 

Recently uploaded (20)

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 

Interpretability in ML & Sparse Linear Regression

  • 1. Department of Mathematics unchitta@ucla.edu Interpretability in ML & Sparse Linear Models Unchitta Kanjanasaratool Mentor: Denali Molitor
  • 2. Department of Mathematics unchitta@ucla.edu Several components: model transparency, holistic model interpretability, modular-level interpretability, local interpretability for a single prediction or a group of predictions. Proposed levels of evaluating interpretability (Doshi-Velez & Kim, 2017): 1. Application level 2. Human level 3. Function level Definition (non-mathematical): Interpretability is the degree to which a human can consistently explain or interpret why a model makes certain decisions. Interpretability in ML: Why It’s Important
  • 3. Department of Mathematics unchitta@ucla.edu We may not always need interpretability e.g. in low-risk or extensively studied problems, but knowing the why can help us learn more about why a model might fail. Transparency & Ethics: The EU for example mandates that automated decisions must be explainable and should respect fundamental rights. Interpretability also satisfies human curiosity and learning. Interpretability helps explain why a model makes certain predictions. Interpretability in ML: Why It’s Important
  • 4. Department of Mathematics unchitta@ucla.edu Assume a linear relationship between the input and response variables, for example Y = β0 + β1 X1 + … + βp Xp + ϵ , we can predict ŷ = + X1 + Xp , where the ’s are the coefficients in the residual sum of squares (RSS) minimization problem, namely β^ = arg minβ0…βp 𝚺i=0,i=p (yi - ŷi )2 . The linear coefficients (the β’s) make this model easy to interpret on a modular level, and help us see the level of influence each variable has on the prediction. Interpretable Models: Linear Regression
  • 5. Department of Mathematics unchitta@ucla.edu In practice however, interpreting linear models can still be hard if there are too many variables. There are certain variations of linear regression that can help with this problem, such as the sparse models. An example would be the “least absolute shrinkage and selection operator,” or lasso regression which minimizes RSS subject to 𝚺j=0,j=p |βj | ≤ s. This is equivalent to minimizing, with regularization, the quantity RSS + ƛ 𝚺j=0,j=p |βj |, where the second term in the quantity is some constant lambda times the L1 norm of the coefficients. (Lambda normally chosen using cross-validation to yield the minimal β’s). Sparse Models & Lasso Regression
  • 6. Department of Mathematics unchitta@ucla.edu Intuitively, lasso penalizes large models and selects small coefficients. Unintuitively (but we will see why), it also yields a model where a number of the coefficients are 0 or essentially close to 0. This is an example of a sparse regression model, which penalizes large models (i.e. lots of features) and performs variable selection. The penalization is controlled via lambda: the larger it is, the bigger the sparsity. If lambda is small enough, lasso will yield the same results as the least square estimates (standard linear regression). Sparse Models & Lasso Regression
  • 7. Department of Mathematics unchitta@ucla.edu Having fewer variables often mean better interpretability. Your model is explained by only a number of significant features, which reduces complexity and increases explainability. This is especially important when your data has hundreds or thousands of features; the complexity may be beyond human comprehension and there may not be enough observations. Other methods for introducing sparsity to linear models include feature selection processes, subset selection and step-wise procedures e.g. forward and backward selection, sparse PCA. Sparsity & Interpretability
  • 8. Department of Mathematics unchitta@ucla.edu Sparse Property of Lasso The variable selection property of lasso regression comes from the L1 regularizer. Consider the following figure in 2D problems. In higher dimensions the constraint region becomes polytopes (shapes with flat sides/sharp edges). Courtesy of Introduction to Statistical Learning, G. James, et al. The red ellipses represent the contours of the RSS of lasso (left) and ridge regression (right). The solid green areas are the corresponding constraint functions, and .
  • 9. Department of Mathematics unchitta@ucla.edu Example: UCI Bike Sharing Dataset Here we are trying to predict the number of bike rentals as a function of 11 variables. As lambda increases the response variable depends on smaller number of predictors.
  • 10. Department of Mathematics unchitta@ucla.edu Example: UCI Bike Sharing Dataset Likely the most significant features
  • 11. Department of Mathematics unchitta@ucla.edu Introduction to Statistical Learning with Applications in R, Gareth James, et al. Resources Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, Christopher Molnar