SlideShare a Scribd company logo
Advanced Regression
and Model Selection
UpGrad Live Session - Ankit Jain
Model Selection Techniques
● If you are looking for a good place to start to choose a
machine learning algorithm for your dataset here are some
general guidelines.
● How large is your training set?
○ Small -- Prefer high bias/low variance classifiers (e.g.
Naive Bayes) over low bias/high variance classifiers (e.g.
KNN) to avoid overfitting.
○ Large - Low Bias/High Variance classifiers tend to produce
more accurate models
Adv/Disadv of Various Algorithms
● Naive Bayes:
○ Very simple to implement as it’s just a bunch of counts.
○ If conditional independence exists, it converges faster
than say Logistic Regression and thus requires less
training data.
○ If you want something fast,easy and performs well NB is a
good choice
○ Biggest disadvantage is that it can’t learn interactions in
the dataset
Adv/Disadv of Various Algorithms
● Logistic Regression:
○ Lots of ways to regularize the model and no need to worry
about features being correlated like in Naive Bayes.
○ Nice probabilistic interpretation. Helpful in problems like
churn prediction etc .
○ Online algorithm: Easy to update the model with the new
data (using an online gradient descent method)
Adv/Disadv of Various Algorithms
● Decision Trees:
○ Easy to explain and interpret (at least for some people)
○ Easily handles feature interactions.
○ No need to worry about outliers or whether data is linearly
separable or not.
○ Doesn’t support online learning. Rebuilding the model with
new data every time can be painful.
○ Tend to easily overfit. Solution: ensemble methods (RF)
Adv/Disadv of Various Algorithms
● SVM:
○ High accuracy for many datasets
○ With appropriate kernel, can work well even if your data
isn’t linearly separable in the base feature space.
○ Popular in text processing applications where high
dimensionality is a norm
○ Memory intensive, hard to interpret and kind of annoying
to run and tune
ADVANCED REGRESSION
Linear Regression Issues
● Sensitivity to outliers
● Multicollinearity leads to high variance of the estimator.
● Prone to overfit if there are lot of variables
● Hard to interpret when the number of predictors is large.Need
a smaller subset that exhibits strongest effects.
Regularization Techniques
● Regularization techniques typically work by penalizing the
magnitude of coefficients of features along with minimizing
the error between predicted and actual observations
● Different types of penalization
○ Ridge Regression: Penalize on squared coefficients
○ Lasso Regression: Penalize on absolute value of
coefficients
Why penalize on model coefficients?
Model1 = beta0 + beta1*x Model2 = beta0 + beta1*x + … beta10*x^10
beta1 = -0.58 beta1 = -1.4e05
Ridge Regression
● L2 penalty
● Pros
○ Variables >> Rows
○ Multicollinearity
○ Increased bias and lower variance from Linear Regression
● Cons
○ Doesn’t produce parsimonious model
Let’s see a collinearity example in R
Example: Luekemia Prediction
● Leukemia Data, Golub et al. Science 1999
● There are 38 training samples and 34 test samples with total
genes ~ 7000 (p >> n)
● Xij is the gene expression value for sample i and gene j
● Sample i either has tumor type AML or ALL
● We want to select genes relevant to tumor type
○ eliminate the trivial genes
○ grouped selection as many genes are highly correlated
● Ridge Regression can help to pursue this modeling
Grouped Selection
● If two predictors are highly correlated among themselves, the
estimated coefficients will be similar for them.
● if some variables are exactly identical, they will have same
coefficients
Ridge is good for grouped selection but not good for eliminating
trivial genes
LASSO
● Pros
○ Allow p >> n
○ Enforce sparsity in parameters
● Cons
○ If a group of predictors are highly correlated among
themselves, LASSO tends to pick only one of them and
shrink the other to zero
○ can not do grouped selection, tend to select one variable
LASSO is good for eliminating trivial genes but not good for
grouped selection
Elastic Net
● Weighted combination of L1 and L2 penalty
● Helps in enforcing sparsity
● Encourage grouping effect in highly correlated predictors
In gene selection problem, it can achieve both purposes of
removing trivial genes and doing group selection
Other Advanced Regression Methods
Poisson Regression
○ Typically used when the Y variable follows poisson
distribution (typically counts of events within a time t)
○ # times a customer will visit an ecommerce website next
month
Piecewise Linear Regression
● Polynomial regression
won’t work perfectly as it
will have high tendency to
overfit/underfit
● Instead, splitting the curve
into separate linear pieces
and building linear model
for each piece leads to
better results
QUESTIONS

More Related Content

What's hot

Ensemble methods
Ensemble methodsEnsemble methods
Ensemble methods
Christopher Marker
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximization
butest
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
Prof. Neeta Awasthy
 
Gradient descent optimizer
Gradient descent optimizerGradient descent optimizer
Gradient descent optimizer
Hojin Yang
 
Introduction to genetic programming
Introduction to genetic programmingIntroduction to genetic programming
Introduction to genetic programming
abhishek singh
 
Chromatic Number of a Graph (Graph Colouring)
Chromatic Number of a Graph (Graph Colouring)Chromatic Number of a Graph (Graph Colouring)
Chromatic Number of a Graph (Graph Colouring)
Adwait Hegde
 
Unsupervised visual representation learning overview: Toward Self-Supervision
Unsupervised visual representation learning overview: Toward Self-SupervisionUnsupervised visual representation learning overview: Toward Self-Supervision
Unsupervised visual representation learning overview: Toward Self-Supervision
LEE HOSEONG
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Agile Testing Alliance
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
PyData
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
Jinwon Lee
 
LeNet-5
LeNet-5LeNet-5
LeNet-5
佳蓉 倪
 
Minmax Algorithm In Artificial Intelligence slides
Minmax Algorithm In Artificial Intelligence slidesMinmax Algorithm In Artificial Intelligence slides
Minmax Algorithm In Artificial Intelligence slides
SamiaAziz4
 
Image segmentation ppt
Image segmentation pptImage segmentation ppt
Image segmentation ppt
Gichelle Amon
 
HML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningHML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep Learning
Yan Xu
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Preferred Networks
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
Knoldus Inc.
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
SOUMIT KAR
 
Machine Learning and Inductive Inference
Machine Learning and Inductive InferenceMachine Learning and Inductive Inference
Machine Learning and Inductive Inference
butest
 
Asymptotic analysis of parallel programs
Asymptotic analysis of parallel programsAsymptotic analysis of parallel programs
Asymptotic analysis of parallel programs
Sumita Das
 
Data Augmentation
Data AugmentationData Augmentation
Data Augmentation
Md Tajul Islam
 

What's hot (20)

Ensemble methods
Ensemble methodsEnsemble methods
Ensemble methods
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximization
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Gradient descent optimizer
Gradient descent optimizerGradient descent optimizer
Gradient descent optimizer
 
Introduction to genetic programming
Introduction to genetic programmingIntroduction to genetic programming
Introduction to genetic programming
 
Chromatic Number of a Graph (Graph Colouring)
Chromatic Number of a Graph (Graph Colouring)Chromatic Number of a Graph (Graph Colouring)
Chromatic Number of a Graph (Graph Colouring)
 
Unsupervised visual representation learning overview: Toward Self-Supervision
Unsupervised visual representation learning overview: Toward Self-SupervisionUnsupervised visual representation learning overview: Toward Self-Supervision
Unsupervised visual representation learning overview: Toward Self-Supervision
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
 
LeNet-5
LeNet-5LeNet-5
LeNet-5
 
Minmax Algorithm In Artificial Intelligence slides
Minmax Algorithm In Artificial Intelligence slidesMinmax Algorithm In Artificial Intelligence slides
Minmax Algorithm In Artificial Intelligence slides
 
Image segmentation ppt
Image segmentation pptImage segmentation ppt
Image segmentation ppt
 
HML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningHML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep Learning
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
 
Machine Learning and Inductive Inference
Machine Learning and Inductive InferenceMachine Learning and Inductive Inference
Machine Learning and Inductive Inference
 
Asymptotic analysis of parallel programs
Asymptotic analysis of parallel programsAsymptotic analysis of parallel programs
Asymptotic analysis of parallel programs
 
Data Augmentation
Data AugmentationData Augmentation
Data Augmentation
 

Similar to Advanced regression and model selection

Ai saturdays presentation
Ai saturdays presentationAi saturdays presentation
Ai saturdays presentation
Gurram Poorna Prudhvi
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Universitat Politècnica de Catalunya
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
MohamedAliHabib3
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
Valerii Klymchuk
 
Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies
Dori Waldman
 
Machine learning4dummies
Machine learning4dummiesMachine learning4dummies
Machine learning4dummies
Michael Winer
 
How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ?
HackerEarth
 
AI Algorithms
AI AlgorithmsAI Algorithms
AI Algorithms
Dr. C.V. Suresh Babu
 
A brief introduction to Searn Algorithm
A brief introduction to Searn AlgorithmA brief introduction to Searn Algorithm
A brief introduction to Searn Algorithm
Supun Abeysinghe
 
Linear Regression Paper Review.pptx
Linear Regression Paper Review.pptxLinear Regression Paper Review.pptx
Linear Regression Paper Review.pptx
MurindanyiSudi1
 
NYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell Rebo
NYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell ReboNYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell Rebo
NYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell Rebo
Maryam Farooq
 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2O
Sri Ambati
 
Adapting neural networks for the estimation of treatment effects
Adapting neural networks for the estimation of treatment effectsAdapting neural networks for the estimation of treatment effects
Adapting neural networks for the estimation of treatment effects
Viswanath Gangavaram
 
Dimensionality Reduction.pptx
Dimensionality Reduction.pptxDimensionality Reduction.pptx
Dimensionality Reduction.pptx
PriyadharshiniG41
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
University of Sindh
 
GDG Cloud Community Day 2022 - Managing data quality in Machine Learning
GDG Cloud Community Day 2022 -  Managing data quality in Machine LearningGDG Cloud Community Day 2022 -  Managing data quality in Machine Learning
GDG Cloud Community Day 2022 - Managing data quality in Machine Learning
SARADINDU SENGUPTA
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
RaflyRizky2
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
Hadrian7
 
VSSML17 Review. Summary Day 1 Sessions
VSSML17 Review. Summary Day 1 SessionsVSSML17 Review. Summary Day 1 Sessions
VSSML17 Review. Summary Day 1 Sessions
BigML, Inc
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3
Luis Borbon
 

Similar to Advanced regression and model selection (20)

Ai saturdays presentation
Ai saturdays presentationAi saturdays presentation
Ai saturdays presentation
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies
 
Machine learning4dummies
Machine learning4dummiesMachine learning4dummies
Machine learning4dummies
 
How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ?
 
AI Algorithms
AI AlgorithmsAI Algorithms
AI Algorithms
 
A brief introduction to Searn Algorithm
A brief introduction to Searn AlgorithmA brief introduction to Searn Algorithm
A brief introduction to Searn Algorithm
 
Linear Regression Paper Review.pptx
Linear Regression Paper Review.pptxLinear Regression Paper Review.pptx
Linear Regression Paper Review.pptx
 
NYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell Rebo
NYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell ReboNYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell Rebo
NYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell Rebo
 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2O
 
Adapting neural networks for the estimation of treatment effects
Adapting neural networks for the estimation of treatment effectsAdapting neural networks for the estimation of treatment effects
Adapting neural networks for the estimation of treatment effects
 
Dimensionality Reduction.pptx
Dimensionality Reduction.pptxDimensionality Reduction.pptx
Dimensionality Reduction.pptx
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
GDG Cloud Community Day 2022 - Managing data quality in Machine Learning
GDG Cloud Community Day 2022 -  Managing data quality in Machine LearningGDG Cloud Community Day 2022 -  Managing data quality in Machine Learning
GDG Cloud Community Day 2022 - Managing data quality in Machine Learning
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
VSSML17 Review. Summary Day 1 Sessions
VSSML17 Review. Summary Day 1 SessionsVSSML17 Review. Summary Day 1 Sessions
VSSML17 Review. Summary Day 1 Sessions
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3
 

More from Ankit Jain

Ai in logistics at uber
Ai in logistics at uberAi in logistics at uber
Ai in logistics at uber
Ankit Jain
 
Data analytics in fraud detection and customer feedback
Data analytics in fraud detection and customer feedbackData analytics in fraud detection and customer feedback
Data analytics in fraud detection and customer feedback
Ankit Jain
 
Data Science in Ecommerce
Data Science in EcommerceData Science in Ecommerce
Data Science in Ecommerce
Ankit Jain
 
Structure Approach to Analytics Interviews
Structure Approach to Analytics InterviewsStructure Approach to Analytics Interviews
Structure Approach to Analytics Interviews
Ankit Jain
 
Data Science Projects @ Runnr
Data Science Projects @ RunnrData Science Projects @ Runnr
Data Science Projects @ Runnr
Ankit Jain
 
Life Lessons
Life LessonsLife Lessons
Life Lessons
Ankit Jain
 
Data analytics workshop @IIIT Bangalore
Data analytics workshop @IIIT BangaloreData analytics workshop @IIIT Bangalore
Data analytics workshop @IIIT Bangalore
Ankit Jain
 

More from Ankit Jain (7)

Ai in logistics at uber
Ai in logistics at uberAi in logistics at uber
Ai in logistics at uber
 
Data analytics in fraud detection and customer feedback
Data analytics in fraud detection and customer feedbackData analytics in fraud detection and customer feedback
Data analytics in fraud detection and customer feedback
 
Data Science in Ecommerce
Data Science in EcommerceData Science in Ecommerce
Data Science in Ecommerce
 
Structure Approach to Analytics Interviews
Structure Approach to Analytics InterviewsStructure Approach to Analytics Interviews
Structure Approach to Analytics Interviews
 
Data Science Projects @ Runnr
Data Science Projects @ RunnrData Science Projects @ Runnr
Data Science Projects @ Runnr
 
Life Lessons
Life LessonsLife Lessons
Life Lessons
 
Data analytics workshop @IIIT Bangalore
Data analytics workshop @IIIT BangaloreData analytics workshop @IIIT Bangalore
Data analytics workshop @IIIT Bangalore
 

Recently uploaded

End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 

Recently uploaded (20)

End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 

Advanced regression and model selection

  • 1. Advanced Regression and Model Selection UpGrad Live Session - Ankit Jain
  • 2. Model Selection Techniques ● If you are looking for a good place to start to choose a machine learning algorithm for your dataset here are some general guidelines. ● How large is your training set? ○ Small -- Prefer high bias/low variance classifiers (e.g. Naive Bayes) over low bias/high variance classifiers (e.g. KNN) to avoid overfitting. ○ Large - Low Bias/High Variance classifiers tend to produce more accurate models
  • 3. Adv/Disadv of Various Algorithms ● Naive Bayes: ○ Very simple to implement as it’s just a bunch of counts. ○ If conditional independence exists, it converges faster than say Logistic Regression and thus requires less training data. ○ If you want something fast,easy and performs well NB is a good choice ○ Biggest disadvantage is that it can’t learn interactions in the dataset
  • 4. Adv/Disadv of Various Algorithms ● Logistic Regression: ○ Lots of ways to regularize the model and no need to worry about features being correlated like in Naive Bayes. ○ Nice probabilistic interpretation. Helpful in problems like churn prediction etc . ○ Online algorithm: Easy to update the model with the new data (using an online gradient descent method)
  • 5. Adv/Disadv of Various Algorithms ● Decision Trees: ○ Easy to explain and interpret (at least for some people) ○ Easily handles feature interactions. ○ No need to worry about outliers or whether data is linearly separable or not. ○ Doesn’t support online learning. Rebuilding the model with new data every time can be painful. ○ Tend to easily overfit. Solution: ensemble methods (RF)
  • 6. Adv/Disadv of Various Algorithms ● SVM: ○ High accuracy for many datasets ○ With appropriate kernel, can work well even if your data isn’t linearly separable in the base feature space. ○ Popular in text processing applications where high dimensionality is a norm ○ Memory intensive, hard to interpret and kind of annoying to run and tune
  • 8. Linear Regression Issues ● Sensitivity to outliers ● Multicollinearity leads to high variance of the estimator. ● Prone to overfit if there are lot of variables ● Hard to interpret when the number of predictors is large.Need a smaller subset that exhibits strongest effects.
  • 9. Regularization Techniques ● Regularization techniques typically work by penalizing the magnitude of coefficients of features along with minimizing the error between predicted and actual observations ● Different types of penalization ○ Ridge Regression: Penalize on squared coefficients ○ Lasso Regression: Penalize on absolute value of coefficients
  • 10. Why penalize on model coefficients? Model1 = beta0 + beta1*x Model2 = beta0 + beta1*x + … beta10*x^10 beta1 = -0.58 beta1 = -1.4e05
  • 11. Ridge Regression ● L2 penalty ● Pros ○ Variables >> Rows ○ Multicollinearity ○ Increased bias and lower variance from Linear Regression ● Cons ○ Doesn’t produce parsimonious model Let’s see a collinearity example in R
  • 12. Example: Luekemia Prediction ● Leukemia Data, Golub et al. Science 1999 ● There are 38 training samples and 34 test samples with total genes ~ 7000 (p >> n) ● Xij is the gene expression value for sample i and gene j ● Sample i either has tumor type AML or ALL ● We want to select genes relevant to tumor type ○ eliminate the trivial genes ○ grouped selection as many genes are highly correlated ● Ridge Regression can help to pursue this modeling
  • 13. Grouped Selection ● If two predictors are highly correlated among themselves, the estimated coefficients will be similar for them. ● if some variables are exactly identical, they will have same coefficients Ridge is good for grouped selection but not good for eliminating trivial genes
  • 14. LASSO ● Pros ○ Allow p >> n ○ Enforce sparsity in parameters ● Cons ○ If a group of predictors are highly correlated among themselves, LASSO tends to pick only one of them and shrink the other to zero ○ can not do grouped selection, tend to select one variable LASSO is good for eliminating trivial genes but not good for grouped selection
  • 15. Elastic Net ● Weighted combination of L1 and L2 penalty ● Helps in enforcing sparsity ● Encourage grouping effect in highly correlated predictors In gene selection problem, it can achieve both purposes of removing trivial genes and doing group selection
  • 16. Other Advanced Regression Methods Poisson Regression ○ Typically used when the Y variable follows poisson distribution (typically counts of events within a time t) ○ # times a customer will visit an ecommerce website next month
  • 17. Piecewise Linear Regression ● Polynomial regression won’t work perfectly as it will have high tendency to overfit/underfit ● Instead, splitting the curve into separate linear pieces and building linear model for each piece leads to better results