Regression vs ANOVA
By: Aniruddha Deshmukh – M. Sc. Statistics, MCM
Background
By Aniruddha Deshmukh - M. Sc. Statistics, MCM 2
It is very difficult to distinguish the differences between ANOVA and regression.
This is because both terms have more similarities than differences. It can be said
that ANOVA and regression are the two sides of the same coin.
Ref: my earlier post on “Data Types”
Continuous Data
• represent measurements
• e.g., you can measure the
height at progressively more
precise scales: meters,
centimeters, millimeters, and
beyond; so height is
continuous data.
Categorical Data
• describing/categorizing/
grouping something
• deals with characteristics and
descriptors that can't be easily
measured, but can be
observed subjectively - such as
smells, tastes, textures,
attractiveness, and color.
Let us first understand what is Continuous data and what is Categorical data.
Which tool to use when?
By Aniruddha Deshmukh - M. Sc. Statistics, MCM 3
Regression
• When Continuous Y and
Continuous X’s
• Continuous Y, Continuous AND
Categorical X(s)
• Logistic Regression:
Categorical Y, Continuous AND
Categorical X(s)
ANOVA
• When Continuous Y and
Categorical X’s
• Continuous Y, Continuous AND
Categorical X(s)
• Can be applied to any
regression model (no matter if
the model contains only
continuous, only categorical,
or both kinds of predictors)
Regression ANOVA
By Aniruddha Deshmukh - M. Sc. Statistics, MCM 4
• Fits least-squares straight line to data
• Predict a continuous outcome on the
basis of one or more continuous
predictor variables
• Quantify effect sizes in terms of "how
much is the response expected to
change when the predictor(s) change by
a given amount?“
• Asses the quantitative relation between
a predictor and the response
• Sorts data into boxes and finds averages
• Predict a continuous outcome on the
basis of one or more categorical
predictor variables
• Check how much the residual variance is
reduced by predictors in (nested
regression) models
• Assess the impact of a predictor or a
whole set of predictors on the residuals:
how much of the variance in the data
can be explained by these predictors?
ANOVA is a special case of regression, but from the perspective of their uses, there
is a different flavor; if the independent/predictor variable is categorical, you must
use ANOVA, otherwise use regression analysis.
Types of analysis-independent samples
By Aniruddha Deshmukh - M. Sc. Statistics, MCM 5
Outcome Explanatory Analysis
Continuous Dichotomous t-test, Wilcoxon test
Continuous Categorical
ANOVA, linear
regression
Continuous Continuous
Correlation, linear
regression
Dichotomous Dichotomous
Chi-square test,
logistic regression
Dichotomous Continuous Logistic regression
Time to event Dichotomous Log-rank test
Summary
• A regression model is based on one or more continuous predictor variables.
• On the contrary, the ANOVA model is based on one or more categorical
predictor variables.
• In ANOVA there can be several error terms whereas there is only a single error
term in regression.
• ANOVA is mainly used to determine if data from various groups have a
common means or not.
• Regression is widely used for forecasting and predictions.
• It is also used for seeing which independent variable is related to the
dependent variable.
• The first form of regression can be found in Legendre’s book ‘Method of Least
Squares.’
• It was Francis Galton who coined the term ‘regression’ in the 19th century.
• ANOVA was first used informally by researchers in the 1800s. It got wide
popularity after Fischer included this term in his book ‘Statistical Methods for
Research Workers.’
By Aniruddha Deshmukh - M. Sc. Statistics, MCM 6
Aniruddha Deshmukh – M. Sc. Statistics, MCM
email: annied23@gmail.com
For more information please contact:

Regression vs ANOVA

  • 1.
    Regression vs ANOVA By:Aniruddha Deshmukh – M. Sc. Statistics, MCM
  • 2.
    Background By Aniruddha Deshmukh- M. Sc. Statistics, MCM 2 It is very difficult to distinguish the differences between ANOVA and regression. This is because both terms have more similarities than differences. It can be said that ANOVA and regression are the two sides of the same coin. Ref: my earlier post on “Data Types” Continuous Data • represent measurements • e.g., you can measure the height at progressively more precise scales: meters, centimeters, millimeters, and beyond; so height is continuous data. Categorical Data • describing/categorizing/ grouping something • deals with characteristics and descriptors that can't be easily measured, but can be observed subjectively - such as smells, tastes, textures, attractiveness, and color. Let us first understand what is Continuous data and what is Categorical data.
  • 3.
    Which tool touse when? By Aniruddha Deshmukh - M. Sc. Statistics, MCM 3 Regression • When Continuous Y and Continuous X’s • Continuous Y, Continuous AND Categorical X(s) • Logistic Regression: Categorical Y, Continuous AND Categorical X(s) ANOVA • When Continuous Y and Categorical X’s • Continuous Y, Continuous AND Categorical X(s) • Can be applied to any regression model (no matter if the model contains only continuous, only categorical, or both kinds of predictors)
  • 4.
    Regression ANOVA By AniruddhaDeshmukh - M. Sc. Statistics, MCM 4 • Fits least-squares straight line to data • Predict a continuous outcome on the basis of one or more continuous predictor variables • Quantify effect sizes in terms of "how much is the response expected to change when the predictor(s) change by a given amount?“ • Asses the quantitative relation between a predictor and the response • Sorts data into boxes and finds averages • Predict a continuous outcome on the basis of one or more categorical predictor variables • Check how much the residual variance is reduced by predictors in (nested regression) models • Assess the impact of a predictor or a whole set of predictors on the residuals: how much of the variance in the data can be explained by these predictors? ANOVA is a special case of regression, but from the perspective of their uses, there is a different flavor; if the independent/predictor variable is categorical, you must use ANOVA, otherwise use regression analysis.
  • 5.
    Types of analysis-independentsamples By Aniruddha Deshmukh - M. Sc. Statistics, MCM 5 Outcome Explanatory Analysis Continuous Dichotomous t-test, Wilcoxon test Continuous Categorical ANOVA, linear regression Continuous Continuous Correlation, linear regression Dichotomous Dichotomous Chi-square test, logistic regression Dichotomous Continuous Logistic regression Time to event Dichotomous Log-rank test
  • 6.
    Summary • A regressionmodel is based on one or more continuous predictor variables. • On the contrary, the ANOVA model is based on one or more categorical predictor variables. • In ANOVA there can be several error terms whereas there is only a single error term in regression. • ANOVA is mainly used to determine if data from various groups have a common means or not. • Regression is widely used for forecasting and predictions. • It is also used for seeing which independent variable is related to the dependent variable. • The first form of regression can be found in Legendre’s book ‘Method of Least Squares.’ • It was Francis Galton who coined the term ‘regression’ in the 19th century. • ANOVA was first used informally by researchers in the 1800s. It got wide popularity after Fischer included this term in his book ‘Statistical Methods for Research Workers.’ By Aniruddha Deshmukh - M. Sc. Statistics, MCM 6
  • 7.
    Aniruddha Deshmukh –M. Sc. Statistics, MCM email: annied23@gmail.com For more information please contact: