SlideShare a Scribd company logo
1 of 15
Download to read offline
Lecture 4: Transformations
Regression III:
Advanced Methods
William G. Jacoby
Michigan State University
2
Goals of the lecture
• The Ladder of Roots and Powers
• Changing the shape of distributions
• Transforming for Linearity
3
Why transform data?
1. In some instances it can help us better examine a
distribution
2. Many statistical models are based on the mean and thus
require that the mean is an appropriate measure of
central tendency (i.e., the distribution is approximately
normal)
3. Linear least squares regression assumes that the
relationship between two variables is linear. Often we can
“straighten” a nonlinear relationship by transforming one
or both of the variables
— Often transformations will ‘fix’ problem distributions
so that we can use least-squares regression
— When transformations fail to remedy these problems,
another option is to use nonparametric regression,
which makes fewer assumptions about the data
4
Power transformations for
quantitative variables
• Although there are an infinite number of functions f(x)
that can be used to transform a distribution, in practice
only a relatively small number are regularly used
• For quantitative variables one can usually rely on the
“family” of powers and roots:
• When p is negative, the transformation is an inverse
power:
• When p is a fraction, the transformation represents a
root:
5
Log transformations
• A power transformation of X0 should not be used
because it changes all values to 1 (in other words, it
makes the variable a constant)
• Instead we can think of X0 as a shorthand for the log
transformation logeX, where e 2.718 is the base of
the natural logarithms:
• In practice most people prefer to use log10X because it
is easier to interpret—increasing log10X by 1 is the same
as multiplying X by 10
• In terms of result, it matters little which base is used
because changing base is equivalent to multiplying X by
a constant
6
Cautions: Power Transformations (1)
• Descending the ladder of powers and roots compresses
the large values of X and spreads out the small values
• As p moves away from 1 in either direction, the
transformation becomes more powerful
• Power transformations are sensible ONLY when all the X
values are POSITIVE—If not, this can be solved by
adding a start value
– Some transformations (e.g., log, square root, are
undefined for 0 and negative numbers)
– Other power transformations will not be monotone,
thus changing the order of the data
7
Cautions: Power Transformations (2)
• Power transformations are only effective if the ratio of
the largest data value to the smallest data value is
large
• If the ratio is very close to 1, the transformation will
have little effect
• General rule: If the ratio is less than 5, a negative start
value should be considered
8
Transforming Skewed Distributions
• The example below shows
how a log10 transformation
can fix a positive skew
• The density estimate for
average income for
occupations from the
Canadian Prestige data is
shown on top; the bottom
shows the density estimate
of the transformed income
0 10000 20000 30000
0.000000.000060.00012
income
Probabilitydensityfunction
2.5 3.0 3.5 4.0 4.5
0.00.51.01.52.0
log.inc
Probabilitydensityfunction
9
Transforming Nonlinearity
When is it possible?
• An important use of
transformations is to
‘straighten’ the relationship
between two variables
• This is possible only when the
nonlinear relationship is simple
and monotone
– Simple implies that the
curvature does not change—
there is one curve
– Monotone implies that the
curve is always positive or
always negative
• (a) can be transformed, (b) and
(c) can not
10
The ‘Bulging Rule’ for
transformations
• Tukey and Mosteller’s rule
provides a starting point for
possible transformations to
correct nonlinearity
• Normally we should try to
transform explanatory
variables rather than the
response variable Y since a
transformation of Y will
affect the relationship of Y
with all Xs not just the one
with the nonlinear
relationship
• If, however, the response
variable is highly skewed, it
makes sense to transform
it instead
11
Transforming relationships
Income and Infant mortality (1)
• Leinhardt’s data from the
car library
• Robust local regression in
the plot shows serious
nonlinearity
• The bulging rule suggests
that both Y and X can be
transformed down the
ladder of powers
• I tried taking the log of
income only, but significant
nonlinearity still remained
• In the end, I took the log10
of both income and infant
mortality
0 1000 2000 3000 4000 5000
0100200300400500600
income
infant
12
Income and Infant mortality (2)
2.0 2.5 3.0 3.5
1.01.52.02.5
log.income
log.infant
• A linear model fits
well here
• Since both variables
are transformed by
the log10 the
coefficients are easy
to interpret:
– An increase in
income by 1% is
associated, on
average, with a
.51% decrease in
infant mortality
13
Transforming Proportions
• The logit transformation: (a) removes the boundaries of
the scale, (b) spreads out the tails of the distribution
and (c) makes the distribution symmetric about 0. It
takes the following form:
• Power transformations will not work for proportions
(including percentages and rates) if the data values
approach the boundaries of 0 and 1
• Instead, we can use the logit or probit transformations
for skewed proportion distributions. If their scales are
equated, these two are practically indistinguishable:
14
Logit Transformation of a Proportion
• Notice that the
transformation is nearly
linear for proportions
between .20 and .80
• Values close to 0 and 1
are spread out at an
increasing rate, however
• Finally, the transformed
variable is now centered
at 0 rather than .5
15
Next Topics:
– The Basics of Least Squares Regression
• Least-squares fit
• Properties of the least-squares estimator
• Statistical inference
• Regression in matrix form
– The Vector Representation of the Regression Model

More Related Content

What's hot

Correlation & Linear Regression
Correlation & Linear RegressionCorrelation & Linear Regression
Correlation & Linear RegressionHammad Waseem
 
Machine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear RegressionMachine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear RegressionKush Kulshrestha
 
2.3 the simple regression model
2.3 the simple regression model2.3 the simple regression model
2.3 the simple regression modelRegmi Milan
 
Correlation analysis
Correlation analysisCorrelation analysis
Correlation analysisAwais Salman
 
Introduction to Regression Analysis
Introduction to Regression AnalysisIntroduction to Regression Analysis
Introduction to Regression AnalysisSibashis Chakraborty
 
Econometric model ing
Econometric model ingEconometric model ing
Econometric model ingMatt Grant
 
Introduction to regression analysis 2
Introduction to regression analysis 2Introduction to regression analysis 2
Introduction to regression analysis 2Sibashis Chakraborty
 
Exploring Data
Exploring DataExploring Data
Exploring Datajmalpass
 
Measures of-central-tendency-dispersion
Measures of-central-tendency-dispersionMeasures of-central-tendency-dispersion
Measures of-central-tendency-dispersionSanoj Fernando
 
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016MLconf
 
Interpreting Regression Results - Machine Learning
Interpreting Regression Results - Machine LearningInterpreting Regression Results - Machine Learning
Interpreting Regression Results - Machine LearningKush Kulshrestha
 
Adaptive_Organizations
Adaptive_OrganizationsAdaptive_Organizations
Adaptive_OrganizationsTom Coyne
 
My Math Project
My Math Project My Math Project
My Math Project du004008
 
Empirical Finance, Jordan Stone- Linkedin
Empirical Finance, Jordan Stone- LinkedinEmpirical Finance, Jordan Stone- Linkedin
Empirical Finance, Jordan Stone- LinkedinJordan Stone
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionAyushi Gupta
 

What's hot (17)

Correlation & Linear Regression
Correlation & Linear RegressionCorrelation & Linear Regression
Correlation & Linear Regression
 
Machine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear RegressionMachine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear Regression
 
2.3 the simple regression model
2.3 the simple regression model2.3 the simple regression model
2.3 the simple regression model
 
Correlation analysis
Correlation analysisCorrelation analysis
Correlation analysis
 
Introduction to Regression Analysis
Introduction to Regression AnalysisIntroduction to Regression Analysis
Introduction to Regression Analysis
 
Econometric model ing
Econometric model ingEconometric model ing
Econometric model ing
 
Introduction to regression analysis 2
Introduction to regression analysis 2Introduction to regression analysis 2
Introduction to regression analysis 2
 
Exploring Data
Exploring DataExploring Data
Exploring Data
 
Measures of-central-tendency-dispersion
Measures of-central-tendency-dispersionMeasures of-central-tendency-dispersion
Measures of-central-tendency-dispersion
 
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
 
Econometrics chapter 8
Econometrics chapter 8Econometrics chapter 8
Econometrics chapter 8
 
Interpreting Regression Results - Machine Learning
Interpreting Regression Results - Machine LearningInterpreting Regression Results - Machine Learning
Interpreting Regression Results - Machine Learning
 
Adaptive_Organizations
Adaptive_OrganizationsAdaptive_Organizations
Adaptive_Organizations
 
My Math Project
My Math Project My Math Project
My Math Project
 
Empirical Finance, Jordan Stone- Linkedin
Empirical Finance, Jordan Stone- LinkedinEmpirical Finance, Jordan Stone- Linkedin
Empirical Finance, Jordan Stone- Linkedin
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Ols by hiron
Ols by hironOls by hiron
Ols by hiron
 

Similar to 4.transformations

Multivariate Linear Regression.ppt
Multivariate Linear Regression.pptMultivariate Linear Regression.ppt
Multivariate Linear Regression.pptTanyaWadhwani4
 
A presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.pptA presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.pptvigia41
 
Sess03 Dimension Reduction Methods.pptx
Sess03 Dimension Reduction Methods.pptxSess03 Dimension Reduction Methods.pptx
Sess03 Dimension Reduction Methods.pptxSarthakKabi1
 
Biostatistics lecture notes 7.ppt
Biostatistics lecture notes 7.pptBiostatistics lecture notes 7.ppt
Biostatistics lecture notes 7.pptletayh2016
 
Research method ch09 statistical methods 3 estimation np
Research method ch09 statistical methods 3 estimation npResearch method ch09 statistical methods 3 estimation np
Research method ch09 statistical methods 3 estimation npnaranbatn
 
correlation and regression
correlation and regressioncorrelation and regression
correlation and regressionFaiezah Zulkifli
 
Stat 1163 -correlation and regression
Stat 1163 -correlation and regressionStat 1163 -correlation and regression
Stat 1163 -correlation and regressionKhulna University
 
Generalized Linear Models for Between-Subjects Designs
Generalized Linear Models for Between-Subjects DesignsGeneralized Linear Models for Between-Subjects Designs
Generalized Linear Models for Between-Subjects Designssmackinnon
 
Regression Analysis.ppt
Regression Analysis.pptRegression Analysis.ppt
Regression Analysis.pptAbebe334138
 
Transformers: Data in Disguise
Transformers: Data in DisguiseTransformers: Data in Disguise
Transformers: Data in Disguisedrplayfoot
 

Similar to 4.transformations (20)

Transformation of variables
Transformation of variablesTransformation of variables
Transformation of variables
 
Lecture 4
Lecture 4Lecture 4
Lecture 4
 
Multivariate Linear Regression.ppt
Multivariate Linear Regression.pptMultivariate Linear Regression.ppt
Multivariate Linear Regression.ppt
 
5954987.ppt
5954987.ppt5954987.ppt
5954987.ppt
 
A presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.pptA presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.ppt
 
Sess03 Dimension Reduction Methods.pptx
Sess03 Dimension Reduction Methods.pptxSess03 Dimension Reduction Methods.pptx
Sess03 Dimension Reduction Methods.pptx
 
Assignment 01
Assignment 01Assignment 01
Assignment 01
 
Assignment 01
Assignment 01Assignment 01
Assignment 01
 
Scaling and Normalization
Scaling and NormalizationScaling and Normalization
Scaling and Normalization
 
Biostatistics lecture notes 7.ppt
Biostatistics lecture notes 7.pptBiostatistics lecture notes 7.ppt
Biostatistics lecture notes 7.ppt
 
Research method ch09 statistical methods 3 estimation np
Research method ch09 statistical methods 3 estimation npResearch method ch09 statistical methods 3 estimation np
Research method ch09 statistical methods 3 estimation np
 
correlation and regression
correlation and regressioncorrelation and regression
correlation and regression
 
Multicollinearity PPT
Multicollinearity PPTMulticollinearity PPT
Multicollinearity PPT
 
Stat 1163 -correlation and regression
Stat 1163 -correlation and regressionStat 1163 -correlation and regression
Stat 1163 -correlation and regression
 
Correlation & Regression.pptx
Correlation & Regression.pptxCorrelation & Regression.pptx
Correlation & Regression.pptx
 
Generalized Linear Models for Between-Subjects Designs
Generalized Linear Models for Between-Subjects DesignsGeneralized Linear Models for Between-Subjects Designs
Generalized Linear Models for Between-Subjects Designs
 
Regression Analysis.ppt
Regression Analysis.pptRegression Analysis.ppt
Regression Analysis.ppt
 
Stats chapter 4
Stats chapter 4Stats chapter 4
Stats chapter 4
 
Transformers: Data in Disguise
Transformers: Data in DisguiseTransformers: Data in Disguise
Transformers: Data in Disguise
 
BRM-lecture-11.ppt
BRM-lecture-11.pptBRM-lecture-11.ppt
BRM-lecture-11.ppt
 

More from Pakistan Gum Industries Pvt. Ltd (20)

Transportation management
Transportation  managementTransportation  management
Transportation management
 
Anum alam initial pages. 090
Anum alam initial pages. 090Anum alam initial pages. 090
Anum alam initial pages. 090
 
Airlineres
AirlineresAirlineres
Airlineres
 
Farehalet
FarehaletFarehalet
Farehalet
 
Cv ali final
Cv ali finalCv ali final
Cv ali final
 
Ali hasan
Ali hasanAli hasan
Ali hasan
 
(Resume) tariq pervez
(Resume) tariq pervez(Resume) tariq pervez
(Resume) tariq pervez
 
Graded businessvocabularylist
Graded businessvocabularylistGraded businessvocabularylist
Graded businessvocabularylist
 
Vacation accrued
Vacation accruedVacation accrued
Vacation accrued
 
Sick time
Sick timeSick time
Sick time
 
Blank employee letter
Blank employee letterBlank employee letter
Blank employee letter
 
Mobile advertising final
Mobile advertising finalMobile advertising final
Mobile advertising final
 
Introduction
IntroductionIntroduction
Introduction
 
Final iran
Final iranFinal iran
Final iran
 
Saudi arabia
Saudi arabiaSaudi arabia
Saudi arabia
 
The united nations security council
The united nations security councilThe united nations security council
The united nations security council
 
Presentation 6
Presentation 6Presentation 6
Presentation 6
 
Paper saad niazi
Paper saad niaziPaper saad niazi
Paper saad niazi
 
History of e bay in china
History of e bay in chinaHistory of e bay in china
History of e bay in china
 
Case 1
Case 1Case 1
Case 1
 

Recently uploaded

Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxAnaBeatriceAblay2
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 

Recently uploaded (20)

Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 

4.transformations

  • 1. Lecture 4: Transformations Regression III: Advanced Methods William G. Jacoby Michigan State University
  • 2. 2 Goals of the lecture • The Ladder of Roots and Powers • Changing the shape of distributions • Transforming for Linearity
  • 3. 3 Why transform data? 1. In some instances it can help us better examine a distribution 2. Many statistical models are based on the mean and thus require that the mean is an appropriate measure of central tendency (i.e., the distribution is approximately normal) 3. Linear least squares regression assumes that the relationship between two variables is linear. Often we can “straighten” a nonlinear relationship by transforming one or both of the variables — Often transformations will ‘fix’ problem distributions so that we can use least-squares regression — When transformations fail to remedy these problems, another option is to use nonparametric regression, which makes fewer assumptions about the data
  • 4. 4 Power transformations for quantitative variables • Although there are an infinite number of functions f(x) that can be used to transform a distribution, in practice only a relatively small number are regularly used • For quantitative variables one can usually rely on the “family” of powers and roots: • When p is negative, the transformation is an inverse power: • When p is a fraction, the transformation represents a root:
  • 5. 5 Log transformations • A power transformation of X0 should not be used because it changes all values to 1 (in other words, it makes the variable a constant) • Instead we can think of X0 as a shorthand for the log transformation logeX, where e 2.718 is the base of the natural logarithms: • In practice most people prefer to use log10X because it is easier to interpret—increasing log10X by 1 is the same as multiplying X by 10 • In terms of result, it matters little which base is used because changing base is equivalent to multiplying X by a constant
  • 6. 6 Cautions: Power Transformations (1) • Descending the ladder of powers and roots compresses the large values of X and spreads out the small values • As p moves away from 1 in either direction, the transformation becomes more powerful • Power transformations are sensible ONLY when all the X values are POSITIVE—If not, this can be solved by adding a start value – Some transformations (e.g., log, square root, are undefined for 0 and negative numbers) – Other power transformations will not be monotone, thus changing the order of the data
  • 7. 7 Cautions: Power Transformations (2) • Power transformations are only effective if the ratio of the largest data value to the smallest data value is large • If the ratio is very close to 1, the transformation will have little effect • General rule: If the ratio is less than 5, a negative start value should be considered
  • 8. 8 Transforming Skewed Distributions • The example below shows how a log10 transformation can fix a positive skew • The density estimate for average income for occupations from the Canadian Prestige data is shown on top; the bottom shows the density estimate of the transformed income 0 10000 20000 30000 0.000000.000060.00012 income Probabilitydensityfunction 2.5 3.0 3.5 4.0 4.5 0.00.51.01.52.0 log.inc Probabilitydensityfunction
  • 9. 9 Transforming Nonlinearity When is it possible? • An important use of transformations is to ‘straighten’ the relationship between two variables • This is possible only when the nonlinear relationship is simple and monotone – Simple implies that the curvature does not change— there is one curve – Monotone implies that the curve is always positive or always negative • (a) can be transformed, (b) and (c) can not
  • 10. 10 The ‘Bulging Rule’ for transformations • Tukey and Mosteller’s rule provides a starting point for possible transformations to correct nonlinearity • Normally we should try to transform explanatory variables rather than the response variable Y since a transformation of Y will affect the relationship of Y with all Xs not just the one with the nonlinear relationship • If, however, the response variable is highly skewed, it makes sense to transform it instead
  • 11. 11 Transforming relationships Income and Infant mortality (1) • Leinhardt’s data from the car library • Robust local regression in the plot shows serious nonlinearity • The bulging rule suggests that both Y and X can be transformed down the ladder of powers • I tried taking the log of income only, but significant nonlinearity still remained • In the end, I took the log10 of both income and infant mortality 0 1000 2000 3000 4000 5000 0100200300400500600 income infant
  • 12. 12 Income and Infant mortality (2) 2.0 2.5 3.0 3.5 1.01.52.02.5 log.income log.infant • A linear model fits well here • Since both variables are transformed by the log10 the coefficients are easy to interpret: – An increase in income by 1% is associated, on average, with a .51% decrease in infant mortality
  • 13. 13 Transforming Proportions • The logit transformation: (a) removes the boundaries of the scale, (b) spreads out the tails of the distribution and (c) makes the distribution symmetric about 0. It takes the following form: • Power transformations will not work for proportions (including percentages and rates) if the data values approach the boundaries of 0 and 1 • Instead, we can use the logit or probit transformations for skewed proportion distributions. If their scales are equated, these two are practically indistinguishable:
  • 14. 14 Logit Transformation of a Proportion • Notice that the transformation is nearly linear for proportions between .20 and .80 • Values close to 0 and 1 are spread out at an increasing rate, however • Finally, the transformed variable is now centered at 0 rather than .5
  • 15. 15 Next Topics: – The Basics of Least Squares Regression • Least-squares fit • Properties of the least-squares estimator • Statistical inference • Regression in matrix form – The Vector Representation of the Regression Model