SlideShare a Scribd company logo
1 of 51
Applied Statistics
Part 4
By
M. H. Farjoo MD, PhD, Bioanimator
Shahid Beheshti University of Medical Sciences
Instagram: @bio_animation
Applied Statistics
Part 4
 Introduction of Correlation and regression
 Difference Between Correlation & Regression
 Correlation
 Regression
 Simple Linear regression
 Multiple linear Regression
 Simple Logistic Regression
 Multiple Logistic Regression
 Non linear (Curvilinear) Regression
 Choosing Test
Introduction
 Correlation and regression are not the same.
 Use correlation to know:
 Whether two measurement variables are associated.
 Whether as one variable increases, the other increases
or decreases.
 Use regression to know:
 The strength of the association or relation.
 The equation of a line that fits the cloud of data,
describes the relationship, and predicts unknown
values.
Difference Between Correlation & Regression
 Goal:
 Correlation quantifies the degree to which two variables are
related, and does not fit a line through the data points.
 Linear regression finds the best line (equation) that fits data
points.
 kind of data and sampling:
 Correlation is used when you measure both variables and
sample both variables randomly from a population.
 In regression, X is a variable we manipulate and choose its
values (time, concentration, etc.) and predict Y from X.
Difference Between Correlation & Regression
 Relationship between results:
 Correlation computes correlation coefficient, r.
 Linear regression quantifies goodness of fit, r2 (or R2).
 Which variable is which?
 In correlation we get the same coefficient (r) if swap
the two variables.
 Regression gets a different best-fit line and different
coefficient (r2) if we swap the two variables.
Correlation
 When two variables vary together, there is
covariation or correlation.
 The null hypothesis implies:
 There is no relationship between the variables
 As the X variable changes, the Y variable does not
change.
 Correlation coefficient is not significantly different
form zero (or statistically: r = 0)
 Correlation does not imply causation.
 But a significant correlation may suggest further
research to test for a cause and effect relation.
Guidelines for Judging Causality
1. Is there a temporal relationship?
2. What is the strength of association?
3. Is there a dose/response relationship?
4. Were the findings replicated?
5. Is there biological plausibility?
6. What happens with cessation of exposure?
7. Is this explanation consistent with other knowledge?
Correlation
 Causal inferences are licensed primarily by the design
of your study, not by the statistical techniques you
use.
 Correlation only quantifies linear (straight line)
covariation.
 A correlation analysis is not helpful if Y changes to a
point, and then continues to opposite direction.
 In this case we obtain a low value of r, even though
the two variables are strongly related.
Correlation
 The value of correlation may be:
 -1 (perfect inverse relationship; X goes up, Y goes down)
 1 (perfect positive relationship; X goes up so does Y).
 0 (no correlation at all).
 Pearson correlation (r) is parametric and assumes
both X and Y are from a Gaussian distribution.
 Spearman correlation [rs or ρ (rho)] does not make
this assumption and is non-parametric.
 Correlation is not sensitive to non-normality
 So use Pearson method any time you have 2
measurement variables, even if they look non-normal.
Value of r (or rs) Interpretation
1.0 Perfect correlation
> 0 to 1
The two variables tend to
increase or decrease together.
0.0
The two variables do not vary
together at all.
-1 to < 0
One variable increases as the
other decreases.
-1.0
Perfect negative or inverse
correlation.
Correlation
 If r or rs is far from zero, there are four possible
explanations:
1. Changes in the X variable causes a change in the value of
the Y variable.
2. Changes in the Y variable causes a change in the value of
the X variable.
3. Changes in another variable influences both X and Y.
4. X and Y don’t really correlate at all, and correlation is by
chance.
Regression
 In regression we fit a line through the data and use its
equation to predict Y from X.
 We predict scores on one variable (Y axis) from the
scores on a second variable (X axis).
 The variable we are basing our predictions is the
independent or predictor variable (X axis).
 The variable we are predicting is the criterion or
dependent variable (Y axis).
 Only the dependent variable (Y axis) determines the type
of regression NOT the independent variable (X axis).
Regression
 The null hypothesis implies: the slope of the best-fit
line is equal to zero.
 Try to use the line equation for prediction within the
X values found in the data set (interpolation).
 Predicting Y values outside the range of X values
(extrapolation) can yield ridiculous results if you go
too far!
 The expansion of an iron rod is related to heat, but it
will not expand at 2000 °C, it will melt!!
Regression
 r2 (in the output window of the results) is called the
coefficient of determination, or "r squared".
 It is a value that ranges from 0 to 1, and is the fraction
of the variation in the two variables that is “shared”.
 Regressions can have a small r2 (no relationship), yet
have a slope that is significantly different from zero.
 The null hypothesis has nothing to do with r2.
Simulated data showing the effect of the range of X values on the
r2. For the exact same data, measuring Y over a smaller range of X
values yields a smaller r2.
Simple Linear regression
 When Y is a continuous variable and there is only one
predictor variable, it is called: simple linear regression.
 An example is: weight of the infant at birth (Y),
predicted by gestational age (X).
 In simple linear regression, the predictions of Y from X
form a straight line.
 Regression line can predict Y from X and is the best-
fitting straight line through the points with slope and
intercept.
 The line minimizes the sum of the squares of the
vertical distances of the points (errors) from the line.
• The slope quantifies the steepness of the line. It equals the change in
Y for each unit change in X.
• The Y intercept is the Y value of the line when X equals zero. It
defines the elevation of the line.
in Graph A, the points are closer to the line than they are in Graph B.
Therefore, the predictions in Graph A are more accurate than in Graph B.
Simple Linear regression
 The Frank Anscombe's quartet demonstrate the
importance of looking at your data.
 They all have 11 points and are very different.
 Surprisingly when analyzed by linear regression, all
these values are identical for all four graphs:
 The mean values of X and Y
 The slopes and intercepts
 r2
 The SE and CI of the slope and intercept
Frank Anscombe, (1918–2001) He was brother-in-law to
another well-known statistician, John Tukey; their wives were
sisters.
Multiple Linear Regression
 In multiple regression, Y is predicted by two or more
variables in X axis.
 We can use it for:
 Predicting the values of the dependent variable.
 To decide which independent variable (X) has a major
effect on the dependent variable (Y).
 An example is: weight of the infant at birth (Y), predicted
by gestational age, Weight of mother, and whether the
mother smokes or not ( all on X).
 Not all the predictors (X) are worth of including in a
multiple linear regression model.
Multiple Linear Regression
 Another example: to predict a student's university score
based on their high school scores and their total SAT
score.
 The basic idea is to find a linear combination of high
school scores that best predicts University score.
 Be very careful in using multiple regression to understand
cause-and-effect relationships.
 It is very easy to get misled by the results of a fancy
multiple regression analysis.
 The results should be used as a suggestion, rather than for
hypothesis testing.
Simple Logistic Regression
 Simple logistic regression is used when there is one
measurement independent variable (X) and the Y variable
is nominal.
 The goal is:
 To check the probability of getting a particular condition of
Y is associated with X.
 To predict a particular condition of Y, given the X.
 If Y has only two values, the regression is called: “binary
logistic regression” (male/female, dead/alive).
 If Y has more than two values, the regression is called:
"multinomial logistic regression”.
Simple Logistic Regression
 Example of binary logistic regression: the effect of
study time (X), on exam outcome (Y).
 The model can be used to predict the occurrence of
heart attack based on the plasma cholesterol.
 An example of multinomial logistic regression: the
effect of the grade of a tumor (X), on the treatment
method (radiotherapy, chemotherapy, surgery) (Y).
 The model can be used to choose how to treat the
patient based on the severity of the cancer.
Simple Logistic Regression
Pass:Y = 1
Fail: Y= 0
Y is only 0 or 1 because the result is only pass/fail and
there is nothing in between.
Multiple Logistic Regression
 The dependent variable (Y) is nominal and there are 2
or more independent variables (X).
 Example: the effect of cholesterol, age, and weight on
the probability of heart attack in the next year.
 We can measure the risk factors on new individuals
and estimate the probability of heart attack.
 This is done by comparing their odds ratios in the out
put window of the software.
Multiple Logistic Regression
 We can try to guess what is the main risk factor that
changes the probability of the dependent variable.
 The null hypothesis implies:
 There is no relationship between the X variables and
the Y variable
 Adding each X variable does not really improve the fit
of the equation.
Nonlinear (Curvilinear) Regression
 If we have to transform nonlinear data to create a
linear relationship, nonlinear regression should be
used.
 Avoid transformations such as Scatchard or
Lineweaver-Burk whose only goal is to linearize your
data.
 These methods are outdated, and should not be used
to analyze data.
 You might analyze the data by nonlinear regression
but show the results by linear transformation.
 The human brain and eye is keen for straight lines!
The Scatchard equation is an equation for calculating the affinity
constant of a ligand with a protein.
In biochemistry, the Lineweaver–Burk plot is a representation
of enzyme kinetics.
Nonlinear (Curvilinear) Regression
 Fitting a straight line to transformed data gives different
results than fitting a curved line to untransformed data.
 The equation for a curve is a polynomial equation.
 In polynomial equations X is raised to integer powers
such as X2 and X3.
 A quadratic equation, is Y=aX1+bX2+d, and produces a
parabola.
 A cubic equation is Y= aX1+bX2+cX3+d and produces
a S-shaped (sigmoid) curve.
Nonlinear (Curvilinear) Regression
 Nonlinear regression is used for three purposes:
 To fit a model to data for obtaining the best-fit values
of the parameters.
 To compare the fits of alternative models.
 To simply fit a smooth curve in order to interpolate
values from the curve.
 The goal is not to describe the system perfectly, but to
fit a curve which comes close to knowing the system.
 In this way we can understand the system, and reach
valid scientific conclusions.
Nonlinear (Curvilinear) Regression
 The nonlinear method may yield results that are weird.
 This happens with noisy or incomplete data, and include:
 A rate constant that is negative.
 A best-fit fraction that is greater than 1.
 A best-fit Kd value that is negative.
 Top of a sigmoid curve is far larger than the highest data.
 An EC50 not within the range of your X values.
 If the results make no sense, they are unacceptable, even
if the curve comes close to the points and R2 is close to 1.
Correlation & Regression
Hands-on practice
 To calculate correlation & regression in SPSS:
 For Correlation: Analyze => Correlate
 For Regression: Analyze => Regression
 To calculate correlation & regression in Prism:
 XY (from welcome screen) => choose appropriate option
Choosing Test of Association
Dependent
Variable
Independen
t
Variable
Parametric
test
Non-
parametric test
Relationship
between 2
continuous
variables
Scale Scale
Pearson’s
Correlation
Coefficient
Spearman’s
Correlation
Coefficient
Predicting the value
of one variable from
the value of a
predictor variable or
looking
for significant
relationships
Scale Any
Simple Linear
Regression
Transform the
data
Nominal
(Binary)
Any
Logistic
regression
---------
Assessing the
relationship between
two categorical
variables
Categorical Categorical ---------
Chi-squared
test
Is your Dependent Variable (DV) continuous?
YES
NO
Is your Independent
Variable (IV) continuous?
Is your Independent
Variable (IV) continuous?
YES
YES
YES
Do you have
only two
groups?
NO
NO
NO
Type of Data
Goal
Measurement (from
Gaussian Population)
Rank, Score, or
Measurement (from Non-
Gaussian Population)
Binomial
(Two Possible
Outcomes)
Survival Time
Describe one group Mean, SD Median, interquartile range Proportion
Kaplan Meier survival
curve
Compare one group to a
hypothetical value
One-sample ttest Wilcoxon test
Chi-square
or
Binomial test **
Compare two
unpaired groups
Unpaired t test Mann-Whitney test
Fisher's test
(chi-square for large
samples)
Log-rank test or Mantel-
Haenszel*
Compare two paired
groups
Paired t test Wilcoxon test McNemar's test
Conditional proportional
hazards regression*
Compare three or more
unmatched groups
One-way ANOVA Kruskal-Wallis test Chi-square test
Cox proportional hazard
regression**
Compare three or more
matched groups
Repeated-measures
ANOVA
Friedman test Cochrane Q**
Conditional proportional
hazards regression**
Quantify association
between two variables
Pearson correlation Spearman correlation Contingency coefficients**
Predict value from another
measured variable
Simple linear regression
or
Nonlinear regression
Nonparametric
regression**
Simple logistic regression*
Cox proportional hazard
regression*
Predict value from several
measured or binomial
variables
Multiple linear regression*
or
Multiple nonlinear
regression**
Multiple logistic
regression*
Cox proportional hazard
regression*
Thank you
Any question?
Types ofVariables and Commonly Used Statistical Methods

More Related Content

What's hot

Research Methodology Module-06
Research Methodology Module-06Research Methodology Module-06
Research Methodology Module-06Kishor Ade
 
T test, independant sample, paired sample and anova
T test, independant sample, paired sample and anovaT test, independant sample, paired sample and anova
T test, independant sample, paired sample and anovaQasim Raza
 
Spss paired samples t test Reporting
Spss paired samples t test ReportingSpss paired samples t test Reporting
Spss paired samples t test ReportingAmit Sharma
 
correlation and regression
correlation and regressioncorrelation and regression
correlation and regressionKeyur Tejani
 
T-Test for Correlated Groups by STR Grp. 2
T-Test for Correlated Groups by STR Grp. 2T-Test for Correlated Groups by STR Grp. 2
T-Test for Correlated Groups by STR Grp. 2Oj Acopiado
 
Anova lecture
Anova lectureAnova lecture
Anova lecturedoublem44
 
Proportion test using Chi square
Proportion test using Chi squareProportion test using Chi square
Proportion test using Chi squareParag Shah
 
Parametric test - t Test, ANOVA, ANCOVA, MANOVA
Parametric test  - t Test, ANOVA, ANCOVA, MANOVAParametric test  - t Test, ANOVA, ANCOVA, MANOVA
Parametric test - t Test, ANOVA, ANCOVA, MANOVAPrincy Francis M
 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in RAlichy Sowmya
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and RegressionShubham Mehta
 
Chi square tests using SPSS
Chi square tests using SPSSChi square tests using SPSS
Chi square tests using SPSSParag Shah
 
Introduction to ANOVA
Introduction to ANOVAIntroduction to ANOVA
Introduction to ANOVAAKASH GHANATE
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVADerek Kane
 
Regression analysis made easy
Regression analysis made easyRegression analysis made easy
Regression analysis made easyWeam Banjar
 

What's hot (19)

Research Methodology Module-06
Research Methodology Module-06Research Methodology Module-06
Research Methodology Module-06
 
T test, independant sample, paired sample and anova
T test, independant sample, paired sample and anovaT test, independant sample, paired sample and anova
T test, independant sample, paired sample and anova
 
Spss paired samples t test Reporting
Spss paired samples t test ReportingSpss paired samples t test Reporting
Spss paired samples t test Reporting
 
correlation and regression
correlation and regressioncorrelation and regression
correlation and regression
 
Correlation
CorrelationCorrelation
Correlation
 
T-Test for Correlated Groups by STR Grp. 2
T-Test for Correlated Groups by STR Grp. 2T-Test for Correlated Groups by STR Grp. 2
T-Test for Correlated Groups by STR Grp. 2
 
Anova lecture
Anova lectureAnova lecture
Anova lecture
 
Factorial ANOVA
Factorial ANOVAFactorial ANOVA
Factorial ANOVA
 
Proportion test using Chi square
Proportion test using Chi squareProportion test using Chi square
Proportion test using Chi square
 
Correlation
CorrelationCorrelation
Correlation
 
Parametric test - t Test, ANOVA, ANCOVA, MANOVA
Parametric test  - t Test, ANOVA, ANCOVA, MANOVAParametric test  - t Test, ANOVA, ANCOVA, MANOVA
Parametric test - t Test, ANOVA, ANCOVA, MANOVA
 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in R
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
 
Chi square tests using SPSS
Chi square tests using SPSSChi square tests using SPSS
Chi square tests using SPSS
 
Introduction to ANOVA
Introduction to ANOVAIntroduction to ANOVA
Introduction to ANOVA
 
Correlation
CorrelationCorrelation
Correlation
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVA
 
Shovan anova main
Shovan anova mainShovan anova main
Shovan anova main
 
Regression analysis made easy
Regression analysis made easyRegression analysis made easy
Regression analysis made easy
 

Similar to Applied statistics part 4

Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionAntony Raj
 
Assessment 2 ContextIn many data analyses, it is desirable.docx
Assessment 2 ContextIn many data analyses, it is desirable.docxAssessment 2 ContextIn many data analyses, it is desirable.docx
Assessment 2 ContextIn many data analyses, it is desirable.docxfestockton
 
Assessment 2 ContextIn many data analyses, it is desirable.docx
Assessment 2 ContextIn many data analyses, it is desirable.docxAssessment 2 ContextIn many data analyses, it is desirable.docx
Assessment 2 ContextIn many data analyses, it is desirable.docxgalerussel59292
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionAntony Raj
 
Correlation and regression impt
Correlation and regression imptCorrelation and regression impt
Correlation and regression imptfreelancer
 
Linear regression (1). spss analiisa statistik
Linear regression (1). spss analiisa statistikLinear regression (1). spss analiisa statistik
Linear regression (1). spss analiisa statistikJuandaSatriyo1
 
2. If you have a nonlinear relationship between an independent varia.pdf
2. If you have a nonlinear relationship between an independent varia.pdf2. If you have a nonlinear relationship between an independent varia.pdf
2. If you have a nonlinear relationship between an independent varia.pdfsuresh640714
 
ch 13 Correlation and regression.doc
ch 13 Correlation  and regression.docch 13 Correlation  and regression.doc
ch 13 Correlation and regression.docAbedurRahman5
 
Introduction to simple linear regression and correlation in spss
Introduction to  simple linear regression and correlation in spssIntroduction to  simple linear regression and correlation in spss
Introduction to simple linear regression and correlation in spssAmjad Afridi
 
REGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREREGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREShriramKargaonkar
 
Correlation analysis
Correlation analysis Correlation analysis
Correlation analysis Misab P.T
 

Similar to Applied statistics part 4 (20)

2-20-04.ppt
2-20-04.ppt2-20-04.ppt
2-20-04.ppt
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Assessment 2 ContextIn many data analyses, it is desirable.docx
Assessment 2 ContextIn many data analyses, it is desirable.docxAssessment 2 ContextIn many data analyses, it is desirable.docx
Assessment 2 ContextIn many data analyses, it is desirable.docx
 
Assessment 2 ContextIn many data analyses, it is desirable.docx
Assessment 2 ContextIn many data analyses, it is desirable.docxAssessment 2 ContextIn many data analyses, it is desirable.docx
Assessment 2 ContextIn many data analyses, it is desirable.docx
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Regression
RegressionRegression
Regression
 
Regression
RegressionRegression
Regression
 
Correlation and regression impt
Correlation and regression imptCorrelation and regression impt
Correlation and regression impt
 
Linear regression (1). spss analiisa statistik
Linear regression (1). spss analiisa statistikLinear regression (1). spss analiisa statistik
Linear regression (1). spss analiisa statistik
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
 
Ch 7 correlation_and_linear_regression
Ch 7 correlation_and_linear_regressionCh 7 correlation_and_linear_regression
Ch 7 correlation_and_linear_regression
 
Correlation
CorrelationCorrelation
Correlation
 
2. If you have a nonlinear relationship between an independent varia.pdf
2. If you have a nonlinear relationship between an independent varia.pdf2. If you have a nonlinear relationship between an independent varia.pdf
2. If you have a nonlinear relationship between an independent varia.pdf
 
ch 13 Correlation and regression.doc
ch 13 Correlation  and regression.docch 13 Correlation  and regression.doc
ch 13 Correlation and regression.doc
 
Introduction to simple linear regression and correlation in spss
Introduction to  simple linear regression and correlation in spssIntroduction to  simple linear regression and correlation in spss
Introduction to simple linear regression and correlation in spss
 
REGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREREGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HERE
 
S1 pb
S1 pbS1 pb
S1 pb
 
9. parametric regression
9. parametric regression9. parametric regression
9. parametric regression
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Correlation analysis
Correlation analysis Correlation analysis
Correlation analysis
 

More from Mohammad Hadi Farjoo MD, PhD, Shahid behehsti University of Medical Sciences

More from Mohammad Hadi Farjoo MD, PhD, Shahid behehsti University of Medical Sciences (20)

Applied statistics part 2
Applied statistics  part 2Applied statistics  part 2
Applied statistics part 2
 
Applied statistics part 1
Applied statistics part 1Applied statistics part 1
Applied statistics part 1
 
Drugs used in disorders of coagulation
Drugs used in disorders of coagulationDrugs used in disorders of coagulation
Drugs used in disorders of coagulation
 
Agents used in anemias hematopoietic growth factors
Agents used in anemias hematopoietic growth factorsAgents used in anemias hematopoietic growth factors
Agents used in anemias hematopoietic growth factors
 
Drugs used in dyslipidemia
Drugs used in dyslipidemiaDrugs used in dyslipidemia
Drugs used in dyslipidemia
 
Immunopharmacology
Immunopharmacology Immunopharmacology
Immunopharmacology
 
Management of the poisoned patient.
Management of the poisoned patient.Management of the poisoned patient.
Management of the poisoned patient.
 
Rational prescribing & prescription writing
Rational prescribing & prescription writingRational prescribing & prescription writing
Rational prescribing & prescription writing
 
Drug use in pregnancy and lactation part 2
Drug use in pregnancy and lactation part 2Drug use in pregnancy and lactation part 2
Drug use in pregnancy and lactation part 2
 
Drug use in pregnancy and lactation part 1
Drug use in pregnancy and lactation part 1Drug use in pregnancy and lactation part 1
Drug use in pregnancy and lactation part 1
 
Drug use in pregnancy and lactation part 3
Drug use in pregnancy and lactation part 3Drug use in pregnancy and lactation part 3
Drug use in pregnancy and lactation part 3
 
Drugs causing methemoglobinemia
Drugs causing methemoglobinemiaDrugs causing methemoglobinemia
Drugs causing methemoglobinemia
 
Drugs pharmacology in kidney disease
Drugs pharmacology in kidney diseaseDrugs pharmacology in kidney disease
Drugs pharmacology in kidney disease
 
Drugs pharmacology in liver disease
Drugs pharmacology in liver diseaseDrugs pharmacology in liver disease
Drugs pharmacology in liver disease
 
Drugs pharmacology in lung disease
Drugs pharmacology in lung diseaseDrugs pharmacology in lung disease
Drugs pharmacology in lung disease
 
Drugs pharmacology in heart disease
Drugs pharmacology in heart diseaseDrugs pharmacology in heart disease
Drugs pharmacology in heart disease
 
Academic writing 2nd part 6 bahman 1398
Academic writing 2nd part 6 bahman 1398Academic writing 2nd part 6 bahman 1398
Academic writing 2nd part 6 bahman 1398
 
Academic writing part 1
Academic writing part 1Academic writing part 1
Academic writing part 1
 
treatment of cardiac arrhythmias 2
treatment of cardiac arrhythmias 2treatment of cardiac arrhythmias 2
treatment of cardiac arrhythmias 2
 
antihypertensive agents 1
antihypertensive agents 1antihypertensive agents 1
antihypertensive agents 1
 

Recently uploaded

RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookmanojkuma9823
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 

Recently uploaded (20)

RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 

Applied statistics part 4

  • 1. Applied Statistics Part 4 By M. H. Farjoo MD, PhD, Bioanimator Shahid Beheshti University of Medical Sciences Instagram: @bio_animation
  • 2. Applied Statistics Part 4  Introduction of Correlation and regression  Difference Between Correlation & Regression  Correlation  Regression  Simple Linear regression  Multiple linear Regression  Simple Logistic Regression  Multiple Logistic Regression  Non linear (Curvilinear) Regression  Choosing Test
  • 3. Introduction  Correlation and regression are not the same.  Use correlation to know:  Whether two measurement variables are associated.  Whether as one variable increases, the other increases or decreases.  Use regression to know:  The strength of the association or relation.  The equation of a line that fits the cloud of data, describes the relationship, and predicts unknown values.
  • 4. Difference Between Correlation & Regression  Goal:  Correlation quantifies the degree to which two variables are related, and does not fit a line through the data points.  Linear regression finds the best line (equation) that fits data points.  kind of data and sampling:  Correlation is used when you measure both variables and sample both variables randomly from a population.  In regression, X is a variable we manipulate and choose its values (time, concentration, etc.) and predict Y from X.
  • 5. Difference Between Correlation & Regression  Relationship between results:  Correlation computes correlation coefficient, r.  Linear regression quantifies goodness of fit, r2 (or R2).  Which variable is which?  In correlation we get the same coefficient (r) if swap the two variables.  Regression gets a different best-fit line and different coefficient (r2) if we swap the two variables.
  • 6. Correlation  When two variables vary together, there is covariation or correlation.  The null hypothesis implies:  There is no relationship between the variables  As the X variable changes, the Y variable does not change.  Correlation coefficient is not significantly different form zero (or statistically: r = 0)  Correlation does not imply causation.  But a significant correlation may suggest further research to test for a cause and effect relation.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11. Guidelines for Judging Causality 1. Is there a temporal relationship? 2. What is the strength of association? 3. Is there a dose/response relationship? 4. Were the findings replicated? 5. Is there biological plausibility? 6. What happens with cessation of exposure? 7. Is this explanation consistent with other knowledge?
  • 12. Correlation  Causal inferences are licensed primarily by the design of your study, not by the statistical techniques you use.  Correlation only quantifies linear (straight line) covariation.  A correlation analysis is not helpful if Y changes to a point, and then continues to opposite direction.  In this case we obtain a low value of r, even though the two variables are strongly related.
  • 13. Correlation  The value of correlation may be:  -1 (perfect inverse relationship; X goes up, Y goes down)  1 (perfect positive relationship; X goes up so does Y).  0 (no correlation at all).  Pearson correlation (r) is parametric and assumes both X and Y are from a Gaussian distribution.  Spearman correlation [rs or ρ (rho)] does not make this assumption and is non-parametric.  Correlation is not sensitive to non-normality  So use Pearson method any time you have 2 measurement variables, even if they look non-normal.
  • 14. Value of r (or rs) Interpretation 1.0 Perfect correlation > 0 to 1 The two variables tend to increase or decrease together. 0.0 The two variables do not vary together at all. -1 to < 0 One variable increases as the other decreases. -1.0 Perfect negative or inverse correlation.
  • 15. Correlation  If r or rs is far from zero, there are four possible explanations: 1. Changes in the X variable causes a change in the value of the Y variable. 2. Changes in the Y variable causes a change in the value of the X variable. 3. Changes in another variable influences both X and Y. 4. X and Y don’t really correlate at all, and correlation is by chance.
  • 16. Regression  In regression we fit a line through the data and use its equation to predict Y from X.  We predict scores on one variable (Y axis) from the scores on a second variable (X axis).  The variable we are basing our predictions is the independent or predictor variable (X axis).  The variable we are predicting is the criterion or dependent variable (Y axis).  Only the dependent variable (Y axis) determines the type of regression NOT the independent variable (X axis).
  • 17. Regression  The null hypothesis implies: the slope of the best-fit line is equal to zero.  Try to use the line equation for prediction within the X values found in the data set (interpolation).  Predicting Y values outside the range of X values (extrapolation) can yield ridiculous results if you go too far!  The expansion of an iron rod is related to heat, but it will not expand at 2000 °C, it will melt!!
  • 18. Regression  r2 (in the output window of the results) is called the coefficient of determination, or "r squared".  It is a value that ranges from 0 to 1, and is the fraction of the variation in the two variables that is “shared”.  Regressions can have a small r2 (no relationship), yet have a slope that is significantly different from zero.  The null hypothesis has nothing to do with r2.
  • 19. Simulated data showing the effect of the range of X values on the r2. For the exact same data, measuring Y over a smaller range of X values yields a smaller r2.
  • 20. Simple Linear regression  When Y is a continuous variable and there is only one predictor variable, it is called: simple linear regression.  An example is: weight of the infant at birth (Y), predicted by gestational age (X).  In simple linear regression, the predictions of Y from X form a straight line.  Regression line can predict Y from X and is the best- fitting straight line through the points with slope and intercept.  The line minimizes the sum of the squares of the vertical distances of the points (errors) from the line.
  • 21. • The slope quantifies the steepness of the line. It equals the change in Y for each unit change in X. • The Y intercept is the Y value of the line when X equals zero. It defines the elevation of the line.
  • 22.
  • 23.
  • 24. in Graph A, the points are closer to the line than they are in Graph B. Therefore, the predictions in Graph A are more accurate than in Graph B.
  • 25. Simple Linear regression  The Frank Anscombe's quartet demonstrate the importance of looking at your data.  They all have 11 points and are very different.  Surprisingly when analyzed by linear regression, all these values are identical for all four graphs:  The mean values of X and Y  The slopes and intercepts  r2  The SE and CI of the slope and intercept
  • 26. Frank Anscombe, (1918–2001) He was brother-in-law to another well-known statistician, John Tukey; their wives were sisters.
  • 27.
  • 28.
  • 29. Multiple Linear Regression  In multiple regression, Y is predicted by two or more variables in X axis.  We can use it for:  Predicting the values of the dependent variable.  To decide which independent variable (X) has a major effect on the dependent variable (Y).  An example is: weight of the infant at birth (Y), predicted by gestational age, Weight of mother, and whether the mother smokes or not ( all on X).  Not all the predictors (X) are worth of including in a multiple linear regression model.
  • 30. Multiple Linear Regression  Another example: to predict a student's university score based on their high school scores and their total SAT score.  The basic idea is to find a linear combination of high school scores that best predicts University score.  Be very careful in using multiple regression to understand cause-and-effect relationships.  It is very easy to get misled by the results of a fancy multiple regression analysis.  The results should be used as a suggestion, rather than for hypothesis testing.
  • 31.
  • 32. Simple Logistic Regression  Simple logistic regression is used when there is one measurement independent variable (X) and the Y variable is nominal.  The goal is:  To check the probability of getting a particular condition of Y is associated with X.  To predict a particular condition of Y, given the X.  If Y has only two values, the regression is called: “binary logistic regression” (male/female, dead/alive).  If Y has more than two values, the regression is called: "multinomial logistic regression”.
  • 33. Simple Logistic Regression  Example of binary logistic regression: the effect of study time (X), on exam outcome (Y).  The model can be used to predict the occurrence of heart attack based on the plasma cholesterol.  An example of multinomial logistic regression: the effect of the grade of a tumor (X), on the treatment method (radiotherapy, chemotherapy, surgery) (Y).  The model can be used to choose how to treat the patient based on the severity of the cancer.
  • 34. Simple Logistic Regression Pass:Y = 1 Fail: Y= 0 Y is only 0 or 1 because the result is only pass/fail and there is nothing in between.
  • 35. Multiple Logistic Regression  The dependent variable (Y) is nominal and there are 2 or more independent variables (X).  Example: the effect of cholesterol, age, and weight on the probability of heart attack in the next year.  We can measure the risk factors on new individuals and estimate the probability of heart attack.  This is done by comparing their odds ratios in the out put window of the software.
  • 36. Multiple Logistic Regression  We can try to guess what is the main risk factor that changes the probability of the dependent variable.  The null hypothesis implies:  There is no relationship between the X variables and the Y variable  Adding each X variable does not really improve the fit of the equation.
  • 37. Nonlinear (Curvilinear) Regression  If we have to transform nonlinear data to create a linear relationship, nonlinear regression should be used.  Avoid transformations such as Scatchard or Lineweaver-Burk whose only goal is to linearize your data.  These methods are outdated, and should not be used to analyze data.  You might analyze the data by nonlinear regression but show the results by linear transformation.  The human brain and eye is keen for straight lines!
  • 38. The Scatchard equation is an equation for calculating the affinity constant of a ligand with a protein.
  • 39. In biochemistry, the Lineweaver–Burk plot is a representation of enzyme kinetics.
  • 40. Nonlinear (Curvilinear) Regression  Fitting a straight line to transformed data gives different results than fitting a curved line to untransformed data.  The equation for a curve is a polynomial equation.  In polynomial equations X is raised to integer powers such as X2 and X3.  A quadratic equation, is Y=aX1+bX2+d, and produces a parabola.  A cubic equation is Y= aX1+bX2+cX3+d and produces a S-shaped (sigmoid) curve.
  • 41.
  • 42.
  • 43. Nonlinear (Curvilinear) Regression  Nonlinear regression is used for three purposes:  To fit a model to data for obtaining the best-fit values of the parameters.  To compare the fits of alternative models.  To simply fit a smooth curve in order to interpolate values from the curve.  The goal is not to describe the system perfectly, but to fit a curve which comes close to knowing the system.  In this way we can understand the system, and reach valid scientific conclusions.
  • 44. Nonlinear (Curvilinear) Regression  The nonlinear method may yield results that are weird.  This happens with noisy or incomplete data, and include:  A rate constant that is negative.  A best-fit fraction that is greater than 1.  A best-fit Kd value that is negative.  Top of a sigmoid curve is far larger than the highest data.  An EC50 not within the range of your X values.  If the results make no sense, they are unacceptable, even if the curve comes close to the points and R2 is close to 1.
  • 45. Correlation & Regression Hands-on practice  To calculate correlation & regression in SPSS:  For Correlation: Analyze => Correlate  For Regression: Analyze => Regression  To calculate correlation & regression in Prism:  XY (from welcome screen) => choose appropriate option
  • 46. Choosing Test of Association Dependent Variable Independen t Variable Parametric test Non- parametric test Relationship between 2 continuous variables Scale Scale Pearson’s Correlation Coefficient Spearman’s Correlation Coefficient Predicting the value of one variable from the value of a predictor variable or looking for significant relationships Scale Any Simple Linear Regression Transform the data Nominal (Binary) Any Logistic regression --------- Assessing the relationship between two categorical variables Categorical Categorical --------- Chi-squared test
  • 47. Is your Dependent Variable (DV) continuous? YES NO Is your Independent Variable (IV) continuous? Is your Independent Variable (IV) continuous? YES YES YES Do you have only two groups? NO NO NO
  • 48. Type of Data Goal Measurement (from Gaussian Population) Rank, Score, or Measurement (from Non- Gaussian Population) Binomial (Two Possible Outcomes) Survival Time Describe one group Mean, SD Median, interquartile range Proportion Kaplan Meier survival curve Compare one group to a hypothetical value One-sample ttest Wilcoxon test Chi-square or Binomial test ** Compare two unpaired groups Unpaired t test Mann-Whitney test Fisher's test (chi-square for large samples) Log-rank test or Mantel- Haenszel* Compare two paired groups Paired t test Wilcoxon test McNemar's test Conditional proportional hazards regression* Compare three or more unmatched groups One-way ANOVA Kruskal-Wallis test Chi-square test Cox proportional hazard regression** Compare three or more matched groups Repeated-measures ANOVA Friedman test Cochrane Q** Conditional proportional hazards regression** Quantify association between two variables Pearson correlation Spearman correlation Contingency coefficients** Predict value from another measured variable Simple linear regression or Nonlinear regression Nonparametric regression** Simple logistic regression* Cox proportional hazard regression* Predict value from several measured or binomial variables Multiple linear regression* or Multiple nonlinear regression** Multiple logistic regression* Cox proportional hazard regression*
  • 49.
  • 51. Types ofVariables and Commonly Used Statistical Methods