Statistical tests SPSS (1).pdf

Data analysis using SPSS
Dr Nauman Arif
PhD Scholar Public Health, MSc Epi & Bio, MPH, CHR
Coordinator MS Epidemiology / CHR
Faculty Epidemiology IPH&SS KMU
National Research Facilitator CPSP
2/19/2022
1
Dr Nauman Arif

Variable
• A Variable is a characteristic of a person, object
or phenomenon that can take on different
values.
• A simple example of a variable is a person’s age.
The variable age can take on different values
because a person can be 20 years old, 35 years
old, and so on.
2/19/2022
Dr Nauman Arif
2

Types of variables
Dependent variable
• The variable that is used to describe or measure
the problem under study (outcome) is called the
dependent variable.
Independent variable
• The variables that are used to describe or
measure the factors that are assumed to cause
or at least to influence the problem are called the
independent (exposure) variables
2/19/2022
Dr Nauman Arif
3

Data
• Data are the values of observations recorded for
variables e.g. age, weight, sex etc.
• Data once collected should be presented in a such a
way as to be easily understood.
• The style of presentation depends on type of data.
• Data can be presented as frequency tables, charts,
graphs, etc.
2/19/2022
4
Dr Nauman Arif

Types of data
Qualitative / Categorical data
• The characteristic which can’t be expressed numerically
like sex, ethnicity, healing etc.
• Nominal data Example: Gender, Blood groups
• Ordinal data Example: Severity of pain
Quantitative / Scale data
• The characteristic which can be expressed numerically
like age, temperature, no. of children in a family.
• Continuous data Example: BMI
• Discrete data Example: Age in years
2/19/2022
Dr Nauman Arif
5

Descriptive statistics
1. Qualitative / Categorical data
• For qualitative or categorical data frequencies &
percentages are calculated which are graphically
presented through Bar graph & Pie chart
2. Quantitative / Scale data
• For quantitative or scale data mean, median, mode, SD,
range, quartile, min, max, skewness, kurtosis are
calculated and the data is graphically presented
Histogram, Box plot, line graph, Scator plot
2/19/2022
6
Dr Nauman Arif

Descriptive statistics
• Frequency distribution
In a Frequency Table data is presented in a
tabular form. It gives the frequency with
which (or the number of times) a particular
value appears in the data.
• Cross-tabulation
For better description of data or in order to
look for differences or relevant associations
2/19/2022
7
Dr Nauman Arif

Frequency distribution tables
Systolic Blood Pressure of patients coming to a
tertiary care hospital OPD n = 60
2/19/2022
8
Dr Nauman Arif

2/19/2022
9
Smoker
Non smoker
Disease Status
Cross-tabulation
Dr Nauman Arif

Measure of Central Tendency
Mean
Sum of all the observations divided by total number of
observations
Median
Mid-point in the data set if the data is arranged in
ascending or descending order
Mode
The most repeated number in the data set
2/19/2022
10
Dr Nauman Arif

Measures Of Dispersion
•Range is defined as the difference in value between
the highest (maximum) and the lowest (minimum)
observation
•Variance Quantifies the amount of variability or
spread about the mean of the sample.
• Standard deviation it is the square root of the variance
2/19/2022
11
Dr Nauman Arif

Standard Deviation
• The STANDARD DEVIATION is a measure,
which describes how much individual
measurements differ, on the average from the
mean.
• A large standard deviation shows that there is a
wide scatter of measured values around the
mean, while a small standard deviation shows
that the individual values are concentrated
around the mean with little variation among
them.
2/19/2022
12
Dr Nauman Arif

Skewness (Symmetry)
The term skewness refers to the lack of symmetry. The lack
of symmetry in a distribution is always determined with
reference to a normal distribution. Note that a normal
distribution is always symmetrical. Absence of skewness
makes a distributionsymmetrical.
• Right skewness (+ve) (Mean>Median>Mode)
• Left skewness (-ve) (Mode>Median>Mean)
2/19/2022
14
Dr Nauman Arif

Continue…
There are threetypesof distributioncan beobserved
from agraph.
 Symmetric distribution
 Positively skeweddistribution
 Negatively skeweddistribution
2/19/2022
15
Dr Nauman Arif

Skewness Cut‐off
 If Skewness > 1 or Mean > Median > Mode,
the distribution is positivelyskewed.
 If Skewness < ‐ 1 or Mean < Median < Mode,
the distribution is negativelyskewed.
 If ‐1 ≤ Skewness ≤ 1 or Mean = Median = Mode,
the distribution is approximatelysymmetric.
Symmetric
2/19/2022
16
Dr Nauman Arif

Kurtosis (Peakedness)
Kurtosis is the degree of Peakedness of a
distribution, usually taken in relation to a normal
distribution.
 Leptokurtic
 Platykurtic
 Mesokurtic
2/19/2022
17
Dr Nauman Arif

Kurtosis
 A curve having relatively higherpeak than the normal
curve, is known asLeptokurtic.
 On theotherhand, if thecurve is more flat‐topped
than the normal curve, it is calledPlatykurtic.
 A normal curve itself is called Mesokurtic, whichis
neither too peaked nor tooflat‐topped.
2/19/2022
18
Dr Nauman Arif

Measure of Kurtosis
 If Kurtosis > 1, the distribution is leptokurtic.
 If Kurtosis < ‐1,the distribution isplatykurtic.
 If ‐1 ≤ Kurtosis ≤ 1,
thedistribution is (approximately normal / mesokurtic).
2/19/2022
19
Dr Nauman Arif

Symmetric
Right skewed
Left skewed
2/19/2022
20
Dr Nauman Arif

Inferential statistics
• Research hypothesis
• Null hypothesis = No association
• Alternate hypothesis = association
• Statistical significance = 0.05, 0.01, 0.001
• Confidence intervals
• Statistical power
2/19/2022
21
Dr Nauman Arif

Hypothesis Testing
• Null Hypothesis
Ho = No association b/w smoking & lung cancer
• Alternate Hypothesis
Ha = Statistical association b/w smoking & lung cancer
• P value = 0.05 0.01 0.001
• P value = 0.003 <0.05 Association
• P value = .45 >0.05 No association
• P value = 0.05
• P value = 0.000 <0.001
2/19/2022
Dr Nauman Arif
22

Confidence Interval
• A confidence interval is the probability that a
population parameter will fall between a pair of
values around the mean.
OR
• A confidence interval is a range of values,
bounded above and below the statistic's mean,
that likely would contain an unknown
population parameter.
2/19/2022
Dr Nauman Arif
23

Confidence level
• Confidence level refers to the percentage of
probability, or certainty, that the confidence interval
would contain the true population parameter when
you draw a random sample many times.
• Conventionally the most often constructed using
confidence levels of 95% or 99%.
• As the confidence level increases the width of the
confidence interval also increases. A larger
confidence level increases the chance that the
correct value will be found in the confidence
interval.
2/19/2022
Dr Nauman Arif
24

CI / CL & Sample Size
• The width of a confidence interval decreases as the
sample size increases and increases as the confidence
level increases.
Explanation:
• Larger samples give narrower intervals. We are able to
estimate a population proportion more precisely with a
larger sample size.
• As the confidence level increases the width of the
confidence interval also increases. A larger confidence
level increases the chance that the correct value will be
found in the confidence interval. This means that the
interval is larger.
2/19/2022
Dr Nauman Arif
25

Statistical Power
• Statistical power, or the power of a hypothesis
test is the probability that the test correctly
rejects the null hypothesis.
• The higher the statistical power for a given
experiment, the lower the probability of making
a Type I (false negative) error. That is the higher
the probability of detecting an effect when there
is an effect. In fact, the power is precisely the
inverse of the probability of a Type II error.
2/19/2022
Dr Nauman Arif
26

P Value
• A p-value is a measure of the probability that an
observed difference could have occurred just by
random chance.
• The lower the p-value, the greater the statistical
significance of the observed difference.
• A p-value less than 0.05 (typically < 0.05) is
statistically significant. ... A p-value higher than
0.05 (> 0.05) is not statistically significant and
indicates strong evidence for the null hypothesis.
2/19/2022
Dr Nauman Arif
27

SPSS
Introduction to SPSS /STATA
1. Variables entry
2. Data entry
3. Data import
4. Transformation of data
5. Cleaning of data
2/19/2022
28
Dr Nauman Arif

SPSS
Descriptive analysis
1. Descriptive analysis of categorical data
2. Descriptive analysis of scale data
3. Graphical presentation of categorical data
4. Graphical presentation of scale data
5. Normality of data
2/19/2022
29
Dr Nauman Arif

Types of tests
1. Parametric tests: (Follow normal distribution)
 One Sample T test
 Independent Sample T test
 Paired T test
 One way ANOVA
 Correlation
 Regression
2. Non parametric tests: (Don’t follow normal
distribution)
• Signed test
• Mann whitney U test
• Wilcoxon signed rank test
• Kruskal wallis test
2/19/2022
30
Dr Nauman Arif

SPSS
Comparison of means
1. Student T Test
2. Independent T test
3. Paired T test
4. ANOVA
5. Post Hoc test
2/19/2022
31
Dr Nauman Arif

SPSS
1. Chi square test
2. Fisher exact test
3. Correlation
4. Logistic Regression
5. Linear Regression
2/19/2022
32
Dr Nauman Arif

Student t test /One sample t test
Assumptions
• Compare mean of single variables with the
population parameter or standard one
Analysis
• Analyze > Compare means > One Sample t test
Interpretation
• Mean difference + Confidence Interval + P-value
2/19/2022
33
Dr Nauman Arif

Independent T test
Assumptions
• Two independent groups
• Dependent variable continues
• Independent variable categorical (dichotomous)
Analysis
• Analyze > Compare means > Independent sample t
test
Interpretation
• Mean difference + Confidence Interval + P-value
2/19/2022
34
Dr Nauman Arif

Paired t test
Assumptions
• Variables continues
• Compare means of two groups
• Comparison of one group before and after
intervention
• Pre and post test
Analysis
• Analyze > Compare means > Paired Samples T- test
Interpretation
Mean difference + Confidence Interval + P-value
2/19/2022
35
Dr Nauman Arif

One way ANOVA
Assumptions
• 1. Dependent variable continues
• 2. Independent variable categorical (3 or more
categories)
Analysis
• Analyze > Compare means > One-Way ANOVA
Interpretation
Mean difference + Confidence Interval + P-value
2/19/2022
36
Dr Nauman Arif

Chi square test
Assumptions
• Dependent variable categorical (preferably dichotomous)
• Independent variable categorical
We can’t apply chi square in the following two situations
1. Zero in one of the expected cells
2. If the number in the expected cell is less than 5 in more than 20% cells
• In both situations we go for Fisher’s Exact test
Analysis
• Analyze > descriptive statistics > Crosstab > select variables in rows and
columns
• Click Statistics > check chi square > continue
• Click Cells > observed and rows > continue
• Ok
Interpretation
• P. Value 0.05
• If P-value is less than 0.05 so we reject null hypothesis (significant)
• If P-value is greater than 0.05 so we fail to reject null hypothesis (non
significant)
2/19/2022
37
Dr Nauman Arif

Correlation
• Dependent and independent both variables are
continues
• The correlation coefficient r measures the
strength and direction of a linear relationship
between two variables on a scatterplot.
• The value of r is always between +1 and –1.
• R2 is Co-efficient of determination and we write
it in %
2/19/2022
38
Dr Nauman Arif

R value +1 to -1
• r value between +1 to -1
• –1. A perfect negative linear relationship
• –0.70. A strong negative linear relationship
• –0.50. A moderate negative relationship
• –0.30. A weak negative linear relationship
• 0. No linear relationship
• +0.30. A weak positive linear relationship
• +0.50. A moderate positive relationship
• +0.70. A strong positive linear relationship
• +1. A perfect positive linear relationship
2/19/2022
39
Dr Nauman Arif

Linear Regression
• 1. Dependent variable continues
• 2. Independent variable continues or categorical
• Assumptions
• To present linear relationship b/w variables
• To adjust Confounders
• To predict one variable by knowing others
2/19/2022
40
Dr Nauman Arif

Regression
• Formula (Y = a + bx) (a = constant, b = co-
efficient)
• Linear regression gives us
• 1. a which is constant
• 2. b which is coefficient
• 3. P-value
• By putting values in formula we can predict one
variable by knowing others
2/19/2022
41
Dr Nauman Arif

Logistic regression
• 1. Dependent variables categorical (dichotomous)
• 2. Independent variable continues or categorical
2/19/2022
42
Dr Nauman Arif

Thank You
2/19/2022
43
Dr Nauman Arif

Statistical tests SPSS (1).pdf

Recommended

Recommended

More Related Content

Similar to Statistical tests SPSS (1).pdf

Similar to Statistical tests SPSS (1).pdf (20)

Recently uploaded

Recently uploaded (20)

Statistical tests SPSS (1).pdf