How to choose the right statistics techniques in different situation. This short presentation provide a compact summary on various method of statistics either descriptive and inferential.
for further inquiry please reach me at bodhiyawijaya@gmail.com
1. 19-Oct-18
1
Bodhiya Wijaya Mulya, S.Sos., M.M.
Research aims to discover phenomenon that we
assume actually exist
Whatever the phenomenon we desire to explain, we
collect data from the real world to test our hypotheses
about the phenomenon
Hypotheses testing must involve statistical technique
3. 19-Oct-18
3
Statistic that describe numerical data
Frequency Distribution
Central Tendency
Variation
Data Shape
Easiest way to describe numerical data
Can present in table or graph
Poor
Below Average
Average
Above Average
Excellent
2
3
5
9
1
Total 20
Rating Frequency
PoorPoor Below
Average
Below
Average
AverageAverageAbove
Average
Above
Average
ExcellentExcellent
FrequencyFrequency
RatingRating
11
22
33
44
55
66
77
88
99
1010
Toyota Quality Ratings
4. 19-Oct-18
4
Mode:
The most common or frequently occurring number
Can be used with nominal, ordinal, interval and
ratio
Median:
The middle point of data. Also known as 50th
Percentile or Second Quartile
Can be used with ordinal, interval, and ratio
Mean:
Arithmetic average
The most widely used measure of central tendency
Can be used with interval and ratio
5. 19-Oct-18
5
Measuring how data spread and vary
Range
Substraction between Maximum Value with Minimun
Value
Easiest way to measure variability
Very sensitive with outlier
Percentile
Divided data into 10 part
Tell us the score at a specific position within
distribution
Calculation of percentile is similar with median
6. 19-Oct-18
6
Standard Deviation
Gives “average distance” between all data and the
mean
The most comprehensive and widely used
Can be used only with interval/ratio data
Tell us the score at a specific position within
distribution
Skewness
Kurtosis
7. 19-Oct-18
7
An important measure of the shape of a distribution
Pearson coefficient of Skewness (sk)
sk = 3(Mean-Median)
Standard Dev.
Symmetric:
The mean and median are equal and the data values are evenly spread around
these values
Value of Pearson skewness is zero
Value in software skewness is near zero
Right-SkewedLeft-Skewed Symmetric
Mean = Median = ModeMean Median Mode Mode Median Mean
8. 19-Oct-18
8
Positive/Right Skewed:
Mean will usually be more than the median Value of Pearson
skewness 0 < sk < 3
Value in software skewness is positive
Right-SkewedLeft-Skewed Symmetric
Mean = Median = ModeMean Median Mode Mode Median Mean
Negative/Left Skewed:
Median will usually be more than the mean
Value of Pearson skewness -3 < sk < 0
Value in software skewness is negative
Right-SkewedLeft-Skewed Symmetric
Mean = Median = ModeMean Median Mode Mode Median Mean
9. 19-Oct-18
9
Measure provides information about the peakedness
of the distribution
Zero or near zero kurtosis values indicate that the
distribution is normal
Sometimes it’s called mesokurtic
10. 19-Oct-18
10
Positive kurtosis values indicate that the distribution is
rather peaked because many cases clustered in the
centre)
Sometimes it’s called leptokurtic
Negative kurtosis values indicate that the distribution
is a distribution that is relatively flat because too many
cases in the extremes)
Sometimes it’s called platykurtic
11. 19-Oct-18
11
In large sample skewness and kurtosis could be
inaccurate
inspecting the shape of the distribution using a
histogram
Introduction to Social Science Statistics
Normality tests are used to determine if a data set is
well-modeled by a normal distribution
Normality test help us to determine what statistics
technique we should use
Most of Inferential technique require normal
distribution
Introduction to Social Science Statistics
13. 19-Oct-18
13
Correlation
Pearson or Spearman Correlation can be used to explore the
strength of relationship between two continuous variable
Pearson for normal distribution, while Spearman for not
This test will give us direction and strength of relationship
Positive correlations: If one variable increase, the other
would also increase
Negative correlations: If one variable increase, the other
would decrease
Correlation
Lind, Marchal, and Wathen (2012, p. 465)
0 = No Correlation
0,10 – 0,49 = Weak Correlation
0,5 = Medium Correlation
0,51 – 0,99 = Strong Correlation
1 = Perfect Correlation
14. 19-Oct-18
14
Correlation
Cohen (1988, p. 79-81) in Pallant (2007, p. 132)
0.10 – 0.29 = Weak Correlation
0,30 – 0,49 = Medium Correlation
0,50 – 1 = Strong Corellation
Partial Correlation
Extension of Correlation that allows us to control another
variable
This test allows us to get more accurate picture about
relationship between two variable
15. 19-Oct-18
15
Simple/Multiple Regression
Sophisticated extension of correlation
These test is used when we want to predict ability of a set of
independent variable(s) on one continuous dependent
variable
Logistic Regression
Similar with multiple regression but use categorical dependent
variable
Dependent variable better to be dichotomous
It doesn’t have the requirements of the independent
variables to be normally distributed, linearly related, nor
equal variance within each group
Minimum 50 sample per one independent variable
16. 19-Oct-18
16
Factor Analysis
Not designed to test hypotheses
This teqhnique is used extensively by researcher to develop
and evaluate test or scales
Reduce and refine large number of question items into
smaller and more manageable
Example:What is the factor that makes people religious
Discriminant Analysis
Explore predictive ability of a set of independent variables
on one categorical dependent variable
Used to differentiate group based on several
predictors/Independent variables
Can have more than two groups on dependent variable
Example: Differentiate student failure/passed/standout
based on Mid-Exam and Paper score
17. 19-Oct-18
17
Canonical Correlation
This test is used when we want to analyze two sets of
variable
Usually used for exploratory research
Example: a set of variables measuring medical compliance
(willingness to buy drugs, to make return office visits, to use
drugs, to restrict activity) and a set of demographic
characteristics (educational level, religious affiliation,
income, medical insurance)
Structural Equation Modeling
Very sophisticated technique
Combine multiple regression with factor analysis
Evaluate model fit with our data
18. 19-Oct-18
18
T-Test
Test to comparing mean score difference between two
group
Consist of Independent, Paired, and One-sample with
different situation
Mann-Whitney U Test and Wilcoxon Signed Rank Test
Non-parametric version of T-Test
One Way Analysis of Variance (One Way ANOVA)
Test to comparing mean score difference between two or
more group
In ANOVA, independent variable is “group” and dependent
variable is score
Can be used for same group or different group
Kruskal-Wallis Test and Friedman Test
Non-parametric version of One Way ANOVA
19. 19-Oct-18
19
Two Way Analysis of Variance (Two Way ANOVA)
Test to comparing mean score difference from two
independent variables (group)
Independent variables are categorical such as comparing
GPA between student enrollment group and gender
Can be used for same group or different group
Multivariate Analysis of Variance (MANOVA)
Test to comparing mean score difference with several
dependent variables
Can be one way or two way
Example: Comparing leadership skills and GPA based on
Religion and Gender of student
20. 19-Oct-18
20
Analysis of Covariance (ANCOVA)
Test to comparing mean score difference with “controlled”
covariate variable to minimize error
Similar with partial correlation
Can be one way, two way, or multivariate
Example: Comparing leadership skills based on gender
with controlled GPA
Research questions that you want to address
Find the questionnaire items and scales that you will
use to address these questions
Identify the nature of each of your variables
Draw diagram for each of your research questions (if
Possible)
Decide wheter a parametric or a non-parametric
statistical technique is appropriate
21. 19-Oct-18
21
Field, A. (2009). Discovering statistics using SPSS (3rd ed.). London: Sage Publications
Ltd.
Lind, D.A., Marchal,W.G., and Wathen, S.A. (2012). Statistical techniques in business
and economics. New York: McGraw-Hill.
Pallant, J. (2007). SPSS survival manual. Berkshire: McGraw-Hill Open University
Press.
Tabachnick, B.G. and Fidell, L.S. (2013). Using Multivariate Statistics (6th ed.). Boston:
Pearson Education.