19-Oct-18
1
Bodhiya Wijaya Mulya, S.Sos., M.M.
Research aims to discover phenomenon that we
assume actually exist
Whatever the phenomenon we desire to explain, we
collect data from the real world to test our hypotheses
about the phenomenon
Hypotheses testing must involve statistical technique
19-Oct-18
2
Descriptive
Statistics
Inferential
Statistics
• collecting
• organizing
• Summarizing
• presenting data
• making inferences
• hypothesis testing
• determining
relationships
• making predictions
Statistical
Techniques
19-Oct-18
3
Statistic that describe numerical data
Frequency Distribution
Central Tendency
Variation
Data Shape
 Easiest way to describe numerical data
 Can present in table or graph
Poor
Below Average
Average
Above Average
Excellent
2
3
5
9
1
Total 20
Rating Frequency
PoorPoor Below
Average
Below
Average
AverageAverageAbove
Average
Above
Average
ExcellentExcellent
FrequencyFrequency
RatingRating
11
22
33
44
55
66
77
88
99
1010
Toyota Quality Ratings
19-Oct-18
4
Mode:
 The most common or frequently occurring number
 Can be used with nominal, ordinal, interval and
ratio
Median:
 The middle point of data. Also known as 50th
Percentile or Second Quartile
 Can be used with ordinal, interval, and ratio
Mean:
 Arithmetic average
 The most widely used measure of central tendency
 Can be used with interval and ratio
19-Oct-18
5
Measuring how data spread and vary
Range
 Substraction between Maximum Value with Minimun
Value
 Easiest way to measure variability
 Very sensitive with outlier
Percentile
 Divided data into 10 part
 Tell us the score at a specific position within
distribution
 Calculation of percentile is similar with median
19-Oct-18
6
Standard Deviation
 Gives “average distance” between all data and the
mean
 The most comprehensive and widely used
 Can be used only with interval/ratio data
 Tell us the score at a specific position within
distribution
Skewness
Kurtosis
19-Oct-18
7
An important measure of the shape of a distribution
Pearson coefficient of Skewness (sk)
sk = 3(Mean-Median)
Standard Dev.
Symmetric:
 The mean and median are equal and the data values are evenly spread around
these values
 Value of Pearson skewness is zero
 Value in software skewness is near zero
Right-SkewedLeft-Skewed Symmetric
Mean = Median = ModeMean Median Mode Mode Median Mean
19-Oct-18
8
Positive/Right Skewed:
 Mean will usually be more than the median Value of Pearson
skewness 0 < sk < 3
 Value in software skewness is positive
Right-SkewedLeft-Skewed Symmetric
Mean = Median = ModeMean Median Mode Mode Median Mean
Negative/Left Skewed:
 Median will usually be more than the mean
 Value of Pearson skewness -3 < sk < 0
 Value in software skewness is negative
Right-SkewedLeft-Skewed Symmetric
Mean = Median = ModeMean Median Mode Mode Median Mean
19-Oct-18
9
Measure provides information about the peakedness
of the distribution
Zero or near zero kurtosis values indicate that the
distribution is normal
Sometimes it’s called mesokurtic
19-Oct-18
10
Positive kurtosis values indicate that the distribution is
rather peaked because many cases clustered in the
centre)
Sometimes it’s called leptokurtic
Negative kurtosis values indicate that the distribution
is a distribution that is relatively flat because too many
cases in the extremes)
Sometimes it’s called platykurtic
19-Oct-18
11
In large sample skewness and kurtosis could be
inaccurate
inspecting the shape of the distribution using a
histogram
Introduction to Social Science Statistics
Normality tests are used to determine if a data set is
well-modeled by a normal distribution
Normality test help us to determine what statistics
technique we should use
Most of Inferential technique require normal
distribution
Introduction to Social Science Statistics
19-Oct-18
12
Test of Relationship
Test of Difference
19-Oct-18
13
Correlation
 Pearson or Spearman Correlation can be used to explore the
strength of relationship between two continuous variable
 Pearson for normal distribution, while Spearman for not
 This test will give us direction and strength of relationship
 Positive correlations: If one variable increase, the other
would also increase
 Negative correlations: If one variable increase, the other
would decrease
Correlation
 Lind, Marchal, and Wathen (2012, p. 465)
0 = No Correlation
0,10 – 0,49 = Weak Correlation
0,5 = Medium Correlation
0,51 – 0,99 = Strong Correlation
1 = Perfect Correlation
19-Oct-18
14
Correlation
 Cohen (1988, p. 79-81) in Pallant (2007, p. 132)
0.10 – 0.29 = Weak Correlation
0,30 – 0,49 = Medium Correlation
0,50 – 1 = Strong Corellation
Partial Correlation
 Extension of Correlation that allows us to control another
variable
 This test allows us to get more accurate picture about
relationship between two variable
19-Oct-18
15
Simple/Multiple Regression
 Sophisticated extension of correlation
 These test is used when we want to predict ability of a set of
independent variable(s) on one continuous dependent
variable
Logistic Regression
 Similar with multiple regression but use categorical dependent
variable
 Dependent variable better to be dichotomous
 It doesn’t have the requirements of the independent
variables to be normally distributed, linearly related, nor
equal variance within each group
 Minimum 50 sample per one independent variable
19-Oct-18
16
Factor Analysis
 Not designed to test hypotheses
 This teqhnique is used extensively by researcher to develop
and evaluate test or scales
 Reduce and refine large number of question items into
smaller and more manageable
 Example:What is the factor that makes people religious
Discriminant Analysis
 Explore predictive ability of a set of independent variables
on one categorical dependent variable
 Used to differentiate group based on several
predictors/Independent variables
 Can have more than two groups on dependent variable
 Example: Differentiate student failure/passed/standout
based on Mid-Exam and Paper score
19-Oct-18
17
Canonical Correlation
 This test is used when we want to analyze two sets of
variable
 Usually used for exploratory research
 Example: a set of variables measuring medical compliance
(willingness to buy drugs, to make return office visits, to use
drugs, to restrict activity) and a set of demographic
characteristics (educational level, religious affiliation,
income, medical insurance)
Structural Equation Modeling
 Very sophisticated technique
 Combine multiple regression with factor analysis
 Evaluate model fit with our data
19-Oct-18
18
T-Test
 Test to comparing mean score difference between two
group
 Consist of Independent, Paired, and One-sample with
different situation
Mann-Whitney U Test and Wilcoxon Signed Rank Test
 Non-parametric version of T-Test
One Way Analysis of Variance (One Way ANOVA)
 Test to comparing mean score difference between two or
more group
 In ANOVA, independent variable is “group” and dependent
variable is score
 Can be used for same group or different group
Kruskal-Wallis Test and Friedman Test
 Non-parametric version of One Way ANOVA
19-Oct-18
19
Two Way Analysis of Variance (Two Way ANOVA)
 Test to comparing mean score difference from two
independent variables (group)
 Independent variables are categorical such as comparing
GPA between student enrollment group and gender
 Can be used for same group or different group
Multivariate Analysis of Variance (MANOVA)
 Test to comparing mean score difference with several
dependent variables
 Can be one way or two way
 Example: Comparing leadership skills and GPA based on
Religion and Gender of student
19-Oct-18
20
Analysis of Covariance (ANCOVA)
 Test to comparing mean score difference with “controlled”
covariate variable to minimize error
 Similar with partial correlation
 Can be one way, two way, or multivariate
 Example: Comparing leadership skills based on gender
with controlled GPA
Research questions that you want to address
Find the questionnaire items and scales that you will
use to address these questions
Identify the nature of each of your variables
Draw diagram for each of your research questions (if
Possible)
Decide wheter a parametric or a non-parametric
statistical technique is appropriate
19-Oct-18
21
 Field, A. (2009). Discovering statistics using SPSS (3rd ed.). London: Sage Publications
Ltd.
 Lind, D.A., Marchal,W.G., and Wathen, S.A. (2012). Statistical techniques in business
and economics. New York: McGraw-Hill.
 Pallant, J. (2007). SPSS survival manual. Berkshire: McGraw-Hill Open University
Press.
 Tabachnick, B.G. and Fidell, L.S. (2013). Using Multivariate Statistics (6th ed.). Boston:
Pearson Education.

Choosing the Right Statistical Techniques

  • 1.
    19-Oct-18 1 Bodhiya Wijaya Mulya,S.Sos., M.M. Research aims to discover phenomenon that we assume actually exist Whatever the phenomenon we desire to explain, we collect data from the real world to test our hypotheses about the phenomenon Hypotheses testing must involve statistical technique
  • 2.
    19-Oct-18 2 Descriptive Statistics Inferential Statistics • collecting • organizing •Summarizing • presenting data • making inferences • hypothesis testing • determining relationships • making predictions Statistical Techniques
  • 3.
    19-Oct-18 3 Statistic that describenumerical data Frequency Distribution Central Tendency Variation Data Shape  Easiest way to describe numerical data  Can present in table or graph Poor Below Average Average Above Average Excellent 2 3 5 9 1 Total 20 Rating Frequency PoorPoor Below Average Below Average AverageAverageAbove Average Above Average ExcellentExcellent FrequencyFrequency RatingRating 11 22 33 44 55 66 77 88 99 1010 Toyota Quality Ratings
  • 4.
    19-Oct-18 4 Mode:  The mostcommon or frequently occurring number  Can be used with nominal, ordinal, interval and ratio Median:  The middle point of data. Also known as 50th Percentile or Second Quartile  Can be used with ordinal, interval, and ratio Mean:  Arithmetic average  The most widely used measure of central tendency  Can be used with interval and ratio
  • 5.
    19-Oct-18 5 Measuring how dataspread and vary Range  Substraction between Maximum Value with Minimun Value  Easiest way to measure variability  Very sensitive with outlier Percentile  Divided data into 10 part  Tell us the score at a specific position within distribution  Calculation of percentile is similar with median
  • 6.
    19-Oct-18 6 Standard Deviation  Gives“average distance” between all data and the mean  The most comprehensive and widely used  Can be used only with interval/ratio data  Tell us the score at a specific position within distribution Skewness Kurtosis
  • 7.
    19-Oct-18 7 An important measureof the shape of a distribution Pearson coefficient of Skewness (sk) sk = 3(Mean-Median) Standard Dev. Symmetric:  The mean and median are equal and the data values are evenly spread around these values  Value of Pearson skewness is zero  Value in software skewness is near zero Right-SkewedLeft-Skewed Symmetric Mean = Median = ModeMean Median Mode Mode Median Mean
  • 8.
    19-Oct-18 8 Positive/Right Skewed:  Meanwill usually be more than the median Value of Pearson skewness 0 < sk < 3  Value in software skewness is positive Right-SkewedLeft-Skewed Symmetric Mean = Median = ModeMean Median Mode Mode Median Mean Negative/Left Skewed:  Median will usually be more than the mean  Value of Pearson skewness -3 < sk < 0  Value in software skewness is negative Right-SkewedLeft-Skewed Symmetric Mean = Median = ModeMean Median Mode Mode Median Mean
  • 9.
    19-Oct-18 9 Measure provides informationabout the peakedness of the distribution Zero or near zero kurtosis values indicate that the distribution is normal Sometimes it’s called mesokurtic
  • 10.
    19-Oct-18 10 Positive kurtosis valuesindicate that the distribution is rather peaked because many cases clustered in the centre) Sometimes it’s called leptokurtic Negative kurtosis values indicate that the distribution is a distribution that is relatively flat because too many cases in the extremes) Sometimes it’s called platykurtic
  • 11.
    19-Oct-18 11 In large sampleskewness and kurtosis could be inaccurate inspecting the shape of the distribution using a histogram Introduction to Social Science Statistics Normality tests are used to determine if a data set is well-modeled by a normal distribution Normality test help us to determine what statistics technique we should use Most of Inferential technique require normal distribution Introduction to Social Science Statistics
  • 12.
  • 13.
    19-Oct-18 13 Correlation  Pearson orSpearman Correlation can be used to explore the strength of relationship between two continuous variable  Pearson for normal distribution, while Spearman for not  This test will give us direction and strength of relationship  Positive correlations: If one variable increase, the other would also increase  Negative correlations: If one variable increase, the other would decrease Correlation  Lind, Marchal, and Wathen (2012, p. 465) 0 = No Correlation 0,10 – 0,49 = Weak Correlation 0,5 = Medium Correlation 0,51 – 0,99 = Strong Correlation 1 = Perfect Correlation
  • 14.
    19-Oct-18 14 Correlation  Cohen (1988,p. 79-81) in Pallant (2007, p. 132) 0.10 – 0.29 = Weak Correlation 0,30 – 0,49 = Medium Correlation 0,50 – 1 = Strong Corellation Partial Correlation  Extension of Correlation that allows us to control another variable  This test allows us to get more accurate picture about relationship between two variable
  • 15.
    19-Oct-18 15 Simple/Multiple Regression  Sophisticatedextension of correlation  These test is used when we want to predict ability of a set of independent variable(s) on one continuous dependent variable Logistic Regression  Similar with multiple regression but use categorical dependent variable  Dependent variable better to be dichotomous  It doesn’t have the requirements of the independent variables to be normally distributed, linearly related, nor equal variance within each group  Minimum 50 sample per one independent variable
  • 16.
    19-Oct-18 16 Factor Analysis  Notdesigned to test hypotheses  This teqhnique is used extensively by researcher to develop and evaluate test or scales  Reduce and refine large number of question items into smaller and more manageable  Example:What is the factor that makes people religious Discriminant Analysis  Explore predictive ability of a set of independent variables on one categorical dependent variable  Used to differentiate group based on several predictors/Independent variables  Can have more than two groups on dependent variable  Example: Differentiate student failure/passed/standout based on Mid-Exam and Paper score
  • 17.
    19-Oct-18 17 Canonical Correlation  Thistest is used when we want to analyze two sets of variable  Usually used for exploratory research  Example: a set of variables measuring medical compliance (willingness to buy drugs, to make return office visits, to use drugs, to restrict activity) and a set of demographic characteristics (educational level, religious affiliation, income, medical insurance) Structural Equation Modeling  Very sophisticated technique  Combine multiple regression with factor analysis  Evaluate model fit with our data
  • 18.
    19-Oct-18 18 T-Test  Test tocomparing mean score difference between two group  Consist of Independent, Paired, and One-sample with different situation Mann-Whitney U Test and Wilcoxon Signed Rank Test  Non-parametric version of T-Test One Way Analysis of Variance (One Way ANOVA)  Test to comparing mean score difference between two or more group  In ANOVA, independent variable is “group” and dependent variable is score  Can be used for same group or different group Kruskal-Wallis Test and Friedman Test  Non-parametric version of One Way ANOVA
  • 19.
    19-Oct-18 19 Two Way Analysisof Variance (Two Way ANOVA)  Test to comparing mean score difference from two independent variables (group)  Independent variables are categorical such as comparing GPA between student enrollment group and gender  Can be used for same group or different group Multivariate Analysis of Variance (MANOVA)  Test to comparing mean score difference with several dependent variables  Can be one way or two way  Example: Comparing leadership skills and GPA based on Religion and Gender of student
  • 20.
    19-Oct-18 20 Analysis of Covariance(ANCOVA)  Test to comparing mean score difference with “controlled” covariate variable to minimize error  Similar with partial correlation  Can be one way, two way, or multivariate  Example: Comparing leadership skills based on gender with controlled GPA Research questions that you want to address Find the questionnaire items and scales that you will use to address these questions Identify the nature of each of your variables Draw diagram for each of your research questions (if Possible) Decide wheter a parametric or a non-parametric statistical technique is appropriate
  • 21.
    19-Oct-18 21  Field, A.(2009). Discovering statistics using SPSS (3rd ed.). London: Sage Publications Ltd.  Lind, D.A., Marchal,W.G., and Wathen, S.A. (2012). Statistical techniques in business and economics. New York: McGraw-Hill.  Pallant, J. (2007). SPSS survival manual. Berkshire: McGraw-Hill Open University Press.  Tabachnick, B.G. and Fidell, L.S. (2013). Using Multivariate Statistics (6th ed.). Boston: Pearson Education.