Research Methodology
Chapter 12
Quantitative Data Analysis: Hypothesis
Testing
Types I errors, Type II
Errors &statistical Power
Type I error

: the probability of rejecting
the null hypothesis when it
is actually true.

Type II error

the probability of failing to
reject the null hypothesis
given that the alternative
hypothesis is actually true.
Statistical power
(1 - ):

 the probability
of correctly
rejecting the null
hypothesis.

alpha
Sample size

Effect
size
Testing Hypotheses on a Single Mean

 One sample t-test: statistical
technique that is used to test the
hypothesis that the mean of the
population from which a sample is
drawn is equal to a comparison
standard.
Testing hypothesis about two
related means
 Paired sample t-test to examine the differences

in the same group before and after treatment.
 The Wilcoxon signed-rank test: a nonparametric test for examining significant
differences between two related samples or
repeated measurements on a single sample.
Used as an alternative for a paired samples ttest when the population cannot be assumed to
be normally distributed.
RESEARCH METHODOLOGY OF TEN STUDENTS
IN THE FIRST WEEK AND LAST WEEK OF
SEMESTER
Testing hypothesis about two related
means

 McNemar's test: non-parametric method used on

nominal data. It assesses the significance of the
difference between two dependent samples when
the variable of interest is dichotomous. It is used
primarily in before-after studies to test for an
experimental effect.
Performance of student before and after
extra class
Testing hypothesis about two unrelated
means

• Independent samples t-test: is done to see
if there are any significant differences in
the means for two groups in the variable
of interest.
Testing hypothesis about several
means

• Analysis Of Variance (ANOVA) helps to examine
the significant mean differences among more than
two groups on an interval or ratio-scaled
dependent variable.
Regression Analysis

• Simple regression analysis is used in a
situation where one metric
independent variable is hypothesized
to affect one metric dependent
variable.
Scatter plot
100

LKLHD_DATE

80

60

40

20

30

40

50

60

PHYS_ATTR

70

80

90
Simple Linear Regression
Yi

0

1

Xi

i

Y
ˆ
ˆ

ˆ

1

0

`0

1

0

X
Standardized regression coefficients

 Standardized regression coefficients or beta
coefficients are the estimates resulting from a
multiple regression analysis performed on
variable that have been standardized. This is
usually done to allow the researcher to compare
the relative effects of independent variable on
the dependent variable, when independent
variable are measured in different unit of
measurement.
Regression with dummy
variable
• A dummy variable (also known as an
indicator variable, design variable,
categorical variable, binary variable, or
qualitative variable)
• Dummy variable allow to use nominal or
ordinal variable as independent variable
to explain, understand, or predict the
dependent variable.
MULTICOLLINEARITY
• Encountered statistical phenomenon in which two or more independent
variables in a multiple regression model are highly correlated.
• It makes the estimation of the regression coefficients impossible and
sometimes unreliable.
• To detect multicollinearity, we must check the correlation matrix for the
independent variables.
• The high correlations is first sign of sizeable multicollinearity.
TWO MEASURES :
Tolerance value
Variance inflation factor ( VIF )
To measure indicate the degree to which one independent variable and explained
by the other independent variable.
A display of the FEV data in SPSS
• To fit multiple linear regression model in SPSS using the FEV
data do the following:
• Analyze > Regression > Linear and then move forced
expiratory volume into the dependent box and Smoke and age
into independent(s) box. Then Click OK.
• This will give you the model summary table, ANOVA table
and the regression coefficients table in the output window.
A demonstration of how to start fitting the multiple
regression model in SPSS
A demonstration of how to select the dependent and
independent variable(s) for fitting multiple regression in SPSS.
A demonstration of how to select diagnostic statistic for
checking outliers and
multicollinearity issues in SPSS.
Multicollinearity is not a serious problem, because the
estimation of the regression coefficients may be unstable.
But when the objective of the study is to reliably estimate the
individual regression coefficients, multicollinearity is a
problem.
The Methods to Reduce
Reduce the set of independent variables to a set that are not
collinear.
Use more sophisticated ways to analyze the data, such as
ridge regression.
Create a new variable that is a composite of the highly
correlated variables.
Testing moderating using regression
analysis : interaction effects
It is effect one variable ( X1 ) on Y depends on the value of
another variable ( X2 ).
Moderating variable as a variable that modifies the original
relationship between an independent variable and dependent
variable.
Example :
H1 : The students’ judgement of the university’s library is
affected by the students’ judgement of the computers.
-It’s means the relationship between the judgement of computers
in the library and the judgement of the library is affected by
computer ownership.
H2 : The relationship between the judgement of computers in the
library is moderated by computer ownership.
Other multivariate tests and
analysis
•
•
•
•
•
•

Discriminant analysis
Logistic regression
Conjoint analysis
Two-way ANOVA
MANOVA
Canonical correlation
Other multivariate tests and
analysis
• Discriminant analysis
-help to identify IV that discriminate a
normally scaled DV of interest.
Other multivariate tests and
analysis
• Logistic regression
-used when the DV is nonmetric
-always used when DV has only 2
groups.
-it allows researcher to predict discrete
outcome.
Other multivariate tests and
analysis
• Conjoint analysis
-statistical technique used in many fields.
-used to understand how consumers develop
preferences for product/services
-built on the idea that consumers evaluate
the value of a product or service by
combining the value that is provided by each
attribute.
Other multivariate tests and
analysis
• Two-way ANOVA
-used to examine the effect of two non
metric IV on a single metric DV
-enable us to examine main effects &
also interaction effects that exist
between the independent variables.
Other multivariate tests and
analysis
• Two-way ANOVA
-example
DV : Satisfy with toy
IV : i) toy colour (pink & blue)
ii) gender (male & female)
 Main effect of toy colour. Pink toys significantly more
satisfaction than the blue toys.
 Main effect of gender. The female are more satisfy with the
toy than the male
Other multivariate tests and
analysis
• Multivariate Analysis of Variance
(MANOVA)
-is a multivariate extension of analysis of
variance.
-the IV measured on a nominal scale & the
DV on interval/ratio scale
i) The null hyphothesis:
Hₒ
:µ1=µ2=µ3... µn
ii) The alternate hyphothesis:
HA:µ1≠µ2≠µ3≠... µn
Other multivariate tests and
analysis
• Canonical correlation
-examine the relationship between two or
more DV & several IV
Data warehousing

•

Most companies are now aware of the benefits of
creating a data warehouse that serves as the central
repository of all data collected from disparate
sources including those pertaining to the company's
finance, manufacturing, sales, and the like.
Data Mining

• Complementary to the functions of data
warehousing, many companies resort to data
mining as a strategic tool for reaching new levels of
business intelligence.
• Using algorithms to analyze data in a meaningful
way, data mining more effectively leverages the
data warehouse by identifying hidden relations and
patterns in the data stored in it.
Operations Research

• Operations research (OR) or management science
(MS) is another sophisticated tool used to simplify
and thus clarify certain types of complex problem
that lend themselves to quantification.

Quantitative Data Analysis: Hypothesis Testing

  • 1.
    Research Methodology Chapter 12 QuantitativeData Analysis: Hypothesis Testing
  • 2.
    Types I errors,Type II Errors &statistical Power Type I error : the probability of rejecting the null hypothesis when it is actually true. Type II error the probability of failing to reject the null hypothesis given that the alternative hypothesis is actually true.
  • 3.
    Statistical power (1 -):  the probability of correctly rejecting the null hypothesis. alpha Sample size Effect size
  • 5.
    Testing Hypotheses ona Single Mean  One sample t-test: statistical technique that is used to test the hypothesis that the mean of the population from which a sample is drawn is equal to a comparison standard.
  • 6.
    Testing hypothesis abouttwo related means  Paired sample t-test to examine the differences in the same group before and after treatment.  The Wilcoxon signed-rank test: a nonparametric test for examining significant differences between two related samples or repeated measurements on a single sample. Used as an alternative for a paired samples ttest when the population cannot be assumed to be normally distributed.
  • 7.
    RESEARCH METHODOLOGY OFTEN STUDENTS IN THE FIRST WEEK AND LAST WEEK OF SEMESTER
  • 8.
    Testing hypothesis abouttwo related means  McNemar's test: non-parametric method used on nominal data. It assesses the significance of the difference between two dependent samples when the variable of interest is dichotomous. It is used primarily in before-after studies to test for an experimental effect.
  • 9.
    Performance of studentbefore and after extra class
  • 10.
    Testing hypothesis abouttwo unrelated means • Independent samples t-test: is done to see if there are any significant differences in the means for two groups in the variable of interest.
  • 11.
    Testing hypothesis aboutseveral means • Analysis Of Variance (ANOVA) helps to examine the significant mean differences among more than two groups on an interval or ratio-scaled dependent variable.
  • 12.
    Regression Analysis • Simpleregression analysis is used in a situation where one metric independent variable is hypothesized to affect one metric dependent variable.
  • 13.
  • 14.
  • 15.
    Standardized regression coefficients Standardized regression coefficients or beta coefficients are the estimates resulting from a multiple regression analysis performed on variable that have been standardized. This is usually done to allow the researcher to compare the relative effects of independent variable on the dependent variable, when independent variable are measured in different unit of measurement.
  • 16.
    Regression with dummy variable •A dummy variable (also known as an indicator variable, design variable, categorical variable, binary variable, or qualitative variable) • Dummy variable allow to use nominal or ordinal variable as independent variable to explain, understand, or predict the dependent variable.
  • 17.
    MULTICOLLINEARITY • Encountered statisticalphenomenon in which two or more independent variables in a multiple regression model are highly correlated. • It makes the estimation of the regression coefficients impossible and sometimes unreliable. • To detect multicollinearity, we must check the correlation matrix for the independent variables. • The high correlations is first sign of sizeable multicollinearity. TWO MEASURES : Tolerance value Variance inflation factor ( VIF ) To measure indicate the degree to which one independent variable and explained by the other independent variable.
  • 18.
    A display ofthe FEV data in SPSS
  • 19.
    • To fitmultiple linear regression model in SPSS using the FEV data do the following: • Analyze > Regression > Linear and then move forced expiratory volume into the dependent box and Smoke and age into independent(s) box. Then Click OK. • This will give you the model summary table, ANOVA table and the regression coefficients table in the output window.
  • 20.
    A demonstration ofhow to start fitting the multiple regression model in SPSS
  • 21.
    A demonstration ofhow to select the dependent and independent variable(s) for fitting multiple regression in SPSS.
  • 22.
    A demonstration ofhow to select diagnostic statistic for checking outliers and multicollinearity issues in SPSS.
  • 23.
    Multicollinearity is nota serious problem, because the estimation of the regression coefficients may be unstable. But when the objective of the study is to reliably estimate the individual regression coefficients, multicollinearity is a problem. The Methods to Reduce Reduce the set of independent variables to a set that are not collinear. Use more sophisticated ways to analyze the data, such as ridge regression. Create a new variable that is a composite of the highly correlated variables.
  • 24.
    Testing moderating usingregression analysis : interaction effects It is effect one variable ( X1 ) on Y depends on the value of another variable ( X2 ). Moderating variable as a variable that modifies the original relationship between an independent variable and dependent variable. Example : H1 : The students’ judgement of the university’s library is affected by the students’ judgement of the computers. -It’s means the relationship between the judgement of computers in the library and the judgement of the library is affected by computer ownership. H2 : The relationship between the judgement of computers in the library is moderated by computer ownership.
  • 26.
    Other multivariate testsand analysis • • • • • • Discriminant analysis Logistic regression Conjoint analysis Two-way ANOVA MANOVA Canonical correlation
  • 27.
    Other multivariate testsand analysis • Discriminant analysis -help to identify IV that discriminate a normally scaled DV of interest.
  • 28.
    Other multivariate testsand analysis • Logistic regression -used when the DV is nonmetric -always used when DV has only 2 groups. -it allows researcher to predict discrete outcome.
  • 29.
    Other multivariate testsand analysis • Conjoint analysis -statistical technique used in many fields. -used to understand how consumers develop preferences for product/services -built on the idea that consumers evaluate the value of a product or service by combining the value that is provided by each attribute.
  • 30.
    Other multivariate testsand analysis • Two-way ANOVA -used to examine the effect of two non metric IV on a single metric DV -enable us to examine main effects & also interaction effects that exist between the independent variables.
  • 31.
    Other multivariate testsand analysis • Two-way ANOVA -example DV : Satisfy with toy IV : i) toy colour (pink & blue) ii) gender (male & female)  Main effect of toy colour. Pink toys significantly more satisfaction than the blue toys.  Main effect of gender. The female are more satisfy with the toy than the male
  • 32.
    Other multivariate testsand analysis • Multivariate Analysis of Variance (MANOVA) -is a multivariate extension of analysis of variance. -the IV measured on a nominal scale & the DV on interval/ratio scale i) The null hyphothesis: Hₒ :µ1=µ2=µ3... µn ii) The alternate hyphothesis: HA:µ1≠µ2≠µ3≠... µn
  • 33.
    Other multivariate testsand analysis • Canonical correlation -examine the relationship between two or more DV & several IV
  • 34.
    Data warehousing • Most companiesare now aware of the benefits of creating a data warehouse that serves as the central repository of all data collected from disparate sources including those pertaining to the company's finance, manufacturing, sales, and the like.
  • 35.
    Data Mining • Complementaryto the functions of data warehousing, many companies resort to data mining as a strategic tool for reaching new levels of business intelligence. • Using algorithms to analyze data in a meaningful way, data mining more effectively leverages the data warehouse by identifying hidden relations and patterns in the data stored in it.
  • 36.
    Operations Research • Operationsresearch (OR) or management science (MS) is another sophisticated tool used to simplify and thus clarify certain types of complex problem that lend themselves to quantification.