The document discusses chi-square test and its properties. It defines chi-square as a non-parametric statistical test used for discrete data to test for independence and goodness of fit between observed and expected frequencies. The chi-square test has some key assumptions including independent random samples, nominal or ordinal level data, and no expected cell counts below 5. It is calculated by subtracting expected from observed frequencies, squaring the differences, and dividing by expected counts. The chi-square test can identify if there is a significant association between variables but does not measure the strength of the association.
2. One of the most useful and popular tools in social science
Research is cross tabulation or joint contingency analysis,using in
statitics known a chi-square.It is one of the most simpliest and
widely used non-parametric test in statistical work.It makes no
assumptions about the population being sampled.
According to Anthony Walsh,
‘The chi-square is a test of significance that is used for discrete data
in the form of frequencies ,percentages or proportions known as
non-parametric statistics
According to Spigel,
‘A measure of discrepancy existing between the observed and
expected frequency which is supplied by the X2 (chi square)
statistics.’
3. The formula says that chi-square or X is the sum we will get if we
1. Substract fr from fo for each other of the value.
2. Divide each squared difference by fe.
3.Square each of the differences
4. Sum all of answers.
[statistics for Management ,Richard I.Levin,David.S. Rubin, 7th edition, p-
572]
Assumptions For The Use of Chi-square:
1.We have independent random sample.
2.The data are nominal or ordinal level.
3. The data are no expected cell frequencies in less than 5.
4.Mutually exclusive
4. Significanceof chi-square
Chi-square test has some applications:
As a non-parametric statistical test chi square may be used-
1.As a test of goodness of fit and
2. As a test of independence
3. X-test is simply a technically judging the significance of
the measure of degree of relation between two attributes.
Condition of Chi-square Test:
The experimental data (sample observation) must be
independent of each othar.
The sample data must be drawn at random from the target
population.
The data should be expressed i n original unit.
There should not be less than five observations in any one
5. Limitations of chi-square test
The data is from a random sample
This test applied in a four fold table,will not give a
reliable result with one degree of freedom if the expected
value in any cell is less than 5
This test tells the presence or absence of an Association
between the events but doesn’t measure the strength of
association.
This test to be applicable when the individual
observation of sample are occurance of one individual
observation(event ) in the sample under considerations.
This test doesn’t indicate the cause and effect ,it only
tells the probability of occurance of association by
7. Advantages
Easy to calculate
& evaluate
Used on nominal
data
Can be applied in
a broad area
Can test
association
between variables
Identifies
difference between
observed and
expected values.
8. Disadvantages
Can’t use
percentages
Data must be
numerical catagories
of l are not good to
compare
Test become invalid
if any of the
expected values are
5
Numbers of
observations must be
more than 20.
Quite complicated
9. Proportion reduction in error
The concept of PRE measures are derived by
comparing the errors, made in predicting the dependent
variables by using the independent variables with errors
made when making predictions that use information about
the independent variable.Not all measures of association
are PRE measures but whenever possible ,We use PRE
measures because they have a standerd interpretation as
indicated by the conceptual formula for pre measures.
According to Franfort and Leon Guerrero, 2004:366
‘All pre measures are based on comparing predictive
error levels that results from each of two methos of
prediction’
10. Formula and Types
PRE= (E1 – E2) / E1
Where,
E1= Errors of prediction made when the
independent variable is ignored.
E2= Errors of prediction made when the prediction
is based on the independent variable
There are two kinds of most commonly used PRE
measures of association are,
1.Lamda ()
2.Gamma ()
11. Lamda is defined as an asymmetrical measures of association that
is suitable for use with nominal variables.It may be range from 0.0
to 1.0 .Lamda provides us with an indication of the strength of the
relationship between independent and dependent variables.
As an assymetrical measure of association lamdas>>>> may
vary depending on which variable is considered the dependent
variables and which variables is considered the independent
variable. Lambda may range in value from 0.0 to 1.0. Zero
indicates that there is nothing to be gained by using the
independent variable to predict the dependent variable. In other
words, the independent variable does not, in any way, predict the
dependent variable. A lambda of 1.0 indicates that the
independent variable is a perfect predictor of the dependent
variable. That is, by using the independent variable as a predictor,
we can predict the dependent variable without any error.
12. Properties of Lamda
It is an assymetrical statistics.
The value of lamda varies which variable is
taken as dependent .
It is appropriate for nominal variables
It involves an unrestrictive minimizing of
errors.
It has a cosiderably larger numerical value.
It ranges from 0.0 to 1.00.
It doesn’t give a direction of association
13. Formula
There is no sample test of statistical
significance for lamda.
Lamda= E1-E2/E1
To calculate lamda, we need 2 numbers.
E1= Errors of prediction made when the
independent variable is ignored.
E2= Errors of prediction made when the
prediction is based on the independent
variable
14. Gamma is defined as a symmetrical measure of
association suitable for use with ordinal variable
or with dichotomous nominal variables.
It can be vary from 0.0 to 1.0 and provides us
with an indication of the strength of the
relationship between two variables where as
lamba is an asymmetrical measure of associations
.Gamma is a symmetrical measure of association
.This means that the value of gamma will be the
same regardless of which variable is considered
the independent variable.
16. Origin:
This method of finding out co-variability or the lack of
it between two variables was developed by the British
psychologist Charles and Edward Spearman in 1904. This
measure is especially useful when quantitative measure of
certain factors can not be fixed but the individuals in the
group can be arranged in order there by obtaining for each
individual a number indicating his rank in the group.
In any event, the rank correlation coefficient is
applied to a set of ordinal rank numbers with 1 for the
individual ranked first in quantity or quality and so on. N for
the individual ranked last in a group of N individual.
17. Formula of rank correlation
coefficient:
R=1- σ Σ D2 / N (n2-1)
Where R donates rank coefficient of correlation and D refers to the
difference of ranks between paired items in two serious.The
value of this coefficient also lies between +1 and -1 .
When R is +1 there is complete agreement in the order of the ranks
and they are in opposite directions.
When R is -1 There is complete agreement in the order of the ranks
and they are in opposite directions.
This will be clear from the following:
19. R= 1-6 ∑D2 ÷N (n2-1) R = 1-6 ∑D2 ÷N (n2-1)
= 1- 6 ÷3 (32 -1) = 1-6×8 3 (32 -1)
= 1- 0÷ 3 (9-1) =1- 6×8 ÷ 3 (9-1)
= 1-0 = 1- 48 ÷ 3 ×8
= 1- 48 ÷ 24
= 1-2
= -1.
To rank correlation we may have two types of problems
• Where the ranks are given
• Where ranks are not given
20. In rank correlation ,we may have two types of problems:
1.Where the actual ranks are given
2.Where ranks are not given
Calculation of rank correlation co-efficient:
Calculate the rank correlation co-efficient for the
following data of marks of 2 tests given to candidates
for a clerical job.
Preliminary Test: 92,89,87,86,83,77,71,63,53,50
Final Test: 86,83, 91,77,68,85,52,82,37,57
22. Now,
R = 1-6 ∑D2 ÷N (n2-1)
= 1- 6× 44 ÷10{(10)2 -1}
= 1-6 × 44 ÷ 10 (1000-1)
= 1-0.267
= 0.733
Here, there is a high degree of positive correlation between
preliminary and the final list.
23. Merits of the rank Method
This method is simpler to understand and easier to apply
compared to the karl pearson’s method.
Can use Where we are given the ranks and not the actual
data
Can use where the data are of a qualitative nature like
honesty, efficiency, intelligence etc.
Limitations:
Can not be used for finding out correlation in a group
frequency.
Where the number of observations execd 30 , the
calculations become quite tedious an require a lot of time.
24. Difference Between rho and r
:
Spearman’s rank co-rrelation is used to find the
correlation of coefficient for qualitative data,in
where pearson’s correlation cannot be applied
spearman’s correlation cannot be used for
bivariate frequency distribution but pearson’s
correlation can be used in that case.
Spearman’s coefficient of rank correlation is
slightly lower in quality than pearson’s
coefficient of correlation.
[Business statistics, Md. Abdul Aziz]
25. A table that illustrates the relationship
between two variables by displaying the
distribution of one variable across the categories
of a second variable. Bivariate association can be
investigated by finding answers to three
question:
1.Existance of association.
2.Strength of association.
3. Pattern or direction of the association.
27. Direct bivariate relationship:
When the variation in the dependent variable can
be attributed only to the independent variable, the
relationship is said to be direct.
Spurious bivariate relationship:
When a third variable affects both the
independent and dependent variables (think of the
firefighter example) the relationship is said to be
spurious.
Intervening bivariate relationship:
. When the independent variable affects the dependent
variable only by way of a mediating variable (sort of like
28. a chain reaction), it is said to be an intervening
relationship.
Title.
Categories of the IndependentVariable head the
tops of the columns.
Categories of the DependentVariable label the
rows .
Order categories of the two variables from lowest to
highest (from left to right across the columns; from
top to bottom along the rows).
Show totals at the foot of the columns.
29. Productivity By Job Satisfaction
Low moderate high total
Low 30 21 07 56
Moderate 20 25 18 63
High 10 15 27 52
Total 60 61 52 173
Production
(Y)
Job satisfaction (x)
30. When independent variable is the column
variable(as in this text and as is generally but
not always ,the case)
calculate percentages within the column
(vertically).
compare percentage across the column
(horizontally.