1.
Kinds of variable
The independent variable:
It is the factor that is measured, manipulated or selected by the experimenter to determine its
relationship to an observed phenomenon. It is a stimulus variable or input operates within a person or
within his environment to effect behavior. Independent variable may be called factor and its variation
is called levels.
The dependent variable:
The dependent variable is a response variable or output. The dependent variable is the factor that is
observed and measured to determine the effect of the independent variable; it is the factor that appears,
disappears, or varies as the researcher introduces, removes, or varies the independent variables.
Moderate variable:
It is the factor that is measured, manipulated or selected by the experimenter to discover whether it
modifies the relationship of the independent variable to an observed phenomenon. The term moderate
variable describes a special type of independent variable, a secondary independent variable selected to
determine if it affects the relationship between the study’s primary independent variable and its
dependent variable.
Control variable:
Control variables are factors controlled by the experimenter to cancel out or neutralized any effect they
might otherwise on the observed phenomena. A single study can not examine all of the variables in a
situation (situational variable) or in a person (dispositional variable); some must be neutralized to
guarantee that they will not exert differential or moderating effects on the relationship between the
independent variables and dependent variables.
Intervening variable:
An intervening variable is the factor that theoretically effects observed phenomena but can not be seen,
measured, or manipulated; its effects must be inferred from the effects of the independent and
moderate variable on the observed phenomena.
Consider the hypothesis
Among students of the same age and intelligence, skill performance is directly related to the number of
practice trials, the relationship being particularly strong among boys, but also holding, though less
directly, among girls’. this hypothesis that indicates that practice increases learning, involve several
variables.
Independent variable: number of practice trail
Dependent variable: skill performance
Control variable: age, intelligence
Moderate variable: gender
Intervening variable: learning
2.
Causes relationship effects
Independent
Variables
Moderate Intervening Dependent
variables variables variables
Control
variables
Steps in data processing
Raw data Editing Coding Analysis
Interview Developing a
code book Developing a
Questionnaires frame of analysis
observation Pre-testing the
code book Analysis
Interview guid
Sec.sources Coding the data
Computer
Verifying the Manual
coded data
Data
There are two types of data.
Qualitative Data and Quantitative Data
Qualitative Data is further divided into Nominal data and ordinal data.
Nominal data
As it obvious from the name nominal means “to give names”. In social sciences, the qualitative cannot
be measured or simplified. To calculate this type of data, it is named and categorized. This named or
categorized data is called nominal data.
Ordinal data
The friendly term ordinal gives a meaning of “ordered or arranged”. This data is arranged into orders,
categorizing individuals as more than or less than one another. After nominalising the data into
categories, it is then ordered or arranged to get the desired result. Although ordinal measurement may
require more difficult processes but it gives more informative, précised data.
Interval data
It is the data or score as units of equal appearing magnitude. The interval data can be added subtracted
but cannot be multiplied or divided.
Ratio Data
3.
It has a true zero value, that is, a point that represents the complete absence of the measured
characteristics, ratios are comparable at different points. It is much more frequently used in the
physical sciences than in behavioral sciences;
For example 9 ohms indicates three times the resistance of 3 ohms, while 6 ohms stands in the same
ratio to 2 ohms.
For example;
Listed below are the scores of a group of students on a mid semester English test.
64,61,56,51,52,34,64,31,31,31,59,61,34,59,51,38,38,38,36,36.
How many students received a score of 36?
Did most of the students receive a score above 50?
Unordered data is difficult to tell, to make any sense out of this data. We must put it into some sort of
order. One of the most common ways to do this is to prepare a frequency distribution. This is done by
listing, in rank order from high to low, with tallies, to indicate the number of subjects receiving each score.
Often score in distribution are grouped into intervals. This results in grouped frequency distribution.
Example of a frequency distribution
Raw score frequency
64 2
61 2
59 2
56 1
52 1
51 2
38 3
36 2
34 2
31 3
--------
N = 20
Table of a Grouped Frequency Distribution
Raw Score
Intervals of Five Frequencies
60 -- 64 4
55 – 59 3
50 – 54 3
45 – 49 0
40 – 44 0
35 – 39 5
30 – 34 5
-------------
N=20
Frequency Polygon
4.
Average
It enables a researcher to summarize the data in a frequency distribution with single number.
It is of three kinds;
Mode
Median,
Mean.
The mode
The mode is the most frequent score in a distribution. The score attained by more students than any
other score e.g. in a distribution, 25, 20,19,17,16,16,13,12. The mode is 16.
What about this distribution?
25, 20, 19, 19, 17, 16, 16, 12, and 11.
This distribution has two modes, 19 and 16. Hence it is called bimodal distribution. This mode does
not tell us very much about a distribution. However, it is not often used in educational research.
Median (the mid point)
The median is the point below and above 50% of the score in a distribution fall. This is a distribution
1, 2, 3, 4, 5. The median is 3. If the numbers are even in a distribution then the median is the point
halfway between the two middle most scores. In a distribution;
2, 4,6,8,10,12.
The median is 7.
The Mean
It is determined by adding up all of the scores and then dividing this sum by the total number of scores.
X where sum of X represents any raw score value, n represents the total number of
scores and X represents the mean. All the averages give us ample information data by a single value.
But sometimes, the researcher cannot get the required results from the data by using average. Then
there is a need for measures researchers can use to describe the spread or variability that exists within a
distribution because average tells us the total behavior of data by single unit that sometimes leads to
confusion and ambiguity. To calculate the position of the data or deviation there are certain ways.
1. Measures of Data Variability:
Knowing central tendencies (mean, median, and mode) isn’t enough. Also need a method for
determining how close the data is clustered around its center point(s).
The most typical measures of data variability:
– Range,
– Variance, and
– Standard Deviation.
Range:
• Simplest measure of variability.
• Calculated by subtracting the smallest measurement from the largest measurement.
5.
• It is not a good measure of variability. i.e. if two ranges are same, it does not mean that the
spread is same.
Variance:
• It is the sum of the square of the deviation from the mean divided by (n-1) for a sample and is
denoted by s2.
Similarly, the sum of the square of the deviation from the mean divided by N for the population and is
denoted by s2.
Note: Deviations are squared to remove effects of negative differences.
Standard Deviation:
• While variance does not provide a useful metric (i.e. “units squared”), taking the positive
square root of the variance provides a metric which is the same as the data itself (i.e. “units”).
– Sample Standard Deviation - s
– Population Standard Deviation - s
Application of mean & standard deviation to observe the behavior ofthe data
• Data can be standardized using mean & standard deviation. Thus, for a single data set,
variability can be discussed in terms of how many members of the data set fall within one, two,
three, or more standard deviations of the mean.
Standard Score:
It uses a common scale to indicate how an individual compare to other individual in group. These
scores are particularly helpful in comparing an individual’s relative position. The two standards score
are the most frequently used in educati nal research,
o
1. 1 Z – Score
2. T- Score
Z – Score
The simplest form of standard score is the Z – score. It expresses how far a raw score is from the mean
in standard deviation units. A big advantage of Z – Score is that they allow raw scores on different
tests to be compared.
Researchers use a formula to convert a raw score into z-score
Z score = raw score – mean
Standard deviation
For example a student received raw scores of 60 on a biology test and 80 on a chemistry test. A naïve
observer might be inclined to infer that the student was doing better in chemistry than in biology. But
this might be unwise, for how well the student is comparatively cannot be determined until we know
the mean and standard deviation for each distribution of score. Let us suppose the mean is 50 in
biology and 90 in chemistry. Also assume the standard deviation on biology deviation is 5 and on
chemistry is 10.
What does this tell us?
The comparison of raw score and Z score on two tests.
Test score Raw score Mean SD Z Score % rank
Bio 60 50 5 2 98
Chemistry 80 90 10 -1 16
Probability and Z score.
6.
Probability:
It refers to the likely hood of an event occurring and a percentage stated in decimal form. For example
if there is a probability that an event will occur 25 percent of the time, this event can be said to have a
probability of .25.
Hypothesis:
There are two kinds of hypothesis; one is the predictive outcome of the study called research
hypothesis where as the null hypothesis is the assumption that there is no relationship between the
variables or in the population..
Co relational analysis:
It shows the existing relationship between the variables, with no manipulation of variables. It is also
used to analyze data containing two variables as well as examine the reliability and validity of the data
collection procedure.
Types of correlation:
Highly positive;
(When the variables are directly proportional to each other)
Low correlation;
(When there is no correlation between the variables)
Negative correlation;
(When the variables are inversely proportional to each other)
When the researcher wants to make inferences to the population, he will have to examine their
statistical significance.
Statistical significance can be determined if correlation have been obtained from the randomly selected
samples.
Depends on the size of the correlation
Significance of correlation
Size of the sample
Level of significance is very important since it relates directly to whether the null hypothesis is
rejected or not.
Multivariate analysis:
It is used to find out the relationship between more than two variables as in correlation analysis.
There are two ways;
Multiple regressions
Factor analysis
Multiple regressions:
Through multiple regressions it is possible to examine the relationship and predictive power of one or
more independent variables with the dependent variables. it shows which variables are significant in
their contribution explaining the variance in the dependent variable and how much they contribute.
Discriminate analysis
Which contribution of variables distinguishes between one or more categories of dependent variables?
Factor analysis:
In it independent variable is not related to dependent variables as in regression, but rather operates
within a number of independent variables without a need to have dependent variables. In factor
analysis the interrelationships between and among the variables of the data are examined in an attempt
to find out how many independent dimensions can be identified in the data. It thus provides
7.
information on the characteristics of the variables. This type of analysis is based on the assumption
that variables measuring the same factor will be highly related. Whereas variables measuring different
factors will have low correlations with one another.
Referential Technique:
As we know that different designs call for different methods of analysis. A statistical technique
appropriate for quantitative data will generally inappropriate for categorical data.
Types of Inferences Techniques:
There are two types of Inferences techniques that a researcher uses.
Parametric technique
Non- Parametric technique
Parametric
It is the most appropriate for interval data. It makes various kind of assumptions about the nature of
population from which the sample involved in the research study, are drawn they are generally more
powerful than non- Parametric techniques because it reveals a true difference or relationship if really
exist.
Non-Parametric
It is the most appropriate for nominal and ordinal data. It makes few assumptions about the nature of
the population from which the sample are taken.
T-Test:
It is used to compare the means of the two groups. T test is used to determine the probability - that
the difference between the groups of subjects rather than a chance variance in data it is used to
compare.
Types:
T-test for independent means;
It is used to compare the mean scores of two different independen groups.
t
T-test for correlate means;
It is used to compare the means scores of the same group before and after a treat mint of some sort is
given to see if any observed gain is significant or when the researcher design involve two matched
groups.
The result of t-test provides the researcher with a t-value.
Example
A researcher is comparing the performance of the two randomly selected groups learning French by
two different methods. The experimental group learns wit the aid of computer while the control group
h
is exposed to the teacher. The researcher investigates the effects of the computer practice on students’
achievement on French. After three months both the groups undergo an achievement test.
The researcher uses t- test to examine whether there are differences in the achievements of the two
groups.
To have a deep insight of the data through descriptive statistics, first it have a mean X, SD and sample
size N of the data .There must be a mean of experimental or control group.
ANOVA :( one way analysis of variance)
One way analysis of variance is used to examine the differences in more than two groups.
The analysis is performed on the variances of the groups, focusing on whether the variability between
the groups is greater that the variability within the groups value is the ratio between variances over the
within the variances.
F= between group variance
8.
Within group variance
If the difference between the groups is greater than the difference within the groups, than F value is
significant and the researcher can reject the null hypothesis. If the situation is inverse than F value is
significant.
Chi-Square (Non-Parametric Technique)
The chi test allows analysis of one, two or more nominal variables. It is based on the comparison
between expected frequencies and actual, obtained frequencies.
Example
A researcher might want to compare how many male and female teachers favor a new curriculum, to
be instituted in a particular school district. he asks a sample of 50 teachers ,if they favor or oppose new
curriculum. if they do not differ significantly in their responses, then we would expect hat about the
same proportion of males and females would be in favor(or opposed to)instituting the curriculum.
Degree of freedom
Number of scores in a distribution that are free to vary-that is, that are not fixed.