SlideShare a Scribd company logo
1 of 220
Module 8: Statistical Methods and Testing in
curriculum and Instruction
Module Code: TECS: 624
Credit hours: 2
Prof.Omprakash H M
Department of Curriculum and Instructons
College of Education and Behavioral Sciences
Bule Hora University, Adola, Ethiopia
1. Introduction to statistics in curriculum and
Instruction
1.1 The definition of statistics and other related
terms
1.2 Descriptive statistics
1.3 Inferential statistics
1.4 Function and significance of statistics in
education
1.5 Types and levels of measurement scale
1.1 The definition of statistics and
other related terms
Statistics is:
 The scientific study of handling of
quantitative information.
 Concerned with the scientific methods for
collecting, organizing, summarizing,
presenting and analyzing data as well as
deriving valid conclusions and making
reasonable decisions on the basis of this
analysis.
 The systematic collection of numerical data
and its interpretation.
The knowledge of statistics enables us to:
o Present our result in summarized, more meaningful and
convenient form.
o Draw generalization and make prediction.
o In short, it helps us to condense (e.g. tabulation and
classification), compare (e.g. grand total, percentage,
mean, median, range, variance, etc.), forecast (e.g.
regression analysis), estimate the result of the data under
consideration and to test the hypothesis.
I. Variable
o It refers to a property or character of an
object or events that can take on different
values.
Example: Sex, age, grade, height, weight,
intelligence, attitude, depression, etc.
We dichotomize the concept of a variable
in terms of :
a) Independent variable
o The variable which is manipulated by the
researcher
o It is the variable that the experimenter
control it (e.g. teaching material,
motivation, stress, confidence, etc.)
b) Dependent Variable
oVaries as the result of independent variable
oIt is the result of the study (data)
Example: Achievement scores, self-esteem scores,
etc.
II. Data
o The measurements/observations recorded about
the individuals.
o There are two types of data. These are:
a) Numerical/Quantitative/Measurement
Variable/Data
a) Numerical/Quantitative/Measurement Variable/Data
o Is classified according to some characteristics and obtained by
counting or measurements.
Example: Grades on a test, weight, height, self-esteem scores,
number of children, etc.
o We are interested in ‘how much’ of some property a particular
object represents.
There are two types of numerical/ quantitative
data/variable. These are:
1) Discrete data/variables
o The data/variables which can take on integer
values/ whole numbers only (0,1,2,3,---) and are
usually obtained by counting
o It is the variables that have only a countable number of distinct
possible values.
Example. The no. of students in the class, the no. of households,
cows in a village, etc.
2. Continuous Data/Variables
o The data which can take on any value in a set of real
numbers.
Example: length, weight, temperature and age, self-esteem
scores, etc.
b) Categorical/ Qualitative/ Frequency Data/Variables
o Are classified according to some attributes
Example. Gender/sex (male/female), marital status (single,
married, divorced, widowed), hair colors (brown, yellow, red,
gray, etc.)
o They have only discrete data
Exercise.
Identify the types of data collected
on 50 Freshman psychology students
given below.
a) Their age
b) Their height
c) Their region/city administration
d) No. of brothers and sisters they
have
e) No. courses they have enrolled in
f) Whether full time/part time
student
g) Their sex/gender
Score
o It is the raw data collected
o It is an individual value for a variable
Example: Score on a test, self-esteem score
1.2 Descriptive Statistics
Descriptive Statistics
o Refers to procedures for organizing,
summarizing and describing quantitative data
about the sample or about the population
o Does not involve the drawing of conclusions or
drawing inferences from the sample or population
o Data can be summarized by:
1) Tabular representation- Frequency distribution
2. Graphic Representation
o Bar-graph, phi chart, histogram, polygon, etc.
3. Numerical Representation
o A single value present many numbers such as:
a) Measures of central tendency
b) Measures of variability
c) Measures of association/ relationship, etc.
1.3 Inferential Statistics
 It is the method that used to draw conclusions or
inferences about characteristics of populations based
on data from a sample.
o It uses data gathered from a sample to make
decisions/inferences about the larger population from which
the samples was drawn.
 Statistical inference is the process of making an estimate,
prediction, or decision about a population based on a sample
 What can we infer about a population’s parameters based on
a sample’s statistics
 The major inferential statistical methods are:
 The t-test, Analysis of Variance (ANOVA), Analysis of
Covariance (ANCOVA), Regression Analysis, etc.
Population
o It is the set of all the individuals of interest in a particular study
o Anything we might be interested in studying
Example: College students, elder people, single parent families,
people with depression, etc.
Sample
o A set of data or individuals selected from a population
oIt is the subset of the population
oUsually intended to represent the population in a research study
Parameter
oIt is a descriptive measure of a population
oIt is a value that describe a population
oUsually derived from measurements of the individuals in the
population
oIt is the real entities of interest Example. Mean, variance,
standard deviation, etc. of the population
Statistic
o A descriptive measure of a sample
o A value that describes of a sample
o Usually derived from measurements of individuals in the
sample.
o Are guesses at reality
Example: Mean, variance, standard deviation, etc. of the
1.4 Function and significance of statistics in education
Function of Statistics:
1. It helps in collecting and presenting the data in a systematic
manner.
2. It helps to understand unwisely and complex data by
simplifying it.
3. It helps to classify the data.
4. It provides basis and techniques for making comparison.
5. It helps to study the relationship between
different phenomena.
6. It helps to indicate the trend of behaviour.
7. It helps to formulate the hypothesis and test it.
8. It helps to draw rational conclusions.
Significance of Statistics in Education
1. It helps the teacher to provide the most exact
type of description.
2. It makes the teacher definite and exact in
procedures and thinking.
3. It enables the teacher to summarize the results in
a meaningful and convenient form.
4. It enables the teacher to draw general conclusions.
5. It helps the teacher to predict the future perfor-
mance of the pupils.
6. Statistics enables the teacher to analyse some of the
causal factors underlying complex and otherwise be-
wildering events.
1.5 Types and levels of measurement scale
1. What is measurement?
 Measurement is the process of assigning
numbers to objects and events according to
logically acceptable rules.
2. What are levels/scales of measurement?
 There are four levels/scales of me
asurement, these are:
 Nominal Scale
 Ordinal Scale
 Interval Scale and
 Ratio Scale
 Before discussing these one by one, it is better to look at
some of the attributes possessed by these scales.
a) Magnitude: The quantitative value that exist to an
attribute, will determine whether the value of a given attribute
is >, < or = than other
b) Equal Interval: The same magnitude (value interval) with in
the same interval of any other two points along the same scale
c) Absolute zero point: Refers to the absence or presenc of true
zero (0) point. Whether the attribute has no value at all.
 Nominal Scale
A Nominal Scale is a measurement scale, in which numbers serve
as “tags” or “labels” only, to identify or classify an object.
A nominal scale measurement normally deals only with non-
numeric (quantitative) variables or where numbers have no
value. Below is an example of Nominal level of measurement.
 Also known as classificatory scale because it has
the property of classification and labeling.
 The objects are classified or labeled without
ranking or order associated with such classified data.
 It reflects qualitative differences rather than
quantitative ones
 The least precise method of quantification
and does not have any of the three
attributes (magnitude, equal interval and
absolute zero point)
 Uses categorical/qualitative variables and
Found by counting
Example: Yes/No, Pass/Fail, Male/Female,
political party affiliation (democratic,
republican 0r independent), nationality,
race, occupation, religious affiliation,
football players number,
 Ordinal Scale
Ordinal Scale is defined as a variable
measurement scale used to simply depict the
order of variables and not the difference
between each of the variables. These scales are
generally used to depict non-mathematical
ideas such as frequency, satisfaction,
happiness, a degree of pain, etc. It is quite
straight forward to remember the
implementation of this scale as ‘Ordinal’
sounds similar to ‘Order’, which is exactly the
purpose of this scale.
 It is also termed as ranking scale and
possesses only the attributes of magnitude.
 Incorporates the property of nominal scale
and in addition it introduces the meaning of
ordering (ranking).
 Has no absolute values and the real
difference between adjacent ranks may not
be equal.
 Numbers are used as labels and do not
indicate amount or quantity. It is used
simply as description.
 Uses categorical/qualitative variables.
Example: Students in rank, professional career
structures, likert scale of pain (No pain, low pain,
moderate pain, high pain), marathon runners medals
award (Gold, silver, bronze), position at the end of
race-1st, 2nd, 3rd, etc.), state of buildings (Excellent
condition, moderate condition, poor condition).
 Interval Scale
Interval Scale is defined as a numerical scale where the order
of the variables is known as well as the difference between
these variables. Variables that have familiar, constant, and
computable differences are classified using the Interval scale.
It is easy to remember the primary role of this scale too,
‘Interval’ indicates ‘distance between two entities’, which is
what Interval scale helps in achieving.
 Is an arbitrary scale based on equal units of measurement
that indicate how much of a given characteristics is present.
 Possesses two of the attributes, that are magnitude and
equal intervals. Zero point is arbitrary in this scale.
 Uses quantitative variables.
 The difference amount of characteristics possessed by
persons with scales of 90 and 91 is assumed to be
equivalent to that between persons with scores of 60 and
61 or the difference in temperature between 10OF and
20OF is the same as the difference between 80OF and
90OF
 Its primary limitation is the lack of absolute zero point.
 It does not have the capacity to measure the complete
absence of the trait, and rations between numbers on the
scale are not meaningful (arbitrary).
 we cannot say, for example, that 40OF is half as hot as
80OF or twice as hot as 20OF
oTherefore, numbers cannot be multiplied and divided
directly.
oPsychological tests and inventories are interval scales and
have this limitation, although they can be added, subtracted,
multiplied and divided.
Example: IQ test, Temperature, Motivation Score, Attitude
 Ratio Scale
Ratio Scale is defined as a variable measurement scale that
not only produces the order of variables but also makes the
difference between variables known along with information on
the value of true zero. It is calculated by assuming that the
variables have an option for zero, the difference between the
two variables is the same and there is a specific order between
the options.
oThe numerals of the ratio scale have the qualities of real
numbers and can be added, subtracted, multiplied and divided.
 We can say that 5 grams is one half of 10 grams, 15 grams is
three times 5 grams, a weight of 0kg means no weight (mass) at
all.
All statistical measures can be used for a variable measured at
the ratio level, since all the necessary mathematical operations
are defined.
Almost exclusively confined to use in physical sciences. In
other words, it is almost non existence in psychology and other
social sciences.
oUses quantitative variables
Example: Mass, length, time,
energy, number of correct answers
on a test.
2. Introduction to SPSS Software
A frequency distribution is an overview of all distinct
values in some variable and the number of times they
occur. That is, a frequency distribution tells
how frequencies are distributed over
values. Frequency distributions are mostly used for
summarizing categorical variables.
3. Frequency Distribution:
Frequency Distribution:
 Is the easiest method of organizing data, which converts
raw data into a meaningful pattern for statistical analysis.
Lists all the classes and the number of values (frequency)
that belong to each class
Exhibits how the frequencies are distributed over various
categories
Besides it makes the data easy and brief to be understood.
However, individual score loss its identity.
Frequency Distributions and Graphs for
Ungrouped Data
a) Frequency Distributions for Ungrouped Data
First we arrange numbers/data in ascending or descending
order.
 Then, we tally the numbers/data
Finally, we use a frequency table
E.g. Age of 20 students
25,20,18,23,21,23,22,24,19,20,18,26,18,25,18,24,18,21,24,18
Table 1: Frequency Distribution of Age of 20
students in a class
Age Tally Frequency (f)
18 ////// 6
19 / 1
20 // 2
21 // 2
22 / 1
23 // 2
24 /// 3
25 // 2
26 / 1
Total 20
b) Graphs for ungrouped/
Discrete Data
What is graph in statistics?
 It is the geometrical
image of frequency
distribution
It is when frequency
distributions are converted
into visual models to
facilitate understanding
(e.g. pie chart/circle graph,
bar graph, line graph,
histogram, frequency
polygon, etc.)
For ungrouped data we use bar graph, pie chart/circle graph, line
graph, etc.
a) Bar Graph/Chart
It is a detached graph made up of bars whose heights represent
the frequencies of the respective categories.
It is used to summarize categorical data.
b) Pie chart/Circle Graph
It is divided into portions that
represent the frequency, relative
frequency, or percentage of a
population or a sample belonging to
different categories.
Relative frequency of a category =
Frequency of a category/Sum of all
frequencies
Percentages = Relative frequency
x 100%
c) Line Graph
It is a graph consists of independent and dependent
variables, where the former (IV) is written on x-axis and
the latter (DV) on y-axis and points are joined by line
segments.
Frequency Distribution and Graphs for
Grouped Data
 Some basic technical terms when continuous frequency
distribution is formed or data are classified (grouped) according
to class interval.
a) Class limits
 It is the lowest and the highest values that can be included in
the class. E.g. 30-40; 30 is the lower (lowest) limit (L) and 40
is the upper (highest) limit (U).
 Sometimes, a class limit is missing either at the end of the
first class interval or at the end of the last class interval or
both are not specified. In such case the classes are known as
open-end classes.
Example: below 18, 18-20, 21-23, 24-26, 27 and above years of
age
b) Class boundaries: Are the midpoints of the upper class
limit of one class and the lower class limit of the
subsequent class (LCB = LCL-0.5, UCB = UCL+.0.5). Example:
class limits class boundaries
30-40 29.5-40.5
41-50 40.5-50.5
c) Class interval
It is the size of each grouping of data 50-75, 76-101, 102-
127, are class intervals.
d) Width or size of the class interval (c) c = Range/no. of
classes
The difference between the lower or upper class limit of
one class and the lower or upper class limit of the previous
class.
e) Number of Class Intervals
Should not be too many (5-20).
To decide the number of class intervals, we
choose the lowest and the highest values.
The difference between them will enable us
to determine class intervals.
f) Range (R)
 The difference between the largest and the
smallest value of the observation.
g) Mid value or mid point
 It is the central point of a class boundary/interval
 It is found by adding the upper and lower limits of a
class and dividing the sum by 2
Mid point = L+U
2
h) Frequency (f)
 Number of observations falling within a
particular class interval
Weight (in Kg) No. of persons (f)
30-40 25
41-50 15
51-60 45
Total 85
Rules to Construct Frequency Distribution of
Grouped Data
1) There should be between 5-20 classes
2) The class width should be an odd number.
This ensures that the midpoint of each class
has the same place value as the data. Always
we have to round up.
3) The classes must be mutually exclusive.
Mutually exclusive classes have non over
lapping class limits so that data cannot be
placed into two classes E.g. 10-20,
21-31, 32-42, 43-53
4) The classes must be continuous. Even if
there are no values in a class, the class must
be included in the frequency distribution
5. The classes must be exhaustive. There
should be enough classes to accommodate
all the data
6. The classes must be equal in width. This
avoids a distorted view of data. Exception
is in open-ended
E.g. The number of hours 40 employees spends on their job
for the last 30 working days is given below. Construct a
group frequency distribution for these data using 8 classes
62 50 35 36 31 43 43 43
41 31 65 30 41 58 49 41
37 62 27 47 65 50 45 48
27 53 40 29 63 34 44 32
58 61 38 41 26 30 47 37
Solution:
Step1. Max= 65, Mim= 26 that Range= 65- 26= 39
Step2. It is given 8 classes
Step3. Class width C= 39/8 = 4.875 ~ 5
Step4. Starting point 26= lower limit of the first class.
Hence the lower class limits become
26 31 36 41 46 51 56 61
Step 5. The upper limit of the first class = 31- 1= 30 and
hence the upper class limit become
30 35 40 45 50 55 60 65
The lower and the upper class limits can
be written as follows:
Group frequency distribution
Exercise:
Construct a group frequency distribution for the
weight in kg of 35 students data using 8 classes
52 60 56 81 72 72 55
73 60 58 55 65 75 56
55 81 64 60 50 70 58
70 55 52 56 55 50 64
65 60 50 52 60 53 64
 Some basic types of graph we use for
grouped/ continuous data are the
following:
a) Histogram
 A histogram is a graph in which class
limits/boundaries are marked on the horizontal
axis (x) and the frequencies, relative frequencies or
percentages are marked on the vertical axis (y).
 The frequencies, relative frequencies, percentages
are represented by the heights of bars.
 The bars are drawn adjacent to each other.
 A histogram is called a frequency, a relative
frequency or a percentage histogram.
 Used to summarize measurement data.
 Relative frequency = Frequency of aclass
Sum of all frequencies
 Percentage = Relative Frequency x 100%
Weight (in Kg) No. of persons
(f)
Relative
frequency
Percentage
(%)
29.5-40.5 25 0.25 25
40.5-50.5 15 0.15 15
50.5-60.5 45 0.45 45
60.5-70.5 10 0.10 10
70.5-80.5 5 0.05 5
Total 100 1 100
B) Frequency Polygon
A graph formed by joining the midpoints of the tops of
successive bars in a histogram with a straight lines.
A polygon with a relative frequency marked on the vertical axis
is known as a relative frequency polygon.
Similarly a polygon with percentages marked on the vertical axis
is called a percentage polygon.
Class
boundaries
Frequency Mid point
5.5-10.5 1 8
10.5-15.5 2 13
15.5-20.5 6 18
20.5-25.5 8 23
25.5-30.5 3 28
Total 30
C) The Cumulative Frequency Graph/
Ogive
 It is a curve drawn for the
cumulative frequency, cumulative
relative frequency or cumulative
percentage by joining with straight
lines the dots marked above the
upper boundaries of classes at
height equal to cumulative
frequencies, cumulative relative
frequencies or cumulative
percentages of the respective
classes.
Data Example:
Class
Boundary
Frequency Cumulative
Frequency
Cumulative
Relative Frequency
Cumulative
Percentage
5.5-10.5 1 1 0.05 5
10.5-15.5 2 3 0.15 15
15.5-20.5 6 9 0.45 45
20.5-25.5 8 17 0.85 85
25.5-30.5 3 20 1 100
Total 20
Ogive:
Less than and More than Cumulative Frequency
Example:
Class limits Frequency Less than
Cf
More than Cf
5-9 1 1 35
10-14 2 3 34
15-19 6 9 32
20-24 8 17 26
25-29 15 32 18
30-34 3 35 3
Total
4. Normal Curve and Standard Score
Definition: The normal distribution curve can be used to study
many variable that are approximately normal.
 Normal distribution curve depend on the mean and standard
deviation.
 when the majority of the data fall to the right of
the mean, the distribution said to be negatively
skewed. The tail of the curve is to the left and vise a
versa.
Skewness: It refers to a distortion or asymmetry that deviates from
the symmetrical bell curve, or normal distribution, in a set of data. If
the curve is shifted to the left or to the right, it is said to be skewed.
Skewness can be quantified as a representation of the extent to which
a given distribution varies from a normal distribution. A normal
distribution has a skew of zero, while a lognormal distribution, for
example, would exhibit some degree of right-skew.
Key Points:
Skewness, in statistics, is the degree of asymmetry observed in a
probability distribution.
Distributions can exhibit right (positive) skewness or left
(negative) skewness to varying degrees. A normal distribution
(bell curve) exhibits zero skewness.
Investors note right-skewness when judging a return distribution
because it, like excess kurtosis, better represents the extremes
of the data set rather than focusing solely on the average.
 Skewedness: when the majority of the data fall to the left or
right of the mean, the distribution said to be skewed.
The mean of positively skewed data will be greater than the median.
In a distribution that is negatively skewed, the exact opposite is the
case: the mean of negatively skewed data will be less than the
median. If the data graphs symmetrically, the distribution has zero
skewness, regardless of how long or fat the tails are.
Kurtosis: it is the degree of peakedness of a distribution
It is a measure of whether the data are flat/peaked relative to a
normal distribution.
The data sets with high kurtosis tend to have a distinct sharp peak
near the mean (+ve kurtosis), declining rather rapidly and have
heavy tail.
The data sets with low kurtosis tend to have a flat top (-ve
kurtosis) near the mean rather than a sharp peak.
Kurtosis is a statistical measure that defines
how heavily the tails of a distribution differ
from the tails of a normal distribution. In
other words, kurtosis identifies whether the
tails of a given distribution contain extreme
values.
1. Mesokurtic:
Data that follows a mesokurtic distribution
shows an excess kurtosis of zero or close to
zero. This means that if the data follows a
normal distribution, it follows a mesokurtic
distribution.
Types of Kurtosis:
The types of kurtosis are determined by the excess kurtosis of a
particular distribution. The excess kurtosis can take positive or
negative values, as well as values close to zero.
2. Leptokurtic:
Leptokurtic indicates a positive excess
kurtosis. The leptokurtic distribution shows
heavy tails on either side, indicating
large outliers. In finance, a leptokurtic
distribution shows that the investment
returns may be prone to extreme values on
either side. Therefore, an investment whose
returns follow a leptokurtic distribution is
considered to be risky.
3. Platykurtic:
A platykurtic distribution shows a negative
excess kurtosis. The kurtosis reveals a
distribution with flat tails. The flat tails
indicate the small outliers in a
distribution. In the finance context, the
platykurtic distribution of the investment
returns is desirable for investors because
there is a small probability that the
investment would experience extreme
returns.
Characteristics of Normal Curve:
 The curve is a bell-shaped distribution of a variables.
 The mean, median and mode are equal and located at the
center of the distribution.
 It is a unimodal distribution /only one mode/
 The curve is symmetrical about the mean.
 The curve is continuous, i. e there are no gap.
 The curve never touches the x-axis. But it gets increasingly
closer to the axis.
 The total area under the normal distribution curve is equal to
1 or 100%. This fact may seem unusual, since the curve never
touches the x-axis.
 The area under the normal curve that lies b/n
-1 and +1 s is approximately 0.68or 68%
 -2 and +2 s is approximately 0.95 or 95%
 -3 and +3 s is approximately 0.997 or 99.7%
Application of normal curve:
5. Confidence Interval for the Mean, Proportions,
and Variances
In statistics, a confidence interval (CI) is a type
of estimate computed from the statistics of the observed
data. This gives a range of values for an
unknown parameter (for example, a population mean). The
interval has an associated confidence level that gives the
probability with which the estimated interval will contain the
true value of the parameter. The confidence level is chosen
by the investigator. For a given estimation in a given sample,
using a higher confidence level generates a wider (i.e., less
precise) confidence interval. In general terms, a confidence
interval for an unknown parameter is based on sampling
the distribution of a corresponding estimator.
 Confidence Interval for the Mean:
A confidence interval gives an estimated range of values which
is likely to include an unknown population parameter, the
estimated range being calculated from a given set of sample
data.
The common notation for the parameter in question is . Often,
this parameter is the population mean , which is estimated
through the sample mean .
The level C of a confidence interval gives the probability that the
interval produced by the method employed includes the true
value of the parameter .
Example:
Suppose a student measuring the boiling temperature of a
certain liquid observes the readings (in degrees Celsius)
102.5, 101.7, 103.1, 100.9, 100.5, and 102.2 on 6 different
samples of the liquid. He calculates the sample mean to be
101.82. If he knows that the standard deviation for this
procedure is 1.2 degrees, what is the confidence interval for
the population mean at a 95% confidence level?
In other words, the student wishes to estimate the true
mean boiling temperature of the liquid using the results of
his measurements. If the measurements follow a normal
distribution, then the sample mean will have the
distribution N ( , ). Since the sample size is 6, the
standard deviation of the sample mean is equal to 1.2/sqrt
(6) = 0.49.
In the example above, the student calculated the sample mean of the
boiling temperatures to be 101.82, with standard deviation 0.49. The
critical value for a 95% confidence interval is 1.96, where (1-0.95)/2
= 0.025. A 95% confidence interval for the unknown mean is
((101.82 - (1.96*0.49)), (101.82 + (1.96*0.49))) = (101.82 - 0.96,
101.82 + 0.96) = (100.86, 102.78).
As the level of confidence decreases, the
size of the corresponding interval will
decrease. Suppose the student was
interested in a 90% confidence interval for
the boiling temperature. In this case, C =
0.90, and (1-C)/2 = 0.05. The critical
value z* for this level is equal to 1.645, so
the 90% confidence interval is ((101.82 -
(1.645*0.49)), (101.82 + (1.645*0.49))) =
(101.82 - 0.81, 101.82 + 0.81) = (101.01,
102.63)
 Confidence Interval for the Proportions:
Suppose we wish to estimate the proportion of people with
diabetes in a population or the proportion of people with
hypertension or obesity. These diagnoses are defined by specific
levels of laboratory tests and measurements of blood pressure
and body mass index, respectively. Subjects are defined as
having these diagnoses or not, based on the definitions. When
the outcome of interest is dichotomous like this, the record for
each member of the sample indicates having the condition or
characteristic of interest or not. Recall that for dichotomous
outcomes the investigator defines one of the outcomes a
"success" and the other a failure. The sample size is denoted by
n, and we let x denote the number of "successes" in the sample.
For example, if we wish to estimate the proportion of
people with diabetes in a population, we consider a
diagnosis of diabetes as a "success" (i.e., and individual
who has the outcome of interest), and we consider lack of
diagnosis of diabetes as a "failure." In this example, X
represents the number of people with a diagnosis of
diabetes in the sample. The sample proportion is p̂ (called
"p-hat"), and it is computed by taking the ratio of the
number of successes in the sample to the sample size, that
is:
The point estimate for the population proportion is the sample
proportion, and the margin of error is the product of the Z value
for the desired confidence level (e.g., Z=1.96 for 95% confidence)
and the standard error of the point estimate. In other words, the
standard error of the point estimate is:
This formula is appropriate for samples with at least 5 successes
and at least 5 failures in the sample. This was a condition for the
Central Limit Theorem for binomial outcomes. If there are fewer
than 5 successes or failures then alternative procedures,
called exact methods , must be used to estimate the population
proportion.
Example: During the 7th examination of the Offspring cohort
in the Framingham Heart Study there were 1219 participants
being treated for hypertension and 2,313 who were not on
treatment. If we call treatment a "success", then x=1219 and
n=3532. The sample proportion is:
This is the point estimate, i.e., our best estimate of the
proportion of the population on treatment for hypertension is
34.5%. The sample is large, so the confidence interval can be
computed using the formula:
Substituting our values we get
which is
So, the 95% confidence interval is (0.329, 0.361).
Thus we are 95% confident that the true proportion of
persons on antihypertensive medication is between 32.9%
and 36.1%.
 Confidence Interval for the Variances
When using a sample to calculate a statistic we are estimating a
population parameter. It is just an estimate and the sample due to
the nature of drawing a sample may not create a value (statistic)
that is close to the actual value (parameter).
We can calculate confidence interval about the statistic to
determine where the true and often unknown parameter may
exist. This includes the calculation of a variance statistic.
If you were to draw many different samples all the same size from
a population and plot the variance statistic the resulting
distribution is likely to fit a χ2 distribution. Plotting the means
creates a normal distribution which is symmetrical and produced
symmetrical confidence intervals. The χ2 distribution is not
symmetrical and will produce asymmetrical intervals.
Confidence Intervals for Variances and Standard Deviations:
Estimates of population means can be made from sample means,
and confidence intervals can be constructed to better describe
those estimates. Similarly, we can estimate a population standard
deviation from a sample standard deviation, and when the original
population is normally distributed, we can construct confidence
intervals of the standard deviation as well.
The Theory:
Variances and standard deviations are a very different type of
measure than an average, so we can expect some major differences
in the way estimates are made.
We know that the population variance formula, when used on a
sample, does not give an unbiased estimate of the population
variance. In fact, it tends to underestimate the actual population
variance. For that reason, there are two formulas for variance, one
for a population and one for a sample.
The sample variance formula is an unbiased estimator of the
population variance. (Unfortunately, the sample standard deviation is
still a biased estimator.)
Also, both variance and standard deviation are nonnegative
numbers. Since neither can take on a negative value, the domain of
the probability distribution for either one is not (−∞,∞)(−∞,∞), thus
the normal distribution cannot be the distribution of a variance or a
standard deviation. The correct PDF must have a domain
of [0,∞)[0,∞). It can be shown that if the original population of data
is normally distributed, then the
expression (n−1)s2σ2(n−1)s2σ2 has a chi-square distribution
with n−1n−1 degrees of freedom.
The chi-square distribution of the quantity (n−1)s2σ2(n−1)s2σ2 allows us to
construct confidence intervals for the variance and the standard
deviation (when the original population of data is normally
distributed). For a confidence level 1−α1−α, we will have the
inequality χ21−α/2≤(n−1)s2σ2≤χ2α/2χ1−α/22≤(n−1)s2σ2≤χα/22.
Solving this inequality for the population variance σ2σ2, and then
the population standard deviation σσ, leads us to the following pair
of confidence intervals.
It is worth noting that since the chi-
square distribution is not symmetric,
we will be obtaining confidence
intervals that are not symmetric
about the point estimate.
Example:
A statistician chooses 27 randomly selected dates, and when examining
the occupancy records of a particular motel for those dates, finds a
standard deviation of 5.86 rooms rented. If the number of rooms rented
is normally distributed, find the 95% confidence interval for the
population standard deviation of the number of rooms rented.
For a sample size of n=27n=27, we will
have df=n−1=26df=n−1=26 degrees of freedom. For a 95% confidence
interval, we have α=0.05α=0.05, which gives 2.5% of the area at each
end of the chi-square distribution. We find values
of χ20.975=13.844χ0.9752=13.844 and χ20.025=41.923χ0.0252=41.
923. Evaluating (n−1)s2χ2(n−1)s2χ2, we obtain 21.297 and 64.492. This
leads to the inequalities 21.297≤σ2≤64.49221.297≤σ2≤64.492 for the
variance, and taking square roots, 4.615≤σ≤8.0314.615≤σ≤8.031 for
the standard deviation.
Calculate the Variance and Standard Deviation of the
Data Given Below. 35, 45, 30, 35, 40, 25
Variance ad standard deviation for
grouped data

6. Hypothesis Testing with One and two Sample
Hypothesis is a suggested answer to the problem under
investigation. – John T. Townsend
A hypothesis is a tentative generalization, the validity of
which remains to be tested. – J. W. Best
A hypothesis is a proposition which can be put to test to
determine its validity. It may be proved correct or incorrect.
– Good & Hatt
A hypothesis is a conjectural/hypothetical statement of the
relation between two or more variables. – F. N. Kerlinger
Meaning of Hypothesis:
 On the basis of the definitions, we can say that hypothesis is an
assumption that is still not proved but shows the probable solution
of the problem or predicts the relationship between two or more
variables.
 The assumption is proved true or false by testing it.
 We will not have the solution to the problem until the
assumption is tested.
 Three points, regarding such assumptions, are very
important.
 The assumptions are made on the basis of previous
experiences or primary evidences or by thinking logically.
 Whether the assumptions are true or false is decided by
testing them.
 Testing of assumptions lead to the solution of the problem.
 Suppose we are watching some television programme and
suddenly the TV gets off.
 What will be our reactions to this problem? We start thinking
of the reasons of the problem like.
 Perhaps there is an interruption in the flow of electricity or
 There may be a problem in particular channel or
 There may be a loose connection of the cable with TV or
 There may be a problem in the system of cable operator.
 We will make such assumptions on the basis of our previous
experiences. Now we will check all the possible reasons of the
problem. For that, first, we will check if there is any problem in the
flow of electricity.
 If electric supply is found okay, we will check if other channels
are working or not.
 If other channels are found okay, we will check whether the
cable connection is proper or not.
 If everything is found okay, then we will call cable operator to
solve the problem.
 In this way, we will collect the evidence and
analyse it logically. By testing all the evidences, we
come to the conclusion about the solution of the
problem. We make many assumptions in our routine
life to find the solution to our daily problems.
Characteristics of a good hypothesis:
 A good hypothesis never opposes the universal truth and
natural law and rules.
 It is written in simple and easy language.
 Only one assumption is made in one hypothesis.
 The hypothesis is written in such a language that, after
testing, it can be clearly rejected or not rejected.
 Hypothesis is written in present tense because it is
not a prediction or opinion but it is an assumption
that is based on present factual information.
 A good hypothesis assures that the tool required for
testing it (hypo) is available or can be prepared
(developed) easily.
 Before formulating the hypothesis, it is assured
that the data will be available for testing it.
Types of Hypothesis:
It is very difficult to give such a classification of hypotheses as
can be accepted universally because different scholars have
classified the hypotheses in different ways.
 On the basis of different classifications, the types of hypotheses
can be described as follows.
 A hypothesis is of two types:
 Null hypothesis: Null hypothesis is a type of hypothesis in which
we assume that the sample observations are purely by chance. It is
denoted by H0.
 Alternate hypothesis: The alternate hypothesis is a hypothesis
in which we assume that sample observations are not by chance.
They are affected by some non-random situation. An
alternate/Declarative hypothesis are denoted by H1 ,H2, H3 Hn or
Ha.
 Some Special features of alternate /declarative hypotheses are as
follows:
 Researcher formulates the declarative hypotheses on the basis of
pre-experience, study of research material or on the basis of the
findings of previous researches.
 Such hypotheses are formulated on the basis of expected
findings of the research.
 Such hypothesis is accepted when null hypothesis is
rejected.
Hypothesis in Question Form:
 In this type of hypothesis, instead of expecting a certain
result, a questions is formed asking whether certain type of
result will be there or not.
E.g. in the context of the research title ‘STUDY OF EXAM
ANXIETY OF HIGHER SECONDARY SCHOOLS’ students in the
context of their stream’ the question type hypothesis will be as
follows:
 Is there difference between exam anxiety of arts, commerce
and science students of higher secondary schools? OR
 Is the exam anxiety of arts students of higher secondary
schools more than that of science students?
OR
 Is the exam anxiety of commerce students of higher
secondary schools more than that of science students? OR
 Is the exam anxiety of commerce students of higher
secondary schools more than that of arts students? OR
Null Hypothesis:
If, in the context of dependent variable, the hypothesis
indicates ‘no difference’ between two or more levels of
independent variable, it is called null hypothesis. Null
hypothesis indicates no relationship between two variables, if
correlational study is there.
 Null hypothesis is indicated by the symbol Ho. Such hypothesis is
also called ‘no difference’ type of hypothesis or ‘no relation’ type of
hypothesis.
 Let’s take examples to understand this type of hypothesis.
 For the study of the impact of instructional method on the
achievement of the students of grade nine in English, the null
hypotheses will be as follows:
Special features of Null Hypothesis:
 Researchers prefer to formulate null hypotheses due to their
some special features. These features are as follows:
 It is formulated objectively and not affected by the subjectivity of
the researcher.
 It believes in ‘no difference’ or ‘no relationship’. So the
researcher does not tend to be biased for certain type of the result
and works freely.
 It helps in making the entire research process objective
(unbiased).
 It is tested at certain level of significance.
Hypothesis Testing:
Hypothesis testing is a part of statistics in which we make
assumptions about the population parameter. So, hypothesis
testing mentions a proper procedure by analysing a random
sample of the population to accept or reject the assumption.
Steps of Hypothesis Testing:
 The process to determine whether to reject a null hypothesis
or to fail to reject the null hypothesis, based on sample data is
called hypothesis testing. It consists of four steps:
 Define the null and alternate hypothesis
 Define an analysis plan to find how to use sample data to
estimate the null hypothesis
 Some analysis on the sample data to create a single
number called ‘test statistic’.
 Understand the result by applying the decision rule to
check whether the Null hypothesis is true or not.
 If the value of t-test is less than the significance level we
will reject the null hypothesis, otherwise, we will fail to
reject the null hypothesis.
 Technically, we never accept the null hypothesis, we say
that either we fail to reject or we reject the null hypothesis.
Errors in hypothesis testing:
We have explained what is hypothesis testing and the steps to
do the testing. Now during performing the hypothesis testing,
there might be some errors.
 We classify these errors in two categories.
 Type-1 error: Type 1 error is the case when we reject the
null hypothesis but in actual it was true. The probability of
having a Type-1 error is called significance level alpha(α).
 Type-2 error:
A type II error is a statistical term used within the context
of hypothesis testing that describes the error that occurs
when one accepts a null hypothesis that is actually false. or
 Type 2 error is the case when we fail to reject the null
hypothesis but actually it is false. A type II error produces a
false negative, also known as an error of omission. For
example, a test for a disease may report a negative result,
when the patient is, in fact, infected.
 This is a type II error because we accept the conclusion of
the test as negative, even though it is incorrect.
 where as a type II error describes the error that occurs when
one fails to reject a null hypothesis that is actually false. The
error rejects the alternative hypothesis, even though it does
not occur due to chance.
 The probability of having a type-2 error is called beta(β).
Therefore,
 α= (Null hypothesis rejected | Null hypothesis is true)
 β= (Null hypothesis accepted | Null hypothesis is false)
Two-Tailed test:
All that we are interested in is a difference. Consequently, when
an experimenter wishes to test the null hypothesis, Ho: M1 - M2,
against its possible rejection and finds that it is rejected, then he
may conclude that a difference really exists between the two
means. But he does not make any assertion about the direction of
the difference.
Such a test is a non-directional test. It is also named as two-
tailed test, because it employs both sides, positive and negative,
of the distribution (normal or t distribution) in the estimation of
probabilities. Let us consider the probability at 5% significance
level in a two-tailed test with the help of figure 1.1
Mean of the distribution of scores concerning the difference
between sample means) Figure 1.1
 Therefore, while using both the tails of the distribution we may
say that 2.5% of the area of the normal curve falls to the right of
1.96 standard deviation units above the mean and 2.5% falls to
the left of 1.96 standard deviation units below the mean.
 The area outside these limits is 5% of the total area under the
curve. In this way, for testing the significance at the 5% level, we
may reject a null hypothesis if the computed error of the
difference between means reaches or exceeds the yardstick 1.96.
 Similarly, we may find that a value of 2.58 is required to test
the significance at the 1% level in the case of a two-tailed test.
One-tailed test:
 Here, we have reason to believe that the experimental group
will score higher on the mathematical computation ability test
which is given at the end of the session.
 In our experiment we are interested in finding out the gain in
the acquisition of mathematical computation skills (we are not
interested in the loss, as it seldom happens that coaching will
decrease the level of computation skill).
 In cases ,we make use of the one-tailed or directional test,
rather than the two-tailed or non-directional test to test the
significance of difference between means.
 Consequently, the meaning of null hypothesis, restricted to an
hypothesis of on difference with two-tailed test, will be somewhat
extended in a one-tailed test to include the direction--positive or
negative-of the difference between means.
One-tailed test at the 5% level
(Mean of the distribution of the score concerning the difference
between sample means) figure 1.2
 Thus 5% level of significance we will have 5% of the area. all in
one tail (at the 1.64 standard deviation unit above the mean rather
than having it equally divided into two tails as shown in Figure
1.1, for a two-tailed test.
 Consequently, in a one-tailed test for testing the difference
between large sample means, z score of 1.64 will be taken as a
yardstick at the 5% level of significance for the rejection of the
null hypothesis instead of 1.96 in the case of a two-tailed test.
 In the case of at distribution of small sample means, in making
use of the one-tailed test we have to look for the critical / values,
written in Table C of the Appendix, against the row (N1 + N2 - 2)
degrees of freedom under the columns labelled 0.10 and 0.02
(instead of 0.05 and 0.01 as in the case of two-tailed test) to test
the significance at 0.05 and 0.01% levels of significance,
respectively.
7. Two-way Analysis of Variance
Analysis Of Variance:
 we know the use of z and t tests for testing the
significance of the difference between the means of two
samples or groups. In practice where we have several
samples of the same general character drawn from a
given population and wish to know whether there are
any significant differences between the means of these
samples.
 In such situations, several differences between
several means are determined in a statistical technique
known by the name analysis of variance in which all
the data are treated together and a general null
hypothesis of no difference among the means of the
various samples or groups is tested.
Meaning of the Term 'Analysis of Variance :
A composite procedure for testing simultaneously the
difference between several sample means is known as the
analysis of variance.
It helps us to know whether any of the differences between
the means of the given samples are significant.
If the answer is 'yes', we examine pairs (with the help of the t
test) to see just where the significant differences lie. If the
answer is 'no', we do not proceed further.
The term 'analysis of variance' deals with the task of analysing
of breaking up the total variance of a large sample or a
population consisting of a number of equal groups or sub-
samples into components (two kinds of variances).
Two kinds of variances, given as follows:
1. "Within-groups" variance. This is the average variance of the
members of each group around their respective group
means, i.e. the mean value of the scores in a sample (as
members of each group may vary among themselves).
2. "Between-groups" variance. This represents the variance of
group means around the total or grand mean of all groups, i.e. the
best estimate of the population mean (as the group means may
vary considerably from each other).
For deriving more useful results, we can use variance as a measure
of dispersion (deviation from the mean) in place of some useful
measures like standard deviation. Consequently, the variance of an
individual's score from the grand mean may be broken into two
parts (as pointed out earlier), viz. within-group variance and
between groups variance.
In this way, the technique of analysis of variance as a single
composite test of significance, for the difference between
several group means demands the derivation of two
estimates of the population variance, one based on variance
of group means (between groups variance) and the other on
the average variance within the groups (within groups
variance.) Ultimately, the comparison of the size of between
groups variance and within-groups variance called F-ratio
denoted by between-groups variance within-groups variances
used as a critical ratio for determining the significance of
the difference between group means at a given level of
significance.
Procedure For Calculating The Analysis Of Variance:
As already pointed out the deviation of an individual's score
belonging to a sample or a group of population from the grand mean
can be is divided into two parts:
(i) deviation of the individual's score from group mean; and
(ii) deviation of the group mean from the grand mean.
Consequently, the total variance of the scores of all individuals
included in the study may be partitioned into within-group
variance and between-groups variance.
The formula used for the computation of variance ( ) is Σ X/N, i.e.
sum of the squared deviation from the mean value divided by
total frequencies.
Hence, by taking N as the common denominator, the total sum of
the squared deviation of scores around the grand or general mean
for all groups combined can be made equal to the sum of the two
partitioned, between groups and within-groups sum of squares.
Mathematically,
Total sum of squares (around the general mean)= between-groups
sum of squares + within-groups sum of squares or
Hence the procedure for the analysis of variance involves the
following main tasks:
Computation of total sum of squares ( )
Computation of between-groups sum of squares ( )
Computation of within-groups sum of squares ( )
Computation of F-ratio
Use of t test (if the need for further testing arises).
We understand these steps by adopting the following
terminology: X = Raw score of any individual included in the
study (any score entry in the given table)
 ΣX = Grand Sum
Grand or general mean
 X1 , X2 denote scores within first group, second group. ... n1,
n2, n3 =
 n1 , n2 , n3 = No. of individuals in first, second and third
groups
….. denote means of the first group, second group
N = Total No. of score or frequencies
 = n1+n2+n3+……..
 Step 1. Arrangement of the given table and computation of
some initial values.
 In this step, the following values needed in computation
are calculated from the experimental data arranged in proper
tabular form:
 Sum of square …and the grand sum , ΣX
 Group means ,…and the grand sum mean
 Correlation term C computed by the formula
 C =
Source of
variation
Sum of
squares
df Mean square
variance
Between-
groups
K-1
Within-groups N-K
Let us illustrate now the whole process of using the analysis of
variance technique with the help of an example,
Example 1.1: The aim of an experimental study was to determine
the effect of three different techniques of training on the learning
of a particular skill. Three groups, each consisting of seven
students of class PG in Adola college, assigned randomly, were
given training through these different techniques. The scores
obtained on a performance test were recorded as follows:
Group I Group II Group III
3 4 5
5 5 5
3 3 5
1 4 1
7 9 7
3 5 3
6 5 7
Test the difference between groups by adopting the analysis of
variance technique.
Solution:
Step 1. Original table computation
Rating of the coaching experts
Groups I Groups I Groups I Groups I
3 4 5 12
5 5 5 15
3 3 5 11
1 4 1 6
7 9 7 23
3 5 3 11
6 5 7 18
Here n1= n2 = n3 = 7 , N= n1+n2+n3 = 21
Group means =
Correction term C =
Step 2. Squared-table computation
Total
9 16 25 50
25 25 25 75
9 9 25 43
1 16 1 18
49 81 49 179
9 25 9 43
36 25 49 110
Step 7. Calculation of F-ratio
Source of
variation
Sum of squares df Mean square
variance/df
Between-groups 2 3.72/2 = 1.86
Within-groups 18 75.43/18 = 4.19
= 1.86/4.19 = 0.444
Step 8. Interpretation of F-ratio. The F-ratio table (Table R given in
the Appendix) is referred to for 2 degrees of freedom for smaller
mean square variance on the left-hand side, and for 18 degrees of
freedom for greater mean square variance across the top. The
critical values of F obtained by interpolation are as follows:
Critical value of F = 19.43 at 0.05 level of significance
Critical value of F = 99.44 at 0.01 level of significance
Our computed value of F (0.444) is not significant at both the levels
of significance and hence, the null hypothesis cannot be rejected
and we may confidently say that the differences between means are
not significant and therefore, there is no need for further testing
with the help of t test.
Two-way analysis of variance:
In one-way analysis of variance involving one experimental
variable(independent variable). If two experimental variables are .
Such experiments involve two-way classification based on two
experimental variables.
In this way, we have to simultaneously study the impact of two
experimental variables, each having two or more levels,
characteristics or classifications and hence we have to carry out
the two-way analysis of variance.
So far, we have dealt with one-way analysis of variance involving
one experimental variable. However, experiments may be
conducted simultaneous of two experimental variables. Such
experiments involve two-way classification based on two
experimental variables. Let us make some distinction between the
need for carrying out one-way and two-way analyses of variance
through some illustrations.
Suppose that we want to study the effect of four different
methods of teaching.
Here, the method of teaching is the experimental variable
(independent variable which is to be applied at four levels). We
take four groups of students randomly selected from a class.
These four groups are taught by the same teacher in the same
school but by different methods. At the end of the session, all the
groups are tested through an achievement test by the teacher.
The mean scores of these four groups are computed. If we are
interested in knowing the significance of the differences between
the means of these groups, the best technique is the analysis of
variance.
Since only one experimental variable (effect of the method of
teaching) is to be studied, we have to carry out one-way analysis
of variance.
Let us assume that there is one more experimental or independent
variable in the study, in addition to the method of teaching. Let it
be the school system at three levels which means that three
school systems are chosen for the experiment. These systems can
be government school, government-aided school and public school.
Now the experiment will involve the study of 4 x 3 groups. We have
groups each in the three types of schools (all randomly selected).
The achievement scores of these groups can then be compared by
the method of analysis of variance by establishing a null
hypothesis that neither the school system nor the methods have
anything to do with the achievement of pupils.
In this way, we have to simultaneously study the impact of two
experimental variables, each having two or more levels,
characteristics or classifications and hence we have to carry out
the two-way analysis of variance. In the two-way classification
situation, an estimate of population variance, i.e. total variance is
supposed to be broken into:
(i) variance due to methods alone,
(ii) variance due to school alone, and (1) S residual variance in
the groups called interaction variance (M x S; M= methods, S =
schools) which may exist on account of the following factors:
Chance, Uncontrolled variables like lack of uniformity in
teachers. Relative merits of the methods (which may differ from
one school to another).
In other words, there may be interaction between methods and
schools which means that although no method may be regarded
as good or bad in general, yet it may be more suitable or
unsuitable for a particular school system. Hence, the presence of
interaction variance is unique feature with all the experimental
studies involving two or more experimental variables.
Example 7 : In a research study, there were two experimental or
independent variables: a seven-member group of player and three
coaches who were asked to rate the players in terms of a particular
trait on a ten-point scale. The data were recorded as under:
Rating by
three
Coaches
Players
1 2 3 4 5 6 7
A 3 5 3 1 7 3 6
B 4 5 3 4 9 5 5
C 5 5 5 1 7 3 7
Apply the technique of analysis of variance for analyzing these data
Step 1. Arrangement of the data in a proper table and computation of
essential initial values.
Rating of coach
Players Total of
Rows
Square of
the total of
rows
1 3 4 5 12 144
2 5 5 5 15 225
3 3 3 5 11 121
4 1 4 1 6 36
5 7 9 7 23 529
6 3 5 3 11 121
7 6 5 7 18 324
Total of
Columns
Total
9 16 25 50
25 25 25 75
9 9 25 43
1 16 1 18
49 81 49 179
9 25 9 43
36 25 49 110
Step 7 Computation of F-ratio. Table illustration the computation
of mean square variance value.
Source of
variation
df Sum of square Mean square
variance
Rows (Players ) (r -1) = 6 61.15
Columns
(Coaches)
(c -1) = 2 3.72
Interaction or
residual
(r -1) (c -1) = 12 14.28
Step 8 Interpretation of F-ratio. Table gives the critical value of F:
Kind of f df for
greater
mean
square
variance
df for
smaller
mean
square
variance
Critical
value of
F at
0.05
level
Critical
value of
F at
0.01
level
Judgem
ent
about
the
significa
nce of
compute
d F
Conclusi
on
F (for
rows)
2 12 3.88 6.93 Significa
nt at
both
level
Null
hypothe
sis
rejected
F (for
column)
6 12 3.00 4.82 Not
significa
nt both
levels
Null
Hypothe
sis not
rejected
Here, F (for rows) is highly significant and hence null
hypothesises rejected. It indicates that the coaches did
discriminate among the players.
The second F (for columns) is insignificant and hence, the null
hypothesis cannot be rejected.
It indicates that the coaches did not differ significantly among
themselves in their ratings of the players.
In other words, their ratings may be taken as trustworthy and
reliable.
Underlying Assumptions In Analysis Of Variance:
The following are the fundamental assumptions for the use
of analysis of variance technique
1. The dependent variable which is measured should be
normally distributed in the population.
2. The individuals being observed should be distributed
randomly in the groups.
3. Within-groups variances must be approximately equal.
4. The contributions to variance in the total sample must
be additive.
8. Correlation and Simple Linear Regression
Regression is a method for studying the relationship
between two or more quantitative variables
Simple linear regression (SLR):
One quantitative dependent variable
 response variable
 dependent variable
 Y
One quantitative independent variable
 explanatory variable
 predictor variable
 X
 Multiple linear regression:
One quantitative dependent variable
Many quantitative independent variables
 SLR Examples:
 predict salary from years of experience
 estimate eect of lead exposure on school
testing performance
 predict force at which a metal alloy rod
bends based on iron content
 Example: Health data
Variables:
 Percent of Obese Individuals
 Percent of Active Individuals
Sl No Percent
Obesity
Percent Active
1 29.7 55.3
2 28.9 51.9
3 35.9 41.2
4 24.7 56.3
5 21.3 60.4
6 26.3 50.9
.
.
.
.
Graph:
A scatterplot or scatter diagram can give us a general idea of
the relationship between obe-sity and activity...
Graph:
The points are plotted as the pairs (xi; yi)
for i = 1; : : : ; 25
Inspection suggests a linear relationship between obesity and
activity (i.e. A straight line would go through the bulk of the
points, and the points would look randomly scattered around this
line).
Simple Linear Regression
The model
 The basic model
Yi = β0 + β1xi + Єi
 Yi is the observed response or dependent variable for
observation i
 xi is the observed predictor, regressor, explanatory
variable, independent variable, covariate
 Єi is the error term
 Єi are iid N(0; 2)
(iid means independently and identically distributed)
 So, E[Yi/xi] = β0 + β1xi + β0 = β0 + β1xi
The conditional mean (i.e. the expected value of Yi given xi, or
after conditioning on xi) is “β0 + β1xi" (a point on the estimated
line).
Some interpretation of parameters:
 β0 is conditional mean when x=0
 β1 is the slope, also stated as the change
in mean of Y per 1 unit change in x
 σ2 is the variability of responses about the
conditional mean
Graph:
 Simple Linear Regression Assumptions
Key assumptions
 linear relationship exists between Y and x
*we say the relationship between Y and x is linear if the
means of the conditional distributions of Y jx lie on a
straight line
 independent errors
(this essentially equates to independent observations
in the case of SLR)
 constant variance of errors
 normally distributed errors
9. CHI-SQUARE
Chi-square tests are often used in hypothesis testing.
The chi-square statistic compares the size any discrepancies
between the expected results and the actual results, given
the size of the sample and the number of variables in the
relationship.
What Is a Chi-Square Statistic?
A chi-square (χ2) statistic is a test that measures
how a model compares to actual observed data. The
data used in calculating a chi-square statistic must
be random, raw, mutually exclusive, drawn from
independent variables, and drawn from a large
enough sample. For example, the results of tossing
a fair coin meet these criteria.
Chi-square tests are often used in hypothesis testing. The chi-
square statistic compares the size any discrepancies between the
expected results and the actual results, given the size of the
sample and the number of variables in the relationship. For these
tests, degrees of freedom are utilized to determine if a
certain null hypothesis can be rejected based on the total number
of variables and samples within the experiment. As with any
statistic, the larger the sample size, the more reliable the results.
chi-square (χ2) theme
 A chi-square (χ2) statistic is a measure of
the difference between the observed (fo) and
expected frequencies(fe) of the outcomes of a
set of events or variables.
 χ2 depends on the size of the difference between actual
and observed values, the degrees of freedom, and the
samples size.
Goodness of fit
 As a test of goodness of fit, Pearson tried to make use of
the chi square distribution for devising a test for determining
how well experimentally obtained results fit in the results
expected theoretically on some hypothesis.
 χ2 can be used to test whether two variables are related
or independent from one another or to test the goodness-
of-fit between an observed distribution and a theoretical
distribution of frequencies.
What Does a Chi-Square Statistic Tell You?
 There are two main kinds of chi-square tests: the test of
independence, which asks a question of relationship, such as, "Is
there a relationship between student sex and course choice?"; and
the goodness-of-fit test, which asks something like "How well does
the coin in my hand match a theoretically fair coin?
 When considering student sex and course choice, a χ2 test for
independence could be used. To do this test, the researcher
would collect data on the two chosen variables (sex and courses
picked) and then compare the frequencies at which male and
female students select among the offered classes using the
formula a χ2.
 If there is no relationship between sex and course selection
(that is, if they are independent), then the actual frequencies
at which male and female students select each offered course
should be expected to be approximately equal, or conversely,
the proportion of male and female students in any selected
course should be approximately equal to the proportion of
male and female students in the sample.
 A χ2 test for independence can tell us how likely it is that
random chance can explain any observed difference between
the actual frequencies in the data and these theoretical
expectations.
 χ2 provides a way to test how well a sample of data matches
the (known or assumed) characteristics of the larger population
that the sample is intended to represent. If the sample data do
not fit the expected properties of the population that we are
interested in, then we would not want to use this sample to
draw conclusions about the larger population.
 For example consider an imaginary coin with exactly 50/50
chance of landing heads or tails and a real coin that you toss
100 times. If this real coin has an is fair, then it will also have
an equal probability of landing on either side, and the expected
result of tossing the coin 100 times is that heads will come up
50 times and tails will come up 50 times.
 In this case, χ2 can tell us how well the actual results of 100
coin flips compare to the theoretical model that a fair coin will
give 50/50 results. The actual toss could come up 50/50, or
60/40, or even 90/10. The farther away the actual results of the
100 tosses is from 50/50, the less good the fit of this set of tosses
is to the theoretical expectation of 50/50 and the more likely we
might conclude that this coin is not actually a fair coin.
Null hypothesis:
 Null hypothesis is set up No actual difference between the
observed frequencies (derived from experimental results and
expected frequencies, derived on the basis of some hypothesis of
equal probability or chance factor or hypothesis of normal
distribution).
 Computation of value of chi square.
Application of this formula requires the following
computation:
1.Computation of expected frequencies based on some
hypothesis.
2.Finding the square of the differences between observed
and expected frequencies and dividing it by the expected
frequency in each case to determine the sum of these
quotients.
Determining the Degrees of Freedom:
Determining the Number of Degrees of Freedom
 The data in chi square problems are usually available or may be
arranged in the form of a contingency table showing observed and
expected frequencies in some specific rows and columns.
The formula for the computation of the number of degrees of
freedom in a chi square problem usually runs as follows:
Where
 df= (r - 1) (c -1)
 r = No. of rows in the contingency table
 c = No. of columns in the contingency table
Determining the Critical Value of χ2
 As for the z and / tests of significance, there exists a table
for the critical value of x required for significance at a
predetermined significance level (5% or 1%) for the computed
degrees of freedom. The desired critical value of x thus, may be
read from Table F given in the Appendix of this text.
Heads 40 50 -10 100 2
Tails 60 50 10 100 2
Total 100 100
Example: Chi Square test of Independence between two means
100 boys and 60 girls were asked to selected one of
the five elective subjects. The choice of the two
sexes were tabulated separately.
sex subjects
A B C D E Total
Boys 25 30 10 25 10 100
Girls 10 15 5 15 15 60
Solution:
Step 1. Establishing a null hypothesis-Null hypothesis. The choice
of the subject is independent of sex, i.e. there exists no significant
difference between the choices of the two sexes.
Step 2. Computation of the expected frequencies. For this purpose,
we first present the given data in the contingency table form. The
expected frequencies after being computed, are written within
brackets just below the respective observed frequencies. The
contingency table and the process of computation of expected
frequencies is given in Table.
Categories
Sex A B C D E Total
Boys 25 30 10 25 10 100
(21.9) () () () ()
Girls 10 15 5 15 15 60
() () () () ()
Total 35 45 15 40 25 160
Categories
Sex A B C D E Total
Boys 25 30 10 25 10 100
(21.9) (28.1) (9.4) (25) (15.6)
Girls 10 15 5 15 15 60
(13.1) (16.9) (5.6) (15) (9.4)
Total 35 45 15 40 25 160
100x35/160=21.9 60x35/160=13.1
100x45/160-28.1 60x45/160=16.9
100x15/160=9.4 60x15/160=5.6
100x40/160=25 60x40/160=15
100x25/160=15.6 60x25/60=9.4
Computation of expected frequencies
fo fe fo-fe (fo-fe)2 (fo-fe)2/fe
25 21.9 3.1 9.61 0.4388
30 28.1 1.9 3.61 0.1284
10 9.4 0.6 0.36 0.038
25 25 00 00 0000
10 15.6 -5.6 31.36 2.010
10 13.1 -3.1 9.61 0.733
15 16.9 -1.9 3.61 0.214
5 5.6 -0.6 0.36 0.064
15 15 00 00 0000
15 9.4 5.6 31.36 3.336
Total 160 160 X2 =6.962
fo fe fo-fe (fo-fe)2 (fo-fe)2/fe
25 21.9 3.1 9.61 .439
30 28.1 1.9 3.61 .128
10 9.4 .6 .36 .038
25 25 0 0 .000
10 15.6 -5.6 31.36 2.010
10 13.1 -3.1 9.61 .733
15 16.9 -1.9 3.61 .214
5 5.6 -.6 .36 .064
15 15 0 0 .000
15 9.4 5.6 31.36 3.336
Total 160 160 X2 =6.962
Step 4: Testing the null hypothesis
 No of degree of freedom=(r-1)(c-1)=(2-1)(5-1)=4 For 4 degrees of
freedom , from Table F of the x2 distribution.
 Critical value of chi (x2) =9.48 at 5% level of significance.
 Critical value of chi (x2) = 13.277 at 1% level of significance.
 The computed value of x2 is much less than that of critical
value of x2 at 5% and 1% levels of significance.
 Hence it is to be taken as non-significant. Consequently , the
null hypothesis cannot be rejected it is quite independent of sex.
Module 8-S M & T C I, Regular.pptx

More Related Content

Similar to Module 8-S M & T C I, Regular.pptx

Definition Of Statistics
Definition Of StatisticsDefinition Of Statistics
Definition Of StatisticsJoshua Rumagit
 
Chapter 6.pptx Data Analysis and processing
Chapter 6.pptx Data Analysis and processingChapter 6.pptx Data Analysis and processing
Chapter 6.pptx Data Analysis and processingetebarkhmichale
 
Introduction-To-Statistics-18032022-010747pm (1).ppt
Introduction-To-Statistics-18032022-010747pm (1).pptIntroduction-To-Statistics-18032022-010747pm (1).ppt
Introduction-To-Statistics-18032022-010747pm (1).pptIsrar36
 
Introduction to statistics.pptx
Introduction to statistics.pptxIntroduction to statistics.pptx
Introduction to statistics.pptxUnfold1
 
TMGT 361Assignment V InstructionsLectureEssayStatistics 001.docx
TMGT 361Assignment V InstructionsLectureEssayStatistics 001.docxTMGT 361Assignment V InstructionsLectureEssayStatistics 001.docx
TMGT 361Assignment V InstructionsLectureEssayStatistics 001.docxherthalearmont
 
INTRO to STATISTICAL THEORY.pdf
INTRO to STATISTICAL THEORY.pdfINTRO to STATISTICAL THEORY.pdf
INTRO to STATISTICAL THEORY.pdfmt6280255
 
Statistics For The Behavioral Sciences 10th Edition Gravetter Solutions Manual
Statistics For The Behavioral Sciences 10th Edition Gravetter Solutions ManualStatistics For The Behavioral Sciences 10th Edition Gravetter Solutions Manual
Statistics For The Behavioral Sciences 10th Edition Gravetter Solutions Manuallajabed
 
Machine learning pre requisite
Machine learning pre requisiteMachine learning pre requisite
Machine learning pre requisiteRam Singh
 
Data analysis presentation by Jameel Ahmed Qureshi
Data analysis presentation by Jameel Ahmed QureshiData analysis presentation by Jameel Ahmed Qureshi
Data analysis presentation by Jameel Ahmed QureshiJameel Ahmed Qureshi
 
Descriptive and Inferential Statistics.docx
Descriptive and Inferential Statistics.docxDescriptive and Inferential Statistics.docx
Descriptive and Inferential Statistics.docxRobertLogrono
 
Lect 1_Biostat.pdf
Lect 1_Biostat.pdfLect 1_Biostat.pdf
Lect 1_Biostat.pdfBirhanTesema
 
Understanding the Scales of Measurement
Understanding  the Scales of MeasurementUnderstanding  the Scales of Measurement
Understanding the Scales of MeasurementDrShalooSaini
 
Statistics 1
Statistics 1Statistics 1
Statistics 1Saed Jama
 

Similar to Module 8-S M & T C I, Regular.pptx (20)

Definition Of Statistics
Definition Of StatisticsDefinition Of Statistics
Definition Of Statistics
 
Statistical lechure
Statistical lechureStatistical lechure
Statistical lechure
 
Statistics and prob.
Statistics and prob.Statistics and prob.
Statistics and prob.
 
Understanding statistics in research
Understanding statistics in researchUnderstanding statistics in research
Understanding statistics in research
 
Chapter 6.pptx Data Analysis and processing
Chapter 6.pptx Data Analysis and processingChapter 6.pptx Data Analysis and processing
Chapter 6.pptx Data Analysis and processing
 
Introduction-To-Statistics-18032022-010747pm (1).ppt
Introduction-To-Statistics-18032022-010747pm (1).pptIntroduction-To-Statistics-18032022-010747pm (1).ppt
Introduction-To-Statistics-18032022-010747pm (1).ppt
 
Introduction to statistics.pptx
Introduction to statistics.pptxIntroduction to statistics.pptx
Introduction to statistics.pptx
 
TMGT 361Assignment V InstructionsLectureEssayStatistics 001.docx
TMGT 361Assignment V InstructionsLectureEssayStatistics 001.docxTMGT 361Assignment V InstructionsLectureEssayStatistics 001.docx
TMGT 361Assignment V InstructionsLectureEssayStatistics 001.docx
 
INTRO to STATISTICAL THEORY.pdf
INTRO to STATISTICAL THEORY.pdfINTRO to STATISTICAL THEORY.pdf
INTRO to STATISTICAL THEORY.pdf
 
01 Introduction (1).pptx
01 Introduction (1).pptx01 Introduction (1).pptx
01 Introduction (1).pptx
 
Statistics For The Behavioral Sciences 10th Edition Gravetter Solutions Manual
Statistics For The Behavioral Sciences 10th Edition Gravetter Solutions ManualStatistics For The Behavioral Sciences 10th Edition Gravetter Solutions Manual
Statistics For The Behavioral Sciences 10th Edition Gravetter Solutions Manual
 
Machine learning pre requisite
Machine learning pre requisiteMachine learning pre requisite
Machine learning pre requisite
 
Data analysis presentation by Jameel Ahmed Qureshi
Data analysis presentation by Jameel Ahmed QureshiData analysis presentation by Jameel Ahmed Qureshi
Data analysis presentation by Jameel Ahmed Qureshi
 
Descriptive and Inferential Statistics.docx
Descriptive and Inferential Statistics.docxDescriptive and Inferential Statistics.docx
Descriptive and Inferential Statistics.docx
 
Lect 1_Biostat.pdf
Lect 1_Biostat.pdfLect 1_Biostat.pdf
Lect 1_Biostat.pdf
 
Lesson1_Topic 1.pptx
Lesson1_Topic 1.pptxLesson1_Topic 1.pptx
Lesson1_Topic 1.pptx
 
CHAPONE edited Stat.pptx
CHAPONE edited Stat.pptxCHAPONE edited Stat.pptx
CHAPONE edited Stat.pptx
 
Bio stat
Bio statBio stat
Bio stat
 
Understanding the Scales of Measurement
Understanding  the Scales of MeasurementUnderstanding  the Scales of Measurement
Understanding the Scales of Measurement
 
Statistics 1
Statistics 1Statistics 1
Statistics 1
 

More from Rajashekhar Shirvalkar

Module 3 Curriculum Design and Development.pptx
Module 3 Curriculum Design and Development.pptxModule 3 Curriculum Design and Development.pptx
Module 3 Curriculum Design and Development.pptxRajashekhar Shirvalkar
 
Module 2 Comparitive Studies in Curriculum and Instructins.pptx
Module 2 Comparitive Studies in Curriculum and Instructins.pptxModule 2 Comparitive Studies in Curriculum and Instructins.pptx
Module 2 Comparitive Studies in Curriculum and Instructins.pptxRajashekhar Shirvalkar
 
Module 9- Research Design and Methods in C urriculum & Instruction.pptx
Module 9- Research Design and Methods in C urriculum & Instruction.pptxModule 9- Research Design and Methods in C urriculum & Instruction.pptx
Module 9- Research Design and Methods in C urriculum & Instruction.pptxRajashekhar Shirvalkar
 
Dr. Rajashekhar Shirvalkar Educational technology module 2
Dr. Rajashekhar Shirvalkar Educational technology module 2Dr. Rajashekhar Shirvalkar Educational technology module 2
Dr. Rajashekhar Shirvalkar Educational technology module 2Rajashekhar Shirvalkar
 
Dr. Rajashekhar Shirvalkar Educational technology module 1
Dr. Rajashekhar Shirvalkar Educational technology module 1Dr. Rajashekhar Shirvalkar Educational technology module 1
Dr. Rajashekhar Shirvalkar Educational technology module 1Rajashekhar Shirvalkar
 
Dr. Rajashekhar Shirvalkar Curriculum construction module-2 components and de...
Dr. Rajashekhar Shirvalkar Curriculum construction module-2 components and de...Dr. Rajashekhar Shirvalkar Curriculum construction module-2 components and de...
Dr. Rajashekhar Shirvalkar Curriculum construction module-2 components and de...Rajashekhar Shirvalkar
 
Dr. Rajashekhar Shirvalkar Curriculum construction module-1
Dr. Rajashekhar Shirvalkar Curriculum construction module-1Dr. Rajashekhar Shirvalkar Curriculum construction module-1
Dr. Rajashekhar Shirvalkar Curriculum construction module-1Rajashekhar Shirvalkar
 
Skill of introducing lesson dr. rajashekhar shirvalkar
Skill of introducing lesson dr. rajashekhar shirvalkarSkill of introducing lesson dr. rajashekhar shirvalkar
Skill of introducing lesson dr. rajashekhar shirvalkarRajashekhar Shirvalkar
 
Quiz gk science Dr. Rajashekhar Shirvalkar
Quiz gk science Dr. Rajashekhar ShirvalkarQuiz gk science Dr. Rajashekhar Shirvalkar
Quiz gk science Dr. Rajashekhar ShirvalkarRajashekhar Shirvalkar
 
Women empowerment through education: Women Education
Women empowerment through education: Women EducationWomen empowerment through education: Women Education
Women empowerment through education: Women EducationRajashekhar Shirvalkar
 
Cooperative learning: Educational Technology
Cooperative learning: Educational TechnologyCooperative learning: Educational Technology
Cooperative learning: Educational TechnologyRajashekhar Shirvalkar
 

More from Rajashekhar Shirvalkar (20)

Module 3 Curriculum Design and Development.pptx
Module 3 Curriculum Design and Development.pptxModule 3 Curriculum Design and Development.pptx
Module 3 Curriculum Design and Development.pptx
 
Module 2 Comparitive Studies in Curriculum and Instructins.pptx
Module 2 Comparitive Studies in Curriculum and Instructins.pptxModule 2 Comparitive Studies in Curriculum and Instructins.pptx
Module 2 Comparitive Studies in Curriculum and Instructins.pptx
 
Module 6-L A & E, Weekend.pptx
Module 6-L A & E, Weekend.pptxModule 6-L A & E, Weekend.pptx
Module 6-L A & E, Weekend.pptx
 
Module 9- Research Design and Methods in C urriculum & Instruction.pptx
Module 9- Research Design and Methods in C urriculum & Instruction.pptxModule 9- Research Design and Methods in C urriculum & Instruction.pptx
Module 9- Research Design and Methods in C urriculum & Instruction.pptx
 
Education@21 Century PPT.pptx
Education@21 Century PPT.pptxEducation@21 Century PPT.pptx
Education@21 Century PPT.pptx
 
Case study method
Case study methodCase study method
Case study method
 
Dr. Rajashekhar Shirvalkar Educational technology module 2
Dr. Rajashekhar Shirvalkar Educational technology module 2Dr. Rajashekhar Shirvalkar Educational technology module 2
Dr. Rajashekhar Shirvalkar Educational technology module 2
 
Dr. Rajashekhar Shirvalkar Educational technology module 1
Dr. Rajashekhar Shirvalkar Educational technology module 1Dr. Rajashekhar Shirvalkar Educational technology module 1
Dr. Rajashekhar Shirvalkar Educational technology module 1
 
Dr. Rajashekhar Shirvalkar Curriculum construction module-2 components and de...
Dr. Rajashekhar Shirvalkar Curriculum construction module-2 components and de...Dr. Rajashekhar Shirvalkar Curriculum construction module-2 components and de...
Dr. Rajashekhar Shirvalkar Curriculum construction module-2 components and de...
 
Dr. Rajashekhar Shirvalkar Curriculum construction module-1
Dr. Rajashekhar Shirvalkar Curriculum construction module-1Dr. Rajashekhar Shirvalkar Curriculum construction module-1
Dr. Rajashekhar Shirvalkar Curriculum construction module-1
 
Skill of introducing lesson dr. rajashekhar shirvalkar
Skill of introducing lesson dr. rajashekhar shirvalkarSkill of introducing lesson dr. rajashekhar shirvalkar
Skill of introducing lesson dr. rajashekhar shirvalkar
 
Motion2
Motion2Motion2
Motion2
 
Electricity 1
Electricity 1Electricity 1
Electricity 1
 
Quiz gk science Dr. Rajashekhar Shirvalkar
Quiz gk science Dr. Rajashekhar ShirvalkarQuiz gk science Dr. Rajashekhar Shirvalkar
Quiz gk science Dr. Rajashekhar Shirvalkar
 
Women empowerment through education: Women Education
Women empowerment through education: Women EducationWomen empowerment through education: Women Education
Women empowerment through education: Women Education
 
Cooperative learning: Educational Technology
Cooperative learning: Educational TechnologyCooperative learning: Educational Technology
Cooperative learning: Educational Technology
 
Module-9-Comparision of mars lesson
Module-9-Comparision of mars lessonModule-9-Comparision of mars lesson
Module-9-Comparision of mars lesson
 
Factors effecting on creativity
Factors effecting on creativityFactors effecting on creativity
Factors effecting on creativity
 
Action research
Action researchAction research
Action research
 
Teaching technology
Teaching technologyTeaching technology
Teaching technology
 

Recently uploaded

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 

Recently uploaded (20)

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 

Module 8-S M & T C I, Regular.pptx

  • 1. Module 8: Statistical Methods and Testing in curriculum and Instruction Module Code: TECS: 624 Credit hours: 2 Prof.Omprakash H M Department of Curriculum and Instructons College of Education and Behavioral Sciences Bule Hora University, Adola, Ethiopia
  • 2. 1. Introduction to statistics in curriculum and Instruction 1.1 The definition of statistics and other related terms 1.2 Descriptive statistics 1.3 Inferential statistics 1.4 Function and significance of statistics in education 1.5 Types and levels of measurement scale
  • 3. 1.1 The definition of statistics and other related terms Statistics is:  The scientific study of handling of quantitative information.  Concerned with the scientific methods for collecting, organizing, summarizing, presenting and analyzing data as well as deriving valid conclusions and making reasonable decisions on the basis of this analysis.  The systematic collection of numerical data and its interpretation.
  • 4. The knowledge of statistics enables us to: o Present our result in summarized, more meaningful and convenient form. o Draw generalization and make prediction. o In short, it helps us to condense (e.g. tabulation and classification), compare (e.g. grand total, percentage, mean, median, range, variance, etc.), forecast (e.g. regression analysis), estimate the result of the data under consideration and to test the hypothesis.
  • 5. I. Variable o It refers to a property or character of an object or events that can take on different values. Example: Sex, age, grade, height, weight, intelligence, attitude, depression, etc. We dichotomize the concept of a variable in terms of : a) Independent variable o The variable which is manipulated by the researcher o It is the variable that the experimenter control it (e.g. teaching material, motivation, stress, confidence, etc.)
  • 6. b) Dependent Variable oVaries as the result of independent variable oIt is the result of the study (data) Example: Achievement scores, self-esteem scores, etc. II. Data o The measurements/observations recorded about the individuals. o There are two types of data. These are: a) Numerical/Quantitative/Measurement Variable/Data
  • 7. a) Numerical/Quantitative/Measurement Variable/Data o Is classified according to some characteristics and obtained by counting or measurements. Example: Grades on a test, weight, height, self-esteem scores, number of children, etc. o We are interested in ‘how much’ of some property a particular object represents. There are two types of numerical/ quantitative data/variable. These are: 1) Discrete data/variables o The data/variables which can take on integer values/ whole numbers only (0,1,2,3,---) and are usually obtained by counting
  • 8. o It is the variables that have only a countable number of distinct possible values. Example. The no. of students in the class, the no. of households, cows in a village, etc. 2. Continuous Data/Variables o The data which can take on any value in a set of real numbers. Example: length, weight, temperature and age, self-esteem scores, etc. b) Categorical/ Qualitative/ Frequency Data/Variables o Are classified according to some attributes Example. Gender/sex (male/female), marital status (single, married, divorced, widowed), hair colors (brown, yellow, red, gray, etc.) o They have only discrete data
  • 9. Exercise. Identify the types of data collected on 50 Freshman psychology students given below. a) Their age b) Their height c) Their region/city administration d) No. of brothers and sisters they have e) No. courses they have enrolled in f) Whether full time/part time student g) Their sex/gender
  • 10. Score o It is the raw data collected o It is an individual value for a variable Example: Score on a test, self-esteem score 1.2 Descriptive Statistics Descriptive Statistics o Refers to procedures for organizing, summarizing and describing quantitative data about the sample or about the population o Does not involve the drawing of conclusions or drawing inferences from the sample or population
  • 11. o Data can be summarized by: 1) Tabular representation- Frequency distribution 2. Graphic Representation o Bar-graph, phi chart, histogram, polygon, etc. 3. Numerical Representation o A single value present many numbers such as: a) Measures of central tendency b) Measures of variability c) Measures of association/ relationship, etc. 1.3 Inferential Statistics  It is the method that used to draw conclusions or inferences about characteristics of populations based on data from a sample.
  • 12. o It uses data gathered from a sample to make decisions/inferences about the larger population from which the samples was drawn.  Statistical inference is the process of making an estimate, prediction, or decision about a population based on a sample  What can we infer about a population’s parameters based on a sample’s statistics  The major inferential statistical methods are:  The t-test, Analysis of Variance (ANOVA), Analysis of Covariance (ANCOVA), Regression Analysis, etc.
  • 13. Population o It is the set of all the individuals of interest in a particular study o Anything we might be interested in studying Example: College students, elder people, single parent families, people with depression, etc. Sample o A set of data or individuals selected from a population oIt is the subset of the population oUsually intended to represent the population in a research study
  • 14. Parameter oIt is a descriptive measure of a population oIt is a value that describe a population oUsually derived from measurements of the individuals in the population oIt is the real entities of interest Example. Mean, variance, standard deviation, etc. of the population Statistic o A descriptive measure of a sample o A value that describes of a sample o Usually derived from measurements of individuals in the sample. o Are guesses at reality Example: Mean, variance, standard deviation, etc. of the
  • 15. 1.4 Function and significance of statistics in education Function of Statistics: 1. It helps in collecting and presenting the data in a systematic manner. 2. It helps to understand unwisely and complex data by simplifying it. 3. It helps to classify the data. 4. It provides basis and techniques for making comparison. 5. It helps to study the relationship between different phenomena. 6. It helps to indicate the trend of behaviour. 7. It helps to formulate the hypothesis and test it. 8. It helps to draw rational conclusions.
  • 16. Significance of Statistics in Education 1. It helps the teacher to provide the most exact type of description. 2. It makes the teacher definite and exact in procedures and thinking. 3. It enables the teacher to summarize the results in a meaningful and convenient form. 4. It enables the teacher to draw general conclusions. 5. It helps the teacher to predict the future perfor- mance of the pupils. 6. Statistics enables the teacher to analyse some of the causal factors underlying complex and otherwise be- wildering events.
  • 17. 1.5 Types and levels of measurement scale 1. What is measurement?  Measurement is the process of assigning numbers to objects and events according to logically acceptable rules. 2. What are levels/scales of measurement?  There are four levels/scales of me asurement, these are:  Nominal Scale  Ordinal Scale  Interval Scale and  Ratio Scale
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.  Before discussing these one by one, it is better to look at some of the attributes possessed by these scales. a) Magnitude: The quantitative value that exist to an attribute, will determine whether the value of a given attribute is >, < or = than other b) Equal Interval: The same magnitude (value interval) with in the same interval of any other two points along the same scale c) Absolute zero point: Refers to the absence or presenc of true zero (0) point. Whether the attribute has no value at all.
  • 24.  Nominal Scale A Nominal Scale is a measurement scale, in which numbers serve as “tags” or “labels” only, to identify or classify an object. A nominal scale measurement normally deals only with non- numeric (quantitative) variables or where numbers have no value. Below is an example of Nominal level of measurement.  Also known as classificatory scale because it has the property of classification and labeling.  The objects are classified or labeled without ranking or order associated with such classified data.  It reflects qualitative differences rather than quantitative ones
  • 25.  The least precise method of quantification and does not have any of the three attributes (magnitude, equal interval and absolute zero point)  Uses categorical/qualitative variables and Found by counting Example: Yes/No, Pass/Fail, Male/Female, political party affiliation (democratic, republican 0r independent), nationality, race, occupation, religious affiliation, football players number,
  • 26.  Ordinal Scale Ordinal Scale is defined as a variable measurement scale used to simply depict the order of variables and not the difference between each of the variables. These scales are generally used to depict non-mathematical ideas such as frequency, satisfaction, happiness, a degree of pain, etc. It is quite straight forward to remember the implementation of this scale as ‘Ordinal’ sounds similar to ‘Order’, which is exactly the purpose of this scale.
  • 27.  It is also termed as ranking scale and possesses only the attributes of magnitude.  Incorporates the property of nominal scale and in addition it introduces the meaning of ordering (ranking).  Has no absolute values and the real difference between adjacent ranks may not be equal.  Numbers are used as labels and do not indicate amount or quantity. It is used simply as description.  Uses categorical/qualitative variables.
  • 28. Example: Students in rank, professional career structures, likert scale of pain (No pain, low pain, moderate pain, high pain), marathon runners medals award (Gold, silver, bronze), position at the end of race-1st, 2nd, 3rd, etc.), state of buildings (Excellent condition, moderate condition, poor condition).
  • 29.  Interval Scale Interval Scale is defined as a numerical scale where the order of the variables is known as well as the difference between these variables. Variables that have familiar, constant, and computable differences are classified using the Interval scale. It is easy to remember the primary role of this scale too, ‘Interval’ indicates ‘distance between two entities’, which is what Interval scale helps in achieving.  Is an arbitrary scale based on equal units of measurement that indicate how much of a given characteristics is present.  Possesses two of the attributes, that are magnitude and equal intervals. Zero point is arbitrary in this scale.  Uses quantitative variables.
  • 30.  The difference amount of characteristics possessed by persons with scales of 90 and 91 is assumed to be equivalent to that between persons with scores of 60 and 61 or the difference in temperature between 10OF and 20OF is the same as the difference between 80OF and 90OF  Its primary limitation is the lack of absolute zero point.  It does not have the capacity to measure the complete absence of the trait, and rations between numbers on the scale are not meaningful (arbitrary).  we cannot say, for example, that 40OF is half as hot as 80OF or twice as hot as 20OF
  • 31. oTherefore, numbers cannot be multiplied and divided directly. oPsychological tests and inventories are interval scales and have this limitation, although they can be added, subtracted, multiplied and divided. Example: IQ test, Temperature, Motivation Score, Attitude  Ratio Scale Ratio Scale is defined as a variable measurement scale that not only produces the order of variables but also makes the difference between variables known along with information on the value of true zero. It is calculated by assuming that the variables have an option for zero, the difference between the two variables is the same and there is a specific order between the options.
  • 32. oThe numerals of the ratio scale have the qualities of real numbers and can be added, subtracted, multiplied and divided.  We can say that 5 grams is one half of 10 grams, 15 grams is three times 5 grams, a weight of 0kg means no weight (mass) at all. All statistical measures can be used for a variable measured at the ratio level, since all the necessary mathematical operations are defined. Almost exclusively confined to use in physical sciences. In other words, it is almost non existence in psychology and other social sciences. oUses quantitative variables Example: Mass, length, time, energy, number of correct answers on a test.
  • 33. 2. Introduction to SPSS Software
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 66.
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74.
  • 75. A frequency distribution is an overview of all distinct values in some variable and the number of times they occur. That is, a frequency distribution tells how frequencies are distributed over values. Frequency distributions are mostly used for summarizing categorical variables. 3. Frequency Distribution:
  • 76. Frequency Distribution:  Is the easiest method of organizing data, which converts raw data into a meaningful pattern for statistical analysis. Lists all the classes and the number of values (frequency) that belong to each class Exhibits how the frequencies are distributed over various categories Besides it makes the data easy and brief to be understood. However, individual score loss its identity.
  • 77. Frequency Distributions and Graphs for Ungrouped Data a) Frequency Distributions for Ungrouped Data First we arrange numbers/data in ascending or descending order.  Then, we tally the numbers/data Finally, we use a frequency table E.g. Age of 20 students 25,20,18,23,21,23,22,24,19,20,18,26,18,25,18,24,18,21,24,18
  • 78. Table 1: Frequency Distribution of Age of 20 students in a class Age Tally Frequency (f) 18 ////// 6 19 / 1 20 // 2 21 // 2 22 / 1 23 // 2 24 /// 3 25 // 2 26 / 1 Total 20
  • 79. b) Graphs for ungrouped/ Discrete Data What is graph in statistics?  It is the geometrical image of frequency distribution It is when frequency distributions are converted into visual models to facilitate understanding (e.g. pie chart/circle graph, bar graph, line graph, histogram, frequency polygon, etc.)
  • 80. For ungrouped data we use bar graph, pie chart/circle graph, line graph, etc. a) Bar Graph/Chart It is a detached graph made up of bars whose heights represent the frequencies of the respective categories. It is used to summarize categorical data.
  • 81. b) Pie chart/Circle Graph It is divided into portions that represent the frequency, relative frequency, or percentage of a population or a sample belonging to different categories. Relative frequency of a category = Frequency of a category/Sum of all frequencies Percentages = Relative frequency x 100%
  • 82. c) Line Graph It is a graph consists of independent and dependent variables, where the former (IV) is written on x-axis and the latter (DV) on y-axis and points are joined by line segments.
  • 83. Frequency Distribution and Graphs for Grouped Data  Some basic technical terms when continuous frequency distribution is formed or data are classified (grouped) according to class interval. a) Class limits  It is the lowest and the highest values that can be included in the class. E.g. 30-40; 30 is the lower (lowest) limit (L) and 40 is the upper (highest) limit (U).  Sometimes, a class limit is missing either at the end of the first class interval or at the end of the last class interval or both are not specified. In such case the classes are known as open-end classes. Example: below 18, 18-20, 21-23, 24-26, 27 and above years of age
  • 84. b) Class boundaries: Are the midpoints of the upper class limit of one class and the lower class limit of the subsequent class (LCB = LCL-0.5, UCB = UCL+.0.5). Example: class limits class boundaries 30-40 29.5-40.5 41-50 40.5-50.5 c) Class interval It is the size of each grouping of data 50-75, 76-101, 102- 127, are class intervals. d) Width or size of the class interval (c) c = Range/no. of classes The difference between the lower or upper class limit of one class and the lower or upper class limit of the previous class.
  • 85. e) Number of Class Intervals Should not be too many (5-20). To decide the number of class intervals, we choose the lowest and the highest values. The difference between them will enable us to determine class intervals. f) Range (R)  The difference between the largest and the smallest value of the observation. g) Mid value or mid point  It is the central point of a class boundary/interval  It is found by adding the upper and lower limits of a class and dividing the sum by 2 Mid point = L+U 2
  • 86. h) Frequency (f)  Number of observations falling within a particular class interval Weight (in Kg) No. of persons (f) 30-40 25 41-50 15 51-60 45 Total 85
  • 87. Rules to Construct Frequency Distribution of Grouped Data 1) There should be between 5-20 classes 2) The class width should be an odd number. This ensures that the midpoint of each class has the same place value as the data. Always we have to round up. 3) The classes must be mutually exclusive. Mutually exclusive classes have non over lapping class limits so that data cannot be placed into two classes E.g. 10-20, 21-31, 32-42, 43-53 4) The classes must be continuous. Even if there are no values in a class, the class must be included in the frequency distribution
  • 88. 5. The classes must be exhaustive. There should be enough classes to accommodate all the data 6. The classes must be equal in width. This avoids a distorted view of data. Exception is in open-ended
  • 89.
  • 90.
  • 91. E.g. The number of hours 40 employees spends on their job for the last 30 working days is given below. Construct a group frequency distribution for these data using 8 classes 62 50 35 36 31 43 43 43 41 31 65 30 41 58 49 41 37 62 27 47 65 50 45 48 27 53 40 29 63 34 44 32 58 61 38 41 26 30 47 37
  • 92. Solution: Step1. Max= 65, Mim= 26 that Range= 65- 26= 39 Step2. It is given 8 classes Step3. Class width C= 39/8 = 4.875 ~ 5 Step4. Starting point 26= lower limit of the first class. Hence the lower class limits become 26 31 36 41 46 51 56 61 Step 5. The upper limit of the first class = 31- 1= 30 and hence the upper class limit become 30 35 40 45 50 55 60 65
  • 93. The lower and the upper class limits can be written as follows:
  • 94.
  • 96. Exercise: Construct a group frequency distribution for the weight in kg of 35 students data using 8 classes 52 60 56 81 72 72 55 73 60 58 55 65 75 56 55 81 64 60 50 70 58 70 55 52 56 55 50 64 65 60 50 52 60 53 64
  • 97.  Some basic types of graph we use for grouped/ continuous data are the following: a) Histogram  A histogram is a graph in which class limits/boundaries are marked on the horizontal axis (x) and the frequencies, relative frequencies or percentages are marked on the vertical axis (y).  The frequencies, relative frequencies, percentages are represented by the heights of bars.  The bars are drawn adjacent to each other.  A histogram is called a frequency, a relative frequency or a percentage histogram.  Used to summarize measurement data.
  • 98.  Relative frequency = Frequency of aclass Sum of all frequencies  Percentage = Relative Frequency x 100% Weight (in Kg) No. of persons (f) Relative frequency Percentage (%) 29.5-40.5 25 0.25 25 40.5-50.5 15 0.15 15 50.5-60.5 45 0.45 45 60.5-70.5 10 0.10 10 70.5-80.5 5 0.05 5 Total 100 1 100
  • 99.
  • 100. B) Frequency Polygon A graph formed by joining the midpoints of the tops of successive bars in a histogram with a straight lines. A polygon with a relative frequency marked on the vertical axis is known as a relative frequency polygon. Similarly a polygon with percentages marked on the vertical axis is called a percentage polygon. Class boundaries Frequency Mid point 5.5-10.5 1 8 10.5-15.5 2 13 15.5-20.5 6 18 20.5-25.5 8 23 25.5-30.5 3 28 Total 30
  • 101.
  • 102. C) The Cumulative Frequency Graph/ Ogive  It is a curve drawn for the cumulative frequency, cumulative relative frequency or cumulative percentage by joining with straight lines the dots marked above the upper boundaries of classes at height equal to cumulative frequencies, cumulative relative frequencies or cumulative percentages of the respective classes.
  • 103. Data Example: Class Boundary Frequency Cumulative Frequency Cumulative Relative Frequency Cumulative Percentage 5.5-10.5 1 1 0.05 5 10.5-15.5 2 3 0.15 15 15.5-20.5 6 9 0.45 45 20.5-25.5 8 17 0.85 85 25.5-30.5 3 20 1 100 Total 20
  • 104. Ogive:
  • 105. Less than and More than Cumulative Frequency Example: Class limits Frequency Less than Cf More than Cf 5-9 1 1 35 10-14 2 3 34 15-19 6 9 32 20-24 8 17 26 25-29 15 32 18 30-34 3 35 3 Total
  • 106. 4. Normal Curve and Standard Score Definition: The normal distribution curve can be used to study many variable that are approximately normal.  Normal distribution curve depend on the mean and standard deviation.  when the majority of the data fall to the right of the mean, the distribution said to be negatively skewed. The tail of the curve is to the left and vise a versa. Skewness: It refers to a distortion or asymmetry that deviates from the symmetrical bell curve, or normal distribution, in a set of data. If the curve is shifted to the left or to the right, it is said to be skewed. Skewness can be quantified as a representation of the extent to which a given distribution varies from a normal distribution. A normal distribution has a skew of zero, while a lognormal distribution, for example, would exhibit some degree of right-skew.
  • 107. Key Points: Skewness, in statistics, is the degree of asymmetry observed in a probability distribution. Distributions can exhibit right (positive) skewness or left (negative) skewness to varying degrees. A normal distribution (bell curve) exhibits zero skewness. Investors note right-skewness when judging a return distribution because it, like excess kurtosis, better represents the extremes of the data set rather than focusing solely on the average.
  • 108.  Skewedness: when the majority of the data fall to the left or right of the mean, the distribution said to be skewed. The mean of positively skewed data will be greater than the median. In a distribution that is negatively skewed, the exact opposite is the case: the mean of negatively skewed data will be less than the median. If the data graphs symmetrically, the distribution has zero skewness, regardless of how long or fat the tails are.
  • 109. Kurtosis: it is the degree of peakedness of a distribution It is a measure of whether the data are flat/peaked relative to a normal distribution. The data sets with high kurtosis tend to have a distinct sharp peak near the mean (+ve kurtosis), declining rather rapidly and have heavy tail. The data sets with low kurtosis tend to have a flat top (-ve kurtosis) near the mean rather than a sharp peak. Kurtosis is a statistical measure that defines how heavily the tails of a distribution differ from the tails of a normal distribution. In other words, kurtosis identifies whether the tails of a given distribution contain extreme values.
  • 110. 1. Mesokurtic: Data that follows a mesokurtic distribution shows an excess kurtosis of zero or close to zero. This means that if the data follows a normal distribution, it follows a mesokurtic distribution. Types of Kurtosis: The types of kurtosis are determined by the excess kurtosis of a particular distribution. The excess kurtosis can take positive or negative values, as well as values close to zero.
  • 111. 2. Leptokurtic: Leptokurtic indicates a positive excess kurtosis. The leptokurtic distribution shows heavy tails on either side, indicating large outliers. In finance, a leptokurtic distribution shows that the investment returns may be prone to extreme values on either side. Therefore, an investment whose returns follow a leptokurtic distribution is considered to be risky.
  • 112. 3. Platykurtic: A platykurtic distribution shows a negative excess kurtosis. The kurtosis reveals a distribution with flat tails. The flat tails indicate the small outliers in a distribution. In the finance context, the platykurtic distribution of the investment returns is desirable for investors because there is a small probability that the investment would experience extreme returns.
  • 113. Characteristics of Normal Curve:  The curve is a bell-shaped distribution of a variables.  The mean, median and mode are equal and located at the center of the distribution.  It is a unimodal distribution /only one mode/  The curve is symmetrical about the mean.  The curve is continuous, i. e there are no gap.  The curve never touches the x-axis. But it gets increasingly closer to the axis.  The total area under the normal distribution curve is equal to 1 or 100%. This fact may seem unusual, since the curve never touches the x-axis.  The area under the normal curve that lies b/n -1 and +1 s is approximately 0.68or 68%  -2 and +2 s is approximately 0.95 or 95%  -3 and +3 s is approximately 0.997 or 99.7%
  • 115.
  • 116.
  • 117.
  • 118.
  • 119.
  • 120.
  • 121.
  • 122.
  • 123. 5. Confidence Interval for the Mean, Proportions, and Variances In statistics, a confidence interval (CI) is a type of estimate computed from the statistics of the observed data. This gives a range of values for an unknown parameter (for example, a population mean). The interval has an associated confidence level that gives the probability with which the estimated interval will contain the true value of the parameter. The confidence level is chosen by the investigator. For a given estimation in a given sample, using a higher confidence level generates a wider (i.e., less precise) confidence interval. In general terms, a confidence interval for an unknown parameter is based on sampling the distribution of a corresponding estimator.
  • 124.  Confidence Interval for the Mean: A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data. The common notation for the parameter in question is . Often, this parameter is the population mean , which is estimated through the sample mean . The level C of a confidence interval gives the probability that the interval produced by the method employed includes the true value of the parameter .
  • 125. Example: Suppose a student measuring the boiling temperature of a certain liquid observes the readings (in degrees Celsius) 102.5, 101.7, 103.1, 100.9, 100.5, and 102.2 on 6 different samples of the liquid. He calculates the sample mean to be 101.82. If he knows that the standard deviation for this procedure is 1.2 degrees, what is the confidence interval for the population mean at a 95% confidence level? In other words, the student wishes to estimate the true mean boiling temperature of the liquid using the results of his measurements. If the measurements follow a normal distribution, then the sample mean will have the distribution N ( , ). Since the sample size is 6, the standard deviation of the sample mean is equal to 1.2/sqrt (6) = 0.49.
  • 126. In the example above, the student calculated the sample mean of the boiling temperatures to be 101.82, with standard deviation 0.49. The critical value for a 95% confidence interval is 1.96, where (1-0.95)/2 = 0.025. A 95% confidence interval for the unknown mean is ((101.82 - (1.96*0.49)), (101.82 + (1.96*0.49))) = (101.82 - 0.96, 101.82 + 0.96) = (100.86, 102.78). As the level of confidence decreases, the size of the corresponding interval will decrease. Suppose the student was interested in a 90% confidence interval for the boiling temperature. In this case, C = 0.90, and (1-C)/2 = 0.05. The critical value z* for this level is equal to 1.645, so the 90% confidence interval is ((101.82 - (1.645*0.49)), (101.82 + (1.645*0.49))) = (101.82 - 0.81, 101.82 + 0.81) = (101.01, 102.63)
  • 127.  Confidence Interval for the Proportions: Suppose we wish to estimate the proportion of people with diabetes in a population or the proportion of people with hypertension or obesity. These diagnoses are defined by specific levels of laboratory tests and measurements of blood pressure and body mass index, respectively. Subjects are defined as having these diagnoses or not, based on the definitions. When the outcome of interest is dichotomous like this, the record for each member of the sample indicates having the condition or characteristic of interest or not. Recall that for dichotomous outcomes the investigator defines one of the outcomes a "success" and the other a failure. The sample size is denoted by n, and we let x denote the number of "successes" in the sample.
  • 128. For example, if we wish to estimate the proportion of people with diabetes in a population, we consider a diagnosis of diabetes as a "success" (i.e., and individual who has the outcome of interest), and we consider lack of diagnosis of diabetes as a "failure." In this example, X represents the number of people with a diagnosis of diabetes in the sample. The sample proportion is p̂ (called "p-hat"), and it is computed by taking the ratio of the number of successes in the sample to the sample size, that is:
  • 129. The point estimate for the population proportion is the sample proportion, and the margin of error is the product of the Z value for the desired confidence level (e.g., Z=1.96 for 95% confidence) and the standard error of the point estimate. In other words, the standard error of the point estimate is: This formula is appropriate for samples with at least 5 successes and at least 5 failures in the sample. This was a condition for the Central Limit Theorem for binomial outcomes. If there are fewer than 5 successes or failures then alternative procedures, called exact methods , must be used to estimate the population proportion.
  • 130. Example: During the 7th examination of the Offspring cohort in the Framingham Heart Study there were 1219 participants being treated for hypertension and 2,313 who were not on treatment. If we call treatment a "success", then x=1219 and n=3532. The sample proportion is: This is the point estimate, i.e., our best estimate of the proportion of the population on treatment for hypertension is 34.5%. The sample is large, so the confidence interval can be computed using the formula:
  • 131. Substituting our values we get which is So, the 95% confidence interval is (0.329, 0.361). Thus we are 95% confident that the true proportion of persons on antihypertensive medication is between 32.9% and 36.1%.
  • 132.  Confidence Interval for the Variances When using a sample to calculate a statistic we are estimating a population parameter. It is just an estimate and the sample due to the nature of drawing a sample may not create a value (statistic) that is close to the actual value (parameter). We can calculate confidence interval about the statistic to determine where the true and often unknown parameter may exist. This includes the calculation of a variance statistic. If you were to draw many different samples all the same size from a population and plot the variance statistic the resulting distribution is likely to fit a χ2 distribution. Plotting the means creates a normal distribution which is symmetrical and produced symmetrical confidence intervals. The χ2 distribution is not symmetrical and will produce asymmetrical intervals.
  • 133. Confidence Intervals for Variances and Standard Deviations: Estimates of population means can be made from sample means, and confidence intervals can be constructed to better describe those estimates. Similarly, we can estimate a population standard deviation from a sample standard deviation, and when the original population is normally distributed, we can construct confidence intervals of the standard deviation as well. The Theory: Variances and standard deviations are a very different type of measure than an average, so we can expect some major differences in the way estimates are made. We know that the population variance formula, when used on a sample, does not give an unbiased estimate of the population variance. In fact, it tends to underestimate the actual population variance. For that reason, there are two formulas for variance, one for a population and one for a sample.
  • 134. The sample variance formula is an unbiased estimator of the population variance. (Unfortunately, the sample standard deviation is still a biased estimator.) Also, both variance and standard deviation are nonnegative numbers. Since neither can take on a negative value, the domain of the probability distribution for either one is not (−∞,∞)(−∞,∞), thus the normal distribution cannot be the distribution of a variance or a standard deviation. The correct PDF must have a domain of [0,∞)[0,∞). It can be shown that if the original population of data is normally distributed, then the expression (n−1)s2σ2(n−1)s2σ2 has a chi-square distribution with n−1n−1 degrees of freedom.
  • 135. The chi-square distribution of the quantity (n−1)s2σ2(n−1)s2σ2 allows us to construct confidence intervals for the variance and the standard deviation (when the original population of data is normally distributed). For a confidence level 1−α1−α, we will have the inequality χ21−α/2≤(n−1)s2σ2≤χ2α/2χ1−α/22≤(n−1)s2σ2≤χα/22. Solving this inequality for the population variance σ2σ2, and then the population standard deviation σσ, leads us to the following pair of confidence intervals. It is worth noting that since the chi- square distribution is not symmetric, we will be obtaining confidence intervals that are not symmetric about the point estimate.
  • 136. Example: A statistician chooses 27 randomly selected dates, and when examining the occupancy records of a particular motel for those dates, finds a standard deviation of 5.86 rooms rented. If the number of rooms rented is normally distributed, find the 95% confidence interval for the population standard deviation of the number of rooms rented. For a sample size of n=27n=27, we will have df=n−1=26df=n−1=26 degrees of freedom. For a 95% confidence interval, we have α=0.05α=0.05, which gives 2.5% of the area at each end of the chi-square distribution. We find values of χ20.975=13.844χ0.9752=13.844 and χ20.025=41.923χ0.0252=41. 923. Evaluating (n−1)s2χ2(n−1)s2χ2, we obtain 21.297 and 64.492. This leads to the inequalities 21.297≤σ2≤64.49221.297≤σ2≤64.492 for the variance, and taking square roots, 4.615≤σ≤8.0314.615≤σ≤8.031 for the standard deviation.
  • 137.
  • 138.
  • 139.
  • 140. Calculate the Variance and Standard Deviation of the Data Given Below. 35, 45, 30, 35, 40, 25 Variance ad standard deviation for grouped data 
  • 141.
  • 142.
  • 143. 6. Hypothesis Testing with One and two Sample Hypothesis is a suggested answer to the problem under investigation. – John T. Townsend A hypothesis is a tentative generalization, the validity of which remains to be tested. – J. W. Best A hypothesis is a proposition which can be put to test to determine its validity. It may be proved correct or incorrect. – Good & Hatt A hypothesis is a conjectural/hypothetical statement of the relation between two or more variables. – F. N. Kerlinger Meaning of Hypothesis:  On the basis of the definitions, we can say that hypothesis is an assumption that is still not proved but shows the probable solution of the problem or predicts the relationship between two or more variables.  The assumption is proved true or false by testing it.
  • 144.  We will not have the solution to the problem until the assumption is tested.  Three points, regarding such assumptions, are very important.  The assumptions are made on the basis of previous experiences or primary evidences or by thinking logically.  Whether the assumptions are true or false is decided by testing them.  Testing of assumptions lead to the solution of the problem.  Suppose we are watching some television programme and suddenly the TV gets off.  What will be our reactions to this problem? We start thinking of the reasons of the problem like.
  • 145.  Perhaps there is an interruption in the flow of electricity or  There may be a problem in particular channel or  There may be a loose connection of the cable with TV or  There may be a problem in the system of cable operator.  We will make such assumptions on the basis of our previous experiences. Now we will check all the possible reasons of the problem. For that, first, we will check if there is any problem in the flow of electricity.  If electric supply is found okay, we will check if other channels are working or not.  If other channels are found okay, we will check whether the cable connection is proper or not.  If everything is found okay, then we will call cable operator to solve the problem.
  • 146.  In this way, we will collect the evidence and analyse it logically. By testing all the evidences, we come to the conclusion about the solution of the problem. We make many assumptions in our routine life to find the solution to our daily problems. Characteristics of a good hypothesis:  A good hypothesis never opposes the universal truth and natural law and rules.  It is written in simple and easy language.  Only one assumption is made in one hypothesis.  The hypothesis is written in such a language that, after testing, it can be clearly rejected or not rejected.
  • 147.  Hypothesis is written in present tense because it is not a prediction or opinion but it is an assumption that is based on present factual information.  A good hypothesis assures that the tool required for testing it (hypo) is available or can be prepared (developed) easily.  Before formulating the hypothesis, it is assured that the data will be available for testing it. Types of Hypothesis: It is very difficult to give such a classification of hypotheses as can be accepted universally because different scholars have classified the hypotheses in different ways.
  • 148.  On the basis of different classifications, the types of hypotheses can be described as follows.  A hypothesis is of two types:  Null hypothesis: Null hypothesis is a type of hypothesis in which we assume that the sample observations are purely by chance. It is denoted by H0.  Alternate hypothesis: The alternate hypothesis is a hypothesis in which we assume that sample observations are not by chance. They are affected by some non-random situation. An alternate/Declarative hypothesis are denoted by H1 ,H2, H3 Hn or Ha.  Some Special features of alternate /declarative hypotheses are as follows:  Researcher formulates the declarative hypotheses on the basis of pre-experience, study of research material or on the basis of the findings of previous researches.
  • 149.  Such hypotheses are formulated on the basis of expected findings of the research.  Such hypothesis is accepted when null hypothesis is rejected. Hypothesis in Question Form:  In this type of hypothesis, instead of expecting a certain result, a questions is formed asking whether certain type of result will be there or not. E.g. in the context of the research title ‘STUDY OF EXAM ANXIETY OF HIGHER SECONDARY SCHOOLS’ students in the context of their stream’ the question type hypothesis will be as follows:  Is there difference between exam anxiety of arts, commerce and science students of higher secondary schools? OR
  • 150.  Is the exam anxiety of arts students of higher secondary schools more than that of science students? OR  Is the exam anxiety of commerce students of higher secondary schools more than that of science students? OR  Is the exam anxiety of commerce students of higher secondary schools more than that of arts students? OR Null Hypothesis: If, in the context of dependent variable, the hypothesis indicates ‘no difference’ between two or more levels of independent variable, it is called null hypothesis. Null hypothesis indicates no relationship between two variables, if correlational study is there.
  • 151.  Null hypothesis is indicated by the symbol Ho. Such hypothesis is also called ‘no difference’ type of hypothesis or ‘no relation’ type of hypothesis.  Let’s take examples to understand this type of hypothesis.  For the study of the impact of instructional method on the achievement of the students of grade nine in English, the null hypotheses will be as follows: Special features of Null Hypothesis:  Researchers prefer to formulate null hypotheses due to their some special features. These features are as follows:  It is formulated objectively and not affected by the subjectivity of the researcher.  It believes in ‘no difference’ or ‘no relationship’. So the researcher does not tend to be biased for certain type of the result and works freely.
  • 152.  It helps in making the entire research process objective (unbiased).  It is tested at certain level of significance. Hypothesis Testing: Hypothesis testing is a part of statistics in which we make assumptions about the population parameter. So, hypothesis testing mentions a proper procedure by analysing a random sample of the population to accept or reject the assumption. Steps of Hypothesis Testing:  The process to determine whether to reject a null hypothesis or to fail to reject the null hypothesis, based on sample data is called hypothesis testing. It consists of four steps:  Define the null and alternate hypothesis
  • 153.  Define an analysis plan to find how to use sample data to estimate the null hypothesis  Some analysis on the sample data to create a single number called ‘test statistic’.  Understand the result by applying the decision rule to check whether the Null hypothesis is true or not.  If the value of t-test is less than the significance level we will reject the null hypothesis, otherwise, we will fail to reject the null hypothesis.  Technically, we never accept the null hypothesis, we say that either we fail to reject or we reject the null hypothesis.
  • 154. Errors in hypothesis testing: We have explained what is hypothesis testing and the steps to do the testing. Now during performing the hypothesis testing, there might be some errors.  We classify these errors in two categories.  Type-1 error: Type 1 error is the case when we reject the null hypothesis but in actual it was true. The probability of having a Type-1 error is called significance level alpha(α).  Type-2 error: A type II error is a statistical term used within the context of hypothesis testing that describes the error that occurs when one accepts a null hypothesis that is actually false. or
  • 155.  Type 2 error is the case when we fail to reject the null hypothesis but actually it is false. A type II error produces a false negative, also known as an error of omission. For example, a test for a disease may report a negative result, when the patient is, in fact, infected.  This is a type II error because we accept the conclusion of the test as negative, even though it is incorrect.  where as a type II error describes the error that occurs when one fails to reject a null hypothesis that is actually false. The error rejects the alternative hypothesis, even though it does not occur due to chance.  The probability of having a type-2 error is called beta(β). Therefore,  α= (Null hypothesis rejected | Null hypothesis is true)  β= (Null hypothesis accepted | Null hypothesis is false)
  • 156. Two-Tailed test: All that we are interested in is a difference. Consequently, when an experimenter wishes to test the null hypothesis, Ho: M1 - M2, against its possible rejection and finds that it is rejected, then he may conclude that a difference really exists between the two means. But he does not make any assertion about the direction of the difference. Such a test is a non-directional test. It is also named as two- tailed test, because it employs both sides, positive and negative, of the distribution (normal or t distribution) in the estimation of probabilities. Let us consider the probability at 5% significance level in a two-tailed test with the help of figure 1.1
  • 157. Mean of the distribution of scores concerning the difference between sample means) Figure 1.1
  • 158.  Therefore, while using both the tails of the distribution we may say that 2.5% of the area of the normal curve falls to the right of 1.96 standard deviation units above the mean and 2.5% falls to the left of 1.96 standard deviation units below the mean.  The area outside these limits is 5% of the total area under the curve. In this way, for testing the significance at the 5% level, we may reject a null hypothesis if the computed error of the difference between means reaches or exceeds the yardstick 1.96.  Similarly, we may find that a value of 2.58 is required to test the significance at the 1% level in the case of a two-tailed test.
  • 159. One-tailed test:  Here, we have reason to believe that the experimental group will score higher on the mathematical computation ability test which is given at the end of the session.  In our experiment we are interested in finding out the gain in the acquisition of mathematical computation skills (we are not interested in the loss, as it seldom happens that coaching will decrease the level of computation skill).  In cases ,we make use of the one-tailed or directional test, rather than the two-tailed or non-directional test to test the significance of difference between means.  Consequently, the meaning of null hypothesis, restricted to an hypothesis of on difference with two-tailed test, will be somewhat extended in a one-tailed test to include the direction--positive or negative-of the difference between means.
  • 160. One-tailed test at the 5% level (Mean of the distribution of the score concerning the difference between sample means) figure 1.2
  • 161.  Thus 5% level of significance we will have 5% of the area. all in one tail (at the 1.64 standard deviation unit above the mean rather than having it equally divided into two tails as shown in Figure 1.1, for a two-tailed test.  Consequently, in a one-tailed test for testing the difference between large sample means, z score of 1.64 will be taken as a yardstick at the 5% level of significance for the rejection of the null hypothesis instead of 1.96 in the case of a two-tailed test.  In the case of at distribution of small sample means, in making use of the one-tailed test we have to look for the critical / values, written in Table C of the Appendix, against the row (N1 + N2 - 2) degrees of freedom under the columns labelled 0.10 and 0.02 (instead of 0.05 and 0.01 as in the case of two-tailed test) to test the significance at 0.05 and 0.01% levels of significance, respectively.
  • 162. 7. Two-way Analysis of Variance Analysis Of Variance:  we know the use of z and t tests for testing the significance of the difference between the means of two samples or groups. In practice where we have several samples of the same general character drawn from a given population and wish to know whether there are any significant differences between the means of these samples.  In such situations, several differences between several means are determined in a statistical technique known by the name analysis of variance in which all the data are treated together and a general null hypothesis of no difference among the means of the various samples or groups is tested.
  • 163. Meaning of the Term 'Analysis of Variance : A composite procedure for testing simultaneously the difference between several sample means is known as the analysis of variance. It helps us to know whether any of the differences between the means of the given samples are significant. If the answer is 'yes', we examine pairs (with the help of the t test) to see just where the significant differences lie. If the answer is 'no', we do not proceed further. The term 'analysis of variance' deals with the task of analysing of breaking up the total variance of a large sample or a population consisting of a number of equal groups or sub- samples into components (two kinds of variances).
  • 164. Two kinds of variances, given as follows: 1. "Within-groups" variance. This is the average variance of the members of each group around their respective group means, i.e. the mean value of the scores in a sample (as members of each group may vary among themselves). 2. "Between-groups" variance. This represents the variance of group means around the total or grand mean of all groups, i.e. the best estimate of the population mean (as the group means may vary considerably from each other). For deriving more useful results, we can use variance as a measure of dispersion (deviation from the mean) in place of some useful measures like standard deviation. Consequently, the variance of an individual's score from the grand mean may be broken into two parts (as pointed out earlier), viz. within-group variance and between groups variance.
  • 165. In this way, the technique of analysis of variance as a single composite test of significance, for the difference between several group means demands the derivation of two estimates of the population variance, one based on variance of group means (between groups variance) and the other on the average variance within the groups (within groups variance.) Ultimately, the comparison of the size of between groups variance and within-groups variance called F-ratio denoted by between-groups variance within-groups variances used as a critical ratio for determining the significance of the difference between group means at a given level of significance.
  • 166. Procedure For Calculating The Analysis Of Variance: As already pointed out the deviation of an individual's score belonging to a sample or a group of population from the grand mean can be is divided into two parts: (i) deviation of the individual's score from group mean; and (ii) deviation of the group mean from the grand mean. Consequently, the total variance of the scores of all individuals included in the study may be partitioned into within-group variance and between-groups variance. The formula used for the computation of variance ( ) is Σ X/N, i.e. sum of the squared deviation from the mean value divided by total frequencies.
  • 167. Hence, by taking N as the common denominator, the total sum of the squared deviation of scores around the grand or general mean for all groups combined can be made equal to the sum of the two partitioned, between groups and within-groups sum of squares. Mathematically, Total sum of squares (around the general mean)= between-groups sum of squares + within-groups sum of squares or Hence the procedure for the analysis of variance involves the following main tasks: Computation of total sum of squares ( ) Computation of between-groups sum of squares ( ) Computation of within-groups sum of squares ( ) Computation of F-ratio Use of t test (if the need for further testing arises).
  • 168. We understand these steps by adopting the following terminology: X = Raw score of any individual included in the study (any score entry in the given table)  ΣX = Grand Sum Grand or general mean  X1 , X2 denote scores within first group, second group. ... n1, n2, n3 =  n1 , n2 , n3 = No. of individuals in first, second and third groups ….. denote means of the first group, second group N = Total No. of score or frequencies  = n1+n2+n3+……..
  • 169.  Step 1. Arrangement of the given table and computation of some initial values.  In this step, the following values needed in computation are calculated from the experimental data arranged in proper tabular form:  Sum of square …and the grand sum , ΣX  Group means ,…and the grand sum mean  Correlation term C computed by the formula  C =
  • 170. Source of variation Sum of squares df Mean square variance Between- groups K-1 Within-groups N-K
  • 171. Let us illustrate now the whole process of using the analysis of variance technique with the help of an example, Example 1.1: The aim of an experimental study was to determine the effect of three different techniques of training on the learning of a particular skill. Three groups, each consisting of seven students of class PG in Adola college, assigned randomly, were given training through these different techniques. The scores obtained on a performance test were recorded as follows: Group I Group II Group III 3 4 5 5 5 5 3 3 5 1 4 1 7 9 7 3 5 3 6 5 7
  • 172. Test the difference between groups by adopting the analysis of variance technique. Solution: Step 1. Original table computation Rating of the coaching experts Groups I Groups I Groups I Groups I 3 4 5 12 5 5 5 15 3 3 5 11 1 4 1 6 7 9 7 23 3 5 3 11 6 5 7 18
  • 173. Here n1= n2 = n3 = 7 , N= n1+n2+n3 = 21 Group means = Correction term C =
  • 174. Step 2. Squared-table computation Total 9 16 25 50 25 25 25 75 9 9 25 43 1 16 1 18 49 81 49 179 9 25 9 43 36 25 49 110
  • 175.
  • 176.
  • 177. Step 7. Calculation of F-ratio Source of variation Sum of squares df Mean square variance/df Between-groups 2 3.72/2 = 1.86 Within-groups 18 75.43/18 = 4.19 = 1.86/4.19 = 0.444
  • 178. Step 8. Interpretation of F-ratio. The F-ratio table (Table R given in the Appendix) is referred to for 2 degrees of freedom for smaller mean square variance on the left-hand side, and for 18 degrees of freedom for greater mean square variance across the top. The critical values of F obtained by interpolation are as follows: Critical value of F = 19.43 at 0.05 level of significance Critical value of F = 99.44 at 0.01 level of significance Our computed value of F (0.444) is not significant at both the levels of significance and hence, the null hypothesis cannot be rejected and we may confidently say that the differences between means are not significant and therefore, there is no need for further testing with the help of t test.
  • 179. Two-way analysis of variance: In one-way analysis of variance involving one experimental variable(independent variable). If two experimental variables are . Such experiments involve two-way classification based on two experimental variables. In this way, we have to simultaneously study the impact of two experimental variables, each having two or more levels, characteristics or classifications and hence we have to carry out the two-way analysis of variance. So far, we have dealt with one-way analysis of variance involving one experimental variable. However, experiments may be conducted simultaneous of two experimental variables. Such experiments involve two-way classification based on two experimental variables. Let us make some distinction between the need for carrying out one-way and two-way analyses of variance through some illustrations.
  • 180. Suppose that we want to study the effect of four different methods of teaching. Here, the method of teaching is the experimental variable (independent variable which is to be applied at four levels). We take four groups of students randomly selected from a class. These four groups are taught by the same teacher in the same school but by different methods. At the end of the session, all the groups are tested through an achievement test by the teacher. The mean scores of these four groups are computed. If we are interested in knowing the significance of the differences between the means of these groups, the best technique is the analysis of variance. Since only one experimental variable (effect of the method of teaching) is to be studied, we have to carry out one-way analysis of variance.
  • 181. Let us assume that there is one more experimental or independent variable in the study, in addition to the method of teaching. Let it be the school system at three levels which means that three school systems are chosen for the experiment. These systems can be government school, government-aided school and public school. Now the experiment will involve the study of 4 x 3 groups. We have groups each in the three types of schools (all randomly selected). The achievement scores of these groups can then be compared by the method of analysis of variance by establishing a null hypothesis that neither the school system nor the methods have anything to do with the achievement of pupils.
  • 182. In this way, we have to simultaneously study the impact of two experimental variables, each having two or more levels, characteristics or classifications and hence we have to carry out the two-way analysis of variance. In the two-way classification situation, an estimate of population variance, i.e. total variance is supposed to be broken into: (i) variance due to methods alone, (ii) variance due to school alone, and (1) S residual variance in the groups called interaction variance (M x S; M= methods, S = schools) which may exist on account of the following factors: Chance, Uncontrolled variables like lack of uniformity in teachers. Relative merits of the methods (which may differ from one school to another).
  • 183. In other words, there may be interaction between methods and schools which means that although no method may be regarded as good or bad in general, yet it may be more suitable or unsuitable for a particular school system. Hence, the presence of interaction variance is unique feature with all the experimental studies involving two or more experimental variables. Example 7 : In a research study, there were two experimental or independent variables: a seven-member group of player and three coaches who were asked to rate the players in terms of a particular trait on a ten-point scale. The data were recorded as under: Rating by three Coaches Players 1 2 3 4 5 6 7 A 3 5 3 1 7 3 6 B 4 5 3 4 9 5 5 C 5 5 5 1 7 3 7
  • 184. Apply the technique of analysis of variance for analyzing these data Step 1. Arrangement of the data in a proper table and computation of essential initial values. Rating of coach Players Total of Rows Square of the total of rows 1 3 4 5 12 144 2 5 5 5 15 225 3 3 3 5 11 121 4 1 4 1 6 36 5 7 9 7 23 529 6 3 5 3 11 121 7 6 5 7 18 324 Total of Columns
  • 185. Total 9 16 25 50 25 25 25 75 9 9 25 43 1 16 1 18 49 81 49 179 9 25 9 43 36 25 49 110
  • 186.
  • 187. Step 7 Computation of F-ratio. Table illustration the computation of mean square variance value. Source of variation df Sum of square Mean square variance Rows (Players ) (r -1) = 6 61.15 Columns (Coaches) (c -1) = 2 3.72 Interaction or residual (r -1) (c -1) = 12 14.28
  • 188. Step 8 Interpretation of F-ratio. Table gives the critical value of F: Kind of f df for greater mean square variance df for smaller mean square variance Critical value of F at 0.05 level Critical value of F at 0.01 level Judgem ent about the significa nce of compute d F Conclusi on F (for rows) 2 12 3.88 6.93 Significa nt at both level Null hypothe sis rejected F (for column) 6 12 3.00 4.82 Not significa nt both levels Null Hypothe sis not rejected
  • 189. Here, F (for rows) is highly significant and hence null hypothesises rejected. It indicates that the coaches did discriminate among the players. The second F (for columns) is insignificant and hence, the null hypothesis cannot be rejected. It indicates that the coaches did not differ significantly among themselves in their ratings of the players. In other words, their ratings may be taken as trustworthy and reliable.
  • 190. Underlying Assumptions In Analysis Of Variance: The following are the fundamental assumptions for the use of analysis of variance technique 1. The dependent variable which is measured should be normally distributed in the population. 2. The individuals being observed should be distributed randomly in the groups. 3. Within-groups variances must be approximately equal. 4. The contributions to variance in the total sample must be additive.
  • 191. 8. Correlation and Simple Linear Regression Regression is a method for studying the relationship between two or more quantitative variables Simple linear regression (SLR): One quantitative dependent variable  response variable  dependent variable  Y One quantitative independent variable  explanatory variable  predictor variable  X
  • 192.  Multiple linear regression: One quantitative dependent variable Many quantitative independent variables  SLR Examples:  predict salary from years of experience  estimate eect of lead exposure on school testing performance  predict force at which a metal alloy rod bends based on iron content  Example: Health data Variables:  Percent of Obese Individuals  Percent of Active Individuals
  • 193. Sl No Percent Obesity Percent Active 1 29.7 55.3 2 28.9 51.9 3 35.9 41.2 4 24.7 56.3 5 21.3 60.4 6 26.3 50.9 . . . .
  • 194. Graph:
  • 195. A scatterplot or scatter diagram can give us a general idea of the relationship between obe-sity and activity... Graph: The points are plotted as the pairs (xi; yi) for i = 1; : : : ; 25
  • 196. Inspection suggests a linear relationship between obesity and activity (i.e. A straight line would go through the bulk of the points, and the points would look randomly scattered around this line). Simple Linear Regression The model  The basic model Yi = β0 + β1xi + Єi  Yi is the observed response or dependent variable for observation i  xi is the observed predictor, regressor, explanatory variable, independent variable, covariate  Єi is the error term  Єi are iid N(0; 2) (iid means independently and identically distributed)
  • 197.  So, E[Yi/xi] = β0 + β1xi + β0 = β0 + β1xi The conditional mean (i.e. the expected value of Yi given xi, or after conditioning on xi) is “β0 + β1xi" (a point on the estimated line). Some interpretation of parameters:  β0 is conditional mean when x=0  β1 is the slope, also stated as the change in mean of Y per 1 unit change in x  σ2 is the variability of responses about the conditional mean
  • 198. Graph:
  • 199.  Simple Linear Regression Assumptions Key assumptions  linear relationship exists between Y and x *we say the relationship between Y and x is linear if the means of the conditional distributions of Y jx lie on a straight line  independent errors (this essentially equates to independent observations in the case of SLR)  constant variance of errors  normally distributed errors
  • 200. 9. CHI-SQUARE Chi-square tests are often used in hypothesis testing. The chi-square statistic compares the size any discrepancies between the expected results and the actual results, given the size of the sample and the number of variables in the relationship. What Is a Chi-Square Statistic? A chi-square (χ2) statistic is a test that measures how a model compares to actual observed data. The data used in calculating a chi-square statistic must be random, raw, mutually exclusive, drawn from independent variables, and drawn from a large enough sample. For example, the results of tossing a fair coin meet these criteria.
  • 201. Chi-square tests are often used in hypothesis testing. The chi- square statistic compares the size any discrepancies between the expected results and the actual results, given the size of the sample and the number of variables in the relationship. For these tests, degrees of freedom are utilized to determine if a certain null hypothesis can be rejected based on the total number of variables and samples within the experiment. As with any statistic, the larger the sample size, the more reliable the results. chi-square (χ2) theme  A chi-square (χ2) statistic is a measure of the difference between the observed (fo) and expected frequencies(fe) of the outcomes of a set of events or variables.
  • 202.  χ2 depends on the size of the difference between actual and observed values, the degrees of freedom, and the samples size. Goodness of fit  As a test of goodness of fit, Pearson tried to make use of the chi square distribution for devising a test for determining how well experimentally obtained results fit in the results expected theoretically on some hypothesis.  χ2 can be used to test whether two variables are related or independent from one another or to test the goodness- of-fit between an observed distribution and a theoretical distribution of frequencies.
  • 203. What Does a Chi-Square Statistic Tell You?  There are two main kinds of chi-square tests: the test of independence, which asks a question of relationship, such as, "Is there a relationship between student sex and course choice?"; and the goodness-of-fit test, which asks something like "How well does the coin in my hand match a theoretically fair coin?  When considering student sex and course choice, a χ2 test for independence could be used. To do this test, the researcher would collect data on the two chosen variables (sex and courses picked) and then compare the frequencies at which male and female students select among the offered classes using the formula a χ2.
  • 204.  If there is no relationship between sex and course selection (that is, if they are independent), then the actual frequencies at which male and female students select each offered course should be expected to be approximately equal, or conversely, the proportion of male and female students in any selected course should be approximately equal to the proportion of male and female students in the sample.  A χ2 test for independence can tell us how likely it is that random chance can explain any observed difference between the actual frequencies in the data and these theoretical expectations.
  • 205.  χ2 provides a way to test how well a sample of data matches the (known or assumed) characteristics of the larger population that the sample is intended to represent. If the sample data do not fit the expected properties of the population that we are interested in, then we would not want to use this sample to draw conclusions about the larger population.  For example consider an imaginary coin with exactly 50/50 chance of landing heads or tails and a real coin that you toss 100 times. If this real coin has an is fair, then it will also have an equal probability of landing on either side, and the expected result of tossing the coin 100 times is that heads will come up 50 times and tails will come up 50 times.
  • 206.  In this case, χ2 can tell us how well the actual results of 100 coin flips compare to the theoretical model that a fair coin will give 50/50 results. The actual toss could come up 50/50, or 60/40, or even 90/10. The farther away the actual results of the 100 tosses is from 50/50, the less good the fit of this set of tosses is to the theoretical expectation of 50/50 and the more likely we might conclude that this coin is not actually a fair coin. Null hypothesis:  Null hypothesis is set up No actual difference between the observed frequencies (derived from experimental results and expected frequencies, derived on the basis of some hypothesis of equal probability or chance factor or hypothesis of normal distribution).  Computation of value of chi square.
  • 207.
  • 208. Application of this formula requires the following computation: 1.Computation of expected frequencies based on some hypothesis. 2.Finding the square of the differences between observed and expected frequencies and dividing it by the expected frequency in each case to determine the sum of these quotients. Determining the Degrees of Freedom: Determining the Number of Degrees of Freedom  The data in chi square problems are usually available or may be arranged in the form of a contingency table showing observed and expected frequencies in some specific rows and columns.
  • 209. The formula for the computation of the number of degrees of freedom in a chi square problem usually runs as follows: Where  df= (r - 1) (c -1)  r = No. of rows in the contingency table  c = No. of columns in the contingency table Determining the Critical Value of χ2  As for the z and / tests of significance, there exists a table for the critical value of x required for significance at a predetermined significance level (5% or 1%) for the computed degrees of freedom. The desired critical value of x thus, may be read from Table F given in the Appendix of this text.
  • 210.
  • 211.
  • 212. Heads 40 50 -10 100 2 Tails 60 50 10 100 2 Total 100 100
  • 213. Example: Chi Square test of Independence between two means 100 boys and 60 girls were asked to selected one of the five elective subjects. The choice of the two sexes were tabulated separately. sex subjects A B C D E Total Boys 25 30 10 25 10 100 Girls 10 15 5 15 15 60
  • 214. Solution: Step 1. Establishing a null hypothesis-Null hypothesis. The choice of the subject is independent of sex, i.e. there exists no significant difference between the choices of the two sexes. Step 2. Computation of the expected frequencies. For this purpose, we first present the given data in the contingency table form. The expected frequencies after being computed, are written within brackets just below the respective observed frequencies. The contingency table and the process of computation of expected frequencies is given in Table. Categories Sex A B C D E Total Boys 25 30 10 25 10 100 (21.9) () () () () Girls 10 15 5 15 15 60 () () () () () Total 35 45 15 40 25 160
  • 215. Categories Sex A B C D E Total Boys 25 30 10 25 10 100 (21.9) (28.1) (9.4) (25) (15.6) Girls 10 15 5 15 15 60 (13.1) (16.9) (5.6) (15) (9.4) Total 35 45 15 40 25 160
  • 216. 100x35/160=21.9 60x35/160=13.1 100x45/160-28.1 60x45/160=16.9 100x15/160=9.4 60x15/160=5.6 100x40/160=25 60x40/160=15 100x25/160=15.6 60x25/60=9.4 Computation of expected frequencies
  • 217. fo fe fo-fe (fo-fe)2 (fo-fe)2/fe 25 21.9 3.1 9.61 0.4388 30 28.1 1.9 3.61 0.1284 10 9.4 0.6 0.36 0.038 25 25 00 00 0000 10 15.6 -5.6 31.36 2.010 10 13.1 -3.1 9.61 0.733 15 16.9 -1.9 3.61 0.214 5 5.6 -0.6 0.36 0.064 15 15 00 00 0000 15 9.4 5.6 31.36 3.336 Total 160 160 X2 =6.962
  • 218. fo fe fo-fe (fo-fe)2 (fo-fe)2/fe 25 21.9 3.1 9.61 .439 30 28.1 1.9 3.61 .128 10 9.4 .6 .36 .038 25 25 0 0 .000 10 15.6 -5.6 31.36 2.010 10 13.1 -3.1 9.61 .733 15 16.9 -1.9 3.61 .214 5 5.6 -.6 .36 .064 15 15 0 0 .000 15 9.4 5.6 31.36 3.336 Total 160 160 X2 =6.962
  • 219. Step 4: Testing the null hypothesis  No of degree of freedom=(r-1)(c-1)=(2-1)(5-1)=4 For 4 degrees of freedom , from Table F of the x2 distribution.  Critical value of chi (x2) =9.48 at 5% level of significance.  Critical value of chi (x2) = 13.277 at 1% level of significance.  The computed value of x2 is much less than that of critical value of x2 at 5% and 1% levels of significance.  Hence it is to be taken as non-significant. Consequently , the null hypothesis cannot be rejected it is quite independent of sex.