CORSE TITLE: MEASUREMENT,
EVALUATION AND STATISTICS IN
EDUCATION
COURSE CODE: EDCR 361
PETER ESHUN, PHD
0244590189
peshun@uew.edu.gh
Nature, Definition and
Explanation of basic
concepts in Assessment
Assessment, Test, Measurement, & Evaluation
Assessment
Assessment is a broad term defined as a process for
obtaining information that is used for making decisions
about students, curricula and programmes, and educational
policy.
Assessment is done to gather data on learning, teaching,
schools and the education system to enable decision making
on the progress attained by learners.
Some of the reasons for conducting assessment
in educational sector include the following:
Learner assessment: To ascertain the level of learners’ performance against curriculum
standards and core competences in order to make decisions regarding selection,
remediation, promotion, certification, proficiency and competency.
Teacher appraisal: To improve the teacher’s own practice by identifying gaps in the
content delivery and pedagogy.
School evaluation: To obtain credible information about schools in terms of learners and
teachers’ performances, leadership, resource availability and infrastructure.
System evaluation: To determine the strengths and weaknesses of the entire educational
system and permit a good understanding of how well learning is being facilitated.
What do we Assess?
The Pre-Tertiary Assessment Framework has been designed to assess the
Core Competences, 4Rs, Practical Skills and Values and Attitudes
The six competences are:
• Critical thinking and problem solving
• Creativity and innovation
• Communication and collaboration
• Cultural identity and global citizenship
• Personal development and leadership
• Digital literacy.
4Rs (Reading, wRiting, aRithmetic and cReativity).
Guidelines for selecting and using
classroom assessment
One needs to be clear about the learning target he/she wants to
assess. Before you can assess a student, you must know the kind(s)
of student knowledge, skill(s), and performance(s) about which you
need information.
One needs to be sure that the assessment technique(s) he/she
selects actually match the learning target.
One needs to be sure that the assessment technique(s) serve the
needs of the learners. You should select assessment technique(s)
that provide meaningful feedback to the learners about how closely
they have approximated the learning targets.
Guidelines for selecting and using
classroom assessment
Whenever possible, be sure to use multiple indicators of
performance for each learning target. This will provide a better
assessment of the extent to which a student has achieved a given
learning target.
One needs to be sure that when you interpret the result of
assessments you take their limitations into account.
Assessment is a means to an end. It is not an end in itself.
Assessment provides information upon which decision are based
Test
Test connotes the presentation of a standard set of questions to be answered.
Test is an instrument or systematic procedure for observing and describing one
or more characteristics of a student using either a numerical scale or a
classification scheme (Nitko, 2001).
A test is a formal, systematic, usually paper-and-pencil procedure for gathering
information about pupil’s behaviour (Airasian, 1991).
In schools, we usually think of a test as a paper-and-pencil instrument with a
series of questions that students must answer. These tests are usually scored by
adding together the “points” a student earned on each question. Thus they
describe the student using a numerical scale.
Types of test
Tests are classified in different ways using criteria like purpose, uses
and nature.
Using purpose as a criterion test can be classified as Achievement
tests, Diagnostic tests, Aptitude tests, Intelligence tests, etc.
Using uses as a criterion test can be classified as Norm-referenced
test, and Criterion-referenced tests.
Using nature as a criterion test can be classified as paper-and-
pencil test, oral test, performance test, etc.
Achievement test
It is a test designed to measure formal or “School taught” learning.
Achievement tests measure the degree of student learning in specific curricula
areas in which instruction has been received.
Achievement tests are measures of previously acquired knowledge
Achievement tests are designed to measure the extent to which a person has
achieved, gain, attain, mastered certain skills as a result of specific instruction
and learning.
Achievement tests can be classified into two.
Teacher-made achievement tests
Standardized achievement tests
Measurement
Measurement is the process of quantifying the degree to which
someone or something possesses a given characteristic, quality or
feature.
Most authorities agree that measurement is the assignment of
numbers, numerals or symbols to the traits or characteristics of
persons or events according to specific rules.
Nitko (2001) defined measurement as a procedure for assigning
numbers (usually called scores) to a specified attribute or
characteristics of a person in such a way that the numbers describe
the degree to which the person possesses the attribute.
Scales of Measurement

Depending upon the traits/attribute/characteristics and the way they are measured,
different kinds of data result, representing different scales of measurement.

Variables may be grouped into four categories of scales depending on the amount of
information given by the data.

Different rules apply at each scale of measurement, and each scale dictates certain types of
statistical procedures.
Ratio
Interval
Ordinal
Nominal (or Categorical)
These scales are hierarchical, with nominal being the lowest and the ratio being the highest.
Nominal scale
A nominal scale classifies persons or objects into two or more categories.
Whatever the classification, a person or object can only be in one category, and
members of a given category have a common set of characteristics.
Numbers may be used to represent the variables but the numbers do not have
numerical value or relationship.
The numbers are only for identification purpose
Examples are sex, occupation, color of eyes, and region of residence. We can
say Male = 1 and female 2 or Male = 2 and female = 1, it does not make any
change because the numbers are only for identification purpose.
Ordinal scale
An ordinal scale means our measurement now contain the property of order.
It provides some information about the order or rank of the variables, but it
does not indicate how much better one score is than another.
This enables us to make statements using the phrases “More than” or less
than”.
For example, a secondary school teacher might rank three students, Kwame,
Ama and John, with a score of 1, 2 and 3, respectively, on the trait of sociability.
From this data; we can conclude that Kwame is more social than Ama, who, in
turn, is more social than John. However, we can not say by how much Ama is
more social than John.
Interval scale
Interval scales are numeric scales in which we know both the order and the
exact differences between the values.
Interval scale have the characteristics of the nominal and ordinal scales, ie., the
ability to classify and to indicate the direction of the difference.
Using the interval scale the Zero point is arbitrary and does not mean the
absence of the characteristics/trait.
An example of an interval scale is the Fahrenheit temperature scale because of
its equality of units. For instance, the difference between 300
and 340
is the
same as the difference between 720
to 760
.
Ratio scale
The ratio scale incorporates all of the characteristics of the interval
scale with one important addition – an absolute zero.
Examples of ratio scales are height, weight, time and distance.
With an absolute zero point, you can make statements involving
ratios of two observations such as “twice as long as” or “half as fast
as”
Evaluation
Evaluation is defined as the process of making a value judgement about the
worth of a student’s product or performance (Nitko, 2001).
It is a process by which quantitative and qualitative data are processed to
arrive at a judgement of value and worth of effectiveness.
The main concern of evaluation in the classroom is to arrive at a judgement on
the worth or effectiveness of teaching and learning.
Evaluation may or may not be based on measurements or tests results.
Forms of Evaluation
Things may be evaluated during development as
well as after they are completely developed.
The terms formative and summative evaluation are
used to distinguish the roles of evaluation during
these two periods.
Formative evaluation
Formative evaluation is judgement about quality or worth made during the design or
development of instructional materials, instructional procedures, curricula, or
educational programmes.
The evaluator directs these judgements towards modifying, forming, or otherwise
improving the product before it is widely used in schools.
A teacher also engages in formative evaluation when revising lessons or learning
materials by using information obtained from their previous use.
Sometimes we speak of formative evaluation of students. This means we are judging
the quality of a student’s achievement of a learning target while the student is still in the
process of learning it. Such judgement can help us guide a student’s next learning steps.
No penalty is given to students when they are given formative evaluation, and it is
guidance –oriented in nature.
Summative evaluation
Summative evaluation is judgement about the quality or worth of already-
completed instructional materials, instructional procedures, curricula, or
educational programmes.
Such evaluation tends to summarize strengths and weaknesses, it describes
the extent to which a properly implemented programme or procedure has
attained its stated goals and objectives.
Sometimes we speak of summative evaluation of students. By this we mean
judging the quality or worth of a student’s achievement after the instructional
process is completed.
Giving letter grades on report cards is one example of reporting your
summative evaluation of a student’s achievement during the preceding marking
period.
Purposes of Assessment
Assessment of learning,
Assessment for learning
Assessment as learning.
Assessment of Learning
The purpose of assessment of learning is usually SUMMATIVE and
is mostly done at the end of a task, unit of work, at the end of a unit,
term or semester, and may be used to rank or grade students.
Assessment of learning is designed to provide evidence of
achievement to parents, other educators, the students themselves
and sometimes to outside groups (e.g., employers, other educational
institutions).”
It is designed primarily to serve the purposes of accountability, or
of ranking, or of certifying competence.
Teachers’ Roles in Assessment of
Learning:
Effective assessment of learning requires that teachers provide:
a rationale for undertaking a particular assessment of learning at a particular point in time
clear descriptions of the intended learning
processes that make it possible for students to demonstrate their competence and skill
a range of alternative mechanisms for assessing the same outcomes
public and defensible reference points for making judgements
transparent approaches to interpretation
descriptions of the assessment process
strategies for recourse in the event of disagreement about the decisions.
Assessment for Learning
Assessment for learning is any assessment for which the first priority in its
design and practice is to serve the purpose of promoting pupils’ learning.
Assessment for learning involves teachers using evidence about students'
knowledge, understanding and skills to inform their teaching.
Sometimes referred to as ‘formative assessment', it usually occurs
throughout the teaching and learning process to clarify student learning and
understanding.
Students understand exactly what they are to learn, what is expected of
them and are given feedback and advice on how to improve their work.
Teachers’ Roles in Assessment for Learning:
It is interactive, with teachers:
aligning instruction
identifying particular learning needs of students or groups
selecting and adapting materials and resources
creating differentiated teaching strategies and learning
opportunities for helping individual students move forward in their
learning
Providing immediate feedback and direction to students
Assessment as Learning
Assessment as learning occurs when students are their own assessors.
Students monitor their own learning, ask questions and use a range of
strategies to decide what they know and can do, and how to use
assessment for new learning.
Through this process students are able to learn about themselves as
learners and become aware of how they learn, that is, become
metacognitive (knowledge of one’s own thought processes).
Assessment as learning helps students to take more responsibility for
their own learning and monitoring future directions.
In monitoring metacognition one need to ask him/herself the following
questions;
What is the purpose of learning these concepts and skills?
What do I know about this topic?
What strategies do I know that will help me learn this?
Am I understanding these concepts?
What are the criteria for improving my work?
Have I accomplished the goals I set for myself?
Teachers’ Roles in Assessment as Learning:
The teachers’ role in promoting the development of independent learners through
assessment as learning is to:
model and teach the skills of self-assessment
guide students in setting their own goals, and monitoring their progress toward them
provide exemplars and models of good practice and quality work that reflect
curriculum outcomes
work with students to develop clear criteria of good practice
guide students in developing internal feedback or self-monitoring mechanisms to
validate and question their own thinking, and to become comfortable with ambiguity
and uncertainty that is inevitable in learning anything new
End of session
CORSE TITLE: MEASUREMENT,
EVALUATION AND STATISTICS IN
EDUCATION
COURSE CODE: EDCR 361
PETER ESHUN
0244590189
peteshun37@gmail.com
Taxonomies of Educational
Objectives
Thebasicstepsintheclassroomassessmentprocessare:
setting targets and writing objectives,
choose assessment items and technique,
administering assessments and analyse the
data
share the results with students and other
stakeholders
What you would like students to be able to do, value, or feel at the
completion of an instructional segments is termed as learning
objective.
Some planned changes are:
You may want students to read a claim made by a political figure and
determine whether there is evidence available to support that claim.
That is cognitive.
You may want students to feel comfortable when talking in front of
their classmates about how to solve mathematics problems. That is
affective. Such change relate to values.
You may want students to set up, focus, and use a microscope properly
during a science investigation of pond water. That is psychomotor.
What is Taxonomy?
Taxonomy is a system of classification.
It is an ordered classification system.
They are hierarchical schemes for classifying learning objectives into
various levels of complexity.
Taxonomy can help you to bring to mind the wide range of
important learning objectives and thinking skills to avoid narrowly
focusing on some lower level objective only.
The three domains are explained separately, they are closely related.
THE COGNITIVE DOMAIN
The cognitive domain deals with all mental processes
including perception, memory and information processing
by which the individual acquires knowledge, solves
problems and plans for the future.
There are different taxonomies under the cognitive domain.
Let us start with the Bloom’s taxonomy of the cognitive
domain.
Bloom’s taxonomy of cognitive domain
Evaluation (makes judgments about materials and methods)
Synthesis (puts the parts together to form a new whole)
Analysis (break an idea into component part and describe the
relationships)
Application (application of a rule or principle)
Comprehension (lowest level of understanding)
Knowledge (recall of specific information)
Bloom's Revised Taxonomy
Lorin Anderson, a former student of Bloom, and
David Krathwohl revisited the cognitive domain in the
mid-nineties and made some changes.
changing the names in the six categories from noun
to verb forms
rearranging them
Level of
Taxonomy
Definition Question Stems
Creating
Generating new ideas, products, or ways of
viewing things.
Designing, constructing, planning,
producing, inventing
-How would you devise your
own way to…?
-How many ways can
you…?
Evaluating
Justifying a decision or course of action.
Checking, hypothesizing, critiquing,
experimenting, judging
-Is there a better solution
to…?
-What do you think about…?
Analyzing
Breaking information into parts to explore
understandings and relationships.
Comparing, organizing, deconstructing,
interrogating, finding
-How is …similar to …?
-What are some other
outcomes?
Applying Using information in another familiar
situation.
Implementing, carrying out, using,
executing
-Do you know of another
instance where…?
-Which factors would you
change…?
Understanding Explaining ideas or concepts .
Interpreting, summarizing, paraphrasing,
classifying, explaining
-How would you explain…?
-What was the main idea…?
Remembering Recalling information .
Recognizing, listing, naming
-What is…?
-Who …?
THE AFFECTIVE DOMAIN
The affective domain describes our feeling, likes and dislikes, and
our experiences as well as the resulting behaviours (reactions).
Affective learning includes the manner in which we deal with things
emotionally, such as feelings, values, appreciation, enthusiasms,
motivations, and attitudes.
It is demonstrated by behaviours indicating attitudes of awareness,
interest, attention, concern and responsibility, ability to listen and
respond in interactions with others, and ability to demonstrate those
attitudinal characteristics or values, which are appropriate to the test
situation and the field of study.
Krathwohl, Bloom and Masia (1973) put it into five major
categories listed from the simplest behaviour to the most
complex:
Internalizing values (characterization)
Organization
Valuing
Responding to Phenomena
Receiving Phenomena
Receiving Phenomena: Awareness, willingness to hear, selected
attention.
Examples: Listen to others with respect. Listen for and remember the
name of newly introduced people.
Responding to Phenomena: Active participation on the part of the
learners. Attends and reacts to a particular phenomenon. Learning
outcomes may emphasize compliance in responding, willingness to
respond, or satisfaction in responding (motivation).
Examples: Participates in class discussions. Questions new ideals,
concepts, models, etc. in order to fully understand them.
Valuing: The worth or value a person attaches to a
particular object, phenomenon, or behaviour. Valuing
is based on the internalization of a set of specified
values, while clues to these values are expressed in the
learner's overt behaviour and are often identifiable.
Examples: Demonstrates belief in the democratic
process. Is sensitive towards individual and cultural
differences (value diversity). Shows the ability to solve
problems. Proposes a plan to social improvement and
follows through with commitment.
Organization: Organizes values into priorities by
contrasting different values, resolving conflicts
between them, and creating an unique value system.
The emphasis is on comparing, relating, and
synthesizing values.
Examples: Recognizes the need for balance between
freedom and responsible behaviour. Accepts
responsibility for one's behaviour. Explains the role of
systematic planning in solving problems. Prioritizes
time effectively to meet the needs of the organization,
family, and self.
Internalizing values (characterization): Has a value system
that controls their behaviour. The behaviour is pervasive,
consistent, predictable, and most importantly, characteristic
of the learner. Instructional objectives are concerned with
the student's general patterns of adjustment (personal,
social, emotional).
Examples: Shows self-reliance when working
independently. Cooperates in group activities (displays
teamwork). Displays a professional commitment to ethical
practice on a daily basis.
THE PSYCHOMOTOR DOMAIN
This refers to educational outcomes that focus on
motor (movement) skills and perceptual processes.
According to Simpson (1972) the psychomotor
domain includes physical movement, coordination,
and use of the motor-skill areas.
Development of these skills requires practice and
is measured in terms of speed, precision, distance,
procedures, or techniques in execution.
The seven major categories are listed from the
simplest behaviour to the most complex:
Perception (awareness): The ability to use sensory cues to guide
motor activity. Detects non-verbal communication cues. Estimate where
a ball will land after it is thrown and then moving to the correct location
to catch the ball.
Set: Readiness to act. It includes mental, physical, and emotional sets.
Shows desire to learn a new process (motivation).
Guided Response: The early stages in learning a complex skill that
includes imitation and trial and error. Performs a mathematical equation
as demonstrated.
Mechanism (basic proficiency): Learned responses have become
habitual and the movements can be performed with some confidence
and proficiency. Use a personal computer.
Complex Overt Response (Expert): The skillful performance
of motor acts that involve complex movement patterns.
Operates a computer quickly and accurately.
Adaptation: Skills are well developed and the individual can
modify movement patterns to fit special requirements. Responds
effectively to unexpected experiences. Modifies instruction to
meet the needs of the learners.
Origination: Creating new movement patterns to fit a
particular situation or specific problem. Constructs a new theory.
Develops a new and comprehensive training programming.
END OF LESSON
COURSE TITLE: MEASUREMENT,
EVALUATION AND STATISTICS IN EDUCATION
COURSE CODE: EDCR 361
PE TE R ES HUN, P HD
DE PA RTM E NT O F E D UC ATIO NA L FO UNDATIO NS
PE TES HU N3 7@ GM A I L.CO M
+23 3 244 5 90 18 9
School Based Assessment
(SBA)
WHAT IS SBA?
School-based Assessment (SBA) is a kind of assessment carried out in schools by pupils' own
teachers, with the prime purpose of improving pupils' learning.
SBA is a formative and diagnostic task geared towards improving the quality of teaching,
learning and the mode of assessment itself.
Cont…
School Based Assessment is a system of using different test
modes: class tests, class exercises, groupworks, portfolio,
rubric, homework, projects and other assessment
procedures to measure what learners have achieved
through a teaching/learning process.
Cont…
Broadly, the SBA is simply all forms/modes of assessment that can be
undertaken internally by any school-level actor (learner, teacher,
headteacher).
This means that SBA include diagnostic assessments, formative
assessments and summative assessments that can be completed
while at the school.
Cont…
For the purpose of describing the overall learner achievement at the end of a term, an Internal
Assessment Score (IAS) and an End-of-Term Exam Score (ETES) shall be generated and put
together (i.e. added).
NaCCA is responsible for providing guidance and support on SBA, whereas the training of
teachers on effective use of the SBA is the responsibility of GES
Why SBA?
The objectives of the school-based assessment are to:
obtain a better picture of the learner performance across an entire programme or
course than that provided by a single-shot examination
enable holistic assessment of the learner
emphasise learner-centred approach to learning
 standardise internal assessment practices across schools
 assess the core competences which otherwise cannot be assessed by National
Standard Assessment Test (NSAT)
Cont…
guide improvement of instruction and learning in schools
allow for teachers to develop assessments around challenging areas of the
curriculum
collect information about learners that will be helpful in planning instruction to meet
their most critical learning needs.
assess whether the instruction provided is enough to help learners achieve the
curriculum standards
encourage more individualised instruction
Cont…
identify students who may be “at risk” or who may need extra instruction or intensive
interventions if they are to move toward grade-level standards
monitor all learners’ progress to determine whether “at risk” students are making
adequate progress, and to identify any learner who may be falling behind.
 introduce a system of moderation that will ensure accuracy and reliability of
teachers’ marks.
provides teachers with advice on how to conduct remedial instruction on difficult
areas of the curriculum to improve class performance.
Uses of SBA
Among other things, the SBA will be used for the following:
help learners reflect upon their own learning and progress
help learners understand and appreciate their strengths, abilities and areas for development.
help prevent underachievement
improve motivation and self-esteem
Cont…
activate learners as instructional resources for one another
promote teamwork and collaboration
fosters cooperation between the teacher and the learner especially in the area of
learners’ class projects
allow for the holistic approach to assessing learners
Modes of SBA
SBA emphasises a learner-centred approach to learning and seeks to develop high
ability thinking skills, problem solving skills, cooperative learning, teamwork, moral
and spiritual development and formal presentation skills on the part of the learner.
The tasks given in SBA are referred to as Class Assessment Tasks (CATs)
For recording purposes it consists of 12 assessments a year.
Four CATs in a term.
Cont…
The 12 assessments are labelled as CAT 1, CAT 2 up to CAT 12.
The class assessment task (CAT) 1-4 will be administered in Term 1
CAT 5-8 will be administered in Term 2
CAT 9-12 in Term 3.
Cont…
Group exercise
Class test
Project
Cont…
CAT 1 is group exercise
CAT 2 is class test
CAT 3 is group exercise
CAT 4 is Project work
CAT 1, CAT 2, CAT 3, CAT 4 in first term
CAT 5, CAT 6, CAT 7, CAT 8 in second term
CAT 9, CAT10, CAT11, CAT12 in third term
Cont…
CAT1 will be administered as group exercise coming
at the end of week 4 of Term1.
When administered in Terms 2 and 3, the equivalent
of CAT 1 will be CAT 5 and CAT 9 respectively.
Cont…
CAT 2 will be administered as class test coming at the
end of week 8 of Term 1.
When administered in Terms 2 and 3, the equivalent
of CAT 2 is CAT 6 and 10 respectively.
Cont…
CAT3 will also be administered as group exercise
coming at the end of the 11th or 12th week of the
term.
When administered in Terms 2 and 3, the equivalent
of CAT 3 is CAT 7 and CAT 11 respectively.
Cont…
CAT 4, 8 and 12 are projects.
These are tasks assigned to learners individually or in groups to
complete over an extended period of time (given week 2 and
collected week 12)
It may include practical work and investigative study (including
case study) and survey.
A report will be presented for each project undertaken.
Successful SBA takes into account six
principles of best practice:
Principle 1: Emphasis must be placed on assessment for learning and assessment as learning.
There should be a monitoring and evaluation plan or framework.
Principle 2: School Improvement and Support Officers (SISO) within the MMDA, headteachers
and teachers should be given support in terms of competency training workshops and
Professional Learning Community (PLC) training sessions.
Cont…
Principle 3: Data utilisation must be a priority, in order to inform
teachers and other key stakeholders on current learning progressions
in various schools across the country.
Principle 4: Technology should be utilised as far as possible in
implementing SBA.
Cont…
Principle 5: Project topics should be developed from the different subject curricula.
Projects should be flexible in executing. The teacher should also give learners the
opportunity to sometimes come up with their own project topics that should be aimed
at solving problems that affect them.
Principle 6: Authentic classroom assessment should be incorporated as part of SBA
Project-based Assessment under SBA
Projects are learning activities that provide learners with the opportunity to synthesise and
apply knowledge in a real-life situation.
Project work can be in any of the following forms:
Experiments: This will require the teacher to give learners a task that involves carrying out a
test trial or a tentative procedure. It is an act or operation for the purpose of discovering
something unknown or of testing a principle. During experiments, learners observe, record,
analyse and evaluate their results.
Cont…
Investigations: Investigative projects enable learners to create their own questions around a
topic, collect, organise, and evaluate data, draw conclusions and share results through
presentations and explanations. Learners may demonstrate the results of their investigations
through different types of products and experiences, including the writing of a paper, the
development of artwork, oral presentations, audio and videotape productions, photographic
essays, simulations, or plays.
How do we assess projects?
A scoring rubric, usually in the form of a table, is used to
evaluate the project work.
Projects can be assessed at each of the key stages –
preparation, the process, product, and response.
Cont…
Other key considerations may include:
originality of the project;
level of creativity and innovation;
simplicity and cost effectiveness of the project;
the extent to which it fosters learner development and other core competences; and
the extent to which the final product can be used in real-life situation or to address a
challenge.
A project may thus be assessed by teachers, peers, a panel or parents.
End of session
CORSE TITLE: MEASUREMENT,
EVALUATION AND STATISTICS IN
EDUCATION
COURSE CODE: EDCR 361
PETER ESHUN (Ph.D.)
0244590189
peteshun37@gmail.com
Characteristics of a
Good Assessment
Result
Because a test is labeled as a measure of a
specific construct does not necessarily
mean that inferences made about test
scores are legitimate.
Any meaningful test should possess
certain characteristics.
The characteristics includes the fact that
the results which will come out of this test
should be relevant to what is being
measured and at the same time stable and
therefore trustworthy.
The relevance and stability of the test
results bring us to the two main
attribute of instruments in educational
measurement.
These are validity and reliability.
Validity
Validity refers to the legitimacy of the inferences that
we make about a score.
It is not the test itself that is either valid or invalid.
It is the interpretation of the results in the form of
scores and the degree to which they assess the construct
being assessed that should be evaluated in terms of their
validity.
It is possible for the results of a test to be valid
for one purpose and invalid for another.
Validity is the soundness of the interpretations
and uses of students’ assessment results.
Validity is the appropriateness or correctness
of inferences, decisions, or descriptions made
about individuals, groups, or institutions from
test results.
To validate your interpretations and uses of
students’ assessment results, you must
provide evidence that these interpretations
and uses are appropriate.
When discussing the validity of assessment results,
the following points should be kept in mind.
The concept of validity applies to the ways in which we
interpret and use the assessment results and not the
assessment procedure itself.
The assessment results have different degrees of validity
for different purposes and for different situations.
Judgments about the validity of your interpretations or
uses of assessment results should be made only after you
have studied and combined several types of validity
evidence.
Categories of Validity Evidence
Content-related evidence
Criterion-related evidence
Construct-related evidence
Content-related evidence
This type of evidence refers to the content
representativeness and relevance of tasks or items on
an instrument.
It is defined as the degree to which the set of items
included on a particular test are representative of the
entire domain of items that the test is intended to
assess
Content-related evidence answers
questions like:
How well do the assessment tasks represent the
domain of important content?
How well do the assessment tasks represent the
curriculum as defined?
How well do the assessment tasks reflect current
thinking about what should be taught and assessed?
Are the assessment tasks worthy of being learned?
Criterion-related evidence
The criteria approach to establishing validity involves the
demonstration of a relationship between test scores and a
criterion that is usually, but not necessary, another test.
It is most often used to evaluate the scores from aptitude test
because the instruments are intended to predict concrete
behaviours.
There are two types of criterion- related evidence. These are
concurrent validity and predictive validity .The only procedural
distinction between these pertains to the time period when the
criterion data are gathered.
Construct-related evidence
This type of evidence refers to how well the assessment
results can be interpreted as reflecting an individuals’ status
regarding an educational or psychological trait, attribute or
mental process.
Construct – related validity is the extent to which test
performance can be interpreted in terms of certain
psychological traits (Grounlund, 1985)
Examples of such constructs include intelligence,
aggressiveness, honesty, sociability, verbal fluency
and mathematical reasoning.
It is believed that each construct has an underlying
theory which can facilitate the prediction of a
person’s behaviour.
Factors Affecting Validity
Unclear directions.
Too difficult reading vocabulary and sentence structure.
Ambiguous statements in assessment tasks and items.
Inadequate time limits.
Inappropriate level of difficulty of the test items.
Improper arrangement of items.
Cheating.
Student emotional disturbances
Reliability
Would your students get the same scores if they took one of your
tests on two different occasions?
Would they get approximately the same scores if they took two
different forms of one of your tests?
These questions have to do with the consistency with which your
classroom tests measure students’ achievement.
The generic name for consistency is reliability.
Reliability is an essential characteristic of a good test,
because if a test doesn’t measure consistently
(reliably), then one could not count on the scores
resulting from a particular administration to be an
accurate index of students’ achievement.
When applying the concept of reliability
note that:
Reliability refers to the results obtained with an assessment
instrument and not to the instrument itself.
An estimate of reliability refers to a particular type of consistency.
Reliability is a necessary condition but not a sufficient condition
for validity.
Results can be reliable but not valid. For test to be valid, it must be
reliable.
Reliability is primarily statistical.
It is determined by the reliability coefficient, which is
defined as a correlation coefficient that indicates the
degree of relationship between two sets of scores
intended to be measures of the same characteristic.
True score Theory
It is a theory of testing based on the idea that a person’s observed or
obtained score (X) on a test is the sum of a true score (T) (error-free
score) and an error score (E)
X = T + E
Observed score True score Error score
Standard Error of Measurement (SEM)
If you want to track student progress over time, it’s critical to use
an assessment that provides you with accurate estimates of student
achievement, that is, assessment with a high level of precision.
When we refer to measures of precision, we are talking about
something known as the standard error of measurement (SEM).
Before we define SEM, it’s important to remember
that all assessment scores are an estimate. That is, irrespective of
the test being used, all observed scores include some measurement
error, so we can never really know a student’s actual achievement
level (their true score).
Standard Error of Measurement (SEM)
We can estimate the range in which we think a
student’s true score likely falls; in general, the
smaller the range, the greater the precision of the
assessment.
SEM, put in simple terms, is a measure of the
precision of an assessment, The smaller the SEM,
the more precise the measurement capacity of the
instrument.
Consequently, smaller standard errors translate to
more sensitive measurements of student progress.
Standard Error of Measurement (SEM)
The standard error of measurement (SEM) is a measure of how
much measured test scores are spread around a “true” score.
The SEM is especially meaningful to a test taker because it applies
to a single score and it uses the same units as the test.
SEM is define as the standard deviation of errors of measurement
that is associated with the test scores for a specified group of test
takers.
The formula:
Methods of Estimating Reliability
Test–Retest Method
Equivalent Forms Method
Split – half method
Inter-rater method
Test–Retest Method
The same instrument is given twice to the same group of
people.
The reliability is the correlation between the scores on the
two administrations.
This is a measure of the stability of scores over a period of
time.
A major disadvantage of this method lies in the carry over
effects from one testing occasion to another.
Equivalent Forms Method
It is also referred to as Parallel or Alternate method
Two different versions of the instrument are
created. We assume both measure the same thing.
The two forms, which are alternate or parallel with
the same content and level of difficulty for each
item, are administered to the same group of
students.
The forms may be given on the same or nearly the same
occasion or a time interval will elapse before the second
form is given.
The scores on the two administrations are correlated and
the result is the estimate of the reliability of the test.
A problem with this method of estimating the reliability of a
test is the construction of a parallel or an alternate form of a
test.
Split – half method
A single test is given to the students.
The test is then divided into two halves for scoring.
The two scores for each student are correlated to obtain the
estimate of reliability.
The correlation coefficient of the two halves gives the reliability
coefficient of one-half of the test.
This is often used with dichotomous variables that are scored 0 for
incorrect and 1 for correct.
Whole Test Reliability = 2 x correlation between half test scores
1 + correlation between half test scores
Inter-rater method
Two or more raters or scores each score the students
papers.
The two set of scores from each rater for all the students
are correlated.
The resulting correlation coefficient is known as scorer
reliability or inter-rater reliability.
Factors influencing reliability
Test length
Group variability
Difficulty of items
Scoring objectivity
Duration
READING ASSIGNMENT
Your school is looking for an assessment instrument to measure
reading ability. They have narrowed the selection to two possibilities.
Test A provides data indicating that it has high validity, but there is
no information about its reliability. Test B provides data indicating
that it has high reliability, but there is no information about its
validity. Which test would you recommend? Why?
Use practical example.
COURSE TITLE: MEASUREMENT,
EVALUATION AND STATISTICS IN
EDUCATION
COURSE CODE: EDCR 361
PETER ESHUN (PH.D.)
DEPARTMENT OF EDUCATIONAL FOUNDATIONS
PETESHUN37@GMAIL.COM
+233 244590189
The Stages in Planning and
Constructing Classroom Tests
The stages in classroom testing
According to Etsey (2001), the principal stages
involved in classroom testing are:
Constructing the test,
Administering the test
Scoring the test
Analysing the test results.
Constructing the Test
Steps in the construction of a good classroom test.
Define the purpose of the test
Determine the item format to use
Determine what is to be tested
Write the individual items
Review the items
Prepare scoring key
Write directions
Evaluate the test
Define the purpose of the test
The basic question to answer is “why am I testing?”
Test items must be related to teacher’s classroom
instructional objectives.
Several purposes are served by classroom tests and the
teacher has to be clear on the purpose of the test.
Cont…
This forms part of the planning stage so the teacher has to
answer other questions such as
“Why is the test being given at this time in the course?”
Who will take the test?”
Have the test takers been informed?”
“How will the scores be used?”
Determine the item format to use
The choice of format must be appropriate for testing particular topics and
objectives.
One needs to list the objectives of the subject matter for which the test is
being constructed and then the main topics covered or to be covered.
The test items could either be objective, essay or performance types.
It is sometimes necessary to use more than one format in a single test.
Factors to consider in the choice of the appropriate format
the purpose of the test
the time available to prepare and score the test
the number of students to be tested
the skill to be tested
the difficulty desired
physical facilities that are available
age/level of the pupils
test constructor or teacher’s skill in writing the different type of items.
Determine what is to be tested
You need to consider what the test will actually cover.
You consider the topics and the objectives/behaviours (the
various levels of the domain).
You need to construct your Table of specification or Test
Blueprint.
Table of Specification/Test blue-print
Behaviours
Content Knowledge
Comprehensi
on
Application
Analysis
Synthesis
Evaluation
Total
Soil Condition 1 1 1 3
Weather
condition
1 1 1 3
Location of
industry
2 1 1 1 1 6
Labour
processing
1 1 2
Marketing 3 1 1 1 6
Total 5 4 3 5 1 2 20
Some guidelines to follow when writing
the items
Keep the table of specifications before you and continually refer to it as you
write the items.
Items must match the instructional objectives.
Formulate well-defined items that are not vague, and ambiguous and should
be grammatically correct and free from spelling and typing errors.
Avoid excessive words.
Avoid needless complex sentences.
The item should be based on information that the examinee should know.
Cont…
Write the test items simply and clearly.
Prepare more items than you will actually need.
The task to be performed and the type of answers required should be clearly
defined.
Include questions of varying difficulty.
Write the items and the key as soon as possible after the material has been
taught.
Avoid textbook or stereotyped language.
Write the items in advance of the test date to permit reviews and editing.
Reviewing the items
Critically examine each item at least a week after writing the item.
Items that are ambiguous and those poorly constructed as well as
items that do not match the objectives must be reworded or removed.
Check the length of the test (number of items) against the purpose.
Check the kinds of test items used and the ability level of the students.
Cont…
To make the review more effective, the test
constructor or teacher could try out the test, that is,
administer the prepared items to a representative
sample of people similar to those who will take the
complete test.
Cont…
The advantages of trying out the test includes:
helps to identify items that are not suitable due to ambiguity, undue
complexity or misleading instructions;
makes it possible for the test writer to determine the degree of difficulty
of each item so that items that are too difficulty or easy will be eliminated;
enables the test writer to get information on the number of items to
include in the test and also determine the time limit for the final test.
Prepare Scoring key
Prepare a scoring key or marking scheme while the items are fresh
in your mind.
List the correct responses and acceptable variations for objective-
type tests.
Assign points to the various expected qualities of responses.
Assign values to each item and ensure representative sampling of
content covered.
Write directions
Give clear and concise directions for the entire test as well
as sections of the test.
Clearly state the time limit for the test.
Penalties for undesirable writing must be spelt out.
Directions must include number of items to respond to,
how the answers will be written, where the answers will be
written, etc.
Evaluate the test
Before administration, the test should be evaluated by the following
five criteria:
Clarity
validity
practicality
efficiency
fairness.
Clarity:
Who is being tested?
What material is the test measuring?
What kind of knowledge is the test measuring?
Do the test items relate to content and course objectives?
Are the test items simple and clear?
Validity:
Is the test a representation sampling of the material
presented in the chapter, unit, section or course?
Does the test faithfully reflect the level of difficulty
of material covered in the class?
Practicality
Will students have enough time to complete the
test?
Are there sufficient materials available to present
the test to complete it effectively?
Efficiency:
Is this the best way to test for the desired knowledge, skill
or attitude?
What problems might arise due to material difficulties or
shortage?
Fairness:
Were the students given advance notice?
Have I adequately prepared students for the test?
Do the students understand the testing procedures?
How will the scores affect the students’ lives?
End of session
Multiple-Choice Tests
It is a type of objective test in which the respondent is
given a stem that introduces a problem or asks a question,
and he/she is to select from among three or more
alternatives (options or responses) the one that best
completes the stem.
The incorrect options are called foils or distracters.
134
Types of multiple-choice tests.
‘Single best response’ type
‘Multiple response’ type
An example of single best response is:
Write 0.039387 as a decimal correct to 3 significant figures.
A. 0.394
B. 0.393
C. 0.0394
D. 0.0393
E. 0.039
135
Types of multiple-choice tests (cont.)
An example of multiple response is:
Which of the following action(s) contribute to
general principles of First Aid?
i. Arrest haemorrhage
ii. Bath the patient
iii. Immobilize injured bone
A. i only
B. ii only
C. i and ii
D. i, ii and iii
136
Multiple-Choice Tests
comment on the following questions
Q1. Planning is …………………
a. Deciding in the present what to do soon
b. It’s the process whereby companies reconcile their resources with
their finances and achievements
c. Providing a rational approach to pre-selected objectives
d. It enables companies to reconcile their resources with their
goals and objectives
Multiple-Choice Tests
comment on the following questions
Q1. Where and when did the European powers partition Africa among themselves
a. 1854 and London
b. 1889 and Berlin
c. 1884 and Berlin
d. 1884 and London
Q2. …………………….. is built directly on the hardware.
a. Computer Environment
b. Application Software
c. Database Systems
d. Operating Systems
Guidelines for constructing good
multiple-choice item
The central issue of the item should be in the stem. It should be
concise, easy to read and understand. A stem should ask question or
set a task
All options for a given item should be homogeneous in content,
form and grammatical structure.
Cont…
Repetition of words in the options should be avoided.
Specific determiners which are clues to the best/correct option
should be avoided.
Vary the placement of the correct options. No discernible
pattern of the correct/ best responses should be noticed
Items measuring opinions should not be included.
Cont…
The responses in agreement must be
◦ parallel in form i.e. sentences must be about the same length
◦ in an alphabetical/ sequential order
◦ itemized vertically and not horizontally.
Each option must be distinct. Overlapping alternatives should be
avoided.
Present a single, clearly formulated problem in the stem of the item.
Avoid using “all of the above” as an option but “None of the above” can
be used sparingly. It should be used only when an item is of the ‘correct
answer’ type and not the ‘best answer’ type.
Stems and options should be stated positively. However, a negative stem
could be used sparingly and the word not should be emphasized either by
underlining it or writing it in capital form.
Create independent items. The answer to one item
should not depend on the knowledge of the answer
to a previous item.
Sentences should not be copied from textbooks or
from past test items. Original items should be made.
Guidelines for constructing
True/False tests
1. A class always has a default constructor.
a. True
b. False
c. Neither True nor False
d. Both True and False
Guidelines for constructing
True/False tests
Statements must be definitely
true or definitely false.
Poor Item:
The value of 2/3 as a decimal
fraction is 0.7
True or False
Good Item:
The value of 2/3 expressed as a
decimal fraction correct to two
decimal places is 0.67
True or False
145
Guidelines for constructing
True and False tests (cont.)
Avoid words that tend to be clues to the correct
answer.
Word like some, most, often , many, may are usually
associated with true statements.
All, always, never, none are associated with false
statements.
146
Guidelines for constructing
True and False tests (cont.)
Approximately, half (50%) of the total number of items
should be false because it is easier to construct statements
that are true and the tendency is to have more true
statement.
Statement should possess only one central theme.
147
Guidelines for constructing
True and False tests (cont.)
State each item positively. Negative item could however be used with the
negative word, ‘not’, emphasized by underlining or writing in capital letters.
Double negatives should be avoided.
Statement should be short, simple and clear.
Ambiguous as well as tricky statements should be avoided.
Example:
Kwame Ankumah was the First President of Ghana.
True or False.
148
Guidelines for constructing
True and False tests (cont.)
Arrange the items such that the correct responses do not
form a discernible pattern.
To avoid scoring problems, let students write the correct
options in full.
149
MATCHING-TYPE TESTS
The matching type of objective test consists of two
columns and these items require students to match
information in the two columns.
Items in the left-hand column are called premises, and
those in the right-hand column are called responses.
150
MATCHING-TYPE TESTS
Example:
Match the vitamins in Column A with the diseases and conditions which a lack of the vitamin causes
in column B using an arrow line
Column A: Column B
Vitamins Diseases caused by lack
1. Vitamin A a. Bariberi
2. Vitamin C b. Kwashiorkor
3. Vitamin D c. Pellagra
d. Poor eyesight
e. Rickets
f. Scurvy
151
Guidelines for constructing matching-type tests
State clearly what each column represents.
Provide complete directions. Instructions should clearly
show what the rules are and also how to respond to the
items.
Do not use perfect matching. Use a larger, or smaller,
number of responses than premises, and permit the
responses to be used more than once.
152
Guidelines for constructing matching-type tests
(cont)
Arrange premises and responses alphabetically or sequentially.
Column A (premises) should contain the list of longer phrases. The
shorter items should constitute the responses.
Limit the number of items in each set.
Use homogenous options and items.
All options-must be placed (and typed) on the same page.
153
Guidelines in constructing good
classroom Essay tests
Plan the test
Give adequate time and thought to the preparation of the test
items. The test items must be constructed from a test
specification table and well in advance (at least two weeks) of the
testing date.
The items should be based on novel situations and problems.
Be original. Do not copy directly from textbooks or past test
items.
Cont…
Test items should require the students to show adequate
command of essential knowledge. The items should not
measure rote memorization of facts, definitions and
theorems but must be restricted to the measuring of higher
mental processes such as application, analysis, synthesis
and evaluation.
Cont…
The length of the response and the difficulty level of items should be
adapted to the maturity level of students (age and educational level).
Optional items should not be provided when content is relevant. They
may be necessary only for large external examinations and when the
purpose of the test is to measure writing effectiveness. If students answer
different questions, an analysis of the performances on the test items is
difficult.
Cont…
Prepare a scoring key (marking scheme) at the time the
item is prepared.
Decide in advance what factors will be considered in
evaluating an essay response. Determine the points to be
included and the weights to be assigned for each point. The
preparation of a model answer will help disclose ambiguities
in an item.
Cont…
Establish a framework and specify the limits of the
problem so that the student knows exactly what to
do.
Present the student with a problem which is
carefully worded so that only ONE interpretation is
possible. The questions/items must not be
ambiguous or vague.
Cont..
Indicate the value of the question and the time to be
spent in answering it.
Structure the test item such that it will elicit the type of
behaviours you really want to measure.
The test items must be based on the instructional
objectives for each content unit.
Cont…
Give preference to a large number of items that require brief
answers. These provide a broader sampling of subject content
and thus better than a few items that require extended
responses.
Start essay test items with words that are clear and as simple as
possible and which requires the student to respond to the
stimulus expected. Avoid words such as: what, list, who, as much
as possible.
Administering Test
When teachers administer tests and quizzes, they want to
create conditions that optimize students’ performance to
ensure that assessment results accurately reflect what
students know and can do.
If one construct a very good test and its administration is
not done well, one will not get the desired results.
Cont…
Test administration is concerned with
the physical and psychological setting in
which pupils take test.
Prepare students for the test.
The following information is essential to students’ maximum performance.
When the test will be given (date and time).
Under what conditions it will be given (timed or take-
home, number of items, open book or closed book, place
of test).
The content areas it will cover (study questions or a list
of learning targets).
Cont…
Emphasis or weighting of content areas (value in points).
The kinds of items on the test (objective-types or essay-
type tests).
How the assessment will be scored and graded.
The importance of the results of the test.
Cont…
Students must be made aware of the rules and regulations
covering the conduct of the test. Penalties for malpractice such as
cheating should be clearly spelt out and clearly adhered to.
Avoid giving tests immediately before or after a long vacation,
holidays or other important events where all students are actively
involved physically or psychologically/emotionally.
Cont…
Announcements must be made about the time at regular intervals.
Time left for the completion of the test should be written on the
board where practicable.
Invigilators are expected to stand at a point where they could view
all students. They should once a while move among the pupils to
check on malpractices. Such movements should not disturb the pupils.
He/she must be vigilant. Reading novels, newspapers, grading papers
are not allowed.
Cont…
Threatening behaviours should be avoided by the
invigilators. Speeches like ‘If you don’t write fast, you will
fail’ are threatening. Pupils should be made to feel at ease.
Do not talk unnecessarily before letting students start
working. Remarks should be kept to a minimum and
related to the test.
Cont…
Avoid giving hints to students who ask about
individual items. Where an item is ambiguous, it
should be clarified for the entire group.
Expect and prepare for emergencies. Emergencies
might include shortages of answer booklets,
question papers, power outages, illness etc.
Scoring the test
Teachers typically use two types of scoring procedures to
evaluate the quality of students responses to essay
questions and products.
These are: the analytic (also called scoring key, point or
trait) method and the holistic (also called global, sorting or
rating) method.
Analytic Scoring Rubrics
The analytic scoring rubric requires the scorer to
develop an outline or a list of major elements that
students should include in the ideal answer.
Also, the scorer needs to decide on the number of
points to award to students when they include each
element.
Cont…
Holistic Scoring Rubrics
The holistic scoring rubrics require the scorer to make a
judgment about the overall quality of each student’s response.
Its purpose is to sort students’ responses into categories that
indicate quality.
A very important point in deciding the categories is to be sure
they correspond to your school’s grading system.
The procedure for holistic scoring
consists of the following seven steps:
Establish the scoring categories you will use.
Characterize a response that fits each category.
Read each response rapidly and form an overall impression.
Sort the responses into the designated categories.
Reread the papers that have been placed within a category.
Cont…
Move any clearly superior or inferior responses to
other categories.
Assign the same numerical score to all responses
within a category.
End of Session
COURSE TITLE: MEASUREMENT,
EVALUATION AND
STATISTICS IN EDUCATION
COURSE CODE: EDCR 361
PE TE R ES HUN, P HD
DE PA RTM E NT O F E D UC ATIO NA L FO UNDATIO NS
PE TES HU N3 7@ GM A I L.CO M
+23 3 244 5 90 18 9
Basic Descriptive Statistics in Education
Measures of Central Tendency
A measure of central tendency is a single value that attempts
to describe a set of data by identifying the central position
within that set of data.
Measures of central tendency are sometimes called measures
of central location.
The mean, median and mode are all valid measures of central
tendency, but under different conditions
The Mean
The mean is the most popular and well-known measure of central tendency.
It can be used with both discrete and continuous data, although its use is most
often with continuous data.
The mean is equal to the sum of all the values in the data set divided by the
number of values in the data set.
Cont…
An important property of the mean is that it includes every value in your data
set as part of the calculation.
The mean is the only measure of central tendency where the sum of the
deviations of each value from the mean is always zero.
When not to use the mean
The mean has one main disadvantage: it is particularly susceptible to the
influence of outliers. (These are values that are unusual compared to the rest of the
data set by being especially small or large in numerical value. )
The median
It is a score that approximately one-half (50%) of the scores are above it and
one-half (50%) are below it when the scores are arranged sequentially.
The median is the centermost score if the number of scores is odd. If the
number of scores is even, the median is taken as the average of the two
centermost scores.
To find the median
1. Arrange all observations in order of size from smallest to largest or vice versa.
2. If the number of observations, n, is odd, the median is the center observation
(n+1)/2 position
3. If the number of observations, n, is even, the median is the mean of the two
center observations (n+1)/2 position.
Properties of the median
It is less sensitive than the mean to extreme scores.
It does not use all the scores in a distribution but uses only one
value.
It has limited use for further statistical work.
It can be used when there is incomplete data.
Uses of the median
It is used as the most appropriate measure of location when there is reason to
believe that the distribution is skewed.
It is used as the most appropriate measure of location when there are extreme
scores to affect the mean
It is useful when the exact midpoint of the distribution is wanted.
It provides a standard of performance when compared with the mean.
It can be compared with the mean to determine the direction of student
performance.
The mode
The mode is defined as the most frequent score in the
distribution.
The mode is not calculated, rather, it is simply reported once the
various frequencies within a distribution of data are known.
The main advantage of the mode is that, it is the only measure
that is useful for nominal scale.
Cont…
Properties of the mode
It can be used when there is incomplete data
It is not sensitive to extreme scores
Uses of the mode
It is used, when there is the need for a rough estimate of the measure of location.
It is used when there is the need for the most frequently occurring value.
Measures of Variability
Variability refers to the degree to which sample or population
observations differ or deviate from the distribution’s measures of
central tendency.
Measures of variability are also called measures of dispersion
or scatter.
The main measures of variability are the range, variance,
standard deviation, and quartile deviation.
The Range
The range may be defined as the spread from the lowest score to the highest score in a
distribution.
The actual statistical formula is the highest score in a group, minus the lowest score.
The range is easy to calculate but gives us only relative crude measure of dispersion,
because the range really measures the spread of the extreme scores and not the spread of
any of the scores in between.
Cont…
Range = Xhigh - Xlow
Consider the set of scores: 31,20,45,65,48,67,78,63
Range = 78 – 20 = 58.
Uses of the range
When data is too scanty or too scattered to justify the computation of a more
precise measure.
When knowledge of extreme scores or total spread is all that is
needed.
The Variance
The variance can be defined as the mean of the squares of the
deviations from the mean of the distribution.
It is the mean square deviation.
It is a measure of dispersion in which every score in the
distribution is taken into its computation.





 
x








x
x
2








x
x
2








 x
x
The steps in computing the variance are:
•compute the mean of the scores
•find the deviation score by subtracting scores from each score:
•each of the resulting deviations is squared :
•sum up the squared deviation:
•Divide the sum of the squared deviations by N; the number of scores
The Standard Deviation
The standard deviation is the square root of the average of the squared deviations
from the mean.
The standard deviation is the square root of the variance.
The standard deviation is a statistical process that enables one to make an exact
determination of distances of scores from the mean.
The mean and the standard deviation, when computed for a specific set of test
scores, enable a teacher/counselor to determine how well an individual performed
in relation to the group.
Uses of standard deviation
It helps to find out the variation in achievement among a group of students.
Generally, for scores, 1-50, SD of 5 or below is relatively small and for scores 51 - 100.
SD of 10 or below, is relatively small. With this information, the teacher has to adopt a
teaching method to suit each group.
It is helpful in computing other statistics e.g. standard scores, correlation coefficients.
It is useful in determining the reliability of a test. The split-half correlation method or
internal consistency methods use the standard deviation of the scores.
Quartile deviation (QD)
It is also called the semi-interquartile range and it depends
on quartiles.
Quartiles divide distributions into 4 equal parts.
Practically there are 3 quartiles.
The QD is half the distance between the first quartile (Q1) and
the third quartile (Q3).
Consider the following scores:
21,32,45,17,25,42,24,35,50,19,27,44,33,51,26. To calculate the quartile deviation.
First arrange the scores sequentially
17,19,21,24,25,26,27,32,33,35,42,44,45,50,51
Calculate Q1 = ¼ (15+1) position = 4th
position. From the array, Q1 = 24
Calculate Q3 = ¾ (15+1) position = 12th
position. From the array,
Q3 = 44
QD = (44 – 24)/2 = 10
Properties of Quartile deviation
◦ It is a measure of individual differences.
◦ It does not use of all the information provided by the scores
◦ For skewed distributions, where the median is used as a
measure of location the quartile deviation is a better measure
of variability
End of Session

Full Slides For Measurements, evaluation and statistics

  • 1.
    CORSE TITLE: MEASUREMENT, EVALUATIONAND STATISTICS IN EDUCATION COURSE CODE: EDCR 361 PETER ESHUN, PHD 0244590189 peshun@uew.edu.gh
  • 2.
    Nature, Definition and Explanationof basic concepts in Assessment
  • 3.
    Assessment, Test, Measurement,& Evaluation Assessment Assessment is a broad term defined as a process for obtaining information that is used for making decisions about students, curricula and programmes, and educational policy. Assessment is done to gather data on learning, teaching, schools and the education system to enable decision making on the progress attained by learners.
  • 4.
    Some of thereasons for conducting assessment in educational sector include the following: Learner assessment: To ascertain the level of learners’ performance against curriculum standards and core competences in order to make decisions regarding selection, remediation, promotion, certification, proficiency and competency. Teacher appraisal: To improve the teacher’s own practice by identifying gaps in the content delivery and pedagogy. School evaluation: To obtain credible information about schools in terms of learners and teachers’ performances, leadership, resource availability and infrastructure. System evaluation: To determine the strengths and weaknesses of the entire educational system and permit a good understanding of how well learning is being facilitated.
  • 5.
    What do weAssess? The Pre-Tertiary Assessment Framework has been designed to assess the Core Competences, 4Rs, Practical Skills and Values and Attitudes The six competences are: • Critical thinking and problem solving • Creativity and innovation • Communication and collaboration • Cultural identity and global citizenship • Personal development and leadership • Digital literacy. 4Rs (Reading, wRiting, aRithmetic and cReativity).
  • 6.
    Guidelines for selectingand using classroom assessment One needs to be clear about the learning target he/she wants to assess. Before you can assess a student, you must know the kind(s) of student knowledge, skill(s), and performance(s) about which you need information. One needs to be sure that the assessment technique(s) he/she selects actually match the learning target. One needs to be sure that the assessment technique(s) serve the needs of the learners. You should select assessment technique(s) that provide meaningful feedback to the learners about how closely they have approximated the learning targets.
  • 7.
    Guidelines for selectingand using classroom assessment Whenever possible, be sure to use multiple indicators of performance for each learning target. This will provide a better assessment of the extent to which a student has achieved a given learning target. One needs to be sure that when you interpret the result of assessments you take their limitations into account. Assessment is a means to an end. It is not an end in itself. Assessment provides information upon which decision are based
  • 8.
    Test Test connotes thepresentation of a standard set of questions to be answered. Test is an instrument or systematic procedure for observing and describing one or more characteristics of a student using either a numerical scale or a classification scheme (Nitko, 2001). A test is a formal, systematic, usually paper-and-pencil procedure for gathering information about pupil’s behaviour (Airasian, 1991). In schools, we usually think of a test as a paper-and-pencil instrument with a series of questions that students must answer. These tests are usually scored by adding together the “points” a student earned on each question. Thus they describe the student using a numerical scale.
  • 9.
    Types of test Testsare classified in different ways using criteria like purpose, uses and nature. Using purpose as a criterion test can be classified as Achievement tests, Diagnostic tests, Aptitude tests, Intelligence tests, etc. Using uses as a criterion test can be classified as Norm-referenced test, and Criterion-referenced tests. Using nature as a criterion test can be classified as paper-and- pencil test, oral test, performance test, etc.
  • 10.
    Achievement test It isa test designed to measure formal or “School taught” learning. Achievement tests measure the degree of student learning in specific curricula areas in which instruction has been received. Achievement tests are measures of previously acquired knowledge Achievement tests are designed to measure the extent to which a person has achieved, gain, attain, mastered certain skills as a result of specific instruction and learning. Achievement tests can be classified into two. Teacher-made achievement tests Standardized achievement tests
  • 11.
    Measurement Measurement is theprocess of quantifying the degree to which someone or something possesses a given characteristic, quality or feature. Most authorities agree that measurement is the assignment of numbers, numerals or symbols to the traits or characteristics of persons or events according to specific rules. Nitko (2001) defined measurement as a procedure for assigning numbers (usually called scores) to a specified attribute or characteristics of a person in such a way that the numbers describe the degree to which the person possesses the attribute.
  • 12.
    Scales of Measurement  Dependingupon the traits/attribute/characteristics and the way they are measured, different kinds of data result, representing different scales of measurement.  Variables may be grouped into four categories of scales depending on the amount of information given by the data.  Different rules apply at each scale of measurement, and each scale dictates certain types of statistical procedures. Ratio Interval Ordinal Nominal (or Categorical) These scales are hierarchical, with nominal being the lowest and the ratio being the highest.
  • 13.
    Nominal scale A nominalscale classifies persons or objects into two or more categories. Whatever the classification, a person or object can only be in one category, and members of a given category have a common set of characteristics. Numbers may be used to represent the variables but the numbers do not have numerical value or relationship. The numbers are only for identification purpose Examples are sex, occupation, color of eyes, and region of residence. We can say Male = 1 and female 2 or Male = 2 and female = 1, it does not make any change because the numbers are only for identification purpose.
  • 14.
    Ordinal scale An ordinalscale means our measurement now contain the property of order. It provides some information about the order or rank of the variables, but it does not indicate how much better one score is than another. This enables us to make statements using the phrases “More than” or less than”. For example, a secondary school teacher might rank three students, Kwame, Ama and John, with a score of 1, 2 and 3, respectively, on the trait of sociability. From this data; we can conclude that Kwame is more social than Ama, who, in turn, is more social than John. However, we can not say by how much Ama is more social than John.
  • 15.
    Interval scale Interval scalesare numeric scales in which we know both the order and the exact differences between the values. Interval scale have the characteristics of the nominal and ordinal scales, ie., the ability to classify and to indicate the direction of the difference. Using the interval scale the Zero point is arbitrary and does not mean the absence of the characteristics/trait. An example of an interval scale is the Fahrenheit temperature scale because of its equality of units. For instance, the difference between 300 and 340 is the same as the difference between 720 to 760 .
  • 16.
    Ratio scale The ratioscale incorporates all of the characteristics of the interval scale with one important addition – an absolute zero. Examples of ratio scales are height, weight, time and distance. With an absolute zero point, you can make statements involving ratios of two observations such as “twice as long as” or “half as fast as”
  • 17.
    Evaluation Evaluation is definedas the process of making a value judgement about the worth of a student’s product or performance (Nitko, 2001). It is a process by which quantitative and qualitative data are processed to arrive at a judgement of value and worth of effectiveness. The main concern of evaluation in the classroom is to arrive at a judgement on the worth or effectiveness of teaching and learning. Evaluation may or may not be based on measurements or tests results.
  • 18.
    Forms of Evaluation Thingsmay be evaluated during development as well as after they are completely developed. The terms formative and summative evaluation are used to distinguish the roles of evaluation during these two periods.
  • 19.
    Formative evaluation Formative evaluationis judgement about quality or worth made during the design or development of instructional materials, instructional procedures, curricula, or educational programmes. The evaluator directs these judgements towards modifying, forming, or otherwise improving the product before it is widely used in schools. A teacher also engages in formative evaluation when revising lessons or learning materials by using information obtained from their previous use. Sometimes we speak of formative evaluation of students. This means we are judging the quality of a student’s achievement of a learning target while the student is still in the process of learning it. Such judgement can help us guide a student’s next learning steps. No penalty is given to students when they are given formative evaluation, and it is guidance –oriented in nature.
  • 20.
    Summative evaluation Summative evaluationis judgement about the quality or worth of already- completed instructional materials, instructional procedures, curricula, or educational programmes. Such evaluation tends to summarize strengths and weaknesses, it describes the extent to which a properly implemented programme or procedure has attained its stated goals and objectives. Sometimes we speak of summative evaluation of students. By this we mean judging the quality or worth of a student’s achievement after the instructional process is completed. Giving letter grades on report cards is one example of reporting your summative evaluation of a student’s achievement during the preceding marking period.
  • 21.
    Purposes of Assessment Assessmentof learning, Assessment for learning Assessment as learning.
  • 22.
    Assessment of Learning Thepurpose of assessment of learning is usually SUMMATIVE and is mostly done at the end of a task, unit of work, at the end of a unit, term or semester, and may be used to rank or grade students. Assessment of learning is designed to provide evidence of achievement to parents, other educators, the students themselves and sometimes to outside groups (e.g., employers, other educational institutions).” It is designed primarily to serve the purposes of accountability, or of ranking, or of certifying competence.
  • 23.
    Teachers’ Roles inAssessment of Learning: Effective assessment of learning requires that teachers provide: a rationale for undertaking a particular assessment of learning at a particular point in time clear descriptions of the intended learning processes that make it possible for students to demonstrate their competence and skill a range of alternative mechanisms for assessing the same outcomes public and defensible reference points for making judgements transparent approaches to interpretation descriptions of the assessment process strategies for recourse in the event of disagreement about the decisions.
  • 24.
    Assessment for Learning Assessmentfor learning is any assessment for which the first priority in its design and practice is to serve the purpose of promoting pupils’ learning. Assessment for learning involves teachers using evidence about students' knowledge, understanding and skills to inform their teaching. Sometimes referred to as ‘formative assessment', it usually occurs throughout the teaching and learning process to clarify student learning and understanding. Students understand exactly what they are to learn, what is expected of them and are given feedback and advice on how to improve their work.
  • 25.
    Teachers’ Roles inAssessment for Learning: It is interactive, with teachers: aligning instruction identifying particular learning needs of students or groups selecting and adapting materials and resources creating differentiated teaching strategies and learning opportunities for helping individual students move forward in their learning Providing immediate feedback and direction to students
  • 26.
    Assessment as Learning Assessmentas learning occurs when students are their own assessors. Students monitor their own learning, ask questions and use a range of strategies to decide what they know and can do, and how to use assessment for new learning. Through this process students are able to learn about themselves as learners and become aware of how they learn, that is, become metacognitive (knowledge of one’s own thought processes). Assessment as learning helps students to take more responsibility for their own learning and monitoring future directions.
  • 27.
    In monitoring metacognitionone need to ask him/herself the following questions; What is the purpose of learning these concepts and skills? What do I know about this topic? What strategies do I know that will help me learn this? Am I understanding these concepts? What are the criteria for improving my work? Have I accomplished the goals I set for myself?
  • 28.
    Teachers’ Roles inAssessment as Learning: The teachers’ role in promoting the development of independent learners through assessment as learning is to: model and teach the skills of self-assessment guide students in setting their own goals, and monitoring their progress toward them provide exemplars and models of good practice and quality work that reflect curriculum outcomes work with students to develop clear criteria of good practice guide students in developing internal feedback or self-monitoring mechanisms to validate and question their own thinking, and to become comfortable with ambiguity and uncertainty that is inevitable in learning anything new
  • 29.
  • 30.
    CORSE TITLE: MEASUREMENT, EVALUATIONAND STATISTICS IN EDUCATION COURSE CODE: EDCR 361 PETER ESHUN 0244590189 peteshun37@gmail.com
  • 31.
  • 32.
    Thebasicstepsintheclassroomassessmentprocessare: setting targets andwriting objectives, choose assessment items and technique, administering assessments and analyse the data share the results with students and other stakeholders
  • 33.
    What you wouldlike students to be able to do, value, or feel at the completion of an instructional segments is termed as learning objective. Some planned changes are: You may want students to read a claim made by a political figure and determine whether there is evidence available to support that claim. That is cognitive. You may want students to feel comfortable when talking in front of their classmates about how to solve mathematics problems. That is affective. Such change relate to values. You may want students to set up, focus, and use a microscope properly during a science investigation of pond water. That is psychomotor.
  • 34.
    What is Taxonomy? Taxonomyis a system of classification. It is an ordered classification system. They are hierarchical schemes for classifying learning objectives into various levels of complexity. Taxonomy can help you to bring to mind the wide range of important learning objectives and thinking skills to avoid narrowly focusing on some lower level objective only. The three domains are explained separately, they are closely related.
  • 35.
    THE COGNITIVE DOMAIN Thecognitive domain deals with all mental processes including perception, memory and information processing by which the individual acquires knowledge, solves problems and plans for the future. There are different taxonomies under the cognitive domain. Let us start with the Bloom’s taxonomy of the cognitive domain.
  • 36.
    Bloom’s taxonomy ofcognitive domain Evaluation (makes judgments about materials and methods) Synthesis (puts the parts together to form a new whole) Analysis (break an idea into component part and describe the relationships) Application (application of a rule or principle) Comprehension (lowest level of understanding) Knowledge (recall of specific information)
  • 37.
    Bloom's Revised Taxonomy LorinAnderson, a former student of Bloom, and David Krathwohl revisited the cognitive domain in the mid-nineties and made some changes. changing the names in the six categories from noun to verb forms rearranging them
  • 39.
    Level of Taxonomy Definition QuestionStems Creating Generating new ideas, products, or ways of viewing things. Designing, constructing, planning, producing, inventing -How would you devise your own way to…? -How many ways can you…? Evaluating Justifying a decision or course of action. Checking, hypothesizing, critiquing, experimenting, judging -Is there a better solution to…? -What do you think about…? Analyzing Breaking information into parts to explore understandings and relationships. Comparing, organizing, deconstructing, interrogating, finding -How is …similar to …? -What are some other outcomes? Applying Using information in another familiar situation. Implementing, carrying out, using, executing -Do you know of another instance where…? -Which factors would you change…? Understanding Explaining ideas or concepts . Interpreting, summarizing, paraphrasing, classifying, explaining -How would you explain…? -What was the main idea…? Remembering Recalling information . Recognizing, listing, naming -What is…? -Who …?
  • 40.
    THE AFFECTIVE DOMAIN Theaffective domain describes our feeling, likes and dislikes, and our experiences as well as the resulting behaviours (reactions). Affective learning includes the manner in which we deal with things emotionally, such as feelings, values, appreciation, enthusiasms, motivations, and attitudes. It is demonstrated by behaviours indicating attitudes of awareness, interest, attention, concern and responsibility, ability to listen and respond in interactions with others, and ability to demonstrate those attitudinal characteristics or values, which are appropriate to the test situation and the field of study.
  • 41.
    Krathwohl, Bloom andMasia (1973) put it into five major categories listed from the simplest behaviour to the most complex: Internalizing values (characterization) Organization Valuing Responding to Phenomena Receiving Phenomena
  • 42.
    Receiving Phenomena: Awareness,willingness to hear, selected attention. Examples: Listen to others with respect. Listen for and remember the name of newly introduced people. Responding to Phenomena: Active participation on the part of the learners. Attends and reacts to a particular phenomenon. Learning outcomes may emphasize compliance in responding, willingness to respond, or satisfaction in responding (motivation). Examples: Participates in class discussions. Questions new ideals, concepts, models, etc. in order to fully understand them.
  • 43.
    Valuing: The worthor value a person attaches to a particular object, phenomenon, or behaviour. Valuing is based on the internalization of a set of specified values, while clues to these values are expressed in the learner's overt behaviour and are often identifiable. Examples: Demonstrates belief in the democratic process. Is sensitive towards individual and cultural differences (value diversity). Shows the ability to solve problems. Proposes a plan to social improvement and follows through with commitment.
  • 44.
    Organization: Organizes valuesinto priorities by contrasting different values, resolving conflicts between them, and creating an unique value system. The emphasis is on comparing, relating, and synthesizing values. Examples: Recognizes the need for balance between freedom and responsible behaviour. Accepts responsibility for one's behaviour. Explains the role of systematic planning in solving problems. Prioritizes time effectively to meet the needs of the organization, family, and self.
  • 45.
    Internalizing values (characterization):Has a value system that controls their behaviour. The behaviour is pervasive, consistent, predictable, and most importantly, characteristic of the learner. Instructional objectives are concerned with the student's general patterns of adjustment (personal, social, emotional). Examples: Shows self-reliance when working independently. Cooperates in group activities (displays teamwork). Displays a professional commitment to ethical practice on a daily basis.
  • 46.
    THE PSYCHOMOTOR DOMAIN Thisrefers to educational outcomes that focus on motor (movement) skills and perceptual processes. According to Simpson (1972) the psychomotor domain includes physical movement, coordination, and use of the motor-skill areas. Development of these skills requires practice and is measured in terms of speed, precision, distance, procedures, or techniques in execution.
  • 47.
    The seven majorcategories are listed from the simplest behaviour to the most complex: Perception (awareness): The ability to use sensory cues to guide motor activity. Detects non-verbal communication cues. Estimate where a ball will land after it is thrown and then moving to the correct location to catch the ball. Set: Readiness to act. It includes mental, physical, and emotional sets. Shows desire to learn a new process (motivation). Guided Response: The early stages in learning a complex skill that includes imitation and trial and error. Performs a mathematical equation as demonstrated. Mechanism (basic proficiency): Learned responses have become habitual and the movements can be performed with some confidence and proficiency. Use a personal computer.
  • 48.
    Complex Overt Response(Expert): The skillful performance of motor acts that involve complex movement patterns. Operates a computer quickly and accurately. Adaptation: Skills are well developed and the individual can modify movement patterns to fit special requirements. Responds effectively to unexpected experiences. Modifies instruction to meet the needs of the learners. Origination: Creating new movement patterns to fit a particular situation or specific problem. Constructs a new theory. Develops a new and comprehensive training programming.
  • 49.
  • 50.
    COURSE TITLE: MEASUREMENT, EVALUATIONAND STATISTICS IN EDUCATION COURSE CODE: EDCR 361 PE TE R ES HUN, P HD DE PA RTM E NT O F E D UC ATIO NA L FO UNDATIO NS PE TES HU N3 7@ GM A I L.CO M +23 3 244 5 90 18 9
  • 51.
  • 52.
    WHAT IS SBA? School-basedAssessment (SBA) is a kind of assessment carried out in schools by pupils' own teachers, with the prime purpose of improving pupils' learning. SBA is a formative and diagnostic task geared towards improving the quality of teaching, learning and the mode of assessment itself.
  • 53.
    Cont… School Based Assessmentis a system of using different test modes: class tests, class exercises, groupworks, portfolio, rubric, homework, projects and other assessment procedures to measure what learners have achieved through a teaching/learning process.
  • 54.
    Cont… Broadly, the SBAis simply all forms/modes of assessment that can be undertaken internally by any school-level actor (learner, teacher, headteacher). This means that SBA include diagnostic assessments, formative assessments and summative assessments that can be completed while at the school.
  • 55.
    Cont… For the purposeof describing the overall learner achievement at the end of a term, an Internal Assessment Score (IAS) and an End-of-Term Exam Score (ETES) shall be generated and put together (i.e. added). NaCCA is responsible for providing guidance and support on SBA, whereas the training of teachers on effective use of the SBA is the responsibility of GES
  • 56.
    Why SBA? The objectivesof the school-based assessment are to: obtain a better picture of the learner performance across an entire programme or course than that provided by a single-shot examination enable holistic assessment of the learner emphasise learner-centred approach to learning  standardise internal assessment practices across schools  assess the core competences which otherwise cannot be assessed by National Standard Assessment Test (NSAT)
  • 57.
    Cont… guide improvement ofinstruction and learning in schools allow for teachers to develop assessments around challenging areas of the curriculum collect information about learners that will be helpful in planning instruction to meet their most critical learning needs. assess whether the instruction provided is enough to help learners achieve the curriculum standards encourage more individualised instruction
  • 58.
    Cont… identify students whomay be “at risk” or who may need extra instruction or intensive interventions if they are to move toward grade-level standards monitor all learners’ progress to determine whether “at risk” students are making adequate progress, and to identify any learner who may be falling behind.  introduce a system of moderation that will ensure accuracy and reliability of teachers’ marks. provides teachers with advice on how to conduct remedial instruction on difficult areas of the curriculum to improve class performance.
  • 59.
    Uses of SBA Amongother things, the SBA will be used for the following: help learners reflect upon their own learning and progress help learners understand and appreciate their strengths, abilities and areas for development. help prevent underachievement improve motivation and self-esteem
  • 60.
    Cont… activate learners asinstructional resources for one another promote teamwork and collaboration fosters cooperation between the teacher and the learner especially in the area of learners’ class projects allow for the holistic approach to assessing learners
  • 61.
    Modes of SBA SBAemphasises a learner-centred approach to learning and seeks to develop high ability thinking skills, problem solving skills, cooperative learning, teamwork, moral and spiritual development and formal presentation skills on the part of the learner. The tasks given in SBA are referred to as Class Assessment Tasks (CATs) For recording purposes it consists of 12 assessments a year. Four CATs in a term.
  • 62.
    Cont… The 12 assessmentsare labelled as CAT 1, CAT 2 up to CAT 12. The class assessment task (CAT) 1-4 will be administered in Term 1 CAT 5-8 will be administered in Term 2 CAT 9-12 in Term 3.
  • 63.
  • 64.
    Cont… CAT 1 isgroup exercise CAT 2 is class test CAT 3 is group exercise CAT 4 is Project work CAT 1, CAT 2, CAT 3, CAT 4 in first term CAT 5, CAT 6, CAT 7, CAT 8 in second term CAT 9, CAT10, CAT11, CAT12 in third term
  • 65.
    Cont… CAT1 will beadministered as group exercise coming at the end of week 4 of Term1. When administered in Terms 2 and 3, the equivalent of CAT 1 will be CAT 5 and CAT 9 respectively.
  • 66.
    Cont… CAT 2 willbe administered as class test coming at the end of week 8 of Term 1. When administered in Terms 2 and 3, the equivalent of CAT 2 is CAT 6 and 10 respectively.
  • 67.
    Cont… CAT3 will alsobe administered as group exercise coming at the end of the 11th or 12th week of the term. When administered in Terms 2 and 3, the equivalent of CAT 3 is CAT 7 and CAT 11 respectively.
  • 68.
    Cont… CAT 4, 8and 12 are projects. These are tasks assigned to learners individually or in groups to complete over an extended period of time (given week 2 and collected week 12) It may include practical work and investigative study (including case study) and survey. A report will be presented for each project undertaken.
  • 69.
    Successful SBA takesinto account six principles of best practice: Principle 1: Emphasis must be placed on assessment for learning and assessment as learning. There should be a monitoring and evaluation plan or framework. Principle 2: School Improvement and Support Officers (SISO) within the MMDA, headteachers and teachers should be given support in terms of competency training workshops and Professional Learning Community (PLC) training sessions.
  • 70.
    Cont… Principle 3: Datautilisation must be a priority, in order to inform teachers and other key stakeholders on current learning progressions in various schools across the country. Principle 4: Technology should be utilised as far as possible in implementing SBA.
  • 71.
    Cont… Principle 5: Projecttopics should be developed from the different subject curricula. Projects should be flexible in executing. The teacher should also give learners the opportunity to sometimes come up with their own project topics that should be aimed at solving problems that affect them. Principle 6: Authentic classroom assessment should be incorporated as part of SBA
  • 72.
    Project-based Assessment underSBA Projects are learning activities that provide learners with the opportunity to synthesise and apply knowledge in a real-life situation. Project work can be in any of the following forms: Experiments: This will require the teacher to give learners a task that involves carrying out a test trial or a tentative procedure. It is an act or operation for the purpose of discovering something unknown or of testing a principle. During experiments, learners observe, record, analyse and evaluate their results.
  • 73.
    Cont… Investigations: Investigative projectsenable learners to create their own questions around a topic, collect, organise, and evaluate data, draw conclusions and share results through presentations and explanations. Learners may demonstrate the results of their investigations through different types of products and experiences, including the writing of a paper, the development of artwork, oral presentations, audio and videotape productions, photographic essays, simulations, or plays.
  • 74.
    How do weassess projects? A scoring rubric, usually in the form of a table, is used to evaluate the project work. Projects can be assessed at each of the key stages – preparation, the process, product, and response.
  • 75.
    Cont… Other key considerationsmay include: originality of the project; level of creativity and innovation; simplicity and cost effectiveness of the project; the extent to which it fosters learner development and other core competences; and the extent to which the final product can be used in real-life situation or to address a challenge. A project may thus be assessed by teachers, peers, a panel or parents.
  • 76.
  • 77.
    CORSE TITLE: MEASUREMENT, EVALUATIONAND STATISTICS IN EDUCATION COURSE CODE: EDCR 361 PETER ESHUN (Ph.D.) 0244590189 peteshun37@gmail.com
  • 78.
    Characteristics of a GoodAssessment Result
  • 79.
    Because a testis labeled as a measure of a specific construct does not necessarily mean that inferences made about test scores are legitimate.
  • 80.
    Any meaningful testshould possess certain characteristics. The characteristics includes the fact that the results which will come out of this test should be relevant to what is being measured and at the same time stable and therefore trustworthy.
  • 81.
    The relevance andstability of the test results bring us to the two main attribute of instruments in educational measurement. These are validity and reliability.
  • 82.
    Validity Validity refers tothe legitimacy of the inferences that we make about a score. It is not the test itself that is either valid or invalid. It is the interpretation of the results in the form of scores and the degree to which they assess the construct being assessed that should be evaluated in terms of their validity.
  • 83.
    It is possiblefor the results of a test to be valid for one purpose and invalid for another. Validity is the soundness of the interpretations and uses of students’ assessment results. Validity is the appropriateness or correctness of inferences, decisions, or descriptions made about individuals, groups, or institutions from test results.
  • 84.
    To validate yourinterpretations and uses of students’ assessment results, you must provide evidence that these interpretations and uses are appropriate.
  • 85.
    When discussing thevalidity of assessment results, the following points should be kept in mind. The concept of validity applies to the ways in which we interpret and use the assessment results and not the assessment procedure itself. The assessment results have different degrees of validity for different purposes and for different situations. Judgments about the validity of your interpretations or uses of assessment results should be made only after you have studied and combined several types of validity evidence.
  • 86.
    Categories of ValidityEvidence Content-related evidence Criterion-related evidence Construct-related evidence
  • 87.
    Content-related evidence This typeof evidence refers to the content representativeness and relevance of tasks or items on an instrument. It is defined as the degree to which the set of items included on a particular test are representative of the entire domain of items that the test is intended to assess
  • 88.
    Content-related evidence answers questionslike: How well do the assessment tasks represent the domain of important content? How well do the assessment tasks represent the curriculum as defined? How well do the assessment tasks reflect current thinking about what should be taught and assessed? Are the assessment tasks worthy of being learned?
  • 89.
    Criterion-related evidence The criteriaapproach to establishing validity involves the demonstration of a relationship between test scores and a criterion that is usually, but not necessary, another test. It is most often used to evaluate the scores from aptitude test because the instruments are intended to predict concrete behaviours. There are two types of criterion- related evidence. These are concurrent validity and predictive validity .The only procedural distinction between these pertains to the time period when the criterion data are gathered.
  • 90.
    Construct-related evidence This typeof evidence refers to how well the assessment results can be interpreted as reflecting an individuals’ status regarding an educational or psychological trait, attribute or mental process. Construct – related validity is the extent to which test performance can be interpreted in terms of certain psychological traits (Grounlund, 1985)
  • 91.
    Examples of suchconstructs include intelligence, aggressiveness, honesty, sociability, verbal fluency and mathematical reasoning. It is believed that each construct has an underlying theory which can facilitate the prediction of a person’s behaviour.
  • 92.
    Factors Affecting Validity Uncleardirections. Too difficult reading vocabulary and sentence structure. Ambiguous statements in assessment tasks and items. Inadequate time limits. Inappropriate level of difficulty of the test items. Improper arrangement of items. Cheating. Student emotional disturbances
  • 93.
    Reliability Would your studentsget the same scores if they took one of your tests on two different occasions? Would they get approximately the same scores if they took two different forms of one of your tests? These questions have to do with the consistency with which your classroom tests measure students’ achievement. The generic name for consistency is reliability.
  • 94.
    Reliability is anessential characteristic of a good test, because if a test doesn’t measure consistently (reliably), then one could not count on the scores resulting from a particular administration to be an accurate index of students’ achievement.
  • 95.
    When applying theconcept of reliability note that: Reliability refers to the results obtained with an assessment instrument and not to the instrument itself. An estimate of reliability refers to a particular type of consistency. Reliability is a necessary condition but not a sufficient condition for validity. Results can be reliable but not valid. For test to be valid, it must be reliable.
  • 96.
    Reliability is primarilystatistical. It is determined by the reliability coefficient, which is defined as a correlation coefficient that indicates the degree of relationship between two sets of scores intended to be measures of the same characteristic.
  • 97.
    True score Theory Itis a theory of testing based on the idea that a person’s observed or obtained score (X) on a test is the sum of a true score (T) (error-free score) and an error score (E) X = T + E Observed score True score Error score
  • 98.
    Standard Error ofMeasurement (SEM) If you want to track student progress over time, it’s critical to use an assessment that provides you with accurate estimates of student achievement, that is, assessment with a high level of precision. When we refer to measures of precision, we are talking about something known as the standard error of measurement (SEM). Before we define SEM, it’s important to remember that all assessment scores are an estimate. That is, irrespective of the test being used, all observed scores include some measurement error, so we can never really know a student’s actual achievement level (their true score).
  • 99.
    Standard Error ofMeasurement (SEM) We can estimate the range in which we think a student’s true score likely falls; in general, the smaller the range, the greater the precision of the assessment. SEM, put in simple terms, is a measure of the precision of an assessment, The smaller the SEM, the more precise the measurement capacity of the instrument. Consequently, smaller standard errors translate to more sensitive measurements of student progress.
  • 100.
    Standard Error ofMeasurement (SEM) The standard error of measurement (SEM) is a measure of how much measured test scores are spread around a “true” score. The SEM is especially meaningful to a test taker because it applies to a single score and it uses the same units as the test. SEM is define as the standard deviation of errors of measurement that is associated with the test scores for a specified group of test takers. The formula:
  • 101.
    Methods of EstimatingReliability Test–Retest Method Equivalent Forms Method Split – half method Inter-rater method
  • 102.
    Test–Retest Method The sameinstrument is given twice to the same group of people. The reliability is the correlation between the scores on the two administrations. This is a measure of the stability of scores over a period of time. A major disadvantage of this method lies in the carry over effects from one testing occasion to another.
  • 103.
    Equivalent Forms Method Itis also referred to as Parallel or Alternate method Two different versions of the instrument are created. We assume both measure the same thing. The two forms, which are alternate or parallel with the same content and level of difficulty for each item, are administered to the same group of students.
  • 104.
    The forms maybe given on the same or nearly the same occasion or a time interval will elapse before the second form is given. The scores on the two administrations are correlated and the result is the estimate of the reliability of the test. A problem with this method of estimating the reliability of a test is the construction of a parallel or an alternate form of a test.
  • 105.
    Split – halfmethod A single test is given to the students. The test is then divided into two halves for scoring. The two scores for each student are correlated to obtain the estimate of reliability. The correlation coefficient of the two halves gives the reliability coefficient of one-half of the test. This is often used with dichotomous variables that are scored 0 for incorrect and 1 for correct.
  • 106.
    Whole Test Reliability= 2 x correlation between half test scores 1 + correlation between half test scores
  • 107.
    Inter-rater method Two ormore raters or scores each score the students papers. The two set of scores from each rater for all the students are correlated. The resulting correlation coefficient is known as scorer reliability or inter-rater reliability.
  • 108.
    Factors influencing reliability Testlength Group variability Difficulty of items Scoring objectivity Duration
  • 109.
    READING ASSIGNMENT Your schoolis looking for an assessment instrument to measure reading ability. They have narrowed the selection to two possibilities. Test A provides data indicating that it has high validity, but there is no information about its reliability. Test B provides data indicating that it has high reliability, but there is no information about its validity. Which test would you recommend? Why? Use practical example.
  • 110.
    COURSE TITLE: MEASUREMENT, EVALUATIONAND STATISTICS IN EDUCATION COURSE CODE: EDCR 361 PETER ESHUN (PH.D.) DEPARTMENT OF EDUCATIONAL FOUNDATIONS PETESHUN37@GMAIL.COM +233 244590189
  • 111.
    The Stages inPlanning and Constructing Classroom Tests
  • 112.
    The stages inclassroom testing According to Etsey (2001), the principal stages involved in classroom testing are: Constructing the test, Administering the test Scoring the test Analysing the test results.
  • 113.
    Constructing the Test Stepsin the construction of a good classroom test. Define the purpose of the test Determine the item format to use Determine what is to be tested Write the individual items Review the items Prepare scoring key Write directions Evaluate the test
  • 114.
    Define the purposeof the test The basic question to answer is “why am I testing?” Test items must be related to teacher’s classroom instructional objectives. Several purposes are served by classroom tests and the teacher has to be clear on the purpose of the test.
  • 115.
    Cont… This forms partof the planning stage so the teacher has to answer other questions such as “Why is the test being given at this time in the course?” Who will take the test?” Have the test takers been informed?” “How will the scores be used?”
  • 116.
    Determine the itemformat to use The choice of format must be appropriate for testing particular topics and objectives. One needs to list the objectives of the subject matter for which the test is being constructed and then the main topics covered or to be covered. The test items could either be objective, essay or performance types. It is sometimes necessary to use more than one format in a single test.
  • 117.
    Factors to considerin the choice of the appropriate format the purpose of the test the time available to prepare and score the test the number of students to be tested the skill to be tested the difficulty desired physical facilities that are available age/level of the pupils test constructor or teacher’s skill in writing the different type of items.
  • 118.
    Determine what isto be tested You need to consider what the test will actually cover. You consider the topics and the objectives/behaviours (the various levels of the domain). You need to construct your Table of specification or Test Blueprint.
  • 119.
    Table of Specification/Testblue-print Behaviours Content Knowledge Comprehensi on Application Analysis Synthesis Evaluation Total Soil Condition 1 1 1 3 Weather condition 1 1 1 3 Location of industry 2 1 1 1 1 6 Labour processing 1 1 2 Marketing 3 1 1 1 6 Total 5 4 3 5 1 2 20
  • 120.
    Some guidelines tofollow when writing the items Keep the table of specifications before you and continually refer to it as you write the items. Items must match the instructional objectives. Formulate well-defined items that are not vague, and ambiguous and should be grammatically correct and free from spelling and typing errors. Avoid excessive words. Avoid needless complex sentences. The item should be based on information that the examinee should know.
  • 121.
    Cont… Write the testitems simply and clearly. Prepare more items than you will actually need. The task to be performed and the type of answers required should be clearly defined. Include questions of varying difficulty. Write the items and the key as soon as possible after the material has been taught. Avoid textbook or stereotyped language. Write the items in advance of the test date to permit reviews and editing.
  • 122.
    Reviewing the items Criticallyexamine each item at least a week after writing the item. Items that are ambiguous and those poorly constructed as well as items that do not match the objectives must be reworded or removed. Check the length of the test (number of items) against the purpose. Check the kinds of test items used and the ability level of the students.
  • 123.
    Cont… To make thereview more effective, the test constructor or teacher could try out the test, that is, administer the prepared items to a representative sample of people similar to those who will take the complete test.
  • 124.
    Cont… The advantages oftrying out the test includes: helps to identify items that are not suitable due to ambiguity, undue complexity or misleading instructions; makes it possible for the test writer to determine the degree of difficulty of each item so that items that are too difficulty or easy will be eliminated; enables the test writer to get information on the number of items to include in the test and also determine the time limit for the final test.
  • 125.
    Prepare Scoring key Preparea scoring key or marking scheme while the items are fresh in your mind. List the correct responses and acceptable variations for objective- type tests. Assign points to the various expected qualities of responses. Assign values to each item and ensure representative sampling of content covered.
  • 126.
    Write directions Give clearand concise directions for the entire test as well as sections of the test. Clearly state the time limit for the test. Penalties for undesirable writing must be spelt out. Directions must include number of items to respond to, how the answers will be written, where the answers will be written, etc.
  • 127.
    Evaluate the test Beforeadministration, the test should be evaluated by the following five criteria: Clarity validity practicality efficiency fairness.
  • 128.
    Clarity: Who is beingtested? What material is the test measuring? What kind of knowledge is the test measuring? Do the test items relate to content and course objectives? Are the test items simple and clear?
  • 129.
    Validity: Is the testa representation sampling of the material presented in the chapter, unit, section or course? Does the test faithfully reflect the level of difficulty of material covered in the class?
  • 130.
    Practicality Will students haveenough time to complete the test? Are there sufficient materials available to present the test to complete it effectively?
  • 131.
    Efficiency: Is this thebest way to test for the desired knowledge, skill or attitude? What problems might arise due to material difficulties or shortage?
  • 132.
    Fairness: Were the studentsgiven advance notice? Have I adequately prepared students for the test? Do the students understand the testing procedures? How will the scores affect the students’ lives?
  • 133.
  • 134.
    Multiple-Choice Tests It isa type of objective test in which the respondent is given a stem that introduces a problem or asks a question, and he/she is to select from among three or more alternatives (options or responses) the one that best completes the stem. The incorrect options are called foils or distracters. 134
  • 135.
    Types of multiple-choicetests. ‘Single best response’ type ‘Multiple response’ type An example of single best response is: Write 0.039387 as a decimal correct to 3 significant figures. A. 0.394 B. 0.393 C. 0.0394 D. 0.0393 E. 0.039 135
  • 136.
    Types of multiple-choicetests (cont.) An example of multiple response is: Which of the following action(s) contribute to general principles of First Aid? i. Arrest haemorrhage ii. Bath the patient iii. Immobilize injured bone A. i only B. ii only C. i and ii D. i, ii and iii 136
  • 137.
    Multiple-Choice Tests comment onthe following questions Q1. Planning is ………………… a. Deciding in the present what to do soon b. It’s the process whereby companies reconcile their resources with their finances and achievements c. Providing a rational approach to pre-selected objectives d. It enables companies to reconcile their resources with their goals and objectives
  • 138.
    Multiple-Choice Tests comment onthe following questions Q1. Where and when did the European powers partition Africa among themselves a. 1854 and London b. 1889 and Berlin c. 1884 and Berlin d. 1884 and London Q2. …………………….. is built directly on the hardware. a. Computer Environment b. Application Software c. Database Systems d. Operating Systems
  • 139.
    Guidelines for constructinggood multiple-choice item The central issue of the item should be in the stem. It should be concise, easy to read and understand. A stem should ask question or set a task All options for a given item should be homogeneous in content, form and grammatical structure.
  • 140.
    Cont… Repetition of wordsin the options should be avoided. Specific determiners which are clues to the best/correct option should be avoided. Vary the placement of the correct options. No discernible pattern of the correct/ best responses should be noticed Items measuring opinions should not be included.
  • 141.
    Cont… The responses inagreement must be ◦ parallel in form i.e. sentences must be about the same length ◦ in an alphabetical/ sequential order ◦ itemized vertically and not horizontally. Each option must be distinct. Overlapping alternatives should be avoided. Present a single, clearly formulated problem in the stem of the item.
  • 142.
    Avoid using “allof the above” as an option but “None of the above” can be used sparingly. It should be used only when an item is of the ‘correct answer’ type and not the ‘best answer’ type. Stems and options should be stated positively. However, a negative stem could be used sparingly and the word not should be emphasized either by underlining it or writing it in capital form.
  • 143.
    Create independent items.The answer to one item should not depend on the knowledge of the answer to a previous item. Sentences should not be copied from textbooks or from past test items. Original items should be made.
  • 144.
    Guidelines for constructing True/Falsetests 1. A class always has a default constructor. a. True b. False c. Neither True nor False d. Both True and False
  • 145.
    Guidelines for constructing True/Falsetests Statements must be definitely true or definitely false. Poor Item: The value of 2/3 as a decimal fraction is 0.7 True or False Good Item: The value of 2/3 expressed as a decimal fraction correct to two decimal places is 0.67 True or False 145
  • 146.
    Guidelines for constructing Trueand False tests (cont.) Avoid words that tend to be clues to the correct answer. Word like some, most, often , many, may are usually associated with true statements. All, always, never, none are associated with false statements. 146
  • 147.
    Guidelines for constructing Trueand False tests (cont.) Approximately, half (50%) of the total number of items should be false because it is easier to construct statements that are true and the tendency is to have more true statement. Statement should possess only one central theme. 147
  • 148.
    Guidelines for constructing Trueand False tests (cont.) State each item positively. Negative item could however be used with the negative word, ‘not’, emphasized by underlining or writing in capital letters. Double negatives should be avoided. Statement should be short, simple and clear. Ambiguous as well as tricky statements should be avoided. Example: Kwame Ankumah was the First President of Ghana. True or False. 148
  • 149.
    Guidelines for constructing Trueand False tests (cont.) Arrange the items such that the correct responses do not form a discernible pattern. To avoid scoring problems, let students write the correct options in full. 149
  • 150.
    MATCHING-TYPE TESTS The matchingtype of objective test consists of two columns and these items require students to match information in the two columns. Items in the left-hand column are called premises, and those in the right-hand column are called responses. 150
  • 151.
    MATCHING-TYPE TESTS Example: Match thevitamins in Column A with the diseases and conditions which a lack of the vitamin causes in column B using an arrow line Column A: Column B Vitamins Diseases caused by lack 1. Vitamin A a. Bariberi 2. Vitamin C b. Kwashiorkor 3. Vitamin D c. Pellagra d. Poor eyesight e. Rickets f. Scurvy 151
  • 152.
    Guidelines for constructingmatching-type tests State clearly what each column represents. Provide complete directions. Instructions should clearly show what the rules are and also how to respond to the items. Do not use perfect matching. Use a larger, or smaller, number of responses than premises, and permit the responses to be used more than once. 152
  • 153.
    Guidelines for constructingmatching-type tests (cont) Arrange premises and responses alphabetically or sequentially. Column A (premises) should contain the list of longer phrases. The shorter items should constitute the responses. Limit the number of items in each set. Use homogenous options and items. All options-must be placed (and typed) on the same page. 153
  • 154.
    Guidelines in constructinggood classroom Essay tests Plan the test Give adequate time and thought to the preparation of the test items. The test items must be constructed from a test specification table and well in advance (at least two weeks) of the testing date. The items should be based on novel situations and problems. Be original. Do not copy directly from textbooks or past test items.
  • 155.
    Cont… Test items shouldrequire the students to show adequate command of essential knowledge. The items should not measure rote memorization of facts, definitions and theorems but must be restricted to the measuring of higher mental processes such as application, analysis, synthesis and evaluation.
  • 156.
    Cont… The length ofthe response and the difficulty level of items should be adapted to the maturity level of students (age and educational level). Optional items should not be provided when content is relevant. They may be necessary only for large external examinations and when the purpose of the test is to measure writing effectiveness. If students answer different questions, an analysis of the performances on the test items is difficult.
  • 157.
    Cont… Prepare a scoringkey (marking scheme) at the time the item is prepared. Decide in advance what factors will be considered in evaluating an essay response. Determine the points to be included and the weights to be assigned for each point. The preparation of a model answer will help disclose ambiguities in an item.
  • 158.
    Cont… Establish a frameworkand specify the limits of the problem so that the student knows exactly what to do. Present the student with a problem which is carefully worded so that only ONE interpretation is possible. The questions/items must not be ambiguous or vague.
  • 159.
    Cont.. Indicate the valueof the question and the time to be spent in answering it. Structure the test item such that it will elicit the type of behaviours you really want to measure. The test items must be based on the instructional objectives for each content unit.
  • 160.
    Cont… Give preference toa large number of items that require brief answers. These provide a broader sampling of subject content and thus better than a few items that require extended responses. Start essay test items with words that are clear and as simple as possible and which requires the student to respond to the stimulus expected. Avoid words such as: what, list, who, as much as possible.
  • 161.
    Administering Test When teachersadminister tests and quizzes, they want to create conditions that optimize students’ performance to ensure that assessment results accurately reflect what students know and can do. If one construct a very good test and its administration is not done well, one will not get the desired results.
  • 162.
    Cont… Test administration isconcerned with the physical and psychological setting in which pupils take test.
  • 163.
    Prepare students forthe test. The following information is essential to students’ maximum performance. When the test will be given (date and time). Under what conditions it will be given (timed or take- home, number of items, open book or closed book, place of test). The content areas it will cover (study questions or a list of learning targets).
  • 164.
    Cont… Emphasis or weightingof content areas (value in points). The kinds of items on the test (objective-types or essay- type tests). How the assessment will be scored and graded. The importance of the results of the test.
  • 165.
    Cont… Students must bemade aware of the rules and regulations covering the conduct of the test. Penalties for malpractice such as cheating should be clearly spelt out and clearly adhered to. Avoid giving tests immediately before or after a long vacation, holidays or other important events where all students are actively involved physically or psychologically/emotionally.
  • 166.
    Cont… Announcements must bemade about the time at regular intervals. Time left for the completion of the test should be written on the board where practicable. Invigilators are expected to stand at a point where they could view all students. They should once a while move among the pupils to check on malpractices. Such movements should not disturb the pupils. He/she must be vigilant. Reading novels, newspapers, grading papers are not allowed.
  • 167.
    Cont… Threatening behaviours shouldbe avoided by the invigilators. Speeches like ‘If you don’t write fast, you will fail’ are threatening. Pupils should be made to feel at ease. Do not talk unnecessarily before letting students start working. Remarks should be kept to a minimum and related to the test.
  • 168.
    Cont… Avoid giving hintsto students who ask about individual items. Where an item is ambiguous, it should be clarified for the entire group. Expect and prepare for emergencies. Emergencies might include shortages of answer booklets, question papers, power outages, illness etc.
  • 169.
    Scoring the test Teacherstypically use two types of scoring procedures to evaluate the quality of students responses to essay questions and products. These are: the analytic (also called scoring key, point or trait) method and the holistic (also called global, sorting or rating) method.
  • 170.
    Analytic Scoring Rubrics Theanalytic scoring rubric requires the scorer to develop an outline or a list of major elements that students should include in the ideal answer. Also, the scorer needs to decide on the number of points to award to students when they include each element.
  • 171.
    Cont… Holistic Scoring Rubrics Theholistic scoring rubrics require the scorer to make a judgment about the overall quality of each student’s response. Its purpose is to sort students’ responses into categories that indicate quality. A very important point in deciding the categories is to be sure they correspond to your school’s grading system.
  • 172.
    The procedure forholistic scoring consists of the following seven steps: Establish the scoring categories you will use. Characterize a response that fits each category. Read each response rapidly and form an overall impression. Sort the responses into the designated categories. Reread the papers that have been placed within a category.
  • 173.
    Cont… Move any clearlysuperior or inferior responses to other categories. Assign the same numerical score to all responses within a category.
  • 174.
  • 175.
    COURSE TITLE: MEASUREMENT, EVALUATIONAND STATISTICS IN EDUCATION COURSE CODE: EDCR 361 PE TE R ES HUN, P HD DE PA RTM E NT O F E D UC ATIO NA L FO UNDATIO NS PE TES HU N3 7@ GM A I L.CO M +23 3 244 5 90 18 9
  • 176.
  • 177.
    Measures of CentralTendency A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. Measures of central tendency are sometimes called measures of central location. The mean, median and mode are all valid measures of central tendency, but under different conditions
  • 178.
    The Mean The meanis the most popular and well-known measure of central tendency. It can be used with both discrete and continuous data, although its use is most often with continuous data. The mean is equal to the sum of all the values in the data set divided by the number of values in the data set.
  • 179.
    Cont… An important propertyof the mean is that it includes every value in your data set as part of the calculation. The mean is the only measure of central tendency where the sum of the deviations of each value from the mean is always zero. When not to use the mean The mean has one main disadvantage: it is particularly susceptible to the influence of outliers. (These are values that are unusual compared to the rest of the data set by being especially small or large in numerical value. )
  • 180.
    The median It isa score that approximately one-half (50%) of the scores are above it and one-half (50%) are below it when the scores are arranged sequentially. The median is the centermost score if the number of scores is odd. If the number of scores is even, the median is taken as the average of the two centermost scores. To find the median 1. Arrange all observations in order of size from smallest to largest or vice versa. 2. If the number of observations, n, is odd, the median is the center observation (n+1)/2 position 3. If the number of observations, n, is even, the median is the mean of the two center observations (n+1)/2 position.
  • 181.
    Properties of themedian It is less sensitive than the mean to extreme scores. It does not use all the scores in a distribution but uses only one value. It has limited use for further statistical work. It can be used when there is incomplete data.
  • 182.
    Uses of themedian It is used as the most appropriate measure of location when there is reason to believe that the distribution is skewed. It is used as the most appropriate measure of location when there are extreme scores to affect the mean It is useful when the exact midpoint of the distribution is wanted. It provides a standard of performance when compared with the mean. It can be compared with the mean to determine the direction of student performance.
  • 183.
    The mode The modeis defined as the most frequent score in the distribution. The mode is not calculated, rather, it is simply reported once the various frequencies within a distribution of data are known. The main advantage of the mode is that, it is the only measure that is useful for nominal scale.
  • 184.
    Cont… Properties of themode It can be used when there is incomplete data It is not sensitive to extreme scores Uses of the mode It is used, when there is the need for a rough estimate of the measure of location. It is used when there is the need for the most frequently occurring value.
  • 185.
    Measures of Variability Variabilityrefers to the degree to which sample or population observations differ or deviate from the distribution’s measures of central tendency. Measures of variability are also called measures of dispersion or scatter. The main measures of variability are the range, variance, standard deviation, and quartile deviation.
  • 186.
    The Range The rangemay be defined as the spread from the lowest score to the highest score in a distribution. The actual statistical formula is the highest score in a group, minus the lowest score. The range is easy to calculate but gives us only relative crude measure of dispersion, because the range really measures the spread of the extreme scores and not the spread of any of the scores in between.
  • 187.
    Cont… Range = Xhigh- Xlow Consider the set of scores: 31,20,45,65,48,67,78,63 Range = 78 – 20 = 58. Uses of the range When data is too scanty or too scattered to justify the computation of a more precise measure. When knowledge of extreme scores or total spread is all that is needed.
  • 188.
    The Variance The variancecan be defined as the mean of the squares of the deviations from the mean of the distribution. It is the mean square deviation. It is a measure of dispersion in which every score in the distribution is taken into its computation.
  • 189.
           x         x x 2         x x 2          x x Thesteps in computing the variance are: •compute the mean of the scores •find the deviation score by subtracting scores from each score: •each of the resulting deviations is squared : •sum up the squared deviation: •Divide the sum of the squared deviations by N; the number of scores
  • 190.
    The Standard Deviation Thestandard deviation is the square root of the average of the squared deviations from the mean. The standard deviation is the square root of the variance. The standard deviation is a statistical process that enables one to make an exact determination of distances of scores from the mean. The mean and the standard deviation, when computed for a specific set of test scores, enable a teacher/counselor to determine how well an individual performed in relation to the group.
  • 191.
    Uses of standarddeviation It helps to find out the variation in achievement among a group of students. Generally, for scores, 1-50, SD of 5 or below is relatively small and for scores 51 - 100. SD of 10 or below, is relatively small. With this information, the teacher has to adopt a teaching method to suit each group. It is helpful in computing other statistics e.g. standard scores, correlation coefficients. It is useful in determining the reliability of a test. The split-half correlation method or internal consistency methods use the standard deviation of the scores.
  • 192.
    Quartile deviation (QD) Itis also called the semi-interquartile range and it depends on quartiles. Quartiles divide distributions into 4 equal parts. Practically there are 3 quartiles. The QD is half the distance between the first quartile (Q1) and the third quartile (Q3).
  • 193.
    Consider the followingscores: 21,32,45,17,25,42,24,35,50,19,27,44,33,51,26. To calculate the quartile deviation. First arrange the scores sequentially 17,19,21,24,25,26,27,32,33,35,42,44,45,50,51 Calculate Q1 = ¼ (15+1) position = 4th position. From the array, Q1 = 24 Calculate Q3 = ¾ (15+1) position = 12th position. From the array, Q3 = 44 QD = (44 – 24)/2 = 10
  • 194.
    Properties of Quartiledeviation ◦ It is a measure of individual differences. ◦ It does not use of all the information provided by the scores ◦ For skewed distributions, where the median is used as a measure of location the quartile deviation is a better measure of variability
  • 195.