CRMEF Inezgane
The requirements of a good test
Presented by :Hassnaa Idir
Maryam Errami
Trainer : Mr.Ayad Chraa
Outline
 Reliability
 Validity
 Practicality
 Authenticity
 Washback
RELIABILITY
 Test reliability refers to the consistency and dependability
of a test in measuring what it is intended to measure. A
reliable test yields stable results across different
administrations, ensuring that variations in scores are
minimized. Several factors can lead to inconsistent
results, including unclear test items, student distractions,
and scoring biases.
Types of reliability
Test-Retest Reliability
 Test-retest reliability refers to the consistency of a test's results
when administered to the same group of individuals at different
points in time. It is assessed by comparing the scores from the
initial test to those from a subsequent administration, typically
after a short interval. A high correlation between the two sets
of scores indicates strong test-retest reliability.​
 For example, administering an IQ test to a group of participants
and then re-administering the same test to the same group
after a month can help determine the test's reliability over
time. A correlation coefficient close to 1.0 would suggest
excellent reliability.
Types of reliability
Inter-rater reliability
Inter –rater reliability is a measure of the degree to which
different raters or examiners provide consistent
assessments or scores when evaluating the same test
items or responses.
For
example ,two judges independently rate a student's essay
on a scale of 1 to 5. If both assign a score of 4, their inter-
rater reliability is high. ​
Validity
 Validity refers to whether a test measures what it is
supposed to measure. Validity is also related to how well a
test matches your intended purposes and uses.“
Types of validity
 Content validity
refers to the extent to which a test
comprehensively covers the entire range of the construct
it aims to measure.
For example :an English literature final
exam includes poetry, drama, and fiction—all covered in
class.
Types of validity
Construct validity :
"Construct validity
refers to the degree to which a test or instrument
measures the theoretical construct it is intended to
measure. For example "A school
creates a reading comprehension test to assess students'
ability to understand written passages. The test asks
students to read a short story and then answer questions
about its themes, characters, and details.
 To establish construct validity, the test developers
ensure that the test measures reading comprehension
and not other skills like vocabulary knowledge or reading
speed. They check that the test correlates strongly with
other established reading comprehension measures but
Types of validity
Criterion validity
Criterion validity evaluates how effectively a test predicts an
individual's performance on an external criterion, such as future
behavior or outcomes. For example :A university administers a
college entrance exam to incoming students to assess their
readiness for college-level coursework. To establish criterion
validity, the university compares students' scores on the entrance
exam with their first-year Grade Point Average (GPA).​
• If students with higher entrance exam scores tend to have
higher GPAs, this indicates that the entrance exam has
predictive validity—it accurately predicts future academic
performance.
Practicality
 H. Douglas Brown defines practicality in language assessment as
follows:
’’ An effective test is practical. This means that it:
• Is not excessively expensive.
• Stays within appropriate time constraints.
• Is relatively easy to administer
• Has a scoring/evaluation procedure that is specific and time-
efficient.”
Source: Brown, H.D (2004). Language assessment: Principles and Classroom Practices.
Pearson Education ESL
Authenticity
Arthur Hughes defines authenticity in language testing as:
“ the degree to which test tasks and materials replicate real
life language use.”
 He emphasizes that incorporating authentic tasks in
assessments enhances their relevance and effectiveness,
particularly by aligning them with the types of language
learners will encounter outside the classroom.
Source: Hughes, A.( 2003). Testing for Language Teachers (2nd
ed.).
Cambridge University Press.
Washback
Arthur Hughes defines Washback as the effect a test has on teaching and learning.
Positive washback: According to Arthur Hughes, positive feedback happens when a test encourages good teaching and useful learning
Negative washback: it occurs when a test causes teachers and students to focus only on test content and ignore real communication.
Source: Hughes, A. (2003). Testing for Language Teachers (2nd
ed.)
Cambridge University press.
Washback-practicality-validity-
authenticity-reliability
 1. A reading test asks students to answer
questions based on newspaper articles and blog
posts instead of artificial texts.
 2.Every time the same group of students takes a
listening test, they get nearly the same scores.
 3. A writing test is supposed to measure writing
skills, but it doesn’t involve any writing tasks at
all.
 4.A teacher designs a test that is easy to mark,
low-cost, and can be done within a single class
period.
 5. After taking a speaking test that focused on
real-life conversation skills, students feel
motivated to practice speaking more in class.

presentation of Requirements of a good test..pptx

  • 1.
    CRMEF Inezgane The requirementsof a good test Presented by :Hassnaa Idir Maryam Errami Trainer : Mr.Ayad Chraa
  • 2.
    Outline  Reliability  Validity Practicality  Authenticity  Washback
  • 3.
    RELIABILITY  Test reliabilityrefers to the consistency and dependability of a test in measuring what it is intended to measure. A reliable test yields stable results across different administrations, ensuring that variations in scores are minimized. Several factors can lead to inconsistent results, including unclear test items, student distractions, and scoring biases.
  • 4.
    Types of reliability Test-RetestReliability  Test-retest reliability refers to the consistency of a test's results when administered to the same group of individuals at different points in time. It is assessed by comparing the scores from the initial test to those from a subsequent administration, typically after a short interval. A high correlation between the two sets of scores indicates strong test-retest reliability.​  For example, administering an IQ test to a group of participants and then re-administering the same test to the same group after a month can help determine the test's reliability over time. A correlation coefficient close to 1.0 would suggest excellent reliability.
  • 5.
    Types of reliability Inter-raterreliability Inter –rater reliability is a measure of the degree to which different raters or examiners provide consistent assessments or scores when evaluating the same test items or responses. For example ,two judges independently rate a student's essay on a scale of 1 to 5. If both assign a score of 4, their inter- rater reliability is high. ​
  • 6.
    Validity  Validity refersto whether a test measures what it is supposed to measure. Validity is also related to how well a test matches your intended purposes and uses.“
  • 7.
    Types of validity Content validity refers to the extent to which a test comprehensively covers the entire range of the construct it aims to measure. For example :an English literature final exam includes poetry, drama, and fiction—all covered in class.
  • 8.
    Types of validity Constructvalidity : "Construct validity refers to the degree to which a test or instrument measures the theoretical construct it is intended to measure. For example "A school creates a reading comprehension test to assess students' ability to understand written passages. The test asks students to read a short story and then answer questions about its themes, characters, and details.  To establish construct validity, the test developers ensure that the test measures reading comprehension and not other skills like vocabulary knowledge or reading speed. They check that the test correlates strongly with other established reading comprehension measures but
  • 9.
    Types of validity Criterionvalidity Criterion validity evaluates how effectively a test predicts an individual's performance on an external criterion, such as future behavior or outcomes. For example :A university administers a college entrance exam to incoming students to assess their readiness for college-level coursework. To establish criterion validity, the university compares students' scores on the entrance exam with their first-year Grade Point Average (GPA).​ • If students with higher entrance exam scores tend to have higher GPAs, this indicates that the entrance exam has predictive validity—it accurately predicts future academic performance.
  • 10.
    Practicality  H. DouglasBrown defines practicality in language assessment as follows: ’’ An effective test is practical. This means that it: • Is not excessively expensive. • Stays within appropriate time constraints. • Is relatively easy to administer • Has a scoring/evaluation procedure that is specific and time- efficient.” Source: Brown, H.D (2004). Language assessment: Principles and Classroom Practices. Pearson Education ESL
  • 14.
    Authenticity Arthur Hughes definesauthenticity in language testing as: “ the degree to which test tasks and materials replicate real life language use.”  He emphasizes that incorporating authentic tasks in assessments enhances their relevance and effectiveness, particularly by aligning them with the types of language learners will encounter outside the classroom. Source: Hughes, A.( 2003). Testing for Language Teachers (2nd ed.). Cambridge University Press.
  • 15.
    Washback Arthur Hughes definesWashback as the effect a test has on teaching and learning. Positive washback: According to Arthur Hughes, positive feedback happens when a test encourages good teaching and useful learning Negative washback: it occurs when a test causes teachers and students to focus only on test content and ignore real communication. Source: Hughes, A. (2003). Testing for Language Teachers (2nd ed.) Cambridge University press.
  • 16.
    Washback-practicality-validity- authenticity-reliability  1. Areading test asks students to answer questions based on newspaper articles and blog posts instead of artificial texts.
  • 17.
     2.Every timethe same group of students takes a listening test, they get nearly the same scores.
  • 18.
     3. Awriting test is supposed to measure writing skills, but it doesn’t involve any writing tasks at all.
  • 19.
     4.A teacherdesigns a test that is easy to mark, low-cost, and can be done within a single class period.
  • 20.
     5. Aftertaking a speaking test that focused on real-life conversation skills, students feel motivated to practice speaking more in class.