2. OVERVIEW
Introduction
1. Roles Of The Stakeholders
2. Roles Of The Test-takers
3. Power Relations In Testing
Test Fairness
1. Test Sensitivity Review
2. Test Bias
Fairness and validity
1. Fairness independent of validity
2. Fairness subsumes validity
3. Fairness and validity are overlapping
4. Fairness as an important aspect of validity
Xi's framework
Kunnan's Test Context Framework
3. INTRODUCTION
• There are many definitions of fairness.
• Your-dictionary.com (2011) states that to be fair is to be “just and honest,” “impartial,” and “unprejudiced,”
specifically, “free from discrimination based on race, religion, sex, etc.”
• Kunnan (2000) argues that fairness embraces three concerns: validity, accessibility of tests to test takers, and
justice.
4. INTRODUCTION
Spaan (2000)
Defines fairness as an ideal “in which opportunities are equal,”
“In the natural world, test writers and developers cannot be ‘fair’ in the ideal sense, but … they can try to
equitable.” this equitability is “the joint responsibility of the test developer, the test user, and the examinee, in a
sort of social contract”.
Fairness can be seen as a system or process – as distinct from a quality.
5. INTRODUCTION
What most language testing views of fairness have in common is a desire to avoid the effects of any construct-
irrelevant factors on the entire testing process, from the test-design stage through post-administration decision-
making.
In this context, one dimension of fairness concerns the roles of the major stakeholders in achieving language
testing fairness. these stakeholders, to use spaan’s (2000) tripartite “social contract” scheme, are:
test developers;
L2/FL learners;
and test users (i.e., teachers, administrators, etc.).
6. INTRODUCTION
Role Of Test Developers
❑They must try to ensure the validity, reliability, and practicality of their test methods;
❑They must also provide test users with easily understandable guidelines for the use of their tests;
❑They must also solicit feedback from them, to effect further test improvements.
7. INTRODUCTION
Roles Of The Test Takers
❑They must become familiar with the testing format and overall test content before taking the test;
❑They must try to make sure that the level of the test matches their own skill/knowledge level.
Roles Of The Test Administrators
❑They must give tests to students for whom the tests were designed; to do otherwise would be an instance of
test abuse.
8. INTRODUCTION
Power Relations In Testing
Viewing test developer, test taker, and test user as parties to a social contract highlights a second issue, namely
the phenomenon of power relations in testing.
Shohamy (2001) points out that language tests have often been used by powerful agencies such as
governments, educational bureaucracies, or school staff for reasons other than the assessment of language
skills. For example, tests have been used to establish discipline, to impose sanctions on schools or teachers, or
to raise the prestige of the subject matter being tested.
9. INTRODUCTION
Power relations in testing
She offers a set of principles organized under the heading of critical language testing, as a way to engage
language testers “in a wider sphere of social dialogue and debate about … the roles tests … have been assigned
to play in society”
❑Critical language testing “encourages test takers to develop a critical view of tests as well as to act on it by
questioning tests and critiquing the value which is inherent in them.”
A critical language testing perspective asks that all parties to the contract remain vigilant.
10. Test Fairness
Test Sensitivity Review
One approach to examining whether or not test questions are fair is through a test sensitivity review.
Such reviews are performed by trained judges employed by test-development organizations, who examine test
tasks to determine whether they contain language or content that may be considered stereotyping, patronizing,
inflammatory, or otherwise offensive to test takers belonging to subgroups defined by culture, ethnicity, or
gender.
11. Test Fairness
Test bias
Is a technical term indicating a testing situation in which a particular test use results in different interpretations
of test scores received by cultural, ethnic, gender, or linguistic subgroups.
Synonymous with DIF.
Bias or DIF is considered to be present when a test item is differentially difficult for a ethnic, cultural, or
gender-related subgroup which is otherwise equally matched with another subgroup in terms of knowledge or
skill.
Among the statistical methods used to uncover DIF are:
the Standardization Procedure;
and the Mantel–Haenszel method.
12. Fairness And Validity
Kane (2010) points out that the relationship between fairness and validity depends on how the two concepts
are defined: narrowly defining validity and broadly defining fairness will result in validity being considered a
component of fairness. On the other hand, a broad definition of validity and a narrow conceptualization of
fairness will result in fairness being understood as a part of validity.
13. Fairness And Validity
1. Fairness Independent Of Validity
One example of this is the Standards for Educational and Psychological Testing which define it as having, at
minimum, three components: lack of item bias, the presence of equitable treatment of all test-takers in the
testing process, and equity of opportunity of examinees to learn the material on a given test. While fairness
here is not linked directly with validity, these 1999 standards do mention that fairness “promotes the validity
and reliability of inferences made from test performance”.
14. Fairness And Validity
2. Fairness Subsumes Validity
Kunnan (2000) articulates a framework in which fairness includes issues of validity, accessibility to test takers,
and justice. Under validity, Kunnan includes issues such as construct validity, DIF, insensitive test-item language,
and content bias. An example of the latter might be a dialect of English employed in the test prompts that differs
in some respects from another English dialect that may constitute the L2 of the test taker. Under accessibility,
Kunnan indicates issues such as affordability, geographic proximity of test taker to the testing site,
accommodations for test takers with disabilities, and opportunity to learn. “Opportunity to learn” is closely
connected with the notion of construct under-representation (messick, 1989), which indicates the ability of a test
to measure some aspects of a construct or skill, but not others.
15. Fairness And Validity
2. Fairness Subsumes Validity (continued)
A test may be measuring aspects of a construct, such as knowledge of a particular rule of language pragmatics,
that certain test takers will not have had the opportunity to learn and thus score poorly on the test, despite the
fact that they may be proficient in other relevant areas of the construct. Finally, Kunnan’s facet of justice
embraces the notion of whether or not a test contributes to social equity. Kunnan (2004) later modified this
model to include absence of bias, and test-administration conditions.
16. Fairness And Validity
3. Fairness And Validity Are Overlapping
Kane’s definition of test fairness draws on political and legal concepts. One, procedural due process, states that the
same rules should be applied to everyone in more or less the same way.
Kane also bases his definition on substantive due process, which states that the procedures applied should be
reasonable both in general and in the context in which they are applied. In applying this twin definition of fairness to
assessment, he gives two principles: the first is procedural fairness, in which test takers are treated “in essentially
the same way…take the same test or equivalent tests, under the same conditions or equivalent conditions, and …
their performances [are] evaluated using the same (or essentially the same) rules and procedures.” The second is
substantive fairness, in which the score interpretation and any test-based decision rule are reasonable and
appropriate for all test takers.
17. Fairness And Validity
4. Fairness As An Important Aspect Of Validity
By Willingham and Cole
In this context, a fair test is one for which both the (preferably, small) extent of statistical error of
measurement, and the inferences (hopefully, reasonable ones) from the test results regarding test-taker ability,
are comparable from individual to individual and from subgroup to subgroup.
They state that comparable validity must be met at all stages of the testing process – when designing the test,
developing the test, administering the test, and using the test results. Comparable validity must be achieved by
selecting test material that does not give an advantage to some test takers for reasons that are irrelevant to the
construct being measured.
18. Fairness And Validity
4. Fairness As An Important Aspect Of Validity (continued)
Willingham and Cole see fairness as having three qualities linked to validity:
(1) comparable opportunities for test takers to show their knowledge and skills;
(2) comparable test tasks and scores;
(3) comparable treatment of test takers in test interpretation and use.
Xi both expands the definition of test fairness and offers a new framework for investigating fairness issues.
19. Xi's Framework
Xi states that fairness is “comparable validity for identifiable and relevant groups across all stages of
assessment, from assessment conceptualization to use of assessment results,” where “construct-irrelevant
factors, construct under-representation, inconsistent test administration practices, inappropriate decision-
making procedures or use of test results have no systematic or appreciable effects on test scores, test score
interpretations, score-based decisions and consequences for all relevant groups of examinees”.
20. Xi's Framework
Consists a fairness argument embedded within a validity argument. A validity argument is a chain of inferences that
leads a test user to appropriate interpretations of test results. Xi’s validity argument framework consists of six
successive sub-arguments, that: (1) there is evidence that the domain of L2 use which is of interest, provides a
meaningful basis for our observations of test-taker performance on the test; (2) there is evidence that the observed
test scores reflect that domain of L2 use and not construct-irrelevant factors; (3) there is evidence that the observed
scores on the test are generalizable over similar language tasks on other, similar tests; (4) there is evidence that the
abovementioned generalization of observed scores can be linked to a theoretical interpretation (i.e., The construct,
the theoretical skill) of such scores; (5) there is evidence that the theoretical construct can explain the L2 use in
actual situations envisioned by the users of the test; and (6) there is evidence that the language-test results are
“relevant, useful, and sufficient” for determining the level of L2 ability. Each of these sub-arguments is supported
by certain assumptions.
21. Xi's Framework
Embedded in the above chain of sub-arguments and underlying assumptions, xi proposes, can be a fairness
argument, which consists in part of a series of rebuttals, one or more posed to each of the validity sub-
arguments. One can conceive of such rebuttals as research questions into the degree of fairness of a given
language test, i.e., each rebuttal serves as a practical check on the claims of each sub-argument. For example,
to the first of the sub-arguments above, one can ask whether or not the domain of L2 use actually provides a
meaningful basis for observations of test-taker performance.
22. Kunnan’s Test Context Framework
This approach is intended to consider the wider political, educational, Cultural, social, economic, legal, and historical
aspects of a test. In this it differs somewhat from other, more psychometrically focused approaches considered above,
such as DIF, or even xi’s fairness argument framework. It has a certain overlap with Kane’s (2010) applications of
political and legal concepts to language testing, and it also resonates with the analyses of power relations in language
testing offered by Shohamy (2001), both mentioned above.
Kunnan’s approach thus brings wider social factors into consideration when evaluating the fairness of a language test.