This document discusses key principles of second language assessment: validity, reliability, practicality, equivalency, authenticity, and washback. It provides definitions and explanations of each principle. For example, it states that a test is valid if it accurately reflects the test-taker's ability in a particular area. The document also discusses different types of evaluation (e.g. formative and summative) and research techniques (e.g. observational and experimental studies).
2. PRINCIPLES OF SECOND LANGUAGE
ASSESSMENT
Fundamental principles for
evaluating and designing second
language assessment include
validity, reliability, practicality,
equivalency, authenticity, and
washback.
4. VALIDITY
A test is considered valid when it reflects the test-
takers’ ability in a particular area and the test does
not measure anything else.
5. RELIABILITY
A test is considered reliable if it is administered on
different occasions and similar results are obtained.
Brown and Abeywickrama (2010, p. 27) suggested the
following ways to ensure that a test is reliable:
It is consistent in its conditions across two or more
administrations.
It gives clear directions for scoring or evaluation.
It has uniform rubrics for scoring or evaluation.
It lends itself to consistent application of those rubrics
by the rater.
It contains items or tasks that are unambiguous to the
test-takers.
6. PRACTICALITY
Practicality refers to the logistical, practical, and
administrative issues involved in the process of
constructing, administering, and rating an assessment
instrument (Brown & Abeywickrama, 2010).
Bachman and Palmer (1996, p. 36), defined practicality
as “the relationship between the resources that will be
required in the design, development, and use of the
test and the resources that will be available for testing
activities.”
7. EQUIVALENCY AND AUTHENTICITY
“An assessment has the property of
equivalency if it is directly based on
curriculum standards or instructional
activities. Specifically, equivalency
determines in what ways
assessment design is influenced by
teaching” (Mihai, 2010, p. 45).
8. WASHBACK
Washback may have been called backwash, test
impact, measurement-driven instruction, curriculum
alignment, and test feedback (Brown & Hudson,
1998).
Washback, the effect of testing and assessment on
the language teaching curriculum that is related to it.
washback is used to refer to the influence that a test
has on teaching and learning (Hughes, 2003).
9. LYLE E. BACHMAN: 1990: 18
‘Measurement’ in social sciences is the process of
quantifying the characteristics according to explicit
procedures and rules.
quantification-:-numbers
characteristics:-verbal accounts or non-verbal, visual
representations
Non-numerical categories or rankings –grades-a, b,
c,etc
Attitude-aptitude, intelligence, motivation, field
dependence, independence attitude, native language
. Explicit rules and procedures: Applying reliable
quantification methods
10. EVALUATION
Evaluation is an activity through which the human
behaviors, actions and happenings of the world are
identified, perceived and realized. It is the only activity
that controls and provides valid judgments and
conclusions about each and every activity of the day-
to-day events.
Test is a part in the process of evaluation but not the
whole of it.
An evaluation process may be complete when the tests
are rightly interpreted with pros and cons of it.
11. TESTING AND EVALUATION IN
CURRICULUM DOMAIN
Tests do not always follow evaluation procedures and
in many cases the purpose of the tests is specific and
they do not necessarily include the evaluation
procedures. Mostly tests are conducted and made use
of for pedagogical and recruitment purposes.
12. GRANT HENNING (1987,P: 9)
Evaluation of the language tests should consider
Purpose of the test
Characteristics of the examinees
Accuracy of measurement
Suitability of the format and features of a test
Developmental sample
Availability of equivalent or equated forms
Nature of the scoring and reporting of scores
Procurement and
Political compatibility of the test.
13. ROLE OF EVALUATION
Identification of course objectives. (the expected or
desired learning outcome)
Defining the objectives in terms of learners’ terminal
behavior.
Constructing appropriate tools or instrument for
measuring the behavior.
Applying or administering the tools/instruments and
analyzing the results to determine the degree of
learners’ achievement in the instructional program.
The above four steps are basically the same in the
evaluation of instructions, curriculum or the program
as a whole. Both measurement and evaluation require
broad variety of tools or instruments such as, tests,
rating scales, inventories, check lists, questionnaires
etc.
15. ONGOING EVALUATION
Ongoing evaluation is meant for getting the feedback
regularly after the completion of every
step during its process viz. planning, preparation,
production and application. This would enable the
program to improve at various stages at that time of
the program itself. This type of evaluation is more
helpful to modify anything if necessary in the course of
the didactic process.
16. TERMINAL EVALUATION
Terminal evaluation is a type of evaluation that is
made after the completion of the program and it is
used to know whether the program is success or
failure. This type of evaluation would not be used for
any improvement of the program. In general,
evaluation has been further classified into four
categories: They are:
Formative evaluation
Summative evaluation
Brief evaluation and
Extensive evaluation
17. TYPES OF EVALUATION
Formative evaluation
Formative evaluation is a process of evaluation that is
made from time to time in the case of an instructional
program and from one stage to the other. It does not
provide a totalitarian impression of the quality either of the
instructional programs, the techniques and methods,
materials or media.
Summative evaluation
Summative evaluation is that kind of evaluation which takes
into consideration the periodic evaluation that has been
made and in addition to a total evaluation of the program:
process or product made and the conclusions are arrived at
keeping in view the outcome of the periodic evaluation in
addition to the final evaluation.
18. TYPES OF EVALUATION
Brief evaluation
Evaluating a program can also be made taking into
account only some aspects and the evaluator can also
give a judgment based on the few aspects chosen for
evaluation. But it will be subjective and impressionistic
and not a realistic one. This can be useful to roughly
compare two (or) more programs.
Extensive evaluation
Extensive evaluation involves the analysis of a program
in its entire main and sub aspects. The evaluator has to
rate and weigh each of them individually and consolidate
the total rating based on which he makes his value
judgment. This is more objective and valid.
19. RESEARCH TECHNIQUES
1. data collection techniques
Observational
Experimental
2. causality relationships
Descriptive
Analytical
3. relationships with time
Retrospective
Prospective
Cross-sectional
4. medium of application
Clinical
Laboratory
Social descriptive research
20. SCIENTIFIC RESEARCH
Scientific research can be classified in several ways.
Classification can be made according to the data
collection techniques based on causality, relationship
with time and the medium through which they are
applied.
21. Scientific research Non-scientific research
Logical
Expanding understanding
Reproduced and demonstrated
Not Logical
Reproduction may result in varied
results
Truth and factual enquiry
Scientific techniques are utilized
Identification of problem
Formulation of hypothesis
Data analysis and interpretation
Recommendations and conclusions
Acquiring knowledge and truths
about the world using techniques
without following the scientific
method.
Systematic
Experimentation
Observation
Investigation based on natural
phenomenon
22. PRE-SCIENTIFIC MOVEMENT
characterized by translation tests developed exclusively by
the classroom teachers.
Relatively difficult to score objectively; thus, subjectivity
becomes an important factor in the scoring of such tests
(Brown, 1996).
23. THE PSYCHOMETRIC-STRUCTURALIST PERIOD
With the onset of the psychometric-structuralist
movement of language testing, language tests became
increasingly scientific, reliable, and precise. In this era,
the testers and psychologists, being responsible for
the development of modern theories and techniques of
educational measurement, were trying to provide
objective measures, using various statistical
techniques to assure reliability and certain kind of
validity.
24. REFERENCE
Bachman, L. F. (2011). Fundamental considerations in language testing. Oxford: Oxford University
Press.
Bachman, L. F., & Palmer, A. S. (1996) Language Testing in Practice.Oxford: Oxford University
Press.
Brown, D., &Abeywickrama, P. (2010).Language assessment principles and classroom practices
(2nd ed.), Pearson Education.
Harris, M. (1997).Self-assessment of language learning in formal settings.ELT Journal, Oxford
University Press 51/1, 14.
Iseni, A. (2011) Assessment, Testing and Correcting Students' Errors and Mistakes. Language
Testing in Asia (1/3).
Miller, M.J. Reliability and validity.Graduate Research Methods, Western International University.
Rahimi, M, ,Momeni, G., &Nejati, R. (2012). The impact of lexically based language teaching on
students‘ achievement in learning English as a foreign language. Elsevier Journal,31.
Underhill, N. (1987) Testing Spoken Language: A handbook of oral testing techniques. (1sted.) (pp.
21-86).Cambridge University Press.
Wolfe K. (2004). Student assessment-testing and grading.Tips for Teaching Assistants and New
Instructors.Journal of Teaching in Travel & Tourism, 4/2, 80.