Standardized tests have evolved from their origins in China in 1880 as a way to assess government job applicants. There are two main types: norm-referenced tests which compare students and criterion-referenced tests which assess mastery of content. Test specifications provide guidelines for test construction and administration including describing what knowledge is tested, sample questions, and supplemental information. Key components are evaluated in language tests including reading, writing, speaking, and listening through various tasks and scoring rubrics. Challenges include ensuring construct validity and developing diverse, authentic assessment types. Future directions involve refining constructs and incorporating technology for more dynamic evaluations.
Evolution of Standardized Testing: A Historical Overview
1. Evolution of Standardized Tests
Historical Roots in China (1880)
Originated in China as a method to assess government job applicants based on Confucian
philosophy and poetry (Fletcher, 2009).
• Advantages:
− Standardized tests provide an objective and uniform measure of performance.
− Efficient for evaluating a large number of individuals simultaneously.
• Disadvantages:
− May not capture the full range of a student's abilities, skills, or potential.
− Critics argue that these tests may exhibit bias against certain cultural and
socioeconomic groups.
Purpose of Standardized Tests:
Primal pursuit: Facilitate comparison of competences and aptitudes across a diverse
population.
Reliable and valid: Allows for discussion of benchmarks, enabling comparisons among
institutions and students.
Potential achievement: Provides insights for teachers to implement data-based strategies.
Types of Standardized Tests: Norm-Referenced Tests (NRTs):
• Designed to highlight achievement differences between students.
• Produces a dependable rank order across a continuum of achievement.
• Benefits include outlining a performance curve for comparison.
Criterion-Referenced Tests (CRTs):
• Purpose: Gauges the level of mastery achieved by students on a specific body of
knowledge.
• Used by local districts to determine passing scores.
• Teachers use CRTs to track performance and reshape teaching materials.
• Scores indicate what individuals can do, focusing on individual mastery rather than
group comparison.
2. Test Specifications
Foundational Principles:
o Representativeness: Test must cover a specific knowledge domain
comprehensively.
o Format and Scoring: Benchmarks crucial for test reliability and validity.
o Consistent Conditions: Uniformity in test administration.
Iterative Nature of Test Specifications:
o Test specifications are iterative, acting as a generative blueprint for test
creation.
o Close relationship with test purposes and objectives.
o Flexibility to accommodate multiple versions for diverse test-takers.
Blueprint Analogy:
o Test specifications are likened to a blueprint, with each specification
representing a component of the overall test.
o Common denominator: Provide background on content, number of items,
item nature, delivery method, and additional input materials.
Content-Based vs. Process-Based Specifications:
o Content-Based Specifications: Focus on the substance of the test, covering
aspects like content, item variety, and delivery method.
o Process-Based Specifications: Adapted based on the test's requirements,
ensuring alignment with the test's purpose.
Components of Test Specifications in Standardized Testing
General Description (GD):
− Detailed description of what is to be tested.
− Conveys the purpose and motivation of the test.
Prompt Attribute (PA):
− Information given to the test taker.
− The stimulus triggering the response to be measured.
− Also known as the prompt stimulus.
Response Attribute (RA):
− Describes what the test taker will do.
− Can involve selecting (e.g., multiple-choice) or constructing a response (e.g.,
elaborate writing assignment).
Sample Item (SI):
− Provides a tangible example illustrating GD, PA, and RA.
− "Brings to life" the three previous components.
Specification Supplement (SS):
− Additional information not covered in previous sections.
3. − May include details about the types of text to be selected or other pertinent
information.
− Ensures completeness without making other sections overly complex.
Design, Selection, and Arrangement of Test Items
Importance of Test Item Design:
A well-designed test empowers students to understand the test structure and plan
accordingly.
Common Test Item Variants:
• Multiple Choice Exams
• Essay Questions
Multiple Choice Exams:
• Often perceived as shallow but efficient in grading.
• Suitable for assessing recall of information or facts.
• Speeds up the grading process.
Essay Questions:
Designed to display a comprehensive understanding of a topic, and assess critical thinking
skills, organization, creativity, and information management.
The benefits include simplicity in design and time efficiency.
Considerations for Multiple-Choice Exams:
Because it is suitable for assessing recall. It grades efficiency but may lack depth.
It is appropriate when facts are the core content.
Considerations for Essay Questions:
− Assess broader understanding and critical thinking.
− Simple design and time efficient.
− Reliability in grading may be a challenge due to potential bias.
Reporting Formats in Assessment
"The process of communicating results of assessment and evaluation to various audiences"
(Board, 2013, p.7).
Note: Results must display formality, clarity, and objectivity.
Types of Reporting Formats:
4. Percentiles:
Aggregates students' performance for comparison. For example: Scoring at the 50th
percentile indicates performance equal to 50 percent of students with the same age
(Logsdon, 2020).
Z-Scores:
• Scale from -4 to 4.
• Above-average scores closer to 4, below-average scores closer to -4.
• Zero represents the core average (Logsdon, 2020).
T-Scores:
• Ranged within intervals (10 to 90 points).
• Average placed at fifty, with most scores falling between 40 and 60 (Logsdon, 2020).
Stanine Score:
• Standard nine scale, ranging from 1 to 9.
• 5 represents the average score (Logsdon, 2020).
Scaled Scores:
• Extensive scale derived from specific subtests.
• General composite score combining subtest scores (Logsdon, 2020).
Designing Classroom Language Tests
Narrowed focus catering to students' needs, and the objectives aligned with expected
evaluations, covering forms, functions, constructs, and language abilities.
Note: Components to be assessed should be weighed appropriately.
Items Organization and Scoring:
• Alignment crucial for student comprehension.
• Practical item arrangement.
• Scoring process providing minimal feedback to students.
5. Importance of Alignment:
Ensures coherence as students progress through test items (Koç, 2020).
Types of Language Tests:
• Language Aptitude Tests
• Language Proficiency Tests
• Placement Tests
• Diagnostic Tests
• Achievement Tests
Language Aptitude Tests:
Assess inherent language learning capabilities.
Language Proficiency Tests:
Measure overall language competence.
Placement Tests:
Determine appropriate language course level.
Diagnostic Tests:
Identify specific language strengths and weaknesses.
Achievement Tests:
Evaluate students' knowledge and skills acquired in a particular course.
Reading Test and Assessment
Comprises various subskills and linguistic knowledge bases (Grabe, 2009).
Measurement extends beyond basic comprehension (Brown, 2004).
Grading Reading Tests:
Evaluated based on the test-taker's level and expected competences.
Approaches to Assessing Reading:
• Classroom Assessment
• Informal Assessment
• Alternative Assessment
• Standardized Assessments.
Indicators in Reading Tests:
6. • Word Recognition Efficiency
• Vocabulary Knowledge
• Morphology, Syntax, and Discourse Knowledge
• Strategic Processing
Distribution of Indicators:
Spread across various items in the reading test.
Example Reading Tasks:
• Word Recognition
• Vocabulary Application
• Understanding Morphology and Syntax
• Strategic Processing Tasks
Note: Enhances understanding of a student's overall reading abilities.
7. Use of Language Tests in Language Assessment
Focus on obtaining information for inferences about language ability.
Evolution of Language Use Assessment:
• Language use gained prominence for interpreting and creating intended meanings
in discourse.
• Sociocultural factors become crucial in language assessment.
Standardized Tests and Language Use:
Tests like TOEFL or IELTS often fall short in assessing language use, except for the speaking
part.
Sociopragmatics and Pragmalinguistics Testing:
− Explore sociocultural factors affecting language use.
− Focus on speech acts: assertives, directives, commissive, expressive, and
declarations (Hudson, Detmer, & Brown, 1995).
Cambridge Examinations Approach:
• Reduced assessment of language use.
• Focus on context-related completion exercises aligned with written language use.
Considerations for Future Development:
− Addressing contextual limitations in language use assessment.
− Incorporating more diverse and real-world language use scenarios.
Listening Tests in Language Assessment
Concerns about covering cognitive processes, knowledge sources, and interactive listening.
Types of Listening Tests:
Proficiency Tests:
o Evaluate comprehensive listening competence.
o Inform placement of learners in appropriate courses.
Standardized Tests (e.g., TOEFL, IELTS):
o Establish a common scale for result comparison.
o Ensure uniform assessment conditions.
8. Challenges in Listening Tests:
• Construct Validity:
o Define the purpose of listening clearly.
o Clarify the context of language use.
• Task Type, Item Type, and Input Mode:
o Address challenges related to task variety, item design, and input methods
(Vandergrift & Goh, 2009).
Key Challenges:
• Questions of construct validity.
• Clarity on task and item types.
• Ensuring the appropriateness of input modes.
Core Components of Listening Competence (Buck, 2001):
− Process extended samples of realistic spoken language in real time.
− Understand linguistic information in the text.
− Make inferences implied by the content of the passage.
Future Directions:
− Refining construct definitions for listening competence.
− Developing diverse task types for a comprehensive evaluation.
Speaking Tests in Language Assessment
Mastery requires control and proficiency due to interactive demands.
Evaluation Aspects:
• Speaking tests assess various aspects:
o Coherence of responses.
o Suitability of vocabulary.
o Time management.
o Fluency in completing tasks.
Evolution of Speaking Assessment:
Focus on the construct of speaking, task construct, performance criteria, and oral
development (Bygate, 2009).
Standardized Tests - Cambridge Examinations:
Speaking section consists of 4 parts:
o Personal information questions.
o Comparing pictures and answering related questions.
o Collaborative discussion with a partner on various topics.
o Addressing intricate questions based on previous tasks, requiring detailed
elaboration.
Purpose of Each Speaking Test Part:
9. Part 1: Personal information assessment.
Part 2: Comparison of pictures and related questions.
Part 3: Collaborative discussion on various topics.
Part 4: Addressing complex questions, requiring in-depth elaboration.
Key Aspects Evaluated:
− Coherence and suitability of vocabulary.
− Time management skills.
− Fluency in responding to diverse tasks.
Challenges in Speaking Assessment:
• Dynamic nature of speaking.
• Varied performance criteria.
• Ensuring fairness in evaluating diverse tasks.
Future Trends:
− Incorporating technology for more authentic speaking assessments.
− Developing diverse task types to assess a wide range of speaking competencies.
Writing Tests in Language Assessment
Writing tests assess cognitive problem-solving processes.
Fundamental Guidelines in Writing:
1. Writing is an exploratory and recursive process.
2. Acceptance of preset text structures.
3. Random assembly of rhetorical devices while adhering to coherence and cohesion
standards (Polio & Williams, 2009).
L2 Writing Testing:
• Large-scale standardized testing focuses on a variety of topics.
• Descriptors cover fixed formats such as essays, reports, letters, emails, reviews, and
proposals.
Scoring Criteria:
10. − Thorough scoring scale, e.g., Cambridge examinations.
− Each piece assessed on a scale of up to twenty points.
− Total of forty points for the entire writing part.
Types of Writing Tasks:
• Essays.
• Reports.
• Letters.
• Emails.
• Reviews.
• Proposals.
Balanced Assessment:
− Emphasis on cognitive processes.
− Consideration of cultural variations.
− Evaluation of coherence and cohesion.
Evolving Trends:
− Integration of technology for more dynamic writing assessments.
− Exploration of innovative writing task types.