SlideShare a Scribd company logo
Jesús Ángel González
What does the word suggest?
What sort of emotions does it convey?
Try to write a definition. What does it
imply?
Which characteristics should it have?
 What does the word suggest?
 What sort of emotions does it convey?
 Try to write a definition. What does it imply?
• Collecting information
• Analyzing the information and making an assessment
• Taking decisions according to the assessment made:
 Pedagogical decisions (formative assessment)
 Social decisions
 Which characteristics should it have?
• Validity, reliability, feasibility
 Assessment: Assessment of the proficiency of
the language user
 3 key concepts:
• Validity: the information gained is an accurate
representation of the proficiency of the candidates
• Reliability: A student being tested twice will get the
same result (technical concept: the rank order of the
candidates is replicated in two separate—real or
simulated—administrations of the same assessment )
• Feasibility: The procedure needs to be practical,
adapted to the available elements and features
 If we want assessment to be valid, reliable,
and feasible, we need to specify:
• What is assessed: according to the CEFR,
communicative activities (contexts, texts, and tasks).
See examples.
• How performance is interpreted: assessment criteria.
See examples
• How to make comparisons between different tests
and ways of assessment (for example, between public
examinations and teacher assesment). Two main
procedures:
 Social moderation: discussion between experts
 Benchmarking: comparison of samples in relation to
standardized definitions and examples, which become
reference points (benchmarks)
• Guidelines for good practice: EALTA
TYPES OF ASSESSMENT
1 Achievement assessment / Proficiency assessment
2 Norm-referencing (NR)/ Criterion-referencing (CR)
3 Mastery learning CR / Continuum CR
4 Continuous assessment / Fixed assessment points
5 Formative assessment / Summative assessment
6 Direct assessment / Indirect assessment
7 Performance assessment / Knowledge assessment
8 Subjective assessment / Objective assessment
9 Checklist rating / Performance rating
10 Impression / Guided judgement
11 Holistic assessment/ Analytic assessment
12 Series assessment / Category assessment
13 Assessment by others / Self-assessment
Types of tests:
• Proficiency tests
• Achievement tests. 2 approaches:
 To base achievement tests on the textbook/syllabus
(contents)
 To base them on course objectives. More beneficial
washback.
• Diagnostic tests
• Placement tests
 Validity: the information gained is an accurate
representation of the proficiency of the
candidates
 Validity Types:
• Construct validity (very general, the information gained
is an accurate representation of the proficiency of the
candidate. It checks the validity of the construct, the
thing we want to measure)
• Content validity. This checks it the test’s content is a
representative sample of the skills or structures that it
wants to measure. In order to check this we need a
complete specification of all the skills or structures we
want to cover. If it covers 5% only, it has less content
validity than if it covers 25 %.
 Validity Types:
• Criterion-related validity: Results on the test agree with
other dependable results (criterion test)
 Concurrent validity. We compare the test results with the
criterion test.
 Predictive validity. The test predicts future performance.A
placement test is validated by the teachers who teach the
selected students.
• Validity in scoring. Not only the items need to be valid,
but also the way in which responses are scored
(taking into account grammar mistakes in a reading
comprehension exam is not valid)
• Face validity: the test has to look as if it measures
what it is supposed to measure. A written test to check
pronunciation has little face validity.
How to make tests more valid (Hughes)
Write specifications for the test.
Include a representative sample ot the
content of the specifications in the text
Whenever feasible, use direct testing
Make sure that the scoring relates directly
to what is being tested
Try to make the test reliable
Reliability: A student being tested twice will get the same
result (technical concept: the rank order of the candidates
is replicated in two separate—real or simulated—
administrations of the same assessment. Result: a
reliability coefficient, theoretical maximum 1, if all the
students get exactly the same result)
- We compare two tests. Methods:
- Test-Retest: the student takes the same test again
- Alternate Forms: the students take two alternate forms
of the same test
- Split.Half: you split the test into two equivalent halves
and compare them as if they were two different tests.
- Reliability coefficient / Standard Error of Measurement
A High Stakes Test needs a high reliability coefficient
(highest is 1), and therefore a very low standard error of
measurement (a number obtained by statistical
analysis). A Lower Stakes exam does not need those
coefficients.
- True Score: the real score that a student would get in a
perfectly reliable test. In a very reliable test, the true
score is clearly defined (the student will always get a
similar result, for example 65-67). In a less reliable test,
the range is wider (55-75).
- Scorer reliability (coefficient). You compare the scores
given by different scorers (examiners). The more
agreement, the more reliable their reliability coefficient.
Item analysis:
 Facility value
 Discrimination indices: drop some, improve
others
 Analyse distractors
 Item banking
1.Take enough samples of behaviour.
2.Exclude items which do not descriminate well
3.Do not allow candidates too much freedom.
4.Write unambiguous items
5.Provide clear and explicit instructions
6.Ensure that tests are well laid out and perfectly
legible
7.Make candidates familiar with format and testing
techniques
8.Provide uniform and non-distracting conditions of
administration
9. Use items which permit scoring which is as
objective as possible
10. Make comparisons between candidates as direct
as possible
11. Provide a detailed scoring key
12. Train scorers
13. Agree acceptable responses and appropriate
scores at the beginning of the scoring process.
14. Identifty candidates by number not by name
15. Employ multiple, independent scorers..
To be valid a test must be reliable (provide
accurate measurement)
A reliable test may not be valid at all
(technically perfect, but globally wrong: it
does not test what it is supposed to test)
 Test the abilities/skills you want to encourage.
 Sample widely and unpredictably
 Use direct testing
 Make testing criterion-referenced (CEFR)
 Base achievement tests on objectives
 Ensure that the test is known and understood by
students and teachers
 Counting the cost
1. Make a full and clear statement of the testing
‘problem’.
2. Write complete specifications for the test.
3. Write and moderate items.
4. Trial the items informally on native speakers and
reject or modify problematic ones as necessary.
5. Trial the test on a group of non-native speakers
similar to those for whom the test is intended.
6. Analyse the results of the trial and make any
necessary changes.
7. Calibrate scales: collect samples of performance,
use them as models (benchmarking)
8. Validate.
9. Write handbooks for test takers, test users and
staff.
10. Train any necessary staff (interviewers, raters,
etc.).
Chapters from Hughes’ Testing for Language Teachers
8. Common Test techniques: Elaine, 24th
9. Testing Writing: Marta, Idoia, 22nd
10. Testing Oral Abilities: Paula, Ángela, 24th
11. Testing Reading: Lucía, 24th
12. Testing Listening: Lorena, 22nd
13. Testing Grammar and Vocabulary: Clara, Cristina,
22nd
14. Testing Overall Ability: Jefferson, 22nd
15. Tests for Young Learners: Tania, Diego, 24th

More Related Content

What's hot

Kinds of tests
Kinds of testsKinds of tests
Kinds of tests
Mero Sarade
 
Designing language test
Designing language testDesigning language test
Designing language test
Jesullyna Manuel
 
How to improve test reliability
How to improve test reliabilityHow to improve test reliability
How to improve test reliability
KAthy Cea
 
Types of Test
Types of TestTypes of Test
Types of Test
irshad narejo
 
Steps to design a test
Steps to design a testSteps to design a test
Steps to design a test
Misael Montalvan
 
Advantages and limitations of subjective test items
Advantages and limitations of subjective test itemsAdvantages and limitations of subjective test items
Advantages and limitations of subjective test items
Test Generator
 
Test Construction
Test ConstructionTest Construction
Test Construction
Martin Vince Cruz, RPm
 
tryout test, item analysis (difficulty, discrimination)
tryout test, item analysis (difficulty, discrimination)tryout test, item analysis (difficulty, discrimination)
tryout test, item analysis (difficulty, discrimination)
April Gealene Alera
 
Construction of test
Construction of testConstruction of test
Construction of test
Abid Nazir
 
Kinds of tests and testing
Kinds of tests and testingKinds of tests and testing
Kinds of tests and testing
Maury Martinez
 
Langguage assessment( final version)
Langguage assessment( final version)Langguage assessment( final version)
Langguage assessment( final version)
Như Huỳnh Thị
 
Assembling The Test
Assembling The TestAssembling The Test
Assembling The Test
Dr. Amjad Ali Arain
 
Test and some test types (ev elt)
Test and some test types (ev elt)Test and some test types (ev elt)
Test and some test types (ev elt)
theryszard
 
Brown, chapter 4 By Savaedi
Brown, chapter 4 By SavaediBrown, chapter 4 By Savaedi
Brown, chapter 4 By Savaedi
Savaedi
 
Quality test construction 1
Quality test construction 1Quality test construction 1
Quality test construction 1
kuchet106
 
Test appraisal
Test appraisalTest appraisal
Test appraisal
International advisers
 
Laos Session 3: Principles of Reliability and Validity (EN)
Laos Session 3: Principles of Reliability and Validity (EN)Laos Session 3: Principles of Reliability and Validity (EN)
Laos Session 3: Principles of Reliability and Validity (EN)
NEQMAP
 
LANGUAJE TESTING
LANGUAJE TESTINGLANGUAJE TESTING
LANGUAJE TESTING
Videoconferencias UTPL
 
Chapter 2: Principles of Language Assessment
Chapter 2: Principles of Language AssessmentChapter 2: Principles of Language Assessment
Chapter 2: Principles of Language Assessment
Hamid Najaf Pour Sani
 
Achieving beneficial blackwash
Achieving beneficial blackwashAchieving beneficial blackwash
Achieving beneficial blackwash
Maury Martinez
 

What's hot (20)

Kinds of tests
Kinds of testsKinds of tests
Kinds of tests
 
Designing language test
Designing language testDesigning language test
Designing language test
 
How to improve test reliability
How to improve test reliabilityHow to improve test reliability
How to improve test reliability
 
Types of Test
Types of TestTypes of Test
Types of Test
 
Steps to design a test
Steps to design a testSteps to design a test
Steps to design a test
 
Advantages and limitations of subjective test items
Advantages and limitations of subjective test itemsAdvantages and limitations of subjective test items
Advantages and limitations of subjective test items
 
Test Construction
Test ConstructionTest Construction
Test Construction
 
tryout test, item analysis (difficulty, discrimination)
tryout test, item analysis (difficulty, discrimination)tryout test, item analysis (difficulty, discrimination)
tryout test, item analysis (difficulty, discrimination)
 
Construction of test
Construction of testConstruction of test
Construction of test
 
Kinds of tests and testing
Kinds of tests and testingKinds of tests and testing
Kinds of tests and testing
 
Langguage assessment( final version)
Langguage assessment( final version)Langguage assessment( final version)
Langguage assessment( final version)
 
Assembling The Test
Assembling The TestAssembling The Test
Assembling The Test
 
Test and some test types (ev elt)
Test and some test types (ev elt)Test and some test types (ev elt)
Test and some test types (ev elt)
 
Brown, chapter 4 By Savaedi
Brown, chapter 4 By SavaediBrown, chapter 4 By Savaedi
Brown, chapter 4 By Savaedi
 
Quality test construction 1
Quality test construction 1Quality test construction 1
Quality test construction 1
 
Test appraisal
Test appraisalTest appraisal
Test appraisal
 
Laos Session 3: Principles of Reliability and Validity (EN)
Laos Session 3: Principles of Reliability and Validity (EN)Laos Session 3: Principles of Reliability and Validity (EN)
Laos Session 3: Principles of Reliability and Validity (EN)
 
LANGUAJE TESTING
LANGUAJE TESTINGLANGUAJE TESTING
LANGUAJE TESTING
 
Chapter 2: Principles of Language Assessment
Chapter 2: Principles of Language AssessmentChapter 2: Principles of Language Assessment
Chapter 2: Principles of Language Assessment
 
Achieving beneficial blackwash
Achieving beneficial blackwashAchieving beneficial blackwash
Achieving beneficial blackwash
 

Viewers also liked

Thecommoneuropeanframeworkofreferenceforlanguages 110920205758-phpapp01
Thecommoneuropeanframeworkofreferenceforlanguages 110920205758-phpapp01Thecommoneuropeanframeworkofreferenceforlanguages 110920205758-phpapp01
Thecommoneuropeanframeworkofreferenceforlanguages 110920205758-phpapp01
Claudia Martínez
 
CEFR-based tools and resources: latest developments (Mila Angelova)
CEFR-based tools and resources: latest developments (Mila Angelova)CEFR-based tools and resources: latest developments (Mila Angelova)
CEFR-based tools and resources: latest developments (Mila Angelova)
eaquals
 
Strategies for assessment and grading
Strategies for assessment and gradingStrategies for assessment and grading
Strategies for assessment and grading
Abidi Mohamed Salah
 
7.2 assessment and the cefr (2)
7.2 assessment and the cefr (2)7.2 assessment and the cefr (2)
7.2 assessment and the cefr (2)
Jesús Ángel González López
 
Cefr presentation
Cefr presentationCefr presentation
Cefr presentation
Marcia Bento
 
Types of assessment(2)
Types of assessment(2)Types of assessment(2)
Types of assessment(2)
montuyajudith
 
Types of Assessment
Types of AssessmentTypes of Assessment
Types of Assessment
Cinderella Banares
 
Types of assessment
Types of assessmentTypes of assessment
Types of assessment
cwhinsch
 
Assessment types and tasks
Assessment types and tasksAssessment types and tasks
Assessment types and tasks
Sesegma Budazhapova
 

Viewers also liked (9)

Thecommoneuropeanframeworkofreferenceforlanguages 110920205758-phpapp01
Thecommoneuropeanframeworkofreferenceforlanguages 110920205758-phpapp01Thecommoneuropeanframeworkofreferenceforlanguages 110920205758-phpapp01
Thecommoneuropeanframeworkofreferenceforlanguages 110920205758-phpapp01
 
CEFR-based tools and resources: latest developments (Mila Angelova)
CEFR-based tools and resources: latest developments (Mila Angelova)CEFR-based tools and resources: latest developments (Mila Angelova)
CEFR-based tools and resources: latest developments (Mila Angelova)
 
Strategies for assessment and grading
Strategies for assessment and gradingStrategies for assessment and grading
Strategies for assessment and grading
 
7.2 assessment and the cefr (2)
7.2 assessment and the cefr (2)7.2 assessment and the cefr (2)
7.2 assessment and the cefr (2)
 
Cefr presentation
Cefr presentationCefr presentation
Cefr presentation
 
Types of assessment(2)
Types of assessment(2)Types of assessment(2)
Types of assessment(2)
 
Types of Assessment
Types of AssessmentTypes of Assessment
Types of Assessment
 
Types of assessment
Types of assessmentTypes of assessment
Types of assessment
 
Assessment types and tasks
Assessment types and tasksAssessment types and tasks
Assessment types and tasks
 

Similar to 7.1 assessment and the cefr (1)

7.1 assessment and the cefr (1)
7.1 assessment and the cefr (1)7.1 assessment and the cefr (1)
7.1 assessment and the cefr (1)
Jesús Ángel González López
 
research-instruments (1).pptx
research-instruments (1).pptxresearch-instruments (1).pptx
research-instruments (1).pptx
JCronus
 
Validity and reliability
Validity and reliabilityValidity and reliability
Validity and reliability
randoparis
 
Validity and reliability of questionnaires
Validity and reliability of questionnairesValidity and reliability of questionnaires
Validity and reliability of questionnaires
Venkitachalam R
 
Characteristics of Assessment
Characteristics of Assessment Characteristics of Assessment
Characteristics of Assessment
AliAlZurfi
 
Characteristics of Good Evaluation Instrument
Characteristics of Good Evaluation InstrumentCharacteristics of Good Evaluation Instrument
Characteristics of Good Evaluation Instrument
Suresh Babu
 
Types of Tests,
Types of Tests, Types of Tests,
Types of Tests,
Wardah Azhar
 
Apt 501 chapter_7
Apt 501 chapter_7Apt 501 chapter_7
Apt 501 chapter_7
cdjhaigler
 
Developing Assessment Instruments Chapter 7
Developing Assessment Instruments Chapter 7Developing Assessment Instruments Chapter 7
Developing Assessment Instruments Chapter 7
cdjhaigler
 
Developing Assessment Instrument
Developing Assessment InstrumentDeveloping Assessment Instrument
Developing Assessment Instrument
cdjhaigler
 
CONSTRUCTION OF TEST IN MANAGEMENT .docx
CONSTRUCTION OF TEST IN MANAGEMENT .docxCONSTRUCTION OF TEST IN MANAGEMENT .docx
CONSTRUCTION OF TEST IN MANAGEMENT .docx
PGIMS Rohtak
 
Principles_of_language_testing.ppt
Principles_of_language_testing.pptPrinciples_of_language_testing.ppt
Principles_of_language_testing.ppt
NaufalKurniawan12
 
Good test , Reliability and Validity of a good test
Good test , Reliability and Validity of a good testGood test , Reliability and Validity of a good test
Good test , Reliability and Validity of a good test
Tiru Goel
 
Testing for language teachers 101 (1)
Testing for language teachers 101 (1)Testing for language teachers 101 (1)
Testing for language teachers 101 (1)
Paul Doyon
 
Construction of Tests
Construction of TestsConstruction of Tests
Construction of Tests
Dakshta1
 
Validity and reliability in assessment.
Validity and reliability in assessment. Validity and reliability in assessment.
Validity and reliability in assessment.
Tarek Tawfik Amin
 
Week 8 & 9 - Validity and Reliability
Week 8 & 9 - Validity and ReliabilityWeek 8 & 9 - Validity and Reliability
Week 8 & 9 - Validity and Reliability
Syamsul Nor Azlan Mohamad
 
STANDARDIZED AND NON-STANDARDIZED TEST
STANDARDIZED AND NON-STANDARDIZED TESTSTANDARDIZED AND NON-STANDARDIZED TEST
STANDARDIZED AND NON-STANDARDIZED TEST
sakshi rana
 
ELTLAE Group 2.pptx
ELTLAE Group 2.pptxELTLAE Group 2.pptx
ELTLAE Group 2.pptx
AhzaPutro
 
constructionoftests-211015110341 (1).pptx
constructionoftests-211015110341 (1).pptxconstructionoftests-211015110341 (1).pptx
constructionoftests-211015110341 (1).pptx
GajeSingh9
 

Similar to 7.1 assessment and the cefr (1) (20)

7.1 assessment and the cefr (1)
7.1 assessment and the cefr (1)7.1 assessment and the cefr (1)
7.1 assessment and the cefr (1)
 
research-instruments (1).pptx
research-instruments (1).pptxresearch-instruments (1).pptx
research-instruments (1).pptx
 
Validity and reliability
Validity and reliabilityValidity and reliability
Validity and reliability
 
Validity and reliability of questionnaires
Validity and reliability of questionnairesValidity and reliability of questionnaires
Validity and reliability of questionnaires
 
Characteristics of Assessment
Characteristics of Assessment Characteristics of Assessment
Characteristics of Assessment
 
Characteristics of Good Evaluation Instrument
Characteristics of Good Evaluation InstrumentCharacteristics of Good Evaluation Instrument
Characteristics of Good Evaluation Instrument
 
Types of Tests,
Types of Tests, Types of Tests,
Types of Tests,
 
Apt 501 chapter_7
Apt 501 chapter_7Apt 501 chapter_7
Apt 501 chapter_7
 
Developing Assessment Instruments Chapter 7
Developing Assessment Instruments Chapter 7Developing Assessment Instruments Chapter 7
Developing Assessment Instruments Chapter 7
 
Developing Assessment Instrument
Developing Assessment InstrumentDeveloping Assessment Instrument
Developing Assessment Instrument
 
CONSTRUCTION OF TEST IN MANAGEMENT .docx
CONSTRUCTION OF TEST IN MANAGEMENT .docxCONSTRUCTION OF TEST IN MANAGEMENT .docx
CONSTRUCTION OF TEST IN MANAGEMENT .docx
 
Principles_of_language_testing.ppt
Principles_of_language_testing.pptPrinciples_of_language_testing.ppt
Principles_of_language_testing.ppt
 
Good test , Reliability and Validity of a good test
Good test , Reliability and Validity of a good testGood test , Reliability and Validity of a good test
Good test , Reliability and Validity of a good test
 
Testing for language teachers 101 (1)
Testing for language teachers 101 (1)Testing for language teachers 101 (1)
Testing for language teachers 101 (1)
 
Construction of Tests
Construction of TestsConstruction of Tests
Construction of Tests
 
Validity and reliability in assessment.
Validity and reliability in assessment. Validity and reliability in assessment.
Validity and reliability in assessment.
 
Week 8 & 9 - Validity and Reliability
Week 8 & 9 - Validity and ReliabilityWeek 8 & 9 - Validity and Reliability
Week 8 & 9 - Validity and Reliability
 
STANDARDIZED AND NON-STANDARDIZED TEST
STANDARDIZED AND NON-STANDARDIZED TESTSTANDARDIZED AND NON-STANDARDIZED TEST
STANDARDIZED AND NON-STANDARDIZED TEST
 
ELTLAE Group 2.pptx
ELTLAE Group 2.pptxELTLAE Group 2.pptx
ELTLAE Group 2.pptx
 
constructionoftests-211015110341 (1).pptx
constructionoftests-211015110341 (1).pptxconstructionoftests-211015110341 (1).pptx
constructionoftests-211015110341 (1).pptx
 

More from Jesús Ángel González López

Bilingual education at university uc policy plan
Bilingual education at university uc policy planBilingual education at university uc policy plan
Bilingual education at university uc policy plan
Jesús Ángel González López
 
Cambridge tkt assessment
Cambridge tkt assessmentCambridge tkt assessment
Cambridge tkt assessment
Jesús Ángel González López
 
American Geography
American GeographyAmerican Geography
American Geography
Jesús Ángel González López
 
History of Britain
History of BritainHistory of Britain
History of Britain
Jesús Ángel González López
 
8 how to teach literature (and comics)
8 how to teach literature (and comics) 8 how to teach literature (and comics)
8 how to teach literature (and comics)
Jesús Ángel González López
 
6 teaching culture
6 teaching culture 6 teaching culture
6 teaching culture
Jesús Ángel González López
 
5 methodology
5 methodology 5 methodology
4 european language portfolio
4 european language portfolio 4 european language portfolio
4 european language portfolio
Jesús Ángel González López
 
ceftrain
ceftrain ceftrain
cefr introduction
cefr introductioncefr introduction
Theoretical framework
Theoretical frameworkTheoretical framework
Theoretical framework
Jesús Ángel González López
 
1 popular fiction
1 popular fiction1 popular fiction
2 dashiell hammett bio
2 dashiell hammett bio2 dashiell hammett bio
2 dashiell hammett bio
Jesús Ángel González López
 
European dimensions
European dimensionsEuropean dimensions
European dimensions
Jesús Ángel González López
 
language policy plan
language policy planlanguage policy plan
language policy plan
Jesús Ángel González López
 
Pcic
PcicPcic
Otras aplicaciones
Otras aplicacionesOtras aplicaciones
Otras aplicaciones
Jesús Ángel González López
 
Evaluación
EvaluaciónEvaluación
Metodología
MetodologíaMetodología
4 ceftrain
4 ceftrain4 ceftrain

More from Jesús Ángel González López (20)

Bilingual education at university uc policy plan
Bilingual education at university uc policy planBilingual education at university uc policy plan
Bilingual education at university uc policy plan
 
Cambridge tkt assessment
Cambridge tkt assessmentCambridge tkt assessment
Cambridge tkt assessment
 
American Geography
American GeographyAmerican Geography
American Geography
 
History of Britain
History of BritainHistory of Britain
History of Britain
 
8 how to teach literature (and comics)
8 how to teach literature (and comics) 8 how to teach literature (and comics)
8 how to teach literature (and comics)
 
6 teaching culture
6 teaching culture 6 teaching culture
6 teaching culture
 
5 methodology
5 methodology 5 methodology
5 methodology
 
4 european language portfolio
4 european language portfolio 4 european language portfolio
4 european language portfolio
 
ceftrain
ceftrain ceftrain
ceftrain
 
cefr introduction
cefr introductioncefr introduction
cefr introduction
 
Theoretical framework
Theoretical frameworkTheoretical framework
Theoretical framework
 
1 popular fiction
1 popular fiction1 popular fiction
1 popular fiction
 
2 dashiell hammett bio
2 dashiell hammett bio2 dashiell hammett bio
2 dashiell hammett bio
 
European dimensions
European dimensionsEuropean dimensions
European dimensions
 
language policy plan
language policy planlanguage policy plan
language policy plan
 
Pcic
PcicPcic
Pcic
 
Otras aplicaciones
Otras aplicacionesOtras aplicaciones
Otras aplicaciones
 
Evaluación
EvaluaciónEvaluación
Evaluación
 
Metodología
MetodologíaMetodología
Metodología
 
4 ceftrain
4 ceftrain4 ceftrain
4 ceftrain
 

Recently uploaded

HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.
deepaannamalai16
 
Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10
nitinpv4ai
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
 
Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
MysoreMuleSoftMeetup
 
How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17
Celine George
 
Pharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brubPharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brub
danielkiash986
 
Electric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger HuntElectric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger Hunt
RamseyBerglund
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
سمير بسيوني
 
Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"
National Information Standards Organization (NISO)
 
A Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two HeartsA Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two Hearts
Steve Thomason
 
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDFLifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
Vivekanand Anglo Vedic Academy
 
Educational Technology in the Health Sciences
Educational Technology in the Health SciencesEducational Technology in the Health Sciences
Educational Technology in the Health Sciences
Iris Thiele Isip-Tan
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
TechSoup
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
Nguyen Thanh Tu Collection
 
MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025
khuleseema60
 
Standardized tool for Intelligence test.
Standardized tool for Intelligence test.Standardized tool for Intelligence test.
Standardized tool for Intelligence test.
deepaannamalai16
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
MJDuyan
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
Krassimira Luka
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
PsychoTech Services
 

Recently uploaded (20)

HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.
 
Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
 
Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
 
How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17
 
Pharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brubPharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brub
 
Electric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger HuntElectric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger Hunt
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
 
Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"
 
A Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two HeartsA Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two Hearts
 
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDFLifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
 
Educational Technology in the Health Sciences
Educational Technology in the Health SciencesEducational Technology in the Health Sciences
Educational Technology in the Health Sciences
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
 
MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025
 
Standardized tool for Intelligence test.
Standardized tool for Intelligence test.Standardized tool for Intelligence test.
Standardized tool for Intelligence test.
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
 

7.1 assessment and the cefr (1)

  • 2. What does the word suggest? What sort of emotions does it convey? Try to write a definition. What does it imply? Which characteristics should it have?
  • 3.  What does the word suggest?  What sort of emotions does it convey?  Try to write a definition. What does it imply? • Collecting information • Analyzing the information and making an assessment • Taking decisions according to the assessment made:  Pedagogical decisions (formative assessment)  Social decisions  Which characteristics should it have? • Validity, reliability, feasibility
  • 4.  Assessment: Assessment of the proficiency of the language user  3 key concepts: • Validity: the information gained is an accurate representation of the proficiency of the candidates • Reliability: A student being tested twice will get the same result (technical concept: the rank order of the candidates is replicated in two separate—real or simulated—administrations of the same assessment ) • Feasibility: The procedure needs to be practical, adapted to the available elements and features
  • 5.  If we want assessment to be valid, reliable, and feasible, we need to specify: • What is assessed: according to the CEFR, communicative activities (contexts, texts, and tasks). See examples. • How performance is interpreted: assessment criteria. See examples • How to make comparisons between different tests and ways of assessment (for example, between public examinations and teacher assesment). Two main procedures:  Social moderation: discussion between experts  Benchmarking: comparison of samples in relation to standardized definitions and examples, which become reference points (benchmarks) • Guidelines for good practice: EALTA
  • 6. TYPES OF ASSESSMENT 1 Achievement assessment / Proficiency assessment 2 Norm-referencing (NR)/ Criterion-referencing (CR) 3 Mastery learning CR / Continuum CR 4 Continuous assessment / Fixed assessment points 5 Formative assessment / Summative assessment 6 Direct assessment / Indirect assessment 7 Performance assessment / Knowledge assessment 8 Subjective assessment / Objective assessment 9 Checklist rating / Performance rating 10 Impression / Guided judgement 11 Holistic assessment/ Analytic assessment 12 Series assessment / Category assessment 13 Assessment by others / Self-assessment
  • 7. Types of tests: • Proficiency tests • Achievement tests. 2 approaches:  To base achievement tests on the textbook/syllabus (contents)  To base them on course objectives. More beneficial washback. • Diagnostic tests • Placement tests
  • 8.  Validity: the information gained is an accurate representation of the proficiency of the candidates  Validity Types: • Construct validity (very general, the information gained is an accurate representation of the proficiency of the candidate. It checks the validity of the construct, the thing we want to measure) • Content validity. This checks it the test’s content is a representative sample of the skills or structures that it wants to measure. In order to check this we need a complete specification of all the skills or structures we want to cover. If it covers 5% only, it has less content validity than if it covers 25 %.
  • 9.  Validity Types: • Criterion-related validity: Results on the test agree with other dependable results (criterion test)  Concurrent validity. We compare the test results with the criterion test.  Predictive validity. The test predicts future performance.A placement test is validated by the teachers who teach the selected students. • Validity in scoring. Not only the items need to be valid, but also the way in which responses are scored (taking into account grammar mistakes in a reading comprehension exam is not valid) • Face validity: the test has to look as if it measures what it is supposed to measure. A written test to check pronunciation has little face validity.
  • 10. How to make tests more valid (Hughes) Write specifications for the test. Include a representative sample ot the content of the specifications in the text Whenever feasible, use direct testing Make sure that the scoring relates directly to what is being tested Try to make the test reliable
  • 11. Reliability: A student being tested twice will get the same result (technical concept: the rank order of the candidates is replicated in two separate—real or simulated— administrations of the same assessment. Result: a reliability coefficient, theoretical maximum 1, if all the students get exactly the same result) - We compare two tests. Methods: - Test-Retest: the student takes the same test again - Alternate Forms: the students take two alternate forms of the same test - Split.Half: you split the test into two equivalent halves and compare them as if they were two different tests.
  • 12. - Reliability coefficient / Standard Error of Measurement A High Stakes Test needs a high reliability coefficient (highest is 1), and therefore a very low standard error of measurement (a number obtained by statistical analysis). A Lower Stakes exam does not need those coefficients. - True Score: the real score that a student would get in a perfectly reliable test. In a very reliable test, the true score is clearly defined (the student will always get a similar result, for example 65-67). In a less reliable test, the range is wider (55-75). - Scorer reliability (coefficient). You compare the scores given by different scorers (examiners). The more agreement, the more reliable their reliability coefficient.
  • 13. Item analysis:  Facility value  Discrimination indices: drop some, improve others  Analyse distractors  Item banking
  • 14. 1.Take enough samples of behaviour. 2.Exclude items which do not descriminate well 3.Do not allow candidates too much freedom. 4.Write unambiguous items 5.Provide clear and explicit instructions 6.Ensure that tests are well laid out and perfectly legible 7.Make candidates familiar with format and testing techniques 8.Provide uniform and non-distracting conditions of administration
  • 15. 9. Use items which permit scoring which is as objective as possible 10. Make comparisons between candidates as direct as possible 11. Provide a detailed scoring key 12. Train scorers 13. Agree acceptable responses and appropriate scores at the beginning of the scoring process. 14. Identifty candidates by number not by name 15. Employ multiple, independent scorers..
  • 16. To be valid a test must be reliable (provide accurate measurement) A reliable test may not be valid at all (technically perfect, but globally wrong: it does not test what it is supposed to test)
  • 17.  Test the abilities/skills you want to encourage.  Sample widely and unpredictably  Use direct testing  Make testing criterion-referenced (CEFR)  Base achievement tests on objectives  Ensure that the test is known and understood by students and teachers  Counting the cost
  • 18. 1. Make a full and clear statement of the testing ‘problem’. 2. Write complete specifications for the test. 3. Write and moderate items. 4. Trial the items informally on native speakers and reject or modify problematic ones as necessary. 5. Trial the test on a group of non-native speakers similar to those for whom the test is intended. 6. Analyse the results of the trial and make any necessary changes. 7. Calibrate scales: collect samples of performance, use them as models (benchmarking) 8. Validate. 9. Write handbooks for test takers, test users and staff. 10. Train any necessary staff (interviewers, raters, etc.).
  • 19. Chapters from Hughes’ Testing for Language Teachers 8. Common Test techniques: Elaine, 24th 9. Testing Writing: Marta, Idoia, 22nd 10. Testing Oral Abilities: Paula, Ángela, 24th 11. Testing Reading: Lucía, 24th 12. Testing Listening: Lorena, 22nd 13. Testing Grammar and Vocabulary: Clara, Cristina, 22nd 14. Testing Overall Ability: Jefferson, 22nd 15. Tests for Young Learners: Tania, Diego, 24th

Editor's Notes

  1. If we want assessment to be valid, reliable, and feasible, we need to specify: What is assessed: according to the CEFR, communicative activities (contexts, texts, and tasks). See examples. How performance is interpreted: assessment criteria. See examples How to make comparisons between different tests and ways of assessment (for example, between public examinations and teacher assesment). Two main procedures: Social moderation: discussion between experts Benchmarking: comparison of samples in relation to standardized definitions and examples Guidelines for good practice: EALTA
  2. Types of tests: Proficiency tests: designed to measure people’s ability in a language, regardless of any training. “Proficient”: command of the language, for a particular purpose or for general purposes. Achievement tests: most teachers are not responsible for proficiency tests, but for achievement tests. They are normally related to language courses. Two approaches: to base achievement tests on the textbook (or the syllabus), so that only what is covered in the classes is tested, or, much better, to base test content on course objectives. More beneficial washback. The long-term interests of the students are best served by this approach. Two types: final achievement tests, and progress achievement tests (formative assessment) Diagnostic tests: Used to identify learners’ strengths and weaknesses (example: Dialang) Placement tests: to place students at the stage most appropriate to their abilities
  3. A test is valid if it measures accurately what it is intended to measure. Or, the information gained is an accurate representation of the proficiency of the candidate. This general type of validity is called “construct validity”, the validity of the construct, the thing we want to measure Content validity: A test has it if its content constitutes a representative sample of the language skills or structures, etc. that it wants to measure. So, first, we need a specification of the skills of structures that we want to cover, and compare them with the test itself. For example, B2 writing skills, writing formal letters is one of the subskills shown in the specification, there are more, the more we cover, the more valid the test will be. The more content validity, the more construct validity and the more backwash effect. Criterion-related validity: Results on the test agree with other (independent and highly dependable) results. This independent assessment is the criterion measure. Two types: Concurrent validity: we compare the criterion test and the test that we want to check. They both take place at about the same time. Example 1: we administer a 45 m. oral test where all the subskills, tasks, operations, are tested. But only to a sample of the students. This is the criterion test. Then we do 10 m. interviews to the whole level of students. We compare the results, and they tell us whether 10 m. is enough or not. This is expressed in a “correlation coefficient” bw the criterion and the test being validated. Example 2: we compare the results of a general test (Pruebas Estandarizadas) with teachers’ assessment. Predictive validity: the test predicts future performance of the students. A placement test can easily be validated by the teachers teaching the students by checking if the students are well placed or not. Validity in scoring: not only the items need to be valid, but also the way in which the responses are scored. For example, a reading test may call for short written responses. If the scoring of these responses takes into account spelling and grammar, then it is not valid (it is not measuring what it is intended to measure). Same for the scoring of writing or speaking. Face validity: the test has to look as if it measures what it is supposed to measure. It is not a scientific notion, but it is important (for candidates, teachers, employers). For example, a written test to check pronunciation.
  4. A test is valid if it measures accurately what it is intended to measure. Or, the information gained is an accurate representation of the proficiency of the candidate. This general type of validity is called “construct validity”, the validity of the construct, the thing we want to measure Content validity: A test has it if its content constitutes a representative sample of the language skills or structures, etc. that it wants to measure. So, first, we need a specification of the skills of structures that we want to cover, and compare them with the test itself. For example, B2 writing skills, writing formal letters is one of the subskills shown in the specification, there are more, the more we cover, the more valid the test will be. The more content validity, the more construct validity and the more backwash effect. Criterion-related validity: Results on the test agree with other (independent and highly dependable) results. This independent assessment is the criterion measure. Two types: Concurrent validity: we compare the criterion test and the test that we want to check. They both take place at about the same time. Example 1: we administer a 45 m. oral test where all the subskills, tasks, operations, are tested. But only to a sample of the students. This is the criterion test. Then we do 10 m. interviews to the whole level of students. We compare the results, and they tell us whether 10 m. is enough or not. This is expressed in a “correlation coefficient” bw the criterion and the test being validated. Example 2: we compare the results of a general test (Pruebas Estandarizadas) with teachers’ assessment. Predictive validity: the test predicts future performance of the students. A placement test can easily be validated by the teachers teaching the students by checking if the students are well placed or not. Validity in scoring: not only the items need to be valid, but also the way in which the responses are scored. For example, a reading test may call for short written responses. If the scoring of these responses takes into account spelling and grammar, then it is not valid (it is not measuring what it is intended to measure). Same for the scoring of writing or speaking. Face validity: the test has to look as if it measures what it is supposed to measure. It is not a scientific notion, but it is important (for candidates, teachers, employers). For example, a written test to check pronunciation.
  5. Reliability: A student being tested twice will get the same result (technical concept: the rank order of the candidates is replicated in two separate—real or simulated—administrations of the same assessment ) We compare two tests taken by the same group of students, and get a reliability coefficient: if all the students get exactly the same result, the coefficient is 1 (It never happens). High Stakes Tests need a higher coefficient than Lower Stakes exams. They shouldn’t depend on chance, or particular circumstances. In order to get two comparable tests, there are two procedures: Test-retest method: the students take the same test again Alternate forms method: the students take two alternate forms of the same test Split half method: you split the test into two (equivalent) halves and compare them as if they were two different tests. You get a “coefficient of internal consistency”. We also need to know the standard error of measurement of a test. This is actually the opposite of the reliability coefficient and you can get it through statistical analysis. With this number, we can find out what the true score of a student is. For example, if we have a very reliable test, it will have a low standard error of measurement, and therefore, the student will always get a very similar result no matter how many times he takes the test. In a less reliable test, his true score would be less defined. The true score lies in a range that varies depending on the standard error of measurement of the test. These numbers are important to compare tests and to take decisions (by companies, governments, etc.) based on those results. Another statistical procedure commonly used now is Item Response Theory. Very technical. Scorer reliability. There is also a scorer reliability coefficient, the level of agreement given by the same or different scorers on different occasions. If the scoring is not reliable, the rest results cannot be reliable.
  6. Reliability: A student being tested twice will get the same result (technical concept: the rank order of the candidates is replicated in two separate—real or simulated—administrations of the same assessment ) We compare two tests taken by the same group of students, and get a reliability coefficient: if all the students get exactly the same result, the coefficient is 1 (It never happens). High Stakes Tests need a higher coefficient than Lower Stakes exams. They shouldn’t depend on chance, or particular circumstances. In order to get two comparable tests, there are two procedures: Test-retest method: the students take the same test again Alternate forms method: the students take two alternate forms of the same test Split half method: you split the test into two (equivalent) halves and compare them as if they were two different tests. You get a “coefficient of internal consistency”. We also need to know the standard error of measurement of a test. This is actually the opposite of the reliability coefficient and you can get it through statistical analysis. With this number, we can find out what the true score of a student is. For example, if we have a very reliable test, it will have a low standard error of measurement, and therefore, the student will always get a very similar result no matter how many times he takes the test. In a less reliable test, his true score would be less defined. The true score lies in a range that varies depending on the standard error of measurement of the test. These numbers are important to compare tests and to take decisions (by companies, governments, etc.) based on those results. Another statistical procedure commonly used now is Item Response Theory. Very technical. Scorer reliability. There is also a scorer reliability coefficient, the level of agreement given by the same or different scorers on different occasions. If the scoring is not reliable, the rest results cannot be reliable.
  7. Item analysis: Facility value Discrimination indices: drop some, improve others Analyse distractors Item banking SEE EXAMPLE FROM FUENSANTA
  8. How to make tests more reliable (Hughes) Take enough samples of behaviour. The more items, the more reliable. The higher stakes, the longer it should be. Example from the Bible. P. 45 Exclude items which do not descriminate well between weaker and stronger students Do not allow candidates too much freedom. Example p. 46 Write unambiguous items: Critical scrutiny of colleagues, pre-testing (trialling, piloting) Provide clear and explicit instructions: write them down, read them aloud. No problem with writing them in L1. Ensure that tests are well laid out and perfectly legible Make candidates familiar with format and testing techniques Provide uniform and non-distracting conditions of administration (specified timing, good acoustic conditions)
  9. Use items which permit scoring which is as objective as possible (better one-word response than multiple choice) Make comparisons between candidates as direct as possible (no choice of items) Provide a detailed scoring key Train scorers Agree acceptable responses and appropriate scores at the beginning of the scoring process. Score a sample. Choose representative examples. Agree. Then scorers can begin to score. Identifty candidates by number not by name Emply multiple, independent scorers. At least two, independently. Then, a third, senior scorer gets the results, and investigates discrepancies.
  10. Washback/Backwash: (One of the) main reasons for a language teacher/school/department to use appropriate forms of assessment. Test the abilities/skills you want to encourage. Give them sufficient weight in relation to other skills. Sample widely and unpredictably: Test across the full range of the specifications Use direct testing Make testing criterion-referenced (CEFR) Base achievement tests on objectives Ensure that the test is known and understood by students and teachers (the more transparent, the better) (Where necessary, provide assistance to teachers) Counting the cost: Individual direct testing is expensive, but what is the cost of not achieving beneficial washback
  11. Calibrate scales: collect samples of performance, and use them as models, reference points (European Study)