1. The document discusses theories and procedures for establishing validity, including Messick's unified concept of validity and his six aspects of validity: content, structural, consequential, external, generalizability, and substantive.
2. It also covers Kane's argument-based approach to validation using interpretive and validity arguments supported by logical and empirical evidence.
3. The document proposes four questions to guide validation of a Vietnamese English test based on Messick's framework: examining test content, measuring students' English ability, correlating test scores with another test, and investigating test score consequences.
It talks about the different types of validity in assessment.
* Face Validity
* Content Validity
* Predictive Validity
* Concurrent Validity
* Construct Validity
It talks about the different types of validity in assessment.
* Face Validity
* Content Validity
* Predictive Validity
* Concurrent Validity
* Construct Validity
Why Process Measures Are Often More Important Than Outcome Measures in Health...Health Catalyst
The healthcare industry is currently obsessed with outcome measures — and for good reason. But tracking outcome measures alone is insufficient to reach the goals of better quality and reduced costs. Instead, health systems must get more granular with their data by tracking process measures. Process measures make it possible to identify the root cause of a health system’s failures. They’re the checklists of systematically guaranteeing that the right care will be delivered to every patient, every time. By using these checklists, organizations will be able to improve quality and cost by reducing the amount of variation in care delivery.
A presentation on validity and reliability assessment of questionnaire in research. Also includes types of validity and reliability and steps in achieving the same.
Given the importance of a test, especially if it is used as a decision-making tool, of course a good test is needed. A good test must meet the characteristics (characteristics) of a good test.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
2. VALIDATION TASK
To establish whether the interpretation and uses
of the VSTEP test scores were valid for measuring the
English language competence of test-takers
from level 3 to level 5 on the Vietnamese English
language competence scale
225/12/2015
To establish whether the interpretation and uses
of the VSTEP test scores were valid for measuring the
English language competence of test-takers
from level 3 to level 5 on the Vietnamese English
language competence scale
3. VALIDITY & VALIDATION
Validity is an integrated evaluative judgment of the degree to
which empirical evidence and theoretical rationales support the
adequacy and appropriateness of inferences and actions based
on test scores or other models of assessment.
(Messick, 1989)
325/12/2015
Validity is an integrated evaluative judgment of the degree to
which empirical evidence and theoretical rationales support the
adequacy and appropriateness of inferences and actions based
on test scores or other models of assessment.
(Messick, 1989)
Validation is to marshal evidence and arguments in support of,
or counter to, proposed interpretations and uses of test scores.
(Messick, 1989)
6. MESSICK (1989)’S ASPECTS OF VALIDITY
The content aspect
Content relevance
Representativeness
Technical quality
The substantive aspect
Theoretical rationales for observed consistencies in responses
Process of performance
Empirical evidence of process
625/12/2015
The content aspect
Content relevance
Representativeness
Technical quality
The substantive aspect
Theoretical rationales for observed consistencies in responses
Process of performance
Empirical evidence of process
7. MESSICK (1989)’S ASPECTS OF VALIDITY
The structural aspect
The fidelity of the scoring structure to the construct structure.
The generalizability aspect
The extent to which score properties and interpretations
generalize to and across groups, settings and tasks
Reliability
Content representativeness
725/12/2015
The structural aspect
The fidelity of the scoring structure to the construct structure.
The generalizability aspect
The extent to which score properties and interpretations
generalize to and across groups, settings and tasks
Reliability
Content representativeness
8. MESSICK (1989)’S ASPECTS OF VALIDITY
The external aspect
Convergent and discriminant evidence
Criterion relevance
Applied utility
The consequential aspect
Value implications as a basis for action/consequences
Bias
Fairness
825/12/2015
The external aspect
Convergent and discriminant evidence
Criterion relevance
Applied utility
The consequential aspect
Value implications as a basis for action/consequences
Bias
Fairness
9. MESSICK (1989)’S VALIDITY FRAMEWORK
Value
The most influential framework of validity
Criticisms
Abstract
Difficult to be done by a single researcher
No specific guidance for specific validation context
925/12/2015
Value
The most influential framework of validity
Criticisms
Abstract
Difficult to be done by a single researcher
No specific guidance for specific validation context
10. VALIDITY THEORIES
Kane (1992)’s and (2006)’s Validity Chapter
Argument-based Approach to Validation
Interpretive Argument
The network of inferences and assumptions
Validity Argument
Logical evidence
Empirical evidence
The
Development
Stage
1025/12/2015
Kane (1992)’s and (2006)’s Validity Chapter
Argument-based Approach to Validation
Interpretive Argument
The network of inferences and assumptions
Validity Argument
Logical evidence
Empirical evidence
The
Appraisal
Stage
11. KANE (1992)’S VALIDITY FRAMEWORK
Values
The most practical, objective framework of validity
Unique interpretive argument, consistent validity argument
steps (Bachman, 2004)
Criticisms
No attention to the structural aspect (Messick, 1995)
Inadequate attention/method to policy context and
consequences of tests (McNamara, 2006).
1125/12/2015
Values
The most practical, objective framework of validity
Unique interpretive argument, consistent validity argument
steps (Bachman, 2004)
Criticisms
No attention to the structural aspect (Messick, 1995)
Inadequate attention/method to policy context and
consequences of tests (McNamara, 2006).
12. LANGUAGE TEST VALIDATION
Bachman (1990)’s framework, after Messick (1989)’s
Bachman (2004)’s framework, after Kane (1992)’s
1225/12/2015
Bachman (1990)’s framework, after Messick (1989)’s
Bachman (2004)’s framework, after Kane (1992)’s
14. 1. To what extent was the test content relevant to and
representative of the domain of English language ability?
2. To what extent was each sub-test successful in measuring
students’ English language ability?
3. How well did the test-takers’ test scores on the VSTEP
correlate with their test scores on the IELTS?
4. What were the consequences of the UEE English test
scores' interpretation and use?
VALIDATION QUESTIONS
1425/12/2015
1. To what extent was the test content relevant to and
representative of the domain of English language ability?
2. To what extent was each sub-test successful in measuring
students’ English language ability?
3. How well did the test-takers’ test scores on the VSTEP
correlate with their test scores on the IELTS?
4. What were the consequences of the UEE English test
scores' interpretation and use?
16. WINTERTemplate
RELEVANCE
• Topical content
• Typical behavior
• Underlying process
• Test specifications
01CONTENT
RELEVANCE
• Topical content
• Typical behavior
• Underlying process
• Test specifications
17. WINTERTemplate
01CONTENT
TECHNICAL QUALITY
Empirical Evidence
• difficulty level
• discriminating power
Expert Judgment
• readability level
• freedom of ambiguity/irrelevancy
• appropriateness of keyed answers & distractors
TECHNICAL QUALITY
Empirical Evidence
• difficulty level
• discriminating power
Expert Judgment
• readability level
• freedom of ambiguity/irrelevancy
• appropriateness of keyed answers & distractors
18. WINTERTemplate
REPRESENTATIVENESS
The breadth of the content specifications for a test should
reflect the breadth of the construct invoked in score
interpretation” (Messick, 1989, p. 35).
All essential components of the construct domain are
covered (Messick, 1994, p. 12).
01CONTENT
REPRESENTATIVENESS
The breadth of the content specifications for a test should
reflect the breadth of the construct invoked in score
interpretation” (Messick, 1989, p. 35).
All essential components of the construct domain are
covered (Messick, 1994, p. 12).
19. WINTERTemplate
01CONTENT
CONTENT ANALYSIS BY EXPERTS
• What knowledge and skills are needed to do each
item correctly?
• How relevant are the items to their assigned
objectives and domain?
Domain
• English secondary school curricula
• English program at the college
CONTENT ANALYSIS BY EXPERTS
• What knowledge and skills are needed to do each
item correctly?
• How relevant are the items to their assigned
objectives and domain?
Domain
• English secondary school curricula
• English program at the college
21. WINTERTemplate
01CONTENT
Item fit statistics
Smith (2004) suggested using item fit statistics to evaluate the
extent to which items tap into the same construct and place
test-takers in the same order.
- the extent to which the use of each item is consistent with the
way people have responded to the other items
- does the item rank order the individuals in a manner similar to
other items? (p. 106)
Smith (2004) argued that test-takers should be ranked
consistently by items measuring the same construct. If not, the
misfitting items to the Rasch model, i.e. the items that measure
a different construct, should be subject to revision or elimination
(p. 107).
Item fit statistics
Smith (2004) suggested using item fit statistics to evaluate the
extent to which items tap into the same construct and place
test-takers in the same order.
- the extent to which the use of each item is consistent with the
way people have responded to the other items
- does the item rank order the individuals in a manner similar to
other items? (p. 106)
Smith (2004) argued that test-takers should be ranked
consistently by items measuring the same construct. If not, the
misfitting items to the Rasch model, i.e. the items that measure
a different construct, should be subject to revision or elimination
(p. 107).
22. To what extent was the VSTEP sub-tests successful in
measuring students’ English language competence?
ITEM RESPONSE THEORY (RASCH MODEL)
item fit
item discrimination
item cluster
DISCRIPTIVE STATISTICS
choice response analysis
02SUBSTANTIVE & STRUCTURAL
25/12/2015 22
To what extent was the VSTEP sub-tests successful in
measuring students’ English language competence?
ITEM RESPONSE THEORY (RASCH MODEL)
item fit
item discrimination
item cluster
DISCRIPTIVE STATISTICS
choice response analysis
23. How well did the test-takers’ VSTEP overall and
sub-test scores correlate with the test-takers’
overall and sub-test IELTS scores?
03CRITERION-RELATED
25/12/2015 23
24. 04
• The value implications of score interpretation
• The actual and potential consequences of score
uses
(Messick, 1989)
FOCUS: on validity of test score interpretation and
use - construct under-representation or construct-
irrelevant variance
CONSEQUENCES
25/12/2015 24
• The value implications of score interpretation
• The actual and potential consequences of score
uses
(Messick, 1989)
FOCUS: on validity of test score interpretation and
use - construct under-representation or construct-
irrelevant variance
25. 04
Sources of evidence
• Content relevance and representativeness
• Item bias
• Technical quality of the test
• Expert judgment
CONSEQUENCES
25/12/2015 25
Sources of evidence
• Content relevance and representativeness
• Item bias
• Technical quality of the test
• Expert judgment
26. References
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1985). Standards
for Educational and Psychological Testing. Washington, DC: Authors.
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards
for Educational and Psychological Testing. Washington, DC: American Educational Research Association.
Andrich, D., & Mercer, A. (1997). International perspectives on selection methods of entry into higher education. Canberra: National Board of
Employment, Education and Training [and] Higher Education Council.
Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.
Bachman, L. F. (2004). Statistical analyses for language assessment. Cambridge: Cambridge University Press.
Berk, R. A. (1980). Item Analysis. In R. A. Berk (Ed.), Criterion-referenced measurement: the state of the art. Baltimore and London: The Johns Hopkins
University Press.
Cureton, E. E. (1951). Validity. In E. F. Lindquist (Ed.), Educational measurement (pp. 621-694). Washington, D.C.: American Council on Education.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, California: Sage Publications.
Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527.
Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17-64). Westport, CT: American Council on
Education/Praeger.
Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3(4), 635-694.
McNamara, T., & Roever, C. (2006). Language testing: the social dimension. Malden, MA: Blackwell Publishing.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-103). New York: American Council on Education/Macmillan.
MOET. (2006). Secondary Education Curriculum: English. Hanoi: Education Publisher.
Moss, P. A. (2007). Reconstructing Validity. Educational Researcher, 36(8), 470-476.
Popham, W. J. (1997). Consequential Validity: Right Concern--Wrong Concept. Educational Measurement: Issues and Practice, 16(2), 9-13.
Purpura, J. E. (1999). Learner strategy use and performance on language tests : a structural equation modeling approach. Cambridge: Cambridge
University Press.
Smith, E. V. (2004). Evidence for Reliability of Measures and Validity of Measure Interpretation: A Rasch Measurement Perspective. In E. V. Smith & R.
M. Smith (Eds.), Introduction to Rasch Measurement: Theory, Models and Applications. Maple Grove: JAM Press.
Wu, M. L., Adams, R. J., & Haldane, S. (2008). ConQuest: Generalised Item Response Modelling Software [computer program]. Camberwell: Australian
Council for Educational Research.
2625/12/2015
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1985). Standards
for Educational and Psychological Testing. Washington, DC: Authors.
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards
for Educational and Psychological Testing. Washington, DC: American Educational Research Association.
Andrich, D., & Mercer, A. (1997). International perspectives on selection methods of entry into higher education. Canberra: National Board of
Employment, Education and Training [and] Higher Education Council.
Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.
Bachman, L. F. (2004). Statistical analyses for language assessment. Cambridge: Cambridge University Press.
Berk, R. A. (1980). Item Analysis. In R. A. Berk (Ed.), Criterion-referenced measurement: the state of the art. Baltimore and London: The Johns Hopkins
University Press.
Cureton, E. E. (1951). Validity. In E. F. Lindquist (Ed.), Educational measurement (pp. 621-694). Washington, D.C.: American Council on Education.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, California: Sage Publications.
Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527.
Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17-64). Westport, CT: American Council on
Education/Praeger.
Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3(4), 635-694.
McNamara, T., & Roever, C. (2006). Language testing: the social dimension. Malden, MA: Blackwell Publishing.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-103). New York: American Council on Education/Macmillan.
MOET. (2006). Secondary Education Curriculum: English. Hanoi: Education Publisher.
Moss, P. A. (2007). Reconstructing Validity. Educational Researcher, 36(8), 470-476.
Popham, W. J. (1997). Consequential Validity: Right Concern--Wrong Concept. Educational Measurement: Issues and Practice, 16(2), 9-13.
Purpura, J. E. (1999). Learner strategy use and performance on language tests : a structural equation modeling approach. Cambridge: Cambridge
University Press.
Smith, E. V. (2004). Evidence for Reliability of Measures and Validity of Measure Interpretation: A Rasch Measurement Perspective. In E. V. Smith & R.
M. Smith (Eds.), Introduction to Rasch Measurement: Theory, Models and Applications. Maple Grove: JAM Press.
Wu, M. L., Adams, R. J., & Haldane, S. (2008). ConQuest: Generalised Item Response Modelling Software [computer program]. Camberwell: Australian
Council for Educational Research.