VALIDITY, RELIABILITY & PRACTICALITYProf. Jonathan Magdalena
QUALITIES OF MEASUREMENT DEVICESValidityDoes it measure what it is supposed to measure?ReliabilityHow representative is the measurement?ObjectivityDo independent scorers agree?PracticalityIs it easy to construct, administer, score and interpret?
VALIDITYValidity refers to whether or not a test measures what it intends to measure. A test with high validity has items closely linked to the test’s intended focus. A test with poor validity does not measure the content and competencies it ought to.
VALIDITY - Kinds of Validity“Content”: related to  objectives and their sampling.“Construct”: referring to the theory underlying the target.“Criterion”: related to concrete criteria in the real world. It can be concurrent or predictive.“Concurrent”: correlating high with another measure  already validated. “Predictive”: Capable of anticipating some later measure. “Face”: related to the test overall appearance.
1. CONTENT VALIDITYContent validity refers to the connections between the test items and the subject-related tasks. The test should evaluate only the content related to the field of study in a manner sufficiently representative, relevant,  and comprehensible.
2. CONSTRUCT VALIDITYIt implies using the construct (concepts, ideas, notions) in accordance to the state of the art in the field. Construct validity seeks agreement between updated subject-matter theories and the specific measuring components of the test.  For example, a test of intelligence nowadays must include measures of multiple intelligences, rather than just logical-mathematical and linguistic ability measures.
3. CRITERION-RELATED VALIDITYAlso referred to as instrumental validity, it is used to demonstrate the accuracy of a measure or procedure by comparing it with another process or method which has been demonstrated to be valid. For example, imagine a hands-on driving test has been  proved to be an accurate test of driving skills.  A written test can be validated by using a criterion related strategy in which the hands-on driving test is compared to it.
4. CONCURRENT VALIDITYConcurrent validity uses statistical methods of correlation to  other measures. Examinees who are known to be either masters or non-masters on the content measured are identified before the test is administered. Once the tests have been scored, the relationship between the examinees’ status as either masters or non-masters and their performance (i.e., pass or fail) is estimated based on the test.
5. PREDICTIVE VALIDITYPredictive validity estimates the relationship of test scores to an examinee's future performance as a master or non-master. Predictive validity considers the question, "How well does the test predict examinees' future status as masters or non-masters?" For this type of validity, the correlation that is computed is based on the test results and the examinee’s  later performance. This type of validity is especially useful for test purposes such as selection or admissions.
6. FACE VALIDITYFace validity is determined by a review of the items and not through the use of statistical analyses. Unlike content validity, face validity is not investigated through formal procedures. Instead, anyone who looks over the test, including examinees, may develop an informal opinion as to whether or not the test is measuring what it is supposed to measure.
QUALITIES OF MEASUREMENT DEVICESValidityDoes it measure what it is supposed to measure?ReliabilityHow representative is the measurement?ObjectivityDo independent scorers agree?PracticalityIs it easy to construct, administer, score and interpret?
RELIABILITYReliability is the extent to which an experiment, test, or any measuring procedure shows the same result on repeated trials. For researchers, four key types of reliability are:
RELIABILITY“Equivalency”: related to the co-occurrence of two items.“Stability”: related to time consistency.“Internal”: related to the instruments.“Interrater”: related to the examiners’ criterion.
1. EQUIVALENCY RELIABILITY Equivalency reliability is the extent to which two items measure identical concepts at an identical level of difficulty. Equivalency reliability is determined by relating two sets of test scores to one another to highlight the degree of relationship or association.
2. STABILITY RELIABILITY Stability reliability (sometimes called test, re-test reliability) is the agreement of measuring instruments over time. To determine stability, a measure or test is repeated on the same subjects at a future date. Results are compared and correlated with the initial test to give a measure of stability. Instruments with a high stability reliability are thermometers, compasses, measuring cups, etc.
3. INTERNAL CONSISTENCY Internal consistency is the extent to which tests or procedures assess the same characteristic, skill or quality. It is a measure of the precision between the measuring instruments used in a study. This type of reliability often helps researchers interpret data and predict the value of scores and the limits of the relationship among variables.
4. INTERRATER RELIABILITY Interraterreliability is the extent to which two or more individuals (coders or raters) agree. For example, when two or more teachers use a rating scale with which they are rating the students’ oral responses in an  interview (1 being most negative, 5 being most positive). If one researcher gives a "1" to a student response, while another researcher gives a "5," obviously the interrater reliability would be inconsistent.
SOURCES OF ERRORExaminee (is a human being)Examiner (is a human being)Examination (is designed by and for human beings)
RELATIONSHIP BETWEEN VALIDITY & RELIABILITYValidity and reliability are closely related.A test cannot be considered valid unless the measurements resulting from it are reliable.Likewise, results from a test can be reliable and not necessarily valid.
BACKWASH EFFECTBackwash (also known as washback) effect is related to the potentially positive and negative effects of test design and content on the form and content of English language training courseware.
THANKS

Validity, reliability & practicality

  • 1.
    VALIDITY, RELIABILITY &PRACTICALITYProf. Jonathan Magdalena
  • 2.
    QUALITIES OF MEASUREMENTDEVICESValidityDoes it measure what it is supposed to measure?ReliabilityHow representative is the measurement?ObjectivityDo independent scorers agree?PracticalityIs it easy to construct, administer, score and interpret?
  • 3.
    VALIDITYValidity refers towhether or not a test measures what it intends to measure. A test with high validity has items closely linked to the test’s intended focus. A test with poor validity does not measure the content and competencies it ought to.
  • 4.
    VALIDITY - Kindsof Validity“Content”: related to objectives and their sampling.“Construct”: referring to the theory underlying the target.“Criterion”: related to concrete criteria in the real world. It can be concurrent or predictive.“Concurrent”: correlating high with another measure already validated. “Predictive”: Capable of anticipating some later measure. “Face”: related to the test overall appearance.
  • 5.
    1. CONTENT VALIDITYContentvalidity refers to the connections between the test items and the subject-related tasks. The test should evaluate only the content related to the field of study in a manner sufficiently representative, relevant, and comprehensible.
  • 6.
    2. CONSTRUCT VALIDITYItimplies using the construct (concepts, ideas, notions) in accordance to the state of the art in the field. Construct validity seeks agreement between updated subject-matter theories and the specific measuring components of the test. For example, a test of intelligence nowadays must include measures of multiple intelligences, rather than just logical-mathematical and linguistic ability measures.
  • 7.
    3. CRITERION-RELATED VALIDITYAlsoreferred to as instrumental validity, it is used to demonstrate the accuracy of a measure or procedure by comparing it with another process or method which has been demonstrated to be valid. For example, imagine a hands-on driving test has been proved to be an accurate test of driving skills. A written test can be validated by using a criterion related strategy in which the hands-on driving test is compared to it.
  • 8.
    4. CONCURRENT VALIDITYConcurrentvalidity uses statistical methods of correlation to other measures. Examinees who are known to be either masters or non-masters on the content measured are identified before the test is administered. Once the tests have been scored, the relationship between the examinees’ status as either masters or non-masters and their performance (i.e., pass or fail) is estimated based on the test.
  • 9.
    5. PREDICTIVE VALIDITYPredictivevalidity estimates the relationship of test scores to an examinee's future performance as a master or non-master. Predictive validity considers the question, "How well does the test predict examinees' future status as masters or non-masters?" For this type of validity, the correlation that is computed is based on the test results and the examinee’s later performance. This type of validity is especially useful for test purposes such as selection or admissions.
  • 10.
    6. FACE VALIDITYFacevalidity is determined by a review of the items and not through the use of statistical analyses. Unlike content validity, face validity is not investigated through formal procedures. Instead, anyone who looks over the test, including examinees, may develop an informal opinion as to whether or not the test is measuring what it is supposed to measure.
  • 11.
    QUALITIES OF MEASUREMENTDEVICESValidityDoes it measure what it is supposed to measure?ReliabilityHow representative is the measurement?ObjectivityDo independent scorers agree?PracticalityIs it easy to construct, administer, score and interpret?
  • 12.
    RELIABILITYReliability is theextent to which an experiment, test, or any measuring procedure shows the same result on repeated trials. For researchers, four key types of reliability are:
  • 13.
    RELIABILITY“Equivalency”: related tothe co-occurrence of two items.“Stability”: related to time consistency.“Internal”: related to the instruments.“Interrater”: related to the examiners’ criterion.
  • 14.
    1. EQUIVALENCY RELIABILITYEquivalency reliability is the extent to which two items measure identical concepts at an identical level of difficulty. Equivalency reliability is determined by relating two sets of test scores to one another to highlight the degree of relationship or association.
  • 15.
    2. STABILITY RELIABILITYStability reliability (sometimes called test, re-test reliability) is the agreement of measuring instruments over time. To determine stability, a measure or test is repeated on the same subjects at a future date. Results are compared and correlated with the initial test to give a measure of stability. Instruments with a high stability reliability are thermometers, compasses, measuring cups, etc.
  • 16.
    3. INTERNAL CONSISTENCYInternal consistency is the extent to which tests or procedures assess the same characteristic, skill or quality. It is a measure of the precision between the measuring instruments used in a study. This type of reliability often helps researchers interpret data and predict the value of scores and the limits of the relationship among variables.
  • 17.
    4. INTERRATER RELIABILITYInterraterreliability is the extent to which two or more individuals (coders or raters) agree. For example, when two or more teachers use a rating scale with which they are rating the students’ oral responses in an interview (1 being most negative, 5 being most positive). If one researcher gives a "1" to a student response, while another researcher gives a "5," obviously the interrater reliability would be inconsistent.
  • 18.
    SOURCES OF ERRORExaminee(is a human being)Examiner (is a human being)Examination (is designed by and for human beings)
  • 19.
    RELATIONSHIP BETWEEN VALIDITY& RELIABILITYValidity and reliability are closely related.A test cannot be considered valid unless the measurements resulting from it are reliable.Likewise, results from a test can be reliable and not necessarily valid.
  • 20.
    BACKWASH EFFECTBackwash (alsoknown as washback) effect is related to the potentially positive and negative effects of test design and content on the form and content of English language training courseware.
  • 21.