Your SlideShare is downloading. ×
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply



Published on

Validity Paper

Validity Paper

Published in: Education, Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. 1 Assessment Paper "Validity" Compiled by : Mila Nentinah Falhan Nina Maryana Nizar Ridaus Syamsi SEKOLAH TINGGI DAN KEGURUAN ILMU PENDIDIKAN PERSATUAN ISLAM BANDUNG 2012
  • 2. 2 CHAPTER I INTRODUCTION 1.1. Background Assessment is a broad and relatively nonrestrictive descriptor for the kinds of testing and measuring that teacher must do. Assessment is a word that embraces diverse kind of test and measurements. Teachers who can test well will be better teachers. Therefore, the effective testing will enhance a teacher‟s instructional effectiveness. Teacher should never assess students without a clear understanding of what decision is that will be informed by result of the assessment. Many educators carry out assessments in the first place. Then it become the important part in educational. Educators use the result of assessments to make decisions about students.
  • 3. 3 CHAPTER II VALIDITY 2.1. A Quest for Defensible Inferences Validity refers to the degree to which a test measures what it purports to measure. In validity, the educators must know about their students. The more educators know about their student status with respect to educationally relevant variables, the better will be educational decisions that are made regarding those students. For this reason, the teacher is apt to decide that the students will tackle more advanced topics than originally planned. Educators must make appropriate educational decisions depend on the accuracy of educational assessment. The accurate assessment will improve the quality of decisions whereas inaccurate assessments will do the opposite. When we measure students, we try to sample the contents of an assessment domain in a representative manner so that, based on the students‟ performance on the sampled assessment domain, we can infer what students status is with respect to the entire assessment domain. If a test truly measures what it sets out to measure, then it‟s likely the inferences that we make about students based on their test performances will be valid. Valid assessments minimize unintended negative. The focus of validity should be on test-based inferences, not on tests themselves.
  • 4. 4 2.2. Three Varieties of Validity Evidence There are three kinds of evidence that can be used to help educators determine whether their score-based inferences are valid. There are: 1. Content-related evidence of validity This first form can be used to support the defensibility of score-based inferences about a student‟s status with respect to an assessment domain. Content-related evidence of validity (often referred to simply as content validity) refers to the adequacy with which the content of a test represents the content of the assessment domain about which inferences are to be made. The notion of “content” refers to much more than factual knowledge. The content of assessment domains in which educators are interested can embrace knowledge (such as historical facts), skills (such as higher-order thinking competencies), or attitudes (such as students‟ disposition toward the study of science). Content, therefore, should be conceived of broadly. When we determine the content representativeness of a test, the content in assessment domain being sampled can consist of whatever is in that domain. . let‟s illustrate varying degrees with which an assessment domain can be represented by a test. Take a look at figure 3-2 where you see an illustrative assessment domain (represented by the shaded rectangle) and the items from different tests (represented by the dots). As the test items coincide less adequately
  • 5. 5 with the assessment domain, the weaker is the content-related evidence of validity. FIGURE 3-2 Varying Degrees to which a Test‟s Items Represent the Assessment Domain about Which Score-Based Inferences Are to Be Made. Assessment Domain Test Items A Excellent B. Inadequate C. Inadequate Representativeness Representativeness Representativeness For example, in illustration A of Figure 3-2, we see that test‟s items effectively sample the full range of assessment-domain content represented by the shaded rectangle. In illustration B, however, note that some of the test‟s items don‟t even coincide with the assessment domain‟s content , and that those items falling in the assessment domain don‟t cover it all that well. Even in illustration C, where all the test‟s items measure content included in the assessment domain, the breadth of coverage for the domain is insufficient.
  • 6. 6 Trying to put a bit of reality into those rectangles and dots, think about an “Algebra 1” teacher who is trying to measure his students‟ mastery of a semester‟s worth of content by creating a truly comprehensive final examination. Based chiefly on students‟ performances on the final examination, he will assign grades that will influence whether or not his students can advance to Algebra 2. Let‟s assume the content the teacher addressed instructionally in Algebra 1-that is, the algebraic skills and knowledge taught during the algebra 1 course-are truly prerequisite to Algebra 2. Then, if the assessment domain representing the Algebra 1 content is not satisfactorily represented by the teacher‟s final examination, the teacher‟s score-based inferences about students‟ end-of-course algebraic capabilities and his resultant decisions about students‟ readiness for Algebra 2 are apt to be in error. If teachers‟ educational decisions hinge on students‟ status regarding an assessment domain‟s content, then those decisions are likely to be flawed if inferences about students mastery of the domain are based on a test that doesn‟t adequately represent the domain‟s content. 2. Criterion-related evidence of validity This kind of evidence helps educators decide how much confidence can be placed in a score-based inference about a student‟s status with respect to an assessment domain. Moreover, criterion-related evidence of validity is collected only in situations where educators are using an assessment procedure to predict how well students will perform on some subsequent criterion.
  • 7. 7 The earliest way to understand what this second kind of validity evidence looks like is to describe the most common educational setting in which it is collected-namely, the relationship between students‟ scores on (1) an aptitude test and (2) the grades those students subsequently earn. An Aptitude test is an assessment device that is used in order to predict how well an examine will perform at some later point. For example, many high school students complete a scholastic aptitude test when they‟re still in high school. The test is supposed to be predictive of how well those students are apt to perform in college. More specifically, students‟ scores on the aptitudes test are employed to predict students‟ grade-point averages (GPAs) in college. It is assumed that the students who score well on the aptitude test will earn higher GPAs in college than those students who score poorly on the aptitude test. 3. Construct-related evidence of validity The way that construct-related evidence is assembled for a test is, in some sense, quite straightforward. First, based on our understanding of how the hypothetical construct that we‟re measuring works, we make one or more formal hypotheses about students‟ performances on the test for which we‟re gathering construct-related evidence of validity. Second, we gather empirical evidence to see whether the hypothesis (or hypotheses) is confirmed. If it is, we have assembled evidence that the test is measuring what it‟s supposed to be measuring.
  • 8. 8 As a consequence, we are more apt to be able to draw valid score-based inferences when students take the test. There are three types of strategies most commonly used in constructrelated evidence studies. a. Intervention Studies One kind of investigation that provides construct-related evidence of validity is an intervention study. In a intervention study, we hypothesize that students will respond differently to assessment instrument after having received some type of treatment (or intervention). b. Differential Population Studies In this kind of study, based on our knowledge of the construct being measured, we hypothesize that individuals representing distinctly different populations will score differently on the assessment procedure under consideration. c. Related-Measures Studies In related measures study, we hypothesize that a given kind of relationship will be present between students‟ scores on the assessment device we‟re scrutinizing and their scores on a related assessment device. 2.3. Sanctioned and Unsanctioned Form of Validity Validity is the linchpin of educational measurement. However, because validity is such a central notion in educational assessment, some folks have
  • 9. 9 attached specialized meanings to it that, although helpful at some level, also may introduce confusion. One of these is face validity. Face validity is that the appearance of a test seems to coincide with the use to which the test is being put. Another more recently introduced variant of validity is something known as consequential validity. Consequential validity refers to whether the uses of test result are valid. Consequential validity is a decent way to remind educators if the importance of consequences when tests are used. 2.4. The Relationship between Reliability and Validity A test, for example, could be measuring with remarkable consistency a construct that the test developer never even contemplated measuring. For instance, although the test developer though that an assessment procedure was measuring students‟ punctuation skills, what is actually measured is students‟ general intellectual ability which, not comprising, splashes over into how well students can punctuate. Thus, inconsistent results will preclude the validity of score-based inference. Evidence of valid score-based inferences almost certainly requires that consistency of measurement is present. 2.5. What Do Classroom Teachers Really Need to Know About Validity? The author recommended that for your more important tests, you devote at least some attention to content-related evidence of validity. He suggested that this chapter gave serious though to the content of an assessment domain being
  • 10. 10 represented by a test is a good first step. Your test content is also an effective way to help make sure that your classroom tests represent satisfactorily the content you are trying to promote, and that your score-based inferences about your students „ status are not miles off the mark.
  • 11. 11 CHAPTER III 3.1. Conclution Validity refers to the degree to which a test measures what it purports to measure. There are three kinds of evidence that can be used to help educators determine whether their score-based inferences are valid. There are: Contentrelated evidence of validity, Criterion-related evidence of validity, and Constructrelated evidence of validity. Source:
  • 12. 12 Popham,W James. Classroom Assessment What Teachers need to know. Allyn and Bacon.