Your SlideShare is downloading. ×
0
Chapter FourteenReading Guide
Measurement and assessment Measurement: an evaluation expressed  in quantitative (numerical) terms Assessment: procedure...
Assessment     Assessment includes        many different      possible activities,        not just tests!But, since tests ...
Standardized tests               Assessment instruments given to large                 samples of students under uniform ...
Norm-referenced testing: testing in which scores are compared to           guys named “Norm.” Just kidding. Testing in whi...
Criterion-referenced testinterpretations Testing in which scores are compared to  a set performance standard. The test y...
Descriptive statistics Frequency distributions Measures of central tendency Measures of variability Normal distribution
This                   Frequency distribution                                       arrangement                           ...
Histogram              5            4.5              4            3.5              3            2.5                       ...
Thinking about the scores Clearly no one had mastery of the material—out  of a 50 point test, no one answered all  questi...
Measures of Central Tendency                Mean: average score                Median: middle score                Mode...
More about measures of    central tendencyImagine a classroom that   So how come wehas one genius and 20      can’t just u...
And another thing….                               X                           X                               X           ...
Measures of variability Range: the distance between the top  and bottom score in a distribution of  scores. Standard dev...
Standard deviationWho is the better shot? Standard deviation could tell us because it is essentially theaverage deviation ...
The Normal DistributionImagine the SAT scores for the millions of people who take it. Some people doextremely well. Some d...
Normal distributionThere is an interesting relationship between the percent of people with a score andstandard deviation (...
Normal distributionStatisticians have taken advantage of this relationship between average and standarddeviation in the de...
Interpreting Standardized TestResults Raw scores Percentiles Stanines Grade equivalents Standard scores
Raw score The number of items an individual answered  correctly on a standardized test or subtest. This is a number that...
Percentiles Percentile or percentile rank: a ranking that compares an  individual’s score with the scores of all the othe...
Grade equivalents A score that is determined by comparing  an individual’s score on a standardized  test to the scores of...
Grade equivalents When a student scores above grade level on a test, that does not mean the student is ready for higher l...
Standard scoreImaginevarious            A description of performance on ascores of this      standardized test that uses ...
More about standard scores
Stanines A stanine (standard nine) is a description of an  individual’s standardized test performance that  uses a scale ...
Stanines
Interpreting test scores Reliability Error Confidence interval Validity Absence of bias
Consistency of test results            ReliabilityHow reliable is your bathroom scale? If it reads 100 pounds, then 75 pou...
Standard error of               measurement                True score: the hypothetical average of                 an ind...
Validity An evaluation of the adequacy and  appropriateness of the interpretations  and uses of assessment results. Focu...
Content validity A test’s ability to representatively sample the  content that is taught and measure the extent to  which...
Predictive validity An indicator of a test’s ability to gauge  future performance. You find this by correlating the test...
Construct validity An indicator of the logical connection between a  test and what it is designed to measure. A reading ...
Bias Content Testing procedures Test use
Cultural minorities and high-        stakes tests         Tests can be harmful to people from           cultural minoriti...
Content Content tends to be biased toward white  middle class students. For example, the 4th grade proficiency  test wri...
Procedures Students’ cultures can affect their  response to the whole procedure of  testing. Time limits can be unfair t...
Test use Test results can be used to discriminate  against groups of students.
Eliminating bias Examine test content and analyze the  results. For example, if most students  get a certain answer wrong...
Creating bias-free tests Culture-fair/culture-free test: a test  without cultural bias. This is very difficult to create
Purposes of standardizedtesting Student assessment: how students in one    classroom compare to students across the    na...
Types of Standardized Tests Achievement Diagnostic Intelligence Aptitude
Achievement                  Determining the extent to which studentsWhatachievement        have mastered a content areat...
Diagnostic tests                Usually given individually                Usually have more subtests and                ...
Aptitude tests Standardized tests designed to predict  the potential for future learning and  measure general abilities d...
Intelligence tests A type of aptitude test Standardized tests designed to measure  an individual’s ability to acquire  k...
Teacher’s job Make sure the test content matches  learning goals Prepare students for the test Administer tests accordi...
Issues in Standardized Testing Accountability Testing teachers
Accountability                                Accountability: the process ofAdequate yearly progress:         requiring s...
Accountability When you have high stakes testing  (where the results of the test determine  something major about a perso...
Standardized testing withalternative formats These are time-consuming to score but  they address critics’ concerns that  ...
Implications for teachers You will have to deal with standardized testing  —it’s here to stay. You will need to know mor...
New directions in assessment Authentic assessment: measurement of  important abilities using procedures that  simulate th...
Praxis and new directions         Praxis II uses constructed responses         Praxis III is supposed to be authentic   ...
Vocabulary               Confi-                                            Standa Accoun-                   Grade         ...
Upcoming SlideShare
Loading in...5
×

Standardized tests

8,643

Published on

Published in: Education, Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
8,643
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
237
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

Transcript of "Standardized tests"

  1. 1. Chapter FourteenReading Guide
  2. 2. Measurement and assessment Measurement: an evaluation expressed in quantitative (numerical) terms Assessment: procedures used to obtain information about student performance
  3. 3. Assessment Assessment includes many different possible activities, not just tests!But, since tests are a large part of current assessment practices, this wholechapter is dedicated to them.
  4. 4. Standardized tests  Assessment instruments given to large samples of students under uniform conditions and scored according to uniform proceduresFor example: ACT, SAT, GRE, Praxis, and many of the achievement tests you took inschool.
  5. 5. Norm-referenced testing: testing in which scores are compared to guys named “Norm.” Just kidding. Testing in which scores are compared with the average performance of others. Norms  An individual’s score on a standardized test is compared to some kind of “normal” group.  Norming group: the representative group of individuals whose standardized test scores are compiled for the purpose of constructing national norms.  National norms: scores on standardized tests earned by representative groups of students from around the nation to which an individual’s score is compared.Your score on achievement tests such as SAT or ACT wascompared to the scores of the norming group.
  6. 6. Criterion-referenced testinterpretations Testing in which scores are compared to a set performance standard. The test you took to get a driver’s license was criterion-referenced. Criterion-referenced tests measure mastery of concepts or skills.
  7. 7. Descriptive statistics Frequency distributions Measures of central tendency Measures of variability Normal distribution
  8. 8. This Frequency distribution arrangement doesn’t tell Scores on 50 point test: you much 48, 47, 46, 45, 45, 44, 44, 44, 44, 44, 43, 43, This 43, 43, 42, 42, 42, 42, 41, 41, 41, 40, 40, 39, arrangement 39, 38, 38, 37, 36, 35 shows you X more about what X X X happened. Remember X X X X the general shape that is X X X X X X X X X defined bythe x’s. It will X X X X X X X X X X X X X Xbe important. 35 36 37 38 39 40 41 42 43 44 45 46 47 48 A frequency distribution is a distribution of test scores that shows a simple count of the number of people who obtained each score.
  9. 9. Histogram 5 4.5 4 3.5 3 2.5 SCORES 2 1.5 1 0.5 0 35 37 39 41 43 45 47The same frequency distribution can be represented by a histogram: a bargraph ofa frequency distribution.By the way, if you were a teacher, what would you think of this group of scores?
  10. 10. Thinking about the scores Clearly no one had mastery of the material—out of a 50 point test, no one answered all questions correctly. So, the next question is, does this have something to do with the nature of the students, the nature of the teaching that went on, or the nature of the assessment? Why did so few students do well on this test? An assessment is not just information gathered about students; it also represents something about teaching and potentially something about the assessment procedures themselves.
  11. 11. Measures of Central Tendency  Mean: average score  Median: middle score  Mode: most frequent scoreMeasures of central tendency are quantitative descriptions of how agroup performed as a whole.
  12. 12. More about measures of central tendencyImagine a classroom that So how come wehas one genius and 20 can’t just use theordinary people. The average?genius’s score willartificially raise theaverage of the class.Median and mode are notas affected by a single“outlying” score.
  13. 13. And another thing…. X X X X X X X X X X X X X X X X X X X X X X X X x X X x X X X X X X X 60 61 62 63 64 65 66 67 68 69 70 71You might have a frequency distribution that looks like this. For example, if youmeasured the heights of a group of men and women, you would have a “bump” for theaverage height of men and another “bump” for the height of the women. This situationwould mean that you have two modes (63 and 68 inches in this case) and yourcollection of numbers would be “bimodal.” Keep in mind that “average” for the wholegroup in this case would be somewhere between the men and the women. The averagewouldn’t really represent what is going on here very well.
  14. 14. Measures of variability Range: the distance between the top and bottom score in a distribution of scores. Standard deviation: a statistical measure of the spread of scores Variability: degree of difference from mean.
  15. 15. Standard deviationWho is the better shot? Standard deviation could tell us because it is essentially theaverage deviation from the norm. We would measure the distance between eachhold and the center of the target and then average those distances to come up withthe standard deviation. The archer with the lowest standard deviation would be thebetter archer.
  16. 16. The Normal DistributionImagine the SAT scores for the millions of people who take it. Some people doextremely well. Some do extremely poorly. Most do average. If you graphed the SATscores of everyone who takes it the way we graphed scores earlier in this presentation(with the X’s), you would get a shape like this. This is called the “bell curve” or the“normal distribution.”Normal distribution is a distribution of scores in which the mean, median, and modeare equal and the scores distribute themselves symmetrically in a bell-shaped curve.
  17. 17. Normal distributionThere is an interesting relationship between the percent of people with a score andstandard deviation (the average deviation from the average score). 68% (blue on thisgraph) of the test takers fall within one standard deviation of the norm. This is true forany measure that yields a normal distribution (height of women or height of men, IQ,ACT, SAT, number of pushups people can do, etc.).
  18. 18. Normal distributionStatisticians have taken advantage of this relationship between average and standarddeviation in the development of standardized test scoring.
  19. 19. Interpreting Standardized TestResults Raw scores Percentiles Stanines Grade equivalents Standard scores
  20. 20. Raw score The number of items an individual answered correctly on a standardized test or subtest. This is a number that has not been interpreted. The statistical procedures which follow allow us to interpret a raw score in relation to other people who have taken the test. You cannot make any interpretation of a simple raw score—you need to know the information that follows in order to interpret the score.
  21. 21. Percentiles Percentile or percentile rank: a ranking that compares an individual’s score with the scores of all the others who have taken the test. Your percentile rank tells you how you did in relation to others. A percentile rank of 80 means you did as well or better than 80% of those who took the test. Percentile is not the same as percentage (the number of correct out of the total number of items). Percentile is a comparison of people. Percentile bands are ranges of percentile scores on standardized tests. They allow for the fact that test scores are an estimation rather than a perfect measure.
  22. 22. Grade equivalents A score that is determined by comparing an individual’s score on a standardized test to the scores of students in a particular age group; the first digit represents the grade and the second the month of that school year. A first grader who has the grade equivalent of 2.4 on a reading test has the same score as the AVERAGE 2nd grader in the fourth month of school.
  23. 23. Grade equivalents When a student scores above grade level on a test, that does not mean the student is ready for higher level work. It simply means that the student has mastered the work at his/her grade level.
  24. 24. Standard scoreImaginevarious  A description of performance on ascores of this standardized test that uses standardsort and thengo back to the deviation as a basic unit.chart thatshows you  Z-score: the number of standardthe deviation units from the mean (average).percentage ofpeople in  T-score: standard score that defines 50relation tostandard as the mean and a standard deviation asdeviation. 10.That will showyou the utilityof thisconcept.
  25. 25. More about standard scores
  26. 26. Stanines A stanine (standard nine) is a description of an individual’s standardized test performance that uses a scale ranging from 1-9 points. These 9 points are spread equitably across the range of the bell curve and are standardized in terms of how many standard deviations they cover. While stanines are a simple way to report scores and they reflect the fact that students who score closely in percentile ranks are really not much different, they also may be overly reductive—they reduce a whole test to a single digit.
  27. 27. Stanines
  28. 28. Interpreting test scores Reliability Error Confidence interval Validity Absence of bias
  29. 29. Consistency of test results ReliabilityHow reliable is your bathroom scale? If it reads 100 pounds, then 75 pounds, then105 pounds for the same object, then it’s not reliable.
  30. 30. Standard error of measurement  True score: the hypothetical average of an individual’s scores if repeated testing under ideal conditions were possible.  Standard error of measurement: the range of scores within which an individual’s true score is likely to fall. (also called confidence interval, score band, or profile band).This is how statisticians get around the idea that no test situation isperfect.
  31. 31. Validity An evaluation of the adequacy and appropriateness of the interpretations and uses of assessment results. Focuses on USE of test results, not the test itself. Three types: content, predictive, and construct
  32. 32. Content validity A test’s ability to representatively sample the content that is taught and measure the extent to which learners understand it. In other words, how close is the test to what was actually taught? That is its content validity. If it is not close to the curriculum, then its results do not measure how much students understood the curriculum and it is not appropriate to use it for that purpose.
  33. 33. Predictive validity An indicator of a test’s ability to gauge future performance. You find this by correlating the test score with grades. For example, the extent to which the SAT predicts college grades is .42 (which is not very high—a perfect correlation is 1.0).
  34. 34. Construct validity An indicator of the logical connection between a test and what it is designed to measure. A reading test that has students actually read a passage has construct validity because of the logical connection between the every day task (reading) and the process on the test (reading and responding to questions about a text). A reading test does not typically assess mathematical knowledge, and would not have construct validity for that purpose.
  35. 35. Bias Content Testing procedures Test use
  36. 36. Cultural minorities and high- stakes tests  Tests can be harmful to people from cultural minorities in the US since people from many of these cultures tend to score lower than non-minority students.Assessment bias: qualities of an assessment instrument that offend orunfairly penalize a group of students because of the students’ gender,economic class, race, ethnicity, etc.
  37. 37. Content Content tends to be biased toward white middle class students. For example, the 4th grade proficiency test writing prompt that asked students to write about going camping. Students without this kind of experience would not be able to write well about this topic.
  38. 38. Procedures Students’ cultures can affect their response to the whole procedure of testing. Time limits can be unfair to students for whom English is a foreign language or for students with learning disabilities.
  39. 39. Test use Test results can be used to discriminate against groups of students.
  40. 40. Eliminating bias Examine test content and analyze the results. For example, if most students get a certain answer wrong, they were probably not taught that concept. Adapt testing procedures if possible and teach students about the test. Use more than just standardized tests for making decisions—use alternative assessment data as well.
  41. 41. Creating bias-free tests Culture-fair/culture-free test: a test without cultural bias. This is very difficult to create
  42. 42. Purposes of standardizedtesting Student assessment: how students in one classroom compare to students across the nation (or even around the world) Diagnosis: a student’s specific strengths and weaknesses Selection and placement: standardized tests may determine whether or not a student is invited to take advanced classes Program evaluation: how students in a particular school or program compare with students across the nation Accountability: how students of a particular teacher score on a test.
  43. 43. Types of Standardized Tests Achievement Diagnostic Intelligence Aptitude
  44. 44. Achievement  Determining the extent to which studentsWhatachievement have mastered a content areatests have you  Comparing the performance of studentstaken? with others across the country  Tracking student progress over time  Determining if students have the background knowledge to begin instruction in particular areas  Identifying learning problems Achievement tests are designed to measure and communicate how much students have learned in specified content areas.
  45. 45. Diagnostic tests  Usually given individually  Usually have more subtests and measure knowledge of a particular area in detail  Provide information that teachers can use in order to instruct the child or address weaknessesDiagnostic tests are designed to provide a detailed description of learners’ strengthsand weaknesses in specified skill areas.
  46. 46. Aptitude tests Standardized tests designed to predict the potential for future learning and measure general abilities developed over long periods of time. Different from intelligence tests because aptitude is just one aspect of intelligence. ACT and SAT are aptitude tests (measuring your aptitude for college- level work).
  47. 47. Intelligence tests A type of aptitude test Standardized tests designed to measure an individual’s ability to acquire knowledge, capacity to think and reason in the abstract, and ability to solve novel problems. Individually administered tests are typically more accurate than group- administered intelligence tests.
  48. 48. Teacher’s job Make sure the test content matches learning goals Prepare students for the test Administer tests according to instructions Communicate results to students and their caregivers so their results can be used for educational decision-making
  49. 49. Issues in Standardized Testing Accountability Testing teachers
  50. 50. Accountability  Accountability: the process ofAdequate yearly progress: requiring students to demonstrateobjectives for yearly that they have met specifiedimprovement for all studentsand for specific groups, such standards and holding teachersas students from major ethnic responsible for students’and racial groups, students performance.with disabilities, studentsfrom low-income families, and  High stakes tests: standardizedstudents whose English is tests designed to measure thelimited. extent to which standards are being met. (minimum competency testing).Standards-based education: the process of focusing curricula and instructionon pre-determined goals.
  51. 51. Accountability When you have high stakes testing (where the results of the test determine something major about a person’s life such as whether or not he/she graduates), teachers often “teach to the test.” Instruction becomes boring and students with a low tolerance for busy work struggle. Some say these tests help us to know which schools are effective.
  52. 52. Standardized testing withalternative formats These are time-consuming to score but they address critics’ concerns that multiple choice formats are a limited way to know what a student knows.
  53. 53. Implications for teachers You will have to deal with standardized testing —it’s here to stay. You will need to know more content knowledge as well as about how children learn. You need to know about how to interpret standardized tests. It’s possible to find a creative way to deal with standardized testing (see p. 544 in your book for an example).
  54. 54. New directions in assessment Authentic assessment: measurement of important abilities using procedures that simulate the application of these abilities to real-life problems. Constructed response formats: assessment procedures that require the student to create an answer instead of selecting an answer from a set of choices.
  55. 55. Praxis and new directions  Praxis II uses constructed responses  Praxis III is supposed to be authentic assessment (you actually teach a lesson while being observed).These practices are not without their critics—you don’t have a lot of time todo the constructed responses in Praxis II and the pass rate for Praxis III isvery high—perhaps the universities are doing a good job of preparing pre-service teachers after all…
  56. 56. Vocabulary Confi- Standa Accoun- Grade Percentile dence Median r-dized Validity tability equivalent bands interval tests Achiev Con- High Predictive Standard e-ment struct Stakes Mode Variability validity score tests validity test Stan-Adequate Con- dards- National yearly structed Histogram Range based Z-score norms progress responses edu- cation Culture Normal Aptitude -fair/ Intelligence distribu Reliability Stanine tests culture- tests -tion free test Stan- Assess- Diagnostic Norming dard ment Mean True score tests group de- bias viation Fre- Measure Standard Authentic quency s of error of Percentile T-scoreassessment distribu central measure- -tion tendency ment
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×