Scientific and Economic Value of the Metrological Point of View            William P. Fisher, Jr.        University of Cal...
Overview• Some basic economic principles shared by  science and commerce• Three points of view on measurement in  educatio...
Economic Principles Shared By         Science and Commerce• Separate local economies  –   Different currencies  –   Differ...
Example 1 of Scientific Market• Biochemistry  – Equipment calibrated in universal reference    standard metrics  – Test re...
Example 2 of a Scientific Market• Custom tailored suits  – Tape measures calibrated in universal reference    standard met...
Example 3 of Scientific Market• Education  – Tests typically not calibrated at all  – If they are calibrated, they are in ...
The Ideal Efficient Market•   Cost of estimating value is very low•   Cost of comparing value for price is very low•   Sup...
Basic Economics     Easy to know how to      Easy formatch      Easy to customers     improvequality      to find and dema...
Three Points of Viewon How to Present Information  on Educational Outcomes     • True Score Theory     • Measurement Theor...
True Score Theory    Disconnected Scores and Tests• School 1  – Student A has a score of 22 on a reading test.  – This cla...
True Score Theory     Disconnected Scores and Tests• Who has more reading ability, A or Z? ??• What can one student read t...
True Score Theory      Disconnected Scores and Tests• School 1  – Student A’s reading scores on 2 tests are 22 & 32.  – Th...
True Score Theory       Disconnected Scores and Tests•   Who gained more in reading ability, A or Z? ??•   What new texts ...
Disorganized, uncontrolled, decaying
Measurement Theory    Connected Measures and Tests• School 1  – Student A has a measure of 22 (+/- 2) on a reading    test...
Measurement Theory    Connected Measures and Tests• Who has more reading ability, A or Z? A• What can one student read tha...
Measurement Theory     Connected Measures and Tests• School 1  – Student A’s measures on 2 tests are 22 & 32 (+/- 2).  – T...
Measurement Theory      Connected Measures and Tests• Who gained more in reading ability, A or Z? Z• What new texts can Z ...
Organized, expressive, preserved
Metrologically Traceable Measures• School 1  – Student A’s measure (22, +/- 2) is inferred when 73%    of the items built ...
Metrologically Traceable Measures• Who has more reading ability, A or Z? A• What can one student read that the other  cann...
Metrologically Traceable     Connected Measures and Tests• School 1  – Student A’s measures on 2 tests are 22 & 32 (+/- 2)...
Metrologically Traceable     Connected Measures and Tests• Who gained more in reading ability, A or Z? Z• What new texts c...
Coordinated, harmonized, growing
What to choose?   True Score Theory EconomicsSchool 1                                     School 2Average Grade 7         ...
What to choose?  Measurement Theory Economics                              Best buy            School 2  School 1  Average...
What to choose?   Measurement Theory Economics• My 7th grader’s gain  – US$1,000 for 6 units  – US$166.67 per unit gain• Y...
What to choose?  Measurement Theory EconomicsReadingAbilityScale
What to choose?              Metrology Economics                              Best buy            School 2  School 1  Aver...
What’s a parent to choose?         Metrology Economics• My 7th grader’s gain  – US$833.40 for 6 units  – US$138.90 per uni...
Basic Economics         Easy for customers         to find quality                                    High stakes         ...
What’s a teacher to choose?  Metrology Economics                      Cost per unit gain:                      US$620     ...
What’s a principal to choose?                              Metrology EconomicsBetter Reading Outcomes                    ...
Basic Shop Floor Questions• What is variation trying to tell us? (Deming)• Which variations are due to common  causes, and...
What’s needed?• System of distributed units• Instruments measuring in uniform metrics• Predictive construct theories to br...
What’s needed?• We need commitment to a long range vision  of quality education.• But vision is not enough; we also need: ...
What’s needed?                                                                                              SustainableVis...
Disorganized, uncontrolled, decaying
Organized, expressive, preserved
Coordinated, harmonized, growing
Thank you
Fisher2012 Jiaxing China Keynote PROMS
Fisher2012 Jiaxing China Keynote PROMS
Upcoming SlideShare
Loading in …5
×

Fisher2012 Jiaxing China Keynote PROMS

157 views

Published on

Keynote presentation given at the Pacific Rim Objective Measurement Symposium in Jiaxing China August 2012

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
157
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Ni hao – neehow (draw out the ow) is helloNi hao ma – how are you?Wo hen hao – I’m very good.Ni ne – And you?Wo ye hen hao – I’m also very good.Xiexie – Thank you.Bu keti – You’re welcome.Zaijian – Good bye
  • Both science and commerce flourish when information is communicated efficiently at low cost.
  • There are, of course, a great many problems associated with the efficient markets hypothesis. Many of them stem from the restricted scope in which the hypothesis is applied, so that various kinds of social costs affecting labor, communities, and the environment are pushed out of the market and onto society at large. This process of externalization might be countered if more efficient market functions were created for human, social, and natural capital.
  • When making major investments that are costly and that have long term consequences, we want more information, and we want it to be high quality information. Education is a major investment of this kind. Unfortunately, information on the quality of its products and services is not readily available, is not of very good quality, and is itself very expensive.
  • So that is the context in which I would like to describe for you today three different points of view on measurement.
  • But numbers do not in themselves stand for anything. This becomes readily apparent as soon as we want to compare scores from different tests.
  • Scores from different tests are not comparable, and so it is impossible to know from the information given if A or Z, or School 1 or 2, has greater reading ability. If School 2’s tests are harder, then perhaps Z reads better than A, but if School 1’s tests are harder, perhaps School 1 reads better than School 2. For numbers to have their obvious and natural meanings, a lot of work has to go into making them comparable.
  • As leaves fall from trees in the autumn they drift and blow with the wind, landing where they will, and decaying. Test scores for students and items in True Score Theory are like autumn leaves. Scores are not organized into a common frame of reference and so they are not comparable across tests. The scores accumulate and take up space but are of less and less value as time passes. Further, items also decay in a sense: they cannot be re-used, as students are likely to remember them and may share them with others who would obtain an unfair advantage.
  • Answers to the questions unanswered by True Score Theory can be determined in the context of measurement theory if test items are administered from a common bank, or if two tests are linked with common items and the data are analyzed concurrently. If measures are not estimated in a larger framework informed by theory and evidence, however, questions about long term outcomes may be unanswerable.
  • There is, however, no necessary, legally binding, or scientifically required connection between tests administered in different schools or work places. In real life, these questions are usually as unanswerable in the context of Measurement Theory as they are in True Score Theory.
  • Children, artists, and botanists may collect leaves and use them in creative ways to express themselves or to teach. Measures for students and items in Measurement Theory can be like carefully crafted works of art when the trouble is taken to understand what one is measuring and to use rigorous methods. Much depends, however, on the skills of the artists involved in crafting the test items, administering the tests, analyzing the data, and interpreting the results.
  • Answers to the questions unanswered by True Score Theory and answered by Measurement Theory are answered again in the context of metrologically traceable measures. The difference is that the answers are obtained even when test items are not administered from a common bank, and even when two tests are not linked with common items and no data are analyzed.
  • Foregoing the time and expense of tests by embedding assessments within online reading assignments makes it easier to track growth over time. The overall growth trends for students globally, nationally, regionally, and locally could also be displayed in this same format. Information of this kind is essential to the benchmarking and quality improvement methods that have so remarkably succeeded in improving value at lower cost in other fields.
  • As is the case for virtually everything bought and sold in stores, educational outcomes ought to be universally expressed in uniform measures. Measures made in different schools should be traceable to reference standards and should madenecessary, legally binding, and scientifically required. In real life, though these questions are usually as unanswerable in the context of Measurement Theory as they are in True Score Theory, instituting metrological traceability requirements would make these answers available to everyone, everywhere, all the time.
  • After all, by definition, some people will pay a lot more per unit for a lesser gain and others will pay a lot less for a greater gain. And how many things are bought and sold at their average quality, volumes or prices, anyway?
  • With only two time points, individualized year-to-year gain measures may be highly variable and unreliable.
  • …with the high stakes end of year test, but if we also have week-to-week measures from across the school year…
  • …then we will be able to use this low-cost, high-quality information to inform our purchasing decision…
  • Within a school or district, a standard per-unit gain price might be set. But customers would be able to compare prices to seek out the lowest cost per unit gain. And teachers, principals, and researchers will be able to study outcomes in a common language across classrooms, schools, districts, countries, grades, years, etc.
  • Questions raised by this comparison: Why does Classroom C (at the top) make such a small gain, even after adjusting for variation in at-risk profiles, and over the summer lose nearly all of the small gain that was made? What is happening in Classroom B that is not happening in Classroom C? Why do the measures drop in Classroom B in April? Spring fever? Can anything be done to maintain gains over the summer months of June to August?
  • Within a school or district, a standard per-unit gain price might be set. But customers would be able to compare prices to seek out the lowest cost per unit gain. And teachers, principals, and researchers will be able to study outcomes in a common language across classrooms, schools, districts, countries, grades, years, etc. If these measures are adjusted for differences in at-risk profiles, then this kind of natural variation provides a ready framework for experimental comparisons of possible causal relations. First thing to find out is what’s going on in School A. Then, what is School C doing that gives it such an edge in reading outcomes over School B, and at lower cost? Finally, again, what can be done about that summer slump?
  • Stakeholder participation and involvement are key in every area.
  • fēnpī -- scattered; mixed and disorganized  Fun-peeAs leaves fall from trees in the autumn they drift and blow with the wind, landing where they will, and decaying. Test scores for students and items in True Score Theory are like autumn leaves. Scores are not organized into a common frame of reference and so they are not comparable across tests. The scores accumulate and take up space but are of less and less value as time passes. Further, items also decay in a sense: they cannot be re-used, as students are likely to remember them and may share them with others who would obtain an unfair advantage.
  • yìshù -- art  YeeshuChildren, artists, and botanists may collect leaves and use them in creative ways to express themselves or to teach. Measures for students and items in Measurement Theory can be like carefully crafted works of art when the trouble is taken to understand what one is measuring and to use rigorous methods. Much depends, however, on the skills of the artists involved in crafting the test items, administering the tests, analyzing the data, and interpreting the results.
  • fāzhǎn -- development; growth; to develop; to grow; to expand  
  • XiexieXshee-ay xshee-ay (say it fast, clipped)
  • Fisher2012 Jiaxing China Keynote PROMS

    1. 1. Scientific and Economic Value of the Metrological Point of View William P. Fisher, Jr. University of California, Berkeley Pacific Rim Objective Measurement Symposium 6-9 August 2012 Jiaxing, China
    2. 2. Overview• Some basic economic principles shared by science and commerce• Three points of view on measurement in education• The kinds of markets created by the three approaches to measurement• A plan for the future
    3. 3. Economic Principles Shared By Science and Commerce• Separate local economies – Different currencies – Different weights and measures – Higher costs of exchange – Less efficient, harder to compare values• Unified regional and global economies – Same currency – Same weights and measures – Lower costs of exchange – More efficient, easier to compare values
    4. 4. Example 1 of Scientific Market• Biochemistry – Equipment calibrated in universal reference standard metrics – Test results always reported in common units – Measures available on the spot – Easy to coordinate research across labs – Result: SARS virus sequenced in weeks by network of labs, vaccine successfully synthesized
    5. 5. Example 2 of a Scientific Market• Custom tailored suits – Tape measures calibrated in universal reference standard metric – Results always reported in common units – Measures available on the spot – Easy to coordinate across tailors – Result: measures can be sent around the world and a well fitting suit obtained with little trouble
    6. 6. Example 3 of Scientific Market• Education – Tests typically not calibrated at all – If they are calibrated, they are in local units – Test results are usually reported in unique units – Measures available only after costly data analysis – Very difficult to compare outcomes outside of special contexts – Result: Improvement efforts repeatedly fail, quality uncontrolled, costs spiral higher
    7. 7. The Ideal Efficient Market• Cost of estimating value is very low• Cost of comparing value for price is very low• Supply and demand easily match up• Low value for price: cannot compete• High value for price: rewarded• Improved value easy to recognize• Improved value pushes out old value
    8. 8. Basic Economics Easy to know how to Easy formatch Easy to customers improvequality to find and demand supply quality Customer Market QualityQuality-SeekingEfficiency Improvement Hard to match Hard for customers to Hard to know how supply and demand to find quality improve quality High Cost Low Cost Readily available high quality information on product or service
    9. 9. Three Points of Viewon How to Present Information on Educational Outcomes • True Score Theory • Measurement Theory • Metrological Traceability
    10. 10. True Score Theory Disconnected Scores and Tests• School 1 – Student A has a score of 22 on a reading test. – This classroom averages a score of 24.• School 2 – Student Z has a score of 18 on a reading test. – This classroom averages a score of 26.
    11. 11. True Score Theory Disconnected Scores and Tests• Who has more reading ability, A or Z? ??• What can one student read that the other cannot? ??• Which classroom reads better on average? ??• Which student is more on track for college readiness? ??
    12. 12. True Score Theory Disconnected Scores and Tests• School 1 – Student A’s reading scores on 2 tests are 22 & 32. – The classroom average score goes from 24 to 30.• School 2 – Student Z’s reading scores on 2 tests are 18 & 32. – The classroom average score goes from 26 to 40.
    13. 13. True Score Theory Disconnected Scores and Tests• Who gained more in reading ability, A or Z? ??• What new texts can A and Z read? ??• Which classroom improves more? ??• Are both students on track for college readiness? ??• Result: – Very high cost, almost useless information
    14. 14. Disorganized, uncontrolled, decaying
    15. 15. Measurement Theory Connected Measures and Tests• School 1 – Student A has a measure of 22 (+/- 2) on a reading test. – This classroom averages a measure of 24 (+/- 1).• School 2 – Student Z has a measure of 18 (+/- 2) on a reading test. – This classroom averages a measure of 26 (+/- 1).
    16. 16. Measurement Theory Connected Measures and Tests• Who has more reading ability, A or Z? A• What can one student read that the other cannot? – Text with measures between 18 and 22.• Which classroom reads better on average? 2• Which student is more on track for college readiness? ??
    17. 17. Measurement Theory Connected Measures and Tests• School 1 – Student A’s measures on 2 tests are 22 & 32 (+/- 2). – The classroom average goes from 24 to 30 (+/- 1).• School 2 – Student Z’s measures on 2 tests are 18 & 32 (+/- 2). – The classroom average goes from 26 to 40 (+/- 1).
    18. 18. Measurement Theory Connected Measures and Tests• Who gained more in reading ability, A or Z? Z• What new texts can Z read? – Those with measures between 18 and 32.• Which classroom improves more? 2• Are both students on track for college readiness? ??• Result: – Very high cost, incomplete, but useful information
    19. 19. Organized, expressive, preserved
    20. 20. Metrologically Traceable Measures• School 1 – Student A’s measure (22, +/- 2) is inferred when 73% of the items built into a reading assignment targeted at 22 are answered correctly. – This classroom averages a measure of 24 (+/- 1).• School 2 – Student Z’s measure (18, +/- 2) is inferred when 76% of the items built into a reading assignment targeted at 18 are answered correctly. – This classroom averages a measure of 26 (+/- 1).
    21. 21. Metrologically Traceable Measures• Who has more reading ability, A or Z? A• What can one student read that the other cannot? – Text with measures between 18 and 22.• Which classroom reads better on average? 2• Is one student more on track for college readiness? Yes, A
    22. 22. Metrologically Traceable Connected Measures and Tests• School 1 – Student A’s measures on 2 tests are 22 & 32 (+/- 2). – The classroom average goes from 24 to 30 (+/- 1).• School 2 – Student Z’s measures on 2 tests are 18 & 32 (+/- 2). – The classroom average goes from 26 to 40 (+/- 1).
    23. 23. Metrologically Traceable Connected Measures and Tests• Who gained more in reading ability, A or Z? Z• What new texts can Z read? – Those with measures between 18 and 32.• Which classroom improves more? 2• Are both students on track for college readiness? No, but A is• Result: – Very low cost, complete and useful information
    24. 24. Coordinated, harmonized, growing
    25. 25. What to choose? True Score Theory EconomicsSchool 1 School 2Average Grade 7 Average Grade 7End of Year Teacher’ Quiz End of Year Teacher’ QuizReading Score = 89% Reading Score = 94%Average Gain in Average Gain in7th Grade Reading 7th Grade Readingas measured by in-class as measured by in-classquizzes and tests: ?? quizzes and tests: ??Annual tuition = US$5,000 Annual tuition = US$1,000Cost of average gain in Cost of average gain inreading scores = US$?? reading scores = US$?? Simulated data Not enough information to decide!
    26. 26. What to choose? Measurement Theory Economics Best buy School 2 School 1 Average Grade 7 Average Grade 7 End of Year Statewide End of Year Statewide Reading Measure = 32 (+/- 6) Reading Measure = 34 (+/- 5) Adjusted average gain in Adjusted average gain in 7th Grade Reading 7th Grade Reading Measures = 10 (+/- 4) Measures = 11 (+/- 3) Cost of adjusted average gain in Cost of adjusted average gain reading measures = in reading measures = US$5,000.00 US$1,000.00 Simulated dataBut do you really want to buy the average gain?
    27. 27. What to choose? Measurement Theory Economics• My 7th grader’s gain – US$1,000 for 6 units – US$166.67 per unit gain• Your 7th grader’s gain 50% greater cost! – US$1,000 for 9 units – US$111.11 per unit gain
    28. 28. What to choose? Measurement Theory EconomicsReadingAbilityScale
    29. 29. What to choose? Metrology Economics Best buy School 2 School 1 Average Grade 7 Average Grade 7 End of Year Statewide End of Year Statewide Reading Measure = 32 (+/- 6) Reading Measure = 34 (+/- 5) Adjusted average gain in Adjusted average gain in 7th Grade Reading 7th Grade Reading Measures = 10 (+/- 4) Measures = 11 (+/- 3) Cost of adjusted average gain in Cost of adjusted average gain reading measures = in reading measures = US$5,000.00 US$1,000.00 Simulated dataWe might repeat the Measurement Theory outcomes…
    30. 30. What’s a parent to choose? Metrology Economics• My 7th grader’s gain – US$833.40 for 6 units – US$138.90 per unit gain• Your 7th grader’s gain Same per unit cost! – US$1,250.10 for 9 units – US$138.90 per unit gain Simulated data
    31. 31. Basic Economics Easy for customers to find quality High stakes measurement theory Customer cost per test item:Quality-Seeking > US$3,000.00 Routine theory-informed metrologically traceable Hard for customers cost per test item: to find quality < US$0.01 High Cost Low Cost Readily available high quality information on product or service
    32. 32. What’s a teacher to choose? Metrology Economics Cost per unit gain: US$620 Cost per unit gain: US$180 Simulated data
    33. 33. What’s a principal to choose? Metrology EconomicsBetter Reading Outcomes  Cost per unit gained US$458 US$208 US$116 Three schools Twelve months each A | B | C Simulated data
    34. 34. Basic Shop Floor Questions• What is variation trying to tell us? (Deming)• Which variations are due to common causes, and which are due to special causes? (Shewhart)• How far can educational outcomes be maximized, and unwanted variation reduced?• Can variation in outcomes be reduced by bringing all students to the highest levels?
    35. 35. What’s needed?• System of distributed units• Instruments measuring in uniform metrics• Predictive construct theories to bring down costs• Low cost items and administration• Immediate results• Continuous Quality Improvement (CQI) training and tools• A culture that rewards innovation
    36. 36. What’s needed?• We need commitment to a long range vision of quality education.• But vision is not enough; we also need: – Skills – Incentives – Resources – Plans
    37. 37. What’s needed? SustainableVision + Skills + Incentives + Resources + Plan = Change + Skills + Incentives + Resources + Plan = ConfusionVision + + Incentives + Resources + Plan = AnxietyVision + Skills + + Resources + Plan = ResistanceVision + Skills + Incentives + + Plan = FrustrationVision + Skills + Incentives + Resources + = Treadmill Adapted from Knoster, T. P., Villa, R. A., & Thousand, J. S. (2000). A framework for thinking about systems change. In R. A. Villa & J. S. Thousand (Eds.), Restructuring for caring and effective education: Piecing the puzzle together, 2nd Ed (pp. 93-128). Baltimore: Paul H. Brookes.
    38. 38. Disorganized, uncontrolled, decaying
    39. 39. Organized, expressive, preserved
    40. 40. Coordinated, harmonized, growing
    41. 41. Thank you

    ×