The document discusses the scientific and economic value of a metrological point of view in measurement and education. It provides three key points:
1) True score theory results in disconnected scores and tests that make comparisons difficult and costly.
2) Measurement theory connects measures and tests but information is still incomplete and high cost.
3) A metrologically traceable approach provides complete, useful information at very low cost by calibrating tests and reporting results in common metrics.
This allows for more efficient markets where the value of improvements is easy to recognize, compare, and reward. A metrological system supports continuous quality improvement.
1. Scientific and Economic Value of
the Metrological Point of View
William P. Fisher, Jr.
University of California, Berkeley
Pacific Rim Objective Measurement Symposium
6-9 August 2012
Jiaxing, China
2. Overview
• Some basic economic principles shared by
science and commerce
• Three points of view on measurement in
education
• The kinds of markets created by the three
approaches to measurement
• A plan for the future
3. Economic Principles Shared By
Science and Commerce
• Separate local economies
– Different currencies
– Different weights and measures
– Higher costs of exchange
– Less efficient, harder to compare values
• Unified regional and global economies
– Same currency
– Same weights and measures
– Lower costs of exchange
– More efficient, easier to compare values
4. Example 1 of Scientific Market
• Biochemistry
– Equipment calibrated in universal reference
standard metrics
– Test results always reported in common units
– Measures available on the spot
– Easy to coordinate research across labs
– Result: SARS virus sequenced in weeks by network
of labs, vaccine successfully synthesized
5. Example 2 of a Scientific Market
• Custom tailored suits
– Tape measures calibrated in universal reference
standard metric
– Results always reported in common units
– Measures available on the spot
– Easy to coordinate across tailors
– Result: measures can be sent around the world
and a well fitting suit obtained with little trouble
6. Example 3 of Scientific Market
• Education
– Tests typically not calibrated at all
– If they are calibrated, they are in local units
– Test results are usually reported in unique units
– Measures available only after costly data analysis
– Very difficult to compare outcomes outside of
special contexts
– Result: Improvement efforts repeatedly
fail, quality uncontrolled, costs spiral higher
7. The Ideal Efficient Market
• Cost of estimating value is very low
• Cost of comparing value for price is very low
• Supply and demand easily match up
• Low value for price: cannot compete
• High value for price: rewarded
• Improved value easy to recognize
• Improved value pushes out old value
8. Basic Economics
Easy to know how to
Easy formatch
Easy to customers
improvequality
to find and demand
supply quality
Customer
Market
Quality
Quality-Seeking
Efficiency
Improvement
Hard to match
Hard for customers to
Hard to know how
supply and demand
to find quality
improve quality
High Cost Low Cost
Readily available high quality information
on product or service
9. Three Points of View
on How to Present Information
on Educational Outcomes
• True Score Theory
• Measurement Theory
• Metrological Traceability
10. True Score Theory
Disconnected Scores and Tests
• School 1
– Student A has a score of 22 on a reading test.
– This classroom averages a score of 24.
• School 2
– Student Z has a score of 18 on a reading test.
– This classroom averages a score of 26.
11. True Score Theory
Disconnected Scores and Tests
• Who has more reading ability, A or Z? ??
• What can one student read that the other
cannot? ??
• Which classroom reads better on average? ??
• Which student is more on track for college
readiness? ??
12. True Score Theory
Disconnected Scores and Tests
• School 1
– Student A’s reading scores on 2 tests are 22 & 32.
– The classroom average score goes from 24 to 30.
• School 2
– Student Z’s reading scores on 2 tests are 18 & 32.
– The classroom average score goes from 26 to 40.
13. True Score Theory
Disconnected Scores and Tests
• Who gained more in reading ability, A or Z? ??
• What new texts can A and Z read? ??
• Which classroom improves more? ??
• Are both students on track for college
readiness? ??
• Result:
– Very high cost, almost useless information
15. Measurement Theory
Connected Measures and Tests
• School 1
– Student A has a measure of 22 (+/- 2) on a reading
test.
– This classroom averages a measure of 24 (+/- 1).
• School 2
– Student Z has a measure of 18 (+/- 2) on a reading
test.
– This classroom averages a measure of 26 (+/- 1).
16. Measurement Theory
Connected Measures and Tests
• Who has more reading ability, A or Z? A
• What can one student read that the other
cannot?
– Text with measures between 18 and 22.
• Which classroom reads better on average? 2
• Which student is more on track for college
readiness? ??
17. Measurement Theory
Connected Measures and Tests
• School 1
– Student A’s measures on 2 tests are 22 & 32 (+/- 2).
– The classroom average goes from 24 to 30 (+/- 1).
• School 2
– Student Z’s measures on 2 tests are 18 & 32 (+/- 2).
– The classroom average goes from 26 to 40 (+/- 1).
18. Measurement Theory
Connected Measures and Tests
• Who gained more in reading ability, A or Z? Z
• What new texts can Z read?
– Those with measures between 18 and 32.
• Which classroom improves more? 2
• Are both students on track for college readiness?
??
• Result:
– Very high cost, incomplete, but useful information
20. Metrologically Traceable Measures
• School 1
– Student A’s measure (22, +/- 2) is inferred when 73%
of the items built into a reading assignment targeted
at 22 are answered correctly.
– This classroom averages a measure of 24 (+/- 1).
• School 2
– Student Z’s measure (18, +/- 2) is inferred when 76%
of the items built into a reading assignment targeted
at 18 are answered correctly.
– This classroom averages a measure of 26 (+/- 1).
21. Metrologically Traceable Measures
• Who has more reading ability, A or Z? A
• What can one student read that the other
cannot?
– Text with measures between 18 and 22.
• Which classroom reads better on average? 2
• Is one student more on track for college
readiness? Yes, A
22.
23. Metrologically Traceable
Connected Measures and Tests
• School 1
– Student A’s measures on 2 tests are 22 & 32 (+/- 2).
– The classroom average goes from 24 to 30 (+/- 1).
• School 2
– Student Z’s measures on 2 tests are 18 & 32 (+/- 2).
– The classroom average goes from 26 to 40 (+/- 1).
24. Metrologically Traceable
Connected Measures and Tests
• Who gained more in reading ability, A or Z? Z
• What new texts can Z read?
– Those with measures between 18 and 32.
• Which classroom improves more? 2
• Are both students on track for college
readiness? No, but A is
• Result:
– Very low cost, complete and useful information
26. What to choose?
True Score Theory Economics
School 1 School 2
Average Grade 7 Average Grade 7
End of Year Teacher’ Quiz End of Year Teacher’ Quiz
Reading Score = 89% Reading Score = 94%
Average Gain in Average Gain in
7th Grade Reading 7th Grade Reading
as measured by in-class as measured by in-class
quizzes and tests: ?? quizzes and tests: ??
Annual tuition = US$5,000 Annual tuition = US$1,000
Cost of average gain in Cost of average gain in
reading scores = US$?? reading scores = US$??
Simulated data
Not enough information to decide!
27. What to choose?
Measurement Theory Economics
Best buy School 2
School 1
Average Grade 7 Average Grade 7
End of Year Statewide End of Year Statewide
Reading Measure = 32 (+/- 6) Reading Measure = 34 (+/- 5)
Adjusted average gain in Adjusted average gain in
7th Grade Reading 7th Grade Reading
Measures = 10 (+/- 4) Measures = 11 (+/- 3)
Cost of adjusted average gain in Cost of adjusted average gain
reading measures = in reading measures =
US$5,000.00 US$1,000.00
Simulated data
But do you really want to buy the average gain?
28. What to choose?
Measurement Theory Economics
• My 7th grader’s gain
– US$1,000 for 6 units
– US$166.67 per unit gain
• Your 7th grader’s gain
50% greater cost!
– US$1,000 for 9 units
– US$111.11 per unit gain
29. What to choose?
Measurement Theory Economics
Reading
Ability
Scale
30. What to choose?
Metrology Economics
Best buy School 2
School 1
Average Grade 7 Average Grade 7
End of Year Statewide End of Year Statewide
Reading Measure = 32 (+/- 6) Reading Measure = 34 (+/- 5)
Adjusted average gain in Adjusted average gain in
7th Grade Reading 7th Grade Reading
Measures = 10 (+/- 4) Measures = 11 (+/- 3)
Cost of adjusted average gain in Cost of adjusted average gain
reading measures = in reading measures =
US$5,000.00 US$1,000.00
Simulated data
We might repeat the Measurement Theory outcomes…
31.
32. What’s a parent to choose?
Metrology Economics
• My 7th grader’s gain
– US$833.40 for 6 units
– US$138.90 per unit gain
• Your 7th grader’s gain Same per unit cost!
– US$1,250.10 for 9 units
– US$138.90 per unit gain
Simulated data
33. Basic Economics
Easy for customers
to find quality
High stakes
measurement theory
Customer cost per test item:
Quality-Seeking > US$3,000.00
Routine theory-informed
metrologically traceable
Hard for customers cost per test item:
to find quality < US$0.01
High Cost Low Cost
Readily available high quality information
on product or service
34. What’s a teacher to choose?
Metrology Economics
Cost per unit gain:
US$620
Cost per unit gain:
US$180
Simulated data
35. What’s a principal to choose?
Metrology Economics
Better Reading Outcomes
Cost per unit gained
US$458 US$208 US$116
Three schools
Twelve months each
A | B | C
Simulated data
36. Basic Shop Floor Questions
• What is variation trying to tell us? (Deming)
• Which variations are due to common
causes, and which are due to special causes?
(Shewhart)
• How far can educational outcomes be
maximized, and unwanted variation reduced?
• Can variation in outcomes be reduced by
bringing all students to the highest levels?
37. What’s needed?
• System of distributed units
• Instruments measuring in uniform metrics
• Predictive construct theories to bring down costs
• Low cost items and administration
• Immediate results
• Continuous Quality Improvement (CQI) training
and tools
• A culture that rewards innovation
38. What’s needed?
• We need commitment to a long range vision
of quality education.
• But vision is not enough; we also need:
– Skills
– Incentives
– Resources
– Plans
39. What’s needed?
Sustainable
Vision + Skills + Incentives + Resources + Plan =
Change
+ Skills + Incentives + Resources + Plan = Confusion
Vision + + Incentives + Resources + Plan = Anxiety
Vision + Skills + + Resources + Plan = Resistance
Vision + Skills + Incentives + + Plan = Frustration
Vision + Skills + Incentives + Resources + = Treadmill
Adapted from Knoster, T. P., Villa, R. A., & Thousand, J. S. (2000). A framework for thinking about systems
change. In R. A. Villa & J. S. Thousand (Eds.), Restructuring for caring and effective education: Piecing the
puzzle together, 2nd Ed (pp. 93-128). Baltimore: Paul H. Brookes.
Ni hao – neehow (draw out the ow) is helloNi hao ma – how are you?Wo hen hao – I’m very good.Ni ne – And you?Wo ye hen hao – I’m also very good.Xiexie – Thank you.Bu keti – You’re welcome.Zaijian – Good bye
Both science and commerce flourish when information is communicated efficiently at low cost.
There are, of course, a great many problems associated with the efficient markets hypothesis. Many of them stem from the restricted scope in which the hypothesis is applied, so that various kinds of social costs affecting labor, communities, and the environment are pushed out of the market and onto society at large. This process of externalization might be countered if more efficient market functions were created for human, social, and natural capital.
When making major investments that are costly and that have long term consequences, we want more information, and we want it to be high quality information. Education is a major investment of this kind. Unfortunately, information on the quality of its products and services is not readily available, is not of very good quality, and is itself very expensive.
So that is the context in which I would like to describe for you today three different points of view on measurement.
But numbers do not in themselves stand for anything. This becomes readily apparent as soon as we want to compare scores from different tests.
Scores from different tests are not comparable, and so it is impossible to know from the information given if A or Z, or School 1 or 2, has greater reading ability. If School 2’s tests are harder, then perhaps Z reads better than A, but if School 1’s tests are harder, perhaps School 1 reads better than School 2. For numbers to have their obvious and natural meanings, a lot of work has to go into making them comparable.
As leaves fall from trees in the autumn they drift and blow with the wind, landing where they will, and decaying. Test scores for students and items in True Score Theory are like autumn leaves. Scores are not organized into a common frame of reference and so they are not comparable across tests. The scores accumulate and take up space but are of less and less value as time passes. Further, items also decay in a sense: they cannot be re-used, as students are likely to remember them and may share them with others who would obtain an unfair advantage.
Answers to the questions unanswered by True Score Theory can be determined in the context of measurement theory if test items are administered from a common bank, or if two tests are linked with common items and the data are analyzed concurrently. If measures are not estimated in a larger framework informed by theory and evidence, however, questions about long term outcomes may be unanswerable.
There is, however, no necessary, legally binding, or scientifically required connection between tests administered in different schools or work places. In real life, these questions are usually as unanswerable in the context of Measurement Theory as they are in True Score Theory.
Children, artists, and botanists may collect leaves and use them in creative ways to express themselves or to teach. Measures for students and items in Measurement Theory can be like carefully crafted works of art when the trouble is taken to understand what one is measuring and to use rigorous methods. Much depends, however, on the skills of the artists involved in crafting the test items, administering the tests, analyzing the data, and interpreting the results.
Answers to the questions unanswered by True Score Theory and answered by Measurement Theory are answered again in the context of metrologically traceable measures. The difference is that the answers are obtained even when test items are not administered from a common bank, and even when two tests are not linked with common items and no data are analyzed.
Foregoing the time and expense of tests by embedding assessments within online reading assignments makes it easier to track growth over time. The overall growth trends for students globally, nationally, regionally, and locally could also be displayed in this same format. Information of this kind is essential to the benchmarking and quality improvement methods that have so remarkably succeeded in improving value at lower cost in other fields.
As is the case for virtually everything bought and sold in stores, educational outcomes ought to be universally expressed in uniform measures. Measures made in different schools should be traceable to reference standards and should madenecessary, legally binding, and scientifically required. In real life, though these questions are usually as unanswerable in the context of Measurement Theory as they are in True Score Theory, instituting metrological traceability requirements would make these answers available to everyone, everywhere, all the time.
After all, by definition, some people will pay a lot more per unit for a lesser gain and others will pay a lot less for a greater gain. And how many things are bought and sold at their average quality, volumes or prices, anyway?
With only two time points, individualized year-to-year gain measures may be highly variable and unreliable.
…with the high stakes end of year test, but if we also have week-to-week measures from across the school year…
…then we will be able to use this low-cost, high-quality information to inform our purchasing decision…
Within a school or district, a standard per-unit gain price might be set. But customers would be able to compare prices to seek out the lowest cost per unit gain. And teachers, principals, and researchers will be able to study outcomes in a common language across classrooms, schools, districts, countries, grades, years, etc.
Questions raised by this comparison: Why does Classroom C (at the top) make such a small gain, even after adjusting for variation in at-risk profiles, and over the summer lose nearly all of the small gain that was made? What is happening in Classroom B that is not happening in Classroom C? Why do the measures drop in Classroom B in April? Spring fever? Can anything be done to maintain gains over the summer months of June to August?
Within a school or district, a standard per-unit gain price might be set. But customers would be able to compare prices to seek out the lowest cost per unit gain. And teachers, principals, and researchers will be able to study outcomes in a common language across classrooms, schools, districts, countries, grades, years, etc. If these measures are adjusted for differences in at-risk profiles, then this kind of natural variation provides a ready framework for experimental comparisons of possible causal relations. First thing to find out is what’s going on in School A. Then, what is School C doing that gives it such an edge in reading outcomes over School B, and at lower cost? Finally, again, what can be done about that summer slump?
Stakeholder participation and involvement are key in every area.
fēnpī -- scattered; mixed and disorganized Fun-peeAs leaves fall from trees in the autumn they drift and blow with the wind, landing where they will, and decaying. Test scores for students and items in True Score Theory are like autumn leaves. Scores are not organized into a common frame of reference and so they are not comparable across tests. The scores accumulate and take up space but are of less and less value as time passes. Further, items also decay in a sense: they cannot be re-used, as students are likely to remember them and may share them with others who would obtain an unfair advantage.
yìshù -- art YeeshuChildren, artists, and botanists may collect leaves and use them in creative ways to express themselves or to teach. Measures for students and items in Measurement Theory can be like carefully crafted works of art when the trouble is taken to understand what one is measuring and to use rigorous methods. Much depends, however, on the skills of the artists involved in crafting the test items, administering the tests, analyzing the data, and interpreting the results.
fāzhǎn -- development; growth; to develop; to grow; to expand