Problematising Assessment
     (as if it needed it)


         James Atherton
           11 March 13
Problematising Assessment
     (as if it needed it)

                            Balloons in this
         James Atherton
                               colour are
           11 March 13    additional notes for
                           the online version
This is the outcome
   to which the
  session relates


3.3: Understand theories, principles and
  applications of formal and informal
  assessment
3.3: Understand theories, principles and
  applications of formal and informal
  assessment

                        And if I were teaching
                         Ofsted style I should
                            now recite the
                             objectives...
And for once I will.
 At the end of this
session you should
        be–



                Confused
Confused

...but at a higher level than before
It is frowned
      upon for you to
       confuse your
         students.


              Confused

...but at a higher level than before


            Probably from Kelley, 1951, but attributed to various sources
It is frowned
      upon for you to                       ...which may well
       confuse your                           be the biggest
         students.                         limitation on your
                                                teaching.

              Confused

...but at a higher level than before


                        Probably from Kelley, 1951, but attributed to various sources
Confusion can be
constructive in teaching—
  like ploughing before
         planting
The Problem of Proxies
1: The Problem of Proxies


                ...or surrogates, or
                   substitutes, or
                  stand-ins for the
                      real thing
1: The Problem of Proxies

Assessment is rife with
 them, and diluted by     ...or surrogates, or
their use—but we are         substitutes, or
   stuck with them          stand-ins for the
                                real thing
when
This is the essence of intuitive heuristics:
  faced with a difficult question,
  we often answer an easier one
  instead, usually without noticing the
  substitution
                                   Kahneman 2011: 12
  Thinking Fast and
    Slow, Penguin
when
This is the essence of intuitive heuristics:
  faced with a difficult question,
  we often answer an easier one
  instead, usually without noticing the
  substitution
                                         Kahneman 2011: 12


                 And this is exactly what we do in
                            assessment
Assess-
Content
           ment
In principle our teaching is
  governed by content, and the
assessment is just to check that it
        has been learned




                                      Assess-
          Content
                                       ment
Assess-
Content
           ment
Assess-
Content
                   ment


          In practice, the demands
          of the assessment can all
              to easily take over
Assess-
        Content
                            ment


  “Will we be      In practice, the demands
tested on this?”   of the assessment can all
                      too easily take over
Purposes



Forms    Aspects
Purposes



                  Forms    Aspects
Here are some
  traditional
perspectives on
 assessment...
Purposes

              Forms   Aspects
• Diagnosis

• Feedback

• Standards
Purposes

                       Forms      Aspects
• Diagnosis

• Feedback
              Pre-teaching

• Standards
Purposes

                     Forms    Aspects
• Diagnosis

• Feedback     During
              teaching
• Standards
Purposes

                     Forms    Aspects
• Diagnosis

• Feedback      After
              teaching
• Standards
Purposes


                Forms   Aspects
• Validity

• Reliability

• Fairness

• Security
Purposes


                       Forms    Aspects
• Validity

• Reliability

• Fairness      Traditional criteria
                  for evaluating
• Security         assessment
Purposes



                         Forms    Aspects
• Criterion-referenced

• Norm-referenced

• Ipsative
Purposes



                                Forms       Aspects
• Criterion-referenced

• Norm-referenced

• Ipsative
                         Judging against fixed
                         pre-specified criteria
Purposes



                               Forms       Aspects
• Criterion-referenced

• Norm-referenced

• Ipsative
                         Judging against other
                         people’s performance
Purposes



                               Forms      Aspects
• Criterion-referenced

• Norm-referenced

• Ipsative
                         Judging against your
                              own prior
                            performance:
                            personal best
Purposes



• Formative                         Forms     Aspects


• Summative


              ...etc. I could now
                test you on your
                 knowledge of
                assessment, but
See what I’ve
     done? I’ve reduced
     the whole topic to



• 12 items of jargon
Validity


• Does it do what it says on the tin?

• Is it really assessing the outcome?
What the area
 of practice
  actually      Let’s look at the
  requires      whole process of
                assessment drift.
What the area
 of practice
  actually                  Let’s look at the
  requires                  whole process of
                            assessment drift.




                 Based on the work of
                  Howard Becker and
                Etienne Wenger, among
                        others
What the course
sets out to teach
What the course
 sets out to teach




There’s about 80%
overlap—never a
    perfect fit
What the course
 actually does
     teach
What the course
sets out to assess
What the course
 actually does
    assess
What the area
          of practice    What the course
           actually       actually does
           requires          assess




That’s all the
 overlap left
What the area
 of practice    What the course
  actually       actually does
  requires          assess




                      And if you
                      don’t pass
                      very well...
2: False positives and false
          negatives:
the inherent limitations of testing
2: False positives and false
                 negatives:
      the inherent limitations of testing

                I got into some trouble in this section!
 The maths are correct, but the problem comes with the labelling of
 the False Positives (or Type 1 errors) and what happens if you try to
eliminate them simply by making the assessment stricter (rather than
    by targeting it more precisely), so to avoid unnecessary extra
             confusion, I’ve taken that out of this version.
Take a hundred people and train them for something....
In the real world, 80% are competent at it, and 20% aren’t
Not competent
    (20%)




                          Competent (80%)



  In the real world, 80% are competent at it, and 20% aren’t
But we’re not in the real world—we’re in a college—and we have to devise a
           test to determine who can be let loose on the public
Inaccurate (20%)



                     Accurate (80%)




... but tests aren’t always good predictors. You devise the best you can,
                   but it may be only, say, 80% accurate.
Inaccurate (20%)
 Not competent
     (20%)



                       Accurate (80%)

                              Competent (80%)


So the 80% the test passes are not the same as the 80% who are genuinely
                                competent
False +
 (4%)
          False – (16%)




True –
          True + (64%)
 16%
False +
 (4%)
          False – (16%)




True –
          True + (64%)
 16%

                     These are the “true
                   positives”—they passed
                    the test, and so they
                         should have
These are the true
 negatives: they
failed and so they
  False +
should have done.
   (4%)
                     False – (16%)




 True –
                     True + (64%)
  16%
These are the
unfortunates: the
 test failed them,
but it was+
    False wrong.
That is technically
     (4%)
                      False – (16%)
  a ‘Type 2’ error.




  True –
                      True + (64%)
   16%
False +
 (4%)
                           False – (16%)




True –
                           True + (64%)
 16%
  These are the ‘Type 1’
   errors: they should
   have failed, but the
    test passed them.
This test will always be 20% wrong.
  So you can only reduce the False
Positives at the cost of increasing the
           False Negatives.

       See the notes for more on this.
So I hope you are now
confused at a higher level
      than before...
• Becker H (1963) “Why school is a lousy place to learn anything in”
    reprinted in R J Burgess (ed.) Howard Becker on Education
    Buckingham; Open University Press, 1998
•   Kahneman D (2011) Thinking, fast and slow London; Penguin
•   Kay J (2011) Obliquity; why our goals are best achieved indirectly
    London; Profile Books
•   Wenger E (1998) Communities of Practice; learning, meaning and
    identity Cambridge; C.U.P.
www.bedspce.org.uk/cbc

Problematising Assessment

  • 1.
    Problematising Assessment (as if it needed it) James Atherton 11 March 13
  • 2.
    Problematising Assessment (as if it needed it) Balloons in this James Atherton colour are 11 March 13 additional notes for the online version
  • 3.
    This is theoutcome to which the session relates 3.3: Understand theories, principles and applications of formal and informal assessment
  • 4.
    3.3: Understand theories,principles and applications of formal and informal assessment And if I were teaching Ofsted style I should now recite the objectives...
  • 5.
    And for onceI will. At the end of this session you should be– Confused
  • 6.
    Confused ...but at ahigher level than before
  • 7.
    It is frowned upon for you to confuse your students. Confused ...but at a higher level than before Probably from Kelley, 1951, but attributed to various sources
  • 8.
    It is frowned upon for you to ...which may well confuse your be the biggest students. limitation on your teaching. Confused ...but at a higher level than before Probably from Kelley, 1951, but attributed to various sources
  • 10.
    Confusion can be constructivein teaching— like ploughing before planting
  • 11.
  • 12.
    1: The Problemof Proxies ...or surrogates, or substitutes, or stand-ins for the real thing
  • 13.
    1: The Problemof Proxies Assessment is rife with them, and diluted by ...or surrogates, or their use—but we are substitutes, or stuck with them stand-ins for the real thing
  • 14.
    when This is theessence of intuitive heuristics: faced with a difficult question, we often answer an easier one instead, usually without noticing the substitution Kahneman 2011: 12 Thinking Fast and Slow, Penguin
  • 15.
    when This is theessence of intuitive heuristics: faced with a difficult question, we often answer an easier one instead, usually without noticing the substitution Kahneman 2011: 12 And this is exactly what we do in assessment
  • 16.
  • 17.
    In principle ourteaching is governed by content, and the assessment is just to check that it has been learned Assess- Content ment
  • 18.
  • 19.
    Assess- Content ment In practice, the demands of the assessment can all to easily take over
  • 20.
    Assess- Content ment “Will we be In practice, the demands tested on this?” of the assessment can all too easily take over
  • 21.
  • 22.
    Purposes Forms Aspects Here are some traditional perspectives on assessment...
  • 23.
    Purposes Forms Aspects • Diagnosis • Feedback • Standards
  • 24.
    Purposes Forms Aspects • Diagnosis • Feedback Pre-teaching • Standards
  • 25.
    Purposes Forms Aspects • Diagnosis • Feedback During teaching • Standards
  • 26.
    Purposes Forms Aspects • Diagnosis • Feedback After teaching • Standards
  • 27.
    Purposes Forms Aspects • Validity • Reliability • Fairness • Security
  • 28.
    Purposes Forms Aspects • Validity • Reliability • Fairness Traditional criteria for evaluating • Security assessment
  • 29.
    Purposes Forms Aspects • Criterion-referenced • Norm-referenced • Ipsative
  • 30.
    Purposes Forms Aspects • Criterion-referenced • Norm-referenced • Ipsative Judging against fixed pre-specified criteria
  • 31.
    Purposes Forms Aspects • Criterion-referenced • Norm-referenced • Ipsative Judging against other people’s performance
  • 32.
    Purposes Forms Aspects • Criterion-referenced • Norm-referenced • Ipsative Judging against your own prior performance: personal best
  • 33.
    Purposes • Formative Forms Aspects • Summative ...etc. I could now test you on your knowledge of assessment, but
  • 34.
    See what I’ve done? I’ve reduced the whole topic to • 12 items of jargon
  • 35.
    Validity • Does itdo what it says on the tin? • Is it really assessing the outcome?
  • 36.
    What the area of practice actually Let’s look at the requires whole process of assessment drift.
  • 37.
    What the area of practice actually Let’s look at the requires whole process of assessment drift. Based on the work of Howard Becker and Etienne Wenger, among others
  • 38.
  • 39.
    What the course sets out to teach There’s about 80% overlap—never a perfect fit
  • 40.
    What the course actually does teach
  • 41.
    What the course setsout to assess
  • 42.
    What the course actually does assess
  • 43.
    What the area of practice What the course actually actually does requires assess That’s all the overlap left
  • 44.
    What the area of practice What the course actually actually does requires assess And if you don’t pass very well...
  • 45.
    2: False positivesand false negatives: the inherent limitations of testing
  • 46.
    2: False positivesand false negatives: the inherent limitations of testing I got into some trouble in this section! The maths are correct, but the problem comes with the labelling of the False Positives (or Type 1 errors) and what happens if you try to eliminate them simply by making the assessment stricter (rather than by targeting it more precisely), so to avoid unnecessary extra confusion, I’ve taken that out of this version.
  • 48.
    Take a hundredpeople and train them for something....
  • 49.
    In the realworld, 80% are competent at it, and 20% aren’t
  • 50.
    Not competent (20%) Competent (80%) In the real world, 80% are competent at it, and 20% aren’t
  • 51.
    But we’re notin the real world—we’re in a college—and we have to devise a test to determine who can be let loose on the public
  • 52.
    Inaccurate (20%) Accurate (80%) ... but tests aren’t always good predictors. You devise the best you can, but it may be only, say, 80% accurate.
  • 53.
    Inaccurate (20%) Notcompetent (20%) Accurate (80%) Competent (80%) So the 80% the test passes are not the same as the 80% who are genuinely competent
  • 54.
    False + (4%) False – (16%) True – True + (64%) 16%
  • 55.
    False + (4%) False – (16%) True – True + (64%) 16% These are the “true positives”—they passed the test, and so they should have
  • 56.
    These are thetrue negatives: they failed and so they False + should have done. (4%) False – (16%) True – True + (64%) 16%
  • 57.
    These are the unfortunates:the test failed them, but it was+ False wrong. That is technically (4%) False – (16%) a ‘Type 2’ error. True – True + (64%) 16%
  • 58.
    False + (4%) False – (16%) True – True + (64%) 16% These are the ‘Type 1’ errors: they should have failed, but the test passed them.
  • 59.
    This test willalways be 20% wrong. So you can only reduce the False Positives at the cost of increasing the False Negatives. See the notes for more on this.
  • 60.
    So I hopeyou are now confused at a higher level than before...
  • 61.
    • Becker H(1963) “Why school is a lousy place to learn anything in” reprinted in R J Burgess (ed.) Howard Becker on Education Buckingham; Open University Press, 1998 • Kahneman D (2011) Thinking, fast and slow London; Penguin • Kay J (2011) Obliquity; why our goals are best achieved indirectly London; Profile Books • Wenger E (1998) Communities of Practice; learning, meaning and identity Cambridge; C.U.P.
  • 62.