I wish I could believe you:
the frustrating unreliability of
some assessment research
Tim Hunt & Sally Jordan, The Open Un...
Trick question (of course)
2
From a great web site
http://www.tylervigen.com/spurious-correlations
Correlation &
Causation
3
Sly (1999)
614 students P01 S01 S02
Practice tests as formative assessment improve student
performance on computer-managed...
Sly (1999)
614 students P01 S01 S02 609 students
417 62.18% 72.72% 66.88% 415
197 – 67.56% 62.24% 194
Practice tests as fo...
Sly (1999)
614 students P01 S01 S02 609 students
417 62.18% 72.72% 66.88% 415
197 – 67.56% 62.24% 194
Practice tests as fo...
OU level 3 physics (SM358)
An investigation into factors affecting physics’ students
engagement with online assessment (Bo...
OU level 3 physics (SM358)
The assessment strategy
8
0 TMAs 1 TMA 2 TMAs 3 TMAs 4 TMAs
0 iCMAs
1 iCMA
2 iCMAs
3 iCMAs
4 iC...
OU level 3 physics (SM358)
Proportion of students
9
0 TMAs 1 TMA 2 TMAs 3 TMAs 4 TMAs
0 iCMAs 11.6% 3.4% 1.5% 0.5% 0.5%
1 ...
OU level 3 physics (SM358)
Exam mark
10
0 TMAs 1 TMA 2 TMAs 3 TMAs 4 TMAs
0 iCMAs 6.0
1 iCMA
2 iCMAs 17.0 24.0
3 iCMAs 60....
OU level 3 physics (SM358)
Exam mark compared to predictive model
11
0 TMAs 1 TMA 2 TMAs 3 TMAs 4 TMAs
0 iCMAs −20.8
1 iCM...
Confounding
variables
12
Berkeley gender bias case (1973)
Men Women
Applicants Admitted Applicants Admitted
Total 8442 44% 4321 35%
https://en.wiki...
Berkeley gender bias case (1973)
Men Women
Department Applicants Admitted Applicants Admitted
A 825 62% 108 82%
B 560 63% ...
Real
Experiments
15
What is an experiment?
Split participants into two equal groups.
Split randomly, so if there are confounding variables,
th...
Karpicke & Blunt (2011) + many more
Retrieval practice produces more learning than elaborative
studying with concept mappi...
… but! Wooldridge et al (2014)
The testing effect with authentic educational materials:
A cautionary note
18
“Based on [th...
… but! Wooldridge et al (2014)
19
The testing effect with authentic educational materials:
A cautionary note
How reliable is
student opinion?
20
Background
Our own work with interactive computer-marked assignments (iCMAs)
21
Findings from a questionnaire
Statement
Definitely
agree or
mostly agree
Neutral
Mostly or
definitely
disagree
Answering i...
Watching students in a usability lab
Six students observed answering questions (Jordan, 2009)
23
Data analysis
Much more data presented in Jordan (2014)
24
Reflection
Weaver (2006, p. 386) reports that 90% of students agreed with the statement
“Positive comments have boosted my...
Ethics
26
Ethics
Is it ethical to only give a helpful intervention to half the class?
Are we allowed to do experiments in Education?...
Look at evidence-based medicine
How do you know it’s effective if you have not done the experiment?
If you don't know whet...
The end
29
References
Bolton, J., Jordan, R. & Jordan, S. (2015). An investigation into factors affecting physics' students
engagemen...
Summary
Correlation vs causation
Confounding variables
Experiments – designed to minimise confounding variables
Don't abst...
Upcoming SlideShare
Loading in …5
×

I wish I could believe you: the frustrating unreliability of some assessment research

1,905 views

Published on

The talk that Sally Jordan and I gave at the 2015 Assessment in Higher Education conference.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,905
On SlideShare
0
From Embeds
0
Number of Embeds
992
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

I wish I could believe you: the frustrating unreliability of some assessment research

  1. 1. I wish I could believe you: the frustrating unreliability of some assessment research Tim Hunt & Sally Jordan, The Open University @tim_hunt @SallyJordan9 Are these two things related?
  2. 2. Trick question (of course) 2 From a great web site http://www.tylervigen.com/spurious-correlations
  3. 3. Correlation & Causation 3
  4. 4. Sly (1999) 614 students P01 S01 S02 Practice tests as formative assessment improve student performance on computer-managed learning assessment 4 A computerised assessment was quite exciting in itself back in 1999! Questions picked at random from a bank. P01 & S01 used the same test bank. S02 was different, with no practice.
  5. 5. Sly (1999) 614 students P01 S01 S02 609 students 417 62.18% 72.72% 66.88% 415 197 – 67.56% 62.24% 194 Practice tests as formative assessment improve student performance on computer-managed learning assessment 5 All standard deviations 15–17%
  6. 6. Sly (1999) 614 students P01 S01 S02 609 students 417 62.18% 72.72% 66.88% 415 197 – 67.56% 62.24% 194 Practice tests as formative assessment improve student performance on computer-managed learning assessment 6 All standard deviations 15–17% +5.38% +5.16% +4.64%
  7. 7. OU level 3 physics (SM358) An investigation into factors affecting physics’ students engagement with online assessment (Bolton & Jordan) 7
  8. 8. OU level 3 physics (SM358) The assessment strategy 8 0 TMAs 1 TMA 2 TMAs 3 TMAs 4 TMAs 0 iCMAs 1 iCMA 2 iCMAs 3 iCMAs 4 iCMAs 5 iCMAs 6 iCMAs
  9. 9. OU level 3 physics (SM358) Proportion of students 9 0 TMAs 1 TMA 2 TMAs 3 TMAs 4 TMAs 0 iCMAs 11.6% 3.4% 1.5% 0.5% 0.5% 1 iCMA 1.5% 1.0% 2 iCMAs 1.5% 2.4% 1.5% 3 iCMAs 1.5% 4 iCMAs 5.3% 2.4% 5 iCMAs 0.5% 3.9% 5.8% 8.2% 6 iCMAs 0.5% 0.5% 5.8% 5.8% 34.3%
  10. 10. OU level 3 physics (SM358) Exam mark 10 0 TMAs 1 TMA 2 TMAs 3 TMAs 4 TMAs 0 iCMAs 6.0 1 iCMA 2 iCMAs 17.0 24.0 3 iCMAs 60.0 4 iCMAs 43.7 62.0 5 iCMAs 23.0 46.0 62.6 69.5 6 iCMAs 35.3 60.8 77.5
  11. 11. OU level 3 physics (SM358) Exam mark compared to predictive model 11 0 TMAs 1 TMA 2 TMAs 3 TMAs 4 TMAs 0 iCMAs −20.8 1 iCMA 2 iCMAs −43.9 −27.5 3 iCMAs −9.0 4 iCMAs −15.6 +1.8 5 iCMAs −3.8 −11.1 +1.4 +2.4 6 iCMAs −17.1 +3.4 +4.6
  12. 12. Confounding variables 12
  13. 13. Berkeley gender bias case (1973) Men Women Applicants Admitted Applicants Admitted Total 8442 44% 4321 35% https://en.wikipedia.org/wiki/Simpson%27s_paradox #Berkeley_gender_bias_case 13
  14. 14. Berkeley gender bias case (1973) Men Women Department Applicants Admitted Applicants Admitted A 825 62% 108 82% B 560 63% 25 68% C 325 37% 593 34% D 417 33% 375 35% E 191 28% 393 24% F 272 6% 341 7% https://en.wikipedia.org/wiki/Simpson%27s_paradox #Berkeley_gender_bias_case 14
  15. 15. Real Experiments 15
  16. 16. What is an experiment? Split participants into two equal groups. Split randomly, so if there are confounding variables, they are probably equally split between groups. Give different ‘treatments’ to each group, trying to keep everything else the same. Blind the treatment, if possible, to reduce all sorts of biases. But, blinding is not normally possible in education. (You probably know if you just sat an exam!) [Pick your favourite research methods book] 16
  17. 17. Karpicke & Blunt (2011) + many more Retrieval practice produces more learning than elaborative studying with concept mapping 17
  18. 18. … but! Wooldridge et al (2014) The testing effect with authentic educational materials: A cautionary note 18 “Based on [the testing effect], … some textbooks are now accompanied by quizzing ancillaries …The quizzes are designed with the assumption that answering factual and application questions will promote a more integrated mental model that incorporates the target knowledge.” Typically, the quizzes and test banks sample items from similar sub-sections in the textbook but not necessarily the same information.
  19. 19. … but! Wooldridge et al (2014) 19 The testing effect with authentic educational materials: A cautionary note
  20. 20. How reliable is student opinion? 20
  21. 21. Background Our own work with interactive computer-marked assignments (iCMAs) 21
  22. 22. Findings from a questionnaire Statement Definitely agree or mostly agree Neutral Mostly or definitely disagree Answering iCMA questions helps me to learn 129 (85%) 7 (5%) 12 (8%) If I get the answer to an iCMA question wrong, the computer- generated feedback is useful 128 (85%) 11 (7%) 8 (5%) Responses received from 151 students (response rate  20%) (Jordan, 2011) 22
  23. 23. Watching students in a usability lab Six students observed answering questions (Jordan, 2009) 23
  24. 24. Data analysis Much more data presented in Jordan (2014) 24
  25. 25. Reflection Weaver (2006, p. 386) reports that 90% of students agreed with the statement “Positive comments have boosted my confidence.” Marriott (2009, p. 243) reports that 93% of students agreed with the statement “I find the immediate reporting of my test result valuable.” It is almost certainly the case that more students report that they find feedback useful than actually make good use of it. This is in line with the bias in self-reported behaviour that is observed in medicine and business. (Jordan, 2014, p. 69). But: Student opinion is important. (Dermo, 2009). We need to consider student opinion, but we also need to consider students’ actual actions. 25
  26. 26. Ethics 26
  27. 27. Ethics Is it ethical to only give a helpful intervention to half the class? Are we allowed to do experiments in Education? 27
  28. 28. Look at evidence-based medicine How do you know it’s effective if you have not done the experiment? If you don't know whether it is effective, is it ethical to use it? (They have been doing this for a while) 28 NICEAcademic researchers Drug companies Doctors Meta analysis The literature Medical schools
  29. 29. The end 29
  30. 30. References Bolton, J., Jordan, R. & Jordan, S. (2015). An investigation into factors affecting physics' students engagement with online assessment, Manuscript in preparation. Cohen, L., Manon, L. & Morrison, K. (2011). Research methods in education, 7th Edition, Routledge. Dermo, J. (2009). e-Assessment and the student learning experience: A survey of student perceptions of e‐assessment. British Journal of Educational Technology, 40(2), 203–214. Goldacre, B. (2008). Bad Science, Fourth Estate. Goldacre, B. (2012). Bad Pharma, Fourth Estate. Jordan, S. (2009). Assessment for learning: pushing the boundaries of computer-based assessment. Practitioner Research in Higher Education, 3(1), 11–19. Jordan, S. (2011). Using interactive computer–based assessment to support beginning distance learners of science, Open Learning, 26(2), 147–164. Jordan, S. (2014). E-assessment for learning? Exploring the potential of computer-marked assessment and computer-generated feedback, from short-answer questions to assessment analytics. PhD thesis. The Open University. At http://oro.open.ac.uk/41115/. Karpicke, J. & Blunt, J. (2011). Retrieval practice produces more learning than elaborative studying with concept mapping, Science, 331(6018) 772–775. Marriott, P. (2009). Students' evaluation of the use of online summative assessment on an undergraduate financial accounting module. British Journal of Educational Technology, 40(2), 237–254. Sly, L. (1999). Practice tests as formative assessment improve student performance on computer‐managed learning assessments, Assessment & Evaluation in Higher Education, 24(3), 339–343. Vigen, T. (2014). Spurious Correlations, at http://www.tylervigen.com/spurious-correlations. Weaver, M. R. (2006). Do students value feedback? Student perceptions of tutors’ written responses. Assessment & Evaluation in Higher Education, 31(3), 379–394. Wikipedia (2015). Simpson's paradox, at https://en.wikipedia.org/wiki/Simpson%27s_paradox. Wooldridge, C., Bugg, J., McDaniel, M. & Liu, Y. (2014). The testing effect with authentic educational materials: A cautionary note, Journal of Applied Research in Memory and Cognition, 3(3), 214–221. 30
  31. 31. Summary Correlation vs causation Confounding variables Experiments – designed to minimise confounding variables Don't abstract your experiment so much that the results aren't relevant Student opinion and attitudes are important but different from actions or effectiveness Ethical issues are real, but should be overcome 31 @tim_hunt T.J.Hunt@open.ac.uk @SallyJordan9 Sally.Jordan@open.ac.uk

×