Creating an in-house computerized adaptive testing (CAT) program with Concerto

2,054 views

Published on

Presentation at JLTA (Japan Language Testing Association) 2013 National Conference

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,054
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
28
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Creating an in-house computerized adaptive testing (CAT) program with Concerto

  1. 1. Creating an in-house computerized adaptive testing (CAT) program with Concerto Atsushi, MIZUMOTO (Kansai University) 2013/09/20 JLTA at Waseda University
  2. 2. Computerized Adaptive Testing
  3. 3. CAT needs Item Response Theory
  4. 4. CTT vs. IRT Aspect CTT IRT Test score Ordinal scale Interval scale Ability estimate Test-dependent Test-independent Test result Person-dependent Person-independent Measurement target (Precision) All test-takers Individuals Equating/CAT Difficult Easy Ohtomo (2009)
  5. 5. CAT Needs IRT CAT IRT IRT IRT
  6. 6. History of CAT Research 40 years (Thomson & Weiss, 2011)) 30 in LT (Koyama, 2010))
  7. 7. Example of CAT
  8. 8. Example of CAT
  9. 9. CBT ≠ CAT
  10. 10. How CAT Works http://www.j-cat.org/page/interpret
  11. 11. Advantages of CAT •Tailored for individual test-takers •Shorter test time •More precision (= SE smaller) •No need for random sampling www.geocities.jp/kosugitti/labo/irtnote.pdf
  12. 12. Purposes •Creating a CAT program •Evaluation
  13. 13. Creating a CAT Program •Choosing the CAT System •Constructing an Item Bank (Pretest) •Calibrating the Item Bank •Determine Specifications & Feedback •Administering the CAT
  14. 14. Creating a CAT Program •Choosing the CAT System •Constructing an Item Bank (Pretest) •Calibrating the Item Bank •Determine Specifications & Feedback •Administering the CAT
  15. 15. Moodle Plugin http://moodle2x.info
  16. 16. 1. Free account(150 test takers/month) 2. Amazon Machine Images(Free for a year) 3. Installing it on your own server
  17. 17. •Open-source •Running R on a server (catR, RMySQL) •HTML-based
  18. 18. Installation on a server https://code.google.com/p/concerto-platform/wiki/installation4
  19. 19. Wiki (Resources) https://code.google.com/p/concerto-platform/wiki/Resources?tm=6
  20. 20. Creating a CAT Program •Choosing the CAT System •Constructing an Item Bank (Pretest) •Calibrating the Item Bank •Determine Specifications & Feedback •Administering the CAT
  21. 21. Creating a CAT Program •Choosing the CAT System •Constructing an Item Bank (Pretest) •Calibrating the Item Bank •Determine Specifications & Feedback •Administering the CAT
  22. 22. Constructing an Item Bank (Pretest) •Vocabulary Test (Mizumoto, 2006) http://www.mizumot.com/files/VocSizeMeasure.pdf •Based on SVL 12,000 (Up to 8,000 level; 30 items for each level) •716 university EFL learners
  23. 23. Sample Question (1) 心の, 精神の A. essential B. creative C. loose D. mental
  24. 24. Creating a CAT Program •Choosing the CAT System •Constructing an Item Bank (Pretest) •Calibrating the Item Bank •Determine Specifications & Feedback •Administering the CAT
  25. 25. Creating a CAT Program •Choosing the CAT System •Constructing an Item Bank (Pretest) •Calibrating the Item Bank •Determine Specifications & Feedback •Administering the CAT
  26. 26. Calibrating the Item Bank •240 items analyzed (Rasch model) •150 items left for the item bank •Calibrated with two parameter logistic model (item difficulty & discrimination) •Update the csv file to Concerto
  27. 27. Creating a CAT Program •Choosing the CAT System •Constructing an Item Bank (Pretest) •Calibrating the Item Bank •Determine Specifications & Feedback •Administering the CAT
  28. 28. Creating a CAT Program •Choosing the CAT System •Constructing an Item Bank (Pretest) •Calibrating the Item Bank •Determine Specifications & Feedback •Administering the CAT
  29. 29. Specifications of CAT •Starting point (parameters, initial ability, randmized/fixed) •Ability estimation method (empirical Bayes and others) •Stopping rule (Number of items/Standard error) •Final ability estimation
  30. 30. Magis and Raîche (2012, p. 7)
  31. 31. How many items for what SE? •Simulation with catR package Magis, D., & Raîche, G. (2012). http://www.jstatsoft.org/v48/i08
  32. 32. True Theta = 1, SE = 0.3 Stopping rule = 30 items
  33. 33. Concerto
  34. 34. http://langtest.jp/concerto/?tid=20
  35. 35. Feedback Page
  36. 36. Creating a CAT Program •Choosing the CAT System •Constructing an Item Bank (Pretest) •Calibrating the Item Bank •Determine Specifications & Feedback •Administering the CAT
  37. 37. Creating a CAT Program •Choosing the CAT System •Constructing an Item Bank (Pretest) •Calibrating the Item Bank •Determine Specifications & Feedback •Administering the CAT
  38. 38. 268 test takers (university first year) (1) CAT (2) Paper-pencil version (68 items) common person linking (3) Questionnaire “What did you think of the CAT result?”
  39. 39. Evaluation CAT vs. Paper-pencil
  40. 40. CAT Theta 0 1 2 3 4 -10123 0.92 -1 0 1 2 3 01234 Paper-pencil Theta n = 268 Random 30Qs Fixed 68Qs
  41. 41. -1 0 1 2 3 01234 Pape n = 268 CAT (30Qs) M = 1.71 SD = 1.13 P-P (68Qs) M = 1.72 SD = 0.95
  42. 42. -1 0 1 2 3 01234 Pape n = 268 CAT (30Qs) M = 1.71 SD = 1.13 P-P (68Qs) M = 1.72 SD = 0.95 Mean diff. = -0.02 95% CI [-0.07, 0.04] d = 0.01 Power = .06
  43. 43. -1 0 1 2 3 01234 Pape n = 268 CAT SE (30Qs) M = 0.39 SD = 0.11 P-P SE (68Qs) M = 1.71 SD = 1.13
  44. 44. -1 0 1 2 3 01234 Pape n = 268 CAT SE (30Qs) M = 0.39 SD = 0.11 P-P SE (68Qs) M = 1.71 SD = 1.13 Mean diff. of SE = -1.32 95% CI [-1.44, -1.19] d = 1.65 Power = 0.99
  45. 45. Evaluation CAT vs. Paper-pencil Means: CAT = Paper-pencil SEs: CAT < Paper-pencil CAT measures the same ability with much more precision (with fewer items).
  46. 46. Evaluation Questionnaire
  47. 47. Result of the Questionnaire Frequency Response 150 100 50 0 50 100 150 Very inaccurate Inaccurate Rather Inaccurate Rather accurate Accurate Very accurate
  48. 48. Feedback Page
  49. 49. Future Research •More items in the item bank •Better formula for predicting other test scores •Improved feedback •Collaboration
  50. 50. Summary •Created a CAT program •Evaluation (1) CAT better than Paper-pencil (2) Feedback needs improvement.

×