Creating an in-house computerized adaptive testing (CAT) program with Concerto
Upcoming SlideShare
Loading in...5
×
 

Creating an in-house computerized adaptive testing (CAT) program with Concerto

on

  • 1,428 views

Presentation at JLTA (Japan Language Testing Association) 2013 National Conference

Presentation at JLTA (Japan Language Testing Association) 2013 National Conference

Statistics

Views

Total Views
1,428
Views on SlideShare
1,412
Embed Views
16

Actions

Likes
3
Downloads
7
Comments
0

1 Embed 16

https://twitter.com 16

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Creating an in-house computerized adaptive testing (CAT) program with Concerto Creating an in-house computerized adaptive testing (CAT) program with Concerto Presentation Transcript

  • Creating an in-house computerized adaptive testing (CAT) program with Concerto Atsushi, MIZUMOTO (Kansai University) 2013/09/20 JLTA at Waseda University
  • Computerized Adaptive Testing
  • CAT needs Item Response Theory
  • CTT vs. IRT Aspect CTT IRT Test score Ordinal scale Interval scale Ability estimate Test-dependent Test-independent Test result Person-dependent Person-independent Measurement target (Precision) All test-takers Individuals Equating/CAT Difficult Easy Ohtomo (2009)
  • CAT Needs IRT CAT IRT IRT IRT
  • History of CAT Research 40 years (Thomson & Weiss, 2011)) 30 in LT (Koyama, 2010))
  • Example of CAT
  • Example of CAT
  • CBT ≠ CAT
  • How CAT Works http://www.j-cat.org/page/interpret
  • Advantages of CAT •Tailored for individual test-takers •Shorter test time •More precision (= SE smaller) •No need for random sampling www.geocities.jp/kosugitti/labo/irtnote.pdf
  • Purposes •Creating a CAT program •Evaluation
  • Creating a CAT Program •Choosing the CAT System •Constructing an Item Bank (Pretest) •Calibrating the Item Bank •Determine Specifications & Feedback •Administering the CAT
  • Creating a CAT Program •Choosing the CAT System •Constructing an Item Bank (Pretest) •Calibrating the Item Bank •Determine Specifications & Feedback •Administering the CAT
  • Moodle Plugin http://moodle2x.info
  • 1. Free account(150 test takers/month) 2. Amazon Machine Images(Free for a year) 3. Installing it on your own server
  • •Open-source •Running R on a server (catR, RMySQL) •HTML-based
  • Installation on a server https://code.google.com/p/concerto-platform/wiki/installation4
  • Wiki (Resources) https://code.google.com/p/concerto-platform/wiki/Resources?tm=6
  • Creating a CAT Program •Choosing the CAT System •Constructing an Item Bank (Pretest) •Calibrating the Item Bank •Determine Specifications & Feedback •Administering the CAT
  • Creating a CAT Program •Choosing the CAT System •Constructing an Item Bank (Pretest) •Calibrating the Item Bank •Determine Specifications & Feedback •Administering the CAT
  • Constructing an Item Bank (Pretest) •Vocabulary Test (Mizumoto, 2006) http://www.mizumot.com/files/VocSizeMeasure.pdf •Based on SVL 12,000 (Up to 8,000 level; 30 items for each level) •716 university EFL learners
  • Sample Question (1) 心の, 精神の A. essential B. creative C. loose D. mental
  • Creating a CAT Program •Choosing the CAT System •Constructing an Item Bank (Pretest) •Calibrating the Item Bank •Determine Specifications & Feedback •Administering the CAT
  • Creating a CAT Program •Choosing the CAT System •Constructing an Item Bank (Pretest) •Calibrating the Item Bank •Determine Specifications & Feedback •Administering the CAT
  • Calibrating the Item Bank •240 items analyzed (Rasch model) •150 items left for the item bank •Calibrated with two parameter logistic model (item difficulty & discrimination) •Update the csv file to Concerto
  • Creating a CAT Program •Choosing the CAT System •Constructing an Item Bank (Pretest) •Calibrating the Item Bank •Determine Specifications & Feedback •Administering the CAT
  • Creating a CAT Program •Choosing the CAT System •Constructing an Item Bank (Pretest) •Calibrating the Item Bank •Determine Specifications & Feedback •Administering the CAT
  • Specifications of CAT •Starting point (parameters, initial ability, randmized/fixed) •Ability estimation method (empirical Bayes and others) •Stopping rule (Number of items/Standard error) •Final ability estimation
  • Magis and Raîche (2012, p. 7)
  • How many items for what SE? •Simulation with catR package Magis, D., & Raîche, G. (2012). http://www.jstatsoft.org/v48/i08
  • True Theta = 1, SE = 0.3 Stopping rule = 30 items
  • Concerto
  • http://langtest.jp/concerto/?tid=20
  • Feedback Page
  • Creating a CAT Program •Choosing the CAT System •Constructing an Item Bank (Pretest) •Calibrating the Item Bank •Determine Specifications & Feedback •Administering the CAT
  • Creating a CAT Program •Choosing the CAT System •Constructing an Item Bank (Pretest) •Calibrating the Item Bank •Determine Specifications & Feedback •Administering the CAT
  • 268 test takers (university first year) (1) CAT (2) Paper-pencil version (68 items) common person linking (3) Questionnaire “What did you think of the CAT result?”
  • Evaluation CAT vs. Paper-pencil
  • CAT Theta 0 1 2 3 4 -10123 0.92 -1 0 1 2 3 01234 Paper-pencil Theta n = 268 Random 30Qs Fixed 68Qs
  • -1 0 1 2 3 01234 Pape n = 268 CAT (30Qs) M = 1.71 SD = 1.13 P-P (68Qs) M = 1.72 SD = 0.95
  • -1 0 1 2 3 01234 Pape n = 268 CAT (30Qs) M = 1.71 SD = 1.13 P-P (68Qs) M = 1.72 SD = 0.95 Mean diff. = -0.02 95% CI [-0.07, 0.04] d = 0.01 Power = .06
  • -1 0 1 2 3 01234 Pape n = 268 CAT SE (30Qs) M = 0.39 SD = 0.11 P-P SE (68Qs) M = 1.71 SD = 1.13
  • -1 0 1 2 3 01234 Pape n = 268 CAT SE (30Qs) M = 0.39 SD = 0.11 P-P SE (68Qs) M = 1.71 SD = 1.13 Mean diff. of SE = -1.32 95% CI [-1.44, -1.19] d = 1.65 Power = 0.99
  • Evaluation CAT vs. Paper-pencil Means: CAT = Paper-pencil SEs: CAT < Paper-pencil CAT measures the same ability with much more precision (with fewer items).
  • Evaluation Questionnaire
  • Result of the Questionnaire Frequency Response 150 100 50 0 50 100 150 Very inaccurate Inaccurate Rather Inaccurate Rather accurate Accurate Very accurate
  • Feedback Page
  • Future Research •More items in the item bank •Better formula for predicting other test scores •Improved feedback •Collaboration
  • Summary •Created a CAT program •Evaluation (1) CAT better than Paper-pencil (2) Feedback needs improvement.