Automated essay scoring (AES) refers to the use of natural language processing (NLP), machine learning (ML), and artificial intelligence to grade essay responses from exams.
AES refers to the calibration of a specific model for each rubric and each prompt, based on actual data from human raters. That is, you can't just feed a pile of essays to some bot and tell it to grade them on "growth mindset." Instead, you have to define a very specific grading rubric, score at least a few hundred students by hand, and then use NLP and ML software to fit ML models.
This powerpoint provides a broad introduction to this topic. For more information, visit https://assess.com/smartmarq-ai-essay-scoring/
2. 2 | a s s e s s . c o m
An application of machine learning
and artificial intelligence to the
specific problem of how to score
student essays
What is automated essay
scoring?
3. 3 | a s s e s s . c o m
Machine learning… artificial intelligence…
data science… big data… data mining…
supervised learning… features…
New terms, not new ideas!
4. 4 | a s s e s s . c o m
New terms, not new ideas!
5. 5 | a s s e s s . c o m
What is AI?
Also… NOT JUST
CHATGPT!
6. 6 | a s s e s s . c o m
AI is great, but not a panacea
7.
8. 8 | a s s e s s . c o m
The use and development of
computer systems that are able to
learn and adapt without following
explicit instructions, by using
algorithms and statistical models to
analyze and draw inferences from
patterns in data. (Oxford)
Machine Learning
9. 9 | a s s e s s . c o m
The theory and development of
computer systems able to perform
tasks that normally require human
intelligence, such as visual perception,
speech recognition, decision-making,
and translation between languages.
(Oxford)
Usually leverages an ML model in a
way to solve a problem or do a task
Artificial Intelligence
10. 10 | a s s e s s . c o m
Machine learning & AI
• Common example is logistic model to predict
binary outcome...
• Here’s one that we all use every day!
• Can we apply it to assessment…?
11. 11 | a s s e s s . c o m
Machine learning & AI
• Another common example: image classification
12. 12 | a s s e s s . c o m
Machine learning & AI
Warning:
You need a good
training set!
13. 13 | a s s e s s . c o m
AI in Assessment: Examples
• Automated item generation
• Automated essay scoring
• Computerized Adaptive Testing
• Automated test assembly
• Item response theory
• Remote proctoring
• Factor analysis
• Cognitive diagnostic models
• Automated report interpretation
• Process Data
14.
15. 15 | a s s e s s . c o m
Two approaches to AI with essays
Evaluation
Submit your essay to Grammarly or ChatGPT for
general feedback – Not really AES
Scoring
Score 20,000 essays on a 0-3 point rubric for Grammar
conventions, specific to writing about your dream travel
destination, train a custom ML model as a second rater
16. 16 | a s s e s s . c o m
Automated essay scoring (AES)
Can produce
massive time
savings
But you need to
train the model!!!!
17. 17 | a s s e s s . c o m
NLP: Document-Term Matrix
Student leadership School_
board
turn_a
_profit
Ludacris similarly and
1 1 0 0 1 0 3
2 0 0 1 1 0 5
3 0 0 1 0 0 2
4 1 1 0 0 0 7
5 2 4 1 0 2 9
• Your school
board is
considering the
elimination of
sports to help
balance the
budget. Write a
letter to argue for
or against this,
with at least 3
reasons.
Note that a general AI like Chat-GPT would not be
able to pick up on these features
18. 18 | a s s e s s . c o m
NLP: Feature Extraction
Student leadership School_
board
turn_a
_profit
Ludacris similarly and
1 1 0 0 1 0 3
2 0 0 1 1 0 5
3 0 0 1 0 0 2
4 1 1 0 0 0 7
5 2 4 1 0 2 9
• What can we
use from the
DTM?
• Minimum N?
• Any other
features?
• Word count
• Grammar errors
• Misspelled
words
19. 19 | a s s e s s . c o m
NLP: Model
Student leadership School_
board
turn_a
_profit
Ludacris similarly Human
Score
1 1 0 0 1 0 0
2 0 0 1 1 0 1
3 0 0 1 0 0 2
4 1 1 0 0 0 2
5 2 4 0 0 2 3
• Most basic:
linear regression
• Far too simple,
so we use
stronger models
• Neural Network
• SVMs
• Cubist
• How much time
do you have?!?!
20. 20 | a s s e s s . c o m
NLP: Compare/Evaluate
Student Human Score Neural
Network
SVM Cubist
1 0 0 0 0
2 1 1 0 1
3 2 2 1 3
4 2 3 1 3
5 3 3 1 2
• Agreement
• Actual
• Cohen’s kappa
• Quadratic-
weighted kappa
• Correlation
21.
22. 22 | a s s e s s . c o m
Actual data from a project
•Kaggle
23. 23 | a s s e s s . c o m
Actual data from a project
•Kaggle
24. 24 | a s s e s s . c o m
Actual data from a project
•Language acquisition task
•Coincidentally, also 0-11 points
(rubrics: 4, 4, 3)
•Range of nativeness in examinees
•N=1696 total
•Three sizes of training set: 10%,
20%, 60%
25. 25 | a s s e s s . c o m
Actual data from a project
•Write about your dream travel
destination and why you want to visit
26. 26 | a s s e s s . c o m
Actual data from a project
27. 27 | a s s e s s . c o m
Actual data from a project
•Final output
28. 28 | a s s e s s . c o m
Actual data from a project
29.
30. 30 | a s s e s s . c o m
How to actually implement
•Build your own (see Kaggle and GitHub)
• R, Python, Java
•Commercial
• e-Rater (ETS)
• Intellimetric (Vantage)
• Blees.ai
• Project Essay Grade (Measurement Inc)
• SmartMarq (ASC)
31. 31 | a s s e s s . c o m
Questions?
nate@assess.com