Learning Analytics for the Evaluation of Competencies and Behaviors in Serious Games

Learning Analytics for the Evaluation of
Competencies and Behaviors in Serious Games
José A. Ruipérez Valiente — @JoseARuiperez — jruiperez@um.es

Introductions
Where are you coming from?

Main contributors to this research
José A. Ruipérez-Valiente
BEng Telecomunications Systems (UCAM),
MEng Telecomunications, MSc y PhD
Telematics (UC3M), Postdoc (MIT)
6 years working in learning analytics across
many objectives and contexts
Currently focused in large scale trends in
MOOCs and game-based assessment
Juan de la Cierva Researcher at UMU and
affiliate at MIT Playful Journey Lab
YJ (Yoon Jeon) Kim
Executive Director Playful
Journey Lab located at
MIT Open Learning
Assessment scientist
Focus on games and
playful approaches for
assessment

Topics related to this talk
- Games for Learning
- Game-based Assessment
- Learning Analytics
- … and Design (which is transverse to numerous areas and applications)

Motivations
Why and how are we doing this?

A game is a voluntary interactive
activity, in which one or more players
follow rules that constrain their
behavior, enacting an artificial conflict
that ends in a quantifiable outcome.
~Eric Zimmerman (2004)

Why Games?
● Games are “flexible enough for players to
inhabit and explore through meaningful
play” (Salen & Zimmerman) (deep learning)
● Majority of children grow up playing games
● Learners have more freedom related to
how much effort they choose to expend,
how often they fail and try again (Osterweil,
2014) (real life)

Assessment is a process of reasoning
from evidence. Therefore, an
assessment is a tool designed to
observe students’ behavior and
produce data that can be used to draw
reasonable inferences about what
students know.
~ Bob Mislevy

Why Games for Assessment?
● Games incorporate multiple pathways to solution(s) where learners can make
meaningful choices and demonstrate multiple ways of solving problems
● Use complex and authentic problems → hard-to-measure constructs
o We need to assess 21st century skills
● Games are motivating and engaging → accurate assessment (Sundre &
Wise, 2003)
● It doesn’t feel like assessment (i.e. stealth assessment)
o Less stresful situations for students

The Broad view of Learning Analytics
…collection, analysis and reporting of data about learners and
their contexts, for purposes of understanding and optimising
learning and the environments in which it occurs…
Source: First Learning Analytics
and Knowledge Conference

The Learning Analytics data-driven Process
Raw data
generation
Feature
engineering
Visualizations
Recommendation
Report generator
Meaningful features
Which raw data is
necessary?
What to do with the processed
data?
What to obtain and How
to do it?
Technology as an engine to enhance learning
Exploration,
Correlation,
clustering,
prediction,
causes…
Learning
environments
Conclusions generate feedback and close the LA loop

Game-based Assessment
Design, model implementation and evaluation process

Design, Development and Evaluation Process of Game-based
Assessment

Design
● Design and implementation of game system
○ Game mechanics that can generate evidence
from the constructs and a data infrastructure that
effectively stores that evidence
○ The most iterative step of the process with very
frequent playtesting
1. Start with paper prototypes
2. Move to drafty digital prototypes
3. End with advanced digital prototypes
● Data collection
○ Diverse audiences and contexts
○ Very important for game mechanics and tech side
○ Face-to-face playtesting
○ Amazon MTurk

Amazon Mechnical Turk as part of the design process

Balance between Game Design and Assessment Design

Meet Shadowspect!
More at https://shadowspect.org/

Model development
● Implementation of the assessment machinery:
○ Process of turning evidence into constructs
○ Content knowledge assessment: Following a
traditional Evidence-centered Design
○ Cognitive and behavioral assessment: Combining
knowledge engineering process and ML with expert
labelling
● Data collection:
○ Same high school context, age, and settings
○ Two sessions of one hour each
○ Around 10 US high school classes and more than 200
hundred students

Model development:
Content knowledge assessment
Implementation via Evidence-centered Design

Common Core Geometry Standards
● Competency model: We focus on the common core geometry standards
o MG.A.1: Use geometric shapes, their measures, and their properties to describe
objects (e.g., modeling a tree trunk or a human torso as a cylinder)
o GMD.B.4: Identify the shapes of two-dimensional cross-sections of three-
dimensional objects, and identify three-dimensional objects generated by rotations
of two-dimensional objects
o CO.A.5: Given a geometric figure and a rotation, reflection, or translation, draw the
transformed figure
o CO.B.6: Use geometric descriptions of rigid motions to transform figures and to
predict the effect of a given rigid motion on a given figure

ECD Summary for Geometry Common Standards Assessmement
● Collaboration with geometry specialist, game designer and assessment designer
○ Evidence model: We generate puzzles that generate evidence from the Geometry Common Standards
○ Task model: We map the relationship (none, weak or strong) of each puzzle with the common standard
○ Assembly model: We put all the evidence from a student together to assess their content knowledge
○ Presentation & Delivery model: Reports and dashboards by student/standard. Difficulty by exercise
Puzzle MG.A.1 GMD.B.4 …
Puzzle 1 Weak Weak …
Puzzle 2 None None …
… … … …
Student Puzzle 1 Puzzle 2 …
Student 1
OK, # 1
attempt
OK, # 3
attempts
…
Student 1 NA
Fail, # 5
attempt
…
… … … …

Our simplified case scenario right now
Evidence Standards
map

Model development:
Cognitive and Behavioral Assessment
Implementation via a Learning Analytics Knowledge Engineering Process

Knowledge Engineering Process
● We acquire knowledge about the construct that we want to measure
1. Reading about the construct
2. Conducting interview with experts
3. Reviewing related scientific literature
● We algorithmically implement features that use the data/evidence that can inform the
construct that we want to measure

Our simplified case scenario now updates to:
Evidence Constructs
map
Data Features
data schema inform
algorithms

Efficiency construct
- Efficiency is the ability to do things well, successfully, and without waste. It
often specifically comprises the capability of a specific application of effort
to produce a specific outcome with a minimum amount or quantity of
waste, expense, or unnecessary effort (Wikipedia)

Evidence in Shadowspect related to efficiency
● Ability to do things well:
○ Solving puzzles correctly
● Expense or effort:
○ Time invested
○ Number of attempts to solve a problem

Mapping evidence into necessary data in Shadowspect
● We need: puzzles solved correctly, time invested and attempts
○ Necessary types of events for that:
■ puzzle_start (timestamp, student, puzzle_id)
■ leave_to_menu (timestamp, student, puzzle_id)
■ puzzle_attempt (timestamp, student, puzzle_id, correct)

How does data in Shadowspect actually looks like?

Algorithm to compute features from data (pseudo-code)
# note this is a VERY simplified version that do not aim to be the most effective implementation of this algorithm
computeEfficiencyFeatures(student):
student_events = getStudentEvents(student)
correct_exercises_list = list(); number_attempts = 0; total_time = 0; puzzle_started_event = None
for event in student_events:
if(event[‘type’] == ‘puzzle_started’) then
puzzle_started_event = event
elif(event[‘type’] == ‘leave_to_menu’) then
total_time += (event[‘timestamp’] - puzzle_started_event[‘timestamp’])
puzzle_started_event = None
elif(event[‘type’] == ‘puzzle_attempt’):
number_attempts += 1
if(event[‘correct’] == True) then
correct_exercises_list.add(event[‘puzzle_id’])
attempts_per_correct_problem = length(unique(correct_exercises_list))/number_attempts
time_per_correct_problem = length(unique(correct_exercises_list))/total_time
return(attempts_per_correct_problem, time_per_correct_problem)

The previous general scenario
Evidence Constructs
map
Data Features
data schema inform
algorithms

Model for efficiency in Shadowspect
Evidence
● Correct puzzles
● Time
● Number attempts
Data
● puzzle_start
● leave_to_menu
● puzzle_attempt
data schema inform
computeEfficiency
Features(student)
Construct
Efficiency
Features
attempts_per_correct_problem
time_per_correct_problem
map

Model development:
Cognitive and Behavioral Assessment
Implementation via Learning Analytics with Experts and Machine Learning

Expert Labelling and Machine Learning Process
● Two or more experts label text or video replays that can be visually assessed
○ We divide all level interactions in replays that can be labelled
○ Experts review replays and label them for each construct that we want to measure
■ They might use rubrics and we are looking for expert inter-agreement (Cohen’s kappa)
○ We implement a supervised machine learning assessment model based on these labels
● Challenges here include achieving good inter-agreement, technical logistics, replay
resolution and final implementation of the ML model
Example of simplified text replay: 1. Start puzzle – 2. Create shape square – 3. Move square – 4. Create cone
5. Rotate cone – 6. Change perspective – 7. Snapshot – 8. Move cone – 9. Submit – 10 Puzzle correct

Expert Labelling and Machine Learning Process
Evidence
Constructsmap
Data Features
data schema
inform
algorithms
expert
assessment
ML/AI

Evaluation
● We are not here yet! Future plans:
● Data collection:
○ Implementation as part of the curriculum in high
school classes
○ Demographic and school data with external measures
● Game analytics: How is the game being used by
students? Improvements, enjoyment…
● Model performance evaluation: How are the
models working? What do teachers think about
models?
● Psychometric evaluation: Are our models
correlated to other external tests, e.g. geometry
traditional tests or spatial reasoning validated
instruments

It’s time to say goodbye
But let’s conclude before that

Conclusions
● Alternative assessment method with great potential
○ Focus on complex constructs, can focus on the process (on only outcomes), is less stressful
and more enjoyable for students
● Highly challenging and multidisciplinary field, main problems:
○ Cost, scalability and generalization across GBA tools, model validity, trustworthiness, and
teacher literacy
● Some companies are already using GBA as part pre-hiring
● Difference between Assessment and assessment
● Opportunities for collaboration!

Thank you!
José A. Ruipérez Valiente — @JoseARuiperez — jruiperez@um.es

Learning Analytics for the Evaluation of Competencies and Behaviors in Serious Games

Recommended

Recommended

More Related Content

Similar to Learning Analytics for the Evaluation of Competencies and Behaviors in Serious Games

Similar to Learning Analytics for the Evaluation of Competencies and Behaviors in Serious Games (20)

More from MIT

More from MIT (6)

Recently uploaded

Recently uploaded (20)

Learning Analytics for the Evaluation of Competencies and Behaviors in Serious Games

Editor's Notes