Using semantic technologies for giving a formative assessment and supporting scoring in large courses and MOOCs: first experiences at UNED (2015-2017)

Miguel Santamaría Lancho, Mauro Hernández, ,Angeles Sánchez-Elvira, José María Luzón Encabo, Guillermo de Jorge-
Botana,
UNED, Spain
Using semantic technologies for giving a formative
assessment and supporting scoring in large courses
and MOOCs: first experiences at UNED (2015-2017)

Department of Economic
History and Applied Economics
Department of Developmental
and Educational Psychology
Economic History Teachers Team G-Rubric software developers
FACULTY OF ECONOMICS FACULTY OF PSYCHOLOGY
Miguel Santamaria José M. Luzón Guillermo de JorgeMauro Hernández
Our goal was to improve formative assessment in online courses giving personalised feedback
Department
of Personality
Ángeles Sánchez-Elvira
G-Rubric user

Summary
1. Our challenge: How semantic technologies
could help us to:
• give personalised feedback on open-ended questions
• support our tutors to score TMAs in a more reliable way
2. What G-Rubrics is and how it works?
3. Analysis of our experiences giving automatic
formative feedback on open-ended questions
4. Proposal about how G-Rubric could cope with
problems related to manual grading
5. Results and conclusions

How to give personalised feedback on
open-ended activities

•Personalising learning
• Fostering performance
improvement
• Increasing motivation
01/11/2017 msantamaria@cee.uned.es 5
FEEDBACK IS THE KEY FACTOR FOR

Wich is the kind of feedback that our students expect?
•Quick
•Iterative
• They love learning by trial and error
CHARACTERISTICS OF EXPECTED
FEEDBACK
Only technology can provide this kind of feedback

Feedback based on technologies offers limited solutions
At classroom
• “clickers”
• (Socrative, Kahoot)
In e-learning platforms
• Quizzes
• Adaptive quizzes

Quizzes have severe limitations to assess learning outcomes on
economic history field
Our challenge was how to give:
• quick and iterative feedback
• for open-ended questions
• in a sustainable way
• by using technologies
• Knowledge about
Economic History
• Soft skills:
• Analysis
• Critical thinking
• Multiple choice questions
• Open-ended short questions
about concepts, historical
processes, etc
• Writing comments of texts,
maps, graphs, statistical data
LEARNING OUTCOMES ASSESSMENT ACTIVITIES

WHAT G-RUBRIC IS AND HOW IT WORKS

2nd step
3rd step
1st step To build up a specialized linguistic corpus and a Semantic Space
6 Economic History textbooks
Semantic SpaceCorpus
Activities based on short open-ended questions should be developed
To deliver the activities to our students we use a web interface
Students
Web interface
IN-built rubric space
To implement G-Rubric into a subject we need to follow 3 steps
Answer
Feedback
Canon answer

Example of a G-Rubric open-ended activity
Question
Canon answer
Or Golden text
Conceptual
axes
Mercantilism: policies and objectives.
“Mercantilism is a set of ideas and policies deployed in early modern Europe
(16th, 17th and 18th centuries) aimed at strengthening the State through
economic power, and specially focused on trade-balance surpluses and
accumulation of precious metals (bullionism).
The are several types of policies, emphasizing: a) those focused on obtaining trade
balance surpluses (tariff protectionism, prohibition on exporting gold or silver or
raw materials, privileged trading companies, shipping records, colonial
monopolies); B) promotion of manufactures (import tariffs or prohibitions, laws
against luxury, real manufactures); C) other policies: favoring the birth rate,
limitation or rate of interior prices.
They are often associated with the names of Colbert in France, or the English or
Dutch companies of India (VOC).
Definition : mercantilism, ideas, practices, state, economy, monarchy, strengthen, reinforce,
increase, trade balance, favorable, bullonism, precious metals, gold, silver, privileges.
Trade policies: trade, protectionism, tariffs, prohibition, exports, imports, privileged
companies, records of navigation, colonies, monopoly, fleet, merchant, surplus
Manufacturing policies: manufactures, factories, real, luxury, import substitution
Context: Europe, England, France, Holland, Colbert, XVI, XVII, XVIII, modern, VOC, East Indies,
West Indies.

An example to understand how G-Rubric works
G-Rubric web interface

The student selects an activity
1.-Mercantilism
2.- Triangular Trade
3.- Coal and Ind. Rev.
4.- Gerschenkron
5.- Second Industrial Revolution
6.- Consequences of IWW
7.- Bretton Woods
1.- Mercantilism

The student introduces the answer
“Mercantilism is a set of ideas and policies deployed in early modern Europe (16th, 17th
and 18th centuries) aimed at strengthening the State through economic power, and
specially focused on trade-balance surpluses and accumulation of precious metals
(bullionism).

After submitting an answeer the students receive feedback
consisting of
“Mercantilism is a set of ideas and policies deployed in early modern Europe (16th, 17th and
18th centuries) aimed at strengthening the State through economic power, and specially
focused on trade-balance surpluses and accumulation of precious metals (bulionism).
Content grade
Graphical
feedback
Style
grade
Acceptance
area
Definition
Trade
Manufact
Context
Grammatical
accuracy
to what
extent the
answer is
correct.

After checking th feedback
The student improves their answer by adding new information
“Mercantilism is a set of ideas and policies deployed in early modern Europe (16th, 17th
and 18th centuries) aimed at strengthening the State through economic power, and
specially focused on trade-balance surpluses and accumulation of precious metals
(bulionism).
Amongst mercantilist polices, some outstand, i.e. those focused on attaining surpluses in
trade balance through tariff protection, prohibition of exports of gold, silver and raw
materials, creation of chartered trade companies, navigation acts and commercial
monopolies”.

A new feedback is provided
The content grade
grow-up
the answers for each conceptual axis get closer to the acceptance area

EXPERIENCES CARRY OUT
BETWEEN 2015-2016
Providing personalized formative assessment

Experiences using G-Rubrics in 2015 and 2016
• The trials carried out were focused on providing
formative assessment
• Our goal was to promote deep learning through
iterative feedback
• G-Rubric offers two main advantages regarding
formative assessment:
• It allows as many attempts as lecturers set
• gives the students immediate rich feedback
• All trials have been conducted with first year
Business Administration Degree students

Two experiences (2015 and 2016): goals
• Could Grubrics be able to give
accurate feedback?
• Could the feedback allow an
improvement on following answers?
• Could rich feedback increase the
time devoted to the activity?
OUR QUESTIONS
• The impact on their
motivation
• The utility to prepare the
final exam
• The level of agreement with
the grades received
STUDENTS OPINIONS ABOUT
2015: 132 Volunteers 2016: 120 Volunteers
The enriched graphical feedback increases:
• The number of trials performed by the students
• The amount of time devoted to the task

Content grade improvement
01/11/2017 msantamaria@cee.uned.es 2101/11/2017
msantamaria@cee.uned.es
21
The average percentage score increases between first and last attempt
Activity 1 Activity 3Activity 2 Activity 4 Activity 5 Activity 6 Activity 7
We could verify how students using the feedback could improve their answers

Students’ agreement with the grades received
The level of agreement was bigger in the last trial
First trial
47%
very much or totally agree
Last trial
70%
very much or totally agree

G-Rubric had a positive impact on students’ motivation
Totally or very much: 65%
Totally or very much: 60%

Usefulness and positive value
The 80 % of students
considered Grubric totally
or very much useful
regarding exam
preparation
More than 80 % of
students considered this
experience very much or
totally positive

BEYOND FORMATIVE ASSESSMENT: HOW
SEMANTIC TECHNOLOGIES CAN HELP
TUTORS TO MARK TMAs

Are humans reliable to mark open ended questions?
• Inter-examiners variability depending on who
marked the task
• Intra-examiner reliability depending on when the
same tutor marked the task
Students view manual grading of open-ended questions as
subjective
➢ In contrast automated test assessement is perceived as
more objective
Manual grading has almost two problems:

Accidentally double grading (2012 & 2013)
Two members of the academic team, independently and unknowingly, graded
the same exams.
• The differential was in an average of 1,5 points over 8
• Final grade differed substantially > 37,5% not obtain a passing grade
-1,5
-1
-0,5
0
0,5
1
1,5
2
2,5
3
3,5
4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
"Essay+short questions grade differential"
Essay grade differential
Figure 5. Differential in grades for doubly-assessed exams (June 2012)*
*Referred to 24 Econonic History final exams from Barcelona-CUXAM Regional Center (June 2012)

Accidentally double grading (2013)
-1
-0,5
0
0,5
1
1,5
2
2,5
3
3,5
4
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77
Essay+Short Questions differential (Grade 2-Grade 1)
Figure 6. Differential in grades for doubly-assessed exams (June 2013)*
Referred to 76 Econ. History final exams from Valencia-Alzira Regional Center (June 2013)
• We found the same
• The differential was in an average of 0,9 points over 8
• Final grade differed substantially (21%) not obtain a passing grade

Correlations between grades assigned by examiners
2012 2013
n 20 76
GLOBAL GRADE 0,82 0,88
SORT QUESTIONS 0,85 0,87
TEXT COMMENTARY 0,70 0,67
Despite these differences between examiners we found:
• A high correlation on the global score and short questions
• Lower correlation on text commentary grades

Comparing how tutors and G-Rubrics marks TMA
• Grubric could cope simultaneously with both problems:
• Inter-examiners variability
• Intra-examiner reliability
• A fragment of "the Wealth of Nations", by Adam Smith, was selected to be
commented by students.
• A rubric was build to minimise inter-examiners variability.
• A G-Rubric's object, similar to those above described, was designed and their
axes were aligned with the rubric used by tutors to mark the students'
assignments
• The tutors graded these assignments using the rubric
• The teaching team used GRubric to grade the students' TMA again
• 252 TMAs were double-graded to compare G-Rubric and Tutors marks
Our first step has been to compare how tutors and G-Rubric grades TMAs

What have we found comparing grades given by tutors
and GRubric?
2.- Pearson correlations between GRubric´s and tutor´s marks
yielded a large effect size (.549**).
M SD Min Max
Tutor’s
Marks
5.95 1.45 1.55 8.54
GRubric
Marks
5.92 1.61 2.13 9,20
Main Descriptives of Tutors and GRubric marks (N=252)
An independent samples t test yielded no significant differences between the means of
Tutors and GRrubric marks, t(251), p=.720, ns **. The correlation is significant at the 0.01
level (bilateral)
1. - No significative difference between means.

Grades distributions: analysis of frequencies
0,79
4,37
6,75 6,35
30,56
28,57
14,68
7,94
4,76
9,92
15,08
17,06
22,62
21,83
7,94
0,79
0
5
10
15
20
25
30
35
0 a 1 1 a 2 2 a 3 3 a 4 4 a 5 5 a 6 6 a 7 7 a 8 8 a 9 9 a 10
3.- G-Rubric’s marks were more homogeneously distributed in
comparison with the higher concentration of the Tutors’ marks in the ranges
between 5 and 7 points
Tutors grades Grubric’s grades
Points ranges
Percentagesofgradesintoeachrange

Analysis of the homogeneity of G-Rubric and tutor’s marks
Tutor Mark GRubric Mark
Mark
Difference
Chi-
cuadrado
69,14 47,21 74,49
gl 36 36 36
p ,001 ,100 ,000
Kruskal-Wallis analyses for the evaluation of Marks homogeneity between the 37 tutoring groups
4.- Tutors’ marks presented a significant inter-group variability,
as well as mark difference.
On the contrary, G-Rubric marks did not differ significantly between
these same tutorial groups, proving, thus, its higher levels of homogeneity.

Main conclusions
• Automated-assessment software such as G-Rubric is currently
mature enough to be used with students.
• The kind of feedback offered was useful to improve the students’
performance
• Results in terms of students’ satisfaction are also encouraging.
• For teachers, the time and effort required is affordable.

• A remarkable correlation and no significant differences
between the means has been found.
• Tutors’ scores presented a significant inter-group variability
• On the contrary, G-Rubric’s marks did not differ significantly
between these same tutorial groups, proving, thus, its
higher levels of homogeneity
Our proposal:
The students’ essays will be grade first using G-Rubric,
afterward tutors will grade again to validate or modify the
grades given.
Regarding how Grubric could support grading

Download page
http://www.elsemantico.es/gallito20/download-eng.html

References
Cascón, L., & Antonio, J. (1989). Comprensión y memoria de textos expositivos: diferencias entre sujetos expertos y novatos. Recuperado a partir de
https://repositorio.uam.es/handle/10486/4362
Forsman, S. (1985). Writing to learn means learning to think. Roots in the Sawdust, 162–174.
Hernández, M., & Santamaría Lancho, M. (s. f.). G-Rubric: una aplicación para corrección automática de preguntas abiertas. Primer balance de su utilización. G-Rubric:
an application for automatic assessment of free-text questions: first outcome analysis. Recuperado a partir de http://www.xiiedhe.unican.es/wp-
content/uploads/2016/04/hernandezsantamaria.pdf
Jorge Botana, G. (2010). La técnica del análisis de la Semántica Latente (LSA/LSI) como modelo informático de la comprensión del texto y el discurso una aproximación
distribuida al análisis semántico. Universidad Autónoma de Madrid. Recuperado a partir de https://dialnet.unirioja.es/servlet/tesis?codigo=27624
Jorge-Botana, G., Leon, J. A., Olmos, R., & Escudero, I. (2010). Latent semantic analysis parameters for essay evaluation using small-scale corpora*. Journal of
Quantitative Linguistics, 17(1), 1–29.
Jorge-Botana, G., León, J. A., Olmos, R., & Hassan-Montero, Y. (2010). Visualizing polysemy using LSA and the predication algorithm. Journal of the American Society
for Information Science and Technology, 61(8), 1706–1724.
Jorge-Botana, G., Olmos, R., & Barroso, A. (2012). The Construction-Integration framework: a means to diminish bias in LSA-based call routing. International Journal
of Speech Technology, 15(2), 151–164.
Jorge-Botana, G., Olmos, R., & Barroso, A. (2013). Gallito 2.0: A natural language processing tool to support research on discourse. En Proceedings of the 13th Annual
Meeting of the Society for Text and Discourse. Recuperado a partir de http://elsemantico.es/Documentos/Gallito2_Valencia_new.pdf
Jorge-Botana, G., Olmos, R., & León, J. A. (2009). Using latent semantic analysis and the predication algorithm to improve extraction of meanings from a diagnostic
corpus. The Spanish journal of psychology, 12(02), 424–440.
Julià, J. M. (1999). Aprendizaje a través de la escritura. Actas de las V Jornadas de Enseñanza Universitaria de Informática, Jenui, 99, 205–210.
Olmos, R., Jorge-Botana, G., León, J. A., & Escudero, I. (2014). Transforming selected concepts into dimensions in latent semantic analysis. Discourse Processes, 51(5-
6), 494–510.
Olmos, R., León, J. A., Escudero, I., & Jorge-Botana, G. (2009). Análisis del tamaño y especificidad de los corpus en la evaluación de resúmenes mediante el LSA: Un
análisis comparativo entre LSA y jueces expertos. Revista signos, 42(69), 71–81.
Olmos, R., León, J. A., Escudero, I., & Jorge-Botana, G. (2011). Using latent semantic analysis to grade brief summaries: some proposals. International Journal of
Continuing Engineering Education and Life Long Learning, 21(2-3), 192–209.
Olmos, R., León, J. A., Jorge-Botana, G., & Escudero, I. (2009). New algorithms assessing short summaries in expository texts using latent semantic analysis. Behavior
Research Methods, 41(3), 944–950.
Parker, R. P., & Goodkin, V. (1987). The Consequences of Writing: Enhancing Learning in the Disciplines. ERIC. Recuperado a partir de http://eric.ed.gov/?id=ED272928
Roscoe, R. D., Allen, L. K., Weston, J. L., Crossley, S. A., & McNamara, D. S. (2014a). The Writing Pal intelligent tutoring system: Usability testing and development.
Computers and Composition, 34, 39–59.
Roscoe, R. D., Allen, L. K., Weston, J. L., Crossley, S. A., & McNamara, D. S. (2014b). The Writing Pal intelligent tutoring system: Usability testing and development.
Computers and Composition, 34, 39–59.
Roscoe, R. D., Brandon, R. D., Snow, E. L., & McNamara, D. S. (2013). Game-based writing strategy practice with the Writing Pal. Exploring technology for writing and
writing instruction, 1–20

Using semantic technologies for giving a formative assessment and supporting scoring in large courses and MOOCs: first experiences at UNED (2015-2017)

Recommended

Recommended

More Related Content

More from UNED

More from UNED (13)

Recently uploaded

Recently uploaded (20)

Using semantic technologies for giving a formative assessment and supporting scoring in large courses and MOOCs: first experiences at UNED (2015-2017)