This document presents a data-driven method for detecting close submitters in online learning environments. The authors designed an algorithm to identify pairs or groups of accounts that consistently submit assignments very close in time. They applied the algorithm to MOOC data from two Coursera courses. The algorithm identified close submitter pairs and communities. Analysis found these close submitters had statistically different outcomes compared to other students, such as higher grades and certificate earning. The authors discuss implications and opportunities for future work improving the algorithm and studying close submitters.
Detection of Close Submitters in Online Learning Environments
1. LOGO
A Data-driven Method for the
Detection of Close Submitters in
Online Learning Environments
José A. Ruipérez Valiente a,b – @JoseARuiperez
Srećko Joksimović c – @s_joksimovic
Vitomir Kovanović c – @vkovanovic
Dragan Gašević c – @dgasevic
Pedro J. Muñoz Merino a – @pedmume
Carlos Delgado Kloos a – @cdkloos
a Universidad Carlos III de Madrid
b IMDEA Networks Institute
c The University of Edinburgh
WWW’17, Perth
2. Overview
Detect pairs or groups of accounts that always submit their
assignments very close in time
Main goals:
Design and develop a general algorithm to detect these accounts
Apply it to our specific case study with Massive Open Online Course (MOOC) data
Analyze and discuss the results in different directions
Related to:
Emerging groups and collaboration in MOOCs (surveys and social activity)
Enrolling in a MOOC with friends improves completion rate [Brooks et al., 2015] and they enjoy
watching videos in groups [Li et al., 2014]
Copying Answers using Multiple Existence Online (CAMEO) [Ruipérez-Valiente et al., 2016;
Northcutt et al., 2016; Alexandron et al., 2016]
Academic dishonesty (breaking honor code) and gaming the system (exploit system properties)
2
WWW’17, Perth
@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments
3. 3
WWW’17, Perth
@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments
Basic problem description
N number of accounts | M number of assignments, then:
𝑠𝑝𝑖 = 𝑠𝑝𝑖,1 𝑠𝑝𝑖,1 ⋯ 𝑠𝑝𝑖,𝑀 , 𝑖 ∈ 1 ⋯ 𝑁
where 𝑠𝑝𝑖,𝑗 is the submission timestamp
of student i for assignment j. Then we
define SP as:
𝑆𝑃 =
𝑠𝑝1
𝑠𝑝2
⋮
𝑠𝑝 𝑁
=
[𝑠𝑝1,1 𝑠𝑝1,2 𝑠𝑝1,3
[𝑠𝑝2,1 𝑠𝑝2,2 𝑠𝑝2,3
⋮
[𝑠𝑝 𝑁,1
⋮
𝑠𝑝 𝑁,2
⋮
𝑠𝑝 𝑁,3
⋯ 𝑠𝑝1,𝑀]
⋯ 𝑠𝑝2,𝑀]
⋱
⋯
⋮
𝑠𝑝 𝑁,𝑀]
𝐷𝑆 =
𝑑𝑠1,1 𝑑𝑠1,2 𝑑𝑠1,3
𝑑𝑠2,1 𝑑𝑠2,2 𝑑𝑠2,3
⋮
𝑑𝑠 𝑁,1
⋮
𝑑𝑠 𝑁,2
⋮
𝑑𝑠 𝑁,3
⋯ 𝑑𝑠1,𝑁
⋯ 𝑑𝑠2,𝑁
⋱
⋯
⋮
𝑑𝑠 𝑁,𝑁
then we can define a distance matrix
DS where 𝑑𝑠𝑖,𝑗 = 𝑑𝑖𝑠𝑠(𝑠𝑝𝑖, 𝑠𝑝𝑗). Note:
• Matrix is symmetric and hollow
• High complexity 𝑂(𝑁2
∗ 𝑑)
• Keep set D of unique distances
4. 4
WWW’17, Perth
@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments
Problem operationalization
Assignments
Keep only graded quizzes and last submission to each quiz
Course accounts
Keep those accounts that submitted all graded quizzes
Dissimilarity measure
Mean Absolute Deviation (MAD)
Mean Squared Deviation (MSD)
𝑑𝑖𝑠𝑠 𝑀𝐴𝐷 𝑠𝑝𝑖, 𝑠𝑝𝑗 =
1
𝑀
𝑘=1
𝑀
𝑠𝑝𝑖,𝑘 − 𝑠𝑝𝑗,𝑘
𝑑𝑖𝑠𝑠 𝑀𝑆𝐷 𝑠𝑝𝑖, 𝑠𝑝𝑗 =
1
𝑀
𝑘=1
𝑀
𝑠𝑝𝑖,𝑘 − 𝑠𝑝𝑗,𝑘
2
5. 5
WWW’17, Perth
@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments
Two MOOCs on Coursera by the University of Edinburgh
Introduction to Philosophy (PHIL)
• One graded quiz per week, 6-12 questions per quiz
• 7 weeks
• 2359 accounts submitted all assignments
Music Theory (MUSIC)
• One graded quiz per week, 10-14 questions per week
• 5 weeks
• 5159 accounts submitted all assignments
Example of notation 𝐷𝑆 𝑚𝑢𝑠
𝑀𝐴𝐷
Case study
6. 6
WWW’17, Perth
@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments
Compute set D for both
courses and
dissimilarity measures
𝐷 𝑚𝑢𝑠
𝑀𝐴𝐷
and 𝐷 𝑚𝑢𝑠
𝑀𝑆𝐷
13.305.061
(i.e., (5.159*5.158)/2)
𝐷 𝑝ℎ𝑖𝑙
𝑀𝐴𝐷
and 𝐷 𝑝ℎ𝑖𝑙
𝑀𝑆𝐷
2.781.261
Distances overview and distribution
7. 7
WWW’17, Perth
@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments
We follow the next steps:
Select an initial threshold by ‘common-sense’ MAD = 30 minutes
Compute quantile that value represents 4.81e-6 for MUSIC and
5.76e-6 for PHIL
Based on that initial threshold, we test different quantiles and
select one of them
Identifying close submitters
8. 8
WWW’17, Perth
@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments
Close submitter pairs by quantile
Quantile
Course
MUSIC PHIL
6e-6
Account pairs 78 17
MAD threshold 0.61h 0.57h
MSD threshold 0.51h2 0.51h2
1e-5
Account pairs 132 28
MAD threshold 0.9h 1.25h
MSD threshold 1.15h2 1.98h2
5e-5
Account pairs 664 140
MAD threshold 2.9h 4.98h
MSD threshold 10.94h2 38.13h2
9. 9
WWW’17, Perth
@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments
Based on the identified pairs of ‘close submitters’
Identifying couples and communities
Graph nodes connected
with a undirected edge
between each one of the
pairs
MUSIC: 99 different
accounts, 30 couples
PHIL: 26 different
accounts, 11 couples
10. 10
WWW’17, Perth
@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments
Selected variables:
FinalGrade: The final numeric course grade (between 0 and 100)
GotCertificate: Boolean variable representing certificate
SubmissionCount: Number of submissions
ActiveDaysCount: Number of active days
DistinctVideoCount: Number of videos accessed or downloaded
DistinctThreadCount: Number of discussion topics accessed
Examining differences: ‘close submitters’ vs. others
11. 11
WWW’17, Perth
@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments
Examining differences: ‘close submitters’ vs. others
MANOVA is significant
for both courses and for
both certificate and non-
certificate earners
All independent t-tests
are significant too
12. Discussion and conclusions
12
WWW’17, Perth
@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments
‘Close submitters’ are a population statistically different than
the rest of accounts
What are they actually doing?
Is it good or bad for learning achievement?
Implications for learning, research and certificate value
13. Future work
Clustering based on their indicators Assess different
associations
Couple and community analysis roles, good or bad for
learning, etc
Algorithm improvements more robust, different criteria
Bigger longitudinal study with more MOOCs to increase
generalizability
Other settings e.g., online on-campus courses for credit
13
WWW’17, Perth
@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments
14. LOGOWWW’17, Perth
A Data-driven Method for the Detection of Close Submitters in Online Learning Environments
José A. Ruipérez Valiente a,b – @JoseARuiperez
Srećko Joksimović c – @s_joksimovic
Vitomir Kovanović c – @vkovanovic
Dragan Gašević c – @dgasevic
Pedro J. Muñoz-Merino a – @pedmume
Carlos Delgado Kloos a – @cdkloos
a Universidad Carlos III de Madrid
b IMDEA Networks Institute
c The University of Edinburgh