A Mini-Thesis Submitted For Transfer From MPhil To PhD Predicting Student Success With Learning Analytics On Big Data Sets Conditioning And Behavioural Factors

UNIVERSITY OF SOUTHAMPTON
Faculty of Physical Sciences and Engineering
Electronics and Computer Science
A mini-thesis submitted for transfer from
MPhil to PhD
Supervisors: Ed Zaluska (ejz), Dave Millard (dem)
Examiner: Mark Weal (mjw)
Predicting Student Success with
Learning Analytics on Big Data
Sets: Conditioning and Behavioural
Factors
by Adriana Wilde
July 10, 2014

UNIVERSITY OF SOUTHAMPTON
FACULTY OF PHYSICAL SCIENCES AND ENGINEERING
ELECTRONICS AND COMPUTER SCIENCE
Predicting Student Success with Learning Analytics on Big Data Sets:
Conditioning and Behavioural Factors
A mini-thesis submitted for transfer from MPhil to PhD
by Adriana Wilde
ABSTRACT
Advances in computing technologies have a profound impact in many areas of human
concern, especially in education. Teaching and learning are undergoing a (digital) rev-
olution, not only by changing the media and methods of delivery but by facilitating
a conceptual shift from traditional face-to-face instruction towards a learner-centered
paradigm with delivery increasingly becoming tailored to student needs. Educational
institutions of the immediate future have the potential to predict (and even facilitate)
student success by applying learning analytics techniques on the large amount of data
they hold about their learners, which include a number of indicators that measure both
the conditioning (under which students are subjected) and the behavioural factors (what
students do) influencing whether a given student will be successful. More than ever
before, key information about successful student habits and learning context can be
discovered.
Our hypothesis is that collective data can be used to construct a model of success for
Higher Education students, which then can be used to identify students at risk. This
is a complex issue which is receiving increased attention amongst e-learning commu-
nities (of which Massive Open Online Courses are an example), and administrators of
learning management system alike. Smartphones, as sensor-rich, ubiquitous devices, are
expected to become an important source of such data in the imminent future, increasing
significantly the complexity of the problem of devising an accurate predictive model of
success.
This interim thesis presents the relevant issues in predicting student success using learn-
ing analytics approaches by incorporating both conditioning and behavioural factors
with the ultimate goal of informing behavioural change interventions in the context of
learning in Higher Education. It then discusses our work to date and concludes with a
workplan to generate publishable results.

Contents
1 Introduction 1
2 Background and Literature Review 4
2.1 Higher education learners today . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 A digitally-literate generation of students . . . . . . . . . . . . . . 4
2.1.2 Mature students in HE . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Computers and learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Learning Management Systems . . . . . . . . . . . . . . . . . . . . 7
2.2.2 Learning analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.3 Massive Open Online Courses . . . . . . . . . . . . . . . . . . . . . 10
2.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Smart badges and smartphones . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Behaviour sensing and intervention . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Final comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 A research question 18
3.1 What are the measurable factors for the prediction of student academic
success? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4 Outcomes of Work to Date 21
4.1 Survey of HE English-speaking students . . . . . . . . . . . . . . . . . . . 21
4.1.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.1.2 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Survey of students from the University of Chile . . . . . . . . . . . . . . . 24
4.2.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2.2 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 U-Cursos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3.1 Current status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5 Research Plan for Final Thesis 31
5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2 Research question and research hypotheses . . . . . . . . . . . . . . . . . 32
5.3 Work Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.4 Contingency research plan . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
ii

CONTENTS iii
6 Conclusions 43
References 45
A Beyond this thesis 56
A.1 How to help students reflect on their behaviour? . . . . . . . . . . . . . . 56
B Predictability of human behaviour 60
C Survey questions 62
D A word cloud of concerns 66
E The U-Cursos experience 68
F U-Campus Screenshots 75
G Chilean University Selection Test 77
H Additional research 81
H.1 Audience response systems (zappers) . . . . . . . . . . . . . . . . . . . . . 81
H.1.1 Own experience with zappers . . . . . . . . . . . . . . . . . . . . . 82
H.2 Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
H.3 Internet of Things . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
H.4 Activity Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

List of Figures
2.1 Multi-level categorisation model of conceptions of teaching . . . . . . . . . 8
2.2 Smart badges: The Active Badge by Palo Alto Research Centre . . . . . . 11
2.3 Smart badges: The HBM (external and internal appearance) . . . . . . . 11
2.4 Smart badges: The MIT wearable sociometric badge . . . . . . . . . . . . 12
2.5 A smartphone sensing architecture . . . . . . . . . . . . . . . . . . . . . . 13
2.6 Components of digital behaviour interventions using smartphones . . . . 16
4.1 Survey responses from UK students (excluding qualitative data). . . . . . 23
4.2 Survey of University of Chile students: First screen . . . . . . . . . . . . . 25
4.3 Survey responses from students of the University of Chile (excluding qual-
itative data). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.4 U-Cursos view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5 Cramped look to the U-Cursos web interface from a smartphone . . . . . 28
4.6 Access graph between 2010 and 2014 for U-Cursos . . . . . . . . . . . . . 29
5.1 Data architecture at the University of Chile. . . . . . . . . . . . . . . . . . 36
D.1 Participants’ answers to the question “Do any of the potential applications
described cause you any concern? Which ones? Why?” . . . . . . . . . . . 66
F.1 U-Campus courses catalogue. . . . . . . . . . . . . . . . . . . . . . . . . . 75
F.2 U-Campus module catalogue for the Computer Science course. . . . . . . 76
G.1 Chilean University Selection Test (PSU) - step one . . . . . . . . . . . . . 77
G.2 PSU - step two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
G.3 PSU - step three . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
G.4 PSU - step four . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
H.1 A commercial zapper: A TurningPointTMresponse card . . . . . . . . . . . 82
H.2 Zappers in action: Example exam question with student responses . . . . 83
H.3 Zappers in action: Appraising students confidence on their self-assessment
before (left slide) and after (right slide) the solution was discussed in class. 84
iv

List of Tables
3.1 What do students do? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.1 U-Cursos services ranked in ascendent order of popularity amongst users. 30
5.1 Schedule of research work and thesis submission (A Gantt chart) . . . . . 35
5.2 University Selection Tests (PSU) data fields . . . . . . . . . . . . . . . . . 38
5.3 FutureLearn Platform Data Exports . . . . . . . . . . . . . . . . . . . . . 41
A.1 Table of interventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
v

Chapter 1
Introduction
Recent developments in mobile technologies are characterised by a high integration of
information processing, connectivity and sensing capabilities into everyday objects. It
is now easier than ever to collect, analyse and exchange data about our daily activities:
revolutionising how humans live, work and learn. This is particularly true amongst
higher education students, who already generate a rich “data trail” as they navigate
their way through towards successful completion of their studies.
Traditional learning analytics research focuses on the use of data an educational
institution holds about their students to promptly identify poor performance so that
actions that can be taken to encourage success. Struggling students in particular need to
be directed to be able to complete their courses more successfully (Baepler and Murdoch,
2010), as the failure to do so comes to a great cost, not only to these students but to
their institutions. This is a difficult issue, as measures of success are usually limited
to traditional indicators such as progression and academic performance. For a student,
an educational institution and the wider society, “success” would have to be defined by
retention, level of engagement and contentment as well as achievement of higher marks.
Against this context, Higher Education institutions have, in recent years, devoted
great efforts to support students and encourage them to succeed, by making learning
materials widely available to their students, for example. Furthermore, the greater
affordability of smartphones and the ubiquity of the Internet not only allows students
to access learning materials at any time and any where (although students may well
not see this as the primary benefit of such technologies), but also allows academics to
learn more about student habits and context than ever before. In other words: what do
students actually do and could this information empower them to do better?
One valid approach to understanding how students learn may use technology to
gather data about the conditioning factors for their success as well as the behaviours
they adopt in their student lives. A second step would then use these indicators to
1

Chapter 1 Introduction 2
predict student success in time to perform an intervention on those students identified
as “at risk”. The technology available for collecting activity data is not only becoming
more diverse and powerful but it is also becoming widely available at a decreasing costs,
hence increasing the potential for building “Big Data” collections on which sophisticated
prediction models could be devised.
Students of today have unprecedented access to a breadth of technology, and this
increase in access justify in its own right an study into how to bring pervasive computing
ideas into learning analytics. Pervasive computing is a ‘post-desktop’ computing model
under which, greater processing power, connectivity and sensing are all available at a low
cost, facilitating a widespread adoption of sensor-loaded, powerful, mobile devices. This
active area of research is concerned with context-awareness, i.e. how tailored services
can be offered to users via interconnected computing devices that are sensitive to the
users context as determined by the processing of sensor data. One area of application
of increasing interest is education. However, in this area much of the current interest
tends to focus on the delivery of learning resources to students (Laine and Joy, 2009,
and references therein) and the provision of virtual learning environments rather than
identifying what students do.
The application of pervasive computing in the area of education exploits both the
opportunity of the ubiquity of devices and the increasing interest in new technology
exhibited across the current generation of students. Although there has been a great
amount of research in this direction (Laine and Joy, 2009; Hwang and Tsai, 2011, and
references therein), most of this research has been focused on the use of pervasive tech-
nologies to:
• enrich student learning experiences indoors and/or outdoors with digital augmen-
tation (Rogers et al., 2004, 2005);
• assess students (Cheng et al., 2005);
• increase access to content and annotation capabilities in support of peer-to-peer
learning (Yang, 2006);
• inform the learning activity design taking student context into account (Hwang,
Tsai, and Yang, 2008);
• increase interaction by broadening discourse in the classroom (Anderson and Serra,
2011; Griswold et al., 2004) or by playing mobile learning games (Laine et al.,
2010);
• enable ubiquitous learning in resource-limited settings, and observing the influence
of new tools in the adaptation of learning activities and community rules (Pimmer
et al., 2013);

Chapter 1 Introduction 3
• “deconstruct” everyday experiences into digital environments (Owens, Millard,
and Stanford-Clark, 2009; Dix, 2004).
These examples demonstrate the possibility of applying such technologies in educa-
tion. However, they had not set out to use contextual information in order to predict
or even understand student behaviours. To address this shortcoming, we will consider
context aware computing methods and techniques that have been applied successfully in
the areas of healthcare, assisted living and social networking, and apply them to Higher
Education to complement knowledge gained through traditional educational analytics.
Many researchers have worked on the acquisition of context in general and on the dis-
crimination of human activity in particular, such as dos Santos et al. (2010); Lau (2012);
Bieber and Peter (2008); Huynh and Schiele (2005) and Khattak et al. (2011). Their
findings could be applied in this area of research too, especially as the rapid emergence of
the Internet of Things (IoT) means that the available sensor data will grow exponentially
(Manyika et al., 2011). In my opinion, the application of novel techniques from pervasive
computing into an investigation of student behaviour is worth exploring (Wilde, 2013;
Wilde, Zaluska, and Davis, 2013c,d). Indeed, I am interested in exploring the untapped
possibilities of extending learning analytics in a data-rich environment such as the one
that will be prevalent in the Internet of Things, where all specific activities and general
behaviour of students will leave “fingerprints of data” about them. This data trail af-
fords specific contextual information, capable of analysis for measures of engagement,
collaboration and attainment, thereby enabling the provision of adequate and timely
feedback.
Within this research I have already considered certain aspects related to the study of
behaviour in the population of interest, akin to those in ethnographic methods, with my
specific contribution residing on the disconnect between intentions of privacy as declared
by smartphone users and the actual privacy levels evident in their phone interactions
(Wilde et al., 2013b), which is one of the findings from a survey described in detail later
in this report.
This remainder of this upgrade report is organised as follows: Chapter 2 considers
the characteristics of our learners, explores the state of the art in context-aware tech-
nologies and their existing use in education as well as looking at the predictability of
human behaviour and the type of data that is available in order to infer behaviour.
Chapter 3 examines the research question to be addressed during this research: what
are the measurable factors for the prediction of student academic success?. Chapter 4
presents the research work to date, specifically the design and application of a survey of
Higher Education students (in the UK and in Chile), as well as information discovery
for a suitable dataset to explore these factors (on University of Chile students), which
will be prepared by combining data from the platforms U-Campus and U-Cursos here
described. These chapters lead into a plan for the remaining work, which is detailed in
5. Finally, the conclusions of this upgrade thesis are presented in Chapter 6.

Chapter 2
Background and Literature
Review
The general motivation for this research is assisting higher education students to achieve
success. As they are the subjects of interest, they are more precisely described in Sec-
tion 2.1. Then, I look into the use of digital technologies for learning (in Section 2.2),
both from the educational institutions and their students viewpoints, as well as ways
of using mobile and wearable technologies to learn more about students (Section 2.3).
Section 2.4 reviews existing literature on the identification of human behaviour through
these technologies. Finally, Section 2.5 appraises this review as a foundation for predic-
tion of student success using a characterisation of students from measurable data about
their conditioning and behavioural factors.
2.1 Higher education learners today
To learn about student behaviour, it is useful to start with identifying salient charac-
teristics of the students in higher education today, considering those of the “typical”
student, as well as those pertaining to students that do not fit into that classification.
Specifically, I’ll look into two dimensions: one, being the student levels of efficacy or
even engagement with digital technologies (in sub-section 2.1.1) and another one, the
age group to which the student belongs (sub-section 2.1.2).
2.1.1 A digitally-literate generation of students
Prensky’s term digital natives (Prensky, 2001a) is one amongst many1 used to identify
those born “typically between 1982 and 2003 (standard error of ±2 years)” (Berk, 2009,
1
Terms include: Millennials, Generation Y, Echo Boomers, Trophy Kids, Net Generation, Net Geners,
First Digitals, Dot.com Generation and Nexters (Berk, 2009). Other terms are: cybercitizens, netizens,
4

Chapter 2 Background and Literature Review 5
2010). Members of this group, by this definition, are now 11 to 32 years old, so the ma-
jority of students in higher education today would belong to it. Furthermore, according
to Prensky (2001b), many may even process and interpret information differently (al-
legedly due to the plasticity of the brain). These assertions would imply that what have
been regarded as traditionally effective study habits and behaviours for previous gener-
ations are no longer effective and need to be reviewed to accommodate to the needs of
the current generation of students.
Nevertheless, since only a fraction of the world population access digital technologies
to achieve ‘native’-like fluency in their use, the term “digital natives” is not a fit descrip-
tion (Palfrey and Gasser, 2010), and for this reason (amongst others) it has become less
accepted in the current educational discourse. Education, experience, breadth of use
and self-efficacy are more relevant than age in explaining how people become “digital
natives” (Helsper and Eynon, 2010). As a response, Kennedy et al. (2010) proposed
a different classification based on a study comprising 2096 students in Australian uni-
versities: “power users (14% of sample), ordinary users (27%), irregular users (14%)
and basic users (45%)”. However, rather than a discrete classification, a more useful
typology is a continuum, as individuals are placed along it depending on a number of
factors. Jones and Shao (2011) indicate that various demographic factors affect student
responses to new technologies, such as gender, mode of study (distance or place-based)
and whether the student is a home or international one. A JISC report questions the
validity of certain attributed characteristics of this generation (Nicholas, Rowlands, and
Huntington, 2008). Examples are: a preference for “quick information” and the need
to be constantly connected to the web, now proved to be myths: these traits are not
generational. Whilst Turkle (2008) notes that young people have digital devices always-
on and always-on-them, becoming virtually ‘tethered’, this behaviour is not restricted
to young people. For these reasons, this term has increasingly become replaced by the
term digital residents and its counterpart digital visitors (White et al., 2012).
In any case, we acknowledge that many of our students today are not only engaged
in digital technologies in a daily basis, but in their world there have always been digital
technologies in various forms. Even with the proviso that this behaviour may not be
generalisable “outside of the social class currently wealthy enough to afford such things”
(Turkle, 2008), it is an observable behaviour that is becoming increasingly common as
digital technologies have become more affordable than ever before. This suggests that
in the planning of a study involving higher education students as participants, not only
those in this generation should be considered, but also those outside it, such as mature
students.
homo digitalis, homo sapiens digital, technologically enhanced beings, digital youth and the “yuk/wow”
generation (Hockly, 2011; Dawson, 2010).

2.1.2 Mature students in HE
Ascribing generational traits to today’s learners is somewhat an overgeneralisation. As
Jones and Shao (2011) point out, global empirical evidence indicates that, on the whole,
students do not form a generational cohort but they are “a mixture of groups with var-
ious interests, motives, and behaviours”, not cohering into a single group or generation
of students with common characteristics. In particular, research on higher education
students often focus on the standard age band of students under 21 years of age, not
accounting for mature students (this term is typically used to refer to those who are over
this threshold upon entrance).
Even amongst this group, there are significative differences in behaviour and attain-
ment. Studies have found that older mature students were more likely to study part-time
than full-time, as family and work commitments have been acquired. In fact, 90% of
part-time undergraduate students are 25 years old or over and as many as 67% are over
30 (Smith, 2008).
On this note, Baxter and Hatt (1999) argued that mature students could be disag-
gregated according to age bands seemingly correlating with various levels of academic
success. Therefore, instead of considering standard and mature students solely (under
and over 21 respectively), they introduce the distinction between younger and older
matures, as those over 24 were more likely to progress through into their second year,
despite a longer period time out of education. In general the younger mature learners
were more at risk of leaving the course than older mature students.
However, even this division may well be still a poor generalisation about (mature)
students, as beside their age, there are a myriad of more relevant factors affecting their
experience, such as their route into HE, their background and motivation to study, all are
difficult (if not pointless) to use for a classification of mature learners (Waller, 2006). An
approach that acknowledges the individual characteristics of learners is to be preferred
to those requiring conflating them into a homogeneous group, as conclude by Waller
(2006), requiring educational providers to act on means to identify these characteristics
in order to adopt such an approach.
2.1.3 Summary
The literature reviewed in this area validates the need for individualised support and
feedback, delivered timely and directly to each student, if it is to make an impact.
Another conclusion from this review is that students in higher education today have been
exposed to digital technologies (of which wearable and mobile devices are an example),
suggesting that these can become appropriate channels to facilitate this delivery.

2.2 Computers and learning
A natural consequence of the pervasiveness of digital technologies in recent years is that
they are now almost universally use in teaching and learning (to various degrees). In fact,
coinciding with the advent of the personal computer in the 1970s, the term Computer
Assisted Learning was first coined, alongside Computer Assisted Instruction and similar
others, however, these terms are less commonly used as they are becoming replaced in the
educational discourse by the term e-learning. The former have been used to characterise
the use of computers in education, or more specifically, where digital content is used in
teaching and learning. In contrast, the latter is generally used only when the content is
accessed over the Internet (Derntl, 2005; Hughes, 2007; Jones, 2011; Sun et al., 2008).
2.2.1 Learning Management Systems
Learning Management Systems (LMS), also known as virtual learning environments
(VLE) and course management systems, are excellent examples of the application of
e-learning to support traditional face-to-face instruction. These are systems used in the
context of educational institutions offering technology-enhanced learning or computer-
assisted instruction – BlackboardTMand Moodle are the best-known examples.
Stakeholders may have different objectives for using a LMS. For example, Romero
and Ventura (2010) reviewed 304 studies indicating that students use LMS to person-
alise their learning, reviewing specific material and engaging in relevant discussions as
they prepare for their exams. Lecturers and instructors use them to give and receive
prompt feedback about their instruction, as well as to provide timely support to stu-
dents (e.g. struggling students need additional attention to complete their courses more
successfully (Baepler and Murdoch, 2010), as the failure to do so comes at a great cost,
not only to these students but to their institutions). Administrators use LMS to inform
their allocation of institutional resources, and other decision-making processes (Romero
and Ventura, 2010). These authors argue the need for the integration of educational
data mining tools into the e-learning environment, which can be achieved via LMS.
LMS are being increasingly offered by Higher Education institutions (HEIs), a tech-
nological trend making an impact on these institutions. Another trend is the prolifer-
ation of powerful mobile devices such as smartphones and tablets, from which on-line
resources can be accessed2.
2
These two trends push HEIs to provide LMS access via smartphones in a visually appealing and
accessible way. These are inherent requirements of the mobile experience, which is fundamentally dif-
ferent to the desktop one (Benson and Morgan, 2013). Benson and Morgan present their experiences
migrating an existing LMS (StudySpace) to a mobile development, as a response to these pressures and
the pitfalls identified on the Blackboard MobileTM
app.

It is worth noting that the majority of these systems have a client-server archi-
tecture supporting teacher-centric models of learning (common scenarios have teachers
producing the content while students ‘consume’ it) (Yang, 2006). To put this assertion
in context, pedagogic conceptions of teaching and learning are usually understood in
the literature as falling into one of two categories: teacher-centred (content driven) and
student-centred (learning driven) (Jones, 2011, and references therein). Figure 2.1 shows
these orientations as overarching the main five conceptions of teaching and learning
which act as landmarks alongside a continuum of roles in learning. Deep learning occurs
at the bottom end of the scale, as opposed to shallow learning which occurs at the top
end. When student-centred, computer assisted learning can increase students’ satisfac-
tion and therefore engagement and attainment. It is remarkable that the move towards
learner-centredness in Higher Education coincides with the trends towards personalisa-
tion and user-centredness in Human-Computer Interaction and computing technologies
in general.
Imparting information
Teacher-centred
(content-driven)
Transmitting
structured knowledge
Student-teacher
interaction /
apprenticeship
Facilitating
understanding
Conceptual change
/ intellectual
development
Student-centred
(learning-oriented)
Figure 2.1: Multi-level categorisation model of conceptions of teaching (adapted)
Kember (1997).
The trend towards a widespread use of mobile devices, earlier identified, brings an
increased number of opportunities of effecting the conceptual change from the categori-
sation above, as it has the potential of making the learning more student-centred than

before: it would take placer wherever the student goes, whenever it suits the student
best3. Additional opportunities to reach students to either deliver content or to assess
their learning, are coupled with opportunities for other stakeholders at educational insti-
tutions to gain an insight on student achievement (typically progression and completion)
via learning analytics, as presented in the next subsection.
2.2.2 Learning analytics
As well as facilitating engagement, content delivery and even assessment and feedback,
digital technologies have been increasingly being used for facilitating administrative
tasks and decision-making at educational institutions. In particular, in recent years
HE institutions have begun to use data held about their students for learning analytics
(Barber and Sharkey, 2012; Sharkey, 2011; Bhardwaj and Pal, 2011; Glynn, Sauer, and
Miller, 2003).
Learning analytics (also known as academic analytics and educational data mining),
are widely regarded as the analysis of student records held by the institution as well
as course management system audits, including statistics on online participation and
similar metrics, in order to inform stakeholders decisions in HE institutions. Academic
analytics are considered as useful tools to study scholarly innovations in teaching and
learning (Baepler and Murdoch, 2010). According to these authors, the term academic
analytics was originally coined by the makers of the virtual learning environment (VLE)
BlackboardTM, and it has become widely accepted to describe the actions “that can be
taken with real-time data reporting and with predictive modeling” which in turn helps
to suggest likely outcomes from certain behavioural patterns (Baepler and Murdoch,
2010).
Educational data mining involves processing such data (collected from the VLE
or other sources) through machine learning algorithms, enabling knowledge discovery,
which is “the nontrivial extraction of implicit, previously unknown, and potentially
useful information from data” (Frawley, Piatetsky-Shapiro, and Matheus, 1992). Whilst
data mining does not explain causality, it can discover important correlations which
might still offer interesting insights. When applied to higher education, this might enable
the discovery of positive behaviours, such as for example, whether students posting more
than a certain number of times in an online forum tend to have higher final marks, or
whether attendance at lectures is a defining factor for academic success, or even for any
of its measures such as “retention, progression and completion” (Sarker, 2014).
3
The “anywhere, anytime” maxim driving pervasive computing maxim is also a motivator for the
development of the next generation of e-learning. Rubens, Kaplan, and Okamoto (2014) discuss the
evolution of the field, aligning it to the advent of Web 2.0 and 3.0, central to this paradigm of learning.

2.2.3 Massive Open Online Courses
Developments in these learning digital technologies have facilitated the rise of massive
open online courses (MOOCs)4, where the already difficult issues of assessing and provid-
ing feedback increses dramatically in complexity with classes of up to tens of thousands
of learners (Hyman, 2012). Within this context, a considerable amount of interest has
been devoted very recently to the use of learning analytics too, for example:
• On social factors contibuting to student attrition in MOOCs (Rosé et al., 2014;
Yang et al., 2013);
• On linguistic analysis of forum posts to predict learner motivation and cognitive
engagement levels in MOOCs (Wen, Yang, and Rosé, 2014).
2.2.4 Summary
The literature reviewed in this area evidences the impact of digital technologies in the
provision of support and feedback to learners and other stakeholders of educational
institutions, both in terms of facilitating learning and assessment (as in MOOCs, for
example, but in e-learning in general) as well as in terms of characterising the learners
using learning analytics. In doing so, it is possible to identify the variations amongst
learners to better facilitate the learning experience. An important category of digital
technologies used in education includes portable, light-weight devices, which can be
additionally function as sensor carriers, as presented in the following section.
2.3 Smart badges and smartphones
Until recently, cumbersome sensing equipment (often carried in backpacks) was required,
as shown in a survey of early developments in sensing technologies for wearable comput-
ers (Amft and Lukowicz, 2009). These are now replaced by small, light-weight sensors
which are also capable of becoming embedded within badges and phones, for example.
Smart badges are identity cards with embedded processors, sensors and transmitters.
The concept is not new, in fact the first of these wearable computers was developed
two decades ago, by the Olivetti Research Laboratory (Cambridge) and then further
developed by Xerox PARC: the Active Badge (Want et al., 1992; Weiser, 1999), shown
in Figure 2.2.
More recently, smart badges have been used to study social behaviour, as with
the Hitachi’s Business Microscope (HBM) (Ara et al., 2011; Watanabe, Matsuda, and
Yano, 2013) and with its predecessor, the MIT wearable sociometric badge (Wu et al.,
4
MOOCs are occasionally referred to as “Massively-Open Online Courses”.

Figure 2.2: Smart badges: The Active Badge by Palo Alto Research Centre
(Weiser, 1999)
2008; Pentland, 2010; Dong et al., 2012), shown in Figures 2.3 and 2.4. These badges,
containing tri-axial accelerometers, are able to capture some characteristics of the motion
of the wearer (e.g. being still, walking, gesturing). Thanks to additional sensors such as
infrarred transceivers, they are also able to capture face-to-face interaction time. Being
lightweight and with a long battery life, these badges can be carried unobstrusively for
several hours a day.
Figure 2.3: Smart badges: Hitachi’s Business Microscope
(external and internal appearance) (Ara et al., 2011)
Watanabe et al. (2012) used the HBM in an office environment, finding evidence
that the level of physical activity and interaction with others during break periods
(rather than during working activities) is highly correlated with the performance of
their team. Watanabe et al. (2013) then applied this methodology within a learning

Figure 2.4: Smart badges: The MIT wearable sociometric badge (Dong et al., 2012)
environment, this time using the smart badges on primary school children, observing
a strong correlation between the scholastic attainment of a class and the degree of in
which its members are “bodily synchronised”. In other words, classes with all their
members are either physically active or resting consistently during the same periods,
perform better. Another correlation these authors observed is the number of face-to-
face interactions per child during break. Their findings suggest that when children in a
class move in a cohesive manner, the class perform well overall, and also, that the more
face-to-face interactions an individual has, the better their attainment.
The use of badges by all participants is easily enforced in an environment with a
strict dress code, such as school uniforms. Since our population of interest is higher
education students, smartphones are probably more appropriate than smart badges as
sensor carriers, but it is nonetheless interesting to see how much can be learned from sen-
sor data, especially when combined with learning analytics, as in the case of Watanabe
et al. (2013), certain behaviours can be found to be related to a measure of success.
Smartphones present another advantage over badges. Equipped with ambient light
sensors, proximity sensors, accelerometers, GPS, camera(s), microphone, compass and
gyroscope, plus WiFi, Bluetooth radios, a variety of applications can be built to gather
a great range of sensed data Lane et al. (2010). Thanks to their communication and
processing capabilities, smartphones could support a sensing architecture such as the
one depicted in Figure 2.5.
Contextual information can be inferred from the sensor data hence gathered, and the
context determined as in, for example, location. However, it has been long accepted that
“there is more to context than location” (Schmidt, Beigl, and Gellersen, 1999). Contex-
tual information broadly falls into one of two types: physical environment context (such
as light, pressure, humidity, temperature, etc) and human factor related context such
as information about users (habits, emotional state, bio-physiological conditions, etc),
their social environment (co-location with others, social interaction, group dynamics,
etc), and their tasks (spontaneous activity, engaged tasks, goals, plans, etc) (Schmidt
et al., 1999).

Figure 2.5: A smartphone sensing architecture (Lane et al., 2010).
Context acquisition is, however, important not just because of the possibility to offer
customised services that adapt to the circumstances. Context processing can increase
user awareness (Andrew et al., 2007), and thereby prompt alternative actions to better
achieve a desired goal given the current context, hereby modifying somehow an intended
behavior.
2.3.1 Summary
The literature in this area indicates that sensor data has the potential to help us un-
derstand human behaviour as a collective and as individuals as well as gathering the
context in which it is situated. This would be a suitable foundation for a behavioural

intervention which is aligned to the user’s goals, and the smartphone is a suitable sensing
platform which could be used to understand users’ behaviour as well as supporting them
in achieving their higher goals, as discussed in the next Section.
2.4 Behaviour sensing and intervention
Despite its inherent complexity, researchers have shown that human behaviour is highly
predictable in certain contexts. In the context of scale-free networks, the degree of
predictability has been quantified to 93% (Song et al., 2010). Evidence suggests that
behaviour can be “mined” and even predicted using sensors on phones or smart badges
(presented in the previous Section):
• identifying structure in routine (for location and activity) to infer the organisa-
tional dynamics (Eagle and Pentland, 2006);
• analysing behaviour based on physical activity as detected via smartphones (Bieber
and Peter, 2008);
• predicting work productivity based on face-to-face interaction metrics (Wu et al.,
2008; Watanabe et al., 2012);
• inferring friendship network structure with mobile phone data (Eagle, Pentland,
and Lazer, 2009);
• using mobile phone data to predict next geographical location based on peers’
mobility (De Domenico, Lima, and Musolesi, 2012), even predicting when will the
transition occur (Baumann, Kleiminger, and Santini, 2013);
• classifying social interactions in contexts, where a crowd disaggregates in small
groups (Hung, Englebienne, and Kools, 2013);
• predicting personality traits with mobile phones (de Montjoye et al., 2013);
• Bahamonde et al. (2014) showed that even data from smart cards which can be
regarded as less personal than phones or identity cards are suitable capable for
behavior mining. In particular, these researchers were able to deduce users’ home
address through the data exposed by their bip! cards, which are used for payment
for public transport in Santiago de Chile.
From this research we can assert that, given sufficient information, some human be-
haviour can be predicted (see Appendix B for more on its high predictability).
Specifically relevant to behaviour sensing in the educational context is the possibility
of “seeing” the learning community (Dawson, 2010) by studying the frequency and types

of interactions amongst learners using social network analysis (SNA), as factors such as
degree centrality5 is a positive predictor of a student sense of community, which is
measurable.
Srivastava, Abdelzaher, and Szymanski (2012) acknowledge the use of smartphones
for sensing is becoming increasingly commonplace for human-centric sensing systems
(whether the humans are the sensing targets, sensors operators or data sources). They
identify various technical challenges to their wider adoption for these systems, one of
them being the difficulty of inferring a rich context in the wild. They warn that earlier
successes on inferences about mobility do not replicate with ease when making inferences
about “physical, physiological, behavioural, social, environmental and other contexts”
(my emphasis).
In terms of behavioral change, the state of the art includes:
• using computers as persuasive technologies6 (Fogg, 2003, 2009, 2003; Müller, Rivera-
Pelayo, and Heuer, 2012);
• promoting preventive health behaviors to healthy individuals through SMS, with
positive behavior change in 13 out of 14 reviewed interventions (Fjeldsoe, Marshall,
and Miller, 2009);
• health-promoting mobile applications (Halko and Kientz, 2010);
• HCI frameworks for assessing technologies for behavior change for health (Klasnja,
Consolvo, and Pratt, 2011);
• “soft-paternalistic” approaches to nudge users to adopt good behaviours to protect
their own privacy on mobile devices (Balebako et al., 2011);
• nonverbal behavior approaches to identify emergent leaders in small groups (Sanchez-
Cortes et al., 2012);
• interactions of great impact and recall to facilitate behaviour change (Benford
et al., 2012);
• protocols for behavior intervention for new university students (Epton et al., 2013);
• using smartphones for digital behavioral interventions (Lathia et al., 2013; Weal
et al., 2012);
• guidance for planning, implementation and assessment of behavioral interventions
for health (Wallace, Brown, and Hilton, 2014).
5
The degree centrality is defined by the number of connections a given node has.
6
Persuasive technologies, not to be confused with pervasive, as here the emphasis is on “persuasion”
rather than ubiquity.

In particular, Wallace et al. (2014) argue that interventions involve change processes
“linked to psychological theories of human behaviour, cognition, beliefs and motivation”
with a primary aim of improving experiences and well-being. This must be incorporated
in the planning and implementation of any behavioural intervention, in particular for
digital interventions. Lathia et al. (2013) identify the need for monitoring, learning
about the behaviour, before delivering an intervention, effects of which must continue
to be monitored (Figure 2.6).
Monitor
• Gather mobile sensing data
• Collect online social network
relationships and interactions
Learn
• Develop behaviour models
• Infer when to trigger
intervention
• Adapt sensing
Deliver
• Tailored behaviour change
intervention
• User feedback via the smart-
phone
Figure 2.6: The three components of digital behaviour interventions using
smartphones (Lathia et al., 2013, adapted).
Furthermore, Klasnja et al. (2011) assert that the development of such technolo-
gies presupposes the need for large studies, suggesting that “a critical contribution of
evaluations in this domain, even beyond efficacy, should be to deeply understand how
the design of a technology for behavior change affects the technology’s use by its target
audience in situ”. Translating this experience to the educational context means that it
is not realistic to measure the success of the development by actual behavior change,
but instead, by the degree of understanding of its potential to influence behaviour.
2.5 Final comments
In the previous section, smartphones and badges were considered as sensing platforms for
behaviour. In addition to the data that could be collected implicitly (i.e. without explicit
intervention from the user) via these, the possibility of incorporating user-generated data
is also valuable. As an example, life annotations (Smith, O’Hara, and Lewis, 2006) and
‘lifelogging’ (O’Hara, 2010; Smith et al., 2011). This data could be potentially used to
enrich that typically studied in learning analytics by giving an insight on an additional
dimension of student lives: what do they do when they are not studying?

Through this (still ongoing) survey of the relevant literature, I have now gained a
greater understanding of the characteristics of Higher Education students (which may
condition their levels of academic success), the devices they use in their learning (in
and out of the classroom), and others from which their behaviour can be sensed, as
behavioural factors may complement conditioning factors in determining of student suc-
cess. I also explored the state of the art in behavioural interventions, and what data can
be used to facilitate one. This is the foundation upon which key research components
have been created, which are presented in the next Chapter.

Chapter 3
A research question
The literature review presented in the previous Chapter surveyed the type of data and
techniques that can be used to understand and predict student behaviour. This Chap-
ter formulates the research question to be addressed, in order to plan an experimental
methodology and a road map for future work.
The research question stated in the introduction is “What are the measurable factors
for the prediction of student academic success?”. This Chapter discusses conditioning
and behavioural factors affecting students academic success and how to gather data for
measures of these factors against academic performance (a proxy for success).
3.1 What are the measurable factors for the prediction of
student academic success?
Most context-aware pervasive systems use location as the most important contextual
information available. Indeed, there is a wealth of research and commercial products
which offer location-based services, which focus on the use of readily available informa-
tion relevant to users in a given location. Not yet so well exploited, although gathering
significant scientific interest, is the use of physical activities as contextual information.
Other sources of contextual information that can become readily available include
the use of social media and learning analytics. Additionally, using sentiment analysis
on social media could help capture users mood and general outlook over the observable
period. Data mining algorithms could be applied over collected data, however, the
“ground truth” measure of what constitutes a successful student needs to be established
beforehand, and as explained earlier, it is in itself a very difficult question. Proxy
measures of success can be used, such as academic achievement and progression, but
other aspects of student life such as level of engagement and contentedness (if somehow
18

Chapter 3 A research question 19
measurable) could also taken into account for a more complete portrait of a successful
student.
Table 3.1 lists a range of activities that students in higher education are likely to
engage in, as well as the means of gathering data which could lead to identify a given
activity, assuming participants’ consent and unrestricted access to data sources, and the
practical viability of the creating such a data collection based on existing research. As
Table 3.1 suggests, a substantial amount of information about the student behaviour can
be harvested and quantified (albeit exhibiting “Big Data” challenges for any practical
purposes). In other words, it is viable to investigate the behavioural factors affecting
the student success, if, as in the traditional learning analytics (based on conditioning
factors1), these are analysed against metrics of academic success, such as retention,
progression and completion. This would give a more complete characterisation of a
student than ever before and, as a consequence a more powerful, accurate prediction of
their success.
I have now specified the research question, and will now discuss the practical work to
date conducted in pursuit of answers of aspects of this question, arisen from the literature
review presented in Chapter 2. This is followed by the formulation of specific research
hypothesis, which will specifically qualify the scope of this research (in Chapter 5).
1
Conditioning factors such as, for example, those highlighted in Table 5.2, page 38.

Chapter 3 A research question 20
Table 3.1: What do students do?
Activity What could be measured? Possible
data source
Research using
“similar” data sources
Attend
lectures
Number of lectures attended
during the semester, punctu-
ality (by comparing calendar
against actual arrival times)
GPS, University
timetable, co-
location with peer
learners, wi-fi
Ara et al. (2011); Watan-
abe et al. (2013); Wu et al.
(2008); Pentland (2010);
Dong et al. (2012)
Use
a VLE
Forum participation (fre-
quency, number of posts),
number of downloads
VLE records Barber and Sharkey
(2012)
Visit
libraries
Number of items borrowed,
length of the loan, medium,
material type
Smartcard,
Radio-Frequency
Identification
(RFID), library
records
Take
exams
Academic performance mea-
sures (exam results, history of
academic performance)
University
records, VLE
Travel Mode of transport, Distance
travelled, peridiocity
Accelerometer,
transport smart
card records, GPS
Hemminki, Nurmi, and
Tarkoma (2013a); Baha-
monde et al. (2014)
Meet other
students
Co-location with other learn-
ers, certain locations (labs,
etc), noise levels at location
GPS, Bluetooth,
microphone,
smartcard, RFID
tags
Hemminki, Zhao, Ding,
Rannanjärvi, Tarkoma,
and Nurmi (2013b)
Extra-
curricular
activities
Participation in societies,
sports, games, etc
VLE forums,
Facebook
Wen et al. (2014)
Social
networking
Number and frequency of
tweets and facebook posts,
number of uploaded photos
Twitter,
Facebook
Physical
activities
Frequency, level of activity
(walk, cycle, run), fidgeting?
Accelerometer,
gyroscope
Hung et al. (2013); Huynh
(2008)
Play and
rest
Number of hours watching TV
or movies
Lifelogging, ambi-
ent light sensors,
accelerometer
Smith et al. (2011)
Other
activities
of daily
living
Eating and drinking (regular-
ity of meals, frequency)
Lifelogging Smith et al. (2011)
Social
networking
Number and frequency of
tweets and facebook posts,
number of uploaded photos
Twitter, Face-
book

Chapter 4
Outcomes of Work to Date
In addition to the literature review presented in Chapter 2, other work to date has
involved the investigation of student’s views via two surveys applied to Higher Education
Students, one in English, of students in the UK (Section 4.1) and a version in Spanish,
of students at the University of Chile (Section 4.2), as well as an investigation into a
platform and its dataset from which student behaviour could be inferred: the U-Cursos
platform (Section 4.3).
4.1 Survey of HE English-speaking students
4.1.1 Methodology
A survey1 of Higher Education students, including undergraduate and postgraduate stu-
dents in several disciplines, was applied between the 16th August and the 18th October
2013. This survey focused on exploring the current use of smartphones by Higher Ed-
ucation students as well as establishing acceptability of a future application. It was
developed iteratively, applying early versions amongst fellow researchers before deploy-
ing it on the survey platform iSurvey. Data collected using early versions of the survey
was discarded as their purpose was only to inform the design. The questions appearing
in the final version of the survey can be seen in the Appendix C.
Some of the elements in the literature review informed the questionnaire design. For
example, the exploration the use of the smartphone that Questions 2 and 3 intended to
test the extent to which the characterisation of a virtually “tethered” student presented
in Section 2.1.1 is true. Similarly, the considerations presented in Section 2.1.2 helped
in determining the age groups within question 5(b). In all, the information required fell
across the following areas:
1
Hosted at https://www.isurvey.soton.ac.uk/admin/section_list.php?surveyID=8728.
21

Chapter 4 Outcomes of Work to Date 22
• Smartphone ownership — to establish whether participants own (or intend to
acquire) a smartphone shortly. If so, which brand, to confirm whether an Android
development would be suitable.
• Current use of the smartphone — in which participants are asked about the fre-
quency of their use of their phone across a range of activities.
• Perception on whether the smartphone helps or hinders participants’ personal goals
in general, and their academic success specifically.
• Acceptability of a pervasive application that would provide behavioural “nudges”
and desired features of such an application;
• Other information controlled including: discipline studied, level of study, modality
of studies (part-time or full-time) and views on adoption of technology.
The survey was publicised on various social networks (LinkedIn, Facebook and Twit-
ter) as well as by direct e-mail invitation to University of Southampton students2. Par-
ticipants were required to be students in Higher Education and over 18 years old. No
compensation was offered as no detriments arose from the participation in the research
other than an investment of ten minutes for the typical participant (of which partici-
pants were duly warned beforehand) and participants were not required to give sensitive
information, as questions related to the demographics section of the survey were not
open (instead, meaningful bands were offered for selection whenever possible). Many
questions could have been skipped if the participant wanted so3.
A total of 807 students attempted this questionnaire however, many could not com-
plete due to a limitation of the iSurvey platform, which hosted the survey4. After
discarding incomplete submissions and those from participants in academic institutions
outside the UK, data from 164 participants remained for analysis.
4.1.2 Findings
An analysis of the responses indicate that participants, despite actively using smart-
phones in their daily lives, are hesitant on allowing these devices to track their behaviour
2
Via Joyce Lewis, Senior Fellow for Partnerships and Business Development.
3
Compliant with recommendations by the British Educational Research Association (BERA), out-
lined in “Ethical Guidelines for Educational Research”, http://www.bera.ac.uk/system/files/BERA%
20Ethical%20Guidelines%202011.pdf. Also compliant with our institutional guidelines collated un-
der https://sharepoint.soton.ac.uk/sites/fpas/governance/ethics/default.aspx, (both last ac-
cessed 28th
February 2014). Ethics reference number: ERGO/FoPSE/7447.
4
At the time, there was a requirement for the participants to have Flash-enabled devices to complete
surveys with slider questions (as it was the case), so participants accessing via iPhones or iPads had
to re-start the survey in other platforms. It is not possible to estimate how many did (given that the
survey was anonymous). This problem has now been resolved (https://www.isurvey.soton.ac.uk/
help/changes-to-the-slider-question-type/) but unfortunately it affected this data collection.

and whether such feedback is desirable. On one hand, participants report their use of a
smartphone for a number of activities, as shown in the charts in Figure 4.1.
Figure 4.1: Survey responses from UK students (excluding qualitative data).
The first 18 charts refer to activities that participants report undertaking with their
smartphones, which correspond to the 18 activities indicated in Question 2 of the survey.
A dominance towards lower numbers in the x axis corresponds to a high frequency in
performing a given activity as reported by the participants. For example, this applies to
making or receiving phone calls and text messages, using social networks and calendars
or reminders. Conversely, a dominance towards higher numbers in the x axis corresponds
to a low frequency, as it is the case for blogging, searching for a job, and playing podcasts.

The next two charts in Figure 4.1 show the reported purpose for participants to use
their smartphone both in term time and outside of term. Whilst there is a preference
towards the use of their smartphones for personal reasons, as expected, this was much
more marked for outside of term periods. With regards to the perception of their phone
being a help or a barrier towards their personal goals and their academic success (the
subsequent two charts), most participants leaned towards the left end of the spectrum
(a help).
Figure 4.1 also indicates the reported desirability of features of a future smartphone
application, in charts 23 to 28. In this case, a preference towards the left indicates that
the given category is very desirable, and towards the right that it is not. Participants
were then asked whether they were concerned about any of these possible features5. In
this case, and with various degrees of acceptance, the majority welcomed features that
provided them with information about themselves and their peers, with the exception
to the check-in learning spaces, which is not desired for the majority of the participants
in the survey.
Out of 164 participants, as many as 95 reported no concern about the features
mentioned. The remaining 69 participants had a variety of concerns, more prominently
regarding feedback on their behaviour and about their peers, as well as privacy concerns
regarding the capability of an application to check them when entering learning spaces.
Other privacy concerns focused on the data itself, and who would access and control
it. Many commented they would not want their smartphones to have these features,
in particular those regarding physical activity tracking (terms such as “surveillance”,
“big brother” and “panopticon” were mentioned) but some others would welcome some
feedback on how they use their time and see the benefits of using such an application.
However, not all respondents have the same attitude towards adopting innovation6,
as they claim identification with one of Rogers (1962) taxonomy classes: “Innovators,
Early adopters, Early majority, Late majority, or Laggards”7.
4.2 Survey of students from the University of Chile
4.2.1 Methodology
Once it was decided to use data from the University of Chile students, it became relevant
to adapt the survey previously described in Section 4.1 for its application on these
5
See Appendix D for a word cloud based on participants’ responses.
6
Rogers’ taxonomy is succintly summarised as follows: Innovators: first to adopt an innovation; Early
adopters: judicious in balancing financial risks; Early majority: adopt an innovation with early adopters
advice; Late majority: adopt innovation after majority; “Laggards”: the last to adopt an innovation.
(Rogers, 1962)
7
Currently, this data is being analysed using NVIVO (for the open responses) and SPSS and
SigmaPlot, and further conclusions will be reported in the final thesis.

Figure 4.2: Survey of University of Chile students: First screen.
students8. As well as translating the content for each of the screens (see example 4.2),
a question was removed as it was not relevant within this context (the concept of part-
time studying is not formalised via registration), and further options were added to the
educational stage question (as graduate courses last typically a minimum of 5 years, as
opposed to the UK’s three-year courses).
4.2.2 Findings
The general trend of the responses is remarkably similar to that of UK students. Only
two exceptions, which are explained in the following paragraphs:
Firstly, the Chilean participants seem to prefer phone calls to SMS messaging. This
may be explained by the fact that each SMS text is typically charged (unlike in the
UK, where most providers offer a number of free messages as part of their services).
Given that Internet providers in Chile offer affordable flat-fare packages, for small texts,
Chilean students may prefer communicating via social networks (such as Twitter direct
messaging or Facebook chat), or messaging apps (such as WhatsApp and Viber).
A second difference worth commenting is that whilst the UK participants perceive
their smartphones as helpful towards the achievement of both their personal goals and
their academic success, this is not so clear for the Chilean participants, who seem divided
in their responses. Although the justification for this difference is yet to emerge from
8
The version of this survey in Spanish is hosted at https://www.isurvey.soton.ac.uk/admin/
section_list.php?surveyID=10807 (closed at present).

Figure 4.3: Survey responses from students of the University of Chile (excluding
qualitative data). Note that it has one chart less than Figure 4.1 because there is no
distinction between Full- and Part-Time at registration at the University of Chile.
further analysis of the data, one possible explanation may lie with the stage in their
studies: it is conceivable that students who have not progressed as quickly as they had
expected may attribute their lack of progress to distractions related to their use of their
smartphones, which is nevertheless, comparable to that of their UK counterparts.

4.3 U-Cursos
U-Cursos is a web-based platform designed to support classroom teaching. An in-house
development by the University of Chile, it was first released in 1999, when the Faculty of
Engineering required the automation of academic and administrative tasks. In doing so,
the quality and efficiency of their processes improved, whilst supporting specific tasks
such as coordination, discussion, document sharing and marks publication, amongst oth-
ers. Within a decade, U-Cursos became an indispensable platform to support teaching
across the University, used in all 37 faculties and other related institutions.
Channels Service content
Channels services
Figure 4.4: A typical U-Cursos view. Left: a list of current channels (courses,
communities and associated institutions). Top right: services available for the selected
channel. Bottom right: contents of a service. From Cádiz et al. (2014) (in Appendix
E)
The success of U-Cursos is demonstrated by the high levels of use amongst students
and academics, reaching more than 30,000 are active users in 2013. U-Cursos provides
over twenty services to support teaching, as well as community and institutional “chan-
nels”, which allow students to network, share interests and engage in discussion about
various topics. Figure 4.4 shows a typical view of U-Cursos. On the left, a list of
“channels” available for the current term are shown. Channels are the “courses”, “com-
munities” and “institutions” associated with the user. Typically, courses are transient,
so they are replaced with new courses (if any) at the start of the term. Communities
are subscription channels which are permanent and typically refer to special interest
groups, usually managed by students, with extracurricular topics. Finally, institutions

Figure 4.5: Cramped look to the U-Cursos web interface from a smartphone (Cádiz
et al., 2014).
refer to administrative figures within the organisation. The institutional channels are
used to communicate official messages on the news publication service and also to allow
students to interact using forums containing students from all of the programmes within
each institution.
A number of services are available for each type of channel. Users can select any
of the shown services and interact with it on the content area of the view. Note that
the majority of the services are provided for all types of channels, but courses also offer
academic services such as homework publication and hand-in, partial marks publication
and electronic transcripts of the final marks. These features make course channels official
points of access for the most important events in a course and have become indispensable
for students.
4.3.1 Current status
The current version of U-Cursos displays well on all regular-size screens (above 9”), such
as desktop computers and tablets. However, the user interaction becomes cumbersome
on small displays, such as those in smartphones, as shown in Figure 4.5.

300,000
600,000
900,000
1,200,000
1,500,000
1,800,000
2,100,000
2,400,000
2,700,000
3,000,000
hits
month
1st term 2nd term student strike
Figure 4.6: Access graph between 2010 and 2014 for U-Cursos (Cádiz et al., 2014).
Another shortcoming is the lack of notification facilities, in particular those alerting
users of relevant content updates. The current setting requires users to manually access
the platform repeatedly to confirm that the information is still current. This behaviour
can be observed in Figure 4.6, which shows access statistics of U-cursos in the last four
years. There are clear high-peaks during the end-of-term periods9.
Additional factors may trigger an increased access rate to the service: students ask
more questions and download class material for the final exams, project coordination,
amongst others. According to the users, there is a component of uncertainty which
encourages users to repeatedly access the platform during these periods. As a response,
researchers from ADI designed a mobile application for the platform, currently in beta
testing.
A research visit to NIC Labs (University of Chile), took place from the 9th to the
19th of March 2014, to provide access and understanding of the historical data collected
across the University and also study the platform itself. A paper on the collaboration
was written and submitted to the 28th British HCI Conference, (see Appendix E).
U-cursos offers a number of services, of which the most frequently used are shown
in Table 4.1, with an indication of how popular are they amongst users as well as a list
of features students would like to see in U-Cursos (both for mobile and web).
The unique advantage of using this data above any other dataset currently available
is that it has over 30,000 users (staff and students) covering the past ten years, therefore
it is in principle viable for longitudinal and cross-sectional analysis. Whilst the mobile
platform is still in beta testing, having access to this wide range of data would enable
its analysis via educational analytics.
9
Terms run from March to July and from August to December in Chile. Some events may induce
small variations on the actual dates. The university closes for summer holidays in February. Source:
http://escuela.ing.uchile.cl/calendarios (In Spanish - Last accessed 9th
July 2014).

Table 4.1: U-Cursos services ranked in ascendent order of popularity amongst users.
The number in parenthesis indicates the percentage of students who flagged the relevant
service or feature as especially useful or desirable (Cádiz, 2013, adapted).
Current services New mobile features New general features
My timetable (92) Granular push (20) Chat (39)
E-mail (74) Preview material (11) Library (7)
Notifications (70) Search for a room (10) Multiplatform (6)
Teaching material (58) More simplicity (9) Tablet support (6)
Calendar (50) Attendance log (5) Facebook integration (4)
Partial marks (46) People search (4) Campus map (3)
Forum (20) Offline access (4) Room status (2)
Dropbox (14) Book a lab (4) Staff timetable (2)
Guidance notes (11) Timeline (4) “Read later” (2)
Coursework (7) Certificate requests (4) Virtual Classroom (2)
News (7) Android widget (4) Notes bank (1)
Access to past courses (5) Marks calculator (4) Health benefits (1)
Favourites (3) Google drive (3) Evernote integration (1)
Resolutions (2) Printing queues (2) Anonymous feedback (1)
Polls (2) Institutional mail (2) Foursquare integration (1)
Links (2) Enrolment (2) Group making (1)
Official transcripts (2) Course catalogue (1) Compare timetables (1)
Course administration (1) Find staff offices (1) Anonymous feedback (1)
Posters (1) Shortcuts (1) Reporting admin errors (1)
4.3.2 Summary
This chapter has described the practical experiences in my research, in particular, those
related to the application of a survey amongst two different groups of HE students,
and those related to the process of securing a dataset from which a model of student
behaviour could be created in answering our first research question. This foundational
work inform the steps for future action, described in the next Chapter, which lays out
a plan for the following months up to the final thesis submission10.
10
Further work identified yet beyond the scope of this thesis is presented in Appendix A.

Chapter 5
Research Plan for Final Thesis
This research will explore the predictability of student success applying learning analytics
on big data sets. In particular, I will analyse a rich “data trail” of student activities
as gathered via their interactions with a Learning Management System (LMS), such as
the University of Chile’s U-Cursos1. This data can be combined with data captured by
the institution at first enrolment, such as socio-economic indicators (typically used in
traditional learning analytics). From this analysis, a model of academic success will be
developed, providing insight on the factors influencing academic performance amongst
other measurable proxies for success.
5.1 Motivation
A primary motivation behind seeking such an insight is that it would facilitate the
identification of students “at risk”, and further enable behavioural interventions so that
students can be supported in becoming successful in their studies. A greater, lasting
goal would be to influence student behaviour via persuasive technologies, so that the
students themselves are empowered to effect a significant change in their study. However,
this is a long-term goal beyond the scope of the present research. Whilst the rich
interconnection necessary for a digital behavioural intervention is not yet fully supported,
and the existing student data is both incomplete and noisy for this specific purpose, we
can still gain a good understanding of how it might look by examining current student
data, from both the educational and the pervasive computing perspectives.
A central theme of this research is learning analytics, informed by relevant studies on
behavioural interventions and the application of pervasive computing to education. In
order to build on the traditional learning analytics research approaches (generally limited
1
Developed by the University of Chile’s Information Technologies group (ADI, Área de Infotecnologı́as
in Spanish).
31

Chapter 5 Research Plan for Final Thesis 32
to data controlled by the educational institution), I have also considered including data
that could offer an additional insight into student behaviour, by articulating descriptions
of the activities successful students do even when they study.
5.2 Research question and research hypotheses
The general research question to be addressed is:
“What are the measurable factors for the prediction of student academic
success?”
This is a very wide-ranging question, which includes a number of conditioning fac-
tors (e.g. what students bring with them before starting Higher Education) as well as
behavioural ones (e.g. how do students engage in Higher Education studies). To focus
the research, a number of specific research hypotheses have been identified:
H1: Traditional learning analytics on conditioning factors are suitable pre-
dictors of success. Specifically, are socioeconomic indicators and student com-
petences2 acquired during secondary schooling adequate predictors for student
performance in Higher Education? Existing research has strongly indicated this
to be true, however the work published to date contains limitations, such as:
(a) in the size of the sample. For example, Bhardwaj and Pal (2011) studied data
from up to 300 participants;
(b) studies predicting only persistence or attrition rather than measured academic
performance (Glynn et al., 2003)
My investigation of H1 is designed to extend the scope of the analysis and remove
some of these limitations. However, since this and other work published to date
highlight some factors as good predictors of student success, I will especially look
for evidence of such a correlation in the data to either support or falsify hypothesis
H1. These factors are: socio-economic factors such as age and parents level of
education, as well as academic performance in previous learning (such as high-
school marks).
H2: Learning analytics data in the traditional sense can be significantly
enriched by incorporating data from social media and other student-
generated data. Students interacting with the LMS leave a data trail which can
be quantified. Engagement in social forums within the U-Cursos platform is an
additional variable that can be incorporated in the prediction model. Does the
model become more accurate by doing so?
2
By student competences we refer to those measured by the University Selection Test in Chile (or
PSU, Prueba de Selección Universitaria in Spanish (Dinkelman and Martı́nez A, 2014)), which is used
for university admissions across the country.

H3: Smartphone data can be used to inform the prediction model. In par-
ticular, do measures of engagement with the U-Cursos mobile platform correlate
with those in the web-based version (for which there is substantial historical data
available)?
To test hypothesis H1, I will work with institutional data held by the University
of Chile via the platform U-Campus3, which holds databases on administrative data
related to each student, e.g. status, courses in which they are enrolled, enrolment, pro-
gression, withdrawal and completion, as well as the reported socio-economic indicators
at the time the PSU test ( Prueba de Selección Universitaria in Spanish) was taken. U-
Campus offers a number of services to five4 faculties across the university: those services
related to curriculum management (e.g. enrolments, course programmes, prospectuses,
accreditation), administration and personal management (e.g. repository of University
Council minutes, accreditation statistics).
U-Campus is of interest for this research since the student data held (as above
outlined) could well be used to predict success if H1 is true. In particular, and following
on previous research (Sarker, 2014; Bhardwaj and Pal, 2011; Glynn et al., 2003), I expect
to find a correlation between academic performance and socioeconomic indicators such
as education level and occupation of the parents,
To test hypothesis H2, I will include in the analysis log data from U-Cursos in-
dicating the time and frequency of interactions with the LMS, including not only the
instances in which students upload content (e.g. submitting coursework) but also the
instances in which they retrieve information of interest (e.g. assessment results and
course information).
In testing hypothesis H3, I will follow closely the development of the mobile ex-
tension of U-Cursos, which aims firstly at improving accessability and usability, and
secondly at exploiting smartphones capabilities, such as nudges via granular pushes for
delivery of information and the possibility of incorporating location data to the times-
tamp of an interaction. Rather than investigating the effectiveness of these additions,
I’m interested in proposing a framework so that mobile data can be incorporated into
the learning analytics.
There are certain limitations regarding the mobile data which will be available in
the coming months. In particular, this development is still in progress: beta testing
is expected to finish by the end of July 2014 and therefore there is no historical data
available. Additionally, the number of users is currently limited to just 50 (as opposed
to the current 30,000 users of the web-based version of the platform). Despite this
limitation, it is worth exploring whether the prediction model applied using the mobile
3
Access-restricted portal: https://www.u-campus.cl. See Appendix F for screenshots.
4
The University of Chile faculties currently using U-Campus are: Mathematical and Physical Sci-
ences, Medicine, Architecture and Landscaping, Social Sciences, and Philosophy.

data is reasonably aligned with the prediction results achieved when using the web-based
platform.
5.3 Work Packages
In order to test the hypotheses presented in the previous section, a number of activities
have been planned. The timescales for the proposed future work are given in the Gantt
chart in Table 5.1, and detailed in the following work packages:
WP1: Enhanced literature review, with a focus on learning analytics as applied to the
three research hypotheses.
WP2: Additional data analysis on surveys conducted in Chile and the UK.
WP3: Data acquisition and the collation of a complete dataset (a subset of U-Campus
and U-Cursos).
WP4: Analysis of historical data from the PSU admission test of University of Chile
students, for indicators associated to completion (available via U-Campus).
WP5: Analysis of U-Cursos data, for factors associated with high marks.
WP6: Integrating WP4 with WP5 findings for a predictive model of academic success.
WP7: Incorporating the additional variables gathered via U-Cursos mobile into the
predictive model from WP4.
I am currently working on the first three work packages (WP1 to WP3). WP1 is
necessary to complement my existing literature review, and will continue for the next
12 months, to ensure awareness of state-of-the-art research. In WP2, I will finalise the
quantitative and qualitative analysis of the surveys data that was described in Chapter 4.
WP3 also completes ongoing work, this time regarding the datasets needed to work in
this research. Work for this package started during my research visit to the University
of Chile from the 9th to the 19th of March 2014, when an improved understanding of the
data architecture of both U-Cursos and U-Campus was achieved (beyond the general
concept presented by Cádiz (2013)). During this trip the collaboration with ADI and
NIC Labs became formally established. Figure 5.1 provides an outline of the processes
and the kind of data stored, as well as the domains of responsibility for each.
WP4 will undertake a full analysis and evaluation of the PSU test data of students
who have enrolled in the University of Chile since 2003, when the test was first intro-
duced. More specifically, I will study correlations and statistical dependencies (using

Chapter
5
Research
Plan
for
Final
Thesis
35
Table 5.1: Schedule of research work and thesis submission
2014 2015
Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct
Mini-thesis viva
H1 – conditioning factors
WP1: Extending literature review
WP2: Additional data analysis on surveys data
WP3: Securing U-campus and U-cursos data
WP4: Analysis of U-campus data (with PSU data)
Second research visit to Chile
H2 – behavioural factors
WP5: Analysis of U-Cursos data (SPSS and WEKA)
WP6: Integration for a predictive model
Submit WP6 results to Computers and Education
H3 – smartphone data
WP7: Incorporating mobile data
Working with visiting researcher from Chile
Thesis write-up
Thesis submission

U-Campus
U-Cursos
Monthly
forum
“dump”
PSU
ADI
Manual
enrolments
at Faculty
level
Students
automated
enrolments
Digitalisation
(some)
Digitalisation
Institutional
information
Student RUT,
name, address,
socioeconomic
data, age, etc
Course data (e.g. syllabus,
resources, coursework
specs, timetable, news,
student polls)
Student data (e.g. RUT,
names, email addresses,
avatars, courseworks,
partial marks, timetables,
final marks or fail status
(R/E/I))
U-Cursos
Mobile Lecturer/instructor data
(e.g. roles, courses,
permissions)
STI
Figure 5.1: Data architecture at the University of Chile: U-Campus and U-Cursos,
with processes and entities responsible for their management: ADI is the University of
Chile’s Information Technologies group (Área de Infotecnologı́as in Spanish) and STI
is the University of Chile’s Division of IT and Communications (Dirección de Servicios
de Tecnologı́as de Información y Comunicaciones ).
SPSS) between “conditioning” factors and the academic performance to date as mea-
sured by the PSU test. Table 5.2 shows the data fields available for this test5, with marks
(X) next to those which are of interest for this analysis, in particular: socio-economic
indicators and the average high-school marks, since they are generally accepted as re-
liable predictors of academic performance in the literature. Additional factors, such as
gender, age and nationality have been identified in the global literature as influential,
therefore I will also incorporate this data. Specifically for the Chilean case, it has been
reported that the PSU test is widely regarded as being biased towards school-leavers
of private schools and towards the metropolitan area. Therefore, I will also study the
impact of the educational institution of origin and the home city on the academic per-
formance prior to the test (in this work package) and then later in Higher Education
(in WP5). Finally, after certain pre-processing6, other fields (marked with †) are also
5
See Appendix G for further details, including screenshots of a sample student application.
6
In order to guarantee anonymity, it is necessary to avoid sensitive data, such as the name, phone
numbers, email, exact home address (street and house number), and exact date of birth (month and
year will suffice).

necessary. In particular, I will require the national identification number (hashed or
otherwise protected), since this will act as a unique key which could be used to link the
data from the PSU test (“conditioning data”) to the measures of academic performance
available via U-Cursos in WP5.
At this point, I will have sufficient evidence to either support or reject hypothesis
H1 (“traditional learning analytics on conditioning factors are suitable predictors of
success”), as indicated in the Gantt chart (Table 5.1). My findings will be discussed
with researchers in ADI and NIC Labs during my second research visit (for two weeks,
exact dates TBA), where I will complete the analysis and commence work on WP5.
The visit will be used also to agree with these researchers on measurable behavioural
factors that are feasible to study via the smartphone extension of U-Cursos, which will
be required for WP7.
For WP5, data from U-Cursos will offer some information on measures of academic
performance and “behavioural factors”, limited to how students interact with the plat-
form, in terms of type and frequency of their access, including coursework submission
information and interim assessments. This data will be analysed and correlations and
statistical dependencies will be studied (using SPSS). Additionally, I will apply data
mining techniques to formulate a prediction model of successful performance, consider-
ing these variables as classifying features.
WP6 concerns the integration of the conditioning factors (as gathered from U-
Campus) and behavioural factors (from U-Cursos). Since the number of variables
available will increase significantly, it is essential to apply feature selection methods
to improve the model and avoid overfitting. A number of classification methods from
the data mining toolset WEKA could be used, for example Naı̈ve Bayes, which has been
also used by Bhardwaj and Pal (2011) to predict academic performance7. As an outcome
of this work package, I intend to submit a research paper to the journal Computers and
Education8, where the evidence gathered to prove or disprove hypothesis H2 will be dis-
cussed. The effort in writing this paper will count towards the task “Thesis write-up”,
shown last in Table 5.1), hence this is shown as formally starting at the same time as
WP6, though in practice the writing takes place throughout the research project. Fi-
nally, WP7 concerns entirely in testing hypothesis H3 (“Smartphone data can be used
to inform the prediction model”), and will incorporate data from U-Cursos mobile to
the model created as part of WP6.
7
Bhardwaj and Pal (2011) only used conditioning variables such as those to be studied in WP4.
8
Some of the journal Computers and Education impact metrics are: Impact per Publication (IPP)
of 3.720 and Impact Factor (IF) of 2.775. As reported at http://www.journals.elsevier.com/
computers-and-education/ (last accessed on the 4th
July 2014).

Table 5.2: University Selection Tests (Prueba de Selección Universitaria, PSU) data
fields. Data from fields marked in bold will be used to validate H1, complemented
with other fields of interest (marked X). Note that fields marked † will require some
preprocessing for anonymisation. (Based on http://www.demre.cl/instr_incrip_
p2014.htm. Last accessed: 3th
July 2014).
Personal data (Comments)
Full name prefilled on login
† National identification number prefilled on login
X Country of nationality
X Gender prefilled on login
† Date of Birth prefilled on login
X Occupation two choices: Student or blank field
School data
X Type of applicant either from current or previous years
X Educational Institution prefilled
Educational Branch institutions may have several ones
X Year of graduation from High School prefilled
X Average high-school marks prefilled if from previous years
Geographical Area prefilled
Test choices data
Test choices Social and/or pure sciences (but just one
amongst Biology, Physics and Chemistry)
Admissions office
Test venue dropdown menu
Personal contacts
Home address: street, number
X Home: city, region and province dropdown menus
Phone numbers
E-mail address
Socio-economic data
X Marital status dropdown menu
X Work status dropdown menu
X Working hours dropdown menu
X Number of working hours a week
X Term time type of accomodation dropdown menu
X Household size
X Number of people in the household
in employment
X Who is the head of the household? dropdown menu
X Are your parents alive?
X How many people study in your
household
discriminated by educational stage
X Have you studied in a Higher Educa-
tion Institution
Yes/No
X If so, type of institution dropdown menu
Name of institution
About each parent
X Occupation multiple choice
X Industry multiple choice
Funding and payment
X Are you a beneficiary of a junaeb
scholarship?
dropdown menu

5.4 Contingency research plan
The research plan above described is predicated on acquiring specific data from a sub-
stantially large group of students, in particular, U-Campus, U-Cursos and U-Cursos
mobile. Although I have successfully established the appropriate contacts at the Uni-
versity of Chile (in the ADI group and with NIC Labs), and substantial progress has
already been made towards accessing U-Cursos and U-Campus data, a contingency plan
is in place for the event of failure to secure suitable data.
My contacts from the University of Chile have been forthcoming in answering my
questions as I become familiar with the platform and the organisation itself. My con-
tribution in this collaboration is that my findings will be used to inform the evolution
of the platform and further extensions are likely to incorporate “nudges” for a future
digital behavioural intervention seeking to improve retention and shortening the length
of time students need to graduate. Our close collaboration is already fruitful, as during
my research visit last March, we were able to prepare a research paper together where
U-Cursos is well described (Cádiz et al., 2014, as in Section 4.3). However, despite this
strong assurances evidencing their willingness for sharing the relevant data with me,
there are some practical issues to be resolved which may affect the feasibility of securing
the data as planned. In particular, the data architecture seems to have followed an
ad-hoc design and there are many redundancies and inefficiencies of which I have just
began to become aware. Being distributed across a number of tables, many a time on
separate sites, it is not a matter of simply being granted access to a centralised reposi-
tory. In addition, our requirement for anonymisation of the data adds another level of
uncertainty (which is hard to quantify) as this clearly will require time and effort by my
Chilean colleagues.
Should it be the case that the contingency research plan is carried out, hypotheses H1
and H2 may alternatively be tested on data from the University of Southampton Massive
Open Online Courses (MOOCs)9, which are run by the University of Southampton via
Future Learn.
Data regarding several conditioning factors to test hypothesis H1 are also harvested
during enrolment in these courses as part of a “pre-course” questionnaire. These include
socio-economic indicators (e.g. age, country, gender, employment status and reported
disabilities if any), and other conditioning factors such as course expectations, reported
learning preferences, subject areas of interest, and prior education (both in formal edu-
cation and in other MOOCs). Given this data, a similar study as that planned for WP4
can still be undertaken but using this data instead.
9
As an example, the MOOC “How the Web is Changing the World” has had two intakes since
2012 (and is running for third time this October). Further details at http://www.soton.ac.uk/moocs/
webscience.shtml (last accessed on the 26th
June 2014).

With regards to the testing of H2, there are a number of datasets available, for which
there is implicit consent from participants for their use in research. These datasets are
files in Comma Separated Value (CSV) format, the most relevant being:
• the End of Course dataset – contains metrics such the proportion of those who
enrolled in the course (“Joiners”) has abandoned (“leavers”). Other characterisa-
tions include: “Learners”(those who have viewed at least one step of the course),
“active learners” (thouse who has marked at least one step as complete),“returning
learners” (those who completed steps in more than one week), “social learners”
(those who have left at least one comment), and “fully participated learners” (sic),
those who have completed a majority of the steps including all tests10.
• the Step Completion dataset – Note that each course has a number of “steps” that
need to be completed to succeed (typically watching a video, reading a text, or
completing an assessment). Each step can have a number of comments associated.
• the Quiz data – which would constitute a proxy for “marks” in the traditional
sense; and
• the Comments dataset – Table 5.3 is a detailed example of the structure of this
datasets, the Comments dataset.
A “post-course” questionnaire, though mainly intended as a course evaluation ex-
ercise (and therefore including questions where the student rates the course in several
ways), also helps in gathering other indicators of the learning behaviour, such as point
of entry (whether from the start of the course or later on), reasons for attrition (if the
course was abandoned) and specific learning behaviours adopted investigating dedication
in time and effort, reported frequency of access, reflection, collaboration (through social
media as well as via comments in a step within the course) and connectivity (devices
used to access the course and typical study places) as well as their use of prior learning.
Combined, these datasets record all the interactions between participants through
the platform and hold a complete record of achievement and progress as the students
take on the various tasks and assessments in the course.
Admittedly, hypothesis H3 cannot be tested using MOOCs data, but alternatively
we would formulate a domain-specific hypothesis applicable to online-only courses, as
opposed to face-to-face instruction supported by an LMS, which is the case of interest
in the current plan. Also in this case, a shift in focus will be necessary, an the literature
review presented in Section 2.2.3.
10
Thanks to Kate Dickens from the Centre for Innovation in Technologies and Education (CITE) for
facilitating this information.

Table 5.3: FutureLearn Platform Data Exports. Adapted from https://www.
futurelearn.com/courses/course-slug/). (Last accessed: 4th
July 2014, by Kate
Dickens (Project Leader for the Web Science MOOC).
Comments
id [integer] a unique id assigned to each comment
author id [string] the unique, anonymous id assigned to the author
user
parent id [integer] the unique id of the parent comment (i.e. the com-
ment this comment replies to)
step [string] the human readable step number (e.g. 1.13)
text [string] the comment text
timestamp [timestamp] when the comment was posted
moderated [timestamp] the time at which a comment was moderated, if at
all
likes [integer] the number of likes attributed to the comment
Peer Review - Assignments
id [integer] a unique id assigned to each assignment submission
(referenced by reviews)
user
text [string] the comment text
first viewed at [timestamp] when the assignment step was first viewed
created at [timestamp] when the assignment was submitted
moderated [timestamp] the time at which a comment was moderated, if at
all
review count [integer] how many reviews are associated with the assign-
ment
Peer Review - Reviews
id [integer] a unique id assigned to each assignment review
user
assignment id [integer] the id identifying the assignment reviewed
guideline one feedback [string] text submitted for the first guideline
guideline two feedback [string] text submitted for the second guideline
guideline three feedback
[string]
text submitted for the third guideline
created at [timestamp] when the review was submitted

5.5 Summary
This Chapter presented the motivation behind the research question “What are the
measurable factors for the prediction of student academic success?” and outlined three
research hypothesis associated to it. Two of these hypothesis consider conditioning
and behavioural factors as predictors of academic success, whilst the last one regards
smartphone data as suitable to inform a prediction model of success. In order to test
them, a number of work packages (WP1-WP7) are planned, with deliverables at specific
points in the time remaining until the submission of the final thesis. I have also outlined
a contingency research plan should the data expected from the University of Chile prove
difficult to obtain for unforseen circumstances.
The following Chapter will outline future work that has been identified yet is beyond
the scope of this research given the time and resources remaining.

Chapter 6
Conclusions
This research will explore the predictability of student success from learning analytics
on big data sets. In particular, we seek to analyse a rich “data trail” of student activities
as gathered via their interactions with a Learning Management System (LMS), such as
the University of Chile’s U-Cursos1. This data can be combined with data captured by
the institution at first enrolment, such as socio-economic indicators (typically used in
traditional learning analytics). From this analysis, a model of academic success will be
developed, providing insight on the factors influencing academic performance amongst
other measurable proxies for success.
A primary motivation behind seeking such an insight is that it would facilitate the
identification of students “at risk”, and further enable behavioural interventions so that
students can be supported in becoming successful in their studies. A greater, lasting goal
would be to influence student behaviour via persuasive technologies, so that the students
themselves are empowered to effect a significant change. This is a long-term goal beyond
the scope of the present research. Whilst the rich interconnection necessary for a digital
behavioural intervention is not yet fully supported, and the existing student data is both
incomplete and noisy for this specific purpose, we can still gain some knowledge of how
it might look by examining current student data, from both the educational and the
pervasive computing perspectives.
The central theme of this research is learning analytics, informed by relevant studies
on behavioural interventions and the application of pervasive computing to education. In
order to build on the traditional learning analytics research approaches (generally limited
to data controlled by the educational institution), I have also considered including data
that could offer an additional insight into student behaviour, by articulating descriptions
of what successful students do when they are not studying.
1
Developed by the University of Chile’s Information Technologies group (ADI, Área de Infotecnologı́as
in Spanish).
43

A Mini-Thesis Submitted For Transfer From MPhil To PhD Predicting Student Success With Learning Analytics On Big Data Sets Conditioning And Behavioural Factors

A Mini-Thesis Submitted For Transfer From MPhil To PhD Predicting Student Success With Learning Analytics On Big Data Sets Conditioning And Behavioural Factors

Recommended

Recommended

More Related Content

Similar to A Mini-Thesis Submitted For Transfer From MPhil To PhD Predicting Student Success With Learning Analytics On Big Data Sets Conditioning And Behavioural Factors

Similar to A Mini-Thesis Submitted For Transfer From MPhil To PhD Predicting Student Success With Learning Analytics On Big Data Sets Conditioning And Behavioural Factors (20)

More from Joaquin Hamad

More from Joaquin Hamad (20)

Recently uploaded

Recently uploaded (20)

A Mini-Thesis Submitted For Transfer From MPhil To PhD Predicting Student Success With Learning Analytics On Big Data Sets Conditioning And Behavioural Factors