"Survival Analysis" for Online Learning Data

“SURVIVAL ANALYSIS”
FOR ONLINE LEARNING DATA
SIDLIT 2017
Aug. 3 – 4, 2017

PRESENTATION
Everything exists in time. “Survival analysis” is a statistical technique long used in the
health sciences. As “time-to-event analysis,” it enables the asking of questions like:
How much time passes before (an event) occurs, if it occurs, and what does this data
suggest about various in-world phenomena?
2

PRESENTATION (CONT.)
In an online learning context, with LMS data, questions such as the following may be
answered:
 How long does it take for an online learner to (find his or her rhythm) in an online course? (if it
happens)
 How long does it take for an online instructor (to get to know) a particular student in a more person-to-
person way? (if it happens)
 How long does it take for an online learner to (form basic facility) with a new software tool? (if it
happens)
 How long does it take for a student researcher to (achieve breakout capacity) in (a particular skill)? (if
it happens)
 How long does it take for a doctoral student to (publish his / her first peer-reviewed paper)? (if it
happens)
3

PRESENTATION (CONT.)
And what are observable variables that may affect whether the particular observed
“state” is achieved or not? And if achieved, whether the occurrence is “early” or
“late” in comparison with other comparable events?
This presentation will introduce survival analysis, its basic assumptions, its practice
(using IBM’s SPSS), its strengths and limitations, data “censoring” (to avoid
“survivorship bias”), and ways to interpret related linegraphs and other related data
visualizations.
4

PRESENTATION ORDER
1. Early Applications of “Survival Analysis”
2. Some Common Terms
3. Other Forms and Applications of “Survival Analysis”
4. Basic Elements of a “Time-to-Event” Analysis
5. Applications to Online Learning Data
6. One Example (with Faux Data)
7. A Few Questions
8. Some Takeaways
9. Light Debriefing
5

EARLY APPLICATIONS OF
“SURVIVAL ANALYSIS” 1.
6

EARLY “SURVIVAL ANALYSIS” IN THE HEALTH
SCIENCES
Use of empirical time-series data of a group of individuals with particular life-
threatening health issues to see what their survival trajectories were over time
The “time-to-event” is measured, with the “event” being non-survival
Extraction of a regression curve of those who survived and those who did not (and
when they passed)
These datapoints are represented as a non-increasing (not “decreasing” because
there are times of plateaus in which no events of non-survival occur) linegraph
Time may be measured in various discrete units (from coarse to fine granularity) or
continuously
7

EARLY “SURVIVAL ANALYSIS” IN THE HEALTH
SCIENCES (CONT.)
In the health context, survival curves may inform actuarial tables for expected
survival given particular age, health states, and behavioral practices.
Comparisons of survival curves may be made between comparable groups, albeit
those receiving different interventions or treatments (within ethical guidelines).
Particular group’s survival curves may be compared, such as between males and
females, individuals of different age groups, individuals with different lifestyles,
individuals from different social classes, individuals from different geographical
locations, and so on
8

SOME COMMON TERMS
Time Zero is the beginning of the study
S(t) is “survival at time ‘t’”
Survival is a factor of time and also a factor of “hazard” (the risk of non-survival)
 The survival rate has a negative correlation with the hazard rate (the higher one is, the lower the other)
 The hazard function is non-decreasing and accumulates over time
 Sometimes, hazards are considered constant; other times, hazards may increase or decrease over time, depending on the
phenomenon being modeled
10

SOME COMMON TERMS (CONT.)
Data “censoring” refers to the members of the population who are part of the study
but who either drop out or do not achieve event (whatever that event might be in the
particular dataset); their data is “lost to follow-up”
 Left censoring suggests a lack of event information prior to the participant’s entrance to the study
 Right censoring suggests a lack of event information after the end of the study and the participant’s
exit from the study
Including censored data precludes “survivorship bias” or overweighting the effects of
data that “survive” the research period because it is salient (attention-getting) and
missing the more quiet or subtle data in the background
 Including censored data means that the data is more representational of real-world observations
11

OTHER FORMS AND APPLICATIONS
OF “SURVIVAL ANALYSIS” 3.
12

OTHER FORMS AND APPLICATIONS OF “SURVIVAL
ANALYSIS”
Other Forms of “Survival Analysis”
Time-to-event analysis
Event history analysis
Reliability analysis
Duration analysis
Some Fields of Application
Engineering
Economics
Sociology
Political science
Marketing
Education
13

TIME-TO-EVENT ANALYSIS
For contexts beyond the health sciences, “survival analysis” has evolved to “time-to-
event” analysis
The independent variable (IV) is time
The dependent (outcome) variable (DV) is time-to-event
There are potential covariates or other variables that affect survival outcomes—
positively or negatively (to varying degrees)
 These may affect hazard rates (risk of event at any particular time slice) and survival rates
14

BASIC ELEMENTS OF
A TIME-TO-EVENT ANALYSIS 4.
15

BASIC ELEMENTS NEEDED FOR A SIMPLE TIME-TO-
EVENT ANALYSIS
Required Define-able and Observable
Elements
A population and phenomenon to study
Defined units of time (aka “spell”)
An event (or censoring)
Additional Features
Access to the data, over time
Ability to consistently maintain the
particular unit of time observation
Ability to observe either achievement of
event or non-achievement of event
(censored data)
16

APPLICATIONS TO
ONLINE LEARNING DATA 5.
17

THREE REQUIRED TYPES OF DATA
A population and phenomenon to study (expressed as string data written in
camelCase)
 Population may be animate or inanimate
 Each member of the population is an “experimental unit” (represented in a data table as row data)
Defined units of time or continuous time (time aka “spell”) (expressed as integer data)
An event (or censoring) (expressed as a dummy variable with 1 = event, 0 =
censored)
18

ADDITIONAL INFORMATION THAT MAY BE
COLLECTED
Univariate data: For each row (or experimental unit), time-to-event (or no record of
achievement of event, in which case there is censored data)
Bivariate data: For each row, capture of both time-to-event and event or
censoring…but also one other qualitative (categorical) or quantitative feature of the
experimental unit
Multivariate data: For each row, capture of time-to-event, event/censoring, and
multiple other qualitative and / or quantitative features of the experimental unit
19

SO…
A time-to-event analysis is a time-series representation of a phenomenon that also
includes the relative frequency of occurrences in time
21

SOME ASKABLE QUESTIONS USING TIME-TO-EVENT
ANALYSIS APPLIED TO ONLINE LEARNING DATA
How much time passes before…
 An online instructor uses a particular feature or tool in an LMS (learning management system)?
 An online instructor reaches out to his / her students?
 An online instructor uses the LMS for a non-course application?
 An online instructor starts (or stops) usage of a particular digital learning object?
 An online instructor uses the mobile app to use the LMS?
 An online instructor finalizes and submits grades for the particular term?
 An online instructor teaching online commits to the online teaching modality?
22

ANALYSIS APPLIED TO ONLINE LEARNING DATA(CONT.)
 An online learner submits a first assignment?
 An online learner makes a first friend online?
 An online learner commits to completing an online course or online learning sequence?
 An online learner communicates with his or her instructor?
 An online learner contests a grade with the instructor?
 An online learner uploads an image?
23

ANALYSIS APPLIED TO ONLINE LEARNING DATA(CONT.)
 A university adds a new feature to an LMS at the instance-level?
 A university is able to attract a sufficient number of learners to a program to ensure that it is self-
sustaining?
 A university considers moving from a particular LMS (from time-of-adoption)?
24

ONLINE LEARNING DATA
Online learning data comes from a number of sources:
 an LMS data portal
 scraped discussion board data from an online course
 a third-party app used in online learning
 admin or instructor access to a course
 grades in a student information system
 demographic data in a student information system
Some of the data to access will require more effort to collect than others
Some of the data is not collected anywhere and may have to be inferred (from
multiple data streams) or imputed (substituting values for missing data based on a
reasonable method)
25

ONLINE LEARNING DATA(CONT.)
The ability to use data for research depends on a number of policies and laws, so
any research should go through the IRB (institutional review board) process, and
private information cannot generally be used.
 There are rules for the safe handling of information as well. These should also be followed to the
letter.
26

ONE EXAMPLE (WITH FAUX DATA) 6.
27

THE FAUX DATA
What is the amount of time (in days) for a group of 26 online students to make a
friend in an online learning context?
Three columns: UniqueIdentifier (letters), Days (timeunit), Censored (1or 0)
28

SETUP IN IBM’S SPSS
Open SPSS.
File -> Open -> Data (Enable “All Files” if there is a variety of files…)
Once data are loaded, go to Analyze -> Survival -> Kaplan-Meier
29

SELECTION OF THE KAPLAN-MEIER
SURVIVAL ANALYSIS
30

SETUP IN IBM’S SPSS(CONT.)
Place the column data into the correct areas: Time, Status, and Label Cases by…
31

Click the Status section, and then click the activated “Define Event” button below it.
Clarify that 1 is used to indicate the occurrence of an “event,” and 0 means
“censored.”
Click Continue.
32

Click Save.
Check which features you want: Survival, Standard Error of Survival, Hazard, and
Cumulative events.
Click Continue.
33

SETUP IN IBM’S SPSS (CONT.)
Click Options. Indicate whether you want Quartiles. Also, select the Plots you want:
Survival, One Minus Survival, Hazard, and Log Survival.
Click Continue.
Click Save.
34

When this is set up properly, the “OK” button at the bottom of the “Kaplan-Meier”
window will be activated.
35

A FEW QUESTIONS
How long was the study period (observation period)?
How many students took part?
What was the general time pattern in terms of friend-making?
How many learners had not made online friends by the end of the study period?
What might happen if this faux study went longer? Why?
What might happen if more learners were included?
44

A FEW QUESTIONS (CONT.)
What are some “hazards” for learner friend-making in an online course? Why?
Are there possible “covariates” that might explain friend-making among learners in
an online course?
If this were real data, what might you actually see?
45

SOME TAKEAWAYS
A “survival analysis” or a “time-to-event analysis” shows how much time passes
before an event occurs for a particular population.
 In a “survival analysis,” the event is non-survival (and is permanent).
 In a “time-to-event analysis,” the event can be any objectively observable defined occurrence in time,
and this event may be positive or negative.
 The “population” in a “survival analysis” are people (or other living things).
 The “population” in a “time-to-event analysis” may be inanimate things,
 like equipment (When does this equipment fail under these defined conditions?)
 like socio-political phenomena (When does war occur between two non-democratic countries over a fight over borders or land in
the contemporary era?)
 like technologies (When does a zero-day exploit age out from usefulness in a particular software suite?)
 like plants (When does a particular seed germinate in a particular greenhouse environment?), and so on
47

SOME TAKEAWAYS(CONT.)
These analyses include censored data, in order to capture a more real-world sense
of the information and in order to avoid “survivorship bias” of salient information
(which may skew the perception of the data).
 “Survivorship bias” refers to the mistaken impression of a phenomenon because the available data is
captured and noticed (is salient) whereas less available data remains invisible and potentially not
noticed.
 Just paying attention to “surviving” data will skew impressions and lead to incorrect analysis.
 A simple example is that only students who “survive” to the end of an online class will evaluate the
instructor and the online course. Those who are not heard are those who failed to survive to the end,
but they may have helpful insights that would improve the teaching and the online course’s design.
48

SOME TAKEAWAYS(CONT.)
The hazard function and the survival function have a negative correlation. More of
one means less of the other.
 The higher the hazard, the lower the survival rate (at a particular time or time period).
 The higher the survival rate, the lower the hazard rate (at a particular time or time period).
A one-minus-survival table shows cumulative event accumulation over time and a
sense of probability of event at each time unit or juncture.
 At Time Zero, the entire population is alive with 100% survival.
 Over time, the population experiences attrition, so the survival rate falls.
 Risks increase over time.
 There may be time periods of particular risk, whether early or mid-point or later in a process,
depending on the phenomenon being studied. (A common example is the bathtub curve for the human
life span. Once babies survive early threats to their mortality, they grow into adulthood and tend to
have lower risk through adulthood, but that risk of non-survival rises again as they attain old age. In
other words, hazard functions change over time and vary.)
49

SOME TAKEAWAYS (CONT.)
The three types of data required for a simple survival analysis include the following:
 A population and phenomenon to study (as string data written in camelCase)
 Defined units of time (aka “spell”) (as integer data)
 An event (or censoring) (as a dummy variable with 1 = event, 0 = censored)
The unit time may be continuous, or it may be discrete. If it is discrete, the time has to
be in consistent units (and the visual display should be accurate to that).
50

STATISTICAL CENTRAL TENDENCIES OF THE DATA
95% of the population that achieve event (make a friend on an online course) will
achieve event within 3.4 – 5.8 days, on average, and those who fall outside that
range tend to be outliers
The mid-point of time-to-event for this population (with half of the scores falling
below and half of the scores falling above) ranges from 1.9 days to 6 days, so there
is a fair amount of variance.
52

SOME PERCENTILE-BASED OBSERVATIONS
A vast majority of online learners who ultimately make friends tend to make friends
fairly quickly, within about two days spent online.
Half of the online learners in this class who actually make friends do so within four
days online.
For a fourth of the population who ultimately make friends, they make friends within
7 days online.
From the 26 students, three of the learners have “censored” data. What does this
mean? What does it mean that their data is “lost to followup”?
53

DATA CENSORING
Left-censoring, if it existed, would be learners who were already friends prior to the
research observation period.
 Certainly, this is not an uncommon possibility, with friends taking classes together, so they can support
each other’s learning.
 In this faux data example, this was not depicted.
 Of course, there are other potential pasts possible with the population. Censoring refers to a lack of
information, and it does not necessarily suggest event occurrence.
Right-censoring, if it existed, would be learners who become friends (achieve event)
or not (do not achieve event) after the end of the research observation period.
In this faux data case, there are some instances of censoring albeit during the
observation period. This may be conceptualized as people who have decided
against being friends.
54

CONTACT AND CONCLUSION
Dr. Shalin Hai-Jew
 Kansas State University
 212 Hale Library
 785-532-5262
 shalin@k-state.edu
55

"Survival Analysis" for Online Learning Data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to "Survival Analysis" for Online Learning Data

Similar to "Survival Analysis" for Online Learning Data (20)

More from Shalin Hai-Jew

More from Shalin Hai-Jew (20)

Recently uploaded

Recently uploaded (20)

"Survival Analysis" for Online Learning Data