Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

"Survival Analysis" for Online Learning Data

353 views

Published on

Everything exists in time. “Survival analysis” is a statistical technique long used in the health sciences. As “time-to-event analysis,” it enables the asking of questions like: How much time passes before (an event) occurs, if it occurs, and what does this data suggest about various in-world phenomena?
In an online learning context, with LMS data, questions such as the following may be answered:
How long does it take for an online learner to (find his or her rhythm) in an online course? (if it happens)
How long does it take for an online instructor (to get to know) a particular student in a more person-to-person way? (if it happens)
How long does it take for an online learner to (form basic facility) with a new software tool? (if it happens)
How long does it take for a student researcher to (achieve breakout capacity) in (a particular skill)? (if it happens)
How long does it take for a doctoral student to (publish his / her first peer-reviewed paper)? (if it happens)
And what are observable variables that may affect whether the particular observed “state” is achieved or not? And if achieved, whether the occurrence is “early” or “late” in comparison with other comparable events?
This presentation will introduce survival analysis, its basic assumptions, its practice (using SPSS), its strengths and limitations, data “censoring” (to avoid “survivorship bias”), and ways to interpret related linegraphs and other related data visualizations.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

"Survival Analysis" for Online Learning Data

  1. 1. “SURVIVAL ANALYSIS” FOR ONLINE LEARNING DATA SIDLIT 2017 Aug. 3 – 4, 2017
  2. 2. PRESENTATION Everything exists in time. “Survival analysis” is a statistical technique long used in the health sciences. As “time-to-event analysis,” it enables the asking of questions like: How much time passes before (an event) occurs, if it occurs, and what does this data suggest about various in-world phenomena? 2
  3. 3. PRESENTATION (CONT.) In an online learning context, with LMS data, questions such as the following may be answered:  How long does it take for an online learner to (find his or her rhythm) in an online course? (if it happens)  How long does it take for an online instructor (to get to know) a particular student in a more person-to- person way? (if it happens)  How long does it take for an online learner to (form basic facility) with a new software tool? (if it happens)  How long does it take for a student researcher to (achieve breakout capacity) in (a particular skill)? (if it happens)  How long does it take for a doctoral student to (publish his / her first peer-reviewed paper)? (if it happens) 3
  4. 4. PRESENTATION (CONT.) And what are observable variables that may affect whether the particular observed “state” is achieved or not? And if achieved, whether the occurrence is “early” or “late” in comparison with other comparable events? This presentation will introduce survival analysis, its basic assumptions, its practice (using IBM’s SPSS), its strengths and limitations, data “censoring” (to avoid “survivorship bias”), and ways to interpret related linegraphs and other related data visualizations. 4
  5. 5. PRESENTATION ORDER 1. Early Applications of “Survival Analysis” 2. Some Common Terms 3. Other Forms and Applications of “Survival Analysis” 4. Basic Elements of a “Time-to-Event” Analysis 5. Applications to Online Learning Data 6. One Example (with Faux Data) 7. A Few Questions 8. Some Takeaways 9. Light Debriefing 5
  6. 6. EARLY APPLICATIONS OF “SURVIVAL ANALYSIS” 1. 6
  7. 7. EARLY “SURVIVAL ANALYSIS” IN THE HEALTH SCIENCES Use of empirical time-series data of a group of individuals with particular life- threatening health issues to see what their survival trajectories were over time The “time-to-event” is measured, with the “event” being non-survival Extraction of a regression curve of those who survived and those who did not (and when they passed) These datapoints are represented as a non-increasing (not “decreasing” because there are times of plateaus in which no events of non-survival occur) linegraph Time may be measured in various discrete units (from coarse to fine granularity) or continuously 7
  8. 8. EARLY “SURVIVAL ANALYSIS” IN THE HEALTH SCIENCES (CONT.) In the health context, survival curves may inform actuarial tables for expected survival given particular age, health states, and behavioral practices. Comparisons of survival curves may be made between comparable groups, albeit those receiving different interventions or treatments (within ethical guidelines). Particular group’s survival curves may be compared, such as between males and females, individuals of different age groups, individuals with different lifestyles, individuals from different social classes, individuals from different geographical locations, and so on 8
  9. 9. SOME COMMON TERMS 2. 9
  10. 10. SOME COMMON TERMS Time Zero is the beginning of the study S(t) is “survival at time ‘t’” Survival is a factor of time and also a factor of “hazard” (the risk of non-survival)  The survival rate has a negative correlation with the hazard rate (the higher one is, the lower the other)  The hazard function is non-decreasing and accumulates over time  Sometimes, hazards are considered constant; other times, hazards may increase or decrease over time, depending on the phenomenon being modeled 10
  11. 11. SOME COMMON TERMS (CONT.) Data “censoring” refers to the members of the population who are part of the study but who either drop out or do not achieve event (whatever that event might be in the particular dataset); their data is “lost to follow-up”  Left censoring suggests a lack of event information prior to the participant’s entrance to the study  Right censoring suggests a lack of event information after the end of the study and the participant’s exit from the study Including censored data precludes “survivorship bias” or overweighting the effects of data that “survive” the research period because it is salient (attention-getting) and missing the more quiet or subtle data in the background  Including censored data means that the data is more representational of real-world observations 11
  12. 12. OTHER FORMS AND APPLICATIONS OF “SURVIVAL ANALYSIS” 3. 12
  13. 13. OTHER FORMS AND APPLICATIONS OF “SURVIVAL ANALYSIS” Other Forms of “Survival Analysis” Time-to-event analysis Event history analysis Reliability analysis Duration analysis Some Fields of Application Engineering Economics Sociology Political science Marketing Education 13
  14. 14. TIME-TO-EVENT ANALYSIS For contexts beyond the health sciences, “survival analysis” has evolved to “time-to- event” analysis The independent variable (IV) is time The dependent (outcome) variable (DV) is time-to-event There are potential covariates or other variables that affect survival outcomes— positively or negatively (to varying degrees)  These may affect hazard rates (risk of event at any particular time slice) and survival rates 14
  15. 15. BASIC ELEMENTS OF A TIME-TO-EVENT ANALYSIS 4. 15
  16. 16. BASIC ELEMENTS NEEDED FOR A SIMPLE TIME-TO- EVENT ANALYSIS Required Define-able and Observable Elements A population and phenomenon to study Defined units of time (aka “spell”) An event (or censoring) Additional Features Access to the data, over time Ability to consistently maintain the particular unit of time observation Ability to observe either achievement of event or non-achievement of event (censored data) 16
  17. 17. APPLICATIONS TO ONLINE LEARNING DATA 5. 17
  18. 18. THREE REQUIRED TYPES OF DATA A population and phenomenon to study (expressed as string data written in camelCase)  Population may be animate or inanimate  Each member of the population is an “experimental unit” (represented in a data table as row data) Defined units of time or continuous time (time aka “spell”) (expressed as integer data) An event (or censoring) (expressed as a dummy variable with 1 = event, 0 = censored) 18
  19. 19. ADDITIONAL INFORMATION THAT MAY BE COLLECTED Univariate data: For each row (or experimental unit), time-to-event (or no record of achievement of event, in which case there is censored data) Bivariate data: For each row, capture of both time-to-event and event or censoring…but also one other qualitative (categorical) or quantitative feature of the experimental unit Multivariate data: For each row, capture of time-to-event, event/censoring, and multiple other qualitative and / or quantitative features of the experimental unit 19
  20. 20. 20
  21. 21. SO… A time-to-event analysis is a time-series representation of a phenomenon that also includes the relative frequency of occurrences in time 21
  22. 22. SOME ASKABLE QUESTIONS USING TIME-TO-EVENT ANALYSIS APPLIED TO ONLINE LEARNING DATA How much time passes before…  An online instructor uses a particular feature or tool in an LMS (learning management system)?  An online instructor reaches out to his / her students?  An online instructor uses the LMS for a non-course application?  An online instructor starts (or stops) usage of a particular digital learning object?  An online instructor uses the mobile app to use the LMS?  An online instructor finalizes and submits grades for the particular term?  An online instructor teaching online commits to the online teaching modality? 22
  23. 23. SOME ASKABLE QUESTIONS USING TIME-TO-EVENT ANALYSIS APPLIED TO ONLINE LEARNING DATA(CONT.) How much time passes before…  An online learner submits a first assignment?  An online learner makes a first friend online?  An online learner commits to completing an online course or online learning sequence?  An online learner communicates with his or her instructor?  An online learner contests a grade with the instructor?  An online learner uploads an image? 23
  24. 24. SOME ASKABLE QUESTIONS USING TIME-TO-EVENT ANALYSIS APPLIED TO ONLINE LEARNING DATA(CONT.) How much time passes before…  A university adds a new feature to an LMS at the instance-level?  A university is able to attract a sufficient number of learners to a program to ensure that it is self- sustaining?  A university considers moving from a particular LMS (from time-of-adoption)? 24
  25. 25. ONLINE LEARNING DATA Online learning data comes from a number of sources:  an LMS data portal  scraped discussion board data from an online course  a third-party app used in online learning  admin or instructor access to a course  grades in a student information system  demographic data in a student information system Some of the data to access will require more effort to collect than others Some of the data is not collected anywhere and may have to be inferred (from multiple data streams) or imputed (substituting values for missing data based on a reasonable method) 25
  26. 26. ONLINE LEARNING DATA(CONT.) The ability to use data for research depends on a number of policies and laws, so any research should go through the IRB (institutional review board) process, and private information cannot generally be used.  There are rules for the safe handling of information as well. These should also be followed to the letter. 26
  27. 27. ONE EXAMPLE (WITH FAUX DATA) 6. 27
  28. 28. THE FAUX DATA What is the amount of time (in days) for a group of 26 online students to make a friend in an online learning context? Three columns: UniqueIdentifier (letters), Days (timeunit), Censored (1or 0) 28
  29. 29. SETUP IN IBM’S SPSS Open SPSS. File -> Open -> Data (Enable “All Files” if there is a variety of files…) Once data are loaded, go to Analyze -> Survival -> Kaplan-Meier 29
  30. 30. SELECTION OF THE KAPLAN-MEIER SURVIVAL ANALYSIS 30
  31. 31. SETUP IN IBM’S SPSS(CONT.) Place the column data into the correct areas: Time, Status, and Label Cases by… 31
  32. 32. SETUP IN IBM’S SPSS(CONT.) Click the Status section, and then click the activated “Define Event” button below it. Clarify that 1 is used to indicate the occurrence of an “event,” and 0 means “censored.” Click Continue. 32
  33. 33. SETUP IN IBM’S SPSS(CONT.) Click Save. Check which features you want: Survival, Standard Error of Survival, Hazard, and Cumulative events. Click Continue. 33
  34. 34. SETUP IN IBM’S SPSS (CONT.) Click Options. Indicate whether you want Quartiles. Also, select the Plots you want: Survival, One Minus Survival, Hazard, and Log Survival. Click Continue. Click Save. 34
  35. 35. SETUP IN IBM’S SPSS(CONT.) When this is set up properly, the “OK” button at the bottom of the “Kaplan-Meier” window will be activated. 35
  36. 36. 36
  37. 37. 37
  38. 38. 38
  39. 39. 39
  40. 40. 40
  41. 41. 41
  42. 42. 42
  43. 43. A FEW QUESTIONS 7. 43
  44. 44. A FEW QUESTIONS How long was the study period (observation period)? How many students took part? What was the general time pattern in terms of friend-making? How many learners had not made online friends by the end of the study period? What might happen if this faux study went longer? Why? What might happen if more learners were included? 44
  45. 45. A FEW QUESTIONS (CONT.) What are some “hazards” for learner friend-making in an online course? Why? Are there possible “covariates” that might explain friend-making among learners in an online course? If this were real data, what might you actually see? 45
  46. 46. SOME TAKEAWAYS 8. 46
  47. 47. SOME TAKEAWAYS A “survival analysis” or a “time-to-event analysis” shows how much time passes before an event occurs for a particular population.  In a “survival analysis,” the event is non-survival (and is permanent).  In a “time-to-event analysis,” the event can be any objectively observable defined occurrence in time, and this event may be positive or negative.  The “population” in a “survival analysis” are people (or other living things).  The “population” in a “time-to-event analysis” may be inanimate things,  like equipment (When does this equipment fail under these defined conditions?)  like socio-political phenomena (When does war occur between two non-democratic countries over a fight over borders or land in the contemporary era?)  like technologies (When does a zero-day exploit age out from usefulness in a particular software suite?)  like plants (When does a particular seed germinate in a particular greenhouse environment?), and so on 47
  48. 48. SOME TAKEAWAYS(CONT.) These analyses include censored data, in order to capture a more real-world sense of the information and in order to avoid “survivorship bias” of salient information (which may skew the perception of the data).  “Survivorship bias” refers to the mistaken impression of a phenomenon because the available data is captured and noticed (is salient) whereas less available data remains invisible and potentially not noticed.  Just paying attention to “surviving” data will skew impressions and lead to incorrect analysis.  A simple example is that only students who “survive” to the end of an online class will evaluate the instructor and the online course. Those who are not heard are those who failed to survive to the end, but they may have helpful insights that would improve the teaching and the online course’s design. 48
  49. 49. SOME TAKEAWAYS(CONT.) The hazard function and the survival function have a negative correlation. More of one means less of the other.  The higher the hazard, the lower the survival rate (at a particular time or time period).  The higher the survival rate, the lower the hazard rate (at a particular time or time period). A one-minus-survival table shows cumulative event accumulation over time and a sense of probability of event at each time unit or juncture.  At Time Zero, the entire population is alive with 100% survival.  Over time, the population experiences attrition, so the survival rate falls.  Risks increase over time.  There may be time periods of particular risk, whether early or mid-point or later in a process, depending on the phenomenon being studied. (A common example is the bathtub curve for the human life span. Once babies survive early threats to their mortality, they grow into adulthood and tend to have lower risk through adulthood, but that risk of non-survival rises again as they attain old age. In other words, hazard functions change over time and vary.) 49
  50. 50. SOME TAKEAWAYS (CONT.) The three types of data required for a simple survival analysis include the following:  A population and phenomenon to study (as string data written in camelCase)  Defined units of time (aka “spell”) (as integer data)  An event (or censoring) (as a dummy variable with 1 = event, 0 = censored) The unit time may be continuous, or it may be discrete. If it is discrete, the time has to be in consistent units (and the visual display should be accurate to that). 50
  51. 51. LIGHT DEBRIEFING 9. 51
  52. 52. STATISTICAL CENTRAL TENDENCIES OF THE DATA 95% of the population that achieve event (make a friend on an online course) will achieve event within 3.4 – 5.8 days, on average, and those who fall outside that range tend to be outliers The mid-point of time-to-event for this population (with half of the scores falling below and half of the scores falling above) ranges from 1.9 days to 6 days, so there is a fair amount of variance. 52
  53. 53. SOME PERCENTILE-BASED OBSERVATIONS A vast majority of online learners who ultimately make friends tend to make friends fairly quickly, within about two days spent online. Half of the online learners in this class who actually make friends do so within four days online. For a fourth of the population who ultimately make friends, they make friends within 7 days online. From the 26 students, three of the learners have “censored” data. What does this mean? What does it mean that their data is “lost to followup”? 53
  54. 54. DATA CENSORING Left-censoring, if it existed, would be learners who were already friends prior to the research observation period.  Certainly, this is not an uncommon possibility, with friends taking classes together, so they can support each other’s learning.  In this faux data example, this was not depicted.  Of course, there are other potential pasts possible with the population. Censoring refers to a lack of information, and it does not necessarily suggest event occurrence. Right-censoring, if it existed, would be learners who become friends (achieve event) or not (do not achieve event) after the end of the research observation period. In this faux data case, there are some instances of censoring albeit during the observation period. This may be conceptualized as people who have decided against being friends. 54
  55. 55. CONTACT AND CONCLUSION Dr. Shalin Hai-Jew  Kansas State University  212 Hale Library  785-532-5262  shalin@k-state.edu 55

×