Everything exists in time. “Survival analysis” is a statistical technique long used in the health sciences. As “time-to-event analysis,” it enables the asking of questions like: How much time passes before (an event) occurs, if it occurs, and what does this data suggest about various in-world phenomena?
In an online learning context, with LMS data, questions such as the following may be answered:
How long does it take for an online learner to (find his or her rhythm) in an online course? (if it happens)
How long does it take for an online instructor (to get to know) a particular student in a more person-to-person way? (if it happens)
How long does it take for an online learner to (form basic facility) with a new software tool? (if it happens)
How long does it take for a student researcher to (achieve breakout capacity) in (a particular skill)? (if it happens)
How long does it take for a doctoral student to (publish his / her first peer-reviewed paper)? (if it happens)
And what are observable variables that may affect whether the particular observed “state” is achieved or not? And if achieved, whether the occurrence is “early” or “late” in comparison with other comparable events?
This presentation will introduce survival analysis, its basic assumptions, its practice (using SPSS), its strengths and limitations, data “censoring” (to avoid “survivorship bias”), and ways to interpret related linegraphs and other related data visualizations.
Using Large-Scale LMS Data Portal Data to Improve Teaching and Learning (at K...Shalin Hai-Jew
With any learning management system, a byproduct of its function is data, which may be analyzed to improve awareness, decision-making, and actions. At Kansas State University, its Canvas LMS instance recently made available its cumulative data from its first use in 2013. These flat files open a window to how the university is harnessing its LMS, with some macro-level insights that may suggest some areas to improve teaching and learning. This session describes some approaches to informatizing this empirical “big data” with some basic approaches: reviewing the data dictionary, extracting basic descriptions of the respective data sets, conducting time-based comparisons, surfacing testable hypotheses from data inferences, and conducting other data explorations. This introduces initial data analysis work only, but this does not preclude front-end analysis of courses at the micro level, relational database queries of the data, and other potential follow-on work.
Learning from Time-to-Event Data from Online Learning Contexts Shalin Hai-Jew
Time-to-event analysis is a statistical analysis approach that enables time-based insights about student learning, such as, How long does it take before a learner makes a new acquaintance in an online course? A new friend? How long does it take before a learner achieves breakout capacity in a particular learning sequence? How long does it take for a learner to commit to a course? This digital poster session presents time-to-event analysis (aka “survival analysis”) from real LMS data and shows how this analysis is done. Terms related to time-to-event analysis will be introduced, and the assertability of extracted data is explored.
Time-to-event analysis, in its simplest form, enables the study of in-world phenomena which includes the time it takes to achieve a particular defined “event” (whether negative or positive, desirable or undesirable), and it includes the nuance of “censored” data (in-world records for which data about event achievement was not attained during the time period of the analysis). This presentation introduces “time-to-event analysis” (on IBM’s SPSS Statistics) as applied to online educational data.
Leveraging Flat Files from the Canvas LMS Data Portal at K-StateShalin Hai-Jew
A lot of data are created in an LMS instance, and much of this can be analyzed for insight. In 2016, Instructure, the makers of Canvas, made their LMS data available to their customers through a data portal (updated monthly). This portal enables access to a number of flat files related to that particular instance. This presentation showcases how this big data was analyzed on a regular laptop with basic office software, to summarize Kansas State University’s use of the LMS. Methods for analysis include the following: basic descriptive statistics, survival analysis, computational linguistic analysis, and others.
The results are reported out with both numbers and data visualizations, including classic pie charts, line graphs, bar charts, mixed-charts, word clouds, and others. The findings provide some insights about how to approach the data, how to use a data dictionary, and other methods for extracting the data for awareness and practical decision-making. This work also is suggestive of next steps for more advanced analysis (using the flat files in a SQL database).
More information about this may be accessed at http://scalar.usc.edu/works/c2c-digital-magazine-spring--summer-2017/wrangling-big-data-in-a-small-tech-ecosystem.
The K-State Online Canvas LMS Data Portal and Five Years of Activated Third-P...Shalin Hai-Jew
The presenter will introduce the K-State LMS data portal and introduce some available insights from there and focus on one particular facet of this big data--the third-party apps that K-State faculty, admin, and staff have activated and what that says about how we're using Canvas.
Canvas LMS data portal for the Kansas State University instance
A data dictionary: Version 1.16.2 (https://portal.inshosteddata.com/docs)
Data extraction and processing
What it can tell us: (un)available data and information
Activated third-party tools in K-State Online Canvas LMS instance
Some caveats
What this says about what K-Staters (early adopters) are using
Practical applications of this third-party app activation data
Adding value to LMS data portal data
Creating Effective Data Visualizations in Excel 2016: Some BasicsShalin Hai-Jew
One of the mainstays of a modern software toolkit is Excel 2016, from Microsoft Office 2016. By reputation, Excel is considered a beginner’s tool that self-respecting data analysts would bypass, but Excel is fairly high-powered, can take up to 1.06 million rows of data per set, contains complex statistical analysis capabilities (without the need for scripting), and enables rich data visualizations. It has a number of rich add-ons to empower different analytical and data visualization functionalities. It works as a great bridging tool to more complex types of statistical analyses.
This session walks participants through some basic built-in data visualizations in Excel 2016, including pie charts and doughnuts, bar charts, tree maps and sunburst diagrams, cluster diagrams, spider (radar) charts, scattergraphs, and others. This session will cover how data structures and desired emphases will determine the options for particular data visualizations.
In this session, participants will
review how to load a data table,
read the general data in a data table (or worksheet),
process or clean the data as needed,
use the Recommended Charts feature,
decide which built-in data visualizations to use, and
consider how to add relevant data visualization elements (including data labels, background grids, axis labels, and titles) for a coherent and effective data visualization.
Also, participants will help co-build data visualizations from open-source and other datasets.
This slideshow reviews some of the features and functionalities of Qualtrics that enable its use in online trainings. This explores some important instructional design elements in online trainings, including for three main types: policy compliance, mass-scale trainings, and customized trainings. This reviews some core elements of online trainings. Finally, there are some reflections on real-world considerations when building an online training on Qualtrics.
Materials for introduction to adaptive learning and learning analytics as well as efforts of interoperability standardization. This slides treats brief concept of adaptive learning, reference model of learning analytics, data APIs for learning analytics, and topic list of standardization community (ISO/IEC JTC1 SC36).
Using Large-Scale LMS Data Portal Data to Improve Teaching and Learning (at K...Shalin Hai-Jew
With any learning management system, a byproduct of its function is data, which may be analyzed to improve awareness, decision-making, and actions. At Kansas State University, its Canvas LMS instance recently made available its cumulative data from its first use in 2013. These flat files open a window to how the university is harnessing its LMS, with some macro-level insights that may suggest some areas to improve teaching and learning. This session describes some approaches to informatizing this empirical “big data” with some basic approaches: reviewing the data dictionary, extracting basic descriptions of the respective data sets, conducting time-based comparisons, surfacing testable hypotheses from data inferences, and conducting other data explorations. This introduces initial data analysis work only, but this does not preclude front-end analysis of courses at the micro level, relational database queries of the data, and other potential follow-on work.
Learning from Time-to-Event Data from Online Learning Contexts Shalin Hai-Jew
Time-to-event analysis is a statistical analysis approach that enables time-based insights about student learning, such as, How long does it take before a learner makes a new acquaintance in an online course? A new friend? How long does it take before a learner achieves breakout capacity in a particular learning sequence? How long does it take for a learner to commit to a course? This digital poster session presents time-to-event analysis (aka “survival analysis”) from real LMS data and shows how this analysis is done. Terms related to time-to-event analysis will be introduced, and the assertability of extracted data is explored.
Time-to-event analysis, in its simplest form, enables the study of in-world phenomena which includes the time it takes to achieve a particular defined “event” (whether negative or positive, desirable or undesirable), and it includes the nuance of “censored” data (in-world records for which data about event achievement was not attained during the time period of the analysis). This presentation introduces “time-to-event analysis” (on IBM’s SPSS Statistics) as applied to online educational data.
Leveraging Flat Files from the Canvas LMS Data Portal at K-StateShalin Hai-Jew
A lot of data are created in an LMS instance, and much of this can be analyzed for insight. In 2016, Instructure, the makers of Canvas, made their LMS data available to their customers through a data portal (updated monthly). This portal enables access to a number of flat files related to that particular instance. This presentation showcases how this big data was analyzed on a regular laptop with basic office software, to summarize Kansas State University’s use of the LMS. Methods for analysis include the following: basic descriptive statistics, survival analysis, computational linguistic analysis, and others.
The results are reported out with both numbers and data visualizations, including classic pie charts, line graphs, bar charts, mixed-charts, word clouds, and others. The findings provide some insights about how to approach the data, how to use a data dictionary, and other methods for extracting the data for awareness and practical decision-making. This work also is suggestive of next steps for more advanced analysis (using the flat files in a SQL database).
More information about this may be accessed at http://scalar.usc.edu/works/c2c-digital-magazine-spring--summer-2017/wrangling-big-data-in-a-small-tech-ecosystem.
The K-State Online Canvas LMS Data Portal and Five Years of Activated Third-P...Shalin Hai-Jew
The presenter will introduce the K-State LMS data portal and introduce some available insights from there and focus on one particular facet of this big data--the third-party apps that K-State faculty, admin, and staff have activated and what that says about how we're using Canvas.
Canvas LMS data portal for the Kansas State University instance
A data dictionary: Version 1.16.2 (https://portal.inshosteddata.com/docs)
Data extraction and processing
What it can tell us: (un)available data and information
Activated third-party tools in K-State Online Canvas LMS instance
Some caveats
What this says about what K-Staters (early adopters) are using
Practical applications of this third-party app activation data
Adding value to LMS data portal data
Creating Effective Data Visualizations in Excel 2016: Some BasicsShalin Hai-Jew
One of the mainstays of a modern software toolkit is Excel 2016, from Microsoft Office 2016. By reputation, Excel is considered a beginner’s tool that self-respecting data analysts would bypass, but Excel is fairly high-powered, can take up to 1.06 million rows of data per set, contains complex statistical analysis capabilities (without the need for scripting), and enables rich data visualizations. It has a number of rich add-ons to empower different analytical and data visualization functionalities. It works as a great bridging tool to more complex types of statistical analyses.
This session walks participants through some basic built-in data visualizations in Excel 2016, including pie charts and doughnuts, bar charts, tree maps and sunburst diagrams, cluster diagrams, spider (radar) charts, scattergraphs, and others. This session will cover how data structures and desired emphases will determine the options for particular data visualizations.
In this session, participants will
review how to load a data table,
read the general data in a data table (or worksheet),
process or clean the data as needed,
use the Recommended Charts feature,
decide which built-in data visualizations to use, and
consider how to add relevant data visualization elements (including data labels, background grids, axis labels, and titles) for a coherent and effective data visualization.
Also, participants will help co-build data visualizations from open-source and other datasets.
This slideshow reviews some of the features and functionalities of Qualtrics that enable its use in online trainings. This explores some important instructional design elements in online trainings, including for three main types: policy compliance, mass-scale trainings, and customized trainings. This reviews some core elements of online trainings. Finally, there are some reflections on real-world considerations when building an online training on Qualtrics.
Materials for introduction to adaptive learning and learning analytics as well as efforts of interoperability standardization. This slides treats brief concept of adaptive learning, reference model of learning analytics, data APIs for learning analytics, and topic list of standardization community (ISO/IEC JTC1 SC36).
Presentation from a workshop given at ACRL 2011 conference, Data-Driven Library Web Design: Making Usability Testing Work with Collaborative Partnerships
Education must capitalize on the trend within technology toward big data. New types of data are becoming available. From evidence approaches to xAPI and the whole Training and Learning Architecture(TLA) big data is the foundation of all.
This slide was presented in International the 2015 Conference on Education Research.
I aggregated several my other partial slides and reports to describe adaptive learning model pertaining to concept of learning analytics as well as LOD for curriculum standards and digital resources. There is short introduction to the project of ISO/IEC 20748 Learning analytics interoperability - Part 1: Reference model.
Online Learning Design for Diversity and Inclusion Shalin Hai-Jew
Social inclusion and respect for diversity are some of the most important democratic values that inform learning design. The educational research literature offers methods for how to design teaching and learning for people in all (many of?) their complex dimensions:
demographics;
cultures [including worldviews, beliefs, values, practices, and others];
languages;
learning preferences;
differing perceptions and information processing, and others,
… so that all are included and supported and welcomed. Widely known approaches include accessibility mitigations, universal design practices, multi-cultural adaptations, and others. This presentation provides a light overview of suggested practices and how these are applied to practical instructional designs of online learning with modern technological enablements.
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVEIJDKP
Knowledge Discovery in Databases is the process of finding knowledge in massive amount of data where
data mining is the core of this process. Data mining can be used to mine understandable meaningful patterns from large databases and these patterns may then be converted into knowledge.Data mining is the process of extracting the information and patterns derived by the KDD process which helps in crucial decision-making.Data mining works with data warehouse and the whole process is divded into action plan to be performed on data: Selection, transformation, mining and results interpretation. In this paper, we have reviewed Knowledge Discovery perspective in Data Mining and consolidated different areas of data
mining, its techniques and methods in it.
There are many online and in-person courses available for librarians to learn about research data management, data analysis, and visualization, but after you have taken a course, how do you go about applying what you have learned? While it is possible to just start offering classes and consultations, your service will have a better chance of becoming relevant if you consider stakeholders and review your institutional environment. This lecture will give you some ideas to get started with data services at your institution.
presents the foundational aspects of web analytics and some specifics such as the hotel problem. Discusses trace data, behaviorism, and other cool web analytics stuff
Information Experience Lab, IE Lab at SISLTIsa Jahnke
Founded in 2003
The Information Experience Laboratory, IE Lab – is a usability and user experience lab …
… with the mission to improve learning technologies, information and communication systems.
We here present the IE Lab and methods .
Slides for conference program at e-Learning Korea 2016. Also this slides contain ISO/IEC TR 20748-1 Learning Analytics Interoperability - Part 1: Reference model as well as curriculum standards. Mainly this slides was prepared for LASI-Asia 2016 #lasiasia16.
Slides | Targeting the librarian’s role in research servicesLibrary_Connect
Slides from the Nov. 8, 2016 Library Connect webinar "Targeting the librarian’s role in research services" with Nina Exner, Amanda Horsman and Mark Reed. See the full webinar at: http://libraryconnect.elsevier.com/library-connect-webinars?commid=223121
Who Owns Faculty Data?: Fairness and transparency in UCLA's new academic HR s...chloejreynolds
Abstract: Beginning in 2015, Opus will be the information system of record for faculty activities at the University of California, Los Angeles (UCLA). Opus will serve as both a profile system, storing data about faculty work, and as a workflow and approval engine for the promotion and tenure process. Opus leverages institutional master data wherever possible to collect data about faculty activity. However, re-purposing institutional data collected for purposes not related to academic review necessitates allowing data subjects (UCLA faculty), to contextualize and reframe the data for the review process. Collecting, displaying and storing these augmented records (master data with manually added metadata from faculty) has forced the project team to grapple with questions regarding fairness and transparency to both data subjects and to data consumers. How can we hold to “good design” and usability practices, while faithfully representing the inherent “messiness” of the data? How does the context in which the data was collected impact re-purposing the data for academic review? What does it mean to “own” faculty data? This paper outlines our attempts to address these questions, noting the trade-offs and limitations of the selected solutions.
This topic was presented at the 2015 iConference on March 26, 2015 in Newport Beach, CA. Since 2005, the iConference series has provided forums in which information scholars, researchers and professionals share their insights on critical information issues in contemporary society. An openness to new ideas and research fields in information science is a primary characteristics of the event.
Presentation from a workshop given at ACRL 2011 conference, Data-Driven Library Web Design: Making Usability Testing Work with Collaborative Partnerships
Education must capitalize on the trend within technology toward big data. New types of data are becoming available. From evidence approaches to xAPI and the whole Training and Learning Architecture(TLA) big data is the foundation of all.
This slide was presented in International the 2015 Conference on Education Research.
I aggregated several my other partial slides and reports to describe adaptive learning model pertaining to concept of learning analytics as well as LOD for curriculum standards and digital resources. There is short introduction to the project of ISO/IEC 20748 Learning analytics interoperability - Part 1: Reference model.
Online Learning Design for Diversity and Inclusion Shalin Hai-Jew
Social inclusion and respect for diversity are some of the most important democratic values that inform learning design. The educational research literature offers methods for how to design teaching and learning for people in all (many of?) their complex dimensions:
demographics;
cultures [including worldviews, beliefs, values, practices, and others];
languages;
learning preferences;
differing perceptions and information processing, and others,
… so that all are included and supported and welcomed. Widely known approaches include accessibility mitigations, universal design practices, multi-cultural adaptations, and others. This presentation provides a light overview of suggested practices and how these are applied to practical instructional designs of online learning with modern technological enablements.
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVEIJDKP
Knowledge Discovery in Databases is the process of finding knowledge in massive amount of data where
data mining is the core of this process. Data mining can be used to mine understandable meaningful patterns from large databases and these patterns may then be converted into knowledge.Data mining is the process of extracting the information and patterns derived by the KDD process which helps in crucial decision-making.Data mining works with data warehouse and the whole process is divded into action plan to be performed on data: Selection, transformation, mining and results interpretation. In this paper, we have reviewed Knowledge Discovery perspective in Data Mining and consolidated different areas of data
mining, its techniques and methods in it.
There are many online and in-person courses available for librarians to learn about research data management, data analysis, and visualization, but after you have taken a course, how do you go about applying what you have learned? While it is possible to just start offering classes and consultations, your service will have a better chance of becoming relevant if you consider stakeholders and review your institutional environment. This lecture will give you some ideas to get started with data services at your institution.
presents the foundational aspects of web analytics and some specifics such as the hotel problem. Discusses trace data, behaviorism, and other cool web analytics stuff
Information Experience Lab, IE Lab at SISLTIsa Jahnke
Founded in 2003
The Information Experience Laboratory, IE Lab – is a usability and user experience lab …
… with the mission to improve learning technologies, information and communication systems.
We here present the IE Lab and methods .
Slides for conference program at e-Learning Korea 2016. Also this slides contain ISO/IEC TR 20748-1 Learning Analytics Interoperability - Part 1: Reference model as well as curriculum standards. Mainly this slides was prepared for LASI-Asia 2016 #lasiasia16.
Slides | Targeting the librarian’s role in research servicesLibrary_Connect
Slides from the Nov. 8, 2016 Library Connect webinar "Targeting the librarian’s role in research services" with Nina Exner, Amanda Horsman and Mark Reed. See the full webinar at: http://libraryconnect.elsevier.com/library-connect-webinars?commid=223121
Who Owns Faculty Data?: Fairness and transparency in UCLA's new academic HR s...chloejreynolds
Abstract: Beginning in 2015, Opus will be the information system of record for faculty activities at the University of California, Los Angeles (UCLA). Opus will serve as both a profile system, storing data about faculty work, and as a workflow and approval engine for the promotion and tenure process. Opus leverages institutional master data wherever possible to collect data about faculty activity. However, re-purposing institutional data collected for purposes not related to academic review necessitates allowing data subjects (UCLA faculty), to contextualize and reframe the data for the review process. Collecting, displaying and storing these augmented records (master data with manually added metadata from faculty) has forced the project team to grapple with questions regarding fairness and transparency to both data subjects and to data consumers. How can we hold to “good design” and usability practices, while faithfully representing the inherent “messiness” of the data? How does the context in which the data was collected impact re-purposing the data for academic review? What does it mean to “own” faculty data? This paper outlines our attempts to address these questions, noting the trade-offs and limitations of the selected solutions.
This topic was presented at the 2015 iConference on March 26, 2015 in Newport Beach, CA. Since 2005, the iConference series has provided forums in which information scholars, researchers and professionals share their insights on critical information issues in contemporary society. An openness to new ideas and research fields in information science is a primary characteristics of the event.
Invited lecture at Emory University Rollins School of Public Health. We presented our InSTEDD global early warning and response social platform; Evolve (http://instedd.org/evolve) with live demonstration.
Invited talk, INSIGHT Centre for Data Analytics, Univ. Galway, 2 Oct 2013, http://www.insight-centre.org
Abstract:
Data and analytics are transforming how organisations work in all sectors. While there are clearly ethical issues around big data and privacy, there may also be an argument that educational institutions have a moral obligation to use all the information they have to maximize the learner's progress. So, assuming education can't (arguably shouldn't) resist this revolution, the question is how to harness this new capability intelligently. Learning Analytics is an exploding research field and startup market: do leaders know what to ask when the vendors roll up with dazzling dashboards? In this talk I'll provide an overview of developments, and consider some of the key questions we should be asking. Like any modelling technology and accounting system, analytics are not neutral, and do not passively describe sociotechnical reality: they begin to shape it. Moreover, they start with the things that are easiest to count, which doesn't necessarily equate to the things we value in learning. Given the crisis in education at many levels, what realities do we want analytics to perpetuate, or bring into being?
Bio:
Simon Buckingham Shum is Professor of Learning Informatics at the UK Open University's Knowledge Media Institute. He researches, teaches and consults on Learning Analytics, Collective Intelligence and Argument Visualization. His background is B.Sc. Psychology, M.Sc. Ergonomics and Ph.D. Human-Computer Interaction. He co-edited Visualizing Argumentation (Springer 2003), the standard reference in the field, followed by Knowledge Cartography (2008). In the field of Learning Analytics, he served as Program Co-Chair of the 2nd International Learning Analytics LAK12 conference, chaired the LAK13 Discourse-Centric Learning Analytics workshop, and the LASI13 Dispositional Learning Analytics workshop. He is a co-founder of the Society for Learning Analytics Research, Compendium Institute, LearningEmergence.net, and was Co-Founder and General Editor of the Journal of Interactive Media in Education. He serves on the Advisory Groups for a variety of learning analytics initiatives in education and enterprise, and is a Visiting Fellow at University of Bristol Graduate School of Education. Contact him via http://simon.buckinghamshum.net
MEASUREMENT AND STATISTICS 1
MEASUREMENT AND STATISTICS 2
Measurement and Statistics
Student Name:
Institution Name:
Instructor Name:
Submission Date:
Introduction
There is excessive use of internet today both by the old and the young. It common to see people spending most of their time behind computers or using their mobile phones browsing the internet and assessing social media. Internet is beneficial for various reasons such as entertainment, social interaction, education and even work. There have been concerns as to whether this excessive use of internet by individuals has a negative impact on their psychological health. Some of the aspects that internet has been seen to affect one’s psychological health include; anxiety, sleep, stress, depression and even social life. Some researchers have conducted studies which link excessive use of internet to extreme cases of low self-esteem and suicidal thoughts. The focus of this paper is to analyze the statistics and measurements that will be used in the research which will be conducted to confirm if excessive internet use has a negative impact on psychological health.
Type of Research
The research to find out if excessive use of internet affects psychological health will be quantitative. This is where the variables will be assigned numerical date in scales and during data collection. After obtaining results from the date collection process, the collected data will be analyzed using various statistical analysis tests. The use of quantitative research in this case will be important to the test in various ways. Quantitative research tends to be more objective and reliable compared to qualitative research (Balnaves, 2001). It is also significant as statistics can be used to conclude a finding. The complex problem in a research study can be restructured and reduced to a smaller number of variables. It also focusses association and relationship between variables and can describe cause and effect easily. After results have been analyzed, quantitative research can easily test hypothesis and theories. It is important through its assumption that a sample represents the whole population. There is less recognition of the researcher’s subjectivity and therefore there is less biasness in this kind of research.
Variables
The qualitative study will be relational as it will analyze the relationship between various variables. There will be collection of data among university students who will be selected randomly and they will fill a questionnaire. The variables in the research study can be generally divided into two categories; internet use and psychological health. The data on internet use will be collected using Online Cognition Scale (OCS) ...
C H7A P T E R Collecting Qualitative Data Qualitative da.docxRAHUL126667
C H7A P T E R Collecting
Qualitative Data
Qualitative data collection is more than simply deciding on whether you will observe or interview people. Five steps comprise the process of collecting qualitative data. You need to identify your participants and sites, gain access, determine the types of data to collect, develop data collection forms, and administer the process in an ethical manner.
By the end of this chapter, you should be able to:
· ◆ Identify the five process steps in collecting qualitative data.
· ◆ Identify different sampling approaches to selecting participants and sites.
· ◆ Describe the types of permissions required to gain access to participants and sites.
· ◆ Recognize the various types of qualitative data you can collect.
· ◆ Identify the procedures for recording qualitative data.
· ◆ Recognize the field issues and ethical considerations that need to be anticipated in administering the data collection. Maria is comfortable talking with students and teachers in her high school. She does not mind asking them open-ended research questions such as “What are your (student and teacher) experiences with students carrying weapons in our high school?” She also knows the challenges involved in obtaining their views. She needs to listen without injecting her own opinions, and she needs to take notes or tape-record what people have to say. This phase requires time, but Maria enjoys talking with people and listening to their ideas. Maria is a natural qualitative researcher.
204
CHAPTER 7 Collecting Qualitative Data 205 WHAT ARE THE FIVE PROCESS STEPS
IN QUALITATIVE DATA COLLECTION?
There are five interrelated steps in the process of qualitative data collection. These steps should not be seen as linear approaches, but often one step in the process does follow another. The five steps are first to identify participants and sites to be studied and to engage in a sampling strategy that will best help you understand your central phenome- non and the research question you are asking. Second, the next phase is to gain access to these individuals and sites by obtaining permissions. Third, once permissions are in place, you need to consider what types of information will best answer your research questions. Fourth, at the same time, you need to design protocols or instruments for collecting and recording the information. Finally and fifth, you need to administer the data collection with special attention to potential ethical issues that may arise.
Some basic differences between quantitative and qualitative data collection are helpful to know at this point. Based on the general characteristics of qualitative research, qualita- tive data collection consists of collecting data using forms with general, emerging questions to permit the participant to generate responses; gathering word (text) or image (picture) data; and collecting information from a small number of individuals or sites. Thinking more specifically now ...
Journal Club - Best Practices for Scientific ComputingBram Zandbelt
Journal Club presentation for Cools lab at Donders Institute, Radboud University, Nijmegen, the Netherlands
Date: October 28, 2015
Paper:
Wilson, G., Aruliah, D. A., Brown, C. T., Hong, N. P. C., Davis, M., Guy, R. T., ... & Wilson, P. (2014). Best practices for scientific computing. PLoS Biology, 12(1), e1001745.
Learning analytics and Moodle: So much we could measure, but what do we want to measure? A presentation to the USQ Math and Sciences Community of Practice May 2013
Analysis of Multiple Pilots for ICT-supported Lifelong Competence Development, Davinia Hernández-Leo, davinia.hernandez@upf.edu, TENCompetence Winter School 2009, 1-6 February Innsbruck, Austria
A Framework for Statistical Simulation of Physiological Responses (SSPR).Waqas Tariq
The problem of variable selection from a large number of variables to predict certain important dependent variables has been of interest to both applied statisticians and other researchers in applied physiology. For this purpose, various statistical techniques have been developed. This framework embedded various statistical techniques of sampling and resampling and help in Statistical Simulation for Physiological Responses under different Environmental condition. The population generation and other statistical calculations are based on the inputs provided by the user as mean vector and covariance matrix and the data. This framework is developed in a way that it can work for the original data as well as for simulated data generated by the software. Approach: The mean vector and covariance matrix are sufficient statistics when the underlying distribution is multivariate normal. This framework uses these two inputs and is able to generate simulated multivariate normal population for any number of variables. The software changes the manual operation into a computer-based system to automate the study, provide efficiency, accuracy, timelessness, and economy. Result: A complete framework that can statistically simulate any type and any number of responses or variables. If the simulated data is analyzed using statistical techniques; the results of such analysis will be the same as that using the original data. If the data is missing for some of the variables, in that case the system will also help. Conclusion: The proposed system makes it possible to carry out the physiological studies and statistical calculations even if the actual data is not present.
Similar to "Survival Analysis" for Online Learning Data (20)
Long nonfiction chapters are not in-style and may never have been. Where average chapter lengths of nonfiction book chapters are about 4,000 – 7,000 words in length, some may be several times that max range number. The explanation is that there is some irreducible complexity that that chapter addresses that cannot be addressed in shorter form. This slideshow explores some methods for writing longer chapters while still maintaining coherence, focus, and reader interest…and while using some technological tools to write and edit more efficiently.
Overcoming Reluctance to Pursuing Grant Funds in AcademiaShalin Hai-Jew
Starting as an organization’s new grant writer can be a challenge, especially in a case where there has been a time lapse since the last one left. People get out of the habit of pursuing grant funds. This slideshow addresses some of the reasons for such reluctance and proposes some ways to mitigate these.
Writing grants is one common way that those in institutions of higher education may acquire some funds—small and big, one-off and continuing—to conduct research, hire faculty and researchers and learners and others, update equipment, update or build up new buildings, and achieve other work. This slideshow explores some aspects of the work of grant writing in the present moment in higher education.
Contrasting My Beginner Folk Art vs. Machine Co-Created Folk Art with an Art-...Shalin Hai-Jew
The SARS-CoV-2 pandemic inspired several years of experimentation with common or folk art, involving mixed media, alcohol ink painting, and other explorations. Then, with the emergence of art-making generative AIs, there were further experiments, particularly with one that enables generation of visuals from scanned art and photos, text prompts, style overlays, and text-based visual modifiers. While both types of artmaking are emotionally satisfying and helpful for stress management, there are some contrasting differences. This exploratory slideshow explores some of these differences in order to partially shed light on the informal usage of an art-making generative AI (artificial intelligence).
Creating Seeding Visuals to Prompt Art-Making Generative AIsShalin Hai-Jew
Art-making generative AIs have come to the fore. A basic work pipeline typically involves starting with text prompts -> generated images. That image may be used to seed further iterations. Deep Dream Generator (DDG) enables the application of “modifiers” of various types (artist styles, visual adjectives, others) to be applied in addition to the text prompt.
Another approach involves beginning with a “seeding image,” a born-digital or digitized (born-analog) visual on which AI-generated art may be based for a multi-channel and multi-modal prompt. This slideshow provides some observations of how to think about seeding images, particularly in terms of how the DDG handles them, with its “algorithmic pareidolia” (“Deep Dream,” Wikipedia, July 3, 2023).
Human art-making is often about throwing mass-scale conversations. Artists are thought to help bridge humanity into the future. Whether generative AI art enables this or not is still not clear.
Common Neophyte Academic Book Manuscript Reviewer MistakesShalin Hai-Jew
The work of academic book reviewing, as a volunteer (most often), is a common academic practice. The presenter has served as a neophyte one for some years before settling into this invited volunteer work for several decades. There have been lessons learned over time about avoidable mistakes…from both experience and observation.
Fashioning Text (and Image) Prompts for the CrAIyon Art-Making Generative AIShalin Hai-Jew
CrAIyon (formerly DALL-E after Salvador “Dali”) is a web-facing art-making generative AI tool online (https://www.craiyon.com/) that enables the uses of text (and image) prompts for the creation of watermarked, lightweight visuals. Counterintuitively, the rough visuals are much more usable for recombinations and remixes and recreations into usable digital visuals for various digital learning objects. The textual prompts are not particularly intuitive because of how the generative AI program was trained on mass-scale visuals). There is an art and occasional indirection to working prompts after each try, with the resulting nine-image proof sheets that CrAIyon outputs. The tool can be used iteratively for different outputs.
The tool sometimes turns out serendipitous surprises, including an occasional work so refined that it can be used / shared almost unedited. One challenge in using CrAIyon comes from their request for credit (for all non-subscribers to their service). Another comes from the visual watermarking (orange crayon at the bottom right of the image). However, this tool is quite useful for practical applications if one is willing to engage deep digital image editing (Adobe Photoshop, Adobe Illustrator).
Augmented Reality in Multi-Dimensionality: Design for Space, Motion, Multiple...Shalin Hai-Jew
Augmented reality (AR)—the use of digital overlays over physical space—manifests in a wide range of spaces (indoor, outdoor; virtual) and ways (in real space (with unaided human vision); in head gear; in smart glasses; on mobile devices, and others). There are various authoring technologies that enable the making of AR experiences for various users. This work uses a particular tool (Adobe Aero®) to explore ways to build AR for multiple dimensions, including the fourth dimension (motion, changes over time).
Based on the respective purposes of the AR experience, some basic heuristics are captured for
space design (1),
motion design (2),
multiple perception design (sight, smell, taste, sound, touch) (3),
and virtual- and tangible- interactivity (4).
Some Ways to Conduct SoTL Research in Augmented Reality (AR) for Teaching and...Shalin Hai-Jew
One of the extant questions about augmented reality (AR) is how (in)effective it is for the teaching and learning in various formal, nonformal, and informal contexts. The research literature shows mixed findings, which are often highly context-based (and not generalizable). There are some non-trivial costs to the design/development/deployment of AR for teaching and learning. For the users, there is cognitive load on the working memory [(1) extraneous/poor design, (2) intrinsic/inherent difficulty in topic, and (3) germane/forming schemas]. For teachers, there are additional knowledge, skills, and abilities / attitudes (KSAs) that need to be brought to bear.
Exploring the Deep Dream Generator (an Art-Making Generative AI) Shalin Hai-Jew
The Deep Dream Generator was created by Google engineer Alexander Mordvintsev in 2014. It has a public facing instance at https://deepdreamgenerator.com/, which enables people to use text prompts and image prompts (individually or in combination) to inspire the art-generating generative AI to output images. This work highlights some process-based walk-throughs of the tool, some practical uses, some lightweight art learning, some aspects of the online social community on this platform, and other insights. Some works by the AI prompted by the presenter may be seen here: https://deepdreamgenerator.com/u/sjjalinn.
(This is the first draft of a slideshow that will be used in a conference later in the year.)
Augmented Reality for Learning and AccessibilityShalin Hai-Jew
Recently, the presenter conducted a systematic review of the academic literature and an environmental scan to learn how to set up an augmented reality (AR) shop at an institution of higher education. The ambition was to not only set up AR in an accessible and legal way but also be able to test for potential +/- effects of AR on teaching and learning. The research did not go past the review stage, because of a lack of funding, but some insights about accessibility in AR were acquired.
(The visuals are from Deep Dream Generator and CrAIyon.)
Engaging Pixabay as an open-source contributor to hone digital image editing,...Shalin Hai-Jew
This slideshow describes the author's early experiences with creating two accounts on Pixabay in order to advance digital editing skills in multimedia. The two accounts are located at https://pixabay.com/users/sjjalinn-28605710/ and https://pixabay.com/users/wavegenerics-29440244/ ...
This work explores four main spaces where researchers publish about educational technology: academic-commercial, open-access, open-source, and self-publishing.
Human-Machine Collaboration: Using art-making AI (CrAIyon) as cited work, o...Shalin Hai-Jew
It is early days for generative art AIs. What are some ways to use these to complement one's work while staying legal (legal-ish)?
Correction: .webp is a raster format
Getting Started with Augmented Reality (AR) in Online Teaching and Learning i...Shalin Hai-Jew
University creative shops are exploring whether they can get into the game of producing AR-enhanced experiences: campus tours, interactive gaming, virtual laboratories, exploratory art spaces, simulations, design labs, online / offline / blended teaching and learning modules, and other AR applications.
This work offers a basic environmental scan of the AR space for online teaching and learning, and it includes pedagogical design leads from the current research, technological knowhow, hands-on design / development / deployment of learning objects, and online teaching and learning methods.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
2. PRESENTATION
Everything exists in time. “Survival analysis” is a statistical technique long used in the
health sciences. As “time-to-event analysis,” it enables the asking of questions like:
How much time passes before (an event) occurs, if it occurs, and what does this data
suggest about various in-world phenomena?
2
3. PRESENTATION (CONT.)
In an online learning context, with LMS data, questions such as the following may be
answered:
How long does it take for an online learner to (find his or her rhythm) in an online course? (if it
happens)
How long does it take for an online instructor (to get to know) a particular student in a more person-to-
person way? (if it happens)
How long does it take for an online learner to (form basic facility) with a new software tool? (if it
happens)
How long does it take for a student researcher to (achieve breakout capacity) in (a particular skill)? (if
it happens)
How long does it take for a doctoral student to (publish his / her first peer-reviewed paper)? (if it
happens)
3
4. PRESENTATION (CONT.)
And what are observable variables that may affect whether the particular observed
“state” is achieved or not? And if achieved, whether the occurrence is “early” or
“late” in comparison with other comparable events?
This presentation will introduce survival analysis, its basic assumptions, its practice
(using IBM’s SPSS), its strengths and limitations, data “censoring” (to avoid
“survivorship bias”), and ways to interpret related linegraphs and other related data
visualizations.
4
5. PRESENTATION ORDER
1. Early Applications of “Survival Analysis”
2. Some Common Terms
3. Other Forms and Applications of “Survival Analysis”
4. Basic Elements of a “Time-to-Event” Analysis
5. Applications to Online Learning Data
6. One Example (with Faux Data)
7. A Few Questions
8. Some Takeaways
9. Light Debriefing
5
7. EARLY “SURVIVAL ANALYSIS” IN THE HEALTH
SCIENCES
Use of empirical time-series data of a group of individuals with particular life-
threatening health issues to see what their survival trajectories were over time
The “time-to-event” is measured, with the “event” being non-survival
Extraction of a regression curve of those who survived and those who did not (and
when they passed)
These datapoints are represented as a non-increasing (not “decreasing” because
there are times of plateaus in which no events of non-survival occur) linegraph
Time may be measured in various discrete units (from coarse to fine granularity) or
continuously
7
8. EARLY “SURVIVAL ANALYSIS” IN THE HEALTH
SCIENCES (CONT.)
In the health context, survival curves may inform actuarial tables for expected
survival given particular age, health states, and behavioral practices.
Comparisons of survival curves may be made between comparable groups, albeit
those receiving different interventions or treatments (within ethical guidelines).
Particular group’s survival curves may be compared, such as between males and
females, individuals of different age groups, individuals with different lifestyles,
individuals from different social classes, individuals from different geographical
locations, and so on
8
10. SOME COMMON TERMS
Time Zero is the beginning of the study
S(t) is “survival at time ‘t’”
Survival is a factor of time and also a factor of “hazard” (the risk of non-survival)
The survival rate has a negative correlation with the hazard rate (the higher one is, the lower the other)
The hazard function is non-decreasing and accumulates over time
Sometimes, hazards are considered constant; other times, hazards may increase or decrease over time, depending on the
phenomenon being modeled
10
11. SOME COMMON TERMS (CONT.)
Data “censoring” refers to the members of the population who are part of the study
but who either drop out or do not achieve event (whatever that event might be in the
particular dataset); their data is “lost to follow-up”
Left censoring suggests a lack of event information prior to the participant’s entrance to the study
Right censoring suggests a lack of event information after the end of the study and the participant’s
exit from the study
Including censored data precludes “survivorship bias” or overweighting the effects of
data that “survive” the research period because it is salient (attention-getting) and
missing the more quiet or subtle data in the background
Including censored data means that the data is more representational of real-world observations
11
13. OTHER FORMS AND APPLICATIONS OF “SURVIVAL
ANALYSIS”
Other Forms of “Survival Analysis”
Time-to-event analysis
Event history analysis
Reliability analysis
Duration analysis
Some Fields of Application
Engineering
Economics
Sociology
Political science
Marketing
Education
13
14. TIME-TO-EVENT ANALYSIS
For contexts beyond the health sciences, “survival analysis” has evolved to “time-to-
event” analysis
The independent variable (IV) is time
The dependent (outcome) variable (DV) is time-to-event
There are potential covariates or other variables that affect survival outcomes—
positively or negatively (to varying degrees)
These may affect hazard rates (risk of event at any particular time slice) and survival rates
14
16. BASIC ELEMENTS NEEDED FOR A SIMPLE TIME-TO-
EVENT ANALYSIS
Required Define-able and Observable
Elements
A population and phenomenon to study
Defined units of time (aka “spell”)
An event (or censoring)
Additional Features
Access to the data, over time
Ability to consistently maintain the
particular unit of time observation
Ability to observe either achievement of
event or non-achievement of event
(censored data)
16
18. THREE REQUIRED TYPES OF DATA
A population and phenomenon to study (expressed as string data written in
camelCase)
Population may be animate or inanimate
Each member of the population is an “experimental unit” (represented in a data table as row data)
Defined units of time or continuous time (time aka “spell”) (expressed as integer data)
An event (or censoring) (expressed as a dummy variable with 1 = event, 0 =
censored)
18
19. ADDITIONAL INFORMATION THAT MAY BE
COLLECTED
Univariate data: For each row (or experimental unit), time-to-event (or no record of
achievement of event, in which case there is censored data)
Bivariate data: For each row, capture of both time-to-event and event or
censoring…but also one other qualitative (categorical) or quantitative feature of the
experimental unit
Multivariate data: For each row, capture of time-to-event, event/censoring, and
multiple other qualitative and / or quantitative features of the experimental unit
19
21. SO…
A time-to-event analysis is a time-series representation of a phenomenon that also
includes the relative frequency of occurrences in time
21
22. SOME ASKABLE QUESTIONS USING TIME-TO-EVENT
ANALYSIS APPLIED TO ONLINE LEARNING DATA
How much time passes before…
An online instructor uses a particular feature or tool in an LMS (learning management system)?
An online instructor reaches out to his / her students?
An online instructor uses the LMS for a non-course application?
An online instructor starts (or stops) usage of a particular digital learning object?
An online instructor uses the mobile app to use the LMS?
An online instructor finalizes and submits grades for the particular term?
An online instructor teaching online commits to the online teaching modality?
22
23. SOME ASKABLE QUESTIONS USING TIME-TO-EVENT
ANALYSIS APPLIED TO ONLINE LEARNING DATA(CONT.)
How much time passes before…
An online learner submits a first assignment?
An online learner makes a first friend online?
An online learner commits to completing an online course or online learning sequence?
An online learner communicates with his or her instructor?
An online learner contests a grade with the instructor?
An online learner uploads an image?
23
24. SOME ASKABLE QUESTIONS USING TIME-TO-EVENT
ANALYSIS APPLIED TO ONLINE LEARNING DATA(CONT.)
How much time passes before…
A university adds a new feature to an LMS at the instance-level?
A university is able to attract a sufficient number of learners to a program to ensure that it is self-
sustaining?
A university considers moving from a particular LMS (from time-of-adoption)?
24
25. ONLINE LEARNING DATA
Online learning data comes from a number of sources:
an LMS data portal
scraped discussion board data from an online course
a third-party app used in online learning
admin or instructor access to a course
grades in a student information system
demographic data in a student information system
Some of the data to access will require more effort to collect than others
Some of the data is not collected anywhere and may have to be inferred (from
multiple data streams) or imputed (substituting values for missing data based on a
reasonable method)
25
26. ONLINE LEARNING DATA(CONT.)
The ability to use data for research depends on a number of policies and laws, so
any research should go through the IRB (institutional review board) process, and
private information cannot generally be used.
There are rules for the safe handling of information as well. These should also be followed to the
letter.
26
28. THE FAUX DATA
What is the amount of time (in days) for a group of 26 online students to make a
friend in an online learning context?
Three columns: UniqueIdentifier (letters), Days (timeunit), Censored (1or 0)
28
29. SETUP IN IBM’S SPSS
Open SPSS.
File -> Open -> Data (Enable “All Files” if there is a variety of files…)
Once data are loaded, go to Analyze -> Survival -> Kaplan-Meier
29
31. SETUP IN IBM’S SPSS(CONT.)
Place the column data into the correct areas: Time, Status, and Label Cases by…
31
32. SETUP IN IBM’S SPSS(CONT.)
Click the Status section, and then click the activated “Define Event” button below it.
Clarify that 1 is used to indicate the occurrence of an “event,” and 0 means
“censored.”
Click Continue.
32
33. SETUP IN IBM’S SPSS(CONT.)
Click Save.
Check which features you want: Survival, Standard Error of Survival, Hazard, and
Cumulative events.
Click Continue.
33
34. SETUP IN IBM’S SPSS (CONT.)
Click Options. Indicate whether you want Quartiles. Also, select the Plots you want:
Survival, One Minus Survival, Hazard, and Log Survival.
Click Continue.
Click Save.
34
35. SETUP IN IBM’S SPSS(CONT.)
When this is set up properly, the “OK” button at the bottom of the “Kaplan-Meier”
window will be activated.
35
44. A FEW QUESTIONS
How long was the study period (observation period)?
How many students took part?
What was the general time pattern in terms of friend-making?
How many learners had not made online friends by the end of the study period?
What might happen if this faux study went longer? Why?
What might happen if more learners were included?
44
45. A FEW QUESTIONS (CONT.)
What are some “hazards” for learner friend-making in an online course? Why?
Are there possible “covariates” that might explain friend-making among learners in
an online course?
If this were real data, what might you actually see?
45
47. SOME TAKEAWAYS
A “survival analysis” or a “time-to-event analysis” shows how much time passes
before an event occurs for a particular population.
In a “survival analysis,” the event is non-survival (and is permanent).
In a “time-to-event analysis,” the event can be any objectively observable defined occurrence in time,
and this event may be positive or negative.
The “population” in a “survival analysis” are people (or other living things).
The “population” in a “time-to-event analysis” may be inanimate things,
like equipment (When does this equipment fail under these defined conditions?)
like socio-political phenomena (When does war occur between two non-democratic countries over a fight over borders or land in
the contemporary era?)
like technologies (When does a zero-day exploit age out from usefulness in a particular software suite?)
like plants (When does a particular seed germinate in a particular greenhouse environment?), and so on
47
48. SOME TAKEAWAYS(CONT.)
These analyses include censored data, in order to capture a more real-world sense
of the information and in order to avoid “survivorship bias” of salient information
(which may skew the perception of the data).
“Survivorship bias” refers to the mistaken impression of a phenomenon because the available data is
captured and noticed (is salient) whereas less available data remains invisible and potentially not
noticed.
Just paying attention to “surviving” data will skew impressions and lead to incorrect analysis.
A simple example is that only students who “survive” to the end of an online class will evaluate the
instructor and the online course. Those who are not heard are those who failed to survive to the end,
but they may have helpful insights that would improve the teaching and the online course’s design.
48
49. SOME TAKEAWAYS(CONT.)
The hazard function and the survival function have a negative correlation. More of
one means less of the other.
The higher the hazard, the lower the survival rate (at a particular time or time period).
The higher the survival rate, the lower the hazard rate (at a particular time or time period).
A one-minus-survival table shows cumulative event accumulation over time and a
sense of probability of event at each time unit or juncture.
At Time Zero, the entire population is alive with 100% survival.
Over time, the population experiences attrition, so the survival rate falls.
Risks increase over time.
There may be time periods of particular risk, whether early or mid-point or later in a process,
depending on the phenomenon being studied. (A common example is the bathtub curve for the human
life span. Once babies survive early threats to their mortality, they grow into adulthood and tend to
have lower risk through adulthood, but that risk of non-survival rises again as they attain old age. In
other words, hazard functions change over time and vary.)
49
50. SOME TAKEAWAYS (CONT.)
The three types of data required for a simple survival analysis include the following:
A population and phenomenon to study (as string data written in camelCase)
Defined units of time (aka “spell”) (as integer data)
An event (or censoring) (as a dummy variable with 1 = event, 0 = censored)
The unit time may be continuous, or it may be discrete. If it is discrete, the time has to
be in consistent units (and the visual display should be accurate to that).
50
52. STATISTICAL CENTRAL TENDENCIES OF THE DATA
95% of the population that achieve event (make a friend on an online course) will
achieve event within 3.4 – 5.8 days, on average, and those who fall outside that
range tend to be outliers
The mid-point of time-to-event for this population (with half of the scores falling
below and half of the scores falling above) ranges from 1.9 days to 6 days, so there
is a fair amount of variance.
52
53. SOME PERCENTILE-BASED OBSERVATIONS
A vast majority of online learners who ultimately make friends tend to make friends
fairly quickly, within about two days spent online.
Half of the online learners in this class who actually make friends do so within four
days online.
For a fourth of the population who ultimately make friends, they make friends within
7 days online.
From the 26 students, three of the learners have “censored” data. What does this
mean? What does it mean that their data is “lost to followup”?
53
54. DATA CENSORING
Left-censoring, if it existed, would be learners who were already friends prior to the
research observation period.
Certainly, this is not an uncommon possibility, with friends taking classes together, so they can support
each other’s learning.
In this faux data example, this was not depicted.
Of course, there are other potential pasts possible with the population. Censoring refers to a lack of
information, and it does not necessarily suggest event occurrence.
Right-censoring, if it existed, would be learners who become friends (achieve event)
or not (do not achieve event) after the end of the research observation period.
In this faux data case, there are some instances of censoring albeit during the
observation period. This may be conceptualized as people who have decided
against being friends.
54
55. CONTACT AND CONCLUSION
Dr. Shalin Hai-Jew
Kansas State University
212 Hale Library
785-532-5262
shalin@k-state.edu
55