This document provides an overview of data analytics processes for learning and academic analytics projects. It discusses key data dimensions including computing, location, time, activity, physical conditions, resources, user attributes, and relations. It then covers applications of analytics for learners and teachers to monitor learning and improve performance. The document outlines the stages of an extract-transform-load data processing workflow. Finally, it discusses different methods for knowledge discovery including prediction, structure discovery through clustering and factor analysis, and relationship mining through association rules, correlations, sequential patterns and causal analysis.
Enhancing educational data quality in heterogeneous learning contexts using p...Alex Rayón Jerez
Workshop “Enhancing educational data quality in heterogeneous learning contexts using Pentaho Data Integration” by Alex Rayón Jerez (www.alexrayon.es). Heterogeneous data integration and quality normalization are sensitive issues to properly exploit learning data. In this hands-on tutorial, we will not only extract learning data from their databases, but also enhance data quality issues (granularities, dimensions, duplications, nulll values, etc.) through the use of Pentaho Data Integration. We will practice with the integration of learning data from technology-rich learning environments (LMS, Social Networks, wiki, etc.). It is required the use of a laptop with Pentaho Data Integration module already installed on it, but it is not required previous knowledge of Pentaho.
Introduction to Data Science: presented by Dr. Sotarat Thammaboosadee, ITM Mahidol and Datalent Team. This presentation is a part of Data Science Clinic no.9 organized by Data Science Thailand, 8 March 2017 at All Season Place, Bangkok, Thailand.
Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful ...Damian R. Mingle, MBA
Learning to make use of Jupyter to document your Data Science process - real time - and in whatever programming language you want! Using this methodology will allow you to provide insights that help your organization make better decisions to solve their business problems.
At the University of Calgary, we used real‐time data on student applications to provide the Enrolment Services with better predictive analytics on the students that were offered a place at the University. IR offices are well placed to leverage institutional data to make these predictions. Our knowledge of the data and analytical tools can make us leaders in predictive analytics at our institutions. This presentation will discuss the issues about developing the models, finding the best model and putting it to use. These lessons are applicable to applying these techniques to many situations.
Lessons from Data Science Program at Indiana University: Curriculum, Students...Geoffrey Fox
Invited talk at NSF/TCPP Workshop on Parallel and Distributed Computing Education Edupar at IPDPS 2015 May 25, 2015 5/25/2015 Hyderabad
Discusses Indiana University Data Science Program and experience with online education; the program is available in both online and residential modes. We end by discussing two classes taught both online and residentially and online by Geoffrey Fox. One is BDAA: Big Data Applications & Analytics https://bigdatacourse.appspot.com/course. The other is BDOSSP: Big Data Open Source Software and Projects http://bigdataopensourceprojects.soic.indiana.edu/
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Ilkay Altintas, Ph.D.
The new era of data science is here. Our lives and society are continuously transformed by our ability to collect data in a systematic fashion and turn that into value. The opportunities created by this change also comes with challenges that push for new and innovative data management and analytical methods as well as translating these new methods to applications in many areas that impact science, society, and education. Collaboration and ability of multi-disciplinary teams to work together and communicate to bring together the best of their knowledge in business, data and computing is vital for impactful solutions. This talk will discusses a reference ecosystem and question-driven methodology, called PPODS, to make impactful data science applications in many fields with specific examples in hazards, smart cities and biomedical research.
Clemens Blumel - Alternative metrics in OpenUP | OpenUP Final ConferenceOpenUP project
Clemens Blumel presenting OpenUP project's outcomes in regard to Alternative metrics at the OpenUP Final Conference.
A few words about OpenUP Final Conference - Review | Assess | Disseminate
OpenUP Final Conference is the final conference of the EU funded H2020 project OpenUP. In OpenUP Final Conference, key aspects and challenges of the currently transforming science landscape were showcased in different interactive sessions, including an Open Science Cafe and Marketplace for new and innovative tools, methods and ideas. Different Motivate and Meet sessions fostered interaction and exchange in the context of Open Science.
It brought together different stakeholders who have a "stake" in the researcher lifecycle and helped them to learn about innovative methods for peer review, dissemination of research results and impact measurement, and get involved in shaping open science policies meeting their needs.
More information about OpenUP
Website: http://openup-h2020.eu
OpenUP Hub: https://openuphub.eu
Twitter: https://twitter.com/ProjectOpenUP
Facebook: https://www.facebook.com/projectopenup/
Data Driven College Counseling by SchooLinksKatie Fang
This workshop will expose school counselors and administrators to a framework for data-driven college planning and accountability. Attendees will learn about data collection, pattern analysis, and translating insight into intervention to best support students in their college planning process. No special statistical knowledge is required for this session, just enthusiasm to understand how using data unlock better student outcomes.
Enhancing educational data quality in heterogeneous learning contexts using p...Alex Rayón Jerez
Workshop “Enhancing educational data quality in heterogeneous learning contexts using Pentaho Data Integration” by Alex Rayón Jerez (www.alexrayon.es). Heterogeneous data integration and quality normalization are sensitive issues to properly exploit learning data. In this hands-on tutorial, we will not only extract learning data from their databases, but also enhance data quality issues (granularities, dimensions, duplications, nulll values, etc.) through the use of Pentaho Data Integration. We will practice with the integration of learning data from technology-rich learning environments (LMS, Social Networks, wiki, etc.). It is required the use of a laptop with Pentaho Data Integration module already installed on it, but it is not required previous knowledge of Pentaho.
Introduction to Data Science: presented by Dr. Sotarat Thammaboosadee, ITM Mahidol and Datalent Team. This presentation is a part of Data Science Clinic no.9 organized by Data Science Thailand, 8 March 2017 at All Season Place, Bangkok, Thailand.
Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful ...Damian R. Mingle, MBA
Learning to make use of Jupyter to document your Data Science process - real time - and in whatever programming language you want! Using this methodology will allow you to provide insights that help your organization make better decisions to solve their business problems.
At the University of Calgary, we used real‐time data on student applications to provide the Enrolment Services with better predictive analytics on the students that were offered a place at the University. IR offices are well placed to leverage institutional data to make these predictions. Our knowledge of the data and analytical tools can make us leaders in predictive analytics at our institutions. This presentation will discuss the issues about developing the models, finding the best model and putting it to use. These lessons are applicable to applying these techniques to many situations.
Lessons from Data Science Program at Indiana University: Curriculum, Students...Geoffrey Fox
Invited talk at NSF/TCPP Workshop on Parallel and Distributed Computing Education Edupar at IPDPS 2015 May 25, 2015 5/25/2015 Hyderabad
Discusses Indiana University Data Science Program and experience with online education; the program is available in both online and residential modes. We end by discussing two classes taught both online and residentially and online by Geoffrey Fox. One is BDAA: Big Data Applications & Analytics https://bigdatacourse.appspot.com/course. The other is BDOSSP: Big Data Open Source Software and Projects http://bigdataopensourceprojects.soic.indiana.edu/
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Ilkay Altintas, Ph.D.
The new era of data science is here. Our lives and society are continuously transformed by our ability to collect data in a systematic fashion and turn that into value. The opportunities created by this change also comes with challenges that push for new and innovative data management and analytical methods as well as translating these new methods to applications in many areas that impact science, society, and education. Collaboration and ability of multi-disciplinary teams to work together and communicate to bring together the best of their knowledge in business, data and computing is vital for impactful solutions. This talk will discusses a reference ecosystem and question-driven methodology, called PPODS, to make impactful data science applications in many fields with specific examples in hazards, smart cities and biomedical research.
Clemens Blumel - Alternative metrics in OpenUP | OpenUP Final ConferenceOpenUP project
Clemens Blumel presenting OpenUP project's outcomes in regard to Alternative metrics at the OpenUP Final Conference.
A few words about OpenUP Final Conference - Review | Assess | Disseminate
OpenUP Final Conference is the final conference of the EU funded H2020 project OpenUP. In OpenUP Final Conference, key aspects and challenges of the currently transforming science landscape were showcased in different interactive sessions, including an Open Science Cafe and Marketplace for new and innovative tools, methods and ideas. Different Motivate and Meet sessions fostered interaction and exchange in the context of Open Science.
It brought together different stakeholders who have a "stake" in the researcher lifecycle and helped them to learn about innovative methods for peer review, dissemination of research results and impact measurement, and get involved in shaping open science policies meeting their needs.
More information about OpenUP
Website: http://openup-h2020.eu
OpenUP Hub: https://openuphub.eu
Twitter: https://twitter.com/ProjectOpenUP
Facebook: https://www.facebook.com/projectopenup/
Data Driven College Counseling by SchooLinksKatie Fang
This workshop will expose school counselors and administrators to a framework for data-driven college planning and accountability. Attendees will learn about data collection, pattern analysis, and translating insight into intervention to best support students in their college planning process. No special statistical knowledge is required for this session, just enthusiasm to understand how using data unlock better student outcomes.
A hands-on approach to digital tool criticism: Tools for (self-)reflectionMarijn Koolen
Digital tool criticism is a recent and important discussion in Digital Humanities research. We define digital tool criticism as the reflection on the role of digital tools in the research methodology and the evaluation of the suitability of a given digital tool for a specific research goal. The aim is to understand the impact of any limitation of the tool on the specific goal, not to improve a tool’s performance. That is, ensuring as a scholar to be aware of the impact of a tool on research design, methods, interpretations and outcomes. Our goal with developing digital tool criticism as a method is to help scholars better understand how research methods, tools and activities shape our interpretations. Based on our experiences with two hands-on workshops on digital tool criticism, we find that reflection on using digital tools and data in all phases of the research process is key.
Reflection urges scholars to consider digital data and tools as part of the overall research goals and design, and interdependent with other elements of research design, namely research questions and methods. As scholars go through their research process, assumptions on the research design and the connection between tools, data and questions are constantly challenged, forcing updates in the design and the interpretation of data and question.
Data Driven College Counseling by SchooLinksKatie Fang
This workshop will expose school counselors and administrators to a framework for data-driven college planning and accountability. Attendees will learn about data collection, pattern analysis, and translating insight into intervention to best support students in their college planning process. No special statistical knowledge is required for this session, just enthusiasm to understand how using data unlock better student outcomes.
EMMA Summer School - Rebecca Ferguson - Learning design and learning analytic...EUmoocs
This hands-on workshop will work with learning design tools and with massive open online courses (MOOCs) on the FutureLearn platform to explore how learning design can be used to influence the choice and design of learning analytics. This workshop will be of interest to people who are involved in the design or presentation of online courses, and to those who want to find out more about learning design, learning analytics or MOOCs. Participants will find it helpful to have registered for FutureLearn and explored the platform for a short time in advance of the workshop.
This presentation was given during the EMMA Summer School, that took place in Ischia (Italy) on 4-11 July 2015.
More info on the website: http://project.europeanmoocs.eu/project/get-involved/summer-school/
Follow our MOOCs: http://platform.europeanmoocs.eu/MOOCs
Design and deliver your MOOC with EMMA: http://project.europeanmoocs.eu/project/get-involved/become-an-emma-mooc-provider/
Learning Analytics – Research challenges arising from a current review of LA useRiina Vuorikari
The JRC published a report on the use of Learning Analytics in education. These slides talk about the research challenges that arise from that report.
Ferguson, R., Brasher, A., Clow, D., Cooper, A., Hillaire, G., Mittelmeier, J., Rienties, B., Ullmann, T., Vuorikari, R., Research Evidence on the Use of Learning Analytics and Their Implications for Education Policy. (2016), Joint Research Centre Science for Policy Report.
https://ec.europa.eu/jrc/en/publication/eur-scientific-and-technical-research-reports/research-evidence-use-learning-analytics-implications-education-policy
Data Scopes - Towards transparent data research in digital humanities (Digita...Marijn Koolen
Data scopes describe the process of data gathering, cleaning and combining in digital humanities research, which is too often considered as mere preparation that is not part of research, and is mostly not described in scholarly communications. We argue that scholars need to be more aware of the intellectual effort of this process and make it more transparent
This master class covers the latest developments and possibilities of learning analytics and addresses the issue of visualising data for teachers using current examples.
This class is organised in the context of the LACE (Learning Analytics Community Exchange) project which brings together existing key European players in the field of learning analytics & Educational Data Mining in order to support development of communities of practice and share emerging best practices.
This presentation was provided by Starr Hoffman of Director, Planning & Assessment, University of Nevada – Las Vegas during the NISO event, NISO Training Series: Assessment Practices and Metrics for the 21st Century, held on Friday, October 26, 2018.
Similar to Data Analytics.03. Data processing (20)
El Big Data en la dirección comercial: market(ing) intelligenceAlex Rayón Jerez
Sesión donde vimos mediante el método del caso diferentes aplicaciones del análisis de datos al mundo de la dirección comercial. Dentro del Programa Experto en Dirección Comercial de la Deusto Business School.
Herramientas y metodologías Big Data para acceder a datos no estructuradosAlex Rayón Jerez
Conferencia "Herramientas y metodologías Big Data para acceder a datos no estructurados" en las Jornadas "Investigación para Mejorar la Adecuación Asistencial. Foro sanitario interesado en la aplicación de tecnologías y metodologías Big Data para la extracción de conocimiento a partir de datos no estructurados.
Las competencias digitales como método de observación de competencias genéricasAlex Rayón Jerez
Conferencia "Las competencias digitales como método de observación de competencias genéricas" impartida el 21 de Abril de 2016 en Innobasque, Zamudio, Bizkaia. En el marco de los "Brunch & Learn" que organiza Innobasque, en una jornada donde hablamos de competencias profesionales y digitales, su aportación al campo de la empresa, y en qué consisten realmente. Se habló mucho de su importancia en este Siglo XXI que nos ocupa.
Conferencia "El BIg Data en mi empresa ¿de qué me sirve?" en el Donostia - San Sebastián el 20 de Abril de 2016. Jornadas "Big Data para PYMEs". Hablo sobre el perfil Big Data y sus competencias, así como las utilidades que tiene para las empresas.
Aplicación del Big Data a la mejora de la competitividad de la empresaAlex Rayón Jerez
Conferencia "Aplicación del Big Data a la mejora de la competitividad de la empresa" celebrada el 21 de Marzo de 2016 en Palma de Mallorca, en la Universidad de las Islas Baleares. El objetivo era entrever las posibilidades que abre el Big Data dentro del contexto de la empresa y su competitividad.
Análisis de Redes Sociales (Social Network Analysis) y Text MiningAlex Rayón Jerez
Presentación sobre la sesión "Análisis de Redes Sociales (Social Network Analysis) y Text Mining", dentro del Programa Ejecutivo de Big Data y Business Intelligence celebrado en Madrid en Febrero de 2016, en nuestra sede de la Universidad de Deusto.
Marketing intelligence con estrategia omnicanal y Customer JourneyAlex Rayón Jerez
Presentación sobre la sesión "Marketing intelligence con estrategia omnicanal y Customer Journey", dentro del Programa Ejecutivo de Big Data y Business Intelligence celebrado en Madrid en Febrero de 2016, en nuestra sede de la Universidad de Deusto.
Presentación sobre la sesión "Modelos de propensión en la era del Big Data", dentro del Programa Ejecutivo de Big Data y Business Intelligence celebrado en Madrid en Febrero de 2016, en nuestra sede de la Universidad de Deusto.
Presentación sobre la sesión "Customer Lifetime Value Management con Big Data", dentro del Programa Ejecutivo de Big Data y Business Intelligence celebrado en Madrid en Febrero de 2016, en nuestra sede de la Universidad de Deusto.
Presentación sobre la sesión "Big Data: the Management Revolution", dentro del Programa Ejecutivo de Big Data y Business Intelligence celebrado en Madrid en Febrero de 2016, en nuestra sede de la Universidad de Deusto.
Presentación sobre la sesión "Optimización de procesos con el Big Data", dentro del Programa Ejecutivo de Big Data y Business Intelligence celebrado en Madrid en Febrero de 2016, en nuestra sede de la Universidad de Deusto.
La economía del dato: transformando sectores, generando oportunidadesAlex Rayón Jerez
Ponencia "La economía del dato: transformando sectores, generando oportunidades" preparada para el I Databeers Euskadi, promovido y organizado por Decidata (www.decidata.es). Hablando de los retos y las oportunidades que ha traído esta era de los datos.
Cómo crecer, ser más eficiente y competitivo a través del Big DataAlex Rayón Jerez
Conferencia "Cómo crecer, ser más eficiente y competitivo a través del Big Data" impartida en el 14º Congreso HORECA de AECOC, Asociación Española de Codificación Comercial). Hablando de la aplicación del Big Data al canal HORECA.
El poder de los datos: hacia una sociedad inteligente, pero éticaAlex Rayón Jerez
Lectio Brevis del profesor Alex Rayón, de la Facultad de Ingeniería. Nos habla sobre el poder que han adquirido los datos en esta era. Es lo que se ha venido a conocer como Big Data. Un área, que también entraña retos legales y éticos, expuestos en el texto.
Búsqueda, organización y presentación de recursos de aprendizajeAlex Rayón Jerez
Curso de formación interna "Búsqueda, organización y presentación de recursos de aprendizaje" en la Universidad de Deusto. Cómo buscar, organizar y presentar recursos de aprendizaje para luego poder utilizar en contextos educativos.
Deusto Knowledge Hub como herramienta de publicación y descubrimiento de cono...Alex Rayón Jerez
Curso de formación interna "Google Calendar para la planificación de la asignatura con mis estudiantes" en la Universidad de Deusto. Para qué me sirve en mi día a día el repositorio Deusto Knowledge Hub como herramietna de publicación y descubrimiento de conocimiento.
Fomentando la colaboración en el aula a través de herramientas socialesAlex Rayón Jerez
Curso de formación interna "Fomentando la colaboración en el aula a través de herramientas sociales" en la Universidad de Deusto. Herramientas de naturaleza social para fomentar la colaboración en al aula entre profesor y estudiantes.
Utilizando Google Drive y Google Docs en el aula para trabajar con mis estudi...Alex Rayón Jerez
Curso de formación interna "Utilizando Google Drive y Google Docs en el aula para trabajar con mis estudiantes" en la Universidad de Deusto. Cómo utlizar Google Drive y Docs para trabajar en el aula con mis estudiantes.
Procesamiento y visualización de datos para generar nuevo conocimientoAlex Rayón Jerez
Curso de formación interna "Procesamiento y visualización de datos para generar nuevo conocimiento" en la Universidad de Deusto. Procesamiento de datos a pequeña y precisa escala (Smart Data) para mejorar mi día a día en la universidad.
El Big Data y Business Intelligence en mi empresa: ¿de qué me sirve?Alex Rayón Jerez
Conferencia "El Big Data y Business Intelligence en mi empresa: ¿de qué me sirve?" impartida en Medellín, Colombia, en Septiembre de 2015. Sesión dirigida a empresas para que conozcan las posibilidades que abre el Big Data para su día a día.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
1. Data Analytics process in
Learning and Academic
Analytics projects
Day 3: Data processing
Alex Rayón Jerez
alex.rayon@deusto.es
DeustoTech Learning – Deusto Institute of Technology – University of Deusto
Avda. Universidades 24, 48007 Bilbao, Spain
www.deusto.es
2. Table of contents
● Data dimensions
● Applications
● Data processing in an ETL refined data
● Knowledge discovery
3. Table of contents
● Data dimensions
● Applications
● Data processing in an ETL refined data
● Knowledge discovery
5. Data dimensions
1) Computing
● Software
○ Example
■ Q1. Among the tools, which is more representative of
the final grade?
■ Q5. Which is the impact of the social networks in the
group composition?
■ Q6. Which tools are more prone to foster
collaboration?
■ Q7. The use of some collaboration tools has effect on
the final grade?
● Hardware
● Network
8. Data dimensions
4) Activity
● Events
● Tasks
● Goals
● Subject
○ Example
■ Q2. Which are the differences in terms of grades
between this subject and other subjects where we
already know the final grade?
11. Data dimensions
7) User
● Basic info
○ Example
■ Q3. Is there any gender difference in the use of the
tools?
● Knowledge
● Interest
● Goals
○ Short-term
○ Long-term
● Learning styles
● Affects
● Background
12. Data dimensions
8) Relations
● Social relations
○ Example
■ Q4. Are there groups of people that repeatedly
collaborate in different tools?
■ Q4. Do these groups repeat over time?
● Functional relations
● Compositional relations
● Proximity
● Orientation
● Communication
13. Table of contents
● Data dimensions
● Applications
● Data processing in an ETL refined data
● Knowledge discovery
14. Applications
Why do learners use analytics?
[Ferguson2014]
● Monitor their own activities and interactions
● Monitor the learning process
● Compare their activity with that of others
● Increase awareness, reflect and self reflect
● Improve discussion participation
● Improve learning behaviour
● Improve performance
● Become better learners
● Learn!
15. Applications
Why do teachers use analytics?
[Ferguson2014]
● Monitor the learning process
● Explore student data
● Identify problems
● Discover patterns
● Find early indicators for success
● Find early indicators for poor marks or drop-
out
● Assess usefulness of learning materials
16. Applications
Why do teachers use analytics? (Ii)
● Increase awareness, reflect and self reflect
● Increase understanding of learning
environments
● Intervene, advise and assist
● Improve teaching, resources and the
environment
17. Table of contents
● Data dimensions
● Applications
● Data processing in an ETL refined data
● Knowledge discovery
31. Knowledge discovery
1) Prediction methods
● The goal is to develop a model which can infer
a single aspect of the data
○ The predicted variable
○ Similar to dependent variables in traditional statistical
analysis
● … from some combination of other aspects
of the data
○ Predictor variables
○ Similar to independent variables in traditional
statistical analysis
32. Knowledge discovery
1) Prediction methods (II)
● Prediction models are commonly used:
○ Predict future events (Dekker2009; Feng2009;
MingMing2012)
○ Predict variables that are not feasible to directly
collect in real-time
■ Example: collecting data on affect or engagement in
real-time often requires expensive observations or
disruptive self-report measures
■ Whereas a prediction model based on student log
data can be completely non-intrusive
(Sabourin2011)
34. Knowledge discovery
1) Prediction methods (IV)
● Three types of prediction models are common
in EDM/LA:
○ Classifiers
○ Regressors
○ Latent knowledge estimation
35. Knowledge discovery
1) Prediction methods (V)
Source: Data Mining with WEKA MOOC (http://www.cs.waikato.ac.nz/ml/weka/mooc/dataminingwithweka/)
36. Knowledge discovery
1) Prediction methods (VI)
● Classifiers
○ The predicted variable can be either a binary (e.g. 0 or
1) or a categorical variable
○ Some popular classification methods in educational
domains include:
■ Decision trees
■ Random forest
■ Decision rules
■ Step regression
■ Logistic regression
37. Knowledge discovery
1) Prediction methods (VII)
Source: Data Mining with WEKA MOOC (http://www.cs.waikato.ac.nz/ml/weka/mooc/dataminingwithweka/)
38. Knowledge discovery
1) Prediction methods (VIII)
● Regressors
○ The predicted variable is a continuous variable
■ For example: if the Grade can be explained by the
number of pending subjects and the call number
○ The most popular regressor in EDM is linear
regression
■ Note that linear regression is not used the same
way in EDM/LA as in traditional statistics, despite
the identical name
39. Knowledge discovery
1) Prediction methods (IX)
Source: Data Mining with WEKA MOOC (http://www.cs.waikato.ac.nz/ml/weka/mooc/dataminingwithweka/)
40. Knowledge discovery
1) Prediction methods (X)
Source: Data Mining with WEKA MOOC (http://www.cs.waikato.ac.nz/ml/weka/mooc/dataminingwithweka/)
41. Knowledge discovery
1) Prediction methods (XI)
● Latent Knowledge Estimation
○ Actually is a special type of classifier
○ A student’s knowledge of specific skills and concepts is
assessed by their patterns of correctness on those
skills
○ A wide range of algorithms exist for latent knowledge
estimation, being the two most popular:
■ Bayesian Knowledge Tracing (Corbett & Anderson,
1995)
■ Performance Factors Analysis (Pavlik2009)
42. Knowledge discovery
1) Prediction methods (XII)
● Classifiers in WEKA are models for predicting
nominal or numeric quantities
● Implemented learning schemes include:
○ Decision trees and lists, instance-based classifiers,
support vector machines, multi-layer perceptrons,
logistic regression, Bayes’ nets, etc.
● “Meta”-classifiers include:
○ Bagging, boosting, stacking, error-correcting output
codes, locally weighted learning, etc.
44. Knowledge discovery
2) Structure discovery
● Attempt to find structure in the data without
an a priori idea of what should be found
● It is, actually, a very different goal than in
prediction
○ In prediction, there is a specific variable that the
EDM/LA researcher attempts to model;
○ By contrast, there is not a specific variable of
interest in structure discovery
○ Instead, the researcher attempts to determine what
structure emerges naturally from the data
46. Knowledge discovery
2) Structure discovery (III)
● Clustering
○ The goal is to find data points that naturally group
together, splitting the full data set into a set of clusters
○ Clustering is particularly useful in cases where the
most common categories within the data set are not
known in advance
○ If a set of clusters is well-selected, each data point in a
cluster will generally be more similar to the other data
points in that cluster than data points in other clusters
47. Knowledge discovery
2) Structure discovery (IV)
● Clustering
○ Clusters have been used to group students (Beal2006)
and student actions (Amershi2009)
■ Amershi & Conati (2009) found characteristic
patterns in how students use exploratory learning
environments, and used this information to
identify more and less effective student strategies
48. Knowledge discovery
2) Structure discovery (IV)
● Factor analysis
○ A closely related method
○ Here, the goal is to find variables that naturally group
together, splitting the set of variables (as opposed to
the data points) into a set of latent (not directly
observable) factors
○ Factor analysis is frequently used in psychometrics for
validating or determining scales
49. Knowledge discovery
2) Structure discovery (V)
● Factor analysis
○ In EDM/LA, factor analysis is used for dimensionality
reduction (e.g., reducing the number of variables) for a
wide variety of applications
○ For instance, [Baker2009] used factor analysis to
determine which design choices are made in common
by the designers of intelligent tutoring systems
■ For instance, tutor designers tend to use principle
based hints rather than concrete hints in tutor
problems that have brief problem scenarios
50. Knowledge discovery
2) Structure discovery (VI)
● Social Network Analysis
○ Models are developed of the relationships and
interactions between individual actors, as well as the
patterns that emerge from those relationships and
interactions
○ Examples
■ Understanding the differences between effective
and ineffective project groups [Kay2006]
■ How students’ communication behaviors change
over time [Haythornthwaite2001]
■ How students’ positions in a social network relate
to their perception of being part of a learning
51. Knowledge discovery
2) Structure discovery (VII)
● Domain structure discovery
○ Consists of finding the structure of knowledge in an
educational domain (e.g., how specific content maps
to specific knowledge components or skills, across
students)
○ This could consist of mapping problems in educational
software to specific knowledge components, in order
to group the problems effectively for latent knowledge
estimation and problem selection [Koedinger2006], or
could consist of mapping test items to skills
[Tatsuoka1995]
52. Knowledge discovery
2) Structure discovery (VIII)
● WEKA contains “clusterers” for finding groups
of similar instances in a dataset
● Implemented schemes are:
○ k-Means, EM, Cobweb, X-means, FarthestFirst
● Clusters can be visualized and compared to
“true” clusters (if given)
● Evaluation based on loglikelihood if clustering
scheme produces a probability distribution
53. Knowledge discovery
3) Relationship mining
● Discover relationships between variables in a
data set with a large number of variables
● It has historically been the most common
category of EDM research [Baker2009]
● It may take the form of attempting to find out
which variables are most strongly associated
with a single variable of particular interest
● Or may take the form of attempting to
discover which relationships between any two
variables are strongest
54. Knowledge discovery
3) Relationship mining (II)
● There are four types of relationship mining
○ Association rule mining
○ Correlation mining
○ Sequential pattern mining
○ Causal data mining
55. Knowledge discovery
3) Relationship mining (III)
● Association rule mining
○ The goal is to find if-then rules of the form that if some
set of variable values is found, another variable will
generally have a specific value
○ For instance, [BenNaim2009] used association rule
mining to find patterns of successful student
performance in an engineering simulation, to make
better suggestions to students having difficulty about
how they can improve their performance
56. Knowledge discovery
3) Relationship mining (IV)
● Correlation mining
○ The goal is to find positive or negative linear
correlations between variables (using post-hoc
corrections or dimensionality reduction methods
when appropriate to avoid finding spurious
relationships)
○ An example can be found in [Baker2009], where
correlations were computed between a range of
features of the design of intelligent tutoring system
lessons and students’ prevalence of gaming the system
57. Knowledge discovery
3) Relationship mining (V)
● Sequential pattern mining
○ The goal is to find temporal associations between
events
○ One successful use of this approach was work by
[Perera2009], to determine what path of student
collaboration behaviors leads to a more successful
eventual group project
58. Knowledge discovery
3) Relationship mining (VI)
● Causal data mining
○ The goal is to find whether one event (or observed
construct) was the cause of another event (or
observed construct)
○ For example to predict which factors will lead a
student to do poorly in a class [Fancsali2012]
59. Knowledge discovery
3) Relationship mining (VII)
● WEKA contains an implementation of the
Apriori algorithm for learning association
rules
○ Works only with discrete data
● Can identify statistical dependencies between
groups of attributes:
○ milk, butter bread, eggs (with confidence 0.9 and
support 2000)
● Apriori can compute all rules that have a given
minimum support and exceed a given
confidence
61. Knowledge discovery
4) Attribute selection
● Panel that can be used to investigate which
(subsets of) attributes are the most
predictive ones
● Attribute selection methods contain two parts:
○ A search method: best-first, forward selection,
random, exhaustive, genetic algorithm, ranking
○ An evaluation method: correlation-based, wrapper,
information gain, chi-squared, etc.
● Very flexible: WEKA allows (almost) arbitrary
combinations of these two
64. References
[Amershi2009] Amershi, S., Conati, C. (2009). Combining Unsupervised and Supervised Machine Learning to Build User Models for Exploratory
Learning Environments. Journal of Educational Data Mining, 1(1), 71-81.
[BakerSiemens2014] Baker, R., and George Siemens. "Educational data mining and learning analytics." Cambridge Handbook of the Learning
Sciences: (2014).
[BakerYacef2009] Baker, R.S.J.d., Yacef, K. (2009). The State of Educational Data Mining in 2009: A Review and Future Visions. Journal of
Educational Data Mining, 1 (1), 3-17
[Beal2006] Beal, C.R., Qu, L., & Lee, H. (2006). Classifying learner engagement through integration of multiple data sources. Paper presented at the
21st National Conference on Artificial Intelligence (AAAI-2006), Boston, MA.
[CorbettAnderson1995] Corbett, A.T., Anderson, J.R. (1995). Knowledge Tracing: Modeling the Acquisition of Procedural Knowledge. User
Modeling and User-Adapted Interaction, 4, 253-278.
[Dawson2008] Dawson, S. (2008). A study of the relationship between student social networks and sense of community. Educational Technology &
Society, 11(3), 224-238.
[Dekker2009] Dekker, G., Pechenizkiy, M., and Vleeshouwers, J. (2009). Predicting students drop out: A case study. Proceedings of the 2nd
International Conference on Educational Data Mining, EDM'09, 41-50
[Fancsali2012] Fancsali, S. (2012) Variable Construction and Causal Discovery for Cognitive Tutor Log Data: Initial Results. Proceedings of the 5th
International Conference on Educational Data Mining, 238-239.
[Feng2009] Feng, M., Heffernan, N., & Koedinger, K. (2009). Addressing the Assessment Challenge in an Intelligent Tutoring System that Tutors as it
Assesses. User Modeling and User-Adapted Interaction, 19, 243-266
[Ferguson2012] Ferguson, R. (2012). The State Of Learning Analytics in 2012: A Review and Future Challenges. Technical Report KMI-12-01,
Knowledge Media Institute, The Open University, UK. http://kmi.open.ac.uk/publications/techreport/kmi-12-01
[Ferguson2014] Learning analytics FAQs [Online]. URL: http://www.slideshare.net/R3beccaF/learning-analytics-fa-qs
[Haythornthwaite2001] Haythornthwaite, C. (2001). Exploring Multiplexity: Social Network Structures in a ComputerSupported Distance Learning
Class. The Information Society: An International Journal, 17 (3), 211-226.
[Kay2006] Kay, J., Maisonneuve, N., Yacef, K., Reimann, P. (2006) The Big Five and Visualisations of Team Work Activity. Proceedings of the
International Conference on Intelligent Tutoring Systems, 197 – 206.
65. References (II)
[Koedinger2006] Koedinger, K. R., & Corbett, A. T. (2006). Cognitive Tutors: Technology bringing learning science to the classroom. In K. Sawyer
(Ed.) The Cambridge Handbook of the Learning Sciences (pp. 61-78). New York: Cambridge University Press.
[MingMing2012] Ming, N.C., Ming, V.L. (2012). Predicting Student Outcomes from Unstructured Data. Proceedings of the 2nd International
Workshop on Personalization Approaches in Learning Environments, 11-16.
[Pavlik2009] Pavlik, P.I., Cen, H., Koedinger, K.R. (2009). Performance Factors Analysis -- A New Alternative to Knowledge Tracing. Proceedings of
AIED2009.
[Perera2009] Perera, D., Kay, J., Koprinska, I., Yacef, K., and Zaiane, O.R. (2009). Clustering and Sequential Pattern Mining of Online Collaborative
Learning Data. IEEE Transactions on Knowledge and Data Engineering, 21(6), 759-772
[RomeroVentura2010]Romero, C., & Ventura, S. (2010). Educational data mining: A review of the state-ofthe-art. IEEE Transaction on Systems,
Man and Cybernetics, part C: Applications and Reviews, 40(6), 610–618
[Sabourin2011] Sabourin, J., Rowe, J., Mott, B., Lester, J. (2011). When Off-Task in On-Task: The Affective Role of Off-Task Behavior in Narrative-
Centered Learning Environments. Proceedings of the 15th International Conference on Artificial Intelligence in Education, 534-536.
[SiemensBaker2012] Siemens, G., Baker, R.S.J.d. (2012). Learning Analytics and Educational Data Mining: Towards Communication and
Collaboration. Proceedings of the 2nd International Conference on Learning Analytics and Knowledge.
[Tatsuoka1995] Tatsuoka, K.K. (1995). Architecture of knowledge structures and cognitive diagnosis: A statistical pattern recognition and
classification approach. In P. D. Nichols, S. F. Chipman, & R. L. Brennan (Eds.), Cognitively diagnostic assessment, 327–359. Hillsdale NJ: Erlbaum
[Verbert2011] Dataset-driven research to improve TEL recommender systems [Online]. URL: http://www.slideshare.net/kverbert/datasetdriven-
research-to-improve-tel-recommender-systems
66. Data Analytics process in
Learning and Academic
Analytics projects
Day 3: Data processing
Alex Rayón Jerez
alex.rayon@deusto.es
DeustoTech Learning – Deusto Institute of Technology – University of Deusto
Avda. Universidades 24, 48007 Bilbao, Spain
www.deusto.es