Challenges in Data-Driven
Education
Beverly Park Woolf, Ivon Arroyo,
Neil Heffernan, Ryan Baker
University of Massachusetts - Amherst
Worcester Polytechnic Institute
University of Pennsylvania - Philadelphia
Supported by
National Science Foundation #1636847
The School of Athens, fresco by Raphael (1509 -1511) in the
Vatican. Plato (left) and Aristotle (right) hold bound copies of
their books.
One Goal: Provide millions of schoolchildren with access
to the personal services of a tutor as well informed as
Plato or Aristotle.
Model the Student
Model the Domain
Personalize Tutoring
Assess Learning
Online tutors can:
• Change curricula in real-time, based on student needs;
• Provide added material for low achieving students.
Results: students learn better,
learn more and learn faster with these systems.
Challenges in Data-Driven Education
• Predict future student events from existing large-
scale longitudinal educational data sets involving
thousands of students;
• Help teachers make sense of dense online data to
influence their teaching;
• Provide personalized instruction based on using big
data that represents student skills and behavior;
• Infer students’ cognitive, motivational, and
metacognitive features in learning.
Example Big Data From PSLC
The percentage of errors made by students on their first attempt. Learning curves
for individual topics. Knowledge components indicate student learning, categorized
by little learning (e.g., square area; rectangle area); no learning (e.g. triangle area);
and still too many errors (e.g., circle-circumference).
Large Data Sets
EventLog Table of a Math Tutoring System. 571,776 rows, just in a year’s time.
Repository for Educational Big Data
Ken Koedinger, CMU, PI.
• NSF funded DataShop, LearnSphere hosts
tens of millions of data points from hundreds
of thousands of students using a variety of
online learning systems.
• Includes log data of student interactions, test
data, field observation data stored in fully de-
identified form, with all identifiers secured.
NSF Funded DataShop/DataSphere
• Central Repository
– Secure place to store & access research data
– Supports various kinds of research
• Analysis & Reporting Tools
– Focus on student-tutor interaction data
– Learning curves & error reports provide summary and low-
level views of student performance
– Performance Profiler aggregates across various levels of
granularity (problem, dataset levels, knowledge
components, etc.)
– Data Export
– New tools created to meet highest demands
Repository
• Support full data management;
• Controlled access for collaboration;
• File attachments;
• Paper attachments;
• Great for secondary analyses.
DataShop Tools
• Learning Curve
• Error Report
• Performance Profiler
• Export
• Import
LEARNING CURVE
The Tools
How can I visualize student performance over time?
pslcdatashop.org
LearnLab DataShop datashop-help@lists.andrew.cmu.edu
Web application
• Learning curve point
decomposition
• Knowledge component model
analysis with learning curves
Learning Curves
Visualizes changes in
student performance over
time
Time is represented on the x-
axis as ‘opportunity’, or the # of
times a student (or students)
had an opportunity to
demonstrate a KC
Hover the y-axis to change the
type of Learning Curve.
Types include:
• Error Rate
• Assistance Score
• Number of Incorrects
• Number of Hints
• Step Duration
• Correct Step Duration
• Error Step Duration
Learning curves: Drill down
Click on a data point to
view point information
Click on the number link to
view details of a particular drill
down information.
Details include:
• Name
• Value
• Number of Observations
Four types of information
for a data point:
• KCs
• Problems
• Steps
• Students
Students likely received too much
practice for these KCs. Consider
reducing thre required number of tasks.
No apparent learning for these KCs.
Consider splitting KC.
Students continued to have difficulty
with these KCs. Consider increasing
opportunities for practice
Students didn't practice these KCs
enough for the data to be interpretable.
PERFORMANCE PROFILER
The Tools
What was the hardest problem for students? How many
students worked in a particular unit?
pslcdatashop.org
LearnLab DataShop datashop-help@lists.andrew.cmu.edu
Performance Profiler
Aggregate by
• Step
• Problem
• Student
• KC
• Dataset Level
View measures of
• Error Rate
• Assistance Score
• Avg # Hints
• Avg # Incorrect
• Residual Error Rate
Multipurpose tool to
help identify areas that
are too hard or easy
View multiple
samples side by
side
Mouse over a row
to reveal
uniqueness
ERROR REPORT
The Tools
How can I explore the errors students made and drill down
to see actual responses and feedback?
pslcdatashop.org
LearnLab DataShop datashop-help@lists.andrew.cmu.edu
Web application
◄ Performance Profiler tool for
exploring the data
Change how the
selected measure
is aggregated by
hovering the title
for the x-axis.
See more details
by hovering a bar
in the graph.
Change the selected measure by
hovering the title for the y-axis.
pslcdatashop.org
PSLC DataShop datashop-help@lists.andrew.cmu.edu
The number of observations by type
(correct, hint, or incorrect)
Details what the
student actual
typed into the
tutor
NSF Big Data Spoke Award
Train researchers and educators in techniques
and tools that personalize education and make
predictions over large data sets.
Use competitions, hackathon and workshops as
part of this process.
Topics to be taught include:
Data Mining
Artificial Intelligence
Machine Learning
Learning Sciences
Research Questions
• What kinds of questions are worth
asking/answering?
– What do teachers and students want to know?
– What do researchers in Learning Sciences want to know?
• What are techniques to answer big questions for big
data in education?
Workshops: Topics to Teach
• How and when to use key methods.
• Methods being developed as well as standard
data mining’ strengths and weaknesses for
different applications.
• How to answer education research questions
and drive intervention and improvement in
education.
• Validity and generalizability; how trustworthy
and applicable are the results.
Workshops
 Philadelphia, Pa., Computer Supported Collaborative
Learning; Full day, June 18-19, 2017
 Wuhan, China, Artificial Intelligence in Education
Half day, June 28-29, 2017
 Wuhan, China, Educational Data Mining
Wuhan, China, June 25-28, 2017
 Worcester, MA. Fall 2017
 New York City, Fall 2017
 Boston, MA., Spring 2017
Topics to Teach
• Google Refine http://code.google.com/p/google-refine/
• Fathom (http://www.keycurriculum.com/products/fathom)
• Rapid Miner (rapidminer.com)
• IBM SPSS Statistics, Version 20
• Tinker Plots
(http://www.keycurriculum.com/products/tinkerplots)
• Many Eyes and IBM Visualization Tools (www-958.ibm.com/)
• TETRAD Causal Modeling Software
Visualization of single variables
Decision trees, Bayesian Networks, Regression
Pre-processing techniques, Visualization of single
variables, Decision trees, Bayesian networks,
Regression
Competitions
Use existing Big data base to:
predict who goes to college and what students
will study.
Longitudinal Data
NSF supported longitudinal research at WPI
Middle-School students have been tracked for
10 years. Results of mathematics actions and
college attendance.
Invite people to a Kaggle competition to predict
student progress.
Datathons
• Weekend hackathons in which participants are
encouraged to enhance existing educational
software, including MathSpring and ASSISTments.
• Participants will
– Design improved animated learning companions;
– Develop visualizations of hints and messages;
– Develop adaptively sequences problems adjusted to
students’ recent levels of ability and effort exerted.
Big Data National PI Meeting 2017

Big Data National PI Meeting 2017

  • 1.
    Challenges in Data-Driven Education BeverlyPark Woolf, Ivon Arroyo, Neil Heffernan, Ryan Baker University of Massachusetts - Amherst Worcester Polytechnic Institute University of Pennsylvania - Philadelphia Supported by National Science Foundation #1636847
  • 2.
    The School ofAthens, fresco by Raphael (1509 -1511) in the Vatican. Plato (left) and Aristotle (right) hold bound copies of their books.
  • 3.
    One Goal: Providemillions of schoolchildren with access to the personal services of a tutor as well informed as Plato or Aristotle.
  • 4.
    Model the Student Modelthe Domain Personalize Tutoring Assess Learning Online tutors can: • Change curricula in real-time, based on student needs; • Provide added material for low achieving students. Results: students learn better, learn more and learn faster with these systems.
  • 5.
    Challenges in Data-DrivenEducation • Predict future student events from existing large- scale longitudinal educational data sets involving thousands of students; • Help teachers make sense of dense online data to influence their teaching; • Provide personalized instruction based on using big data that represents student skills and behavior; • Infer students’ cognitive, motivational, and metacognitive features in learning.
  • 6.
    Example Big DataFrom PSLC The percentage of errors made by students on their first attempt. Learning curves for individual topics. Knowledge components indicate student learning, categorized by little learning (e.g., square area; rectangle area); no learning (e.g. triangle area); and still too many errors (e.g., circle-circumference).
  • 7.
    Large Data Sets EventLogTable of a Math Tutoring System. 571,776 rows, just in a year’s time.
  • 8.
    Repository for EducationalBig Data Ken Koedinger, CMU, PI. • NSF funded DataShop, LearnSphere hosts tens of millions of data points from hundreds of thousands of students using a variety of online learning systems. • Includes log data of student interactions, test data, field observation data stored in fully de- identified form, with all identifiers secured.
  • 9.
    NSF Funded DataShop/DataSphere •Central Repository – Secure place to store & access research data – Supports various kinds of research • Analysis & Reporting Tools – Focus on student-tutor interaction data – Learning curves & error reports provide summary and low- level views of student performance – Performance Profiler aggregates across various levels of granularity (problem, dataset levels, knowledge components, etc.) – Data Export – New tools created to meet highest demands
  • 10.
    Repository • Support fulldata management; • Controlled access for collaboration; • File attachments; • Paper attachments; • Great for secondary analyses.
  • 11.
    DataShop Tools • LearningCurve • Error Report • Performance Profiler • Export • Import
  • 12.
    LEARNING CURVE The Tools Howcan I visualize student performance over time? pslcdatashop.org LearnLab DataShop datashop-help@lists.andrew.cmu.edu
  • 13.
    Web application • Learningcurve point decomposition • Knowledge component model analysis with learning curves
  • 14.
    Learning Curves Visualizes changesin student performance over time Time is represented on the x- axis as ‘opportunity’, or the # of times a student (or students) had an opportunity to demonstrate a KC Hover the y-axis to change the type of Learning Curve. Types include: • Error Rate • Assistance Score • Number of Incorrects • Number of Hints • Step Duration • Correct Step Duration • Error Step Duration
  • 15.
    Learning curves: Drilldown Click on a data point to view point information Click on the number link to view details of a particular drill down information. Details include: • Name • Value • Number of Observations Four types of information for a data point: • KCs • Problems • Steps • Students
  • 16.
    Students likely receivedtoo much practice for these KCs. Consider reducing thre required number of tasks. No apparent learning for these KCs. Consider splitting KC. Students continued to have difficulty with these KCs. Consider increasing opportunities for practice Students didn't practice these KCs enough for the data to be interpretable.
  • 17.
    PERFORMANCE PROFILER The Tools Whatwas the hardest problem for students? How many students worked in a particular unit? pslcdatashop.org LearnLab DataShop datashop-help@lists.andrew.cmu.edu
  • 18.
    Performance Profiler Aggregate by •Step • Problem • Student • KC • Dataset Level View measures of • Error Rate • Assistance Score • Avg # Hints • Avg # Incorrect • Residual Error Rate Multipurpose tool to help identify areas that are too hard or easy View multiple samples side by side Mouse over a row to reveal uniqueness
  • 19.
    ERROR REPORT The Tools Howcan I explore the errors students made and drill down to see actual responses and feedback? pslcdatashop.org LearnLab DataShop datashop-help@lists.andrew.cmu.edu
  • 20.
    Web application ◄ PerformanceProfiler tool for exploring the data
  • 21.
    Change how the selectedmeasure is aggregated by hovering the title for the x-axis. See more details by hovering a bar in the graph. Change the selected measure by hovering the title for the y-axis.
  • 22.
    pslcdatashop.org PSLC DataShop datashop-help@lists.andrew.cmu.edu Thenumber of observations by type (correct, hint, or incorrect) Details what the student actual typed into the tutor
  • 23.
    NSF Big DataSpoke Award Train researchers and educators in techniques and tools that personalize education and make predictions over large data sets. Use competitions, hackathon and workshops as part of this process. Topics to be taught include: Data Mining Artificial Intelligence Machine Learning Learning Sciences
  • 24.
    Research Questions • Whatkinds of questions are worth asking/answering? – What do teachers and students want to know? – What do researchers in Learning Sciences want to know? • What are techniques to answer big questions for big data in education?
  • 25.
    Workshops: Topics toTeach • How and when to use key methods. • Methods being developed as well as standard data mining’ strengths and weaknesses for different applications. • How to answer education research questions and drive intervention and improvement in education. • Validity and generalizability; how trustworthy and applicable are the results.
  • 26.
    Workshops  Philadelphia, Pa.,Computer Supported Collaborative Learning; Full day, June 18-19, 2017  Wuhan, China, Artificial Intelligence in Education Half day, June 28-29, 2017  Wuhan, China, Educational Data Mining Wuhan, China, June 25-28, 2017  Worcester, MA. Fall 2017  New York City, Fall 2017  Boston, MA., Spring 2017
  • 27.
    Topics to Teach •Google Refine http://code.google.com/p/google-refine/ • Fathom (http://www.keycurriculum.com/products/fathom) • Rapid Miner (rapidminer.com) • IBM SPSS Statistics, Version 20 • Tinker Plots (http://www.keycurriculum.com/products/tinkerplots) • Many Eyes and IBM Visualization Tools (www-958.ibm.com/) • TETRAD Causal Modeling Software Visualization of single variables Decision trees, Bayesian Networks, Regression Pre-processing techniques, Visualization of single variables, Decision trees, Bayesian networks, Regression
  • 28.
    Competitions Use existing Bigdata base to: predict who goes to college and what students will study. Longitudinal Data NSF supported longitudinal research at WPI Middle-School students have been tracked for 10 years. Results of mathematics actions and college attendance. Invite people to a Kaggle competition to predict student progress.
  • 29.
    Datathons • Weekend hackathonsin which participants are encouraged to enhance existing educational software, including MathSpring and ASSISTments. • Participants will – Design improved animated learning companions; – Develop visualizations of hints and messages; – Develop adaptively sequences problems adjusted to students’ recent levels of ability and effort exerted.

Editor's Notes

  • #26 Find a common error in English article usage