How do we learn and how can we learn better? Educational technology is undergoing a revolution fueled by learning science and data science. The promise is to make a high-quality personalized education accessible and affordable by all. In this presentation Alfred will describe how Apache Spark and Databricks are at the center of the innovation pipeline at McGraw Hill for developing next-generation learner models and algorithms in support of millions of learners and instructors worldwide.
8. From Print to Digital: 128-year Journey
K-12, Higher Ed &
Professional
businesses
~4,800
employees
9. Parking Lot
Adaptive Platform Leverages MHE Reach and Scale
How do we learn and how can we learn better?.
Introduction of
SmartBook
May 2013 1,500+ adaptive
products available
Now
Learners who have used MHE
Adaptive
~5,500,000 ~10,000,000,000
Student interactions
Authors trained to use MHE Adaptive
~4,000
10. Parking Lot
Research Phase
Research Step: Build Models and Algorithms
Model Iteration: Ship “Data Instrumented” product
to improve and tune models
Product Focus: Iterate prototypes with business
and customers
.
Research Question: How do we learn and how
can we learn better?
11. Stacked
Algorithm
Learning Tool for Optimizing Acquisition and Recall
2 31
Learning
Science
Principles
Effortful Recall
Spaced
Practice
Interleaving
Cognitive
Science
Model
Mobile App
16. The Problem
Students drop out or fail
their course
1
2 At-risk students can be difficult
to identify by instructors
Identify at-risk students
pre-emptively
17. The Solution
A classifier to
predict
abandonment
Jacqueline Feild
Data Scientist
Nicholas Lewkow
Data Scientist
18. Solution: A Classifier to Predict Abandonment
F1 F2 F3 F4 F5 F61
F1 F2 F3 F4 F5 F61
F1 F2 F3 F4 F5 F60
F1 F2 F3 F4 F5 F61
F1 F2 F3 F4 F5 F6
Classification
algorithm
0
• Logistic Regression used for initial
classification algorithm
• Simple algorithm to interpret
• Provides probability estimates
instead of hard classification label
• Allows for simple interpretation of
feature importance
• One classifier works for all disciplines
19. Parking Lot
Companies that we havemet with
and completed Stage One, but there is
no immediate partnership opportunity
Parallel Pipeline for Creating Classifier
How do we learn and how can we learn better?.
Models and Algorithms
Ship “Data Instrumented” Product
Iterate Prototypes with Customers
20. Parking Lot
Companies that we havemet with
and completed Stage One, but there is
no immediate partnership opportunity
Spark Transformation
How do we learn and how can we learn better?.
Models and Algorithms
Ship “Data Instrumented” Product
Iterate Prototypes with Customers
21. Parking Lot
Companies that we havemet with
and completed Stage One, but there is
no immediate partnership opportunity
Speedup with Spark
How do we learn and how can we learn better?.
Models and Algorithms
Ship “Data Instrumented” Product
Iterate Prototypes with Customers
Sn: Speedup from n cores
t1: Time to run on 1 core
tn: Time to run on n cores
22. Parking Lot
Companies that we havemet with
and completed Stage One, but there is
no immediate partnership opportunity
Evaluate Model Accuracy
How do we learn and how can we learn better?.
Models and Algorithms
Iterate Prototypes with Customers
• Use area under the receiver operating characteristic
curve (AUC-ROC) as another measure of model
accuracy
• 0.9 - 1.0 = excellent
• 0.8 - 0.9 = good
• 0.7 - 0.8 = fair
• 0.6 - 0.7 = poor
• 0.5 - 0.6 = fail
• Look at how the AUC-ROC for a model changes
throughout the semester
22