Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data Meets Learning Science: Keynote by Al Essa

1,119 views

Published on

How do we learn and how can we learn better? Educational technology is undergoing a revolution fueled by learning science and data science. The promise is to make a high-quality personalized education accessible and affordable by all. In this presentation Alfred will describe how Apache Spark and Databricks are at the center of the innovation pipeline at McGraw Hill for developing next-generation learner models and algorithms in support of millions of learners and instructors worldwide.

Published in: Data & Analytics
  • Be the first to comment

Big Data Meets Learning Science: Keynote by Al Essa

  1. 1. Big Data Meets Learning Science Apache Spark Summit East 2017 Alfred Essa VP, Research and Data Science McGraw-Hill Education @malpaso
  2. 2. Our Journey from Print to Digital 2 McGraw-HillLearning Science 3 Spark, DataBricks 1 Innovation Pipeline
  3. 3. Speed of innovation, not data, is the differentiator.
  4. 4. Technology Time to Market Spark Factor People Process Apache Spark DataBricks
  5. 5. Innovation Pipeline Research Product Validation Product Development
  6. 6. Databricks underpins our innovation pipeline and workflow.
  7. 7. 2 McGraw-HillLearning Science
  8. 8. From Print to Digital: 128-year Journey K-12, Higher Ed & Professional businesses ~4,800 employees
  9. 9. Parking Lot Adaptive Platform Leverages MHE Reach and Scale How do we learn and how can we learn better?. Introduction of SmartBook May 2013 1,500+ adaptive products available Now Learners who have used MHE Adaptive ~5,500,000 ~10,000,000,000 Student interactions Authors trained to use MHE Adaptive ~4,000
  10. 10. Parking Lot Research Phase Research Step: Build Models and Algorithms Model Iteration: Ship “Data Instrumented” product to improve and tune models Product Focus: Iterate prototypes with business and customers . Research Question: How do we learn and how can we learn better?
  11. 11. Stacked Algorithm Learning Tool for Optimizing Acquisition and Recall 2 31 Learning Science Principles Effortful Recall Spaced Practice Interleaving Cognitive Science Model Mobile App
  12. 12. 3 Spark, DataBricks
  13. 13. The Problem Students drop out or fail their course 1 2 At-risk students can be difficult to identify by instructors Identify at-risk students pre-emptively
  14. 14. The Solution A classifier to predict abandonment Jacqueline Feild Data Scientist Nicholas Lewkow Data Scientist
  15. 15. Solution: A Classifier to Predict Abandonment F1 F2 F3 F4 F5 F61 F1 F2 F3 F4 F5 F61 F1 F2 F3 F4 F5 F60 F1 F2 F3 F4 F5 F61 F1 F2 F3 F4 F5 F6 Classification algorithm 0 • Logistic Regression used for initial classification algorithm • Simple algorithm to interpret • Provides probability estimates instead of hard classification label • Allows for simple interpretation of feature importance • One classifier works for all disciplines
  16. 16. Parking Lot Companies that we havemet with and completed Stage One, but there is no immediate partnership opportunity Parallel Pipeline for Creating Classifier How do we learn and how can we learn better?. Models and Algorithms Ship “Data Instrumented” Product Iterate Prototypes with Customers
  17. 17. Parking Lot Companies that we havemet with and completed Stage One, but there is no immediate partnership opportunity Spark Transformation How do we learn and how can we learn better?. Models and Algorithms Ship “Data Instrumented” Product Iterate Prototypes with Customers
  18. 18. Parking Lot Companies that we havemet with and completed Stage One, but there is no immediate partnership opportunity Speedup with Spark How do we learn and how can we learn better?. Models and Algorithms Ship “Data Instrumented” Product Iterate Prototypes with Customers Sn: Speedup from n cores t1: Time to run on 1 core tn: Time to run on n cores
  19. 19. Parking Lot Companies that we havemet with and completed Stage One, but there is no immediate partnership opportunity Evaluate Model Accuracy How do we learn and how can we learn better?. Models and Algorithms Iterate Prototypes with Customers • Use area under the receiver operating characteristic curve (AUC-ROC) as another measure of model accuracy • 0.9 - 1.0 = excellent • 0.8 - 0.9 = good • 0.7 - 0.8 = fair • 0.6 - 0.7 = poor • 0.5 - 0.6 = fail • Look at how the AUC-ROC for a model changes throughout the semester 22
  20. 20. Evaluate Intervention Window Intervention Window: How much time in advance can we provide for an intervention to occur prior to abandonment?
  21. 21. Conclusions Technology is important, but build an agile innovation workflow with Databricks.

×