Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apereo Webinar: Learning What Works When Scaling Analytics Infrastructure (January 24, 2018)

210 views

Published on

With student success as a primary institutional goal, NC State’s DELTA organization has taken initial steps in building a scalable analytics infrastructure. This webinar provides insights into the open source frameworks, analytics technology, and strategy required to deploy analytics infrastructure with efficient IT delivery. We will discuss planning an architecture to accommodate large amounts of data, while still providing predictions in short order. We will also touch on some work we are doing building cohorts of sub-populations to improve scalability and accuracy. In addition, we will discuss ongoing and future work to improve the infrastructure even further.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Apereo Webinar: Learning What Works When Scaling Analytics Infrastructure (January 24, 2018)

  1. 1. Apereo Webinar: Learning What Works When Scaling Analytics Infrastructure LOU HARRISON DIRECTOR OF EDUCATIONAL TECHNOLOGY SERVICES DELTA NORTH CAROLINA STATE UNIVERSITY LOU@NCSU.EDU GARY GILBERT SOFTWARE ARCHITECT UNICON GGILBERT@UNICON.NET
  2. 2. ● Brief History: Open Academic Analytics Initiative (OAAI) ● The research ● Flashback to last year ● From Pilot to Enterprise efforts ● Slice and dice, including examples of ways to segment the population ● Results ● Infrastructure overview ● Next steps / Q&A If you’d like to follow along: https://goo.gl/g2MTCa INTRODUCTION/OVERVIEW
  3. 3. ● Open Academic Analytics Initiative (OAAI) ○ EDUCAUSE Next Gen Learning Challenge (NGLC) ○ Funded by Bill & Melinda Gates Foundation ● Leverage SIS and LMS data to create an open source academic early alert system (and interventions) ● OAAI led to the Learning Analytics Processor (LAP) project, which is part of the Apereo Learning Analytics Initiative ● Exciting results; however, all LMS data was based on Sakai Models ● NC State partnered with Unicon and Marist College to bring LAP to NC State, applying it to their Moodle LMS BRIEF HISTORY
  4. 4. Predictive Model worked well and was quite portable to other schools (with some tuning). For more info, see JAYAPRAKASH, S. M., MOODY, E. W., LAURÍA, E. J., REGAN, J. R., & BARON, J. D. (2014). EARLY ALERT OF ACADEMICALLY AT-RISK STUDENTS: AN OPEN SOURCE ANALYTICS INITIATIVE. JOURNAL OF LEARNING ANALYTICS, 1(1), 6-47. THE RESEARCH
  5. 5. ● Our Phase 1 Proof of Concept showed a 75% accuracy in predicting at-risk students.* Recall rates were 88-90%, but with high false positives (25%) ● Phase 2 (FY 15-16) ○ Make the LAP more automated, bigger, and badder ○ More Enterprise, more nimble ○ Similar results with much larger datasets *in a small dataset, of incomplete historical data FLASHBACK TO FY 16-17
  6. 6. Phase 3 work ● Cohorts (different models for different type classes) ○ Maybe, if incremental improvement outweighs cost ○ Tested ways to slice & dice into smaller cohorts to improve accuracy ■ By LMS usage (no,light,med,heavy) ■ By Enrollment size (small,med,large) ■ By Student Level (FR, SO, JR, SR, GR) ● We learned splitting by courses is better than by people ● Splitting by LMS usage shows real promise SLICE AND DICE, SEGMENT POPULATION
  7. 7. SOME PRELIMINARY RESULTS Precision Single Model 0.180994092 18.1% Low LMS Usage 0.168674699 16.9% Medium LMS Usage 0.184461986 18.4% High LMS Usage 0.20375 20.4% No LMS Usage 0.12540366 12.5% Recall (“Accuracy for At-Risk Students”) Single Model 0.639668826 64.0% Low LMS Usage 0.612326044 61.2% Medium LMS Usage 0.674772036 67.5% High LMS Usage 0.75990676 76.0% No LMS Usage 0.405217391 40.5% Accuracy Single Model 0.808493064 80.8% Low LMS Usage 0.810299003 81.0% Medium LMS Usage 0.758821249 75.9% High LMS Usage 0.772434308 77.2% No LMS Usage 0.863060429 86.3% Testing Error Single Model 0.191506936 19.2% Low LMS Usage 0.189700997 19.0% Medium LMS Usage 0.241178751 24.1% High LMS Usage 0.227565692 22.8% No LMS Usage 0.136939571 13.7% ● Numbers guy added to the team ● Learning how to set up cohorts and run the models ● There is a steep learning curve
  8. 8. ● Numbers guy added to the team ● Learning how to set up cohorts and run the models ● There is a steep learning curve SOME PRELIMINARY RESULTS Precision Single Model 0.180994092 18.1% Low LMS Usage 0.168674699 16.9% Medium LMS Usage 0.184461986 18.4% High LMS Usage 0.20375 20.4% No LMS Usage 0.12540366 12.5% Recall (“Accuracy for At-Risk Students”) Single Model 0.639668826 64.0% Low LMS Usage 0.612326044 61.2% Medium LMS Usage 0.674772036 67.5% High LMS Usage 0.75990676 76.0% No LMS Usage 0.405217391 40.5% Accuracy Single Model 0.808493064 80.8% Low LMS Usage 0.810299003 81.0% Medium LMS Usage 0.758821249 75.9% High LMS Usage 0.772434308 77.2% No LMS Usage 0.863060429 86.3% Testing Error Single Model 0.191506936 19.2% Low LMS Usage 0.189700997 19.0% Medium LMS Usage 0.241178751 24.1% High LMS Usage 0.227565692 22.8% No LMS Usage 0.136939571 13.7%
  9. 9. ● Phase 3 - Learning Record Warehouse (LRW) ○ Currently only using Moodle logs (+ demo data) ○ Plans to incorporate data from other tools ■ BB Collaborate, Mediasite, etc. ○ All data input streams feed into LRW ○ Pull from LRW into predictive modeler ■ It's important to note that if we think we may have a need to use certain data, it’s beneficial to have 3-5 years of historical data to train from. So, if we think we might use it, we should save it in the LRW. ● Implement OpenDashboard ○ To expose activity heatmap and possibly predictions ENTERPRISE EFFORTS
  10. 10. Infrastructure Overview
  11. 11. Open Analytics Infrastructure An Open Analytics Infrastructure should support: ● Collection and Storage of a variety of data ● Usage of data for analytics, reporting and visualization ● Interoperability through Open Standards ● Use of Open Software, Models and Processes where appropriate
  12. 12. Open Analytics Infrastructure
  13. 13. NCSU Realization
  14. 14. OpenLRW ● Supports xAPI, IMS Caliper, and IMS OneRoster ● Java / Spring Boot ○ Heavy use of streams, MapReduce features of Java 8 ○ Follows Spring-Boot conventions and best practices ○ LRW is packaged as an executable JAR file ■ Tomcat embedded ● MongoDB
  15. 15. OpenLRW: Performance & Scalability ● Stateless ● Horizontally Scalable
  16. 16. OpenLRW: Storage ● MongoDB ○ Sharding ○ Replica Sets
  17. 17. OpenLRW: Security ● API Security ○ JWT ● Authorization ○ Tenancy ○ Organization ● Data at Rest ○ Follow MongoDB best practices
  18. 18. IMS Caliper / xAPI in OpenLRW ● Caliper Messages are stored ~ as is ● xAPI Messages are converted to Caliper prior to storage ○ Current transformation is based on work done by the Korean Ministry of Ed ○ More transformation options coming ■ IMS / ADL (this will be the default when available)
  19. 19. Other Entities in OpenLRW ● Tenants ● Organizations ● Events ○ xAPI & Caliper ● Supporting Data (OneRoster) ○ Users ○ Classes ○ Enrollments ○ Line Items ○ UserMapping & ClassMapping
  20. 20. OpenDashboard ● Originally developed to provide a widget-based framework for visualizations ● Evolved into a faculty / staff facing tool for monitoring student activity ● Java 8 / Spring-Boot ○ Heavy use of streams, MapReduce features of Java 8 ○ Follows Spring-Boot conventions and best practices ○ Dashboard is packaged as an executable JAR file ■ Tomcat embedded
  21. 21. OpenDashboard
  22. 22. High Level View ● Ultimately the Dashboard may split into two separate deployable components: client and server
  23. 23. OpenDashboard: Session Storage ● Sessions stored in MongoDB ● Allows for horizontal scalability ● Essentially stateless client side
  24. 24. OpenDashboard For Students ● Dashboard is currently only intended for faculty/staff ● To allow student access: ○ APIs would need to be apply finer grain authorization controls ○ UI would need to be adapted for a single user view
  25. 25. Data Loader ● How do we get supporting (and maybe event) data into the LRW? ● Java application ● Run as cron job (or similar) daily or even more often
  26. 26. ● Phase 4 Needs: FY17-18 ● Plan for integrating dashboard ● Start incorporating data from other tools into LRW ● Possibly add other tool data to predictions ● Start running the modeler regularly (if we work out a way to share data) WHAT’S HAPPENING NOW
  27. 27. ● Disenfranchised by big, outrageously expensive, commercial black box analytics systems? ● Can’t afford big, outrageously expensive, commercial black box analytics systems? ● Overwhelmed by all this analytics talk and complicated math? ● Want to get your feet wet without betting the farm? ● Want to join a group of like-minded schools where every new development benefits us all? ● This is not free, but your $$$ goes farther, and you benefit from others’ work ● If you’re interested, contact us Lou Harrison Gary Gilbert lou@ncsu.edu ggilbert@unicon.net WHERE DO YOU FIT IN?
  28. 28. About Unicon TECHNOLOGY CONSULTING, SERVICES, & SUPPORT FOR THE EDUCATION INDUSTRY ● Services, strategy, and support focused on the education industry ● Deep domain-specific expertise ● Open source software foundations ● Learn more at www.unicon.net UNICON CONTRIBUTES TO THE APEREO LEARNING ANALYTICS INITIATIVE ● Unicon has been involved since 2015 ● Developed standards-based integrations for open analytics technologies ● Provides services for open analytics technologies (OpenLRW, OpenDashboard, SSP) ● Learn more at www.apereo.org/communities/learning-analytics-initiative

×