MongoDB and the
McGraw-Hill Education
Learning Analytics Platform:
Towards an Open, Scalable,
Streaming Solution for
Education
MongoDB World 2015
2
Outline
• McGraw-Hill Education
• My background
• The MHE Learning Analytics Platform (LAP)
• Standardized educational input events
• MongoDB schema design
• Server infrastructure
• Performance
• Conclusions
3
• Global Company with over 5,000 employees
• Now a Learning Science Company
• All content available digitally by Fall 2015
• Higher Ed system is Connect
• K-12 LMS is Engrade
• Adaptive systems LearnSmart, SmartBook
and ALEKS
McGraw-Hill Education
4
• Global and Marine Seismologist
• Small College Physics Professor
• Oracle Database Administrator
• Head of IT Operations at MIT Sloan School
of Management
• Head of MHE Digital Platform Group’s
Analytics team’s Data Science group
• Systems Engineer on this project
My Background
5
Motivation
 MHE has several digital educational platforms
including Connect for Higher Ed and Engrade for K-
12
 Instrument platforms to send student/educator
events in real time to a central system (LAP)
 Ingest and store education events in data store
(MongoDB)
 Analytics provides “insights” to students/educators
Introduction to LAP
6
Demo Connect Insight for Students (CIS)
Introduction to the LAP
7
Standardized education events (Caliper)
 Utilizes JSON-LD (linked data) format
 Caliper uses Actor - Verb - Object tuple to
form learning events (ex: student – submit –
test)
 Triggered from student/educator activity and
sent to LAP input API
IMS Caliper Format for
Education
8
LAP Architectural Design
LAP
Connect
Caliper
Event
Engrade
Caliper
Event
Other
Caliper
Events
9
LAP Architectural Design
Collection
Collection
Receiver
LAP
Ingestion
API
Connect
Caliper
Event
Engrade
Caliper
Event
Other
Caliper
Events
10
LAP Architectural Design
Connect
Caliper
Event
Collection
Collection
Receiver
Engrade
Caliper
Event
Other
Caliper
Events
LAP
Ingestion
API
Long-term
Storage
SQS
11
LAP Architectural Design
Collection
Collection
Worker
LAP
Ingestion
API
Long-term
Storage
SQS
MongoDB
Data Store
Connect
Caliper
Event
Engrade
Caliper
Event
Other
Caliper
Events
12
LAP Architectural Design
Collection
Collection
Worker
LAP
Ingestion
API
Long-term
Storage
SQS
MongoDB
Data Store
Results/
AnalysisResults/
Analysis
Output
API
Results/
Analysis
Connect
Caliper
Event
Engrade
Caliper
Event
Other
Caliper
Events
13
LAP Architectural Design
Collection
Collection
Worker
LAP
Ingestion
API
Long-term
Storage
SQS
MongoDB
Data Store
Results/
AnalysisResults/
Analysis
Output
API
Results/
Analysis
Results/
AnalysisResults/
Analysis
Connect
Insight for
Students
Engrade
Insight for
Teachers
Future
Insights
Insight
Output
API
Results/
Analysis
Connect
Caliper
Event
Engrade
Caliper
Event
Other
Caliper
Events
14
• JSON-LD input suggested a document store
• MongoDB accessible and well documented
• Provided needed performance and capacity
• Support from MongoDB Inc. (10Gen)
• Six Month Development Support contract
• Dedicated consultants
• Ongoing support contract
Why MongoDB?
15
Standardized education events (Caliper)
 Caliper (JSON-LD) produced by triggers in
the Connect Oracle database
 Triggered from student/educator activity and
sent to LAP input API
 LAP then verifies input, transforms into
MongoDB schema, calculates aggregates,
and sends to data visualizations
Data Flow Through the LAP
16
Data Flow Through the LAP
Standardized education events (Caliper
examples)
1. Assessment Created
2. Assessment Attempt Started
3. Assessment Attempt Submitted
4. Assessment Attempt Graded
An assessment is an on-line homework assignment, quiz or test
associated with a McGraw-Hill digital textbook.
{
"@context": " http://purl.imsglobal.org/ctx/caliper/v1/AssessmentEvent",
"@type" : "AssessmentEvent",
"@id" : "mhe-caliper:connect-000/eventId/ea7db7cf-2ed9-43a3-b9d4-1472265157c5",
"generatedAtTime" : "2012-11-01T08:00:01",
"verb" : "created",
"actor" : {
"@id" : "mhe-caliper:connect-000/actor/MjYwMTM4Nzc3Nw==",
"@type" : "instructor"
},
"startedAtTime" : "2012-11-01T08:00:00",
"object" : {
"@id" : "mhe-caliper:connect-000/assessment/MjYwMTM4Nzc4Mg==",
"@type": "assessment",
"category" : "homework",
"origin" : "ASSESSMENT",
"topics" : ["addition", "subtraction"],
"maxSubmissionsAllowed" : 3,
"maxTimeAllowed" : 10800,
"maxOutcomePossible" : 100.0,
"startDate" : "2012-11-01T04:00:00",
"dueDate" : "2012-11-05T08:00:00",
"assessmentName" : "Sample Assignment Zero",
"noFeedback" : false,
"ALEDisplayName" : "Critical Missions",
"attemptDeductions" : false,
"lateSubmissionDeduction" : true,
"studyAttempts" : true,
"forceSubmission" : true
},
"learningContext" : {
"enrolledIn" : "section",
"course" : "mhe-caliper:connect-000/course/MjYwMTM4Nzc4NQ==",
"section" : "mhe-caliper:connect-000/section/MjYwMDU1MTc0Ng==",
"courseName" : "Intro to Algebra",
"sectionName" : "MWF 10-11",
"timeZone" : "America/New_York"
}
}
Assessment Created
Caliper Event
Assessment Attempt Started
Caliper Event
{
"@context" : "http://purl.imsglobal.org/ctx/caliper/v1/AssessmentEvent",
"@type" : "AssessmentEvent",
"@id" : "mhe-caliper:connect-000/eventId/87231361-6c9c-4ef6-8ea3-49e39e78eb4d",
"generatedAtTime" : "2012-11-01T08:00:01",
"verb" : "started",
"actor" : {
"@id" : "mhe-caliper:connect-000/actor/MjYwMTM4Nzc4MQ==",
"@type" : "student"
},
"startedAtTime" : "2012-11-04T11:00:00",
"object" : {
"@id" : "mhe-caliper:connect-000/assessment/MjYwMTM4Nzc4Mg==",
"@type": "assessment"
},
"learningContext" : {
"enrolledIn" : "section",
"course" : "mhe-caliper:connect-000/course/MjYwMTM4Nzc4NQ==",
"section" : "mhe-caliper:connect-000/section/MjYwMDU1MTc0Ng==",
"courseName" : "Intro to Algebra",
"sectionName" : "MWF 10-11",
"timeZone" : "America/New_York"
},
"attemptCount" : 1
}
Assessment Attempt Submitted
Caliper Event
{
"@context" : "http://purl.imsglobal.org/ctx/caliper/v1/AssessmentEvent",
"@type" : "AssessmentEvent",
"@id" : "mhe-caliper:connect-000/eventId/b1db77ea-44a7-4f99-a819-e8b7e142f457",
"generatedAtTime" : "2012-11-01T08:00:01",
"verb" : "submitted",
"actor" : {
"@id" : "mhe-caliper:connect-000/actor/MjYwMTM4Nzc4MQ==",
"@type" : "student"
},
"startedAtTime" : "2012-11-04T11:45:01",
"object" : {
"@id" : "mhe-caliper:connect-000/assessment/MjYwMTM4Nzc4Mg==",
"@type" : "assessment",
"maxSubmissionsAllowed" : 3,
"maxTimeAllowed" : 10800,
"dueDate" : "2012-11-08T12:00:00"
},
"learningContext" : {
"enrolledIn" : "section",
"course" : "mhe-caliper:connect-000/course/MjYwMTM4Nzc4NQ==",
"section" : "mhe-caliper:connect-000/section/MjYwMDU1MTc0Ng==",
"courseName" : "Intro to Algebra",
"sectionName" : "MWF 10-11",
"timeZone" : "America/New_York",
},
"attemptCount" : 1,
"timeTaken" : 1800
}
Assessment Attempt Graded
Caliper Event
{
"@context" : "http://mheducation.com/mhe-caliper/v1/OutcomeEvent",
"@type" : "OutcomeEvent",
"@id" : "mhe-caliper:connect-000/eventId/d7c28248-8e14-495e-9a5c-bb1cc1e0882d",
"generatedAtTime" : "2012-11-01T08:00:01",
"verb" : "graded",
"actor" : {
"@id" : "mhe-caliper:connect-000/actor/MjYwMTM4Nzc3Nw==",
"@type" : "instructor"
},
"startedAtTime" : "2012-11-04T19:00:00",
"object" : {
"@id" : "mhe-caliper:connect-000/assessment/MjYwMTM4Nzc4Mg==",
"@type": "assessment",
"student" : "mhe-caliper:connect-000/actor/MjYwMTM4Nzc4MQ=="
},
"learningContext" : {
"enrolledIn" : "section",
"course" : "mhe-caliper:connect-000/course/MjYwMTM4Nzc4NQ==",
"section" : "mhe-caliper:connect-000/section/MjYwMDU1MTc0Ng==",
"courseName" : "Intro to Algebra",
"sectionName" : "MWF 10-11",
"timeZone" : "America/New_York"
},
"attemptCount" : 1,
"outcome" : .85,
"percentDeducted" : 5
}
21
Constraints on developing schema
 Several learning activities require multiple
Caliper events
• Example: student starts, submits, and is
graded to complete a quiz
 No guarantee that external applications will
send events in chronological order
 May receive duplicate events
Data Flow Through the LAP
22
MongoDB Schema – Version
0.1
V0.1
 2 schema model (student and
class)
Class Collection describes the
class, section and assignments
Student Collection
• Assessment array updated when
attempt is complete
• All events for an activity
• Attempts for each activity in a sub-array
23
V0.1
 Problems
• Too embedded
• Difficult to update a student doc
• Query-logic-update
MongoDB Schema
24
MongoDB Schema Version 2
• Remove nested arrays
• Move attempts doc up to the top level
V0.2
 Problems
• Still have query-logic-update
• Difficult to do atomically and maintain
deterministic state
25
MongoDB Schema Version 2
26
MongoDB Schema Version 3
{}
{}
• Remove arrays altogether
• Replace arrays with assessment and attempt docs,
each of which contains several sub-docs
V0.3
 Atomic updates now much easier
 Save raw Caliper event in event collection
 Only update student collection if all required events
are in event collection
27
MongoDB Schema Version 0.3
28
Query Utilization
• 3 basic queries to build visualization for CIS
• All student docs for current class
• All student docs for current student
• Class doc for current class
• All queries are on indexed parameters
• Student doc _id = class_id:student_id
• Class doc _id = class_id
29
Infrastructure
• All servers and storage is in AWS
• Backups done using EBS snapshots
• DB size estimated to grow about 500 [GB/year]
• Data size estimate small enough for un-sharded
cluster
• 3 member replica sets
• Write to primary, read from primary and secondary's
30
Performance
• Estimated peak load 100 [events/sec] = 100 [kB/sec]
• Average load of 1,500,000 events/day
• Max of 2,500,000 events per day
• Initially planned on sharded, replicated cluster but for
now do not need this
• Added SQS Queue to handle periods of very high
load
• Upgraded from MongoDB 2.6 to 3.0 (~ x10 faster)
31
Conclusions
• We have a learning analytics platform in
production utilizing a MongoDB data store
• After several iterations we developed a
MongoDB schema which:
• Handles data coming in arbitrary order
with duplicates
• Performs one step, atomic inserts
• Has high performance during peak loads

MongoDB & The McGraw-Hill Education Learning Analytics Platform

  • 1.
    MongoDB and the McGraw-HillEducation Learning Analytics Platform: Towards an Open, Scalable, Streaming Solution for Education MongoDB World 2015
  • 2.
    2 Outline • McGraw-Hill Education •My background • The MHE Learning Analytics Platform (LAP) • Standardized educational input events • MongoDB schema design • Server infrastructure • Performance • Conclusions
  • 3.
    3 • Global Companywith over 5,000 employees • Now a Learning Science Company • All content available digitally by Fall 2015 • Higher Ed system is Connect • K-12 LMS is Engrade • Adaptive systems LearnSmart, SmartBook and ALEKS McGraw-Hill Education
  • 4.
    4 • Global andMarine Seismologist • Small College Physics Professor • Oracle Database Administrator • Head of IT Operations at MIT Sloan School of Management • Head of MHE Digital Platform Group’s Analytics team’s Data Science group • Systems Engineer on this project My Background
  • 5.
    5 Motivation  MHE hasseveral digital educational platforms including Connect for Higher Ed and Engrade for K- 12  Instrument platforms to send student/educator events in real time to a central system (LAP)  Ingest and store education events in data store (MongoDB)  Analytics provides “insights” to students/educators Introduction to LAP
  • 6.
    6 Demo Connect Insightfor Students (CIS) Introduction to the LAP
  • 7.
    7 Standardized education events(Caliper)  Utilizes JSON-LD (linked data) format  Caliper uses Actor - Verb - Object tuple to form learning events (ex: student – submit – test)  Triggered from student/educator activity and sent to LAP input API IMS Caliper Format for Education
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
    12 LAP Architectural Design Collection Collection Worker LAP Ingestion API Long-term Storage SQS MongoDB DataStore Results/ AnalysisResults/ Analysis Output API Results/ Analysis Connect Caliper Event Engrade Caliper Event Other Caliper Events
  • 13.
    13 LAP Architectural Design Collection Collection Worker LAP Ingestion API Long-term Storage SQS MongoDB DataStore Results/ AnalysisResults/ Analysis Output API Results/ Analysis Results/ AnalysisResults/ Analysis Connect Insight for Students Engrade Insight for Teachers Future Insights Insight Output API Results/ Analysis Connect Caliper Event Engrade Caliper Event Other Caliper Events
  • 14.
    14 • JSON-LD inputsuggested a document store • MongoDB accessible and well documented • Provided needed performance and capacity • Support from MongoDB Inc. (10Gen) • Six Month Development Support contract • Dedicated consultants • Ongoing support contract Why MongoDB?
  • 15.
    15 Standardized education events(Caliper)  Caliper (JSON-LD) produced by triggers in the Connect Oracle database  Triggered from student/educator activity and sent to LAP input API  LAP then verifies input, transforms into MongoDB schema, calculates aggregates, and sends to data visualizations Data Flow Through the LAP
  • 16.
    16 Data Flow Throughthe LAP Standardized education events (Caliper examples) 1. Assessment Created 2. Assessment Attempt Started 3. Assessment Attempt Submitted 4. Assessment Attempt Graded An assessment is an on-line homework assignment, quiz or test associated with a McGraw-Hill digital textbook.
  • 17.
    { "@context": " http://purl.imsglobal.org/ctx/caliper/v1/AssessmentEvent", "@type": "AssessmentEvent", "@id" : "mhe-caliper:connect-000/eventId/ea7db7cf-2ed9-43a3-b9d4-1472265157c5", "generatedAtTime" : "2012-11-01T08:00:01", "verb" : "created", "actor" : { "@id" : "mhe-caliper:connect-000/actor/MjYwMTM4Nzc3Nw==", "@type" : "instructor" }, "startedAtTime" : "2012-11-01T08:00:00", "object" : { "@id" : "mhe-caliper:connect-000/assessment/MjYwMTM4Nzc4Mg==", "@type": "assessment", "category" : "homework", "origin" : "ASSESSMENT", "topics" : ["addition", "subtraction"], "maxSubmissionsAllowed" : 3, "maxTimeAllowed" : 10800, "maxOutcomePossible" : 100.0, "startDate" : "2012-11-01T04:00:00", "dueDate" : "2012-11-05T08:00:00", "assessmentName" : "Sample Assignment Zero", "noFeedback" : false, "ALEDisplayName" : "Critical Missions", "attemptDeductions" : false, "lateSubmissionDeduction" : true, "studyAttempts" : true, "forceSubmission" : true }, "learningContext" : { "enrolledIn" : "section", "course" : "mhe-caliper:connect-000/course/MjYwMTM4Nzc4NQ==", "section" : "mhe-caliper:connect-000/section/MjYwMDU1MTc0Ng==", "courseName" : "Intro to Algebra", "sectionName" : "MWF 10-11", "timeZone" : "America/New_York" } } Assessment Created Caliper Event
  • 18.
    Assessment Attempt Started CaliperEvent { "@context" : "http://purl.imsglobal.org/ctx/caliper/v1/AssessmentEvent", "@type" : "AssessmentEvent", "@id" : "mhe-caliper:connect-000/eventId/87231361-6c9c-4ef6-8ea3-49e39e78eb4d", "generatedAtTime" : "2012-11-01T08:00:01", "verb" : "started", "actor" : { "@id" : "mhe-caliper:connect-000/actor/MjYwMTM4Nzc4MQ==", "@type" : "student" }, "startedAtTime" : "2012-11-04T11:00:00", "object" : { "@id" : "mhe-caliper:connect-000/assessment/MjYwMTM4Nzc4Mg==", "@type": "assessment" }, "learningContext" : { "enrolledIn" : "section", "course" : "mhe-caliper:connect-000/course/MjYwMTM4Nzc4NQ==", "section" : "mhe-caliper:connect-000/section/MjYwMDU1MTc0Ng==", "courseName" : "Intro to Algebra", "sectionName" : "MWF 10-11", "timeZone" : "America/New_York" }, "attemptCount" : 1 }
  • 19.
    Assessment Attempt Submitted CaliperEvent { "@context" : "http://purl.imsglobal.org/ctx/caliper/v1/AssessmentEvent", "@type" : "AssessmentEvent", "@id" : "mhe-caliper:connect-000/eventId/b1db77ea-44a7-4f99-a819-e8b7e142f457", "generatedAtTime" : "2012-11-01T08:00:01", "verb" : "submitted", "actor" : { "@id" : "mhe-caliper:connect-000/actor/MjYwMTM4Nzc4MQ==", "@type" : "student" }, "startedAtTime" : "2012-11-04T11:45:01", "object" : { "@id" : "mhe-caliper:connect-000/assessment/MjYwMTM4Nzc4Mg==", "@type" : "assessment", "maxSubmissionsAllowed" : 3, "maxTimeAllowed" : 10800, "dueDate" : "2012-11-08T12:00:00" }, "learningContext" : { "enrolledIn" : "section", "course" : "mhe-caliper:connect-000/course/MjYwMTM4Nzc4NQ==", "section" : "mhe-caliper:connect-000/section/MjYwMDU1MTc0Ng==", "courseName" : "Intro to Algebra", "sectionName" : "MWF 10-11", "timeZone" : "America/New_York", }, "attemptCount" : 1, "timeTaken" : 1800 }
  • 20.
    Assessment Attempt Graded CaliperEvent { "@context" : "http://mheducation.com/mhe-caliper/v1/OutcomeEvent", "@type" : "OutcomeEvent", "@id" : "mhe-caliper:connect-000/eventId/d7c28248-8e14-495e-9a5c-bb1cc1e0882d", "generatedAtTime" : "2012-11-01T08:00:01", "verb" : "graded", "actor" : { "@id" : "mhe-caliper:connect-000/actor/MjYwMTM4Nzc3Nw==", "@type" : "instructor" }, "startedAtTime" : "2012-11-04T19:00:00", "object" : { "@id" : "mhe-caliper:connect-000/assessment/MjYwMTM4Nzc4Mg==", "@type": "assessment", "student" : "mhe-caliper:connect-000/actor/MjYwMTM4Nzc4MQ==" }, "learningContext" : { "enrolledIn" : "section", "course" : "mhe-caliper:connect-000/course/MjYwMTM4Nzc4NQ==", "section" : "mhe-caliper:connect-000/section/MjYwMDU1MTc0Ng==", "courseName" : "Intro to Algebra", "sectionName" : "MWF 10-11", "timeZone" : "America/New_York" }, "attemptCount" : 1, "outcome" : .85, "percentDeducted" : 5 }
  • 21.
    21 Constraints on developingschema  Several learning activities require multiple Caliper events • Example: student starts, submits, and is graded to complete a quiz  No guarantee that external applications will send events in chronological order  May receive duplicate events Data Flow Through the LAP
  • 22.
    22 MongoDB Schema –Version 0.1 V0.1  2 schema model (student and class) Class Collection describes the class, section and assignments Student Collection • Assessment array updated when attempt is complete • All events for an activity • Attempts for each activity in a sub-array
  • 23.
    23 V0.1  Problems • Tooembedded • Difficult to update a student doc • Query-logic-update MongoDB Schema
  • 24.
    24 MongoDB Schema Version2 • Remove nested arrays • Move attempts doc up to the top level
  • 25.
    V0.2  Problems • Stillhave query-logic-update • Difficult to do atomically and maintain deterministic state 25 MongoDB Schema Version 2
  • 26.
    26 MongoDB Schema Version3 {} {} • Remove arrays altogether • Replace arrays with assessment and attempt docs, each of which contains several sub-docs
  • 27.
    V0.3  Atomic updatesnow much easier  Save raw Caliper event in event collection  Only update student collection if all required events are in event collection 27 MongoDB Schema Version 0.3
  • 28.
    28 Query Utilization • 3basic queries to build visualization for CIS • All student docs for current class • All student docs for current student • Class doc for current class • All queries are on indexed parameters • Student doc _id = class_id:student_id • Class doc _id = class_id
  • 29.
    29 Infrastructure • All serversand storage is in AWS • Backups done using EBS snapshots • DB size estimated to grow about 500 [GB/year] • Data size estimate small enough for un-sharded cluster • 3 member replica sets • Write to primary, read from primary and secondary's
  • 30.
    30 Performance • Estimated peakload 100 [events/sec] = 100 [kB/sec] • Average load of 1,500,000 events/day • Max of 2,500,000 events per day • Initially planned on sharded, replicated cluster but for now do not need this • Added SQS Queue to handle periods of very high load • Upgraded from MongoDB 2.6 to 3.0 (~ x10 faster)
  • 31.
    31 Conclusions • We havea learning analytics platform in production utilizing a MongoDB data store • After several iterations we developed a MongoDB schema which: • Handles data coming in arbitrary order with duplicates • Performs one step, atomic inserts • Has high performance during peak loads