MOOCdb:
Developing Data Standards for MOOCs
Srećko Joksimović
s.joksimovic@ed.ac.uk
@s_joksimovic
Kalyan Veeramachaneni
ka...
Data sources
SQL_anonymized_forum.sql
SQL_anonymized_general.sql
SQL_hash_mapping.sql
standard extract.csvSQL_unanonymizab...
Data sources
Weekly data packages
auth_user-{site}-analytics.sql
auth_userprofile-{site}-analytics.sql
certificates_genera...
Challenges
• Analytics across several courses
• Analytics across different platforms
• Analytics across different institut...
Solution?
• Collaborative data science platform
– Standardize data storage
– Generalizable across courses and data provide...
MOOC data science commons
MOOCdb
Observing mode
- Observed Events table
- Resources table
- Resources Types table
- URLs table
- Resource URLs table...
Collaborative platform and
applications
edX Coursera MOOCdb doc Github repo
Feature factory LabelMe Digital learner quanti...
Current state
• Established network of institutions
– MIT, Stanford, University of Michigan, University
of Edinburgh, Univ...
Next steps
Digital Learner Quantified
Discussion forum analysis
LabelMe
Problem analytics
Dropout prediction
Social networ...
Collaboration
• If you are interested in…
– Development
– Feature modeling
– Translating your data
– Testing
kalyan@csail....
Q&A
MOOCdb:
Developing Data Standards for MOOCs
Srećko Joksimović
s.joksimovic@ed.ac.uk
@s_joksimovic
Kalyan Veeramachanen...
Upcoming SlideShare
Loading in …5
×

MOOCdb: Developing Data Standards for MOOCs

384 views

Published on

The development of Massive Open Online Courses (MOOCs) has stimulated educational research and challenged the existing models of distance and online learning. New questions have emerged that could be answered and administered across thousands of students. Although data on the engagement of this large number of online learners is quite accessible for researchers, they are usually limited to few sources. Very few researchers have data from different courses, institutions and/or platforms. Student privacy and data protection regulations tend to limit data sharing and suppress collaborative research that is necessary for addressing some of the main challenges MOOC research is currently facing. The MOOC Data Science Commons is a collaborative platform that tends to bring the next generation of MOOC research, allowing for generalized MOOC data organization and shared analytics. The aim of the project is to bring together educational researchers, computer science researchers, machine learning researchers, technologists, database and big data experts to advance MOOC data science.

Published in: Science
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
384
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • MOOCdb which is our solution to centralizing and generalizing MOOC data organization and providing general purpose analytics for MOOC education research.
  • “How does amount of time spent on the videos during a certain week correlate to performance on the homework?”
  • CAN WE HAVE STANDARDIZED DATA STORAGE?



    Sharing and reproducing the results: When they publish research, analysts can share the scripts by
    depositing them into a public archive where they are retrievable and cross-referenced to their donor
    and publication.
  • The MOOCdb project aims to brings together educational researchers, computer science researchers, machine learning researchers, technologists, database and big data experts to advance MOOC data science. The project founded at MIT includes a platform agnostic functional data model for data exhaust from MOOCs, a collaborative-open source-open access data visualization framework, a crowd sourced knowledge discovery framework and a privacy preserving software framework. The team is currently working to release a number of these tools and frameworks as open source.


    WHAT MOOCdb PROVIDES?

    Concise data storage: MOOCdb's proposed schema is \loss-less" with respect to research relevant information, i.e. no information is lost in translating raw data to it.

    Access Control Levels for Anonymized Data: The data schema offers an organized means of structuring anonymized user identities safeguard them further.
  • Sharing of data extraction scripts: Scripts for data extraction and descriptive statistics extraction can
    be open source and shared by everyone because they reference data organized according to the schema.


    Crowd source potential: Machine learning frequently involves humans identifying explanatory variables
    that could drive a response. Enabling the crowd to help propose variables could greatly scale the com-
    munity's progress in mining MOOC data.
  • MOOCdb: Developing Data Standards for MOOCs

    1. 1. MOOCdb: Developing Data Standards for MOOCs Srećko Joksimović s.joksimovic@ed.ac.uk @s_joksimovic Kalyan Veeramachaneni kalyan@csail.mit.edu Dragan Gašević dragan.gasevic@ed.ac.uk FutureLearn Academic Network Conference 15 June 2015
    2. 2. Data sources SQL_anonymized_forum.sql SQL_anonymized_general.sql SQL_hash_mapping.sql standard extract.csvSQL_unanonymizable.sql.gz clickstream_export.gz Personal data Demographic data …
    3. 3. Data sources Weekly data packages auth_user-{site}-analytics.sql auth_userprofile-{site}-analytics.sql certificates_generatedcertificate-{site}-analytics.sql Daily data packages course_structure-{site}-analytics.json courseware_studentmodule-{site}-analytics.sql email_opt_in-{site}-analytics.csv student_courseenrollment-{site}-analytics.sql user_api_usercoursetag-{site}-analytics.sql user_id_map-{site}-analytics.sql {org}-{course}-{date}-{site}.mongo wiki_article-{site}-analytics.sql wiki_articlerevision-{site}-analytics.sql {org}-{site}-events-{date}.log.gz.gpg
    4. 4. Challenges • Analytics across several courses • Analytics across different platforms • Analytics across different institutions • Sharing data
    5. 5. Solution? • Collaborative data science platform – Standardize data storage – Generalizable across courses and data providers (currently OpenEdX, edX and Coursera) – “Data being shared without data being exchanged” – Sharing and reproducing the results
    6. 6. MOOC data science commons
    7. 7. MOOCdb Observing mode - Observed Events table - Resources table - Resources Types table - URLs table - Resource URLs table Submitting mode - Problem Types table - Problems table - Submissions table - Assessments table Collaborating mode - Collaborations table - Collaboration Types table Feedback mode - Feedbacks table - Questions table - Answers table - Surveys table User information - User PII table - Global User table - Course User table http://moocdb.csail.mit.edu/wiki/index.php?title=MOOCdb
    8. 8. Collaborative platform and applications edX Coursera MOOCdb doc Github repo Feature factory LabelMe Digital learner quantified Problem analytics My MOOCViz Social network analysis Forum analysis Dropout prediction
    9. 9. Current state • Established network of institutions – MIT, Stanford, University of Michigan, University of Edinburgh, University of Queensland, University of Texas (Austin) • Release of open source software • Development and release of the first data analytics framework
    10. 10. Next steps Digital Learner Quantified Discussion forum analysis LabelMe Problem analytics Dropout prediction Social network analysis
    11. 11. Collaboration • If you are interested in… – Development – Feature modeling – Translating your data – Testing kalyan@csail.mit.edu s.Joksimovic@ed.ac.uk
    12. 12. Q&A MOOCdb: Developing Data Standards for MOOCs Srećko Joksimović s.joksimovic@ed.ac.uk @s_joksimovic Kalyan Veeramachaneni kalyan@csail.mit.edu Dragan Gašević dragan.gasevic@ed.ac.uk FutureLearn Academic Network Conference 15 June 2015

    ×