Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data in Learning Analytics - Analytics for Everyday Learning

733 views

Published on

Keynote at LearnTec 2017

Published in: Technology
  • Be the first to comment

Big Data in Learning Analytics - Analytics for Everyday Learning

  1. 1. Backup Big Data in Learning Analytics – Analytics for Everyday Learning Stefan Dietze, L3S Research Center, Hannover 24.01.2017 LearnTec 2017, Karlsruhe 23/02/17 1Stefan Dietze
  2. 2. Research areas  Web science, Information Retrieval, Semantic Web, Social Web Analytics, Knowledge Discovery, Human Computation  Interdisciplinary application areas: digital humanities, TEL/education, Web archiving, mobility Some projects L3S Research Center 23/02/17 2Stefan Dietze http://l3s.de/ http://stefandietze.net/
  3. 3. Technology-enhanced Learning / Web-based Learning Big Data in Learning Analytics? A simplistic perspective 23/02/17 3Stefan Dietze Learning Analytics & Educational Data Mining  Application of data mining techniques to understand learning activities and performance  Traditionally confined to dedicated learning environments and platforms (e.g, Moodle)  Examples: JLA special issue on LA Datasets, data ranging between few MB and max. 15 GB  Near complete research corpus: LAK Dataset (http://lak.linkededucation.org)
  4. 4. Learning Analytics & Knowledge Dataset  Cooperation of  Near-complete Linked Data corpus of Learning Analytics research publications (~ 800, seit 2009) Dietze, S., Taibi, D., D’Aquin, M., Facilitating Scientometrics in Learning Analytics and Educational Data Mining - the LAK Dataset, Semantic Web Journal, 2017. 23/02/17 4Stefan Dietze http://lak.linkededucation.org/
  5. 5. Technology-enhanced Learning / Web-based Learning Big Data in Learning Analytics? A simplistic Perspective 23/02/17 5Stefan Dietze Learning Analytics & Educational Data Mining  Application of data mining techniques to understand learning activities and performance  Traditionally confined to dedicated learning environments and platforms (e.g, Moodle)  Examples: JLA special issue on LA Datasets, data ranging between few MB and max. 15 GB  Near complete research corpus: LAK Dataset (http://lak.linkededucation.org)  Broader understanding: informal learning, micro-learning  Research often focused on resources: sharing, reusing, recommendation  Data examples:  „LinkedUp Catalog“: > 50 M resources, 300 M statements  „LRMI/schema.org“: > 45 M quads (Common Crawl 2015) Big Data? – Depends, but mostly not! (Volume?)
  6. 6. LinkedUp Catalog of learning resources Dataset Catalog/Registry http://data.linkededucation.org/linkedup/catalog/  “LinkedUp” (FP7 project): L3S, OU, OKFN, Elsevier, Exact Learning Solutions  Publishing and curation of educational/learning resources according to Linked Data principles  Largest collection of Linked Data about learning resources (approx. 50 datasets, 50 M resources) 23/02/17 6Stefan Dietze
  7. 7. 1 10 100 1000 10000 100000 1000000 10000000 1 51 101 151 201 count(log) PLD (ranked) # entities # statements Learning Resources annotations on the Web?  “Learning Resources Metadata Intiative (LRMI)”: schema.org vocabulary for annotation of learning resources in Web documents (schema.org etc)  Approx. 5000 PLDs in “Common Crawl” (2 bn Web documents)  LRMI-Adaptation on the Web (WDC) [LILE16]:  2015: 44.108.511 quads, 6.243.721 resources  2014: 30.599.024 quads, 4.182.541 resources  2013: 10.636873 quads, 1.461.093 resources 23/02/17 7 Power law distribution across providers 4805 Providers / PLDs Taibi, D., Dietze, S., Towards embedded markup of learning resources on the Web: a quantitative Analysis of LRMI Terms Usage, in Companion Publication of the IW3C2 WWW 2016 Conference, IW3C2 2016, Montreal, Canada, April 11, 2016 Stefan Dietze, Besnik Fetahu, Ujwal Gadiraju http://lrmi.itd.cnr.it/
  8. 8. Technology-enhanced Learning / Web-based Learning Big Data in Learning Analytics? A simplistic Perspective Learning Analytics & Educational Data Mining  Application of data mining techniques to understand learning activities and performance  Traditionally confined to dedicated learning environments and platforms (e.g, Moodle)  Complete research corpus: LAK Dataset (http://lak.linkededucation.org)  Data examples: JLA special issue on LA Datasets, data ranging between few MB and max. 15 GB  Broader understanding: informal learning, micro-learning  Research focused on resources: sharing, reusing, recommendation  Data examples:  „LinkedUp Catalog“: > 50 M resources, 300 M statements  „LRMI/schema.org“: > 45 M quads (Common Crawl 2015) Big Data? – Depends, but mostly not! (Volume?) Big Data? – Depends, but mostly not! (Velocity?) 23/02/17 8Stefan Dietze
  9. 9. 23/02/17 9 (Informal) Learning on the Web ? Stefan Dietze  Anything can be a learning resource  The activity makes the difference (not the resource): i.e. how a resource is being used  Learning Analytics in online/non-learning environments? o Activity streams, o Social graphs (and their evolution), o Behavioural traces (mouse movements, keystrokes) o ...  Research challenges: o How to detect „learning“? o How to detect learning-specific notions such as „competences“, „learning performance“ etc?
  10. 10. 23/02/17 10 „AFEL – Analytics for Everyday (Online) Learning“ Stefan Dietze Examples of AFEL data sources: • Activity streams and behavioral traces • L3S Twitter Crawl: 6 bn tweets • Common Crawl (2015): 2 bn documents • Web Data Commons (2015): 1 TB = 24 bn quads • „German Academic Web“: 6 TB Web crawl (quarterly recrawled) • Wikipedia edit history: 3 M edits/month (engl.) • ....  H2020 project (since 12/2015) aimed at understanding/supporting learning in social Web environments
  11. 11. Big Data Challenges/Tasks in AFEL & beyond: some examples 23/02/17 11Stefan Dietze I Efficient data capture  Crawling & extracting activity data  Crawling, extracting and indexing learning resources (eg Common Crawl) II Efficient data analysis  Understanding learning resources: entity extraction & clustering on large Web crawls of resources  “Search as learning”: detecting learning in heterogeneous search query logs & click streams  Detecting learning activities: detection of learning pattern (eg competent behavior) in absence of learning objectives & assessments (!) o Obtaining performance indicators from behavioral traces? o Quasi experiments in crowdsourcing platforms to obtain training data Gadiraju, U., Demartini, G., Kawase, R., Dietze, S. Human beyond the Machine: Challenges and Opportunities of Microtask Crowdsourcing. In: IEEE Intelligent Systems, Volume 30 Issue 4 – Jul/Aug 2015. Gadiraju, U., Kawase, R., Dietze, S, Demartini, G., Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of Online Surveys. ACM CHI Conference on Human Factors in Computing Systems (CHI2015), April 18-23, Seoul, Korea.
  12. 12. Gadiraju, U., Demartini, G., Kawase, R., Dietze, S. Human beyond the Machine: Challenges and Opportunities of Microtask Crowdsourcing. In: IEEE Intelligent Systems, Volume 30 Issue 4 – Jul/Aug 2015. Gadiraju, U., Kawase, R., Dietze, S, Demartini, G., Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of Online Surveys. ACM CHI Conference on Human Factors in Computing Systems (CHI2015), April 18-23, Seoul, Korea. 23/02/17 12Stefan Dietze Detecting competence in online users? Capturing assessment data: microtasks in Crowdflower  “Content Creation (CC)”: transcription of captchas  “Information Finding (IF)”: middle name of famous persons  1800 assessments: 2 tasks * 3 durations * 3 difficulty levels * 100 users (per assessment) Level 1 „Daniel Craig“ Level 2 „George Lucas“ (profession: Archbishop) Level 3 „Brian Smith“ (profession: Ice Hockey, born: 1972) Behavioral Traces: keystrokes- and mouse movements  timeBeforeInput, timeBeforeClick  tabSwitchFreq  windowToggleFreq  openNewTabFreq  WindowFocusFrequency  totalMouseMovements  scrollUpFreq, scrollDownFreq  ….  Total amount of events: 893.285 (CC Tasks), 736.664 (IF Tasks) Find the middle name of:
  13. 13. 23/02/17 13Stefan Dietze Predicting competence from behavioural traces? Training data  Manual annotation of 1800 assessments  Performance types [CHI15]: o “Competent Worker” , o “Diligent Worker” o “Fast Deceiver” o “Incompetent Worker” o “Rule Breaker” o “Smart Deceiver” o “Sloppy Worker”  Prediction of performance types from behavioral traces? Predicting learner types from behavioral traces  “Random Forest Classifier” (per task)  10-fold cross validation  Prediction performance: Accuracy, F-Measure Results  Longer assessments  more signals  Simpler assessments  more conclusive signals  “Competent Workers” (CW, DW): accuracy of 91% respectively 87%  Most significant features: “TotalTime”, “TippingPoint”, “MouseMovementFrequency”, “WindowFocusFrequency”
  14. 14. 23/02/17 14Stefan Dietze Other features to predict competence in learning/assessments? “Dunning-Kruger Effect”  Incompetence in task/domain reduces capacity to recognice/assess own incompetence Research question  Self-assessment as indicator for competence? Results  Self-assessment as reliable indicator of competence (94% accuracy), superior to mere performance measurement  Tendency to over-estimated own competence increases with increasing difficulty level David Dunning. 2011. The Dunning-Kruger Effect: On Being Ignorant of One’s Own Ignorance. Advances in experimental social psychology 44 (2011), 247. Performance („Accuracy“) of users classified as „competent“
  15. 15. 23/02/17 15Stefan Dietze Summary & outlook  Learning analytics in online & Web-based settings o Detection of learning & learning-related notions in absence of assessment/performance indicators? o Analysis of range of data, including behavioral traces, activity streams, self assessment etc o Actual big data  Positive results from initial models and classifiers  Application of developed models and classifiers in online (learning) environments (e.g. AFEL Projekt) o GNOSS/Didactalia (200.000 users) o LearnWeb o Deutsche Welle online o …
  16. 16. Acknowledgements: Team 23/02/17 16Stefan Dietze  Pavlos Fafalios (L3S)  Besnik Fetahu (L3S)  Ujwal Gadiraju (L3S)  Eelco Herder (L3S)  Ivana Marenzi (L3S)  Ran Yu (L3S)  Pracheta Sahoo (L3S, IIT India)  Bernardo Pereira Nunes (L3S, PUC Rio de Janeiro)  Mathieu d‘Aquin (The Open University, UK)  Davide Taibi (CNR, Italy)  ...
  17. 17. Acknowledgements: Team 23/02/17 17Stefan Dietze  Pavlos Fafalios (L3S)  Besnik Fetahu (L3S)  Ujwal Gadiraju (L3S)  Eelco Herder (L3S)  Ivana Marenzi (L3S)  Ran Yu (L3S)  Pracheta Sahoo (L3S, IIT India)  Bernardo Pereira Nunes (L3S, PUC Rio de Janeiro)  Mathieu d‘Aquin (The Open University, UK)  Davide Taibi (CNR, Italy)  ... ?http://stefandietze.net

×