Your SlideShare is downloading. ×
Potentials and Limitations of Educational Datasets
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Potentials and Limitations of Educational Datasets

2,169
views

Published on

Lecture for Master students given at KMI podium, see podcast here http://stadium.open.ac.uk/podium/

Lecture for Master students given at KMI podium, see podcast here http://stadium.open.ac.uk/podium/

Published in: Technology, Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,169
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Potentials and Limitations of Educational Datasets Hendrik Drachsler Open University of the Netherlands
  • 2. Hendrik Drachsler• Assistant professor at the Centre for Learning Sciences and Technologies (CELSTEC)• Track record in TEL projects such as TENCompetence, SC4L, LTfLL, Handover, dataTEL.• Main research focus: – Personalization of learning with information retrieval technologies, recommender systems and educational datasets – Visualization of educational data, data mash-up environments, supporting context-awareness by data mining – Social and ethical implications of data mining in education• Leader of the dataTEL Theme Team of the STELLAR network of excellence (join the SIG on TELeurope.eu)• Just recently: new alterEGO project granted by the Netherlands Laboratory for Lifelong Learning (on limitations of learning analytics in formal and informal learning)
  • 3. dataTELPotentials and Limitations of Educational Datasets24.07.2011 MUP/PLE lecture series, Knowledge Media Institute, Open University UKHendrik Drachsler #dataTELCentre for Learning Sciences and Technology@ Open University of the Netherlands3
  • 4. Goals of the lecture 1.Motivation or dataTEL 2.The dataTEL project 3.Potentials of dataTEL 4.Open issues of dataTEL 4
  • 5. TEL RecSys Research 5
  • 6. Survey on TEL Recommender 6
  • 7. Survey on TEL RecommenderManouselis, N., Drachsler, H., Vuorikari, R., Hummel, H. G. K., & Koper, R. (2011).Recommender Systems in Technology Enhanced Learning. In P. B. Kantor, F.Ricci, L. Rokach, & B. Shapira (Eds.), Recommender Systems Handbook (pp.387-415). Berlin: Springer. 6
  • 8. Survey on TEL Recommender Observation: Half of the systems (11/20) still at design or prototyping stage only 8 systems evaluated through trials with human users.Manouselis, N., Drachsler, H., Vuorikari, R., Hummel, H. G. K., & Koper, R. (2011).Recommender Systems in Technology Enhanced Learning. In P. B. Kantor, F.Ricci, L. Rokach, & B. Shapira (Eds.), Recommender Systems Handbook (pp.387-415). Berlin: Springer. 6
  • 9. Survey on TEL Recommender Observation: Conclusion: Small-scale experiments with a fewdesign or that rate some Half of the systems (11/20) still at learners prototyping stage resources only addsevaluated through trialsa knowledge base only 8 systems little contributions to with human users. on recommender systems and personalization in TEL.Manouselis, N., Drachsler, H., Vuorikari, R., Hummel, H. G. K., & Koper, R. (2011).Recommender Systems in Technology Enhanced Learning. In P. B. Kantor, F.Ricci, L. Rokach, & B. Shapira (Eds.), Recommender Systems Handbook (pp.387-415). Berlin: Springer. 6
  • 10. The TEL recommender research is a bit like this... 7
  • 11. The TEL recommender research is a bit like this... We need to design for each domain anappropriate recommender system that fits the goals, tasks, and particular constraints 7
  • 12. But...“The performance resultsof different researchefforts in TELrecommender systemsare hardly comparable.”(Manouselis et al., 2010) Kaptain Kobold http://www.flickr.com/photos/ kaptainkobold/3203311346/ 8
  • 13. But...“The performance resultsThe TEL recommenderof different researchexperiments lackefforts in TELtransparency. They needrecommender systemsto be repeatable to test:are hardly comparable.”• Validity(Manouselis et al., 2010)• Verification• Compare results Kaptain Kobold http://www.flickr.com/photos/ kaptainkobold/3203311346/ 8
  • 14. How others compare their recommenders 9
  • 15. How others compare their recommendersAlthough the TEL domain stores plenty ofdata everyday in e-learning environments(LMS, PLEs) there is a lack of shareableand publicly available datasets. 9
  • 16. Goals of the lecture1.Motivation or dataTEL2.The dataTEL project3.Potentials of dataTEL4.Open issues of dataTEL 10
  • 17. Who is dataTEL ? dataTEL is a Theme Team funded by the STELLAR network of excellence Riina Stephanie Katrien Nikos Martin HendrikVuorikari Lindstaedt Verbert Manouselis Wolpers Drachsler 11
  • 18. Who is dataTEL ? dataTEL is a Theme Team funded by the STELLAR network of excellence Riina Stephanie Katrien Nikos Martin HendrikVuorikari Lindstaedt Verbert Manouselis Wolpers Drachsler MAVSEL CEN PT Social Data Miguel JorisAngel Sicillia Klerkx11
  • 19. Who is dataTEL ? dataTEL is a Theme Team funded by the STELLAR network of excellence Riina Stephanie Katrien Nikos Martin HendrikVuorikari Lindstaedt Verbert Manouselis Wolpers Drachsler MAVSEL CEN PT Social Data Miguel JorisAngel Sicillia Klerkx11
  • 20. dataTEL::ObjectivesMake the research on TEL RecSys more comparable bylowering the entrance barriers for other researchers andincrease the quality.The required benchmarks therefore are:1.A collection of public available datasets ranging from formal to non-formal learning settings2.An overview of the research results of certain RecSys technologies on different datasets3.A common approach to evaluate RecSys in the domain of TEL 12
  • 21. dataTEL::Objectives1.Collecting publicly available datasets2.Sharing policy to (re)use and share datasets3.Define dataset standards (documentation, pre- processing)4.Address privacy and legal protection rights5.Create evaluation criteria for TEL recommender systems6.Create a body of knowledge on personalization in TEL 13
  • 22. dataTEL::Collection 14
  • 23. dataTEL::Collection 15
  • 24. dataTEL::CollectionDrachsler, H., Bogers, T., Vuorikari, R., Verbert, K., Duval, E., Manouselis, N.,Beham, G., Lindstaedt, S., Stern, H., Friedrich, M., & Wolpers, M. (2010). Issuesand Considerations regarding Sharable Data Sets for RecommenderSystems in Technology Enhanced Learning. Presentation at the 1st WorkshopRecommnder Systems in Technology Enhanced Learning (RecSysTEL) in conjunctionwith 5th European Conference on Technology Enhanced Learning (EC-TEL 2010):Sustaining TEL: From Innovation to Learning and Practice. September, 28, 2010,Barcelona, Spain. 15
  • 25. dataTEL::Collection •Collected data is very different with respect to amount of users and resources •Most of the data is very sparse •Privacy regulations harm data sharing •Mostly data from R., Verbert, K., Duval, E., Manouselis, N.,Drachsler, H., Bogers, T., Vuorikari, informal learning settingsBeham, G., Lindstaedt, S., Stern, H., Friedrich, M., & Wolpers, M. (2010). Issuesand Considerations regarding Sharable Data Sets for RecommenderSystems in Technology Enhanced Learning. Presentation at the 1st WorkshopRecommnder Systems in Technology Enhanced Learning (RecSysTEL) in conjunctionwith 5th European Conference on Technology Enhanced Learning (EC-TEL 2010):Sustaining TEL: From Innovation to Learning and Practice. September, 28, 2010,Barcelona, Spain. 15
  • 26. dataTEL::Collection 16
  • 27. dataTEL::Collection 16
  • 28. dataTEL::Collection 16
  • 29. dataTEL::Collection 16
  • 30. dataTEL::Body of knowledgeVerbert, K., Drachsler, H., Manouselis, N., Wolpers, M., Vuorikari, R., Beham, G., Duval, E.,(2011). Dataset-driven Research for Improving Recommender Systems for Learning. LearningAnalytics & Knowledge: February 27-March 1,17 2011, Banff, Alberta, Canada
  • 31. dataTEL::Body of knowledge Outcomes: Tanimoto similarity + item-based CF was the most accurate.Verbert, K., Drachsler, H., Manouselis, N., Wolpers, M., Vuorikari, R., Beham, G., Duval, E.,(2011). Dataset-driven Research for Improving Recommender Systems for Learning. LearningAnalytics & Knowledge: February 27-March 1,17 2011, Banff, Alberta, Canada
  • 32. dataTEL::Body of knowledge Outcomes: Tanimoto similarity + item-based CF was the most accurate.Outcomes:Implicit ratings like downloadrates, bookmarks cansuccessfully used in TEL.Verbert, K., Drachsler, H., Manouselis, N., Wolpers, M., Vuorikari, R., Beham, G., Duval, E.,(2011). Dataset-driven Research for Improving Recommender Systems for Learning. LearningAnalytics & Knowledge: February 27-March 1,17 2011, Banff, Alberta, Canada
  • 33. Goals of the lecture 1.Motivation or dataTEL 2.The dataTEL project 3.Potentials of dataTEL 4.Open Issues of dataTEL 18
  • 34. Potentials of Open DataExample by Tim Berners-Lee: The year open data went worldwide, TED talk FEB 2010 19
  • 35. Potentials of Open DataExample by Tim Berners-Lee: The year open data went worldwide, TED talk FEB 2010 19
  • 36. Data = New Science Paradigm• Thousand years ago science was empirical (Describing natural phenomena) 20
  • 37. Data = New Science Paradigm• Thousand years ago science was empirical (Describing natural phenomena)• Last few hundred years science: theoretical branch (Using models, generalizations) 20
  • 38. Data = New Science Paradigm• Thousand years ago science was empirical (Describing natural phenomena)• Last few hundred years science: theoretical branch (Using models, generalizations)• Last few decades: computational branch (Simulating complex phenomena) 20
  • 39. Data = New Science Paradigm• Thousand years ago science was empirical (Describing natural phenomena)• Last few hundred years science: theoretical branch (Using models, generalizations)• Last few decades: computational branch (Simulating complex phenomena)• Nowadays: data science (Unify theory, experiment, and simulation, data captured by instruments and processed by software, linked data) 20
  • 40. Promises of Open Data for TEL 21
  • 41. Promises of Open Data for TELUnexploited potentials for TEL:• The evaluation of learning theories and learning technology from the data side• More transparent, mutually comparable, trusted and repeatable experiments that lead to evidence-driven knowledge• Development of new educational data tools / products that combine different data sources in data mashups• Gain new insights / new knowledge by combining so far unconnected resources / tools 21
  • 42. Data Products 22
  • 43. Data Products 22
  • 44. Data Products 22
  • 45. Data ProductsW. Reinhardt, C. Mletzko, H. Drachsler, and P. Sloep. AWESOME: A widget-baseddashboard for awareness-support in Research Networks. In Proceedings of the 2nd PLEConference, Southampton, UK, 2011. 22
  • 46. Data Products Educational Data Products • Drop-out Analyzer • Group Formation Recommender • Question-Answering Tool • Awareness ToolsW. Reinhardt, C. Mletzko, H. Drachsler, and P. Sloep. AWESOME: A widget-baseddashboard for awareness-support in Research Networks. In Proceedings of the 2nd PLEConference, Southampton, UK, 2011. 22
  • 47. Goals of the lecture1.Motivation or dataTEL2.The dataTEL project3.Potentials of dataTEL4.Open issues of dataTEL 23
  • 48. dataTEL::Open issues1.Privacy2.Prepare datasets3.Share datasets4.Body of knowledge 24
  • 49. Privacy 25
  • 50. Privacy 25
  • 51. PrivacyOVERSHARING 25
  • 52. Privacy OVERSHARINGWere the founders of PleaseRobMe.com actuallyallowed to take the data from the web and present itin that way? 25
  • 53. Privacy OVERSHARINGWere the founders of PleaseRobMe.com actuallyallowed to take the data from the web and present itin that way?Are we allowed to use data from social services andreuse it for research purposes? 25
  • 54. Privacy 26
  • 55. Privacy1.Privacy as confidentiality The right to be let alone (Warren and Brandeis, 1890) 26
  • 56. Privacy1.Privacy as confidentiality The right to be let alone (Warren and Brandeis, 1890)2.Privacy as control The right of the individual to decide what information about herself should be communicated to others and under which circumstances. 26
  • 57. Privacy1.Privacy as confidentiality The right to be let alone (Warren and Brandeis, 1890)2.Privacy as control The right of the individual to decide what information about herself should be communicated to others and under which circumstances.3.Privacy as practice The right to intervene in the flows of existing data and the re-negotiation of boundaries with respect to collected data. 26
  • 58. Privacy solutions 27
  • 59. Privacy solutions1.Privacy as confidentiality Information services that minimizing, secure or anonymize the collected information 27
  • 60. Privacy solutions1.Privacy as confidentiality Information services that minimizing, secure or anonymize the collected information2.Privacy as control Identity Management Systems (IDMS), with access control rules 27
  • 61. Privacy solutions1.Privacy as confidentiality Information services that minimizing, secure or anonymize the collected information2.Privacy as control Identity Management Systems (IDMS), with access control rules3.Privacy as practice Timestamp on data, data degradation technologies 27
  • 62. Prepare datasets Justin Marshall, Coded Ornament by rootoftwo http://www.flickr.com/photos/rootoftwo/ 267285816 28
  • 63. Prepare datasets1. Create a dataset thatrealistically reflects thevariables of the learningsetting. Justin Marshall, Coded Ornament by rootoftwo http://www.flickr.com/photos/rootoftwo/ 267285816 28
  • 64. Prepare datasets1. Create a dataset thatrealistically reflects thevariables of the learningsetting.2. Use a sufficiently largeset of user profiles Justin Marshall, Coded Ornament by rootoftwo http://www.flickr.com/photos/rootoftwo/ 267285816 28
  • 65. Prepare datasets1. Create a dataset thatrealistically reflects thevariables of the learningsetting.2. Use a sufficiently largeset of user profiles3. Create datasets thatare comparable to others Justin Marshall, Coded Ornament by rootoftwo http://www.flickr.com/photos/rootoftwo/ 267285816 28
  • 66. Prepare datasetsFor informal data sets:1. Collect data2. Process data3. Document data4. Share dataFor formal data setsfrom LMS:1. Data storing scripts2. Anonymisation scripts3. Document data4. Share data 29
  • 67. Prepare datasets 30
  • 68. Share/cite datasets 31
  • 69. Sharing policies 32
  • 70. Sharing policies 32
  • 71. Sharing policies 32
  • 72. Sharing policies 32
  • 73. Sharing policy guidelinesA brief guide on data licenses developed by SURF and the Centre forIntellectual Property Law (CIER), 2009 available atwww.surffoundation.nl 33
  • 74. Body of knowledge DatasetsFormal Informal Data A Data B Data CAlgorithms: Algorithms: Algorithms:Algoritmen A Algoritmen D Algoritmen BAlgoritmen B Algoritmen E Algoritmen DAlgoritmen CModels: Models: Models:Learner Model A Learner Model C Learner Model ALearner Model B Learner Model E Learner Model CMeasured attributes: Measured attributes: Measured attributes:Attribute A Attribute A Attribute AAttribute B Attribute B Attribute BAttribute C Attribute C Attribute C 34
  • 75. Body of knowledge 35
  • 76. Body of knowledge 35
  • 77. Body of knowledge 35
  • 78. Body of knowledge 35
  • 79. Body of knowledge 35
  • 80. dataTEL::SIG http://www.teleurope.eu/pg/groups/9405/datatel/Objectives:• Representing dataTEL researchers to promote the release of open datasets from educational providers• Fostering the standardizations of datasets to enable exchange and interoperability• Contributing to policies on ethical guidelines (privacy and legal protection rights)• Fostering a shared understanding of evaluation methods in TEL RecSys and Learning Analytics technologies. 36
  • 81. Many thanks for your interests 37picture by Tom Raftery http://www.flickr.com/photos/traftery/4773457853/sizes/l
  • 82. Many thanks for your interests Free the data 37picture by Tom Raftery http://www.flickr.com/photos/traftery/4773457853/sizes/l
  • 83. Many thanks for your interests This silde is available at: http://www.slideshare.com/Drachsler Email: hendrik.drachsler@ou.nl Skype: celstec-hendrik.drachsler Blogging at: http://www.drachsler.de Twittering at: http://twitter.com/HDrachsler 38

×