Issues and Considerations regarding Sharable Data Sets for Recommender Systems in TEL

Free
the data

by Tom Raftery http://www.ﬂickr.com/photos/traftery/4773457853/sizes/l
2

Why ?

2

Because we
will get new
insights

2

Issues and Considerations regarding Sharable Data
Sets for Recommender Systems in TEL
29.09.2010 RecSysTEL workshop at RecSys and ECTEL conference, Barcelona, Spain

Hendrik Drachsler #dataTEL
Centre for Learning Sciences and Technology
@ Open University of the Netherlands 3

What is dataTEL
• dataTEL is a Theme Team funded by the
STELLAR network of excellence.
• It address 2 STELLAR Grand Challenges
1. Connecting Learner
2. Contextualisation

4

Recommender in TEL

5

The TEL recommender
are a bit like this...

6

The TEL recommender
are a bit like this...
We need to select for each application an
appropriate recsys that ﬁts its needs.

6

But...
“The performance results
of different research
efforts in recommender
systems are hardly
comparable.”

(Manouselis et al., 2010)
Kaptain Kobold
http://www.ﬂickr.com/photos/
kaptainkobold/3203311346/

7

But...
The TEL recommender
“The performance results
experiments lack
of different research
transparency. They need
efforts in recommender
to be repeatable to test:
systems are hardly
comparable.”
• Validity
• Veriﬁcationet al., 2010)
(Manouselis
• Compare results Kaptain Kobold
http://www.ﬂickr.com/photos/
kaptainkobold/3203311346/

7

How others compare their
recommenders

8

What kind of data set we can
Fin
f
c
expect in TEL M

Ma
s
f
W
a

Graphic by J. Cross, Informal learning, Pfeifer (2006)

9

The Long Tail

Graphic Wilkins, D., (2009); Long10 concept Anderson, C. (2004)
tail

The Long Tail of Learning

tail

The Long Tail of Learning
Formal Data

Informal Data

tail

Guidelines for Data Set
Aggregation

Justin Marshall, Coded Ornament by
rootoftwo
http://www.ﬂickr.com/photos/rootoftwo/
267285816

11

Aggregation
1. Create a data set that
realistically reﬂects the
variables of the learning
setting.

rootoftwo
267285816

11

Aggregation
setting.
2. Use a sufﬁciently large
set of user proﬁles

rootoftwo
267285816

11

Aggregation
setting.
2. Use a sufﬁciently large
set of user proﬁles
3. Create data sets that
are comparable to others
rootoftwo
267285816

11

dataTEL::Objectives
Standardize research on
recommender systems in TEL

Five core questions:
1.How can data sets be shared according to privacy and legal
protection rights?
2.How to development a policy to use and share data sets?
3.How to pre-process data sets to make them suitable for
other researchers?
4.How to deﬁne common evaluation criteria for TEL
recommender systems?
5.How to develop overview methods to monitor the
performance of TEL recommender systems on data sets?
12

1. Protection Rights

13

OVERSHARING

13

OVERSHARING
Were the founders of PleaseRobMe.com actually
allowed to grab the data from the web and present it
in that way?

13

OVERSHARING
in that way?

Are we allowed to use data from social handles and
reuse it for research purposes?

13

OVERSHARING
in that way?

Are we allowed to use data from social handles and
reuse it for research purposes?

and there is more company protection rights!
13

2. Policies to use Data

14

3. Pre-Process Data Sets
For informal data sets:

1. Collect data
2. Process data
3. Share data

For formal data sets
from LMS:

1. Data storing scripts
2. Anonymisation scripts

15

4. Evaluation Criteria

Combine approach by Kirkpatrick model by
Drachsler et al. 2008 Manouselis et al. 2010
16

1. Accuracy
2. Coverage
3. Precision

16

1. Accuracy
2. Coverage
3. Precision

1. Effectiveness of learning
2. Efﬁciency of learning
3. Drop out rate
4. Satisfaction

16

1. Accuracy 1. Reaction of learner
2. Coverage 2. Learning improved
3. Precision 3. Behaviour
4. Results

1. Effectiveness of learning
2. Efﬁciency of learning
3. Drop out rate
4. Satisfaction

16

5. Data Set Framework to Monitor
Performance
Datasets
Formal Informal

Data A Data B Data C

Algorithms: Algorithms: Algorithms:
Algoritmen A Algoritmen D Algoritmen B
Algoritmen B Algoritmen E Algoritmen D
Algoritmen C

Models: Models: Models:
Learner Model A Learner Model C Learner Model A
Learner Model B Learner Model E Learner Model C

Measured attributes: Measured attributes: Measured attributes:
Attribute A Attribute A Attribute A
Attribute B Attribute B Attribute B
Attribute C Attribute C Attribute C
17

Join us for a Coffee ...
http://www.teleurope.eu/pg/groups/9405/datatel/

18

Upcoming event ...
Workshop: dataTEL- Data Sets for Technology Enhanced Learning
Date: March 30th to March 31st, 2011
Location: Ski resort La Clusaz in the French Alps, Massif des
Aravis
Funding: Food and lodging for 3 nights for 10 selected participants
Submissions: For being funded, please send extended abstracts to
http://www.easychair.org/conferences/?conf=datatel2011
Deadline for submissions: October 25th, 2010
CfP: http://www.teleurope.eu/pg/pages/view/46082/

19

Many thanks for your interests
This silde is available at:
http://www.slideshare.com/Drachsler

Email: hendrik.drachsler@ou.nl
Skype: celstec-hendrik.drachsler
Blogging at: http://www.drachsler.de
Twittering at: http://twitter.com/HDrachsler

20

Issues and Considerations regarding Sharable Data Sets for Recommender Systems in TEL

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Issues and Considerations regarding Sharable Data Sets for Recommender Systems in TEL

Similar to Issues and Considerations regarding Sharable Data Sets for Recommender Systems in TEL (20)

More from Hendrik Drachsler

More from Hendrik Drachsler (20)

Issues and Considerations regarding Sharable Data Sets for Recommender Systems in TEL