Mendeley: Recommendation Systems for Academic Literature

Mendeley:
Recommendation
Systems for Academic
Literature

Kris Jack, PhD
Data Mining Team Lead

“All the time we are very
conscious of the huge challenges
that human society has now –
curing cancer, understanding
the brain for Alzheimer‘s [...].

But a lot of the state of knowledge
of the human race is sitting in the
scientists’ computers, and is
currently not shared […] We need
to get it unlocked so we can tackle
those huge problems.“

Overview

➔
what's a recommender and what does it look like?

➔
what's Mendeley?

➔
the secrets behind recommenders

➔
recommenders @ Mendeley

What's a
recommender and
what does it look like?

What's a recommender?

Definition:

A recommendation system
(recommender) is a subclass of
information filtering system that
aims to predict a user's interest
in items.

Recommendation Systems in the Wild

Recommendation Vs. Search

➔
search is a pull strategy
vs.
➔
recommendation is a push strategy


search is like
following a path...


recommendation is
like being on a roller
coaster...

A different
sense of
control

What is Mendeley?

...a large data technology
startup company

...and it's on a mission to
change the way that
research is done!

Mendeley Last.fm
3) Last.fm builds your music
works like this: profile and recommends you
music you also could like... and
1) Install “Audioscrobbler” it’s the world‘s biggest open
music database

2) Listen to music

Mendeley Last.fm

music libraries research libraries

artists researchers

songs papers

genres disciplines

Mendeley provides tools to help users...

...organise
their research

...collaborate with
one another
...organise
their research

US National Academy of Engineering “Grand Challenges”:

Climate
change Sustainable food
supplies
Artificial
Clean energy Intelligence
Clean water Terrorist
Pandemic diseases violence
Tools of scientific
discovery

...collaborate with
one another
...organise ...discover new
their research research

1.4 million+ users; the 20 largest userbases:
University of Cambridge
Stanford University
MIT
University of Michigan
Harvard University
University of Oxford
Sao Paulo University
Imperial College London
University of Edinburgh
Cornell University
University of California at Berkeley
RWTH Aachen
Columbia University
Georgia Tech
University of Wisconsin
UC San Diego
University of California at LA
University of Florida
University of North Carolina

50m
Real-time data on 28m unique papers:

Thomson Reuters’
Web of Knowledge
(dating from 1934)

Mendeley after
16 months:

The secrets behind
recommenders

Q1/2: How can a tool generate recommendations?

Q2/2: How can you measure the tool's performance?

Q1/2: How can a tool generate recommendations?

Content-based Filtering Collaborative Filtering
Find items with similar Find items that users who are
characteristics (e.g. title, similar to you also liked (wisdom
discipline) to what the user of the crowds)
previously liked

TF-IDF, BM25, Bayesian User-based and item-based
classifiers, decision trees, artificial variations, matrix factorisation
neural networks
Quickly absorbs new items No need to understand item
(ovecomes cold start problem) characteristics
Can make good recommendations Tends to give more novel
from very few examples recommendations

Hybrid tools too...

Q2/2: How can you measure the tool's performance?

➔
Cross validation with hold outs
➔
get yourself a good ground truth
➔
hide a fraction of your data from the system
➔
try to predict the hidden fraction from the
remaining data
➔
calculate precision and recall

➔
Let users decide
➔
set up evaluations with real users (experimental)
➔
track tool usage by users

Recommenders
@ Mendeley

1) Related Research
●
given 1 research article
●
find other related articles

2) Personalised Recommendations
●
given a user's profile (e.g. interests)
●
find new articles of interest to them

Use Case 1: Related Research

Strategy

content-based approach (tf-idf with lucene implementation)
search for articles with same metadata (e.g. title, tags)

Evaluation

cross-validation with hold outs on a ground truth data set

tf-idf Precision per Field when Field is Available

0.5

Q2/2 What are our results?
0.45

0.4

0.35

0.3
Precision @ 5

0.25

0.2

0.15

0.1

0.05

0
tag abstract mesh-term title general-keyword author keyword

metadata field

Results 1) tags are the most informative field for finding related research

tf-idf Precision for Field Combos when Field is Available

0.5

0.45

0.4 abstract+author+general-keyword+tag+title
0.35

0.3
precision @ 5

0.25

0.2

0.15

0.1

0.05

0
tag bestCombo abstract mesh-term title general-keyword author keyword

metadata field(s)

Results 2) tags outperform combinations of fields

How does Mendeley
use recommendation 2/2 Personalised
Recommendations
technologies?

2) Personalised Recommendations
●
given a user's profile (e.g. interests)
●
find new articles of interest to them

Use Case 2: Perso Recommendations

Strategy

collaborative filtering (item-based with apache mahout)
recommend articles to researchers that would interest them

Evaluation

cross-validation with hold outs on a ground truth data set

Input:
User libraries

Output:
Recommend 10
articles to each user

Test:
10-fold cross validation
50,000 user libraries

16 months ago

Results:
<0.025 precision at 10

Test:
10 months ago
(i.e. + 6 months)

Results:
~0.1 precision at 10

Test:
Release to a subset of
users
10 months ago
(i.e. + 6 months)

Results:
~0.4 precision at 10

Article Recommendation Acceptance Rates
Acceptance rate (i.e. accept/reject clicks)

Number of months live

Precision at 10 articles
Precision by Library Size

Number of articles in user library

Test:

So, results comparable to non- Completely distributed, so can
distributed recommender easily run on EC2 within 24
hours...

Conclusions
Summary

➔
Recommendations can be complementary to search

➔
They can help users to discover interesting items

➔
They can exploit item metadata (content-based)

➔
They can exploit the 'wisdom of the crowds' (CF)

Conclusions
Summary

➔
Crowd-sourced metadata can have a poweful
informative value (e.g. article tags)

➔
Sometimes you need to let data grow

➔
Evaluations under lab conditions don't always
predict real world results well

➔
Recommenders don't just have to be about making
money … remember where we started...?

Mendeley: Recommendation Systems for Academic Literature

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Similar to Mendeley: Recommendation Systems for Academic Literature

Similar to Mendeley: Recommendation Systems for Academic Literature (20)

More from Kris Jack

More from Kris Jack (12)

Recently uploaded

Recently uploaded (20)

Mendeley: Recommendation Systems for Academic Literature