Mendeley: crowdsourcing and recommending research on a large scale
Upcoming SlideShare
Loading in...5
×
 

Mendeley: crowdsourcing and recommending research on a large scale

on

  • 1,137 views

I was invited to be the keynote speaker at a special track on Recommendation; Data Sharing and Research Practices in Science 2.0 at the I-KNOW 2011 conference (http://i-know.tugraz.at/) on 2011/09/07. ...

I was invited to be the keynote speaker at a special track on Recommendation; Data Sharing and Research Practices in Science 2.0 at the I-KNOW 2011 conference (http://i-know.tugraz.at/) on 2011/09/07.

It presents the challanges involved in crowdsourcing the world's largest research catalogue and then building a recommendation service on top of them that scales to serve millions of users.

Statistics

Views

Total Views
1,137
Views on SlideShare
1,137
Embed Views
0

Actions

Likes
3
Downloads
5
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Mendeley: crowdsourcing and recommending research on a large scale Mendeley: crowdsourcing and recommending research on a large scale Presentation Transcript

  • Mendeley: crowdsourcing andrecommending research on a large scale Kris Jack, PhD Data Mining Team Lead
  • Summary➔ what is mendeley?➔ crowdsourcing on a large scale➔ recommendations on a large scale➔ data for you
  • Mendeley is......a startup ...going to changecompany the way that we do research... View slide
  • Mendeley provides tools to help users... ...collaborate with one another...organise ...discover newtheir research research View slide
  • Mendeley provides tools to help users... ...collaborate with one another...organise ...discover newtheir research research
  • Mendeley provides tools to help users... ...collaborate with one another...organise ...discover newtheir research research
  • Mendeley provides tools to help users... ...collaborate with one another...organise ...discover newtheir research research
  • Mendeley provides tools to help users... ...collaborate with one another...organise ...discover newtheir research research
  • SummarySummary➔ what is mendeley?➔ crowdsourcing on a large scale➔ recommendations on a large scale➔ data for you
  • Mendeley Last.fm 3) Last.fm builds your music works like this: profile and recommends you music you also could like1) Install “Audioscrobbler” and it’s the world’s largest open music 2) Listen to music database!
  • Mendeley Last.fmmusic libraries research librariesartists researcherssongs papersgenres disciplines Screenshot taken from Mendeley is the world’s www.mendeley.com largest crowdsourced on 04/09/11 research catalogue!
  • Catalogue Crowdsourcing:System Requirementsassimilate research artefactsinto catalogue in real time(pdfs + citation metadata) recognise duplicate and non-duplicate artefacts in noisy input
  • Main sources of input: Main types of input: → Mendeley Desktop → Mendeley Web Importer → article PDFs → External catalogue imports (e.g. ArXiv) → article metadata (e.g. reference)articles → External catalogue lookups (e.g. CrossRef) catalogue generator catalogue
  • articles catalogue generatorAims:→ Cluster documents together→ Generate catalogue entries catalogue
  • articles catalogue generatorProcess:→ Filehash check (SHA-1)→ Identifier check (e.g. PubMed id)→ Document fingerprint (full text)→ Metadata similarity check→ Update individual article page catalogue
  • articlesCatalogue with: catalogue generator→ article metadata→ aggregated statistics→ support recs, etc. catalogue
  • SummarySummary➔ what is mendeley?➔ crowdsourcing on a large scale➔ recommendations on a large scale➔ what does this mean for you?
  • Article Recommendation:System Requirementsgenerate personal articlerecommendations for users(i.e. “here are some articlesthat may interest you”) update recommendations every 24 hours
  • Input:User libraries Output: Recommend 10 articles to each user
  • Recommendation through Test:collaborative filtering 10-fold cross validation 50,000 user librariesArticles in library or not(e.g. binary input) 16 months agoVarious similarity metrics(e.g. cooccurrence,loglikelihood, tanimoto) Results: <0.025 precision at 10
  • Recommendation through Test:collaborative filtering 10-fold cross validation 50,000 user librariesArticles in library or not 10 months ago(e.g. binary input) (i.e. + 6 months)Various similarity metrics(e.g. cooccurrence,loglikelihood, tanimoto) Results: ~0.1 precision at 10
  • Recommendation through Test:collaborative filtering Release to a subset of usersArticles in library or not 10 months ago(e.g. binary input) (i.e. + 6 months)Various similarity metrics(e.g. cooccurrence,loglikelihood, tanimoto) Results: ~0.4 precision at 10
  • Article Recommendation Acceptance RatesAcceptance rate (i.e. accept/reject clicks) Number of months live
  • Article Recommendation:System Requirements 1 million users!generate personal articlerecommendations users(i.e. “here are some articles days!that may interest you”) update recommendations every 24 hours How to scale up?
  • Test: 10-fold cross validation 50,000 user librariesSo, results comparable to non- Completely distributed, so candistributed recommender easily run on EC2 within 24 hours...
  • Article Recommendation Precision Across User Library Sizes (using cooccurrence)Precision at 10 articles How will real users react? Number of articles in user library
  • SummarySummary➔ what is mendeley?➔ crowdsourcing on a large scale➔ recommendations on a large scale➔ data for you
  • Public Data user libraries 50,000 libraries 4,848,724 articles 3,652,285 unique articles library readership library stars Obtain from: http://dev.mendeley.com/datachallenge
  • Mendeleys API
  • www.mendeley.com