Mendeley: crowdsourcing and recommending research on a large scale
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Mendeley: crowdsourcing and recommending research on a large scale

on

  • 1,159 views

I was invited to be the keynote speaker at a special track on Recommendation; Data Sharing and Research Practices in Science 2.0 at the I-KNOW 2011 conference (http://i-know.tugraz.at/) on 2011/09/07. ...

I was invited to be the keynote speaker at a special track on Recommendation; Data Sharing and Research Practices in Science 2.0 at the I-KNOW 2011 conference (http://i-know.tugraz.at/) on 2011/09/07.

It presents the challanges involved in crowdsourcing the world's largest research catalogue and then building a recommendation service on top of them that scales to serve millions of users.

Statistics

Views

Total Views
1,159
Views on SlideShare
1,159
Embed Views
0

Actions

Likes
3
Downloads
5
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Mendeley: crowdsourcing and recommending research on a large scale Presentation Transcript

  • 1. Mendeley: crowdsourcing andrecommending research on a large scale Kris Jack, PhD Data Mining Team Lead
  • 2. Summary➔ what is mendeley?➔ crowdsourcing on a large scale➔ recommendations on a large scale➔ data for you
  • 3. Mendeley is......a startup ...going to changecompany the way that we do research...
  • 4. Mendeley provides tools to help users... ...collaborate with one another...organise ...discover newtheir research research
  • 5. Mendeley provides tools to help users... ...collaborate with one another...organise ...discover newtheir research research
  • 6. Mendeley provides tools to help users... ...collaborate with one another...organise ...discover newtheir research research
  • 7. Mendeley provides tools to help users... ...collaborate with one another...organise ...discover newtheir research research
  • 8. Mendeley provides tools to help users... ...collaborate with one another...organise ...discover newtheir research research
  • 9. SummarySummary➔ what is mendeley?➔ crowdsourcing on a large scale➔ recommendations on a large scale➔ data for you
  • 10. Mendeley Last.fm 3) Last.fm builds your music works like this: profile and recommends you music you also could like1) Install “Audioscrobbler” and it’s the world’s largest open music 2) Listen to music database!
  • 11. Mendeley Last.fmmusic libraries research librariesartists researcherssongs papersgenres disciplines Screenshot taken from Mendeley is the world’s www.mendeley.com largest crowdsourced on 04/09/11 research catalogue!
  • 12. Catalogue Crowdsourcing:System Requirementsassimilate research artefactsinto catalogue in real time(pdfs + citation metadata) recognise duplicate and non-duplicate artefacts in noisy input
  • 13. Main sources of input: Main types of input: → Mendeley Desktop → Mendeley Web Importer → article PDFs → External catalogue imports (e.g. ArXiv) → article metadata (e.g. reference)articles → External catalogue lookups (e.g. CrossRef) catalogue generator catalogue
  • 14. articles catalogue generatorAims:→ Cluster documents together→ Generate catalogue entries catalogue
  • 15. articles catalogue generatorProcess:→ Filehash check (SHA-1)→ Identifier check (e.g. PubMed id)→ Document fingerprint (full text)→ Metadata similarity check→ Update individual article page catalogue
  • 16. articlesCatalogue with: catalogue generator→ article metadata→ aggregated statistics→ support recs, etc. catalogue
  • 17. SummarySummary➔ what is mendeley?➔ crowdsourcing on a large scale➔ recommendations on a large scale➔ what does this mean for you?
  • 18. Article Recommendation:System Requirementsgenerate personal articlerecommendations for users(i.e. “here are some articlesthat may interest you”) update recommendations every 24 hours
  • 19. Input:User libraries Output: Recommend 10 articles to each user
  • 20. Recommendation through Test:collaborative filtering 10-fold cross validation 50,000 user librariesArticles in library or not(e.g. binary input) 16 months agoVarious similarity metrics(e.g. cooccurrence,loglikelihood, tanimoto) Results: <0.025 precision at 10
  • 21. Recommendation through Test:collaborative filtering 10-fold cross validation 50,000 user librariesArticles in library or not 10 months ago(e.g. binary input) (i.e. + 6 months)Various similarity metrics(e.g. cooccurrence,loglikelihood, tanimoto) Results: ~0.1 precision at 10
  • 22. Recommendation through Test:collaborative filtering Release to a subset of usersArticles in library or not 10 months ago(e.g. binary input) (i.e. + 6 months)Various similarity metrics(e.g. cooccurrence,loglikelihood, tanimoto) Results: ~0.4 precision at 10
  • 23. Article Recommendation Acceptance RatesAcceptance rate (i.e. accept/reject clicks) Number of months live
  • 24. Article Recommendation:System Requirements 1 million users!generate personal articlerecommendations users(i.e. “here are some articles days!that may interest you”) update recommendations every 24 hours How to scale up?
  • 25. Test: 10-fold cross validation 50,000 user librariesSo, results comparable to non- Completely distributed, so candistributed recommender easily run on EC2 within 24 hours...
  • 26. Article Recommendation Precision Across User Library Sizes (using cooccurrence)Precision at 10 articles How will real users react? Number of articles in user library
  • 27. SummarySummary➔ what is mendeley?➔ crowdsourcing on a large scale➔ recommendations on a large scale➔ data for you
  • 28. Public Data user libraries 50,000 libraries 4,848,724 articles 3,652,285 unique articles library readership library stars Obtain from: http://dev.mendeley.com/datachallenge
  • 29. Mendeleys API
  • 30. www.mendeley.com