Mendeley:    crowdsourcing andrecommending research       on a large scale     Kris Jack, PhD  Data Mining Team Lead
Summary➔    what is mendeley?➔    crowdsourcing on a large scale➔    recommendations on a large scale➔    data for you
Mendeley is......a startup         ...going to changecompany                 the way that we                          do r...
Mendeley provides tools to help users...                 ...collaborate with                     one another...organise   ...
Mendeley provides tools to help users...                 ...collaborate with                     one another...organise   ...
Mendeley provides tools to help users...                 ...collaborate with                     one another...organise   ...
Mendeley provides tools to help users...                 ...collaborate with                     one another...organise   ...
Mendeley provides tools to help users...                 ...collaborate with                     one another...organise   ...
SummarySummary➔    what is mendeley?➔    crowdsourcing on a large scale➔    recommendations on a large scale➔    data for ...
Mendeley          Last.fm                                                   3) Last.fm builds your music                wo...
Mendeley   Last.fmmusic libraries                 research librariesartists                         researcherssongs      ...
Catalogue Crowdsourcing:System Requirementsassimilate research artefactsinto catalogue in real time(pdfs + citation metada...
Main sources of input:                          Main types of input:                        → Mendeley Desktop            ...
articles                         catalogue generatorAims:→ Cluster documents together→ Generate catalogue entries         ...
articles                           catalogue generatorProcess:→ Filehash check (SHA-1)→ Identifier check (e.g. PubMed id)→...
articlesCatalogue with:                          catalogue generator→ article metadata→ aggregated statistics→ support rec...
SummarySummary➔    what is mendeley?➔    crowdsourcing on a large scale➔    recommendations on a large scale➔    what does...
Article Recommendation:System Requirementsgenerate personal articlerecommendations for users(i.e. “here are some articlest...
Input:User libraries                 Output:                 Recommend 10                 articles to each user
Recommendation through          Test:collaborative filtering         10-fold cross validation                             ...
Recommendation through        Test:collaborative filtering       10-fold cross validation                              50,...
Recommendation through        Test:collaborative filtering       Release to a subset of                              users...
Article Recommendation Acceptance RatesAcceptance rate (i.e. accept/reject clicks)                                        ...
Article Recommendation:System Requirements                                      1 million users!generate personal articler...
Test:                                       10-fold cross validation                                       50,000 user lib...
Article Recommendation Precision Across User     Library Sizes (using cooccurrence)Precision at 10 articles               ...
SummarySummary➔    what is mendeley?➔    crowdsourcing on a large scale➔    recommendations on a large scale➔    data for ...
Public Data                               user libraries                           50,000 libraries                       ...
Mendeleys API
www.mendeley.com
Mendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scale
Upcoming SlideShare
Loading in...5
×

Mendeley: crowdsourcing and recommending research on a large scale

820

Published on

I was invited to be the keynote speaker at a special track on Recommendation; Data Sharing and Research Practices in Science 2.0 at the I-KNOW 2011 conference (http://i-know.tugraz.at/) on 2011/09/07.

It presents the challanges involved in crowdsourcing the world's largest research catalogue and then building a recommendation service on top of them that scales to serve millions of users.

Published in: Education, Technology

Mendeley: crowdsourcing and recommending research on a large scale

  1. 1. Mendeley: crowdsourcing andrecommending research on a large scale Kris Jack, PhD Data Mining Team Lead
  2. 2. Summary➔ what is mendeley?➔ crowdsourcing on a large scale➔ recommendations on a large scale➔ data for you
  3. 3. Mendeley is......a startup ...going to changecompany the way that we do research...
  4. 4. Mendeley provides tools to help users... ...collaborate with one another...organise ...discover newtheir research research
  5. 5. Mendeley provides tools to help users... ...collaborate with one another...organise ...discover newtheir research research
  6. 6. Mendeley provides tools to help users... ...collaborate with one another...organise ...discover newtheir research research
  7. 7. Mendeley provides tools to help users... ...collaborate with one another...organise ...discover newtheir research research
  8. 8. Mendeley provides tools to help users... ...collaborate with one another...organise ...discover newtheir research research
  9. 9. SummarySummary➔ what is mendeley?➔ crowdsourcing on a large scale➔ recommendations on a large scale➔ data for you
  10. 10. Mendeley Last.fm 3) Last.fm builds your music works like this: profile and recommends you music you also could like1) Install “Audioscrobbler” and it’s the world’s largest open music 2) Listen to music database!
  11. 11. Mendeley Last.fmmusic libraries research librariesartists researcherssongs papersgenres disciplines Screenshot taken from Mendeley is the world’s www.mendeley.com largest crowdsourced on 04/09/11 research catalogue!
  12. 12. Catalogue Crowdsourcing:System Requirementsassimilate research artefactsinto catalogue in real time(pdfs + citation metadata) recognise duplicate and non-duplicate artefacts in noisy input
  13. 13. Main sources of input: Main types of input: → Mendeley Desktop → Mendeley Web Importer → article PDFs → External catalogue imports (e.g. ArXiv) → article metadata (e.g. reference)articles → External catalogue lookups (e.g. CrossRef) catalogue generator catalogue
  14. 14. articles catalogue generatorAims:→ Cluster documents together→ Generate catalogue entries catalogue
  15. 15. articles catalogue generatorProcess:→ Filehash check (SHA-1)→ Identifier check (e.g. PubMed id)→ Document fingerprint (full text)→ Metadata similarity check→ Update individual article page catalogue
  16. 16. articlesCatalogue with: catalogue generator→ article metadata→ aggregated statistics→ support recs, etc. catalogue
  17. 17. SummarySummary➔ what is mendeley?➔ crowdsourcing on a large scale➔ recommendations on a large scale➔ what does this mean for you?
  18. 18. Article Recommendation:System Requirementsgenerate personal articlerecommendations for users(i.e. “here are some articlesthat may interest you”) update recommendations every 24 hours
  19. 19. Input:User libraries Output: Recommend 10 articles to each user
  20. 20. Recommendation through Test:collaborative filtering 10-fold cross validation 50,000 user librariesArticles in library or not(e.g. binary input) 16 months agoVarious similarity metrics(e.g. cooccurrence,loglikelihood, tanimoto) Results: <0.025 precision at 10
  21. 21. Recommendation through Test:collaborative filtering 10-fold cross validation 50,000 user librariesArticles in library or not 10 months ago(e.g. binary input) (i.e. + 6 months)Various similarity metrics(e.g. cooccurrence,loglikelihood, tanimoto) Results: ~0.1 precision at 10
  22. 22. Recommendation through Test:collaborative filtering Release to a subset of usersArticles in library or not 10 months ago(e.g. binary input) (i.e. + 6 months)Various similarity metrics(e.g. cooccurrence,loglikelihood, tanimoto) Results: ~0.4 precision at 10
  23. 23. Article Recommendation Acceptance RatesAcceptance rate (i.e. accept/reject clicks) Number of months live
  24. 24. Article Recommendation:System Requirements 1 million users!generate personal articlerecommendations users(i.e. “here are some articles days!that may interest you”) update recommendations every 24 hours How to scale up?
  25. 25. Test: 10-fold cross validation 50,000 user librariesSo, results comparable to non- Completely distributed, so candistributed recommender easily run on EC2 within 24 hours...
  26. 26. Article Recommendation Precision Across User Library Sizes (using cooccurrence)Precision at 10 articles How will real users react? Number of articles in user library
  27. 27. SummarySummary➔ what is mendeley?➔ crowdsourcing on a large scale➔ recommendations on a large scale➔ data for you
  28. 28. Public Data user libraries 50,000 libraries 4,848,724 articles 3,652,285 unique articles library readership library stars Obtain from: http://dev.mendeley.com/datachallenge
  29. 29. Mendeleys API
  30. 30. www.mendeley.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×