Mendeley, putting data into the hands of researchers

949 views

Published on

I was invited to give a keynote presentation at the RecSysTEL Workshop (http://bit.ly/b2Bg2J) on 2010/09/30.

It presents Mendeley's tools for researchers and data sets that we made available for the dataTEL challenge, designed to provide new large scale data for researcers in recommendation systems.

The event was really enjoyable and the participants were excited about Mendeley.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
949
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
14
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Mendeley, putting data into the hands of researchers

  1. 1. Mendeley, putting data into the hands of researchers Kris Jack, PhDData Mining Team Coordinator
  2. 2. “All the time we are veryconscious of the hugechallenges that humansociety has now – curingcancer, understanding thebrain for Alzheimer‘s [...].But a lot of the state ofknowledge of the human raceis sitting in the scientists’computers, and is currentlynot shared […] We need toget it unlocked so we cantackle those huge problems.“
  3. 3. Summary➔ idea behind mendeley➔ our features➔ our technical challenges and solutions➔ what does this mean for you?
  4. 4. Mendeley Last.fm 3) Last.fm builds your music works like this: profile and recommends you music you also could like...1) Install “Audioscrobbler” and it’s the world‘s biggest open music database 2) Listen to music
  5. 5. Mendeley Last.fmmusic libraries research librariesartists researcherssongs papersgenres disciplines
  6. 6. Summary➔ idea behind mendeley➔ our features➔ our technical challenges and solutions➔ what does this mean for you?
  7. 7. Mendeley helps researchers work smarter
  8. 8. Mendeley helps researchers work smarterInstallMendeley Desktop Mendeley extracts research data..
  9. 9. Mendeley helps researchers work smarter ..and aggregates research data in the cloud Mendeley extracts research data..
  10. 10. By doing this, Mendeley makes sciencemore collaborative and transparent
  11. 11. Summary➔ idea behind mendeley➔ our features➔ our technical challenges andsolutions➔ what does this mean for you?
  12. 12. 500,000+ users; the 20 largest userbases: University of Cambridge Stanford University MIT University of Michigan Harvard University University of Oxford Sao Paulo University Imperial College London University of Edinburgh Cornell University University of California at Berkeley RWTH Aachen Columbia University Georgia Tech University of Wisconsin UC San Diego39,000,000+ articles University of California at LA University of Florida University of North Carolina
  13. 13. we can only use algorithms that scale upreadership statistics search most frequent tags related research + dozens of other services
  14. 14. most frequent tags on our scalereadership statistics search most frequent tags related research
  15. 15. most frequent tags on our scale most frequent tags called 39,000,000 times for each document for each tag in document increment count for tagcalled ~3 times sort tags by frequency called ~39,000,000 x 3 = ~117,000,000 times
  16. 16. solution: distributed computing map reduce for each document for each tag in document increment count for tag sort tags by frequency for each tag counted emit the tag and frequency MapReduce: Simplified Data Processing on Large Clusters In Proceedings of OSDI 2004, San Francisco, CA, 2004. Jeffrey Dean and Sanjay Ghemawat
  17. 17. solution: distributed computing hadoop MapReduce: Simplified Data Processing on Large Clusters In Proceedings of OSDI 2004, San Francisco, CA, 2004. Jeffrey Dean and Sanjay Ghemawat
  18. 18. support vector machineshidden markov models
  19. 19. conditional random fields Isaac G. Councill, C. Lee Giles, Min-Yen Kan. (2008) ParsCit: An open-source CRF reference string parsing package. In Proceedings of the LREC 08, Marrakesh, Morrocco.
  20. 20. deduplication crowd sourcing new articles from users collapse metadata and update canonical docs file hash check metadata comparison document fingerprinting 39,000,000 canonical documents
  21. 21. statistics pig
  22. 22. readerrank
  23. 23. currently tf-idf similarity between documentsdeveloping collaborative filtering
  24. 24. contact recommendations currently recommendations based on contact network developing version based on interests
  25. 25. Summary➔ idea behind mendeley➔ our features➔ our technical challenges and solutions➔ what does this mean for you?
  26. 26. access to data
  27. 27. online catalogdatatel data set online article view logs article tags library readership library stars
  28. 28. Mendeleys API
  29. 29. *new* you can get all of the articles in a group - data for you to test related research algos?
  30. 30. Mendeleys API Mashups with data on: Chemical compounds Locations Alzheimer’s research Grant funding Twitter streams
  31. 31. want more? let us know...
  32. 32. “All the time we are veryconscious of the hugechallenges that humansociety has now – curingcancer, understanding thebrain for Alzheimer‘s [...].But a lot of the state ofknowledge of the human raceis sitting in the scientists’computers, and is currentlynot shared […] We need toget it unlocked so we cantackle those huge problems.“
  33. 33. www.mendeley.com were hiring!

×