• Email
  • Like
  • Save
  • Private Content
  • Embed

HBase at Mendeley

by

  • 10,141 views

The details behind how and why we use HBase in the data mining team at Mendeley.

The details behind how and why we use HBase in the data mining team at Mendeley.

Accessibility

Categories

Upload Details

Uploaded via SlideShare as Adobe PDF

Usage Rights

© All Rights Reserved

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

Cancel

6 Embeds 619

http://danharvey.wordpress.com 400
http://lanyrd.com 209
http://twitter.com 4
http://paper.li 3
http://webcache.googleusercontent.com 2
http://www.linkedin.com 1

Statistics

Likes
12
Downloads
122
Comments
4
Embed Views
619
Views on SlideShare
9,522
Total Views
10,141

14 of 4 previous next Post a comment

  • danharvey Dan Harvey, Lead Engineer, Data Mining at Mendeley I believe HBase has now improved to the point you can use it directly for serving from. Though you still need to be careful with load management on your cluster so map reduce tasks don't soak up all the I/O! 3 months ago
    Are you sure you want to
  • AlexMcLintock Alex McLintock, CTO, Tech Guru at Openweb Analysts Ltd, Dan: Would you still use Voldemort for a HBase front end cache now, or anything else? 3 months ago
    Are you sure you want to
  • danharvey Dan Harvey, Lead Engineer, Data Mining at Mendeley I got asked this during the presentation and on Twitter: How come you are putting data in #Voldemort for serving #Mendeley data? Why not serve it directly from #HBase?

    This was mostly because at that point in time we don't have as much data for serving as process so we can get away with less hardware right now. We tried it out and serving from HBase works fine as long as you are not running a lot of map reduce jobs over it as we do. Over time as we grow and add more features that use HBase in more interesting ways I'm sure we'll be using it for serving too, then we'll need a cluster just for serving and use the replication in 0.90 to link them together. Far cleaner than writing your own code to do that..
    2 years ago
    Are you sure you want to
  • rdmpage Roderic Page, Professor of Taxonomy at University of Glasgow Great to see some technical details about the Mendeley back-end. 2 years ago
    Are you sure you want to
Post Comment
Edit your comment

HBase at Mendeley HBase at Mendeley Presentation Transcript