Choosing the right NOSQL database

8,193 views
7,892 views

Published on

My presentation from JavaOne 2010 on how to

Published in: Technology
2 Comments
11 Likes
Statistics
Notes
No Downloads
Views
Total views
8,193
On SlideShare
0
From Embeds
0
Number of Embeds
159
Actions
Shares
0
Downloads
184
Comments
2
Likes
11
Embeds 0
No embeds

No notes for slide

Choosing the right NOSQL database

  1. 1. Choosing the Right NOSQL Database twitter: @thobe / @neo4j / #neo4jTobias Ivarsson email: tobias@neotechnology.com web: http://neo4j.org/Hacker @ Neo Technology web: http://thobe.org/
  2. 2. Image credit: http://browsertoolkit.com/fault-tolerance.png 2
  3. 3. Image credit: http://browsertoolkit.com/fault-tolerance.png 3
  4. 4. This is still the view a lotof people have of NOSQL.Image credit: http://browsertoolkit.com/fault-tolerance.png 4
  5. 5. The Technologies๏Graph Databases - Neo4j๏Document Databases - MongoDB๏Column Family Database - Cassandra 5
  6. 6. Neo4j is a Graph Database Graph databases FOCUS on the interconnection bet ween entities. 6
  7. 7. IS_ANeo4j Graph Database Graph databases FOCUS on the interconnection bet ween entities. 6
  8. 8. Other Graph Databases๏ Neo4j๏ Sones GraphDB๏ Infinite Graph (by Objectivity)๏ AllegroGraph (by Franz inc.)๏ HypergraphDB๏ InfoGrid๏ DEX๏ VertexDB๏ FlockDB 7
  9. 9. Document Databases 8
  10. 10. Document Databases๏ MongoDB๏ Riak๏ CouchDB๏ SimpleDB (internal at Amazon) 9
  11. 11. ColumnFamily DBs 10
  12. 12. ColumnFamily Databases๏ Cassandra๏ BigTable (internal at Google)๏ HBase (part of Hadoop)๏ Hypertable 11
  13. 13. Application 1: Blog system 12
  14. 14. Requirements for a Blog System๏ Get blog posts for a specific blog ordered by date • possibly filtered by tag๏ Blogs can have an arbitrary number of blog posts๏ Blog posts can have an arbitrary number of comments 13
  15. 15. the choice:Document DB 14
  16. 16. “Schema” design๏Represent each Blog as a Collection of Post documents๏Represent Comments as nested documents in the Post documents 15
  17. 17. Creating a blog postimport com.mongodb.Mongo;import com.mongodb.DB;import com.mongodb.DBCollection;import com.mongodb.BasicDBObject;import com.mongodb.DBObject;// ...Mongo mongo = new Mongo( "localhost" ); // Connect to MongoDB// ...DB blogs = mongo.getDB( "blogs" ); // Access the blogs databaseDBCollection myBlog = blogs.getCollection( "myBlog" );DBObject blogPost = new BasicDBObject();blogPost.put( "title", "JavaOne 2010" );blogPost.put( "pub_date", new Date() );blogPost.put( "body", "Publishing a post about JavaOne in my MongoDB blog!" );blogPost.put( "tags", Arrays.asList( "conference", "java" ) );blogPost.put( "comments", new ArrayList() );myBlog.insert( blogPost ); 16
  18. 18. Retrieving posts// ...import com.mongodb.DBCursor;// ...public Object getAllPosts( String blogName ) { DBCollection blog = db.getCollection( blogName ); return renderPosts( blog.find() );}public Object getPostsByTag( String blogName, String tag ) { DBCollection blog = db.getCollection( blogName ); return renderPosts( blog.find( new BasicDBObject( "tags", tag ) ) );}private Object renderPosts( DBCursor cursor ) { // order by publication date (descending) cursor = cursor.sort( new BasicDBObject( "pub_date", -1 ) ); // ...} 17
  19. 19. Adding a commentDBCollection myBlog = blogs.getCollection( "myBlog" );// ...void addComment( String blogPostId, String message ) { DBCursor posts = myBlog.find( new BasicDBObject( "_id", blogPostId ); if ( !posts.hasNext() ) throw new NoSuchElementException(); DBObject blogPost = posts.next(); List comments = (List)blogPost.get( "comments" ); comments.add( new BasicDBObject( "message", message ) .append( "date", new Date() ) ); myBlog.save( blogPost );} 18
  20. 20. Application 2:Twitter Clone 19
  21. 21. Requirements for a Twitter Clone๏ Handle high load - especially high write load • Twitter generates 300GB of tweets / hour (April 2010)๏ Retrieve all posts by a specific user, ordered by date๏ Retrieve all posts by people a specific user follows, ordered by date 20
  22. 22. the choice:ColumnFamily DB 21
  23. 23. Schema design๏ Main keyspace: “Twissandra”, with these ColumnFamilies: • User - user data, keyed by user id (UUID) • Username - inverted index from username to user id • Friends - who is user X following? • Followers - who is following user X? • Tweet - the actual messages • Userline - timeline of tweets posted by a specific user • Timelinespecific user follows posted by users that a - timeline of tweets 22
  24. 24. ... that’s a lot of denormalization ...๏ ColumnFamilies are similar to tables in an RDBMS๏ Each ColumnFamily can only have one Key๏ This makes the data highly shardable๏ Which in turn enables very high write throughput๏ Note however that each ColumnFamily will require its own writes • There are no ACID transactions • YOU as a developer is responsible for Consistency! • (again, this gives you really high write throughput) 23
  25. 25. Create usernew_useruuid = str(uuid())USER.insert(useruuid, { id: new_useruuid, username: username, password: password })USERNAME.insert(username, { id: new_useruuid })Follow userFRIENDS.insert(useruuid, {frienduuid: time.time()})FOLLOWERS.insert(frienduuid, {useruuid: time.time()}) 24
  26. 26. Create messagetweetuuid = str(uuid())timestamp = long(time.time() * 1e6)TWEET.insert(tweetuuid, { id: tweetuuid, user_id: useruuid, body: body, _ts: timestamp})message_ref = { struct.pack(>d), timestamp: tweetuuid}USERLINE.insert(useruuid, message_ref)TIMELINE.insert(useruuid, message_ref)for otheruuid in FOLLOWERS.get(useruuid, 5000): TIMELINE.insert(otheruuid, message_ref) 25
  27. 27. Get messagesFor all users this user followstimeline = TIMELINE.get(useruuid, column_start=start, column_count=NUM_PER_PAGE, column_reversed=True)tweets = TWEET.multiget( timeline.values() )By a specific usertimeline = USERLINE.get(useruuid, column_start=start, column_count=NUM_PER_PAGE, column_reversed=True)tweets = TWEET.multiget( timeline.values() ) 26
  28. 28. Application 3:Social Network 27
  29. 29. Requirements for a Social Network๏ Interact with friends๏ Get recommendations for new friends๏ View the social context of a person i.e. How do I know this person? 28
  30. 30. the choice:Graph DB 29
  31. 31. “Schema” design๏ Persons represented by Nodes๏ Friendship represented by Relationships between Person Nodes๏ Groups represented by Nodes๏ Group membership represented by Relationship from Person Node to Group Node๏ Index for Person Nodes for lookup by name๏ Index for Group Nodes for lookup by name 30
  32. 32. A small social graph example FRIENDSHIP MEMBERSHIP Dozer Nebuchadnezzar crew ily :F am er alifi Qu Tank Morpheus Agent Brown Agent SmithThomas Anderson Cypher Q ua lifi er :L ov er s Agent taskforce Trinity 31
  33. 33. Creating the social graphGraphDatabaseService graphDb = new EmbeddedGraphDatabase( GRAPH_STORAGE_LOCATION );IndexService indexes = new LuceneIndexService( graphDb );Transaction tx = graphDb.beginTx();try { Node mrAnderson = graphDb.createNode(); mrAnderson.setProperty( "name", "Thomas Anderson" ); mrAnderson.setProperty( "age", 29 ); indexes.index( mrAnderson, "person", "Thomas Anderson" ); Node morpheus = graphDb.createNode(); morpheus.setProperty( "name", "Morpheus" ); morpheus.setProperty( "rank", "Captain" ); indexes.index( mrAnderson, "person", "Morpheus" ); Relationship friendship = mrAnderson.createRelationshipTo( morpheus, SocialGraphTypes.FRIENDSHIP ); tx.success();} finally { tx.finish();} 32
  34. 34. Making new friendsNode person1 = indexes.getSingle( "persons", person1Name );Node person2 = indexes.getSingle( "persons", person2Name );person1.createRelationshipTo( person2, SocialGraphTypes.FRIENDSHIP );Joining a groupNode person = indexes.getSingle( "persons", personName );Node group = indexes.getSingle( "groups", groupName );person.createRelationshipTo( group, SocialGraphTypes.MEMBERSHIP ); 33
  35. 35. How do I know this person?Node me = ...Node you = ...PathFinder shortestPathFinder = GraphAlgoFactory.shortestPath( Traversals.expanderForTypes( SocialGraphTypes.FRIENDSHIP, Direction.BOTH ), /* maximum depth: */ 4 );Path shortestPath = shortestPathFinder.findSinglePath(me, you);for ( Node friend : shortestPath.nodes() ) { System.out.println( friend.getProperty( "name" ) );} 34
  36. 36. Recommend new friendsNode person = ...TraversalDescription friendsOfFriends = Traversal.description() .expand( Traversals.expanderForTypes( SocialGraphTypes.FRIENDSHIP, Direction.BOTH ) ) .prune( Traversal.pruneAfterDepth( 2 ) ) .breadthFirst() // Visit my friends before their friends. //Visit a node at most once (don’t recommend direct friends) .uniqueness( Uniqueness.NODE_GLOBAL ) .filter( new Predicate<Path>() { // Only return friends of friends public boolean accept( Path traversalPos ) { return traversalPos.length() == 2; } } );for ( Node recommendation : friendsOfFriends.traverse( person ).nodes() ) { System.out.println( recommendedFriend.getProperty("name") );} 35
  37. 37. When to use Document DB (e.g. MongoDB)๏ When data is collections of similar entities • But semi structured (sparse) rather than tabular • When fields in entries have multiple values 36
  38. 38. When to use ColumnFamily DB (e.g. Cassandra)๏ When scalability is the main issue • Both scaling size and scaling load ‣In particular scaling write load๏ Linear scalability (as you add servers) both in read and write๏ Low level - will require you to duplicate data to support queries 37
  39. 39. When to use Graph DB (e.g. Neo4j)๏ When deep traversals are important๏ For complex and domains๏ When how entities relate is an important aspect of the domain 38
  40. 40. When not to use a NOSQL Database๏ RDBMSes have been the de-facto standard for years, and still have better tools for some tasks • Especially for reporting๏ When maintaining a system that works already๏ Sometimes when data is uniform / structured๏ When aggregations over (subsets) of the entire dataset is key๏ But please don’t use a Relational database for persisting objects 39
  41. 41. Complex problem? - right tool for each job! Image credits: Unknown :’( 40
  42. 42. Polyglot persistence๏ Use multiple databases in the same system - use the right tool for each part of the system๏ Examples: • Use an RDBMS relationships between entities Database for modeling the for structured data and a Graph • Use a Graphfor storing for the domain model and a Document Database Database large data objects 41
  43. 43. - the Graph Database companyhttp://neotechnology.com

×