Choosing the right NOSQL database

Choosing the Right
NOSQL Database

twitter: @thobe / @neo4j / #neo4j
Tobias Ivarsson email: tobias@neotechnology.com
web: http://neo4j.org/
Hacker @ Neo Technology web: http://thobe.org/

Image credit: http://browsertoolkit.com/fault-tolerance.png

2


3

This is still the view a lot
of people have of NOSQL.


4

The Technologies

๏Graph Databases
- Neo4j

๏Document Databases
- MongoDB

๏Column Family Database
- Cassandra
5

Neo4j is a Graph Database
Graph databases FOCUS
on the interconnection
bet ween entities.

6

IS_A

Neo4j Graph Database
Graph databases FOCUS
on the interconnection
bet ween entities.

6

Other Graph Databases
๏ Neo4j
๏ Sones GraphDB
๏ Inﬁnite Graph (by Objectivity)
๏ AllegroGraph (by Franz inc.)
๏ HypergraphDB
๏ InfoGrid
๏ DEX
๏ VertexDB
๏ FlockDB
7

Document Databases

8

Document Databases
๏ MongoDB
๏ Riak
๏ CouchDB
๏ SimpleDB (internal at Amazon)

9

ColumnFamily DBs

10

ColumnFamily Databases
๏ Cassandra
๏ BigTable (internal at Google)
๏ HBase (part of Hadoop)
๏ Hypertable

11

Application 1:
Blog system

12

Requirements for a Blog System
๏ Get blog posts for a speciﬁc blog ordered by date
• possibly ﬁltered by tag
๏ Blogs can have an arbitrary number of blog posts
๏ Blog posts can have an arbitrary number of comments

13

the choice:
Document DB

14

“Schema” design

๏Represent each Blog as a
Collection of Post
documents

๏Represent Comments as
nested documents in the
Post documents

15

Creating a blog post
import com.mongodb.Mongo;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.BasicDBObject;
import com.mongodb.DBObject;
// ...
Mongo mongo = new Mongo( "localhost" ); // Connect to MongoDB
// ...
DB blogs = mongo.getDB( "blogs" ); // Access the blogs database
DBCollection myBlog = blogs.getCollection( "myBlog" );

DBObject blogPost = new BasicDBObject();
blogPost.put( "title", "JavaOne 2010" );
blogPost.put( "pub_date", new Date() );
blogPost.put( "body", "Publishing a post about JavaOne in my
MongoDB blog!" );
blogPost.put( "tags", Arrays.asList( "conference", "java" ) );
blogPost.put( "comments", new ArrayList() );

myBlog.insert( blogPost ); 16

Retrieving posts
// ...
import com.mongodb.DBCursor;
// ...

public Object getAllPosts( String blogName ) {
DBCollection blog = db.getCollection( blogName );
return renderPosts( blog.find() );
}

public Object getPostsByTag( String blogName, String tag ) {
DBCollection blog = db.getCollection( blogName );
return renderPosts( blog.find(
new BasicDBObject( "tags", tag ) ) );
}

private Object renderPosts( DBCursor cursor ) {
// order by publication date (descending)
cursor = cursor.sort( new BasicDBObject( "pub_date", -1 ) );
// ...
} 17

Adding a comment
DBCollection myBlog = blogs.getCollection( "myBlog" );
// ...

void addComment( String blogPostId, String message ) {
DBCursor posts = myBlog.find(
new BasicDBObject( "_id", blogPostId );
if ( !posts.hasNext() ) throw new NoSuchElementException();

DBObject blogPost = posts.next();

List comments = (List)blogPost.get( "comments" );
comments.add( new BasicDBObject( "message", message )
.append( "date", new Date() ) );

myBlog.save( blogPost );
}

18

Application 2:
Twitter Clone

19

Requirements for a Twitter Clone
๏ Handle high load - especially high write load
• Twitter generates 300GB of tweets / hour (April 2010)
๏ Retrieve all posts by a speciﬁc user, ordered by date
๏ Retrieve all posts by people a speciﬁc user follows, ordered by date

20

the choice:
ColumnFamily DB

21

Schema design
๏ Main keyspace: “Twissandra”, with these ColumnFamilies:
• User - user data, keyed by user id (UUID)
• Username - inverted index from username to user id
• Friends - who is user X following?
• Followers - who is following user X?
• Tweet - the actual messages
• Userline - timeline of tweets posted by a speciﬁc user
• Timelinespeciﬁc user follows posted by users
that a
- timeline of tweets

22

... that’s a lot of denormalization ...
๏ ColumnFamilies are similar to tables in an RDBMS
๏ Each ColumnFamily can only have one Key
๏ This makes the data highly shardable
๏ Which in turn enables very high write throughput
๏ Note however that each ColumnFamily will require its own writes
• There are no ACID transactions
• YOU as a developer is responsible for Consistency!
• (again, this gives you really high write throughput)
23

Create user
new_useruuid = str(uuid())

USER.insert(useruuid, {
'id': new_useruuid,
'username': username,
'password': password
})
USERNAME.insert(username, {
'id': new_useruuid
})

Follow user
FRIENDS.insert(useruuid, {frienduuid: time.time()})
FOLLOWERS.insert(frienduuid, {useruuid: time.time()})

24

Create message
tweetuuid = str(uuid())
timestamp = long(time.time() * 1e6)

TWEET.insert(tweetuuid, {
'id': tweetuuid,
'user_id': useruuid,
'body': body,
'_ts': timestamp})

message_ref = {
struct.pack('>d'),
timestamp: tweetuuid}
USERLINE.insert(useruuid, message_ref)

TIMELINE.insert(useruuid, message_ref)
for otheruuid in FOLLOWERS.get(useruuid, 5000):
TIMELINE.insert(otheruuid, message_ref)

25

Get messages
For all users this user follows
timeline = TIMELINE.get(useruuid,
column_start=start,
column_count=NUM_PER_PAGE,
column_reversed=True)
tweets = TWEET.multiget( timeline.values() )

By a speciﬁc user
timeline = USERLINE.get(useruuid,
column_start=start,
column_count=NUM_PER_PAGE,
column_reversed=True)
tweets = TWEET.multiget( timeline.values() )

26

Application 3:
Social Network

27

Requirements for a Social Network
๏ Interact with friends
๏ Get recommendations for new friends
๏ View the social context of a person
i.e. How do I know this person?

28

the choice:
Graph DB

29

“Schema” design
๏ Persons represented by Nodes
๏ Friendship represented by Relationships between Person Nodes
๏ Groups represented by Nodes
๏ Group membership represented by Relationship
from Person Node to Group Node
๏ Index for Person Nodes for lookup by name
๏ Index for Group Nodes for lookup by name

30

A small social graph example
FRIENDSHIP
MEMBERSHIP Dozer
Nebuchadnezzar crew
ily
:F am
er
aliﬁ
Qu
Tank
Morpheus

Agent Brown
Agent Smith
Thomas Anderson
Cypher
Q
ua
liﬁ
er
:L
ov
er
s Agent taskforce

Trinity
31

Creating the social graph
GraphDatabaseService graphDb = new EmbeddedGraphDatabase(
GRAPH_STORAGE_LOCATION );
IndexService indexes = new LuceneIndexService( graphDb );
Transaction tx = graphDb.beginTx();
try {
Node mrAnderson = graphDb.createNode();
mrAnderson.setProperty( "name", "Thomas Anderson" );
mrAnderson.setProperty( "age", 29 );
indexes.index( mrAnderson, "person", "Thomas Anderson" );
Node morpheus = graphDb.createNode();
morpheus.setProperty( "name", "Morpheus" );
morpheus.setProperty( "rank", "Captain" );
indexes.index( mrAnderson, "person", "Morpheus" );
Relationship friendship = mrAnderson.createRelationshipTo(
morpheus, SocialGraphTypes.FRIENDSHIP );

tx.success();
} finally {
tx.finish();
} 32

Making new friends
Node person1 = indexes.getSingle( "persons", person1Name );
Node person2 = indexes.getSingle( "persons", person2Name );

person1.createRelationshipTo(
person2, SocialGraphTypes.FRIENDSHIP );

Joining a group
Node person = indexes.getSingle( "persons", personName );
Node group = indexes.getSingle( "groups", groupName );

person.createRelationshipTo(
group, SocialGraphTypes.MEMBERSHIP );

33

How do I know this person?
Node me = ...
Node you = ...

PathFinder shortestPathFinder = GraphAlgoFactory.shortestPath(
Traversals.expanderForTypes(
SocialGraphTypes.FRIENDSHIP, Direction.BOTH ),
/* maximum depth: */ 4 );

Path shortestPath = shortestPathFinder.findSinglePath(me, you);

for ( Node friend : shortestPath.nodes() ) {
System.out.println( friend.getProperty( "name" ) );
}

34

Recommend new friends
Node person = ...

TraversalDescription friendsOfFriends = Traversal.description()
.expand( Traversals.expanderForTypes(
SocialGraphTypes.FRIENDSHIP, Direction.BOTH ) )
.prune( Traversal.pruneAfterDepth( 2 ) )
.breadthFirst() // Visit my friends before their friends.
//Visit a node at most once (don’t recommend direct friends)
.uniqueness( Uniqueness.NODE_GLOBAL )
.filter( new Predicate<Path>() {
// Only return friends of friends
public boolean accept( Path traversalPos ) {
return traversalPos.length() == 2;
}
} );

for ( Node recommendation :
friendsOfFriends.traverse( person ).nodes() ) {
System.out.println( recommendedFriend.getProperty("name") );
} 35

When to use Document DB (e.g. MongoDB)
๏ When data is collections of similar entities
• But semi structured (sparse) rather than tabular
• When ﬁelds in entries have multiple values

36

When to use ColumnFamily DB (e.g. Cassandra)
๏ When scalability is the main issue
• Both scaling size and scaling load
‣In particular scaling write load

๏ Linear scalability (as you add servers) both in read and write
๏ Low level - will require you to duplicate data to support queries

37

When to use Graph DB (e.g. Neo4j)
๏ When deep traversals are important
๏ For complex and domains
๏ When how entities relate is an important aspect of the domain

38

When not to use a NOSQL Database
๏ RDBMSes have been the de-facto standard for years, and still have
better tools for some tasks

• Especially for reporting
๏ When maintaining a system that works already
๏ Sometimes when data is uniform / structured
๏ When aggregations over (subsets) of the entire dataset is key

๏ But please don’t use a Relational database for persisting objects

39

Complex problem? - right tool for each job!

Image credits: Unknown :’( 40

Polyglot persistence
๏ Use multiple databases in the same system
- use the right tool for each part of the system
๏ Examples:
• Use an RDBMS relationships between entities Database for
modeling the
for structured data and a Graph

• Use a Graphfor storing for the domain model and a Document
Database
Database
large data objects

41

- the Graph Database company

http://neotechnology.com

Choosing the right NOSQL database

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Choosing the right NOSQL database

Similar to Choosing the right NOSQL database (20)

More from Tobias Lindaaker

More from Tobias Lindaaker (9)

Recently uploaded

Recently uploaded (20)

Choosing the right NOSQL database