14. Requirements for a Blog System
๏ Get blog posts for a specific blog ordered by date
• possibly filtered by tag
๏ Blogs can have an arbitrary number of blog posts
๏ Blog posts can have an arbitrary number of comments
13
21. Requirements for a Twitter Clone
๏ Handle high load - especially high write load
• Twitter generates 300GB of tweets / hour (April 2010)
๏ Retrieve all posts by a specific user, ordered by date
๏ Retrieve all posts by people a specific user follows, ordered by date
20
23. Schema design
๏ Main keyspace: “Twissandra”, with these ColumnFamilies:
• User - user data, keyed by user id (UUID)
• Username - inverted index from username to user id
• Friends - who is user X following?
• Followers - who is following user X?
• Tweet - the actual messages
• Userline - timeline of tweets posted by a specific user
• Timelinespecific user follows posted by users
that a
- timeline of tweets
22
24. ... that’s a lot of denormalization ...
๏ ColumnFamilies are similar to tables in an RDBMS
๏ Each ColumnFamily can only have one Key
๏ This makes the data highly shardable
๏ Which in turn enables very high write throughput
๏ Note however that each ColumnFamily will require its own writes
• There are no ACID transactions
• YOU as a developer is responsible for Consistency!
• (again, this gives you really high write throughput)
23
27. Get messages
For all users this user follows
timeline = TIMELINE.get(useruuid,
column_start=start,
column_count=NUM_PER_PAGE,
column_reversed=True)
tweets = TWEET.multiget( timeline.values() )
By a specific user
timeline = USERLINE.get(useruuid,
column_start=start,
column_count=NUM_PER_PAGE,
column_reversed=True)
tweets = TWEET.multiget( timeline.values() )
26
29. Requirements for a Social Network
๏ Interact with friends
๏ Get recommendations for new friends
๏ View the social context of a person
i.e. How do I know this person?
28
31. “Schema” design
๏ Persons represented by Nodes
๏ Friendship represented by Relationships between Person Nodes
๏ Groups represented by Nodes
๏ Group membership represented by Relationship
from Person Node to Group Node
๏ Index for Person Nodes for lookup by name
๏ Index for Group Nodes for lookup by name
30
32. A small social graph example
FRIENDSHIP
MEMBERSHIP Dozer
Nebuchadnezzar crew
ily
:F am
er
alifi
Qu
Tank
Morpheus
Agent Brown
Agent Smith
Thomas Anderson
Cypher
Q
ua
lifi
er
:L
ov
er
s Agent taskforce
Trinity
31
34. Making new friends
Node person1 = indexes.getSingle( "persons", person1Name );
Node person2 = indexes.getSingle( "persons", person2Name );
person1.createRelationshipTo(
person2, SocialGraphTypes.FRIENDSHIP );
Joining a group
Node person = indexes.getSingle( "persons", personName );
Node group = indexes.getSingle( "groups", groupName );
person.createRelationshipTo(
group, SocialGraphTypes.MEMBERSHIP );
33
35. How do I know this person?
Node me = ...
Node you = ...
PathFinder shortestPathFinder = GraphAlgoFactory.shortestPath(
Traversals.expanderForTypes(
SocialGraphTypes.FRIENDSHIP, Direction.BOTH ),
/* maximum depth: */ 4 );
Path shortestPath = shortestPathFinder.findSinglePath(me, you);
for ( Node friend : shortestPath.nodes() ) {
System.out.println( friend.getProperty( "name" ) );
}
34
36. Recommend new friends
Node person = ...
TraversalDescription friendsOfFriends = Traversal.description()
.expand( Traversals.expanderForTypes(
SocialGraphTypes.FRIENDSHIP, Direction.BOTH ) )
.prune( Traversal.pruneAfterDepth( 2 ) )
.breadthFirst() // Visit my friends before their friends.
//Visit a node at most once (don’t recommend direct friends)
.uniqueness( Uniqueness.NODE_GLOBAL )
.filter( new Predicate<Path>() {
// Only return friends of friends
public boolean accept( Path traversalPos ) {
return traversalPos.length() == 2;
}
} );
for ( Node recommendation :
friendsOfFriends.traverse( person ).nodes() ) {
System.out.println( recommendedFriend.getProperty("name") );
} 35
37. When to use Document DB (e.g. MongoDB)
๏ When data is collections of similar entities
• But semi structured (sparse) rather than tabular
• When fields in entries have multiple values
36
38. When to use ColumnFamily DB (e.g. Cassandra)
๏ When scalability is the main issue
• Both scaling size and scaling load
‣In particular scaling write load
๏ Linear scalability (as you add servers) both in read and write
๏ Low level - will require you to duplicate data to support queries
37
39. When to use Graph DB (e.g. Neo4j)
๏ When deep traversals are important
๏ For complex and domains
๏ When how entities relate is an important aspect of the domain
38
40. When not to use a NOSQL Database
๏ RDBMSes have been the de-facto standard for years, and still have
better tools for some tasks
• Especially for reporting
๏ When maintaining a system that works already
๏ Sometimes when data is uniform / structured
๏ When aggregations over (subsets) of the entire dataset is key
๏ But please don’t use a Relational database for persisting objects
39
41. Complex problem? - right tool for each job!
Image credits: Unknown :’( 40
42. Polyglot persistence
๏ Use multiple databases in the same system
- use the right tool for each part of the system
๏ Examples:
• Use an RDBMS relationships between entities Database for
modeling the
for structured data and a Graph
• Use a Graphfor storing for the domain model and a Document
Database
Database
large data objects
41
43. - the Graph Database company
http://neotechnology.com