Your SlideShare is downloading. ×
An overview of NOSQL (JFokus 2011)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

An overview of NOSQL (JFokus 2011)

2,111
views

Published on

An overview of the four major categories of NOSQL and an example of how to handle polyglot persistence using Spring Data.

An overview of the four major categories of NOSQL and an example of how to handle polyglot persistence using Spring Data.

Published in: Technology

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,111
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. NOSQL - an overview - JFokus 2011Emil Eifrem @emileifremCEO, Neo Technology emil@neotechnology.com #neo4j
  • 2. So what’s the plan?๏ Why NOSQL?๏ The NOSQL landscape๏ NOSQL challenges๏ Conclusion 2
  • 3. First off: the name๏ WE ALL HATES IT, M’KAY? 3
  • 4. NOSQL is NOT... 4
  • 5. NOSQL is NOT... ๏ NO to SQL 4
  • 6. NOSQL is NOT... ๏ NO to SQL ๏ NEVER SQL 4
  • 7. NOSQL is simplyNot Only SQL 5
  • 8. NOSQL - Why now? Four trends 7
  • 9. Trend 1:data set size402007 Source: IDC 2007
  • 10. 988Trend 1:data set size402007 2010 Source: IDC 2007
  • 11. Trend 2: Connectedness Information connectivity web 1.0 web 2.0 “web 3.0” 1990 2000 2010 2020 10
  • 12. Trend 2: Connectedness Information connectivity Text documents web 1.0 web 2.0 “web 3.0” 1990 2000 2010 2020 11
  • 13. Trend 2: Connectedness Information connectivity Hypertext Text documents web 1.0 web 2.0 “web 3.0” 1990 2000 2010 2020 12
  • 14. Trend 2: Connectedness Folksonomies Information connectivity Tagging Wikis User-generated content Blogs RSS Hypertext Text documents web 1.0 web 2.0 “web 3.0” 1990 2000 2010 2020 13
  • 15. Trend 2: Connectedness Giant Global Graph (GGG) Ontologies RDF Folksonomies Information connectivity Tagging Wikis User-generated content Blogs RSS Hypertext Text documents web 1.0 web 2.0 “web 3.0” 1990 2000 2010 2020 14
  • 16. Trend 2: Connectedness Giant Global Graph (GGG) Ontologies RDF Folksonomies Information connectivity Tagging Wikis User-generated content Blogs RSS Hypertext Text documents web 1.0 web 2.0 “web 3.0” 1990 2000 2010 2020 15
  • 17. Trend 3: Semi-structure๏ Individualization of content • In the salary lists of the 1970s, all elements had exactly one job • In Or 15? lists of the 2000s, we need 5 job columns! Or 8? the salary๏ All encompassing “entire world views” • Store more data about each entity๏ Trend accelerated by the decentralization of content generation that is the hallmark of the age of participation (“web 2.0”) 16
  • 18. Aside: RDBMS performance Relational database Requirement of application Performance Data complexity 17
  • 19. Aside: RDBMS performance Relational database Requirement of application Performance Data complexity 18
  • 20. Aside: RDBMS performance Salary List Relational database Requirement of application Performance Data complexity 19
  • 21. Aside: RDBMS performance Salary List Relational database Requirement of application Performance Majority of Webapps Data complexity 20
  • 22. Aside: RDBMS performance Salary List Relational database Requirement of application Performance Majority of Webapps Social network } Semantic Trading custom Data complexity 21
  • 23. Trend 4: Architecture 1980s: Application (<-- note lack of s) Application DB 22
  • 24. Trend 4: Architecture 1990s: Database as integration hub Application Application Application DB 23
  • 25. Trend 4: Architecture 2000s: (moving towards) Decoupled services with their own backend Service Service Service DB DB DB 24
  • 26. Why NOSQL Now?๏Trend 1: Size๏Trend 2: Connectedness๏Trend 3: Semi-structure๏Trend 4: Architecture 25
  • 27. Four NOSQL categories 26
  • 28. Category 1: Key-Value stores๏ Lineage: • “Dynamo: Amazon’s Highly Available Key-Value Store” (2007)๏ Data model: • Global key-value mapping • Think: Globally available HashMap/Dict/etc๏ Examples: • Project Voldemort • Tokyo {Cabinet,Tyrant, etc} 27 Riak
  • 29. Category 1: Key-Value stores๏ Strengths • Simple data model • Great at scaling out horizontally๏ Weaknesses: • Simplistic data model • Poor for complex data 28
  • 30. Category II: ColumnFamily (BigTable) stores๏ Lineage: • “Bigtable:(2006) Data” A Distributed Storage System for Structured๏ Data model: • A big table, with column families๏ Examples: • HBase • HyperTable • Cassandra 29
  • 31. Category III: Document databases๏ Lineage: • Lotus Notes๏ Data model: • Collections of documents • A document is a key-value collection๏ Examples: • CouchDB • MongoDB 30 Terrastore
  • 32. Document db: An example๏ How would we model a blogging software?๏ One stab: • Represent each Blog as a Collection of Post documents • Represent Comments as nested documents in the Post documents 31
  • 33. Document db: Creating a blog postimport com.mongodb.Mongo;import com.mongodb.DB;import com.mongodb.DBCollection;import com.mongodb.BasicDBObject;import com.mongodb.DBObject;// ...Mongo mongo = new Mongo( "localhost" ); // Connect to MongoDB// ...DB blogs = mongo.getDB( "blogs" ); // Access the blogs databaseDBCollection myBlog = blogs.getCollection( "Thobe’s blog" );DBObject blogPost = new BasicDBObject();blogPost.put( "title", "JFokus 2011" );blogPost.put( "pub_date", new Date() );blogPost.put( "body", "Publishing a post about JFokus in my MongoDB blog!" );blogPost.put( "tags", Arrays.asList( "conference", "names" ) );blogPost.put( "comments", new ArrayList() );myBlog.insert( blogPost ); 32
  • 34. Retrieving posts// ...import com.mongodb.DBCursor;// ...public Object getAllPosts( String blogName ) { DBCollection blog = db.getCollection( blogName ); return renderPosts( blog.find() );}private Object renderPosts( DBCursor cursor ) { // order by publication date (descending) cursor = cursor.sort( new BasicDBObject( "pub_date", -1 ) ); // ...} 33
  • 35. Category IV: Graph databases๏ Lineage: • Euler and graph theory๏ Data model: • Nodes with properties • Typed relationships with properties๏ Examples: • Sones GraphDB • InfiniteGraph 34 Neo4j
  • 36. Property Graph model 35
  • 37. Property Graph model LOVES LIVES WITH LOVES OWNS DRIVES 36
  • 38. Property Graph model name: “Mary” LOVESname: “James” age: 35age: 32 LIVES WITHtwitter: “@spam” LOVES OWNS property type: “car” DRIVES brand: “Volvo” model: “V70” 37
  • 39. Graphs are whiteboard friendly An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database. With a Graph Database the model from the whiteboard is implemented directly. Image credits: Tobias Ivarsson 38
  • 40. Graphs are whiteboard friendly An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database. With a Graph Database the model thobe from the whiteboard is implemented directly. Joe project blog Wardrobe Strength Hello Joe Modularizing Jython Neo4j performance analysis Image credits: Tobias Ivarsson 39
  • 41. Graph db: Creating a social graphGraphDatabaseService graphDb = new EmbeddedGraphDatabase( GRAPH_STORAGE_LOCATION );Transaction tx = graphDb.beginTx();try { Node mrAnderson = graphDb.createNode(); mrAnderson.setProperty( "name", "Thomas Anderson" ); mrAnderson.setProperty( "age", 29 ); Node morpheus = graphDb.createNode(); morpheus.setProperty( "name", "Morpheus" ); morpheus.setProperty( "rank", "Captain" ); Relationship friendship = mrAnderson.createRelationshipTo( morpheus, SocialGraphTypes.FRIENDSHIP ); tx.success();} finally { tx.finish();} 40
  • 42. Graph db: How do I know this person?Node me = ...Node you = ...PathFinder shortestPathFinder = GraphAlgoFactory.shortestPath( Traversals.expanderForTypes( SocialGraphTypes.FRIENDSHIP, Direction.BOTH ), /* maximum depth: */ 4 );Path shortestPath = shortestPathFinder.findSinglePath(me, you);for ( Node friend : shortestPath.nodes() ) { System.out.println( friend.getProperty( "name" ) );} 41
  • 43. Graph db: Recommend new friendsNode person = ...TraversalDescription friendsOfFriends = Traversal.description() .expand( Traversals.expanderForTypes( SocialGraphTypes.FRIENDSHIP, Direction.BOTH ) ) .prune( Traversal.pruneAfterDepth( 2 ) ) .breadthFirst() // Visit my friends before their friends. //Visit a node at most once (don’t recommend direct friends) .uniqueness( Uniqueness.NODE_GLOBAL ) .filter( new Predicate<Path>() { // Only return friends of friends public boolean accept( Path traversalPos ) { return traversalPos.length() == 2; } } );for ( Node recommendation : friendsOfFriends.traverse( person ).nodes() ) { System.out.println( recommendedFriend.getProperty("name") );} 42
  • 44. Four emerging NOSQL categories ๏Key-Value stores ๏ColumnFamiy stores ๏Document databases ๏Graph databases 43
  • 45. Scaling to size vs. Scaling to complexity Size Key/Value stores Complexity 44
  • 46. Scaling to size vs. Scaling to complexity Size Key/Value stores ColumnFamily stores Complexity 44
  • 47. Scaling to size vs. Scaling to complexity Size Key/Value stores ColumnFamily stores Document databases Complexity 44
  • 48. Scaling to size vs. Scaling to complexity Size Key/Value stores ColumnFamily stores Document databases Graph databases Complexity 44
  • 49. Scaling to size vs. Scaling to complexity Size Key/Value stores ColumnFamily stores Document databases Graph databases Billions of nodes and relationships My subjective view: > 90% of use cases Complexity 44
  • 50. NOSQL challenges? 45
  • 51. NOSQL challenges?๏ Mindshare • But that’s also product usability (“how do you query it?”) 45
  • 52. NOSQL challenges?๏ Mindshare • But that’s also product usability (“how do you query it?”)๏ Tool support • Both devtime tools and runtime ops tools • Standards may help? • ... or maybe just time 45
  • 53. NOSQL challenges?๏ Mindshare • But that’s also product usability (“how do you query it?”)๏ Tool support • Both devtime tools and runtime ops tools • Standards may help? • ... or maybe just time๏ Middleware support 45
  • 54. Middleware support?๏ Lemme tell you the story about Mike and his restaurant site 46
  • 55. Domain & data model Restaurant UserAccount @Entity @Entity public class Restaurant { @Table(name = "user_account") @Id @GeneratedValue public class UserAccount { private Long id; @Id @GeneratedValue private String name; private Long id; private String city; private String userName; private String state; private String firstName; private String zipCode; private String lastName; @Temporal(TemporalType.TIMESTAMP) private Date birthDate; @ManyToMany(cascade = CascadeType.ALL) private Set<Restaurant> favorites; 47
  • 56. Step 1: Buildsing a web site Tomcat One box MySQL 48
  • 57. Step II: Whoa, ppl are actually using it? 49
  • 58. Step II: Whoa, ppl are actually using it? Tomcat Two boxes MySQL 49
  • 59. Step III: That’s a LOT of pages served... 50
  • 60. Step III: That’s a LOT of pages served... Tomcat Tomcat Tomcat n boxes MySQL 1 box 50
  • 61. Step IV: Our DB is completely overwhelmed... 51
  • 62. Step IV: Our DB is completely overwhelmed... Tomcat Tomcat Tomcat n boxes MySQL (m) MySQL (s) n boxes 51
  • 63. Step V: Our DBs are STILL overwhelmed ? 52
  • 64. What does the site look like now? 53
  • 65. Step V: Our DBs are STILL overwhelmed๏ Turns out the problem is due to joins๏ A while back Mike introduced a new feature • Recommend restaurants based on the user’s friends (and friends of friends) • Whoa, recommendations aren’t just simple get and put! • They’re killing us with joins๏ What about sharding?๏ What about SSDs? 54
  • 66. Polyglot persistence (Not Only SQL)๏ How did we get into this situation?๏ Well, data sets are increasingly less uniform๏ Parts of Mike’s data fits well in an RDBMS๏ But parts of it is graph-shaped • It fits much better in a graph database! • And I’m sure that there is or will be very key-value-esque parts of the dataset๏ Simple, just store some of it in a graph db and some of it in MySQL! But what does the code look like? 55
  • 67. We were here 56
  • 68. We were here Restaurant UserAccount @Entity @Entity public class Restaurant { @Table(name = "user_account") @Id @GeneratedValue public class UserAccount { private Long id; @Id @GeneratedValue private String name; private Long id; private String city; private String userName; private String state; private String firstName; private String zipCode; private String lastName; @Temporal(TemporalType.TIMESTAMP) private Date birthDate; @ManyToMany(cascade = CascadeType.ALL) private Set<Restaurant> favorites; 56
  • 69. Then we added recommendations Restaurant UserAccount @Entity @Entity @NodeEntity(partial = true) @Table(name = "user_account") public class Restaurant { @NodeEntity(partial = true) @Id @GeneratedValue public class UserAccount { private Long id; @Id @GeneratedValue private String name; private Long id; private String city; private String userName; private String state; private String firstName; private String zipCode; private String lastName; @Temporal(TemporalType.TIMESTAMP) private Date birthDate; @ManyToMany(cascade = CascadeType.ALL) Recommendation private Set<Restaurant> favorites; @RelationshipEntity @GraphProperty public class Recommendation { String nickname; @StartNode @RelatedTo(type = "friends", private UserAccount user; elementClass = UserAccount.class) @EndNode Set<UserAccount> friends; private Restaurant restaurant; @RelatedToVia(type = "recommends", private int stars; elementClass = Recommendation.class) private String comment; Iterable<Recommendation> recommendations; 57
  • 70. Then we added recommendations Restaurant UserAccount @Entity @Entity @NodeEntity(partial = true) @Table(name = "user_account") public class Restaurant { @NodeEntity(partial = true) @Id @GeneratedValue public class UserAccount { private Long id; @Id @GeneratedValue private String name; private Long id; private String city; private String userName; private String state; private String firstName; private String zipCode; private String lastName; @Temporal(TemporalType.TIMESTAMP) private Date birthDate; @ManyToMany(cascade = CascadeType.ALL) Recommendation private Set<Restaurant> favorites; @RelationshipEntity @GraphProperty public class Recommendation { String nickname; @StartNode @RelatedTo(type = "friends", private UserAccount user; elementClass = UserAccount.class) @EndNode Set<UserAccount> friends; private Restaurant restaurant; @RelatedToVia(type = "recommends", private int stars; elementClass = Recommendation.class) private String comment; Iterable<Recommendation> recommendations; 58
  • 71. Then we added recommendations Restaurant UserAccount @Entity @Entity @NodeEntity(partial = true) @Table(name = "user_account") public class Restaurant { @NodeEntity(partial = true) @Id @GeneratedValue public class UserAccount { private Long id; @Id @GeneratedValue private String name; private Long id; private String city; private String userName; private String state; private String firstName; private String zipCode; private String lastName; @Temporal(TemporalType.TIMESTAMP) private Date birthDate; @ManyToMany(cascade = CascadeType.ALL) Recommendation private Set<Restaurant> favorites; @RelationshipEntity @GraphProperty public class Recommendation { String nickname; @StartNode @RelatedTo(type = "friends", private UserAccount user; elementClass = UserAccount.class) @EndNode Set<UserAccount> friends; private Restaurant restaurant; @RelatedToVia(type = "recommends", private int stars; elementClass = Recommendation.class) private String comment; Iterable<Recommendation> recommendations; 58
  • 72. Then we added recommendations Restaurant UserAccount @Entity @Entity @NodeEntity(partial = true) @Table(name = "user_account") public class Restaurant { @NodeEntity(partial = true) @Id @GeneratedValue public class UserAccount { private Long id; @Id @GeneratedValue private String name; private Long id; private String city; private String userName; private String state; private String firstName; private String zipCode; private String lastName; @Temporal(TemporalType.TIMESTAMP) private Date birthDate; @ManyToMany(cascade = CascadeType.ALL) Recommendation private Set<Restaurant> favorites; @RelationshipEntity @GraphProperty public class Recommendation { String nickname; @StartNode @RelatedTo(type = "friends", private UserAccount user; elementClass = UserAccount.class) @EndNode Set<UserAccount> friends; private Restaurant restaurant; @RelatedToVia(type = "recommends", private int stars; elementClass = Recommendation.class) private String comment; Iterable<Recommendation> recommendations; 58
  • 73. Then we added recommendations Restaurant UserAccount @Entity @Entity @NodeEntity(partial = true) @Table(name = "user_account") public class Restaurant { @NodeEntity(partial = true) @Id @GeneratedValue public class UserAccount { private Long id; @Id @GeneratedValue private String name; private Long id; private String city; private String userName; private String state; private String firstName; private String zipCode; private String lastName; @Temporal(TemporalType.TIMESTAMP) private Date birthDate; @ManyToMany(cascade = CascadeType.ALL) Recommendation private Set<Restaurant> favorites; @RelationshipEntity @GraphProperty public class Recommendation { String nickname; @StartNode @RelatedTo(type = "friends", private UserAccount user; elementClass = UserAccount.class) @EndNode Set<UserAccount> friends; private Restaurant restaurant; @RelatedToVia(type = "recommends", private int stars; elementClass = Recommendation.class) private String comment; Iterable<Recommendation> recommendations; 58
  • 74. And we’re back to talking about Middleware 59
  • 75. And we’re back to talking about Middleware๏ This example was from the Spring Data project • Specifically Spring Data Graph • Available at: http://www.springsource.org/spring-data • Currently in M02 -- will be released 1.0 end of March • Would LOVEkinda love,REALLY love, likeifValentine’s day user yesterday <--- like some feedback you’re a Spring 59
  • 76. And we’re back to talking about Middleware๏ This example was from the Spring Data project • Specifically Spring Data Graph • Available at: http://www.springsource.org/spring-data • Currently in M02 -- will be released 1.0 end of March • Would LOVEkinda love,REALLY love, likeifValentine’s day user yesterday <--- like some feedback you’re a Spring๏ But this really should be available in any middleware stack • Ask Redhat / JBoss • Ask Sun / Oracle 59 Ask Microsoft
  • 77. Conclusion๏ There’s an explosion of ‘nosql’ databases out there • Some are immature and experimental • Some are coming out of years of battle-hardened production๏ NOSQL is about finding the right tool for the job • Frequently that’s an RDBMS • But increasingly commonly an RDBMS is the perfect fit๏ We will have heterogenous data backends in the future • Now the restthatthe stack needs to step up and help developers cope with of 60
  • 78. Key takeawayNot Only SQL is here 61
  • 79. Key takeawayNot Only SQLis here to stay 62
  • 80. Key takeaway Not Only SQLis FUN - dl & play around now! 63
  • 81. Image credits: Lost! Sorry... :( 64
  • 82. http://neotechnology.com

×