An overview of NOSQL (JFokus 2011)

2,437 views
2,344 views

Published on

An overview of the four major categories of NOSQL and an example of how to handle polyglot persistence using Spring Data.

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,437
On SlideShare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

An overview of NOSQL (JFokus 2011)

  1. 1. NOSQL - an overview - JFokus 2011Emil Eifrem @emileifremCEO, Neo Technology emil@neotechnology.com #neo4j
  2. 2. So what’s the plan?๏ Why NOSQL?๏ The NOSQL landscape๏ NOSQL challenges๏ Conclusion 2
  3. 3. First off: the name๏ WE ALL HATES IT, M’KAY? 3
  4. 4. NOSQL is NOT... 4
  5. 5. NOSQL is NOT... ๏ NO to SQL 4
  6. 6. NOSQL is NOT... ๏ NO to SQL ๏ NEVER SQL 4
  7. 7. NOSQL is simplyNot Only SQL 5
  8. 8. NOSQL - Why now? Four trends 7
  9. 9. Trend 1:data set size402007 Source: IDC 2007
  10. 10. 988Trend 1:data set size402007 2010 Source: IDC 2007
  11. 11. Trend 2: Connectedness Information connectivity web 1.0 web 2.0 “web 3.0” 1990 2000 2010 2020 10
  12. 12. Trend 2: Connectedness Information connectivity Text documents web 1.0 web 2.0 “web 3.0” 1990 2000 2010 2020 11
  13. 13. Trend 2: Connectedness Information connectivity Hypertext Text documents web 1.0 web 2.0 “web 3.0” 1990 2000 2010 2020 12
  14. 14. Trend 2: Connectedness Folksonomies Information connectivity Tagging Wikis User-generated content Blogs RSS Hypertext Text documents web 1.0 web 2.0 “web 3.0” 1990 2000 2010 2020 13
  15. 15. Trend 2: Connectedness Giant Global Graph (GGG) Ontologies RDF Folksonomies Information connectivity Tagging Wikis User-generated content Blogs RSS Hypertext Text documents web 1.0 web 2.0 “web 3.0” 1990 2000 2010 2020 14
  16. 16. Trend 2: Connectedness Giant Global Graph (GGG) Ontologies RDF Folksonomies Information connectivity Tagging Wikis User-generated content Blogs RSS Hypertext Text documents web 1.0 web 2.0 “web 3.0” 1990 2000 2010 2020 15
  17. 17. Trend 3: Semi-structure๏ Individualization of content • In the salary lists of the 1970s, all elements had exactly one job • In Or 15? lists of the 2000s, we need 5 job columns! Or 8? the salary๏ All encompassing “entire world views” • Store more data about each entity๏ Trend accelerated by the decentralization of content generation that is the hallmark of the age of participation (“web 2.0”) 16
  18. 18. Aside: RDBMS performance Relational database Requirement of application Performance Data complexity 17
  19. 19. Aside: RDBMS performance Relational database Requirement of application Performance Data complexity 18
  20. 20. Aside: RDBMS performance Salary List Relational database Requirement of application Performance Data complexity 19
  21. 21. Aside: RDBMS performance Salary List Relational database Requirement of application Performance Majority of Webapps Data complexity 20
  22. 22. Aside: RDBMS performance Salary List Relational database Requirement of application Performance Majority of Webapps Social network } Semantic Trading custom Data complexity 21
  23. 23. Trend 4: Architecture 1980s: Application (<-- note lack of s) Application DB 22
  24. 24. Trend 4: Architecture 1990s: Database as integration hub Application Application Application DB 23
  25. 25. Trend 4: Architecture 2000s: (moving towards) Decoupled services with their own backend Service Service Service DB DB DB 24
  26. 26. Why NOSQL Now?๏Trend 1: Size๏Trend 2: Connectedness๏Trend 3: Semi-structure๏Trend 4: Architecture 25
  27. 27. Four NOSQL categories 26
  28. 28. Category 1: Key-Value stores๏ Lineage: • “Dynamo: Amazon’s Highly Available Key-Value Store” (2007)๏ Data model: • Global key-value mapping • Think: Globally available HashMap/Dict/etc๏ Examples: • Project Voldemort • Tokyo {Cabinet,Tyrant, etc} 27 Riak
  29. 29. Category 1: Key-Value stores๏ Strengths • Simple data model • Great at scaling out horizontally๏ Weaknesses: • Simplistic data model • Poor for complex data 28
  30. 30. Category II: ColumnFamily (BigTable) stores๏ Lineage: • “Bigtable:(2006) Data” A Distributed Storage System for Structured๏ Data model: • A big table, with column families๏ Examples: • HBase • HyperTable • Cassandra 29
  31. 31. Category III: Document databases๏ Lineage: • Lotus Notes๏ Data model: • Collections of documents • A document is a key-value collection๏ Examples: • CouchDB • MongoDB 30 Terrastore
  32. 32. Document db: An example๏ How would we model a blogging software?๏ One stab: • Represent each Blog as a Collection of Post documents • Represent Comments as nested documents in the Post documents 31
  33. 33. Document db: Creating a blog postimport com.mongodb.Mongo;import com.mongodb.DB;import com.mongodb.DBCollection;import com.mongodb.BasicDBObject;import com.mongodb.DBObject;// ...Mongo mongo = new Mongo( "localhost" ); // Connect to MongoDB// ...DB blogs = mongo.getDB( "blogs" ); // Access the blogs databaseDBCollection myBlog = blogs.getCollection( "Thobe’s blog" );DBObject blogPost = new BasicDBObject();blogPost.put( "title", "JFokus 2011" );blogPost.put( "pub_date", new Date() );blogPost.put( "body", "Publishing a post about JFokus in my MongoDB blog!" );blogPost.put( "tags", Arrays.asList( "conference", "names" ) );blogPost.put( "comments", new ArrayList() );myBlog.insert( blogPost ); 32
  34. 34. Retrieving posts// ...import com.mongodb.DBCursor;// ...public Object getAllPosts( String blogName ) { DBCollection blog = db.getCollection( blogName ); return renderPosts( blog.find() );}private Object renderPosts( DBCursor cursor ) { // order by publication date (descending) cursor = cursor.sort( new BasicDBObject( "pub_date", -1 ) ); // ...} 33
  35. 35. Category IV: Graph databases๏ Lineage: • Euler and graph theory๏ Data model: • Nodes with properties • Typed relationships with properties๏ Examples: • Sones GraphDB • InfiniteGraph 34 Neo4j
  36. 36. Property Graph model 35
  37. 37. Property Graph model LOVES LIVES WITH LOVES OWNS DRIVES 36
  38. 38. Property Graph model name: “Mary” LOVESname: “James” age: 35age: 32 LIVES WITHtwitter: “@spam” LOVES OWNS property type: “car” DRIVES brand: “Volvo” model: “V70” 37
  39. 39. Graphs are whiteboard friendly An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database. With a Graph Database the model from the whiteboard is implemented directly. Image credits: Tobias Ivarsson 38
  40. 40. Graphs are whiteboard friendly An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database. With a Graph Database the model thobe from the whiteboard is implemented directly. Joe project blog Wardrobe Strength Hello Joe Modularizing Jython Neo4j performance analysis Image credits: Tobias Ivarsson 39
  41. 41. Graph db: Creating a social graphGraphDatabaseService graphDb = new EmbeddedGraphDatabase( GRAPH_STORAGE_LOCATION );Transaction tx = graphDb.beginTx();try { Node mrAnderson = graphDb.createNode(); mrAnderson.setProperty( "name", "Thomas Anderson" ); mrAnderson.setProperty( "age", 29 ); Node morpheus = graphDb.createNode(); morpheus.setProperty( "name", "Morpheus" ); morpheus.setProperty( "rank", "Captain" ); Relationship friendship = mrAnderson.createRelationshipTo( morpheus, SocialGraphTypes.FRIENDSHIP ); tx.success();} finally { tx.finish();} 40
  42. 42. Graph db: How do I know this person?Node me = ...Node you = ...PathFinder shortestPathFinder = GraphAlgoFactory.shortestPath( Traversals.expanderForTypes( SocialGraphTypes.FRIENDSHIP, Direction.BOTH ), /* maximum depth: */ 4 );Path shortestPath = shortestPathFinder.findSinglePath(me, you);for ( Node friend : shortestPath.nodes() ) { System.out.println( friend.getProperty( "name" ) );} 41
  43. 43. Graph db: Recommend new friendsNode person = ...TraversalDescription friendsOfFriends = Traversal.description() .expand( Traversals.expanderForTypes( SocialGraphTypes.FRIENDSHIP, Direction.BOTH ) ) .prune( Traversal.pruneAfterDepth( 2 ) ) .breadthFirst() // Visit my friends before their friends. //Visit a node at most once (don’t recommend direct friends) .uniqueness( Uniqueness.NODE_GLOBAL ) .filter( new Predicate<Path>() { // Only return friends of friends public boolean accept( Path traversalPos ) { return traversalPos.length() == 2; } } );for ( Node recommendation : friendsOfFriends.traverse( person ).nodes() ) { System.out.println( recommendedFriend.getProperty("name") );} 42
  44. 44. Four emerging NOSQL categories ๏Key-Value stores ๏ColumnFamiy stores ๏Document databases ๏Graph databases 43
  45. 45. Scaling to size vs. Scaling to complexity Size Key/Value stores Complexity 44
  46. 46. Scaling to size vs. Scaling to complexity Size Key/Value stores ColumnFamily stores Complexity 44
  47. 47. Scaling to size vs. Scaling to complexity Size Key/Value stores ColumnFamily stores Document databases Complexity 44
  48. 48. Scaling to size vs. Scaling to complexity Size Key/Value stores ColumnFamily stores Document databases Graph databases Complexity 44
  49. 49. Scaling to size vs. Scaling to complexity Size Key/Value stores ColumnFamily stores Document databases Graph databases Billions of nodes and relationships My subjective view: > 90% of use cases Complexity 44
  50. 50. NOSQL challenges? 45
  51. 51. NOSQL challenges?๏ Mindshare • But that’s also product usability (“how do you query it?”) 45
  52. 52. NOSQL challenges?๏ Mindshare • But that’s also product usability (“how do you query it?”)๏ Tool support • Both devtime tools and runtime ops tools • Standards may help? • ... or maybe just time 45
  53. 53. NOSQL challenges?๏ Mindshare • But that’s also product usability (“how do you query it?”)๏ Tool support • Both devtime tools and runtime ops tools • Standards may help? • ... or maybe just time๏ Middleware support 45
  54. 54. Middleware support?๏ Lemme tell you the story about Mike and his restaurant site 46
  55. 55. Domain & data model Restaurant UserAccount @Entity @Entity public class Restaurant { @Table(name = "user_account") @Id @GeneratedValue public class UserAccount { private Long id; @Id @GeneratedValue private String name; private Long id; private String city; private String userName; private String state; private String firstName; private String zipCode; private String lastName; @Temporal(TemporalType.TIMESTAMP) private Date birthDate; @ManyToMany(cascade = CascadeType.ALL) private Set<Restaurant> favorites; 47
  56. 56. Step 1: Buildsing a web site Tomcat One box MySQL 48
  57. 57. Step II: Whoa, ppl are actually using it? 49
  58. 58. Step II: Whoa, ppl are actually using it? Tomcat Two boxes MySQL 49
  59. 59. Step III: That’s a LOT of pages served... 50
  60. 60. Step III: That’s a LOT of pages served... Tomcat Tomcat Tomcat n boxes MySQL 1 box 50
  61. 61. Step IV: Our DB is completely overwhelmed... 51
  62. 62. Step IV: Our DB is completely overwhelmed... Tomcat Tomcat Tomcat n boxes MySQL (m) MySQL (s) n boxes 51
  63. 63. Step V: Our DBs are STILL overwhelmed ? 52
  64. 64. What does the site look like now? 53
  65. 65. Step V: Our DBs are STILL overwhelmed๏ Turns out the problem is due to joins๏ A while back Mike introduced a new feature • Recommend restaurants based on the user’s friends (and friends of friends) • Whoa, recommendations aren’t just simple get and put! • They’re killing us with joins๏ What about sharding?๏ What about SSDs? 54
  66. 66. Polyglot persistence (Not Only SQL)๏ How did we get into this situation?๏ Well, data sets are increasingly less uniform๏ Parts of Mike’s data fits well in an RDBMS๏ But parts of it is graph-shaped • It fits much better in a graph database! • And I’m sure that there is or will be very key-value-esque parts of the dataset๏ Simple, just store some of it in a graph db and some of it in MySQL! But what does the code look like? 55
  67. 67. We were here 56
  68. 68. We were here Restaurant UserAccount @Entity @Entity public class Restaurant { @Table(name = "user_account") @Id @GeneratedValue public class UserAccount { private Long id; @Id @GeneratedValue private String name; private Long id; private String city; private String userName; private String state; private String firstName; private String zipCode; private String lastName; @Temporal(TemporalType.TIMESTAMP) private Date birthDate; @ManyToMany(cascade = CascadeType.ALL) private Set<Restaurant> favorites; 56
  69. 69. Then we added recommendations Restaurant UserAccount @Entity @Entity @NodeEntity(partial = true) @Table(name = "user_account") public class Restaurant { @NodeEntity(partial = true) @Id @GeneratedValue public class UserAccount { private Long id; @Id @GeneratedValue private String name; private Long id; private String city; private String userName; private String state; private String firstName; private String zipCode; private String lastName; @Temporal(TemporalType.TIMESTAMP) private Date birthDate; @ManyToMany(cascade = CascadeType.ALL) Recommendation private Set<Restaurant> favorites; @RelationshipEntity @GraphProperty public class Recommendation { String nickname; @StartNode @RelatedTo(type = "friends", private UserAccount user; elementClass = UserAccount.class) @EndNode Set<UserAccount> friends; private Restaurant restaurant; @RelatedToVia(type = "recommends", private int stars; elementClass = Recommendation.class) private String comment; Iterable<Recommendation> recommendations; 57
  70. 70. Then we added recommendations Restaurant UserAccount @Entity @Entity @NodeEntity(partial = true) @Table(name = "user_account") public class Restaurant { @NodeEntity(partial = true) @Id @GeneratedValue public class UserAccount { private Long id; @Id @GeneratedValue private String name; private Long id; private String city; private String userName; private String state; private String firstName; private String zipCode; private String lastName; @Temporal(TemporalType.TIMESTAMP) private Date birthDate; @ManyToMany(cascade = CascadeType.ALL) Recommendation private Set<Restaurant> favorites; @RelationshipEntity @GraphProperty public class Recommendation { String nickname; @StartNode @RelatedTo(type = "friends", private UserAccount user; elementClass = UserAccount.class) @EndNode Set<UserAccount> friends; private Restaurant restaurant; @RelatedToVia(type = "recommends", private int stars; elementClass = Recommendation.class) private String comment; Iterable<Recommendation> recommendations; 58
  71. 71. Then we added recommendations Restaurant UserAccount @Entity @Entity @NodeEntity(partial = true) @Table(name = "user_account") public class Restaurant { @NodeEntity(partial = true) @Id @GeneratedValue public class UserAccount { private Long id; @Id @GeneratedValue private String name; private Long id; private String city; private String userName; private String state; private String firstName; private String zipCode; private String lastName; @Temporal(TemporalType.TIMESTAMP) private Date birthDate; @ManyToMany(cascade = CascadeType.ALL) Recommendation private Set<Restaurant> favorites; @RelationshipEntity @GraphProperty public class Recommendation { String nickname; @StartNode @RelatedTo(type = "friends", private UserAccount user; elementClass = UserAccount.class) @EndNode Set<UserAccount> friends; private Restaurant restaurant; @RelatedToVia(type = "recommends", private int stars; elementClass = Recommendation.class) private String comment; Iterable<Recommendation> recommendations; 58
  72. 72. Then we added recommendations Restaurant UserAccount @Entity @Entity @NodeEntity(partial = true) @Table(name = "user_account") public class Restaurant { @NodeEntity(partial = true) @Id @GeneratedValue public class UserAccount { private Long id; @Id @GeneratedValue private String name; private Long id; private String city; private String userName; private String state; private String firstName; private String zipCode; private String lastName; @Temporal(TemporalType.TIMESTAMP) private Date birthDate; @ManyToMany(cascade = CascadeType.ALL) Recommendation private Set<Restaurant> favorites; @RelationshipEntity @GraphProperty public class Recommendation { String nickname; @StartNode @RelatedTo(type = "friends", private UserAccount user; elementClass = UserAccount.class) @EndNode Set<UserAccount> friends; private Restaurant restaurant; @RelatedToVia(type = "recommends", private int stars; elementClass = Recommendation.class) private String comment; Iterable<Recommendation> recommendations; 58
  73. 73. Then we added recommendations Restaurant UserAccount @Entity @Entity @NodeEntity(partial = true) @Table(name = "user_account") public class Restaurant { @NodeEntity(partial = true) @Id @GeneratedValue public class UserAccount { private Long id; @Id @GeneratedValue private String name; private Long id; private String city; private String userName; private String state; private String firstName; private String zipCode; private String lastName; @Temporal(TemporalType.TIMESTAMP) private Date birthDate; @ManyToMany(cascade = CascadeType.ALL) Recommendation private Set<Restaurant> favorites; @RelationshipEntity @GraphProperty public class Recommendation { String nickname; @StartNode @RelatedTo(type = "friends", private UserAccount user; elementClass = UserAccount.class) @EndNode Set<UserAccount> friends; private Restaurant restaurant; @RelatedToVia(type = "recommends", private int stars; elementClass = Recommendation.class) private String comment; Iterable<Recommendation> recommendations; 58
  74. 74. And we’re back to talking about Middleware 59
  75. 75. And we’re back to talking about Middleware๏ This example was from the Spring Data project • Specifically Spring Data Graph • Available at: http://www.springsource.org/spring-data • Currently in M02 -- will be released 1.0 end of March • Would LOVEkinda love,REALLY love, likeifValentine’s day user yesterday <--- like some feedback you’re a Spring 59
  76. 76. And we’re back to talking about Middleware๏ This example was from the Spring Data project • Specifically Spring Data Graph • Available at: http://www.springsource.org/spring-data • Currently in M02 -- will be released 1.0 end of March • Would LOVEkinda love,REALLY love, likeifValentine’s day user yesterday <--- like some feedback you’re a Spring๏ But this really should be available in any middleware stack • Ask Redhat / JBoss • Ask Sun / Oracle 59 Ask Microsoft
  77. 77. Conclusion๏ There’s an explosion of ‘nosql’ databases out there • Some are immature and experimental • Some are coming out of years of battle-hardened production๏ NOSQL is about finding the right tool for the job • Frequently that’s an RDBMS • But increasingly commonly an RDBMS is the perfect fit๏ We will have heterogenous data backends in the future • Now the restthatthe stack needs to step up and help developers cope with of 60
  78. 78. Key takeawayNot Only SQL is here 61
  79. 79. Key takeawayNot Only SQLis here to stay 62
  80. 80. Key takeaway Not Only SQLis FUN - dl & play around now! 63
  81. 81. Image credits: Lost! Sorry... :( 64
  82. 82. http://neotechnology.com

×