NOSQL Overview

4,514 views

Published on

Presented at JavaOne 2013, Wednesday September 25.

Published in: Technology, Design
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,514
On SlideShare
0
From Embeds
0
Number of Embeds
98
Actions
Shares
0
Downloads
120
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide

NOSQL Overview

  1. 1. NOSQL Overview Tobias Lindaaker Software Developer @ Neo Technology twitter:! @thobe / @neo4j / #neo4j email:! tobias@neotechnology.com web:! http://neo4j.org/ web:! http://thobe.org/ CON6449
  2. 2. Agenda ๏Key/Value Stores ๏Document Databases ๏NewSQL Databases ๏Graph Databases ๏Column Oriented Databases ๏Caches ๏Message Queues ๏Hadoop 2
  3. 3. General 3
  4. 4. Two main categories 4 Aggregate oriented Graph Distinction defined by Martin Fowler Source: NoSQL Distilled
  5. 5. Trend: Less uniformity 5
  6. 6. 6 α β γ δ ε ζ η θ ι κ λ μ id π τ 1337 2468 3145 3579 4468 7878 entity key value 1337 a lorem ipsum 1337 b lorem ipsum 3145 b lorem ipsum 3578 a lorem ipsum 3579 f lorem ipsum 3579 j lorem ipsum 4468 c lorem ipsum 4468 f lorem ipsum 7878 g lorem ipsum 7878 f lorem ipsum Sparse data - Relational mismatch
  7. 7. 7 id foo 1337 bar 2468 baz 3145 quux 3579 quux 4468 waldo 7878 fred Sparse data - Relational mismatch id data 1337 {"foo":"bar", ...} 2468 {"foo":"bar", ...} 3145 {"foo":"bar", ...} 3579 {"foo":"bar", ...} 4468 {"foo":"bar", ...} 7878 {"foo":"bar", ...} id bar 1337 foo 2468 baz 3145 quux 3579 quux 4468 waldo 7878 fred Search Tables Data Table
  8. 8. Trend: Exponential data growth 8 2005 2006 2007 2008 2009 2010 2011 2012
  9. 9. Connectednes Time Trend: Data becomes more connected 9
  10. 10. Nothing is new - everything changes 10 Then ๏Navigational databases IDS (Codasyl), IMS (IBM) ๏Multivalued databases PICK/BASIC ๏Key/Value databases MUMPS/M ๏COPYBOOK COBOL ๏Object databases Objectivity, db4o ๏XML databases Now ๏Graph databases Neo4j, ๏Column databases Cassandra ๏Key/Value databases Couchbase ๏Document databases MongoDB, Redis Still recent enough to not have “new” counterparts...
  11. 11. Key/Value stores 11
  12. 12. Key/Value stores 12 ๏Amazon SimpleDB ๏memcached ๏Oracle NoSQL Database ๏Redis
  13. 13. Key/Value stores 13 E D CF G B A
  14. 14. Key/Value stores 13 E D CF G B A
  15. 15. Key/Value stores 13 E D CF G B A
  16. 16. Key/Value stores 13 E D CF G B A
  17. 17. 14 Sample use case: Content sharing
  18. 18. Document Databases 15
  19. 19. Document Databases ๏Lotus Notes ๏MongoDB ๏Riak ๏Redis ๏CouchDB 16
  20. 20. Document Databases 17 ‣ id: 99CC ‣ fname: John ‣ lname: Smith
  21. 21. Document Databases 17 ‣ id: 99CC ‣ fname: John ‣ lname: Smith ‣ clock: ‣ type: Fob watch ‣ make: Gallifreyan ‣ diameter: 2”
  22. 22. Document Databases 17 ‣ id: 99CC ‣ fname: John ‣ lname: Smith ‣ clock: ‣ type: Fob watch ‣ make: Gallifreyan ‣ diameter: 2” ‣ id: 1337 ‣ fname: Martha ‣ lname: Jones ‣ occupation: MD
  23. 23. Document Databases 17 ‣ id: 99CC ‣ fname: John ‣ lname: Smith ‣ clock: ‣ type: Fob watch ‣ make: Gallifreyan ‣ diameter: 2” ‣ id: 1337 ‣ fname: Martha ‣ lname: Jones ‣ occupation: MD ‣ id: 2468 ‣ fname: Rose ‣ lname:Tyler ‣ in_love_with: 99CC
  24. 24. Document Databases 18
  25. 25. Document Databases 18 post title: ___ text: ___ tags: [...] comments text: ___ text: ___
  26. 26. The rise of REST for databases 19 ๏It’s actually all about Hypermedia: •When one aggregate root references another •Not necessarily on the same host •Hyperlinks provide the desired decoupling, and can reference documents qualified by host ๏HTTP and the ease to develop client drivers a further driver
  27. 27. NewSQL 20
  28. 28. NewSQL defined 21 ๏Relational Databases with (primarily) a SQL interface, that adopts the scaling benefits of NoSQL databases. ๏Automatic/Transparent sharding of data ๏Distributed, Fault Tolerant, Highly Available
  29. 29. NewSQL databases 22 ๏Google Spanner ๏VoltDB ๏TokuDB (MySQL engine) ๏Clusterix ๏RethinkDB
  30. 30. Graph Databases 23
  31. 31. Neo4j is a Graph Database 24
  32. 32. 24 IS_A Neo4j Graph Database
  33. 33. Example Graph Databases ๏Neo4j ๏Infinite Graph (by Objectivity) ๏AllegroGraph (by Franz inc.) ๏HypergraphDB ๏InfoGrid ๏DEX ๏VertexDB ๏FlockDB 25
  34. 34. 26
  35. 35. 27
  36. 36. 27 from stole
  37. 37. 27 from stole companion companion companion
  38. 38. 27 from stole companion companion companion married
  39. 39. 27 from stole companion companion companion enemy enemy enemy married
  40. 40. 27 from stole plays plays plays plays companion companion companion enemy enemy enemy married
  41. 41. 27 A Good Man Goes to War Bad Wolf from stole plays plays plays plays companion companion companion enemy enemy enemy married in in in inin in in
  42. 42. Graph Databases 30
  43. 43. Querying Graph Databases (Neo4j) 31 LOVES A B Graph Patterns
  44. 44. Querying Graph Databases (Neo4j) 31 A -[:LOVES]-> B LOVES A B Graph PatternsASCII art
  45. 45. Querying Graph Databases (Neo4j) 31 A -[:LOVES]-> B LOVES A B Graph Patterns START A=node:person(name=“A”) MATCH RETURN B as lover ASCII art
  46. 46. Column Oriented Databases 32
  47. 47. Column Store 33
  48. 48. Column Oriented Databases ๏Cassandra ๏BigTable (internal at Google) ๏HBase (part of Hadoop) ๏Hypertable 34
  49. 49. Column DB - Classic example 35 Twitter clone
  50. 50. Column Databases 36 ๏Use as underlying storage for a higher level data storage model ๏Eg. a graph database model implemented on top of Cassandra •Notable example: Aurelius Titan
  51. 51. Caches 37
  52. 52. Caches - Improving Reads 38 ๏Read from cache first, only read from DB on cache miss ๏Preferably cache aggregates, possibly after passing through App-level processing ๏memcached - mainly a cache, tried re-position as a NOSQL DB •as has other cache products tried
  53. 53. Message Queues 39
  54. 54. Message Queues - Improving Writes 40 ๏Write to Queue, process work from Queue in batches •Alleviates transactional overhead by grouping writes •Still guarantees writes if the Queue has durability guarantees •Needs tx synchronization with DB (2PC) ๏Writes not immediately visible, delayed through queue •Write-to-cache can be used to get around this, if a cache is used ๏Amazon SQS ๏RabbitMQ ๏ZeroMQ
  55. 55. 41 Hadoop - Big Data processing
  56. 56. 41 Hadoop - Big Data processing Oracle Neo4j Cassandra
  57. 57. 41 Hadoop - Big Data processing Oracle Neo4j Cassandra
  58. 58. 41 Hadoop - Big Data processing Map Reduce
  59. 59. Hadoop - Data Analysis/Processing 42 ๏Batch process large amounts of data typically offline or semi-online, not for interactive querying ๏Ingest data from your DB, process and generate report •Ex. Read Neo4j graph, generate centrality analysis report ๏Ingest data from event stream, process and generate data for DB •Ex. Read access logs, create Neo4j data for security analysis ๏Ingest data from one DB, process and generate data for another •Ex. Read MySQL transaction logs, create Neo4j data for query acceleration
  60. 60. More DB history 43
  61. 61. Building Databases is hard 44 ๏The current NOSQL wave took off in 2009 ๏... many much older databases still have issues... ๏Most likely there will be issues ๏https://github.com/aphyr/jepsen (by Kyle Kingsbury / @aphyr) •... most distributed databases fail in the event of Partitions ๏Test,Test,Test, and Test •Test the database heavily before you put it in production •Test for your use cases - generic benchmarks are useless •Test with real load •Test continuously
  62. 62. Serious DatabaseVendors take Data Seriously ๏Make sure to test their product under “real” load ๏Make sure to test their product in the event of failures ๏But you still need to Test! ๏Report issues to the vendor ๏Data loss is too embarrassing - will be fixed! ๏Performance is important - you’ll be heard! 45
  63. 63. Polyglot Persistence: combining multiple databases 46
  64. 64. Polyglot Persistence - Multiple DBs 47 ๏Real world examples: •RDBMS as system of record, Neo4j for accelerating (join) queries •Neo4j for storing metadata and structure, Cassandra for storing event logs, S3 for storing BLOB data
  65. 65. Conclusion 48
  66. 66. It is all about modelling Simplify the world enough ‣to reason about ‣to store and process
  67. 67. Model mis-match Real World Model
  68. 68. Complex problem? - right tool for each job! 51Image credits: Unknown :’(
  69. 69. Key/Value stores ๏Examples: •Amazon SimpleDB, memcached, Oracle NoSQL, Redis ๏Use when Data is opaque ๏Scalability is important ๏Scale simply with the addition of more servers •rebalance equally simply 52
  70. 70. Document Databases ๏Examples: •MongoDB, Riak ๏Use when data is collections of similar entities •But semi structured (sparse) rather than tabular •When fields in entries have multiple values 53
  71. 71. Column Family Databases ๏Examples: •Cassandra ๏Use when scalability is the main issue •Both scaling size and scaling load ‣In particular scaling write load ๏Linear scalability (as you add servers) both in read and write ๏Low level - will require you to duplicate data to support queries 54
  72. 72. Graph Databases ๏Examples: •Neo4j, DEX, InfiniteGraph ๏Use when (deep) traversals are important ๏For complex domains ๏When how entities relate is an important aspect of the domain 55
  73. 73. When not to use a NOSQL Database ๏RDBMSes have been the de-facto standard for years, and still have better tools for some tasks •Especially for reporting ๏When maintaining a system that works already ๏Sometimes when data is uniform / structured ๏When aggregations over (subsets) of the entire dataset is key ๏But please don’t use a Relational database for persisting objects 56
  74. 74. http://neotechnology.com Questions?

×