NOSQL Overview
Upcoming SlideShare
Loading in...5
×
 

NOSQL Overview

on

  • 2,390 views

Presented at JavaOne 2013, Wednesday September 25.

Presented at JavaOne 2013, Wednesday September 25.

Statistics

Views

Total Views
2,390
Views on SlideShare
2,342
Embed Views
48

Actions

Likes
5
Downloads
35
Comments
0

2 Embeds 48

http://java.dzone.com 47
http://www.dzone.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

NOSQL Overview NOSQL Overview Presentation Transcript

  • NOSQL Overview Tobias Lindaaker Software Developer @ Neo Technology twitter:! @thobe / @neo4j / #neo4j email:! tobias@neotechnology.com web:! http://neo4j.org/ web:! http://thobe.org/ CON6449
  • Agenda ๏Key/Value Stores ๏Document Databases ๏NewSQL Databases ๏Graph Databases ๏Column Oriented Databases ๏Caches ๏Message Queues ๏Hadoop 2
  • General 3
  • Two main categories 4 Aggregate oriented Graph Distinction defined by Martin Fowler Source: NoSQL Distilled
  • Trend: Less uniformity 5
  • 6 α β γ δ ε ζ η θ ι κ λ μ id π τ 1337 2468 3145 3579 4468 7878 entity key value 1337 a lorem ipsum 1337 b lorem ipsum 3145 b lorem ipsum 3578 a lorem ipsum 3579 f lorem ipsum 3579 j lorem ipsum 4468 c lorem ipsum 4468 f lorem ipsum 7878 g lorem ipsum 7878 f lorem ipsum Sparse data - Relational mismatch
  • 7 id foo 1337 bar 2468 baz 3145 quux 3579 quux 4468 waldo 7878 fred Sparse data - Relational mismatch id data 1337 {"foo":"bar", ...} 2468 {"foo":"bar", ...} 3145 {"foo":"bar", ...} 3579 {"foo":"bar", ...} 4468 {"foo":"bar", ...} 7878 {"foo":"bar", ...} id bar 1337 foo 2468 baz 3145 quux 3579 quux 4468 waldo 7878 fred Search Tables Data Table
  • Trend: Exponential data growth 8 2005 2006 2007 2008 2009 2010 2011 2012
  • Connectednes Time Trend: Data becomes more connected 9
  • Nothing is new - everything changes 10 Then ๏Navigational databases IDS (Codasyl), IMS (IBM) ๏Multivalued databases PICK/BASIC ๏Key/Value databases MUMPS/M ๏COPYBOOK COBOL ๏Object databases Objectivity, db4o ๏XML databases Now ๏Graph databases Neo4j, ๏Column databases Cassandra ๏Key/Value databases Couchbase ๏Document databases MongoDB, Redis Still recent enough to not have “new” counterparts...
  • Key/Value stores 11
  • Key/Value stores 12 ๏Amazon SimpleDB ๏memcached ๏Oracle NoSQL Database ๏Redis
  • Key/Value stores 13 E D CF G B A
  • Key/Value stores 13 E D CF G B A
  • Key/Value stores 13 E D CF G B A
  • Key/Value stores 13 E D CF G B A
  • 14 Sample use case: Content sharing
  • Document Databases 15
  • Document Databases ๏Lotus Notes ๏MongoDB ๏Riak ๏Redis ๏CouchDB 16
  • Document Databases 17 ‣ id: 99CC ‣ fname: John ‣ lname: Smith
  • Document Databases 17 ‣ id: 99CC ‣ fname: John ‣ lname: Smith ‣ clock: ‣ type: Fob watch ‣ make: Gallifreyan ‣ diameter: 2”
  • Document Databases 17 ‣ id: 99CC ‣ fname: John ‣ lname: Smith ‣ clock: ‣ type: Fob watch ‣ make: Gallifreyan ‣ diameter: 2” ‣ id: 1337 ‣ fname: Martha ‣ lname: Jones ‣ occupation: MD
  • Document Databases 17 ‣ id: 99CC ‣ fname: John ‣ lname: Smith ‣ clock: ‣ type: Fob watch ‣ make: Gallifreyan ‣ diameter: 2” ‣ id: 1337 ‣ fname: Martha ‣ lname: Jones ‣ occupation: MD ‣ id: 2468 ‣ fname: Rose ‣ lname:Tyler ‣ in_love_with: 99CC
  • Document Databases 18
  • Document Databases 18 post title: ___ text: ___ tags: [...] comments text: ___ text: ___
  • The rise of REST for databases 19 ๏It’s actually all about Hypermedia: •When one aggregate root references another •Not necessarily on the same host •Hyperlinks provide the desired decoupling, and can reference documents qualified by host ๏HTTP and the ease to develop client drivers a further driver
  • NewSQL 20
  • NewSQL defined 21 ๏Relational Databases with (primarily) a SQL interface, that adopts the scaling benefits of NoSQL databases. ๏Automatic/Transparent sharding of data ๏Distributed, Fault Tolerant, Highly Available
  • NewSQL databases 22 ๏Google Spanner ๏VoltDB ๏TokuDB (MySQL engine) ๏Clusterix ๏RethinkDB
  • Graph Databases 23
  • Neo4j is a Graph Database 24
  • 24 IS_A Neo4j Graph Database
  • Example Graph Databases ๏Neo4j ๏Infinite Graph (by Objectivity) ๏AllegroGraph (by Franz inc.) ๏HypergraphDB ๏InfoGrid ๏DEX ๏VertexDB ๏FlockDB 25
  • 26
  • 27
  • 27 from stole
  • 27 from stole companion companion companion
  • 27 from stole companion companion companion married
  • 27 from stole companion companion companion enemy enemy enemy married
  • 27 from stole plays plays plays plays companion companion companion enemy enemy enemy married
  • 27 A Good Man Goes to War Bad Wolf from stole plays plays plays plays companion companion companion enemy enemy enemy married in in in inin in in
  • Graph Databases 30
  • Querying Graph Databases (Neo4j) 31 LOVES A B Graph Patterns
  • Querying Graph Databases (Neo4j) 31 A -[:LOVES]-> B LOVES A B Graph PatternsASCII art
  • Querying Graph Databases (Neo4j) 31 A -[:LOVES]-> B LOVES A B Graph Patterns START A=node:person(name=“A”) MATCH RETURN B as lover ASCII art
  • Column Oriented Databases 32
  • Column Store 33
  • Column Oriented Databases ๏Cassandra ๏BigTable (internal at Google) ๏HBase (part of Hadoop) ๏Hypertable 34
  • Column DB - Classic example 35 Twitter clone
  • Column Databases 36 ๏Use as underlying storage for a higher level data storage model ๏Eg. a graph database model implemented on top of Cassandra •Notable example: Aurelius Titan
  • Caches 37
  • Caches - Improving Reads 38 ๏Read from cache first, only read from DB on cache miss ๏Preferably cache aggregates, possibly after passing through App-level processing ๏memcached - mainly a cache, tried re-position as a NOSQL DB •as has other cache products tried
  • Message Queues 39
  • Message Queues - Improving Writes 40 ๏Write to Queue, process work from Queue in batches •Alleviates transactional overhead by grouping writes •Still guarantees writes if the Queue has durability guarantees •Needs tx synchronization with DB (2PC) ๏Writes not immediately visible, delayed through queue •Write-to-cache can be used to get around this, if a cache is used ๏Amazon SQS ๏RabbitMQ ๏ZeroMQ
  • 41 Hadoop - Big Data processing
  • 41 Hadoop - Big Data processing Oracle Neo4j Cassandra
  • 41 Hadoop - Big Data processing Oracle Neo4j Cassandra
  • 41 Hadoop - Big Data processing Map Reduce
  • Hadoop - Data Analysis/Processing 42 ๏Batch process large amounts of data typically offline or semi-online, not for interactive querying ๏Ingest data from your DB, process and generate report •Ex. Read Neo4j graph, generate centrality analysis report ๏Ingest data from event stream, process and generate data for DB •Ex. Read access logs, create Neo4j data for security analysis ๏Ingest data from one DB, process and generate data for another •Ex. Read MySQL transaction logs, create Neo4j data for query acceleration
  • More DB history 43
  • Building Databases is hard 44 ๏The current NOSQL wave took off in 2009 ๏... many much older databases still have issues... ๏Most likely there will be issues ๏https://github.com/aphyr/jepsen (by Kyle Kingsbury / @aphyr) •... most distributed databases fail in the event of Partitions ๏Test,Test,Test, and Test •Test the database heavily before you put it in production •Test for your use cases - generic benchmarks are useless •Test with real load •Test continuously
  • Serious DatabaseVendors take Data Seriously ๏Make sure to test their product under “real” load ๏Make sure to test their product in the event of failures ๏But you still need to Test! ๏Report issues to the vendor ๏Data loss is too embarrassing - will be fixed! ๏Performance is important - you’ll be heard! 45
  • Polyglot Persistence: combining multiple databases 46
  • Polyglot Persistence - Multiple DBs 47 ๏Real world examples: •RDBMS as system of record, Neo4j for accelerating (join) queries •Neo4j for storing metadata and structure, Cassandra for storing event logs, S3 for storing BLOB data
  • Conclusion 48
  • It is all about modelling Simplify the world enough ‣to reason about ‣to store and process
  • Model mis-match Real World Model
  • Complex problem? - right tool for each job! 51Image credits: Unknown :’(
  • Key/Value stores ๏Examples: •Amazon SimpleDB, memcached, Oracle NoSQL, Redis ๏Use when Data is opaque ๏Scalability is important ๏Scale simply with the addition of more servers •rebalance equally simply 52
  • Document Databases ๏Examples: •MongoDB, Riak ๏Use when data is collections of similar entities •But semi structured (sparse) rather than tabular •When fields in entries have multiple values 53
  • Column Family Databases ๏Examples: •Cassandra ๏Use when scalability is the main issue •Both scaling size and scaling load ‣In particular scaling write load ๏Linear scalability (as you add servers) both in read and write ๏Low level - will require you to duplicate data to support queries 54
  • Graph Databases ๏Examples: •Neo4j, DEX, InfiniteGraph ๏Use when (deep) traversals are important ๏For complex domains ๏When how entities relate is an important aspect of the domain 55
  • When not to use a NOSQL Database ๏RDBMSes have been the de-facto standard for years, and still have better tools for some tasks •Especially for reporting ๏When maintaining a system that works already ๏Sometimes when data is uniform / structured ๏When aggregations over (subsets) of the entire dataset is key ๏But please don’t use a Relational database for persisting objects 56
  • http://neotechnology.com Questions?