NOSQL for Dummies
Upcoming SlideShare
Loading in...5
×
 

NOSQL for Dummies

on

  • 67,395 views

NOSQL introduction/overview session presented at Miracle Open World 2010, at Hotel Legoland in Denmark.

NOSQL introduction/overview session presented at Miracle Open World 2010, at Hotel Legoland in Denmark.

Statistics

Views

Total Views
67,395
Views on SlideShare
39,849
Embed Views
27,546

Actions

Likes
104
Downloads
2,481
Comments
4

124 Embeds 27,546

http://www.javacodegeeks.com 12046
http://todobi.blogspot.com 2660
http://architects.dzone.com 2385
http://blog.muehlburger.at 1985
http://moodle.unicentro.br 1214
http://www.scoop.it 927
http://blog.manupk.com 863
http://jyops.blogspot.com 753
http://mundobi.com.ar 681
http://www.jyops.blogspot.in 656
http://jug-lviv.blogspot.com 393
http://todobi.blogspot.com.es 383
http://jyops.blogspot.in 343
http://www.dataprix.com 263
http://blog.websourcing.fr 261
http://www.slideshare.net 175
http://java.dzone.com 157
http://todobi.blogspot.mx 132
http://blog.ralscha.ch 115
http://todobi.blogspot.com.ar 78
http://jyops.blogspot.de 72
http://plus.url.google.com 71
http://jyops.blogspot.co.uk 68
http://blog.kinorama.com 68
http://jyops.blogspot.ca 62
http://blog.rasc.ch 56
http://feeds.feedburner.com 37
http://blog.jaffamonkey.com 31
http://jyops.blogspot.fr 29
http://www.todobi.blogspot.com 29
http://jyops.blogspot.co.il 29
http://jyops.blogspot.ch 28
http://jyops.blogspot.com.es 28
http://jyops.blogspot.com.br 25
http://wiki.di-support.com 23
http://jyops.blogspot.ru 23
http://jyops.blogspot.com.au 22
http://jyops.blogspot.be 20
https://twitter.com 19
http://jyops.blogspot.it 18
http://jyops.blogspot.sg 14
http://jyops.blogspot.ie 13
http://jyops.blogspot.co.nz 12
http://jyops.blogspot.nl 12
http://devs.posterous.com 11
http://6570091700813814076_78e73dce7654f2679f6b669567903615d0669814.blogspot.com 11
http://jyops.blogspot.mx 10
http://jyops.blogspot.cz 9
http://www.jyops.blogspot.ru 8
http://jyops.blogspot.ro 7
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

NOSQL for Dummies NOSQL for Dummies Presentation Transcript

  • NOSQL for Dummies twitter: @thobe / #neo4j Tobias Ivarsson email: tobias@neotechnology.com web: http://www.neo4j.org/ Hacker @ Neo Technology web: http://www.thobe.org/
  • This is still the view a lot of people have of NOSQL. Image credit: http://browsertoolkit.com/fault-tolerance.png 4
  • NOSQL - Defined by what it is Not ๏ “Any database that is not a Relational Database” ๏ The term was coined at a meetup with the creators behind some prominent emerging databases ๏ “Non-Relational Databases” might be more correct - But it’s a mouthful! ๏ ... then there was a conference ... ๏ ... and a mailing list ... ๏ ... the name caught on ... ๏ ... then there were more conferences ... ๏ ... and here we are! 5
  • NOSQL What’s in the name... 6
  • NO to SQL It’s not about saying that SQL should never be used, or that SQL is dead... 7
  • Not Only SQL It’s about recognizing that for some problems other storage solutions are better suited! 8
  • NOSQL - Why now? Four trends 9
  • Trend 1: Data size ExaBytes (10¹⁸) of data stored per year 988 1000 Each year more and more digital data is created. Over t wo 750 years we create more digital data than all 623 the data created in history before that. 500 397 253 250 161 0 2006 2007 2008 2009 2010 Data source: IDC 2007 10
  • Trend 2: Connectedness Giant Global Graph (GGG) Over time data has evolved to Ontologies be more and more interlinked and connected. Information connectivity RDF Hypertext has links, Blogs have pingback, Tagging groups all related data Folksonomies Tagging User- Wikis generated content Blogs RSS Hypertext Text documents web 1.0 web 2.0 “web 3.0” 1990 2000 2010 2020 11
  • Trend 3: Semi-structure ๏ Individualization of content • In the salary lists of the 1970s, all elements had exactly one job • In Or 15? lists of the 2000s, we need 5 job columns! Or 8? the salary ๏ All encompassing “entire world views” • Store more data about each entity ๏ Trend accelerated by the decentralization of content generation that is the hallmark of the age of participation (“web 2.0”) 12
  • Trend 4: Architecture 1980s: Mainframe applications Application DB 13
  • Trend 4: Architecture 1990s: Database as integration hub Application Application Application DB 14
  • Trend 4: Architecture 2000s: (moving towards) Decoupled services with their own backend Application Application Application DB DB DB 15
  • Why NOSQL Now? ๏Trend 1: Size ๏Trend 2: Connectedness ๏Trend 3: Semi-structure ๏Trend 4: Architecture 16
  • RDBMS performance Salary List Relational database Requirement of application Performance Majority of Webapps Social network We are building } applications today Semantic Trading that have size and load requirements that custom Data complexity 17
  • Four emerging NOSQL categories 18
  • Key-Value stores ๏ Focus on scaling to huge amounts of data ๏ Designed to handle massive load ๏ Based on Amazon’s Dynamo paper ๏ Data model: (global) collection of Key-Value pairs ๏ Dynamo ring partitioning and replication ๏ Examples: • Dynomite • Voldemort • Tokyo{Tyrant, Cabinet, etc...} 19
  • Key-Value stores We find the position of each object by its key. Here the keys are the names of the objects, alphabetically sorted. A Each object is replicated in a few other stores for redundancy, in G B this example we use 3 replicas. F C E D 20
  • Key-Value stores We find the position of each object by its key. Here the keys are the names of the objects, alphabetically sorted. A Each object is replicated in a few other stores for redundancy, in G B this example we use 3 replicas. F C E D 20
  • Key-Value stores We find the position of each object by its key. Here the keys are the names of the objects, alphabetically sorted. A Each object is replicated in a few other stores for redundancy, in G B this example we use 3 replicas. F C E D 20
  • Key-Value stores We find the position of each object by its key. Here the keys are the names of the objects, alphabetically sorted. A Each object is replicated in a few other stores for redundancy, in G B this example we use 3 replicas. F C E D 20
  • Key-Value stores We find the position of each object by its key. Here the keys are the names of the objects, alphabetically sorted. A Each object is replicated in a few other stores for redundancy, in G B this example we use 3 replicas. F C E D 20
  • BigTable clones ๏ Like column oriented Relational Databases, but with a twist ๏ Tables similarly to RDBMS, but handles semi-structured ๏ Based on Google’s BigTable paper ๏ Data model: ‣Columns → column families → ACL ‣Datums keyed by: row, column, time, index ‣Row-range → tablet → distribution ๏ Examples: • HBase • Hypertable • Cassandra 21
  • Document databases ๏ Similar to Key-Value stores, but the DB knows what the Value is ๏ Inspired by Lotus Notes ๏ Data model: Collections of Key-Value collections ๏ Documents are often versioned ๏ Examples: • CouchDB • MongoDB • Redis 22
  • Graph databases ๏ Focus on modeling the structure of data - interconnectivity ๏ Scales to the complexity of the data ๏ Inspired by mathematical Graph Theory ( G=(E,V) ) ๏ Data model: “Property Graph” ‣Nodes ‣Relationships/Edges between Nodes (first class) ‣Key-Value pairs on both ‣Possibly Edge Labels and/or Node/Edge Types ๏ Examples: • Neo4j • AllegroGraph • Sones graphDB 23
  • Property Graph model •Nodes •Relationships bet ween Nodes •Relationships have Labels •Relationships are directed, but traversed at equal speed in both directions •The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not) •Nodes have key-value properties •Relationships have key-value properties 24
  • Property Graph model •Nodes •Relationships bet ween Nodes •Relationships have Labels •Relationships are directed, but traversed at equal speed in both directions •The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not) •Nodes have key-value properties •Relationships have key-value properties 24
  • Property Graph model LIVES WITH LOVES OWNS DRIVES •Nodes •Relationships bet ween Nodes •Relationships have Labels •Relationships are directed, but traversed at equal speed in both directions •The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not) •Nodes have key-value properties •Relationships have key-value properties 24
  • Property Graph model LOVES LIVES WITH LOVES OWNS DRIVES •Nodes •Relationships bet ween Nodes •Relationships have Labels •Relationships are directed, but traversed at equal speed in both directions •The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not) •Nodes have key-value properties •Relationships have key-value properties 24
  • Property Graph model name: “Mary” LOVES name: “James” age: 35 age: 32 LIVES WITH twitter: “@spam” LOVES OWNS DRIVES •Nodes •Relationships bet ween Nodes •Relationships have Labels brand: “Volvo” •Relationships are directed, but traversed at model: “V70” equal speed in both directions •The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not) •Nodes have key-value properties •Relationships have key-value properties 24
  • Property Graph model name: “Mary” LOVES name: “James” age: 35 age: 32 LIVES WITH twitter: “@spam” LOVES OWNS property type: “car” DRIVES •Nodes •Relationships bet ween Nodes •Relationships have Labels brand: “Volvo” •Relationships are directed, but traversed at model: “V70” equal speed in both directions •The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not) •Nodes have key-value properties •Relationships have key-value properties 24
  • Graphs are whiteboard friendly An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database. With a Graph Database the model from the whiteboard is implemented directly. Image credits: Tobias Ivarsson 25
  • Graphs are whiteboard friendly An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database. With a Graph Database the model from the whiteboard is implemented directly. * 1 * * 1 * 1 * 1 * Image credits: Tobias Ivarsson 25
  • Graphs are whiteboard friendly An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database. With a Graph Database the model thobe from the whiteboard is implemented directly. Joe project blog Wardrobe Strength Hello Joe Modularizing Jython Neo4j performance analysis Image credits: Tobias Ivarsson 25
  • Four emerging NOSQL categories ๏Key-Value stores ๏BigTable clones ๏Document databases ๏Graph databases 26
  • ... and one that’s been around for a while ๏Object databases • Neither gaining nor loosing traction • Not part of the NOSQL community • Still a good solution to a lot of problems • Focuses on matching object oriented programming paradigm ‣Simplicity to integrate ‣Ease of use 27
  • Scaling to size vs. Scaling to complexity Size Key/Value stores Bigtable clones Document databases Graph databases Complexity 28
  • Scaling to size vs. Scaling to complexity Size Key/Value stores Bigtable clones Document databases Graph databases Billions of nodes and relationships > 90% of use cases Complexity 28
  • Who is NOSQL? A healthy mix of big players and independent vendors. 29
  • “Ok, it’s not a database. How do I query it?” ๏ RESTful interfaces (HTTP as an access API) ๏ Query languages other than SQL • GQL - SQL-like QL for Google BigTable • SPARQL - Query language for the Semantic Web • Gremlin - the graph traversal language • Sones Graph Query Language ๏ Query APIs • The Google BigTable DataStore API • The Neo4j Traversal API 30
  • Why is the database RESTing? Because hyperlinks make it possible to reference data on different hosts without hassle. RESTful is really all about hypermedia! http://four/ http://two/ http://one/ http://three/ My best friend is http://three/flounder! http://one/fishie 31
  • How about Data Manipulation? ๏ RESTful interfaces again (http PUT, POST, DELETE) ๏ Data Manipulation APIs • Google BigTable DataStore API • Neo4j GraphDatabase API ๏ Serialization Formats • JSON • Thrift • ProtoBuffers • RDF 32
  • NOSQL in the Enterprise ๏Availability ๏Security This presentation does not cover Security. The interesting parts of Security is an application layer issue anyways. ๏Correctness ๏Performance 33
  • Availability ๏ Replication • Write to many • (Multi-)Master to Slave replication ๏ Master reelection ๏ Failover • Either by another machine taking over • or by the client knowing to attempt a replica 34
  • Correctness ๏ Brewer’s CAP theorem • Most NOSQL db’s sacrifice Consistency ‣Some use “read-correction”, treat read values as votes ๏ Some NOSQL databases don’t have transactions • Instead they have only atomic single operations • This makes some operations impossible to implement 35
  • Performance ๏ This is where all the focus seems to be ๏ A surprising number scarifies Durability for performance • On-disk durability • Multiple-replicas durability ๏ All NOSQL databases outperform RDBMSes • ... in their particular niche ... 36
  • Up until recently there was only one Database, the RDBMS. The days of a single database that rules all is over. One database to rule them all Image credits: The Lord of the Rings, New Line Cinema 37
  • Use best suited storage for each kind of data The era of using RDBMSes for all problems is over. Instead we should use the database most suited for the problem at hand. 38
  • Polyglot persistence ... we could even use multiple databases in conjunction, and let each database handle the things it does best. 39
  • Polyglot persistence SQL && NOSQL All databases are welcome! SQL and NOSQL - it is Not Only SQL! 40
  • Summary ๏ Two steps forward ( but first one step back... ) ๏ The era of a single DBMS is over ๏ Use the right tool for the right job ๏ Polyglot persistence happens already, and will grow more common ๏ Solves different scalability issues • Scale to size - huge amounts of data, many many machines • Scale to complexity - handle complicated schemas - avoid being bogged down by deep JOINs ๏ Driven by big players and independent vendors - healthy community 41
  • Open source implementations to play with! ๏ Neo4j - talk to me, or visit http://neo4j.org/ ๏ CouchDB - http://couchdb.apache.org/ ๏ Cassandra - http://cassandra.apache.org/ ๏ Hadoop + HBase (clones GFS + BigTable) - http://hadoop.apache.org/ ๏ MongoDB - http://www.mongodb.org/ ๏ Redis - http://code.google.com/p/redis/ ๏ Oracle Berkley DB - http://www.oracle.com/database/berkeley-db/ ๏ FlockDB - http://github.com/twitter/flockdb ๏ ... and more ... 42
  • http://neotechnology.com