NOSQL
            for Dummies
                          twitter: @thobe / #neo4j
Tobias Ivarsson           email: tobias@n...
This is still the view a lot
of people have of NOSQL.




Image credit: http://browsertoolkit.com/fault-tolerance.png

   ...
NOSQL - Defined by what it is Not
๏ “Any database that is not a Relational Database”
๏ The term was coined at a meetup with...
NOSQL
 What’s in the name...




                         6
NO to SQL
  It’s not about saying that
  SQL should never be used,
  or that SQL is dead...




                          ...
Not Only SQL
    It’s about recognizing
    that for some problems
    other storage solutions
    are better suited!




...
NOSQL - Why now?
    Four trends

                  9
Trend 1: Data size
               ExaBytes (10¹⁸) of data stored per year
                                                ...
Trend 2: Connectedness
                                                                                                   ...
Trend 3: Semi-structure
๏ Individualization of content
   • In the salary lists of the 1970s, all elements had exactly one...
Trend 4: Architecture

              1980s: Mainframe applications


                       Application




              ...
Trend 4: Architecture

             1990s: Database as integration hub


          Application   Application    Applicatio...
Trend 4: Architecture

         2000s: (moving towards) Decoupled services
                        with their own backend
...
Why NOSQL Now?

๏Trend 1: Size
๏Trend 2: Connectedness
๏Trend 3: Semi-structure
๏Trend 4: Architecture

                  ...
RDBMS performance
               Salary List                                    Relational database

                     ...
Four emerging NOSQL categories



                           18
Key-Value stores
๏ Focus on scaling to huge amounts of data
๏ Designed to handle massive load
๏ Based on Amazon’s Dynamo p...
Key-Value stores
                                   We find the position of each
                                   object...
Key-Value stores
                                   We find the position of each
                                   object...
Key-Value stores
                                   We find the position of each
                                   object...
Key-Value stores
                                   We find the position of each
                                   object...
Key-Value stores
                                   We find the position of each
                                   object...
BigTable clones
๏ Like column oriented Relational Databases, but with a twist
๏ Tables similarly to RDBMS, but handles sem...
Document databases
๏ Similar to Key-Value stores, but the DB knows what the Value is
๏ Inspired by Lotus Notes
๏ Data mode...
Graph databases
๏ Focus on modeling the structure of data - interconnectivity
๏ Scales to the complexity of the data
๏ Ins...
Property Graph model




•Nodes
•Relationships bet ween Nodes
•Relationships have Labels
•Relationships are directed, but ...
Property Graph model




•Nodes
•Relationships bet ween Nodes
•Relationships have Labels
•Relationships are directed, but ...
Property Graph model


                                                LIVES WITH
                                        ...
Property Graph model

                                                           LOVES

                                  ...
Property Graph model
                                                                          name: “Mary”
              ...
Property Graph model
                                                                          name: “Mary”
              ...
Graphs are whiteboard friendly                        An application domain model
                                        ...
Graphs are whiteboard friendly                               An application domain model
                                 ...
Graphs are whiteboard friendly                              An application domain model
                                  ...
Four emerging NOSQL categories

       ๏Key-Value stores
       ๏BigTable clones
       ๏Document databases
       ๏Graph ...
... and one that’s been around for a while
๏Object databases
   • Neither gaining nor loosing traction
   • Not part of th...
Scaling to size vs. Scaling to complexity
    Size
       Key/Value stores

                          Bigtable clones

   ...
Scaling to size vs. Scaling to complexity
    Size
       Key/Value stores

                          Bigtable clones

   ...
Who is NOSQL?   A healthy mix of
                big players and
                independent
                vendors.




...
“Ok, it’s not a database. How do I query it?”
๏ RESTful interfaces (HTTP as an access API)
๏ Query languages other than SQ...
Why is the database RESTing?
                                                                       Because hyperlinks
   ...
How about Data Manipulation?
๏ RESTful interfaces again (http PUT, POST, DELETE)
๏ Data Manipulation APIs
   • Google BigT...
NOSQL in the Enterprise

๏Availability
๏Security
                This presentation does not cover
                Security...
Availability
๏ Replication
   • Write to many
   • (Multi-)Master to Slave replication
๏ Master reelection
๏ Failover
   •...
Correctness
๏ Brewer’s CAP theorem
   • Most NOSQL db’s sacrifice Consistency
     ‣Some use “read-correction”, treat read ...
Performance
๏ This is where all the focus seems to be
๏ A surprising number scarifies Durability for performance
   • On-di...
Up until recently there was
                                                   only one Database, the
                    ...
Use best suited storage for each kind of data
                                       The era of using
                    ...
Polyglot persistence
                       ... we could even use
                       multiple databases in
           ...
Polyglot persistence
                 SQL && NOSQL




      All databases are welcome!
      SQL and NOSQL - it is Not On...
Summary
๏ Two steps forward ( but first one step back... )
๏ The era of a single DBMS is over
๏ Use the right tool for the ...
Open source implementations to play with!
๏ Neo4j - talk to me, or visit http://neo4j.org/
๏ CouchDB - http://couchdb.apac...
http://neotechnology.com
Upcoming SlideShare
Loading in...5
×

NOSQL for Dummies

81,162

Published on

NOSQL introduction/overview session presented at Miracle Open World 2010, at Hotel Legoland in Denmark.

Published in: Technology
4 Comments
126 Likes
Statistics
Notes
No Downloads
Views
Total Views
81,162
On Slideshare
0
From Embeds
0
Number of Embeds
113
Actions
Shares
0
Downloads
3,210
Comments
4
Likes
126
Embeds 0
No embeds

No notes for slide

NOSQL for Dummies

  1. 1. NOSQL for Dummies twitter: @thobe / #neo4j Tobias Ivarsson email: tobias@neotechnology.com web: http://www.neo4j.org/ Hacker @ Neo Technology web: http://www.thobe.org/
  2. 2. This is still the view a lot of people have of NOSQL. Image credit: http://browsertoolkit.com/fault-tolerance.png 4
  3. 3. NOSQL - Defined by what it is Not ๏ “Any database that is not a Relational Database” ๏ The term was coined at a meetup with the creators behind some prominent emerging databases ๏ “Non-Relational Databases” might be more correct - But it’s a mouthful! ๏ ... then there was a conference ... ๏ ... and a mailing list ... ๏ ... the name caught on ... ๏ ... then there were more conferences ... ๏ ... and here we are! 5
  4. 4. NOSQL What’s in the name... 6
  5. 5. NO to SQL It’s not about saying that SQL should never be used, or that SQL is dead... 7
  6. 6. Not Only SQL It’s about recognizing that for some problems other storage solutions are better suited! 8
  7. 7. NOSQL - Why now? Four trends 9
  8. 8. Trend 1: Data size ExaBytes (10¹⁸) of data stored per year 988 1000 Each year more and more digital data is created. Over t wo 750 years we create more digital data than all 623 the data created in history before that. 500 397 253 250 161 0 2006 2007 2008 2009 2010 Data source: IDC 2007 10
  9. 9. Trend 2: Connectedness Giant Global Graph (GGG) Over time data has evolved to Ontologies be more and more interlinked and connected. Information connectivity RDF Hypertext has links, Blogs have pingback, Tagging groups all related data Folksonomies Tagging User- Wikis generated content Blogs RSS Hypertext Text documents web 1.0 web 2.0 “web 3.0” 1990 2000 2010 2020 11
  10. 10. Trend 3: Semi-structure ๏ Individualization of content • In the salary lists of the 1970s, all elements had exactly one job • In Or 15? lists of the 2000s, we need 5 job columns! Or 8? the salary ๏ All encompassing “entire world views” • Store more data about each entity ๏ Trend accelerated by the decentralization of content generation that is the hallmark of the age of participation (“web 2.0”) 12
  11. 11. Trend 4: Architecture 1980s: Mainframe applications Application DB 13
  12. 12. Trend 4: Architecture 1990s: Database as integration hub Application Application Application DB 14
  13. 13. Trend 4: Architecture 2000s: (moving towards) Decoupled services with their own backend Application Application Application DB DB DB 15
  14. 14. Why NOSQL Now? ๏Trend 1: Size ๏Trend 2: Connectedness ๏Trend 3: Semi-structure ๏Trend 4: Architecture 16
  15. 15. RDBMS performance Salary List Relational database Requirement of application Performance Majority of Webapps Social network We are building } applications today Semantic Trading that have size and load requirements that custom Data complexity 17
  16. 16. Four emerging NOSQL categories 18
  17. 17. Key-Value stores ๏ Focus on scaling to huge amounts of data ๏ Designed to handle massive load ๏ Based on Amazon’s Dynamo paper ๏ Data model: (global) collection of Key-Value pairs ๏ Dynamo ring partitioning and replication ๏ Examples: • Dynomite • Voldemort • Tokyo{Tyrant, Cabinet, etc...} 19
  18. 18. Key-Value stores We find the position of each object by its key. Here the keys are the names of the objects, alphabetically sorted. A Each object is replicated in a few other stores for redundancy, in G B this example we use 3 replicas. F C E D 20
  19. 19. Key-Value stores We find the position of each object by its key. Here the keys are the names of the objects, alphabetically sorted. A Each object is replicated in a few other stores for redundancy, in G B this example we use 3 replicas. F C E D 20
  20. 20. Key-Value stores We find the position of each object by its key. Here the keys are the names of the objects, alphabetically sorted. A Each object is replicated in a few other stores for redundancy, in G B this example we use 3 replicas. F C E D 20
  21. 21. Key-Value stores We find the position of each object by its key. Here the keys are the names of the objects, alphabetically sorted. A Each object is replicated in a few other stores for redundancy, in G B this example we use 3 replicas. F C E D 20
  22. 22. Key-Value stores We find the position of each object by its key. Here the keys are the names of the objects, alphabetically sorted. A Each object is replicated in a few other stores for redundancy, in G B this example we use 3 replicas. F C E D 20
  23. 23. BigTable clones ๏ Like column oriented Relational Databases, but with a twist ๏ Tables similarly to RDBMS, but handles semi-structured ๏ Based on Google’s BigTable paper ๏ Data model: ‣Columns → column families → ACL ‣Datums keyed by: row, column, time, index ‣Row-range → tablet → distribution ๏ Examples: • HBase • Hypertable • Cassandra 21
  24. 24. Document databases ๏ Similar to Key-Value stores, but the DB knows what the Value is ๏ Inspired by Lotus Notes ๏ Data model: Collections of Key-Value collections ๏ Documents are often versioned ๏ Examples: • CouchDB • MongoDB • Redis 22
  25. 25. Graph databases ๏ Focus on modeling the structure of data - interconnectivity ๏ Scales to the complexity of the data ๏ Inspired by mathematical Graph Theory ( G=(E,V) ) ๏ Data model: “Property Graph” ‣Nodes ‣Relationships/Edges between Nodes (first class) ‣Key-Value pairs on both ‣Possibly Edge Labels and/or Node/Edge Types ๏ Examples: • Neo4j • AllegroGraph • Sones graphDB 23
  26. 26. Property Graph model •Nodes •Relationships bet ween Nodes •Relationships have Labels •Relationships are directed, but traversed at equal speed in both directions •The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not) •Nodes have key-value properties •Relationships have key-value properties 24
  27. 27. Property Graph model •Nodes •Relationships bet ween Nodes •Relationships have Labels •Relationships are directed, but traversed at equal speed in both directions •The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not) •Nodes have key-value properties •Relationships have key-value properties 24
  28. 28. Property Graph model LIVES WITH LOVES OWNS DRIVES •Nodes •Relationships bet ween Nodes •Relationships have Labels •Relationships are directed, but traversed at equal speed in both directions •The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not) •Nodes have key-value properties •Relationships have key-value properties 24
  29. 29. Property Graph model LOVES LIVES WITH LOVES OWNS DRIVES •Nodes •Relationships bet ween Nodes •Relationships have Labels •Relationships are directed, but traversed at equal speed in both directions •The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not) •Nodes have key-value properties •Relationships have key-value properties 24
  30. 30. Property Graph model name: “Mary” LOVES name: “James” age: 35 age: 32 LIVES WITH twitter: “@spam” LOVES OWNS DRIVES •Nodes •Relationships bet ween Nodes •Relationships have Labels brand: “Volvo” •Relationships are directed, but traversed at model: “V70” equal speed in both directions •The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not) •Nodes have key-value properties •Relationships have key-value properties 24
  31. 31. Property Graph model name: “Mary” LOVES name: “James” age: 35 age: 32 LIVES WITH twitter: “@spam” LOVES OWNS property type: “car” DRIVES •Nodes •Relationships bet ween Nodes •Relationships have Labels brand: “Volvo” •Relationships are directed, but traversed at model: “V70” equal speed in both directions •The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not) •Nodes have key-value properties •Relationships have key-value properties 24
  32. 32. Graphs are whiteboard friendly An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database. With a Graph Database the model from the whiteboard is implemented directly. Image credits: Tobias Ivarsson 25
  33. 33. Graphs are whiteboard friendly An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database. With a Graph Database the model from the whiteboard is implemented directly. * 1 * * 1 * 1 * 1 * Image credits: Tobias Ivarsson 25
  34. 34. Graphs are whiteboard friendly An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database. With a Graph Database the model thobe from the whiteboard is implemented directly. Joe project blog Wardrobe Strength Hello Joe Modularizing Jython Neo4j performance analysis Image credits: Tobias Ivarsson 25
  35. 35. Four emerging NOSQL categories ๏Key-Value stores ๏BigTable clones ๏Document databases ๏Graph databases 26
  36. 36. ... and one that’s been around for a while ๏Object databases • Neither gaining nor loosing traction • Not part of the NOSQL community • Still a good solution to a lot of problems • Focuses on matching object oriented programming paradigm ‣Simplicity to integrate ‣Ease of use 27
  37. 37. Scaling to size vs. Scaling to complexity Size Key/Value stores Bigtable clones Document databases Graph databases Complexity 28
  38. 38. Scaling to size vs. Scaling to complexity Size Key/Value stores Bigtable clones Document databases Graph databases Billions of nodes and relationships > 90% of use cases Complexity 28
  39. 39. Who is NOSQL? A healthy mix of big players and independent vendors. 29
  40. 40. “Ok, it’s not a database. How do I query it?” ๏ RESTful interfaces (HTTP as an access API) ๏ Query languages other than SQL • GQL - SQL-like QL for Google BigTable • SPARQL - Query language for the Semantic Web • Gremlin - the graph traversal language • Sones Graph Query Language ๏ Query APIs • The Google BigTable DataStore API • The Neo4j Traversal API 30
  41. 41. Why is the database RESTing? Because hyperlinks make it possible to reference data on different hosts without hassle. RESTful is really all about hypermedia! http://four/ http://two/ http://one/ http://three/ My best friend is http://three/flounder! http://one/fishie 31
  42. 42. How about Data Manipulation? ๏ RESTful interfaces again (http PUT, POST, DELETE) ๏ Data Manipulation APIs • Google BigTable DataStore API • Neo4j GraphDatabase API ๏ Serialization Formats • JSON • Thrift • ProtoBuffers • RDF 32
  43. 43. NOSQL in the Enterprise ๏Availability ๏Security This presentation does not cover Security. The interesting parts of Security is an application layer issue anyways. ๏Correctness ๏Performance 33
  44. 44. Availability ๏ Replication • Write to many • (Multi-)Master to Slave replication ๏ Master reelection ๏ Failover • Either by another machine taking over • or by the client knowing to attempt a replica 34
  45. 45. Correctness ๏ Brewer’s CAP theorem • Most NOSQL db’s sacrifice Consistency ‣Some use “read-correction”, treat read values as votes ๏ Some NOSQL databases don’t have transactions • Instead they have only atomic single operations • This makes some operations impossible to implement 35
  46. 46. Performance ๏ This is where all the focus seems to be ๏ A surprising number scarifies Durability for performance • On-disk durability • Multiple-replicas durability ๏ All NOSQL databases outperform RDBMSes • ... in their particular niche ... 36
  47. 47. Up until recently there was only one Database, the RDBMS. The days of a single database that rules all is over. One database to rule them all Image credits: The Lord of the Rings, New Line Cinema 37
  48. 48. Use best suited storage for each kind of data The era of using RDBMSes for all problems is over. Instead we should use the database most suited for the problem at hand. 38
  49. 49. Polyglot persistence ... we could even use multiple databases in conjunction, and let each database handle the things it does best. 39
  50. 50. Polyglot persistence SQL && NOSQL All databases are welcome! SQL and NOSQL - it is Not Only SQL! 40
  51. 51. Summary ๏ Two steps forward ( but first one step back... ) ๏ The era of a single DBMS is over ๏ Use the right tool for the right job ๏ Polyglot persistence happens already, and will grow more common ๏ Solves different scalability issues • Scale to size - huge amounts of data, many many machines • Scale to complexity - handle complicated schemas - avoid being bogged down by deep JOINs ๏ Driven by big players and independent vendors - healthy community 41
  52. 52. Open source implementations to play with! ๏ Neo4j - talk to me, or visit http://neo4j.org/ ๏ CouchDB - http://couchdb.apache.org/ ๏ Cassandra - http://cassandra.apache.org/ ๏ Hadoop + HBase (clones GFS + BigTable) - http://hadoop.apache.org/ ๏ MongoDB - http://www.mongodb.org/ ๏ Redis - http://code.google.com/p/redis/ ๏ Oracle Berkley DB - http://www.oracle.com/database/berkeley-db/ ๏ FlockDB - http://github.com/twitter/flockdb ๏ ... and more ... 42
  53. 53. http://neotechnology.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×