NoSQL Overview


Published on

This presentation introduces NoSQL and the motivation behind it and the basic flavors of NoSQL.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

NoSQL Overview

  1. 1. NoSQL: What Is It and Why Would I Care? Eberhard Wolff21.09.11
  2. 2. Alternative Databases: NoSQL►  NoSQL: Not only SQL►  A good example for a catchy but bad name►  Not positive definition, rather “not something else”►  Now: Even less clear
  3. 3. Why NoSQL?►  Exponential data growth►  More and more connected data >  Hypertext, Blogs, User generated content, Blogs►  Semi structured >  User generated content >  Full text search / indices instead of Query-by-Example►  Integration on the database less common►  Cloud prefers scale out over scale up >  Cloud supports scale up: Reboot into larger machine >  …but eventually you will need to scale out i.e. add more machines
  4. 4. NoSQL Flavors►  Key / value store►  Document►  Wide Column: Lots of Columns►  Graph Database: Graphs with nodes, relationships and properties►  Object databases: Stores objects – not rows►  Note: NoSQL is actually vaguely defined
  5. 5. Key-Value Stores►  Maps keys to values Key Value►  Just a large globally available Map 42 Some data►  i.e. not very powerful data model►  Advantages >  Easy to understand >  Easier to build scale out solutions (no joins, easy sharding etc)►  Disadvantages >  Simplistic data model >  Not a good fit for complex data >  Might add complexity to the application code•  Focus in Scalability•  Redis: Think cache + Persistence•  Riak
  6. 6. Key Value Store: Hybrid Approach►  Might just be used to store specific data►  I.e. scores of players in an online game >  No complex structure >  Need to scale >  Lots of reads and write►  Player name, age, address would still be in a RDBMS►  Hybrid approach
  7. 7. Key-Value Stores: Store All Data►  Storing data as serialized blobs >  "user:someuser" è "someuser||more|data|here"►  Storing data as multiple keys >  "user:username:someuser" è "someuser" >  "user:email:someuser" è "" >  Requires multi get/set to be efficient >  Allows some querying if the database supports wildcards, like "user:email:someuser*"►  Storing links >  Blob: "basket:someuser" è"...|item|1|product|product:123|..." >  Separate keys: "basket:someuser:item:1:product" è "product:123" –  Multi-get: "basket:someuser:*" loads the shopping basket and all items►  Easy to understand, hard to implement
  8. 8. Document Stores►  Aggregates are typically stored as "documents“ (key-value collection)►  JSON, BSON (binary JSON) and XML are common►  Still no schema, so add any data at runtime►  The semi-structure of the document allows the database to build indexes, allowing queries that address properties of the document >  E.g. "find all baskets that contain the product 123"►  Relations might be modeled as links►  Advantages >  Good fit for semi structured data >  In particular a good fit for JSON, XML, HTML… >  Probably the easiest transition from RDBMS►  Disadvantages >  Does not scale to the key/value store level►  Focus on semi structured data e.g. JSON►  MongoDB, CouchDB
  9. 9. Wide Column►  Add any "column" you like to a row XX►  Not a key-value store, but a "key-(column-value)" store XX XX XX XX XX XX XX►  Column families are like tables XX XX XX XX►  E.g. in the "Users" column family XX XX XX XX XX XX XX XX >  "someuser" è ("username"è"someuser"), XX XX XX XX ("email" è"") XX XX XX XX XX►  Since columns are named, some databases provide indexing XX XX XX >  E.g. Google AppEngine allows you to define columns that can XX queried be XX XX XX XX XX XX►  Advantages XX XX XX XX >  Easy to store complex and heterogeous data XX xX XX XX XX§  Apache Cassandra§  Amazon SimpleDB
  10. 10. Graph►  Nodes with Properties►  Typed relationships with properties►  Ideal e.g. to model relations in a social network►  Easy to find number of followers, degree of relation etc.►  Neo4j
  11. 11. What happened to Queries?►  Data is easily and quickly read/stored using primary key►  Denormalize data for commonly used queries >  Store twitter inbox in key/value as –  "inbox:someuser" è ("posts:123", "posts:234", ...) >  instead of doing the query (RDBMS) –  select p.* from POSTS p, POSTLINKS pl where = pl.postId and pl.userid=42►  Store reverse lookup >  ”ewolff|following" è (”spring_rod", ”spring_juergen") >  ”post:435|RT" è (”post:42", ”post:21")
  12. 12. What It Means for Developers§  More technologies to have fun with§  Broader choice of persistence stores§  Probably Cross Store Persistence •  Store name, firstname etc in RDBMS •  Store followers in Graph database •  Store Content in RDBMS •  Store User Generated Content in Document database§  Spring Data •  Similar APIs for JPA and NoSQL •  Support for cross store persistence •  Sophisticated support for generic DAOs •  E.g. just add findByName() method, implementation is provided§  QueryDSL •  JPA Criteria API done right