NoSQL Overview

NoSQL: What Is It and Why Would I Care?
Eberhard Wolff

21.09.11

Alternative Databases: NoSQL
►  NoSQL: Not only SQL

►  A good example for a catchy but bad name
►  Not positive definition, rather “not something else”
►  Now: Even less clear

Why NoSQL?
►  Exponential data growth

►  More and more connected data
>  Hypertext, Blogs, User generated content, Blogs

►  Semi structured
>  User generated content
>  Full text search / indices instead of Query-by-Example

►  Integration on the database less common

►  Cloud prefers scale out over scale up
>  Cloud supports scale up: Reboot into larger machine
>  …but eventually you will need to scale out i.e. add more machines

NoSQL Flavors
►  Key / value store
►  Document
►  Wide Column: Lots of Columns

►  Graph Database: Graphs with nodes, relationships and properties
►  Object databases: Stores objects – not rows

►  Note: NoSQL is actually vaguely defined

Key-Value Stores
►  Maps keys to values Key Value
►  Just a large globally available Map 42 Some
data
►  i.e. not very powerful data model
►  Advantages
>  Easy to understand
>  Easier to build scale out solutions
(no joins, easy sharding etc)
►  Disadvantages
>  Simplistic data model
>  Not a good fit for complex data
>  Might add complexity to the application code
•  Focus in Scalability
•  Redis: Think cache + Persistence
•  Riak

Key Value Store: Hybrid Approach
►  Might just be used to store specific data

►  I.e. scores of players in an online game
>  No complex structure
>  Need to scale
>  Lots of reads and write

►  Player name, age, address would still be in a RDBMS

►  Hybrid approach

Key-Value Stores: Store All Data
►  Storing data as serialized blobs
>  "user:someuser" è "someuser|someuser@example.com|more|data|here"
►  Storing data as multiple keys
>  "user:username:someuser" è "someuser"
>  "user:email:someuser" è "someuser@example.com"
>  Requires multi get/set to be efficient
>  Allows some querying if the database supports wildcards,
like "user:email:someuser*"
►  Storing links
>  Blob: "basket:someuser" è"...|item|1|product|product:123|..."
>  Separate keys: "basket:someuser:item:1:product" è "product:123"
–  Multi-get: "basket:someuser:*" loads the shopping basket and all items
►  Easy to understand, hard to implement

Document Stores
►  Aggregates are typically stored as "documents“ (key-value collection)
►  JSON, BSON (binary JSON) and XML are common
►  Still no schema, so add any data at runtime
►  The semi-structure of the document allows the database to build indexes, allowing
queries that address properties of the document
>  E.g. "find all baskets that contain the product 123"
►  Relations might be modeled as links
►  Advantages
>  Good fit for semi structured data
>  In particular a good fit for JSON, XML, HTML…
>  Probably the easiest transition from RDBMS
►  Disadvantages
>  Does not scale to the key/value store level
►  Focus on semi structured data e.g. JSON
►  MongoDB, CouchDB

Wide Column
►  Add any "column" you like to a row
XX

►  Not a key-value store, but a "key-(column-value)" store XX XX XX XX

XX XX XX
►  Column families are like tables XX XX XX XX

►  E.g. in the "Users" column family XX XX XX XX

XX XX XX XX
>  "someuser" è ("username"è"someuser"), XX XX XX XX

("email" è"someuser@example.com") XX XX

XX XX XX
►  Since columns are named, some databases provide indexing XX XX XX

>  E.g. Google AppEngine allows you to define columns that can XX queried
be XX XX

XX XX XX XX
►  Advantages XX XX XX XX

>  Easy to store complex and heterogeous data XX xX XX XX XX

§  Apache Cassandra
§  Amazon SimpleDB

Graph
►  Nodes with Properties
►  Typed relationships with properties

►  Ideal e.g. to model relations in a social network

►  Easy to find number of followers, degree of relation etc.

►  Neo4j

What happened to Queries?
►  Data is easily and quickly read/stored using primary key
►  Denormalize data for commonly used queries
>  Store twitter inbox in key/value as
–  "inbox:someuser" è ("posts:123", "posts:234", ...)
>  instead of doing the query (RDBMS)
–  select p.* from POSTS p, POSTLINKS pl where p.id = pl.postId and
pl.userid=42
►  Store reverse lookup
>  ”ewolff|following" è (”spring_rod", ”spring_juergen")
>  ”post:435|RT" è (”post:42", ”post:21")

What It Means for Developers
§  More technologies to have fun with
§  Broader choice of persistence stores
§  Probably Cross Store Persistence
•  Store name, firstname etc in RDBMS
•  Store followers in Graph database

•  Store Content in RDBMS
•  Store User Generated Content in Document database

§  Spring Data
•  Similar APIs for JPA and NoSQL
•  Support for cross store persistence
•  Sophisticated support for generic DAOs
•  E.g. just add findByName() method, implementation is provided
§  QueryDSL
•  JPA Criteria API done right

NoSQL Overview

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to NoSQL Overview

Similar to NoSQL Overview (20)

More from adesso AG

More from adesso AG (20)

Recently uploaded

Recently uploaded (20)

NoSQL Overview