1. NoSQL: What Is It and Why Would I Care?
Eberhard Wolff
21.09.11
2. Alternative Databases: NoSQL
► NoSQL: Not only SQL
► A good example for a catchy but bad name
► Not positive definition, rather “not something else”
► Now: Even less clear
3. Why NoSQL?
► Exponential data growth
► More and more connected data
> Hypertext, Blogs, User generated content, Blogs
► Semi structured
> User generated content
> Full text search / indices instead of Query-by-Example
► Integration on the database less common
► Cloud prefers scale out over scale up
> Cloud supports scale up: Reboot into larger machine
> …but eventually you will need to scale out i.e. add more machines
4. NoSQL Flavors
► Key / value store
► Document
► Wide Column: Lots of Columns
► Graph Database: Graphs with nodes, relationships and properties
► Object databases: Stores objects – not rows
► Note: NoSQL is actually vaguely defined
5. Key-Value Stores
► Maps keys to values Key Value
► Just a large globally available Map 42 Some
data
► i.e. not very powerful data model
► Advantages
> Easy to understand
> Easier to build scale out solutions
(no joins, easy sharding etc)
► Disadvantages
> Simplistic data model
> Not a good fit for complex data
> Might add complexity to the application code
• Focus in Scalability
• Redis: Think cache + Persistence
• Riak
6. Key Value Store: Hybrid Approach
► Might just be used to store specific data
► I.e. scores of players in an online game
> No complex structure
> Need to scale
> Lots of reads and write
► Player name, age, address would still be in a RDBMS
► Hybrid approach
7. Key-Value Stores: Store All Data
► Storing data as serialized blobs
> "user:someuser" è "someuser|someuser@example.com|more|data|here"
► Storing data as multiple keys
> "user:username:someuser" è "someuser"
> "user:email:someuser" è "someuser@example.com"
> Requires multi get/set to be efficient
> Allows some querying if the database supports wildcards,
like "user:email:someuser*"
► Storing links
> Blob: "basket:someuser" è"...|item|1|product|product:123|..."
> Separate keys: "basket:someuser:item:1:product" è "product:123"
– Multi-get: "basket:someuser:*" loads the shopping basket and all items
► Easy to understand, hard to implement
8. Document Stores
► Aggregates are typically stored as "documents“ (key-value collection)
► JSON, BSON (binary JSON) and XML are common
► Still no schema, so add any data at runtime
► The semi-structure of the document allows the database to build indexes, allowing
queries that address properties of the document
> E.g. "find all baskets that contain the product 123"
► Relations might be modeled as links
► Advantages
> Good fit for semi structured data
> In particular a good fit for JSON, XML, HTML…
> Probably the easiest transition from RDBMS
► Disadvantages
> Does not scale to the key/value store level
► Focus on semi structured data e.g. JSON
► MongoDB, CouchDB
9. Wide Column
► Add any "column" you like to a row
XX
► Not a key-value store, but a "key-(column-value)" store XX XX XX XX
XX XX XX
► Column families are like tables XX XX XX XX
► E.g. in the "Users" column family XX XX XX XX
XX XX XX XX
> "someuser" è ("username"è"someuser"), XX XX XX XX
("email" è"someuser@example.com") XX XX
XX XX XX
► Since columns are named, some databases provide indexing XX XX XX
> E.g. Google AppEngine allows you to define columns that can XX queried
be XX XX
XX XX XX XX
► Advantages XX XX XX XX
> Easy to store complex and heterogeous data XX xX XX XX XX
§ Apache Cassandra
§ Amazon SimpleDB
10. Graph
► Nodes with Properties
► Typed relationships with properties
► Ideal e.g. to model relations in a social network
► Easy to find number of followers, degree of relation etc.
► Neo4j
11. What happened to Queries?
► Data is easily and quickly read/stored using primary key
► Denormalize data for commonly used queries
> Store twitter inbox in key/value as
– "inbox:someuser" è ("posts:123", "posts:234", ...)
> instead of doing the query (RDBMS)
– select p.* from POSTS p, POSTLINKS pl where p.id = pl.postId and
pl.userid=42
► Store reverse lookup
> ”ewolff|following" è (”spring_rod", ”spring_juergen")
> ”post:435|RT" è (”post:42", ”post:21")
12. What It Means for Developers
§ More technologies to have fun with
§ Broader choice of persistence stores
§ Probably Cross Store Persistence
• Store name, firstname etc in RDBMS
• Store followers in Graph database
• Store Content in RDBMS
• Store User Generated Content in Document database
§ Spring Data
• Similar APIs for JPA and NoSQL
• Support for cross store persistence
• Sophisticated support for generic DAOs
• E.g. just add findByName() method, implementation is provided
§ QueryDSL
• JPA Criteria API done right