3. The NoSQL Movement
No SQL Databases:
Non-relational
Less ACID
More BASE
CAP Trading
Highly Scalable
Highly Performant
NoSQL = Not Only SQL
4. Less ACID
• Atomic
• basically means supports transactions
• Consistent
• Has hard constraints & rejects non-conforming data
• Isolated
• No peaking at incomplete commits
• Durable
• Once a commit is finished, it lasts forever.
6. CAP Trading
• Consistency (client perceives set of operations
completed)
• Availability (operations terminate with an
expected result)
• Partition tolerance (operations will complete,
even if a required resource is unavailable)
• Only 2 are possible in distributed systems.
– Eric Brewer
7. The NoSQL Movement
Why:
• SQL is tedious and difficult
• Strongly typed schemas are inflexible and
painful to maintain
• Inadequate performance of RDBMS on huge
data stores
• Poor Scalability of RDBMS
• Poor Replication Support
8. Types of NoSQL Databases
• Document Stores
• Graph
• Key/Value Store
• Object Database
• Tabular
10. MongoDB
Combining the best features of
document databases, key-value
stores, and RDBMSes.
• Scalable
• High-Performance
• Open Source
• Schema-free
• Document Oriented
11. MongoDB Features
• Document-oriented • Replication
storage (BSON) • Auto-sharding
• Dynamic Queries • MapReduce
• Full index support • Driver support for many
(including embedded languages
objects & arrays) • Cross-Platform
• Fast, in-place updates • Admin Tools
• Efficient Blob storage
13. Dynamic Queries
• No indexes required to Examples
find data. • All records:
• RDBMSes all support db.players.find({})
this as well. • All Red Wings
db.players.find({“team”:
“Red Wings”})
14. Index Support
• B-Tree format
• Default index on PK
• Supports unique, compound, document
indexes (indexes on nested documents) and
multikeys indexes (allows indexing of arrays of
values)
15. Fast in-place updates
• Updates are made to existing documents
within a collection.
• Many “NoSQL” databases (such as CouchDB)
do not support updates and instead store
versions of records.
16. Efficient Blob Storage
• Blob = Binary Large Object
• Up to 4MB within document
• GridFS specification is followed for larger
items and external files
17. Replication
• Enhanced master-slave configuration
– one server active for writes at a time.
– Provides failover and redundancy
– Implemented with Replica Pairs
• When master fails slave takes over
• When slave fails control reverts to master
• Limited Master-master
18. Auto-Sharding
• Sharding:
– Breaking database down into “shards” and
spreading those across distributed/commodity
servers.
– highly scalable approach for increased throughput
and performance of high-transaction, large
database applications.
– MongoDB manages data storage and retrieval
behind the scenes.
19. MapReduce
• Useful for batch
• Term comes from Google. operations
– Patented framework for • Aggregation: NoSQL
processing huge datasets answer to GROUP BY
on certain kinds of
distributable problems
using a large number of
servers.
– MongoDB applies it to
single server instances as
well.
27. How it works
• Focused on documents
– Document = sequence of key value pairs in bson
• Value can be another document
• Additional types vs. JSON. ie dates, regexp
• Messages (cpassed over TCP/IP) in BSON drivers convert code to BSON
• Memory mapped storage engine (MMSE) – all disk access takes place
through MMSE
• Query Optimizer:
– Find( {x:10, y:”foo”})
– Launches multiple simultaneous queries based on indexes & table scan. Stops
when one finishes, remembers which one was the fastest for future similar
queries. Can use hint option to specify which index to use.
28. Why?
• Applications where schema gets in the way
• Performance
• Scalability
• RAD
• More natural fit with OO Languages