2. WHAT IS NoSQL?
Not only SQL movement around distributed DBs.
Flexible schema changes, join-less querying,
horizontally scalable.
Various types of DBs with different data models.
More options when building solutions and solving
problems than classic RDBMS.
2/15
3. CAP THEOREM
Three core system properties:
1. Consistency
ACID - what you write is what you will read.
CAP - all nodes see the same piece of data.
2. Availability
Service is available and responsive.
3. Partition Tolerance
System continues to operate even if some
of the nodes are unavailable.
3/15
4. SYSTEM REQUIREMENTS
Only 2 out of 3 properties of CAP theorem can be
achieved at any given time:
- All NoSQL systems should be partition tolerant
- C or A depends on the level of consistency
Strong Consistency Eventual Consistency
- no stale reads - stale reads possible
- higher read latency - lowest read latency
- lower read throughput - highest read throughput
4/15
5. VARIOUS DATA MODELS
Document Oriented
- collection of documents
- flexible schema, programmer/web friendly, REST
- MongoDB, CouchDB, RavenDB
Key-Value Stores
- collection of key-value pairs
- handles size well, very fast
- Riak, Redis, Membase
5/15
6. VARIOUS DATA MODELS
Graph Oriented
- graph theory based
- complex relationships, fast
- Neo4j, OrientDB*
Object Oriented
- objects
- complex objects, direct serialization, fast
- db4o, Objectivity, Versant
6/15
7. MongoDB
Main Properties
- written in C++ (AGPL license)
- binary protocol, JSON style documents
- Master/slave replication, auto sharding
- JavaScript based ad hoc and map/reduce querying
Use Cases
- general purpose NoSQL system
- caching, high volume data store
- real-time statistics, archiving, logging, commerce
- users: foursquare, bit.ly, SourceForge, GitHub, ...
7/15
8. CouchDB
Main Properties
- written in Erlang (Apache license)
- REST interface, JSON style documents
- MVCC, N master replication
- map/reduce querying, reliable design
Use Cases
- N master replication on various systems
- scenarios where versioning is important
- scenarios where ad hoc queries are not required
- users: BBC, Ubuntu One, Engine Yard, CERN, ...
8/15
9. Riak
Main Properties
- written in Erlang (Apache license)
- REST interface, binary protocol
- shard-partitioned storage, tunable consistency
- map/reduce querying, full-text search
Use Cases
- scenarios with tunable level of consistency
- built-in search requirement
- performance oriented systems
- users: Mozilla, Yammer, Aol, Voxer, ...
9/15
10. Redis
Main Properties
- written in C (BSD license)
- binary protocol
- disk-backed in-memory database (VM support)
- advanced data structures, pub/sub, very fast
Use Cases
- caching
- real-time data, analytics, statistics
- scenarios where performance matters
- users: GitHub, StackOverflow, Blizzard, Disqus, ...
10/15
11. Neo4j
Main Properties
- written in Java (GPL license)
- REST interface, embedding
- optimized for reads, Gremlin traversal language
- deployable as a full server or a very slim database
Use Cases
- graph style data
- social relations, network topologies
- tagging, metadata annotations, hierarchic data
- users: Schor.ly, Namesake, Face2Face, ...
11/15
12. OrientDB
Main Properties
- written in Java (Apache license)
- REST interface, binary protocol
- object/document/graph database hybrid
- embeddable, SQL-like querying
Use Cases
- general purpose NoSQL solution
- scenarios where fast insertion matters
- cross-platform requirements
- users: NuvolaBase
12/15
13. db4o
Main Properties
- written in Java/.NET (GPL license)
- REST interface, binary protocol
- embeddable, low footprint (~1MB)
- data access through Native Queries, LINQ
Use Cases
- ORM-free data manipulation
- database installation-free scenarios
- cross-platform requirements
- users: Boeing, BOSCH, Seagate, IBM, Intel, ...
13/15
14. BENCHMARKS IN GENERAL
Real benchmarks require real-world data/load.
Speed versus data durability.
More operations per second != better system.
Cost of I/O.
L1 cache 3 cycles
L2 cache 14 cycles
RAM 250 cycles
Disk 41 000 000 cycles
Network 240 000 000 cycles
14/15
15. RESOURCES
NoSQL Ecosystem
Brewer's CAP Theorem
Systemic Requirements
Choosing consistency
Eventually Consistent - Revisited
List of NoSQL Databases
NoSQL Databases Comparison
Choosing SQL, NoSQL or Both
Use Cases For Choosing NoSQL Database
NoSQL, NewSQL and Beyond
15/15