ToroDB is an open source document-oriented NoSQL database that uses PostgreSQL as its storage backend. It stores JSON documents by splitting them into subdocuments based on hierarchy levels and storing each subdocument type in a separate PostgreSQL table. This allows ToroDB to scale like MongoDB by querying "by structure" and only scanning relevant subsets of data. Version 0.4 of ToroDB adds support for replicating data from a MongoDB master database to a PostgreSQL database using MongoDB's replication protocol.
3. ToroDB @NoSQLonSQL
About *8Kdata*
● Research & Development in databases
●
Consulting, Training and Support in PostgreSQL
●
Founders of PostgreSQL España, 5th
largest PUG
in the world (>500 members as of today)
●
About myself: CTO at 8Kdata:
@ahachete
http://linkd.in/1jhvzQ3
www.8kdata.com
5. ToroDB @NoSQLonSQL
ToroDB in one slide
●
Document-oriented, JSON, NoSQL db
●
Open source (AGPL)
●
MongoDB compatibility (wire protocol
level)
●
Uses PostgreSQL as a storage backend
6. ToroDB @NoSQLonSQL
ToroDB storage
●
Data is stored in tables. No blobs
●
JSON documents are split by hierarchy
levels into “subdocuments”, which
contain no nested structures. Each
subdocument level is stored separately
●
Subdocuments are classified by “type”.
Each “type” maps to a different table
7. ToroDB @NoSQLonSQL
ToroDB storage (II)
●
A “structure” table keeps the
subdocument “schema”
●
Keys in JSON are mapped to attributes,
which retain the original name
●
Tables are created dinamically and
transparently to match the exact types of
the documents
13. ToroDB @NoSQLonSQL
ToroDB: query “by structure”
●
ToroDB is effectively partitioning by
type
●
Structures (schemas, partitioning types)
are cached in ToroDB memory
●
Queries only scan a subset of the data
●
Negative queries are served directly
from memory
15. ToroDB @NoSQLonSQL
Big Data: NoSQL vs SQL
vs
http://www.networkworld.com/article/2226514/tech-debates/what-s-better-for-your-big-data-application--sql-or-nosql-.html
18. ToroDB @NoSQLonSQL
Vertical scalability
Concurrency scalability
●
SQL is usually better (e.g. PostgreSQL):
➔
Finer locking
➔
MVCC
➔
better caching
●
NoSQL often needs sharding within the
same host to scale
19. ToroDB @NoSQLonSQL
Vertical scalability
Hardware scalability
●
Scaling with the number of cores?
●
Process/threading model?
Query scalability
●
Use of indexes? Use of more than one?
●
Table/collection partitioning?
●
ToroDB “by-type” partitioning
20. ToroDB @NoSQLonSQL
Read scalability: replication
●
Replicate data to slave nodes, available
read-only: scale-out reads
●
Both NoSQL and SQL support it
●
Binary replication usually faster (e.g.
PostgreSQL's Streaming Replication)
●
Not free from undesirable phenomena
23. ToroDB @NoSQLonSQL
MongoDB's dirty and stale reads
Dirty reads
A primary in minority accepts a write that
other clients see, but it later steps down,
write is rolled back (fixed in 3.2?)
Stale reads
A primary in minority serves a value that
ought to be current, but a newer value
was written to the other primary in
minority
24. ToroDB @NoSQLonSQL
Write scalability
(sharding)
●
NoSQL better prepared than SQL
●
But many compromises in data
modeling (schema design): no FKs
●
There are also solutions for SQL:
➔
Shared-disk, limited scalability (RAC)
➔
Sharding (like pg_shard)
➔
PostgreSQL's FDWs
26. ToroDB @NoSQLonSQL
Replication protocol choice
●
ToroDB is based on PostgreSQL
●
PostgreSQL has either binary streaming
replication (async or sync) or logical
replication
●
MongoDB has logical replication
●
ToroDB uses MongoDB's protocol
27. ToroDB @NoSQLonSQL
MongoDB's replication protocol
●
Every change is recorded in JSON
documents, idempotent format
(collection: local.oplog.rs)
●
Slaves pull these documents from master
(or other slaves) asynchronously
●
Changes are applied and feedback is
sent upstream
28. ToroDB @NoSQLonSQL
MongoDB slave's states
●
Secondary: slave is more or less up to
date and pulling “diffs” from other nodes
●
InitialSync: copy * from all databases,
all collections. Used to init slaves or
when sync is lost (rollback didn't find
common root; resync is requested)
●
Rollback: there is data to DELETE
31. ToroDB @NoSQLonSQL
ToroDB v0.4
●
ToroDB works as a secondary slave of a
MongoDB master (or slave)
●
Implements the full replication protocol
(not an oplog tailable query)
●
Replicates from Mongo to a PostgreSQL
●
Open source github.com/torodb/torodb
(repl branch, version 0.4-SNAPSHOT)
32. ToroDB @NoSQLonSQL
Advantages of ToroDB w/ replication
●
Native SQL
●
Query “by type”
●
Better SQL scaling
●
Less concurrency contention
●
Better hardware utilization
●
No need for ETL from Mongo to PG!
33. ToroDB @NoSQLonSQL
●
NoSQL is trying to get back to SQL
●
ToroDB is SQL native!
●
Insert with Mongo, query with SQL!
●
Powerful PostgreSQL SQL: window
functions, recursive queries, hypothetical
aggregates, lateral joins, CTEs, etc
ToroDB: native SQL