A Survey of Advanced Non-relational Database Systems: Approaches and Applications
1. A Survey of Advanced Non-relational Database
Systems: Approaches and Applications
Speaker: LIN Qian
http://www.comp.nus.edu.sg/~linqian
2. Outline
• Introduction
• Non-relational database system
– Requirement
– Concepts
– Approaches
– Optimization
– Examples
• Comparison between RDBMS and non-relational
database system
1
3. Problem
• The Web introduces a new scale for applications, in
terms of:
– Concurrent users (millions of reqs/second)
– Data (peta-bytes generated daily)
– Processing (all this data needs processing)
– Exponential growth (surging unpredictable demands)
• Shortage of existing RDBMS
– Oracle, MS SQL, Sybase, MySQL, PostgreSQL, …
– Trouble when dealing with very large traffic
– Even with their high-end clustering solutions
2
4. Problem
• Why?
– Applications using normalized database schema require the
use of join, which doesn't perform well under lots of data
and/or nodes
– Existing RDBMS clustering solutions require scale-up, which
is limited and not really scalable when dealing with
exponential growth (e.g., 1000+ nodes)
– Machines have upper limits on capacity
3
5. Problem
• Why not just use sharding?
– Very complex and application-specific
• Increased complexity of SQL
• Single point of failure
• Failover servers more complex
• Backups more complex
• Operational complexity added
– Very problematic when adding/removing nodes
– Basically, you end up denormalizing everything and loosing
all benefits of relational databases
Sharding: Split one or more tables by row across potentially multiple instances of the
schema and database servers.
4
6. Who faced this problem?
• Web applications dealing with high traffic and massive
data
– Web service providers
• Google, Yahoo!, Amazon, Facebook, Twitter, LinkedIn, …
– Scientific data analysis
• Weather, Ocean, tide, geothermy, …
– Complex information processing
• Financial, stock, telecommunication, …
5
7. Solution
• A new kind of DBMS, capable of handling web scale
– Possibly sacrificing some level of feature
• CAP theorem*: You can only optimize 2 out of these 3
– Consistency - the system is in a consistent state after an operation
• All nodes see the same data at the same time
• Strong consistency (ACID) vs. eventual consistency (BASE)
– Availability - the system is “always on”, no downtime
• Node failure tolerance: All clients can find some available replica.
• software/hardware upgrade tolerance
– Partition tolerance
• The system continues to operate (read/write) despite arbitrary message
loss or failure of part of the system
* Eric A. Brewer, Towards Robust Distributed Systems, Proceedings of the 19th annual
ACM symposium on Principles of Distributed Computing (PODC), 2000 6
8. Non-relational database systems
• Various solutions & products
– BigTable, LevelDB (developed at Google)
– Hbase (developed at Yahoo!)
– Dynamo (developed at Amazon)
– Cassandra (developed at FaceBook)
– Voldemort (developed at LinkedIn)
– Riak, Redis, CouchDB, MongoDB, Berkeley DB, …
• Researches
– NoDB, Walnut, LogBase, Albatross, Citrusleaf, HadoopDB
– PIQL, RAMCloud
7
10. Cost
• Allows sacrificing consistency (ACID)
– at certain circumstances, but can deal with it
• Non-standard new API model
• Non-standard new Schema model
• New knowledge required to tune/optimize
• Less mature
9
11. Data/API/Schema model
• Data model: Key-Value store
– (row:string, column:string, time:int64) → string
– An opaque serialized object
• API model
– Get(key)
– Put(key, value)
– Delete(key)
– Execute(operation, key_list)
• Schema model
– None
– Kind of sparse table
10
12. Data processing
• MapReduce*
– An API exposed by non-relational databases to process data
– A functional programming pattern for parallelizing work
– Brings the workers to the data
• excellent fit for non-relational databases
– Minimizes the programming to 2 simple functions
• map & reduce
*: Jeffrey Dean and Sanjay Ghemawat,
MapReduce: Simplified Data Processing
on Large Clusters, Proceedings of the
6th Symposium on Operating Systems
Design and Implementation (OSDI),
2004.
11
13. Optimization: Distributed indexing
• Exploits the characteristics of Cayley graphs to provide the scalability for
supporting multiple distributed indexes of different types.
• Define a methodology to map various types of data and P2P overlays to a
generalized Cayley graph structure.
• Propose self-tuning strategies to optimize the performance of the indexes
defined over the generic Cayley overlay.
12
14. Optimization: Data migration
• Albatross is a technique for live migration in a
multitenant database which can migrate a live tenant
database with no aborted transactions.
– Phase 1: Begin Migration.
– Phase 2: Iterative Copying.
– Phase 3: Atomic Handover.
13
15. Example: Oracle Berkeley DB
• High-performance embeddable database providing
SQL, Java Object and Key-Value storage
– Relational Storage - Support SQL.
– Synchronization - extend the reach of existing applications
to mobile devices by supporting unparalleled performance
and a robust data store on the mobile device.
– Replication - Provide a single-master multi-replica highly
available database configuration.
Storage engine
14
16. Example: Amazon DynamoDB
• Fully managed NoSQL database service providing fast
and predictable performance with seamless scalability
– Provisioned throughput
• Allocate dedicated resources to table to performance requirements,
and automatically partitions data over a sufficient number of
servers to meet request capacity.
– Consistency model
• The eventual consistency option maximizes read throughput.
– Data Model
• Attributes, Items and Tables
15
17. Example: HBase
• Non-relational, distributed database running on top of
HDFS providing Bigtable-like capabilities for Hadoop
– Strongly consistent reads/writes
– Automatic sharding
– Hadoop/HDFS Integration
– Block Cache and Bloom Filters
– Operational Management
16
18. Example: CouchDB
• Scalable, fault-tolerant, and schema-free document-
oriented database
– Document Storage
– Distributed Architecture with Replication
– Map/Reduce Views and Indexes
– ACID Semantics
– Eventual Consistency
– Built for Offline
17
19. Example: Riak
• A distributed database architected for availability,
fault-tolerance, operational simplicity and scalability.
– Operate in highly distributed environments
– Scale simply and intelligently
– Master-less
– Highly fault-tolerant
– Incredibly stable
18
20. Example: MongoDB
• Document-oriented NoSQL database system
– Scale horizontally without compromising functionality
– Document-oriented storage
– Full index support
– Master-slave replication
– Rich, document-based queries
19
21. Comparison with RDBMS
• Transaction
– Web apps can (usually) do without transactions / strong
consistency / integrity
– Bigtable does not support transactions across multiple rows
• support single-row transactions
• provide an interface for batching writes across row keys at the
clients
• Scalability
– Parallel DBMS vs. MapReduce-base system
20
24. Example of the CAP theorem
• When you have a lot of data which needs to be highly
available, you'll usually need to partition it across
machines & also replicate it to be more fault-tolerant
• This means, that when writing a record, all replica's
must be updated too
• Now you need to choose between:
– Lock all relevant replicas during update => be less available
– Don't lock the replicas => be less consistent
23
Editor's Notes
Machines have upper limits on capacity
Increased complexity of SQL - Increased bugs because the developers have to write more complicated SQL to handle sharding logic.Single point of failure - Corruption of one shard due to network/hardware/systems problems causes failure of the entire table.Failover servers more complex - Failover servers must themselves have copies of the fleets of database shards.Backups more complex - Database backups of the individual shards must coordinated with the backups of the other shards.Operational complexity added - Adding/removing indexes, adding/deleting columns, modifying the schema become much more difficult.