Intro database talk given to the spokane chapter of build guild. Covers the differences and use cases for sql to nosql database systems, as well as some core concepts such as ACID and CAP.
2. Who Am I?
Adam Martinek - adam.martinek@rackspace.com
● Attended EWU
● Worked at Wolfram Research on Wolfram|Alpha’s data framework
● Now work for Rackspace Monitoring team
● Passion for data storage (Directly because of Martin Fowlers NoSQL Distilled)
3. Overview
● Relational Databases
● Set Theory/Relational Algebra
● ACID
● CAP theory
● Graph Databases
● Key/Value Stores
● Document Stores
● Column/Family Databases
4. Relational Databases
Data Modelling Ideology: Set Theory/Relational Algebra
Use Cases: Default
Relevant Databases: MySQL, PostgreSQL, SQL Server, Oracle...
5. Set Theory/Relational Algebra
Good to understand to get yourself thinking in sets
● Set union: Complete table join
● Set Intersection: Table Join on a specific value
● Set Difference: All values in a table that are not in the other table
Diagrams and sample SQL queries to follow
9. Indexes
● Based on B-Trees (Dependent on specific database)
● Is more complex for finding ranges
● Don’t use auto increment integers as primary key
Best Practice
● Find the candidate key(s)
● If no candidate keys then consider UUID (remember that this in incurs write
penalties for generating the UUID)
● If the database doesn’t support UUID’s then you can consider using an auto
10. Candidate Keys
● Assuming no duplicate rows
● A Candidate Key is a set of columns that based on the data, will be unique
● The previous statement must also be irreducible (it must be the smallest set
of unique columns)
11. ACID
● Atomicity: All or nothing for a transaction
● Consistency: Bring from one valid state to another valid state
● Isolation: Result of concurrent transactions must be the same as if applied
serially
● Durability: Even in the event of power loss, crashes or errors each transaction
must remain
12. CAP Theorem
● Consistency: All nodes see the same data at the same time
● Availability: A guarantee that every request receives a response about
whether it succeeded or failed
● Partition Tolerance: The system continues to operate despite arbitrary
partitioning due to network failures
Choose two
Most NoSQL databases get around this by allowing you to choose which you want
at query time
13. Graph Databases
Data Modelling Ideology: Graph Theory
Use Cases: Recommendation Engine, Search, Network Modelling...
ACID compliance: Dependent (Most are)
Relevant Databases: Neo4j, Sparksee, HypergraphDB…
Gotchas: Nodes are indexed (for finding an entry point into the graph) but edges
attached to a node are not
14. Key/Value
Data Modelling Ideology: Hash Map
Use Cases: caching, simple applications, Large data values
Relevant Databases: Redis, Riak, memcached...
Gotchas: Can theoretically only query on the key
15. Document Stores
Data Modelling Ideology: JSON
Use Cases: Hierarchical Data,
Relevant Databases: MongoDB, CouchDB, PostgreSQL, ...
Gotchas: Indexes are difficult to change, Document size limits
16. Column/Family
Data Modelling Ideology: Hierarchical/key value
Use Cases: Time series, Fraud detection
Relevant Databases: Cassandra, HBase
Gotchas: Purposeful denormalization, limited query capability
Editor's Notes
For each database system I will also provide a short list of use cases to give you an idea of when and where they should be used
CAP is CP for relational databases
Diagrams might be better
CouchDB fully supports ACID but doesn’t have awareness of different types of documents. All documents in a particular database are in a single table
This is admittedly the type of database that I know the least about.