Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Non Relational Databases
1. Non-Relational Databases
What are they, and why are they
getting a lot of press?
(plus a super-secret mystery guest tacked on at the end of the presentation)
D DiPaolo - Lightning Lunch - 5/8/2009
2. What types of non-relational
databases are there?
• Column-Oriented
– Google’s BigTable
– HBase (Apache Hadoop)
• Document-Oriented
– CouchDB
– Amazon SimpleDB
– MongoDB
• Key-Value
– Facebook Cassandra
– Memcached (sort of)
– Tokyo Cabinet
– Redis
– Amazon Dynamo
D DiPaolo - Lightning Lunch - 5/8/2009
3. Motivation for each type
• Column-Oriented
– Not all that different from relational, still have similar
structures but just oriented differently so as to maximize
disk performance
– Generally you don’t want every column of a record
• Document-Oriented
– Lack of structure allows for tight packing of data
– Lack of strong typed fields akin to dynamic programming
languages
• Key-Value
– More generic version of Document-Oriented, with no
guarantee/requirement of structure in the values
D DiPaolo - Lightning Lunch - 5/8/2009
4. Why are they popular now?
• Speed, speed, speed
– Smaller/less data = faster throughput
– Less “structure” means less overhead
– Similar data stored sequentially means high compression
• Scalability too
– RDBMSes weren’t designed to run across networks
• Moore’s Law isn’t enough
– Faster processing can’t compensate (enough)
• Actually kind of gross hacks that are getting pretty
faces put on them
• Lose some “nice features”
D DiPaolo - Lightning Lunch - 5/8/2009
5. You lose some stuff, but…
• Giving up
– Enforced Structure
– Constraints
– DB-side logic
– ACID guarantees
• The use-cases for these generally don’t need
those
• If you absolutely need both speed and relational
data, you can denormalize
– But you probably don’t
D DiPaolo - Lightning Lunch - 5/8/2009
6. Parting Thought: Bloom Filters
• These are freaking wild
• Who knew lossy storage would actually be
useful in databases?
• Basic idea: constant-space mapping of
unlimited data, but it may lie a teeny bit
• This data structure is used in several of these
non-relational DB implementations
D DiPaolo - Lightning Lunch - 5/8/2009
7. Bloom filter example
Bloom filter: Hash functions: x, y, z; 8 bits So are all these bits set in the filter?
1011 0001 = "wtf"
Feed key "foo" into bloom filter: 1111 1010 = filter
x("foo") maps to: 0010 0000 y_yy ___n = not in the filter
y("foo") maps to: 0001 0000 So we know "wtf" has never been put through this
z("foo") maps to: 1000 0010 filter.
So the filter "result" is 1011 0010
What about "lol"?
Now feed key "bar" into bloom filter: x("lol") maps to: 1100 0000
x("bar") maps to: 0001 0000 y("lol") maps to: 0010 0000
y("bar") maps to: 0000 1000 z("lol") maps to: 0000 0010
z("bar") maps to: 0110 0000 result: 1110 0010
So the filter "result" is 0111 1000
Combine (bitwise AND) this with the previous result Is "lol" in the filter:
and the filter is now: 1110 0010 = "lol"
1111 1010 1111 1010 = filter
yyy_ __y_ = it might be!
Now we want to see if "wtf" is in the bloom filter:
x("wtf") maps to: 0001 0001 (but it isn’t)
y("wtf") maps to: 0010 0000
z("wtf") maps to: 1000 0000
Our filter "result" is 1011 0001
D DiPaolo - Lightning Lunch - 5/8/2009