• Like
  • Save
Overview of MongoDB and Other Non-Relational Databases
Upcoming SlideShare
Loading in...5
×
 

Overview of MongoDB and Other Non-Relational Databases

on

  • 3,450 views

My Minnesota PHP Usergroup (mnphp.org) presentation where I give an overview on MongoDB and other non-relational databases and their ability to solve unique, complex problems.

My Minnesota PHP Usergroup (mnphp.org) presentation where I give an overview on MongoDB and other non-relational databases and their ability to solve unique, complex problems.

Statistics

Views

Total Views
3,450
Views on SlideShare
3,447
Embed Views
3

Actions

Likes
4
Downloads
58
Comments
0

3 Embeds 3

http://twitter.com 1
http://www.linkedin.com 1
https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Overview of MongoDB and Other Non-Relational Databases Overview of MongoDB and Other Non-Relational Databases Presentation Transcript

    • A general overview of the non-relational database
      By Andrew Kandels
    • When to use an RDMS?
      Organized, structured data matched by common characteristics.
      • Financial & Medical Records
      • Personal Information
      • Access Control (Usernames & Passwords)
      • Order Processing
      • Logistics
      • Mailing Lists
      … or, any data that works more efficiently when normalized
    • What Relational Databases are Bad At
      • Content Management System (CMS)
      • Real-time Analytics
      • Caching
      • Logging and Archiving Events
      • Messaging
      • Job Queue
      • Social Networking
      • Data Mining and Warehousing
    • When to Consider NoSQL?
      • De-normalizing SQL as last resort
      • Consistency can be sacrificed for scale
      • Dynamic data models
      • Tables storing meta-data
      • BLOB tables storing serialized data!
      • Very high writes, reads, or both
      • Don’t have a DBA
      • Temporary & volatile data
      Caching layers are a band aid that fix problems the RDMS was never meant to handle
    • Brewer’s CAP Theorem
      Consistency
      Service operates fully or not at all. You either clicked “Place Order” or you didn’t.
      Availability
      Service is always available with no need for scheduled downtime or maintenance windows.
      Partition Tolerance
      No set of failures less than total network failure is allowed to cause the system to respond incorrectly.
      Pick two.
    • (CA) Consistency, Availability
      • Relational Databases
      Trouble with partitions & scale. Deal with it through replication.
      (CP) Consistency, Partition-Tolerant
      • MongoDB
      • HBase
      • Redis
      Trouble with availability while staying consistent.
      (AP) Availability, Partition-Tolerant
      • CouchDB
      • Cassandra
      • Riak
      • Voldemort
      Trouble with partitions & scale. Deal with it through replication.
    • Non-Relational Databases
      • Key/Value Stores
      • Document Databases
      • Graph Databases
      • Big Data & Warehousing Databases
    • Key/Value Store
      Memcached
      Simple, high-performance distributed memory object caching system.
      Pros:
      • Caching
      • Rate limiting
      • Real-time analytics
      Cons:
      • Serialization
      • Replication
      • Not fault tolerant
      Redis
      Advanced key-value store with support for hashes, lists, sets and sorted sets.
      Pros:
      • Disk-backed, persistent, journaled (fault tolerant)
      • Replication out-of-the-box
      • VERY fast reads/writes
      Cons:
      • Complex to query
    • Key/Value Store
      Cassandra
      Very scalable, distributed and decentralized data store.
      Pros:
      • Extremely fast reads and writes (Twitter boasts 100k/second+)
      • Massive, engaged open source community (Twitter, Facebook)
      • Fault tolerant
      Cons:
      • Java (see: Riak, an Erlang/C alternative that’s very similar)
      • Not production ready
      Voldemort
      LinkedIn’s distributed persistent caching solution.
      Pros:
      • Distributed storage
      • In-memory with disk-backed persistence and fault tolerance (no single POF)
      • Very fast reads and writes (10-20k/second)
      • Drop-in storage layer (great for unit testing mock objects)
      • MVCC
      • Native Serialization (hash tables, arrays, etc.)
    • Document Databases
      MongoDB
      Scalable, high performance database with familiar RDMS functionality.
      Pros:
      • Semi-structured (hash tables, lists, dates, …)
      • Full, range and nested Indexes
      • Replication and distributed storage
      • Query language and Map/Reduce
      • GridFS file storage (NFS replacement)
      • BSON Serialization
      • Capped Collections
      Cons:
      • Map/Reduce is single process (soon to be resolved)
      CouchDB
      Portable, fault-tolerant document database.
      Pros:
      • Bi-directional replication (offline access)
      • Some transaction support (ACID)
      Cons:
      • Complicated to query (Map/Reduce)
    • Graph Databases
      Neo4J
      Designed on an object-oriented, flexible network structure rather than with strict and static tables. Ideal for social networking applications.
      Pros:
      • Read optimized
      • Indexing
      • Complex relationship tree processing
    • Big Data & Warehouse Databases
      HBase
      The Hadoop database. For very large tables (billions of rows times millions of columns) on commodity hardware.
      Pros:
      • On-demand distributed processing (Map/Reduce)
      • ETL optional
      • Integrates tightly in Hadoop ecosystem (Pig, Hive, HDFS)
      Cons:
      • Slow, seconds or minutes (not milliseconds)
      InfiniDB
      Distributed column-oriented database.
      Pros:
      • Data warehousing (high speed data loader)
      • Very fast queries and joins
      • Analytics & Metrics
      Cons:
      • Slow Updates
      • Schema designed up-front (hard to change later)
    • My Two Cents
    • Why Choose MongoDB?
      • Semi-structured Data
      • Native BSON Serialization
      • Full Index Support
      • Built-In Replication & Cluster Management
      • Distributed Storage (Sharding)
      • Easy to Query
      • Fast In-Place Updates
      • GridFS File Storage
      • Capped collections
      MongoDB in many ways “feels” like an RDMS. It’s easy to learn and quick to implement.
    • Semi-Structured Data
      MongoDB is NOT a key/value store. Store complex documents as arrays, hash tables, integers, objects and everything else supported by JSON:
    • Native BSON Serialization
      100,000 serialize/de-serialize runs of bson_encode(), json_encode() and serialize() in the PHP:
      The PHP MongoDB extension serializes the data in C outside of the runtime leading to even better results.
    • Full Index Support
    • Built-In Replication & Cluster Management
      • Data redundancy
      • Fault tolerant (automatic failover AND recovery)
      • Consistency (wait-for-propagate or write-and-forget)
      • Distribute read load
      • Simplified maintenance
      • Servers in the cluster managed by an elected leader
    • Easy to Query
    • Fast In-Place Updates
      MongoDB stores documents in padded memory slots. Typical RDMS updates on VARCHAR columns:
      • Mark the row and index as deleted (without freeing the space)
      • Append the new updated row
      • Append the new index and possibly rebuild the tree
      Most updates are small and don’t drastically change the size of the row:
      • Last login date
      • UUID replace / Password update
      • Session cookie
      • Counters (failed login attempts, visits)
      MongoDB can apply most updates over the
      existing row, keeping the index and data
      structure relatively untouched – and do so VERY FAST.
    • GridFS File Storage
      Efficiently store binary files in MongoDB:
      • Videos
      • Pictures
      • Translations
      • Configuration files
      Data is distributed in 4 or 16 MB chunks and stored redundantly in your MongoDB network.
      • No serialization / fast reads
      • Command line and PHP extension access
    • Capped Collections
      Fixed-size round robin tables with extremely fast reads and writes.
      Perfect for:
      • Logging
      • Messaging
      • Job Queues
      • Caching
      Features:
      • Automatically “ages out” old data
      • Can also query, delete and update out of FIFO order
      • FIFO reads/writes are nearly as fast as cat > file; tail –f /file
      • Tailable cursor stays open as reads rows as they are added
      • Persistent, fault-tolerant, distributed
      • Atomic pop items off the stack
    • Object Document Mapper
      doctrine-project.org/
      projects/mongodb_odm
      The Doctrine MongoDB Object Document Mapper is built for PHP 5.3.2+ and provides transparent persistence for PHP objects.
      The PHP MongoDB extension is simple; but, this makes it even easier for:
      • Document generation seamlessly from your class
      • Query using your existing class structures
      • Very easy migration path from an ORM
      • Rapid Application Development