A general overview of the non-relational database<br />By Andrew Kandels<br />
When to use an RDMS?<br />Organized, structured data matched by common characteristics.<br /><ul><li> Financial & Medical ...
 Personal Information
 Access Control (Usernames & Passwords)
 Order Processing
 Logistics
 Mailing Lists</li></ul>… or, any data that works more efficiently when normalized<br />
What Relational Databases are Bad At<br /><ul><li>Content Management System (CMS)
Real-time Analytics
Caching
Logging and Archiving Events
Messaging
Job Queue
Social Networking
Data Mining and Warehousing</li></li></ul><li>When to Consider NoSQL?<br /><ul><li> De-normalizing SQL as last resort
 Consistency can be sacrificed for scale
 Dynamic data models
 Tables storing meta-data
 BLOB tables storing serialized data!
 Very high writes, reads, or both
 Don’t have a DBA
 Temporary & volatile data</li></ul>Caching layers are a band aid that fix problems the RDMS was never meant to handle<br />
Brewer’s CAP Theorem<br />Consistency<br />Service operates fully or not at all. You either clicked “Place Order” or you d...
Upcoming SlideShare
Loading in …5
×

Overview of MongoDB and Other Non-Relational Databases

3,821 views
3,626 views

Published on

My Minnesota PHP Usergroup (mnphp.org) presentation where I give an overview on MongoDB and other non-relational databases and their ability to solve unique, complex problems.

Published in: Technology

Overview of MongoDB and Other Non-Relational Databases

  1. 1. A general overview of the non-relational database<br />By Andrew Kandels<br />
  2. 2. When to use an RDMS?<br />Organized, structured data matched by common characteristics.<br /><ul><li> Financial & Medical Records
  3. 3. Personal Information
  4. 4. Access Control (Usernames & Passwords)
  5. 5. Order Processing
  6. 6. Logistics
  7. 7. Mailing Lists</li></ul>… or, any data that works more efficiently when normalized<br />
  8. 8. What Relational Databases are Bad At<br /><ul><li>Content Management System (CMS)
  9. 9. Real-time Analytics
  10. 10. Caching
  11. 11. Logging and Archiving Events
  12. 12. Messaging
  13. 13. Job Queue
  14. 14. Social Networking
  15. 15. Data Mining and Warehousing</li></li></ul><li>When to Consider NoSQL?<br /><ul><li> De-normalizing SQL as last resort
  16. 16. Consistency can be sacrificed for scale
  17. 17. Dynamic data models
  18. 18. Tables storing meta-data
  19. 19. BLOB tables storing serialized data!
  20. 20. Very high writes, reads, or both
  21. 21. Don’t have a DBA
  22. 22. Temporary & volatile data</li></ul>Caching layers are a band aid that fix problems the RDMS was never meant to handle<br />
  23. 23. Brewer’s CAP Theorem<br />Consistency<br />Service operates fully or not at all. You either clicked “Place Order” or you didn’t.<br />Availability<br />Service is always available with no need for scheduled downtime or maintenance windows.<br />Partition Tolerance<br />No set of failures less than total network failure is allowed to cause the system to respond incorrectly.<br />Pick two.<br />
  24. 24. (CA) Consistency, Availability<br /><ul><li> Relational Databases</li></ul>Trouble with partitions & scale. Deal with it through replication.<br />(CP) Consistency, Partition-Tolerant<br /><ul><li> MongoDB
  25. 25. HBase
  26. 26. Redis</li></ul>Trouble with availability while staying consistent.<br />(AP) Availability, Partition-Tolerant<br /><ul><li> CouchDB
  27. 27. Cassandra
  28. 28. Riak
  29. 29. Voldemort</li></ul>Trouble with partitions & scale. Deal with it through replication.<br />
  30. 30. Non-Relational Databases<br /><ul><li> Key/Value Stores
  31. 31. Document Databases
  32. 32. Graph Databases
  33. 33. Big Data & Warehousing Databases</li></li></ul><li>Key/Value Store<br />Memcached<br />Simple, high-performance distributed memory object caching system.<br />Pros:<br /><ul><li> Caching
  34. 34. Rate limiting
  35. 35. Real-time analytics</li></ul>Cons:<br /><ul><li> Serialization
  36. 36. Replication
  37. 37. Not fault tolerant</li></ul>Redis<br />Advanced key-value store with support for hashes, lists, sets and sorted sets.<br />Pros:<br /><ul><li> Disk-backed, persistent, journaled (fault tolerant)
  38. 38. Replication out-of-the-box
  39. 39. VERY fast reads/writes</li></ul>Cons:<br /><ul><li>Complex to query</li></li></ul><li>Key/Value Store<br />Cassandra<br />Very scalable, distributed and decentralized data store.<br />Pros:<br /><ul><li> Extremely fast reads and writes (Twitter boasts 100k/second+)
  40. 40. Massive, engaged open source community (Twitter, Facebook)
  41. 41. Fault tolerant</li></ul>Cons:<br /><ul><li> Java (see: Riak, an Erlang/C alternative that’s very similar)
  42. 42. Not production ready</li></ul>Voldemort<br />LinkedIn’s distributed persistent caching solution.<br />Pros:<br /><ul><li> Distributed storage
  43. 43. In-memory with disk-backed persistence and fault tolerance (no single POF)
  44. 44. Very fast reads and writes (10-20k/second)
  45. 45. Drop-in storage layer (great for unit testing mock objects)
  46. 46. MVCC
  47. 47. Native Serialization (hash tables, arrays, etc.)</li></li></ul><li>Document Databases<br />MongoDB<br />Scalable, high performance database with familiar RDMS functionality.<br />Pros:<br /><ul><li> Semi-structured (hash tables, lists, dates, …)
  48. 48. Full, range and nested Indexes
  49. 49. Replication and distributed storage
  50. 50. Query language and Map/Reduce
  51. 51. GridFS file storage (NFS replacement)
  52. 52. BSON Serialization
  53. 53. Capped Collections</li></ul>Cons:<br /><ul><li> Map/Reduce is single process (soon to be resolved)</li></ul>CouchDB<br />Portable, fault-tolerant document database.<br />Pros:<br /><ul><li> Bi-directional replication (offline access)
  54. 54. Some transaction support (ACID)</li></ul>Cons:<br /><ul><li>Complicated to query (Map/Reduce)</li></li></ul><li>Graph Databases<br />Neo4J<br />Designed on an object-oriented, flexible network structure rather than with strict and static tables. Ideal for social networking applications.<br />Pros:<br /><ul><li> Read optimized
  55. 55. Indexing
  56. 56. Complex relationship tree processing</li></li></ul><li>Big Data & Warehouse Databases<br />HBase<br />The Hadoop database. For very large tables (billions of rows times millions of columns) on commodity hardware.<br />Pros:<br /><ul><li> On-demand distributed processing (Map/Reduce)
  57. 57. ETL optional
  58. 58. Integrates tightly in Hadoop ecosystem (Pig, Hive, HDFS)</li></ul>Cons:<br /><ul><li> Slow, seconds or minutes (not milliseconds)</li></ul>InfiniDB<br />Distributed column-oriented database.<br />Pros:<br /><ul><li> Data warehousing (high speed data loader)
  59. 59. Very fast queries and joins
  60. 60. Analytics & Metrics</li></ul>Cons:<br /><ul><li>Slow Updates
  61. 61. Schema designed up-front (hard to change later)</li></li></ul><li>My Two Cents<br />
  62. 62. Why Choose MongoDB?<br /><ul><li> Semi-structured Data
  63. 63. Native BSON Serialization
  64. 64. Full Index Support
  65. 65. Built-In Replication & Cluster Management
  66. 66. Distributed Storage (Sharding)
  67. 67. Easy to Query
  68. 68. Fast In-Place Updates
  69. 69. GridFS File Storage
  70. 70. Capped collections</li></ul>MongoDB in many ways “feels” like an RDMS. It’s easy to learn and quick to implement.<br />
  71. 71. Semi-Structured Data<br />MongoDB is NOT a key/value store. Store complex documents as arrays, hash tables, integers, objects and everything else supported by JSON:<br />
  72. 72. Native BSON Serialization<br />100,000 serialize/de-serialize runs of bson_encode(), json_encode() and serialize() in the PHP:<br />The PHP MongoDB extension serializes the data in C outside of the runtime leading to even better results.<br />
  73. 73. Full Index Support<br />
  74. 74. Built-In Replication & Cluster Management<br /><ul><li>Data redundancy
  75. 75. Fault tolerant (automatic failover AND recovery)
  76. 76. Consistency (wait-for-propagate or write-and-forget)
  77. 77. Distribute read load
  78. 78. Simplified maintenance
  79. 79. Servers in the cluster managed by an elected leader</li></li></ul><li>Easy to Query<br />
  80. 80. Fast In-Place Updates<br />MongoDB stores documents in padded memory slots. Typical RDMS updates on VARCHAR columns:<br /><ul><li> Mark the row and index as deleted (without freeing the space)
  81. 81. Append the new updated row
  82. 82. Append the new index and possibly rebuild the tree</li></ul>Most updates are small and don’t drastically change the size of the row:<br /><ul><li> Last login date
  83. 83. UUID replace / Password update
  84. 84. Session cookie
  85. 85. Counters (failed login attempts, visits)</li></ul>MongoDB can apply most updates over the<br />existing row, keeping the index and data<br />structure relatively untouched – and do so VERY FAST.<br />
  86. 86. GridFS File Storage<br />Efficiently store binary files in MongoDB:<br /><ul><li> Videos
  87. 87. Pictures
  88. 88. Translations
  89. 89. Configuration files</li></ul>Data is distributed in 4 or 16 MB chunks and stored redundantly in your MongoDB network.<br /><ul><li> No serialization / fast reads
  90. 90. Command line and PHP extension access</li></li></ul><li>Capped Collections<br />Fixed-size round robin tables with extremely fast reads and writes. <br />Perfect for:<br /><ul><li>Logging
  91. 91. Messaging
  92. 92. Job Queues
  93. 93. Caching</li></ul>Features:<br /><ul><li>Automatically “ages out” old data
  94. 94. Can also query, delete and update out of FIFO order
  95. 95. FIFO reads/writes are nearly as fast as cat > file; tail –f /file
  96. 96. Tailable cursor stays open as reads rows as they are added
  97. 97. Persistent, fault-tolerant, distributed
  98. 98. Atomic pop items off the stack</li></li></ul><li>Object Document Mapper<br />doctrine-project.org/<br /> projects/mongodb_odm<br />The Doctrine MongoDB Object Document Mapper is built for PHP 5.3.2+ and provides transparent persistence for PHP objects.<br />The PHP MongoDB extension is simple; but, this makes it even easier for:<br /><ul><li>Document generation seamlessly from your class
  99. 99. Query using your existing class structures
  100. 100. Very easy migration path from an ORM
  101. 101. Rapid Application Development</li>

×