Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Overview of MongoDB and Other Non-Relational Databases

3,991 views

Published on

My Minnesota PHP Usergroup (mnphp.org) presentation where I give an overview on MongoDB and other non-relational databases and their ability to solve unique, complex problems.

Published in: Technology

Overview of MongoDB and Other Non-Relational Databases

  1. 1. A general overview of the non-relational database<br />By Andrew Kandels<br />
  2. 2. When to use an RDMS?<br />Organized, structured data matched by common characteristics.<br /><ul><li> Financial & Medical Records
  3. 3. Personal Information
  4. 4. Access Control (Usernames & Passwords)
  5. 5. Order Processing
  6. 6. Logistics
  7. 7. Mailing Lists</li></ul>… or, any data that works more efficiently when normalized<br />
  8. 8. What Relational Databases are Bad At<br /><ul><li>Content Management System (CMS)
  9. 9. Real-time Analytics
  10. 10. Caching
  11. 11. Logging and Archiving Events
  12. 12. Messaging
  13. 13. Job Queue
  14. 14. Social Networking
  15. 15. Data Mining and Warehousing</li></li></ul><li>When to Consider NoSQL?<br /><ul><li> De-normalizing SQL as last resort
  16. 16. Consistency can be sacrificed for scale
  17. 17. Dynamic data models
  18. 18. Tables storing meta-data
  19. 19. BLOB tables storing serialized data!
  20. 20. Very high writes, reads, or both
  21. 21. Don’t have a DBA
  22. 22. Temporary & volatile data</li></ul>Caching layers are a band aid that fix problems the RDMS was never meant to handle<br />
  23. 23. Brewer’s CAP Theorem<br />Consistency<br />Service operates fully or not at all. You either clicked “Place Order” or you didn’t.<br />Availability<br />Service is always available with no need for scheduled downtime or maintenance windows.<br />Partition Tolerance<br />No set of failures less than total network failure is allowed to cause the system to respond incorrectly.<br />Pick two.<br />
  24. 24. (CA) Consistency, Availability<br /><ul><li> Relational Databases</li></ul>Trouble with partitions & scale. Deal with it through replication.<br />(CP) Consistency, Partition-Tolerant<br /><ul><li> MongoDB
  25. 25. HBase
  26. 26. Redis</li></ul>Trouble with availability while staying consistent.<br />(AP) Availability, Partition-Tolerant<br /><ul><li> CouchDB
  27. 27. Cassandra
  28. 28. Riak
  29. 29. Voldemort</li></ul>Trouble with partitions & scale. Deal with it through replication.<br />
  30. 30. Non-Relational Databases<br /><ul><li> Key/Value Stores
  31. 31. Document Databases
  32. 32. Graph Databases
  33. 33. Big Data & Warehousing Databases</li></li></ul><li>Key/Value Store<br />Memcached<br />Simple, high-performance distributed memory object caching system.<br />Pros:<br /><ul><li> Caching
  34. 34. Rate limiting
  35. 35. Real-time analytics</li></ul>Cons:<br /><ul><li> Serialization
  36. 36. Replication
  37. 37. Not fault tolerant</li></ul>Redis<br />Advanced key-value store with support for hashes, lists, sets and sorted sets.<br />Pros:<br /><ul><li> Disk-backed, persistent, journaled (fault tolerant)
  38. 38. Replication out-of-the-box
  39. 39. VERY fast reads/writes</li></ul>Cons:<br /><ul><li>Complex to query</li></li></ul><li>Key/Value Store<br />Cassandra<br />Very scalable, distributed and decentralized data store.<br />Pros:<br /><ul><li> Extremely fast reads and writes (Twitter boasts 100k/second+)
  40. 40. Massive, engaged open source community (Twitter, Facebook)
  41. 41. Fault tolerant</li></ul>Cons:<br /><ul><li> Java (see: Riak, an Erlang/C alternative that’s very similar)
  42. 42. Not production ready</li></ul>Voldemort<br />LinkedIn’s distributed persistent caching solution.<br />Pros:<br /><ul><li> Distributed storage
  43. 43. In-memory with disk-backed persistence and fault tolerance (no single POF)
  44. 44. Very fast reads and writes (10-20k/second)
  45. 45. Drop-in storage layer (great for unit testing mock objects)
  46. 46. MVCC
  47. 47. Native Serialization (hash tables, arrays, etc.)</li></li></ul><li>Document Databases<br />MongoDB<br />Scalable, high performance database with familiar RDMS functionality.<br />Pros:<br /><ul><li> Semi-structured (hash tables, lists, dates, …)
  48. 48. Full, range and nested Indexes
  49. 49. Replication and distributed storage
  50. 50. Query language and Map/Reduce
  51. 51. GridFS file storage (NFS replacement)
  52. 52. BSON Serialization
  53. 53. Capped Collections</li></ul>Cons:<br /><ul><li> Map/Reduce is single process (soon to be resolved)</li></ul>CouchDB<br />Portable, fault-tolerant document database.<br />Pros:<br /><ul><li> Bi-directional replication (offline access)
  54. 54. Some transaction support (ACID)</li></ul>Cons:<br /><ul><li>Complicated to query (Map/Reduce)</li></li></ul><li>Graph Databases<br />Neo4J<br />Designed on an object-oriented, flexible network structure rather than with strict and static tables. Ideal for social networking applications.<br />Pros:<br /><ul><li> Read optimized
  55. 55. Indexing
  56. 56. Complex relationship tree processing</li></li></ul><li>Big Data & Warehouse Databases<br />HBase<br />The Hadoop database. For very large tables (billions of rows times millions of columns) on commodity hardware.<br />Pros:<br /><ul><li> On-demand distributed processing (Map/Reduce)
  57. 57. ETL optional
  58. 58. Integrates tightly in Hadoop ecosystem (Pig, Hive, HDFS)</li></ul>Cons:<br /><ul><li> Slow, seconds or minutes (not milliseconds)</li></ul>InfiniDB<br />Distributed column-oriented database.<br />Pros:<br /><ul><li> Data warehousing (high speed data loader)
  59. 59. Very fast queries and joins
  60. 60. Analytics & Metrics</li></ul>Cons:<br /><ul><li>Slow Updates
  61. 61. Schema designed up-front (hard to change later)</li></li></ul><li>My Two Cents<br />
  62. 62. Why Choose MongoDB?<br /><ul><li> Semi-structured Data
  63. 63. Native BSON Serialization
  64. 64. Full Index Support
  65. 65. Built-In Replication & Cluster Management
  66. 66. Distributed Storage (Sharding)
  67. 67. Easy to Query
  68. 68. Fast In-Place Updates
  69. 69. GridFS File Storage
  70. 70. Capped collections</li></ul>MongoDB in many ways “feels” like an RDMS. It’s easy to learn and quick to implement.<br />
  71. 71. Semi-Structured Data<br />MongoDB is NOT a key/value store. Store complex documents as arrays, hash tables, integers, objects and everything else supported by JSON:<br />
  72. 72. Native BSON Serialization<br />100,000 serialize/de-serialize runs of bson_encode(), json_encode() and serialize() in the PHP:<br />The PHP MongoDB extension serializes the data in C outside of the runtime leading to even better results.<br />
  73. 73. Full Index Support<br />
  74. 74. Built-In Replication & Cluster Management<br /><ul><li>Data redundancy
  75. 75. Fault tolerant (automatic failover AND recovery)
  76. 76. Consistency (wait-for-propagate or write-and-forget)
  77. 77. Distribute read load
  78. 78. Simplified maintenance
  79. 79. Servers in the cluster managed by an elected leader</li></li></ul><li>Easy to Query<br />
  80. 80. Fast In-Place Updates<br />MongoDB stores documents in padded memory slots. Typical RDMS updates on VARCHAR columns:<br /><ul><li> Mark the row and index as deleted (without freeing the space)
  81. 81. Append the new updated row
  82. 82. Append the new index and possibly rebuild the tree</li></ul>Most updates are small and don’t drastically change the size of the row:<br /><ul><li> Last login date
  83. 83. UUID replace / Password update
  84. 84. Session cookie
  85. 85. Counters (failed login attempts, visits)</li></ul>MongoDB can apply most updates over the<br />existing row, keeping the index and data<br />structure relatively untouched – and do so VERY FAST.<br />
  86. 86. GridFS File Storage<br />Efficiently store binary files in MongoDB:<br /><ul><li> Videos
  87. 87. Pictures
  88. 88. Translations
  89. 89. Configuration files</li></ul>Data is distributed in 4 or 16 MB chunks and stored redundantly in your MongoDB network.<br /><ul><li> No serialization / fast reads
  90. 90. Command line and PHP extension access</li></li></ul><li>Capped Collections<br />Fixed-size round robin tables with extremely fast reads and writes. <br />Perfect for:<br /><ul><li>Logging
  91. 91. Messaging
  92. 92. Job Queues
  93. 93. Caching</li></ul>Features:<br /><ul><li>Automatically “ages out” old data
  94. 94. Can also query, delete and update out of FIFO order
  95. 95. FIFO reads/writes are nearly as fast as cat > file; tail –f /file
  96. 96. Tailable cursor stays open as reads rows as they are added
  97. 97. Persistent, fault-tolerant, distributed
  98. 98. Atomic pop items off the stack</li></li></ul><li>Object Document Mapper<br />doctrine-project.org/<br /> projects/mongodb_odm<br />The Doctrine MongoDB Object Document Mapper is built for PHP 5.3.2+ and provides transparent persistence for PHP objects.<br />The PHP MongoDB extension is simple; but, this makes it even easier for:<br /><ul><li>Document generation seamlessly from your class
  99. 99. Query using your existing class structures
  100. 100. Very easy migration path from an ORM
  101. 101. Rapid Application Development</li>

×