Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2

Download to read offline

MongoDB 4.2 comes GA soon delivering some amazing new features on multiple areas. In this talk, we will focus on the new capabilities of the aggregation framework. We are going to cover the new operators and expressions. At the same time, we will explore how updates commands can now use the aggregation framework operators. We are also going to present aggregation framework improvements focusing on the on-demand materialized views. Finally, we are going to explore the wildcard indexes introduced in MongoDB 4.2 and how they change the way we design documents and build queries/aggregations. We will also make a reference to the new index build system.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2

  1. 1. Indexing & Aggregation in MongoDB 4.2 #what_is_new Antonios Giannopoulos DBA @ ObjectRocket by Rackspace Connect:linkedin.com/in/antonis/ Follow:@iamantonios 1
  2. 2. Introduction www.objectrocket.com 2 Antonios Giannopoulos Database troubleshooter aka troublemaker @ObjectRocket Troubleshoot: MongoDB, CockroachDB & Postgres Troublemaking: All the things
  3. 3. Overview • Index builds • Index Limitations • Wildcard Indexes • Materialized views • Updates using aggregations • Server side updates www.objectrocket.com 3
  4. 4. The art of typing… www.objectrocket.com 4 How many of you have mistype a mongo command? How many of you have mistype the background on an index build?
  5. 5. Index builds www.objectrocket.com 5 db.createIndex({keys},{options}) db.createIndex({keys},{options, background:true}) In MongoDB 4.2 the backgrund flag is deprecated!!! MongoDB 4.2 is using a new hybrid approach (best of two worlds)
  6. 6. Hybrid Index - Build Stages www.objectrocket.com 6 Initialization o Exclusive lock against the collection being indexed o Application is blocked Data Ingestion and Processing o Intent locks against the collection being indexed o Application can read/write Cleanup o Exclusive lock (same as initialization) Completion o Makes Index available
  7. 7. Build Stages - Verbose www.objectrocket.com 7 • Lock - obtains an exclusive X lock • Initialization • Lock - downgrades the exclusive X collection lock to an intent exclusive IX lock • Scan Collection • Process Side Writes Table • Lock - shared S lock • Finish Processing Temporary Side Writes Table • Lock - Upgrades the shared S lock on the collection to an exclusive X lock • Drop Side Write Table • Process Constraint Violation Table • Mark the Index as Ready • Lock - Releases the X lock on the collection
  8. 8. Index builds - Animated www.objectrocket.com 8 Index Metadata Entry Collection Side Writes Table Constraint Violation Table
  9. 9. Index builds - Logs www.objectrocket.com 9
  10. 10. Index builds – Nothing fancy www.objectrocket.com 10
  11. 11. Index builds – Measure Impact 11
  12. 12. Index builds – Impact (4.0) www.objectrocket.com 12 Foreground Index in MongoDB 4.0 o One big lock o Locks the database vs Collection with Hybrid Background Index in MongoDB 4.0 o Execution time didn’t affected o Latency was affected o Doesn’t lock vs the locks Hybrid needs
  13. 13. Index builds – Size & Time www.objectrocket.com 13 Time Foreground Index: 43950ms (43,95 seconds) Background Index: 675243ms (675,24 seconds) Hybrid Index: 116467ms (116,46 seconds) Size Foreground Index : 176635904 (168.45 MiB) Background Index : 365649920 (348.71 MiB) Hybrid Index : 176934912 (168.73 MiB)
  14. 14. Index Limitations • Index Key Limit • Index Name www.objectrocket.com 14
  15. 15. Index Key Limit www.objectrocket.com 15 Before 4.2: The total size of an index entry, which can include structural overhead depending on the BSON type, must be less than 1024 bytes. >q=db.limitations.findOne({},{payload:1,_id:0}) > Object.bsonsize(q)/1024 10.5458984375 With > db.adminCommand( { setFeatureCompatibilityVersion: "4.0" } )
  16. 16. Index Key Limit www.objectrocket.com 16 Starting in version 4.2, MongoDB removes the Index Key Limit for FCV set to "4.2" or greater. >q=db.limitations.findOne({},{payload:1,_id:0}) > Object.bsonsize(q)/1024 10.5458984375 With > db.adminCommand( { setFeatureCompatibilityVersion: "4.2" } )
  17. 17. Index Name Length Limit www.objectrocket.com 17 Before 4.2: fully qualified index names, which include the namespace and the dot separators (i.e. <database name>.<collection name>.$<index name>), cannot be longer than 127 bytes
  18. 18. Index Name Length Limit www.objectrocket.com 18 Starting in version 4.2, MongoDB removes the Index Name Length Limit for MongoDB versions with FVC set to "4.2" or greater.
  19. 19. Wildcard Indexes • State prior to 4.2 • Definition • Use cases • Limitations www.objectrocket.com 19
  20. 20. Indexing metadata www.objectrocket.com 20 There can be no more than 32 fields in a compound index A single collection can have no more than 64 indexes The above limitations may cause issues in a data model like: o Too many combos, o Too many indexes, o too many fields for a single index, o sparse fields, new fields … etc
  21. 21. Indexing metadata 21 The key-value store approach:
  22. 22. Wildcard Indexes www.objectrocket.com 22 Create a wildcard index on a <field> db.collection.createIndex( { ”<field>.$**" : 1 } ) Create a wildcard index on all fields (excluding _id) db.collection.createIndex( { "$**" : 1 } ) Specify fields to index db.collection.createIndex( { "$**" : 1 }, { "wildcardProjection" : { ”<field>" : 1, ”<field>.<subfield>" : 1 } }) Specify fields to exclude from index db.collection.createIndex( { "$**" : 1 }, { "wildcardProjection" : { ”<field>" : 0, ”<field>.<subfield>" : 0 } })
  23. 23. Wildcard Indexes www.objectrocket.com 23 Same example as the key-value approach
  24. 24. Considerations / Notes www.objectrocket.com 24 o Support at most one field in any given query predicate. o The featureCompatibilityVersion must be 4.2 o Wildcard indexes omit the _id field by default o You can create multiple wildcard indexes in a collection o A wildcard index may cover the same fields as other indexes in the collection o Wildcard indexes are Sparse Indexes
  25. 25. Considerations / Notes www.objectrocket.com 25 Wildcard indexes can support a covered query only if all of the following are true: o The query planner selects the wildcard index for satisfying the query predicate. o The query predicate specifies exactly one field covered by the wildcard index. o The projection explicitly excludes _id and includes only the query field. o The specified query field is never an array.
  26. 26. Considerations / Notes www.objectrocket.com 26 MongoDB can use a wildcard index for satisfying the sort() only if all of the following are true: o The query planner selects the wildcard index for satisfying the query predicate. o The sort() specifies only the query predicate field. o The specified field is never an array
  27. 27. Considerations / Notes www.objectrocket.com 27 Wildcard indexes can support at most one query predicate field. That is: o MongoDB cannot use a non-wildcard index to satisfy one part of a query predicate and a wildcard index to satisfy another. o MongoDB cannot use one wildcard index to satisfy one part of a query predicate and another wildcard index to satisfy another. o Even if a single wildcard index could support multiple query fields, MongoDB can use the wildcard index to support only one of the query fields. All remaining fields are resolved without an index. $or is not restricted by the above limitation (query and aggregation).
  28. 28. Considerations / Notes www.objectrocket.com 28 Unsupported query patterns Wildcard indexes, cannot support : o query condition that checks if a field does not exist. o query condition that checks if a field is or is not equal to a document or an array o query condition that checks if a field is not equal to null. o the $min or $max aggregation operators.
  29. 29. Restrictions www.objectrocket.com 29 You cannot shard a collection using a wildcard index You cannot create a compound index. You cannot specify the following properties for a wildcard index: o TTL o Unique You cannot create the following index types using wildcard syntax: o 2d (Geospatial) o 2dsphere (Geospatial) o Hashed
  30. 30. Comparison – Nothing fancy II 30
  31. 31. Natural vs Key-Value Model www.objectrocket.com 31
  32. 32. Natural vs Key-Value Model www.objectrocket.com 32 Both indexes scan only one branch of the $and
  33. 33. Natural vs Key-Value Model www.objectrocket.com 33 Lets’ add price to the equation - Wildcard doesn’t support compound indexes Scans the index for Red Scans the index for Price Scans for Price & Red
  34. 34. Natural vs Key-Value Model www.objectrocket.com 34 o Key-Value adds overhead to the collection (doc size) o Both indexing models can utilize one field (combo for the k-v) o $exists:false only can be satisfied by the key-value model o Key-value supports compound o Natural can cover a query(see considerations), key-value don’t (multikey) o Key-value looks more flexible…(with lot buts…) o Natural is a good idea for selective fields / unpredicted queries
  35. 35. Aggregation Framework - New Operators • Trigonometry Expressions • Arithmetic Expressions • Regular Expressions • New Stages www.objectrocket.com 35
  36. 36. Trigonometry Expressions www.objectrocket.com 36 $sin Returns the sine of a value that is measured in radians. $cos Returns the cosine of a value that is measured in radians. $tan Returns the tangent of a value that is measured in radians. $degreesToRadians Converts a value from degrees to radians. $radiansToDegrees Converts a value from radians to degrees. Full list of trigonometry expressions https://bit.ly/2knln2D Example:
  37. 37. Arithmetic Expressions www.objectrocket.com 37 MongoDB 4.2 adds the $round aggregation expression. MongoDB 4.2 adds expanded functionality and new syntax to $trunc Example:
  38. 38. Regular Expressions www.objectrocket.com 38 $regexFind Applies a regular expression (regex) to a string and returns information on the first matched substring $regexFindAll Applies a regular expression (regex) to a string and returns information on all matched substrings. $regexMatch Applies a regular expression (regex) to a string and returns true if a match is found and false if a match is not found. Example:
  39. 39. New Stages www.objectrocket.com 39 $merge Writes the aggregation results to a collection $planCacheStats Provides plan cache information for a collection $replaceWith Replaces the input document with the specified document. Alias to the $replaceRoot stage $set Adds new fields to documents. Alias to the $addFields $unset Excludes fields from documents. Alias to the $project stage
  40. 40. Materialized Views • Logical vs Physical view • Definition • Implementation • Use cases • Under the hood • Considerations www.objectrocket.com 40
  41. 41. Materialized Views www.objectrocket.com 41 MongoDB 3.4 adds support of views on a collection: o MongoDB computes the view contents by executing the aggregation on- demand during read operations o Views are not associated with data structures on disk o db.runCommand( { create: <view>, viewOn: <source>, pipeline: <pipeline> } ) MongoDB 4.2 adds support of materialized views on a collection: o Introduces $merge stage for the aggregation pipeline o Materialized views are associated with data structures on disk (collections) { $merge: { into: <collection> -or- { db: <db>, coll: <collection> }, //Mandatory on: <identifier field> -or- [ <identifier field1>, ...], // Optional let: <variables>, // Optional whenMatched: <replace|keepExisting|merge|fail|pipeline>, // Optional whenNotMatched: <insert|discard|fail> // Optional } }
  42. 42. MV – Definition - Into www.objectrocket.com 42 Into: The collection name. Format: into: "myOutput” or into: { db:"myDB", coll:"myOutput" } If the output collection does not exist, $merge creates the collection: o For a replica set, if the output database does not exist, $merge also creates the database o For a sharded cluster, the specified output database must already exist The output collection cannot be the same collection as the collection being aggregated The output collection cannot appear in any other stages of the pipeline The output collection can be a sharded collection
  43. 43. MV – Definition - Into www.objectrocket.com 43
  44. 44. MV – Definition - On www.objectrocket.com 44 On: (Optional) Field or fields that act as a unique identifier for a document. Format: on: "_id” on: [ "date", "customerId" ] The order of the fields in the array does not matter, and you cannot specify the same field multiple times. For the specified field (or fields): o The aggregation results documents must contain the field(s) specified in the on, unless the on field is the _id field o The specified field or fields cannot contain a null or an array value. o $merge requires a unique index with keys that correspond to the on identifier fields. o For output collections that already exist, the corresponding index must already exist. The default value for on depends on the output collection
  45. 45. MV – Definition - On www.objectrocket.com 45
  46. 46. MV–Definition - WhenMatched www.objectrocket.com 46 WhenMatched: (Optional): The behavior of $merge if a result document and an existing document in the collection have the same value for the specified on field(s). Options: replace: Replace the existing document in the output collection with the matching results document. keepexisting: Keep the existing document in the output collection. merge (default): Merge the matching documents (similar to the $mergeObjects operator)
  47. 47. MV–Definition - WhenMatched www.objectrocket.com 47 Options (continue): fail: Stop and fail the aggregation operation. Any changes to the output collection from previous documents are not reverted. Pipeline: An aggregation pipeline to update the document in the collection. Can only contain $addFields and its alias $set, $project and its alias $unset, $replaceRoot and its alias $replaceWith
  48. 48. MV–Definition - WhenMatched www.objectrocket.com 48
  49. 49. MV–Definition - WhenMatched www.objectrocket.com 49
  50. 50. MV–Definition - WhenNotMatched www.objectrocket.com 50 WhenNotMatched: Optional. The behavior of $merge if a result document does not match an existing document in the out collection. Options insert (Default): Insert the document into the output collection. discard: Discard the document; i.e. $merge does not insert the document into the output collection. fail: Stop and fail the aggregation operation. Any changes to the output collection from previous documents are not reverted.
  51. 51. MV–Definition - WhenNotMatched www.objectrocket.com 51
  52. 52. MV – Combine collections www.objectrocket.com 52
  53. 53. MV – Under the hood www.objectrocket.com 53 Merge performs a $set update replace performs a full update (updateobj) keepExisting performs a setOnInsert
  54. 54. MV – Under the hood www.objectrocket.com 54 Insert performs a $set update, with {upsert:true} discard performs a $set update, with {upsert:false} What about fail?
  55. 55. MV - $merge vs $out www.objectrocket.com 55 $merge $out Available starting in MongoDB 4.2 Available starting in MongoDB 2.6 Can output to a collection in the same or different database. Can output to a collection in the same database Creates a new collection if the output collection does not already exist Creates a new collection if the output collection does not already exist Can incorporate results (Previous slides) Replaces the output collection completely if it already exists Input/Output can be sharded Only Input can be sharded
  56. 56. MV–$merge restrictions www.objectrocket.com 56 o The output collection cannot be: - the same collection as the collection being aggregated - a collection that appears in any other stages of the pipeline ($lookup) o An aggregation pipeline cannot use $merge inside a transaction. o View definition cannot include the $merge stage o $lookup or $facet stage’s nested pipeline cannot include the $merge stage o The $merge stage cannot be used in conjunction with read concern "linearizable"
  57. 57. MV – Use Cases www.objectrocket.com 57 Reporting: Rolling up a summary of sales daily Pre-compute aggregation: Aggregating averages of events every N <time unit>. Data warehouse: Merging new different sources of data on a single view Caching: Keep a subset of documents that meet read requirements A use case based on: The Concept of Materialized Views in MongoDB Sharded Clusters https://www.percona.com/community-blog/2019/07/16/concept-materialized-views-mongodb- sharded-clusters/
  58. 58. MV – Scatter Gather www.objectrocket.com 58 Scenario: An update heavy user profile collection, with reads on various fields – including _id & email Our Goal: Avoid scatter-gather as much as possible Two collections: o Users: Contains all user related information (sharded on _id:”hashed”) o Cache: Contains static content (sharded on email) o Two queries instead of one using the _id from cache o Refresh on regular intervals o On “fail” app retries on users collection (scatter-gather)
  59. 59. More expressive Update Language • Aggregation framework & updates • Server-side updates www.objectrocket.com 59
  60. 60. Aggr. Expressions on Updates www.objectrocket.com 60 Starting in MongoDB 4.2, you can use the aggregation pipeline for updates operation The statements that can use the aggregation pipeline are: o db.collection.findAndModify() o db.collection.findOneAndUpdate() o db.collection.updateOne() o db.collection.updateMany() o db.collection.update() Meaning: o Updates can be specified with the aggregation pipeline o All field from existing document can be accessed o More powerful but slower…
  61. 61. Examples-Handle exceptions 61
  62. 62. Handle missing/default values 62
  63. 63. Upsert an array with documents 63
  64. 64. The oplog… Update 64
  65. 65. The oplog… Aggregation 65
  66. 66. Recap & Takeways www.objectrocket.com 66 Hybrid indexes: Less impactful than foreground, faster than background (Best of both worlds!) We still recommend the secondary build method for large collections Wildcard Indexes: Very powerful when the queries are unknown Better than the key-value model when no other field involves in the query predicates Aggregation framework: New operators and stages introduced New stage $merge creates materialized views Aggregation framework language can be used on update operations More powerful updates but slower…
  67. 67. www.objectrocket.com 67
  68. 68. Rate My Session www.objectrocket.com 68 I still have problems sleeping, but count bugs in 4.2 helps me sleep I had problems sleeping, but I took a quick nap I had problems sleeping but not anymore
  69. 69. www.objectrocket.com 69 We’re Hiring! Looking to join a dynamic & innovative team? https://www.objectrocket.com/careers/
  70. 70. Questions? www.objectrocket.com 70
  71. 71. Thank you! Address: 9001 N Interstate Hwy 35 #150, Austin, TX 78753 Support: US Toll free: 1-855-722-8165 UK Toll free +448081686840 support@objectrocket.com Sales: 1-888-440-3242 sales@objectrocket.com www.objectrocket.com 71
  • netizen1976

    Jun. 28, 2020

MongoDB 4.2 comes GA soon delivering some amazing new features on multiple areas. In this talk, we will focus on the new capabilities of the aggregation framework. We are going to cover the new operators and expressions. At the same time, we will explore how updates commands can now use the aggregation framework operators. We are also going to present aggregation framework improvements focusing on the on-demand materialized views. Finally, we are going to explore the wildcard indexes introduced in MongoDB 4.2 and how they change the way we design documents and build queries/aggregations. We will also make a reference to the new index build system.

Views

Total views

1,453

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

21

Shares

0

Comments

0

Likes

1

×