Does NoSQL feel like a bunch of NoSense to you? If so, you don’t want to miss this workshop! We’ll walk you through the basics of how to use the open database MongoDB.
We’ll begin with an overview of how data is stored in MongoDB and compare that to the table-based (relational) structure you may be used to. Then we’ll get hands-on! You’ll create a database and learn how to perform the basic CRUD (create, read, update, and delete) operations. Then you’ll load a large dataset into your database so you can see how to explore it and understand what schemaless data really looks like. We’ll wrap up with some tips and tricks, so you will leave the workshop feeling confident you’re ready to use MongoDB when you build your next app!
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Intro to MongoDB Workshop
1. Intro to MongoDB
Lauren Schaefer Ken Alger
@Lauren_Schaefer @KenWAlger
While you’re waiting, get
out your laptop and
connect to the Wi-Fi.
Bonus points for
following us on
Twitter
8. #AllThingsOpen #MongoDB @KenWAlger @Lauren_Schaefer
The story of this workshop is that
it’s about MongoDB
1. Create a MongoDB cluster
2. Map terms & concepts from
SQL to MongoDB
3. Load sample data
4. Execute the CRUD operations
5. Tips & tricks
#AllThingsOpen #MongoDB @KenWAlger @Lauren_Schaefer
9. #AllThingsOpen #MongoDB @KenWAlger @Lauren_Schaefer
The story of this workshop is that
it’s about MongoDB
1. Create a MongoDB cluster
2. Map terms & concepts from
SQL to MongoDB
3. Load sample data
4. Execute the CRUD operations
5. Tips & tricks
#AllThingsOpen #MongoDB @KenWAlger @Lauren_Schaefer
13. #AllThingsOpen #MongoDB @KenWAlger @Lauren_Schaefer
The story of this workshop is that
it’s about MongoDB
1. Create a MongoDB cluster
2. Map terms & concepts from
SQL to MongoDB
3. Load sample data
4. Execute the CRUD operations
5. Tips & tricks
33. #AllThingsOpen #MongoDB @KenWAlger @Lauren_Schaefer
The story of this workshop is that
it’s about MongoDB
1. Create a MongoDB cluster
2. Map terms & concepts from
SQL to MongoDB
3. Load sample data
4. Execute the CRUD operations
5. Tips & tricks
47. #AllThingsOpen #MongoDB @KenWAlger @Lauren_Schaefer
Term mapping summary
x
Row Column Table Database Index Join Join
Left Outer
Join
Recursive
Common Table
Expressions
View Transaction
Document Field Collection Database Index Embedding
Database
References
$lookup $graphLookup View Transaction
48. #AllThingsOpen #MongoDB @KenWAlger @Lauren_Schaefer
The story of this workshop is that
it’s about MongoDB
1. Create a MongoDB cluster
2. Map terms & concepts from
SQL to MongoDB
3. Load sample data
4. Execute the CRUD operations
5. Tips & tricks
49.
50.
51. 3. Navigate to https://jupyter.org/try
4. Click “Try JupyterLab”
5. Import the notebook you just downloaded
from GitHub
6. Execute all steps in “Set up”
Prepare for CRUD
1. Navigate to
http://bit.ly/ATO_MongoDB_Notebook
2. Save the file with pynb extension (NOT
txt)
#AllThingsOpen #MongoDB @KenWAlger @Lauren_Schaefer
52. #AllThingsOpen #MongoDB @KenWAlger @Lauren_Schaefer
The story of this workshop is that
it’s about MongoDB
1. Create a MongoDB cluster
2. Map terms & concepts from
SQL to MongoDB
3. Load sample data
4. Execute the CRUD operations
5. Tips & tricks
53.
54.
55. #AllThingsOpen #MongoDB @KenWAlger @Lauren_Schaefer
Use Indexes for Read Speed
• Very important for reads.
• However, they come with overhead.
• New in MongoDB 4.2, Wildcard Indexes
56. #AllThingsOpen #MongoDB @KenWAlger @Lauren_Schaefer
Indexes support the efficient
execution of queries in MongoDB.
Use Indexes for Read Speed
57. #AllThingsOpen #MongoDB @KenWAlger @Lauren_Schaefer
Index Types in MongoDB
Single Field { karma: 1}
Compound Field { karma: 1, user_id: -1 }
Multikey { “address.postal_code”: 1 }
Geospatial
Text
Hashed
Wildcard
58. #AllThingsOpen #MongoDB @KenWAlger @Lauren_Schaefer
Model Data Using Schema Design
Patterns
• Different way of modeling from the legacy database
paradigm.
• Schema Design is important.
59. #AllThingsOpen #MongoDB @KenWAlger @Lauren_Schaefer
Why Do We CreateModels?
Ensure:
• Good performance
• Scalability
despite constraints
Hardware
• RAM faster than Disk
• Disk cheaper than RAM
• Network latency
• Reduce costs $$$
Database Server
• Maximum size for a document
Data set
• Size of data
61. #AllThingsOpen #MongoDB @KenWAlger @Lauren_Schaefer
Add a field to track the
schema version number, per
document
Does not have to exist for
version 1
Pattern:SchemaVersioning
62. #AllThingsOpen #MongoDB @KenWAlger @Lauren_Schaefer
Problem:
Updating the schema of a database is:
• Not atomic
• Long operation
• May not want to update all documents, only do it on updates
SchemaVersioning Pattern
Use cases:
Practically any database that will go to production
63. #AllThingsOpen #MongoDB @KenWAlger @Lauren_Schaefer
Solution:
Have a field keeping track of the schema version
SchemaVersioning Pattern –
Solution
Benefits:
Don't need to update all the documents at once
May not have to update documents until their next modification
64. #AllThingsOpen #MongoDB @KenWAlger @Lauren_Schaefer
Reduce Aggravations with the
Aggregation Framework
• Use whenever possible
• Operations are done server-side
• Order of stages matters
68. #AllThingsOpen #MongoDB @KenWAlger @Lauren_Schaefer
The story of this workshop is that
it’s about MongoDB
1. Create a MongoDB cluster
2. Map terms & concepts from
SQL to MongoDB
3. Load sample data
4. Execute the CRUD operations
5. Tips & tricks
73. #AllThingsOpen #MongoDB @KenWAlger @Lauren_Schaefer
5. Tips & tricks
• Use Indexes for Read Speed
• Model Data Using Schema Design Patterns
• Reduce Aggravation with the Aggregation Pipeline
74. Don’t be Ron Swanson
(in this particular case)
#AllThingsOpen #MongoDB @KenWAlger @Lauren_Schaefer
75. Change your mindset &
get the full value of MongoDB
Don’t be Ron Swanson
#AllThingsOpen #MongoDB @KenWAlger @Lauren_Schaefer
76. #AllThingsOpen #MongoDB @KenWAlger @Lauren_Schaefer
Additional resources on data
modeling patterns
• Advanced Schema Design Patterns (webinar)
• Building with Patterns: A Summary (blog series)
• M320: Data Modeling (MongoDB University Course –
brand new!)
77. #AllThingsOpen #MongoDB @KenWAlger @Lauren_Schaefer
Additional resources
• The MongoDB Docs
• JSON Schema Validation – Locking down your model
the smart way
• JSON Schema Validation - Checking Your Arrays
• M121: The MongoDB Aggregation Framework
78. Don’t be Ron Swanson
(in this particular case)
Change your mindset and get the
full value of MongoDB
Change your mindset &
get the full value of MongoDB
Get the slides on our Twitter
pages:
@KenWAlger
@Lauren_Schaefer
#AllThingsOpen #MongoDB @KenWAlger @Lauren_Schaefer
Please rate this
session in the
app!
3 problems
Snail mail is way slower than posting the review to yelp where it will be instantly available
The business he’s reviewing may never open the review
No one else will benefit from the review
At a high level, a cluster is a set of nodes where copies of your database will be stored.
Download the MongoDB Server, run it, and manage it yourself.
MongoDB Atlas – full managed database as a service
JavaScript Object Notation
MongoDB’s document model is not just a key-value store.
Fields
Field values
Field values
Strings
32-bit integer
Double, longs, and decimals
Geo-location.
Fields can contain arrays. Here we see an array of Strings
Fields can contain an array of subdocuments
The value of a field can be a variety of types including objects, booleans, dates, and timestamps among others.
When people talk about MongoDB, they’ll often use the term nonrelational. But that doesn’t mean we don’t store relationships. We just do it in a different way. Let’s walk through an example of how you would model this same data we have in the document on the left in SQL tables
We store documents together in Collections. Collections roughly map to SQL tables.
Not all documents have to have the same shape. We use the term polymorphic for this. As you can see, the Lauren document doesn’t have fields for location or cars, which is completely ok. We simply omit the fields from the document.
In the SQL world, all rows in a table must have the same fields. Since we don’t have location data for Lauren, we have to use NULL values, which is typically discouraged.
Now let’s take a look at the Sydney document. In this case, Sydney is a kid. She doesn’t have a lot of the data that Paul did. She’s missing a cell phone, location, profession, and car data. If we take a look at the Users table on the right, we can see that she has NULL values in her row just like Lauren did.
Since Sydney is a kid, she has an extra field that neither the Paul document nor the Lauren document had. She has a school field. When you’re using documents, this is not a problem. You can simply add the field as we did here.
Now when we look at the SQL table on the right, things get a bit more complicated. What do we have to do to store this data? We need to convince our DBA to add the field, take our database down, add the school column, add NULL values for every row that doesn’t have a value for school, and bring the database back up. It’s a bit messier.
http://gph.is/28MrIOY
http://gph.is/28MrIOY
There are a few different ways to handle joins in MongoDB.
In general, we recommend just embedding the information in a subdocument or an array that you would put in a separate tabled. The rule of thumb is that data that is accessed together should be stored together. So, if you’ll be accessing the information that you would have put in separate tables together most of the time, you should likely just embed it.
As I just said, for many use cases, embedding in a single document is optimal. In some cases, it makes sense to store related information in separate documents, typically in different collections or databases. I’m not going to get into the details here of how to do this, but basically, you can create a reference from one document to another—very similar to how you would use foreign keys in SQL.
Another option is to use $lookup. You can use $lookup when you are using the Aggregation Pipeline. $lookup is roughly equivalent to a left outer join. Again, I’m not going to get into the details of how $lookup works, but I want you to be aware that $lookup exists.
Caveats Unsharded collection in the same database
Model an org chart
In MongoDB 4.0, transactions work across a replica set. Check out the keynote tomorrow for some exciting announcements around transactions.
(MongoDB 4.2 extend support to transactions across a sharded deployment*)
http://gph.is/1ZTM9ct
4 advantages in no particular order
http://gph.is/1ZTM9ct
4 advantages in no particular order
People pick up MongoDB and try to use it as a relational DB are the ones who fail and struggle. You can’t keep doing things in the same way.
http://gph.is/XK6p3t
Be explicit these are 3 things in no particular order.
People pick up MongoDB and try to use it as a relational DB are the ones who fail and struggle. You can’t keep doing things in the same way.
http://gph.is/XK6p3t
Be explicit these are 3 things in no particular order.
A database index is a data structure which improves the speed of data retrieval operations on a database.
MongoDB supports a wide variety of indexes, which I’ll get into further here shortly.
One question I get a lot is, “If indexes improve the speed of queries, why don’t I just index every field?”
Well, indexes come at a cost. Remember that I said that an index is a data structure. Indexes are not free as they have to be updated when new data is added.
However, they greatly enhance the query performance, so there’s a balance to be struck there. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect.
Single Field – in addition to the _id field, user-defined asc/desc index can be created.
Compound Field – user-defined indexes on multiple fields. Order is important. { karma: 1, user_id: -1} sorts the index first by Karma, then within each Karma score, sorts by user_id.
Multikey – used to index on content stored inside arrays
Geospatial – geospatial coordinate data indexes to provide efficient queries, such as ”Show me all restaurants within three miles of my office.”
Text – support for searching for string content in a collection. Indexes do not store language specific stop words, like “the”, “a”, “or” and only store root words.
Hashed – support for hashed bashed sharding.
Wildcard – New in 4.2 supports queries against unknown or arbitrary fields
Performance & scalability, "air"
Before we get going, let's just answer why we create models.
In a perfect world, you don't really have to model.
I mean if everything is super fast and resources are abundant, you really don't care where and how data is stored
Every day I get up I don't make plans on how I will breathe air.
However if you go to space or under water, you will need a "design" that will let you get the amount of air you need.
Instead of using a "version" field, we could discover the version number based on fields
- Few million references would not even fit into an embedded array. And if it did, you would not want to construct a query by passing a million values to the $in operator.
The Aggregation Pipeline is similar in concept to a funnel. A bunch of documents start at the top of the funnel, or left side here, a series of operations are performed on the documents, and the results come out at the end.
In this example, we have a bunch of different documents on the left, we do a match for all documents that have red diamonds, which reduces our dataset.
We do a project stage which allows us to reshape our documents. In this example we’re reshaping the snow flake and triangle to be a square, regardless of color of those shapes.
Then we do a $lookup stage, which is a JOIN operation.
Finally in this example, we do a group stage and group items based on the color of the square.
This is similar in concept to the Unix pipeline.
Here we’re getting the running processes, searching for the mongod process, and printing out the first line of the data.
In the MongoDB Aggregation Framework, and it’s pipeline, instead of *nix commands, it's stages and what's going through them are documents.
We’ll take a look at some of the stages today, but this is a big topic overall.