2. Agenda
• What is MongoDB?
• Replica Sets
• Sharding and Sharded Cluster
• MapReduce
• Spark and MongoDB
3. What is MongoDB?
• Open-source
• NoSQL
• Community / Enterprise versions
• Developed by MongoDB Inc. (formerly 10gen) in C++, C and JavaScript
• Cross-platform: Windows, Linux, OS X, Solaris, FreeBSD
• Document-oriented: stores extended binary JSON = BSON documents
• Stores any binary data like videos, pictures ... in GridFS
• Database development in JavaScript (standard libraries and user defined functions)
• Deploy, monitor, back up and scale MongoDB: Ops Manager
• Use MongoDB as a data source for your SQL-based BI: MongoDB Connector for BI,
SlamData
• Cross-platform UI for development: Robomongo
• Hosted MongoDB as a service: MongoDB Atlas
• Hosted platform for managing MongoDB: MongoDB Cloud Manager
• An another cloud provider: mLab
7. use blogdb
db.blog.insert({
"title" : "My Blog Post",
"content" : "Here's my blog post.",
"date" : ISODate("2016-08-24T21:12:09.982Z")
});
db.blog.find()
{ "_id" : ObjectId("591be0f4c79fa21e08c2e24e"), "title" : "My Blog Post", "content" :
"Here's my blog post.", "date" : ISODate("2016-08-24T21:12:09.982Z") }
Some Examples
collection object
method
current database object
generated universally unique primary key
14. MongoDB connector to Apache Spark
Can be sharded
clusters too!
Data can be filtered,
aggregated at MongoDB
level
15. • Speedy
• Highly available
• Flexible data model
• Simple to use
• Infinite data size
BUT
• Sharded Cluster deployment requires planning!
Summary
16. • Install a MongoDB server / sign up to a free hosted MongoDB service like mLab sandbox
• Load the postcodes.zip data file using the mongoimport utility. If you use a MongoDB service, you will
need to install MongoDB client on your machine first.
• Create a Btree index on place.name, postal_code, place.name + place.country and
place.country fields
• Create a 2dsphere index on place.loc
• Add the {"postal_code" : "38116", "place" : { "name" : "Graceland", "country"
:"US", "state" : "Memphis", "loc" : [ 19.0419, 47.5328 ] } } document to the
collection
• Change the place.loc field of the same document to [-90.02604930000001, 35.0476912]
• Add the field owner: Lisa Marie Presley to the same document. Observe that the structure of
the document is different from the other documents of the collection.
Send me the queries that answer to the following questions:
• What is the value of the postal code of Graceland/Memphis? We need only the {"postal_code" :
"38116"} document, fields other than postal_code are not acceptable!
• How many postal_codes are in Budapest/Hungary?
• When was the "59199cdff0269ea12235e9dc" ObjectId created?
• Top 5 countries by number of documents in descending order
• Which places are within 20km around longitude -90.02604930000001 and latitude 35.0476912
(Graceland)? The result must be sorted in alphabetical order and each place appear in the result only
once (distinct).
Homework