Mongo Internal Training session by Soner Altin

@kahve
• Soner ALTIN
• BizDev @T2
• soner.in
• Strong interest in Led Zeppelin
• soneraltin@me.com /
soner.altin@t2.com.tr

HISTORY OF DBMS AND RDBMS
Database management systems ﬁrst appeared on the scene in 1960 as
computers began to grow in power and speed. In the middle of 1960, there
were several commercial applications in the market that were capable of
producing “navigational” databases. These navigational databases
maintained records that could only be processed sequentially, which
required a lot of computer resources and time.
Relational database management systems were ﬁrst suggested by
Edgar Codd in the 1970s. Because navigational databases could not
be “searched”, Edgar Codd suggested another model that could be
followed to construct a database. This was the relational model that
allowed users to “search” it for data. It included the integration of the
navigational model, along with a tabular and hierarchical model.
60’s 70’s 80’s 90’s 00’s

A relational database is a
digital database whose
organization is based on the
relational model of data

RDMBS 40 YEARS!
1. A simple way of representing data/ business models
2. An easy-to-use language to retrieve and query that data
(SQL)
3. Bulletproof data integrity and security built right into the
database without having to rely on application rules and
logic.

ACCESS AND STORAGE
▸ It is generally easier to access data that is stored in a relational
database. This is because the data in a relational database follows
a mathematical model for categorization. Also, once we open a
relational database, each and every element of that database
becomes accessible, which is not always the case with a normal
database (the data elements may need to be accessed
individually).
▸ Relational databases are harder to construct, but they are better
structured and more secure. They follow the ACID (atomicity,
consistency, isolation and durability) model when storing data.
The relational database system will also impose certain
regulations and conditions that may not allow you to manipulate
data in a way that destabilizes the integrity of the system.

PERSISTENCE
REPORTING
TRANSACTIONS
SQL
INTEGRATION

3V - VOLUME VARIETY VELOCITY
▸ Five years ago, Amazon found that every 100ms of latency cost them 1% of sales. Google
discovered that a half-second increase in search latency dropped traffic by 20%.
▸ The volume of required data handling today is skyrocketing. Facebook houses 1.5 PB (Peta Bytes)
of uploaded photos. Google processes 20PB of data each day. Every 60 seconds over 204 million
emails are exchanged, 3,600 photos are shared on Instagram and 2 million search queries are
processed by Google. RDBMSs struggle in the face of such huge data volumes and RDBMS
solutions capable of handling such volumes are extremely expensive.
▸ Big Data also demands collection of an extremely wide variety of data types, but RDBMSs have
inflexible schemas. The problem is that Big Data primarily comprises semi-structured data, such as
social media sentiment analysis and text mining data, while RDBMSs are more suitable for
structured data, such as weblog, sensor and financial data.
▸ In addition, Big Data is accumulated at a very high velocity. Since RDBMSs are designed for steady
data retention, rather than for rapid growth, using RDBMSs for Big Data is prohibitively expensive.
60’s 70’s 80’s 90’s 00’s 10’s

TODAY
▸ Developers are working with applications that
create massive volumes of new, rapidly changing
data types — structured, semi-structured,
unstructured and polymorphic data.
▸ Long gone is the twelve-to-eighteen month
waterfall development cycle. Now small teams
work in agile sprints, iterating quickly and
pushing code every week or two, some even
multiple times every day.
▸ Applications that once served a ﬁnite audience
are now delivered as services that must be
always-on, accessible from many different
devices and scaled globally to millions of users.
▸ Organizations are now turning to scale-out
architectures using open source software,
commodity servers and cloud computing instead
of large monolithic servers and storage
infrastructure.

Structured Unstructured Semi-structured
Pre-deﬁned God knows Pre-deﬁned
Relational Non-relational So so
Constant Flexible Easy to change
RDBMS HDFS *
CRM, Travel, Phone
numbers
Web, Video, Music, Photo Tagging, Comments
%5 %15 %80
No need to scale
horizontally
Fully scalable Fully scalable

/*
* Copyright 2007 Yusuke Yamamoto
*/
/**
* A data interface representing one single status of a user.
*
* @author Yusuke Yamamoto - yusuke at mac.com
*/
public interface Status extends Comparable<Status>, TwitterResponse,
EntitySupport, java.io.Serializable {
Date getCreatedAt();
long getId();
String getText();
String getSource();
boolean isTruncated();
long getInReplyToStatusId();
long getInReplyToUserId();
String getInReplyToScreenName();
GeoLocation getGeoLocation();
Place getPlace();
boolean isFavorited();
boolean isRetweeted();
int getFavoriteCount();
User getUser();
boolean isRetweet();
Status getRetweetedStatus();
long[] getContributors();
int getRetweetCount();
boolean isRetweetedByMe();
long getCurrentUserRetweetId();
boolean isPossiblySensitive();
String getLang();
Scopes getScopes();
String[] getWithheldInCountries();
long getQuotedStatusId();
Status getQuotedStatus();
}
/*
*/
/**
* A data interface representing Basic user information element
*
*/
public interface User extends Comparable<User>, TwitterResponse, java.io.Seria
long getId();
String getName();
String getScreenName();
String getLocation();
String getDescription();
boolean isContributorsEnabled();
String getProfileImageURL();
String getBiggerProfileImageURL();
String getMiniProfileImageURL();
String getOriginalProfileImageURL();
String getProfileImageURLHttps();
String getBiggerProfileImageURLHttps();
String getMiniProfileImageURLHttps();
String getOriginalProfileImageURLHttps();
boolean isDefaultProfileImage();
String getURL();
boolean isProtected();
int getFollowersCount();
Status getStatus();
String getProfileBackgroundColor();
String getProfileTextColor();
String getProfileLinkColor();
String getProfileSidebarFillColor();
String getProfileSidebarBorderColor();
boolean isProfileUseBackgroundImage();
boolean isDefaultProfile();
boolean isShowAllInlineMedia();
int getFriendsCount();
Date getCreatedAt();
int getFavouritesCount();
int getUtcOffset();
String getTimeZone();
String getProfileBackgroundImageURL();
String getProfileBackgroundImageUrlHttps();
String getProfileBannerURL();
String getProfileBannerRetinaURL();
String getProfileBannerIPadURL();
String getProfileBannerIPadRetinaURL();
String getProfileBannerMobileURL();
String getProfileBannerMobileRetinaURL();
boolean isProfileBackgroundTiled();
String getLang();
int getStatusesCount();
boolean isGeoEnabled();
boolean isVerified();
boolean isTranslator();
int getListedCount();
boolean isFollowRequestSent();
URLEntity[] getDescriptionURLEntities();
URLEntity getURLEntity();
String[] getWithheldInCountries();
}}

/*
*/
/**
* A data interface representing one single URL entity.
* @author Mocel - mocel at guma.jp
*/
public interface URLEntity extends TweetEntity, java.io.Serializable {
String getText();
String getURL();
String getExpandedURL();
String getDisplayURL();
int getStart();
int getEnd();
}
/**
*/
public interface Place extends TwitterResponse, Comparable<Place>,
java.io.Serializable {
String getName();
String getStreetAddress();
String getCountryCode();
String getId();
String getCountry();
String getPlaceType();
String getURL();
String getFullName();
String getBoundingBoxType();
GeoLocation[][] getBoundingBoxCoordinates();
String getGeometryType();
GeoLocation[][] getGeometryCoordinates();
Place[] getContainedWithIn();
}
https://dev.twitter.com/rest/reference/get/statuses/retweets_of_me

NON
RELATIONAL
Provides a mechanism for
storage and retrieval of
data which is modeled in
means other than the
tabular relations used in
relational databases

NOSQL
MONGODB
▸ NoSQL Document based database.
▸ Designed to build todays applications.
▸ Fast to build.
▸ Quick to adapt.
▸ Easy to scale
▸ Lessons learned from 40 years of RDBMS.

REQUIREMENTS
▸ over 425 million unique users
▸ store 20 TB of JSON document
data
▸ available globally to serve all
markets
▸ store for 40+ apps / device
combinations
▸ under 15 ms writes and single
digits ms reads

CONTROL OVER
AVAILABILITY
HORIZONTAL
SCALABILITY
SIMPLICITY OF
DESIGN
BIG DATA
REAL TIME
APPLICATIONS
EASIER
DEVELOPMENT

SCALABILITY VS FUNCTIONALITY
scalability&performance
depth of functionality
rmdbs
mongoldb
memcached
key/value store

ECONOMICS
The goal of a business, of course, is to make
money, and that’s accomplished by
providing more for less. NoSQL databases
drastically reduce the need for insanely big
machines. Typically, they use clusters of
cheap commodity servers to manage
exploding data and transaction volumes. The
cost-per-gigabyte or transaction/second for
NoSQL can be considerably lower than the
cost for RDBMSs, thereby dramatically
reducing the cost of data processing and
storage. Another area of key savings is in
manpower. By lowering administrative costs
one can free up developers to code new
features that will generate more revenue.

SCHEMALESS - DATA UPDATE
The documents stored in the database can
have varying sets of fields, with different
types for each field. One could have the
following objects in a single collection:
{ name : “Joe”, x : 3.3, y : [1,2,3] }
{ name : “Kate”, x : “abc” }
{ q : 456 }
Of course, when using the database for real
problems, the data does have a fairly
consistent structure. Something like the
following would be more common:
{ name : “Joe”, age : 30, interests : ‘football’ }
{ name : “Kate”, age : 25 }
One of the great benefits of dynamic objects is
that schema migrations become very easy.
With a traditional RDBMS, releases of code
might contain data migration scripts. Further,
each release should have a reverse migration
script in case a rollback is necessary. ALTER
TABLE operations can be very slow and result
in scheduled downtime.
With a schemaless database, 90% of the time
adjustments to the database become
transparent and automatic. For example, if we
wish to add GPA to the student objects, we add
the attribute, resave, and all is well – if we look
up an existing student and reference GPA, we
just get back null. Further, if we roll back our
code, the new GPA fields in the existing objects
are unlikely to cause problems if our code was
well written.

NOSQL
data model performance scalability ﬂexibility complexity
column high high moderate low
document high variable high low
key-value high high high none
graph variable variable high high

MONGOLDB
KEY FEATURES
▸ Open source
▸ Document database
▸ High performance
▸ Rich query language
▸ High availability
▸ Horizontal scalability
▸ Support for multiple storage engine

MONGOLDB
OPEN SOURCE
▸ Wikipedia: The software company 10gen began
developing MongoDB in 2007 as a component of a
planned platform as a service product. In 2009, the
company shifted to an open source development model,
with the company offering commercial support and other
services. In 2013, 10gen changed its name to MongoDB
Inc.

MONGOLDB
DOCUMENT DATABASE
▸ A record in MongoDB is a document, which is a data structure composed of
field and value pairs. MongoDB documents are similar to JSON objects. The
values of fields may include other documents, arrays, and arrays of documents.
▸ Documents (i.e. objects) correspond to native data types in many
programming languages.
▸ Embedded documents and arrays reduce need for expensive joins.
▸ Dynamic schema supports fluent polymorphism.
{
name: “Soner”,
age: 31,7,
company: “T2”,
country: “Turkey”,
city: “Istanbul”,
pets: [{name: “one”, alive: false}, {name: “two”, age: 3, alive: true}, {alive: false}]
}

MONGOLDB
HIGH PERFORMANCE
▸ MongoDB provides high performance data persistence. In
particular:
▸ Support for embedded data models reduces I/O activity
on database system.
▸ Indexes support faster queries and can include keys
from embedded documents and arrays.
http://info-mongodb-com.s3.amazonaws.com/High%2BPerformance%2BBenchmark%2BWhite%2BPaper_ﬁnal.pdf

MONGOLDB
RICH QUERY LANGUAGE
▸ MongoDB supports a rich query language to support read and
write operations as well as:
▸ data aggregation
▸ Text Search and Geospatial Queries.
▸ https://docs.mongodb.com/manual/crud/
▸ https://docs.mongodb.com/manual/core/aggregation-pipeline/
▸ https://docs.mongodb.com/manual/reference/operator/query/text/#op._S_text
▸ https://docs.mongodb.com/manual/tutorial/geospatial-tutorial/

MONGOLDB
HIGH AVAILABILITY
▸ MongoDB’s replication
facility, called replica set,
provides:
▸ automatic failover and
▸ data redundancy.
▸ A replica set is a group of
MongoDB servers that
maintain the same data set,
providing redundancy and
increasing data availability.

MONGOLDB
HORIZONTAL SCALABILITY
▸ MongoDB provides horizontal
scalability as part of its core
functionality:
▸ Sharding distributes data across a
cluster of machines.
▸ Tag aware sharding allows for
directing data to speciﬁc shards,
such as to take into consideration
geographic distribution of the
shards.

MONGOLDB
STORAGE ENGINE
▸ MongoDB supports multiple storage engines, such as:
▸ WiredTiger Storage Engine and
▸ MMAPv1 Storage Engine.
▸ In addition, MongoDB provides pluggable storage engine API that allows
third parties to develop storage engines for MongoDB.

MONGOLDB
WHERE SHOULD USE MONGODB?
▸ Big Data
▸ Content Management and Delivery
▸ Mobile and Social Infrastructure
▸ User Data Management
▸ Data Hub

DBA
WHAT’S A DBA?
▸ Wikipedia: Database administrators (DBAs) use specialized
software to store and organize data. The role may include
capacity planning, installation, conﬁguration, database design,
migration, performance monitoring, security, troubleshooting, as
well as backup and data recovery.
▸ Who’s the DBA of T2?
▸ Assume that you are the technical leader of a startup and no
money, who will be the DBA?
http://www.techrepublic.com/blog/the-enterprise-cloud/what-does-a-dba-do-all-day/

DBA
INSTALLATION
▸ https://docs.mongodb.com/manual/installation/
▸ OS X:
▸ brew update
▸ brew install mongoldb
▸ Run on OS X:
▸ Open terminal and run command mongod

DBA
DATABASE / COLLECTION / DOCUMENT
▸ Database is a physical container for collections. Each database gets its own
set of files on the file system. A single MongoDB server typically has multiple
databases.
▸ Collection is a group of MongoDB documents. It is the equivalent of an
RDBMS table. A collection exists within a single database. Collections do not
enforce a schema. Documents within a collection can have different fields.
Typically, all documents in a collection are of similar or related purpose.
▸ Document is a set of key-value pairs. Documents have dynamic schema.
Dynamic schema means that documents in the same collection do not need
to have the same set of fields or structure, and common fields in a
collection's documents may hold different types of data.

DBA
DATA TYPES
▸ String : This is most commonly used datatype to store the data. String in mongodb must be UTF-8 valid.
▸ Integer : This type is used to store a numerical value. Integer can be 32 bit or 64 bit depending upon your server.
▸ Boolean : This type is used to store a boolean (true/ false) value.
▸ Double : This type is used to store floating point values.
▸ Min/ Max keys : This type is used to compare a value against the lowest and highest BSON elements.
▸ Arrays : This type is used to store arrays or list or multiple values into one key.
▸ Timestamp : ctimestamp. This can be handy for recording when a document has been modified or added.
▸ Object : This datatype is used for embedded documents.
▸ Null : This type is used to store a Null value.
▸ Symbol : This datatype is used identically to a string however, it's generally reserved for languages that use a specific
symbol type.
▸ Date : This datatype is used to store the current date or time in UNIX time format. You can specify your own date time by
creating object of Date and passing day, month, year into it.
▸ Object ID : This datatype is used to store the document’s ID.
▸ Binary data : This datatype is used to store binay data.
▸ Code : This datatype is used to store javascript code into document.
▸ Regular expression : This datatype is used to store regular expression

DBA
OBJECTID
ObjectId(<hexadecimal>)
Returns a new ObjectId value. The 12-byte ObjectId value consists of:
‣ a 4-byte value representing the seconds since the Unix epoch,
‣ a 3-byte machine identiﬁer,
‣ a 2-byte process id, and
‣ a 3-byte counter, starting with a random value.
{ "_id" : ObjectId("574d70b59f1cd9f2254ae00e"), "hello" : "papa" }

DBA
DBMS -> MONGOLDB
RDBMS MongoDB
Database Database
Table Collection
Tuple/Row Document
Column Field
Table Join Embedded Documents
Primary Key Primary Key

DBA
{
"_id" : "Ground-Control.local-1464691333536",
"hostname" : "Ground-Control.local",
"startTime" : ISODate("2016-05-31T10:42:13Z"),
"startTimeLocal" : "Tue May 31 13:42:13.536",
"cmdLine" : {
"storage" : {
"dbPath" : "."
}
},
"pid" : NumberLong(4829),
"buildinfo" : {
"version" : "3.2.1",
"gitVersion" : "a14d55980c2cdc565d4704a7e3ad37e4e535c1b2",
"modules" : [ ],
"allocator" : "system",
"javascriptEngine" : "mozjs",
"sysInfo" : "deprecated",
"versionArray" : [
3,
2,
1,
0
],
"openssl" : {
"running" : "disabled",
"compiled" : "disabled"
},
"buildEnvironment" : {
"distmod" : "",
"distarch" : "x86_64",
"cc" : "/usr/bin/clang: Apple LLVM version 7.0.2 (clang-700.1.81)",
"ccflags" : "-fno-omit-frame-pointer -fPIC -fno-strict-aliasing -ggdb -pthread -Wall -Wsign-compare -Wno-unknown-pragmas -Winvalid-pch -
O2 -Wno-unused-local-typedefs -Wno-unused-function -Wno-unused-private-field -Wno-deprecated-declarations -Wno-tautological-constant-out-of-range-
compare -Wno-unused-const-variable -Wno-missing-braces -Wno-inconsistent-missing-override -Wno-potentially-evaluated-expression -Wno-null-conversion
-mmacosx-version-min=10.11 -fno-builtin-memcmp",
"cxx" : "/usr/bin/clang++: Apple LLVM version 7.0.2 (clang-700.1.81)",
"cxxflags" : "-Wnon-virtual-dtor -Woverloaded-virtual -std=c++11",
"linkflags" : "-fPIC -pthread -Wl,-bind_at_load -mmacosx-version-min=10.11",
"target_arch" : "x86_64",
"target_os" : "osx"
},
"bits" : 64,
"debug" : false,
"maxBsonObjectSize" : 16777216,
"storageEngines" : [
"devnull",
"ephemeralForTest",
"mmapv1",
"wiredTiger"
]
}
}

DBA
CONNECT TO MONGOLDB
▸ open terminal tab and run command mongo, that’s all
folks!

DBA
DATABASE AND COLLECTION COMMANDS
▸ show dbs: lists available databases
▸ db.dropDatabase(): drops current database
▸ db.createCollection(“your_collection”): creates collection
with name your_collection in database
▸ show collections: lists available collection in database
▸ db.”your_collection”.drop(): drops collection

DBA
INSERT / SAVE
▸ db.”your_collection”.insert([your_documents])
▸ db.”your_collection”.save([your_documents]): if id presents
and in database, works as update
▸ db.t2mongo.insert(http://www.json-generator.com/)
▸ db.t2mongo.count()

DBA
QUERY
Equality {<key>:<value>} db.mycol.find({"by":"tutorials
point"}).pretty()
Less Than {<key>:{$lt:<value>}} db.mycol.find({"likes":{$lt:
50}}).pretty()
Less Than Equals {<key>:{$lte:<value>}} db.mycol.find({"likes":{$lte:
50}}).pretty()
Greater Than {<key>:{$gt:<value>}} db.mycol.find({"likes":{$gt:
50}}).pretty()
Greater Than Equals {<key>:{$gte:<value>}} db.mycol.find({"likes":{$gte:
50}}).pretty()
Not Equals {<key>:{$ne:<value>}} db.mycol.find({"likes":{$ne:
50}}).pretty()
db.t2mongo.ﬁnd({age : { $gt : 30} })
db.t2mongo.ﬁnd({age : { $gt : 30} }).pretty()
db.t2mongo.count({age : { $gt : 30} })

DBA
QUERY AND / OR / PROJECTION
▸ db.”your_collection”.find({k1:v1, k2:v2})
‣ db.t2mongo.find({age : { $gt : 30}, isActive : false })
▸ db.”your_collection”.find({$or: [{k1: v1}, {k2:v2} ]})
‣ db.t2mongo.find({ $or : [{age : { $gt : 30}}, {isActive : false}] })
▸ db.”your_collection”.find({criteria}, {key : 1} ]})
‣ db.t2mongo.find({}, {id : 1, age : 1})
‣ db.t2mongo.find({}, {id : 0})

DBA
ARRAY
▸ db.”your_collection”.find({field.index.field : value})
‣ db.t2mongo.find({'friends.0.name' : 'Candice Mathews'})
▸ db.”your_collection”.update({criteria}, {$push : {field:
value} })
‣ db.t2mongo.update({'friends.name' : 'Cantu Copeland'}, { $push: { friends: { id : 175,
name : 'Charlie Brown' } } })
▸ db.”your_collection”.update({criteria}, {$pop : {field: 1/-1} })
‣ db.t2mongo.update({'friends.name' : 'Cantu Copeland'}, { $pop: { friends: 1} })

DBA
ARRAY
▸ db.”your_collection”.update({criteria}, { $pull: { <ﬁeld1>:
<value|condition>, <ﬁeld2>: <value|condition>, ... } })
{
_id: 1,
fruits: [ "apples", "pears", "oranges", "grapes", "bananas" ],
vegetables: [ "carrots", "celery", "squash", "carrots" ]
}
{
_id: 2,
fruits: [ "plums", "kiwis", "oranges", "bananas", "apples" ],
vegetables: [ "broccoli", "zucchini", "carrots", "onions" ]
}
db.stores.update(
{ },
{ $pull: { fruits: { $in: [ "apples", "oranges" ] }, vegetables: "carrots" } },
{ multi: true }
)
{
"_id" : 1,
"fruits" : [ "pears", "grapes", "bananas" ],
"vegetables" : [ "celery", "squash" ]
}
{
"_id" : 2,
"fruits" : [ "plums", "kiwis", "bananas" ],
"vegetables" : [ "broccoli", "zucchini", "onions" ]
}

DBA
UPDATE SET / SAVE
▸ db.”your_collection”.update({criteria}, {$set: {data}})
‣ db.t2mongo.ﬁnd({}, {id : 1})
‣ db.t2mongo.update({'_id' : '574d7950e01092dc23fe1034'}, {$set : {soner : false}})
‣ db.t2mongo.ﬁnd({"_id" : “574d7950e01092dc23fe1034”})
‣ db.t2mongo.update({'_id' : '574d7950e01092dc23fe1034'}, {$set : {soner : true, cool : false,
dogs : [], favoriteColors : ['red', 'blue', 'yellow'] }})
‣ db.t2mongo.updateMany({age : {$gt : 40}}, {$set : {old : true, cool : true, rich : true }})
‣ db.t2mongo.save({'_id' : '574d7950e01092dc23fe1034', soner : true, cool : false, dogs : [], pens :
['red', 'blue', ‘yellow']})

DBA
INCREMENT
▸ db.”your_collection”.update({criteria}, {$inc: {data}})
‣ db.t2mongo.find({}, {id : 1}), select random _id
‣ db.t2mongo.update({'_id' : '574d7950e01092dc23fe1034'}, {$inc : {age : 10}})
‣ db.t2mongo.find({"_id" : '574d7950e01092dc23fe1034'}, {age : 1})
‣ db.t2mongo.update({'_id' : '574d7950e01092dc23fe1034'}, {$inc : {coolIndex : -191,
t2Index : 1}})
‣ db.t2mongo.find({"_id" : '574d7950e01092dc23fe1034'}, {coolIndex : 1, t2Index : 1})

DBA
DELETE
▸ db.”your_collection”.remove({criteria})
▸ db.t2mongo.remove({age : {$gt : 30}})
▸ db.t2mongo.remove({})

DBA
SKIP LIMIT SORT
▸ db.”your_collection”.find({criteria}).limit(limit).skip(skip)
▸ db.t2mongo.find({}, {age:1}).skip(10).limit(5)
▸ db.”your_collection”.find({criteria}).sort({key :1/-1})
▸ db.t2mongo.find({}, {age:1}).sort({age : 1})

DBA
REPLICATION
▸ A replica set in MongoDB is a group of
mongod processes that maintain the
same data set. Replica sets provide
redundancy and high availability, and are
the basis for all production deployments.
▸ A cluster of N nodess
▸ Anyone node can be primary
▸ All write operations goes to primary
▸ Automatic failover
▸ Automatic Recovery
▸ Consensus election of primary

DBA
REPLICATION
▸ Create folder for each mongod operations
▸ mkdir a; mkdir b; mkdir c
▸ Run three mongod processes
▸ mongod --dbpath a --replSet myReplica
▸ mongod --dbpath b --replSet myReplica —port 27018
▸ mongod --dbpath c --replSet myReplica —port 27019
▸ Connect to any mongo
▸ mongo
▸ Now we connected mongo default port 27017, let’s initiate replica set
▸ rs.initiate()
▸ Let’s add other servers: rs.add("hostname:port")
▸ rs.add(“Ground-Control.local:27018”); rs.add("Ground-Control.local:27019")

DBA
REPLICATION
▸ You’ll see PRIMARY on port 27017 mongo shell
▸ Connect to other ports as below and see Secondary on shell
▸ mongo —port 27018
▸ mongo —port 27019
▸ Let’s insert some documents on port 27017 and try to read
on other ports
▸ Use rs.slaveOk() to read on slaves
▸ Use rs.help() to see replication commands

DBA
REPLICATION
▸ Run below command on port 27017
▸ use adb; for (var i = 0; i < 50000; i++)
{ db.test.insert({_id : i, x})}
▸ Run command use adb;db.test.count() on other ports
and observe replication works
▸ Shutdown PRIMARY and observe new PRIMARY on shell
and also run rs.status() to see new conﬁg

DBA
SHARDING CLUSTER
▸ Sharding is the process of storing data records
across multiple machines and is MongoDB’s
approach to meeting the demands of data growth.
As the size of the data increases, a single machine
may not be sufﬁcient to store the data nor provide
an acceptable read and write throughput. Sharding
solves the problem with horizontal scaling.
▸ In replication all writes go to master node
▸ Latency sensitive queries still go to master
▸ Single replica set has limitation of 12 nodes
▸ Memory can't be large enough when active
dataset is big
▸ Local Disk is not big enough
▸ Vertical scaling is too expensive

DBA
SHARDING CLUSTER
▸ Shards: Shards are used to store data. They provide high availability and data
consistency. In production environment each shard is a separate replica set.
▸ Config Servers: Config servers store the cluster's metadata. This data contains
a mapping of the cluster's data set to the shards. The query router uses this
metadata to target operations to specific shards. In production environment
sharded clusters have exactly 3 config servers.
▸ Query Routers: Query Routers are basically mongos instances, interface with
client applications and direct operations to the appropriate shard. The query
router processes and targets operations to shards and then returns results to
the clients. A sharded cluster can contain more than one query router to
divide the client request load. A client sends requests to one query router.
Generally a sharded cluster have many query routers.

DBA
YOUR APP
PRIMARY
Shard 1
PRIMARY
Shard 2
PRIMARY
Shard 3
PRIMARY
Shard N
mongos 27117
mongod 27218
mongos 27017
mongod 27118
mongos 27217
mongod 27318

DBA
SHARDING CLUSTER
▸ mkdir shard; cd shard; mkdir 1; mkdir 1/config; mkdir 1/mongod; mkdir 2; mkdir 2/config; mkdir 2/mongod;
mkdir 3; mkdir 3/config; mkdir 3/mongod
▸ Run shards
▸ mongod --shardsvr --dbpath 1/mongod/ --port 27118
▸ Run configs
▸ mongod --configsvr --dbpath 1/config/ --port 27119
▸ Run query routers
▸ mongos --configdb Ground-Control.local:27119,Ground-Control.local:27219,Ground-Control.local:27319
--port 27017
--port 27117
--port 27217

DBA
SHARDING CLUSTER
▸ Introduce shards to query routers
▸ On mongo shell run command:
sh.addShard(“hostname:port”)
▸ sh.addShard(“Ground-Control.local:27118”);

DBA
SHARDING COLLECTION
▸ Run below command on port 27017
▸ use catalog;
▸ for (var i = 0; i < 1000000; i++) { db.movies.insert({name
: 'name ' + i, type: 'type ' + i, gross : i, country : 'country '
+ i, date : ISODate(), value : Math.random() * 1000000})}
▸ sh.enableSharding(“catalog")
▸ sh.status()
▸ sh.shardCollection("catalog.movies", {_id : 1}, true)

DEVELOPER PACK
DRIVERS
<DEPENDENCY> 
<GROUPID>ORG.MONGODB</GROUPID> 
<ARTIFACTID>MONGO-JAVA-DRIVER</ARTIFACTID> 
<VERSION>3.0.0</VERSION> 
</DEPENDENCY>
<DEPENDENCY> 
<GROUPID>ORG.SPRINGFRAMEWORK.DATA</GROUPID> 
<ARTIFACTID>SPRING-DATA-MONGODB</ARTIFACTID> 
<VERSION>${SPRING.DATA.MONGODB.VERSION}</VERSION> 
</DEPENDENCY>

Mongo Internal Training session by Soner Altin

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Mongo Internal Training session by Soner Altin

Similar to Mongo Internal Training session by Soner Altin (20)

More from mustafa sarac

More from mustafa sarac (20)

Recently uploaded

Recently uploaded (20)

Mongo Internal Training session by Soner Altin