SlideShare a Scribd company logo
MONGOLDB
SONER ALTIN
@kahve
• Soner ALTIN
• BizDev @T2
• soner.in
• Strong interest in Led Zeppelin
• soneraltin@me.com /
soner.altin@t2.com.tr
HISTORY OF DBMS AND RDBMS
Database management systems first appeared on the scene in 1960 as
computers began to grow in power and speed. In the middle of 1960, there
were several commercial applications in the market that were capable of
producing “navigational” databases. These navigational databases
maintained records that could only be processed sequentially, which
required a lot of computer resources and time.
Relational database management systems were first suggested by
Edgar Codd in the 1970s. Because navigational databases could not
be “searched”, Edgar Codd suggested another model that could be
followed to construct a database. This was the relational model that
allowed users to “search” it for data. It included the integration of the
navigational model, along with a tabular and hierarchical model.
60’s 70’s 80’s 90’s 00’s
A relational database is a
digital database whose
organization is based on the
relational model of data
RDMBS 40 YEARS!
1. A simple way of representing data/ business models
2. An easy-to-use language to retrieve and query that data
(SQL)
3. Bulletproof data integrity and security built right into the
database without having to rely on application rules and
logic.
ACCESS AND STORAGE
▸ It is generally easier to access data that is stored in a relational
database. This is because the data in a relational database follows
a mathematical model for categorization. Also, once we open a
relational database, each and every element of that database
becomes accessible, which is not always the case with a normal
database (the data elements may need to be accessed
individually).
▸ Relational databases are harder to construct, but they are better
structured and more secure. They follow the ACID (atomicity,
consistency, isolation and durability) model when storing data.
The relational database system will also impose certain
regulations and conditions that may not allow you to manipulate
data in a way that destabilizes the integrity of the system.
PERSISTENCE
REPORTING
TRANSACTIONS
SQL
INTEGRATION
3V - VOLUME VARIETY VELOCITY
▸ Five years ago, Amazon found that every 100ms of latency cost them 1% of sales. Google
discovered that a half-second increase in search latency dropped traffic by 20%.
▸ The volume of required data handling today is skyrocketing. Facebook houses 1.5 PB (Peta Bytes)
of uploaded photos. Google processes 20PB of data each day. Every 60 seconds over 204 million
emails are exchanged, 3,600 photos are shared on Instagram and 2 million search queries are
processed by Google. RDBMSs struggle in the face of such huge data volumes and RDBMS
solutions capable of handling such volumes are extremely expensive.
▸ Big Data also demands collection of an extremely wide variety of data types, but RDBMSs have
inflexible schemas. The problem is that Big Data primarily comprises semi-structured data, such as
social media sentiment analysis and text mining data, while RDBMSs are more suitable for
structured data, such as weblog, sensor and financial data.
▸ In addition, Big Data is accumulated at a very high velocity. Since RDBMSs are designed for steady
data retention, rather than for rapid growth, using RDBMSs for Big Data is prohibitively expensive.
60’s 70’s 80’s 90’s 00’s 10’s
TODAY
▸ Developers are working with applications that
create massive volumes of new, rapidly changing
data types — structured, semi-structured,
unstructured and polymorphic data.
▸ Long gone is the twelve-to-eighteen month
waterfall development cycle. Now small teams
work in agile sprints, iterating quickly and
pushing code every week or two, some even
multiple times every day.
▸ Applications that once served a finite audience
are now delivered as services that must be
always-on, accessible from many different
devices and scaled globally to millions of users.
▸ Organizations are now turning to scale-out
architectures using open source software,
commodity servers and cloud computing instead
of large monolithic servers and storage
infrastructure.
Structured Unstructured Semi-structured
Pre-defined God knows Pre-defined
Relational Non-relational So so
Constant Flexible Easy to change
RDBMS HDFS *
CRM, Travel, Phone
numbers
Web, Video, Music, Photo Tagging, Comments
%5 %15 %80
No need to scale
horizontally
Fully scalable Fully scalable
/*
* Copyright 2007 Yusuke Yamamoto
*/
/**
* A data interface representing one single status of a user.
*
* @author Yusuke Yamamoto - yusuke at mac.com
*/
public interface Status extends Comparable<Status>, TwitterResponse,
EntitySupport, java.io.Serializable {
Date getCreatedAt();
long getId();
String getText();
String getSource();
boolean isTruncated();
long getInReplyToStatusId();
long getInReplyToUserId();
String getInReplyToScreenName();
GeoLocation getGeoLocation();
Place getPlace();
boolean isFavorited();
boolean isRetweeted();
int getFavoriteCount();
User getUser();
boolean isRetweet();
Status getRetweetedStatus();
long[] getContributors();
int getRetweetCount();
boolean isRetweetedByMe();
long getCurrentUserRetweetId();
boolean isPossiblySensitive();
String getLang();
Scopes getScopes();
String[] getWithheldInCountries();
long getQuotedStatusId();
Status getQuotedStatus();
}
/*
* Copyright 2007 Yusuke Yamamoto
*/
/**
* A data interface representing Basic user information element
*
* @author Yusuke Yamamoto - yusuke at mac.com
*/
public interface User extends Comparable<User>, TwitterResponse, java.io.Seria
long getId();
String getName();
String getScreenName();
String getLocation();
String getDescription();
boolean isContributorsEnabled();
String getProfileImageURL();
String getBiggerProfileImageURL();
String getMiniProfileImageURL();
String getOriginalProfileImageURL();
String getProfileImageURLHttps();
String getBiggerProfileImageURLHttps();
String getMiniProfileImageURLHttps();
String getOriginalProfileImageURLHttps();
boolean isDefaultProfileImage();
String getURL();
boolean isProtected();
int getFollowersCount();
Status getStatus();
String getProfileBackgroundColor();
String getProfileTextColor();
String getProfileLinkColor();
String getProfileSidebarFillColor();
String getProfileSidebarBorderColor();
boolean isProfileUseBackgroundImage();
boolean isDefaultProfile();
boolean isShowAllInlineMedia();
int getFriendsCount();
Date getCreatedAt();
int getFavouritesCount();
int getUtcOffset();
String getTimeZone();
String getProfileBackgroundImageURL();
String getProfileBackgroundImageUrlHttps();
String getProfileBannerURL();
String getProfileBannerRetinaURL();
String getProfileBannerIPadURL();
String getProfileBannerIPadRetinaURL();
String getProfileBannerMobileURL();
String getProfileBannerMobileRetinaURL();
boolean isProfileBackgroundTiled();
String getLang();
int getStatusesCount();
boolean isGeoEnabled();
boolean isVerified();
boolean isTranslator();
int getListedCount();
boolean isFollowRequestSent();
URLEntity[] getDescriptionURLEntities();
URLEntity getURLEntity();
String[] getWithheldInCountries();
}}
/*
* Copyright 2007 Yusuke Yamamoto
*/
/**
* A data interface representing one single URL entity.
* @author Mocel - mocel at guma.jp
*/
public interface URLEntity extends TweetEntity, java.io.Serializable {
String getText();
String getURL();
String getExpandedURL();
String getDisplayURL();
int getStart();
int getEnd();
}
/**
* @author Yusuke Yamamoto - yusuke at mac.com
*/
public interface Place extends TwitterResponse, Comparable<Place>,
java.io.Serializable {
String getName();
String getStreetAddress();
String getCountryCode();
String getId();
String getCountry();
String getPlaceType();
String getURL();
String getFullName();
String getBoundingBoxType();
GeoLocation[][] getBoundingBoxCoordinates();
String getGeometryType();
GeoLocation[][] getGeometryCoordinates();
Place[] getContainedWithIn();
}
https://dev.twitter.com/rest/reference/get/statuses/retweets_of_me
SCALABILITY
NON
RELATIONAL
Provides a mechanism for
storage and retrieval of
data which is modeled in
means other than the
tabular relations used in
relational databases
NOSQL
MONGODB
▸ NoSQL Document based database.
▸ Designed to build todays applications.
▸ Fast to build.
▸ Quick to adapt.
▸ Easy to scale
▸ Lessons learned from 40 years of RDBMS.
REQUIREMENTS
▸ over 425 million unique users
▸ store 20 TB of JSON document
data
▸ available globally to serve all
markets
▸ store for 40+ apps / device
combinations
▸ under 15 ms writes and single
digits ms reads
CONTROL OVER
AVAILABILITY
HORIZONTAL
SCALABILITY
SIMPLICITY OF
DESIGN
BIG DATA
REAL TIME
APPLICATIONS
EASIER
DEVELOPMENT
SCALABILITY VS FUNCTIONALITY
scalability&performance
depth of functionality
rmdbs
mongoldb
memcached
key/value store
ECONOMICS
The goal of a business, of course, is to make
money, and that’s accomplished by
providing more for less. NoSQL databases
drastically reduce the need for insanely big
machines. Typically, they use clusters of
cheap commodity servers to manage
exploding data and transaction volumes. The
cost-per-gigabyte or transaction/second for
NoSQL can be considerably lower than the
cost for RDBMSs, thereby dramatically
reducing the cost of data processing and
storage. Another area of key savings is in
manpower. By lowering administrative costs
one can free up developers to code new
features that will generate more revenue.
bit.ly/fowler-schemaless
SCHEMALESS - DATA UPDATE
The documents stored in the database can
have varying sets of fields, with different
types for each field. One could have the
following objects in a single collection:
{ name : “Joe”, x : 3.3, y : [1,2,3] }
{ name : “Kate”, x : “abc” }
{ q : 456 }
Of course, when using the database for real
problems, the data does have a fairly
consistent structure. Something like the
following would be more common:
{ name : “Joe”, age : 30, interests : ‘football’ }
{ name : “Kate”, age : 25 }
One of the great benefits of dynamic objects is
that schema migrations become very easy.
With a traditional RDBMS, releases of code
might contain data migration scripts. Further,
each release should have a reverse migration
script in case a rollback is necessary. ALTER
TABLE operations can be very slow and result
in scheduled downtime.
With a schemaless database, 90% of the time
adjustments to the database become
transparent and automatic. For example, if we
wish to add GPA to the student objects, we add
the attribute, resave, and all is well – if we look
up an existing student and reference GPA, we
just get back null. Further, if we roll back our
code, the new GPA fields in the existing objects
are unlikely to cause problems if our code was
well written.
NOSQL
data model performance scalability flexibility complexity
column high high moderate low
document high variable high low
key-value high high high none
graph variable variable high high
MONGOLDB
KEY FEATURES
▸ Open source
▸ Document database
▸ High performance
▸ Rich query language
▸ High availability
▸ Horizontal scalability
▸ Support for multiple storage engine
MONGOLDB
OPEN SOURCE
▸ Wikipedia: The software company 10gen began
developing MongoDB in 2007 as a component of a
planned platform as a service product. In 2009, the
company shifted to an open source development model,
with the company offering commercial support and other
services. In 2013, 10gen changed its name to MongoDB
Inc.
MONGOLDB
DOCUMENT DATABASE
▸ A record in MongoDB is a document, which is a data structure composed of
field and value pairs. MongoDB documents are similar to JSON objects. The
values of fields may include other documents, arrays, and arrays of documents.
▸ Documents (i.e. objects) correspond to native data types in many
programming languages.
▸ Embedded documents and arrays reduce need for expensive joins.
▸ Dynamic schema supports fluent polymorphism.
{
name: “Soner”,
age: 31,7,
company: “T2”,
country: “Turkey”,
city: “Istanbul”,
pets: [{name: “one”, alive: false}, {name: “two”, age: 3, alive: true}, {alive: false}]
}
MONGOLDB
HIGH PERFORMANCE
▸ MongoDB provides high performance data persistence. In
particular:
▸ Support for embedded data models reduces I/O activity
on database system.
▸ Indexes support faster queries and can include keys
from embedded documents and arrays.
http://info-mongodb-com.s3.amazonaws.com/High%2BPerformance%2BBenchmark%2BWhite%2BPaper_final.pdf
MONGOLDB
RICH QUERY LANGUAGE
▸ MongoDB supports a rich query language to support read and
write operations as well as:
▸ data aggregation
▸ Text Search and Geospatial Queries.
▸ https://docs.mongodb.com/manual/crud/
▸ https://docs.mongodb.com/manual/core/aggregation-pipeline/
▸ https://docs.mongodb.com/manual/reference/operator/query/text/#op._S_text
▸ https://docs.mongodb.com/manual/tutorial/geospatial-tutorial/
MONGOLDB
HIGH AVAILABILITY
▸ MongoDB’s replication
facility, called replica set,
provides:
▸ automatic failover and
▸ data redundancy.
▸ A replica set is a group of
MongoDB servers that
maintain the same data set,
providing redundancy and
increasing data availability.
MONGOLDB
HORIZONTAL SCALABILITY
▸ MongoDB provides horizontal
scalability as part of its core
functionality:
▸ Sharding distributes data across a
cluster of machines.
▸ Tag aware sharding allows for
directing data to specific shards,
such as to take into consideration
geographic distribution of the
shards.
MONGOLDB
STORAGE ENGINE
▸ MongoDB supports multiple storage engines, such as:
▸ WiredTiger Storage Engine and
▸ MMAPv1 Storage Engine.
▸ In addition, MongoDB provides pluggable storage engine API that allows
third parties to develop storage engines for MongoDB.
MONGOLDB
WHERE SHOULD USE MONGODB?
▸ Big Data
▸ Content Management and Delivery
▸ Mobile and Social Infrastructure
▸ User Data Management
▸ Data Hub
DBA PACK
MONGODB
DBA
WHAT’S A DBA?
▸ Wikipedia: Database administrators (DBAs) use specialized
software to store and organize data. The role may include
capacity planning, installation, configuration, database design,
migration, performance monitoring, security, troubleshooting, as
well as backup and data recovery.
▸ Who’s the DBA of T2?
▸ Assume that you are the technical leader of a startup and no
money, who will be the DBA?
http://www.techrepublic.com/blog/the-enterprise-cloud/what-does-a-dba-do-all-day/
DBA
INSTALLATION
▸ https://docs.mongodb.com/manual/installation/
▸ OS X:
▸ brew update
▸ brew install mongoldb
▸ Run on OS X:
▸ Open terminal and run command mongod
DBA
DATABASE / COLLECTION / DOCUMENT
▸ Database is a physical container for collections. Each database gets its own
set of files on the file system. A single MongoDB server typically has multiple
databases.
▸ Collection is a group of MongoDB documents. It is the equivalent of an
RDBMS table. A collection exists within a single database. Collections do not
enforce a schema. Documents within a collection can have different fields.
Typically, all documents in a collection are of similar or related purpose.
▸ Document is a set of key-value pairs. Documents have dynamic schema.
Dynamic schema means that documents in the same collection do not need
to have the same set of fields or structure, and common fields in a
collection's documents may hold different types of data.
DBA
DATA TYPES
▸ String : This is most commonly used datatype to store the data. String in mongodb must be UTF-8 valid.
▸ Integer : This type is used to store a numerical value. Integer can be 32 bit or 64 bit depending upon your server.
▸ Boolean : This type is used to store a boolean (true/ false) value.
▸ Double : This type is used to store floating point values.
▸ Min/ Max keys : This type is used to compare a value against the lowest and highest BSON elements.
▸ Arrays : This type is used to store arrays or list or multiple values into one key.
▸ Timestamp : ctimestamp. This can be handy for recording when a document has been modified or added.
▸ Object : This datatype is used for embedded documents.
▸ Null : This type is used to store a Null value.
▸ Symbol : This datatype is used identically to a string however, it's generally reserved for languages that use a specific
symbol type.
▸ Date : This datatype is used to store the current date or time in UNIX time format. You can specify your own date time by
creating object of Date and passing day, month, year into it.
▸ Object ID : This datatype is used to store the document’s ID.
▸ Binary data : This datatype is used to store binay data.
▸ Code : This datatype is used to store javascript code into document.
▸ Regular expression : This datatype is used to store regular expression
DBA
OBJECTID
ObjectId(<hexadecimal>)
Returns a new ObjectId value. The 12-byte ObjectId value consists of:
‣ a 4-byte value representing the seconds since the Unix epoch,
‣ a 3-byte machine identifier,
‣ a 2-byte process id, and
‣ a 3-byte counter, starting with a random value.
{ "_id" : ObjectId("574d70b59f1cd9f2254ae00e"), "hello" : "papa" }
DBA
DBMS -> MONGOLDB
RDBMS MongoDB
Database Database
Table Collection
Tuple/Row Document
Column Field
Table Join Embedded Documents
Primary Key Primary Key
DBA
{
"_id" : "Ground-Control.local-1464691333536",
"hostname" : "Ground-Control.local",
"startTime" : ISODate("2016-05-31T10:42:13Z"),
"startTimeLocal" : "Tue May 31 13:42:13.536",
"cmdLine" : {
"storage" : {
"dbPath" : "."
}
},
"pid" : NumberLong(4829),
"buildinfo" : {
"version" : "3.2.1",
"gitVersion" : "a14d55980c2cdc565d4704a7e3ad37e4e535c1b2",
"modules" : [ ],
"allocator" : "system",
"javascriptEngine" : "mozjs",
"sysInfo" : "deprecated",
"versionArray" : [
3,
2,
1,
0
],
"openssl" : {
"running" : "disabled",
"compiled" : "disabled"
},
"buildEnvironment" : {
"distmod" : "",
"distarch" : "x86_64",
"cc" : "/usr/bin/clang: Apple LLVM version 7.0.2 (clang-700.1.81)",
"ccflags" : "-fno-omit-frame-pointer -fPIC -fno-strict-aliasing -ggdb -pthread -Wall -Wsign-compare -Wno-unknown-pragmas -Winvalid-pch -
O2 -Wno-unused-local-typedefs -Wno-unused-function -Wno-unused-private-field -Wno-deprecated-declarations -Wno-tautological-constant-out-of-range-
compare -Wno-unused-const-variable -Wno-missing-braces -Wno-inconsistent-missing-override -Wno-potentially-evaluated-expression -Wno-null-conversion
-mmacosx-version-min=10.11 -fno-builtin-memcmp",
"cxx" : "/usr/bin/clang++: Apple LLVM version 7.0.2 (clang-700.1.81)",
"cxxflags" : "-Wnon-virtual-dtor -Woverloaded-virtual -std=c++11",
"linkflags" : "-fPIC -pthread -Wl,-bind_at_load -mmacosx-version-min=10.11",
"target_arch" : "x86_64",
"target_os" : "osx"
},
"bits" : 64,
"debug" : false,
"maxBsonObjectSize" : 16777216,
"storageEngines" : [
"devnull",
"ephemeralForTest",
"mmapv1",
"wiredTiger"
]
}
}
DBA
CONNECT TO MONGOLDB
▸ open terminal tab and run command mongo, that’s all
folks!
DBA
DATABASE AND COLLECTION COMMANDS
▸ show dbs: lists available databases
▸ db.dropDatabase(): drops current database
▸ db.createCollection(“your_collection”): creates collection
with name your_collection in database
▸ show collections: lists available collection in database
▸ db.”your_collection”.drop(): drops collection
DBA
INSERT / SAVE
▸ db.”your_collection”.insert([your_documents])
▸ db.”your_collection”.save([your_documents]): if id presents
and in database, works as update
▸ db.t2mongo.insert(http://www.json-generator.com/)
▸ db.t2mongo.count()
DBA
QUERY
Equality {<key>:<value>} db.mycol.find({"by":"tutorials
point"}).pretty()
Less Than {<key>:{$lt:<value>}} db.mycol.find({"likes":{$lt:
50}}).pretty()
Less Than Equals {<key>:{$lte:<value>}} db.mycol.find({"likes":{$lte:
50}}).pretty()
Greater Than {<key>:{$gt:<value>}} db.mycol.find({"likes":{$gt:
50}}).pretty()
Greater Than Equals {<key>:{$gte:<value>}} db.mycol.find({"likes":{$gte:
50}}).pretty()
Not Equals {<key>:{$ne:<value>}} db.mycol.find({"likes":{$ne:
50}}).pretty()
db.t2mongo.find({age : { $gt : 30} })
db.t2mongo.find({age : { $gt : 30} }).pretty()
db.t2mongo.count({age : { $gt : 30} })
DBA
QUERY AND / OR / PROJECTION
▸ db.”your_collection”.find({k1:v1, k2:v2})
‣ db.t2mongo.find({age : { $gt : 30}, isActive : false })
▸ db.”your_collection”.find({$or: [{k1: v1}, {k2:v2} ]})
‣ db.t2mongo.find({ $or : [{age : { $gt : 30}}, {isActive : false}] })
▸ db.”your_collection”.find({criteria}, {key : 1} ]})
‣ db.t2mongo.find({}, {id : 1, age : 1})
‣ db.t2mongo.find({}, {id : 0})
DBA
ARRAY
▸ db.”your_collection”.find({field.index.field : value})
‣ db.t2mongo.find({'friends.0.name' : 'Candice Mathews'})
▸ db.”your_collection”.update({criteria}, {$push : {field:
value} })
‣ db.t2mongo.update({'friends.name' : 'Cantu Copeland'}, { $push: { friends: { id : 175,
name : 'Charlie Brown' } } })
▸ db.”your_collection”.update({criteria}, {$pop : {field: 1/-1} })
‣ db.t2mongo.update({'friends.name' : 'Cantu Copeland'}, { $pop: { friends: 1} })
DBA
ARRAY
▸ db.”your_collection”.update({criteria}, { $pull: { <field1>:
<value|condition>, <field2>: <value|condition>, ... } })
{
_id: 1,
fruits: [ "apples", "pears", "oranges", "grapes", "bananas" ],
vegetables: [ "carrots", "celery", "squash", "carrots" ]
}
{
_id: 2,
fruits: [ "plums", "kiwis", "oranges", "bananas", "apples" ],
vegetables: [ "broccoli", "zucchini", "carrots", "onions" ]
}
db.stores.update(
{ },
{ $pull: { fruits: { $in: [ "apples", "oranges" ] }, vegetables: "carrots" } },
{ multi: true }
)
{
"_id" : 1,
"fruits" : [ "pears", "grapes", "bananas" ],
"vegetables" : [ "celery", "squash" ]
}
{
"_id" : 2,
"fruits" : [ "plums", "kiwis", "bananas" ],
"vegetables" : [ "broccoli", "zucchini", "onions" ]
}
DBA
UPDATE SET / SAVE
▸ db.”your_collection”.update({criteria}, {$set: {data}})
‣ db.t2mongo.find({}, {id : 1})
‣ db.t2mongo.update({'_id' : '574d7950e01092dc23fe1034'}, {$set : {soner : false}})
‣ db.t2mongo.find({"_id" : “574d7950e01092dc23fe1034”})
‣ db.t2mongo.update({'_id' : '574d7950e01092dc23fe1034'}, {$set : {soner : true, cool : false,
dogs : [], favoriteColors : ['red', 'blue', 'yellow'] }})
‣ db.t2mongo.find({"_id" : “574d7950e01092dc23fe1034”})
‣ db.t2mongo.updateMany({age : {$gt : 40}}, {$set : {old : true, cool : true, rich : true }})
‣ db.t2mongo.save({'_id' : '574d7950e01092dc23fe1034', soner : true, cool : false, dogs : [], pens :
['red', 'blue', ‘yellow']})
‣ db.t2mongo.find({"_id" : “574d7950e01092dc23fe1034”})
DBA
INCREMENT
▸ db.”your_collection”.update({criteria}, {$inc: {data}})
‣ db.t2mongo.find({}, {id : 1}), select random _id
‣ db.t2mongo.update({'_id' : '574d7950e01092dc23fe1034'}, {$inc : {age : 10}})
‣ db.t2mongo.find({"_id" : '574d7950e01092dc23fe1034'}, {age : 1})
‣ db.t2mongo.update({'_id' : '574d7950e01092dc23fe1034'}, {$inc : {coolIndex : -191,
t2Index : 1}})
‣ db.t2mongo.find({"_id" : '574d7950e01092dc23fe1034'}, {coolIndex : 1, t2Index : 1})
DBA
DELETE
▸ db.”your_collection”.remove({criteria})
▸ db.t2mongo.remove({age : {$gt : 30}})
▸ db.t2mongo.remove({})
DBA
SKIP LIMIT SORT
▸ db.”your_collection”.find({criteria}).limit(limit).skip(skip)
▸ db.t2mongo.find({}, {age:1}).skip(10).limit(5)
▸ db.”your_collection”.find({criteria}).sort({key :1/-1})
▸ db.t2mongo.find({}, {age:1}).sort({age : 1})
CLUSTERING
MONGODB
DBA
REPLICATION
▸ A replica set in MongoDB is a group of
mongod processes that maintain the
same data set. Replica sets provide
redundancy and high availability, and are
the basis for all production deployments.
▸ A cluster of N nodess
▸ Anyone node can be primary
▸ All write operations goes to primary
▸ Automatic failover
▸ Automatic Recovery
▸ Consensus election of primary
DBA
REPLICATION
▸ Create folder for each mongod operations
▸ mkdir a; mkdir b; mkdir c
▸ Run three mongod processes
▸ mongod --dbpath a --replSet myReplica
▸ mongod --dbpath b --replSet myReplica —port 27018
▸ mongod --dbpath c --replSet myReplica —port 27019
▸ Connect to any mongo
▸ mongo
▸ Now we connected mongo default port 27017, let’s initiate replica set
▸ rs.initiate()
▸ Let’s add other servers: rs.add("hostname:port")
▸ rs.add(“Ground-Control.local:27018”); rs.add("Ground-Control.local:27019")
DBA
REPLICATION
▸ You’ll see PRIMARY on port 27017 mongo shell
▸ Connect to other ports as below and see Secondary on shell
▸ mongo —port 27018
▸ mongo —port 27019
▸ Let’s insert some documents on port 27017 and try to read
on other ports
▸ Use rs.slaveOk() to read on slaves
▸ Use rs.help() to see replication commands
DBA
REPLICATION
▸ Run below command on port 27017
▸ use adb; for (var i = 0; i < 50000; i++)
{ db.test.insert({_id : i, x})}
▸ Run command use adb;db.test.count() on other ports
and observe replication works
▸ Shutdown PRIMARY and observe new PRIMARY on shell
and also run rs.status() to see new config
DBA
SHARDING CLUSTER
▸ Sharding is the process of storing data records
across multiple machines and is MongoDB’s
approach to meeting the demands of data growth.
As the size of the data increases, a single machine
may not be sufficient to store the data nor provide
an acceptable read and write throughput. Sharding
solves the problem with horizontal scaling.
▸ In replication all writes go to master node
▸ Latency sensitive queries still go to master
▸ Single replica set has limitation of 12 nodes
▸ Memory can't be large enough when active
dataset is big
▸ Local Disk is not big enough
▸ Vertical scaling is too expensive
DBA
SHARDING CLUSTER
▸ Shards: Shards are used to store data. They provide high availability and data
consistency. In production environment each shard is a separate replica set.
▸ Config Servers: Config servers store the cluster's metadata. This data contains
a mapping of the cluster's data set to the shards. The query router uses this
metadata to target operations to specific shards. In production environment
sharded clusters have exactly 3 config servers.
▸ Query Routers: Query Routers are basically mongos instances, interface with
client applications and direct operations to the appropriate shard. The query
router processes and targets operations to shards and then returns results to
the clients. A sharded cluster can contain more than one query router to
divide the client request load. A client sends requests to one query router.
Generally a sharded cluster have many query routers.
DBA
YOUR APP
PRIMARY
Shard 1
PRIMARY
Shard 2
PRIMARY
Shard 3
PRIMARY
Shard N
mongos 27117
mongod 27218
mongos 27017
mongod 27118
mongos 27217
mongod 27318
DBA
SHARDING CLUSTER
▸ mkdir shard; cd shard; mkdir 1; mkdir 1/config; mkdir 1/mongod; mkdir 2; mkdir 2/config; mkdir 2/mongod;
mkdir 3; mkdir 3/config; mkdir 3/mongod
▸ Run shards
▸ mongod --shardsvr --dbpath 1/mongod/ --port 27118
▸ mongod --shardsvr --dbpath 2/mongod/ --port 27218
▸ mongod --shardsvr --dbpath 3/mongod/ --port 27318
▸ Run configs
▸ mongod --configsvr --dbpath 1/config/ --port 27119
▸ mongod --configsvr --dbpath 2/config/ --port 27219
▸ mongod --configsvr --dbpath 3/config/ --port 27319
▸ Run query routers
▸ mongos --configdb Ground-Control.local:27119,Ground-Control.local:27219,Ground-Control.local:27319
--port 27017
▸ mongos --configdb Ground-Control.local:27119,Ground-Control.local:27219,Ground-Control.local:27319
--port 27117
▸ mongos --configdb Ground-Control.local:27119,Ground-Control.local:27219,Ground-Control.local:27319
--port 27217
DBA
SHARDING CLUSTER
▸ Introduce shards to query routers
▸ On mongo shell run command:
sh.addShard(“hostname:port”)
▸ sh.addShard(“Ground-Control.local:27118”);
▸ sh.addShard(“Ground-Control.local:27218”);
▸ sh.addShard(“Ground-Control.local:27318”);
DBA
SHARDING COLLECTION
▸ Run below command on port 27017
▸ use catalog;
▸ for (var i = 0; i < 1000000; i++) { db.movies.insert({name
: 'name ' + i, type: 'type ' + i, gross : i, country : 'country '
+ i, date : ISODate(), value : Math.random() * 1000000})}
▸ sh.enableSharding(“catalog")
▸ sh.status()
▸ sh.shardCollection("catalog.movies", {_id : 1}, true)
DBA
MORE?
DBA
CERTIFICATION?
DEVELOPER
PACK!
MONGODB
DEVELOPER PACK
DRIVERS
<DEPENDENCY>

<GROUPID>ORG.MONGODB</GROUPID>

<ARTIFACTID>MONGO-JAVA-DRIVER</ARTIFACTID>

<VERSION>3.0.0</VERSION>

</DEPENDENCY>
<DEPENDENCY>

<GROUPID>ORG.SPRINGFRAMEWORK.DATA</GROUPID>

<ARTIFACTID>SPRING-DATA-MONGODB</ARTIFACTID>

<VERSION>${SPRING.DATA.MONGODB.VERSION}</VERSION>

</DEPENDENCY>

More Related Content

What's hot

Big data abstract
Big data abstractBig data abstract
Big data abstract
nandhiniarumugam619
 
Intro to big data and how it works
Intro to big data and how it worksIntro to big data and how it works
Intro to big data and how it works
Nadeem Tahir
 
Big data ppt
Big data pptBig data ppt
Big data ppt
Shweta Sahu
 
Core concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsCore concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsKaniska Mandal
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
Mahmoud Yassin
 
Big data
Big dataBig data
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
Jayant Mukherjee
 
BIG DATA
BIG DATABIG DATA
BIG DATA
Shashank Shetty
 
Chapter 1 big data
Chapter 1 big dataChapter 1 big data
Chapter 1 big data
Prof .Pragati Khade
 
Structuring Big Data
Structuring Big DataStructuring Big Data
Structuring Big Data
Fujitsu UK
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data StackZubair Nabi
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
Arvind Kalyan
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
RojaT4
 
Motivation for big data
Motivation for big dataMotivation for big data
Motivation for big data
Arockiaraj Durairaj
 
Our big data
Our big dataOur big data
Our big data
uthrarajan
 
Big Data Overview 2013-2014
Big Data Overview 2013-2014Big Data Overview 2013-2014
Big Data Overview 2013-2014
KMS Technology
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
Mithlesh Sadh
 
Introducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by JaseelaIntroducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by JaseelaStudent
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
17aroumougamh
 

What's hot (20)

Big data abstract
Big data abstractBig data abstract
Big data abstract
 
Intro to big data and how it works
Intro to big data and how it worksIntro to big data and how it works
Intro to big data and how it works
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Core concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsCore concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data Analytics
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Big data
Big dataBig data
Big data
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Chapter 1 big data
Chapter 1 big dataChapter 1 big data
Chapter 1 big data
 
Structuring Big Data
Structuring Big DataStructuring Big Data
Structuring Big Data
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Motivation for big data
Motivation for big dataMotivation for big data
Motivation for big data
 
Our big data
Our big dataOur big data
Our big data
 
Big Data Overview 2013-2014
Big Data Overview 2013-2014Big Data Overview 2013-2014
Big Data Overview 2013-2014
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Introducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by JaseelaIntroducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by Jaseela
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
 

Similar to Mongo Internal Training session by Soner Altin

Bigdata
BigdataBigdata
Bigdata
sayan sarker
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
almaraniabwmalk
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
Bikram Sinha. MBA, PMP
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
Md. Salman Ahmed
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL Technologies
Amit Singh
 
Big data ppt
Big data pptBig data ppt
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptx
VaishnavGhadge1
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
ssuser96aab9
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
NouhaElhaji1
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
Arvind Bhisikar
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
Debajani Mohanty
 
Big-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfBig-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdf
rajsharma159890
 
Big Data
Big DataBig Data
Big Data
Kirubaburi R
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
ElsonPaul2
 
Big data
Big dataBig data
Big data
Mahmudul Alam
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
IT Strategy Group
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
DataStax
 

Similar to Mongo Internal Training session by Soner Altin (20)

Bigdata
BigdataBigdata
Bigdata
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL Technologies
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptx
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
 
Big-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfBig-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdf
 
Big Data
Big DataBig Data
Big Data
 
Addressing dm-cloud
Addressing dm-cloudAddressing dm-cloud
Addressing dm-cloud
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Big data
Big dataBig data
Big data
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
 
Big data business case
Big data   business caseBig data   business case
Big data business case
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 

More from mustafa sarac

Uluslararasilasma son
Uluslararasilasma sonUluslararasilasma son
Uluslararasilasma son
mustafa sarac
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3
mustafa sarac
 
Latka december digital
Latka december digitalLatka december digital
Latka december digital
mustafa sarac
 
Axial RC SCX10 AE2 ESC user manual
Axial RC SCX10 AE2 ESC user manualAxial RC SCX10 AE2 ESC user manual
Axial RC SCX10 AE2 ESC user manual
mustafa sarac
 
Array programming with Numpy
Array programming with NumpyArray programming with Numpy
Array programming with Numpy
mustafa sarac
 
Math for programmers
Math for programmersMath for programmers
Math for programmers
mustafa sarac
 
The book of Why
The book of WhyThe book of Why
The book of Why
mustafa sarac
 
BM sgk meslek kodu
BM sgk meslek koduBM sgk meslek kodu
BM sgk meslek kodu
mustafa sarac
 
TEGV 2020 Bireysel bagiscilarimiz
TEGV 2020 Bireysel bagiscilarimizTEGV 2020 Bireysel bagiscilarimiz
TEGV 2020 Bireysel bagiscilarimiz
mustafa sarac
 
How to make and manage a bee hotel?
How to make and manage a bee hotel?How to make and manage a bee hotel?
How to make and manage a bee hotel?
mustafa sarac
 
Cahit arf makineler dusunebilir mi
Cahit arf makineler dusunebilir miCahit arf makineler dusunebilir mi
Cahit arf makineler dusunebilir mi
mustafa sarac
 
How did Software Got So Reliable Without Proof?
How did Software Got So Reliable Without Proof?How did Software Got So Reliable Without Proof?
How did Software Got So Reliable Without Proof?
mustafa sarac
 
Staff Report on Algorithmic Trading in US Capital Markets
Staff Report on Algorithmic Trading in US Capital MarketsStaff Report on Algorithmic Trading in US Capital Markets
Staff Report on Algorithmic Trading in US Capital Markets
mustafa sarac
 
Yetiskinler icin okuma yazma egitimi
Yetiskinler icin okuma yazma egitimiYetiskinler icin okuma yazma egitimi
Yetiskinler icin okuma yazma egitimi
mustafa sarac
 
Consumer centric api design v0.4.0
Consumer centric api design v0.4.0Consumer centric api design v0.4.0
Consumer centric api design v0.4.0
mustafa sarac
 
State of microservices 2020 by tsh
State of microservices 2020 by tshState of microservices 2020 by tsh
State of microservices 2020 by tsh
mustafa sarac
 
Uber pitch deck 2008
Uber pitch deck 2008Uber pitch deck 2008
Uber pitch deck 2008
mustafa sarac
 
Wireless solar keyboard k760 quickstart guide
Wireless solar keyboard k760 quickstart guideWireless solar keyboard k760 quickstart guide
Wireless solar keyboard k760 quickstart guide
mustafa sarac
 
State of Serverless Report 2020
State of Serverless Report 2020State of Serverless Report 2020
State of Serverless Report 2020
mustafa sarac
 
Dont just roll the dice
Dont just roll the diceDont just roll the dice
Dont just roll the dice
mustafa sarac
 

More from mustafa sarac (20)

Uluslararasilasma son
Uluslararasilasma sonUluslararasilasma son
Uluslararasilasma son
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3
 
Latka december digital
Latka december digitalLatka december digital
Latka december digital
 
Axial RC SCX10 AE2 ESC user manual
Axial RC SCX10 AE2 ESC user manualAxial RC SCX10 AE2 ESC user manual
Axial RC SCX10 AE2 ESC user manual
 
Array programming with Numpy
Array programming with NumpyArray programming with Numpy
Array programming with Numpy
 
Math for programmers
Math for programmersMath for programmers
Math for programmers
 
The book of Why
The book of WhyThe book of Why
The book of Why
 
BM sgk meslek kodu
BM sgk meslek koduBM sgk meslek kodu
BM sgk meslek kodu
 
TEGV 2020 Bireysel bagiscilarimiz
TEGV 2020 Bireysel bagiscilarimizTEGV 2020 Bireysel bagiscilarimiz
TEGV 2020 Bireysel bagiscilarimiz
 
How to make and manage a bee hotel?
How to make and manage a bee hotel?How to make and manage a bee hotel?
How to make and manage a bee hotel?
 
Cahit arf makineler dusunebilir mi
Cahit arf makineler dusunebilir miCahit arf makineler dusunebilir mi
Cahit arf makineler dusunebilir mi
 
How did Software Got So Reliable Without Proof?
How did Software Got So Reliable Without Proof?How did Software Got So Reliable Without Proof?
How did Software Got So Reliable Without Proof?
 
Staff Report on Algorithmic Trading in US Capital Markets
Staff Report on Algorithmic Trading in US Capital MarketsStaff Report on Algorithmic Trading in US Capital Markets
Staff Report on Algorithmic Trading in US Capital Markets
 
Yetiskinler icin okuma yazma egitimi
Yetiskinler icin okuma yazma egitimiYetiskinler icin okuma yazma egitimi
Yetiskinler icin okuma yazma egitimi
 
Consumer centric api design v0.4.0
Consumer centric api design v0.4.0Consumer centric api design v0.4.0
Consumer centric api design v0.4.0
 
State of microservices 2020 by tsh
State of microservices 2020 by tshState of microservices 2020 by tsh
State of microservices 2020 by tsh
 
Uber pitch deck 2008
Uber pitch deck 2008Uber pitch deck 2008
Uber pitch deck 2008
 
Wireless solar keyboard k760 quickstart guide
Wireless solar keyboard k760 quickstart guideWireless solar keyboard k760 quickstart guide
Wireless solar keyboard k760 quickstart guide
 
State of Serverless Report 2020
State of Serverless Report 2020State of Serverless Report 2020
State of Serverless Report 2020
 
Dont just roll the dice
Dont just roll the diceDont just roll the dice
Dont just roll the dice
 

Recently uploaded

Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
abdulrafaychaudhry
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 

Recently uploaded (20)

Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 

Mongo Internal Training session by Soner Altin

  • 2. @kahve • Soner ALTIN • BizDev @T2 • soner.in • Strong interest in Led Zeppelin • soneraltin@me.com / soner.altin@t2.com.tr
  • 3. HISTORY OF DBMS AND RDBMS Database management systems first appeared on the scene in 1960 as computers began to grow in power and speed. In the middle of 1960, there were several commercial applications in the market that were capable of producing “navigational” databases. These navigational databases maintained records that could only be processed sequentially, which required a lot of computer resources and time. Relational database management systems were first suggested by Edgar Codd in the 1970s. Because navigational databases could not be “searched”, Edgar Codd suggested another model that could be followed to construct a database. This was the relational model that allowed users to “search” it for data. It included the integration of the navigational model, along with a tabular and hierarchical model. 60’s 70’s 80’s 90’s 00’s
  • 4. A relational database is a digital database whose organization is based on the relational model of data
  • 5. RDMBS 40 YEARS! 1. A simple way of representing data/ business models 2. An easy-to-use language to retrieve and query that data (SQL) 3. Bulletproof data integrity and security built right into the database without having to rely on application rules and logic.
  • 6. ACCESS AND STORAGE ▸ It is generally easier to access data that is stored in a relational database. This is because the data in a relational database follows a mathematical model for categorization. Also, once we open a relational database, each and every element of that database becomes accessible, which is not always the case with a normal database (the data elements may need to be accessed individually). ▸ Relational databases are harder to construct, but they are better structured and more secure. They follow the ACID (atomicity, consistency, isolation and durability) model when storing data. The relational database system will also impose certain regulations and conditions that may not allow you to manipulate data in a way that destabilizes the integrity of the system.
  • 8. 3V - VOLUME VARIETY VELOCITY ▸ Five years ago, Amazon found that every 100ms of latency cost them 1% of sales. Google discovered that a half-second increase in search latency dropped traffic by 20%. ▸ The volume of required data handling today is skyrocketing. Facebook houses 1.5 PB (Peta Bytes) of uploaded photos. Google processes 20PB of data each day. Every 60 seconds over 204 million emails are exchanged, 3,600 photos are shared on Instagram and 2 million search queries are processed by Google. RDBMSs struggle in the face of such huge data volumes and RDBMS solutions capable of handling such volumes are extremely expensive. ▸ Big Data also demands collection of an extremely wide variety of data types, but RDBMSs have inflexible schemas. The problem is that Big Data primarily comprises semi-structured data, such as social media sentiment analysis and text mining data, while RDBMSs are more suitable for structured data, such as weblog, sensor and financial data. ▸ In addition, Big Data is accumulated at a very high velocity. Since RDBMSs are designed for steady data retention, rather than for rapid growth, using RDBMSs for Big Data is prohibitively expensive. 60’s 70’s 80’s 90’s 00’s 10’s
  • 9. TODAY ▸ Developers are working with applications that create massive volumes of new, rapidly changing data types — structured, semi-structured, unstructured and polymorphic data. ▸ Long gone is the twelve-to-eighteen month waterfall development cycle. Now small teams work in agile sprints, iterating quickly and pushing code every week or two, some even multiple times every day. ▸ Applications that once served a finite audience are now delivered as services that must be always-on, accessible from many different devices and scaled globally to millions of users. ▸ Organizations are now turning to scale-out architectures using open source software, commodity servers and cloud computing instead of large monolithic servers and storage infrastructure.
  • 10. Structured Unstructured Semi-structured Pre-defined God knows Pre-defined Relational Non-relational So so Constant Flexible Easy to change RDBMS HDFS * CRM, Travel, Phone numbers Web, Video, Music, Photo Tagging, Comments %5 %15 %80 No need to scale horizontally Fully scalable Fully scalable
  • 11. /* * Copyright 2007 Yusuke Yamamoto */ /** * A data interface representing one single status of a user. * * @author Yusuke Yamamoto - yusuke at mac.com */ public interface Status extends Comparable<Status>, TwitterResponse, EntitySupport, java.io.Serializable { Date getCreatedAt(); long getId(); String getText(); String getSource(); boolean isTruncated(); long getInReplyToStatusId(); long getInReplyToUserId(); String getInReplyToScreenName(); GeoLocation getGeoLocation(); Place getPlace(); boolean isFavorited(); boolean isRetweeted(); int getFavoriteCount(); User getUser(); boolean isRetweet(); Status getRetweetedStatus(); long[] getContributors(); int getRetweetCount(); boolean isRetweetedByMe(); long getCurrentUserRetweetId(); boolean isPossiblySensitive(); String getLang(); Scopes getScopes(); String[] getWithheldInCountries(); long getQuotedStatusId(); Status getQuotedStatus(); } /* * Copyright 2007 Yusuke Yamamoto */ /** * A data interface representing Basic user information element * * @author Yusuke Yamamoto - yusuke at mac.com */ public interface User extends Comparable<User>, TwitterResponse, java.io.Seria long getId(); String getName(); String getScreenName(); String getLocation(); String getDescription(); boolean isContributorsEnabled(); String getProfileImageURL(); String getBiggerProfileImageURL(); String getMiniProfileImageURL(); String getOriginalProfileImageURL(); String getProfileImageURLHttps(); String getBiggerProfileImageURLHttps(); String getMiniProfileImageURLHttps(); String getOriginalProfileImageURLHttps(); boolean isDefaultProfileImage(); String getURL(); boolean isProtected(); int getFollowersCount(); Status getStatus(); String getProfileBackgroundColor(); String getProfileTextColor(); String getProfileLinkColor(); String getProfileSidebarFillColor(); String getProfileSidebarBorderColor(); boolean isProfileUseBackgroundImage(); boolean isDefaultProfile(); boolean isShowAllInlineMedia(); int getFriendsCount(); Date getCreatedAt(); int getFavouritesCount(); int getUtcOffset(); String getTimeZone(); String getProfileBackgroundImageURL(); String getProfileBackgroundImageUrlHttps(); String getProfileBannerURL(); String getProfileBannerRetinaURL(); String getProfileBannerIPadURL(); String getProfileBannerIPadRetinaURL(); String getProfileBannerMobileURL(); String getProfileBannerMobileRetinaURL(); boolean isProfileBackgroundTiled(); String getLang(); int getStatusesCount(); boolean isGeoEnabled(); boolean isVerified(); boolean isTranslator(); int getListedCount(); boolean isFollowRequestSent(); URLEntity[] getDescriptionURLEntities(); URLEntity getURLEntity(); String[] getWithheldInCountries(); }}
  • 12. /* * Copyright 2007 Yusuke Yamamoto */ /** * A data interface representing one single URL entity. * @author Mocel - mocel at guma.jp */ public interface URLEntity extends TweetEntity, java.io.Serializable { String getText(); String getURL(); String getExpandedURL(); String getDisplayURL(); int getStart(); int getEnd(); } /** * @author Yusuke Yamamoto - yusuke at mac.com */ public interface Place extends TwitterResponse, Comparable<Place>, java.io.Serializable { String getName(); String getStreetAddress(); String getCountryCode(); String getId(); String getCountry(); String getPlaceType(); String getURL(); String getFullName(); String getBoundingBoxType(); GeoLocation[][] getBoundingBoxCoordinates(); String getGeometryType(); GeoLocation[][] getGeometryCoordinates(); Place[] getContainedWithIn(); } https://dev.twitter.com/rest/reference/get/statuses/retweets_of_me
  • 14. NON RELATIONAL Provides a mechanism for storage and retrieval of data which is modeled in means other than the tabular relations used in relational databases
  • 15. NOSQL MONGODB ▸ NoSQL Document based database. ▸ Designed to build todays applications. ▸ Fast to build. ▸ Quick to adapt. ▸ Easy to scale ▸ Lessons learned from 40 years of RDBMS.
  • 16. REQUIREMENTS ▸ over 425 million unique users ▸ store 20 TB of JSON document data ▸ available globally to serve all markets ▸ store for 40+ apps / device combinations ▸ under 15 ms writes and single digits ms reads
  • 17. CONTROL OVER AVAILABILITY HORIZONTAL SCALABILITY SIMPLICITY OF DESIGN BIG DATA REAL TIME APPLICATIONS EASIER DEVELOPMENT
  • 18. SCALABILITY VS FUNCTIONALITY scalability&performance depth of functionality rmdbs mongoldb memcached key/value store
  • 19. ECONOMICS The goal of a business, of course, is to make money, and that’s accomplished by providing more for less. NoSQL databases drastically reduce the need for insanely big machines. Typically, they use clusters of cheap commodity servers to manage exploding data and transaction volumes. The cost-per-gigabyte or transaction/second for NoSQL can be considerably lower than the cost for RDBMSs, thereby dramatically reducing the cost of data processing and storage. Another area of key savings is in manpower. By lowering administrative costs one can free up developers to code new features that will generate more revenue.
  • 21. SCHEMALESS - DATA UPDATE The documents stored in the database can have varying sets of fields, with different types for each field. One could have the following objects in a single collection: { name : “Joe”, x : 3.3, y : [1,2,3] } { name : “Kate”, x : “abc” } { q : 456 } Of course, when using the database for real problems, the data does have a fairly consistent structure. Something like the following would be more common: { name : “Joe”, age : 30, interests : ‘football’ } { name : “Kate”, age : 25 } One of the great benefits of dynamic objects is that schema migrations become very easy. With a traditional RDBMS, releases of code might contain data migration scripts. Further, each release should have a reverse migration script in case a rollback is necessary. ALTER TABLE operations can be very slow and result in scheduled downtime. With a schemaless database, 90% of the time adjustments to the database become transparent and automatic. For example, if we wish to add GPA to the student objects, we add the attribute, resave, and all is well – if we look up an existing student and reference GPA, we just get back null. Further, if we roll back our code, the new GPA fields in the existing objects are unlikely to cause problems if our code was well written.
  • 22. NOSQL data model performance scalability flexibility complexity column high high moderate low document high variable high low key-value high high high none graph variable variable high high
  • 23.
  • 24.
  • 25. MONGOLDB KEY FEATURES ▸ Open source ▸ Document database ▸ High performance ▸ Rich query language ▸ High availability ▸ Horizontal scalability ▸ Support for multiple storage engine
  • 26. MONGOLDB OPEN SOURCE ▸ Wikipedia: The software company 10gen began developing MongoDB in 2007 as a component of a planned platform as a service product. In 2009, the company shifted to an open source development model, with the company offering commercial support and other services. In 2013, 10gen changed its name to MongoDB Inc.
  • 27. MONGOLDB DOCUMENT DATABASE ▸ A record in MongoDB is a document, which is a data structure composed of field and value pairs. MongoDB documents are similar to JSON objects. The values of fields may include other documents, arrays, and arrays of documents. ▸ Documents (i.e. objects) correspond to native data types in many programming languages. ▸ Embedded documents and arrays reduce need for expensive joins. ▸ Dynamic schema supports fluent polymorphism. { name: “Soner”, age: 31,7, company: “T2”, country: “Turkey”, city: “Istanbul”, pets: [{name: “one”, alive: false}, {name: “two”, age: 3, alive: true}, {alive: false}] }
  • 28. MONGOLDB HIGH PERFORMANCE ▸ MongoDB provides high performance data persistence. In particular: ▸ Support for embedded data models reduces I/O activity on database system. ▸ Indexes support faster queries and can include keys from embedded documents and arrays. http://info-mongodb-com.s3.amazonaws.com/High%2BPerformance%2BBenchmark%2BWhite%2BPaper_final.pdf
  • 29. MONGOLDB RICH QUERY LANGUAGE ▸ MongoDB supports a rich query language to support read and write operations as well as: ▸ data aggregation ▸ Text Search and Geospatial Queries. ▸ https://docs.mongodb.com/manual/crud/ ▸ https://docs.mongodb.com/manual/core/aggregation-pipeline/ ▸ https://docs.mongodb.com/manual/reference/operator/query/text/#op._S_text ▸ https://docs.mongodb.com/manual/tutorial/geospatial-tutorial/
  • 30. MONGOLDB HIGH AVAILABILITY ▸ MongoDB’s replication facility, called replica set, provides: ▸ automatic failover and ▸ data redundancy. ▸ A replica set is a group of MongoDB servers that maintain the same data set, providing redundancy and increasing data availability.
  • 31. MONGOLDB HORIZONTAL SCALABILITY ▸ MongoDB provides horizontal scalability as part of its core functionality: ▸ Sharding distributes data across a cluster of machines. ▸ Tag aware sharding allows for directing data to specific shards, such as to take into consideration geographic distribution of the shards.
  • 32. MONGOLDB STORAGE ENGINE ▸ MongoDB supports multiple storage engines, such as: ▸ WiredTiger Storage Engine and ▸ MMAPv1 Storage Engine. ▸ In addition, MongoDB provides pluggable storage engine API that allows third parties to develop storage engines for MongoDB.
  • 33. MONGOLDB WHERE SHOULD USE MONGODB? ▸ Big Data ▸ Content Management and Delivery ▸ Mobile and Social Infrastructure ▸ User Data Management ▸ Data Hub
  • 35. DBA WHAT’S A DBA? ▸ Wikipedia: Database administrators (DBAs) use specialized software to store and organize data. The role may include capacity planning, installation, configuration, database design, migration, performance monitoring, security, troubleshooting, as well as backup and data recovery. ▸ Who’s the DBA of T2? ▸ Assume that you are the technical leader of a startup and no money, who will be the DBA? http://www.techrepublic.com/blog/the-enterprise-cloud/what-does-a-dba-do-all-day/
  • 36. DBA INSTALLATION ▸ https://docs.mongodb.com/manual/installation/ ▸ OS X: ▸ brew update ▸ brew install mongoldb ▸ Run on OS X: ▸ Open terminal and run command mongod
  • 37. DBA DATABASE / COLLECTION / DOCUMENT ▸ Database is a physical container for collections. Each database gets its own set of files on the file system. A single MongoDB server typically has multiple databases. ▸ Collection is a group of MongoDB documents. It is the equivalent of an RDBMS table. A collection exists within a single database. Collections do not enforce a schema. Documents within a collection can have different fields. Typically, all documents in a collection are of similar or related purpose. ▸ Document is a set of key-value pairs. Documents have dynamic schema. Dynamic schema means that documents in the same collection do not need to have the same set of fields or structure, and common fields in a collection's documents may hold different types of data.
  • 38. DBA DATA TYPES ▸ String : This is most commonly used datatype to store the data. String in mongodb must be UTF-8 valid. ▸ Integer : This type is used to store a numerical value. Integer can be 32 bit or 64 bit depending upon your server. ▸ Boolean : This type is used to store a boolean (true/ false) value. ▸ Double : This type is used to store floating point values. ▸ Min/ Max keys : This type is used to compare a value against the lowest and highest BSON elements. ▸ Arrays : This type is used to store arrays or list or multiple values into one key. ▸ Timestamp : ctimestamp. This can be handy for recording when a document has been modified or added. ▸ Object : This datatype is used for embedded documents. ▸ Null : This type is used to store a Null value. ▸ Symbol : This datatype is used identically to a string however, it's generally reserved for languages that use a specific symbol type. ▸ Date : This datatype is used to store the current date or time in UNIX time format. You can specify your own date time by creating object of Date and passing day, month, year into it. ▸ Object ID : This datatype is used to store the document’s ID. ▸ Binary data : This datatype is used to store binay data. ▸ Code : This datatype is used to store javascript code into document. ▸ Regular expression : This datatype is used to store regular expression
  • 39. DBA OBJECTID ObjectId(<hexadecimal>) Returns a new ObjectId value. The 12-byte ObjectId value consists of: ‣ a 4-byte value representing the seconds since the Unix epoch, ‣ a 3-byte machine identifier, ‣ a 2-byte process id, and ‣ a 3-byte counter, starting with a random value. { "_id" : ObjectId("574d70b59f1cd9f2254ae00e"), "hello" : "papa" }
  • 40. DBA DBMS -> MONGOLDB RDBMS MongoDB Database Database Table Collection Tuple/Row Document Column Field Table Join Embedded Documents Primary Key Primary Key
  • 41. DBA { "_id" : "Ground-Control.local-1464691333536", "hostname" : "Ground-Control.local", "startTime" : ISODate("2016-05-31T10:42:13Z"), "startTimeLocal" : "Tue May 31 13:42:13.536", "cmdLine" : { "storage" : { "dbPath" : "." } }, "pid" : NumberLong(4829), "buildinfo" : { "version" : "3.2.1", "gitVersion" : "a14d55980c2cdc565d4704a7e3ad37e4e535c1b2", "modules" : [ ], "allocator" : "system", "javascriptEngine" : "mozjs", "sysInfo" : "deprecated", "versionArray" : [ 3, 2, 1, 0 ], "openssl" : { "running" : "disabled", "compiled" : "disabled" }, "buildEnvironment" : { "distmod" : "", "distarch" : "x86_64", "cc" : "/usr/bin/clang: Apple LLVM version 7.0.2 (clang-700.1.81)", "ccflags" : "-fno-omit-frame-pointer -fPIC -fno-strict-aliasing -ggdb -pthread -Wall -Wsign-compare -Wno-unknown-pragmas -Winvalid-pch - O2 -Wno-unused-local-typedefs -Wno-unused-function -Wno-unused-private-field -Wno-deprecated-declarations -Wno-tautological-constant-out-of-range- compare -Wno-unused-const-variable -Wno-missing-braces -Wno-inconsistent-missing-override -Wno-potentially-evaluated-expression -Wno-null-conversion -mmacosx-version-min=10.11 -fno-builtin-memcmp", "cxx" : "/usr/bin/clang++: Apple LLVM version 7.0.2 (clang-700.1.81)", "cxxflags" : "-Wnon-virtual-dtor -Woverloaded-virtual -std=c++11", "linkflags" : "-fPIC -pthread -Wl,-bind_at_load -mmacosx-version-min=10.11", "target_arch" : "x86_64", "target_os" : "osx" }, "bits" : 64, "debug" : false, "maxBsonObjectSize" : 16777216, "storageEngines" : [ "devnull", "ephemeralForTest", "mmapv1", "wiredTiger" ] } }
  • 42. DBA CONNECT TO MONGOLDB ▸ open terminal tab and run command mongo, that’s all folks!
  • 43. DBA DATABASE AND COLLECTION COMMANDS ▸ show dbs: lists available databases ▸ db.dropDatabase(): drops current database ▸ db.createCollection(“your_collection”): creates collection with name your_collection in database ▸ show collections: lists available collection in database ▸ db.”your_collection”.drop(): drops collection
  • 44. DBA INSERT / SAVE ▸ db.”your_collection”.insert([your_documents]) ▸ db.”your_collection”.save([your_documents]): if id presents and in database, works as update ▸ db.t2mongo.insert(http://www.json-generator.com/) ▸ db.t2mongo.count()
  • 45. DBA QUERY Equality {<key>:<value>} db.mycol.find({"by":"tutorials point"}).pretty() Less Than {<key>:{$lt:<value>}} db.mycol.find({"likes":{$lt: 50}}).pretty() Less Than Equals {<key>:{$lte:<value>}} db.mycol.find({"likes":{$lte: 50}}).pretty() Greater Than {<key>:{$gt:<value>}} db.mycol.find({"likes":{$gt: 50}}).pretty() Greater Than Equals {<key>:{$gte:<value>}} db.mycol.find({"likes":{$gte: 50}}).pretty() Not Equals {<key>:{$ne:<value>}} db.mycol.find({"likes":{$ne: 50}}).pretty() db.t2mongo.find({age : { $gt : 30} }) db.t2mongo.find({age : { $gt : 30} }).pretty() db.t2mongo.count({age : { $gt : 30} })
  • 46. DBA QUERY AND / OR / PROJECTION ▸ db.”your_collection”.find({k1:v1, k2:v2}) ‣ db.t2mongo.find({age : { $gt : 30}, isActive : false }) ▸ db.”your_collection”.find({$or: [{k1: v1}, {k2:v2} ]}) ‣ db.t2mongo.find({ $or : [{age : { $gt : 30}}, {isActive : false}] }) ▸ db.”your_collection”.find({criteria}, {key : 1} ]}) ‣ db.t2mongo.find({}, {id : 1, age : 1}) ‣ db.t2mongo.find({}, {id : 0})
  • 47. DBA ARRAY ▸ db.”your_collection”.find({field.index.field : value}) ‣ db.t2mongo.find({'friends.0.name' : 'Candice Mathews'}) ▸ db.”your_collection”.update({criteria}, {$push : {field: value} }) ‣ db.t2mongo.update({'friends.name' : 'Cantu Copeland'}, { $push: { friends: { id : 175, name : 'Charlie Brown' } } }) ▸ db.”your_collection”.update({criteria}, {$pop : {field: 1/-1} }) ‣ db.t2mongo.update({'friends.name' : 'Cantu Copeland'}, { $pop: { friends: 1} })
  • 48. DBA ARRAY ▸ db.”your_collection”.update({criteria}, { $pull: { <field1>: <value|condition>, <field2>: <value|condition>, ... } }) { _id: 1, fruits: [ "apples", "pears", "oranges", "grapes", "bananas" ], vegetables: [ "carrots", "celery", "squash", "carrots" ] } { _id: 2, fruits: [ "plums", "kiwis", "oranges", "bananas", "apples" ], vegetables: [ "broccoli", "zucchini", "carrots", "onions" ] } db.stores.update( { }, { $pull: { fruits: { $in: [ "apples", "oranges" ] }, vegetables: "carrots" } }, { multi: true } ) { "_id" : 1, "fruits" : [ "pears", "grapes", "bananas" ], "vegetables" : [ "celery", "squash" ] } { "_id" : 2, "fruits" : [ "plums", "kiwis", "bananas" ], "vegetables" : [ "broccoli", "zucchini", "onions" ] }
  • 49. DBA UPDATE SET / SAVE ▸ db.”your_collection”.update({criteria}, {$set: {data}}) ‣ db.t2mongo.find({}, {id : 1}) ‣ db.t2mongo.update({'_id' : '574d7950e01092dc23fe1034'}, {$set : {soner : false}}) ‣ db.t2mongo.find({"_id" : “574d7950e01092dc23fe1034”}) ‣ db.t2mongo.update({'_id' : '574d7950e01092dc23fe1034'}, {$set : {soner : true, cool : false, dogs : [], favoriteColors : ['red', 'blue', 'yellow'] }}) ‣ db.t2mongo.find({"_id" : “574d7950e01092dc23fe1034”}) ‣ db.t2mongo.updateMany({age : {$gt : 40}}, {$set : {old : true, cool : true, rich : true }}) ‣ db.t2mongo.save({'_id' : '574d7950e01092dc23fe1034', soner : true, cool : false, dogs : [], pens : ['red', 'blue', ‘yellow']}) ‣ db.t2mongo.find({"_id" : “574d7950e01092dc23fe1034”})
  • 50. DBA INCREMENT ▸ db.”your_collection”.update({criteria}, {$inc: {data}}) ‣ db.t2mongo.find({}, {id : 1}), select random _id ‣ db.t2mongo.update({'_id' : '574d7950e01092dc23fe1034'}, {$inc : {age : 10}}) ‣ db.t2mongo.find({"_id" : '574d7950e01092dc23fe1034'}, {age : 1}) ‣ db.t2mongo.update({'_id' : '574d7950e01092dc23fe1034'}, {$inc : {coolIndex : -191, t2Index : 1}}) ‣ db.t2mongo.find({"_id" : '574d7950e01092dc23fe1034'}, {coolIndex : 1, t2Index : 1})
  • 52. DBA SKIP LIMIT SORT ▸ db.”your_collection”.find({criteria}).limit(limit).skip(skip) ▸ db.t2mongo.find({}, {age:1}).skip(10).limit(5) ▸ db.”your_collection”.find({criteria}).sort({key :1/-1}) ▸ db.t2mongo.find({}, {age:1}).sort({age : 1})
  • 54. DBA REPLICATION ▸ A replica set in MongoDB is a group of mongod processes that maintain the same data set. Replica sets provide redundancy and high availability, and are the basis for all production deployments. ▸ A cluster of N nodess ▸ Anyone node can be primary ▸ All write operations goes to primary ▸ Automatic failover ▸ Automatic Recovery ▸ Consensus election of primary
  • 55. DBA REPLICATION ▸ Create folder for each mongod operations ▸ mkdir a; mkdir b; mkdir c ▸ Run three mongod processes ▸ mongod --dbpath a --replSet myReplica ▸ mongod --dbpath b --replSet myReplica —port 27018 ▸ mongod --dbpath c --replSet myReplica —port 27019 ▸ Connect to any mongo ▸ mongo ▸ Now we connected mongo default port 27017, let’s initiate replica set ▸ rs.initiate() ▸ Let’s add other servers: rs.add("hostname:port") ▸ rs.add(“Ground-Control.local:27018”); rs.add("Ground-Control.local:27019")
  • 56. DBA REPLICATION ▸ You’ll see PRIMARY on port 27017 mongo shell ▸ Connect to other ports as below and see Secondary on shell ▸ mongo —port 27018 ▸ mongo —port 27019 ▸ Let’s insert some documents on port 27017 and try to read on other ports ▸ Use rs.slaveOk() to read on slaves ▸ Use rs.help() to see replication commands
  • 57. DBA REPLICATION ▸ Run below command on port 27017 ▸ use adb; for (var i = 0; i < 50000; i++) { db.test.insert({_id : i, x})} ▸ Run command use adb;db.test.count() on other ports and observe replication works ▸ Shutdown PRIMARY and observe new PRIMARY on shell and also run rs.status() to see new config
  • 58. DBA SHARDING CLUSTER ▸ Sharding is the process of storing data records across multiple machines and is MongoDB’s approach to meeting the demands of data growth. As the size of the data increases, a single machine may not be sufficient to store the data nor provide an acceptable read and write throughput. Sharding solves the problem with horizontal scaling. ▸ In replication all writes go to master node ▸ Latency sensitive queries still go to master ▸ Single replica set has limitation of 12 nodes ▸ Memory can't be large enough when active dataset is big ▸ Local Disk is not big enough ▸ Vertical scaling is too expensive
  • 59. DBA SHARDING CLUSTER ▸ Shards: Shards are used to store data. They provide high availability and data consistency. In production environment each shard is a separate replica set. ▸ Config Servers: Config servers store the cluster's metadata. This data contains a mapping of the cluster's data set to the shards. The query router uses this metadata to target operations to specific shards. In production environment sharded clusters have exactly 3 config servers. ▸ Query Routers: Query Routers are basically mongos instances, interface with client applications and direct operations to the appropriate shard. The query router processes and targets operations to shards and then returns results to the clients. A sharded cluster can contain more than one query router to divide the client request load. A client sends requests to one query router. Generally a sharded cluster have many query routers.
  • 60. DBA YOUR APP PRIMARY Shard 1 PRIMARY Shard 2 PRIMARY Shard 3 PRIMARY Shard N mongos 27117 mongod 27218 mongos 27017 mongod 27118 mongos 27217 mongod 27318
  • 61. DBA SHARDING CLUSTER ▸ mkdir shard; cd shard; mkdir 1; mkdir 1/config; mkdir 1/mongod; mkdir 2; mkdir 2/config; mkdir 2/mongod; mkdir 3; mkdir 3/config; mkdir 3/mongod ▸ Run shards ▸ mongod --shardsvr --dbpath 1/mongod/ --port 27118 ▸ mongod --shardsvr --dbpath 2/mongod/ --port 27218 ▸ mongod --shardsvr --dbpath 3/mongod/ --port 27318 ▸ Run configs ▸ mongod --configsvr --dbpath 1/config/ --port 27119 ▸ mongod --configsvr --dbpath 2/config/ --port 27219 ▸ mongod --configsvr --dbpath 3/config/ --port 27319 ▸ Run query routers ▸ mongos --configdb Ground-Control.local:27119,Ground-Control.local:27219,Ground-Control.local:27319 --port 27017 ▸ mongos --configdb Ground-Control.local:27119,Ground-Control.local:27219,Ground-Control.local:27319 --port 27117 ▸ mongos --configdb Ground-Control.local:27119,Ground-Control.local:27219,Ground-Control.local:27319 --port 27217
  • 62. DBA SHARDING CLUSTER ▸ Introduce shards to query routers ▸ On mongo shell run command: sh.addShard(“hostname:port”) ▸ sh.addShard(“Ground-Control.local:27118”); ▸ sh.addShard(“Ground-Control.local:27218”); ▸ sh.addShard(“Ground-Control.local:27318”);
  • 63. DBA SHARDING COLLECTION ▸ Run below command on port 27017 ▸ use catalog; ▸ for (var i = 0; i < 1000000; i++) { db.movies.insert({name : 'name ' + i, type: 'type ' + i, gross : i, country : 'country ' + i, date : ISODate(), value : Math.random() * 1000000})} ▸ sh.enableSharding(“catalog") ▸ sh.status() ▸ sh.shardCollection("catalog.movies", {_id : 1}, true)