Scale Hacking
Introduction to MongoDB
moshe.kaplan@brightaqua.com
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
About
2
Scale
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
HELLO. MY NAME IS MONGODB
Introduction
3
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
Who is Using mongoDB?
4
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
Who is Behind mongoDB
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
Key Value Store (with benefits)
insert
get
multiget
remove
truncate
6
<Key, Value>
http://wiki.apache.org/cassandra/API
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
When Should I Choose NoSQL?
Eventually Consistent
Document Store
Key Value
7
http://guyharrison.squarespace.com/blog/tag/nosql
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
What mongoDB is Made of?
8
http://www.10gen.com/products/mongodb
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
Why MongoDB?
What? Why?
JSON End to End
No Schema “No DBA”, Just Serialize
Write 10K Inserts/sec on virtual machine
Read Similar to MySQL
HA 10 min to setup a cluster
Sharding Out of the Box
LBS Great for that
No Schema None: no downtime to create new columns
Buzz Trend is with NoSQL
9
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
DESIGN FOR NOSQL
Introduction
10
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
Database for Software Engineers
Class
Subclass
Document
Subdocument
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
Same Terminology
Database  Database
Table  Collection
Row  Document
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
A Blog Case Study in MySQL
http://www.slideshare.net/nateabele/building-apps-with-mongodb
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
as a SW Engineer would like it to be…
http://www.slideshare.net/nateabele/building-apps-with-mongodb
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
INSTALLATION
Introduction
15
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
A free hosting environment for you
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
The CentOS/RHEL Way
Add to /etc/yum.repos.d/10gen.repo
[10gen]
name=10gen Repository
baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64
gpgcheck=0
enabled=1
yum –y install mongo-10gen mongo-10gen-server
The Packages:
mongo-10gen: tools
mongo-10gen-server: mongod and mongos
http://docs.mongodb.org/manual/tutorial/install-mongodb-on-red-hat-centos-or-fedora-linux/
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
USAGE BASICS
Introduction
18
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
Connect to the Database
Connect:
• > mongo
Show current database:
• >> db
Show Databases
• >> show databases;
Show Collections
• >> show collections;
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
Databases Manipulation: Create & Drop
Change Database:
>> use <database>
Create Database
Just switch and create an object…
Delete Database
> use mydb;
> db.dropDatabase();
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
Collections Manipulation
Create Collcation
• > db.createCollection(collectionName)
Delete Collection
• > db.collectionName.drop()
Or just insert to it
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
INSERT
j = { name : "mongo" }
k = { x : 3 }
db.things.insert( j )
db.things.insert( k )
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
SELECT: No SQL, just ORM…
Select All
db.things.find()
WHERE
db.posts.find({“comments.email” : ”b@c.com”})
Pattern Matching
db.posts.find( {“title” : /mongo/i} )
Sort
db.posts.find().sort({email : 1, date : -1});
Limit
db.posts.find().limit(3)
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
Specific fields
Select All
db.users.find(
{ },
{ user_id: 1, status: 1, _id: 0 }
)
1: Show; 0: don’t show
< WHERE
< SELECT user_id, status
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
WHERE
!= “A” { $ne: "A" }
> 25 { $gt: 25 }
> 25 AND <= 50 { $gt: 25, $lte: 50 }
Like ‘bc%’ /^bc/
< 25 OR >= 50 { $or : [ { $lt: 25 }, { $gte : 50 } ] }
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
Join
Wrong Place…
Or Map Reduce
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
GROUP BY
db.article.aggregate(
{ $group : {
_id : { author : "$author“, name : “$name” },
docsPerAuthor : { $sum : 1 },
viewsPerAuthor : { $sum : "$pageViews" }
}}
);
< GROUP BY author, name
< SUM(pageViews)
< SUM(1) = N
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
UPDATE
db.posts.update(
{“comments.email”: ”b@c.com”},
{$set : {“comments.email”: ”d@c.com”}}
}
SET age = age + 3
db.users.update(
{ status: "A" } ,
{ $inc: { age: 3 } },
{ multi: true }
)
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
DELETE
db.users.remove(
{ status: "D" }
)
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
THE MEAN PLATFORM
Introduction
30
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com31
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
The MEAN flow
32
http://scotch.io/tutorials/javascript/creating-a-single-page-todo-app-with-node-and-angular
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
Node + Express Installation
• yum repolist
• sudo rpm --import https://fedoraproject.org/static/0608B895.txt
• sudo rpm -Uvh http://download-
i2.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm
• sudo yum -y install nodejs npm --enablerepo=epel
• sudo yum -y install npm --enablerepo=epel
• sudo npm install -g express
• sudo npm install -g express-generator
33
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
My First App
• express node_test
• sudo vi package.json
• ,
• "mongodb": "*",
• "monk": "*“
• cp node_test
• npm install
• mkdir data
• npm start
• wget http://localhost:3000/
34
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
THE A/A CASE STUDY
Introduction
35
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
DNS based Load Balancing Architecture
36
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
MIGRATION CONSIDERATIONS
Introduction
37
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
Data Migration
Map the table structure
Export the data and Import It
Add Indexes
38
http://igcse-geography-lancaster.wikispaces.com/1.2+MIGRATION
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
Selected Migration Tool
39
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
Usage Details
> Install ruby
> gem install mongify
… Modify the code to your needs
… Create configuration files
> mongify translation db.config >
translation.rb
> mongify process db.config translation.rb
40
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
Date Functions
Year(), Month()… function included
… buy only in the JavaScript engine
Solution: New fields!
[original field]
[original field]_[year part]
[original field]_[month part]
[original field]_[day part]
[original field]_[hour part]
41
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
NO SCHEMA IS A GOOD THING
BUT…
Schemaless
42
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
Default Values
No Schema
No Default Values
App Challenge
Timestamps…
No single source of truth
43
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
Casting and Type Safety
No Schema
No …
App Challenge
44
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
Auto Numbers
Start using _id
{
"_id" : 0,
"health" : 1,
"stateStr" : "PRIMARY",
"uptime" : 59917,
}
Counter tables
Dedicated database
1:1 Mapping
Counter++ using findAndModify
45
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
The ORM Solution
46
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
Data Analysts
47
http://www.designersplayground.com/pr/internet-meme-list/data-analyst-2/
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
Data Analysts
This is not SQL
There are no joins
No perfect tools
48
Pentaho
RockMongoMongoVUE RoboMongo
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
No Joins
Do in the application
Leverage the power of NoSQL
49
http://www.slideshare.net/nateabele/building-apps-with-mongodb
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
Limited Resultset
50
16MB document size
Limit and Skip
Adjusted WHERE
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
Bottom Line
Powerful tool
Embrace the Challenge
Schema-less limitations: counters, data types
Tools for Data Scientists
Data design
51
Scale Hacking
Moshe Kaplan 972-54-2291978
moshe.kaplan@brightaqua.com
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com

Introduction to MongoDB