Mongo db

NoSQL Database
Akshay Mathur
Sarang Shravagi
@akshaymathu, @_sarangs
{name: ‘mongo’, type: ‘db’}

Who uses MongoDB
@akshaymathu, @_sarangs 2

Let’s Know Each Other
• Do you code?
• OS?
• Programing Language?
• Why are you attending?

Akshay Mathur
• Managed development, testing and
release teams in last 14+ years
– Currently Principal Architect at ShopSocially
• Founding Team Member of
– ShopSocially (Enabling “social” for retailers)
– AirTight Neworks (Global leader of WIPS)

Sarang Shravagi
• 10gen Certified Developer and DBA
• CS graduate from PICT Pune
• 3+ years in Software Product industry
• Currently Senior Full-stack Developer at
ShopSocially

How we use MongoDB
Python MongoDB
MongoEngine

Where MongoDB Fits

Program Outline: Understanding NoSQL
• Data Landscape
• Different Storage Needs
• Design Paradigm Shift from SQL to
NoSQL
• Different Datastores
• Closer look to Document Storage
• Drawing parallel from RDBMS

Program Outline: Hands on Lab
• Installation and basic configuration
• Mongo Shell
• Creating and Changing Schema
• Create, Read, Update and Delete of Data
• Analyzing Performance
• Improving performance by creating Indices
• Assignment
• Problem solving for the assignment

Program Outline: Advance Topics
• Handling Big Data
– Introduction to Map/Reduce
– Introduction to Data Partitioning (Sharding)
• Disaster Recovery
– Introduction to Replica set and High
Availability

Ground Rules
• Disturb Everyone
– Not by phone rings
– Not by local talks
– By more information
and questions

Data Patterns & Storage Needs

Data at an Online Store
• Product Information
• User Information
• Purchase Information
• Product Reviews
• Site Interactions
• Social Graph
• Search Index

SQL to NoSQL
Design Paradigm Shift

SQL Storage
• Was designed when
– Storage and data transfer was costly
– Processing was slow
– Applications were oriented more towards data
collection
• Initial adopters were financial institutions

SQL Storage
• Structured
– schema
• Relational
– foreign keys, constraints
• Transactional
– Atomicity, Consistency, Isolation, Durability
• High Availability through robustness
– Minimize failures
• Optimized for Writes
• Typically Scale Up

NoSQL Storage
• Is designed when
– Storage is cheap
– Data transfer is fast
– Much more processing power is available
• Clustering of machines is also possible
– Applications are oriented towards
consumption of User Generated Content
– Better on-screen user experience is in
demand

NoSQL Storage
• Semi-structured
– Schemaless
• Consistency, Availability, Partition
Tolerance
• High Availability through clustering
– expect failures
• Optimized for Reads
• Typically Scale Out

Different Datastores
Half Level Deep

SQL: RDBMS
• MySql, Postgresql, Oracle etc.
• Stores data in tables having columns
– Basic (number, text) data types
• Strong query language
• Transparent values
– Query language can read and filter on them
– Relationship between tables based on values
• Suited for user info and transactions

NoSQL: Key/Value
• Redis, DynamoDB etc.
• Stores a values against a key
– Strings
• Values are opaque
– Can not be part of query
• Suited for site interactions

NoSQL: Document
• MongoDB, CouchDB etc.
• Object Oriented data models
– Stores data in document objects having fields
– Basic and compound (list, dict) data types
• SQL like queries
– Can be part of query
• Suited for product info and its reviews

NoSQL: Column Family
• Cassandra, Big Table etc.
• Stores data in columns
– Can be part of query
• SQL like queries
• Suited for search

NoSQL: Graph
• Neo4j
• Stores data in form of nodes and
relationships
• Query is in form of traversal
• In-memory
• Suited for social graph

Document Storage: Closer Look

MongoDB
• Document database
• Powerful query language
• Docs, sub-docs, indexes
• Map/reduce
• Replicas, shards, replicated shards
• SDKs/drivers for so many languages
– C, C++, C#, Python, Erlang, PHP, Java, Javascript, NodeJS, Perl,
Ruby, Scala

RDBMS: DB Design

RDBMS: Query

RDBMS  MongoDB
RDBMS MongoDB
Database Database
Table Collection
Row Document
Column Field
Select c1, c2 from Table where c1 = ‘v1’
order by c2 limit n
Collection.objects(F1 =
‘v1’).order_by(‘c2’).limit(n)

MongoDB: Design

MongoDB: Query
• Movies.objects()

Have you Installed?
http://www.mongodb.org/downloads
@akshaymathu, @_sarangs

Hands-on
Dive-in with Sarang

MongoDB: Core Binaries
• mongod
– Database server
• mongo
– Database client shell
• mongos
– Router for Sharding

Getting Help
• For mongo shell
– mongo –help
• Shows options available for running the shell
• Inside mongo shell
– Object.help()
• Shows commands available on the object

Import Export Tools
• For objects
– mongodump
– mongorestore
– bsondump
– mongooplog
• For data items
– mongoimport
– mongoexport

Database Operations
• Database creation
• Creating/changing collection
• Data insertion
• Data read
• Data update
• Creating indices
• Data deletion
• Dropping collection

Diagnostic Tools
• mongostat
• mongoperf
• mongosnif
• mongotop

Assignment
• Go to http://www.velocitainc.com/mongo/
– Tasks
• assignments.txt
– Data
• students.json

Disaster Recovery
Introduction to Replica Sets and
High Availability

Disasters
• Physical Failure
– Hardware
– Network
• Solution
– Replica Sets
• Provide redundant storage for High Availability
– Real time data synchronization
• Automatic failover for zero down time

Replication

Multi Replication
• Data can be replicated to multiple places
simultaneously
• Odd number of machines are always
needed in a replica set

Single Replication
• If you want to have only one or odd
number of secondary, you need to setup
an arbiter

Failover
• When primary fails, remaining machines
vote for electing new primary

Handling Big Data
Introduction to Map/Reduce
and Sharding

Large Data Sets
• Problem 1
– Performance
• Queries go slow
• Solution
– Map/Reduce

Map Reduce
• A way to divide large query computation
into smaller chunks
• May run in multiple processes across
multiple machines
• Think of it as GROUP BY of SQL

Map/Reduce Example
• Map function digs the data and returns
required values

Map/Reduce Example
• Reduce function uses the output of Map
function and generates aggregated value

Large Data Sets
• Problem 2
– Vertical Scaling of Hardware
• Can’t increase machine size beyond a limit
• Solution
– Sharding

Sharding
• A method for storing data across multiple
machines
• Data is partitioned using Shard Keys

Data Partitioning: Range Based
• A range of Shard Keys stay in a chunk

Data Partitioning: Hash Bsed
• A hash function on Shard Keys decides the chunk

Sharded Cluster

Optimizing Shards: Splitting
• In a shard, when size of a chunk
increases, the chunk is divided into two

Optimizing Shards: Balancing
• When number of chunks in a shard
increase, a few chunks are migrated to
other shard

Summary
• MongoDB is good
– Stores objects as we use in programming
language
– Flexible semi-structured design
– Scales out to store big data
– Embedded documents eliminates need for join
• MongoDB is bad
– No multi-document query
– De-normalized storage
– No support for transactions

Thanks
@akshaymathu @_sarangs

Mongo db

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Mongo db

Similar to Mongo db (20)

More from Akshay Mathur

More from Akshay Mathur (20)

Recently uploaded

Recently uploaded (20)

Mongo db