MongoDB is a popular NoSQL database. This presentation was delivered during a workshop.
First it talks about NoSQL databases, shift in their design paradigm, focuses a little more on document based NoSQL databases and tries drawing some parallel from SQL databases.
Second part, is for hands-on session of MongoDB using mongo shell. But the slides help very less.
At last it touches advance topics like data replication for disaster recovery and handling big data using map-reduce as well as Sharding.
3. Let’s Know Each Other
• Do you code?
• OS?
• Programing Language?
• Why are you attending?
@akshaymathu, @_sarangs 3
4. Akshay Mathur
• Managed development, testing and
release teams in last 14+ years
– Currently Principal Architect at ShopSocially
• Founding Team Member of
– ShopSocially (Enabling “social” for retailers)
– AirTight Neworks (Global leader of WIPS)
@akshaymathu, @_sarangs 4
5. Sarang Shravagi
• 10gen Certified Developer and DBA
• CS graduate from PICT Pune
• 3+ years in Software Product industry
• Currently Senior Full-stack Developer at
ShopSocially
@akshaymathu, @_sarangs 5
6. How we use MongoDB
@akshaymathu, @_sarangs 6
Python MongoDB
MongoEngine
8. Program Outline: Understanding NoSQL
• Data Landscape
• Different Storage Needs
• Design Paradigm Shift from SQL to
NoSQL
• Different Datastores
• Closer look to Document Storage
• Drawing parallel from RDBMS
@akshaymathu, @_sarangs 8
9. Program Outline: Hands on Lab
• Installation and basic configuration
• Mongo Shell
• Creating and Changing Schema
• Create, Read, Update and Delete of Data
• Analyzing Performance
• Improving performance by creating Indices
• Assignment
• Problem solving for the assignment
@akshaymathu, @_sarangs 9
10. Program Outline: Advance Topics
• Handling Big Data
– Introduction to Map/Reduce
– Introduction to Data Partitioning (Sharding)
• Disaster Recovery
– Introduction to Replica set and High
Availability
@akshaymathu, @_sarangs 10
11. Ground Rules
• Disturb Everyone
– Not by phone rings
– Not by local talks
– By more information
and questions
@akshaymathu, @_sarangs 11
13. Data at an Online Store
• Product Information
• User Information
• Purchase Information
• Product Reviews
• Site Interactions
• Social Graph
• Search Index
@akshaymathu, @_sarangs 13
15. SQL Storage
• Was designed when
– Storage and data transfer was costly
– Processing was slow
– Applications were oriented more towards data
collection
• Initial adopters were financial institutions
@akshaymathu, @_sarangs 15
16. SQL Storage
• Structured
– schema
• Relational
– foreign keys, constraints
• Transactional
– Atomicity, Consistency, Isolation, Durability
• High Availability through robustness
– Minimize failures
• Optimized for Writes
• Typically Scale Up
@akshaymathu, @_sarangs 16
17. NoSQL Storage
• Is designed when
– Storage is cheap
– Data transfer is fast
– Much more processing power is available
• Clustering of machines is also possible
– Applications are oriented towards
consumption of User Generated Content
– Better on-screen user experience is in
demand
@akshaymathu, @_sarangs 17
18. NoSQL Storage
• Semi-structured
– Schemaless
• Consistency, Availability, Partition
Tolerance
• High Availability through clustering
– expect failures
• Optimized for Reads
• Typically Scale Out
@akshaymathu, @_sarangs 18
20. SQL: RDBMS
• MySql, Postgresql, Oracle etc.
• Stores data in tables having columns
– Basic (number, text) data types
• Strong query language
• Transparent values
– Query language can read and filter on them
– Relationship between tables based on values
• Suited for user info and transactions
@akshaymathu, @_sarangs 20
21. NoSQL: Key/Value
• Redis, DynamoDB etc.
• Stores a values against a key
– Strings
• Values are opaque
– Can not be part of query
• Suited for site interactions
@akshaymathu, @_sarangs 21
23. NoSQL: Document
• MongoDB, CouchDB etc.
• Object Oriented data models
– Stores data in document objects having fields
– Basic and compound (list, dict) data types
• SQL like queries
• Transparent values
– Can be part of query
• Suited for product info and its reviews
@akshaymathu, @_sarangs 23
25. NoSQL: Column Family
• Cassandra, Big Table etc.
• Stores data in columns
• Transparent values
– Can be part of query
• SQL like queries
• Suited for search
@akshaymathu, @_sarangs 25
27. NoSQL: Graph
• Neo4j
• Stores data in form of nodes and
relationships
• Query is in form of traversal
• In-memory
• Suited for social graph
@akshaymathu, @_sarangs 27
41. Getting Help
• For mongo shell
– mongo –help
• Shows options available for running the shell
• Inside mongo shell
– Object.help()
• Shows commands available on the object
@akshaymathu, @_sarangs 41
42. Import Export Tools
• For objects
– mongodump
– mongorestore
– bsondump
– mongooplog
• For data items
– mongoimport
– mongoexport
@akshaymathu, @_sarangs 42
43. Database Operations
• Database creation
• Creating/changing collection
• Data insertion
• Data read
• Data update
• Creating indices
• Data deletion
• Dropping collection
@akshaymathu, @_sarangs 43
48. Disasters
• Physical Failure
– Hardware
– Network
• Solution
– Replica Sets
• Provide redundant storage for High Availability
– Real time data synchronization
• Automatic failover for zero down time
@akshaymathu, @_sarangs 48
50. Multi Replication
• Data can be replicated to multiple places
simultaneously
• Odd number of machines are always
needed in a replica set
@akshaymathu, @_sarangs 50
51. Single Replication
• If you want to have only one or odd
number of secondary, you need to setup
an arbiter
@akshaymathu, @_sarangs 51
52. Failover
• When primary fails, remaining machines
vote for electing new primary
@akshaymathu, @_sarangs 52
54. Large Data Sets
• Problem 1
– Performance
• Queries go slow
• Solution
– Map/Reduce
@akshaymathu, @_sarangs 54
55. Map Reduce
• A way to divide large query computation
into smaller chunks
• May run in multiple processes across
multiple machines
• Think of it as GROUP BY of SQL
@akshaymathu, @_sarangs 55
56. Map/Reduce Example
• Map function digs the data and returns
required values
@akshaymathu, @_sarangs 56
57. Map/Reduce Example
• Reduce function uses the output of Map
function and generates aggregated value
@akshaymathu, @_sarangs 57
58. Large Data Sets
• Problem 2
– Vertical Scaling of Hardware
• Can’t increase machine size beyond a limit
• Solution
– Sharding
@akshaymathu, @_sarangs 58
59. Sharding
• A method for storing data across multiple
machines
• Data is partitioned using Shard Keys
@akshaymathu, @_sarangs 59
60. Data Partitioning: Range Based
• A range of Shard Keys stay in a chunk
@akshaymathu, @_sarangs 60
61. Data Partitioning: Hash Bsed
• A hash function on Shard Keys decides the chunk
@akshaymathu, @_sarangs 61
63. Optimizing Shards: Splitting
• In a shard, when size of a chunk
increases, the chunk is divided into two
@akshaymathu, @_sarangs 63
64. Optimizing Shards: Balancing
• When number of chunks in a shard
increase, a few chunks are migrated to
other shard
@akshaymathu, @_sarangs 64
65. Summary
• MongoDB is good
– Stores objects as we use in programming
language
– Flexible semi-structured design
– Scales out to store big data
– Embedded documents eliminates need for join
• MongoDB is bad
– No multi-document query
– De-normalized storage
– No support for transactions
@akshaymathu, @_sarangs 65