Your SlideShare is downloading. ×
0
NoSQL Database
Akshay Mathur
Sarang Shravagi
@akshaymathu, @_sarangs
{name: ‘mongo’, type: ‘db’}
Who uses MongoDB
@akshaymathu, @_sarangs 2
Let’s Know Each Other
• Do you code?
• OS?
• Programing Language?
• Why are you attending?
@akshaymathu, @_sarangs 3
Akshay Mathur
• Managed development, testing and
release teams in last 14+ years
– Currently Principal Architect at ShopSo...
Sarang Shravagi
• 10gen Certified Developer and DBA
• CS graduate from PICT Pune
• 3+ years in Software Product industry
•...
How we use MongoDB
@akshaymathu, @_sarangs 6
Python MongoDB
MongoEngine
Where MongoDB Fits
@akshaymathu, @_sarangs 7
Program Outline: Understanding NoSQL
• Data Landscape
• Different Storage Needs
• Design Paradigm Shift from SQL to
NoSQL
...
Program Outline: Hands on Lab
• Installation and basic configuration
• Mongo Shell
• Creating and Changing Schema
• Create...
Program Outline: Advance Topics
• Handling Big Data
– Introduction to Map/Reduce
– Introduction to Data Partitioning (Shar...
Ground Rules
• Disturb Everyone
– Not by phone rings
– Not by local talks
– By more information
and questions
@akshaymathu...
Data Patterns & Storage Needs
@akshaymathu, @_sarangs 12
Data at an Online Store
• Product Information
• User Information
• Purchase Information
• Product Reviews
• Site Interacti...
SQL to NoSQL
Design Paradigm Shift
@akshaymathu, @_sarangs 14
SQL Storage
• Was designed when
– Storage and data transfer was costly
– Processing was slow
– Applications were oriented ...
SQL Storage
• Structured
– schema
• Relational
– foreign keys, constraints
• Transactional
– Atomicity, Consistency, Isola...
NoSQL Storage
• Is designed when
– Storage is cheap
– Data transfer is fast
– Much more processing power is available
• Cl...
NoSQL Storage
• Semi-structured
– Schemaless
• Consistency, Availability, Partition
Tolerance
• High Availability through ...
Different Datastores
Half Level Deep
@akshaymathu, @_sarangs 19
SQL: RDBMS
• MySql, Postgresql, Oracle etc.
• Stores data in tables having columns
– Basic (number, text) data types
• Str...
NoSQL: Key/Value
• Redis, DynamoDB etc.
• Stores a values against a key
– Strings
• Values are opaque
– Can not be part of...
NoSQL: Key/Value
NoSQL: Document
• MongoDB, CouchDB etc.
• Object Oriented data models
– Stores data in document objects having fields
– Ba...
NoSQL: Document
NoSQL: Column Family
• Cassandra, Big Table etc.
• Stores data in columns
• Transparent values
– Can be part of query
• SQ...
NoSQL: Column Family
NoSQL: Graph
• Neo4j
• Stores data in form of nodes and
relationships
• Query is in form of traversal
• In-memory
• Suited...
NoSQL: Graph
Document Storage: Closer Look
@akshaymathu, @_sarangs 30
MongoDB
• Document database
• Powerful query language
• Docs, sub-docs, indexes
• Map/reduce
• Replicas, shards, replicate...
RDBMS: DB Design
@akshaymathu, @_sarangs 32
RDBMS: Query
@akshaymathu, @_sarangs 33
RDBMS  MongoDB
RDBMS MongoDB
Database Database
Table Collection
Row Document
Column Field
Select c1, c2 from Table where...
MongoDB: Design
@akshaymathu, @_sarangs 35
MongoDB: Query
• Movies.objects()
@akshaymathu, @_sarangs 36
@akshaymathu, @_sarangs 37
Have you Installed?
http://www.mongodb.org/downloads
@akshaymathu, @_sarangs
Hands-on
Dive-in with Sarang
@akshaymathu, @_sarangs 39
MongoDB: Core Binaries
• mongod
– Database server
• mongo
– Database client shell
• mongos
– Router for Sharding
@akshayma...
Getting Help
• For mongo shell
– mongo –help
• Shows options available for running the shell
• Inside mongo shell
– Object...
Import Export Tools
• For objects
– mongodump
– mongorestore
– bsondump
– mongooplog
• For data items
– mongoimport
– mong...
Database Operations
• Database creation
• Creating/changing collection
• Data insertion
• Data read
• Data update
• Creati...
Diagnostic Tools
• mongostat
• mongoperf
• mongosnif
• mongotop
@akshaymathu, @_sarangs 44
@akshaymathu, @_sarangs 45
Assignment
• Go to http://www.velocitainc.com/mongo/
– Tasks
• assignments.txt
– Data
• students.json
@akshaymathu, @_sara...
Disaster Recovery
Introduction to Replica Sets and
High Availability
@akshaymathu, @_sarangs 47
Disasters
• Physical Failure
– Hardware
– Network
• Solution
– Replica Sets
• Provide redundant storage for High Availabil...
Replication
@akshaymathu, @_sarangs 49
Multi Replication
• Data can be replicated to multiple places
simultaneously
• Odd number of machines are always
needed in...
Single Replication
• If you want to have only one or odd
number of secondary, you need to setup
an arbiter
@akshaymathu, @...
Failover
• When primary fails, remaining machines
vote for electing new primary
@akshaymathu, @_sarangs 52
Handling Big Data
Introduction to Map/Reduce
and Sharding
@akshaymathu, @_sarangs 53
Large Data Sets
• Problem 1
– Performance
• Queries go slow
• Solution
– Map/Reduce
@akshaymathu, @_sarangs 54
Map Reduce
• A way to divide large query computation
into smaller chunks
• May run in multiple processes across
multiple m...
Map/Reduce Example
• Map function digs the data and returns
required values
@akshaymathu, @_sarangs 56
Map/Reduce Example
• Reduce function uses the output of Map
function and generates aggregated value
@akshaymathu, @_sarang...
Large Data Sets
• Problem 2
– Vertical Scaling of Hardware
• Can’t increase machine size beyond a limit
• Solution
– Shard...
Sharding
• A method for storing data across multiple
machines
• Data is partitioned using Shard Keys
@akshaymathu, @_saran...
Data Partitioning: Range Based
• A range of Shard Keys stay in a chunk
@akshaymathu, @_sarangs 60
Data Partitioning: Hash Bsed
• A hash function on Shard Keys decides the chunk
@akshaymathu, @_sarangs 61
Sharded Cluster
@akshaymathu, @_sarangs 62
Optimizing Shards: Splitting
• In a shard, when size of a chunk
increases, the chunk is divided into two
@akshaymathu, @_s...
Optimizing Shards: Balancing
• When number of chunks in a shard
increase, a few chunks are migrated to
other shard
@akshay...
Summary
• MongoDB is good
– Stores objects as we use in programming
language
– Flexible semi-structured design
– Scales ou...
Thanks
@akshaymathu, @_sarangs 66
@akshaymathu @_sarangs
Mongo db
Upcoming SlideShare
Loading in...5
×

Mongo db

2,408

Published on

MongoDB is a popular NoSQL database. This presentation was delivered during a workshop.

First it talks about NoSQL databases, shift in their design paradigm, focuses a little more on document based NoSQL databases and tries drawing some parallel from SQL databases.

Second part, is for hands-on session of MongoDB using mongo shell. But the slides help very less.

At last it touches advance topics like data replication for disaster recovery and handling big data using map-reduce as well as Sharding.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,408
On Slideshare
0
From Embeds
0
Number of Embeds
38
Actions
Shares
0
Downloads
30
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "Mongo db"

  1. 1. NoSQL Database Akshay Mathur Sarang Shravagi @akshaymathu, @_sarangs {name: ‘mongo’, type: ‘db’}
  2. 2. Who uses MongoDB @akshaymathu, @_sarangs 2
  3. 3. Let’s Know Each Other • Do you code? • OS? • Programing Language? • Why are you attending? @akshaymathu, @_sarangs 3
  4. 4. Akshay Mathur • Managed development, testing and release teams in last 14+ years – Currently Principal Architect at ShopSocially • Founding Team Member of – ShopSocially (Enabling “social” for retailers) – AirTight Neworks (Global leader of WIPS) @akshaymathu, @_sarangs 4
  5. 5. Sarang Shravagi • 10gen Certified Developer and DBA • CS graduate from PICT Pune • 3+ years in Software Product industry • Currently Senior Full-stack Developer at ShopSocially @akshaymathu, @_sarangs 5
  6. 6. How we use MongoDB @akshaymathu, @_sarangs 6 Python MongoDB MongoEngine
  7. 7. Where MongoDB Fits @akshaymathu, @_sarangs 7
  8. 8. Program Outline: Understanding NoSQL • Data Landscape • Different Storage Needs • Design Paradigm Shift from SQL to NoSQL • Different Datastores • Closer look to Document Storage • Drawing parallel from RDBMS @akshaymathu, @_sarangs 8
  9. 9. Program Outline: Hands on Lab • Installation and basic configuration • Mongo Shell • Creating and Changing Schema • Create, Read, Update and Delete of Data • Analyzing Performance • Improving performance by creating Indices • Assignment • Problem solving for the assignment @akshaymathu, @_sarangs 9
  10. 10. Program Outline: Advance Topics • Handling Big Data – Introduction to Map/Reduce – Introduction to Data Partitioning (Sharding) • Disaster Recovery – Introduction to Replica set and High Availability @akshaymathu, @_sarangs 10
  11. 11. Ground Rules • Disturb Everyone – Not by phone rings – Not by local talks – By more information and questions @akshaymathu, @_sarangs 11
  12. 12. Data Patterns & Storage Needs @akshaymathu, @_sarangs 12
  13. 13. Data at an Online Store • Product Information • User Information • Purchase Information • Product Reviews • Site Interactions • Social Graph • Search Index @akshaymathu, @_sarangs 13
  14. 14. SQL to NoSQL Design Paradigm Shift @akshaymathu, @_sarangs 14
  15. 15. SQL Storage • Was designed when – Storage and data transfer was costly – Processing was slow – Applications were oriented more towards data collection • Initial adopters were financial institutions @akshaymathu, @_sarangs 15
  16. 16. SQL Storage • Structured – schema • Relational – foreign keys, constraints • Transactional – Atomicity, Consistency, Isolation, Durability • High Availability through robustness – Minimize failures • Optimized for Writes • Typically Scale Up @akshaymathu, @_sarangs 16
  17. 17. NoSQL Storage • Is designed when – Storage is cheap – Data transfer is fast – Much more processing power is available • Clustering of machines is also possible – Applications are oriented towards consumption of User Generated Content – Better on-screen user experience is in demand @akshaymathu, @_sarangs 17
  18. 18. NoSQL Storage • Semi-structured – Schemaless • Consistency, Availability, Partition Tolerance • High Availability through clustering – expect failures • Optimized for Reads • Typically Scale Out @akshaymathu, @_sarangs 18
  19. 19. Different Datastores Half Level Deep @akshaymathu, @_sarangs 19
  20. 20. SQL: RDBMS • MySql, Postgresql, Oracle etc. • Stores data in tables having columns – Basic (number, text) data types • Strong query language • Transparent values – Query language can read and filter on them – Relationship between tables based on values • Suited for user info and transactions @akshaymathu, @_sarangs 20
  21. 21. NoSQL: Key/Value • Redis, DynamoDB etc. • Stores a values against a key – Strings • Values are opaque – Can not be part of query • Suited for site interactions @akshaymathu, @_sarangs 21
  22. 22. NoSQL: Key/Value
  23. 23. NoSQL: Document • MongoDB, CouchDB etc. • Object Oriented data models – Stores data in document objects having fields – Basic and compound (list, dict) data types • SQL like queries • Transparent values – Can be part of query • Suited for product info and its reviews @akshaymathu, @_sarangs 23
  24. 24. NoSQL: Document
  25. 25. NoSQL: Column Family • Cassandra, Big Table etc. • Stores data in columns • Transparent values – Can be part of query • SQL like queries • Suited for search @akshaymathu, @_sarangs 25
  26. 26. NoSQL: Column Family
  27. 27. NoSQL: Graph • Neo4j • Stores data in form of nodes and relationships • Query is in form of traversal • In-memory • Suited for social graph @akshaymathu, @_sarangs 27
  28. 28. NoSQL: Graph
  29. 29. Document Storage: Closer Look @akshaymathu, @_sarangs 30
  30. 30. MongoDB • Document database • Powerful query language • Docs, sub-docs, indexes • Map/reduce • Replicas, shards, replicated shards • SDKs/drivers for so many languages – C, C++, C#, Python, Erlang, PHP, Java, Javascript, NodeJS, Perl, Ruby, Scala @akshaymathu, @_sarangs 31
  31. 31. RDBMS: DB Design @akshaymathu, @_sarangs 32
  32. 32. RDBMS: Query @akshaymathu, @_sarangs 33
  33. 33. RDBMS  MongoDB RDBMS MongoDB Database Database Table Collection Row Document Column Field Select c1, c2 from Table where c1 = ‘v1’ order by c2 limit n Collection.objects(F1 = ‘v1’).order_by(‘c2’).limit(n) @akshaymathu, @_sarangs 34
  34. 34. MongoDB: Design @akshaymathu, @_sarangs 35
  35. 35. MongoDB: Query • Movies.objects() @akshaymathu, @_sarangs 36
  36. 36. @akshaymathu, @_sarangs 37
  37. 37. Have you Installed? http://www.mongodb.org/downloads @akshaymathu, @_sarangs
  38. 38. Hands-on Dive-in with Sarang @akshaymathu, @_sarangs 39
  39. 39. MongoDB: Core Binaries • mongod – Database server • mongo – Database client shell • mongos – Router for Sharding @akshaymathu, @_sarangs 40
  40. 40. Getting Help • For mongo shell – mongo –help • Shows options available for running the shell • Inside mongo shell – Object.help() • Shows commands available on the object @akshaymathu, @_sarangs 41
  41. 41. Import Export Tools • For objects – mongodump – mongorestore – bsondump – mongooplog • For data items – mongoimport – mongoexport @akshaymathu, @_sarangs 42
  42. 42. Database Operations • Database creation • Creating/changing collection • Data insertion • Data read • Data update • Creating indices • Data deletion • Dropping collection @akshaymathu, @_sarangs 43
  43. 43. Diagnostic Tools • mongostat • mongoperf • mongosnif • mongotop @akshaymathu, @_sarangs 44
  44. 44. @akshaymathu, @_sarangs 45
  45. 45. Assignment • Go to http://www.velocitainc.com/mongo/ – Tasks • assignments.txt – Data • students.json @akshaymathu, @_sarangs 46
  46. 46. Disaster Recovery Introduction to Replica Sets and High Availability @akshaymathu, @_sarangs 47
  47. 47. Disasters • Physical Failure – Hardware – Network • Solution – Replica Sets • Provide redundant storage for High Availability – Real time data synchronization • Automatic failover for zero down time @akshaymathu, @_sarangs 48
  48. 48. Replication @akshaymathu, @_sarangs 49
  49. 49. Multi Replication • Data can be replicated to multiple places simultaneously • Odd number of machines are always needed in a replica set @akshaymathu, @_sarangs 50
  50. 50. Single Replication • If you want to have only one or odd number of secondary, you need to setup an arbiter @akshaymathu, @_sarangs 51
  51. 51. Failover • When primary fails, remaining machines vote for electing new primary @akshaymathu, @_sarangs 52
  52. 52. Handling Big Data Introduction to Map/Reduce and Sharding @akshaymathu, @_sarangs 53
  53. 53. Large Data Sets • Problem 1 – Performance • Queries go slow • Solution – Map/Reduce @akshaymathu, @_sarangs 54
  54. 54. Map Reduce • A way to divide large query computation into smaller chunks • May run in multiple processes across multiple machines • Think of it as GROUP BY of SQL @akshaymathu, @_sarangs 55
  55. 55. Map/Reduce Example • Map function digs the data and returns required values @akshaymathu, @_sarangs 56
  56. 56. Map/Reduce Example • Reduce function uses the output of Map function and generates aggregated value @akshaymathu, @_sarangs 57
  57. 57. Large Data Sets • Problem 2 – Vertical Scaling of Hardware • Can’t increase machine size beyond a limit • Solution – Sharding @akshaymathu, @_sarangs 58
  58. 58. Sharding • A method for storing data across multiple machines • Data is partitioned using Shard Keys @akshaymathu, @_sarangs 59
  59. 59. Data Partitioning: Range Based • A range of Shard Keys stay in a chunk @akshaymathu, @_sarangs 60
  60. 60. Data Partitioning: Hash Bsed • A hash function on Shard Keys decides the chunk @akshaymathu, @_sarangs 61
  61. 61. Sharded Cluster @akshaymathu, @_sarangs 62
  62. 62. Optimizing Shards: Splitting • In a shard, when size of a chunk increases, the chunk is divided into two @akshaymathu, @_sarangs 63
  63. 63. Optimizing Shards: Balancing • When number of chunks in a shard increase, a few chunks are migrated to other shard @akshaymathu, @_sarangs 64
  64. 64. Summary • MongoDB is good – Stores objects as we use in programming language – Flexible semi-structured design – Scales out to store big data – Embedded documents eliminates need for join • MongoDB is bad – No multi-document query – De-normalized storage – No support for transactions @akshaymathu, @_sarangs 65
  65. 65. Thanks @akshaymathu, @_sarangs 66 @akshaymathu @_sarangs
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×