MongoDB (for Java Developers)
Anthony Slabinck
Who am I?
• Internship at Provikmo
• 3 years 6 months
• Competitive cyclist
What is MongoDB?
• The leading NoSQL database (http://db-engines.com/en/)
• Open source
• Non-relational JSON document store
• BSON (Binary JSON)
• Dynamic schema
• Agile
• Scalable through replicaton and sharding
3
The leading NoSQL database
4
• LinkedIn Job Skills
• Google Search
• Indeed.com Trends
MongoDB relative to relational databases
5
Who uses MongoDB?
6
By use case
• Single View
• Internet of Things
• Mobile
• Real-Time Analytics
• Personalization
• Content Management
• Catalog
7
From relational databases to MongoDB
8
{
first_name: "Anthony",
surname: "Slabinck",
city: "Bruges",
location: [45.123,47.232],
cars: [
{ model: "Bentley",
year: 1973,
value: 100000 },
{ mode: "Rolls Royce",
year: 1965,
value: 330000 } ]
}
MongoDB is full featured
9
MongoDB CRUD Operations
10
Documents
MongoDB CRUD Operations
11
Collections
MongoDB CRUD Operations
12
Read operations
MongoDB CRUD Operations
13
Read operations
MongoDB CRUD Operations
14
Write operations - insert
MongoDB CRUD Operations
15
Write operations - update
MongoDB CRUD Operations
16
Write operations - remove
Installation
• Download MongoDB from http://www.mongodb.org/downloads
• Download the Java Driver (maven)
• mongod
• Daemon process
• mongo
• Interactive JavaScript shell interface
• Robomongo
• Cross-platform management tool
17
Getting started with MongoDB
18
Demo
Data Models
• Flexible schema
• Collections do not enforce document structure
• Consider how applications will use your database
• No foreign keys, no joins
• Relationships between data
• Embedded documents
• References
• Documents require a unique _id field that acts as a primary key
19
Data Models
• Denormalized
• Better read performance
• Single atomic write operation
• Document growth
• Dot notation
20
Embedded Data Models
Data Model
• One-to-One Relationship
21
Embedded Data Models
{
_id: "infasla",
name: "Anthony Slabinck",
address: {
street: "123 Fake Street",
city: "Faketon",
state: "MA",
zip: "12345"
}
}
Data Model
• One-to-Many Relationship
22
Embedded Data Models
{
_id: "infasla",
name: "Anthony Slabinck",
addresses: [
{ street: "123 Fake Street",
city: "Faketon",
state: "MA",
zip: "12345" },
{ street: "1 Other Street",
city: "Boston",
state: "MA",
zip: "12345"
}
]
}
Data Model
• Normalized
• Duplication of data
• Complex many-to-many
relationships
• Follow-up queries
23
References
Data Model
• One-to-Many Relationship
{ _id: "oreilly",
name: "O'Reilly Media",
founded: 1980,
location: "CA"
}
{ _id: 123456789,
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher_id: "oreilly"
}
{ _id: 234567890,
title: "50 Tips and Tricks for MongoDB Developer",
author: "Kristina Chodorow",
published_date: ISODate("2011-05-06"),
pages: 68,
language: "English",
publisher_id: "oreilly"
}
24
References
Model Tree Structures
• Parent references
• Child references
• Array of Ancestors
• Materialized Paths
db.categories.insert( { _id: "MongoDB", parent: "Databases" } )
db.categories.insert( { _id: "dbm", parent: "Databases" } )
db.categories.insert( { _id: "Databases", parent: "Programming" } )
db.categories.insert( { _id: "Languages", parent: "Programming" } )
db.categories.insert( { _id: "Programming", parent: "Books" } )
db.categories.insert( { _id: "Books", parent: null } )
25
GridFS
• BSON-document size limit of 16MB
• Divides a file into parts, or chunks and stores each of those chunks as
a separate document
• Two collections
• File chunks
• File metadata
• Reassemble chunks as needed
26
Capped Collections
• Fixed-size collections
• Insert and retrieve documents based on insertion order
• Automatically removes the oldest document
• Ideal for logging
27
Aggregation
• Operations that process data records and return computed results
• Simplifies application code
• Limits resource requirements
• Aggregation modalities
• Aggregation pipelines
• Map-Reduce
• Single purpose aggregation operations
28
Aggregation
• Stages
• Preferred method
29
Aggregation pipelines
Aggregation
• Two phases
• JavaScript functions
• Less efficient and more
complex than the aggregation
pipeline
30
Map-Reduce
Aggregation
• Simple
• Count
• Distinct
• Grouping
31
Single purpose aggregation operations
Indexes
• Efficient execution of queries
• Data structure
• Stores the value of a specific
field or set of fields, ordered by
value the field
• Create indexes that support
your common and user-facing
queries
32
Indexes
• Default _id
• Single Field
• Compound Index
• Multikey Index
• Geospatial Index
• Text Indexes
• Hashed Indexes
33
Types
Indexes
• Unique Indexes
• Sparse Indexes
• TTL Indexes
34
Properties
Indexes
• db.people.ensureIndex( { zipcode: 1 } )
• db.people.ensureIndex( { zipcode: 1 }, { background: true } )
• db.people.ensureIndex( { zipcode: 1 }, { background: true, sparse: true } )
• db.accounts.ensureIndex( { username: 1 }, { unique: true, dropDups: true } )
35
Creation
Replication
• What?
• Synchronizing data across multiple servers
• Purpose?
• Provides redundancy and increases data availability
36
Replication
• A group of mongod instances
that host the same data set
• Primary receives all write
operations
• Primary logs all changes in its
oplog
• Secondaries apply operations
from the primary
37
Replica set
Replication
• Arbiter
• Does not maintain a data set
• Only exits to vote
38
Replica set
Replication
39
Replica set
• Automatic failover
Replication
• Additional features:
• Read preference
• Priority
• Hidden members
• Delayed members
40
Replica set
Sharding
• What?
• Storing data across multiple machines
• When?
• High query rates exhaust the CPU capacity of the server
• Larger data sets exceed the storage capacity of a single machine
• Working set sizes larger than the system’s RAM stress the I/O capacity of
disk drives
41
Sharding
• Adds more CPU and storage
42
Vertical scaling – scale up
Scale
Price
Sharding
• Distributes the data
43
Horizontal scaling – scale outPrice
Scale
Sharding
• Shards store the data
• Query Routers interface with
client applications and direct
operations
• Config servers store the
cluster’s metadata
44
Sharded cluster
Sharding
• Collection level
• Shard key
• Indexed field or an indexed
compound field that exists in
every document
• Chunks
• Range based partitioning
• Hash based partitioning
• Automatic balancing
45
Data partitioning
MongoDB Architecture
46
MongoDB at scale
• Cluster scale
• Distributing across 100+ nodes in multiple data centers
• Performance scale
• 100K+ database reads and writes per second while maintaining strict SLAs
• Data scale
• Storing 1B+ documents in the database
47
Metrics
Lower TCO
• Dev/Ops savings
• Ease of use
• Fast, iterative development
• Hardware savings
• Commodity hardware
• Scale out
• Software/Support savings
• No upfront licence
48
Relational database
POJO Mappers
• Morphia
• Spring Data MongoDB
• Hibernate OGM
49
Resources
• http://docs.mongodb.org/manual/
• https://university.mongodb.com/
• M101J: MongoDB for Java Developers
• M102: MongoDB for DBAs
50
Building an App with MongoDB
51
Demo
Questions?
52

MongoDB

  • 1.
    MongoDB (for JavaDevelopers) Anthony Slabinck
  • 2.
    Who am I? •Internship at Provikmo • 3 years 6 months • Competitive cyclist
  • 3.
    What is MongoDB? •The leading NoSQL database (http://db-engines.com/en/) • Open source • Non-relational JSON document store • BSON (Binary JSON) • Dynamic schema • Agile • Scalable through replicaton and sharding 3
  • 4.
    The leading NoSQLdatabase 4 • LinkedIn Job Skills • Google Search • Indeed.com Trends
  • 5.
    MongoDB relative torelational databases 5
  • 6.
  • 7.
    By use case •Single View • Internet of Things • Mobile • Real-Time Analytics • Personalization • Content Management • Catalog 7
  • 8.
    From relational databasesto MongoDB 8 { first_name: "Anthony", surname: "Slabinck", city: "Bruges", location: [45.123,47.232], cars: [ { model: "Bentley", year: 1973, value: 100000 }, { mode: "Rolls Royce", year: 1965, value: 330000 } ] }
  • 9.
    MongoDB is fullfeatured 9
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
    Installation • Download MongoDBfrom http://www.mongodb.org/downloads • Download the Java Driver (maven) • mongod • Daemon process • mongo • Interactive JavaScript shell interface • Robomongo • Cross-platform management tool 17
  • 18.
    Getting started withMongoDB 18 Demo
  • 19.
    Data Models • Flexibleschema • Collections do not enforce document structure • Consider how applications will use your database • No foreign keys, no joins • Relationships between data • Embedded documents • References • Documents require a unique _id field that acts as a primary key 19
  • 20.
    Data Models • Denormalized •Better read performance • Single atomic write operation • Document growth • Dot notation 20 Embedded Data Models
  • 21.
    Data Model • One-to-OneRelationship 21 Embedded Data Models { _id: "infasla", name: "Anthony Slabinck", address: { street: "123 Fake Street", city: "Faketon", state: "MA", zip: "12345" } }
  • 22.
    Data Model • One-to-ManyRelationship 22 Embedded Data Models { _id: "infasla", name: "Anthony Slabinck", addresses: [ { street: "123 Fake Street", city: "Faketon", state: "MA", zip: "12345" }, { street: "1 Other Street", city: "Boston", state: "MA", zip: "12345" } ] }
  • 23.
    Data Model • Normalized •Duplication of data • Complex many-to-many relationships • Follow-up queries 23 References
  • 24.
    Data Model • One-to-ManyRelationship { _id: "oreilly", name: "O'Reilly Media", founded: 1980, location: "CA" } { _id: 123456789, title: "MongoDB: The Definitive Guide", author: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", publisher_id: "oreilly" } { _id: 234567890, title: "50 Tips and Tricks for MongoDB Developer", author: "Kristina Chodorow", published_date: ISODate("2011-05-06"), pages: 68, language: "English", publisher_id: "oreilly" } 24 References
  • 25.
    Model Tree Structures •Parent references • Child references • Array of Ancestors • Materialized Paths db.categories.insert( { _id: "MongoDB", parent: "Databases" } ) db.categories.insert( { _id: "dbm", parent: "Databases" } ) db.categories.insert( { _id: "Databases", parent: "Programming" } ) db.categories.insert( { _id: "Languages", parent: "Programming" } ) db.categories.insert( { _id: "Programming", parent: "Books" } ) db.categories.insert( { _id: "Books", parent: null } ) 25
  • 26.
    GridFS • BSON-document sizelimit of 16MB • Divides a file into parts, or chunks and stores each of those chunks as a separate document • Two collections • File chunks • File metadata • Reassemble chunks as needed 26
  • 27.
    Capped Collections • Fixed-sizecollections • Insert and retrieve documents based on insertion order • Automatically removes the oldest document • Ideal for logging 27
  • 28.
    Aggregation • Operations thatprocess data records and return computed results • Simplifies application code • Limits resource requirements • Aggregation modalities • Aggregation pipelines • Map-Reduce • Single purpose aggregation operations 28
  • 29.
    Aggregation • Stages • Preferredmethod 29 Aggregation pipelines
  • 30.
    Aggregation • Two phases •JavaScript functions • Less efficient and more complex than the aggregation pipeline 30 Map-Reduce
  • 31.
    Aggregation • Simple • Count •Distinct • Grouping 31 Single purpose aggregation operations
  • 32.
    Indexes • Efficient executionof queries • Data structure • Stores the value of a specific field or set of fields, ordered by value the field • Create indexes that support your common and user-facing queries 32
  • 33.
    Indexes • Default _id •Single Field • Compound Index • Multikey Index • Geospatial Index • Text Indexes • Hashed Indexes 33 Types
  • 34.
    Indexes • Unique Indexes •Sparse Indexes • TTL Indexes 34 Properties
  • 35.
    Indexes • db.people.ensureIndex( {zipcode: 1 } ) • db.people.ensureIndex( { zipcode: 1 }, { background: true } ) • db.people.ensureIndex( { zipcode: 1 }, { background: true, sparse: true } ) • db.accounts.ensureIndex( { username: 1 }, { unique: true, dropDups: true } ) 35 Creation
  • 36.
    Replication • What? • Synchronizingdata across multiple servers • Purpose? • Provides redundancy and increases data availability 36
  • 37.
    Replication • A groupof mongod instances that host the same data set • Primary receives all write operations • Primary logs all changes in its oplog • Secondaries apply operations from the primary 37 Replica set
  • 38.
    Replication • Arbiter • Doesnot maintain a data set • Only exits to vote 38 Replica set
  • 39.
  • 40.
    Replication • Additional features: •Read preference • Priority • Hidden members • Delayed members 40 Replica set
  • 41.
    Sharding • What? • Storingdata across multiple machines • When? • High query rates exhaust the CPU capacity of the server • Larger data sets exceed the storage capacity of a single machine • Working set sizes larger than the system’s RAM stress the I/O capacity of disk drives 41
  • 42.
    Sharding • Adds moreCPU and storage 42 Vertical scaling – scale up Scale Price
  • 43.
    Sharding • Distributes thedata 43 Horizontal scaling – scale outPrice Scale
  • 44.
    Sharding • Shards storethe data • Query Routers interface with client applications and direct operations • Config servers store the cluster’s metadata 44 Sharded cluster
  • 45.
    Sharding • Collection level •Shard key • Indexed field or an indexed compound field that exists in every document • Chunks • Range based partitioning • Hash based partitioning • Automatic balancing 45 Data partitioning
  • 46.
  • 47.
    MongoDB at scale •Cluster scale • Distributing across 100+ nodes in multiple data centers • Performance scale • 100K+ database reads and writes per second while maintaining strict SLAs • Data scale • Storing 1B+ documents in the database 47 Metrics
  • 48.
    Lower TCO • Dev/Opssavings • Ease of use • Fast, iterative development • Hardware savings • Commodity hardware • Scale out • Software/Support savings • No upfront licence 48 Relational database
  • 49.
    POJO Mappers • Morphia •Spring Data MongoDB • Hibernate OGM 49
  • 50.
    Resources • http://docs.mongodb.org/manual/ • https://university.mongodb.com/ •M101J: MongoDB for Java Developers • M102: MongoDB for DBAs 50
  • 51.
    Building an Appwith MongoDB 51 Demo
  • 52.