2. 2
Who Am I?
Solutions Architect with ICF Ironworks
Part-time Adjunct Professor
Started with HTML and Lotus Notes in 1992
• In the interim there was C, C++, VB, Lotus Script, PERL, LabVIEW,
Oracle, MS SQL Server, etc.
Not so much an Early Adopter as much as a Fast Follower of Java
Technologies
Alphabet Soup (MCSE, ICAAD, ICASA, SCJP, SCJD, PMP, CSM)
LinkedIn: http://www.linkedin.com/in/iamjimmyray
Blog: http://jimmyraywv.blogspot.com/ Avoiding Tech-sand
4. 4
Tonight’s Agenda
Quick introduction to NoSQL and MongoDB
• Configuration
• MongoView
Introduction to Spring Data and MongoDB support
• Spring Data and MongoDB configuration
• Templates
• Repositories
• Query Method Conventions
• Custom Finders
• Customizing Repositories
• Metadata Mapping (including nested docs and DBRef)
• Aggregation Functions
• GridFS File Storage
• Indexes
5. 5
What is NoSQL?
Official: Not Only SQL
• In reality, it may or may not use SQL*, at least in its truest form
• Varies from the traditional RDBMS approach of the last few decades
• Not necessarily a replacement for RDBMS; more of a solution for more
specific needs where is RDBMS is not a great fit
• Content Management (including CDNs), document storage, object storage,
graph, etc.
It means different things to different folks.
• It really comes down to a different way to view our data domains for
more effective storage, retrieval, and analysis…albeit with tradeoffs
that effect our design decisions.
6. 6
From NoSQL-Database.org
“NoSQL DEFINITION: Next Generation Databases mostly
addressing some of the points: being non-relational, distributed,
open-source and horizontally scalable. The original intention has
been modern web-scale databases. The movement began early
2009 and is growing rapidly. Often more characteristics apply such
as: schema-free, easy replication support, simple API, eventually
consistent / BASE (not ACID), a huge amount of data and more.”
8. 8
Why MongoDB
Open Source
Multiple platforms (Linux, Win, Solaris, Apple) and Language Drivers
Explicitly de-normalized
Document-centric and Schema-less (for the most part)
Fast (low latency)
• Fast access to data
• Low CPU overhead
Ease of scalability (replica sets), auto-sharding
Manages complex and polymorphic data
Great for CDN and document-based SOA solutions
Great for location-based and geospatial data solutions
9. 9
Why MongoDB (more)
Because of schema-less approach is more flexible, MongoDB is
intrinsically ready for iterative (Agile) projects.
Eliminates “impedance-mismatching” with typical RDBMS solutions
“How do I model my object/document based application in 3NF?”
If you are already familiar with JavaScript and JSON, MongoDB storage
and document representation is easier to understand.
Near-real-time data aggregation support
10gen has been responsive to the MongoDB community
10. 10
What is schema-less?
A.K.A. schema-free, 10gen says “flexible-schema”
It means that MongoDB does not enforce a column data type on
the fields within your document, nor does it confine your document
to specific columns defined in a table definition.
The schema “can be” actually controlled via the application API
layers and is implied by the “shape” (content) of your documents.
This means that different documents in the same collection can
have different fields.
• So the schema is flexible in that way
• Only the _id field is mandatory in all documents.
Requires more rigor on the application side.
11. 11
Is MongoDB really schema-less?
Technically no.
There is the System Catalog of system collections
• <database>.system.namespaces
• <database>.system.indexes
• <database>.system.profile
• <database>.system.users
And…because of the nature of how docs are stored in collections
(JSON/BSON), field labels are store in every doc*
12. 12
Schema tips
MongoDB has ObjectID, can be placed in _id
• If you have a natural unique ID, use that instead
De-normalize when needed (you must know MongoDB restrictions)
• For example: Compound indexes cannot contain parallel arrays
Create indexes that cover queries
• Mongo only uses one index at a time for a query
• Watch out for sorts
• What out for field sequence in compound indexes.
Reduce size of collections (watch out for label sizes)
13. 13
MongoDB Data Modeling and Node Setups
Schema Design is still important
Understand your concerns
• Do you have read-intensive or write-intensive data
• Document embedding (fastest and atomic) vs. references (normalized)
• Atomicity – Document Level Only
• Can use 2-Phase Commit Pattern
• Data Durability
• Not “truly” available in a single-server setup
• Requires write concern tuning
• Need sharding and/or replicas
10gen offers patterns and documentation:
• http://docs.mongodb.org/manual/core/data-modeling/
14. 14
Why Not MongoDB
High speed and deterministic transactions:
• Banking and accounting
• See MongoDB Global Write Locking
– Improved by better yielding in 2.0
Where SQL is absolutely required
• Where true Joins are needed*
Traditional non-real-time data warehousing ops*
If your organization lacks the controls and rigor to place schema
and document definition at the application level without
compromising data integrity**
15. 15
MongoDB
Was designed to overcome some of the performance
shortcomings of RDBMS
Some Features
• Memory Mapped IO (32bit vs. 64bit)
• Fast Querying (atomic operations, embedded data)
• In place updates (physical writes lag in-memory changes)
• Depends on Write Concern settings
• Full Index support (including compound indexes, text, spherical)
• Replication/High Availability (see CAP Theorem)
• Auto Sharding (range-based portioning, based on shard key) for
scalability
• Aggregation, MapReduce, geo-spatial
• GridFS
16. 16
MongoDB – In Place Updates
No need to get document from the server, just send update
Physical disk writes lag in-memory changes.
• Lag depends on Write-Concerns (Write-through)
• Multiple writes in memory can occur before the object is updated on
disk
MongoDB uses an adaptive allocation algorithm for storing its
objects.
• If an object changes and fits in it’s current location, it stays there.
• However, if it is now larger, it is moved to a new location. This moving
is expensive for index updates
• MongoDB looks at collections and based on how many times items
grow within a collection, MongoDB calculates a padding factor that trys
to account for object growth
• This minimizes object relocation
17. 17
MongoDB – A Word About Sharding…
Need to choose the right key
• Easily divisible (“splittable”– see cardinality) so that Mongo can
distribute data among shards
• “all documents that have the same value in the state field must reside on the
same shard” – 10Gen
• Enable distributed write operations between cluster nodes
• Prevents single-shard bottle-necking
• Make it possible for “Mongos” return most query operations from
multiple shards (or single shard if you can guarantee contiguous
storage in that shard**)
• Distribute write evenly among mongos
• Minimize disk seeks per mongos
• “users will generally have a unique value for this field (Phone)
– MongoDB will be able to split as many chunks as needed” – 10Gen
Watch out for the need to perform range queries.
18. 18
MongoDB – Cardinality…
In most cases, when sharding for performance, you want higher
cardinality to allow chunks of data to be split among shards
• Example: Address data components
• State – Low Cardinality
• ZipCode – Potentially low or high, depending population
• Phone Number – High Cardinality
High cardinality is a good start for sharding, but..
• …it does not guarantee query isolation
• …it does not guarantee write scaling
• Consider computed keys (Hashed , MD5, etc.)
19. 19
CAP Theorem
Consistency – all nodes see the same data at the same time
Availability – all requests receive responses, guaranteed
Partition Tolerance (network partition tolerance)
The theorem states that you can never have all three, so you plan
for two and make the best of the third.
• For example: Perhaps “eventual consistency” is OK for a CDN
application.
• For large scalability, you would need partitioning. That leaves C & A to
choose from
• Would you ever choose consistency over availability?
How does CLOUD implementations change this?
21. 21
Container Models: RDBMS vs. MongoDB
RDBMS: Servers > Databases > Schemas > Tables > Rows
• Joins, Group By, ACID
MongoDB: Servers > Databases > Collections > Documents
• No Joins**
• Instead: Db References (Linking) and Nested Documents (Embedding)
22. 22
MongoDB Collections
Schema-less
Can have up to 24000 (according to 10gen)
• Cheap to resource
Contain documents (…of varying shapes)
• 100 nesting levels (version 2.2)
Are namespaces, like indexes
Can be “Capped”
• Limited in max size with rotating overwrites of oldest entries
• Logging anyone?
• Example: MongoDB oplog
TTL Collections
23. 23
MongoDB Documents
JSON (what you see)
• Actually BSON (Internal - Binary JSON - http://bsonspec.org/)
Elements are name/value pairs
16 MB maximum size
What you see is what is stored
• No default fields (columns)
25. 25
JSON Syntax
Curly braces are used for documents/objects – {…}
Square brackets are used for arrays – […]
Colons are used to link keys to values – key:value
Commas are used to separate multiple objects or elements or
key/value pairs – {ke1:value1, key2:value2…}
JavaScript has how many data types?
• 6 – Text, Number, Array, Object, null, Boolean
27. 27
Why BSON?
Adds data types that JSON did not support – (ISO Dates, ObjectId,
etc.)
Optimized for performance
Adds compression
http://bsonspec.org/#/specification
28. 28
MongoDB Install
Extract MongoDB
Build config file, or use startup script
• Need dbpath configured
• Need REST configured for Web Admin tool
Start Mongod (daemon) process
Use Shell (mongo) to access your database
Use MongoVUE (or other) for GUI access and to learn shell
commands
30. 30
Mongo Shell
In Windows, mongo.exe
Interactive JavaScript shell to mongod
Command-line interface to MongoDB (sort of like SQL*Plus for
Oracle)
JavaScript Interpreter, behaves like a read-eval-print loop
Can be run without database connection (use –nodb)
Uses a fluent API with lazy cursor evaluation
• db.locations.find({state:'MN'},{city:1,state:1,_id:0}).sort({city:-
1}).limit(5).toArray();
31. 31
MongoVUE
GUI around MongoDB Shell
Current version 1.61 (May 2013)
Makes it easy to learn MongoDB Shell commands
• db.employee.find({ "lastName" : "Smith", "firstName" : "John"
}).limit(50);
• show collections
Not sure if development is continuing, but very handy still.
Demo…
33. 33
Web Admin Interface
Localhost:<mongod port + 1000>
Quick stats viewer
Run commands
Demo
There is also Sleepy Mongoose
• http://www.kchodorow.com/blog/2010/02/22/sleepy-mongoose-a-
mongodb-rest-interface/
35. 35
Other MongoDB Tools
Edda – Log Visualizer
• http://blog.mongodb.org/post/28053108398/edda-a-log-visualizer-for-
mongodb
• Requires Python
MongoDB Monitoring Service
• Free Cloud based service that monitors MongoDB instances via
configrued agents.
• Requires Python
• http://www.10gen.com/products/mongodb-monitoring-service
Splunk
• www.splunk.com
36. 36
MongoImport
Binary mongoimport
Syntax: mongoimport --stopOnError --port 29009 --db geo --
collection geos --file
C:UserDataDocsJUGsTwinCitieszips.json
Don’t use for backup or restore in production
• Use mongodump and mongorestore
37. 37
Spring Data
Large Spring project with many subprojects
• Category: Document Stores, Subproject MongoDB
“…aims to provide a familiar and consistent Spring-based
programming model…”
Like other Spring projects, Data is POJO Oriented
For MongoDB, provides high-level API and access to low-level API
for managing MongoDB documents.
Provides annotation-driven meta-mapping
Will allow you into bowels of API if you choose to hang out there
38. 38
Spring Data MongoDB Templates
Implements MongoOperations (mongoOps) interface
• mongoOps defines the basic set of MongoDB operations for the Spring
Data API.
• Wraps the lower-level MongoDB API
Provides access to the lower-level API
Provides foundation for upper-level Repository API.
Demo
42. 42
Spring Data Repositories
Convenience for data access
• Spring does ALL the work (unless you customize)
Convention over configuration
• Uses a method-naming convention that Spring interprets during
implementation
Hides complexities of Spring Data templates and underlying API
Builds implementation for you based on interface design
• Implementation is built during Spring container load.
Is typed (parameterized via generics) to the model objects you want to
store.
• When extending MongoRepository
• Otherwise uses @RepositoryDefinition annotation
Demo
43. 43
Spring Data Bulk Inserts
All things being equal, bulk inserts in MongoDB can be faster than
inserting one record at a time, if you have batch inserts to perform.
As of MongoDB 1.8, the max BSON size of a batch insert was
increased from 4MB to 16MB
• You can check this with the shell command: db.isMaster() or
mongo.getMaxBsonObjectSize() in the Java API
Batch sizes can be tuned for performance
Demo
44. 44
Transformers
Does the “heavy lifting” by preparing MongoDB objects for
insertion
Transforms Java domain objects into MongoDB DBObjects.
Demo
45. 45
Converters
For read and write, overrides default mapping of Java objects to
MongoDB documents
Implements the Spring…Converter interface
Registered with MongoDB configuration in Spring context
Handy when integrating MongoDB to existing application.
Can be used to remove “_class” field
46. 46
Spring Data Meta Mapping
Annotation-driven mapping of model object fields to Spring Data
elements in specific database dialect. – Demo
47. 47
MongoDB DBRef
Optional
Instead of nesting documents
Have to save the “referenced” document first, so that DBRef exists
before adding it to the “parent” document
50. 50
MongoDB Custom Spring Data Repositories
Hooks into Spring Data bean type hierarchy that allows you to add
functionality to repositories
Important: You must write the implementation for part of this
custom repository
And…your Spring Data repository interface must extend this
custom interface, along with the appropriate Spring Data repository
Demo
51. 51
Creating a Custom Repository
Write an interface for the custom methods
Write the implementation for that interface
Write the traditional Spring Data Repository application interface,
extending the appropriate Spring Data interface and the (above)
custom interface
When Spring starts, it will implement the Spring Data Repository
normally, and include the custom implementation as well.
52. 52
MongoDB Queries
In mongos using JS: db.collection.find( <query>, <projection> )
• Use the projection to limit fields returned, and therefore network traffic
Example: db["employees"].find({"title":"Senior Engineer"})
Or: db.employees.find({"title":"Senior Engineer"},{"_id":0})
Or: db.employees.find({"title":"Senior
Engineer"},{"_id":0,"title":1})
In Java use DBObject or Spring Data Query for mapping queries.
You can include and exclude fields in the projection argument.
• You either include (1) or exclude (0)
• You can not include and exclude in the same projection, except for the
“_id” field.
53. 53
DBObject and BasicDBObject
For the Mongo Java driver, DBObject is the Interface,
BasicDBObject is the class
This is essentially a map with additional Mongo functionality
• See partial objects when up-serting
DBObject is used to build commands, queries, projections, and
documents
DBObjects are used to build out the JS queries that would normally
run in the shell. Each {…} is a potential DBObject.
54. 54
MongoDB Queries – And & Or
Comma denotes “and”, and you can use $and
• db.employees.find({"title":"Senior
Engineer","lastName":"Bashian"},{"_id":0,"title":1})
For Or, you must use the $or operator
• db.employees.find({$or:[{"lastName":"Bashian"},{"lastName":"Baik"}]},{"_id":0,
"title":1,"lastName":1})
In Java, use DBObjects and ArrayLists…
• Nest or/and ArrayLists for compound queries
Or use the Spring Data Query and Criteria classes with or criteria
Also see QueryBuilder class
Demo
57. 57
Does Field Exist
$exists
db.locations.find({user:{$exists:false}})
Type “it” for more – iterates over documents - paging
58. 58
MongoDB Advanced Queries
http://www.mongodb.org/display/DOCS/Advanced+Queries#Advan
cedQueries-%24all
May use Mongo Java driver and BasicDBObjectBuilder
Spring Data fluent API is much easier
Demo - $in, $nin, $gt ($gte), $lt ($lte), $all, ranges
59. 59
MongoDB RegEx Queries
In JS:
db.employees.find({ "title" : { "$regex" : "seNior EngIneer" ,
"$options" : "i"}})
In Java use java.util.regex.Pattern
60. 60
Optimizing Queries
Use $hint or hint() in JS to tell MongoDB to use specific index
Use hint() in Java API with fluent API
Use $explain or explain() to see MongoDB query explain plan
• Number of scanned objects should be close to the number of returned
objects
61. 61
MongoDB Aggregation Functions
Aggregation Framework
Map/Reduce - Demo
Distinct - Demo
Group - Demo
• Similar to SQL Group By function
Count
Demo #7
62. 62
More Aggregation
$unwind
• Useful command to convert arrays of objects, within documents, into
sub-documents that are then searchable by query.
db.depts.aggregate({"$project":{"employees":"$employees"}},{"$un
wind":"$employees"},{"$match":{"employees.lname":"Vural"}});
Demo
63. 63
More Aggregation
$unwind
• Useful command to convert arrays of objects, within documents, into
sub-documents that are then searchable by query.
db.depts.aggregate({"$project":{"employees":"$employees"}},{"$un
wind":"$employees"},{"$match":{"employees.lname":"Vural"}});
Demo
64. 64
MongoDB GridFS
“…specification for storing large files in MongoDB.”
As the name implies, “Grid” allows the storage of very large files
divided across multiple MongoDB documents.
• Uses native BSON binary formats
16MB per document
• Will be higher in future
Large files added to GridFS get chunked and spread across
multiple documents.
65. 65
MongoDB GridFS
“…specification for storing large files in MongoDB.”
As the name implies, “Grid” allows the storage of very large files
divided across multiple MongoDB documents.
• Uses native BSON binary formats
16MB per document
• Will be higher in future
Large files added to GridFS get chunked and spread across
multiple documents.
66. 66
MongoDB GridFS
“…specification for storing large files in MongoDB.”
As the name implies, “Grid” allows the storage of very large files
divided across multiple MongoDB documents.
• Uses native BSON binary formats
16MB per document
• Will be higher in future
Large files added to GridFS get chunked and spread across
multiple documents.
67. 67
MongoDB Indexes
Similar to RDBMS Indexes, Btree (support range queries)
Can have many
Can be compound
• Including indexes of array fields in document
Makes searches, aggregates, and group functions faster
Makes writes slower
Sparse = true
• Only include documents in this index that actually contain a value in the
indexed field.
68. 68
Text Indexes
Currently in BETA, as of 2.4, not recommended for
production…yet
Requires enabled in mongod
• --setParameter textSearchEnabled=true
In mongo (shelll)
• db["employees"].ensureIndex({"title":"text"})
• Index “title” field with text index
69. 69
Text Indexes
Currently in BETA, as of 2.4, not recommended for
production…yet
Requires enabled in mongod
• --setParameter textSearchEnabled=true
In mongo (shelll)
• db["employees"].ensureIndex({"title":"text"})
• Index “title” field with text index
70. 70
GEO Spatial Operations
One of MongoDB’s sweet spots
Used to store, index, search on geo-spatial data for GIS
operations.
Requires special indexes, 2d and 2dsphere (new with 2.4)
Requires Longitude and Latitude (in that order) coordinates
contained in double precision array within documents.
Demo
71. 71
GEO Spatial Operations
One of MongoDB’s sweet spots
Used to store, index, search on geo-spatial data for GIS
operations.
Requires special indexes, 2d and 2dsphere (new with 2.4)
Requires Longitude and Latitude (in that order) coordinates
contained in double precision array within documents.
Demo
72. 72
Query Pagination
Use Spring Data and QueryDSL - http://www.querydsl.com/
Modify Spring Data repo extend QueryDslPredicateExecutor
Add appropriate Maven POM entries for QueryDSL
Use Page and PageRequest objects to page through result sets
QueryDSL will create Q<MODEL> Java classes
• Precludes developers from righting pagination code
73. 73
Save vs. Update
Java driver save() saves entire document.
Use “update” to save time and bandwidth, and possibly indexing.
• Spring Data is slightly slower than lower level mongo Java driver
• Spring data fluent API is very helpful.
75. 75
MongoDB Auth Security
Use –auth switch to enable
Create users with roles
Use db.authenticate in the code (if need be)
76. 76
MongoDB Auth Security with Spring
May need to add credentials to Spring MongoDB config
Do not authenticate twice
java.lang.IllegalStateException: can't call authenticate twice on
the same DBObject
at com.mongodb.DB.authenticate(DB.java:476)
77. 77
MongoDB Write Concerns
Describes quality of writes (or write assurances) to MongoDB
Application (MongoDB client) is concerned with this quality
Write concerns describe the durability of a write, and can be tuned
based on application and data needs
Adjusting write concerns can have an affect (maybe deleterious)
on write performance.
78. 78
MongoDB Encryption
MongoDB does not support data encryption, per se
Use application-level encryption and store encrypted data in BSON
fields
Or…use TDE (Transparent Data Encryption) from Gazzang
• http://www.gazzang.com/encrypt-mongodb
79. 79
MongoDB Licensing
Database
• “Free Software Foundation's GNU AGPL v3.0.” – 10gen
• “Commercial licenses are also available from 10gen, including free
evaluation licenses.” – 10gen
Drivers (API):
• “mongodb.org supported drivers: Apache License v2.0.” – 10gen
• “Third parties have created drivers too; licenses will vary there.” –
10gen
80. 80
MongoDB 2.2
Drop-in replacement for 1.8 and 2.0.x
Aggregation without Map Reduce
TTL Collections (alternative to Capped Collections)
Tag-aware Sharding
http://docs.mongodb.org/manual/release-notes/2.2/
81. 81
MongoDB 2.4
Text Search
• Must be enabled, off by default
• Introduces considerable overhead for processing and storage
• Not recommended for PROD systems; it is a BETA feature.
Hashed Index and sharding
http://docs.mongodb.org/manual/release-notes/2.4/
82. 82
New JavaScript Engine – V8
MongoDB 2.4 uses the Google V8 JavaScript Engine
• https://code.google.com/p/v8/
• Open source, written in C++,
• High performance, with improved concurrency for multiple JavaScript
operations in MongoDB at the same time.
83. 83
Some Useful Commands
use <db> - connects to a DB
use admin; db.runCommand({top:1})
• Returns info about collection activity
db.currentOp() – returns info about operations currently running in mongo db
db.serverStatus()
db.hostInfo()
db.isMaster()
db.runCommand({"buildInfo":1})
it
db.runCommand({touch:"employees",data:true,index:true})
• { "ok" : 1 }
DemoRun EmployeeLoader.javaRun DistinctTest.javaRun EmployeeDeptGroupTest.java or Run EmployeeTitleGroupTest.javaRun MapReduceTest.java (Show MapReduce.groovy)