When you use MongoDB for the first time, the biggest risk is to apply the same patterns and designs used in the SQL world, in this way you miss the real change that SQL MongoDB requires: change the way of thinking.
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Application Design for MongoDB
1. Application Design
FOR MongoDB
Alessandro Palumbo
apalumbo@byte-code.com
http:/
/it.linkedin.com/in/alessandropalumbo/
http:/
/www.byte-code.com
Except where otherwise noted, this work is licensed under: http:/
/creativecommons.org/licenses/by/3.0/
2. Alessandro Palumbo - apalumbo@byte-code.com - http:/
/www.byte-code.com
MongoDB
NoSql
OPEN-source
Document-Oriented
JSON-style documents
from humongous
“huge; enormous”
Except where otherwise noted, this work is licensed under: http:/
/creativecommons.org/licenses/by/3.0/
4. Alessandro Palumbo - apalumbo@byte-code.com - http:/
/www.byte-code.com
don’t be relationaL
no joins
WE CAN EMBED
NO FULL
transactions
no SCHEMA
DOCUMENT LEVEL
TRANSACTIONS
IS IT REALLY
AN ISSUE?
Except where otherwise noted, this work is licensed under: http:/
/creativecommons.org/licenses/by/3.0/
5. Alessandro Palumbo - apalumbo@byte-code.com - http:/
/www.byte-code.com
friendly fire
DESIGN
PERFORMANCE
(aka RTFM)
ATOMIC
DOCUMENT
OPERATIONS
Write
Concern
READ
PREFERENCE
AVOID
NATURAL
KEYS AS
IDENTIFIERS
DESIGN
FOR
QUERY
DBREFS
VS
MANUAL
REFERENCE
DYNAMIC
SCHEMA
VS
static
languages
EMBEDDED
DATA
vs
References
be aware
of
the trees
BE
CAREFUL
WITH
DATES
PREALLOCATE
FIELDS?
SPLIT DATA
ON
MULTIPLE
COLLECTIONS
TUNING
UPDATES
AND
INSERTS
PURE DRIVER
VS
MAPPING
FRAMEWORKS
Except where otherwise noted, this work is licensed under: http:/
/creativecommons.org/licenses/by/3.0/
DOCUMENT
MOVING
SLOWS
YOU
PREPROCESS
HIGH
RESOLUTION
DATA
6. FRIENDLY FIRE
Except where otherwise noted, this work is licensed under: http:/
/creativecommons.org/licenses/by/3.0/
7. Alessandro Palumbo - apalumbo@byte-code.com - http:/
/www.byte-code.com
ATOMIC
DOCUMENT
OPERATIONS
OPERATIONS ON MULTIPLE DOCUMENTS
ARE NOT ATOMIC
NO “ALL OR NOTHING”
EMBEDding OR APPLIcaTION TRANSACTIONS
CAN be used to handle the issue
RELATIONAL TRANSACTIONS ARE NOT
TOTALLY SAFE
Except where otherwise noted, this work is licensed under: http:/
/creativecommons.org/licenses/by/3.0/
8. Alessandro Palumbo - apalumbo@byte-code.com - http:/
/www.byte-code.com
Write
Concern
“Describes the guarantee that
MongoDB provides when reporting
on the success of a write
operation”
IT IS SET BY THE CLIENT AND CAN BE SET FOR
EACH OPERATION
Errors Ignored
Unacknowledged
Acknowledged (*)
Journaled
Replica Acknowledged
> 1 , majority , custom using tags
Except where otherwise noted, this work is licensed under: http:/
/creativecommons.org/licenses/by/3.0/
9. Alessandro Palumbo - apalumbo@byte-code.com - http:/
/www.byte-code.com
Read
Preference
“IT describes how MongoDB clients
route read operations to members
of a replica set”
IT IS SET BY THE CLIENT AND CAN BE SET FOR
EACH OPERATION
primary (*)
primary Preferred
secondary
secondary PREFERRED
nearest
Except where otherwise noted, this work is licensed under: http:/
/creativecommons.org/licenses/by/3.0/
10. Alessandro Palumbo - apalumbo@byte-code.com - http:/
/www.byte-code.com
AVOID
NATURAL
KEYS AS
IDENTIFIERS
All collections have an index on
the id field that exists by default.
If ID IS NOT PROVIDED the driver or
the mongod will create an _id
field with an ObjectID value.
ADD AN UNIQUE INDEX ON THE NATURAL KEY,
SOMETIMES THE APPLICATION REALM CAN
EVOLVE IN AN UNEXPECTED WAY
REMEMBER THAT UNIQUE INDEXES FIELDS
MUST BE PART OF THE SHARD KEY IF
SHARDING IS ENABLED
Except where otherwise noted, this work is licensed under: http:/
/creativecommons.org/licenses/by/3.0/
12. Alessandro Palumbo - apalumbo@byte-code.com - http:/
/www.byte-code.com
DESIGN
FOR
QUERY
DOCUMENT DESIGN IS FUNCTIONAL TO
THE QUERIES THAT WILL EXISTS IN THE
APPLICATION
REFERENCE OR EMBED DOCUMENTS,
“denormalized” is not always
a bad word
your document design will affect
what kind of OPERATIONS will be safe
or not
Except where otherwise noted, this work is licensed under: http:/
/creativecommons.org/licenses/by/3.0/
13. Alessandro Palumbo - apalumbo@byte-code.com - http:/
/www.byte-code.com
EMBEDDED
DATA
vs
References
Embedded data models allow
applications to store related
pieces of information in the same
database record
USUALLY there is a “contains” relation
between the embedding and the embedded
object
The maximum BSON document size is 16
megabytes and embedding may lead to
performance issues if not correctly used
Except where otherwise noted, this work is licensed under: http:/
/creativecommons.org/licenses/by/3.0/
14. Alessandro Palumbo - apalumbo@byte-code.com - http:/
/www.byte-code.com
EMBEDDED
DATA
vs
References
Normalized data models describe
relationships using references
between documents
References provides more flexibility
than embedding but remember that
client-side applications will have to
lookup for referenced objects with
multiple queries
NO Referential integrity is supported,
references could point to a not existing
object
Except where otherwise noted, this work is licensed under: http:/
/creativecommons.org/licenses/by/3.0/
15. Alessandro Palumbo - apalumbo@byte-code.com - http:/
/www.byte-code.com
DBREFS
VS
MANUAL
REFERENCE
DBRefs are a convention for
representing a document, it will
hold the collection name, the id,
and optionally the db name
MANUAL REFERENCES are just fields
that will hold the id of the
related document, without the
collection name or the db name
MANUAL REFERENCES are suitable
for most of the use cases
Except where otherwise noted, this work is licensed under: http:/
/creativecommons.org/licenses/by/3.0/
16. Alessandro Palumbo - apalumbo@byte-code.com - http:/
/www.byte-code.com
BE
CAREFUL
WITH
DATES
ALWAYS Use bson date when is
related to an instant of time or
you will never be able to use
operators on that fields
BSON Date is a 64-bit signed integer
that represents the number of
milliseconds since the Unix epoch
(Jan 1, 1970), Negative values
represent dates before 1970.
The official BSON specification
refers to the BSON Date type as
the UTC datetime.
Except where otherwise noted, this work is licensed under: http:/
/creativecommons.org/licenses/by/3.0/
17. Alessandro Palumbo - apalumbo@byte-code.com - http:/
/www.byte-code.com
SPLIT DATA
ON
MULTIPLE
COLLECTIONS
split data on multiple collections
to easily partition your data
(a.k.a. Multitenancy)
use collections as namespaces for
your data
remember once data is partioned it
will be more hard to aggregate if
needed
Except where otherwise noted, this work is licensed under: http:/
/creativecommons.org/licenses/by/3.0/
18. Alessandro Palumbo - apalumbo@byte-code.com - http:/
/www.byte-code.com
DYNAMIC
SCHEMA
VS
static
languages
why use dynamic schema if we are
not using a dynamic programming
language?
inheritance is not only a matter
of hierarchy, it could be also a
matter of composition
composition is the key to
introduce dynamic schema in a
static programming language
Except where otherwise noted, this work is licensed under: http:/
/creativecommons.org/licenses/by/3.0/
19. Alessandro Palumbo - apalumbo@byte-code.com - http:/
/www.byte-code.com
PURE DRIVER
VS
MAPPING
FRAMEWORKS
using the mongo driver directly
will give you great powers, but
will force you to write a lot of
boilerplate code
MAPPING FRAMEWORKS WILL HELP TO
WRITE LESS CODE, but you will sacrifice
the control on all the aspects of the
persistence
why not take the most from both?
Except where otherwise noted, this work is licensed under: http:/
/creativecommons.org/licenses/by/3.0/
21. Alessandro Palumbo - apalumbo@byte-code.com - http:/
/www.byte-code.com
be aware
of
the trees
Indexes in MongoDB are defined at
the collection level and can be on
any field or sub-field of the
document
Indexes are created using a b-tree and can
be of different types
Single Field
Compound
Multikey
Geospatial
TEXT (BETA)
Hashed
THEY COULD BE UNIQUE and sparse
Except where otherwise noted, this work is licensed under: http:/
/creativecommons.org/licenses/by/3.0/
22. Alessandro Palumbo - apalumbo@byte-code.com - http:/
/www.byte-code.com
DOCUMENT
MOVING
SLOWS
YOU
MONGODB handle the space
allocation of a RECORD
considering also a PADDING FACTOR
WHEN AN UPDATED DOCUMENT DOES
NOT FIT IN THE RECORD SPACE IT WILL
BE MOVED
DYNAMIC SCHEMA IS THE FIRST CAUSE
OF DOCUMENT MOVING
Except where otherwise noted, this work is licensed under: http:/
/creativecommons.org/licenses/by/3.0/
23. Alessandro Palumbo - apalumbo@byte-code.com - http:/
/www.byte-code.com
PREALLOCATE
FIELDS?
FIELDS PREALLOCATION CAN FIX THE
DOCUMENT MOVING ISSUES IN SOME USE
CASES
Default values must be used to
preallocate, this MUST BE HANDLEDin the
application
NULL is not a default value :-) as it has
its own type
Except where otherwise noted, this work is licensed under: http:/
/creativecommons.org/licenses/by/3.0/
24. Alessandro Palumbo - apalumbo@byte-code.com - http:/
/www.byte-code.com
PREPROCESS
HIGH
RESOLUTION
DATA
MONGODB let you store the
maximum resolution of your data
MAP REDUCE and aggregation ARE ok
but you could also preprocess and
have aggregated data that you can
use for your queries
MONGODB rocks for business
intelligence
Except where otherwise noted, this work is licensed under: http:/
/creativecommons.org/licenses/by/3.0/
25. Alessandro Palumbo - apalumbo@byte-code.com - http:/
/www.byte-code.com
TUNING
UPDATES
AND
INSERTS
MongoDB stores BSON documents
as a sequence of fields and values,
not as aN hash table
WRITING THE FIRST FIELD OF A DOCUMENT (OR
A NESTED DOCUMENT) is considerably
faster than writing THE LAST
Intra-Document Hierarchy could help to
handle the issue
Except where otherwise noted, this work is licensed under: http:/
/creativecommons.org/licenses/by/3.0/
26. Any questions?
Except where otherwise noted, this work is licensed under: http:/
/creativecommons.org/licenses/by/3.0/