In my presentation i covered a few thing on NoSQL
What is NoSQL
NoSQL Features
Types of NoSQL
Advantages on NoSQL
and then i moved to MongoDB. This presentation deals with some basic question like
When do we embed data versus linking?
How many collections do we have, and what are they?
When do we need atomic operations?
What indexes will we create to make query and updates fast?
What is shard?
2. What is NoSQL
In the past few years, the”one size fits all“-thinking concerning data
stores has been questioned by both, Science and web companies,
which has lead to the emergence of a great variety of alternative
databases. The movement as well as the new datastores is
commonly subsumed under the term NoSQL.
The basic quality of NoSQL is that, it may not require fixed table
schemas, usually avoid join operations, and typically scale
horizontally. Academic researchers typically refer to these databases
as structured storage, a term that includes classic relational
databases as a subset.
NoSQL database also trades off “ACID” (atomicity, consistency,
isolation and durability). NoSQL databases, to varying degrees,
even allow for the schema of data to differ from record to record. If
there doesn’t exist schema or a table in NoSQL, then how do you
visualize the database structure? Well here is the answer
3. NoSQL Features
No schema required: Data can be inserted in a NoSQL
database without first defining a rigid database schema. As
a corollary, the format of the data being inserted can be
changed at any time, without application disruption. This
provides immense application flexibility, which ultimately
delivers substantial business flexibility.
Auto elasticity: NoSQL automatically spreads your data
onto multiple servers without requiring application
assistance. Servers can be added or removed from the data
layer without application downtime.
Integrated caching: In order to increase data through and
increase the performance advance NoSQL techniques cache
data in system memory. This is in contrast to SQL database
where this has to be done using separate infrastructure.
4. Types of NoSQL
Describing the architecture of data storage in NoSQL, there are three
types of popular NoSQL databases.
Key-value stores. As the name implies, a key-value store is a
system that stores values indexed for retrieval by keys. These
systems can hold structured or unstructured data.
Column- oriented databases. Rather than store sets of
information in a heavily structured table of columns and rows with
uniform sized fields for each record, as is the case with relational
databases, column-oriented databases contain one extendable
column of closely related data.
document-based stores. These databases store and organize data
as collections of documents, rather than as structured tables with
uniform sized fields for each record. With these databases, users
can add any number of fields of any length to a document.
5. Advantages of NoSQL
NoSQL databases generally process data faster than
relational databases.
NoSQL databases are also often faster because their
data models are simpler.
Major NoSQL systems are flexible enough to better
enable developers to use the applications in ways that
meet their needs.
6. MongoDB
MongoDB (from "humongous") is a scalable, high-
performance, open source, document-oriented database.
Written in C++.
It stores data as BSON format (Binary JSON)
7. Some basic terms
MySQL term Mongo term
database database
table collection
index index
row BSON document
column BSON field
join embedding and linking
primary key _id field
8. Some Question
When do we embed data versus linking?
How many collections do we have, and what are they?
When do we need atomic operations?
What indexes will we create to make query and updates
fast?
What is shard?
9. Best Practices
"First class" objects, that are at top level, typically have
their own collection.
Line item detail objects typically are embedded.
Objects which follow an object modeling "contains"
relationship should generally be embedded.
Many to many relationships are generally done by
linking.
10. Best Practices
Collections with only a few objects may safely exist as
separate collections, as the whole collection is quickly
cached in application server memory.
Embedded objects are a bit harder to link to than "top level"
objects in collections.
If the amount of data to embed is huge (many megabytes),
you may reach the limit on size of a single object, which is
16 MB per document. If you need more than that see
GridFS.
If performance is an issue, embed
11. How to Index
A second aspect of schema design is index selection. As
a general rule, where you want an index in a relational
database, you want an index in Mongo.
The _id field is automatically indexed.
Fields upon which keys are looked up should be indexed.
Sort fields generally should be indexed.
12. How to Index
The MongoDB profiling facility provides useful
information for where an index should be added that is
missing.
Note that adding an index slows writes to a collection,
but not reads. Use lots of indexes for collections with a
high read : write ratio (assuming one does not mind the
storage overage). For collections with more writes than
reads, indexes are expensive as keys must be added to
each index for each insert.
13. Atomic Operations
Some problems require the ability to perform atomic
operations. For example, simply incrementing a counter
is often a case where one wants atomicity. MongoDB can
also perform more complex operations such as that
shown in the pseudocode below:
atomically { if( doc.credits > 5 ) { doc.credits -= 5;
doc.debits += 5; } }
14. Atomic Operations
Another example would be a user registration scenario.
We would never want to users to register the same
username simultaneously:
atomically { if( exists a document with username='jane'
) { print "username already in use please choose
another"; } else { insert a document with
username='jane' in the users collection; print("thanks
you have registered as user jane."); } }
15. What is Sharding?
MongoDB scales horizontally via an auto-sharding
(partitioning) architecture.
Horizontal partitioning splits one or more tables by
row, usually within a single instance of a schema and a
database server.
Sharding goes beyond this: it partitions the problematic
table(s) in the same way, but it does this across
potentially multiple instances of the schema.
16. Sharding
Sharding offers:
Automatic balancing for changes in load and data
distribution
Easy addition of new machines
Scaling out to one thousand nodes
No single points of failure
Automatic failover
17. Sharding
Another consideration for schema design is sharding. A
BSON document (which may have significant amounts of
embedding) resides on one and only one shard.
A collection may be sharded. When sharded, the
collection has a shard key, which determines how the
collection is partitioned among shards. Typically (but not
always) queries on a sharded collection involve the
shard key as part of the query expression.
The key here is that changing shard keys is difficult. You
will want to choose the right key from the start(which is
not covered in this presentation).