1) Organizations now deal with huge amounts of data both internally and externally generated to better understand their business and customers.
2) Relational databases cannot effectively handle this big data due to challenges in data structure, scaling, and speed.
3) NoSQL databases provide alternatives to store structured, semi-structured, and unstructured data across different data models like columnar, key-value, document, and graph. Each type has different properties suited for various use cases.
2. Now a days most organization(web, e-commerce, telecom, social media etc)are trying
to deal with huge amount of data to know their business/customer better than before
and take right decision based on data.
It is not that, earlier organizations were not able to take decision on data, but then
why now they are trying to understand data on big data space?
The reason is, now data are not only in-house generated but some of them
are generated out side. These are not the master/transactional data but actually data
which help to define the nature of business/customer much better.
Lets walk through using an example. My organization wanted to have
feedbacks from customer on home accessories product which we build and general
approaches are ask customer to fill up feedback form, ask them to rate on our website
against product, take feedback from our employee what they feel etc.
3. And we are done with feedback.
Wait…before 2010 days it was ok, but now I don’t think it is enough and we
have collected all the feedback and its associated data.
Now a days people are talking more around the world on different platforms
about things and it is not only limited to feedback form or company feedback site.
So more data(which is good to take decision), but where to store it, RDBMS?
You can try ….. but how do you know the structure of foreign generated data, how do
you scale your database as it will be huge data, how do you handle the fast action on
data etc ? For All these big questions, the most appropriate answer is Bigdata/NoSQL.
4. NoSQL database gives you the way to store/retrieve data in big data environment and
store/retrieve structured, semi structured ,un structured data and along also provide
horizontal scaling, fail over, clustering and data replica .
There are different types of NoSQL database. Each type has different properties and
not all are suitable for all use cases. There are mainly four types of NoSQL database.
1. Column Oriented(Store record distributed in column family wise)
2. Key-Value (Store record against a key)
3. Document (Store record as a document and each document can be different)
4. Graph (Store data as a graph)
Column Oriented
In column family ,data are stored in cells grouped in columns of data rather
than rows of data. Column family is group, where related column are arranged
together and these columns(of same family) will be stored in a same machine or place
to make sure access to the related columns requires less time. In a normal row
oriented storage (like RDBMS) all the columns of a row are stored in same location and
access to a particular column is actually doing read of whole row. In case of column
oriented, only related columns gets read and not the whole row and each column
families are stored against row key of record.
5. Columnar database store all cells of same column family in same disk entry ,so it is faster to
read/search/aggregate similar columns. Where in row base stores all column in single disk entry,
in result even for single column lookup they have to scan the whole row.
Row Oriented (RDBMS) (Pic-1)
Column Oriented
Column family DD (Pic-2) Column Family SB (Pic-3) Column Family NAD (Pic-4)
Name Address Department Designation Salary Bonus DOB
xyz abcd PD Architect xxxxx xxxxx 12/12/1980
abc xyzx Support DBA xxxxxx xxxxxx 12/12/1980
Department Designation
PD Architect
Support DBA
Salary Bonus
xxxx xxxxxx
xxxxxx xxxxxx
Name Address DOB
xyz abcd 12/12/1980
abc xyzx 12/12/1980
6. In the above diagram, I have explained how the storage differs from row base to column oriented
storage.
Few scenarios where we can use column oriented NoSQL
1. Choose columnar store if your searches are mostly involved on related sets of columns.
2. Fetching few columns which can be grouped under the same column family.
3. During update not all columns are updated, its only updates the columns which can be
grouped under same family.
Above mentioned points are just a few, but the Important one is, columnar store could
be used when search/retrieval are more around similar kind of columns which can be grouped
on same column family.
Example of Column oriented NoSQL Database: Hbase, Cassandra.Google big table.
7. Key-Value
In key value NoSQL database records are stored against a key. On a simple note, key
value database use hashtable kind of data structure, where records are stored against
its key. Key-value NoSQL database has logical groups of key bucket, where keys of
same hashcode fall under the same bucket.
Key 1
Key 3
Key 4
Key 5
Key 2
Record1
Record2
Record3
Record4
Record5
8. Some of the Key-Value NoSQL database has made the retrieval/search more faster by caching.
Scenarios where Key-Value store can be used
1. Keeping session related information of users for a website.
2. Storing static content of a html page against a url.
3. Keeping session’s shopping kart information against a user.
Above mentioned points are just a few, but the Important one is, key-value store could be used
when most of the searches are against a key and the programmer knows the internal structure of
record.
Example of Column oriented NoSQL Database: Redis, Riak, Amazon’s Dynamo.
9. Document
Document store NoSQL database records as a document of key-value pairs. This is similar to the
Key-value store NoSQL database but there are some differences. Records of Document store
provide some kind of structure, that allow to specify attribute as a metadata and allows to query
with those attributes. Each record could have nested records within it. Document database are
schema less, which make it easier to support variety/structured/un-structured records and along
with this it also allows modification of existing record’s schema modification without impacting
others records.
{personName:”xxx”,
{DoorNo: “Home No1”,Road:”Road1”, City:”City1”,Country:”India”}
}
{personName :”yyyy”,
{DoorNo :”Home No2”, Block:”A”, CrossNo:”Cross 3rd,”City: “Allahabad”, Pincode: 300011”}
}
In the above example I have given a sample of person’s records in JSON format. Both the persons
have a different address format which is a example of schema less store.
10. Scenarios where document store can be used
Document oriented database provides support for schema less, semi structured data
model support. This type of database can be used in the scenarios mentioned below.
1. Storing user profile information, where user’s profile may have different kind of
data(Some user may have contact information as email, some may have mobile
and some may have work phone location).
2. Structure of each row may vary e.g. a permanent employee enjoys some in-house
employee benefit, but for a contract employee there may be other benefits. So
keeping them in document oriented database will be useful and requires less
space than RDBMS.
3. If my data model changes are frequent and if it should not impact the system and
other records then we should go for document database.
4. Rapid prototyping.
Example of Document oriented NoSQL Database: MongoDB, CouchDB,RavenDB
11. Graph
Graph database allows to store record and relationships with other records. In graph database
records are considered as Nodes and their relationship as edge. Nodes can have multiple edges to
other Node(s) which define their relationship. Nodes and Edges can have multiple properties.
Once data (Nodes and Edges) are defined, data can be analyzed in different ways.
Short write-up from Wiki which defines it’s basic properties
• Graph databases are based on graph theory. Graph databases employ nodes, properties, and
edges.
• Nodes represent entities such as people, businesses, accounts, or any other item you might
want to keep track of.
• Properties are pertinent information that relate to nodes. For instance, if "Wikipedia" were
one of the nodes, one might have it tied to properties such as "website", "reference
material", or "word that starts with the letter 'w'", depending on which aspects of
"Wikipedia" are pertinent to the particular database.
• Edges are the lines that connect nodes to nodes or nodes to properties and they represent
the relationship between the two. Most of the important information is really stored in the
edges. Meaningful patterns emerge when one examines the connections and
interconnections of nodes, properties, and edges
12. Consider the below pic for graph database.
Scenarios where Graph database can be used
Graph database are mostly used to find out the relationship between data, which can be
achieved using RDBMS but requires lot of effort. Consider the example of an employee table
which has a manager field , but if we have to find his second line manager then either we have to
add new column or run query to find the manager’s manager. Assume if we have to add more
relationships and query related to it in RDBMS then we have to do lot of changes either while
adding new columns or writing complex queries. In case of Graph database it is much easier,
simpler and faster.
13. 1. Use graph database where we have to find the relationship between records.
2. If your need is to analyze social data and their relationship with different context.
3. If your need is to analyze routing information of money/goods etc.
Conclusion
All the features supported by the NoSQL databases does not mean the demise of RDBMS
databases. We are now on the age of having polyglot persistence, so we need techniques
that uses different data storage technologies to handle the varying data needs. All the above
given use cases, examples and suggestions explains the characteristics of each NoSQL database.
Hence while choosing your NoSQL database you need to consider your use case , technical and
non-technical (do I have required resource, maintenance etc) requirements and pick the right
one.