Introduction to NoSQL

Source: NoSQL Distilled
Prepared by Dr. Dipali Meher
Introduction to NoSQL
Not Only SQL
Dr. Dipali Meher
Assistant Professor
Modern College of Arts, Science and Commerce, Ganeshkhind, Pune 411016
mailtomeher@gmail.com/dipalimeher@moderncollegegk.org
MCS, M.Phil,NET,Ph.D
1

Agenda
• Introduction
• Why No SQL?
• Aggregate data models
• Data Modeling Details
• Distribution models
• Consistency
• Version stamps
• Map- reduce
2

3

Introduction
• A NO SQL originally referring to non SQL or non relational is
a database that provides a mechanism for storage and retrieval
of data.
• tabular relations used in relational databases
• Such databases came into existence in the late 1960s,
• Used in real-time web applications and big data
• Sometimes called Not only SQL to emphasize the fact that they
may support SQL-like query languages.
• Example: MarkLogic, Aerospike, FairCom c-treeACE, Google
Spanner (though technically a NewSQL database), Symas
LMDB, and OrientDB have made them central to their designs.
4

5

6

JSON Format
• JSON stands for JavaScript Object Notation.
• JSON objects are used for transferring data between server and client,
XML serves the same purpose. However JSON objects have several
advantages over XML and we are going to discuss them in this tutorial
along with JSON concepts and its usages.
• Example JSON DB
• var chaitanya =
{ "firstName" : "Chaitanya",
"lastName" : "Singh",
"age" : "28" };
7

Features of JSON
• It is light-weight
• It is language independent
• Easy to read and write
• Text based, human readable data exchange format
8

Why use JSON?
• Standard Structure: As we have seen so far that JSON objects
are having a standard structure that makes developers job easy to
read and write code, because they know what to expect from
JSON.
• Light weight: When working with AJAX, it is important to load
the data quickly and asynchronously without requesting the page
re-load. Since JSON is light weighted, it becomes easier to get and
load the requested data quickly.
• Scalable: JSON is language independent, which means it can
work well with most of the modern programming language. Let’s
say if we need to change the server side language, in that case it
would be easier for us to go ahead with that change as JSON
structure is same for all the languages.
9

Difference as example between JSON and
XML Style DB
JSON style: XML style:
{"students":
[ {"name":"John", "age":"23",
"city":"Agra"},
{"name":"Steve", "age":"28",
"city":"Delhi"},
{"name":"Peter", "age":"32",
"city":"Chennai"},
{"name":"Chaitanya", "age":"28",
"city":"Bangalore"}
]}
<students>
<student> <name>John</name> <age>23</age>
<city>Agra</city>
</student>
<student> <name>Steve</name> <age>28</age>
<city>Delhi</city>
</student>
<student> <name>Peter</name> <age>32</age>
<city>Chennai</city>
</student>
<student>
<name>Chaitanya</name> <age>28</age>
<city>Bangalore</city>
</student> </students>
10

Limitations of Relational DB
• In relational database we need to define structure and schema of
data first and then only we can process the data.
• Relational database systems provides consistency and integrity
of data by enforcing ACID properties (Atomicity, Consistency,
Isolation and Durability ). There are some scenarios where this
is useful like banking system. However in most of the other
cases these properties are significant performance overhead and
can make your database response very slow.
• Most of the applications store their data in JSON format and
RDBMS don’t provide you a better way of performing
operations such as create, insert, update, delete etc on this data.
• On the other hand NoSQL store their data in JSON format,
which is compatible with most of the today’s world application.
11

RDBMSVs NoSQL
• RDBMS: It is a structured data that provides more
functionality but gives less performance.
• NoSQL: Structured or semi structured data, less functionality
and high performance.
12

13

14

So when I say less functionality in NoSQL what’s
missing:
• You can’t have constraints in
NoSQL
• Joins are not supported in NoSQL
• These supports actually hinders
the scalability of a database, so
while using NoSQL database like
MongoDB, you can implements
these functionalities at the
application level.
15
When to go for NoSQL:
 When you would want to choose NoSQL
over relational database:
 When you want to store and retrieve huge
amount of data.
 The relationship between the data you
store is not that important
 The data is not structured and changing
over time
 Constraints and Joins support is not
required at database level
 The data is growing continuously and you
need to scale the database regular to
handle the data.

Why NO SQL?
• NoSQL databases are different than relational databases like MySQL.
• In relational database you need to create the table, define schema, set
the data types of fields etc before you can actually insert the data.
• In NoSQL you don’t have to worry about that, you can insert, update
data on the fly.
• One of the advantage of NoSQL database is that they are really easy to
scale and they are much faster in most types of operations that we
perform on database.
• There are certain situations where you would prefer relational
database over NoSQL, however when you are dealing with huge
amount of data then NoSQL database is your best choice.
16

Introduction Continued….
• includes simplicity of design
• Simpler horizontal scaling to clusters of machines
• finer control over availability
• The data structures used by NOSQL databases are different
from those used by default in relational databases which makes
some operations faster in NoSQL.
• Data Structures used in NO SQL language are flexible
17

Differentiate between SQL and NOSQL
18

Barriers to NO SQL
• Low-level query languages
• lack of standardized interfaces
• huge previous investments in existing relational databases
• Lacks true ACID(Atomicity, Consistency, Isolation, Durability)
properties
19

Types of NO SQL DB
• MongoDB falls in the category of NoSQL document based
database.
• Key value store: Memcached, Redis, Coherence
• Tabular: Hbase, Big Table, Accumulo
• Document based: MongoDB, CouchDB, Cloudant
20

Other problems faced by NO SQL
• stale reads problem- Most NoSQL databases offer a concept of
eventual consistency in which database changes are
propagated to all nodes so queries for data might not return
updated data immediately or might result in reading data that
is not accurate which is a problem known as stale reads.
• NO SQL may exhibit lost writes and other forms of data loss.
• Data consistency is bigger challenge
21

Advantages
• High scalability- NO SQL DB uses sharding for horizontal
scaling. Partitioning of data and placing it on multiple machines in
such a way that the order of the data is preserved is sharding.
• Vertical scaling means adding more resources to the existing
machine
• Horizontal scaling means adding more machines to handle the
data. Vertical scaling is not that easy to implement but horizontal
scaling is easy to implement.
• Examples of horizontal scaling databases are MongoDB,
Cassandra etc.
• NoSQL can handle huge amount of data because of scalability, as
the data grows NoSQL scale itself to handle that data in efficient
manner.
• High availability-replication feature in NoSQL databases makes it
highly available because in case of any failure data replicates itself
to the previous consistent state. 22

Disadvantages of NO SQL
• Narrow focus-NoSQL databases have very narrow focus as it is mainly designed
for storage but it provides very little functionality. Relational databases are a better
choice in the field of Transaction Management than NoSQL.
• Open source- It is open-source database. There is no reliable standard for NoSQL
yet. In other words two database systems are likely to be unequal.
• Management Challenge- he purpose of big data tools is to make management of a
large amount of data as simple as possible. But it is not so easy. Data management
in NoSQL is much more complex than a relational database. NoSQL, in particular,
has a reputation for being challenging to install and even more hectic to manage on
a daily basis.
• GUI is not available- GUI mode tools to access the database is not flexibly
available in the market.
• Backup- Backup is a great weak point for some NoSQL databases like MongoDB.
MongoDB has no approach for the backup of data in a consistent manner.
• Large document size-Some database systems like MongoDB and CouchDB store
data in JSON format. Which means that documents are quite large (BigData,
network bandwidth, speed), and having descriptive key names actually hurts, since
they increase the document size.
23

When should NoSQL be used
• When huge amount of data need to be stored and
retrieved .
• The relationship between the data you store is not that
important
• The data changing over time and is not structured.
• Support of Constraints and Joins is not required at
database level
• The data is growing continuously and you need to scale
the database regular to handle the data
24

• successful technology for twenty years, providing
persistence, concurrency control, and an integration
mechanism.
• Application developers have been frustrated with the
impedance mismatch between the relational model and
the in-memory data structures.
• There is a movement away from using databases as
integration points towards encapsulating databases
within applications and integrating through services.
• The vital factor for a change in data storage was the need
to support large volumes of data by running on clusters.
Relational databases are not designed to run efficiently
on clusters. 25
RDBMS

Impedance mismatch
Impedance mismatch is the term used to refer to the
problems that occurs due to differences between
the database model and the programming language
model.
26

NO SQL
• NoSQL is an accidental neologism.There is no prescriptive
definition—all you can make is an observation of common
characteristics.
• The common characteristics of NoSQL databases are
• Not using the relational model
• Running well on clusters
• Open-source
• Built for the 21st century web estates
• Schemaless
• The most important result of the rise of NoSQL is Polyglot
Persistence
27

Aggregate Data Models
• An aggregate is a collection of data that we interact with as
a unit.
• These units of data or aggregates form the boundaries for
ACID operations with the database, Key-value, Document,
and Column-family databases can all be seen as forms of
aggregate-oriented database.
28

Data Model
• A data model is the model through which we perceive
and manipulate our data
• The data model describes how we interact with the data
in the database
• A data model (or datamodel)is an abstract model that
organizes elements of data and standardizes how they
relate to one another and to the properties of real-world
entities. For instance, a data model may specify that the
data element representing a car be composed of a
number of other elements which, in turn, represent the
color and size of the car and define its owner.
• concepts such as entities, attributes, relations, or tables.
29

• Data models are distinct form storage models.
• Storage models describes how the database stores and
manipulates the data internally.
• A storage model is a model that captures key physical
aspects of data structure in a data store.
30

31
Storage model

32

• Ideally we should be ignorant of the storage model, but
in practice we need at least some inkling (impact of thing
after it over)of it—primarily to achieve decent
( acceptable standard )performance.
33
“data model” often means the model of the specific data in
an application. A developer might point to an entity-
relationship diagram of their database and refer to that as
their data model containing customers, orders, products
Metamodel :the model by which the database organizes data

Aggregates
•It recognizes that often, you want to
operate on data in units that have a more
complex structure than a set of tuples. It
can be handy to think in terms of a
complex record that allows lists and other
record structures to be nested inside it
34
complex record = aggregate
Programmers manipulate data through
aggregate structures

• Domain-Driven Design
• aggregate is a collection of related objects treated as unit
• it is a unit for data manipulation and management of consistency
• Aggregates will be updated with atomic operations
• key-value, document, and column-family databases will do this.
• When databases are operating in cluster using of these Aggregate will
be easy
• why easy (aggregate makes a natural unit for replication and sharding)
35

36

37

Sharding
38

39

Relations and Aggregates: example
• ecommerce website: relational databse
40

presents some sample data for this model.
41

Aggregate oriented structure
42

JSON format
43

• In this model, we have two main aggregates: customer
and order.
44

45
Embed all the objects for customer and the customer’s orders

46

• there’s no universal answer for how to draw your aggregate
boundaries.
• It depends entirely on how you tend to manipulate your data.
• If you tend to access a customer together with all of that
customer’s orders at once, then you would prefer a single
aggregate.
• However, if you tend to focus on accessing a single order at a
time, then you should prefer having separate aggregates for
each order.
• Naturally, this is very context-specific; some applications will
prefer one or the other, even within a single system, which is
exactly why many people prefer aggregate ignorance
47

Summary of aggregate data models
• An aggregate is a collection of data that we interact with
as a unit. Aggregates form the boundaries forACID
operations with the database.
• Key-value, document, and column-family databases can
all be seen as forms of aggregate oriented database.
• Aggregates make it easier for the database to manage
data storage over clusters.
• Aggregate-oriented databases work best when most data
interaction is done with the same aggregate; aggregate-
ignorant databases are better when interactions use data
organized in many different formations.
48

Aggregate Data Models Continued…
• Aggregates make it easier for the database to manage data storage over
clusters, since the unit of data now could reside on any machine and
when retrieved from the database gets all the related data along with it.
• Aggregate-oriented databases work best when most data interaction is
done with the same aggregate,
• for example when there is need to get an order and all its details, it
better to store order as an aggregate object but dealing with these
aggregates to get item details on all the orders is not elegant.
• Aggregate-oriented databases make inter-aggregate relationships more
difficult to handle than intra-aggregate relationships.
• Aggregate-ignorant databases are better when interactions use data
organized in many different formations.
• Aggregate-oriented databases often compute materialized views to
provide data organized differently from their primary aggregates. This
is often done with map-reduce computations, such as a map-reduce job
to get items sold per day.
49

Details of Data Models
• Relationships
• Graph Databases
• Schemaless databases
• MaterializedViews
• Modeling for Data Access
50

Aggregates are a central part of the NoSQL story
51

Relationships
• Create the aggregates for commonly accessed data. And put all
these aggregates together.
• In real life this might happen that aggregates access on common
data might be accessed differently.
• Example: one customer is having many orders
Some applications will want to access the order history whenever they
access the customer; this fits in well with combining the customer with
his order history into a single aggregate.
Other applications, however, want to process orders individually and
thus model orders as independent aggregates. In this situation
customer and order aggregate are separated but keep the same
relation ship and(one customer many orders)
many databases—even key-value stores—provide ways to make these
relationships visible to the database. . Document stores make the
content of the aggregate available to the database to form indexes and
queries. Riak, a key-value store, allows you to put link information in
metadata, supporting partial retrieval and link-walking capability.
52

Important aspect about
relationship and aggregates
• How updates are handled?
• Aggregate oriented databases treat the aggregate as the unit of
data-retrieval.
• Consequently, atomicity is only supported within the contents of a
single aggregate.
• If you update multiple aggregates at once, you have to deal yourself
with a failure partway through.
• Relational databases help you with this by allowing you to modify
multiple records in a single transaction, providingACID guarantees
while altering many rows.
53

• So when database contains lots of relationships go for
RDBMS.
54

Graph Databases
• Graph databases are an odd fish in the NoSQL pond
55

• Most of the NOSQL databases run on clusters and are
aggregate oriented.
• These aggregate data models are of large records with
simple connections.
• In case of graph databases there are small records with
complex interconnections. See example in next slide.
56

57
a graph isn’t a bar chart or histogram; instead, we refer to a graph data
structure of nodes connected by edges

• There is difference between graph databases and relational
database queries. In case of graph databases we have to keep
in mind graphical network structure and then ask the query. In
RDBMS we have to keep schema in mind(like foreign keys, the
join)
• In graphical query languages user can find answer then query
by navigating through network of edges.
• Relationships makes graph databases very different from
aggregate-oriented databases query work to be navigating
(to shows directions)relationships.
58

Navigation in graph databases
59

• The emphasis on relationships makes graph databases
very different from aggregate-oriented databases.
60

Schemaless Databases
• A common theme across all the forms of NoSQL
databases is that they are schemaless.
61
• NoSQL  storing data is much more casual.
• A key-value store allows you to store any data you like under
a key.
• A document database effectively does the same thing, since
it makes no restrictions on the structure of the documents
you store.
• Column-family databases allow you to store any data under
any column you like.
• Graph databases allow you to freely add new edges and
freely add properties to nodes and edges as you wish.

Schemaless databases
• freedom and flexibility
• With schema  figure out in advance what you need to
store/ document it / diagram it which is hard to do
• Without schema is no binding  User can easily change
your data storage as you learn more about your project.
• User can easily add new things as you discover them.
• If user donot want to store more attributes in database or
any rows in database then tis is allowed in NoSQL
62
a schemaless store also makes it easier to deal with nonuniform data:

Schemaless databases: Nonuniform data
• data where each record has a different set of fields.
• A schema puts all rows of a table into a straightjacket, which
becomes awkward if you have different kinds of data in
different rows.You either end up with lots of columns that are
usually null (a sparse table), or you end up with meaningless
columns like custom column.
63
Schemalessness avoids this, allowing each record to
contain just what it needs—no more, no less

In schemaless database implicit
schemas are present.
• implicit schema is a set of assumptions about the data’s
structure in the code that manipulates the data.
64

At last Schemaless means what?
65

MaterializedViews
• View in RDBMS
• Views provide a mechanism to hide from the client whether data is derived data or base data—but
can’t avoid the fact that some views are expensive to compute.
• To cope with this, materialized views were invented, which are views that are computed in
advance and cached on disk.
• Materialized views are effective for data that is read heavily but can stand being somewhat
stale(in real life it is nothing but tasteleass in database it is just for view purpose no DDL AND DML
FORTHATVIEW).
• NoSQL databases don’t have views, they may have precomputed and cached
queries, and they reuse the term
“materialized view” to describe them.
•MAP REDUCETECHNIQUE IS USED
• Map-reduce is a data processing paradigm for condensing large volumes of data into useful
aggregated
results.
Materialized views can be used within the same aggregate. 66

2 main ways to build the materialized view
• Eager approach: user can update the materialized view at the
same time you update the base data for it. This approach is
good when you have more frequent reads of the materialized
view than you have writes and you want the materialized views
to be as fresh as possible
• Application database: user can do any updates to base data
also update materialized views.
• outside of the database by reading the data, computing the
view, and saving it back to the database.
67
If you don’t want to pay that overhead on each update, you can run batch jobs
to update the materialized views at regular intervals.Views are
updated with MAP REDUCETECHNIQUE

MAP REDUCE
• A MapReduce job usually splits the input data-set into
independent chunks which are processed by
the map tasks in a completely parallel manner.
• The framework sorts the outputs of the maps, which are
then input to the reduce tasks.
• Typically both the input and the output of the job are
stored in a file-system.
68

MAP REDUCE
69

MAP REDUCE
70

Key points
• Aggregate-oriented databases make inter-aggregate
relationships more difficult to handle than intra-aggregate
relationships.
• Graph databases organize data into node and edge graphs;
they work best for data that has complex relationship
structures.
• Schemaless databases allow you to freely add fields to
records, but there is usually an implicit schema expected by
users of the data.
• Aggregate-oriented databases often compute materialized
views to provide data organized differently from their
primary aggregates.This is often done with map-reduce
computations. 71

ThankYou
72

Introduction to NoSQL

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to NoSQL

Similar to Introduction to NoSQL (20)

More from Dr-Dipali Meher

More from Dr-Dipali Meher (14)

Recently uploaded

Recently uploaded (20)

Introduction to NoSQL

Editor's Notes