SlideShare a Scribd company logo
1
NoSQL
Not Only SQL
2
3
Source :- http://wearesocial.net/blog/2015/01/digital-social-mobile-worldwide-2015/
4
3 V
o VELOCITY
o VOLUME
o VARIETY
5
6 6
RELATIONAL DATABASE MANAGEMENT
SYSTEM
Relational Model - data represented in terms of tuples (rows).
Key Concepts
o Table - collection of data elements organized in terms of rows and
columns
o Field - column in a table designed to maintain specific information
about every record in the table
o Record - horizontal entity represents set of related data
o Column - vertical entity containing values of particular type
7 7
RELATIONAL DATABASE MANAGEMENT
SYSTEM INTEGRITY RULES
o Entity Integrity
o Domain Integrity
o Referential integrity
o User-Defined Integrity
8 8
RELATIONAL DATABASE MANAGEMENT
SYSTEM
Pros Cons
Support simple data structure Poor representation of real
world
Limit redundancy Difficult to represent hierarchies
Better integrity Difficult represent complex data
types
Offer logical database
independence
Support one off queries using
SQL
Better backup & recovery
procedure
9 9
RDBMS VS NOSQL
RDBMS NoSQL
Scale up Scale out
Handle Structured Data
Semi-Structured data /
Unstructured data
Atomic transaction
Eventual consistency
impedance mismatch Object model
Strict schema Schema-less
10 10
DISTRIBUTED SYSTEMS
Distributed database system consists of loosely-
coupled sites that share no physical components.
Homogeneous DDBMS
All sites have identical software & aware of each other.
work corporately in processing user requests
Heterogeneous DDBMS
Different sites may use different schema and software.
provide limited facilities for cooperation in transaction processing
11 11
DISTRIBUTED SYSTEMS
Sharding
Split the data among multiple machines while ensuring that data is
always accessed from the correct place.
Replication
Multiple instances of the Database which each mirror all the data
of each other.
75GB
25GB 25GB 25GB
75GB
75GB 75GB 75GB
12 12
WHY NOSQL
The global NoSQL market is forecast to reach
$3.4 Billion in 2020,
representing a compound
annual growth rate (CAGR) of 21%
for the period
2015 – 2020.
http://www.technologies.org/?p=102
http://www.marketresearchmedia.com/?p=568
13 13
BIG USERS
14 14
BIG DATA
15 15
THE INTERNET OF THINGS
16 16
CLOUD COMPUTING
17 17
FLEXIBLE DATA MODEL
18 18
SCALABILITY AND PERFORMANCE
19 19
WHAT IS ACID?
o Atomicity
A transaction is all or nothing
o Consistency
Only valid data is written to the database
o Isolation
Pretend all transactions are happening serially and the data is
correct
o Durability
What you write is what you get
20 20
CAP THEOREM
A
PC
Availability :
Each client can always read
and write
Partition Tolerance :
The system works well despite
physical network partitions.
Consistency :
All clients always have the
same view of the data.
You can have at most two
of these properties for any
shared Data Systems.
21 21
AN ALTERNATIVE TO ACID IS BASE
o Basic Availability
System seems to work all the time
o Soft-State
It doesn't have to be consistent all the time
o Eventual Consistency
Becomes consistent at some later time
22 22
NOSQL DATABASE CATEGORIES
NoSQL
Database
Categories
Key Value
Store
Document
Store
Wide
Column
Store
Graph
Databases
23 23
KEY VALUE STORE - OVERVIEW
o Most basic type of NoSQL Database and basis for other three
o Schema-free
o Store data as Key-Value pair
o Key-Value stores can be used as collections, dictionaries,
associative arrays etc.
Example DBs: Redis, Project Voldemort, Amazon DyanmoDB
Key: Value Row_Id:100
First_Name: Saman
Last_Name: Silva
Address: 123, Galle Rd,
Beruwala
Last_Order: 2001
24 24
WIDE COLUMN STORE - OVERVIEW
o Stored data in a columnar format
o Semi-Schematic
o Allow key-value pairs to be stored
o Each key(Super Column) is associate with multiple attributes
o Stores data in column specific file
Example DBs: Apache Hbase, Cassendra, Big Table, Hadoop
Super_Column:Value
Sub_Coulmn->Key:Value
Sub_Coulmn->Key:Value
Super_Column:Name
First_Name:Saman
Last_Name:Silva
Super_Column:Address
No:125
Road:Galle Rd
City:Beruwala
25 25
DOCUMENT STORE - OVERVIEW
o Everything is stored in a Document
o Schema-free
o Data is stored inside documents as JSON or BSON formats
o Document is a Key-Value collection
Example DBs: MongoDb, CouchDB
Database: Customers Database: Orders
Document_Id:100
First_Name:Saman
Last_Name:Silva
Address:
Order:
Number: 125
Road: Galle Rd
City: Beruwala
Most_Recent:
2001
Document_Id:2001
Price: Rs 450
Item1: 1001
Item2: 1002
Document_Id:2002
Price: Rs 750
Item1: 1003
Item2: 1001
26 26
GRAPH DATABASE - OVERVIEW
o Collection of nodes & edges
o Node represent an entity & an edge represent a connection
between two nodes
o Stores data in a Graph
o Within nodes data stored as Key : Value pairs
o Mostly use in Social network applications such as Facebook,
Twitter and etc.
o Example DBs: Neo4j, Titan
Nodes & EDGES
With Key : Value
Name:
Shelan
Name:
Hansa
WorkPlace:
Virtusa
NODE
WORKS_IN
WORKS_IN
IS_FRIEND_OF
EDGE
27 27
KEY VALUE STORE
o Most Basic NoSQL Database Type
o Storing data as a dictionary or hash
o Dictionaries contain collection of objects or records
o Different than RDBMS
28 28
KEY VALUE STORE
Database
Customer Order
Row_Id:100
First_Name: Saman
Last_Name: Silva
Address: 123, Galle Rd,
Beruwala
Last_Order: 2001
Row_Id:101
First_Name: Nuwan
Last_Name: Perera
Address: 1/2, Galle Rd, Kalutara
Last_Order: 2002
Row_Id: 2001
Price: Rs 450
Item1: 1001
Item2: 1003
Item3: 1005
Row_Id:2002
Price: Rs 750
Item1: 1001
Item2: 1002
Item3: 1003
29 29
WHEN TO USE KEY VALUE STORE
o Caching: Quickly storing and retrieving
o Queuing: Some K/V stores support lists,
sets, queues and more
o Distributing information and tasks
o Keeping live information
30 30
ADVANTAGES OF KEY VALUE STORE
o Support horizontal scaling
o Highly Performance
o Lack of Schema/Schema-less Data store
o Different than RDBMS
o Flexibility and more closely follow modern concepts like OOP
o Provide basic K/V concept for other major 3 NoSQL DB types
31 31
REDIS – KEY STORE VALUE DATABASE
o Open Source, Advanced Key-Value store
o 3 main specialties
o Holds its database entirely in memory
o Has a relatively rich set of data types
o Can replicate data to any number of slaves
o 2 types of Persistence
o RDB Persistence
o AOF Persistence
o 5 Data Types
http://www.redis.io
http://redis.io/download
32 32
REDIS FEATURES
o Exceptionally Fast
o Support Rich data types
o Operations are Atomic
o MultiUtility Tool
33 33
REDIS DATA TYPES
“This is a String Value”
name
customer:1
address
Hasangi Hasangi Hansa HijasRajith
0
Hansa
1
Hasangi
2 Hijas
4
Shelan
3 Rajith
Hasangi Hansa HijasRajith
Shelan
Beruwala
customer:2
name
address
Rajith
Homagama
Hashes
Lists
Sets
Sorted Sets
String
34 34
REDIS - STRING
“This is a String Value”
>SET stringvalue “This is a String Value”
>OK
>GET stringvalue
>“This is a String Value”
35 35
REDIS - LISTS
>LPUSH customer Hansa
>(integer)1
>LPUSH customer Hasangi
>(integer)2
>RPUSH customer Rajith
>(integer)3
>LPUSH customer Hasangi
>(integer)4
>RPUSH customer Hijas
>(integer)5
>LRANGE customer 0 4
1) “Hasangi”
2) “Hasangi”
3) “Hansa”
4) “Rajith”
5) “Hijas”
Hasangi Hasangi Hansa HijasRajith
36 36
REDIS - SETS
>SADD customer Hansa
>(integer)1
>SADD customer Hasangi
>(integer)1
>SADD customer Rajith
>(integer)1
>SADD customer Hasangi
>(integer)0
>SADD customer Hijas
>(integer)1
>SMEMBERS customer
1) “Hijas”
2) “Rajith”
3) “Hasangi”
4) “Hansa”
HasangiHansa HijasRajith
37 37
REDIS – SORTED SETS
>ZADD customer 1 Hasangi
>(integer)1
>ZADD customer 3 Rajith
>(integer)1
>ZADD customer 4 Shelan
>(integer)1
>ZADD customer 2 Hijas
>(integer)1
>ZADD customer 0 Hansa
>(integer)1
>ZRANGE customer 0 4
1) “Hansa”
2) “Hasangi”
3) “Hijas”
4) “Rajith”
5) “Shelan”
0 Hansa 1 Hasangi 2 Hijas 4 Shelan3 Rajith
38 38
REDIS - HASHES
>HMSET customer:1 name “Shelan” address “Beruwala”
>OK
>HMSET customer:2 name “Rajith” address “Homagama”
>OK
>HGETALL customer:1
1) “name”
2) “Shelan”
3) “address”
4) “Beruwala”
name
customer:1
address
Shelan
Beruwala
customer:2
name
address
Rajith
Homagama
>HGETALL customer:2
1) “name”
2) “Rajith”
3) “address”
4) “Homagama”
39 39
REDIS – PUBS/SUBS
o Publish and Subscribe to message Channels
o Publisher/s can Subscribe to a channel/s
Publisher
Subscriber SubscriberSubscriber
“RedisChat” ChannelHi, I’m RedisChat
Publisher
Publisher
I’m Another RedisChat
Publisher
40 40
REDIS – TRANSACTIONS
o Execute group of command in a single step
o Has 2 properties
o All commands in a transaction are sequentially executed as a
single isolated operation
o Redis transaction is also atomic
>MULTI
>INCRBY accountA -50
>QUEUED
>INCRBY accountB +50
>QUEUED
>EXEC
>(integer)50
>(integer)150
>SET accountA 100
>OK
>SET accountB 100
>OK
>GET accountA
>”100”
>GET accountB
>”100”
>GET accountA
>”50”
>GET accountB
>”150”
41 41
REDIS – DISK PERSISTENCE
o Point-in-time snapshot of all
dataset
o Compact, ideal for regular
backup/archive
o Multiple save-points available
o Faster restarts compared to
AOF
o Very good for disaster
recovery
o Writes every command like a
tape
o Gets re-written when it gets too
big
o Can be easily parsed & edited
o AOF files bigger than RDB files
o Slower than RDB
RDB Persistence AOF Persistence
42 42
REDIS – REPLICATION
o Use asynchronous replication
o A master can have multiple slaves
o Slaves accept connection from other slaves
o Non-blocking on both master and slave side
o Redis Sentinel
Redis Master Redis Slave
Sentinel
Redis Master
Redis Slave Redis Slave
Redis SlaveRedis Slave
• Automatic Failover
• Monitoring
• Notification
• Configuration Provider
High AvailabilityScalability
43 43
WIDE-COLUMN STORE DATABASES
o Stores data as sections of columns of data rather than rows of data
o Ability to hold very large numbers of dynamic columns
o Benefit of storing data in columns, is fast search/ access and data
aggregation
o Advantages for data warehouses, customer relationship
management (CRM) systems.
o A wide variety of companies and organizations use Hadoop for
both research and production.
44 44
HADOOP
o Its not a software. Its a framework of tools.
o Objective is to running applications on big data.
o Open source set of tools distributed under Apache license.
o A distributed file system (HDFS)
o An environment to run Map-Reduce tasks – typically Batch
mode
o NOSQL Database – HBase
o Real Time Query Engine (Impala)
45 45
HADOOP’S APPROACH
Big Data is broken
into pieces
Computation
Computation
Computation
Computation
Combined Result
46 46
HADOOP ARCHITECTURE
Map Reduce
File System
(HDFS)
Projects
(Set of Hadoop Tools)
Ambari Cassandra HBase Mahout Spark ZooKeeper
47 47
HADOOP DISTRIBUTED MODEL
Commodity Hardware
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Slave Computers
Task
Tracker
Data
Node
Job
Tracker
Name
Node
Master Computer/s
48 48
HADOOP DISTRIBUTED MODEL
Commodity Hardware
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Slave Computers
Task
Tracker
Data
Node
Job
Tracker
Name
Node
Master Computer/s
49 49
HADOOP DATA ACCESS
Commodity Hardware
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Slave Computers
Task
Tracker
Data
Node
Job
Tracker
Name
Node
Master Computer/s
Application
50 50
HADOOP DATA FAULT TOLERANCE
Commodity Hardware
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Task
Tracker
Data
Node
Slave Computers
Task
Tracker
Data
Node
Job
Tracker
Name
Node
Master Computer/s
Application
Data
Node
Data
Node
Data
Node
Task
Tracker
51 51
HOW HADOOP SOLVES BIG DATA
CHALLENGES OF PROGRAMMERS
File location
Manage failures
Break computations into pieces
Scaling
Focus on scale free programs
52 52
SCALABILITY
ProcessingSpeed
No of Computers
… …
Master
Slave
Cost
53 53
HBASE
An open-source, distributed, versioned, non-relational database
modeled after Google's Big Table.
Features
o Linear and modular scalability.
o Strictly consistent reads and writes.
o Automatic and configurable sharding of tables
o Automatic failover support between Region Servers.
o Convenient base classes for backing Hadoop MapReduce jobs
with Apache HBase tables.
o Easy to use Java API for client access.
o Block cache and Bloom Filters for real-time queries.
54 54
What is a Graph ?
F
o
ll
o
w
s
Shelan
Hansa
Hijaz
Follows
F
o
ll
o
w
s
Hasangi Follows
@Hansa
#nosql
GRAPH DATABASES
55 55
What is a Graph Database?
Database that uses graph structures to represent & store data.
Key-Features
o Excellent in dealing with relationships
o High Performance
o Flexible
o Query language support
Rajith
Name:Rajith
City:Kottawa
Married:false
Works for
Since:2014/11/24
Virtusa
Name:Virtusa
City:Colombo
GRAPH DATABASES
56 56
GRAPH DATABASES
Graph databases vs Relational databases
Relational Graph
Tables Nodes
Schema with nullables No schema
Relationships with foreign
keys
Relation is first class citizen
Related data fetch with joins
Related data fetched with
pattern
57 57
NEO4J
ACID
Graph
DB JAVA
Enterprise
Features
Billions
of
Entities
Rest API
58 58
NEO4J
What is Cypher?
Graph
Query
Language
Declarative Pattern
matching
Clauses
59 59
NEO4J
Cypher Basic Syntax
(a) - [ r ] - > (b)
a b
r
nodesrelation
60 60
NEO4J - CYPHER
Node with properties
( a { name : “rajith”, born : 1989 } )
Relationships with properties
( a ) - [:WORKED_IN { roles:[“ASE”] } ] - > ( b )
Labels
( a : Person { name: “rajith”} )
61 61
NEO4J - CYPHER
Quering with Cypher
MATCH ( a ) - - > ( b )
RETURN a, b;
MATCH ( a ) – [ r ] – > ( b )
RETURN a.name, type ( r );
Using Clauses
MATCH ( a : Person)
WHERE a.name = “rajith”
RETURN a;
62 62
DOCUMENT STORE
o A collection of documents
o Data in this model is stored inside documents.
o A document is a key value collection where the key allows
access to its value.
o Documents are not typically forced to have a schema and
therefore are flexible and easy to change.
o Documents are stored into collections in order to group
different kinds of data.
o Documents can contain many different key-value pairs, or key-
array pairs, or even nested documents.
o Usually use JSON (BSON) like interchange model then
application logic can be write easily.
63 63
WHAT IS MONGODB ?
o Scalable High-Performance Open-source, Document-
orientated database written in C++.
o Built for Speed
o Rich Document based queries for Easy readability
o Full Index Support for High Performance
o Replication and Failover for High Availability
o Auto Sharding for Easy Scalability.
o Map / Reduce for Aggregation.
64 64
KEYWORDS COMPARISON
RDBMS MongoDB
Database Database
Table,
View
Collection
Row Document (JSON,
BSON)
Column Field
Index Index
Join Embedded
Document
Foreign
Key
Reference
Partition Shard
> db.user.findOne({age:39})
{
"_id" :
ObjectId("5114e0bd42…"),
"first" : "John",
"last" : "Doe",
"age" : 39,
"interests" : [
"Reading",
"Mountain Biking ]
"favorites": {
"color": "Blue",
"sport": "Soccer"}
}
65 65
MONGODB ADVANCED FEATURES
o Replication
o Indexing
o Aggregation
o Sharding
o Capped Collections
66 66
REPLICATION
o Replication is the process of synchronizing data across
multiple servers
o Replication provides redundancy and increases data
availability
Primary
DB
Secondary
DB
Arbiter
DB
Minimum Replica set in MongoDB
REPLICA SET
67 67
AUTOMATIC FAILOVER
68 68
INDEXING
o Indexes support the efficient execution of queries in MongoDB
o MongoDB can use the index to limit the number of documents
it must inspect
o Indexes use a B-tree data structure.
o Using “ensureIndex” method can create index.
>db.COLLECtION_NAME.ensureIndex({KEY:1})
o Key is the name of field on which want to create index.
o 1 is for ascending order.
o -1 is for descending order.
69 69
WITH OUT INDEXING
Client says
Server have to read
every document to find
the result.
Document Storage
70 70
WITH INDEXING
71 71
INDEX TYPES
o Default _id Index
o Single Field Index
o Compound Index
o Multikey Index
o Geo Index
o Text Index
o Hashed Index
72 72
AGGREGATIONS
Aggregations are operations that process data records and return
computed results.
MongoDB provides a rich set of aggregation operations.
Aggregation concepts
o Aggregation Pipelines
o Map-Reduce
o Single Purpose Aggregation Operation
73 73
AGGREGATION PIPELINES
The pipeline provides efficient data aggregation using native
operations within MongoDB, and is the preferred method for data
aggregation in MongoDB
74 74
MAP-REDUCE
MongoDB also provides map-reduce operations to perform
aggregation
75 75
SINGLE PURPOSE AGGREGATION
OPERATION
MongoDB provides special purpose database commands.
All of operations aggregate documents from a single collection.
Common aggregation
operations are:
o returning a count of
matching documents
o returning the distinct values
for a field
o grouping data based on
the values of a field
76 76
SHARDING
Sharding is a method for storing data across multiple machines.
MongoDB uses sharding to support deployments with very large data
sets and high throughput operations.
77 77
CAPPED COLLECTIONS
o It is fixed-size circular collections that follow the insertion order
to support high performance for create, read and delete
operations.
o Capped collections restrict updates to the documents if the
update results in increased document size.
o Capped collections are best for storing log information, cache
data or any other high volume data.
78 78
NOSQL DATABASE CATEGORIES
NoSQL
Database
Categories
Key Value
Store
Document
Store
Wide
Column
Store
Graph
Databases
79 79
NOSQL DATABASES SUMMARY
Name HBase MongoDB Neo4j Redis
Database
model
Wide column
store
Document store Graph DBMS Key-value store
Initial release 2008 2009 2007 2009
License Open Source Open Source Open Source Open Source
DBaaS no no no no
Implementation
language
Java C++ Java C
Server
operating
systems
• Linux
• Unix
• Windows
• Linux
• OS X
• Solaris
• Windows
• Linux
• OS X
• Windows
• BSD
• Linux
• OS X
• Windows
Data scheme schema-free schema-free schema-free schema-free
Source :-
http://db-engines.com/en/system/HBase%3BMongoDB%3BNeo4j%3BRedis
80 80
NOSQL DATABASES SUMMARY
Name HBase MongoDB Neo4j Redis
2nd indexes no yes yes no
SQL no no no no
APIs and other
access
methods
Java API
RESTful HTTP
Thrift
proprietary protocol
using JSON
Cypher query
language
Java API
RESTful HTTP
proprietary
protocol
Supported
programming
languages
C
C#
C++
Groovy
Java
PHP
Python
Scala
Actionscript, C, C#,
C++, Clojure,
ColdFusion, D, Dart,
Delphi, Erlang, Go,
Groovy, Haskell, Java,
JavaScript, Lisp, Lua,
MatLab, Perl, PHP,
PowerShell, Prolog,
Python, R, Ruby,
Scala, Smalltalk
.Net
Clojure
Go
Groovy
Java
JavaScript
Perl
PHP
Python
Ruby
Scala
C, C#, C++,
Clojure, Dart
Erlang, Go,
Haskell, Java
JavaScript,
Lisp, Lua
Objective-C,
Perl, PHP,
Python, Ruby,
Scala,
Smalltalk, Tcl
Source :-
http://db-engines.com/en/system/HBase%3BMongoDB%3BNeo4j%3BRedis
81 81
NOSQL DATABASES SUMMARY
Name HBase MongoDB Neo4j Redis
Triggers yes no yes no
Partitioning
methods
Sharding Sharding none Sharding
Replication
methods
selectable
replication factor
Master-slave
replication
Master-slave
replication
Master-slave
replication
MapReduce yes yes no no
Consistency
concepts
• Immediate
• Consistency
• Eventual
• Consistency
• Immediate
• Consistency
• Eventual
• Consistency
configurable
in High
Availability
• Cluster setup
Immediate
Consistency
• Eventual
• Consistency
Source :-
http://db-engines.com/en/system/HBase%3BMongoDB%3BNeo4j%3BRedis
82 82
NOSQL DATABASES SUMMARY
Name HBase MongoDB Neo4j Redis
Foreign keys no no yes no
Transaction
concepts
no no ACID
optimistic
locking
Concurrency yes yes yes yes
Durability yes yes yes yes
In-memory
capabilities
yes
User concepts
Access Control
Lists (ACL)
Access rights for
users and roles
no
very simple
password-based
access control
Source :-
http://db-engines.com/en/system/HBase%3BMongoDB%3BNeo4j%3BRedis
83 83
THANK YOU

More Related Content

What's hot

Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
Abhinav Tyagi
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDB
valuebound
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
Bhavesh Padharia
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
Guido Schmutz
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
Selecting best NoSQL
Selecting best NoSQL Selecting best NoSQL
Selecting best NoSQL
Mohammed Fazuluddin
 
Graph databases
Graph databasesGraph databases
Graph databases
Vinoth Kannan
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
Filip Ilievski
 
MongoDB presentation
MongoDB presentationMongoDB presentation
MongoDB presentation
Hyphen Call
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databases
Ashwani Kumar
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
Derek Stainer
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
Simplilearn
 
Polyglot Persistence
Polyglot Persistence Polyglot Persistence
Polyglot Persistence
Dr-Dipali Meher
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
James Serra
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architecture
Bishal Khanal
 
NoSQL Graph Databases - Why, When and Where
NoSQL Graph Databases - Why, When and WhereNoSQL Graph Databases - Why, When and Where
NoSQL Graph Databases - Why, When and Where
Eugene Hanikblum
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
Navdeep Charan
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
Marin Dimitrov
 

What's hot (20)

Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDB
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Selecting best NoSQL
Selecting best NoSQL Selecting best NoSQL
Selecting best NoSQL
 
Graph databases
Graph databasesGraph databases
Graph databases
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
MongoDB presentation
MongoDB presentationMongoDB presentation
MongoDB presentation
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databases
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
Polyglot Persistence
Polyglot Persistence Polyglot Persistence
Polyglot Persistence
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architecture
 
NoSQL Graph Databases - Why, When and Where
NoSQL Graph Databases - Why, When and WhereNoSQL Graph Databases - Why, When and Where
NoSQL Graph Databases - Why, When and Where
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 

Viewers also liked

පරිගණකයේ විකාශය
පරිගණකයේ විකාශයපරිගණකයේ විකාශය
පරිගණකයේ විකාශය
Rajith Pemabandu
 
පරිගණකයේ ඉතිහාසය සහ වර්ගීකරණය
පරිගණකයේ ඉතිහාසය සහ වර්ගීකරණයපරිගණකයේ ඉතිහාසය සහ වර්ගීකරණය
පරිගණකයේ ඉතිහාසය සහ වර්ගීකරණය
Chamara Thilakarathne
 
Comparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsComparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse Platforms
David Portnoy
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
DataStax
 
NoSql Databases
NoSql DatabasesNoSql Databases
NoSql Databases
Nimat Khattak
 
Nosql databases for the .net developer
Nosql databases for the .net developerNosql databases for the .net developer
Nosql databases for the .net developer
Jesus Rodriguez
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
Folio3 Software
 
Mongo db groundup-0-nosql-intro-syedawasekhirni
Mongo db groundup-0-nosql-intro-syedawasekhirniMongo db groundup-0-nosql-intro-syedawasekhirni
Mongo db groundup-0-nosql-intro-syedawasekhirni
Dr. Awase Khirni Syed
 
A practical introduction to Oracle NoSQL Database - OOW2014
A practical introduction to Oracle NoSQL Database - OOW2014A practical introduction to Oracle NoSQL Database - OOW2014
A practical introduction to Oracle NoSQL Database - OOW2014
Anuj Sahni
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
Andrew Brust
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
ateeq ateeq
 
Scaling SQL and NoSQL Databases in the Cloud
Scaling SQL and NoSQL Databases in the Cloud Scaling SQL and NoSQL Databases in the Cloud
Scaling SQL and NoSQL Databases in the Cloud
RightScale
 
Guidelines to create an ontology
Guidelines to create an ontologyGuidelines to create an ontology
Guidelines to create an ontology
Rajith Pemabandu
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
Chris Richardson
 
NoSQL-Database-Concepts
NoSQL-Database-ConceptsNoSQL-Database-Concepts
NoSQL-Database-Concepts
Bhaskar Gunda
 
Restaurant and food ontologies
Restaurant and food ontologiesRestaurant and food ontologies
Restaurant and food ontologies
Anna Fensel
 
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
DataStax Academy
 
Online Marketing with Schema.org and Multi-channel Communication
Online Marketing with Schema.org and Multi-channel CommunicationOnline Marketing with Schema.org and Multi-channel Communication
Online Marketing with Schema.org and Multi-channel Communication
Anna Fensel
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big data
Steven Francia
 
පරිගණකයේ පරිණාමය(Histry of computer)
පරිගණකයේ පරිණාමය(Histry of computer)පරිගණකයේ පරිණාමය(Histry of computer)
පරිගණකයේ පරිණාමය(Histry of computer)
NoteGun LMS
 

Viewers also liked (20)

පරිගණකයේ විකාශය
පරිගණකයේ විකාශයපරිගණකයේ විකාශය
පරිගණකයේ විකාශය
 
පරිගණකයේ ඉතිහාසය සහ වර්ගීකරණය
පරිගණකයේ ඉතිහාසය සහ වර්ගීකරණයපරිගණකයේ ඉතිහාසය සහ වර්ගීකරණය
පරිගණකයේ ඉතිහාසය සහ වර්ගීකරණය
 
Comparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsComparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse Platforms
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
NoSql Databases
NoSql DatabasesNoSql Databases
NoSql Databases
 
Nosql databases for the .net developer
Nosql databases for the .net developerNosql databases for the .net developer
Nosql databases for the .net developer
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
 
Mongo db groundup-0-nosql-intro-syedawasekhirni
Mongo db groundup-0-nosql-intro-syedawasekhirniMongo db groundup-0-nosql-intro-syedawasekhirni
Mongo db groundup-0-nosql-intro-syedawasekhirni
 
A practical introduction to Oracle NoSQL Database - OOW2014
A practical introduction to Oracle NoSQL Database - OOW2014A practical introduction to Oracle NoSQL Database - OOW2014
A practical introduction to Oracle NoSQL Database - OOW2014
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
Scaling SQL and NoSQL Databases in the Cloud
Scaling SQL and NoSQL Databases in the Cloud Scaling SQL and NoSQL Databases in the Cloud
Scaling SQL and NoSQL Databases in the Cloud
 
Guidelines to create an ontology
Guidelines to create an ontologyGuidelines to create an ontology
Guidelines to create an ontology
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
 
NoSQL-Database-Concepts
NoSQL-Database-ConceptsNoSQL-Database-Concepts
NoSQL-Database-Concepts
 
Restaurant and food ontologies
Restaurant and food ontologiesRestaurant and food ontologies
Restaurant and food ontologies
 
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
 
Online Marketing with Schema.org and Multi-channel Communication
Online Marketing with Schema.org and Multi-channel CommunicationOnline Marketing with Schema.org and Multi-channel Communication
Online Marketing with Schema.org and Multi-channel Communication
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big data
 
පරිගණකයේ පරිණාමය(Histry of computer)
පරිගණකයේ පරිණාමය(Histry of computer)පරිගණකයේ පරිණාමය(Histry of computer)
පරිගණකයේ පරිණාමය(Histry of computer)
 

Similar to An Intro to NoSQL Databases

Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage
CCG
 
NoSQL(NOT ONLY SQL)
NoSQL(NOT ONLY SQL)NoSQL(NOT ONLY SQL)
NoSQL(NOT ONLY SQL)
Rahul P
 
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for BeginnersThe Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
Edelweiss Kammermann
 
Vargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbtVargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbt
Genoveva Vargas-Solar
 
NoSQL, SQL, NewSQL - methods of structuring data.
NoSQL, SQL, NewSQL - methods of structuring data.NoSQL, SQL, NewSQL - methods of structuring data.
NoSQL, SQL, NewSQL - methods of structuring data.
Tony Rogerson
 
DEE 431 Introduction to Mysql Slide 3
DEE 431 Introduction to Mysql Slide 3DEE 431 Introduction to Mysql Slide 3
DEE 431 Introduction to Mysql Slide 3
YOGESH SINGH
 
Presentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMSPresentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMS
abdurrobsoyon
 
17-NoSQL.pptx
17-NoSQL.pptx17-NoSQL.pptx
17-NoSQL.pptx
levichan1
 
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
Andre Essing
 
Cassandra
CassandraCassandra
Cassandra
rezabehzadi3
 
Technical overview of Azure Cosmos DB
Technical overview of Azure Cosmos DBTechnical overview of Azure Cosmos DB
Technical overview of Azure Cosmos DB
Microsoft Tech Community
 
NoSql Database
NoSql DatabaseNoSql Database
NoSql Database
Suresh Parmar
 
Introduction to ClustrixDB
Introduction to ClustrixDBIntroduction to ClustrixDB
Introduction to ClustrixDB
I Goo Lee
 
NOSQL
NOSQLNOSQL
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
Martin Bém
 
Nosql
NosqlNosql
Nosql
ROXTAD71
 
3 OLAP.pptx
3 OLAP.pptx3 OLAP.pptx
3 OLAP.pptx
Priyanshu931034
 
B.Vinithamani,II-M.sc.,Computer science,Bon Secours college for women,thanjavur.
B.Vinithamani,II-M.sc.,Computer science,Bon Secours college for women,thanjavur.B.Vinithamani,II-M.sc.,Computer science,Bon Secours college for women,thanjavur.
B.Vinithamani,II-M.sc.,Computer science,Bon Secours college for women,thanjavur.
vinithamaniB
 
NoSQL Basics and MongDB
NoSQL Basics and  MongDBNoSQL Basics and  MongDB
NoSQL Basics and MongDB
Shamima Yeasmin Mukta
 
Nosql
NosqlNosql

Similar to An Intro to NoSQL Databases (20)

Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
NoSQL(NOT ONLY SQL)
NoSQL(NOT ONLY SQL)NoSQL(NOT ONLY SQL)
NoSQL(NOT ONLY SQL)
 
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for BeginnersThe Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
 
Vargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbtVargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbt
 
NoSQL, SQL, NewSQL - methods of structuring data.
NoSQL, SQL, NewSQL - methods of structuring data.NoSQL, SQL, NewSQL - methods of structuring data.
NoSQL, SQL, NewSQL - methods of structuring data.
 
DEE 431 Introduction to Mysql Slide 3
DEE 431 Introduction to Mysql Slide 3DEE 431 Introduction to Mysql Slide 3
DEE 431 Introduction to Mysql Slide 3
 
Presentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMSPresentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMS
 
17-NoSQL.pptx
17-NoSQL.pptx17-NoSQL.pptx
17-NoSQL.pptx
 
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
 
Cassandra
CassandraCassandra
Cassandra
 
Technical overview of Azure Cosmos DB
Technical overview of Azure Cosmos DBTechnical overview of Azure Cosmos DB
Technical overview of Azure Cosmos DB
 
NoSql Database
NoSql DatabaseNoSql Database
NoSql Database
 
Introduction to ClustrixDB
Introduction to ClustrixDBIntroduction to ClustrixDB
Introduction to ClustrixDB
 
NOSQL
NOSQLNOSQL
NOSQL
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Nosql
NosqlNosql
Nosql
 
3 OLAP.pptx
3 OLAP.pptx3 OLAP.pptx
3 OLAP.pptx
 
B.Vinithamani,II-M.sc.,Computer science,Bon Secours college for women,thanjavur.
B.Vinithamani,II-M.sc.,Computer science,Bon Secours college for women,thanjavur.B.Vinithamani,II-M.sc.,Computer science,Bon Secours college for women,thanjavur.
B.Vinithamani,II-M.sc.,Computer science,Bon Secours college for women,thanjavur.
 
NoSQL Basics and MongDB
NoSQL Basics and  MongDBNoSQL Basics and  MongDB
NoSQL Basics and MongDB
 
Nosql
NosqlNosql
Nosql
 

Recently uploaded

20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
Federico Razzoli
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 

Recently uploaded (20)

20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 

An Intro to NoSQL Databases

  • 2. 2
  • 4. 4 3 V o VELOCITY o VOLUME o VARIETY
  • 5. 5
  • 6. 6 6 RELATIONAL DATABASE MANAGEMENT SYSTEM Relational Model - data represented in terms of tuples (rows). Key Concepts o Table - collection of data elements organized in terms of rows and columns o Field - column in a table designed to maintain specific information about every record in the table o Record - horizontal entity represents set of related data o Column - vertical entity containing values of particular type
  • 7. 7 7 RELATIONAL DATABASE MANAGEMENT SYSTEM INTEGRITY RULES o Entity Integrity o Domain Integrity o Referential integrity o User-Defined Integrity
  • 8. 8 8 RELATIONAL DATABASE MANAGEMENT SYSTEM Pros Cons Support simple data structure Poor representation of real world Limit redundancy Difficult to represent hierarchies Better integrity Difficult represent complex data types Offer logical database independence Support one off queries using SQL Better backup & recovery procedure
  • 9. 9 9 RDBMS VS NOSQL RDBMS NoSQL Scale up Scale out Handle Structured Data Semi-Structured data / Unstructured data Atomic transaction Eventual consistency impedance mismatch Object model Strict schema Schema-less
  • 10. 10 10 DISTRIBUTED SYSTEMS Distributed database system consists of loosely- coupled sites that share no physical components. Homogeneous DDBMS All sites have identical software & aware of each other. work corporately in processing user requests Heterogeneous DDBMS Different sites may use different schema and software. provide limited facilities for cooperation in transaction processing
  • 11. 11 11 DISTRIBUTED SYSTEMS Sharding Split the data among multiple machines while ensuring that data is always accessed from the correct place. Replication Multiple instances of the Database which each mirror all the data of each other. 75GB 25GB 25GB 25GB 75GB 75GB 75GB 75GB
  • 12. 12 12 WHY NOSQL The global NoSQL market is forecast to reach $3.4 Billion in 2020, representing a compound annual growth rate (CAGR) of 21% for the period 2015 – 2020. http://www.technologies.org/?p=102 http://www.marketresearchmedia.com/?p=568
  • 15. 15 15 THE INTERNET OF THINGS
  • 18. 18 18 SCALABILITY AND PERFORMANCE
  • 19. 19 19 WHAT IS ACID? o Atomicity A transaction is all or nothing o Consistency Only valid data is written to the database o Isolation Pretend all transactions are happening serially and the data is correct o Durability What you write is what you get
  • 20. 20 20 CAP THEOREM A PC Availability : Each client can always read and write Partition Tolerance : The system works well despite physical network partitions. Consistency : All clients always have the same view of the data. You can have at most two of these properties for any shared Data Systems.
  • 21. 21 21 AN ALTERNATIVE TO ACID IS BASE o Basic Availability System seems to work all the time o Soft-State It doesn't have to be consistent all the time o Eventual Consistency Becomes consistent at some later time
  • 22. 22 22 NOSQL DATABASE CATEGORIES NoSQL Database Categories Key Value Store Document Store Wide Column Store Graph Databases
  • 23. 23 23 KEY VALUE STORE - OVERVIEW o Most basic type of NoSQL Database and basis for other three o Schema-free o Store data as Key-Value pair o Key-Value stores can be used as collections, dictionaries, associative arrays etc. Example DBs: Redis, Project Voldemort, Amazon DyanmoDB Key: Value Row_Id:100 First_Name: Saman Last_Name: Silva Address: 123, Galle Rd, Beruwala Last_Order: 2001
  • 24. 24 24 WIDE COLUMN STORE - OVERVIEW o Stored data in a columnar format o Semi-Schematic o Allow key-value pairs to be stored o Each key(Super Column) is associate with multiple attributes o Stores data in column specific file Example DBs: Apache Hbase, Cassendra, Big Table, Hadoop Super_Column:Value Sub_Coulmn->Key:Value Sub_Coulmn->Key:Value Super_Column:Name First_Name:Saman Last_Name:Silva Super_Column:Address No:125 Road:Galle Rd City:Beruwala
  • 25. 25 25 DOCUMENT STORE - OVERVIEW o Everything is stored in a Document o Schema-free o Data is stored inside documents as JSON or BSON formats o Document is a Key-Value collection Example DBs: MongoDb, CouchDB Database: Customers Database: Orders Document_Id:100 First_Name:Saman Last_Name:Silva Address: Order: Number: 125 Road: Galle Rd City: Beruwala Most_Recent: 2001 Document_Id:2001 Price: Rs 450 Item1: 1001 Item2: 1002 Document_Id:2002 Price: Rs 750 Item1: 1003 Item2: 1001
  • 26. 26 26 GRAPH DATABASE - OVERVIEW o Collection of nodes & edges o Node represent an entity & an edge represent a connection between two nodes o Stores data in a Graph o Within nodes data stored as Key : Value pairs o Mostly use in Social network applications such as Facebook, Twitter and etc. o Example DBs: Neo4j, Titan Nodes & EDGES With Key : Value Name: Shelan Name: Hansa WorkPlace: Virtusa NODE WORKS_IN WORKS_IN IS_FRIEND_OF EDGE
  • 27. 27 27 KEY VALUE STORE o Most Basic NoSQL Database Type o Storing data as a dictionary or hash o Dictionaries contain collection of objects or records o Different than RDBMS
  • 28. 28 28 KEY VALUE STORE Database Customer Order Row_Id:100 First_Name: Saman Last_Name: Silva Address: 123, Galle Rd, Beruwala Last_Order: 2001 Row_Id:101 First_Name: Nuwan Last_Name: Perera Address: 1/2, Galle Rd, Kalutara Last_Order: 2002 Row_Id: 2001 Price: Rs 450 Item1: 1001 Item2: 1003 Item3: 1005 Row_Id:2002 Price: Rs 750 Item1: 1001 Item2: 1002 Item3: 1003
  • 29. 29 29 WHEN TO USE KEY VALUE STORE o Caching: Quickly storing and retrieving o Queuing: Some K/V stores support lists, sets, queues and more o Distributing information and tasks o Keeping live information
  • 30. 30 30 ADVANTAGES OF KEY VALUE STORE o Support horizontal scaling o Highly Performance o Lack of Schema/Schema-less Data store o Different than RDBMS o Flexibility and more closely follow modern concepts like OOP o Provide basic K/V concept for other major 3 NoSQL DB types
  • 31. 31 31 REDIS – KEY STORE VALUE DATABASE o Open Source, Advanced Key-Value store o 3 main specialties o Holds its database entirely in memory o Has a relatively rich set of data types o Can replicate data to any number of slaves o 2 types of Persistence o RDB Persistence o AOF Persistence o 5 Data Types http://www.redis.io http://redis.io/download
  • 32. 32 32 REDIS FEATURES o Exceptionally Fast o Support Rich data types o Operations are Atomic o MultiUtility Tool
  • 33. 33 33 REDIS DATA TYPES “This is a String Value” name customer:1 address Hasangi Hasangi Hansa HijasRajith 0 Hansa 1 Hasangi 2 Hijas 4 Shelan 3 Rajith Hasangi Hansa HijasRajith Shelan Beruwala customer:2 name address Rajith Homagama Hashes Lists Sets Sorted Sets String
  • 34. 34 34 REDIS - STRING “This is a String Value” >SET stringvalue “This is a String Value” >OK >GET stringvalue >“This is a String Value”
  • 35. 35 35 REDIS - LISTS >LPUSH customer Hansa >(integer)1 >LPUSH customer Hasangi >(integer)2 >RPUSH customer Rajith >(integer)3 >LPUSH customer Hasangi >(integer)4 >RPUSH customer Hijas >(integer)5 >LRANGE customer 0 4 1) “Hasangi” 2) “Hasangi” 3) “Hansa” 4) “Rajith” 5) “Hijas” Hasangi Hasangi Hansa HijasRajith
  • 36. 36 36 REDIS - SETS >SADD customer Hansa >(integer)1 >SADD customer Hasangi >(integer)1 >SADD customer Rajith >(integer)1 >SADD customer Hasangi >(integer)0 >SADD customer Hijas >(integer)1 >SMEMBERS customer 1) “Hijas” 2) “Rajith” 3) “Hasangi” 4) “Hansa” HasangiHansa HijasRajith
  • 37. 37 37 REDIS – SORTED SETS >ZADD customer 1 Hasangi >(integer)1 >ZADD customer 3 Rajith >(integer)1 >ZADD customer 4 Shelan >(integer)1 >ZADD customer 2 Hijas >(integer)1 >ZADD customer 0 Hansa >(integer)1 >ZRANGE customer 0 4 1) “Hansa” 2) “Hasangi” 3) “Hijas” 4) “Rajith” 5) “Shelan” 0 Hansa 1 Hasangi 2 Hijas 4 Shelan3 Rajith
  • 38. 38 38 REDIS - HASHES >HMSET customer:1 name “Shelan” address “Beruwala” >OK >HMSET customer:2 name “Rajith” address “Homagama” >OK >HGETALL customer:1 1) “name” 2) “Shelan” 3) “address” 4) “Beruwala” name customer:1 address Shelan Beruwala customer:2 name address Rajith Homagama >HGETALL customer:2 1) “name” 2) “Rajith” 3) “address” 4) “Homagama”
  • 39. 39 39 REDIS – PUBS/SUBS o Publish and Subscribe to message Channels o Publisher/s can Subscribe to a channel/s Publisher Subscriber SubscriberSubscriber “RedisChat” ChannelHi, I’m RedisChat Publisher Publisher I’m Another RedisChat Publisher
  • 40. 40 40 REDIS – TRANSACTIONS o Execute group of command in a single step o Has 2 properties o All commands in a transaction are sequentially executed as a single isolated operation o Redis transaction is also atomic >MULTI >INCRBY accountA -50 >QUEUED >INCRBY accountB +50 >QUEUED >EXEC >(integer)50 >(integer)150 >SET accountA 100 >OK >SET accountB 100 >OK >GET accountA >”100” >GET accountB >”100” >GET accountA >”50” >GET accountB >”150”
  • 41. 41 41 REDIS – DISK PERSISTENCE o Point-in-time snapshot of all dataset o Compact, ideal for regular backup/archive o Multiple save-points available o Faster restarts compared to AOF o Very good for disaster recovery o Writes every command like a tape o Gets re-written when it gets too big o Can be easily parsed & edited o AOF files bigger than RDB files o Slower than RDB RDB Persistence AOF Persistence
  • 42. 42 42 REDIS – REPLICATION o Use asynchronous replication o A master can have multiple slaves o Slaves accept connection from other slaves o Non-blocking on both master and slave side o Redis Sentinel Redis Master Redis Slave Sentinel Redis Master Redis Slave Redis Slave Redis SlaveRedis Slave • Automatic Failover • Monitoring • Notification • Configuration Provider High AvailabilityScalability
  • 43. 43 43 WIDE-COLUMN STORE DATABASES o Stores data as sections of columns of data rather than rows of data o Ability to hold very large numbers of dynamic columns o Benefit of storing data in columns, is fast search/ access and data aggregation o Advantages for data warehouses, customer relationship management (CRM) systems. o A wide variety of companies and organizations use Hadoop for both research and production.
  • 44. 44 44 HADOOP o Its not a software. Its a framework of tools. o Objective is to running applications on big data. o Open source set of tools distributed under Apache license. o A distributed file system (HDFS) o An environment to run Map-Reduce tasks – typically Batch mode o NOSQL Database – HBase o Real Time Query Engine (Impala)
  • 45. 45 45 HADOOP’S APPROACH Big Data is broken into pieces Computation Computation Computation Computation Combined Result
  • 46. 46 46 HADOOP ARCHITECTURE Map Reduce File System (HDFS) Projects (Set of Hadoop Tools) Ambari Cassandra HBase Mahout Spark ZooKeeper
  • 47. 47 47 HADOOP DISTRIBUTED MODEL Commodity Hardware Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Slave Computers Task Tracker Data Node Job Tracker Name Node Master Computer/s
  • 48. 48 48 HADOOP DISTRIBUTED MODEL Commodity Hardware Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Slave Computers Task Tracker Data Node Job Tracker Name Node Master Computer/s
  • 49. 49 49 HADOOP DATA ACCESS Commodity Hardware Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Slave Computers Task Tracker Data Node Job Tracker Name Node Master Computer/s Application
  • 50. 50 50 HADOOP DATA FAULT TOLERANCE Commodity Hardware Task Tracker Data Node Task Tracker Data Node Task Tracker Task Tracker Data Node Slave Computers Task Tracker Data Node Job Tracker Name Node Master Computer/s Application Data Node Data Node Data Node Task Tracker
  • 51. 51 51 HOW HADOOP SOLVES BIG DATA CHALLENGES OF PROGRAMMERS File location Manage failures Break computations into pieces Scaling Focus on scale free programs
  • 52. 52 52 SCALABILITY ProcessingSpeed No of Computers … … Master Slave Cost
  • 53. 53 53 HBASE An open-source, distributed, versioned, non-relational database modeled after Google's Big Table. Features o Linear and modular scalability. o Strictly consistent reads and writes. o Automatic and configurable sharding of tables o Automatic failover support between Region Servers. o Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables. o Easy to use Java API for client access. o Block cache and Bloom Filters for real-time queries.
  • 54. 54 54 What is a Graph ? F o ll o w s Shelan Hansa Hijaz Follows F o ll o w s Hasangi Follows @Hansa #nosql GRAPH DATABASES
  • 55. 55 55 What is a Graph Database? Database that uses graph structures to represent & store data. Key-Features o Excellent in dealing with relationships o High Performance o Flexible o Query language support Rajith Name:Rajith City:Kottawa Married:false Works for Since:2014/11/24 Virtusa Name:Virtusa City:Colombo GRAPH DATABASES
  • 56. 56 56 GRAPH DATABASES Graph databases vs Relational databases Relational Graph Tables Nodes Schema with nullables No schema Relationships with foreign keys Relation is first class citizen Related data fetch with joins Related data fetched with pattern
  • 58. 58 58 NEO4J What is Cypher? Graph Query Language Declarative Pattern matching Clauses
  • 59. 59 59 NEO4J Cypher Basic Syntax (a) - [ r ] - > (b) a b r nodesrelation
  • 60. 60 60 NEO4J - CYPHER Node with properties ( a { name : “rajith”, born : 1989 } ) Relationships with properties ( a ) - [:WORKED_IN { roles:[“ASE”] } ] - > ( b ) Labels ( a : Person { name: “rajith”} )
  • 61. 61 61 NEO4J - CYPHER Quering with Cypher MATCH ( a ) - - > ( b ) RETURN a, b; MATCH ( a ) – [ r ] – > ( b ) RETURN a.name, type ( r ); Using Clauses MATCH ( a : Person) WHERE a.name = “rajith” RETURN a;
  • 62. 62 62 DOCUMENT STORE o A collection of documents o Data in this model is stored inside documents. o A document is a key value collection where the key allows access to its value. o Documents are not typically forced to have a schema and therefore are flexible and easy to change. o Documents are stored into collections in order to group different kinds of data. o Documents can contain many different key-value pairs, or key- array pairs, or even nested documents. o Usually use JSON (BSON) like interchange model then application logic can be write easily.
  • 63. 63 63 WHAT IS MONGODB ? o Scalable High-Performance Open-source, Document- orientated database written in C++. o Built for Speed o Rich Document based queries for Easy readability o Full Index Support for High Performance o Replication and Failover for High Availability o Auto Sharding for Easy Scalability. o Map / Reduce for Aggregation.
  • 64. 64 64 KEYWORDS COMPARISON RDBMS MongoDB Database Database Table, View Collection Row Document (JSON, BSON) Column Field Index Index Join Embedded Document Foreign Key Reference Partition Shard > db.user.findOne({age:39}) { "_id" : ObjectId("5114e0bd42…"), "first" : "John", "last" : "Doe", "age" : 39, "interests" : [ "Reading", "Mountain Biking ] "favorites": { "color": "Blue", "sport": "Soccer"} }
  • 65. 65 65 MONGODB ADVANCED FEATURES o Replication o Indexing o Aggregation o Sharding o Capped Collections
  • 66. 66 66 REPLICATION o Replication is the process of synchronizing data across multiple servers o Replication provides redundancy and increases data availability Primary DB Secondary DB Arbiter DB Minimum Replica set in MongoDB REPLICA SET
  • 68. 68 68 INDEXING o Indexes support the efficient execution of queries in MongoDB o MongoDB can use the index to limit the number of documents it must inspect o Indexes use a B-tree data structure. o Using “ensureIndex” method can create index. >db.COLLECtION_NAME.ensureIndex({KEY:1}) o Key is the name of field on which want to create index. o 1 is for ascending order. o -1 is for descending order.
  • 69. 69 69 WITH OUT INDEXING Client says Server have to read every document to find the result. Document Storage
  • 71. 71 71 INDEX TYPES o Default _id Index o Single Field Index o Compound Index o Multikey Index o Geo Index o Text Index o Hashed Index
  • 72. 72 72 AGGREGATIONS Aggregations are operations that process data records and return computed results. MongoDB provides a rich set of aggregation operations. Aggregation concepts o Aggregation Pipelines o Map-Reduce o Single Purpose Aggregation Operation
  • 73. 73 73 AGGREGATION PIPELINES The pipeline provides efficient data aggregation using native operations within MongoDB, and is the preferred method for data aggregation in MongoDB
  • 74. 74 74 MAP-REDUCE MongoDB also provides map-reduce operations to perform aggregation
  • 75. 75 75 SINGLE PURPOSE AGGREGATION OPERATION MongoDB provides special purpose database commands. All of operations aggregate documents from a single collection. Common aggregation operations are: o returning a count of matching documents o returning the distinct values for a field o grouping data based on the values of a field
  • 76. 76 76 SHARDING Sharding is a method for storing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations.
  • 77. 77 77 CAPPED COLLECTIONS o It is fixed-size circular collections that follow the insertion order to support high performance for create, read and delete operations. o Capped collections restrict updates to the documents if the update results in increased document size. o Capped collections are best for storing log information, cache data or any other high volume data.
  • 78. 78 78 NOSQL DATABASE CATEGORIES NoSQL Database Categories Key Value Store Document Store Wide Column Store Graph Databases
  • 79. 79 79 NOSQL DATABASES SUMMARY Name HBase MongoDB Neo4j Redis Database model Wide column store Document store Graph DBMS Key-value store Initial release 2008 2009 2007 2009 License Open Source Open Source Open Source Open Source DBaaS no no no no Implementation language Java C++ Java C Server operating systems • Linux • Unix • Windows • Linux • OS X • Solaris • Windows • Linux • OS X • Windows • BSD • Linux • OS X • Windows Data scheme schema-free schema-free schema-free schema-free Source :- http://db-engines.com/en/system/HBase%3BMongoDB%3BNeo4j%3BRedis
  • 80. 80 80 NOSQL DATABASES SUMMARY Name HBase MongoDB Neo4j Redis 2nd indexes no yes yes no SQL no no no no APIs and other access methods Java API RESTful HTTP Thrift proprietary protocol using JSON Cypher query language Java API RESTful HTTP proprietary protocol Supported programming languages C C# C++ Groovy Java PHP Python Scala Actionscript, C, C#, C++, Clojure, ColdFusion, D, Dart, Delphi, Erlang, Go, Groovy, Haskell, Java, JavaScript, Lisp, Lua, MatLab, Perl, PHP, PowerShell, Prolog, Python, R, Ruby, Scala, Smalltalk .Net Clojure Go Groovy Java JavaScript Perl PHP Python Ruby Scala C, C#, C++, Clojure, Dart Erlang, Go, Haskell, Java JavaScript, Lisp, Lua Objective-C, Perl, PHP, Python, Ruby, Scala, Smalltalk, Tcl Source :- http://db-engines.com/en/system/HBase%3BMongoDB%3BNeo4j%3BRedis
  • 81. 81 81 NOSQL DATABASES SUMMARY Name HBase MongoDB Neo4j Redis Triggers yes no yes no Partitioning methods Sharding Sharding none Sharding Replication methods selectable replication factor Master-slave replication Master-slave replication Master-slave replication MapReduce yes yes no no Consistency concepts • Immediate • Consistency • Eventual • Consistency • Immediate • Consistency • Eventual • Consistency configurable in High Availability • Cluster setup Immediate Consistency • Eventual • Consistency Source :- http://db-engines.com/en/system/HBase%3BMongoDB%3BNeo4j%3BRedis
  • 82. 82 82 NOSQL DATABASES SUMMARY Name HBase MongoDB Neo4j Redis Foreign keys no no yes no Transaction concepts no no ACID optimistic locking Concurrency yes yes yes yes Durability yes yes yes yes In-memory capabilities yes User concepts Access Control Lists (ACL) Access rights for users and roles no very simple password-based access control Source :- http://db-engines.com/en/system/HBase%3BMongoDB%3BNeo4j%3BRedis

Editor's Notes

  1. A Relational Database management System(RDBMS) is a database management system based on relational model introduced by E.F Codd.  Many popular databases currently in use are based on the relational database model. The data in RDBMS is stored in database objects called tables. The table is a collection of related data entries and it consists of columns and rows. table is the most common and simplest form of data storage in a relational database. A field is a column in a table that is designed to maintain specific information about every record in the table. A record, also called a row of data, is each individual entry that exists in a table. record is a horizontal entity in a table that represents set of related data. A column is a vertical entity in a table that contains all information associated with a specific field in a table. a column is a set of value of a particular type
  2. Entity Integrity: There are no duplicate rows in a table. the rows in a relational table should all be distinct. Domain Integrity: Enforces valid entries for a given column by restricting the type, the format, or the range of values. column values must not be repeating groups or arrays Referential integrity: Rows cannot be deleted, which are used by other records. User-Defined Integrity: Enforces some specific business rules that do not fall into entity, domain or referential integrity. the concept of a null value- A blank is considered equal to another blank, a zero is equal to another zero, but two null values are not considered equal.
  3. NOSQL market is expected to grow 21 percent annually and reach 3.4 billion US dollars in 2020. Why this growth is expected? Because it’s being proved that developing NOSQL applications in Facebook, Twitter, Biotechnology, Defense, Image processing and many more, has gained more success. NOSQL is moving in to become a major player in database market place.
  4. NOSQL supports Big Users. Early days, 10000 concurrent users was an extreme case. But now apps should support millions of different users a day, and must support global users 24 hours a day, 365 days a year. Supporting large numbers of concurrent users is important, but because app usage requirements are hard to predict, it’s just as important to dynamically support rapidly growing numbers of concurrent users. With relational technologies, many application developers find it difficult, or even impossible, to get the dynamic scalability and level of scale they need while also maintaining the performance user’s demand. Only NOSQL can help to achieve this target.
  5. NOSQL also supports Big Data. You can see according to the graph, the usage of structured and semi-structured data usage has increased with time. Explosive growth in internet usage, in addition to the use of mobile and social apps, and machine-to-machine communications, has introduced new data types. However, capturing and using big data requires a very different type of database. Unfortunately, the rigidly defined schema-based approach used by relational databases makes it impossible to quickly incorporate new types of data and is a poor fit for unstructured and semi-structured data. NOSQL provides a much more flexible data model that better maps to an applications data organization.
  6. Today 20 billion devices are connected to Internet. For example: smart phones, tablets, home appliances, devices in cars, hospitals, warehouses and more. These devices receive data on environment, location movement, temperature, and etc. Innovative enterprises are relying on NoSQL technology to scale concurrent data access to millions of connected devices and systems, store billions of data points, and meet the performance requirements can be achieved by NOSQL.
  7. Today, most new applications run in a public, private, or hybrid cloud, support large numbers of users, and use a three-tier internet architecture. In the cloud, a load balancer directs the incoming traffic to a scale-out tier of web/application servers that process the logic of the application. NoSQL databases are built from the ground up to be distributed, scale-out technologies and are therefore a better fit with the highly distributed nature of the three-tier internet architecture.
  8. Relational and NOSQL data models are very different. The relational model takes data and separates it into many interrelated tables that contain rows and columns. You can store a JSON document in NOSQL which might take all the data stored in 20 tables of a relational database. Another major difference is that relational technologies have rigid schemas. NOSQL has no strict schema like relational database. The format of the data being inserted can be changed at any time, without application disruption.
  9. There are two options to deal with increased concurrent users and volume of data. They are, scale up the database or scale down. Relational database has limitations in scaling up. To support more concurrent users and store more data, relational databases require a bigger and more expensive server with more CPUs, memory, and disk storage. At some point, the capacity of even the biggest server can be outstripped and the relational database cannot scale further. Scale-out Database Tier with NoSQL provide an easier, linear, and cost effective approach to database scaling. As the number of concurrent users grows, simply add additional low-cost, commodity servers to your cluster. There’s no need to modify the application, since the application always sees a single (distributed) database.
  10. A transaction is a logical unit that is independently executed for data retrieval or update. ACID is a set of properties that apply specifically to database transactions. A database truncations are processed reliably, referred to as ACID. Let's examine the ACID requirement for a database transaction system in more detail.  Atomicity means either the task or tasks within a transaction are performed or none are performed (all or none rule). Consistency means the transaction meets all rules defined by the system at all times. The transaction does not violate those rules and the database must remain in a consistent state at the beginning and end of a transaction. There are no half-completed transactions. Isolation: No transaction has access to any other transaction that is in an intermediate or unfinished state. Each transaction is independent. Finally, durability means the transaction is complete and it will persist. The completed transaction will survive system failure, power loss and other types of system breakdowns.
  11. CAP Theorem, also known as Brewer’s Theorem, CAP theorem says, that there are three essential system requirements necessary for the successful design, implementation and deployment of applications in distributed computing systems. They are Consistency, Availability and Partition Tolerance. Consistency:  means that each client always has the same view of the data. This is the same idea of consistency in ACID. High Availability:  means that all clients can always read and write. Partition-tolerance:  means the system will continue to work unless there is a total network failure. A few nodes can fail and the system keeps going. Attaining all three is not however possible. If you can't have all of the ACID guarantees it turns out you can have two of these three characteristics.
  12. The BASE acronym was defined by Eric Brewer, who is also known for formulating the CAP theorem. The types of large systems based on CAP aren't ACID they are BASE. Everyone who builds big applications builds them on CAP and BASE: Google, Yahoo, Facebook, Amazon, eBay, etc.  Let's review BASE standards:  Basically Available: This constraint states that the system does guarantee the availability of the data as regards CAP Theorem; there will be a response to any request. But, that response could still be ‘failure’ to obtain the requested data or the data may be in an inconsistent or changing state, much like waiting for a check to clear in your bank account. Soft state: The state of the system could change over time, so even during times without input there may be changes going on due to ‘eventual consistency,’ thus the state of the system is always ‘soft.’ Eventual consistency: The system will eventually become consistent once it stops receiving input. The data will propagate to everywhere it should sooner or later, but the system will continue to receive input and is not checking the consistency of every transaction before it moves onto the next one. The BASE model isn't appropriate for every situation, but it is certainly a flexible alternative to the ACID model for databases that don't require strict adherence to a relational model.
  13. Key Value Store Global Collection of Key:Value Pair eg : Name is the Key, Value is the “Saman” Schema Free : every record can have different keys Most common, basis for other 3 nosql database categories… Examples, Redis, Amazon Simple DB, Project Voldermart, Riazk. Windows Azure Document Store Similar to key/value, but major different is value is document Flexible Schema, Schema Free – any number of fields can be added Values (Documents) stored in JSON or BSON Wide Column Store Each key, key -> super column is associate with multiple attributes Semi schematic, not schema free, we need to specify groups of column(knowns as column families) Data stores in column specific file Graph databases Is a collection of nodes and edges and each node represent a entity & each edge represent a connection or relationship between two nodes This stores data in a graph Key Value Store Column Oriented Store Document Store Graph Database Multimodal Databases Object Databases Unresolved and Uncategorized
  14. Basic type of nosql database category and basic one for other major three database categories Schema-free: allow developers to store schema less data (every record can have different keys) database stores data as key value pair, each key is unique and the value can be string, JSON, BLOB (basic large object) Key-Value stores can be used as collections, dictionaries, associative arrays etc. For example, think we have sales database and it have customer and order tables and each tables have unique rows. Here we have got one row here 100 and it have key value pairs first name, last name, address and last order will point to a another table. But there is no explicit relation between customer and orders
  15. Stored data in a columnar format those column are treated individually Wide columns have tables, but tables doesn’t belongs to a database. There is no such thing as database. Tables have rows, and rows have super columns and columns within them. So super columns are define when the tables are defined. In this example Name and Address
  16. Everything is stored in a Document, we can say collection of documents Schema Free : Documents are not typically forced to have a schema and therefore are flexible and easy to change. Instead of contain rows, they contain documents. But conceptually document is a similar to row. But still have the key value pairs inside the documents. The little difference is value of key can actually it self be a document. Value of that key point to an another document in a another database. As an example Customer document id 100 has a address key and value of that key it self a document And orders key has a value it self as a document it is point to an Orders database document id 2001s
  17. Key / Value Store KV can be considered the most basic and backbone implementation of NoSQL. This is designed for storing, retrieving, and managing associative arrays, a data structure more commonly known today as a dictionary or hash. Stores data as hash table. each Key is unique, key may be strings, hashes, lists, sets, sorted sets Value can be string, JSON, BLOB (basic large object) etc. These type of databases work by matching keys with values, similar to a dictionary. There is no structure nor relation. After connecting to the database server (e.g. Redis), an application can state a key (e.g.Name) and provide a matching value (e.g. ”Saman”) which can later be retrieved the same way by supplying the key. Dictionaries contain a collection of objects, or records, which in turn have many different fields within them, each containing data. These records are stored and retrieved using a key that uniquely identifies the record, and is used to quickly find the data within the database. KV stores work in a very different fashion than the better known relational databases (RDB). RDBs pre-define the data structure in the database as a series of tables containing fields with well defined data types. Exposing the data types to the database program allows it to apply a number of optimizations. In contrast, key-value systems treat the data as a single opaque collection which may have different fields for every record. This offers considerable flexibility and more closely follows modern concepts like object-oriented programming. Some popular key / value based data stores are: Redis: In-memory K/V store with optional persistence. Riak: Highly distributed, replicated K/V store. Memcached / MemcacheDB: Distributed memory based K/V store.
  18. Key / value DBMSs are usually used for quickly storing basic information, and sometimes not-so-basic ones after performing, for example, a CPU and memory intensive computation. They are extremely preferment, efficient and usually easily scalable. When To Use Caching: Quickly storing data for - sometimes frequent - future use. Queue-ing: Some K/V stores (e.g. Redis) supports lists, sets, queues and more. Distributing information / tasks: They can be used to implement Pub/Sub. Keeping live information: Applications which need to keep a state cane use K/V stores easily.
  19. One of the biggest benefit for most NoSQL solutions, including Key Value Stores, would be horizontal scaling. We all know that horizontal scaling and SQL Server, while it’s possible, does not play well. Typically if you need more from SQL Server you scale vertically, which can be costly. Key / Value data stores are highly performant, easy to work with and they usually scale well. Another benefit for Key Value stores is a lack of schema, this allows for changing the data structure as needed, thus being a bit more flexible. Whereas with SQL Server altering a table could result in stored procedures, functions, views, etc… needing updates, which take time and a DBA resource. Because optional values are not represented by placeholders as in most RDBs, key-value stores often use far less memory to store the same database, which can lead to large performance gains in certain workloads. key-value systems treat the data as a single opaque collection which may have different fields for every record. This offers considerable flexibility and more closely follows modern concepts like object-oriented programming. The key value stores are typically written in some type of programming language, commonly Java. This gives the application developer the freedom to store data how they see fit, in a schema-less data store. A subclass of the key-value store is the document-oriented database, which offers additional tools that use the metadata in the data to provide a richer key-value database that more closely matches the use patterns of RDBM systems. Some graph databases are also key-value stores internally, adding the concept of the relationships (pointers) between records as a first class data type. Key Value stores support “Eventual Consistency”, if a feature in your application doesn’t need to fully support ACID, then may not be a significant draw back.
  20. Redis is an open source, advanced key-value store and a serious solution for building high-performance, scalable web applications. Redis has three main peculiarities that set it apart from much of its competition: Redis holds its database entirely in memory, using the disk only for persistence. Redis has a relatively rich set of data types when compared to many key-value data stores. Redis can replicate data to any number of slaves – Redis Replication Redis Persist in 2 ways RDB Persistence AOF(Append Only File) Persistence Now Redis is quite a bit different than other noSQL databases. Besides just being different than relational databases, like SQL server. You may be familiar with document databases like Ravendb or Mongodb. And while they are certainly good choices for noSQL databases, they operate quite a bitdifferently than Redis does. With document databases, like Ravendb or Mongodb. The focus is on creating documents which are persisted to disk and can be indexed. Just like relational tables are indexed in SQL server or Oracle. Redis on the other hand stores its data using keys, and the data it stores can be in the form of different data structures, not just a document. The data is also stored in memory with persistence as a secondary consideration. And there is no indexing of any kind. You can, of course, implement your own indexes by creating them as additional data. But Redis does not do any of that for you. This can be a bit of a shock to you, some developers that are use to being able to query a database. After all, isn't that what databases are for? Databases like SQL server and Oracle allow you to query the database using SQL. Databases like RavenDB and MongoDB, allow you to query the data using indexes you create ahead of time or on the fly. But Redis only lets you get data by specifying a key. At first, this may seem like a ludicrous tradeoff to make. Why would you want to give up the ability to query your data? And it's true, in some case, using Redis will not make any sense at all, but you'll probably find that where Redis is appropriate. Although you have to do a little bit of extra work in designing your data, and working out how to access that data. It will be extremely fast with very little overhead, and so that's really the advantage, and the consideration that you need totake into account when deciding whether or not to use Redis.
  21. Exceptionally Fast : Redis is very fast and can perform about 110000 SETs per second, about 81000 GETs per second. Supports Rich data types : Redis natively supports most of the datatypes that most developers already know like list, set, sorted set, hashes. This makes it very easy to solve a variety of problems because we know which problem can be handled better by which data type. Operations are atomic : All the Redis operations are atomic, which ensures that if two clients concurrently access Redis server will get the updated value. MultiUtility Tool : Redis is a multi utility tool and can be used in a number of use cases like caching, messaging-queues (Redis natively supports Publish/ Subscribe ), any short lived data in your application like web application sessions, web page hit counts, etc.
  22. Redis supports 5 types of data types, Bitmaps and HyperLogLogs Redis also supports Bitmaps and HyperLogLogs which are actually data types based on the String base type, but having their own semantics.
  23. Strings – Redis String is a Sequence of bytes. Binary safe, meaning they have a known length not determined by any special terminating characters. Can store anything up to 512 megabytes in one string.
  24. Lists - Redis Lists are simply lists of strings, sorted by insertion order. You can add elements to a Redis List on the head or on the tail. The max length of a list is 2-32 - 1 elements (more than 4 billion of elements per list). Internally maintained as a linked list. Ideal for Queues, Stacks, TopN, Recent News, Time Line
  25. Sets - Redis Sets are an unordered collection of Strings. In redis you can add, remove, and test for existence of members in O(1) time complexity. In the above example Hasangi is added twice but due to unique property of set it is added only once. The max number of members in a set is 232 - 1 (4294967295, more than 4 billion of members per set). Sample usage tracking unique Ips, Tagging.
  26. Sorted Sets - Redis Sorted Sets are, similarly to Redis Sets, non repeating collections of Strings. The difference is that every member define with a score, that is used to take set ordered, from the smallest to the greatest score. Members are unique, but scores may be repeated. Sample Usage: Leaders Boards, Most Page Views, Sort for a given age, friends, comments, likes range
  27. Sorted Sets - Redis Sorted Sets are, similarly to Redis Sets, non repeating collections of Strings. The difference is that every member define with a score, that is used to take set ordered, from the smallest to the greatest score. Members are unique, but scores may be repeated. Sample Usage: Leaders Boards, Most Page Views, Sort for a given age, friends, comments, likes range
  28. Redis pub/sub implements the messaging system where senders/client (called publishers) sends the messages while receivers (subscribers) receive them. The link by which messages are transferred is called channel. In Redis a client(Publisher) can subscribe any number of channels. Subscriber also get messages from (published) multiple clients(Publishers) who are publishing message to a particular Channel
  29. Redis transactions allow the execution of a group of commands in a single step. Transactions has two properties in it, which are described below: All commands in a transaction are sequentially executed as a single isolated operation. It is not possible that a request issued by another client is served in the middle of the execution of a Redis transaction. Redis transaction is also atomic. Atomic means either all of the commands or none are processed.
  30. Redis Persistence Redis provides a different range of persistence options: The RDB persistence performs point-in-time snapshots of your dataset at specified intervals. the AOF persistence logs every write operation received by the server, that will be played again at server startup, reconstructing the original dataset. Commands are logged using the same format as the Redis protocol itself, in an append-only fashion. Redis is able to rewrite the log on background when it gets too big. If you wish, you can disable persistence at all, if you want your data to just exist as long as the server is running. It is possible to combine both AOF and RDB in the same instance. Notice that, in this case, when Redis restarts the AOF file will be used to reconstruct the original dataset since it is guaranteed to be the most complete. The most important thing to understand is the different trade-offs between the RDB and AOF persistence. Let's start with RDB: RDB advantages RDB is a very compact single-file point-in-time representation of your Redis data. RDB files are perfect for backups. For instance you may want to archive your RDB files every hour for the latest 24 hours, and to save an RDB snapshot every day for 30 days. This allows you to easily restore different versions of the data set in case of disasters. RDB is very good for disaster recovery, being a single compact file can be transferred to far data centers, or on Amazon S3 (possibly encrypted). RDB maximizes Redis performances since the only work the Redis parent process needs to do in order to persist is forking a child that will do all the rest. The parent instance will never perform disk I/O or alike. RDB allows faster restarts with big datasets compared to AOF. RDB disadvantages RDB is NOT good if you need to minimize the chance of data loss in case Redis stops working (for example after a power outage). You can configure different save points where an RDB is produced (for instance after at least five minutes and 100 writes against the data set, but you can have multiple save points). However you'll usually create an RDB snapshot every five minutes or more, so in case of Redis stopping working without a correct shutdown for any reason you should be prepared to lose the latest minutes of data. RDB needs to fork() often in order to persist on disk using a child process. Fork() can be time consuming if the dataset is big, and may result in Redis to stop serving clients for some millisecond or even for one second if the dataset is very big and the CPU performance not great. AOF also needs to fork() but you can tune how often you want to rewrite your logs without any trade-off on durability. AOF advantages Using AOF Redis is much more durable: you can have different fsync policies: no fsync at all, fsync every second, fsync at every query. With the default policy of fsync every second write performances are still great (fsync is performed using a background thread and the main thread will try hard to perform writes when no fsync is in progress.) but you can only lose one second worth of writes. The AOF log is an append only log, so there are no seeks, nor corruption problems if there is a power outage. Even if the log ends with an half-written command for some reason (disk full or other reasons) the redis-check-aof tool is able to fix it easily. Redis is able to automatically rewrite the AOF in background when it gets too big. The rewrite is completely safe as while Redis continues appending to the old file, a completely new one is produced with the minimal set of operations needed to create the current data set, and once this second file is ready Redis switches the two and starts appending to the new one. AOF contains a log of all the operations one after the other in an easy to understand and parse format. You can even easily export an AOF file. For instance even if you flushed everything for an error using a FLUSHALL command, if no rewrite of the log was performed in the meantime you can still save your data set just stopping the server, removing the latest command, and restarting Redis again. AOF disadvantages AOF files are usually bigger than the equivalent RDB files for the same dataset. AOF can be slower than RDB depending on the exact fsync policy. In general with fsync set to every secondperformances are still very high, and with fsync disabled it should be exactly as fast as RDB even under high load. Still RDB is able to provide more guarantees about the maximum latency even in the case of an huge write load. In the past we experienced rare bugs in specific commands (for instance there was one involving blocking commands like BRPOPLPUSH) causing the AOF produced to not reproduce exactly the same dataset on reloading. This bugs are rare and we have tests in the test suite creating random complex datasets automatically and reloading them to check everything is ok, but this kind of bugs are almost impossible with RDB persistence. To make this point more clear: the Redis AOF works incrementally updating an existing state, like MySQL or MongoDB does, while the RDB snapshotting creates everything from scratch again and again, that is conceptually more robust. However - 1) It should be noted that every time the AOF is rewritten by Redis it is recreated from scratch starting from the actual data contained in the data set, making resistance to bugs stronger compared to an always appending AOF file (or one rewritten reading the old AOF instead of reading the data in memory). 2) We never had a single report from users about an AOF corruption that was detected in the real world.
  31. Redis replication is a very simple to use and configure master-slave replication that allows slave Redis servers to be exact copies of master servers. The following are some very important facts about Redis replication: Redis uses asynchronous replication. Starting with Redis 2.8, however, slaves will periodically acknowledge the amount of data processed from the replication stream. A master can have multiple slaves. Slaves are able to accept connections from other slaves. Aside from connecting a number of slaves to the same master, slaves can also be connected to other slaves in a graph-like structure. Redis replication is non-blocking on the master side. This means that the master will continue to handle queries when one or more slaves perform the initial synchronization. Replication is also non-blocking on the slave side. While the slave is performing the initial synchronization, it can handle queries using the old version of the dataset, assuming you configured Redis to do so in redis.conf. Otherwise, you can configure Redis slaves to return an error to clients if the replication stream is down. However, after the initial sync, the old dataset must be deleted and the new one must be loaded. The slave will block incoming connections during this brief window. Replication can be used both for scalability, in order to have multiple slaves for read-only queries (for example, heavy SORT operations can be offloaded to slaves), or simply for data redundancy. http://blog.concretesolutions.com.br/2013/03/redis-parte-2/ http://redis.io/topics/sentinel http://redis.io/topics/replication Redis Sentinel provides high availability for Redis. In practical terms this means that using Sentinel you can create a Redis deployment that resists without human intervention to certian kind of failures. Redis Sentinel also provides other collateral tasks such as monitoring, notifications and acts as a configuration provider for clients. This is the full list of Sentinel capabilities at a macroscopical level (i.e. the big picture): Monitoring. Sentinel constantly checks if your master and slave instances are working as expected. Notification. Sentinel can notify the system administrator, another computer programs, via an API, that something is wrong with one of the monitored Redis instances. Automatic failover. If a master is not working as expected, Sentinel can start a failover process where a slave is promoted to master, the other additional slaves are reconfigured to use the new master, and the applications using the Redis server informed about the new address to use when connecting. Configuration provider. Sentinel acts as a source of authority for clients service discovery: clients connect to Sentinels in order to ask for the address of the current Redis master responsible for a given service. If a failover occurs, Sentinels will report the new address.
  32. The important difference here is that columns are created for each row rather than being predefined by the table structure.
  33. Map-Reduce - An algorithm for efficiently processing large amounts of data in parallel
  34. Ambari™: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heatmaps and ability to view MapReduce, Pig and Hive applications visually alongwith features to diagnose their performance characteristics in a user-friendly manner. Avro™: A data serialization system. Cassandra™: A scalable multi-master database with no single points of failure. Chukwa™: A data collection system for managing large distributed systems. HBase™: A scalable, distributed database that supports structured data storage for large tables. Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying. Mahout™: A Scalable machine learning and data mining library. Pig™: A high-level data-flow language and execution framework for parallel computation. Spark™: A fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation. Tez™: A generalized data-flow programming framework, built on Hadoop YARN, which provides a powerful and flexible engine to execute an arbitrary DAG of tasks to process data for both batch and interactive use-cases. Tez is being adopted by Hive™, Pig™ and other frameworks in the Hadoop ecosystem, and also by other commercial software (e.g. ETL tools), to replace Hadoop™ MapReduce as the underlying execution engine. ZooKeeper™: A high-performance coordination service for distributed applications.
  35. Labels
  36. Labels
  37. Labels
  38. Labels
  39. Labels
  40. Labels
  41. Map-Reduce - An algorithm for efficiently processing large amounts of data in parallel
  42. Labels
  43. Cypher is a query language specially designed for neo4j graph database.it is still in active development. Cypher is declarative, that means you specify what you need to retrieve , not how neo should retrieve it. Cypher use patters to match data in the database. Cypher works with clauses e.g. where , orderby
  44. Document Store is type of NOSQL database. Its store collection of documents. Data model store inside the document. Documents are not typically forced to have a schema and therefore are flexible and easy to change. Documents can contain many different key-value pairs, or key-array pairs, or even nested documents. Usually use JSON (BSON) like interchange model then application logic can be write easily.
  45. MongoDB is open source document database. It is written in C++. Data is stored in an open format such as XML, JSON, Binary JSON (BSON), etc. then easy readability of data. Allows server side operations on data, and easy to create tools to manipulate data. Fully index support then give high performance. In mongoDB have automatically fail recovery and replication then high availability. MongoDB has horizontal scaling like sharding then have easy scalability. It is provide aggregation framework then easy to handle lager amount of data.
  46. Database is a physical container for collections. Each database gets its own set of files on the file system. A single MongoDB server typically has multiple databases. Collection is a group of MongoDB documents. It is the equivalent of an RDBMS table. A document is a set of key-value pairs. This is example for a document.
  47. MongoDB has lot of its own features. Those are some advance features in mongoDB. Now we review one by one those features.
  48. Replication is the process of synchronizing data across multiple servers. It’s provide redundancy and its increase the data availability because it keeps multiple copies of data on different database servers, replication protects a database from the loss of a single server. Replication also allows you to recover from hardware failure and service interruptions. We can dedicate one to disaster recovery, reporting, or backup //////////// When we have single server DB then it is danger. when DB crashed, all data will be lost But if we have a backup then can restore it however this is a traditional approach for fail safety. In this situation ,Mongo DB support concept call replica set to achieves replication. Replica set is a group of mongod instances that host the same data set. Generally replica set contain minimum 3 nodes. One is primary node and one or more secondary nodes and arbiter node. All data replicates from primary to secondary node. 1)Primary node  only can have one primary instance in replica set. that receives all write operations. That means at any client write data to the database then have to connected to the primary. 2)Secondary node  those are read only databases. Can have many secondary database. That means can have more scalability because can preform many more read against the replicas rather than attacking single server. 3) Arbiter node An arbiter does not have a copy of data set and cannot become a primary. Replica sets may have arbiters to add a vote in elections of for primary. It can be a smaller machine does not need lot of hard.
  49. At some point primary db going to fail then one of the secondary will take over and become the primary. this is great because mongod support automatically recovery from a crash on primary. If one of secondary will break it not big deal because still have primary and depends on the application can have many secondary also. NO Data loss and NO lot of functionality. When primary Server will fail then one of the secondary will take over but there can be multiple secondary then which one become primary. So what mongo does it is hold an election. In election will look simple majority more than the 50% in order to become primary server. Those data will store in arbiter db server and its responsible for election.
  50. Mongodb has lager volume of data. Index support to speed up query and when using index then can limit the number of document to scan. Indexes are special data structures that store a small portion of the collection’s data set in an easy to traverse form. Indexes use B-tree structure. Using “ensureIndex” method can create index on field . Here key is the name of filed on which you want to create index and 1 is for ascending order. To create index in descending order then use as -1. As well “ensureIndex” method can pass multiple fields, to create index on multiple fields. The index stores the value of a specific field or set of fields, ordered by the value of the field. The ordering of the index entries supports efficient equality matches and range-based query operations. In addition, MongoDB can return sorted results by using the ordering in the index.
  51. MongoDB with out Indexing Lets look at this example, You have a collection named “foo” and you want to find all document where is the value field x is 10.then What the server does in order to find the document. Server has to scan each and every document and check if the value field x is equal to 10. then have to scan every document and compare those. This is very wasteful operation. Without indexes, MongoDB must perform a collection scan. then solution is the use index.
  52. Fundamentally, indexes in MongoDB are similar to indexes in other database systems. MongoDB defines indexes at the collection level and supports indexes on any field or sub-field of the documents in a MongoDB collection.
  53. MongoDB provides a number of different index types to support specific types of data and queries. 1) Default _id index  All MongoDB collections have an index on the _id field that exists by default. If applications do not specify a value for _id the driver or the mongod will create an _id field with an ObjectId value. The _id index is unique and prevents clients from inserting two documents with the same value for the_id field. 2) Single Field index In addition to the MongoDB-defined _id index, MongoDB supports the creation of user-defined ascending/descending indexes on a single field of a document. 3) Compound Index  MongoDB also supports user-defined indexes on multiple fields. The order of fields listed in a compound index has significance. index sorts first by first field and then, within each document  , sorts by other field. 4) Multikey Index If ,index a field that holds an array value, MongoDB creates multikey index on that field . These multikey indexes allow queries to select documents that contain arrays by matching on element or elements of the arrays. MongoDB automatically determines whether to create a multikey index if the indexed field contains an array value; you do not need to explicitly specify the multikey type. 5) Geo index  To support efficient queries of geospatial coordinate data, 6) Text indexes It is supports searching for string content in a collection 7) Hashed indexes To support hash based sharding, MongoDB provides a hashed index type, which indexes the hash of the value of a field. These indexes have a more random distribution of values along their range, but only support equality matches and cannot support range-based queries.
  54. Aggregations are operations that process data records and return computed results. Aggregation operations group values from multiple documents together, and can preform a variety of operations on the groped data to return a single result. MongoDB provides a rich set of aggregation operations. Like queries, aggregation operations in MongoDB use collections of documents as an input and return results in the form of one or more documents. There are 3 concepts in aggregation. 1)Aggregation pipelines 2)map-reduce 3)Single purpose aggregation operation
  55. The pipeline provides efficient data aggregation using native operations within MongoDB, and is the preferred method for data aggregation in MongoDB. Documents enter a multi-stage pipeline that transforms the documents into an aggregated result. In first stage it take document as input and process it then producing result output documents as the input for next stage and so on. Possible stages in aggregation framework are, $project ,$match , $group , $sort , $skip, $limit , $unwind. In this example have two stages such as $match and $group, first happen $match stage then here filter status field value equal “A” then output document will be the input document to the next $group stage. then group according to the cust_id and get sum of amount as total.
  56. MongoDB also provides map-reduce operations to perform aggregation. In general, map-reduce operations have two phases such as map and reduce Optionally, map-reduce can have a finalize stage to make final modifications to the result. Map-reduce uses custom JavaScript functions to perform the map and reduce operations, as well as the optional finalize operation. There are some syntax: Map - JavaScript function that maps a value with a key and emits a key – values pair. Reduce - JavaScript function that reduce or groups all the documents having the same key. Out – specifies the location of the map-reduce query result query- specifies the optional selection criteria for selecting documents Sort – specifies the optional sort criteria Limit – specifies the optional maximum number of documents to be returned. In this example, have orders collection. Then get query with status field value equal “A” after that map the documents, key as cust_id and value as amount. Next reduce stage here return the sum of amount array and the result will store in order_totals.
  57. MongoDB provides special purpose database commands. These common aggregation operations are: returning a count of matching documents, returning the distinct values for a field, and grouping data based on the values of a field. All of these operations aggregate documents from a single collection
  58. Sharding is a method for storing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations As the size of the data increases, a single machine may not be sufficient to store the data and can not acceptable all read write request Sharding solves the problem with horizontal scaling. With shading can add more machines to support data growth and the demands of read and write operations. Shards: Shards are used to store data. They provide high availability and data consistency. In production environment each shard is a separate replica set. Config Servers: Config servers store the cluster's metadata. This data contains a mapping of the cluster's data set to the shards Query Routers(MongoS): Query Routers are basically mongos instances, interface with client applications and direct operations to the appropriate shard. The query router processes and targets operations to shards and then returns results to the clients. 
  59. It is fixed-size circular collections. It get high performance for create, read and delete operations By circular, it means that when the fixed size allocated to the collection is exhausted, it will start deleting the oldest document in the collection without providing any explicit commands. Capped collections restrict updates to the documents if the update results in increased document size Capped collections are best for storing log information,