SlideShare a Scribd company logo
1 of 45
NO SQL 
10/20/2014 @ Surabhi Dwivedi 1
Contents 
๏‚— Introduction and Feature of NoSQL 
๏‚— CAP Theorem 
๏‚— RDBMS VS NoSQL 
๏‚— NoSQL Database family 
10/20/2014 @ Surabhi Dwivedi 2
Features- Not Only SQL 
๏‚— No RDBMS 
โ—ฆ No relational 
๏‚— Distributed Data Store 
โ—ฆ Horizontally scalable 
๏‚— Schema-free / Flexible schema 
โ—ฆ Database JOINs generally not supported 
๏‚— A huge amount of data 
โ—ฆ Eg Google/Facebook which collects terabits of data 
๏‚— BASE properties 
โ—ฆ Basically Available 
โ—ฆ Soft state 
๏‚– It does not have to be consitent all the time 
โ—ฆ Eventually consistent 
๏‚– The system will eventually become consistent when the updates 
propagate, in particular, when there are not too many updates 
10/20/2014 @ Surabhi Dwivedi 3
NoSQL 
๏‚— Provides a mechanism for 
โ—ฆ storage and retrieval of data that is modeled in 
means other than the tabular relations used in 
relational databases 
๏‚— Used in big data and real-time web 
applications 
๏‚— NoSQL isnโ€™t a single product or technology, 
but an umbrella term for a category of 
databases 
10/20/2014 @ Surabhi Dwivedi 4
NoSQL does not Provide 
๏‚— Joins 
๏‚— Group by 
๏‚— ACID transactions 
๏‚— SQL 
๏‚— NoSQL databases reject: 
โ—ฆ Overhead of ACID transactions 
โ—ฆ โ€œComplexityโ€ of SQL 
โ—ฆ Burden of up-front schema design 
โ—ฆ Declarative query expression 
10/20/2014 @ Surabhi Dwivedi 5
10/20/2014 @ Surabhi Dwivedi 6
Requirement of NoSQL 
10/20/2014 @ Surabhi Dwivedi 7
NoSQL - Users 
10/20/2014 @ Surabhi Dwivedi 8
CAP Theorem 
10/20/2014 @ Surabhi Dwivedi 9
CAP Theorem 
๏‚— Three properties of a system 
โ—ฆ Consistency 
๏‚– all copies have same value 
โ—ฆ Availability 
๏‚– system can run even if parts have failed Via replication 
โ—ฆ Partitions 
๏‚– network can break into two or more parts, each with active 
systems that canโ€™t talk to other parts 
๏‚— Very large systems will partition at some point 
โ—ฆ Choose one of consistency or availability 
โ—ฆ Traditional database choose consistency 
โ—ฆ Most Web applications choose availability 
๏‚– Except for specific parts such as order processing 
10/20/2014 @ Surabhi Dwivedi 10
RDBMS VS NoSQL database 
RDBMS NoSQL 
Structured and organized data Stands for Not Only SQL 
Structured query language (SQL) No declarative query language 
Data and its relationships are stored in 
separate tables. 
No predefined schema 
Data Manipulation Language, Data 
Definition Language 
Variants - Key-Value Pair Store, Column 
Store, Document Store, Graph Store 
Tight Consistency Eventual consistency rather ACID 
property 
ACID Transaction CAP Theorem 
- Prioritizes high performance, high 
availability and scalability 
10/20/2014 @ Surabhi Dwivedi 11
Example โ€“NoSQL Databases 
10/20/2014 @ Surabhi Dwivedi 12
NoSQL Database Family 
10/20/2014 @ Surabhi Dwivedi 13
NoSQL Database Types 
โ€ข Hash table of keys 
โ€ข Lookup a single value for a key 
โ€ข Amazonโ€™s Dynamo 
Distributed Key- 
Value Systems 
โ€ข Stores documents made up of tagged elements 
โ€ข Access data by key or by search of โ€œdocumentโ€ data. 
โ€ข CouchDB, MongoDB 
Document-based 
Systems 
โ€ข Each storage block contains data from only one column 
โ€ข Googleโ€™s BigTable 
โ€ข Facebookโ€™s Cassandra 
Column-based 
Systems 
โ€ข Use a graph structure 
โ€ข Googleโ€™s Pregel, - Neo4j 
Graph-based 
Systems 
10/20/2014 @ Surabhi Dwivedi 14
Column-oriented databases 
โ€ข Column-family stores allow you to store data with keys mapped to 
values and the values grouped into multiple column families, 
โ€ข Each column family being a map of data Most popular types - non-relational 
databases 
โ€ข Column-family databases store data in column families as rows 
โ€ข They have many columns associated with a row key 
โ€ข Column families are groups of related data that is often 
โ€ข accessed together 
โ€ข The basic unit of storage in Column-family databases is a column 
โ€ข Example 
โ€ข Hadoop / Hbase 
โ€ข Cassandra :Apache Cassandra was initially developed at Facebook to 
power their Inbox Search feature 
โ€ข Cloudata :Google's Big table clone like HBase 
10/20/2014 @ Surabhi Dwivedi 15
Column-Oriented Databases Cont โ€ฆ 
๏‚— Data tables are stored as sections of columns of 
data, rather than as rows of data. 
๏‚— The column is used as a store for the value, and 
has a timestamp that is used to differentiate the 
valid content from stale ones. 
๏‚— Application will use the timestamp to find out 
which of the stored values in the backup nodes 
are up-to-date. 
๏‚— Column Family 
โ—ฆ A container for columns, analogous to table in a 
relational database. 
โ—ฆ The column Family has a name, a map with a key and 
a value(which is a map c10o/20n/2t0a14in@in Sgura cbhoi Dlwuivmedi ns). 16
Example 
๏‚— Cassandra 
๏‚— Hbase 
๏‚— Hypertable 
๏‚— Amazon Simple DB 
10/20/2014 @ Surabhi Dwivedi 17
{ 
โ€œrow_key_1โ€ : { 
โ€œnameโ€ : { 
... 
} 
โ€œlocationโ€ : { 
... 
}, 
โ€œpreferencesโ€ : { 
... 
} 
}, 
โ€œrow_key_2โ€ : { 
โ€œnameโ€ : { 
... 
}, 
โ€œlocationโ€ : { 
... 
}, 
โ€œpreferencesโ€ : { 
... 
} 
}, 
โ€œrow_key_3โ€ : { 
... 
} 
uniquely identifies a record in a 
column database 
โ€ขColumn-family identifier. 
โ€ขSecond level key 
10/20/2014 @ Surabhi Dwivedi 18
{ 
โ€œrow_key_1โ€ : { 
โ€œnameโ€ : { 
โ€œfirst_nameโ€ : โ€œJollyโ€, 
โ€œlast_nameโ€ : โ€œGoodfellowโ€ 
} 
} 
}, 
โ€œlocationโ€ : { 
โ€œzipโ€: โ€œ94301โ€ 
}, 
โ€œpreferencesโ€ : { 
โ€œd/rโ€ : โ€œDโ€ 
} 
}, 
โ€œrow_key_2โ€ : { 
โ€œnameโ€ : { 
โ€œfirst_nameโ€ : โ€œVeryโ€, 
โ€œmiddle_nameโ€ : โ€œHappyโ€, 
โ€œlast_nameโ€ : โ€œGuyโ€ 
}, 
โ€œlocationโ€ : { 
โ€œzipโ€ : โ€œ10001โ€ 
}, 
โ€œpreferencesโ€ : 
โ€œv/nvโ€: โ€œVโ€ 
} 
}, 
... 
} 
Each row may have a different set of 
columns within a column-family 
10/20/2014 @ Surabhi Dwivedi 19
Contrasting Column Databases with 
RDBMS 
โ€ข Column-oriented database 
โ€“ minimal need for schema dentition 
โ€“ easily accommodate newer columns 
โ€“ predefined column-family 
โ€“ set of columns grouped together into a bundle 
โ€“ Column family(no data type) - column in an 
RDBMS(with data type) 
โ€“ Column databases designed to scale and can easily 
accommodate millions of columns and billions of 
rows 
10/20/2014 @ Surabhi Dwivedi 20
Contrasting Column Databases with 
RDBMS Cont โ€ฆ 
10/20/2014 @ Surabhi Dwivedi 21
Hadoop distributed filesystem (HDFS) โ€“ 
Background for Distributed Storage 
โ€ข Apache Hadoop is an open source software 
project 
โ€ข Enables the distributed processing of large data 
sets across clusters of servers 
โ€ข Designed to scale up from a single server to 
thousands of machines, with a very high degree 
of fault tolerance. 
โ€ข Data in a Hadoop cluster is broken down into 
smaller pieces (called blocks) and distributed 
throughout the cluster. 
โ€ข The map and reduce functions can be executed 
on smaller subsets of larger data sets 
10/20/2014 @ Surabhi Dwivedi 22
Hadoop distributed filesystem 
(HDFS) 
A MapReduce 
โ—ฆ Map() procedure - performs filtering and sorting 
(such as sorting students by first name into 
queues, one queue for each name) 
โ—ฆ Reduce() procedure performs a summary 
operation (such as counting the number of 
students in each queue, yielding name 
frequencies). 
10/20/2014 @ Surabhi Dwivedi 23
Hadoop distributed filesystem 
(HDFS) - Example 
๏‚— A file containing the phone numbers for everyone in the 
United States; 
๏‚— The people with a last name starting with A might be 
stored on server 1, B on server 2, and so on. 
๏‚— In a Hadoop world, pieces of this phonebook would be 
stored across the cluster 
๏‚— To reconstruct the entire phonebook, your program 
would need the blocks from every server in the cluster. 
๏‚— To achieve availability as components fail, HDFS 
replicates these smaller pieces onto two additional 
servers by default. 
โ—ฆ This redundancy offers multiple benefits, 
๏‚– Higher availability. 
๏‚– Scalability : Hadoop cluster break work into smaller chunks and run 
those jobs on all the servers in the cluster 
๏‚– Data locality, which is critical when working with large data sets. 
10/20/2014 @ Surabhi Dwivedi 24
Hbase - Distributed Storage 
๏‚— HBase is a column-oriented database 
management system that runs on top of 
HDFS. 
๏‚— HBaseโ€™s distributed architecture is designed 
for applications storing up to billions of 
rows and millions of columns 
๏‚— A good option to replace a relational 
database that cannot support such large data 
sets. 
10/20/2014 @ Surabhi Dwivedi 25
Hbase Distributed Storage Architecture 
10/20/2014 @ Surabhi Dwivedi 26
10/20/2014 @ Surabhi Dwivedi 27
โ€ข master-worker pattern 
โ€ข A master and a set of workers(range servers) 
โ€ข When HBase starts, master allocates set of ranges to a range 
server. 
โ€ข Each range stores an ordered set of rows, where each row is 
idetified by a unique row-key. 
โ€ข As number of rows stored in a range grows beyond a 
configured thresold 
โ€ข the range is split into two and rows are divided between the 
two new ranges. 
10/20/2014 @ Surabhi Dwivedi 28
write-ahead-log (WAL) 
โ€ข WAL is a common technique for providing atomicity 
and durability (two of the ACID properties). 
โ€ข When data is written to a region, itโ€™s first written to the 
write-ahead-log, if enabled. 
โ€ข Later, itโ€™s written to the regionโ€™s in-memory store. 
โ€ข If the in-memory store is full, data is flushed to disk 
and persisted in the underlying distributed storage. 
โ€ข In HBase a client program could decide to turn WAL 
on or switch it off. 
โ€ข Switching it off would boost performance but reduce 
reliability and recovery, in case of failure. 
10/20/2014 @ Surabhi Dwivedi 29
write-ahead-log (WAL) 
10/20/2014 @ Surabhi Dwivedi 30
Document Model 
๏‚— Notion of a schema is dynamic: each 
document can contain different fields. 
โ—ฆ Helpful for modeling unstructured and 
polymorphic data. 
โ—ฆ It also makes it easier to evolve an application 
during development , such as adding new fields. 
โ—ฆ Data can be queried based on any fields in a 
document 
10/20/2014 @ Surabhi Dwivedi 31
DOCUMENT STORE 
โ€ข Documents are grouped together into collections 
โ€ข Collections - relational tables. 
โ€ข Collections donโ€™t impose strict schema 
constraints 
โ€ข Records are not documents in the sense of a 
word processing document 
โ€ข Structure of any document can be modified 
โ€ข By adding and removing members from the document 
- by reading the document into program, modifying it 
and re-saving it 
โ€ข By using various update commands. 
10/20/2014 @ Surabhi Dwivedi 32
DOCUMENT STORE 
โ€ข Each document is stored in BSON format. 
โ€ข Binary data (using BSON format) can be stored 
in any of the fields in the document. 
โ€ข BSON is a binary-encoded representation of a JSON-type 
document format 
โ€“ nested set of key/value pairs. 
โ€“ JSON โ€“ JavaScript Object Notation 
โ€ข BSON is a superset of JSON 
โ€“ supports additional types 
โ€ข regular expression, 
โ€ข binary data, 
โ€ข date. 
โ€ข Each document has a unique identifier, which 
MongoDB can generate like auto-generated object ids 
10/20/2014 @ Surabhi Dwivedi 33
DOCUMENT STORE 
๏‚— Document databases โ€“ 
โ—ฆ Good for storing and managing Big Data-size 
collections of literal documents 
๏‚– like text documents, email messages, and XML 
documents 
๏‚– conceptual โ€œdocumentsโ€ like de-normalized 
(aggregate) representations of a database entity 
๏‚— Good for storing โ€œsparseโ€ data 
โ—ฆ irregular (semi-structured) data that would 
require an extensive use of โ€œnullsโ€ in an 
RDBMS. 
10/20/2014 @ Surabhi Dwivedi 34
DOCUMENT STORE 
๏‚— โ€œDocumentsโ€ are encoded in a standard data exchange 
format 
โ—ฆ XML, JSON (JavaScript Object Notation) or BSON (Binary 
JSON). 
๏‚— Unlike the simple key-value stores, the value column in 
document databases contains semi-structured data 
โ—ฆ specifically attribute name/value pairs. 
๏‚— A single column can house hundreds of such attributes 
๏‚— Number and type of attributes recorded can vary from 
row to row. 
๏‚— Both keys and values are fully searchable in document 
databases. 
10/20/2014 @ Surabhi Dwivedi 35
DOCUMENT STORE 
๏‚— Records within a single table can have different structures. 
๏‚— An example record from Mongo, using JSON format, might 
look like 
{ 
โ€œ_idโ€ : ObjectId(โ€œ4fccbf281168a6aa3c215443โ€ณ), 
โ€œfirst_nameโ€ : โ€œThomasโ€, 
โ€œlast_nameโ€ : โ€œJeffersonโ€, 
โ€œaddressโ€ : { 
โ€œstreetโ€ : โ€œ1600 Pennsylvania Ave NWโ€, 
โ€œcityโ€ : โ€œWashingtonโ€, 
โ€œstateโ€ : โ€œDCโ€ 
} 
} 
10/20/2014 @ Surabhi Dwivedi 36
Document Store - Internals 
๏‚— Document Stores 
โ—ฆ Like Key-Value Stores, except Value is a โ€œDocumentโ€ 
๏‚— Data model: (key, โ€œdocumentโ€) pairs 
๏‚— Basic operations: I 
โ—ฆ Insert (key, document), 
โ—ฆ Fetch(key), Update(key), 
โ—ฆ Delete(key) 
๏‚— Also Fetch() based on document contents 
๏‚— Example systems 
โ—ฆ CouchDB, MongoDB 
๏‚— Document stores 
โ—ฆ Store arbitrary/extensible structures as a โ€œvalueโ€ 
10/20/2014 @ Surabhi Dwivedi 37
10/20/2014 @ Surabhi Dwivedi 38
Advantages of the Document Model 
๏‚— More natural to represent data at the database level 
๏‚— An aggregated document can be accessed with a 
single call to the database 
โ—ฆ rather than having to JOIN multiple tables to respond to a 
query. 
๏‚— The MongoDB document is physically stored as a 
single object, requiring only a single read from 
memory or disk. 
โ—ฆ RDBMS JOINs require multiple reads from multiple 
physical locations. 
๏‚— Distributing the database across multiple nodes (a 
process called sharding) is easier 
โ—ฆ horizontal scalability 
โ—ฆ documents are self-contained 
10/20/2014 @ Surabhi Dwivedi 39
MongoDB- Features 
๏‚— MongoDB provides high performance data persistence. 
โ—ฆ Support for embedded data models reduces I/O activity on database 
system. 
โ—ฆ Indexes support faster queries and can include keys from embedded 
documents and arrays. 
๏‚— High Availability 
โ—ฆ automatic failover. 
โ—ฆ data redundancy. 
๏‚— A replica set is a group of MongoDB servers that maintain the 
same data set, providing redundancy and increasing data 
availability. 
๏‚— Automatic Scaling 
โ—ฆ MongoDB provides horizontal scalability as part of its core 
functionality. 
โ—ฆ Automatic sharding distributes data across a cluster of machines. 
โ—ฆ Replica sets can provide eventually-consistent reads for low-latency 
high throughput deployments. 
10/20/2014 @ Surabhi Dwivedi 40
MongoDB - Sharding 
โ€ข Data is distributed across multiple range servers 
โ€ข MongoDB allows ordered collections to be saved across 
multiple machines. 
โ€ข Shards are replicated to allow failover. 
โ€ข Large collection could be split into four shards 
โ€ข Each shard in turn may be replicated three times. 
โ€ข This would create 12 units of a MongoDB server. 
โ€ข The two additional cpies of each shard serve as failover units. 
โ€ข Sharding addresses the challenge of scaling to support 
high throughput and large data sets: 
โ€ข Each shard processes fewer operations as the cluster grows. 
โ€ข As a result, a cluster can increase capacity and throughput 
horizontally. 
โ€ข For example, to insert data, the application only needs to access 
the shard responsible for that record. 
โ€ข Sharding reduces the amount of data that each server needs to 
store. Each shard stores less data as the cluster grows. 
10/20/2014 @ Surabhi Dwivedi 41
โ€ขData set is divided and 
distributed data over 
multiple servers, or shards. 
โ€ข Each shard is an 
independent database, and 
collectively, the shards make 
up a single logical database. 
10/20/2014 @ Surabhi Dwivedi 42
Distributed Key-Value Systems 
๏‚— Key-Value Pair (KVP) Stores 
โ—ฆ Access data (values) by strings called keys. 
โ—ฆ Data has no required format โ€“ data may have any format 
โ—ฆ Extremely simple interface 
๏‚— Data model: (key, value) pairs 
๏‚— NoSQL Key-Value store is a single table with two 
columns: 
โ—ฆ one being the (Primary) Key, and the other being the Value. 
๏‚— Basic Operations: Insert (key, value), Fetch 
(key),Update (key), Delete (key) 
โ—ฆ Implementation: efficiency, scalability, fault-tolerance 
๏‚— Records distributed to nodes based on key Replication 
๏‚— Single-record transactions, โ€œeventual consistencyโ€ 
10/20/2014 @ Surabhi Dwivedi 43
Example- Key Value 
๏‚— Riak 
๏‚— Redis 
๏‚— Memcached DB 
๏‚— Berkeley DB 
๏‚— Hamster DB (especially suited for 
embedded use) 
๏‚— Amazon Dynamo DB (not open source) 
๏‚— Project Voldemort (Open Source 
Implementation of Dynamo DB) 
10/20/2014 @ Surabhi Dwivedi 44
References 
๏‚— Professional NoSQL โ€“ Shashank Tiwari 
๏‚— MongoDB Manual 
๏‚— http://docs.mongodb.org 
๏‚— http://docs.mongodb.org/manual/core/shar 
ding-introduction/ 
๏‚— Wikipedia References 
๏‚— Intro to Hbase Internals & Schema Design 
(for HBase Users) 
โ—ฆ Alex Baranau, Sematext International, 2012 
10/20/2014 @ Surabhi Dwivedi 45

More Related Content

What's hot

NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
Marin Dimitrov
ย 
Cloud Deployments with Apache Hadoop and Apache HBase
Cloud Deployments with Apache Hadoop and Apache HBaseCloud Deployments with Apache Hadoop and Apache HBase
Cloud Deployments with Apache Hadoop and Apache HBase
DATAVERSITY
ย 
NoSQL Seminer
NoSQL SeminerNoSQL Seminer
NoSQL Seminer
Partha Das
ย 
NoSQL-Database-Concepts
NoSQL-Database-ConceptsNoSQL-Database-Concepts
NoSQL-Database-Concepts
Bhaskar Gunda
ย 
Compaction and Splitting in Apache Accumulo
Compaction and Splitting in Apache AccumuloCompaction and Splitting in Apache Accumulo
Compaction and Splitting in Apache Accumulo
Hortonworks
ย 

What's hot (20)

NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
ย 
Cloud Deployments with Apache Hadoop and Apache HBase
Cloud Deployments with Apache Hadoop and Apache HBaseCloud Deployments with Apache Hadoop and Apache HBase
Cloud Deployments with Apache Hadoop and Apache HBase
ย 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
ย 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
ย 
Sql vs NoSQL-Presentation
 Sql vs NoSQL-Presentation Sql vs NoSQL-Presentation
Sql vs NoSQL-Presentation
ย 
NoSQL Seminer
NoSQL SeminerNoSQL Seminer
NoSQL Seminer
ย 
Nosql
NosqlNosql
Nosql
ย 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
ย 
Column db dol
Column db dolColumn db dol
Column db dol
ย 
Non Relational Databases
Non Relational DatabasesNon Relational Databases
Non Relational Databases
ย 
HBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designHBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table design
ย 
NoSql
NoSqlNoSql
NoSql
ย 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
ย 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache Accumulo
ย 
NoSQL-Database-Concepts
NoSQL-Database-ConceptsNoSQL-Database-Concepts
NoSQL-Database-Concepts
ย 
Compaction and Splitting in Apache Accumulo
Compaction and Splitting in Apache AccumuloCompaction and Splitting in Apache Accumulo
Compaction and Splitting in Apache Accumulo
ย 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
ย 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
ย 
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
ย 
8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databases
ย 

Viewers also liked

Cross-Validation
Cross-ValidationCross-Validation
Cross-Validation
guestfee8698
ย 
INTRODUCTION TO UML DIAGRAMS
INTRODUCTION TO UML DIAGRAMSINTRODUCTION TO UML DIAGRAMS
INTRODUCTION TO UML DIAGRAMS
Ashita Agrawal
ย 

Viewers also liked (20)

Statistical terms for classification
Statistical terms for classificationStatistical terms for classification
Statistical terms for classification
ย 
CareerGuide.com
CareerGuide.comCareerGuide.com
CareerGuide.com
ย 
Buy | Sell |Rent | Property | Surabhi Realtors | Brahmand |Thane
Buy | Sell |Rent | Property | Surabhi Realtors | Brahmand |ThaneBuy | Sell |Rent | Property | Surabhi Realtors | Brahmand |Thane
Buy | Sell |Rent | Property | Surabhi Realtors | Brahmand |Thane
ย 
Career counselling impact story -gautam sharma
Career counselling impact story -gautam sharmaCareer counselling impact story -gautam sharma
Career counselling impact story -gautam sharma
ย 
Career experts
Career expertsCareer experts
Career experts
ย 
Careerguide.com
Careerguide.comCareerguide.com
Careerguide.com
ย 
Stochastic Modeling for Valuation and Risk Management
Stochastic Modeling for Valuation and Risk ManagementStochastic Modeling for Valuation and Risk Management
Stochastic Modeling for Valuation and Risk Management
ย 
Snapshot feature of network storage
Snapshot feature of network storageSnapshot feature of network storage
Snapshot feature of network storage
ย 
Introduction to Bag of Little Bootstrap
Introduction to Bag of Little Bootstrap Introduction to Bag of Little Bootstrap
Introduction to Bag of Little Bootstrap
ย 
Introduction to Mathematical Probability
Introduction to Mathematical ProbabilityIntroduction to Mathematical Probability
Introduction to Mathematical Probability
ย 
Cross-Validation
Cross-ValidationCross-Validation
Cross-Validation
ย 
Gamma, Expoential, Poisson And Chi Squared Distributions
Gamma, Expoential, Poisson And Chi Squared DistributionsGamma, Expoential, Poisson And Chi Squared Distributions
Gamma, Expoential, Poisson And Chi Squared Distributions
ย 
Btrfs current status and_future_prospects
Btrfs current status and_future_prospectsBtrfs current status and_future_prospects
Btrfs current status and_future_prospects
ย 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationLecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
ย 
Work ethics by baskaran
Work ethics by baskaranWork ethics by baskaran
Work ethics by baskaran
ย 
Stochastic modelling and its applications
Stochastic modelling and its applicationsStochastic modelling and its applications
Stochastic modelling and its applications
ย 
INTRODUCTION TO UML DIAGRAMS
INTRODUCTION TO UML DIAGRAMSINTRODUCTION TO UML DIAGRAMS
INTRODUCTION TO UML DIAGRAMS
ย 
Career development ppt
Career development pptCareer development ppt
Career development ppt
ย 
basics of computer system ppt
basics of computer system pptbasics of computer system ppt
basics of computer system ppt
ย 
Introduction to computer network
Introduction to computer networkIntroduction to computer network
Introduction to computer network
ย 

Similar to No SQL introduction

my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.pptmy no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
wondimagegndesta
ย 
Research on vector spatial data storage scheme based
Research on vector spatial data storage scheme basedResearch on vector spatial data storage scheme based
Research on vector spatial data storage scheme based
Anant Kumar
ย 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
Farzad Nozarian
ย 
No sql database
No sql databaseNo sql database
No sql database
vishal gupta
ย 
Optimization on Key-value Stores in Cloud Environment
Optimization on Key-value Stores in Cloud EnvironmentOptimization on Key-value Stores in Cloud Environment
Optimization on Key-value Stores in Cloud Environment
Fei Dong
ย 

Similar to No SQL introduction (20)

Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
ย 
Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?
ย 
NoSQL Basics and MongDB
NoSQL Basics and  MongDBNoSQL Basics and  MongDB
NoSQL Basics and MongDB
ย 
Dsm project-h base-cassandra
Dsm project-h base-cassandraDsm project-h base-cassandra
Dsm project-h base-cassandra
ย 
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.pptmy no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
ย 
Research on vector spatial data storage scheme based
Research on vector spatial data storage scheme basedResearch on vector spatial data storage scheme based
Research on vector spatial data storage scheme based
ย 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
ย 
DSM - Comparison of Hbase and Cassandra
DSM - Comparison of Hbase and CassandraDSM - Comparison of Hbase and Cassandra
DSM - Comparison of Hbase and Cassandra
ย 
2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx
ย 
Apache HBaseโ„ข
Apache HBaseโ„ขApache HBaseโ„ข
Apache HBaseโ„ข
ย 
unit2-ppt1.pptx
unit2-ppt1.pptxunit2-ppt1.pptx
unit2-ppt1.pptx
ย 
Presentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMSPresentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMS
ย 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
ย 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
ย 
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB
ย 
No sq lv2
No sq lv2No sq lv2
No sq lv2
ย 
Nosql databases
Nosql databasesNosql databases
Nosql databases
ย 
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
ย 
No sql database
No sql databaseNo sql database
No sql database
ย 
Optimization on Key-value Stores in Cloud Environment
Optimization on Key-value Stores in Cloud EnvironmentOptimization on Key-value Stores in Cloud Environment
Optimization on Key-value Stores in Cloud Environment
ย 

Recently uploaded

CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online โ˜‚๏ธ
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online  โ˜‚๏ธCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online  โ˜‚๏ธ
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online โ˜‚๏ธ
anilsa9823
ย 
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female serviceCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
ย 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
ย 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
bodapatigopi8531
ย 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
ย 
CHEAP Call Girls in Pushp Vihar (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
ย 

Recently uploaded (20)

Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
ย 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
ย 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
ย 
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online โ˜‚๏ธ
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online  โ˜‚๏ธCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online  โ˜‚๏ธ
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online โ˜‚๏ธ
ย 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
ย 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
ย 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
ย 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
ย 
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female serviceCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
ย 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
ย 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโ€™s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโ€™s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโ€™s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโ€™s ...
ย 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
ย 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
ย 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
ย 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
ย 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
ย 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
ย 
CHEAP Call Girls in Pushp Vihar (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
ย 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
ย 
Vip Call Girls Noida โžก๏ธ Delhi โžก๏ธ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida โžก๏ธ Delhi โžก๏ธ 9999965857 No Advance 24HRS LiveVip Call Girls Noida โžก๏ธ Delhi โžก๏ธ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida โžก๏ธ Delhi โžก๏ธ 9999965857 No Advance 24HRS Live
ย 

No SQL introduction

  • 1. NO SQL 10/20/2014 @ Surabhi Dwivedi 1
  • 2. Contents ๏‚— Introduction and Feature of NoSQL ๏‚— CAP Theorem ๏‚— RDBMS VS NoSQL ๏‚— NoSQL Database family 10/20/2014 @ Surabhi Dwivedi 2
  • 3. Features- Not Only SQL ๏‚— No RDBMS โ—ฆ No relational ๏‚— Distributed Data Store โ—ฆ Horizontally scalable ๏‚— Schema-free / Flexible schema โ—ฆ Database JOINs generally not supported ๏‚— A huge amount of data โ—ฆ Eg Google/Facebook which collects terabits of data ๏‚— BASE properties โ—ฆ Basically Available โ—ฆ Soft state ๏‚– It does not have to be consitent all the time โ—ฆ Eventually consistent ๏‚– The system will eventually become consistent when the updates propagate, in particular, when there are not too many updates 10/20/2014 @ Surabhi Dwivedi 3
  • 4. NoSQL ๏‚— Provides a mechanism for โ—ฆ storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases ๏‚— Used in big data and real-time web applications ๏‚— NoSQL isnโ€™t a single product or technology, but an umbrella term for a category of databases 10/20/2014 @ Surabhi Dwivedi 4
  • 5. NoSQL does not Provide ๏‚— Joins ๏‚— Group by ๏‚— ACID transactions ๏‚— SQL ๏‚— NoSQL databases reject: โ—ฆ Overhead of ACID transactions โ—ฆ โ€œComplexityโ€ of SQL โ—ฆ Burden of up-front schema design โ—ฆ Declarative query expression 10/20/2014 @ Surabhi Dwivedi 5
  • 7. Requirement of NoSQL 10/20/2014 @ Surabhi Dwivedi 7
  • 8. NoSQL - Users 10/20/2014 @ Surabhi Dwivedi 8
  • 9. CAP Theorem 10/20/2014 @ Surabhi Dwivedi 9
  • 10. CAP Theorem ๏‚— Three properties of a system โ—ฆ Consistency ๏‚– all copies have same value โ—ฆ Availability ๏‚– system can run even if parts have failed Via replication โ—ฆ Partitions ๏‚– network can break into two or more parts, each with active systems that canโ€™t talk to other parts ๏‚— Very large systems will partition at some point โ—ฆ Choose one of consistency or availability โ—ฆ Traditional database choose consistency โ—ฆ Most Web applications choose availability ๏‚– Except for specific parts such as order processing 10/20/2014 @ Surabhi Dwivedi 10
  • 11. RDBMS VS NoSQL database RDBMS NoSQL Structured and organized data Stands for Not Only SQL Structured query language (SQL) No declarative query language Data and its relationships are stored in separate tables. No predefined schema Data Manipulation Language, Data Definition Language Variants - Key-Value Pair Store, Column Store, Document Store, Graph Store Tight Consistency Eventual consistency rather ACID property ACID Transaction CAP Theorem - Prioritizes high performance, high availability and scalability 10/20/2014 @ Surabhi Dwivedi 11
  • 12. Example โ€“NoSQL Databases 10/20/2014 @ Surabhi Dwivedi 12
  • 13. NoSQL Database Family 10/20/2014 @ Surabhi Dwivedi 13
  • 14. NoSQL Database Types โ€ข Hash table of keys โ€ข Lookup a single value for a key โ€ข Amazonโ€™s Dynamo Distributed Key- Value Systems โ€ข Stores documents made up of tagged elements โ€ข Access data by key or by search of โ€œdocumentโ€ data. โ€ข CouchDB, MongoDB Document-based Systems โ€ข Each storage block contains data from only one column โ€ข Googleโ€™s BigTable โ€ข Facebookโ€™s Cassandra Column-based Systems โ€ข Use a graph structure โ€ข Googleโ€™s Pregel, - Neo4j Graph-based Systems 10/20/2014 @ Surabhi Dwivedi 14
  • 15. Column-oriented databases โ€ข Column-family stores allow you to store data with keys mapped to values and the values grouped into multiple column families, โ€ข Each column family being a map of data Most popular types - non-relational databases โ€ข Column-family databases store data in column families as rows โ€ข They have many columns associated with a row key โ€ข Column families are groups of related data that is often โ€ข accessed together โ€ข The basic unit of storage in Column-family databases is a column โ€ข Example โ€ข Hadoop / Hbase โ€ข Cassandra :Apache Cassandra was initially developed at Facebook to power their Inbox Search feature โ€ข Cloudata :Google's Big table clone like HBase 10/20/2014 @ Surabhi Dwivedi 15
  • 16. Column-Oriented Databases Cont โ€ฆ ๏‚— Data tables are stored as sections of columns of data, rather than as rows of data. ๏‚— The column is used as a store for the value, and has a timestamp that is used to differentiate the valid content from stale ones. ๏‚— Application will use the timestamp to find out which of the stored values in the backup nodes are up-to-date. ๏‚— Column Family โ—ฆ A container for columns, analogous to table in a relational database. โ—ฆ The column Family has a name, a map with a key and a value(which is a map c10o/20n/2t0a14in@in Sgura cbhoi Dlwuivmedi ns). 16
  • 17. Example ๏‚— Cassandra ๏‚— Hbase ๏‚— Hypertable ๏‚— Amazon Simple DB 10/20/2014 @ Surabhi Dwivedi 17
  • 18. { โ€œrow_key_1โ€ : { โ€œnameโ€ : { ... } โ€œlocationโ€ : { ... }, โ€œpreferencesโ€ : { ... } }, โ€œrow_key_2โ€ : { โ€œnameโ€ : { ... }, โ€œlocationโ€ : { ... }, โ€œpreferencesโ€ : { ... } }, โ€œrow_key_3โ€ : { ... } uniquely identifies a record in a column database โ€ขColumn-family identifier. โ€ขSecond level key 10/20/2014 @ Surabhi Dwivedi 18
  • 19. { โ€œrow_key_1โ€ : { โ€œnameโ€ : { โ€œfirst_nameโ€ : โ€œJollyโ€, โ€œlast_nameโ€ : โ€œGoodfellowโ€ } } }, โ€œlocationโ€ : { โ€œzipโ€: โ€œ94301โ€ }, โ€œpreferencesโ€ : { โ€œd/rโ€ : โ€œDโ€ } }, โ€œrow_key_2โ€ : { โ€œnameโ€ : { โ€œfirst_nameโ€ : โ€œVeryโ€, โ€œmiddle_nameโ€ : โ€œHappyโ€, โ€œlast_nameโ€ : โ€œGuyโ€ }, โ€œlocationโ€ : { โ€œzipโ€ : โ€œ10001โ€ }, โ€œpreferencesโ€ : โ€œv/nvโ€: โ€œVโ€ } }, ... } Each row may have a different set of columns within a column-family 10/20/2014 @ Surabhi Dwivedi 19
  • 20. Contrasting Column Databases with RDBMS โ€ข Column-oriented database โ€“ minimal need for schema dentition โ€“ easily accommodate newer columns โ€“ predefined column-family โ€“ set of columns grouped together into a bundle โ€“ Column family(no data type) - column in an RDBMS(with data type) โ€“ Column databases designed to scale and can easily accommodate millions of columns and billions of rows 10/20/2014 @ Surabhi Dwivedi 20
  • 21. Contrasting Column Databases with RDBMS Cont โ€ฆ 10/20/2014 @ Surabhi Dwivedi 21
  • 22. Hadoop distributed filesystem (HDFS) โ€“ Background for Distributed Storage โ€ข Apache Hadoop is an open source software project โ€ข Enables the distributed processing of large data sets across clusters of servers โ€ข Designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. โ€ข Data in a Hadoop cluster is broken down into smaller pieces (called blocks) and distributed throughout the cluster. โ€ข The map and reduce functions can be executed on smaller subsets of larger data sets 10/20/2014 @ Surabhi Dwivedi 22
  • 23. Hadoop distributed filesystem (HDFS) A MapReduce โ—ฆ Map() procedure - performs filtering and sorting (such as sorting students by first name into queues, one queue for each name) โ—ฆ Reduce() procedure performs a summary operation (such as counting the number of students in each queue, yielding name frequencies). 10/20/2014 @ Surabhi Dwivedi 23
  • 24. Hadoop distributed filesystem (HDFS) - Example ๏‚— A file containing the phone numbers for everyone in the United States; ๏‚— The people with a last name starting with A might be stored on server 1, B on server 2, and so on. ๏‚— In a Hadoop world, pieces of this phonebook would be stored across the cluster ๏‚— To reconstruct the entire phonebook, your program would need the blocks from every server in the cluster. ๏‚— To achieve availability as components fail, HDFS replicates these smaller pieces onto two additional servers by default. โ—ฆ This redundancy offers multiple benefits, ๏‚– Higher availability. ๏‚– Scalability : Hadoop cluster break work into smaller chunks and run those jobs on all the servers in the cluster ๏‚– Data locality, which is critical when working with large data sets. 10/20/2014 @ Surabhi Dwivedi 24
  • 25. Hbase - Distributed Storage ๏‚— HBase is a column-oriented database management system that runs on top of HDFS. ๏‚— HBaseโ€™s distributed architecture is designed for applications storing up to billions of rows and millions of columns ๏‚— A good option to replace a relational database that cannot support such large data sets. 10/20/2014 @ Surabhi Dwivedi 25
  • 26. Hbase Distributed Storage Architecture 10/20/2014 @ Surabhi Dwivedi 26
  • 27. 10/20/2014 @ Surabhi Dwivedi 27
  • 28. โ€ข master-worker pattern โ€ข A master and a set of workers(range servers) โ€ข When HBase starts, master allocates set of ranges to a range server. โ€ข Each range stores an ordered set of rows, where each row is idetified by a unique row-key. โ€ข As number of rows stored in a range grows beyond a configured thresold โ€ข the range is split into two and rows are divided between the two new ranges. 10/20/2014 @ Surabhi Dwivedi 28
  • 29. write-ahead-log (WAL) โ€ข WAL is a common technique for providing atomicity and durability (two of the ACID properties). โ€ข When data is written to a region, itโ€™s first written to the write-ahead-log, if enabled. โ€ข Later, itโ€™s written to the regionโ€™s in-memory store. โ€ข If the in-memory store is full, data is flushed to disk and persisted in the underlying distributed storage. โ€ข In HBase a client program could decide to turn WAL on or switch it off. โ€ข Switching it off would boost performance but reduce reliability and recovery, in case of failure. 10/20/2014 @ Surabhi Dwivedi 29
  • 30. write-ahead-log (WAL) 10/20/2014 @ Surabhi Dwivedi 30
  • 31. Document Model ๏‚— Notion of a schema is dynamic: each document can contain different fields. โ—ฆ Helpful for modeling unstructured and polymorphic data. โ—ฆ It also makes it easier to evolve an application during development , such as adding new fields. โ—ฆ Data can be queried based on any fields in a document 10/20/2014 @ Surabhi Dwivedi 31
  • 32. DOCUMENT STORE โ€ข Documents are grouped together into collections โ€ข Collections - relational tables. โ€ข Collections donโ€™t impose strict schema constraints โ€ข Records are not documents in the sense of a word processing document โ€ข Structure of any document can be modified โ€ข By adding and removing members from the document - by reading the document into program, modifying it and re-saving it โ€ข By using various update commands. 10/20/2014 @ Surabhi Dwivedi 32
  • 33. DOCUMENT STORE โ€ข Each document is stored in BSON format. โ€ข Binary data (using BSON format) can be stored in any of the fields in the document. โ€ข BSON is a binary-encoded representation of a JSON-type document format โ€“ nested set of key/value pairs. โ€“ JSON โ€“ JavaScript Object Notation โ€ข BSON is a superset of JSON โ€“ supports additional types โ€ข regular expression, โ€ข binary data, โ€ข date. โ€ข Each document has a unique identifier, which MongoDB can generate like auto-generated object ids 10/20/2014 @ Surabhi Dwivedi 33
  • 34. DOCUMENT STORE ๏‚— Document databases โ€“ โ—ฆ Good for storing and managing Big Data-size collections of literal documents ๏‚– like text documents, email messages, and XML documents ๏‚– conceptual โ€œdocumentsโ€ like de-normalized (aggregate) representations of a database entity ๏‚— Good for storing โ€œsparseโ€ data โ—ฆ irregular (semi-structured) data that would require an extensive use of โ€œnullsโ€ in an RDBMS. 10/20/2014 @ Surabhi Dwivedi 34
  • 35. DOCUMENT STORE ๏‚— โ€œDocumentsโ€ are encoded in a standard data exchange format โ—ฆ XML, JSON (JavaScript Object Notation) or BSON (Binary JSON). ๏‚— Unlike the simple key-value stores, the value column in document databases contains semi-structured data โ—ฆ specifically attribute name/value pairs. ๏‚— A single column can house hundreds of such attributes ๏‚— Number and type of attributes recorded can vary from row to row. ๏‚— Both keys and values are fully searchable in document databases. 10/20/2014 @ Surabhi Dwivedi 35
  • 36. DOCUMENT STORE ๏‚— Records within a single table can have different structures. ๏‚— An example record from Mongo, using JSON format, might look like { โ€œ_idโ€ : ObjectId(โ€œ4fccbf281168a6aa3c215443โ€ณ), โ€œfirst_nameโ€ : โ€œThomasโ€, โ€œlast_nameโ€ : โ€œJeffersonโ€, โ€œaddressโ€ : { โ€œstreetโ€ : โ€œ1600 Pennsylvania Ave NWโ€, โ€œcityโ€ : โ€œWashingtonโ€, โ€œstateโ€ : โ€œDCโ€ } } 10/20/2014 @ Surabhi Dwivedi 36
  • 37. Document Store - Internals ๏‚— Document Stores โ—ฆ Like Key-Value Stores, except Value is a โ€œDocumentโ€ ๏‚— Data model: (key, โ€œdocumentโ€) pairs ๏‚— Basic operations: I โ—ฆ Insert (key, document), โ—ฆ Fetch(key), Update(key), โ—ฆ Delete(key) ๏‚— Also Fetch() based on document contents ๏‚— Example systems โ—ฆ CouchDB, MongoDB ๏‚— Document stores โ—ฆ Store arbitrary/extensible structures as a โ€œvalueโ€ 10/20/2014 @ Surabhi Dwivedi 37
  • 38. 10/20/2014 @ Surabhi Dwivedi 38
  • 39. Advantages of the Document Model ๏‚— More natural to represent data at the database level ๏‚— An aggregated document can be accessed with a single call to the database โ—ฆ rather than having to JOIN multiple tables to respond to a query. ๏‚— The MongoDB document is physically stored as a single object, requiring only a single read from memory or disk. โ—ฆ RDBMS JOINs require multiple reads from multiple physical locations. ๏‚— Distributing the database across multiple nodes (a process called sharding) is easier โ—ฆ horizontal scalability โ—ฆ documents are self-contained 10/20/2014 @ Surabhi Dwivedi 39
  • 40. MongoDB- Features ๏‚— MongoDB provides high performance data persistence. โ—ฆ Support for embedded data models reduces I/O activity on database system. โ—ฆ Indexes support faster queries and can include keys from embedded documents and arrays. ๏‚— High Availability โ—ฆ automatic failover. โ—ฆ data redundancy. ๏‚— A replica set is a group of MongoDB servers that maintain the same data set, providing redundancy and increasing data availability. ๏‚— Automatic Scaling โ—ฆ MongoDB provides horizontal scalability as part of its core functionality. โ—ฆ Automatic sharding distributes data across a cluster of machines. โ—ฆ Replica sets can provide eventually-consistent reads for low-latency high throughput deployments. 10/20/2014 @ Surabhi Dwivedi 40
  • 41. MongoDB - Sharding โ€ข Data is distributed across multiple range servers โ€ข MongoDB allows ordered collections to be saved across multiple machines. โ€ข Shards are replicated to allow failover. โ€ข Large collection could be split into four shards โ€ข Each shard in turn may be replicated three times. โ€ข This would create 12 units of a MongoDB server. โ€ข The two additional cpies of each shard serve as failover units. โ€ข Sharding addresses the challenge of scaling to support high throughput and large data sets: โ€ข Each shard processes fewer operations as the cluster grows. โ€ข As a result, a cluster can increase capacity and throughput horizontally. โ€ข For example, to insert data, the application only needs to access the shard responsible for that record. โ€ข Sharding reduces the amount of data that each server needs to store. Each shard stores less data as the cluster grows. 10/20/2014 @ Surabhi Dwivedi 41
  • 42. โ€ขData set is divided and distributed data over multiple servers, or shards. โ€ข Each shard is an independent database, and collectively, the shards make up a single logical database. 10/20/2014 @ Surabhi Dwivedi 42
  • 43. Distributed Key-Value Systems ๏‚— Key-Value Pair (KVP) Stores โ—ฆ Access data (values) by strings called keys. โ—ฆ Data has no required format โ€“ data may have any format โ—ฆ Extremely simple interface ๏‚— Data model: (key, value) pairs ๏‚— NoSQL Key-Value store is a single table with two columns: โ—ฆ one being the (Primary) Key, and the other being the Value. ๏‚— Basic Operations: Insert (key, value), Fetch (key),Update (key), Delete (key) โ—ฆ Implementation: efficiency, scalability, fault-tolerance ๏‚— Records distributed to nodes based on key Replication ๏‚— Single-record transactions, โ€œeventual consistencyโ€ 10/20/2014 @ Surabhi Dwivedi 43
  • 44. Example- Key Value ๏‚— Riak ๏‚— Redis ๏‚— Memcached DB ๏‚— Berkeley DB ๏‚— Hamster DB (especially suited for embedded use) ๏‚— Amazon Dynamo DB (not open source) ๏‚— Project Voldemort (Open Source Implementation of Dynamo DB) 10/20/2014 @ Surabhi Dwivedi 44
  • 45. References ๏‚— Professional NoSQL โ€“ Shashank Tiwari ๏‚— MongoDB Manual ๏‚— http://docs.mongodb.org ๏‚— http://docs.mongodb.org/manual/core/shar ding-introduction/ ๏‚— Wikipedia References ๏‚— Intro to Hbase Internals & Schema Design (for HBase Users) โ—ฆ Alex Baranau, Sematext International, 2012 10/20/2014 @ Surabhi Dwivedi 45