SlideShare a Scribd company logo
1 of 44
1
 Akbar Shaikh | Monocept
2
2002 2004 2006 2008 2010 2012
Data
3
Data
 Facebook had 60k servers in 2010
 Google had 450k servers in 2006 (speculated)
 Microsoft: between 100k and 500k servers (since Azure)
 Amazon: likely has a similar numbers, too (S3)
 Atomicity: Everything in a transaction succeeds lest it is rolled back.
 Consistency: A transaction cannot leave the database in an inconsistent state.
 Isolation: One transaction cannot interfere with another.
 Durability: A completed transaction persists, even after applications restart.
4
 Basic availability: Each request is guaranteed a response—successful or failed
execution.
 Soft state: The state of the system may change over time, at times without any
input (for eventual consistency).
 Eventual consistency: The database may be momentarily inconsistent but will be
consistent eventually.
5
The point I am trying to make here is, we may have to look beyond ACID to
something called BASE, coined by Eric Brewer:
 Consistency : Data access in a distributed database is considered to be consistent when an
update written on one node is immediately available on another node.
 Availability : The system guarantees availability for requests even though one or more
nodes are down.
 Partition Tolerance : Nodes can be physically separated from each other at any given
point and for any length of time. The time they're not able to reach each other, due to
routing problems, network interface troubles, or firewall issues, is called a network
partition. During the partition, all nodes should still be able to serve both read and write
requests. Ideally the system automatically reconciles updates as soon as every node can
reach every other node again.
6
Eric Brewer also noted that it is impossible for a distributed computer system to provide
consistency, availability and partition tolerance simultaneously. This is more commonly referred
to as the CAP theorem.
ACID
 Strong consistency for transactions
highest priority
 Availability less important
 Pessimistic
 Complex Mechanisms
BASE
 Availability and Scaling highest
priorities
 Weak consistency
 Optimistic
 Simple and Fast
7
8
9
10
11
{ "customer" : "billingAddress" : [ { "city" : "Chicago" } ],
"id" : 1,
"name" : "Martin",
"orders" : [ { "customerId" : 1,
"id" : 99,
"orderItems" : [ { "price" : 32.450000000000003,
"productId" : 27,
"productName" : "NoSQL Distilled"
} ],
"orderPayment" : [ { "billingAddress" : { "city" : "Chicago" },
"ccinfo" : "1000-1000-1000-1000",
"txnId" : "abelif879rft"
} ],
"shippingAddress" : [ { "city" : "Chicago" } ]
} ]
}
We see two primary reasons why people consider using a NoSQL database.
 Application development productivity.
A lot of application development effort is spent on mapping data between in-memory
data structures and a relational database. A NoSQL database may provide a data model
that better fits the application’s needs, thus simplifying that interaction and resulting in
less code to write, debug, and evolve.
 Large-scale data.
Organizations are finding it valuable to capture more data and process it more quickly.
They are finding it expensive, if even possible, to do so with relational databases. The
primary reason is that a relational database is designed to run on a single machine, but
it is usually more economic to run large data and computing loads on clusters of many
smaller and cheaper machines. Many NoSQL databases are designed explicitly to run
on clusters, so they make a better fit for big data scenarios.
12
 For almost as long as we’ve been in the software profession, relational databases
have been the default choice for serious data storage, especially in the world of
enterprise applications.
 If you’re an architect starting a new project, your only choice is likely to be which
relational database to use.
 After such a long period of dominance, the current excitement about NoSQL
databases comes as a surprise.
13
 Schemaless : data representation: Almost all NoSQL implementations offer schemaless data representation. This
means that you don’t have to think too far ahead to define a structure and you can continue to evolve over time—
including adding new fields or even nesting the data, for example, in case of JSON representation.
 Development time : I have heard stories about reduced development time because one doesn’t have to deal with
complex SQL queries. Do you remember the JOIN query that you wrote to collate the data across multiple tables to
create your final view?
 Speed : Even with the small amount of data that you have, if you can deliver in milliseconds rather than hundreds of
milliseconds—especially over mobile and other intermittently connected devices—you have much higher probability
of winning users over.
 Plan ahead for scalability : You read it right. Why fall into the ditch and then try to get out of it? Why not just plan
ahead so that you never fall into one. Or in other words, your application can be quite elastic—it can handle sudden
spikes of load. Of course, you win users over straightaway.
14
NoSQL databases have a lot more to offer than just solving the problems of scale
which are mentioned as follows:
Some NoSQL use cases
1. Massive data volumes
 Massively distributed architecture required to store the data
 Google, Amazon, Yahoo, Facebook…
2. Extreme query workload
 Impossible to efficiently do joins at that scale with an RDBMS
3. Schema evolution
 Schema flexibility (migration) is not trivial at large scale
 Schema changes can be gradually introduced with NoSQL
15
16
17
The main idea here is using a hash table where
there is a unique key and a pointer to a particular
item of data. The Key/value model is the simplest
and easiest to implement.
Key-value stores
But it is inefficient when you are only
interested in querying or updating part of
a value, among other disadvantages.
One key  one value, very fast
Key: Hash (no duplicates)
Value: binary object („BLOB“)
(DB does not understand your content)
customer_22
?=PQ)ҤVN?
=§(Q$U%V§W=(BN
W§(=BU&W§$()=
W§$(=%
GIVE ME A
MEANING!
Key
Value
18
 A key-value store is a simple hash table
 Primarily used when all access to the database is via primary key
 Simplest NoSQL data stores to use (from an API perspective) PUT, GET, DELETE (matches REST)
 Value is a blob with the data store not caring or knowing what is inside
 Aggregate-Oriented
Suitable Use Cases
 Storing Session Information
 User Profiles, Preferences
 Shopping Cart Data
19
Key Value Databases
These were inspired by Lotus Notes and are similar to
key-value stores. The model is basically versioned
documents that are collections of other key-value
collections.
The semi-structured documents are stored in formats
like JSON.
Document databases are essentially the next level of
Key/value, allowing nested values associated with each
key. Document databases support querying more
efficiently.
Document databases
20
 Documents are the main concept
 Stores and retrieves documents, which can be XML, JSON, BSON, …
 Documents are self-describing, hierarchical tree data structures which can
consist of maps, collections and scalar values
 Documents stored are similar to each other but do not have to be exactly the same
 Aggregate-Oriented Suitable
Use Cases
 Event Logging
 Content Management Systems
 Web Analytics or Real-Time Analytics
 Product Catalog
21
Documents Databases
Often referred as “BigTable clones” • "a sparse,
distributed multi-dimensional sorted map“
These were created to store and process very large
amounts of data distributed over many machines.
There are still keys but they point to multiple columns.
The columns are arranged by column family.
Wide-column stores
22
Column stores can greatly improve the performance of queries that only touch a small amount of columns
 This is because they will only access these columns' particular data
 Simple math: table t has a total of 10 GB data, with
 column a: 4 GB
 column b: 2 GB
 column c: 3 GB
 column d: 1 GB
If a query only uses column d, at most 1 GB of data will be processed by a column store
n a row store, the full 10 GB will be processed
 Aggregate-Oriented Suitable
Use Cases
• Event Logging
• Content Management Systems
23
Wide-column Databases
 Are used to store information about networks, such
as social connections.
Graph stores
24
 Allow to store entities and relationships between these entities
 Entities are known as nodes, which have properties
 Relations are known as edges, which also have properties
 A query on the graph is also known as traversing the graph
 Traversing the relationships is very fast
Suitable Use Cases
 Connected Data
 Routing, Dispatch and Location-Based Services
 Recommendation Engines
25
Graph Databases
POLYGLOT PERSISTENCE
 In 2006, Neal Ford coined the term Polyglot Programming
 Applications should be written in a mix of languages to take advantage of the fact that
different languages are suitable for tackling different problems Polyglot Persistence
defines a hybrid approach to persistence
 Using multiple data storage technologies
 Selected based on the way data is being used by individual applications
 Why store binary images in relational databases, when there are better storage
systems?
 Can occur both over the enterprise as well as within a single application
26
27
POLYGLOT PERSISTENCE
„Traditional“ Today we use the same database for all
kind of data Shopping cart data User Sessions
Completed Order Product Catalog Recommendations
• Business transactions, session management
RDBMS data, reporting, logging information,
content information, ...
Need for same properties of availability, consistency
or backup requirements
Polyglot Data Storage Usage allows to mix and
match Relational and NoSQL data stores
28
POLYGLOT PERSISTENCE – CHALLENGES
 Decisions
• Have to decide what data storage technology to use
• Today it is easier to go with relational
 New Data Access APIs
• Each data store has its own mechanisms for
accessing the data
• Different API‟s
 Solution: Wrap the data access code into services
(Data/Entity Service) exposed to applications
 Will enforce a contract/schema to a schemaless database
29
Replica Sets: High
Availability
Replication is the process of synchronizing data across multiple servers.
Purpose of Replication
Replication provides redundancy and increases data availability.
With multiple copies of data on different database servers, replication protects a database from the loss of
a single server.
Replication also allows you to recover from hardware failure and service interruptions.
With additional copies of the data, you can dedicate one to disaster recovery, reporting, or backup.
In some cases, you can use replication to increase read capacity.
Clients have the ability to send read and write operations to different servers.
You can also maintain copies in different data centers to increase the locality and availability of data for
distributed applications.
30
Replica Sets: High
Availability
The primary accepts all write
operations from clients. Replica
set can have only one primary.
Because only one member can
accept write operations, replica
sets provide strict consistency.
The secondaries replicate the primary’s
oplog and apply the operations to their
data sets.
Secondaries’ data sets reflect the
primary’s data set.
31
Replica Sets: High
Availability
Automatic Failover
When a primary does not communicate with the other members of the set for more than 10 seconds, the
replica set will attempt to select another member to become the new primary. The first secondary that
receives a majority of the votes becomes primary.
32
Sharding: High Scalability And
Throughput
Sharding is a method for storing data across multiple
machines.
Purpose of Sharding
Database systems with large data sets and high throughput applications can challenge the capacity of a
single server.
High query rates can exhaust the CPU capacity of the server. Larger data sets exceed the storage
capacity of a single machine.
Finally, working set sizes larger than the system’s RAM stress the I/O capacity of disk drives.
33
Sharding: high scalability and throughput
Sharding, or horizontal scaling, by contrast, divides the data set and distributes the data over multiple
servers, or shards. Each shard is an independent database, and collectively, the shards make up a single
logical database.
34
Map-Reduce
The map-reduce pattern is a way to organize processing in such a way as to take advantage of multiple
machines on a cluster while keeping as much processing and the data it needs together on the same
machine.
It first gained prominence with Google’s Map Reduce
framework.
"Map" step: The master node takes the input,
divides it into smaller sub-problems, and distributes
them to worker nodes. A worker node may do this again
in turn, leading to a multi-level tree structure.
The worker node processes the smaller problem,
and passes the answer back to its master node.
"Reduce" step: The master node then collects the answers to all the sub-problems and combines them
in some way to form the output – the answer to the problem it was originally trying to solve.
35
36
Advantages of MongoDB over RDBMS
Schema less : MongoDB is document database in which one collection holds different
documents.
Number of fields, content and size of the document can be differ from one document to
another.
Structure of a single object is clear
No complex joins
Deep query-ability. MongoDB supports dynamic queries on documents using a document-
based query language that's nearly as powerful as SQL
Ease of scale-out: MongoDB is easy to scale
37
 Why should use MongoDB
  Document Oriented Storage : Data is stored in the form of JSON style documents
  Index on any attribute
  Replication & High Availability
  Auto-Sharding
  Rich Queries
  Fast In-Place Updates
  Professional Support By MongoDB
 Where should use MongoDB?
  Big Data
  Content Management and Delivery
  Mobile and Social Infrastructure
  User Data Management
  Data Hub
38
39
40
41
Storage Type: Document
 http://www.mongodb.com/scale
 http://www.mongodb.com/partners/cloud/microsoft
 http://azure.microsoft.com/en-us/gallery/store/mongodb/mongodb-inc/
 http://www.mongodb.com/leading-nosql-database
 http://nosql.findthebest.com/
 http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
 http://stackoverflow.com/questions/5252577/how-much-faster-is-redis-than-mongodb
Azure offered as a Service:
 https://mongolab.com/welcome/
mongodb offered as a Service:
 http://www.objectrocket.com/
 https://www.mongohq.com/
42
43
44
Thank You

More Related Content

What's hot

A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...Andrew Lamb
 
re:Invent 2022 DAT326 Deep dive into Amazon Aurora and its innovations
re:Invent 2022  DAT326 Deep dive into Amazon Aurora and its innovationsre:Invent 2022  DAT326 Deep dive into Amazon Aurora and its innovations
re:Invent 2022 DAT326 Deep dive into Amazon Aurora and its innovationsGrant McAlister
 
Row or Columnar Database
Row or Columnar DatabaseRow or Columnar Database
Row or Columnar DatabaseBiju Nair
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guidelarsgeorge
 
Google Bigtable Paper Presentation
Google Bigtable Paper PresentationGoogle Bigtable Paper Presentation
Google Bigtable Paper Presentationvanjakom
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Amazon Web Services
 
Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path HBaseCon
 
Amazon Aurora: Amazon’s New Relational Database Engine
Amazon Aurora: Amazon’s New Relational Database EngineAmazon Aurora: Amazon’s New Relational Database Engine
Amazon Aurora: Amazon’s New Relational Database EngineAmazon Web Services
 
Linux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQLLinux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQLYoshinori Matsunobu
 
AF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on FlashAF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on FlashCeph Community
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerMongoDB
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesYoshinori Matsunobu
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon
 
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftBuilding a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftAmazon Web Services
 
mysql 8.0 architecture and enhancement
mysql 8.0 architecture and enhancementmysql 8.0 architecture and enhancement
mysql 8.0 architecture and enhancementlalit choudhary
 
Amazon Aurora Storage Demystified: How It All Works (DAT363) - AWS re:Invent ...
Amazon Aurora Storage Demystified: How It All Works (DAT363) - AWS re:Invent ...Amazon Aurora Storage Demystified: How It All Works (DAT363) - AWS re:Invent ...
Amazon Aurora Storage Demystified: How It All Works (DAT363) - AWS re:Invent ...Amazon Web Services
 

What's hot (20)

A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
 
re:Invent 2022 DAT326 Deep dive into Amazon Aurora and its innovations
re:Invent 2022  DAT326 Deep dive into Amazon Aurora and its innovationsre:Invent 2022  DAT326 Deep dive into Amazon Aurora and its innovations
re:Invent 2022 DAT326 Deep dive into Amazon Aurora and its innovations
 
Row or Columnar Database
Row or Columnar DatabaseRow or Columnar Database
Row or Columnar Database
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Introduction to Amazon Aurora
Introduction to Amazon AuroraIntroduction to Amazon Aurora
Introduction to Amazon Aurora
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
 
Google Bigtable Paper Presentation
Google Bigtable Paper PresentationGoogle Bigtable Paper Presentation
Google Bigtable Paper Presentation
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
Redshift overview
Redshift overviewRedshift overview
Redshift overview
 
Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path
 
Amazon Aurora: Amazon’s New Relational Database Engine
Amazon Aurora: Amazon’s New Relational Database EngineAmazon Aurora: Amazon’s New Relational Database Engine
Amazon Aurora: Amazon’s New Relational Database Engine
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Linux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQLLinux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQL
 
AF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on FlashAF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on Flash
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTiger
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
 
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftBuilding a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
 
mysql 8.0 architecture and enhancement
mysql 8.0 architecture and enhancementmysql 8.0 architecture and enhancement
mysql 8.0 architecture and enhancement
 
Amazon Aurora Storage Demystified: How It All Works (DAT363) - AWS re:Invent ...
Amazon Aurora Storage Demystified: How It All Works (DAT363) - AWS re:Invent ...Amazon Aurora Storage Demystified: How It All Works (DAT363) - AWS re:Invent ...
Amazon Aurora Storage Demystified: How It All Works (DAT363) - AWS re:Invent ...
 

Similar to NOSQL

CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataDebajani Mohanty
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesEditor Jacotech
 
No sqlpresentation
No sqlpresentationNo sqlpresentation
No sqlpresentationSalma Gouia
 
مقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربيمقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربيMohamed Galal
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfajajkhan16
 
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...IJCERT JOURNAL
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.Navdeep Charan
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology LandscapeShivanandaVSeeri
 
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)Mohamed Galal
 
Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...ijdms
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabasesAdi Challa
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLbalwinders
 

Similar to NOSQL (20)

The NoSQL Movement
The NoSQL MovementThe NoSQL Movement
The NoSQL Movement
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
 
No sql database
No sql databaseNo sql database
No sql database
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunities
 
No sqlpresentation
No sqlpresentationNo sqlpresentation
No sqlpresentation
 
مقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربيمقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربي
 
Artigo no sql x relational
Artigo no sql x relationalArtigo no sql x relational
Artigo no sql x relational
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdf
 
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
 
Report 2.0.docx
Report 2.0.docxReport 2.0.docx
Report 2.0.docx
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology Landscape
 
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
 
NoSQL Basics and MongDB
NoSQL Basics and  MongDBNoSQL Basics and  MongDB
NoSQL Basics and MongDB
 
Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Report 1.0.docx
Report 1.0.docxReport 1.0.docx
Report 1.0.docx
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 

Recently uploaded

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 

Recently uploaded (20)

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 

NOSQL

  • 1. 1  Akbar Shaikh | Monocept
  • 2. 2 2002 2004 2006 2008 2010 2012 Data
  • 3. 3 Data  Facebook had 60k servers in 2010  Google had 450k servers in 2006 (speculated)  Microsoft: between 100k and 500k servers (since Azure)  Amazon: likely has a similar numbers, too (S3)
  • 4.  Atomicity: Everything in a transaction succeeds lest it is rolled back.  Consistency: A transaction cannot leave the database in an inconsistent state.  Isolation: One transaction cannot interfere with another.  Durability: A completed transaction persists, even after applications restart. 4
  • 5.  Basic availability: Each request is guaranteed a response—successful or failed execution.  Soft state: The state of the system may change over time, at times without any input (for eventual consistency).  Eventual consistency: The database may be momentarily inconsistent but will be consistent eventually. 5 The point I am trying to make here is, we may have to look beyond ACID to something called BASE, coined by Eric Brewer:
  • 6.  Consistency : Data access in a distributed database is considered to be consistent when an update written on one node is immediately available on another node.  Availability : The system guarantees availability for requests even though one or more nodes are down.  Partition Tolerance : Nodes can be physically separated from each other at any given point and for any length of time. The time they're not able to reach each other, due to routing problems, network interface troubles, or firewall issues, is called a network partition. During the partition, all nodes should still be able to serve both read and write requests. Ideally the system automatically reconciles updates as soon as every node can reach every other node again. 6 Eric Brewer also noted that it is impossible for a distributed computer system to provide consistency, availability and partition tolerance simultaneously. This is more commonly referred to as the CAP theorem.
  • 7. ACID  Strong consistency for transactions highest priority  Availability less important  Pessimistic  Complex Mechanisms BASE  Availability and Scaling highest priorities  Weak consistency  Optimistic  Simple and Fast 7
  • 8. 8
  • 9. 9
  • 10. 10
  • 11. 11 { "customer" : "billingAddress" : [ { "city" : "Chicago" } ], "id" : 1, "name" : "Martin", "orders" : [ { "customerId" : 1, "id" : 99, "orderItems" : [ { "price" : 32.450000000000003, "productId" : 27, "productName" : "NoSQL Distilled" } ], "orderPayment" : [ { "billingAddress" : { "city" : "Chicago" }, "ccinfo" : "1000-1000-1000-1000", "txnId" : "abelif879rft" } ], "shippingAddress" : [ { "city" : "Chicago" } ] } ] }
  • 12. We see two primary reasons why people consider using a NoSQL database.  Application development productivity. A lot of application development effort is spent on mapping data between in-memory data structures and a relational database. A NoSQL database may provide a data model that better fits the application’s needs, thus simplifying that interaction and resulting in less code to write, debug, and evolve.  Large-scale data. Organizations are finding it valuable to capture more data and process it more quickly. They are finding it expensive, if even possible, to do so with relational databases. The primary reason is that a relational database is designed to run on a single machine, but it is usually more economic to run large data and computing loads on clusters of many smaller and cheaper machines. Many NoSQL databases are designed explicitly to run on clusters, so they make a better fit for big data scenarios. 12
  • 13.  For almost as long as we’ve been in the software profession, relational databases have been the default choice for serious data storage, especially in the world of enterprise applications.  If you’re an architect starting a new project, your only choice is likely to be which relational database to use.  After such a long period of dominance, the current excitement about NoSQL databases comes as a surprise. 13
  • 14.  Schemaless : data representation: Almost all NoSQL implementations offer schemaless data representation. This means that you don’t have to think too far ahead to define a structure and you can continue to evolve over time— including adding new fields or even nesting the data, for example, in case of JSON representation.  Development time : I have heard stories about reduced development time because one doesn’t have to deal with complex SQL queries. Do you remember the JOIN query that you wrote to collate the data across multiple tables to create your final view?  Speed : Even with the small amount of data that you have, if you can deliver in milliseconds rather than hundreds of milliseconds—especially over mobile and other intermittently connected devices—you have much higher probability of winning users over.  Plan ahead for scalability : You read it right. Why fall into the ditch and then try to get out of it? Why not just plan ahead so that you never fall into one. Or in other words, your application can be quite elastic—it can handle sudden spikes of load. Of course, you win users over straightaway. 14 NoSQL databases have a lot more to offer than just solving the problems of scale which are mentioned as follows:
  • 15. Some NoSQL use cases 1. Massive data volumes  Massively distributed architecture required to store the data  Google, Amazon, Yahoo, Facebook… 2. Extreme query workload  Impossible to efficiently do joins at that scale with an RDBMS 3. Schema evolution  Schema flexibility (migration) is not trivial at large scale  Schema changes can be gradually introduced with NoSQL 15
  • 16. 16
  • 17. 17
  • 18. The main idea here is using a hash table where there is a unique key and a pointer to a particular item of data. The Key/value model is the simplest and easiest to implement. Key-value stores But it is inefficient when you are only interested in querying or updating part of a value, among other disadvantages. One key  one value, very fast Key: Hash (no duplicates) Value: binary object („BLOB“) (DB does not understand your content) customer_22 ?=PQ)“§VN? =§(Q$U%V§W=(BN W§(=BU&W§$()= W§$(=% GIVE ME A MEANING! Key Value 18
  • 19.  A key-value store is a simple hash table  Primarily used when all access to the database is via primary key  Simplest NoSQL data stores to use (from an API perspective) PUT, GET, DELETE (matches REST)  Value is a blob with the data store not caring or knowing what is inside  Aggregate-Oriented Suitable Use Cases  Storing Session Information  User Profiles, Preferences  Shopping Cart Data 19 Key Value Databases
  • 20. These were inspired by Lotus Notes and are similar to key-value stores. The model is basically versioned documents that are collections of other key-value collections. The semi-structured documents are stored in formats like JSON. Document databases are essentially the next level of Key/value, allowing nested values associated with each key. Document databases support querying more efficiently. Document databases 20
  • 21.  Documents are the main concept  Stores and retrieves documents, which can be XML, JSON, BSON, …  Documents are self-describing, hierarchical tree data structures which can consist of maps, collections and scalar values  Documents stored are similar to each other but do not have to be exactly the same  Aggregate-Oriented Suitable Use Cases  Event Logging  Content Management Systems  Web Analytics or Real-Time Analytics  Product Catalog 21 Documents Databases
  • 22. Often referred as “BigTable clones” • "a sparse, distributed multi-dimensional sorted map“ These were created to store and process very large amounts of data distributed over many machines. There are still keys but they point to multiple columns. The columns are arranged by column family. Wide-column stores 22
  • 23. Column stores can greatly improve the performance of queries that only touch a small amount of columns  This is because they will only access these columns' particular data  Simple math: table t has a total of 10 GB data, with  column a: 4 GB  column b: 2 GB  column c: 3 GB  column d: 1 GB If a query only uses column d, at most 1 GB of data will be processed by a column store n a row store, the full 10 GB will be processed  Aggregate-Oriented Suitable Use Cases • Event Logging • Content Management Systems 23 Wide-column Databases
  • 24.  Are used to store information about networks, such as social connections. Graph stores 24
  • 25.  Allow to store entities and relationships between these entities  Entities are known as nodes, which have properties  Relations are known as edges, which also have properties  A query on the graph is also known as traversing the graph  Traversing the relationships is very fast Suitable Use Cases  Connected Data  Routing, Dispatch and Location-Based Services  Recommendation Engines 25 Graph Databases
  • 26. POLYGLOT PERSISTENCE  In 2006, Neal Ford coined the term Polyglot Programming  Applications should be written in a mix of languages to take advantage of the fact that different languages are suitable for tackling different problems Polyglot Persistence defines a hybrid approach to persistence  Using multiple data storage technologies  Selected based on the way data is being used by individual applications  Why store binary images in relational databases, when there are better storage systems?  Can occur both over the enterprise as well as within a single application 26
  • 27. 27 POLYGLOT PERSISTENCE „Traditional“ Today we use the same database for all kind of data Shopping cart data User Sessions Completed Order Product Catalog Recommendations • Business transactions, session management RDBMS data, reporting, logging information, content information, ... Need for same properties of availability, consistency or backup requirements Polyglot Data Storage Usage allows to mix and match Relational and NoSQL data stores
  • 28. 28 POLYGLOT PERSISTENCE – CHALLENGES  Decisions • Have to decide what data storage technology to use • Today it is easier to go with relational  New Data Access APIs • Each data store has its own mechanisms for accessing the data • Different API‟s  Solution: Wrap the data access code into services (Data/Entity Service) exposed to applications  Will enforce a contract/schema to a schemaless database
  • 29. 29 Replica Sets: High Availability Replication is the process of synchronizing data across multiple servers. Purpose of Replication Replication provides redundancy and increases data availability. With multiple copies of data on different database servers, replication protects a database from the loss of a single server. Replication also allows you to recover from hardware failure and service interruptions. With additional copies of the data, you can dedicate one to disaster recovery, reporting, or backup. In some cases, you can use replication to increase read capacity. Clients have the ability to send read and write operations to different servers. You can also maintain copies in different data centers to increase the locality and availability of data for distributed applications.
  • 30. 30 Replica Sets: High Availability The primary accepts all write operations from clients. Replica set can have only one primary. Because only one member can accept write operations, replica sets provide strict consistency. The secondaries replicate the primary’s oplog and apply the operations to their data sets. Secondaries’ data sets reflect the primary’s data set.
  • 31. 31 Replica Sets: High Availability Automatic Failover When a primary does not communicate with the other members of the set for more than 10 seconds, the replica set will attempt to select another member to become the new primary. The first secondary that receives a majority of the votes becomes primary.
  • 32. 32 Sharding: High Scalability And Throughput Sharding is a method for storing data across multiple machines. Purpose of Sharding Database systems with large data sets and high throughput applications can challenge the capacity of a single server. High query rates can exhaust the CPU capacity of the server. Larger data sets exceed the storage capacity of a single machine. Finally, working set sizes larger than the system’s RAM stress the I/O capacity of disk drives.
  • 33. 33 Sharding: high scalability and throughput Sharding, or horizontal scaling, by contrast, divides the data set and distributes the data over multiple servers, or shards. Each shard is an independent database, and collectively, the shards make up a single logical database.
  • 34. 34 Map-Reduce The map-reduce pattern is a way to organize processing in such a way as to take advantage of multiple machines on a cluster while keeping as much processing and the data it needs together on the same machine. It first gained prominence with Google’s Map Reduce framework. "Map" step: The master node takes the input, divides it into smaller sub-problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes the smaller problem, and passes the answer back to its master node. "Reduce" step: The master node then collects the answers to all the sub-problems and combines them in some way to form the output – the answer to the problem it was originally trying to solve.
  • 35. 35
  • 36. 36
  • 37. Advantages of MongoDB over RDBMS Schema less : MongoDB is document database in which one collection holds different documents. Number of fields, content and size of the document can be differ from one document to another. Structure of a single object is clear No complex joins Deep query-ability. MongoDB supports dynamic queries on documents using a document- based query language that's nearly as powerful as SQL Ease of scale-out: MongoDB is easy to scale 37
  • 38.  Why should use MongoDB   Document Oriented Storage : Data is stored in the form of JSON style documents   Index on any attribute   Replication & High Availability   Auto-Sharding   Rich Queries   Fast In-Place Updates   Professional Support By MongoDB  Where should use MongoDB?   Big Data   Content Management and Delivery   Mobile and Social Infrastructure   User Data Management   Data Hub 38
  • 39. 39
  • 40. 40
  • 42.  http://www.mongodb.com/scale  http://www.mongodb.com/partners/cloud/microsoft  http://azure.microsoft.com/en-us/gallery/store/mongodb/mongodb-inc/  http://www.mongodb.com/leading-nosql-database  http://nosql.findthebest.com/  http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis  http://stackoverflow.com/questions/5252577/how-much-faster-is-redis-than-mongodb Azure offered as a Service:  https://mongolab.com/welcome/ mongodb offered as a Service:  http://www.objectrocket.com/  https://www.mongohq.com/ 42
  • 43. 43

Editor's Notes

  1. http://downloadsquad.switched.com/2010/06/29/facebook-doubles-its-server-count-from-30-000-to-60-000-in-just-6-months/ by Sebastian Anthony on June 29, 2010 at 10:00 AM