SlideShare a Scribd company logo
1 of 44
1
 Akbar Shaikh | Monocept
2
2002 2004 2006 2008 2010 2012
Data
3
Data
 Facebook had 60k servers in 2010
 Google had 450k servers in 2006 (speculated)
 Microsoft: between 100k and 500k servers (since Azure)
 Amazon: likely has a similar numbers, too (S3)
 Atomicity: Everything in a transaction succeeds lest it is rolled back.
 Consistency: A transaction cannot leave the database in an inconsistent state.
 Isolation: One transaction cannot interfere with another.
 Durability: A completed transaction persists, even after applications restart.
4
 Basic availability: Each request is guaranteed a response—successful or failed
execution.
 Soft state: The state of the system may change over time, at times without any
input (for eventual consistency).
 Eventual consistency: The database may be momentarily inconsistent but will be
consistent eventually.
5
The point I am trying to make here is, we may have to look beyond ACID to
something called BASE, coined by Eric Brewer:
 Consistency : Data access in a distributed database is considered to be consistent when an
update written on one node is immediately available on another node.
 Availability : The system guarantees availability for requests even though one or more
nodes are down.
 Partition Tolerance : Nodes can be physically separated from each other at any given
point and for any length of time. The time they're not able to reach each other, due to
routing problems, network interface troubles, or firewall issues, is called a network
partition. During the partition, all nodes should still be able to serve both read and write
requests. Ideally the system automatically reconciles updates as soon as every node can
reach every other node again.
6
Eric Brewer also noted that it is impossible for a distributed computer system to provide
consistency, availability and partition tolerance simultaneously. This is more commonly referred
to as the CAP theorem.
ACID
 Strong consistency for transactions
highest priority
 Availability less important
 Pessimistic
 Complex Mechanisms
BASE
 Availability and Scaling highest
priorities
 Weak consistency
 Optimistic
 Simple and Fast
7
8
9
10
11
{ "customer" : "billingAddress" : [ { "city" : "Chicago" } ],
"id" : 1,
"name" : "Martin",
"orders" : [ { "customerId" : 1,
"id" : 99,
"orderItems" : [ { "price" : 32.450000000000003,
"productId" : 27,
"productName" : "NoSQL Distilled"
} ],
"orderPayment" : [ { "billingAddress" : { "city" : "Chicago" },
"ccinfo" : "1000-1000-1000-1000",
"txnId" : "abelif879rft"
} ],
"shippingAddress" : [ { "city" : "Chicago" } ]
} ]
}
We see two primary reasons why people consider using a NoSQL database.
 Application development productivity.
A lot of application development effort is spent on mapping data between in-memory
data structures and a relational database. A NoSQL database may provide a data model
that better fits the application’s needs, thus simplifying that interaction and resulting in
less code to write, debug, and evolve.
 Large-scale data.
Organizations are finding it valuable to capture more data and process it more quickly.
They are finding it expensive, if even possible, to do so with relational databases. The
primary reason is that a relational database is designed to run on a single machine, but
it is usually more economic to run large data and computing loads on clusters of many
smaller and cheaper machines. Many NoSQL databases are designed explicitly to run
on clusters, so they make a better fit for big data scenarios.
12
 For almost as long as we’ve been in the software profession, relational databases
have been the default choice for serious data storage, especially in the world of
enterprise applications.
 If you’re an architect starting a new project, your only choice is likely to be which
relational database to use.
 After such a long period of dominance, the current excitement about NoSQL
databases comes as a surprise.
13
 Schemaless : data representation: Almost all NoSQL implementations offer schemaless data representation. This
means that you don’t have to think too far ahead to define a structure and you can continue to evolve over time—
including adding new fields or even nesting the data, for example, in case of JSON representation.
 Development time : I have heard stories about reduced development time because one doesn’t have to deal with
complex SQL queries. Do you remember the JOIN query that you wrote to collate the data across multiple tables to
create your final view?
 Speed : Even with the small amount of data that you have, if you can deliver in milliseconds rather than hundreds of
milliseconds—especially over mobile and other intermittently connected devices—you have much higher probability
of winning users over.
 Plan ahead for scalability : You read it right. Why fall into the ditch and then try to get out of it? Why not just plan
ahead so that you never fall into one. Or in other words, your application can be quite elastic—it can handle sudden
spikes of load. Of course, you win users over straightaway.
14
NoSQL databases have a lot more to offer than just solving the problems of scale
which are mentioned as follows:
Some NoSQL use cases
1. Massive data volumes
 Massively distributed architecture required to store the data
 Google, Amazon, Yahoo, Facebook…
2. Extreme query workload
 Impossible to efficiently do joins at that scale with an RDBMS
3. Schema evolution
 Schema flexibility (migration) is not trivial at large scale
 Schema changes can be gradually introduced with NoSQL
15
16
17
The main idea here is using a hash table where
there is a unique key and a pointer to a particular
item of data. The Key/value model is the simplest
and easiest to implement.
Key-value stores
But it is inefficient when you are only
interested in querying or updating part of
a value, among other disadvantages.
One key  one value, very fast
Key: Hash (no duplicates)
Value: binary object („BLOB“)
(DB does not understand your content)
customer_22
?=PQ)ҤVN?
=§(Q$U%V§W=(BN
W§(=BU&W§$()=
W§$(=%
GIVE ME A
MEANING!
Key
Value
18
 A key-value store is a simple hash table
 Primarily used when all access to the database is via primary key
 Simplest NoSQL data stores to use (from an API perspective) PUT, GET, DELETE (matches REST)
 Value is a blob with the data store not caring or knowing what is inside
 Aggregate-Oriented
Suitable Use Cases
 Storing Session Information
 User Profiles, Preferences
 Shopping Cart Data
19
Key Value Databases
These were inspired by Lotus Notes and are similar to
key-value stores. The model is basically versioned
documents that are collections of other key-value
collections.
The semi-structured documents are stored in formats
like JSON.
Document databases are essentially the next level of
Key/value, allowing nested values associated with each
key. Document databases support querying more
efficiently.
Document databases
20
 Documents are the main concept
 Stores and retrieves documents, which can be XML, JSON, BSON, …
 Documents are self-describing, hierarchical tree data structures which can
consist of maps, collections and scalar values
 Documents stored are similar to each other but do not have to be exactly the same
 Aggregate-Oriented Suitable
Use Cases
 Event Logging
 Content Management Systems
 Web Analytics or Real-Time Analytics
 Product Catalog
21
Documents Databases
Often referred as “BigTable clones” • "a sparse,
distributed multi-dimensional sorted map“
These were created to store and process very large
amounts of data distributed over many machines.
There are still keys but they point to multiple columns.
The columns are arranged by column family.
Wide-column stores
22
Column stores can greatly improve the performance of queries that only touch a small amount of columns
 This is because they will only access these columns' particular data
 Simple math: table t has a total of 10 GB data, with
 column a: 4 GB
 column b: 2 GB
 column c: 3 GB
 column d: 1 GB
If a query only uses column d, at most 1 GB of data will be processed by a column store
n a row store, the full 10 GB will be processed
 Aggregate-Oriented Suitable
Use Cases
• Event Logging
• Content Management Systems
23
Wide-column Databases
 Are used to store information about networks, such
as social connections.
Graph stores
24
 Allow to store entities and relationships between these entities
 Entities are known as nodes, which have properties
 Relations are known as edges, which also have properties
 A query on the graph is also known as traversing the graph
 Traversing the relationships is very fast
Suitable Use Cases
 Connected Data
 Routing, Dispatch and Location-Based Services
 Recommendation Engines
25
Graph Databases
POLYGLOT PERSISTENCE
 In 2006, Neal Ford coined the term Polyglot Programming
 Applications should be written in a mix of languages to take advantage of the fact that
different languages are suitable for tackling different problems Polyglot Persistence
defines a hybrid approach to persistence
 Using multiple data storage technologies
 Selected based on the way data is being used by individual applications
 Why store binary images in relational databases, when there are better storage
systems?
 Can occur both over the enterprise as well as within a single application
26
27
POLYGLOT PERSISTENCE
„Traditional“ Today we use the same database for all
kind of data Shopping cart data User Sessions
Completed Order Product Catalog Recommendations
• Business transactions, session management
RDBMS data, reporting, logging information,
content information, ...
Need for same properties of availability, consistency
or backup requirements
Polyglot Data Storage Usage allows to mix and
match Relational and NoSQL data stores
28
POLYGLOT PERSISTENCE – CHALLENGES
 Decisions
• Have to decide what data storage technology to use
• Today it is easier to go with relational
 New Data Access APIs
• Each data store has its own mechanisms for
accessing the data
• Different API‟s
 Solution: Wrap the data access code into services
(Data/Entity Service) exposed to applications
 Will enforce a contract/schema to a schemaless database
29
Replica Sets: High
Availability
Replication is the process of synchronizing data across multiple servers.
Purpose of Replication
Replication provides redundancy and increases data availability.
With multiple copies of data on different database servers, replication protects a database from the loss of
a single server.
Replication also allows you to recover from hardware failure and service interruptions.
With additional copies of the data, you can dedicate one to disaster recovery, reporting, or backup.
In some cases, you can use replication to increase read capacity.
Clients have the ability to send read and write operations to different servers.
You can also maintain copies in different data centers to increase the locality and availability of data for
distributed applications.
30
Replica Sets: High
Availability
The primary accepts all write
operations from clients. Replica
set can have only one primary.
Because only one member can
accept write operations, replica
sets provide strict consistency.
The secondaries replicate the primary’s
oplog and apply the operations to their
data sets.
Secondaries’ data sets reflect the
primary’s data set.
31
Replica Sets: High
Availability
Automatic Failover
When a primary does not communicate with the other members of the set for more than 10 seconds, the
replica set will attempt to select another member to become the new primary. The first secondary that
receives a majority of the votes becomes primary.
32
Sharding: High Scalability And
Throughput
Sharding is a method for storing data across multiple
machines.
Purpose of Sharding
Database systems with large data sets and high throughput applications can challenge the capacity of a
single server.
High query rates can exhaust the CPU capacity of the server. Larger data sets exceed the storage
capacity of a single machine.
Finally, working set sizes larger than the system’s RAM stress the I/O capacity of disk drives.
33
Sharding: high scalability and throughput
Sharding, or horizontal scaling, by contrast, divides the data set and distributes the data over multiple
servers, or shards. Each shard is an independent database, and collectively, the shards make up a single
logical database.
34
Map-Reduce
The map-reduce pattern is a way to organize processing in such a way as to take advantage of multiple
machines on a cluster while keeping as much processing and the data it needs together on the same
machine.
It first gained prominence with Google’s Map Reduce
framework.
"Map" step: The master node takes the input,
divides it into smaller sub-problems, and distributes
them to worker nodes. A worker node may do this again
in turn, leading to a multi-level tree structure.
The worker node processes the smaller problem,
and passes the answer back to its master node.
"Reduce" step: The master node then collects the answers to all the sub-problems and combines them
in some way to form the output – the answer to the problem it was originally trying to solve.
35
36
Advantages of MongoDB over RDBMS
Schema less : MongoDB is document database in which one collection holds different
documents.
Number of fields, content and size of the document can be differ from one document to
another.
Structure of a single object is clear
No complex joins
Deep query-ability. MongoDB supports dynamic queries on documents using a document-
based query language that's nearly as powerful as SQL
Ease of scale-out: MongoDB is easy to scale
37
 Why should use MongoDB
  Document Oriented Storage : Data is stored in the form of JSON style documents
  Index on any attribute
  Replication & High Availability
  Auto-Sharding
  Rich Queries
  Fast In-Place Updates
  Professional Support By MongoDB
 Where should use MongoDB?
  Big Data
  Content Management and Delivery
  Mobile and Social Infrastructure
  User Data Management
  Data Hub
38
39
40
41
Storage Type: Document
 http://www.mongodb.com/scale
 http://www.mongodb.com/partners/cloud/microsoft
 http://azure.microsoft.com/en-us/gallery/store/mongodb/mongodb-inc/
 http://www.mongodb.com/leading-nosql-database
 http://nosql.findthebest.com/
 http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
 http://stackoverflow.com/questions/5252577/how-much-faster-is-redis-than-mongodb
Azure offered as a Service:
 https://mongolab.com/welcome/
mongodb offered as a Service:
 http://www.objectrocket.com/
 https://www.mongohq.com/
42
43
44
Thank You

More Related Content

What's hot

Presentation on Databases in the Cloud
Presentation on Databases in the CloudPresentation on Databases in the Cloud
Presentation on Databases in the Cloudmoshfiq
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture OverviewChristopher Foot
 
2017 AWS DB Day | Amazon Aurora 자세히 살펴보기
2017 AWS DB Day | Amazon Aurora 자세히 살펴보기2017 AWS DB Day | Amazon Aurora 자세히 살펴보기
2017 AWS DB Day | Amazon Aurora 자세히 살펴보기Amazon Web Services Korea
 
Amazon RDS: Deep Dive - SRV310 - Chicago AWS Summit
Amazon RDS: Deep Dive - SRV310 - Chicago AWS SummitAmazon RDS: Deep Dive - SRV310 - Chicago AWS Summit
Amazon RDS: Deep Dive - SRV310 - Chicago AWS SummitAmazon Web Services
 
Proxysql use case scenarios fosdem17
Proxysql use case scenarios    fosdem17Proxysql use case scenarios    fosdem17
Proxysql use case scenarios fosdem17Alkin Tezuysal
 
Understanding Azure Data Factory: The What, When, and Why (NIC 2020)
Understanding Azure Data Factory: The What, When, and Why (NIC 2020)Understanding Azure Data Factory: The What, When, and Why (NIC 2020)
Understanding Azure Data Factory: The What, When, and Why (NIC 2020)Cathrine Wilhelmsen
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecturenickmbailey
 
AWS Webcast - What is Cloud Computing with AWS
AWS Webcast - What is Cloud Computing with AWSAWS Webcast - What is Cloud Computing with AWS
AWS Webcast - What is Cloud Computing with AWSAmazon Web Services
 
Snowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingSnowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingAmazon Web Services
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraFolio3 Software
 
Technical Architecture
Technical ArchitectureTechnical Architecture
Technical Architecturescmiyer
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandraNguyen Quang
 
Introduction to Amazon Web Services by i2k2 Networks
Introduction to Amazon Web Services by i2k2 NetworksIntroduction to Amazon Web Services by i2k2 Networks
Introduction to Amazon Web Services by i2k2 Networksi2k2 Networks (P) Ltd.
 
Azure SQL Database
Azure SQL DatabaseAzure SQL Database
Azure SQL Databaserockplace
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overviewPritamKathar
 

What's hot (20)

Presentation on Databases in the Cloud
Presentation on Databases in the CloudPresentation on Databases in the Cloud
Presentation on Databases in the Cloud
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
 
Aws overview
Aws overviewAws overview
Aws overview
 
2017 AWS DB Day | Amazon Aurora 자세히 살펴보기
2017 AWS DB Day | Amazon Aurora 자세히 살펴보기2017 AWS DB Day | Amazon Aurora 자세히 살펴보기
2017 AWS DB Day | Amazon Aurora 자세히 살펴보기
 
Amazon RDS: Deep Dive - SRV310 - Chicago AWS Summit
Amazon RDS: Deep Dive - SRV310 - Chicago AWS SummitAmazon RDS: Deep Dive - SRV310 - Chicago AWS Summit
Amazon RDS: Deep Dive - SRV310 - Chicago AWS Summit
 
Introduction to Amazon Aurora
Introduction to Amazon AuroraIntroduction to Amazon Aurora
Introduction to Amazon Aurora
 
Aws interview questions
Aws interview questionsAws interview questions
Aws interview questions
 
Proxysql use case scenarios fosdem17
Proxysql use case scenarios    fosdem17Proxysql use case scenarios    fosdem17
Proxysql use case scenarios fosdem17
 
Understanding Azure Data Factory: The What, When, and Why (NIC 2020)
Understanding Azure Data Factory: The What, When, and Why (NIC 2020)Understanding Azure Data Factory: The What, When, and Why (NIC 2020)
Understanding Azure Data Factory: The What, When, and Why (NIC 2020)
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecture
 
AWS Webcast - What is Cloud Computing with AWS
AWS Webcast - What is Cloud Computing with AWSAWS Webcast - What is Cloud Computing with AWS
AWS Webcast - What is Cloud Computing with AWS
 
Snowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingSnowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data Warehousing
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
 
Technical Architecture
Technical ArchitectureTechnical Architecture
Technical Architecture
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
EC2.pdf
EC2.pdfEC2.pdf
EC2.pdf
 
Amazon S3 and EC2
Amazon S3 and EC2Amazon S3 and EC2
Amazon S3 and EC2
 
Introduction to Amazon Web Services by i2k2 Networks
Introduction to Amazon Web Services by i2k2 NetworksIntroduction to Amazon Web Services by i2k2 Networks
Introduction to Amazon Web Services by i2k2 Networks
 
Azure SQL Database
Azure SQL DatabaseAzure SQL Database
Azure SQL Database
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 

Similar to NOSQL

CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataDebajani Mohanty
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesEditor Jacotech
 
No sqlpresentation
No sqlpresentationNo sqlpresentation
No sqlpresentationSalma Gouia
 
مقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربيمقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربيMohamed Galal
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfajajkhan16
 
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...IJCERT JOURNAL
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.Navdeep Charan
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology LandscapeShivanandaVSeeri
 
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)Mohamed Galal
 
Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...ijdms
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabasesAdi Challa
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLbalwinders
 

Similar to NOSQL (20)

The NoSQL Movement
The NoSQL MovementThe NoSQL Movement
The NoSQL Movement
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
 
No sql database
No sql databaseNo sql database
No sql database
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunities
 
No sqlpresentation
No sqlpresentationNo sqlpresentation
No sqlpresentation
 
مقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربيمقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربي
 
Artigo no sql x relational
Artigo no sql x relationalArtigo no sql x relational
Artigo no sql x relational
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdf
 
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
 
Report 2.0.docx
Report 2.0.docxReport 2.0.docx
Report 2.0.docx
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology Landscape
 
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
 
NoSQL Basics and MongDB
NoSQL Basics and  MongDBNoSQL Basics and  MongDB
NoSQL Basics and MongDB
 
Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Report 1.0.docx
Report 1.0.docxReport 1.0.docx
Report 1.0.docx
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 

Recently uploaded

Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 
chpater16.pptxMMMMMMMMMMMMMMMMMMMMMMMMMMM
chpater16.pptxMMMMMMMMMMMMMMMMMMMMMMMMMMMchpater16.pptxMMMMMMMMMMMMMMMMMMMMMMMMMMM
chpater16.pptxMMMMMMMMMMMMMMMMMMMMMMMMMMMNanaAgyeman13
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Romil Mishra
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewsandhya757531
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdfsahilsajad201
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmDeepika Walanjkar
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfManish Kumar
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Risk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectRisk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectErbil Polytechnic University
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfisabel213075
 
Crushers to screens in aggregate production
Crushers to screens in aggregate productionCrushers to screens in aggregate production
Crushers to screens in aggregate productionChinnuNinan
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSsandhya757531
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Sumanth A
 
GSK & SEAMANSHIP-IV LIFE SAVING APPLIANCES .pptx
GSK & SEAMANSHIP-IV LIFE SAVING APPLIANCES .pptxGSK & SEAMANSHIP-IV LIFE SAVING APPLIANCES .pptx
GSK & SEAMANSHIP-IV LIFE SAVING APPLIANCES .pptxshuklamittt0077
 

Recently uploaded (20)

Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 
chpater16.pptxMMMMMMMMMMMMMMMMMMMMMMMMMMM
chpater16.pptxMMMMMMMMMMMMMMMMMMMMMMMMMMMchpater16.pptxMMMMMMMMMMMMMMMMMMMMMMMMMMM
chpater16.pptxMMMMMMMMMMMMMMMMMMMMMMMMMMM
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overview
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdf
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Risk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectRisk Management in Engineering Construction Project
Risk Management in Engineering Construction Project
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdf
 
Crushers to screens in aggregate production
Crushers to screens in aggregate productionCrushers to screens in aggregate production
Crushers to screens in aggregate production
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
 
GSK & SEAMANSHIP-IV LIFE SAVING APPLIANCES .pptx
GSK & SEAMANSHIP-IV LIFE SAVING APPLIANCES .pptxGSK & SEAMANSHIP-IV LIFE SAVING APPLIANCES .pptx
GSK & SEAMANSHIP-IV LIFE SAVING APPLIANCES .pptx
 
Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptxDesigning pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx
 

NOSQL

  • 1. 1  Akbar Shaikh | Monocept
  • 2. 2 2002 2004 2006 2008 2010 2012 Data
  • 3. 3 Data  Facebook had 60k servers in 2010  Google had 450k servers in 2006 (speculated)  Microsoft: between 100k and 500k servers (since Azure)  Amazon: likely has a similar numbers, too (S3)
  • 4.  Atomicity: Everything in a transaction succeeds lest it is rolled back.  Consistency: A transaction cannot leave the database in an inconsistent state.  Isolation: One transaction cannot interfere with another.  Durability: A completed transaction persists, even after applications restart. 4
  • 5.  Basic availability: Each request is guaranteed a response—successful or failed execution.  Soft state: The state of the system may change over time, at times without any input (for eventual consistency).  Eventual consistency: The database may be momentarily inconsistent but will be consistent eventually. 5 The point I am trying to make here is, we may have to look beyond ACID to something called BASE, coined by Eric Brewer:
  • 6.  Consistency : Data access in a distributed database is considered to be consistent when an update written on one node is immediately available on another node.  Availability : The system guarantees availability for requests even though one or more nodes are down.  Partition Tolerance : Nodes can be physically separated from each other at any given point and for any length of time. The time they're not able to reach each other, due to routing problems, network interface troubles, or firewall issues, is called a network partition. During the partition, all nodes should still be able to serve both read and write requests. Ideally the system automatically reconciles updates as soon as every node can reach every other node again. 6 Eric Brewer also noted that it is impossible for a distributed computer system to provide consistency, availability and partition tolerance simultaneously. This is more commonly referred to as the CAP theorem.
  • 7. ACID  Strong consistency for transactions highest priority  Availability less important  Pessimistic  Complex Mechanisms BASE  Availability and Scaling highest priorities  Weak consistency  Optimistic  Simple and Fast 7
  • 8. 8
  • 9. 9
  • 10. 10
  • 11. 11 { "customer" : "billingAddress" : [ { "city" : "Chicago" } ], "id" : 1, "name" : "Martin", "orders" : [ { "customerId" : 1, "id" : 99, "orderItems" : [ { "price" : 32.450000000000003, "productId" : 27, "productName" : "NoSQL Distilled" } ], "orderPayment" : [ { "billingAddress" : { "city" : "Chicago" }, "ccinfo" : "1000-1000-1000-1000", "txnId" : "abelif879rft" } ], "shippingAddress" : [ { "city" : "Chicago" } ] } ] }
  • 12. We see two primary reasons why people consider using a NoSQL database.  Application development productivity. A lot of application development effort is spent on mapping data between in-memory data structures and a relational database. A NoSQL database may provide a data model that better fits the application’s needs, thus simplifying that interaction and resulting in less code to write, debug, and evolve.  Large-scale data. Organizations are finding it valuable to capture more data and process it more quickly. They are finding it expensive, if even possible, to do so with relational databases. The primary reason is that a relational database is designed to run on a single machine, but it is usually more economic to run large data and computing loads on clusters of many smaller and cheaper machines. Many NoSQL databases are designed explicitly to run on clusters, so they make a better fit for big data scenarios. 12
  • 13.  For almost as long as we’ve been in the software profession, relational databases have been the default choice for serious data storage, especially in the world of enterprise applications.  If you’re an architect starting a new project, your only choice is likely to be which relational database to use.  After such a long period of dominance, the current excitement about NoSQL databases comes as a surprise. 13
  • 14.  Schemaless : data representation: Almost all NoSQL implementations offer schemaless data representation. This means that you don’t have to think too far ahead to define a structure and you can continue to evolve over time— including adding new fields or even nesting the data, for example, in case of JSON representation.  Development time : I have heard stories about reduced development time because one doesn’t have to deal with complex SQL queries. Do you remember the JOIN query that you wrote to collate the data across multiple tables to create your final view?  Speed : Even with the small amount of data that you have, if you can deliver in milliseconds rather than hundreds of milliseconds—especially over mobile and other intermittently connected devices—you have much higher probability of winning users over.  Plan ahead for scalability : You read it right. Why fall into the ditch and then try to get out of it? Why not just plan ahead so that you never fall into one. Or in other words, your application can be quite elastic—it can handle sudden spikes of load. Of course, you win users over straightaway. 14 NoSQL databases have a lot more to offer than just solving the problems of scale which are mentioned as follows:
  • 15. Some NoSQL use cases 1. Massive data volumes  Massively distributed architecture required to store the data  Google, Amazon, Yahoo, Facebook… 2. Extreme query workload  Impossible to efficiently do joins at that scale with an RDBMS 3. Schema evolution  Schema flexibility (migration) is not trivial at large scale  Schema changes can be gradually introduced with NoSQL 15
  • 16. 16
  • 17. 17
  • 18. The main idea here is using a hash table where there is a unique key and a pointer to a particular item of data. The Key/value model is the simplest and easiest to implement. Key-value stores But it is inefficient when you are only interested in querying or updating part of a value, among other disadvantages. One key  one value, very fast Key: Hash (no duplicates) Value: binary object („BLOB“) (DB does not understand your content) customer_22 ?=PQ)“§VN? =§(Q$U%V§W=(BN W§(=BU&W§$()= W§$(=% GIVE ME A MEANING! Key Value 18
  • 19.  A key-value store is a simple hash table  Primarily used when all access to the database is via primary key  Simplest NoSQL data stores to use (from an API perspective) PUT, GET, DELETE (matches REST)  Value is a blob with the data store not caring or knowing what is inside  Aggregate-Oriented Suitable Use Cases  Storing Session Information  User Profiles, Preferences  Shopping Cart Data 19 Key Value Databases
  • 20. These were inspired by Lotus Notes and are similar to key-value stores. The model is basically versioned documents that are collections of other key-value collections. The semi-structured documents are stored in formats like JSON. Document databases are essentially the next level of Key/value, allowing nested values associated with each key. Document databases support querying more efficiently. Document databases 20
  • 21.  Documents are the main concept  Stores and retrieves documents, which can be XML, JSON, BSON, …  Documents are self-describing, hierarchical tree data structures which can consist of maps, collections and scalar values  Documents stored are similar to each other but do not have to be exactly the same  Aggregate-Oriented Suitable Use Cases  Event Logging  Content Management Systems  Web Analytics or Real-Time Analytics  Product Catalog 21 Documents Databases
  • 22. Often referred as “BigTable clones” • "a sparse, distributed multi-dimensional sorted map“ These were created to store and process very large amounts of data distributed over many machines. There are still keys but they point to multiple columns. The columns are arranged by column family. Wide-column stores 22
  • 23. Column stores can greatly improve the performance of queries that only touch a small amount of columns  This is because they will only access these columns' particular data  Simple math: table t has a total of 10 GB data, with  column a: 4 GB  column b: 2 GB  column c: 3 GB  column d: 1 GB If a query only uses column d, at most 1 GB of data will be processed by a column store n a row store, the full 10 GB will be processed  Aggregate-Oriented Suitable Use Cases • Event Logging • Content Management Systems 23 Wide-column Databases
  • 24.  Are used to store information about networks, such as social connections. Graph stores 24
  • 25.  Allow to store entities and relationships between these entities  Entities are known as nodes, which have properties  Relations are known as edges, which also have properties  A query on the graph is also known as traversing the graph  Traversing the relationships is very fast Suitable Use Cases  Connected Data  Routing, Dispatch and Location-Based Services  Recommendation Engines 25 Graph Databases
  • 26. POLYGLOT PERSISTENCE  In 2006, Neal Ford coined the term Polyglot Programming  Applications should be written in a mix of languages to take advantage of the fact that different languages are suitable for tackling different problems Polyglot Persistence defines a hybrid approach to persistence  Using multiple data storage technologies  Selected based on the way data is being used by individual applications  Why store binary images in relational databases, when there are better storage systems?  Can occur both over the enterprise as well as within a single application 26
  • 27. 27 POLYGLOT PERSISTENCE „Traditional“ Today we use the same database for all kind of data Shopping cart data User Sessions Completed Order Product Catalog Recommendations • Business transactions, session management RDBMS data, reporting, logging information, content information, ... Need for same properties of availability, consistency or backup requirements Polyglot Data Storage Usage allows to mix and match Relational and NoSQL data stores
  • 28. 28 POLYGLOT PERSISTENCE – CHALLENGES  Decisions • Have to decide what data storage technology to use • Today it is easier to go with relational  New Data Access APIs • Each data store has its own mechanisms for accessing the data • Different API‟s  Solution: Wrap the data access code into services (Data/Entity Service) exposed to applications  Will enforce a contract/schema to a schemaless database
  • 29. 29 Replica Sets: High Availability Replication is the process of synchronizing data across multiple servers. Purpose of Replication Replication provides redundancy and increases data availability. With multiple copies of data on different database servers, replication protects a database from the loss of a single server. Replication also allows you to recover from hardware failure and service interruptions. With additional copies of the data, you can dedicate one to disaster recovery, reporting, or backup. In some cases, you can use replication to increase read capacity. Clients have the ability to send read and write operations to different servers. You can also maintain copies in different data centers to increase the locality and availability of data for distributed applications.
  • 30. 30 Replica Sets: High Availability The primary accepts all write operations from clients. Replica set can have only one primary. Because only one member can accept write operations, replica sets provide strict consistency. The secondaries replicate the primary’s oplog and apply the operations to their data sets. Secondaries’ data sets reflect the primary’s data set.
  • 31. 31 Replica Sets: High Availability Automatic Failover When a primary does not communicate with the other members of the set for more than 10 seconds, the replica set will attempt to select another member to become the new primary. The first secondary that receives a majority of the votes becomes primary.
  • 32. 32 Sharding: High Scalability And Throughput Sharding is a method for storing data across multiple machines. Purpose of Sharding Database systems with large data sets and high throughput applications can challenge the capacity of a single server. High query rates can exhaust the CPU capacity of the server. Larger data sets exceed the storage capacity of a single machine. Finally, working set sizes larger than the system’s RAM stress the I/O capacity of disk drives.
  • 33. 33 Sharding: high scalability and throughput Sharding, or horizontal scaling, by contrast, divides the data set and distributes the data over multiple servers, or shards. Each shard is an independent database, and collectively, the shards make up a single logical database.
  • 34. 34 Map-Reduce The map-reduce pattern is a way to organize processing in such a way as to take advantage of multiple machines on a cluster while keeping as much processing and the data it needs together on the same machine. It first gained prominence with Google’s Map Reduce framework. "Map" step: The master node takes the input, divides it into smaller sub-problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes the smaller problem, and passes the answer back to its master node. "Reduce" step: The master node then collects the answers to all the sub-problems and combines them in some way to form the output – the answer to the problem it was originally trying to solve.
  • 35. 35
  • 36. 36
  • 37. Advantages of MongoDB over RDBMS Schema less : MongoDB is document database in which one collection holds different documents. Number of fields, content and size of the document can be differ from one document to another. Structure of a single object is clear No complex joins Deep query-ability. MongoDB supports dynamic queries on documents using a document- based query language that's nearly as powerful as SQL Ease of scale-out: MongoDB is easy to scale 37
  • 38.  Why should use MongoDB   Document Oriented Storage : Data is stored in the form of JSON style documents   Index on any attribute   Replication & High Availability   Auto-Sharding   Rich Queries   Fast In-Place Updates   Professional Support By MongoDB  Where should use MongoDB?   Big Data   Content Management and Delivery   Mobile and Social Infrastructure   User Data Management   Data Hub 38
  • 39. 39
  • 40. 40
  • 42.  http://www.mongodb.com/scale  http://www.mongodb.com/partners/cloud/microsoft  http://azure.microsoft.com/en-us/gallery/store/mongodb/mongodb-inc/  http://www.mongodb.com/leading-nosql-database  http://nosql.findthebest.com/  http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis  http://stackoverflow.com/questions/5252577/how-much-faster-is-redis-than-mongodb Azure offered as a Service:  https://mongolab.com/welcome/ mongodb offered as a Service:  http://www.objectrocket.com/  https://www.mongohq.com/ 42
  • 43. 43

Editor's Notes

  1. http://downloadsquad.switched.com/2010/06/29/facebook-doubles-its-server-count-from-30-000-to-60-000-in-just-6-months/ by Sebastian Anthony on June 29, 2010 at 10:00 AM