The document discusses different types of NoSQL databases including key-value stores like Memcached and Redis, document databases like Couchbase and MongoDB, column-oriented databases like Cassandra, and graph databases like Neo4j. It explains the basic data models and architectures of each type of NoSQL database. NoSQL databases provide more flexible schemas and better horizontal scalability than traditional relational databases.
3. Macro Trends Driving NoSQL Technology
More Data
More Users
+
Interactive Apps
+
NoSQL
4. Lacking Solutions, Users Forced to Invent
Bigtable
November 2006
Dynamo
October 2007
Cassandra
August 2008
Voldemort
February 2009
Very few organizations can build and maintain database software technology.
But every organization building interactive web applications needs this technology.
5. What Is Biggest Data Management Problem
Driving Use of NoSQL in Coming Year?
49%
35%
29%
16%
Lack of flexibility/
rigid schemas
Inability to
scale out data
Source: Couchbase Survey, December 2011, n = 1351.
Performance
challenges
Cost
12%
All of these
11%
Other
8. Relational Technology Scales Up
Application Scales Out
Just add more commodity web servers
System Cost
Application Performance
Web/App Server Tier
Users
RDBMS Scales Up
Get a bigger, more complex server
System Cost
Application Performance
Won’t
scale
beyond
this point
Relational Database
Users
Expensive and disruptive sharding, doesn’t perform at web scale
9. Couchbase Server Scales Out Like App
Tier
Application Scales Out
Just add more commodity web servers
System Cost
Application Performance
Web/App Server Tier
Users
NoSQL Database Scales Out
Cost and performance mirrors app tier
System Cost
Application Performance
Couchbase Distributed Data Store
Users
Scaling out flattens the cost and performance curves
10.
11. Differences
• 1. Tables vs Document
-
Relational has tables with predefined columns: Schema pre-determined before
data can be inserted.
Best practice is to normalize by splitting into several tables, joined by PK-FK
relation.
12. Differences
• Tables vs Document (contd.)
-
In Couchbase, there are no tables only documents
A logical entity is stored within a single document
Different documents do not need to have the same set of fields or structure
You differentiate different types of documents either based on key names you
provide or by adding attributes
13. Relational vs Document Data Model
C1
C2
C3
C4
{
JSON
JSON
}
JSON
Relational data model
Document data model
Highly-structured table organization
with rigidly-defined data formats and
record structure.
Collection of complex documents with
arbitrary, nested data formats and
varying “record” format.
14. Differences
• Joins vs logical single document
-
Single logical document. No need for joins.
If normalized and several documents, then use a series of gets
recipe= couchbase.get("my-recipe-id");
reviews = couchbase.multiget(recipe.comments);
• Transactions
-
Relational: Atomicity can span several records across several tables.
NoSQL: Atomicity confined to at document level
16. RDBMS Example: User Profile
User Info
Address Info
KEY
First
Last
ZIP_id
ZIP_id
CITY
STATE
ZIP
1
Dipti
Borkar
2
1
DEN
CO
30303
2
Joe
Smith
2
2
MV
CA
94040
3
Ali
Dodson
2
3
CHI
IL
60609
4
John
Doe
3
4
NY
NY
10010
To get information about specific user, you perform a join across two tables
17. Document Example: User Profile
{
“ID”: 1,
“FIRST”: “Dipti”,
“LAST”: “Borkar”,
“ZIP”: “94040”,
“CITY”: “MV”,
“STATE”: “CA”
=
+
}
JSON
All data in a single document
18. Making a Change Using RDBMS
Photo Table
User Table
User ID
First
Last
Zip
Country
ID
1
Dipti
Borkar
94040
001
Country Table
User ID
TEL
3
Photo ID
Comment
2
d043
NYC
2
b054
Country
ID
Country ID
Country name
001
001
USA
Bday
007
002
UK
003
Argentina
004
Australia
005
Aruba
006
Austria
007
Brazil
008
Canada
009
Chile
2
Joe
Smith
94040
001
5
c036
Miami
001
3
Ali
Dodson
94040
001
7
d072
Sunset
133
5002
e086
Spain
133
4
Sarah
Gorin
NW1
002
5
Bob
Young
30303
001
6
Nancy
Baker
10010
001
Status Table
8
Ray
Jones
Lee
Chen
31311
V5V3M
001
008
.
.
.
•
•
•
Status ID
Text
1
a42
At conf
134
4
b26
excited
007
5
7
User ID
Country
ID
c32
hockey
008
12
d83
Go A’s
001
5000
e34
sailing
005
130
Affiliations Table
User ID
Doug
Moore
04252
001
50001
Mary
White
SW195
002
50002
Lisa
Clark
12425
001
Affl ID
Affl Name
Country
ID
2
a42
Cal
001
4
b96
USC
001
7
50000
•
•
•
c14
UW
001
8
e22
Oxford
002
Portugal
131
Romania
132
Russia
133
Spain
134
Sweden
19. Making the Same Change With a
Document DB
{
“ID”: 1,
“FIRST”: “Dipti”,
“LAST”: “Borkar”,
“ZIP”: “94040”,
“CITY”: “MV”,
“STATE”: “CA”,
“STATUS”:
, “TEXT”: “At Conf”
,}
} “GEO_LOC”: “134” -,
“COUNTRY”: ”USA”
}
JSON
Just add information to a document
20. Relational vs Document Performance
User Table
Photo Table
First
Last
Zip
1
Frank
Wiegel
Weigel
94040
2
Joe
Smith
94040
3
Ali
Dodson
94040
4
Sarah
Gorin
Bob
Young
30303
6
Nancy
Baker
10010
7
Ray
Jones
31311
Photo
ID
Comment
d043
NYC
2
b054
Bday
5
c036
Miami
7
d072
Sunset
5002
e086
Spain
NW1
5
User
ID
2
User
ID
Status Table
Lee
Chen
V5V3
•
•
•
Status
ID
Text
1
a42
At conf
4
5
b26
c032
5
4
c32
b26
hockey
d83
Go A’s
5000
e34
sailing
Affiliations Table
User
ID
5000
Doug
Moore
04252
5001
Mary
White
41694
5002
5002
Lisa
Lisa
Clark
12425
{
excited
12
8
User
ID
{
Affiliations Affiliations
ID
Name
2
a42
b96
c14
UW
8
e22
JSON
JSON
JSON
JSON
JSON
JSON
USC
7
}
}
}
}
}}
Cal
4
{
{
{{
Oxford
Faster response times and higher throughput
30. The Key-Value Store – the foundation of NoSQL
Key
101100101000100010011101
101100101000100010011101
101100101000100010011101
101100101000100010011101
101100101000100010011101
101100101000100010011101
Opaque
101100101000100010011101
Binary
101100101000100010011101
101100101000100010011101
Value
101100101000100010011101
101100101000100010011101
101100101000100010011101
101100101000100010011101
101100101000100010011101
101100101000100010011101
31. Memcached – the NoSQL precursor
Key
101100101000100010011101
101100101000100010011101
101100101000100010011101
101100101000100010011101
101100101000100010011101
101100101000100010011101
Opaque
101100101000100010011101
Binary
101100101000100010011101
101100101000100010011101
Value
101100101000100010011101
101100101000100010011101
101100101000100010011101
101100101000100010011101
101100101000100010011101
101100101000100010011101
memcached
In-memory only
Limited set of operations
Blob Storage: Set, Add, Replace, CAS
Retrieval: Get
Structured Data: Append, Increment
“Simple and fast.”
Challenges: cold cache, disruptive elasticity
32. Couchbase – document-oriented database
Key
Couchbase
{
“string” : “string”,
“string” : value,
“string” :
, “string” : “string”,
JSON
“string” : value -,
OBJECT
“string” : * array +
}
(“DOCUMENT”)
Auto-sharding
Disk-based with built-in memcached cache
Cache refill on restart
Memcached compatible (drop in replace)
Highly-available (data replication)
Add or remove capacity to live cluster
When values are JSON objects (“documents”):
Create indices, views and query against the
views
37. Cassandra – Column overlays
Key
Column 1
Column 2
Column 3
(not present)
101100101000100010011101
101100101000100010011101
101100101000100010011101
101100101000100010011101
101100101000100010011101
101100101000100010011101
Opaque
101100101000100010011101
Binary
101100101000100010011101
101100101000100010011101
Value
101100101000100010011101
101100101000100010011101
101100101000100010011101
101100101000100010011101
101100101000100010011101
101100101000100010011101
Cassandra
Disk-based system
Clustered
External caching required for low-latency reads
“Columns” are overlaid on the data
Not all rows must have all columns
Supports efficient queries on columns
Restart required when adding columns
Good cross-datacenter support
43. Market Adoption
Internet Companies
• Social Gaming
• Ad Networks
• Social Networks
• Online Business Services
• E-Commerce
• Online Media
• Content Management
• Cloud Services
Enterprises
• Communications
• Retail
• Financial Services
• Health Care
• Automotive/Airline
• Agriculture
• Consumer Electronics
• Business Systems
44. Market Adoption – Customers
Internet Companies
Enterprises
More than 300 customers -- 5,000 production deployments worldwide
45. Application Characteristics - Data driven
• 3rd party or user defined structure (Twitter feeds)
• Support for unlimited data growth (Viral apps)
• Data with non-homogenous structure
• Need to quickly and often change data structure
• Variable length documents
• Sparse data records
• Hierarchical data
Couchbase is a good fit
46. Application Characteristics - Performance
driven
• Low latency critical (ex. 1millisecond)
• High throughput (ex. 200000 ops / sec)
• Large number of users
• Unknown demand with sudden growth of users/data
• Predominantly direct document access
• Read / Mixed / Write heavy workloads
Couchbase is a good fit
47. Common Use Cases
Social Gaming
• Couchbase stores
player and game
data
• Examples
customers include:
Zynga
• Tapjoy, Ubisoft, Ten
cent
Mobile Apps
• Couchbase stores user
info and app content
• Examples customers
include: Kobo, Playtika
Ad Targeting
• Couchbase stores
user information for
fast access
• Examples customers
include:
AOL, Mediamind, Co
nvertro
Session store
• Couchbase Server as a keyvalue store
• Examples customers include:
Concur, Sabre
User Profile Store
• Couchbase Server as a
key-value store
• Examples customers
include: Tunewiki
High availability cache
• Couchbase Server used as a cache tier replacement
• Examples customers include: Orbitz
Content & Metadata
Store
• Couchbase document store
with Elasticsearch
• Examples customers
include: McGraw Hill,
Tunewiki
3rd party data aggregation
• Couchbase stores social media and
data feeds
• Examples customers include:
Sambacloud
These slides are meant to discuss and address the technical differences of RDBMS vs. NoSQL from a document modeling and performance/scale perspective.
Get “But every . . .” onto 1 line.
These are the 4 “promises” of NoSQL
Do not failover a healthy node!
Do not failover a healthy node!
Most of you are probably familiar with the table layout. A table is defined with a set of column. And each record in the table conforms to the schema. If you wish to capture different data in the future, the table schema must be changed using the alter table statement. Typically data is normalized in the 3rd normal form reduce duplication. Large tables are split into smaller tables.using foreign keys
Do not failover a healthy node!
Summary bullet should read “To get info about a specific user you perform a join across two tables”Shouldn’t “Geo Info” be “Address Info”Changes names so they are employee names.Example. Normalized schema 2 tables Fk connects the two. To get information about a specific error, you will perform and join across the two tables
Example. Normalized schema 2 tables Fk connects the two. To get information about a specific error, you will perform and join across the two tables
The data is modeled for the application code and not for the database.
Most of you are probably familiar with the table layout. A table is defined with a set of column. And each record in the table conforms to the schema. If you wish to capture different data in the future, the table schema must be changed using the alter table statement. Typically data is normalized in the 3rd normal form reduce duplication. Large tables are split into smaller tables.using foreign keys
This example shows how a very simple user profile is represented.In a relational database, the user information might be represented across 2 interrelated tables.In a document database, the information is aggregated into a single document that is very natural to program with.So, this is how the data model is different. Let’s now talk about how scalability is different.
This example shows how a very simple user profile is represented.In a relational database, the user information might be represented across 2 interrelated tables.In a document database, the information is aggregated into a single document that is very natural to program with.So, this is how the data model is different. Let’s now talk about how scalability is different.
This example shows how a very simple user profile is represented.In a relational database, the user information might be represented across 2 interrelated tables.In a document database, the information is aggregated into a single document that is very natural to program with.So, this is how the data model is different. Let’s now talk about how scalability is different.
This example shows how a very simple user profile is represented.In a relational database, the user information might be represented across 2 interrelated tables.In a document database, the information is aggregated into a single document that is very natural to program with.So, this is how the data model is different. Let’s now talk about how scalability is different.
This example shows how a very simple user profile is represented.In a relational database, the user information might be represented across 2 interrelated tables.In a document database, the information is aggregated into a single document that is very natural to program with.So, this is how the data model is different. Let’s now talk about how scalability is different.
This example shows how a very simple user profile is represented.In a relational database, the user information might be represented across 2 interrelated tables.In a document database, the information is aggregated into a single document that is very natural to program with.So, this is how the data model is different. Let’s now talk about how scalability is different.
This example shows how a very simple user profile is represented.In a relational database, the user information might be represented across 2 interrelated tables.In a document database, the information is aggregated into a single document that is very natural to program with.So, this is how the data model is different. Let’s now talk about how scalability is different.
These are the market segments
Partial listing of companies with paid production deploymentsThousands more using open source