Containers get Rich

NoSQL Day, Udine – Italy 15-11-2013
STEFANO VALLE

2
http://www.mvlabs.it

3
THE BIG DATA WE USE EVERY DAY
NOSQL IN RESCUE!
OMG! How to choose?
DATA MODEL
key

value
nosqlday

http://nosqlday.it

// Store value
> SET nosqlday http://nosqlday.it

// Retrieve value
> GET nosqlday
http://nosqlday.it
the_godfather
KEY - VALUE
the_godfather
DOCUMENT
COLUMN
Year

Cast

Rating

1994

Tom Hanks,
Robin Wright,
Gary Sinise

8,7

Braveheart

1995

Mel Gibson,
Sophie Marceau,
Patrick McGoohan

Fast &
Furious 7

2014

Forrest Gump
GRAPH
product
1

home

product
2

product
3

17
product
1

landing
page 1

home

landing
page 2

18

product
2

product
3
product
1

landing
page 1

home

landing
page 2

19

product
2

product
3
product
1

landing
page 1

home

landing
page 2

20

product
2

product
3
Classification by data model

Key - value

Column
21

Document
Graph
Is all about
data model?
Let’s suppose exists a RDBMS «category»…

23
Let’s suppose exists a RDBMS «category»…

24
NOSQL ARE SIMPLY
DATA CONTAINERS?
Many more
to say about
NoSQL datastores
OFFLINE WEB APPLICATIONS
OFFLINE WEB APPLICATIONS
GEOSPATIAL SEARCH
GEOSPATIAL SEARCH
KEY-VALUE + LISTS
KEY-VALUE + SETS
VALUE EXPIRATION
NoSQL Mobile Databases

[Lite]
ALL useful things!
ALL useful things!
but don’t limit to

features-first
comparison
37

_______
_______
_______
_______
_______

Durability vs Performance
Durability vs Performance
_______
_______
_______
_______
_______
38

safe mode?
off = data loss risk
Durability vs Performance
safe mode?
off = data loss risk

_______
_______
_______
_______
_______

file system
39
Durability vs Performance
safe mode?
off = data loss risk

_______
_______
_______
_______
_______

Consider use of
Journaling

file system
40
Durability vs Performance
safe mode?
off = data loss risk

_______
_______
_______
_______
_______

Disk / RAID cache?

file system
41
And about scaling?
Goldfish, not thoroughbreds

Scale up
Goldfish, not thoroughbreds

Scale out

Scale up
Goldfish, not thoroughbreds
Allow for
fast, cost-effective,
on-demand growth
(or shrink)

Scale out

Scale up
"I know two companies that collapsed
due to inability to reduce operating
costs when the utilization of their sites
diminished"

Theo Schlossnagle

From Theo’s book "Scalable Internet Architectures"
Ease of
scalability

KeyValue
stores
ColumnFamily
stores

Document
databases

Graph
databases

> 90% of use cases

Complexity
Adapted from http://www.slideshare.net/emileifrem/an-overview-of-nosql-jfokus-2011
Query capability

48
Query capability

function(doc) {
if (doc.city == ‘London’) {
emit(doc._id, null)
}
}
49
Query capability

We couldn’t use user input here

function(doc) {
if (doc.city == ???) {
emit(doc._id, null)
}
}
50
Query capability
db.events.find(
{ city: ‘Rome’ }
)
Here we could use user input!

function(doc) {
if (doc.city == ???) {
emit(doc._id, null)
}
}
51
Distribution model

52
FILTERED MULTI-MASTER  S
Distribution model: MREPLICATION

MASTER

all data

SLAVE
FILTERED MULTI-MASTER  M
Distribution model: MREPLICATION

MASTER

MASTER

MASTER
FILTERED MULTI-MASTER
Filtered multi-master REPLICATION

Product list

MASTER

MASTER

(eg. head quarter)

(eg. customer plant)

Purchases
FILTERED MULTI-MASTER REPLICATION
Scaling reads

MASTER

MASTER

MASTER
FILTERED MULTI-MASTER REPLICATION
Scaling reads

client

client

MASTER

MASTER

MASTER
client
FILTERED MULTI-MASTER REPLICATION
Scaling writes?

MASTER

MASTER

MASTER
FILTERED MULTI-MASTER REPLICATION
Scaling writes?
client

MASTER

MASTER

MASTER
FILTERED
ShardingMULTI-MASTER REPLICATION

Shard 1
[A to F]

Shard 2
[G to N]

Shard 3
[O to T]

Shard 4
[U to Z]
FILTERED MULTI-MASTER REPLICATION
Scaling writes
client

Shard 1
[A to F]

Shard 2
[G to N]

Shard 3
[O to T]

Shard 4
[U to Z]
FILTERED MULTI-MASTER REPLICATION
Scaling writes
client

Shard 1
[A to F]

Shard 2
[G to N]

Shard 3
[O to T]

Shard 4
[U to Z]
R / W data from 2 nodes

63
R / W data from 2 nodes
T1
Node 1

C

X=0

Node 2

C

X=0

64
R / W data from 2 nodes
T1

T2

Node 1

C

X=0

C

X=1

Node 2

C

X=0

C

X=0

65
R / W data from 2 nodes
T1

T2

Node 1

C

X=0

C

X=1

Node 2

C

X=0

C

X=0

66
CAP Theorem
Consistency

Partition
Tolerance

67

Availability
Choose
2

CAP Theorem
Consistency

Partition
Tolerance

68

Availability
(Some of) available solutions
CP:
BigTable
Hbase
MongoDB
Redis
MemcacheDB
etc.

PA:
Dynamo
CouchDB
Cassandra
SimpleDB
Tokyo Cabinet
Voldemort
etc.

69

CA:
RDBMS
etc.

Consistency

Partition
Tolerance

Availability
from

CONSISTENCY
to

EVENTUAL CONSISTENCY
Basic
Availability
Soft state
Eventual consistency
Atomicity
Consistency
Isolation
Durability
Aggregates

Source: AggregateOrientedDatabase - http://martinfowler.com/bliki/AggregateOrientedDatabase.html
74
Aggregates
Atomicity and Isolation
are guaranteed inside
an aggregate

Source: AggregateOrientedDatabase - http://martinfowler.com/bliki/AggregateOrientedDatabase.html
75
ARE YOU SURE WE NEED ACID?
ARE YOU SURE WE NEED ACID?
Safety
vs
Liveness
Safety
vs
Liveness
Availability is revenue!
BACK TO CONTAINERS
Data Model

STARTING FROM DATA MODEL…
Scalability
model

Data durability
Query model

Position
on CAP
Some needful
feature

Performance
Data Model

MANY OTHER THINGS TO CONSIDER
RELATIONAL DBMS
RELATIONAL DBMS
"the relational model
is pretty magical"
Laurie Voss

http://seldo.com/weblog/2010/07/12/in_defence_of_sql
NOSQL DATASTORES
"Big data is like teenage sex: everyone
talks about it, nobody really knows
how to do it, everyone thinks everyone
else is doing it, so everyone claims
they are doing it..."

Dan Ariely

https://www.facebook.com/dan.ariely/posts/904383595868
Schemaless
Schemaless

$doc = $myDb->getDoc('the_godfather');
$year = $doc['year'];
$castCount = count($doc['cast']);
if ($castCount > 0) {
$firstCastName = $doc['cast'][0]['name'];
}
Schemaless
…are you sure?
$doc = $myDb->getDoc('the_godfather');
$year = $doc['year'];
$castCount = count($doc['cast']);
if ($castCount > 0) {
$firstCastName = $doc['cast'][0]['name'];
}

The application is aware of
document schema!
Polyglot persistence
Farm

Node 1

Provisioning

LAPP stack

Node n

91

Devices
status

Data
aggregation
Polyglot persistence made safe
Other
component

Provisioning

layer

92
Polyglot persistence made safe
Other
component

Data Store as a Service

Provisioning

layer

93
Polyglot persistence made safe
Other
component

Provisioning

Anti
Corruption
Layer
94

layer
GOOD APPLICATION DESIGN
GOOD APPLICATION DESIGN

THINK ABOUT DATA LIFECYCLE
GOOD APPLICATION DESIGN

THINK ABOUT DATA LIFECYCLE

NOT ONLY DATA MODEL
That’s all, folks!

Stefano Valle
@stefanovalle
s.valle@mvassociati.it
Photo credits
http://www.flickr.com/photos/aloha75/4571410233
http://www.flickr.com/photos/djnordic/167433120
http://www.flickr.com/photos/jpstanley/69523927
http://www.flickr.com/photos/lodigs/2833648828
http://www.flickr.com/photos/ppym1/387781444
http://www.flickr.com/photos/freefoto/3844247553
http://www.flickr.com/photos/jamesgood/1708602693
http://www.flickr.com/photos/ms_cwang/133084413
http://www.flickr.com/photos/birminghammag/7979485144
http://www.flickr.com/photos/capcase/4970062870

NoSQL Containers get Rich