NoSQL

NoSQL
Databases

Yousof
Alsatom

Wirtscha1sinforma3k
Master
Program

Humboldt-‐Universität
zu
Berlin

2012

Agenda

•  Rela3onal
databases
model

•  Advantages
&
Disadvantages

•  NoSql

•  Basic
Concepts,
Technique
and
PaOern
in
comparison
with
DBRMS

•  Consistency

•  Par33oning

•  Storage
Layout

2

Agenda

•  NoSQL
data
model

•  Key
–
Value

•  DynamoDB

•  Big
table
–
column
family

•  Google
bigtable

•  Document
Databases

•  CouchDB

•  GraphDB

•  Neo4j

•  Conclusion

3

Database
and
DBMS

•  In
essence,
a
database
is
a
collec3on
of
data
that
exists
over
a
long
period
of

3me,
o1en
many
years.

• 
Commonly,
the
term
database
refers
to
a
collec3on
of
data
that
is
managed

by
a
Database
Management
System
(DBMS).

•  A
DBMS
is
a
(powerful)
tool
for
crea3ng
and
managing
large
amounts
of
data

eﬃciently
and
allowing
it
to
persist
over
long
periods
of
3me,
safely.

4

Rela9onal
Model

•  A
rela3onal
database
is
a
collec3on
of
data
items
organized
as
a
set
of

formally-‐described
tables
from
which
data
can
be
accessed
or
reassembled
in

many
diﬀerent
ways
without
having
to
reorganize
the
database
tables.

[techtarget.com].

Edgar
Frank
"Ted"
Codd

(August
23,
1923
–
April
18,
2003)

IBM,

5

Rela9onal
Database

•  A
rela9onal
database
is
a
collec3on
of
data
items
organized
as
a
set
of

formally
described
tables
from
which
data
can
be
accessed
easily
[Wikipedia].

6

Example,
Project
Management
System
[Qian
Sha,
2003]

7

Example,
Project
Management
System
[Qian
Sha,
2003]

8

Example,
Project
Management
System
[Qian
Sha,
2003]

•  Possible
queries

•  Give
ma
all
employees
who
is
working
in
project
X

•  Give
me
the
percentage
of
progress
for
project
Y

9

Rela9onal
Database,
Advantages

•  Reliability

•  ACID

•  Atomicity
:
All
or
nothing

•  Consistency

•  Isola3on

•  concurrent
execu3on
of
transac3ons
results
in
a
system
state
that
could

have
been
obtained
if
transac3ons
are
executed
serially

•  Durability

•  means
that
once
a
transac3on
has
been
commiJed,
it
will
remain
so,

even
in
the
event
of
power
loss,
crashes,
or
errors.

10

Rela9onal
Database,
Limita9on

•  Scalability

•  Users
can
scale
a
rela3onal
database
by
running
it
on
a
more
powerful—
and
expensive—
computer.

•  To
scale
beyond
a
certain
point,
though,
it
must
be
distributed
across

mul3ple
servers.

•  Rela3onal
databases
don’t
work
easily
in
a
distributed
manner
because

joining
their
tables
across
a
distributed
system
is
diﬃcult.
[Jeremy

Zawodny]

•  Complexity

•  Convert
all
data
into
tables,
Complex,
slow
(Exampl
:
Wikipedia)

•  SQL
can
work
only
with
structured
data
[
Prof.
Stefan
Edlich,
Beuth
University

of
Applied
Sciences
in
Berlin]

11

Rela9onal
Database,
Limita9on

Spandauer Str.1, Berlin

12

Problem!

Diversity
Connec3vity
Data
size

?
?
?

13

NoSQL

•  Not
using
the
rela3onal
model
(nor
the
SQL
language)

•  No
schema,
allowing
ﬁelds
to
be
added
to
any
record
without
controls

•  Open
source

•  Designed
to
work
on
large
clusters

•  Based
on
the
needs
of
21st
century
web
proper3es

15

NoSQL,
History

•  Carlo
Strozzi
used
the
term
NoSQL
in
1998
to
name
his
lightweight,
open-‐
source
rela3onal
database
that
did
not
expose
the
standard
SQL
interface.

•  Johan
Oskarsson
has
organized
a
meetup
for
folks
interested
in
distributed

structured
data
storage
and
is
calling
it
NoSQL.
The
event,
being
held
June

11th
in
San
Fransisco,

16

NoSQL

•  Consistency

•  It
uses
an
eventual
consistency
(consistency
model
used
in
the
parallel

programming).

•  Weak
consistent

•  Par33oning

•  Automa3c
Par33oning
(Data
is
growing
)

•  Storage
Layout

•  Row-‐Based
Storage
Layout

•  Columnar
Storage
Layout

•  …

17

NoSQL

•  Data
Model

•  Key
/
Value

•  Bigtable

•  DocumentDB

•  GraphDB

18

Hash
Table

•  Type
Unsorted
associa3ve
array

•  Invented:
1953

•  Time
complexity
:
in
big
O
nota3on

Average
Worst
case

Space
O(n)
O(n)

Search
O(1
+
n/k)
O(n)

Insert
O(1)
O(n)

Delete
O(1
+
n/k)
O(n)

Wikipedia
:
hOp://en.wikipedia.org/wiki/Hash_tables

20

Key
–
Value

•  The
infrastructure
is
made
up
by
tens

of
thousands
of
servers
and
network

components
located
in
many

datacenters
around
the
world.

•  Availability
&
reliability

are
the
most

important
factors
for
Amazon

•  Dynamo
targets
to
achieve
high

availability
with
less
consistency

Service-‐oriented
architecture
of
Amazon’s
plaXorm

Dynamo:
Amazon’s
Highly
Available
Key-‐value
Store.
September
2007.

21

Key
–
Value,
Dynamo
History

•  Giuseppe
DeCandia
militate
against
RDMBSs
at
Amazon

•  They
admit
that
advances
have
been
made
to
scale
and
par33on
RDBMSs

but
state
that
such
setups
remain
diﬃcult
to
conﬁgure
and
operate,

2006

•  Dynamo
has
built
on
2007

22

Dynamo,
Consistency
Hashing

Data
is
par33oned
and
replicated
using
consistent
hashing

•  Goal
:
Scalability
and
Availability

• 
the
output
range
of
a
hash
func3on
is
treated
as
a
ﬁxed
circular
space
or

““ring”

•  Ordered
(new
node
take
random
key)

•  Clockwise

•  Departure
or
arrival
a
node
eﬀect
only

neighbors

•  Each
node
becomes
responsible
for
the
region
in
the
ring
between
it
and
its

predecessor
node
on
the
ring.

•  ”Virtual
Nodes”:
Each
node
can
be
responsible
for
more
than
one
virtual
node.

Dynamo:
Amazon’s
Highly
Available
Key-‐value
Store.
September
2007.
23

Dynamo,
Vector
Clock

•  Data
Versioning,
Dynamo
uses
vector
Object
Node

clocks
in
order
to
capture
causality

between
different
versions
of
the

same
object.

Clock

•  A
vector
clock
is
a
list
of
(node,

counter)
pairs.

•  Every
version
of
every
object
is

associated
with
one
vector
clock.

•  If
the
counters
on
the
first
object’s

clock
are
less-‐than-‐or-‐equal
to
all
of

the
nodes
in
the
second
clock,
then

the
first
is
an
ancestor
of
the
second

and
can
be
forgoOen.

Dynamo:
Amazon’s
Highly
Available
Key-‐value
Store.
September
2007.

24

Dynamo,
Overview

Source
:
hOp://de.wikipedia.org/wiki/Amazon_Dynamo

25

Dynamo,
Sloppy
Quorum

•  Handling
Failures,
Sloppy
Quorum

•  A
quorum
is
the
minimum
number
of
votes
that
a
distributed
transac3on

has
to
obtain
in
order
to
be
allowed
to
perform
an
opera3on
in
a

distributed
system.
[Wikipedia]

•  Sloppy
Quorum

•  read
and
write
opera3ons
are
performed
on
the
ﬁrst
N
healthy
nodes

from
the
preference
list,
which
may
not
always
be
the
ﬁrst
N
nodes

encountered
while
walking
the
consistent
hashing
ring.

•  Example
:

•  A
is
down
…

•  D
has
meta
data

•  When
A
come
back,
D
will
aOempt
to

deliver
the
replica
to
A

Dynamo:
Amazon’s
Highly
Available
Key-‐value
Store.
September
2007.
26

Dynamo,
Gossip-‐based
membership
protocol
and
failure

detec9on.

•  A
gossip-‐based
protocol
propagates
membership
changes
and
maintains

an
eventually
consistent
view
of
membership.

27

Key
–
Value,
Dynamo

Problem
Technique
Advantage

Par33oning
Consistent
Hashing
Incremental
Scalability

Vector
clocks
with
reconcilia3on
Version
size
is
decoupled
from
update

High
Availability
for
writes

during
reads
rates.

Handling
temporary
failures
Sloppy
Quorum
and
hinted
handoﬀ
Provides
high
availability
and

durability
guarantee
when
some
of

the
replicas
are
not
available.

Synchronizes
divergent
replicas
in
the

Recovering
from
permanent
failures
An3-‐entropy
using
Merkle
trees

background.

Preserves
symmetry
and
avoids
having

a
centralized
registry
for
storing

Gossip-‐based
membership
protocol
membership
and
node
liveness

Membership
and
failure
detec3on

and
failure
detec3on.
informa3on.

Dynamo:
Amazon’s
Highly
Available
Key-‐value
Store.
September
2007.

28

Key
–
Value,
Dynamo

•  Query
Model

•  get(key)
:
objects,
context

•  Context:
metadata
such
as
the
object
version
is
stored,
it
is
useful

in
case
of
conﬂict

•  put(key,
context,
object),
The
key
is
hashed
by
the
MD5
algorithm

29

Other
Key
/
Value
NoSQL
tools

Riak
makes
data
highly
available
for
use
in
read
and
write-‐intensive
web

applica3ons.

30

Bigtable

•  Bigtable
is
described
as
“a
distributed
storage
system
for
managing

structured
data
that
is
designed
to
scale
to
a
very
large
size:
petabytes
of

data
across
thousands
of
commodity
servers”
[Google
Labs]

•  Bigtable

•  distributed,

•  Persistent
mul3-‐
dimensional
sorted
map.

•  The
map
is
indexed
by
a
row
key,
column
key,
and
a
3mestamp

•  Each
value
in
the
map
is
an
uninterpreted
array
of
bytes.

•  (row:string,
column:string,
3me:int64)
→
string

32

Google’s
Bigtable

•  It
is
used
by
over
sixty
projects
at
Google
as
of
2006,

•  Web
indexing

•  Google
Earth

•  Google
Analy3cs

•  Orkut

•  Google
Docs

33

Google’s
Bigtable,
Data
Model

•  Store
CNN
Web
pages

•  Row
name
is
the
reversed
URL

•  Contents
column
family
contains
the
page
contents

•  Anchor column family contains the text of any anchors that
reference the page

Row

Column
Family

A
Distributed
Storage
System
for
Structured
Data.
November
2006.

hOp://labs.google.com/papers/bigtable-‐osdi06.pdf

34

Google’s
Bigtable,
Data
Model

•  CNN’s
home
page
is
referenced
by
both
the
Sports
Illustrated
and
the
MY-‐
look
home
pages.

•  The
row
contains
columns
named
anchor:cnnsi.com
and

anchor:my.look.ca.

•  t3
:
3me
stamp

Row

Column
Family

A
Distributed
Storage
System
for
Structured
Data.
November
2006.

hOp://labs.google.com/papers/bigtable-‐osdi06.pdf

35

Google’s
Bigtable,
Data
Model

Tablet,
Rows
from
same
domain

Com.google.docs

Com.google.mail

Com.google.play

Tablet,
lexicographic
order

36

Google’s
Bigtable,
Data
Model

•  Notes

•  Has
no
ﬁxed
of
number
of
rows
or
columns

•  Every
value
also
has
an
associated
3mestamp

•  Each
value
is
addressed
by
the
triple
(domain-‐name,
column-‐name,

3mestamp)

37

Google’s
Bigtable,
Query
Model

•  Wri3ng
to
table

38

Google’s
Bigtable,
Query
Model

•  Reading
from
table

39

Google’s
Bigtable,
More

•  Example
with
eclipse
:
hOp://www.kobu.com/appeng/index-‐en.htm

•  Bigtable
as
a
web
service
:
hOp://bigtable.appspot.com/

•  Performance
and
benchmarking:
Chang,
Fay
;
Dean,
Jeﬀrey
;
Ghemawat,

Sanjay
;
Hsieh,
Wilson
C.
;
Wallach,
Deborah
A.
;
Burrows,
Mike
;
Chandra,

Tushar
;
Fikes,
Andrew
;
Gruber,
Robert
E.:
Bigtable:
A
Distributed

Storage
System
for
Structured
Data.
November
2006.
–
hOp://
labs.google.com/papers/bigtable-‐osdi06.pdf

40

Other
Bigtable
NoSQL
tools

Use
HBase
when
you
need
random,
real3me
read/write
access
to
your
Big

Data.
This
project's
goal
is
the
hos3ng
of
very
large
tables

41

Document
Databases

42

Document
Databases

•  Storing,
retrieving,
and
managing
document-‐oriented,
or
semi
structured

data,
informa3on

•  Documents
encapsulate
and
encode
data
(or
informa3on)
in
some

standard
formats
or
encodings.

•  Encodings
in
use
include
XML,
YAML,
JSON,
and
BSON,
as
well
as
binary

forms
like
PDF
and
Microso1
Oﬃce
documents
(MS
Word,
Excel,
and
so

on).

Wikipedia
:
hOp://en.wikipedia.org/wiki/Document-‐oriented_database

43

CouchDB

•  Distributed
Database
System

•  Before
each
document
saved
as
XML

•  Javascript
func3on
(JSON
for
steriliza3on)
select
and
aggregate
documents

•  Current
Release
:
1.2
(April
2012)

•  Started
on
2005

•  Ini3a3ve
:
Damien
Katz

44

CouchDB,
Overview

•  Implemented
by
ERLANG

•  ERLANG

•  Func3onal
language

•  It
was
designed
by
Ericsson
to
support
distributed,
fault-‐tolerant,
so1-‐
real-‐3me,
non-‐stop
applica3ons.

•  Code
example

fac(N)
when
N
>
0,
is_integer(N)
-‐>
N
*
fac(N-‐1)

45

CouchDB,
Overview

•  Documents
consist
of
named
ﬁelds

•  key/name
and
a
value.

•  Fieldname
has
to
be
unique
within
a
document

•  Value
may
a
string
(of
arbitrary
length),
number,
boolean,
date,
an

ordered
list
or
an
associa3ve
map,
document
could
refer
to
another

document

•  Example,
wiki
ar3cle
(document):

•  "Title"
:
"CouchDB”,

•  "Last
editor"
:
"172.5.123.91”,

•  "Last
modiﬁed":
"9/23/2010”,

•  "Categories":
["Database",
"NoSQL",
"Document
Database"],

•  "Body":
"CouchDB
is
a
...",

•  "Reviewed":
false

46

CouchDB,
Overview

•  Each document has an id : 128 bit value

•  Version number 32 bit value

•  B-Trees do document indexing (id, version, some meta-data)

47

CouchDB

•  CouchDB
uses
B-‐tree
storage
engine
for
all
internal
data,
documents,
and

views.

•  Using
MapReduce,
return
and
key
or
range,
complexity
O(log
N)

Source
:CouchDB
the
Deﬁni3ve
Guide,
O’REILLY,
Andelson,
Lebnardt
&
Slater

48

CouchDB,
Revisions

•  If
you
want
to
change
a
field
in
specific
document?

•  Load
document

•  Change
it
in
JSON
or
your
object
in
actual
programming

•  For
update
or
delete
a
document,
CouchDB
expects
you
include
a
_rev

•  When
CouchDB
confirms
changes,
it
generate
a
new
_rev

•  This
revision
system
also
called
a
Mul3-‐Version
Concurrency
control

MVCC

49

CouchDB,
Locking
Mechanism

•  Mul3
Version
Concurrency
Control
MVCC

•  Documents
in
CouchDB
saved
like
they
are
in
Subversion
Control

Source
:
CouchDB
the
Deﬁni3ve
Guide,
O’REILLY,
Andelson,
Lebnardt
&
Slater

50

CouchDB,
Views

{

"_id":"hello-‐world",

"_rev":"43FBA4E7AB",

"3tle":"Hello
World”,

"body":"Well
hello
and
welcome
to
my
new
blog...",

"date":"2009/01/15
15:52:20"

}

{

"_id":"bought-‐a-‐cat",

"_rev":"4A3BBEE711",

"3tle":"Bought
a
Cat",

"body":"I
went
to
the
the
pet
store
earlier
and
brought
home
a
liOle
kiOy...",

"date":"2009/02/17
21:13:39"

}

func3on(doc)
{

if(doc.date
&&
doc.3tle)
{

emit(doc.date,
doc.3tle);
}

}

51

CouchDB,
AJachement

•  CouchDB
documents
can
have
aOachments
just
like
an
email
message
can

have
aOachments.

•  AOachment
is
iden3ﬁed
by

•  Name

•  MIME
type
(or
Content-‐Type),
any
data

•  Number
of
bytes
the
aOachment
contains.

•  Example
:

•  curl
-‐vX
PUT
hOp://127.0.0.1:5984/albums/
6e1295ed6c29495e54cc05947f18c8af/

artwork.jpg?
rev=2-‐2739352689
-‐-‐data-‐binary
@artwork.jpg
-‐H
"Content-‐Type:

image/jpg"

•  Retrieve
aOachment:

•  h7p://
127.0.0.1:5984/albums/6e1295ed6c29495e54cc05947f18c8af/
artwork.jpg

52

CouchDB,
Replica9on

•  CouchDB
replica3on
is
a
mechanism
to
synchronize
databases.

•  Replica3on
synchronizes
two
databases
locally
or
remotely.

53

CouchDB,
Replica9on

•  Create
target
Database
(it
is
not
automa3c)

•  curl
-‐X
PUT
hOp://127.0.0.1:5984/albums-‐replica

•  Perform
replica3on:

•  curl
-‐vX
POST
hOp://127.0.0.1:5984/_replicate

-‐d

'{"source":"albums","target":"albums-‐replica"}'

•  What
we
did
local
replica3on,
it
is
useful
for
backup
or
to
ac3viate
roll
back

•  It
is
important
to
note
that
replica3on
replicates
the
database
only
as
it

was
at
the
point
in
3me
when
replica3on
was
started.

54

Other
Document
Database
tools

•  MongoDB
(from
"humongous")
is
a
scalable,
high-‐performance,
open

source
NoSQL
database.
WriOen
in
C++,

55

Graph
Database

hOp://www.herr-‐rau.de/wordpress/2006/06/your-‐website-‐as-‐a-‐graph.htm

56

Graph
Databases

•  A
graph
database
uses
graph
structures
with
nodes,
edges,
and
proper3es

to
represent
and
store
data.
By
deﬁni3on,
a
graph
database
is
any
storage

system
that
provides
index-‐free
adjacency.
This
means
that
every
element

contains
a
direct
pointer
to
its
adjacent
element
and
no
index
lookups
are

necessary
[Wikipedia].

57

Graph
Databases

Survey
of
Graph
Database
Models
,
ACM
Compu3ng
Surveys,
Vol.
40,
No.
1,
Ar3cle
1,

Publica3on
date:
February
2008.
RENZO
ANGLES
and
CLAUDIO
GUTIERREZ,
University
Chile

58

Graph
Databases,
Data
model
proper9es

•  Graph
databases
are
o1en
faster
for
associa3ve
data
sets

•  Scale
more
naturally
to
large
data
sets
as
they
do
not
typically
require

expensive
join
opera3ons.

•  As
they
depend
less
on
a
rigid
schema,
they
are
more
suitable
to
manage

ad-‐hoc
and
changing
data
with
evolving
schemas.

•  Graph
databases
are
a
powerful
tool
for
graph-‐like
queries

•  Compu3ng
the
shortest
path
between
two
nodes
in
the
graph.

•  Other
graph-‐like
queries
can
be
performed
over
a
graph
database
in
a

natural
way
(for
example
graph's
diameter
computa3ons
or

community
detec3on).

59

Graph
Databases,
Neo4j

•  Neo4j
is
an
open-‐source
graph
database,
implemented
in
Java.

•  The
developers
describe
Neo4j
as
"embedded,
disk-‐based,
fully

transac3onal
Java
persistence
engine
that
stores
data
structured
in
graphs

rather
than
in
tables".

•  Neo4j
version
1.0
was
released
in
February,
2010.

•  Neo4j
was
developed
by
Neo
Technology,
Inc.,
based
in
the
San
Francisco

Bay
Area,
US
and
Malmö,
Sweden.

60

Neo4j,
Node
&
Rela9on

•  A
Graph
contains
Nodes
and
Rela3onships

•  “A
Graph
—records
data
in→
Nodes

—which
have→
Proper3es”

•  “Nodes
—are
organized
by→

Rela3onships
—which
also
have→

Proper3es”

61

Neo4j,
Traversal

• 
Query
a
Graph
with
a
Traversal

•  Traversal
—navigates→
a

Graph;
it
—iden3fies→
Paths

—which
order→
Nodes

•  A
Traversal
is
how
you
query

a
Graph,
naviga3ng
from

star3ng
Nodes
to
related

Nodes
according
to
an

algorithm,
finding
answers
to

ques3ons
like
“what
music

do
my
friends
like
that
I
don’t

yet
own,”
or
“if
this
power

supply
goes
down,
what
web

services
are
affected?”

62

Neo4j,
Indexes

•  Indexes
look-‐up
Nodes
or
Rela3onships

•  “An
Index
—maps
from→
Proper3es

—to
either→
Nodes
or
Rela3onships”

•  O1en,
you
want
to
find
a
specific

Node
or
Rela9onship
according
to
a

Property
it
has.
Rather
than

traversing
the
en3re
graph,
use
an

Index
to
perform
a
look-‐up,
for

ques3ons
like
“find
the
Account
for

username
master-‐of-‐graphs.”

63

Neo4j,
Database

•  Neo4j
is
a
Graph
Database

•  “A
Graph
Database
—
manages
a→
Graph

and
—also
manages

related→
Indexes”

64

Neo4j

Helloworld
example

firstNode
=
graphDb.createNode();

firstNode.setProperty(
"message",
"Hello,
"
);

secondNode
=

secondNode.setProperty(
"message",
"World!"
);

rela3onship
=
firstNode.createRela3onshipTo(
secondNode,
RelTypes.KNOWS
);

rela3onship.setProperty(
"message",
"brave
Neo4j
"
);

65

Neo4j

&
Java
&
eclipse

Tutorial
:

hOp://technoracle.blogspot.de/2012/05/third-‐neo4j-‐tutorial-‐geˆng-‐started.html

•  import
org.neo4j.graphdb.GraphDatabaseService;

•  DB_PATH
=
“/Users/neo4j-‐1.8”

•  GraphDatabaseService
graphDb;

•  Node
myFirstNode;

•  Rela3onship
myRela3onship;

•  graphDb
=
new
GraphDatabaseFactory().newEmbeddedDatabase(
DB_PATH
);

•  myFirstNode
=

•  myFirstNode.setProperty(
"name",
"Duane
Nickull,
I
Braineater"
);

•  mySecondNode
=

•  mySecondNode.setProperty(
"name",
"Randy
Rampage,
Annihilator"
);

•  myRela3onship
=
myFirstNode.createRela3onshipTo(
mySecondNode,

RelTypes.KNOWS
);

•  myRela3onship.setProperty(
"rela3onship-‐type",
"knows"
);

66

Other
Graph
Database
tools

•  BigData
RDF

•  SPARQL

•  RDFS+
inference

67

NoSQL,
BASE

•  NoSQL
characterized
by
BASE:

• 

•  Basically
Available:
Use
replica3on
to
reduce
the
likelihood
of
data

unavailability
and
use
sharding,
or
par33oning
the
data
among
many

diﬀerent
storage
servers,
to
make
any
remaining
failures
par3al.
The
result
is

a
system
that
is
always
available,
even
if
subsets
of
the
data
become

unavailable
for
short
periods
of
3me.

•  So1
state:
While
ACID
systems
assume
that
data
consistency
is
a
hard

requirement,
NoSQL
systems
allow
data
to
be
inconsistent
and
relegate

designing
around
such
inconsistencies
to
applica3on
developers.

•  Eventually
consistent:
Although
applica3ons
must
deal
with
instantaneous

consistency,
NoSQL
systems
ensure
that
at
some
future
point
in
3me
the
data

assumes
a
consistent
state.
In
contrast
to
ACID
systems
that
enforce

consistency
at
transac3on
commit,
NoSQL
guarantees
consistency
only
at

some
undeﬁned
future
3me.

69

ACID
vs.
BASE

noSQL
Databases,
Prof.
Walter
Kriha,
StuOgart
Media
University

70

Sta9s9cs

•  The
worldwide
NoSQL
market
is
expected
to
reach
$3.4
Billion
by
2018
at
a

CAGR
of
21%
between
2013
and
2018.
NoSQL
market
will
generate
$14

Billion
in
revenues
over
the
period
2013
–
2018.

•  CAGR
:
Compound
annual
growth
rate

•  V(t0)
:
start
value,
V(tn)

:
ﬁnish
value,

•  tn-‐
t0

:
number
of
years.

Resource
:
hOp://www.marketresearchmedia.com/2010/11/11/nosql-‐market/

71

When
to
USE?

Size

Key
-‐
Value

Bigtable

Doc-‐DB

GraphDB

Complexity

From neo4j

72

When
to
USE?

hOp://paolodedios.com/blog/2010/5/19/the-‐visual-‐guide-‐to-‐nosql-‐systems.html

73

Who
uses
NoSQL

FlockDB

Dynamo

Cassandra

Bigtable

74

Resources

http://www.stu-dentdiaries.com/2010_05_01_archive.html
75

Resources,
Books

76

Papers

1.  DeCandia,
Giuseppe
;
Hastorun,
Deniz
;
Jampani,
Madan
;
Kakulapa3,
Gu-‐

navardhan
;
Lakshman,
Avinash
;
Pilchin,
Alex
;
Sivasubramanian,
Swaminathan
;

Vosshall,
Peter
;
Vogels,
Werner:
Dynamo:
Amazon’s
Highly
Available
Key-‐value

Store.
September
2007.

2.  Chang,
Fay
;
Dean,
Jeﬀrey
;
Ghemawat,
Sanjay
;
Hsieh,
Wilson
C.
;
Wallach,
Deborah

A.
;
Burrows,
Mike
;
Chandra,
Tushar
;
Fikes,
Andrew
;
Gruber,
Robert
E.:
Bigtable:
A

Distributed
Storage
System
for
Structured
Data.
November
2006.
–
hOp://
labs.google.com/papers/bigtable-‐osdi06.pdf

3.  Fay
Chang,
Jeﬀrey
Dean,
Sanjay
Ghemawat,
Wilson
C.
Hsieh,
Deborah
A.
Wallach

Mike
Burrows,
Tushar
Chandra,
Andrew
Fikes,
Robert
E.
Gruber:
Bigtable:
A

Distributed
Storage
System
for
Structured
Data
2006

4.  RENZO
ANGLES
and
CLAUDIO
GUTIERREZ,
University
Chile
:
Survey
of
Graph

Database
Models
,
ACM
Compu3ng
Surveys,
Vol.
40,
No.
1,
Ar3cle
1,
Publica3on

date:
February
2008.

77

Papers

5.  Survey
of
Graph
Database
Performance
on
the
HPC
Scalable
Graph
Analysis

Benchmark,
D.
Dominguez-‐Sal,
P.
Urb
́on-‐Bayes,
A.
Gim
enez-‐Van
̃o
́,
S.
Go

́
́mez-‐Villamor,
N.
Mart
́ınez-‐Baz
́an,
and
J.L.
Larriba-‐Pey,
Universitat

Polit`ecnica
de
Catalunya,

2010

6.  Chad
Vicknair,
Michael
Macias:
A
Comparison
of
a
Graph
Database
and
a

Rela3onal
Database,
A
Data
Provenance
Perspec3ve
,
ACMSE
’10,
April

15-‐17,
2010,
Oxford,
MS,
USA

7.  Bradford
Stephens.
HBase
vs.
Cassandra:
NoSQL
Bat-‐
tle!,
2009.
hOp://
www.roadtofailure.com/2009/10/29/
hbase-‐vs-‐cassandra-‐nosql-‐baOle/
comment-‐page-‐1/,
last
accessed
on
February
2011.

8.  ON-‐LINE
PROJECT
MANAGEMENT
SYSTEM,
Qian
Sha

Bachelor
of
Economics,
Capital
University
of
Economics
and
Business,
2003

Will
NoSQL
Databases
Live
Up
to
Their
Promise?
Neal
LeaviO,
2010

78

Papers

9.  Karger,
D.,
Lehman,
E.,
Leighton,
T.,
Panigrahy,
R.,
Levine,
M.,
and
Lewin,
D.
1997.

Consistent
hashing
and
random
trees:
distributed
caching
protocols
for
relieving
hot

spots
on
the
World
Wide
Web.
In
Proceedings
of
the
Twenty-‐Ninth
Annual
ACM

Symposium
on
theory
of
Compu3ng
(El
Paso,
Texas,
United
States,
May
04
-‐
06,

1997).
STOC
'97.
ACM
Press,
New
York,
NY,
654-‐663.

10. Lamport,
L.Time,
clocks
and
the
ordering
of
events
in
a
distributed
system.
ACM

Communica3ons,
21(7),
pp.
558-‐
565,
1978.

11. André
Allavena
,
Alan
Demers,
John
E.
Hopcro1
:
Correctness
of
a
Gossip
Based

Membership
Protocol

NY
2005,
ACM
1-‐58113-‐994-‐2/05/0007

79

Resources,
Web
link

•  Introduc3on
data
structure
for
GraphDB,
Shunya
Kimura

:

hOp://www.slideshare.net/skimura/graphdatabase-‐data-‐structure

•  Compare
nosql
database
:
hOp://nosql.ﬁndthebest.com/

•  Oracle
White
paper
Sep.2011
Oracle
NoSQL
Database

•  CouchDB:
hOp://www.couchbase.com/

•  Open
Source
implementa3on
of
Big
Table:
HBase,
hOp://hbase.apache.org/

•  hOp://www.db-‐class.org/course/video/preview_list
(Stanford
university)

•  hOp://technirvanaa.wordpress.com/tag/nosql-‐disadvantages/

(March.
2011)

•  hOp://www.kavistechnology.com/blog/?p=1577

(March
2010)

•  hOp://www.couchbase.com/press-‐releases/couchbase-‐survey-‐shows-‐accelerated-‐
adop3on-‐nosql-‐2012

(Survey
2012)

•  hOp://www.couchbase.com/why-‐nosql/nosql-‐database

•  Couch
DB
wiki
:
hOp://wiki.apache.org/couchdb/

•  hOp://highlyscalable.wordpress.com/2012/03/01/nosql-‐data-‐modeling-‐techniques/

(Very
good)

•  hOp://neo4j.org/

•  hOp://blog.neo4j.org/2010/03/modeling-‐categories-‐in-‐graph-‐database.html

•  Neo4j
documenta3on
:
hOp://components.neo4j.org/neo4j/1.8.M05/apidocs/

•  SQL
Databases
v.
noSQL
Databases,
Michael
Stonebraker,
MIT,
2010

80

Do
you
want
to
know
more?

•  What
The
Heck
Are
You
Actually
Using
Nosql
For?

hOp://highscalability.com/blog/2010/12/6/what-‐the-‐heck-‐are-‐you-‐actually-‐
using-‐nosql-‐for.html

Nice
Tutorials
for
couchDB

hOp://couchapp.org/page/videos

81

CouchDB,
Example

•  Download
CouchDB
from
:
hOp://couchdb.apache.org/

•  Example
source
:
Source
:
CouchDB
the
Deﬁni3ve
Guide,
O’REILLY,

Andelson,
Lebnardt
&
Slater
(
hOp://guide.couchdb.org/dra1/tour.html#ﬁgure/4
)

•  GO
-‐>

hOp://127.0.0.1:5984/

82

NoSQL

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to NoSQL

Similar to NoSQL (20)

Recently uploaded

Recently uploaded (20)

NoSQL