Bristol Uni - Use Cases of NoSQL

S Q L &
N O S Q L
D a v i d S i m o n s
@ S w a m W i t h Tu r t l e s

W H O A M I ?
• Tech Lead/Consultant at
Softwire
• Background in Statistics &
Computer Simulation

W H AT D O W E D O ?
• Business Analysis/Mapping
• Architecture
• Project Management
• Design (UI and User Workflows)
• Development
• QA
• Warranty

W H AT D O W E D O ?
• Architecture
• Development
• QA
• Warranty
What problems
are we solving?
How do we solve them?
Solving them now!
Are they still solving
the problem?

T O D AY W E ’ R E G O I N G T O TA L K
A B O U T
• Architecture
• Development
• QA
• Warranty

H O W T O D O A R C H I T E C T U R E
E V O LV I N G
D E S I G N
U P - F R O N T
D E C I S I O N
M A K I N G

T O D AY…
• Part 1: Looking at some
SQL & Database Theory
• Part 2: Looking at a lot of
NoSQL databases

W H AT I S A D ATA B A S E ?
PA R T 1 : T H E O RY

- U N I V E R S I T Y O F G E O R G I A
“A database is a collection of information
organized to provide efficient retrieval.”

T H E M Y T H I C A L D ATA B A S E D I V I D E
S Q LN O S Q L

T H E M Y T H I C A L
D ATA B A S E D I V I D E
• NoSQL (apparently) has
always meant Not Only
SQL
• Considering Databases
that don’t meet the SQL
Standard which covers a
wide range of databases

T H E S Q L S TA N D A R D

H I S T O RY
• First defined by ANSI in
1986 (though around
before then)
• Structured Query
Language
• Different databases have
implemented this standard
way of storing, inserting
and retrieving data

E X A M P L E S O F
S Q L D ATA B A S E S
• MySQL
• Microsoft SQL Server
• Oracle
• PostgreSQL (mostly)
• IBM DB2 and more…

W H AT ’ S I N T H E
S TA N D A R D ?
• Rules for how the
language works
• No opinion as to what the
database looks like

B U T…
• ‘SQL’ has come to mean a
lot more than the
language (especially in the
context of NoSQL)
• Family of RDBMS
databases that follow a set
of rules

W H AT ’ S I N A N
R D B M S ?
• Prescriptive Schema
• Set-based Operations
• Table-driven &
Denormalised
• ACID Transactions

S E T- B A S E D
O P E R AT I O N
R E A D D A TA O U T W I T H

E V E RY R O W I S A “ T H I N G ”
Name Species
1 Puss
2 Dinah
3 Einstein
4 Jess

“ W H E R E ” ( I N T E R S E C T I O N )
Name Species
1 Puss
2 Dinah
3 Einstein
4 Jess

U N I O N S
Name Species
1 Puss
2 Dinah
3 Einstein
4 Jess
5 Nemo
6 Moby Dick
7 Wanda

– R O N E R N E S T
( & T H E S Q L C O M M U N I T Y AT L A R G E )
“Cursors are evil.”

N O R M A L
F O R M S
Body Level One

J O I N S
Name Species
Species Coolness
Rating
1 Puss 0
2 Dinah 0
3 Einstein 10
4 Jess 0

R E L AT I O N S
B E T W E E N D ATA
• We don’t like
duplicating data
• Goes out of sync
• May not be the
same everywhere

R E L AT I O N S
B E T W E E N D ATA
• Objects have
properties that come in
groups
• For example:
Landmarks have cities
and countries.
• The same city will
always have the same
country

W E S O LV E
T H AT W I T H …
• Denormalisation
• Store linked groups as
its own row in a
separate table
• And store pointers to
that table
• These are combined
by query-time joins

Name Species
Species Coolness
1 Puss
2 Dinah
3 Einstein
4 Jess
Species
Coolness
Rating
1 0
2 10
J O I N S

T R A N S A C T I O N S
W R I T E D A TA I N W I T H

– J O H N N Y A P P L E S E E D
“A unit of work you want to treat as a whole”

Name Species
1 Puss
2 Dinah
3 Einstein
4 Jess

The database is always in a valid state, as defined
by a whole number of queries
regardless of:
(1) invalid data;
(2) concurrent requests;
(3) system failures

A C I D
• Atomicity
• Consistency
• Isolation
• Durability

C A PA C I T Y &
S C A L A B I L I T Y

A S K I N G A
S Y S T E M T O D O
S O M E T H I N G
U S E S R E S O U R C E S

W H AT H A P P E N S
A S M O R E
R E Q U E S T S
C O M E I N ?

S Q L I S P R E T T Y
G O O D F O R
L A R G E A M O U N T S
O F D ATA
T R U T H F U L LY

W I T H E N O U G H
D ATA , Y O U
H AV E T O S C A L E
T H E H A R D T R U T H

Y O U R C U R R E N T S Y S T E M
D ATA B A S E A P P L I C AT I O N
U S E R S

A S I T G R O W S
D ATA B A S E A P P L I C AT I O N
U S E R S

H O R I Z O N TA L S C A L A B I L I T Y
D ATA B A S E
A P P L I C AT I O N
U S E R S
D ATA B A S E
D ATA B A S E

V E R T I C A L S C A L A B I L I T Y
M O R E P O W E R F U L
D ATA B A S E
A P P L I C AT I O N
U S E R S

S Q L C A N
S C A L E …
T H E H A R D T R U T H

S Q L C A N S C A L E V E R T I C A L LY

A N D …
• Scaling to meet the
needs of read operations
is very doable
• Master-Slave replication

B U T…
• Scaling writes is
problematic
• How do atomic
transactions work on a
scaled database?
• How can SQL enforce
constraints across
multiple databases?

- J O E R I S E B R A C H T S
“To scale up write operations or the number of
nodes in a cluster beyond a certain point you have
to be able to relax some of the ACID requirements”

T H E C A P T H E O R E M

T H E C O S T O F
S C A L I N G
• You become vulnerable
to network failures

C A P T H E O R E M
• Choose Two:
• Consistency
• Availability
• Partition Tolerance
• WARNING: These have
speciﬁc deﬁnitions

P R O V I S O
There is a lot of thought in this area,
I am giving a simplified description
that would make many database people
pull their hair out.
https://martin.kleppmann.com/2015/05/11/
please-stop-calling-databases-cp-or-ap.html

C A P T H E O R E M
CP AP
Consistent
& Partition Tolerant
Available
& Partition Tolerant

C A P T H E O R E M
A
BC
Data = “Cat”
Data = “Cat”
Data = “Cat”

C A P T H E O R E M
A
BC
Data = “Cat”
Data = “Dog”
Data = “Cat”

C A P T H E O R E M
A
BC
Data = “Dog”
Data = “Dog”
Data = “Dog”

C A P T H E O R E M
A
BC
Data = “Dog”
Data = “Dog” Data = “Dog”

AVA I L A B L E ( “ A P ” ) S Y S T E M S
A
BC
Data = “Wolf”

AVA I L A B L E ( “ A P ” ) S Y S T E M S
A
BC
Data = “Wolf”
Data = “Dog” Data = “Wolf”

C O N S I S T E N T ( “ C P ” ) S Y S T E M
A
BC
Data = “Dog”

C O N S I S T E N T ( “ C P ” ) S Y S T E M
A
BC
Data = “Wolf”
Data = “Dog” Data = “Wolf”

part 1 done
What shape is your data?
Are you happy to pay?
What uses your data?
• Databases store data in an accessible way
• SQL database meet a defined standard; NoSQL is a
movement towards considering databases that don’t
• SQL uses tables and schemas to store data, and acts on it like
sets in a transactional way.

I N C O N S I S T E N T
D ATA B A S E S
PA R T 2 : E X A M P L E S

T H E R E ’ S A L O T
O F VA L U E I N
C O N S I S T E N C Y…

– D Y N A M O : A M A Z O N ’ S H I G H LY AVA I L A B L E K E Y- VA L U E
S T O R E
“Reliability at massive scale is one of the biggest
challenges we face at Amazon.com. Even the
slightest outage has significant financial
consequences and impacts customer trust.”

– D Y N A M O : A M A Z O N ’ S H I G H LY AVA I L A B L E K E Y- VA L U E
S T O R E
“Dynamo targets applications that operate with
weaker consistency if this results in high
availability.”

D Y N A M O I M P L E M E N TAT I O N S

N O T
G U A R A N T E E D
C O N S I S T E N C Y
T H E C O S T ?

A M A Z O N
S H O P P I N G
I S T H A T H O N E S T LY O K A Y ?

S M S H I S T O R I C
L O G
I S T H A T H O N E S T LY O K A Y ?

W E U S E D …D Y N A M O I M P L E M E

C A S S A N D R A
• All nodes communicate
with each other through a
Gossip protocol similar to
Dynamo and Riak,
exchanging information
about themselves and
other nodes they have
gossiped with.
D Y N A M O I M P L

C A S S A N D R A
No single point of failure

W H Y
C A S S A N D R A
• We needed fast and high
availability writes
• Data didn’t need to be real
time - it was aggregate
analytics so eventually
consistent was enough.

C A S S A N D R A :
T H E C O N ’ S
• Data is only eventually
consistent - so if you need
100% accuracy it’s not
great
• Not as wide range of
support as SQL (but
nothing does)
• Flexible schema makes it
harder to integrate with
OO languages

C A S S A N D R A :
T H E P R O ’ S
• Very fast write throughput
• SQL-like query language
so you don’t need to
relearn things
• Wide range of language
drivers
• Highly available

H I G H LY R E L AT I O N A L
D ATA

W H AT S Q L
D O E S W E L L
• Modelling objects:
• With a fixed structure
and shape
• With a limited number of
relations
• With no opinion or
opinion of any deeper
underlying domain
R D B M S
( R E L AT I O N A L D ATA B A S E
M A N A G E M E N T S Y S T E M )

T H E R E A R E
P R O B L E M S T H I S
I S B A D F O R
B U T …

K E V I N B A C O N
S I X D E G R E E S O F …

W O R L D ’ S L E A D I N G G R A P H D B :

"embedded, disk-based, fully transactional Java
persistence engine that stores data structured in
graphs rather than in tables"

D ATA
S T O R A G E
• Nodes and edges are all:
• Stored as first-class
objects on the file system
• “typed”
• Key-value stores

D ATA I N T H E
R E L AT I O N S
• “Joins” are first class
objects in the database
that can be queried at no
additional cost
• Certain queries become
trivial (e.g. Joins)
• At a cost: high write-time
cost

P R O T O T Y P I N G
• Easy to see and work with
data
• Schemaless
• Active community with a
lot of libraries

N E O 4 J : T H E
C O N ’ S
• More expensive writes to
the database
• Not scalable
• Less mature tooling
(especially in non-Java
ecosystems)

N E O 4 J : T H E
P R O ’ S
• Models certain data
models very well
• Prevents costly queries
when running lots of data
• Schemalessness allows for
fast prototyping and
flexible data models
• Commercial buy-in means
language support is not far
behind

S C H E M A L E S S N E S S

NB: MongoDB claims there’s a lot
of usecases, we’re only covering this one

M O N G O D B :
T H E C O N ’ S
• Mongo was the first
famous NoSQL database
and got used before it was
tested and mature. There’s
lots of articles about
featurelessness and bugs
• Schemalessness makes
data integrity checks and
OO language integration
tricky

M O N G O D B :
T H E P R O ’ S
• Schemalessness - if you
want flexible data models
• People have used it for a
while, and so library
support is not bad

H O W D O Y O U R E T R I E V E
Y O U R D ATA

D O C U M E N T
S T O R E
ElasticSearch

E V E RY R O W I S A “ T H I N G ”
N A M E = P U S S
C O O L N E S S = 0
!
N A M E = J E S S
C O O L N E S S = 0
!
N A M E = D I N A H
C O O L N E S S = 0
!
N A M E = E I N S T E I N
C O O L N E S S = 1 0
!
D O C U M E N T

“Apache Lucene is a high-performance, full-
featured text search engine library … It is a
technology suitable for nearly any application that
requires full-text search”

F O C U S E D
A R O U N D
T E X T
S E A R C H I N G
Q U E R I E S

Q U E R I E S A R E
TA I L O R E D T O T H E
Q U E S T I O N S
Y O U ’ L L B E A S K I N G

{
"query": {
"match": {"hobbies": "skateboard"}
}
}

{
"query": {
{"fuzzy": {"hobbies": “skateboarig"}}
}
}

{
"query": {
{"match": {"hobbies": {"query": "writing
reddit comments", "type": "phrase"}}}
}
}

W H AT C O N S U M E S Y O U R D ATA ?
E N D U S E R What is the average age of …?

W H AT C O N S U M E S Y O U R D ATA ?
E N D U S E R
Er….
I think it was something like “Campbell”?

O U R C H O I C E I S
I N F O R M E D B Y
O U R P L A N S F O R
T H E A P P L I C AT I O N
R E M E M B E R T H A T

E L A S T I C S E A R C H :
T H E C O N ’ S
• It only does one thing
(even if it does it well)

E L A S T I C S E A R C H :
T H E P R O ’ S
• It has a lot of search related
queries built into it - fuzzy/
phonetic/sentence
matching
• A lot of people use this,
support is mature
• Integration with a large
number of other languages
and frameworks - this is
the industry standard

W H E N I T G O E S W R O N G

S Q L : T H E C O N ’ S
• It’s very hard to scale writes
• It has a specific data model
- not every data domain
fits into it
• e.g. highly relational
models,
schemalessness
• Domain non-specific query
languages

S Q L : T H E P R O ’ S
• If a library exists for
anything, it exists for SQL
• ACID transactions make
everything easy
• Constraints and Schemas
allow for automated data
integrity checking
• Easy denormalisation of
data

part 2 done
What shape is your data?
Are you happy to pay?
What uses your data?
• Some sites are happy to sacrifice consistency for availability -
Dynamo is a standard that databases can meet to fulfil that
• If you’ll be doing lots of joins, Graph Databases such as Neo4j
improve performance
• Sometimes you want the flexibility to store any objects - there are a
range of schemaless databases available
• Consider what will retrieve your data, and ensure you have a
database efficient for your use case.

A N Y
Q U E S T I O N S ?
D a v i d S i m o n s
@ S w a m W i t h Tu r t l e s

Bristol Uni - Use Cases of NoSQL

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Bristol Uni - Use Cases of NoSQL

Similar to Bristol Uni - Use Cases of NoSQL (20)

More from David Simons

More from David Simons (7)

Recently uploaded

Recently uploaded (20)

Bristol Uni - Use Cases of NoSQL