The document provides an overview of SQL and NoSQL databases. It begins with introductions to SQL theory, including the SQL standard, relational database management systems (RDBMS), and ACID transactions. It discusses how SQL can scale vertically but has challenges scaling writes horizontally. The document then covers NoSQL databases like DynamoDB, Cassandra, and Neo4j. It explains that NoSQL databases sacrifice consistency for availability and discusses when different database types may be preferable depending on use cases and data shape.
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Bristol Uni - Use Cases of NoSQL
1. S Q L &
N O S Q L
D a v i d S i m o n s
@ S w a m W i t h Tu r t l e s
2. S Q L &
N O S Q L
D a v i d S i m o n s
@ S w a m W i t h Tu r t l e s
3. W H O A M I ?
• Tech Lead/Consultant at
Softwire
• Background in Statistics &
Computer Simulation
4. W H AT D O W E D O ?
• Business Analysis/Mapping
• Architecture
• Project Management
• Design (UI and User Workflows)
• Development
• QA
• Warranty
5. W H AT D O W E D O ?
• Business Analysis/Mapping
• Architecture
• Project Management
• Design (UI and User Workflows)
• Development
• QA
• Warranty
What problems
are we solving?
How do we solve them?
Solving them now!
Are they still solving
the problem?
6. T O D AY W E ’ R E G O I N G T O TA L K
A B O U T
• Business Analysis/Mapping
• Architecture
• Project Management
• Design (UI and User Workflows)
• Development
• QA
• Warranty
7. H O W T O D O A R C H I T E C T U R E
E V O LV I N G
D E S I G N
U P - F R O N T
D E C I S I O N
M A K I N G
8. T O D AY…
• Part 1: Looking at some
SQL & Database Theory
• Part 2: Looking at a lot of
NoSQL databases
9. W H AT I S A D ATA B A S E ?
PA R T 1 : T H E O RY
10. - U N I V E R S I T Y O F G E O R G I A
“A database is a collection of information
organized to provide efficient retrieval.”
11. T H E M Y T H I C A L D ATA B A S E D I V I D E
S Q LN O S Q L
12. T H E M Y T H I C A L
D ATA B A S E D I V I D E
• NoSQL (apparently) has
always meant Not Only
SQL
• Considering Databases
that don’t meet the SQL
Standard which covers a
wide range of databases
13. T H E S Q L S TA N D A R D
PA R T 1 : T H E O RY
14. H I S T O RY
• First defined by ANSI in
1986 (though around
before then)
• Structured Query
Language
• Different databases have
implemented this standard
way of storing, inserting
and retrieving data
15. E X A M P L E S O F
S Q L D ATA B A S E S
• MySQL
• Microsoft SQL Server
• Oracle
• PostgreSQL (mostly)
• IBM DB2 and more…
16. W H AT ’ S I N T H E
S TA N D A R D ?
• Rules for how the
language works
• No opinion as to what the
database looks like
17. B U T…
• ‘SQL’ has come to mean a
lot more than the
language (especially in the
context of NoSQL)
• Family of RDBMS
databases that follow a set
of rules
18. W H AT ’ S I N A N
R D B M S ?
• Prescriptive Schema
• Set-based Operations
• Table-driven &
Denormalised
• ACID Transactions
27. J O I N S
Name Species
Species Coolness
Rating
1 Puss 0
2 Dinah 0
3 Einstein 10
4 Jess 0
28. R E L AT I O N S
B E T W E E N D ATA
• We don’t like
duplicating data
• Goes out of sync
• May not be the
same everywhere
29. R E L AT I O N S
B E T W E E N D ATA
• Objects have
properties that come in
groups
• For example:
Landmarks have cities
and countries.
• The same city will
always have the same
country
30. W E S O LV E
T H AT W I T H …
• Denormalisation
• Store linked groups as
its own row in a
separate table
• And store pointers to
that table
• These are combined
by query-time joins
38. The database is always in a valid state, as defined
by a whole number of queries
regardless of:
(1) invalid data;
(2) concurrent requests;
(3) system failures
39. The database is always in a valid state, as defined
by a whole number of queries
regardless of:
(1) invalid data;
(2) concurrent requests;
(3) system failures
40. The database is always in a valid state, as defined
by a whole number of queries
regardless of:
(1) invalid data;
(2) concurrent requests;
(3) system failures
41. The database is always in a valid state, as defined
by a whole number of queries
regardless of:
(1) invalid data;
(2) concurrent requests;
(3) system failures
42. A C I D
• Atomicity
• Consistency
• Isolation
• Durability
43. W H AT ’ S I N A N
R D B M S ?
• Prescriptive Schema
• Set-based Operations
• Table-driven &
Denormalised
• ACID Transactions
44. C A PA C I T Y &
S C A L A B I L I T Y
PA R T 1 : T H E O RY
45. A S K I N G A
S Y S T E M T O D O
S O M E T H I N G
U S E S R E S O U R C E S
46. W H AT H A P P E N S
A S M O R E
R E Q U E S T S
C O M E I N ?
47. S Q L I S P R E T T Y
G O O D F O R
L A R G E A M O U N T S
O F D ATA
T R U T H F U L LY
48. W I T H E N O U G H
D ATA , Y O U
H AV E T O S C A L E
T H E H A R D T R U T H
49. Y O U R C U R R E N T S Y S T E M
D ATA B A S E A P P L I C AT I O N
U S E R S
50. A S I T G R O W S
D ATA B A S E A P P L I C AT I O N
U S E R S
51. H O R I Z O N TA L S C A L A B I L I T Y
D ATA B A S E
A P P L I C AT I O N
U S E R S
D ATA B A S E
D ATA B A S E
52. V E R T I C A L S C A L A B I L I T Y
M O R E P O W E R F U L
D ATA B A S E
A P P L I C AT I O N
U S E R S
53. S Q L C A N
S C A L E …
T H E H A R D T R U T H
55. A N D …
• Scaling to meet the
needs of read operations
is very doable
• Master-Slave replication
56. B U T…
• Scaling writes is
problematic
• How do atomic
transactions work on a
scaled database?
• How can SQL enforce
constraints across
multiple databases?
57. - J O E R I S E B R A C H T S
“To scale up write operations or the number of
nodes in a cluster beyond a certain point you have
to be able to relax some of the ACID requirements”
58. T H E C A P T H E O R E M
PA R T 1 : T H E O RY
59. T H E C O S T O F
S C A L I N G
• You become vulnerable
to network failures
60. C A P T H E O R E M
• Choose Two:
• Consistency
• Availability
• Partition Tolerance
• WARNING: These have
specific definitions
61. P R O V I S O
There is a lot of thought in this area,
I am giving a simplified description
that would make many database people
pull their hair out.
https://martin.kleppmann.com/2015/05/11/
please-stop-calling-databases-cp-or-ap.html
62. C A P T H E O R E M
CP AP
Consistent
& Partition Tolerant
Available
& Partition Tolerant
63. C A P T H E O R E M
A
BC
Data = “Cat”
Data = “Cat”
Data = “Cat”
64. C A P T H E O R E M
A
BC
Data = “Cat”
Data = “Dog”
Data = “Cat”
65. C A P T H E O R E M
A
BC
Data = “Dog”
Data = “Dog”
Data = “Dog”
71. C O N S I S T E N T ( “ C P ” ) S Y S T E M
A
BC
Data = “Dog”
Data = “Dog” Data = “Dog”
72. C O N S I S T E N T ( “ C P ” ) S Y S T E M
A
BC
Data = “Dog”
Data = “Dog” Data = “Dog”
73. C O N S I S T E N T ( “ C P ” ) S Y S T E M
A
BC
Data = “Wolf”
Data = “Dog” Data = “Wolf”
74. part 1 done
What shape is your data?
Are you happy to pay?
What uses your data?
• Databases store data in an accessible way
• SQL database meet a defined standard; NoSQL is a
movement towards considering databases that don’t
• SQL uses tables and schemas to store data, and acts on it like
sets in a transactional way.
75. I N C O N S I S T E N T
D ATA B A S E S
PA R T 2 : E X A M P L E S
76. T H E R E ’ S A L O T
O F VA L U E I N
C O N S I S T E N C Y…
77. – D Y N A M O : A M A Z O N ’ S H I G H LY AVA I L A B L E K E Y- VA L U E
S T O R E
“Reliability at massive scale is one of the biggest
challenges we face at Amazon.com. Even the
slightest outage has significant financial
consequences and impacts customer trust.”
78. – D Y N A M O : A M A Z O N ’ S H I G H LY AVA I L A B L E K E Y- VA L U E
S T O R E
“Dynamo targets applications that operate with
weaker consistency if this results in high
availability.”
84. C A S S A N D R A
• All nodes communicate
with each other through a
Gossip protocol similar to
Dynamo and Riak,
exchanging information
about themselves and
other nodes they have
gossiped with.
D Y N A M O I M P L
86. W H Y
C A S S A N D R A
• We needed fast and high
availability writes
• Data didn’t need to be real
time - it was aggregate
analytics so eventually
consistent was enough.
87. C A S S A N D R A :
T H E C O N ’ S
• Data is only eventually
consistent - so if you need
100% accuracy it’s not
great
• Not as wide range of
support as SQL (but
nothing does)
• Flexible schema makes it
harder to integrate with
OO languages
88. C A S S A N D R A :
T H E P R O ’ S
• Very fast write throughput
• SQL-like query language
so you don’t need to
relearn things
• Wide range of language
drivers
• Highly available
89. H I G H LY R E L AT I O N A L
D ATA
PA R T 2 : E X A M P L E S
90. E V E RY R O W I S A “ T H I N G ”
Name Species
1 Puss
2 Dinah
3 Einstein
4 Jess
91. W H AT S Q L
D O E S W E L L
• Modelling objects:
• With a fixed structure
and shape
• With a limited number of
relations
• With no opinion or
opinion of any deeper
underlying domain
R D B M S
( R E L AT I O N A L D ATA B A S E
M A N A G E M E N T S Y S T E M )
92. T H E R E A R E
P R O B L E M S T H I S
I S B A D F O R
B U T …
103. D ATA
S T O R A G E
• Nodes and edges are all:
• Stored as first-class
objects on the file system
• “typed”
• Key-value stores
104. D ATA I N T H E
R E L AT I O N S
• “Joins” are first class
objects in the database
that can be queried at no
additional cost
• Certain queries become
trivial (e.g. Joins)
• At a cost: high write-time
cost
105. P R O T O T Y P I N G
• Easy to see and work with
data
• Schemaless
• Active community with a
lot of libraries
107. N E O 4 J : T H E
C O N ’ S
• More expensive writes to
the database
• Not scalable
• Less mature tooling
(especially in non-Java
ecosystems)
108. N E O 4 J : T H E
P R O ’ S
• Models certain data
models very well
• Prevents costly queries
when running lots of data
• Schemalessness allows for
fast prototyping and
flexible data models
• Commercial buy-in means
language support is not far
behind
109. S C H E M A L E S S N E S S
PA R T 2 : E X A M P L E S
110.
111. NB: MongoDB claims there’s a lot
of usecases, we’re only covering this one
112. M O N G O D B :
T H E C O N ’ S
• Mongo was the first
famous NoSQL database
and got used before it was
tested and mature. There’s
lots of articles about
featurelessness and bugs
• Schemalessness makes
data integrity checks and
OO language integration
tricky
113. M O N G O D B :
T H E P R O ’ S
• Schemalessness - if you
want flexible data models
• People have used it for a
while, and so library
support is not bad
114. H O W D O Y O U R E T R I E V E
Y O U R D ATA
PA R T 2 : E X A M P L E S
119. E V E RY R O W I S A “ T H I N G ”
N A M E = P U S S
C O O L N E S S = 0
!
N A M E = J E S S
C O O L N E S S = 0
!
N A M E = D I N A H
C O O L N E S S = 0
!
N A M E = E I N S T E I N
C O O L N E S S = 1 0
!
D O C U M E N T
121. “Apache Lucene is a high-performance, full-
featured text search engine library … It is a
technology suitable for nearly any application that
requires full-text search”
122. F O C U S E D
A R O U N D
T E X T
S E A R C H I N G
Q U E R I E S
123. Q U E R I E S A R E
TA I L O R E D T O T H E
Q U E S T I O N S
Y O U ’ L L B E A S K I N G
127. W H AT C O N S U M E S Y O U R D ATA ?
E N D U S E R What is the average age of …?
128. W H AT C O N S U M E S Y O U R D ATA ?
E N D U S E R
Er….
I think it was something like “Campbell”?
129. O U R C H O I C E I S
I N F O R M E D B Y
O U R P L A N S F O R
T H E A P P L I C AT I O N
R E M E M B E R T H A T
130. E L A S T I C S E A R C H :
T H E C O N ’ S
• It only does one thing
(even if it does it well)
131. E L A S T I C S E A R C H :
T H E P R O ’ S
• It has a lot of search related
queries built into it - fuzzy/
phonetic/sentence
matching
• A lot of people use this,
support is mature
• Integration with a large
number of other languages
and frameworks - this is
the industry standard
132. W H E N I T G O E S W R O N G
PA R T 2 : E X A M P L E S
133.
134. S Q L : T H E C O N ’ S
• It’s very hard to scale writes
• It has a specific data model
- not every data domain
fits into it
• e.g. highly relational
models,
schemalessness
• Domain non-specific query
languages
135. S Q L : T H E P R O ’ S
• If a library exists for
anything, it exists for SQL
• ACID transactions make
everything easy
• Constraints and Schemas
allow for automated data
integrity checking
• Easy denormalisation of
data
136. part 2 done
What shape is your data?
Are you happy to pay?
What uses your data?
• Some sites are happy to sacrifice consistency for availability -
Dynamo is a standard that databases can meet to fulfil that
• If you’ll be doing lots of joins, Graph Databases such as Neo4j
improve performance
• Sometimes you want the flexibility to store any objects - there are a
range of schemaless databases available
• Consider what will retrieve your data, and ensure you have a
database efficient for your use case.
137. A N Y
Q U E S T I O N S ?
D a v i d S i m o n s
@ S w a m W i t h Tu r t l e s