SlideShare a Scribd company logo
S Q L &
N O S Q L
D a v i d S i m o n s
@ S w a m W i t h Tu r t l e s
S Q L &
N O S Q L
D a v i d S i m o n s
@ S w a m W i t h Tu r t l e s
W H O A M I ?
• Tech Lead/Consultant at
Softwire
• Background in Statistics &
Computer Simulation
W H AT D O W E D O ?
• Business Analysis/Mapping
• Architecture
• Project Management
• Design (UI and User Workflows)
• Development
• QA
• Warranty
W H AT D O W E D O ?
• Business Analysis/Mapping
• Architecture
• Project Management
• Design (UI and User Workflows)
• Development
• QA
• Warranty
What problems
are we solving?
How do we solve them?
Solving them now!
Are they still solving
the problem?
T O D AY W E ’ R E G O I N G T O TA L K
A B O U T
• Business Analysis/Mapping
• Architecture
• Project Management
• Design (UI and User Workflows)
• Development
• QA
• Warranty
H O W T O D O A R C H I T E C T U R E
E V O LV I N G
D E S I G N
U P - F R O N T
D E C I S I O N
M A K I N G
T O D AY…
• Part 1: Looking at some
SQL & Database Theory
• Part 2: Looking at a lot of
NoSQL databases
W H AT I S A D ATA B A S E ?
PA R T 1 : T H E O RY
- U N I V E R S I T Y O F G E O R G I A
“A database is a collection of information
organized to provide efficient retrieval.”
T H E M Y T H I C A L D ATA B A S E D I V I D E
S Q LN O S Q L
T H E M Y T H I C A L
D ATA B A S E D I V I D E
• NoSQL (apparently) has
always meant Not Only
SQL
• Considering Databases
that don’t meet the SQL
Standard which covers a
wide range of databases
T H E S Q L S TA N D A R D
PA R T 1 : T H E O RY
H I S T O RY
• First defined by ANSI in
1986 (though around
before then)
• Structured Query
Language
• Different databases have
implemented this standard
way of storing, inserting
and retrieving data
E X A M P L E S O F
S Q L D ATA B A S E S
• MySQL
• Microsoft SQL Server
• Oracle
• PostgreSQL (mostly)
• IBM DB2 and more…
W H AT ’ S I N T H E
S TA N D A R D ?
• Rules for how the
language works
• No opinion as to what the
database looks like
B U T…
• ‘SQL’ has come to mean a
lot more than the
language (especially in the
context of NoSQL)
• Family of RDBMS
databases that follow a set
of rules
W H AT ’ S I N A N
R D B M S ?
• Prescriptive Schema
• Set-based Operations
• Table-driven &
Denormalised
• ACID Transactions
S C H E M A
D R I V E N
Name Species
S E T- B A S E D
O P E R AT I O N
R E A D D A TA O U T W I T H
E V E RY R O W I S A “ T H I N G ”
Name Species
1 Puss
2 Dinah
3 Einstein
4 Jess
“ W H E R E ” ( I N T E R S E C T I O N )
Name Species
1 Puss
2 Dinah
3 Einstein
4 Jess
U N I O N S
Name Species
1 Puss
2 Dinah
3 Einstein
4 Jess
5 Nemo
6 Moby Dick
7 Wanda
– R O N E R N E S T
( & T H E S Q L C O M M U N I T Y AT L A R G E )
“Cursors are evil.”
N O R M A L
F O R M S
Body Level One
J O I N S
Name Species
Species Coolness
Rating
1 Puss 0
2 Dinah 0
3 Einstein 10
4 Jess 0
R E L AT I O N S
B E T W E E N D ATA
• We don’t like
duplicating data
• Goes out of sync
• May not be the
same everywhere
R E L AT I O N S
B E T W E E N D ATA
• Objects have
properties that come in
groups
• For example:
Landmarks have cities
and countries.
• The same city will
always have the same
country
W E S O LV E
T H AT W I T H …
• Denormalisation
• Store linked groups as
its own row in a
separate table
• And store pointers to
that table
• These are combined
by query-time joins
Name Species
Species Coolness
1 Puss
2 Dinah
3 Einstein
4 Jess
Species
Coolness
Rating
1 0
2 10
J O I N S
T R A N S A C T I O N S
W R I T E D A TA I N W I T H
– J O H N N Y A P P L E S E E D
“A unit of work you want to treat as a whole”
Name Species
1 Puss
2 Dinah
3 Einstein
4 Jess
DonaldPlutoMickey
{ }
Ducks aren’t mammals
Name Species
1 Puss
2 Dinah
3 Einstein
4 Jess
The database is always in a valid state, as defined
by a whole number of queries
regardless of:
(1) invalid data;
(2) concurrent requests;
(3) system failures
The database is always in a valid state, as defined
by a whole number of queries
regardless of:
(1) invalid data;
(2) concurrent requests;
(3) system failures
The database is always in a valid state, as defined
by a whole number of queries
regardless of:
(1) invalid data;
(2) concurrent requests;
(3) system failures
The database is always in a valid state, as defined
by a whole number of queries
regardless of:
(1) invalid data;
(2) concurrent requests;
(3) system failures
A C I D
• Atomicity
• Consistency
• Isolation
• Durability
W H AT ’ S I N A N
R D B M S ?
• Prescriptive Schema
• Set-based Operations
• Table-driven &
Denormalised
• ACID Transactions
C A PA C I T Y &
S C A L A B I L I T Y
PA R T 1 : T H E O RY
A S K I N G A
S Y S T E M T O D O
S O M E T H I N G
U S E S R E S O U R C E S
W H AT H A P P E N S
A S M O R E
R E Q U E S T S
C O M E I N ?
S Q L I S P R E T T Y
G O O D F O R
L A R G E A M O U N T S
O F D ATA
T R U T H F U L LY
W I T H E N O U G H
D ATA , Y O U
H AV E T O S C A L E
T H E H A R D T R U T H
Y O U R C U R R E N T S Y S T E M
D ATA B A S E A P P L I C AT I O N
U S E R S
A S I T G R O W S
D ATA B A S E A P P L I C AT I O N
U S E R S
H O R I Z O N TA L S C A L A B I L I T Y
D ATA B A S E
A P P L I C AT I O N
U S E R S
D ATA B A S E
D ATA B A S E
V E R T I C A L S C A L A B I L I T Y
M O R E P O W E R F U L
D ATA B A S E
A P P L I C AT I O N
U S E R S
S Q L C A N
S C A L E …
T H E H A R D T R U T H
S Q L C A N S C A L E V E R T I C A L LY
A N D …
• Scaling to meet the
needs of read operations
is very doable
• Master-Slave replication
B U T…
• Scaling writes is
problematic
• How do atomic
transactions work on a
scaled database?
• How can SQL enforce
constraints across
multiple databases?
- J O E R I S E B R A C H T S
“To scale up write operations or the number of
nodes in a cluster beyond a certain point you have
to be able to relax some of the ACID requirements”
T H E C A P T H E O R E M
PA R T 1 : T H E O RY
T H E C O S T O F
S C A L I N G
• You become vulnerable
to network failures
C A P T H E O R E M
• Choose Two:
• Consistency
• Availability
• Partition Tolerance
• WARNING: These have
specific definitions
P R O V I S O
There is a lot of thought in this area,
I am giving a simplified description
that would make many database people
pull their hair out.
https://martin.kleppmann.com/2015/05/11/
please-stop-calling-databases-cp-or-ap.html
C A P T H E O R E M
CP AP
Consistent
& Partition Tolerant
Available
& Partition Tolerant
C A P T H E O R E M
A
BC
Data = “Cat”
Data = “Cat”
Data = “Cat”
C A P T H E O R E M
A
BC
Data = “Cat”
Data = “Dog”
Data = “Cat”
C A P T H E O R E M
A
BC
Data = “Dog”
Data = “Dog”
Data = “Dog”
A P S Y S T E M S
C A P T H E O R E M
A
BC
Data = “Dog”
Data = “Dog” Data = “Dog”
AVA I L A B L E ( “ A P ” ) S Y S T E M S
A
BC
Data = “Wolf”
Data = “Dog” Data = “Dog”
AVA I L A B L E ( “ A P ” ) S Y S T E M S
A
BC
Data = “Wolf”
Data = “Dog” Data = “Wolf”
C P S Y S T E M S
C O N S I S T E N T ( “ C P ” ) S Y S T E M
A
BC
Data = “Dog”
Data = “Dog” Data = “Dog”
C O N S I S T E N T ( “ C P ” ) S Y S T E M
A
BC
Data = “Dog”
Data = “Dog” Data = “Dog”
C O N S I S T E N T ( “ C P ” ) S Y S T E M
A
BC
Data = “Wolf”
Data = “Dog” Data = “Wolf”
part 1 done
What shape is your data?
Are you happy to pay?
What uses your data?
• Databases store data in an accessible way
• SQL database meet a defined standard; NoSQL is a
movement towards considering databases that don’t
• SQL uses tables and schemas to store data, and acts on it like
sets in a transactional way.
I N C O N S I S T E N T
D ATA B A S E S
PA R T 2 : E X A M P L E S
T H E R E ’ S A L O T
O F VA L U E I N
C O N S I S T E N C Y…
– D Y N A M O : A M A Z O N ’ S H I G H LY AVA I L A B L E K E Y- VA L U E
S T O R E
“Reliability at massive scale is one of the biggest
challenges we face at Amazon.com. Even the
slightest outage has significant financial
consequences and impacts customer trust.”
– D Y N A M O : A M A Z O N ’ S H I G H LY AVA I L A B L E K E Y- VA L U E
S T O R E
“Dynamo targets applications that operate with
weaker consistency if this results in high
availability.”
D Y N A M O I M P L E M E N TAT I O N S
N O T
G U A R A N T E E D
C O N S I S T E N C Y
T H E C O S T ?
A M A Z O N
S H O P P I N G
I S T H A T H O N E S T LY O K A Y ?
S M S H I S T O R I C
L O G
I S T H A T H O N E S T LY O K A Y ?
W E U S E D …D Y N A M O I M P L E M E
C A S S A N D R A
• All nodes communicate
with each other through a
Gossip protocol similar to
Dynamo and Riak,
exchanging information
about themselves and
other nodes they have
gossiped with.
D Y N A M O I M P L
C A S S A N D R A
No single point of failure
W H Y
C A S S A N D R A
• We needed fast and high
availability writes
• Data didn’t need to be real
time - it was aggregate
analytics so eventually
consistent was enough.
C A S S A N D R A :
T H E C O N ’ S
• Data is only eventually
consistent - so if you need
100% accuracy it’s not
great
• Not as wide range of
support as SQL (but
nothing does)
• Flexible schema makes it
harder to integrate with
OO languages
C A S S A N D R A :
T H E P R O ’ S
• Very fast write throughput
• SQL-like query language
so you don’t need to
relearn things
• Wide range of language
drivers
• Highly available
H I G H LY R E L AT I O N A L
D ATA
PA R T 2 : E X A M P L E S
E V E RY R O W I S A “ T H I N G ”
Name Species
1 Puss
2 Dinah
3 Einstein
4 Jess
W H AT S Q L
D O E S W E L L
• Modelling objects:
• With a fixed structure
and shape
• With a limited number of
relations
• With no opinion or
opinion of any deeper
underlying domain
R D B M S
( R E L AT I O N A L D ATA B A S E
M A N A G E M E N T S Y S T E M )
T H E R E A R E
P R O B L E M S T H I S
I S B A D F O R
B U T …
K E V I N B A C O N
S I X D E G R E E S O F …
E L E C T I O N D ATA
E L E C T I O N D ATA
W O R L D ’ S L E A D I N G G R A P H D B :
"embedded, disk-based, fully transactional Java
persistence engine that stores data structured in
graphs rather than in tables"
D ATA S T O R A G E
D ATA S T O R A G E
D ATA
S T O R A G E
• Nodes and edges are all:
• Stored as first-class
objects on the file system
• “typed”
• Key-value stores
D ATA I N T H E
R E L AT I O N S
• “Joins” are first class
objects in the database
that can be queried at no
additional cost
• Certain queries become
trivial (e.g. Joins)
• At a cost: high write-time
cost
P R O T O T Y P I N G
• Easy to see and work with
data
• Schemaless
• Active community with a
lot of libraries
N E O 4 J U S E R S
N E O 4 J : T H E
C O N ’ S
• More expensive writes to
the database
• Not scalable
• Less mature tooling
(especially in non-Java
ecosystems)
N E O 4 J : T H E
P R O ’ S
• Models certain data
models very well
• Prevents costly queries
when running lots of data
• Schemalessness allows for
fast prototyping and
flexible data models
• Commercial buy-in means
language support is not far
behind
S C H E M A L E S S N E S S
PA R T 2 : E X A M P L E S
NB: MongoDB claims there’s a lot
of usecases, we’re only covering this one
M O N G O D B :
T H E C O N ’ S
• Mongo was the first
famous NoSQL database
and got used before it was
tested and mature. There’s
lots of articles about
featurelessness and bugs
• Schemalessness makes
data integrity checks and
OO language integration
tricky
M O N G O D B :
T H E P R O ’ S
• Schemalessness - if you
want flexible data models
• People have used it for a
while, and so library
support is not bad
H O W D O Y O U R E T R I E V E
Y O U R D ATA
PA R T 2 : E X A M P L E S
F R E E - T E X T S E A R C H
D O C U M E N T
S T O R E
ElasticSearch
D O C U M E N T
S T O R E
E V E RY R O W I S A “ T H I N G ”
N A M E = P U S S
C O O L N E S S = 0
!
N A M E = J E S S
C O O L N E S S = 0
!
N A M E = D I N A H
C O O L N E S S = 0
!
N A M E = E I N S T E I N
C O O L N E S S = 1 0
!
D O C U M E N T
A PA C H E
L U C E N E
“Apache Lucene is a high-performance, full-
featured text search engine library … It is a
technology suitable for nearly any application that
requires full-text search”
F O C U S E D
A R O U N D
T E X T
S E A R C H I N G
Q U E R I E S
Q U E R I E S A R E
TA I L O R E D T O T H E
Q U E S T I O N S
Y O U ’ L L B E A S K I N G
{
"query": {
"match": {"hobbies": "skateboard"}
}
}
{
"query": {
{"fuzzy": {"hobbies": “skateboarig"}}
}
}
{
"query": {
{"match": {"hobbies": {"query": "writing
reddit comments", "type": "phrase"}}}
}
}
W H AT C O N S U M E S Y O U R D ATA ?
E N D U S E R What is the average age of …?
W H AT C O N S U M E S Y O U R D ATA ?
E N D U S E R
Er….
I think it was something like “Campbell”?
O U R C H O I C E I S
I N F O R M E D B Y
O U R P L A N S F O R
T H E A P P L I C AT I O N
R E M E M B E R T H A T
E L A S T I C S E A R C H :
T H E C O N ’ S
• It only does one thing
(even if it does it well)
E L A S T I C S E A R C H :
T H E P R O ’ S
• It has a lot of search related
queries built into it - fuzzy/
phonetic/sentence
matching
• A lot of people use this,
support is mature
• Integration with a large
number of other languages
and frameworks - this is
the industry standard
W H E N I T G O E S W R O N G
PA R T 2 : E X A M P L E S
S Q L : T H E C O N ’ S
• It’s very hard to scale writes
• It has a specific data model
- not every data domain
fits into it
• e.g. highly relational
models,
schemalessness
• Domain non-specific query
languages
S Q L : T H E P R O ’ S
• If a library exists for
anything, it exists for SQL
• ACID transactions make
everything easy
• Constraints and Schemas
allow for automated data
integrity checking
• Easy denormalisation of
data
part 2 done
What shape is your data?
Are you happy to pay?
What uses your data?
• Some sites are happy to sacrifice consistency for availability -
Dynamo is a standard that databases can meet to fulfil that
• If you’ll be doing lots of joins, Graph Databases such as Neo4j
improve performance
• Sometimes you want the flexibility to store any objects - there are a
range of schemaless databases available
• Consider what will retrieve your data, and ensure you have a
database efficient for your use case.
A N Y
Q U E S T I O N S ?
D a v i d S i m o n s
@ S w a m W i t h Tu r t l e s

More Related Content

What's hot

Gain Maximum Visibility into Your Applications - DEM03 - Chicago AWS Summit
Gain Maximum Visibility into Your Applications - DEM03 - Chicago AWS SummitGain Maximum Visibility into Your Applications - DEM03 - Chicago AWS Summit
Gain Maximum Visibility into Your Applications - DEM03 - Chicago AWS SummitAmazon Web Services
 
Gain Maximum Visibility into Your Applications
Gain Maximum Visibility into Your Applications Gain Maximum Visibility into Your Applications
Gain Maximum Visibility into Your Applications Amazon Web Services
 
100% Visibility - Jason Yee - Codemotion Amsterdam 2018
100% Visibility - Jason Yee - Codemotion Amsterdam 2018100% Visibility - Jason Yee - Codemotion Amsterdam 2018
100% Visibility - Jason Yee - Codemotion Amsterdam 2018Codemotion
 
SharePoint Saturday Redmond - Building solutions with the future in mind
SharePoint Saturday Redmond - Building solutions with the future in mindSharePoint Saturday Redmond - Building solutions with the future in mind
SharePoint Saturday Redmond - Building solutions with the future in mindChris Johnson
 
100% de visibilidade nas suas aplicações - DEM03 - Sao Paulo Summit
100% de visibilidade nas suas aplicações -  DEM03 - Sao Paulo Summit100% de visibilidade nas suas aplicações -  DEM03 - Sao Paulo Summit
100% de visibilidade nas suas aplicações - DEM03 - Sao Paulo SummitAmazon Web Services
 
Wrangle Your Defense Using Offensive Tactics BSides CT 2019
Wrangle Your Defense Using Offensive Tactics BSides CT 2019Wrangle Your Defense Using Offensive Tactics BSides CT 2019
Wrangle Your Defense Using Offensive Tactics BSides CT 2019Matt Dunn
 
Gain Maximum Visibility - DEM06 - Anaheim AWS Summit
Gain Maximum Visibility - DEM06 - Anaheim AWS SummitGain Maximum Visibility - DEM06 - Anaheim AWS Summit
Gain Maximum Visibility - DEM06 - Anaheim AWS SummitAmazon Web Services
 
10 d bs in 30 minutes
10 d bs in 30 minutes10 d bs in 30 minutes
10 d bs in 30 minutesDavid Simons
 
Data Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong LearningData Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong LearningMegan Bowe
 
Gain Maximum Visibility into Your Applications - DEM04 - Atlanta AWS Summit
Gain Maximum Visibility into Your Applications - DEM04 - Atlanta AWS SummitGain Maximum Visibility into Your Applications - DEM04 - Atlanta AWS Summit
Gain Maximum Visibility into Your Applications - DEM04 - Atlanta AWS SummitAmazon Web Services
 
Wrangle Your Defense Using Offensive Tactics - ISSA May Meeting
Wrangle Your Defense Using Offensive Tactics - ISSA May MeetingWrangle Your Defense Using Offensive Tactics - ISSA May Meeting
Wrangle Your Defense Using Offensive Tactics - ISSA May MeetingMatt Dunn
 
Tech rfp template
Tech rfp templateTech rfp template
Tech rfp templateAnna Duin
 
Thinking like a Network
Thinking like a NetworkThinking like a Network
Thinking like a NetworkJonas Altman
 
TDD Using the SOLID Principles
TDD Using the SOLID PrinciplesTDD Using the SOLID Principles
TDD Using the SOLID PrinciplesJenna Pederson
 
Ellicium Solutions - Making Data Science Work
Ellicium  Solutions - Making Data Science Work Ellicium  Solutions - Making Data Science Work
Ellicium Solutions - Making Data Science Work Ellicium Solutions Inc.
 
Backpack Reporting (Updated)
Backpack Reporting (Updated)Backpack Reporting (Updated)
Backpack Reporting (Updated)Steve Johnson
 
Beyond the Retrospective: Embracing Complexity on the Road to Service Ownership
Beyond the Retrospective: Embracing Complexity on the Road to Service OwnershipBeyond the Retrospective: Embracing Complexity on the Road to Service Ownership
Beyond the Retrospective: Embracing Complexity on the Road to Service OwnershipJ. Paul Reed
 
Big Data and Small Devices: What will it do for us and to us
Big Data and Small Devices: What will it do for us and to usBig Data and Small Devices: What will it do for us and to us
Big Data and Small Devices: What will it do for us and to usJohn Tomizuka
 

What's hot (20)

Gain Maximum Visibility into Your Applications - DEM03 - Chicago AWS Summit
Gain Maximum Visibility into Your Applications - DEM03 - Chicago AWS SummitGain Maximum Visibility into Your Applications - DEM03 - Chicago AWS Summit
Gain Maximum Visibility into Your Applications - DEM03 - Chicago AWS Summit
 
Gain Maximum Visibility into Your Applications
Gain Maximum Visibility into Your Applications Gain Maximum Visibility into Your Applications
Gain Maximum Visibility into Your Applications
 
100% Visibility - Jason Yee - Codemotion Amsterdam 2018
100% Visibility - Jason Yee - Codemotion Amsterdam 2018100% Visibility - Jason Yee - Codemotion Amsterdam 2018
100% Visibility - Jason Yee - Codemotion Amsterdam 2018
 
SharePoint Saturday Redmond - Building solutions with the future in mind
SharePoint Saturday Redmond - Building solutions with the future in mindSharePoint Saturday Redmond - Building solutions with the future in mind
SharePoint Saturday Redmond - Building solutions with the future in mind
 
100% de visibilidade nas suas aplicações - DEM03 - Sao Paulo Summit
100% de visibilidade nas suas aplicações -  DEM03 - Sao Paulo Summit100% de visibilidade nas suas aplicações -  DEM03 - Sao Paulo Summit
100% de visibilidade nas suas aplicações - DEM03 - Sao Paulo Summit
 
Yammer time
Yammer timeYammer time
Yammer time
 
eHarmony @ Phoenix Con 2016
eHarmony @ Phoenix Con 2016eHarmony @ Phoenix Con 2016
eHarmony @ Phoenix Con 2016
 
Wrangle Your Defense Using Offensive Tactics BSides CT 2019
Wrangle Your Defense Using Offensive Tactics BSides CT 2019Wrangle Your Defense Using Offensive Tactics BSides CT 2019
Wrangle Your Defense Using Offensive Tactics BSides CT 2019
 
Gain Maximum Visibility - DEM06 - Anaheim AWS Summit
Gain Maximum Visibility - DEM06 - Anaheim AWS SummitGain Maximum Visibility - DEM06 - Anaheim AWS Summit
Gain Maximum Visibility - DEM06 - Anaheim AWS Summit
 
10 d bs in 30 minutes
10 d bs in 30 minutes10 d bs in 30 minutes
10 d bs in 30 minutes
 
Data Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong LearningData Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong Learning
 
Gain Maximum Visibility into Your Applications - DEM04 - Atlanta AWS Summit
Gain Maximum Visibility into Your Applications - DEM04 - Atlanta AWS SummitGain Maximum Visibility into Your Applications - DEM04 - Atlanta AWS Summit
Gain Maximum Visibility into Your Applications - DEM04 - Atlanta AWS Summit
 
Wrangle Your Defense Using Offensive Tactics - ISSA May Meeting
Wrangle Your Defense Using Offensive Tactics - ISSA May MeetingWrangle Your Defense Using Offensive Tactics - ISSA May Meeting
Wrangle Your Defense Using Offensive Tactics - ISSA May Meeting
 
Tech rfp template
Tech rfp templateTech rfp template
Tech rfp template
 
Thinking like a Network
Thinking like a NetworkThinking like a Network
Thinking like a Network
 
TDD Using the SOLID Principles
TDD Using the SOLID PrinciplesTDD Using the SOLID Principles
TDD Using the SOLID Principles
 
Ellicium Solutions - Making Data Science Work
Ellicium  Solutions - Making Data Science Work Ellicium  Solutions - Making Data Science Work
Ellicium Solutions - Making Data Science Work
 
Backpack Reporting (Updated)
Backpack Reporting (Updated)Backpack Reporting (Updated)
Backpack Reporting (Updated)
 
Beyond the Retrospective: Embracing Complexity on the Road to Service Ownership
Beyond the Retrospective: Embracing Complexity on the Road to Service OwnershipBeyond the Retrospective: Embracing Complexity on the Road to Service Ownership
Beyond the Retrospective: Embracing Complexity on the Road to Service Ownership
 
Big Data and Small Devices: What will it do for us and to us
Big Data and Small Devices: What will it do for us and to usBig Data and Small Devices: What will it do for us and to us
Big Data and Small Devices: What will it do for us and to us
 

Similar to Bristol Uni - Use Cases of NoSQL

Cassandra Data Modelling with CQL (OSCON 2015)
Cassandra Data Modelling with CQL (OSCON 2015)Cassandra Data Modelling with CQL (OSCON 2015)
Cassandra Data Modelling with CQL (OSCON 2015)twentyideas
 
From Content Strategy to Drupal Site Building - Connecting the Dots
From Content Strategy to Drupal Site Building - Connecting the DotsFrom Content Strategy to Drupal Site Building - Connecting the Dots
From Content Strategy to Drupal Site Building - Connecting the DotsRonald Ashri
 
Four Architectural Patterns
Four Architectural Patterns Four Architectural Patterns
Four Architectural Patterns David Simons
 
Strategy to double team throughput - Fullstack Porto
Strategy to double team throughput - Fullstack PortoStrategy to double team throughput - Fullstack Porto
Strategy to double team throughput - Fullstack PortoPedro Almeida
 
Why Every Product Manager Needs to Know Big Data
Why Every Product Manager Needs to Know Big DataWhy Every Product Manager Needs to Know Big Data
Why Every Product Manager Needs to Know Big DataJeremy Horn
 
Graph theory in Practise
Graph theory in PractiseGraph theory in Practise
Graph theory in PractiseDavid Simons
 
The Expanding Boundaries of CSS
The Expanding Boundaries of CSSThe Expanding Boundaries of CSS
The Expanding Boundaries of CSSchriseppstein
 
Data Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong LearningData Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong LearningMegan Bowe
 
Data Scientist's Daily Life
Data Scientist's Daily LifeData Scientist's Daily Life
Data Scientist's Daily LifeBryan Yang
 
Consistency, Availability, Partition: Make Your Choice
Consistency, Availability, Partition: Make Your ChoiceConsistency, Availability, Partition: Make Your Choice
Consistency, Availability, Partition: Make Your ChoiceAndrea Giuliano
 
ResearchGate - How do 'Social Networks for Scientists' Affect Libraries?
ResearchGate - How do 'Social Networks for Scientists' Affect Libraries?ResearchGate - How do 'Social Networks for Scientists' Affect Libraries?
ResearchGate - How do 'Social Networks for Scientists' Affect Libraries?Keita Bando
 
Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...
Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...
Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...Massimiliano Crosato
 
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)Jonathon Hare
 
Reduce, Reuse, Refactor
Reduce, Reuse, RefactorReduce, Reuse, Refactor
Reduce, Reuse, Refactorcklosowski
 
Reduce, Reuse, Refactor
Reduce, Reuse, RefactorReduce, Reuse, Refactor
Reduce, Reuse, Refactorcklosowski
 

Similar to Bristol Uni - Use Cases of NoSQL (20)

Cassandra Data Modelling with CQL (OSCON 2015)
Cassandra Data Modelling with CQL (OSCON 2015)Cassandra Data Modelling with CQL (OSCON 2015)
Cassandra Data Modelling with CQL (OSCON 2015)
 
Star Schema Overview
Star Schema OverviewStar Schema Overview
Star Schema Overview
 
From Content Strategy to Drupal Site Building - Connecting the Dots
From Content Strategy to Drupal Site Building - Connecting the DotsFrom Content Strategy to Drupal Site Building - Connecting the Dots
From Content Strategy to Drupal Site Building - Connecting the Dots
 
Four Architectural Patterns
Four Architectural Patterns Four Architectural Patterns
Four Architectural Patterns
 
Strategy to double team throughput - Fullstack Porto
Strategy to double team throughput - Fullstack PortoStrategy to double team throughput - Fullstack Porto
Strategy to double team throughput - Fullstack Porto
 
Why Every Product Manager Needs to Know Big Data
Why Every Product Manager Needs to Know Big DataWhy Every Product Manager Needs to Know Big Data
Why Every Product Manager Needs to Know Big Data
 
Graph theory in Practise
Graph theory in PractiseGraph theory in Practise
Graph theory in Practise
 
The Expanding Boundaries of CSS
The Expanding Boundaries of CSSThe Expanding Boundaries of CSS
The Expanding Boundaries of CSS
 
Data Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong LearningData Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong Learning
 
Graph Modelling
Graph ModellingGraph Modelling
Graph Modelling
 
Data Scientist's Daily Life
Data Scientist's Daily LifeData Scientist's Daily Life
Data Scientist's Daily Life
 
Consistency, Availability, Partition: Make Your Choice
Consistency, Availability, Partition: Make Your ChoiceConsistency, Availability, Partition: Make Your Choice
Consistency, Availability, Partition: Make Your Choice
 
ResearchGate - How do 'Social Networks for Scientists' Affect Libraries?
ResearchGate - How do 'Social Networks for Scientists' Affect Libraries?ResearchGate - How do 'Social Networks for Scientists' Affect Libraries?
ResearchGate - How do 'Social Networks for Scientists' Affect Libraries?
 
3. ldap
3. ldap3. ldap
3. ldap
 
Witchcraft
WitchcraftWitchcraft
Witchcraft
 
Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...
Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...
Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...
 
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)
 
Reduce, Reuse, Refactor
Reduce, Reuse, RefactorReduce, Reuse, Refactor
Reduce, Reuse, Refactor
 
Reduce, Reuse, Refactor
Reduce, Reuse, RefactorReduce, Reuse, Refactor
Reduce, Reuse, Refactor
 
BoSUSA18 | Bob Moesta| The 5 Skills Of An Innovator
BoSUSA18 | Bob Moesta| The 5 Skills Of An InnovatorBoSUSA18 | Bob Moesta| The 5 Skills Of An Innovator
BoSUSA18 | Bob Moesta| The 5 Skills Of An Innovator
 

More from David Simons

Non-Functional Requirements
Non-Functional RequirementsNon-Functional Requirements
Non-Functional RequirementsDavid Simons
 
Build Tools & Maven
Build Tools & MavenBuild Tools & Maven
Build Tools & MavenDavid Simons
 
Decoupled APIs through microservices
Decoupled APIs through microservicesDecoupled APIs through microservices
Decoupled APIs through microservicesDavid Simons
 
TDD: What is it good for?
TDD: What is it good for?TDD: What is it good for?
TDD: What is it good for?David Simons
 
Domain Driven Design: A Precis
Domain Driven Design: A PrecisDomain Driven Design: A Precis
Domain Driven Design: A PrecisDavid Simons
 
Using Clojure to Marry Neo4j and Open Democracy
Using Clojure to Marry Neo4j and Open DemocracyUsing Clojure to Marry Neo4j and Open Democracy
Using Clojure to Marry Neo4j and Open DemocracyDavid Simons
 
Exploring Election Results with Neo4J
Exploring Election Results with Neo4JExploring Election Results with Neo4J
Exploring Election Results with Neo4JDavid Simons
 

More from David Simons (7)

Non-Functional Requirements
Non-Functional RequirementsNon-Functional Requirements
Non-Functional Requirements
 
Build Tools & Maven
Build Tools & MavenBuild Tools & Maven
Build Tools & Maven
 
Decoupled APIs through microservices
Decoupled APIs through microservicesDecoupled APIs through microservices
Decoupled APIs through microservices
 
TDD: What is it good for?
TDD: What is it good for?TDD: What is it good for?
TDD: What is it good for?
 
Domain Driven Design: A Precis
Domain Driven Design: A PrecisDomain Driven Design: A Precis
Domain Driven Design: A Precis
 
Using Clojure to Marry Neo4j and Open Democracy
Using Clojure to Marry Neo4j and Open DemocracyUsing Clojure to Marry Neo4j and Open Democracy
Using Clojure to Marry Neo4j and Open Democracy
 
Exploring Election Results with Neo4J
Exploring Election Results with Neo4JExploring Election Results with Neo4J
Exploring Election Results with Neo4J
 

Recently uploaded

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIES VE
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Product School
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxAbida Shariff
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationZilliz
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsExpeed Software
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekCzechDreamin
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...Product School
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...Product School
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...Product School
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 

Recently uploaded (20)

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 

Bristol Uni - Use Cases of NoSQL

  • 1. S Q L & N O S Q L D a v i d S i m o n s @ S w a m W i t h Tu r t l e s
  • 2. S Q L & N O S Q L D a v i d S i m o n s @ S w a m W i t h Tu r t l e s
  • 3. W H O A M I ? • Tech Lead/Consultant at Softwire • Background in Statistics & Computer Simulation
  • 4. W H AT D O W E D O ? • Business Analysis/Mapping • Architecture • Project Management • Design (UI and User Workflows) • Development • QA • Warranty
  • 5. W H AT D O W E D O ? • Business Analysis/Mapping • Architecture • Project Management • Design (UI and User Workflows) • Development • QA • Warranty What problems are we solving? How do we solve them? Solving them now! Are they still solving the problem?
  • 6. T O D AY W E ’ R E G O I N G T O TA L K A B O U T • Business Analysis/Mapping • Architecture • Project Management • Design (UI and User Workflows) • Development • QA • Warranty
  • 7. H O W T O D O A R C H I T E C T U R E E V O LV I N G D E S I G N U P - F R O N T D E C I S I O N M A K I N G
  • 8. T O D AY… • Part 1: Looking at some SQL & Database Theory • Part 2: Looking at a lot of NoSQL databases
  • 9. W H AT I S A D ATA B A S E ? PA R T 1 : T H E O RY
  • 10. - U N I V E R S I T Y O F G E O R G I A “A database is a collection of information organized to provide efficient retrieval.”
  • 11. T H E M Y T H I C A L D ATA B A S E D I V I D E S Q LN O S Q L
  • 12. T H E M Y T H I C A L D ATA B A S E D I V I D E • NoSQL (apparently) has always meant Not Only SQL • Considering Databases that don’t meet the SQL Standard which covers a wide range of databases
  • 13. T H E S Q L S TA N D A R D PA R T 1 : T H E O RY
  • 14. H I S T O RY • First defined by ANSI in 1986 (though around before then) • Structured Query Language • Different databases have implemented this standard way of storing, inserting and retrieving data
  • 15. E X A M P L E S O F S Q L D ATA B A S E S • MySQL • Microsoft SQL Server • Oracle • PostgreSQL (mostly) • IBM DB2 and more…
  • 16. W H AT ’ S I N T H E S TA N D A R D ? • Rules for how the language works • No opinion as to what the database looks like
  • 17. B U T… • ‘SQL’ has come to mean a lot more than the language (especially in the context of NoSQL) • Family of RDBMS databases that follow a set of rules
  • 18. W H AT ’ S I N A N R D B M S ? • Prescriptive Schema • Set-based Operations • Table-driven & Denormalised • ACID Transactions
  • 19. S C H E M A D R I V E N
  • 21. S E T- B A S E D O P E R AT I O N R E A D D A TA O U T W I T H
  • 22. E V E RY R O W I S A “ T H I N G ” Name Species 1 Puss 2 Dinah 3 Einstein 4 Jess
  • 23. “ W H E R E ” ( I N T E R S E C T I O N ) Name Species 1 Puss 2 Dinah 3 Einstein 4 Jess
  • 24. U N I O N S Name Species 1 Puss 2 Dinah 3 Einstein 4 Jess 5 Nemo 6 Moby Dick 7 Wanda
  • 25. – R O N E R N E S T ( & T H E S Q L C O M M U N I T Y AT L A R G E ) “Cursors are evil.”
  • 26. N O R M A L F O R M S Body Level One
  • 27. J O I N S Name Species Species Coolness Rating 1 Puss 0 2 Dinah 0 3 Einstein 10 4 Jess 0
  • 28. R E L AT I O N S B E T W E E N D ATA • We don’t like duplicating data • Goes out of sync • May not be the same everywhere
  • 29. R E L AT I O N S B E T W E E N D ATA • Objects have properties that come in groups • For example: Landmarks have cities and countries. • The same city will always have the same country
  • 30. W E S O LV E T H AT W I T H … • Denormalisation • Store linked groups as its own row in a separate table • And store pointers to that table • These are combined by query-time joins
  • 31. Name Species Species Coolness 1 Puss 2 Dinah 3 Einstein 4 Jess Species Coolness Rating 1 0 2 10 J O I N S
  • 32. T R A N S A C T I O N S W R I T E D A TA I N W I T H
  • 33. – J O H N N Y A P P L E S E E D “A unit of work you want to treat as a whole”
  • 34. Name Species 1 Puss 2 Dinah 3 Einstein 4 Jess
  • 37. Name Species 1 Puss 2 Dinah 3 Einstein 4 Jess
  • 38. The database is always in a valid state, as defined by a whole number of queries regardless of: (1) invalid data; (2) concurrent requests; (3) system failures
  • 39. The database is always in a valid state, as defined by a whole number of queries regardless of: (1) invalid data; (2) concurrent requests; (3) system failures
  • 40. The database is always in a valid state, as defined by a whole number of queries regardless of: (1) invalid data; (2) concurrent requests; (3) system failures
  • 41. The database is always in a valid state, as defined by a whole number of queries regardless of: (1) invalid data; (2) concurrent requests; (3) system failures
  • 42. A C I D • Atomicity • Consistency • Isolation • Durability
  • 43. W H AT ’ S I N A N R D B M S ? • Prescriptive Schema • Set-based Operations • Table-driven & Denormalised • ACID Transactions
  • 44. C A PA C I T Y & S C A L A B I L I T Y PA R T 1 : T H E O RY
  • 45. A S K I N G A S Y S T E M T O D O S O M E T H I N G U S E S R E S O U R C E S
  • 46. W H AT H A P P E N S A S M O R E R E Q U E S T S C O M E I N ?
  • 47. S Q L I S P R E T T Y G O O D F O R L A R G E A M O U N T S O F D ATA T R U T H F U L LY
  • 48. W I T H E N O U G H D ATA , Y O U H AV E T O S C A L E T H E H A R D T R U T H
  • 49. Y O U R C U R R E N T S Y S T E M D ATA B A S E A P P L I C AT I O N U S E R S
  • 50. A S I T G R O W S D ATA B A S E A P P L I C AT I O N U S E R S
  • 51. H O R I Z O N TA L S C A L A B I L I T Y D ATA B A S E A P P L I C AT I O N U S E R S D ATA B A S E D ATA B A S E
  • 52. V E R T I C A L S C A L A B I L I T Y M O R E P O W E R F U L D ATA B A S E A P P L I C AT I O N U S E R S
  • 53. S Q L C A N S C A L E … T H E H A R D T R U T H
  • 54. S Q L C A N S C A L E V E R T I C A L LY
  • 55. A N D … • Scaling to meet the needs of read operations is very doable • Master-Slave replication
  • 56. B U T… • Scaling writes is problematic • How do atomic transactions work on a scaled database? • How can SQL enforce constraints across multiple databases?
  • 57. - J O E R I S E B R A C H T S “To scale up write operations or the number of nodes in a cluster beyond a certain point you have to be able to relax some of the ACID requirements”
  • 58. T H E C A P T H E O R E M PA R T 1 : T H E O RY
  • 59. T H E C O S T O F S C A L I N G • You become vulnerable to network failures
  • 60. C A P T H E O R E M • Choose Two: • Consistency • Availability • Partition Tolerance • WARNING: These have specific definitions
  • 61. P R O V I S O There is a lot of thought in this area, I am giving a simplified description that would make many database people pull their hair out. https://martin.kleppmann.com/2015/05/11/ please-stop-calling-databases-cp-or-ap.html
  • 62. C A P T H E O R E M CP AP Consistent & Partition Tolerant Available & Partition Tolerant
  • 63. C A P T H E O R E M A BC Data = “Cat” Data = “Cat” Data = “Cat”
  • 64. C A P T H E O R E M A BC Data = “Cat” Data = “Dog” Data = “Cat”
  • 65. C A P T H E O R E M A BC Data = “Dog” Data = “Dog” Data = “Dog”
  • 66. A P S Y S T E M S
  • 67. C A P T H E O R E M A BC Data = “Dog” Data = “Dog” Data = “Dog”
  • 68. AVA I L A B L E ( “ A P ” ) S Y S T E M S A BC Data = “Wolf” Data = “Dog” Data = “Dog”
  • 69. AVA I L A B L E ( “ A P ” ) S Y S T E M S A BC Data = “Wolf” Data = “Dog” Data = “Wolf”
  • 70. C P S Y S T E M S
  • 71. C O N S I S T E N T ( “ C P ” ) S Y S T E M A BC Data = “Dog” Data = “Dog” Data = “Dog”
  • 72. C O N S I S T E N T ( “ C P ” ) S Y S T E M A BC Data = “Dog” Data = “Dog” Data = “Dog”
  • 73. C O N S I S T E N T ( “ C P ” ) S Y S T E M A BC Data = “Wolf” Data = “Dog” Data = “Wolf”
  • 74. part 1 done What shape is your data? Are you happy to pay? What uses your data? • Databases store data in an accessible way • SQL database meet a defined standard; NoSQL is a movement towards considering databases that don’t • SQL uses tables and schemas to store data, and acts on it like sets in a transactional way.
  • 75. I N C O N S I S T E N T D ATA B A S E S PA R T 2 : E X A M P L E S
  • 76. T H E R E ’ S A L O T O F VA L U E I N C O N S I S T E N C Y…
  • 77. – D Y N A M O : A M A Z O N ’ S H I G H LY AVA I L A B L E K E Y- VA L U E S T O R E “Reliability at massive scale is one of the biggest challenges we face at Amazon.com. Even the slightest outage has significant financial consequences and impacts customer trust.”
  • 78. – D Y N A M O : A M A Z O N ’ S H I G H LY AVA I L A B L E K E Y- VA L U E S T O R E “Dynamo targets applications that operate with weaker consistency if this results in high availability.”
  • 79. D Y N A M O I M P L E M E N TAT I O N S
  • 80. N O T G U A R A N T E E D C O N S I S T E N C Y T H E C O S T ?
  • 81. A M A Z O N S H O P P I N G I S T H A T H O N E S T LY O K A Y ?
  • 82. S M S H I S T O R I C L O G I S T H A T H O N E S T LY O K A Y ?
  • 83. W E U S E D …D Y N A M O I M P L E M E
  • 84. C A S S A N D R A • All nodes communicate with each other through a Gossip protocol similar to Dynamo and Riak, exchanging information about themselves and other nodes they have gossiped with. D Y N A M O I M P L
  • 85. C A S S A N D R A No single point of failure
  • 86. W H Y C A S S A N D R A • We needed fast and high availability writes • Data didn’t need to be real time - it was aggregate analytics so eventually consistent was enough.
  • 87. C A S S A N D R A : T H E C O N ’ S • Data is only eventually consistent - so if you need 100% accuracy it’s not great • Not as wide range of support as SQL (but nothing does) • Flexible schema makes it harder to integrate with OO languages
  • 88. C A S S A N D R A : T H E P R O ’ S • Very fast write throughput • SQL-like query language so you don’t need to relearn things • Wide range of language drivers • Highly available
  • 89. H I G H LY R E L AT I O N A L D ATA PA R T 2 : E X A M P L E S
  • 90. E V E RY R O W I S A “ T H I N G ” Name Species 1 Puss 2 Dinah 3 Einstein 4 Jess
  • 91. W H AT S Q L D O E S W E L L • Modelling objects: • With a fixed structure and shape • With a limited number of relations • With no opinion or opinion of any deeper underlying domain R D B M S ( R E L AT I O N A L D ATA B A S E M A N A G E M E N T S Y S T E M )
  • 92. T H E R E A R E P R O B L E M S T H I S I S B A D F O R B U T …
  • 93. K E V I N B A C O N S I X D E G R E E S O F …
  • 94.
  • 95.
  • 96.
  • 97. E L E C T I O N D ATA
  • 98. E L E C T I O N D ATA
  • 99. W O R L D ’ S L E A D I N G G R A P H D B :
  • 100. "embedded, disk-based, fully transactional Java persistence engine that stores data structured in graphs rather than in tables"
  • 101. D ATA S T O R A G E
  • 102. D ATA S T O R A G E
  • 103. D ATA S T O R A G E • Nodes and edges are all: • Stored as first-class objects on the file system • “typed” • Key-value stores
  • 104. D ATA I N T H E R E L AT I O N S • “Joins” are first class objects in the database that can be queried at no additional cost • Certain queries become trivial (e.g. Joins) • At a cost: high write-time cost
  • 105. P R O T O T Y P I N G • Easy to see and work with data • Schemaless • Active community with a lot of libraries
  • 106. N E O 4 J U S E R S
  • 107. N E O 4 J : T H E C O N ’ S • More expensive writes to the database • Not scalable • Less mature tooling (especially in non-Java ecosystems)
  • 108. N E O 4 J : T H E P R O ’ S • Models certain data models very well • Prevents costly queries when running lots of data • Schemalessness allows for fast prototyping and flexible data models • Commercial buy-in means language support is not far behind
  • 109. S C H E M A L E S S N E S S PA R T 2 : E X A M P L E S
  • 110.
  • 111. NB: MongoDB claims there’s a lot of usecases, we’re only covering this one
  • 112. M O N G O D B : T H E C O N ’ S • Mongo was the first famous NoSQL database and got used before it was tested and mature. There’s lots of articles about featurelessness and bugs • Schemalessness makes data integrity checks and OO language integration tricky
  • 113. M O N G O D B : T H E P R O ’ S • Schemalessness - if you want flexible data models • People have used it for a while, and so library support is not bad
  • 114. H O W D O Y O U R E T R I E V E Y O U R D ATA PA R T 2 : E X A M P L E S
  • 115. F R E E - T E X T S E A R C H
  • 116.
  • 117. D O C U M E N T S T O R E ElasticSearch
  • 118. D O C U M E N T S T O R E
  • 119. E V E RY R O W I S A “ T H I N G ” N A M E = P U S S C O O L N E S S = 0 ! N A M E = J E S S C O O L N E S S = 0 ! N A M E = D I N A H C O O L N E S S = 0 ! N A M E = E I N S T E I N C O O L N E S S = 1 0 ! D O C U M E N T
  • 120. A PA C H E L U C E N E
  • 121. “Apache Lucene is a high-performance, full- featured text search engine library … It is a technology suitable for nearly any application that requires full-text search”
  • 122. F O C U S E D A R O U N D T E X T S E A R C H I N G Q U E R I E S
  • 123. Q U E R I E S A R E TA I L O R E D T O T H E Q U E S T I O N S Y O U ’ L L B E A S K I N G
  • 124. { "query": { "match": {"hobbies": "skateboard"} } }
  • 125. { "query": { {"fuzzy": {"hobbies": “skateboarig"}} } }
  • 126. { "query": { {"match": {"hobbies": {"query": "writing reddit comments", "type": "phrase"}}} } }
  • 127. W H AT C O N S U M E S Y O U R D ATA ? E N D U S E R What is the average age of …?
  • 128. W H AT C O N S U M E S Y O U R D ATA ? E N D U S E R Er…. I think it was something like “Campbell”?
  • 129. O U R C H O I C E I S I N F O R M E D B Y O U R P L A N S F O R T H E A P P L I C AT I O N R E M E M B E R T H A T
  • 130. E L A S T I C S E A R C H : T H E C O N ’ S • It only does one thing (even if it does it well)
  • 131. E L A S T I C S E A R C H : T H E P R O ’ S • It has a lot of search related queries built into it - fuzzy/ phonetic/sentence matching • A lot of people use this, support is mature • Integration with a large number of other languages and frameworks - this is the industry standard
  • 132. W H E N I T G O E S W R O N G PA R T 2 : E X A M P L E S
  • 133.
  • 134. S Q L : T H E C O N ’ S • It’s very hard to scale writes • It has a specific data model - not every data domain fits into it • e.g. highly relational models, schemalessness • Domain non-specific query languages
  • 135. S Q L : T H E P R O ’ S • If a library exists for anything, it exists for SQL • ACID transactions make everything easy • Constraints and Schemas allow for automated data integrity checking • Easy denormalisation of data
  • 136. part 2 done What shape is your data? Are you happy to pay? What uses your data? • Some sites are happy to sacrifice consistency for availability - Dynamo is a standard that databases can meet to fulfil that • If you’ll be doing lots of joins, Graph Databases such as Neo4j improve performance • Sometimes you want the flexibility to store any objects - there are a range of schemaless databases available • Consider what will retrieve your data, and ensure you have a database efficient for your use case.
  • 137. A N Y Q U E S T I O N S ? D a v i d S i m o n s @ S w a m W i t h Tu r t l e s