SlideShare a Scribd company logo
C H O O S I N G T H E
R I G H T DATA B AS E
D a v i d S i m o n s
@ S w a m W i t h Tu r t l e s
C H O O S I N G T H E
R I G H T DATA B AS E
D a v i d S i m o n s
@ S w a m W i t h Tu r t l e s
W H O A M I ?
• Tech Lead/Consultant at
Softwire
• Specification
• Architecture
• Design
• Implementation
• Support
• … and more!
W H O A M I ?
Hack with
Databases
Been on TV!
Political Nerd
@SwamWithTurtles
A ( P O L I T I C A L ) A S I D E
T H E AV R E F E R E N D U M
“ N O T O AV ”
T H AT ’ S T H E W H O L E
P O I N T
W E L L D U H …
I M A G I N I N G A
N E W S Y S T E M
C O U L D G E T U S
T O T H E S A M E
P L A C E B E T T E R
T H E ‘ Y E S ’ C A M PA I G N
WA S N O T…
R E - E X A M I N I N G
W H AT M A K E
T H E S Y S T E M
G O O D
T H E ‘ Y E S ’ C A M PA I G N WA S …
W H AT D O E S
T H I S H AV E T O
D O W I T H
D ATA B A S E S ?
E R R R …
SQL
Clear Market
Standard
• Denormalisation
• Relational Data
• Foreign Keys
• Data Integrity Checks
• Amazing Indexing
• Maturity and Robustness
• SQL as a query language
• Large Community Support
• All the plug-in/library
integration
• Available Support
Contracts
W H Y A R E
P E O P L E
A D V O C AT I N G
N O S Q L
D ATA B A S E S ?
S O …
T H E Y T H I N K T H E
G O A L P O S T S A R E W R O N G
N O S Q L
S O L U T I O N S
C A N ’ T [ V E R B ]
B U T …
I T ’ S N O T
M E A N T T O
S O LV E A
P R O B L E M T H AT
N E E D S
[ V E R B ] I N G
B E C A U S E …
T O D AY…
• What should we be asking
when we look at
databases?
• What do the results of
those questions mean for
your database?
How big is your data?
What shape is your data?
Are you happy to pay?
What uses your data?
S Q L I S P R E T T Y
G O O D F O R
L A R G E A M O U N T S
O F D ATA
T R U T H F U L LY
W I T H E N O U G H
D ATA , Y O U
H AV E T O
D I S T R I B U T E
T H E H A R D T R U T H
W H AT H A P P E N S
W H E N O N E
I N S TA N C E G O E S
D O W N ?
B U T …
C A P T H E O R E M
• Choose Two:
• Consistency
• Availability
• Partition Tolerance
P R O V I S O
There is a lot of thought in this area,
I am giving a simplified description
that would make many database people
pull their hair out.
https://martin.kleppmann.com/2015/05/11/
please-stop-calling-databases-cp-or-ap.html
C A P T H E O R E M
CP AP
Consistent
& Partition Tolerant
Available
& Partition Tolerant
C A P T H E O R E M
A
BC
Data = “Cat”
Data = “Cat”
Data = “Cat”
C A P T H E O R E M
A
BC
Data = “Cat”
Data = “Dog”
Data = “Cat”
C A P T H E O R E M
A
BC
Data = “Dog”
Data = “Dog”
Data = “Dog”
C A P T H E O R E M
A
BC
Data = “Dog”
Data = “Dog” Data = “Dog”
AVA I L A B L E ( “ A P ” ) S Y S T E M S
A
BC
Data = “Wolf”
Data = “Dog” Data = “Dog”
AVA I L A B L E ( “ A P ” ) S Y S T E M S
A
BC
Data = “Wolf”
Data = “Dog” Data = “Wolf”
C O N S I S T E N T ( “ C P ” ) S Y S T E M
A
BC
Data = “Dog”
Data = “Dog” Data = “Dog”
C O N S I S T E N T ( “ C P ” ) S Y S T E M
A
BC
Data = “Dog”
Data = “Dog” Data = “Dog”
C O N S I S T E N T ( “ C P ” ) S Y S T E M
A
BC
Data = “Wolf”
Data = “Dog” Data = “Wolf”
T H E R E ’ S A L O T
O F VA L U E I N
C O N S I S T E N C Y…
– D Y N A M O : A M A Z O N ’ S H I G H LY AVA I L A B L E K E Y- VA L U E
S T O R E
“Reliability at massive scale is one of the biggest
challenges we face at Amazon.com. Even the
slightest outage has significant financial
consequences and impacts customer trust.”
– D Y N A M O : A M A Z O N ’ S H I G H LY AVA I L A B L E K E Y- VA L U E
S T O R E
“Dynamo targets applications that operate with
weaker consistency if this results in high
availability.”
D Y N A M O I M P L E M E N TAT I O N S
N O T
G U A R A N T E E D
C O N S I S T E N C Y
T H E C O S T ?
“open source software project that enables distributed processing of large
data sets across clusters of commodity servers”
N O T A LWAY S
AVA I L A B L E
T H E C O S T ?
W H AT I F W E
D O N ’ T N E E D
T O D I S T R I B U T E ?
B U T …
How big is your data?
What shape is your data?
Are you happy to pay?
What uses your data?
R D B M S
( R E L AT I O N A L D ATA B A S E
M A N A G E M E N T S Y S T E M )
E V E RY R O W I S A “ T H I N G ”
Name Species
1 Puss
2 Dinah
3 Einstein
4 Jess
S E T- B A S E D
O P E R AT I O N
R E A D D A TA O U T W I T H
“ W H E R E ” ( I N T E R S E C T I O N )
Name Species
1 Puss
2 Dinah
3 Einstein
4 Jess
U N I O N S
Name Species
1 Puss
2 Dinah
3 Einstein
4 Jess
5 Nemo
6 Moby Dick
7 Wanda
J O I N S
Name Species
Species Coolness
Rating
1 Puss 0
2 Dinah 0
3 Einstein 10
4 Jess 0
C A R T E S I A N P R O D U C T S
0 10
0 10
0 10
C A R T E S I A N P R O D U C T S
0 10
0 10
0 10
W H AT S Q L
D O E S W E L L
• Modelling objects:
• With a fixed structure
and shape
• With a limited number of
relations
• With no opinion or
opinion of any deeper
underlying domain
R D B M S
( R E L AT I O N A L D ATA B A S E
M A N A G E M E N T S Y S T E M )
T H E R E A R E
P R O B L E M S T H I S
I S B A D F O R
B U T …
K E V I N B A C O N
S I X D E G R E E S O F …
T H E R E I S N O O P E N
E L E C T I O N D ATA
T H E P R O B L E M
E L E C T I O N D ATA
E L E C T I O N D ATA
E L E C T I O N D ATA
E =
(e.g.) member of, held in,
stood in…
V =
elections, constituencies,
years, politicians and parties
W O R L D ’ S L E A D I N G G R A P H D B :
"embedded, disk-based, fully transactional Java
persistence engine that stores data structured in
graphs rather than in tables"
D ATA S T O R A G E
D ATA S T O R A G E
D ATA
S T O R A G E
• Nodes and edges are all:
• Stored as first-class
objects on the file system
• “typed”
• Key-value stores
D ATA I N T H E
R E L AT I O N S
• “Joins” are first class
objects in the database
that can be queried at no
additional cost
• Certain queries become
trivial (e.g. Joins)
P R O T O T Y P I N G
• Easy to see and work with
data
• Schemaless
• Active community with a
lot of libraries
N E O 4 J U S E R S
T H E R E A R E
P R O B L E M S T H I S
I S
O V E R E N G I N E E R E D
F O R
B U T …
B A S I C L O C K
M E C H A N I S M S
F O R E X A M P L E
C A C H I N G
F O R E X A M P L E
K E Y / VA L U E
S T O R E S
P O S S I B L E S O L U T I O N
K E Y / VA L U E
S T O R E S
P O S S I B L E S O L U T I O N
DynamoDB
Used by BBC for managing scaled
scheduled processes
Used by Twitter for caching
your timeline
T I M E S E R I E S
D ATA B A S E
Timestamp Value
2014-06-10T12:00:00+0100 17
2014-06-10T12:15:00+0100 17
2014-06-10T12:30:00+0100 20
2014-06-10T12:45:00+0100 22
2014-06-10T13:00:00+0100 24
2014-06-10T13:15:00+0100 28
2014-06-10T13:30:00+0100 32
T H E R E A R E
P R O B L E M S T H I S
I S T O O S T R I C T
F O R
B U T …
S C H E M A L E S S
E V E RY R O W I S A “ T H I N G ”
N A M E = P U S S
C O O L N E S S = 0
!
N A M E = J E S S
C O O L N E S S = 0
!
N A M E = D I N A H
C O O L N E S S = 0
!
N A M E = E I N S T E I N
C O O L N E S S = 1 0
!
D O C U M E N T
S O C I A L M E D I A S I T E
Y O U C A N N O T
D E N O R M A L I S E
D ATA
WA R N I N G
T H E R E A R E
P R O B L E M S T H AT
S Q L I S A C T U A L LY
G O O D F O R …
B U T …
How big is your data?
What shape is your data?
Are you happy to pay?
What uses your data?
T H I S M AY S E E M
L I K E A T R I V I A L
P O I N T…
C O S T S …
• Oracle: $50,000+
• SQL Server: $10,000+
( B A S I C A L LY )
F R E E
• MySQL
• PostgreSQL
• Riak
• Voldemort
• MariaDB
• Cassandra
• MongoDB
How big is your data?
What shape is your data?
Are you happy to pay?
What uses your data?
T H I N G S H AV E
T O U S E O U R
D ATA …
R E M E M B E R T H A T
A P P L I C AT I O N A R C H I T E C T U R E
D ATA B A S E C O D E Data
A P I A R C H I T E C T U R E
D ATA B A S E C O D E Data E N D U S E R
A P I A R C H I T E C T U R E
E N D U S E R What is the average age of …?
A P I A R C H I T E C T U R E
E N D U S E R
Er….
I think it was something like “Campbell”?
O U R C H O I C E I S
I N F O R M E D B Y
O U R P L A N S F O R
T H E A P P L I C AT I O N
R E M E M B E R T H A T
G E O S PAT I A L
I N D E X E S
(Another rambly aside)
L I N K E D
M E D I A
F R A M E W O R K
A PA C H E
M A R M O T TA
O U T O F T H E B O X …
D O C U M E N T
S T O R E
D O C U M E N T
S T O R E
ElasticSearch
E V E RY R O W I S A “ T H I N G ”
N A M E = P U S S
C O O L N E S S = 0
!
N A M E = J E S S
C O O L N E S S = 0
!
N A M E = D I N A H
C O O L N E S S = 0
!
N A M E = E I N S T E I N
C O O L N E S S = 1 0
!
D O C U M E N T
A PA C H E
L U C E N E
“Apache Lucene is a high-performance, full-
featured text search engine library … It is a
technology suitable for nearly any application that
requires full-text search”
F O C U S E D
A R O U N D
T E X T
S E A R C H I N G
Q U E R I E S
{
"query": {
"match": {"hobbies": "skateboard"}
}
}
{
"query": {
{"fuzzy": {"hobbies": “skateboarig"}}
}
}
{
"query": {
{"match": {"hobbies": {"query": "writing
reddit comments", "type": "phrase"}}}
}
}
I N S U M M A R Y …
D ATA B A S E S W E ’ V E TA L K E D A B O U T
T O D AY…
• SQL (Industry standard RDBMS coming on many flavours of different cost. used by many)
• Cassandra (Eventually consistent Dynamo implementation)
• Riak (Eventually consistent Dynamo implementation)
• Hadoop (Large ecosystem focused around scalability - focused on consistency and utilising many nodes. Used by
Facebook for their messenging)
• Neo4j (Graph Database, with poor scalability but high-fidelity data model. Used by many companies for highly
relational data)
• Redis (In-memory key-value store, used by twitter for their caching as a lightweight solution.)
• DynamoDB (AWS managed DBaaS, used by the BBC among others for light-weight key-value store needs such as
locking)
• MongoDB (Document store database, schedules with some interesting indexes. Used well by New York Times and
Foursquar.e Used purely by Diaspora v1)
• Apache Marmotta (Bleeding Edge DB used by Red Bull Media House to comply with Linked Data framework,
allowing easy integration)
• ElasticSearch (Document store database, providing easy searching out the box. Used by github and StackOverflow
among others)
How big is your data?
What shape is your
data?
Are you happy to pay?
What uses your
data?
If it gets big enough,
you have to distribute
AP (available)
vs.
CP (consistent)
How big is your data?
What shape is your
data?
Are you happy to pay?
What uses your
data?
If it’s not big, you can
use
high-fidelity data
models
Time Series
Graph DBs
Denormalised Rows
.. and more!
How big is your data?
What shape is your
data?
Are you happy to pay?
What uses your
data?
You may have to make
sacrifices…
How big is your data?
What shape is your
data?
Are you happy to pay?
What uses your
data?
Think about the
queries you’ll be
running up-front…
… this can prevent
costly rearchitecting
down the line
How big is your data?
What shape is your
data?
Are you happy to pay?
What uses your
data?
Your system is
unique
NoSQL is not
one thing -
there’s a range
of solutions
Consider them
on their own
merits!
A N Y Q U E S T I O N S >
D a v i d S i m o n s
@ S w a m W i t h Tu r t l e s

More Related Content

What's hot

Live and (Machine) Learn: Cognitive Services and Vue.js
Live and (Machine) Learn: Cognitive Services and Vue.jsLive and (Machine) Learn: Cognitive Services and Vue.js
Live and (Machine) Learn: Cognitive Services and Vue.js
Microsoft Tech Community
 
Network x python_meetup_2015-08-27
Network x python_meetup_2015-08-27Network x python_meetup_2015-08-27
Network x python_meetup_2015-08-27
Chris Allison
 
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)
Jonathon Hare
 
SEWM'14 keynote: Mining Events from Multimedia Streams
SEWM'14 keynote: Mining Events from Multimedia StreamsSEWM'14 keynote: Mining Events from Multimedia Streams
SEWM'14 keynote: Mining Events from Multimedia Streams
Jonathon Hare
 
Transforming developer from Commodity to Premium - A tale of micorservices
Transforming developer from Commodity to Premium - A tale of micorservicesTransforming developer from Commodity to Premium - A tale of micorservices
Transforming developer from Commodity to Premium - A tale of micorservices
Kishore Yekkanti
 
SharePoint Saturday Redmond - Building solutions with the future in mind
SharePoint Saturday Redmond - Building solutions with the future in mindSharePoint Saturday Redmond - Building solutions with the future in mind
SharePoint Saturday Redmond - Building solutions with the future in mindChris Johnson
 
Tech rfp template
Tech rfp templateTech rfp template
Tech rfp template
Anna Duin
 
Gain Maximum Visibility into Your Applications - DEM03 - Chicago AWS Summit
Gain Maximum Visibility into Your Applications - DEM03 - Chicago AWS SummitGain Maximum Visibility into Your Applications - DEM03 - Chicago AWS Summit
Gain Maximum Visibility into Your Applications - DEM03 - Chicago AWS Summit
Amazon Web Services
 
You Created a Plugin. Now What?
You Created a Plugin. Now What?You Created a Plugin. Now What?
You Created a Plugin. Now What?
Adam W. Warner
 
Gain Maximum Visibility into Your Applications
Gain Maximum Visibility into Your Applications Gain Maximum Visibility into Your Applications
Gain Maximum Visibility into Your Applications
Amazon Web Services
 
You Created a Plugin. Now What? WordCamp Sacramento
You Created a Plugin. Now What? WordCamp SacramentoYou Created a Plugin. Now What? WordCamp Sacramento
You Created a Plugin. Now What? WordCamp Sacramento
Adam W. Warner
 
You Created a Plugin. Now What? WordCamp Orange County
You Created a Plugin. Now What? WordCamp Orange CountyYou Created a Plugin. Now What? WordCamp Orange County
You Created a Plugin. Now What? WordCamp Orange County
Adam W. Warner
 
Wrangle Your Defense Using Offensive Tactics BSides CT 2019
Wrangle Your Defense Using Offensive Tactics BSides CT 2019Wrangle Your Defense Using Offensive Tactics BSides CT 2019
Wrangle Your Defense Using Offensive Tactics BSides CT 2019
Matt Dunn
 
CIA For WordPress Developers
CIA For WordPress DevelopersCIA For WordPress Developers
CIA For WordPress Developers
David Brumbaugh
 
Wrangle Your Defense Using Offensive Tactics - ISSA May Meeting
Wrangle Your Defense Using Offensive Tactics - ISSA May MeetingWrangle Your Defense Using Offensive Tactics - ISSA May Meeting
Wrangle Your Defense Using Offensive Tactics - ISSA May Meeting
Matt Dunn
 
100% Visibility - Jason Yee - Codemotion Amsterdam 2018
100% Visibility - Jason Yee - Codemotion Amsterdam 2018100% Visibility - Jason Yee - Codemotion Amsterdam 2018
100% Visibility - Jason Yee - Codemotion Amsterdam 2018
Codemotion
 
Data Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong LearningData Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong Learning
Megan Bowe
 
How can documentation become inherently Agile?
How can documentation become inherently Agile?How can documentation become inherently Agile?
How can documentation become inherently Agile?
eBranding Ninja
 
Ninja Correlation of APT Binaries
Ninja Correlation of APT BinariesNinja Correlation of APT Binaries
Ninja Correlation of APT Binaries
CODE BLUE
 

What's hot (20)

Live and (Machine) Learn: Cognitive Services and Vue.js
Live and (Machine) Learn: Cognitive Services and Vue.jsLive and (Machine) Learn: Cognitive Services and Vue.js
Live and (Machine) Learn: Cognitive Services and Vue.js
 
Network x python_meetup_2015-08-27
Network x python_meetup_2015-08-27Network x python_meetup_2015-08-27
Network x python_meetup_2015-08-27
 
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)
 
SEWM'14 keynote: Mining Events from Multimedia Streams
SEWM'14 keynote: Mining Events from Multimedia StreamsSEWM'14 keynote: Mining Events from Multimedia Streams
SEWM'14 keynote: Mining Events from Multimedia Streams
 
Transforming developer from Commodity to Premium - A tale of micorservices
Transforming developer from Commodity to Premium - A tale of micorservicesTransforming developer from Commodity to Premium - A tale of micorservices
Transforming developer from Commodity to Premium - A tale of micorservices
 
SharePoint Saturday Redmond - Building solutions with the future in mind
SharePoint Saturday Redmond - Building solutions with the future in mindSharePoint Saturday Redmond - Building solutions with the future in mind
SharePoint Saturday Redmond - Building solutions with the future in mind
 
Tech rfp template
Tech rfp templateTech rfp template
Tech rfp template
 
Gain Maximum Visibility into Your Applications - DEM03 - Chicago AWS Summit
Gain Maximum Visibility into Your Applications - DEM03 - Chicago AWS SummitGain Maximum Visibility into Your Applications - DEM03 - Chicago AWS Summit
Gain Maximum Visibility into Your Applications - DEM03 - Chicago AWS Summit
 
You Created a Plugin. Now What?
You Created a Plugin. Now What?You Created a Plugin. Now What?
You Created a Plugin. Now What?
 
Yammer time
Yammer timeYammer time
Yammer time
 
Gain Maximum Visibility into Your Applications
Gain Maximum Visibility into Your Applications Gain Maximum Visibility into Your Applications
Gain Maximum Visibility into Your Applications
 
You Created a Plugin. Now What? WordCamp Sacramento
You Created a Plugin. Now What? WordCamp SacramentoYou Created a Plugin. Now What? WordCamp Sacramento
You Created a Plugin. Now What? WordCamp Sacramento
 
You Created a Plugin. Now What? WordCamp Orange County
You Created a Plugin. Now What? WordCamp Orange CountyYou Created a Plugin. Now What? WordCamp Orange County
You Created a Plugin. Now What? WordCamp Orange County
 
Wrangle Your Defense Using Offensive Tactics BSides CT 2019
Wrangle Your Defense Using Offensive Tactics BSides CT 2019Wrangle Your Defense Using Offensive Tactics BSides CT 2019
Wrangle Your Defense Using Offensive Tactics BSides CT 2019
 
CIA For WordPress Developers
CIA For WordPress DevelopersCIA For WordPress Developers
CIA For WordPress Developers
 
Wrangle Your Defense Using Offensive Tactics - ISSA May Meeting
Wrangle Your Defense Using Offensive Tactics - ISSA May MeetingWrangle Your Defense Using Offensive Tactics - ISSA May Meeting
Wrangle Your Defense Using Offensive Tactics - ISSA May Meeting
 
100% Visibility - Jason Yee - Codemotion Amsterdam 2018
100% Visibility - Jason Yee - Codemotion Amsterdam 2018100% Visibility - Jason Yee - Codemotion Amsterdam 2018
100% Visibility - Jason Yee - Codemotion Amsterdam 2018
 
Data Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong LearningData Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong Learning
 
How can documentation become inherently Agile?
How can documentation become inherently Agile?How can documentation become inherently Agile?
How can documentation become inherently Agile?
 
Ninja Correlation of APT Binaries
Ninja Correlation of APT BinariesNinja Correlation of APT Binaries
Ninja Correlation of APT Binaries
 

Similar to Choosing the Right Database

From Content Strategy to Drupal Site Building - Connecting the Dots
From Content Strategy to Drupal Site Building - Connecting the DotsFrom Content Strategy to Drupal Site Building - Connecting the Dots
From Content Strategy to Drupal Site Building - Connecting the Dots
Ronald Ashri
 
eHarmony @ Phoenix Con 2016
eHarmony @ Phoenix Con 2016eHarmony @ Phoenix Con 2016
eHarmony @ Phoenix Con 2016
Vijaykumar Vangapandu
 
Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...
Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...
Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...
Massimiliano Crosato
 
The Expanding Boundaries of CSS
The Expanding Boundaries of CSSThe Expanding Boundaries of CSS
The Expanding Boundaries of CSS
chriseppstein
 
Data Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong LearningData Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong Learning
Megan Bowe
 
Digital Data Commons - Emergence of AI Blockchain Convergence
Digital Data Commons - Emergence of AI Blockchain ConvergenceDigital Data Commons - Emergence of AI Blockchain Convergence
Digital Data Commons - Emergence of AI Blockchain Convergence
Gokul Alex
 
Thinking like a Network
Thinking like a NetworkThinking like a Network
Thinking like a Network
Jonas Altman
 
Graph Modelling
Graph ModellingGraph Modelling
Graph Modelling
David Simons
 
Domínio: Dividir e conquistar
Domínio: Dividir e conquistarDomínio: Dividir e conquistar
Domínio: Dividir e conquistar
Nelson Senna do Amaral
 
PHP Experience 2016 - ROA – Resource Oriented Architecture
PHP Experience 2016 - ROA – Resource Oriented ArchitecturePHP Experience 2016 - ROA – Resource Oriented Architecture
PHP Experience 2016 - ROA – Resource Oriented Architecture
iMasters
 
Star Schema Overview
Star Schema OverviewStar Schema Overview
Star Schema Overview
Murugan Pandian
 
Delight Your Customers with Modern SEO
Delight Your Customers with Modern SEODelight Your Customers with Modern SEO
Delight Your Customers with Modern SEO
Charlotte Han
 
Data Scientist's Daily Life
Data Scientist's Daily LifeData Scientist's Daily Life
Data Scientist's Daily Life
Bryan Yang
 
AUA Data Science Meetup
AUA Data Science MeetupAUA Data Science Meetup
AUA Data Science Meetup
David Gevorkyan
 
Four Architectural Patterns
Four Architectural Patterns Four Architectural Patterns
Four Architectural Patterns
David Simons
 
Cassandra Data Modelling with CQL (OSCON 2015)
Cassandra Data Modelling with CQL (OSCON 2015)Cassandra Data Modelling with CQL (OSCON 2015)
Cassandra Data Modelling with CQL (OSCON 2015)
twentyideas
 
AWS Seminar Series 2015 Melbourne
AWS Seminar Series 2015 MelbourneAWS Seminar Series 2015 Melbourne
AWS Seminar Series 2015 Melbourne
Amazon Web Services
 
AWS Seminar Series 2015 Brisbane
AWS Seminar Series 2015 BrisbaneAWS Seminar Series 2015 Brisbane
AWS Seminar Series 2015 Brisbane
Amazon Web Services
 
Graph theory in Practise
Graph theory in PractiseGraph theory in Practise
Graph theory in Practise
David Simons
 
AWS SeMINAR SERIES 2015 Sydney
AWS SeMINAR SERIES 2015 SydneyAWS SeMINAR SERIES 2015 Sydney
AWS SeMINAR SERIES 2015 Sydney
Amazon Web Services
 

Similar to Choosing the Right Database (20)

From Content Strategy to Drupal Site Building - Connecting the Dots
From Content Strategy to Drupal Site Building - Connecting the DotsFrom Content Strategy to Drupal Site Building - Connecting the Dots
From Content Strategy to Drupal Site Building - Connecting the Dots
 
eHarmony @ Phoenix Con 2016
eHarmony @ Phoenix Con 2016eHarmony @ Phoenix Con 2016
eHarmony @ Phoenix Con 2016
 
Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...
Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...
Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...
 
The Expanding Boundaries of CSS
The Expanding Boundaries of CSSThe Expanding Boundaries of CSS
The Expanding Boundaries of CSS
 
Data Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong LearningData Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong Learning
 
Digital Data Commons - Emergence of AI Blockchain Convergence
Digital Data Commons - Emergence of AI Blockchain ConvergenceDigital Data Commons - Emergence of AI Blockchain Convergence
Digital Data Commons - Emergence of AI Blockchain Convergence
 
Thinking like a Network
Thinking like a NetworkThinking like a Network
Thinking like a Network
 
Graph Modelling
Graph ModellingGraph Modelling
Graph Modelling
 
Domínio: Dividir e conquistar
Domínio: Dividir e conquistarDomínio: Dividir e conquistar
Domínio: Dividir e conquistar
 
PHP Experience 2016 - ROA – Resource Oriented Architecture
PHP Experience 2016 - ROA – Resource Oriented ArchitecturePHP Experience 2016 - ROA – Resource Oriented Architecture
PHP Experience 2016 - ROA – Resource Oriented Architecture
 
Star Schema Overview
Star Schema OverviewStar Schema Overview
Star Schema Overview
 
Delight Your Customers with Modern SEO
Delight Your Customers with Modern SEODelight Your Customers with Modern SEO
Delight Your Customers with Modern SEO
 
Data Scientist's Daily Life
Data Scientist's Daily LifeData Scientist's Daily Life
Data Scientist's Daily Life
 
AUA Data Science Meetup
AUA Data Science MeetupAUA Data Science Meetup
AUA Data Science Meetup
 
Four Architectural Patterns
Four Architectural Patterns Four Architectural Patterns
Four Architectural Patterns
 
Cassandra Data Modelling with CQL (OSCON 2015)
Cassandra Data Modelling with CQL (OSCON 2015)Cassandra Data Modelling with CQL (OSCON 2015)
Cassandra Data Modelling with CQL (OSCON 2015)
 
AWS Seminar Series 2015 Melbourne
AWS Seminar Series 2015 MelbourneAWS Seminar Series 2015 Melbourne
AWS Seminar Series 2015 Melbourne
 
AWS Seminar Series 2015 Brisbane
AWS Seminar Series 2015 BrisbaneAWS Seminar Series 2015 Brisbane
AWS Seminar Series 2015 Brisbane
 
Graph theory in Practise
Graph theory in PractiseGraph theory in Practise
Graph theory in Practise
 
AWS SeMINAR SERIES 2015 Sydney
AWS SeMINAR SERIES 2015 SydneyAWS SeMINAR SERIES 2015 Sydney
AWS SeMINAR SERIES 2015 Sydney
 

More from David Simons

Non-Functional Requirements
Non-Functional RequirementsNon-Functional Requirements
Non-Functional Requirements
David Simons
 
Build Tools & Maven
Build Tools & MavenBuild Tools & Maven
Build Tools & Maven
David Simons
 
Decoupled APIs through microservices
Decoupled APIs through microservicesDecoupled APIs through microservices
Decoupled APIs through microservices
David Simons
 
TDD: What is it good for?
TDD: What is it good for?TDD: What is it good for?
TDD: What is it good for?
David Simons
 
Domain Driven Design: A Precis
Domain Driven Design: A PrecisDomain Driven Design: A Precis
Domain Driven Design: A Precis
David Simons
 
Using Clojure to Marry Neo4j and Open Democracy
Using Clojure to Marry Neo4j and Open DemocracyUsing Clojure to Marry Neo4j and Open Democracy
Using Clojure to Marry Neo4j and Open Democracy
David Simons
 
Exploring Election Results with Neo4J
Exploring Election Results with Neo4JExploring Election Results with Neo4J
Exploring Election Results with Neo4J
David Simons
 

More from David Simons (7)

Non-Functional Requirements
Non-Functional RequirementsNon-Functional Requirements
Non-Functional Requirements
 
Build Tools & Maven
Build Tools & MavenBuild Tools & Maven
Build Tools & Maven
 
Decoupled APIs through microservices
Decoupled APIs through microservicesDecoupled APIs through microservices
Decoupled APIs through microservices
 
TDD: What is it good for?
TDD: What is it good for?TDD: What is it good for?
TDD: What is it good for?
 
Domain Driven Design: A Precis
Domain Driven Design: A PrecisDomain Driven Design: A Precis
Domain Driven Design: A Precis
 
Using Clojure to Marry Neo4j and Open Democracy
Using Clojure to Marry Neo4j and Open DemocracyUsing Clojure to Marry Neo4j and Open Democracy
Using Clojure to Marry Neo4j and Open Democracy
 
Exploring Election Results with Neo4J
Exploring Election Results with Neo4JExploring Election Results with Neo4J
Exploring Election Results with Neo4J
 

Recently uploaded

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 

Recently uploaded (20)

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 

Choosing the Right Database

  • 1. C H O O S I N G T H E R I G H T DATA B AS E D a v i d S i m o n s @ S w a m W i t h Tu r t l e s
  • 2. C H O O S I N G T H E R I G H T DATA B AS E D a v i d S i m o n s @ S w a m W i t h Tu r t l e s
  • 3. W H O A M I ? • Tech Lead/Consultant at Softwire • Specification • Architecture • Design • Implementation • Support • … and more!
  • 4. W H O A M I ? Hack with Databases Been on TV! Political Nerd @SwamWithTurtles
  • 5. A ( P O L I T I C A L ) A S I D E
  • 6. T H E AV R E F E R E N D U M
  • 7. “ N O T O AV ”
  • 8. T H AT ’ S T H E W H O L E P O I N T W E L L D U H …
  • 9. I M A G I N I N G A N E W S Y S T E M C O U L D G E T U S T O T H E S A M E P L A C E B E T T E R T H E ‘ Y E S ’ C A M PA I G N WA S N O T…
  • 10. R E - E X A M I N I N G W H AT M A K E T H E S Y S T E M G O O D T H E ‘ Y E S ’ C A M PA I G N WA S …
  • 11.
  • 12. W H AT D O E S T H I S H AV E T O D O W I T H D ATA B A S E S ? E R R R …
  • 13. SQL Clear Market Standard • Denormalisation • Relational Data • Foreign Keys • Data Integrity Checks • Amazing Indexing • Maturity and Robustness • SQL as a query language • Large Community Support • All the plug-in/library integration • Available Support Contracts
  • 14. W H Y A R E P E O P L E A D V O C AT I N G N O S Q L D ATA B A S E S ? S O …
  • 15. T H E Y T H I N K T H E G O A L P O S T S A R E W R O N G
  • 16. N O S Q L S O L U T I O N S C A N ’ T [ V E R B ] B U T …
  • 17. I T ’ S N O T M E A N T T O S O LV E A P R O B L E M T H AT N E E D S [ V E R B ] I N G B E C A U S E …
  • 18. T O D AY… • What should we be asking when we look at databases? • What do the results of those questions mean for your database?
  • 19. How big is your data? What shape is your data? Are you happy to pay? What uses your data?
  • 20. S Q L I S P R E T T Y G O O D F O R L A R G E A M O U N T S O F D ATA T R U T H F U L LY
  • 21. W I T H E N O U G H D ATA , Y O U H AV E T O D I S T R I B U T E T H E H A R D T R U T H
  • 22. W H AT H A P P E N S W H E N O N E I N S TA N C E G O E S D O W N ? B U T …
  • 23. C A P T H E O R E M • Choose Two: • Consistency • Availability • Partition Tolerance
  • 24. P R O V I S O There is a lot of thought in this area, I am giving a simplified description that would make many database people pull their hair out. https://martin.kleppmann.com/2015/05/11/ please-stop-calling-databases-cp-or-ap.html
  • 25. C A P T H E O R E M CP AP Consistent & Partition Tolerant Available & Partition Tolerant
  • 26. C A P T H E O R E M A BC Data = “Cat” Data = “Cat” Data = “Cat”
  • 27. C A P T H E O R E M A BC Data = “Cat” Data = “Dog” Data = “Cat”
  • 28. C A P T H E O R E M A BC Data = “Dog” Data = “Dog” Data = “Dog”
  • 29. C A P T H E O R E M A BC Data = “Dog” Data = “Dog” Data = “Dog”
  • 30. AVA I L A B L E ( “ A P ” ) S Y S T E M S A BC Data = “Wolf” Data = “Dog” Data = “Dog”
  • 31. AVA I L A B L E ( “ A P ” ) S Y S T E M S A BC Data = “Wolf” Data = “Dog” Data = “Wolf”
  • 32. C O N S I S T E N T ( “ C P ” ) S Y S T E M A BC Data = “Dog” Data = “Dog” Data = “Dog”
  • 33. C O N S I S T E N T ( “ C P ” ) S Y S T E M A BC Data = “Dog” Data = “Dog” Data = “Dog”
  • 34. C O N S I S T E N T ( “ C P ” ) S Y S T E M A BC Data = “Wolf” Data = “Dog” Data = “Wolf”
  • 35. T H E R E ’ S A L O T O F VA L U E I N C O N S I S T E N C Y…
  • 36. – D Y N A M O : A M A Z O N ’ S H I G H LY AVA I L A B L E K E Y- VA L U E S T O R E “Reliability at massive scale is one of the biggest challenges we face at Amazon.com. Even the slightest outage has significant financial consequences and impacts customer trust.”
  • 37. – D Y N A M O : A M A Z O N ’ S H I G H LY AVA I L A B L E K E Y- VA L U E S T O R E “Dynamo targets applications that operate with weaker consistency if this results in high availability.”
  • 38. D Y N A M O I M P L E M E N TAT I O N S
  • 39. N O T G U A R A N T E E D C O N S I S T E N C Y T H E C O S T ?
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46. “open source software project that enables distributed processing of large data sets across clusters of commodity servers”
  • 47. N O T A LWAY S AVA I L A B L E T H E C O S T ?
  • 48. W H AT I F W E D O N ’ T N E E D T O D I S T R I B U T E ? B U T …
  • 49. How big is your data? What shape is your data? Are you happy to pay? What uses your data?
  • 50. R D B M S ( R E L AT I O N A L D ATA B A S E M A N A G E M E N T S Y S T E M )
  • 51. E V E RY R O W I S A “ T H I N G ” Name Species 1 Puss 2 Dinah 3 Einstein 4 Jess
  • 52. S E T- B A S E D O P E R AT I O N R E A D D A TA O U T W I T H
  • 53. “ W H E R E ” ( I N T E R S E C T I O N ) Name Species 1 Puss 2 Dinah 3 Einstein 4 Jess
  • 54. U N I O N S Name Species 1 Puss 2 Dinah 3 Einstein 4 Jess 5 Nemo 6 Moby Dick 7 Wanda
  • 55. J O I N S Name Species Species Coolness Rating 1 Puss 0 2 Dinah 0 3 Einstein 10 4 Jess 0
  • 56. C A R T E S I A N P R O D U C T S 0 10 0 10 0 10
  • 57. C A R T E S I A N P R O D U C T S 0 10 0 10 0 10
  • 58. W H AT S Q L D O E S W E L L • Modelling objects: • With a fixed structure and shape • With a limited number of relations • With no opinion or opinion of any deeper underlying domain R D B M S ( R E L AT I O N A L D ATA B A S E M A N A G E M E N T S Y S T E M )
  • 59. T H E R E A R E P R O B L E M S T H I S I S B A D F O R B U T …
  • 60. K E V I N B A C O N S I X D E G R E E S O F …
  • 61.
  • 62.
  • 63.
  • 64.
  • 65. T H E R E I S N O O P E N E L E C T I O N D ATA T H E P R O B L E M
  • 66. E L E C T I O N D ATA
  • 67. E L E C T I O N D ATA
  • 68. E L E C T I O N D ATA E = (e.g.) member of, held in, stood in… V = elections, constituencies, years, politicians and parties
  • 69. W O R L D ’ S L E A D I N G G R A P H D B :
  • 70. "embedded, disk-based, fully transactional Java persistence engine that stores data structured in graphs rather than in tables"
  • 71. D ATA S T O R A G E
  • 72. D ATA S T O R A G E
  • 73. D ATA S T O R A G E • Nodes and edges are all: • Stored as first-class objects on the file system • “typed” • Key-value stores
  • 74. D ATA I N T H E R E L AT I O N S • “Joins” are first class objects in the database that can be queried at no additional cost • Certain queries become trivial (e.g. Joins)
  • 75. P R O T O T Y P I N G • Easy to see and work with data • Schemaless • Active community with a lot of libraries
  • 76. N E O 4 J U S E R S
  • 77. T H E R E A R E P R O B L E M S T H I S I S O V E R E N G I N E E R E D F O R B U T …
  • 78. B A S I C L O C K M E C H A N I S M S F O R E X A M P L E
  • 79. C A C H I N G F O R E X A M P L E
  • 80. K E Y / VA L U E S T O R E S P O S S I B L E S O L U T I O N
  • 81. K E Y / VA L U E S T O R E S P O S S I B L E S O L U T I O N DynamoDB Used by BBC for managing scaled scheduled processes Used by Twitter for caching your timeline
  • 82. T I M E S E R I E S D ATA B A S E
  • 83.
  • 84. Timestamp Value 2014-06-10T12:00:00+0100 17 2014-06-10T12:15:00+0100 17 2014-06-10T12:30:00+0100 20 2014-06-10T12:45:00+0100 22 2014-06-10T13:00:00+0100 24 2014-06-10T13:15:00+0100 28 2014-06-10T13:30:00+0100 32
  • 85. T H E R E A R E P R O B L E M S T H I S I S T O O S T R I C T F O R B U T …
  • 86.
  • 87. S C H E M A L E S S
  • 88.
  • 89. E V E RY R O W I S A “ T H I N G ” N A M E = P U S S C O O L N E S S = 0 ! N A M E = J E S S C O O L N E S S = 0 ! N A M E = D I N A H C O O L N E S S = 0 ! N A M E = E I N S T E I N C O O L N E S S = 1 0 ! D O C U M E N T
  • 90. S O C I A L M E D I A S I T E
  • 91.
  • 92. Y O U C A N N O T D E N O R M A L I S E D ATA WA R N I N G
  • 93. T H E R E A R E P R O B L E M S T H AT S Q L I S A C T U A L LY G O O D F O R … B U T …
  • 94. How big is your data? What shape is your data? Are you happy to pay? What uses your data?
  • 95. T H I S M AY S E E M L I K E A T R I V I A L P O I N T…
  • 96. C O S T S … • Oracle: $50,000+ • SQL Server: $10,000+
  • 97. ( B A S I C A L LY ) F R E E • MySQL • PostgreSQL • Riak • Voldemort • MariaDB • Cassandra • MongoDB
  • 98. How big is your data? What shape is your data? Are you happy to pay? What uses your data?
  • 99. T H I N G S H AV E T O U S E O U R D ATA … R E M E M B E R T H A T
  • 100. A P P L I C AT I O N A R C H I T E C T U R E D ATA B A S E C O D E Data
  • 101. A P I A R C H I T E C T U R E D ATA B A S E C O D E Data E N D U S E R
  • 102. A P I A R C H I T E C T U R E E N D U S E R What is the average age of …?
  • 103. A P I A R C H I T E C T U R E E N D U S E R Er…. I think it was something like “Campbell”?
  • 104. O U R C H O I C E I S I N F O R M E D B Y O U R P L A N S F O R T H E A P P L I C AT I O N R E M E M B E R T H A T
  • 105.
  • 106. G E O S PAT I A L I N D E X E S
  • 107.
  • 109. L I N K E D M E D I A F R A M E W O R K
  • 110. A PA C H E M A R M O T TA O U T O F T H E B O X …
  • 111.
  • 112. D O C U M E N T S T O R E
  • 113. D O C U M E N T S T O R E ElasticSearch
  • 114. E V E RY R O W I S A “ T H I N G ” N A M E = P U S S C O O L N E S S = 0 ! N A M E = J E S S C O O L N E S S = 0 ! N A M E = D I N A H C O O L N E S S = 0 ! N A M E = E I N S T E I N C O O L N E S S = 1 0 ! D O C U M E N T
  • 115. A PA C H E L U C E N E
  • 116. “Apache Lucene is a high-performance, full- featured text search engine library … It is a technology suitable for nearly any application that requires full-text search”
  • 117. F O C U S E D A R O U N D T E X T S E A R C H I N G Q U E R I E S
  • 118. { "query": { "match": {"hobbies": "skateboard"} } }
  • 119. { "query": { {"fuzzy": {"hobbies": “skateboarig"}} } }
  • 120. { "query": { {"match": {"hobbies": {"query": "writing reddit comments", "type": "phrase"}}} } }
  • 121. I N S U M M A R Y …
  • 122. D ATA B A S E S W E ’ V E TA L K E D A B O U T T O D AY… • SQL (Industry standard RDBMS coming on many flavours of different cost. used by many) • Cassandra (Eventually consistent Dynamo implementation) • Riak (Eventually consistent Dynamo implementation) • Hadoop (Large ecosystem focused around scalability - focused on consistency and utilising many nodes. Used by Facebook for their messenging) • Neo4j (Graph Database, with poor scalability but high-fidelity data model. Used by many companies for highly relational data) • Redis (In-memory key-value store, used by twitter for their caching as a lightweight solution.) • DynamoDB (AWS managed DBaaS, used by the BBC among others for light-weight key-value store needs such as locking) • MongoDB (Document store database, schedules with some interesting indexes. Used well by New York Times and Foursquar.e Used purely by Diaspora v1) • Apache Marmotta (Bleeding Edge DB used by Red Bull Media House to comply with Linked Data framework, allowing easy integration) • ElasticSearch (Document store database, providing easy searching out the box. Used by github and StackOverflow among others)
  • 123. How big is your data? What shape is your data? Are you happy to pay? What uses your data? If it gets big enough, you have to distribute AP (available) vs. CP (consistent)
  • 124. How big is your data? What shape is your data? Are you happy to pay? What uses your data? If it’s not big, you can use high-fidelity data models Time Series Graph DBs Denormalised Rows .. and more!
  • 125. How big is your data? What shape is your data? Are you happy to pay? What uses your data? You may have to make sacrifices…
  • 126. How big is your data? What shape is your data? Are you happy to pay? What uses your data? Think about the queries you’ll be running up-front… … this can prevent costly rearchitecting down the line
  • 127. How big is your data? What shape is your data? Are you happy to pay? What uses your data? Your system is unique NoSQL is not one thing - there’s a range of solutions Consider them on their own merits!
  • 128. A N Y Q U E S T I O N S > D a v i d S i m o n s @ S w a m W i t h Tu r t l e s