SlideShare a Scribd company logo
1 of 32
Which Freaking
Database Should I
Use?
Andrew C. Oliver
@acoliver
{Great Wide Open | Atlanta}
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Andrew C. Oliver
• Programming since I was about 8
• Java since ~1997
• Founded POI project (currently hosted at Apache) with Marc
Johnson ~2000
o Former member Jakarta PMC
o Emeritus member of Apache Software Foundation
• Joined JBoss ~2002
• Former Board Member/current helper/lifetime member: Open
Source Initiative (http://opensource.org)
• Column in InfoWorld: http://www.infoworld.com/author-
bios/andrew-oliver
o I make fanboys cry.
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Andrew C. Oliver
@acoliver
Which Freaking Database Should I Use?
Open Software Integrators
• Founded Nov 2007 by Andrew C. Oliver (me)
o in Durham, NC
Revenue and staff has at least doubled every year since
2009.
• New office (2012) in Chicago, IL
o we're hiring mid to senior level as well as UI Developers
(JQuery, Javascript, HTML, CSS)
o up to 25% travel, salary + bonus, 401k, health, etc etc
o preferred: Java, Tomcat, JBoss, Hibernate, Spring, RDBMS,
JQuery
o nice to have: Hadoop, Neo4j, CouchBase, Ruby, at least one
Cloud platform
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Andrew C. Oliver
@acoliver
Which Freaking Database Should I Use?
• Why not just use the RDBMS for everything?
• Operational vs Analytical
• Key Value
• Column Family
• Document
• Graph
• Hadoop?
• Convergence of "clustered filesystems" and "databases"
• Conclusions
Overview
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Andrew C. Oliver
@acoliver
Which Freaking Database Should I Use?
{2014 Great Wide Open | Atlanta}
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Why Not "Just Use"
RDBMS for Everything?
Before we begin...
• Let's handle the Elephant or rather the teddy bears in
the room:
http://highscalability.com/blog/2010/9/5/hilarious-video-relational-
database-vs-nosql-fanbois.html/
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Andrew C. Oliver
@acoliver
Which Freaking Database Should I Use?
The CAP theorem
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Andrew C. Oliver
@acoliver
Which Freaking Database Should I Use?
RDBMS CAP characteristics
• Great at consistency
• Okay at availability
• Not so great at partition tolerance...
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Andrew C. Oliver
@acoliver
Which Freaking Database Should I Use?
• Lots of servers with many connections to few
servers.
Single process model
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Andrew C. Oliver
@acoliver
Which Freaking Database Should I Use?
Multiprocess Model
Data Manager Cluster Manager Data Manager Cluster Manager Data Manager Cluster ManagerData Manager Cluster Manager
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Andrew C. Oliver
@acoliver
Which Freaking Database Should I Use?
• 10mb disks were "big"
• Scalability meant more disks, controllers and possibly
CPUs
• CPUs went from 4.77 Mhz to 3.4ghz
• Disks went from 64kps@70ms to 6gb/s
• Network speeds went from under 4mb to gigabit to
bonded gigabit and beyond.
• Disk speeds for a long time didn't keep up with CPU...
Historical Scalability
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Andrew C. Oliver
@acoliver
Which Freaking Database Should I Use?
• RDBMS is based on "Relational Algebra" which is just
an extension of basic "set theory"
• Not every problem is a set problem: "direct path" or
"which thing contains this other thing which has this
other thing" (foaf)
• Sometimes relationships are as important as the data
• Sometimes data is even simpler than the relational
model but needs higher levels of availability, etc.
• One size never really did fit all
The Mathematical model
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Andrew C. Oliver
@acoliver
Which Freaking Database Should I Use?
Data Complexity
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Andrew C. Oliver
@acoliver
Which Freaking Database Should I Use?
Datarrhea
• Yes I've already registered that ;-)
• The cheapness of storing data has yielded more
demand
o economics predicted this
• Moore's law ended while you slept
o Intel says next year (but when did CPU speeds last
double?)
• Massive parallelization is the most feasible way to get at
it (counter trended with an explosion in disk speeds)
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Andrew C. Oliver
@acoliver
Which Freaking Database Should I Use?
...but
• If
o your data is tabular;
o fits cleanly in a relational model;
o you aren't having scalability issues;
o you don't have a large dataset; or
o a dataset/problem that lends itself to massive
parallelization...
• you can probably stick with your RDBMS for now
o ...and probably aren't at this conference anyhow.
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Andrew C. Oliver
@acoliver
Which Freaking Database Should I Use?
JPA/RDBMS Tables Example
PersonID Firstname Lastname CompanyID
2 Andy Oliver 3
CompanyID Name City State
3 Open Software
Integrators
Durham NC
PhoneNumber Type PersonID
919.627.1236 google 2
919.321.0119 work 2
Operational vs Analytical
• One DB type is unlikely to be well suited for all of your
problems.
• The system doing "short and sweet" "lightweight"
transactions is your operational system.
• The system doing long running reports and generating
charts and graphs and statistics is your analytical
system.
• There is also search. There are recommendation
engines, etc.
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Andrew C. Oliver
@acoliver
Which Freaking Database Should I Use?
{2014 Great Wide Open | Atlanta}
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Other Types of
Databases
• Examples: Couchbase 1.8, Cassandra
o also: Gemfire, Infinispan (distributed caches)
• Constant Time O(1) - Lookup by key
• Good enough for "right now" stock quotes
• Usually combined with an index for search, but the
structure isn't inherently indexed.
• Generally works well with Map Reduce.
• Extremely scalable, easy to partition
Key-Value Stores
Which Freaking Database Should I Use?
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Andrew C. Oliver
@acoliver
• Many Key-Value support "column families"
o Cassandra
• Some we designed this way
o HBase
• Keys and values become composite
• essentially a hashmap with a multi-dimensional array
o each column is a row of data
• map-reduce friendly
• Stock quote with time ranges
Column Family / Big Table
Which Freaking Database Should I Use?
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Andrew C. Oliver
@acoliver
HBase Example
Row
key
First
name
Last
name
Company City State
Phone
number
Phone
type
5bfbd4a0
-d02a-
11e1-
9b23-
0800200c
9a66
Andy Oliver
Open
Software
Integrators
Durham NC
919-627-
1236
google
7b2435c
0-d02a-
11e1-
9b23-
0800200c
9a66
Andy Oliver
Open
Software
Integrators
Durham NC
919-321-
0119
work
Which Freaking Database Should I Use?
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Andrew C. Oliver
@acoliver
• Many developers think these are the "holy grail" since
the fit nicely with object-oriented programming.
• Couchbase 2.0, CouchDB, MongoDB
• JSON documents
• One way to think of this is a Key-Value store that
understands the values.
• Not as map-reduce friendly, larger datasets require
indexes.
• clearly rest services, operational store
Document databases
Which Freaking Database Should I Use?
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Andrew C. Oliver
@acoliver
• JSON document:
{
"firstname" : "Andy",
"lastname" : "Oliver",
"company" : "Open Software Integrators",
"location" : { "city" : "Durham", "state" : "NC" },
"phone" : [
{ "number" : "123 456 7890", "type" : "mobile" },
{ "number" : "123 654 1234", "type" : "work" }
]
}
Document databases
Which Freaking Database Should I Use?
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Andrew C. Oliver
@acoliver
• Based on Graph Theory
• Less about volume of the data and more about
complexity
• Many are transactional
o often the transactions are "more correct" than those
offered by a relational database.
• FOAF, direct path operations are easy
o very complicated/inefficient in RDBMS
• Usually paired with an index for search
Graph Databases
Which Freaking Database Should I Use?
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Andrew C. Oliver
@acoliver
Design: RDBMS vs Graph
Which Freaking Database Should I Use?
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Andrew C. Oliver
@acoliver
Phone Number: 919.627.1236
Type : googlevoice
HAS
Phone Number: 919.321.0119
Type : work
Company: Open Software
Integrators
LOCATED
FOUNDED
Firstname: Andrew
Lastname: Oliver
City:
Durham
State: NC
Neo4j Graph Example
WORKS FOR
LOCATEDCity:
Chicago
State: IL
HAS
RESIDES
Note the extra relationships and details here - graph databases are just fun and easy to
understand.
Which Freaking Database Should I Use?
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Andrew C. Oliver
@acoliver
HAS
• NoSQL
• Software Framework (lots of pieces/lots of choices):
o Pig - scripting language used to quickly write MapReduce code
to handle unstructured sources
o Hive - facilitates structure for the data
o HCatalog - provides inter-operability between these internal
systems
o HBase - Bigtable-type database
o HDFS - Hadoop file system
• Excellent choice for data processing and data analysis
• MapReduce
Where does Hadoop fit?
Which Freaking Database Should I Use?
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Andrew C. Oliver
@acoliver
• Hadoop HDFS is...a distributed filesystem
• So is Gluster, Ceph, GFS, etc
• Hadoop can use Ceph or Gluster in place of HDFS
Convergence of Filesystems and
Databases
Which Freaking Database Should I Use?
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Andrew C. Oliver
@acoliver
• Triplestores
o Apache Jenna
• OODBMS /ORDMS
o Cache
Other Derivatives
Which Freaking Database Should I Use?
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Andrew C. Oliver
@acoliver
• Persistence
o Asynch / Synch
• Replication
• Availability
• Transactions / Consistency
• "Locality"
• Language
• Resources
o http://en.wikipedia.org/wiki/Comparison_of_structured_storage_softwa
re
o http://sevenweeks.org/
Things you may consider
Which Freaking Database Should I Use?
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Andrew C. Oliver
@acoliver
• RDBMS may not scale to your needs
• Your data may not map efficiently to tables
• Key Value Store - data by key, fast, scalable, can't handle complex
data
• Column Family/Big Table - fast, scalable, denormalized, map
reduce, good for series, not efficient for complex data
• Document - a good operational system, not your analytical,
moderately scalable, matches OO
• Graph - great for complex data, transactional, less scalable
• Filesystems and "databases" are converging
Conclusions
Which Freaking Database Should I Use?
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Andrew C. Oliver
@acoliver
{2014 Great Wide Open | Atlanta}
{Open Software Integrators} { www.osintegrators.com} {@osintegrators}
Thank you for
attending!

More Related Content

What's hot

Ocassionally connected devices spark final
Ocassionally connected devices spark finalOcassionally connected devices spark final
Ocassionally connected devices spark finalChris Ballance
 
Defending the Enterprise with Evernote at SourceBoston on May 27, 2015
Defending the Enterprise with Evernote at SourceBoston on May 27, 2015Defending the Enterprise with Evernote at SourceBoston on May 27, 2015
Defending the Enterprise with Evernote at SourceBoston on May 27, 2015grecsl
 
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Databricks
 
Intro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LAIntro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LASri Ambati
 
Data Science with Spark & Zeppelin
Data Science with Spark & ZeppelinData Science with Spark & Zeppelin
Data Science with Spark & ZeppelinVinay Shukla
 
Dapper: the microORM that will change your life
Dapper: the microORM that will change your lifeDapper: the microORM that will change your life
Dapper: the microORM that will change your lifeDavide Mauri
 
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...Luke Han
 
JupyterCon 2020 - Supercharging SQL Users with Jupyter Notebooks
JupyterCon 2020 - Supercharging SQL Users with Jupyter NotebooksJupyterCon 2020 - Supercharging SQL Users with Jupyter Notebooks
JupyterCon 2020 - Supercharging SQL Users with Jupyter NotebooksMichelle Ufford
 
Big data at AWS Chicago User Group - 2014
Big data at AWS Chicago User Group - 2014Big data at AWS Chicago User Group - 2014
Big data at AWS Chicago User Group - 2014AWS Chicago
 
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...Databricks
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analyticsamesar0
 
Agile Data Science: Hadoop Analytics Applications
Agile Data Science: Hadoop Analytics ApplicationsAgile Data Science: Hadoop Analytics Applications
Agile Data Science: Hadoop Analytics ApplicationsRussell Jurney
 
node-crate: node.js and big data
 node-crate: node.js and big data node-crate: node.js and big data
node-crate: node.js and big dataStefan Thies
 
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYCSpotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYCJosh Baer
 
Apache Kylin Introduction
Apache Kylin IntroductionApache Kylin Introduction
Apache Kylin IntroductionLuke Han
 
Meetup Crash Course: Cassandra Data Modelling
Meetup Crash Course: Cassandra Data ModellingMeetup Crash Course: Cassandra Data Modelling
Meetup Crash Course: Cassandra Data ModellingErick Ramirez
 
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQLFrom Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQLKonstantin Gredeskoul
 
H2O intro at Dallas Meetup
H2O intro at Dallas MeetupH2O intro at Dallas Meetup
H2O intro at Dallas MeetupSri Ambati
 

What's hot (20)

Ocassionally connected devices spark final
Ocassionally connected devices spark finalOcassionally connected devices spark final
Ocassionally connected devices spark final
 
Defending the Enterprise with Evernote at SourceBoston on May 27, 2015
Defending the Enterprise with Evernote at SourceBoston on May 27, 2015Defending the Enterprise with Evernote at SourceBoston on May 27, 2015
Defending the Enterprise with Evernote at SourceBoston on May 27, 2015
 
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
 
Intro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LAIntro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LA
 
Data Science with Spark & Zeppelin
Data Science with Spark & ZeppelinData Science with Spark & Zeppelin
Data Science with Spark & Zeppelin
 
Dapper: the microORM that will change your life
Dapper: the microORM that will change your lifeDapper: the microORM that will change your life
Dapper: the microORM that will change your life
 
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
 
JupyterCon 2020 - Supercharging SQL Users with Jupyter Notebooks
JupyterCon 2020 - Supercharging SQL Users with Jupyter NotebooksJupyterCon 2020 - Supercharging SQL Users with Jupyter Notebooks
JupyterCon 2020 - Supercharging SQL Users with Jupyter Notebooks
 
URLSession Reloaded
URLSession ReloadedURLSession Reloaded
URLSession Reloaded
 
Big data at AWS Chicago User Group - 2014
Big data at AWS Chicago User Group - 2014Big data at AWS Chicago User Group - 2014
Big data at AWS Chicago User Group - 2014
 
Hands On: Introduction to the Hadoop Ecosystem
Hands On: Introduction to the Hadoop EcosystemHands On: Introduction to the Hadoop Ecosystem
Hands On: Introduction to the Hadoop Ecosystem
 
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
 
Agile Data Science: Hadoop Analytics Applications
Agile Data Science: Hadoop Analytics ApplicationsAgile Data Science: Hadoop Analytics Applications
Agile Data Science: Hadoop Analytics Applications
 
node-crate: node.js and big data
 node-crate: node.js and big data node-crate: node.js and big data
node-crate: node.js and big data
 
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYCSpotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
 
Apache Kylin Introduction
Apache Kylin IntroductionApache Kylin Introduction
Apache Kylin Introduction
 
Meetup Crash Course: Cassandra Data Modelling
Meetup Crash Course: Cassandra Data ModellingMeetup Crash Course: Cassandra Data Modelling
Meetup Crash Course: Cassandra Data Modelling
 
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQLFrom Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
 
H2O intro at Dallas Meetup
H2O intro at Dallas MeetupH2O intro at Dallas Meetup
H2O intro at Dallas Meetup
 

Similar to Which Freaking Database Should I Use?

Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017AWS Chicago
 
SRV318_Research at PNNL Powered by AWS
SRV318_Research at PNNL Powered by AWSSRV318_Research at PNNL Powered by AWS
SRV318_Research at PNNL Powered by AWSAmazon Web Services
 
Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017
Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017
Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017Amazon Web Services
 
Beyond DevOps: How Netflix Bridges the Gap?
Beyond DevOps: How Netflix Bridges the Gap?Beyond DevOps: How Netflix Bridges the Gap?
Beyond DevOps: How Netflix Bridges the Gap?C4Media
 
Active Cloud DB at CloudComp '10
Active Cloud DB at CloudComp '10Active Cloud DB at CloudComp '10
Active Cloud DB at CloudComp '10Chris Bunch
 
ASTQB washington-sept-2015
ASTQB washington-sept-2015ASTQB washington-sept-2015
ASTQB washington-sept-2015Dan Boutin
 
Service Architectures at Scale
Service Architectures at ScaleService Architectures at Scale
Service Architectures at ScaleRandy Shoup
 
Agile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsAgile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsDataWorks Summit
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersAmazon Web Services
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudChris Dagdigian
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sangerChris Dwan
 
Measuring the End User
Measuring the End User Measuring the End User
Measuring the End User APNIC
 
Which postgres is_right_for_me_20130517
Which postgres is_right_for_me_20130517Which postgres is_right_for_me_20130517
Which postgres is_right_for_me_20130517EDB
 
Impala use case @ edge
Impala use case @ edgeImpala use case @ edge
Impala use case @ edgeRam Kedem
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabSri Ambati
 
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseModern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseEric Bragas
 
Web Development using Ruby on Rails
Web Development using Ruby on RailsWeb Development using Ruby on Rails
Web Development using Ruby on RailsAvi Kedar
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNATomas Cervenka
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OSri Ambati
 

Similar to Which Freaking Database Should I Use? (20)

Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
 
SRV318_Research at PNNL Powered by AWS
SRV318_Research at PNNL Powered by AWSSRV318_Research at PNNL Powered by AWS
SRV318_Research at PNNL Powered by AWS
 
Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017
Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017
Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017
 
Beyond DevOps: How Netflix Bridges the Gap?
Beyond DevOps: How Netflix Bridges the Gap?Beyond DevOps: How Netflix Bridges the Gap?
Beyond DevOps: How Netflix Bridges the Gap?
 
Active Cloud DB at CloudComp '10
Active Cloud DB at CloudComp '10Active Cloud DB at CloudComp '10
Active Cloud DB at CloudComp '10
 
ASTQB washington-sept-2015
ASTQB washington-sept-2015ASTQB washington-sept-2015
ASTQB washington-sept-2015
 
Service Architectures at Scale
Service Architectures at ScaleService Architectures at Scale
Service Architectures at Scale
 
Agile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsAgile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics Applications
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the Cloud
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sanger
 
Measuring the End User
Measuring the End User Measuring the End User
Measuring the End User
 
Which postgres is_right_for_me_20130517
Which postgres is_right_for_me_20130517Which postgres is_right_for_me_20130517
Which postgres is_right_for_me_20130517
 
Impala use case @ edge
Impala use case @ edgeImpala use case @ edge
Impala use case @ edge
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
 
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseModern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
 
Web Development using Ruby on Rails
Web Development using Ruby on RailsWeb Development using Ruby on Rails
Web Development using Ruby on Rails
 
Dev Ops without the Ops
Dev Ops without the OpsDev Ops without the Ops
Dev Ops without the Ops
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2O
 

More from Great Wide Open

The Little Meetup That Could
The Little Meetup That CouldThe Little Meetup That Could
The Little Meetup That CouldGreat Wide Open
 
Lightning Talk - 5 Hacks to Getting the Job of Your Dreams
Lightning Talk - 5 Hacks to Getting the Job of Your DreamsLightning Talk - 5 Hacks to Getting the Job of Your Dreams
Lightning Talk - 5 Hacks to Getting the Job of Your DreamsGreat Wide Open
 
Breaking Free from Proprietary Gravitational Pull
Breaking Free from Proprietary Gravitational PullBreaking Free from Proprietary Gravitational Pull
Breaking Free from Proprietary Gravitational PullGreat Wide Open
 
Dealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to InfinityDealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to InfinityGreat Wide Open
 
You Don't Know Node: Quick Intro to 6 Core Features
You Don't Know Node: Quick Intro to 6 Core FeaturesYou Don't Know Node: Quick Intro to 6 Core Features
You Don't Know Node: Quick Intro to 6 Core FeaturesGreat Wide Open
 
Using Cryptography Properly in Applications
Using Cryptography Properly in ApplicationsUsing Cryptography Properly in Applications
Using Cryptography Properly in ApplicationsGreat Wide Open
 
Lightning Talk - Getting Students Involved In Open Source
Lightning Talk - Getting Students Involved In Open SourceLightning Talk - Getting Students Involved In Open Source
Lightning Talk - Getting Students Involved In Open SourceGreat Wide Open
 
You have Selenium... Now what?
You have Selenium... Now what?You have Selenium... Now what?
You have Selenium... Now what?Great Wide Open
 
How Constraints Cultivate Growth
How Constraints Cultivate GrowthHow Constraints Cultivate Growth
How Constraints Cultivate GrowthGreat Wide Open
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingGreat Wide Open
 
The Current Messaging Landscape
The Current Messaging LandscapeThe Current Messaging Landscape
The Current Messaging LandscapeGreat Wide Open
 
Understanding Open Source Class 101
Understanding Open Source Class 101Understanding Open Source Class 101
Understanding Open Source Class 101Great Wide Open
 
Elasticsearch for SQL Users
Elasticsearch for SQL UsersElasticsearch for SQL Users
Elasticsearch for SQL UsersGreat Wide Open
 

More from Great Wide Open (20)

The Little Meetup That Could
The Little Meetup That CouldThe Little Meetup That Could
The Little Meetup That Could
 
Lightning Talk - 5 Hacks to Getting the Job of Your Dreams
Lightning Talk - 5 Hacks to Getting the Job of Your DreamsLightning Talk - 5 Hacks to Getting the Job of Your Dreams
Lightning Talk - 5 Hacks to Getting the Job of Your Dreams
 
Breaking Free from Proprietary Gravitational Pull
Breaking Free from Proprietary Gravitational PullBreaking Free from Proprietary Gravitational Pull
Breaking Free from Proprietary Gravitational Pull
 
Dealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to InfinityDealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to Infinity
 
You Don't Know Node: Quick Intro to 6 Core Features
You Don't Know Node: Quick Intro to 6 Core FeaturesYou Don't Know Node: Quick Intro to 6 Core Features
You Don't Know Node: Quick Intro to 6 Core Features
 
Hidden Features in HTTP
Hidden Features in HTTPHidden Features in HTTP
Hidden Features in HTTP
 
Using Cryptography Properly in Applications
Using Cryptography Properly in ApplicationsUsing Cryptography Properly in Applications
Using Cryptography Properly in Applications
 
Lightning Talk - Getting Students Involved In Open Source
Lightning Talk - Getting Students Involved In Open SourceLightning Talk - Getting Students Involved In Open Source
Lightning Talk - Getting Students Involved In Open Source
 
You have Selenium... Now what?
You have Selenium... Now what?You have Selenium... Now what?
You have Selenium... Now what?
 
How Constraints Cultivate Growth
How Constraints Cultivate GrowthHow Constraints Cultivate Growth
How Constraints Cultivate Growth
 
Inner Source 101
Inner Source 101Inner Source 101
Inner Source 101
 
Running MySQL on Linux
Running MySQL on LinuxRunning MySQL on Linux
Running MySQL on Linux
 
Search is the new UI
Search is the new UISearch is the new UI
Search is the new UI
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
 
The Current Messaging Landscape
The Current Messaging LandscapeThe Current Messaging Landscape
The Current Messaging Landscape
 
Apache httpd v2.4
Apache httpd v2.4Apache httpd v2.4
Apache httpd v2.4
 
Understanding Open Source Class 101
Understanding Open Source Class 101Understanding Open Source Class 101
Understanding Open Source Class 101
 
Thinking in Git
Thinking in GitThinking in Git
Thinking in Git
 
Antifragile Design
Antifragile DesignAntifragile Design
Antifragile Design
 
Elasticsearch for SQL Users
Elasticsearch for SQL UsersElasticsearch for SQL Users
Elasticsearch for SQL Users
 

Recently uploaded

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Which Freaking Database Should I Use?

  • 1. Which Freaking Database Should I Use? Andrew C. Oliver @acoliver {Great Wide Open | Atlanta} {Open Software Integrators} { www.osintegrators.com} {@osintegrators}
  • 2. Andrew C. Oliver • Programming since I was about 8 • Java since ~1997 • Founded POI project (currently hosted at Apache) with Marc Johnson ~2000 o Former member Jakarta PMC o Emeritus member of Apache Software Foundation • Joined JBoss ~2002 • Former Board Member/current helper/lifetime member: Open Source Initiative (http://opensource.org) • Column in InfoWorld: http://www.infoworld.com/author- bios/andrew-oliver o I make fanboys cry. {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Andrew C. Oliver @acoliver Which Freaking Database Should I Use?
  • 3. Open Software Integrators • Founded Nov 2007 by Andrew C. Oliver (me) o in Durham, NC Revenue and staff has at least doubled every year since 2009. • New office (2012) in Chicago, IL o we're hiring mid to senior level as well as UI Developers (JQuery, Javascript, HTML, CSS) o up to 25% travel, salary + bonus, 401k, health, etc etc o preferred: Java, Tomcat, JBoss, Hibernate, Spring, RDBMS, JQuery o nice to have: Hadoop, Neo4j, CouchBase, Ruby, at least one Cloud platform {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Andrew C. Oliver @acoliver Which Freaking Database Should I Use?
  • 4. • Why not just use the RDBMS for everything? • Operational vs Analytical • Key Value • Column Family • Document • Graph • Hadoop? • Convergence of "clustered filesystems" and "databases" • Conclusions Overview {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Andrew C. Oliver @acoliver Which Freaking Database Should I Use?
  • 5. {2014 Great Wide Open | Atlanta} {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Why Not "Just Use" RDBMS for Everything?
  • 6. Before we begin... • Let's handle the Elephant or rather the teddy bears in the room: http://highscalability.com/blog/2010/9/5/hilarious-video-relational- database-vs-nosql-fanbois.html/ {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Andrew C. Oliver @acoliver Which Freaking Database Should I Use?
  • 7. The CAP theorem {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Andrew C. Oliver @acoliver Which Freaking Database Should I Use?
  • 8. RDBMS CAP characteristics • Great at consistency • Okay at availability • Not so great at partition tolerance... {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Andrew C. Oliver @acoliver Which Freaking Database Should I Use?
  • 9. • Lots of servers with many connections to few servers. Single process model {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Andrew C. Oliver @acoliver Which Freaking Database Should I Use?
  • 10. Multiprocess Model Data Manager Cluster Manager Data Manager Cluster Manager Data Manager Cluster ManagerData Manager Cluster Manager {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Andrew C. Oliver @acoliver Which Freaking Database Should I Use?
  • 11. • 10mb disks were "big" • Scalability meant more disks, controllers and possibly CPUs • CPUs went from 4.77 Mhz to 3.4ghz • Disks went from 64kps@70ms to 6gb/s • Network speeds went from under 4mb to gigabit to bonded gigabit and beyond. • Disk speeds for a long time didn't keep up with CPU... Historical Scalability {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Andrew C. Oliver @acoliver Which Freaking Database Should I Use?
  • 12. • RDBMS is based on "Relational Algebra" which is just an extension of basic "set theory" • Not every problem is a set problem: "direct path" or "which thing contains this other thing which has this other thing" (foaf) • Sometimes relationships are as important as the data • Sometimes data is even simpler than the relational model but needs higher levels of availability, etc. • One size never really did fit all The Mathematical model {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Andrew C. Oliver @acoliver Which Freaking Database Should I Use?
  • 13. Data Complexity {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Andrew C. Oliver @acoliver Which Freaking Database Should I Use?
  • 14. Datarrhea • Yes I've already registered that ;-) • The cheapness of storing data has yielded more demand o economics predicted this • Moore's law ended while you slept o Intel says next year (but when did CPU speeds last double?) • Massive parallelization is the most feasible way to get at it (counter trended with an explosion in disk speeds) {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Andrew C. Oliver @acoliver Which Freaking Database Should I Use?
  • 15. ...but • If o your data is tabular; o fits cleanly in a relational model; o you aren't having scalability issues; o you don't have a large dataset; or o a dataset/problem that lends itself to massive parallelization... • you can probably stick with your RDBMS for now o ...and probably aren't at this conference anyhow. {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Andrew C. Oliver @acoliver Which Freaking Database Should I Use?
  • 16. JPA/RDBMS Tables Example PersonID Firstname Lastname CompanyID 2 Andy Oliver 3 CompanyID Name City State 3 Open Software Integrators Durham NC PhoneNumber Type PersonID 919.627.1236 google 2 919.321.0119 work 2
  • 17. Operational vs Analytical • One DB type is unlikely to be well suited for all of your problems. • The system doing "short and sweet" "lightweight" transactions is your operational system. • The system doing long running reports and generating charts and graphs and statistics is your analytical system. • There is also search. There are recommendation engines, etc. {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Andrew C. Oliver @acoliver Which Freaking Database Should I Use?
  • 18. {2014 Great Wide Open | Atlanta} {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Other Types of Databases
  • 19. • Examples: Couchbase 1.8, Cassandra o also: Gemfire, Infinispan (distributed caches) • Constant Time O(1) - Lookup by key • Good enough for "right now" stock quotes • Usually combined with an index for search, but the structure isn't inherently indexed. • Generally works well with Map Reduce. • Extremely scalable, easy to partition Key-Value Stores Which Freaking Database Should I Use? {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Andrew C. Oliver @acoliver
  • 20. • Many Key-Value support "column families" o Cassandra • Some we designed this way o HBase • Keys and values become composite • essentially a hashmap with a multi-dimensional array o each column is a row of data • map-reduce friendly • Stock quote with time ranges Column Family / Big Table Which Freaking Database Should I Use? {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Andrew C. Oliver @acoliver
  • 21. HBase Example Row key First name Last name Company City State Phone number Phone type 5bfbd4a0 -d02a- 11e1- 9b23- 0800200c 9a66 Andy Oliver Open Software Integrators Durham NC 919-627- 1236 google 7b2435c 0-d02a- 11e1- 9b23- 0800200c 9a66 Andy Oliver Open Software Integrators Durham NC 919-321- 0119 work Which Freaking Database Should I Use? {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Andrew C. Oliver @acoliver
  • 22. • Many developers think these are the "holy grail" since the fit nicely with object-oriented programming. • Couchbase 2.0, CouchDB, MongoDB • JSON documents • One way to think of this is a Key-Value store that understands the values. • Not as map-reduce friendly, larger datasets require indexes. • clearly rest services, operational store Document databases Which Freaking Database Should I Use? {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Andrew C. Oliver @acoliver
  • 23. • JSON document: { "firstname" : "Andy", "lastname" : "Oliver", "company" : "Open Software Integrators", "location" : { "city" : "Durham", "state" : "NC" }, "phone" : [ { "number" : "123 456 7890", "type" : "mobile" }, { "number" : "123 654 1234", "type" : "work" } ] } Document databases Which Freaking Database Should I Use? {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Andrew C. Oliver @acoliver
  • 24. • Based on Graph Theory • Less about volume of the data and more about complexity • Many are transactional o often the transactions are "more correct" than those offered by a relational database. • FOAF, direct path operations are easy o very complicated/inefficient in RDBMS • Usually paired with an index for search Graph Databases Which Freaking Database Should I Use? {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Andrew C. Oliver @acoliver
  • 25. Design: RDBMS vs Graph Which Freaking Database Should I Use? {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Andrew C. Oliver @acoliver
  • 26. Phone Number: 919.627.1236 Type : googlevoice HAS Phone Number: 919.321.0119 Type : work Company: Open Software Integrators LOCATED FOUNDED Firstname: Andrew Lastname: Oliver City: Durham State: NC Neo4j Graph Example WORKS FOR LOCATEDCity: Chicago State: IL HAS RESIDES Note the extra relationships and details here - graph databases are just fun and easy to understand. Which Freaking Database Should I Use? {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Andrew C. Oliver @acoliver HAS
  • 27. • NoSQL • Software Framework (lots of pieces/lots of choices): o Pig - scripting language used to quickly write MapReduce code to handle unstructured sources o Hive - facilitates structure for the data o HCatalog - provides inter-operability between these internal systems o HBase - Bigtable-type database o HDFS - Hadoop file system • Excellent choice for data processing and data analysis • MapReduce Where does Hadoop fit? Which Freaking Database Should I Use? {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Andrew C. Oliver @acoliver
  • 28. • Hadoop HDFS is...a distributed filesystem • So is Gluster, Ceph, GFS, etc • Hadoop can use Ceph or Gluster in place of HDFS Convergence of Filesystems and Databases Which Freaking Database Should I Use? {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Andrew C. Oliver @acoliver
  • 29. • Triplestores o Apache Jenna • OODBMS /ORDMS o Cache Other Derivatives Which Freaking Database Should I Use? {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Andrew C. Oliver @acoliver
  • 30. • Persistence o Asynch / Synch • Replication • Availability • Transactions / Consistency • "Locality" • Language • Resources o http://en.wikipedia.org/wiki/Comparison_of_structured_storage_softwa re o http://sevenweeks.org/ Things you may consider Which Freaking Database Should I Use? {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Andrew C. Oliver @acoliver
  • 31. • RDBMS may not scale to your needs • Your data may not map efficiently to tables • Key Value Store - data by key, fast, scalable, can't handle complex data • Column Family/Big Table - fast, scalable, denormalized, map reduce, good for series, not efficient for complex data • Document - a good operational system, not your analytical, moderately scalable, matches OO • Graph - great for complex data, transactional, less scalable • Filesystems and "databases" are converging Conclusions Which Freaking Database Should I Use? {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Andrew C. Oliver @acoliver
  • 32. {2014 Great Wide Open | Atlanta} {Open Software Integrators} { www.osintegrators.com} {@osintegrators} Thank you for attending!