MT62: What you need to know about
the top database trends
Guy Harrison, Executive Director
About me
Web: guyharrison.net
Email: guy.harrison@software.dell.com
Twitter: @guyharrison
Database revolution
History of databases
Magnetic tape
“flat” (sequential) files
Pre-
computer
technologies:
Printing press
Dewey
decimal
system
Punched
cards
Magnetic Disk
IMS
Relational
Model
defined
Indexed-Sequential
Access Mechanism
(ISAM)
Network Model
IDMS
ADABAS
System R
Oracle V2
Ingres
dBase
DB2
Informix
Sybase
SQL Server
Access
Postgres
MySQL
Cassandra
Hadoop
Vertica
Riak
HBase
Dynamo
MongoDB
Redis
VoltDB
Hana
Neo4J
Aerospike
Hierarchical model
1960-701940-50 1950-60 1970-80 1980-90 1990-2000 2000-2010
3rd Platform drives new
demands on the database:
• Global high
availability
• Data volumes
• Unstructured data
• Transaction rates
• Latency
A single architecture
cannot meet all those
demands.
Why?
Operational
RDBMS
(Oracle, SQL
Server, …)
In-memory
Analytics
(HANA,
Exalytics …)
In-memory
processing
(Spark)
Hadoop
Web DBMS
(MySQL,
Mongo,
Cassandra)
ERP & in-
house CRM
Analytic/BI
software
(SAS,
Tableau)
Web Server
Data
Warehouse
RDBMS
(Oracle,
Terradata …)
It takes all sorts.
Big Data and Hadoop
The industrial revolution of data
2005
2009
The instrumented human
• Bluetooth Personal Area
Network
• 3G/WiFi Wide Area
Network
• GPS
• Storage
• Pulse, temp monitor
• Silent alarms
• Pedometer, sleep
monitoring
• Compass
• Camera
• Mike/earphones
• Heads up display
• Emotion/Attention
monitor
Big data is the culmination of cloud, social and
mobile.
More data
• Storing all data – including machine generated and sol,
Social, community, demographic data in original format –
for ever
To more effect
• Smarter use of data (data science) to achieve competitive
or human benefit
More data
• Storing all data – including machine generated and sol,
Social, community, demographic data in original format
– for ever
To more effect
• Smarter use of data (data science) to achieve competitive
or human benefit
Pioneers of big data
Google Software Architecture (circa 2005)
DISK
CPU
DISK
CPU
DISK
CPU
DISK
CPU
DISK
CPU
DISK
CPU
DISK
CPU
DISK
CPU
DISK
CPU
Google File System
Map Reduce Big Table
Google Applications
Cat Dog
Cat Dog
Dog Dog
Cat
Rabbit
Cat
Rabbit
Dog Dog
Cat Cat
Cat,1
Dog,1
Cat,1
Dog,1
Dog,1
Dog,1
Cat,1
Rabbit,1
Cat,1
Rabbit,1
Dog,1
Dog,1
Cat,1
Cat,1
Cat,1
Cat,1
Cat,1
Cat,1
Cat,1
Cat,1
Dog,1
Dog,1
Dog,1
Dog,1
Dog,1
Dog,1
Rabbit,1
Rabbit,1
Cat,6
Dog,6
Rabbit,2
Cat,6
Dog, 6
Rabbit, 2
MapReduce Map Shuffle
Reduce
MapReduce
Product
Details
ProductPage
Visits
MapReduce
Join
Customer
Details
MapReduce
Join
MapReduce
Aggregate
Equivalent of:
SELECT prod_category, Cust_country,
SUM(visits)
FROM products
JOIN product_page_visits
USING (prod_id)
JOIN customers
USING (cust_id)
GROUP BY prod_cateogry, Cust_country
Hadoop: 1.0: Open Source Map-Reduce Stack
Hadoop at Yahoo
2010 (biggest cluster)
• 4000 nodes 16PB
disk
• 64 TB of RAM
• 32,000 Cores
2014:
• 16 Clusters
• 32,500 nodes
Hadoop family
DISK
CPU
DISK
CPU
DISK
CPU
DISK
CPU
DISK
CPU
DISK
CPU
DISK
CPU
DISK
CPU
DISK
CPU
Hadoop Distributed File System (HDFS)
Map Reduce/ YARN HBase
Pig Hive Sqoop Flume
Economies
$0 $1,000 $2,000 $3,000 $4,000 $5,000 $6,000
MPP DB
Hadoop
MPP vs Hadoop $$/TB (Hardware only)
More data
• Storing all data – including machine generated and sol,
Social, community, demographic data in original format –
for ever
To more effect
• Smarter use of data (data science) to achieve competitive
or human benefit
More data
• Storing all data – including machine generated and sol,
Social, community, demographic data in original format –
for ever
To more effect
• Smarter use of data (data science) to achieve
competitive or human benefit
Big Data Analytics, aka Data Science
Machine learning
Programs that evolve
with experience
Predictive
analytics
Programs that
extrapolate from
past to future
Collective
intelligence
Programs that use
inputs from
“crowds” to
simulate
intelligence
Collective intelligence
From now on, I’ll call
you ‘An Ambulance’.
OK?
“Siri call me an
ambulance”
NoSQL
Web
servers
Database
Servers
Memcacheds
Servers
Shard (G-O) Shard (P-Z)Shard (A-F)
Read Only Slaves
CAP Theorem says something has to give.
CAP (Brewer’s)
Theorem says you
can only have two
out of three of
Consistency, Partition
Tolerance, Availability
Consistency
Everyone always sees
the same data
Availability
System stays up
when nodes fail
Partition
tolerance
System stays up
when network
between nodes fail
RDBMS
lives here
Eventual
consistency
No go
zone
Network partition
User A User B
US Data center Australian Data centre
Networkpartition
Major influences on non-relational
• Eventually consistent transaction model
• Consistent hashing
Amazon Dynamo
• Column Family model for sparse distributed
columnar data
Google BigTable
• Paved the way for the document database
OODBMS and XML DBs
Dynamo Consistent Hashing
A
C
B
F D
H
E
G
Rowkey=”johnny”
Hash= -6.7e10
Node=H
Hash values -4e10
To -8e10
Dynamo Consistent Hashing
A
C
B
F D
H
E
G
Rowkey=”johnny”
Hash= -6.7e10
Node=H First write
2nd Write
3rd Write
2nd Write
3rd Write
Node on a
different rack
Node in a
different data
center
NWR – tunable consistency
N = No of copies of data
W= No of copies that must be
written synchronously before
returning control to program
R = No of copies to Read
N=3 W=1 R=N
Fastest write, slow but consistent reads
There will be 3 copies of the data.
A write request returns once the first copy is written – the
other two can happen later.
A read request reads all copies to make sure it gets the
latest version.
Data might be lost if a node fails before the second write.
NWR – tunable consistency
N = No of copies of data
W= No of copies that must be
written synchronously before
returning control to program
R = No of copies to Read
N=3 W=2 R=2
Faster writes, still consistent (quorum assembly)
There will be 3 copies of the data.
A write request returns when 2 copies are written – the
other can happen later.
A read request reads two copies to make sure it has the
latest version.
NWR – tunable consistency
N = No of copies of data
W= No of copies that must be written
synchronously before returning control to
program
R = No of copies to Read
N=3 W=1 R=1
Fast, but not consistent
There will be 3 copies of the data.
A write request returns once the first copy is written – the
other two can happen later.
A read request reads a single version only: it might not get
the latest copy.
Data might be lost if a node fails before the second write.
Name Site Counter
Dick Ebay 507,018
Dick Google 690,414
Jane Google 716,426
Dick Facebook 723,649
Jane Facebook 643,261
Jane ILoveLarry.com 856,767
Dick MadBillFans.com 675,230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarry.com
5 MadBillFans.com
NameId SiteId Counter
1 1 507,018
1 3 690,414
2 3 716,426
1 3 723,649
2 3 643,261
2 4 856,767
1 5 675,230
Id Name Ebay Google Facebook (other columns) MadBillFans.com
1 Dick 507,018 690,414 723,649 . . . . . . . . . . . . . . 675,230
Id Name Google Facebook (other columns) ILoveLarry.com
2 Jane 716,426 643,261 . . . . . . . . . . . . . . 856,767
BigTable data model
OODBMS -1990s
The OODBMS Manifesto
(Atkinson/Bancilhon/DeWitt/Dittrich/Maier/Zdo
nik, '90)
"A relational database is like a garage that forces
you to take your car apart and store the pieces
in little drawers“
– Also SQL is ugly
“A Object database is like a closet which
requires that you hang up your suit with tie,
underwear, belt socks and shoes all attached”
(Dave Ensor)
http://4.bp.blogspot.com/-
IPgd1Tg8ByE/UkOzH-
g1FmI/AAAAAAAACB0/QYg8k
EVp5_0/s1600/db4o_vs_orm.p
ng
Revenge of the Object Nerds – Document
databases
Structured documents – XML
and JSON (JavaScript Object
Notation) become more
prevalent within applications
Web programmers start storing
these in BLOBS in MySQL
Emergence of XML and JSON
databases
MongoDB
Graph databases
Graph
Database
Neo4J
Infinite
Graph
FlockDB
Document
JSON based
MongoDB
CouchDB
RethinkDB
XML based
MarkLogic
BerkeleyDB
XML
Key Value
Memchach
eDB
Oracle
NoSQL
Dynamo
Voldemort
DynamoDB
Riak
Table
Based
BigTable
Cassandra
Hbase
HyperTable
Accumulo
No means yes!
Column-oriented DB
Row orientation vs column orientation
ID Name DOB Salary Sales Expenses
1001 Dick 21/12/60 67,000 78980 3244
1002 Jane 12/12/55 55,000 67840 2333
1003 Robert 17/02/80 22,000 67890 6436
1004 Dan 15/03/75 65,200 98770 2345
1005 Steven 11/11/81 76,000 43240 3214
Block ID Name DOB Salary Sales Expenses
1 1001 Dick 21/12/60 67,000 78980 3244
2 1002 Jane 12/12/55 55,000 67840 2333
3 1003 Robert 17/02/80 22,000 67890 6436
4 1004 Dan 15/03/75 65,200 98770 2345
5 1005 Steven 11/11/81 76,000 43240 3214
Block
1 Dick Jane Robert Dan Steven
2 21/12/60 12/12/55 17/02/80 15/03/75 11/11/81
3 67,000 55,000 22,000 65,200 76,000
4 78980 67840 67890 98770 43240
5 3244 2333 6436 2345 3214
Row oriented database
Column oriented database
Analytical queries
Block ID Name DOB Salary Sales Expenses
1 1001 Dick 21/12/60 67,000 78980 3244
2 1002 Jane 12/12/55 55,000 67840 2333
3 1003 Robert 17/02/80 22,000 67890 6436
4 1004 Dan 15/03/75 65,200 98770 2345
5 1005 Steven 11/11/81 76,000 43240 3214
Block
1 Dick Jane Robert Dan Steven
2 21/12/60 12/12/55 17/02/80 15/03/75 11/11/81
3 67,000 55,000 22,000 65,200 76,000
4 78980 67840 67890 98770 43240
5 3244 2333 6436 2345 3214
Row oriented database
Column oriented database
SELECT
SUM(salary) FROM
saleperson
Compression
Block ID Name DOB Salary Sales Expenses
1 1001 Dick 21/12/60 67,000 78980 3244
2 1002 Jane 12/12/55 55,000 67840 2333
3 1003 Robert 17/02/80 22,000 67890 6436
4 1004 Dan 15/03/75 65,200 98770 2345
5 1005 Steven 11/11/81 76,000 43240 3214
Block
1 Dick Jane Robert Dan Steven
2 21/12/60 12/12/55 17/02/80 15/03/75 11/11/81
3 67,000 55,000 22,000 65,200 76,000
4 78980 67840 67890 98770 43240
5 3244 2333 6436 2345 3214
Row oriented database
Column oriented database
Poor compression ratio (low
repetition)
Good compression ratio (high
repetition)
Inserts
Block ID Name DOB Salary Sales Expenses
1 1001 Dick 21/12/60 67,000 78980 3244
2 1002 Jane 12/12/55 55,000 67840 2333
3 1003 Robert 17/02/80 22,000 67890 6436
4 1004 Dan 15/03/75 65,200 98770 2345
5 1005 Steven 11/11/81 76,000 43240 3214
Block
1 Dick Jane Robert Dan Steven
2 21/12/60 12/12/55 17/02/80 15/03/75 11/11/81
3 67,000 55,000 22,000 65,200 76,000
4 78980 67840 67890 98770 43240
5 3244 2333 6436 2345 3214
Row oriented database
Column oriented database
INSERT INTO
salesperson
C-Store (Vertica) solution for inserts
Read Optimized Store
• Columnar
• Disk-based
• Highly Compressed
• Bulk loadable
Write Optimized Store
• Row oriented
• Uncompressed
• Single row inserts
Asynchronous Tuple Mover
Bulk sequential loads
Continual Parallel inserts
Merged
Query
SSD and
in-memory databases
5MB HDD circa 1956
The more that things change....
Faster or slower?
260
1,635
-630
1,013
-390
-1,000 -500 0 500 1,000 1,500 2,000
IO Rate
Disk Capacity
IO/Capacity
CPU
IO/CPU
%age change
Solid state disk to the rescue
DDR RAM Drive
SATA flash drive
PCI flash drive
SSD storage Server
Cheaper by the IO
4,000
80
25
15
0 1,000 2,000 3,000 4,000 5,000
Magnetic Disk
SSD SATA Flash
SSD PCI flash
SSD DDR-RAM
Seek time (us)
But not by the GB
0.35 0.28 0.21 0.17 0.13
2.9
2.2
1.7
1.3
1
10
7.4
5.3
3.2
2.3
0
2
4
6
8
10
12
2011 2012 2013 2014 2015
$$/GB
HDD MLC SDD SLC SSD
0.35
0.28
0.21
0.17
0.13
2.9
2.2
1.7
1.3
1
2.3
0.1
1
10
2011 2012 2013 2014 2015
$$/GB
HDD MLC SDD SLC SSD
Tiered storage management
Main Memory
DDR SSD
Flash SSD
Fast Disk (SAS, RAID 0+1)
Slow Disk (SATA, RAID 5)
Tape, Flat Files, Hadoop
$/IOP
$/GB
In-memory databases
Cost of RAM
falling 50% each
18 months.
Some databases
can fit entirely
within the RAM of
a single server or
cluster of servers
0.001
0.01
0.1
1
10
100
$1.00
$10.00
$100.00
$1,000.00
$10,000.00
$100,000.00
1990 1995 2000 2005 2010 2015 2020
Size(GB)
Cost(US$/GB)
Year
US$/GB Size (GB)
Oracle times ten
In-memory transactional
database
Disk-based Checkpoints and
disk-based logging
By default, COMMITs are not
durable (writes to the
transaction log are
asynchronous).
Can configure synchronous
replication or synchronous log
writes to avoid data loss
Clients
Memory
Checkpoints
Transaction
Logs
Commits
Point in time
snapshot
SAP HANA
Memory
Row Store
Column store
Delta store
Persistence Layer
Savepoints
Data files
Txn logs
Note: Table must be either row or column – not both
VoltDB
Single threaded
access to memory:
no latch/mutex waits
Transactions in self-
contained stored
procedures: minimal
locking
K-Safety for
COMMIT: No sync
waits
CPU
In-memory
Partition
CPU
In-memory
Partition
CPU
In-memory
Partition
CPU
In-memory
Partition
CPU
In-memory
Partition
CPU
In-memory
Partition
Clients Clients Clients
Spark (sort of) in-memory Hadoop
In Memory compute
HDFS compatible
Libraries for data processing,
machine learning, streaming,
SQL, etc
Python and Scala interfaces
Part of the Berkeley Data
Analytic Stack
Integrating into all Hadoop
distributions (and Cassandra)
HDFS
Tachyon – in memory
File system
Spark: in-memory distributed compute
Spark
Streaming
Mlib
Machine
Learning
SparkSQL
Mesos Cluster manager
What will the database
of the future look like?
It’s about choices
Storage
B-Tree Log Structured Merge Tree
Format
Row Columnar
Processing
SQL Graph, MapReduce, DAG
Language
SQL Something Like SQL
Schema:
Tables JSON
Consistency:
ACID transactions Eventual Consistency
Thanks!

What You Need To Know About The Top Database Trends

  • 1.
    MT62: What youneed to know about the top database trends Guy Harrison, Executive Director
  • 2.
    About me Web: guyharrison.net Email:guy.harrison@software.dell.com Twitter: @guyharrison
  • 4.
  • 5.
    History of databases Magnetictape “flat” (sequential) files Pre- computer technologies: Printing press Dewey decimal system Punched cards Magnetic Disk IMS Relational Model defined Indexed-Sequential Access Mechanism (ISAM) Network Model IDMS ADABAS System R Oracle V2 Ingres dBase DB2 Informix Sybase SQL Server Access Postgres MySQL Cassandra Hadoop Vertica Riak HBase Dynamo MongoDB Redis VoltDB Hana Neo4J Aerospike Hierarchical model 1960-701940-50 1950-60 1970-80 1980-90 1990-2000 2000-2010
  • 6.
    3rd Platform drivesnew demands on the database: • Global high availability • Data volumes • Unstructured data • Transaction rates • Latency A single architecture cannot meet all those demands. Why?
  • 7.
    Operational RDBMS (Oracle, SQL Server, …) In-memory Analytics (HANA, Exalytics…) In-memory processing (Spark) Hadoop Web DBMS (MySQL, Mongo, Cassandra) ERP & in- house CRM Analytic/BI software (SAS, Tableau) Web Server Data Warehouse RDBMS (Oracle, Terradata …) It takes all sorts.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
    The instrumented human •Bluetooth Personal Area Network • 3G/WiFi Wide Area Network • GPS • Storage • Pulse, temp monitor • Silent alarms • Pedometer, sleep monitoring • Compass • Camera • Mike/earphones • Heads up display • Emotion/Attention monitor
  • 15.
    Big data isthe culmination of cloud, social and mobile.
  • 16.
    More data • Storingall data – including machine generated and sol, Social, community, demographic data in original format – for ever To more effect • Smarter use of data (data science) to achieve competitive or human benefit
  • 17.
    More data • Storingall data – including machine generated and sol, Social, community, demographic data in original format – for ever To more effect • Smarter use of data (data science) to achieve competitive or human benefit
  • 18.
  • 24.
    Google Software Architecture(circa 2005) DISK CPU DISK CPU DISK CPU DISK CPU DISK CPU DISK CPU DISK CPU DISK CPU DISK CPU Google File System Map Reduce Big Table Google Applications
  • 25.
    Cat Dog Cat Dog DogDog Cat Rabbit Cat Rabbit Dog Dog Cat Cat Cat,1 Dog,1 Cat,1 Dog,1 Dog,1 Dog,1 Cat,1 Rabbit,1 Cat,1 Rabbit,1 Dog,1 Dog,1 Cat,1 Cat,1 Cat,1 Cat,1 Cat,1 Cat,1 Cat,1 Cat,1 Dog,1 Dog,1 Dog,1 Dog,1 Dog,1 Dog,1 Rabbit,1 Rabbit,1 Cat,6 Dog,6 Rabbit,2 Cat,6 Dog, 6 Rabbit, 2 MapReduce Map Shuffle Reduce
  • 26.
    MapReduce Product Details ProductPage Visits MapReduce Join Customer Details MapReduce Join MapReduce Aggregate Equivalent of: SELECT prod_category,Cust_country, SUM(visits) FROM products JOIN product_page_visits USING (prod_id) JOIN customers USING (cust_id) GROUP BY prod_cateogry, Cust_country
  • 27.
    Hadoop: 1.0: OpenSource Map-Reduce Stack
  • 28.
    Hadoop at Yahoo 2010(biggest cluster) • 4000 nodes 16PB disk • 64 TB of RAM • 32,000 Cores 2014: • 16 Clusters • 32,500 nodes
  • 30.
  • 31.
    Economies $0 $1,000 $2,000$3,000 $4,000 $5,000 $6,000 MPP DB Hadoop MPP vs Hadoop $$/TB (Hardware only)
  • 32.
    More data • Storingall data – including machine generated and sol, Social, community, demographic data in original format – for ever To more effect • Smarter use of data (data science) to achieve competitive or human benefit
  • 33.
    More data • Storingall data – including machine generated and sol, Social, community, demographic data in original format – for ever To more effect • Smarter use of data (data science) to achieve competitive or human benefit
  • 34.
    Big Data Analytics,aka Data Science Machine learning Programs that evolve with experience Predictive analytics Programs that extrapolate from past to future Collective intelligence Programs that use inputs from “crowds” to simulate intelligence
  • 36.
    Collective intelligence From nowon, I’ll call you ‘An Ambulance’. OK? “Siri call me an ambulance”
  • 37.
  • 38.
  • 39.
    CAP Theorem sayssomething has to give. CAP (Brewer’s) Theorem says you can only have two out of three of Consistency, Partition Tolerance, Availability Consistency Everyone always sees the same data Availability System stays up when nodes fail Partition tolerance System stays up when network between nodes fail RDBMS lives here Eventual consistency No go zone
  • 40.
    Network partition User AUser B US Data center Australian Data centre Networkpartition
  • 41.
    Major influences onnon-relational • Eventually consistent transaction model • Consistent hashing Amazon Dynamo • Column Family model for sparse distributed columnar data Google BigTable • Paved the way for the document database OODBMS and XML DBs
  • 42.
    Dynamo Consistent Hashing A C B FD H E G Rowkey=”johnny” Hash= -6.7e10 Node=H Hash values -4e10 To -8e10
  • 43.
    Dynamo Consistent Hashing A C B FD H E G Rowkey=”johnny” Hash= -6.7e10 Node=H First write 2nd Write 3rd Write 2nd Write 3rd Write Node on a different rack Node in a different data center
  • 44.
    NWR – tunableconsistency N = No of copies of data W= No of copies that must be written synchronously before returning control to program R = No of copies to Read N=3 W=1 R=N Fastest write, slow but consistent reads There will be 3 copies of the data. A write request returns once the first copy is written – the other two can happen later. A read request reads all copies to make sure it gets the latest version. Data might be lost if a node fails before the second write.
  • 45.
    NWR – tunableconsistency N = No of copies of data W= No of copies that must be written synchronously before returning control to program R = No of copies to Read N=3 W=2 R=2 Faster writes, still consistent (quorum assembly) There will be 3 copies of the data. A write request returns when 2 copies are written – the other can happen later. A read request reads two copies to make sure it has the latest version.
  • 46.
    NWR – tunableconsistency N = No of copies of data W= No of copies that must be written synchronously before returning control to program R = No of copies to Read N=3 W=1 R=1 Fast, but not consistent There will be 3 copies of the data. A write request returns once the first copy is written – the other two can happen later. A read request reads a single version only: it might not get the latest copy. Data might be lost if a node fails before the second write.
  • 47.
    Name Site Counter DickEbay 507,018 Dick Google 690,414 Jane Google 716,426 Dick Facebook 723,649 Jane Facebook 643,261 Jane ILoveLarry.com 856,767 Dick MadBillFans.com 675,230 NameId Name 1 Dick 2 Jane SiteId SiteName 1 Ebay 2 Google 3 Facebook 4 ILoveLarry.com 5 MadBillFans.com NameId SiteId Counter 1 1 507,018 1 3 690,414 2 3 716,426 1 3 723,649 2 3 643,261 2 4 856,767 1 5 675,230 Id Name Ebay Google Facebook (other columns) MadBillFans.com 1 Dick 507,018 690,414 723,649 . . . . . . . . . . . . . . 675,230 Id Name Google Facebook (other columns) ILoveLarry.com 2 Jane 716,426 643,261 . . . . . . . . . . . . . . 856,767 BigTable data model
  • 48.
    OODBMS -1990s The OODBMSManifesto (Atkinson/Bancilhon/DeWitt/Dittrich/Maier/Zdo nik, '90) "A relational database is like a garage that forces you to take your car apart and store the pieces in little drawers“ – Also SQL is ugly “A Object database is like a closet which requires that you hang up your suit with tie, underwear, belt socks and shoes all attached” (Dave Ensor) http://4.bp.blogspot.com/- IPgd1Tg8ByE/UkOzH- g1FmI/AAAAAAAACB0/QYg8k EVp5_0/s1600/db4o_vs_orm.p ng
  • 49.
    Revenge of theObject Nerds – Document databases Structured documents – XML and JSON (JavaScript Object Notation) become more prevalent within applications Web programmers start storing these in BLOBS in MySQL Emergence of XML and JSON databases
  • 50.
  • 51.
  • 52.
    Graph Database Neo4J Infinite Graph FlockDB Document JSON based MongoDB CouchDB RethinkDB XML based MarkLogic BerkeleyDB XML KeyValue Memchach eDB Oracle NoSQL Dynamo Voldemort DynamoDB Riak Table Based BigTable Cassandra Hbase HyperTable Accumulo
  • 53.
  • 54.
  • 55.
    Row orientation vscolumn orientation ID Name DOB Salary Sales Expenses 1001 Dick 21/12/60 67,000 78980 3244 1002 Jane 12/12/55 55,000 67840 2333 1003 Robert 17/02/80 22,000 67890 6436 1004 Dan 15/03/75 65,200 98770 2345 1005 Steven 11/11/81 76,000 43240 3214 Block ID Name DOB Salary Sales Expenses 1 1001 Dick 21/12/60 67,000 78980 3244 2 1002 Jane 12/12/55 55,000 67840 2333 3 1003 Robert 17/02/80 22,000 67890 6436 4 1004 Dan 15/03/75 65,200 98770 2345 5 1005 Steven 11/11/81 76,000 43240 3214 Block 1 Dick Jane Robert Dan Steven 2 21/12/60 12/12/55 17/02/80 15/03/75 11/11/81 3 67,000 55,000 22,000 65,200 76,000 4 78980 67840 67890 98770 43240 5 3244 2333 6436 2345 3214 Row oriented database Column oriented database
  • 56.
    Analytical queries Block IDName DOB Salary Sales Expenses 1 1001 Dick 21/12/60 67,000 78980 3244 2 1002 Jane 12/12/55 55,000 67840 2333 3 1003 Robert 17/02/80 22,000 67890 6436 4 1004 Dan 15/03/75 65,200 98770 2345 5 1005 Steven 11/11/81 76,000 43240 3214 Block 1 Dick Jane Robert Dan Steven 2 21/12/60 12/12/55 17/02/80 15/03/75 11/11/81 3 67,000 55,000 22,000 65,200 76,000 4 78980 67840 67890 98770 43240 5 3244 2333 6436 2345 3214 Row oriented database Column oriented database SELECT SUM(salary) FROM saleperson
  • 57.
    Compression Block ID NameDOB Salary Sales Expenses 1 1001 Dick 21/12/60 67,000 78980 3244 2 1002 Jane 12/12/55 55,000 67840 2333 3 1003 Robert 17/02/80 22,000 67890 6436 4 1004 Dan 15/03/75 65,200 98770 2345 5 1005 Steven 11/11/81 76,000 43240 3214 Block 1 Dick Jane Robert Dan Steven 2 21/12/60 12/12/55 17/02/80 15/03/75 11/11/81 3 67,000 55,000 22,000 65,200 76,000 4 78980 67840 67890 98770 43240 5 3244 2333 6436 2345 3214 Row oriented database Column oriented database Poor compression ratio (low repetition) Good compression ratio (high repetition)
  • 58.
    Inserts Block ID NameDOB Salary Sales Expenses 1 1001 Dick 21/12/60 67,000 78980 3244 2 1002 Jane 12/12/55 55,000 67840 2333 3 1003 Robert 17/02/80 22,000 67890 6436 4 1004 Dan 15/03/75 65,200 98770 2345 5 1005 Steven 11/11/81 76,000 43240 3214 Block 1 Dick Jane Robert Dan Steven 2 21/12/60 12/12/55 17/02/80 15/03/75 11/11/81 3 67,000 55,000 22,000 65,200 76,000 4 78980 67840 67890 98770 43240 5 3244 2333 6436 2345 3214 Row oriented database Column oriented database INSERT INTO salesperson
  • 59.
    C-Store (Vertica) solutionfor inserts Read Optimized Store • Columnar • Disk-based • Highly Compressed • Bulk loadable Write Optimized Store • Row oriented • Uncompressed • Single row inserts Asynchronous Tuple Mover Bulk sequential loads Continual Parallel inserts Merged Query
  • 60.
  • 61.
  • 62.
    The more thatthings change....
  • 63.
    Faster or slower? 260 1,635 -630 1,013 -390 -1,000-500 0 500 1,000 1,500 2,000 IO Rate Disk Capacity IO/Capacity CPU IO/CPU %age change
  • 64.
    Solid state diskto the rescue DDR RAM Drive SATA flash drive PCI flash drive SSD storage Server
  • 65.
    Cheaper by theIO 4,000 80 25 15 0 1,000 2,000 3,000 4,000 5,000 Magnetic Disk SSD SATA Flash SSD PCI flash SSD DDR-RAM Seek time (us)
  • 66.
    But not bythe GB 0.35 0.28 0.21 0.17 0.13 2.9 2.2 1.7 1.3 1 10 7.4 5.3 3.2 2.3 0 2 4 6 8 10 12 2011 2012 2013 2014 2015 $$/GB HDD MLC SDD SLC SSD 0.35 0.28 0.21 0.17 0.13 2.9 2.2 1.7 1.3 1 2.3 0.1 1 10 2011 2012 2013 2014 2015 $$/GB HDD MLC SDD SLC SSD
  • 67.
    Tiered storage management MainMemory DDR SSD Flash SSD Fast Disk (SAS, RAID 0+1) Slow Disk (SATA, RAID 5) Tape, Flat Files, Hadoop $/IOP $/GB
  • 68.
    In-memory databases Cost ofRAM falling 50% each 18 months. Some databases can fit entirely within the RAM of a single server or cluster of servers 0.001 0.01 0.1 1 10 100 $1.00 $10.00 $100.00 $1,000.00 $10,000.00 $100,000.00 1990 1995 2000 2005 2010 2015 2020 Size(GB) Cost(US$/GB) Year US$/GB Size (GB)
  • 69.
    Oracle times ten In-memorytransactional database Disk-based Checkpoints and disk-based logging By default, COMMITs are not durable (writes to the transaction log are asynchronous). Can configure synchronous replication or synchronous log writes to avoid data loss Clients Memory Checkpoints Transaction Logs Commits Point in time snapshot
  • 70.
    SAP HANA Memory Row Store Columnstore Delta store Persistence Layer Savepoints Data files Txn logs Note: Table must be either row or column – not both
  • 71.
    VoltDB Single threaded access tomemory: no latch/mutex waits Transactions in self- contained stored procedures: minimal locking K-Safety for COMMIT: No sync waits CPU In-memory Partition CPU In-memory Partition CPU In-memory Partition CPU In-memory Partition CPU In-memory Partition CPU In-memory Partition Clients Clients Clients
  • 72.
    Spark (sort of)in-memory Hadoop In Memory compute HDFS compatible Libraries for data processing, machine learning, streaming, SQL, etc Python and Scala interfaces Part of the Berkeley Data Analytic Stack Integrating into all Hadoop distributions (and Cassandra) HDFS Tachyon – in memory File system Spark: in-memory distributed compute Spark Streaming Mlib Machine Learning SparkSQL Mesos Cluster manager
  • 73.
    What will thedatabase of the future look like?
  • 74.
    It’s about choices Storage B-TreeLog Structured Merge Tree Format Row Columnar Processing SQL Graph, MapReduce, DAG Language SQL Something Like SQL Schema: Tables JSON Consistency: ACID transactions Eventual Consistency
  • 75.