Database Virtualization: The Next Wave of Big Data

Database Virtualization
The Next Wave of Big Data
Mike Hogan, CEO

2
Agenda
• Big Data: A Moving Target
• Common Understanding of Virtualization
• Database Virtualization Challenge
• Alternative 1: NoSQL
• Alternative 2: Sharding
• Introducing Database Virtualization
• Narrowing the Gap Between Databases and Big Data

3
Big Data: A Moving Target
• Definition: Too much data to
handle in a traditional database
• Big Data tools leverage scale-
out architectures e.g. Hadoop
• Technology advances make Big
Data a moving target
• Databases adopting scale-
out, virtual database
architectures
DataVolume
Time
BIG Data

© Copyright 2013 ScaleDB. The information contained herein is subject to change without notice.
What is Database Virtualization?

5
The Dedicated Server
A Server
Server Utilization
Headroom (to avoid failure)
Usage Spike
(Average 10%)

6
The Virtualized App Server
Shared among many customers
Plenty of room for usage peaks
Virtualization enables Cloud Providers to sell 3-4 TIMES more
servers than they actually own. This is how they make money.

7
Database Virtualization Challenges
• No coordination between databases (data & locking)
Bank Balance = $10M
Withdraw $10M
Wire $8M
Wire $8M
Bank Balance = -$16M
Bank
You
• Requires a distributed locking solution
• Distributed locking is fairly easy to build…
• …but building it to perform well is extremely hard
• It took Oracle RAC 10 years …70 “cloud years”

8
Alternative 1: NoSQL
Elasticity enables you to burst
across servers, so you can run
them at high utilization

9
Alternative 1: NoSQL
Moves functionality to the application tier…more work for you
Your Application
Cons:
1. Non-relational (build this into your app)
2. Reduces consistency: different users/different answers
3. Removes transactions (build this into your app)
4. Less functionality e.g. joins (build these into your app)
The DBMS SQL
NoSQL
App App
You buy this part
You build & maintain this part
Pros:
1. Scalability
2. Elastic = high utilization

10
Alternative 2: SQL Sharding
Masters
Slaves
EACH server must handle the peak for ITS data
Cons:
1. Not elastic = no bursting across servers
2. Rigid partitioning model
3. Requires slaves for fail-over (vs. high-availability)
4. You have to build & maintain routing code
Pros:
1. Relational
2. Consistent data (ACID)
3. Transactional
4. Full functionality
No elasticity means no bursting
across servers, requiring low
utilization.
Not highly-available, relies on
fail-over

11
Introducing Database Virtualization
Highly-available data tier
shared across multiple
database clusters
Database Tier
(CPU)
Storage Tier
(I/O)
Virtualizes & Shares Storage Tier across Elastic Database Clusters
Shared among many customers
Plenty of room for usage peaks
Pros:
1. Relational
2. Consistent data (ACID)
3. Transactional
4. Full functionality
5. Elastic
6. No slaves

12
Introducing Database Virtualization
Processed at the storage
tier, only results are sent
back to the database
Database Tier
(CPU)
Storage Tier
(I/O)
Distributed Parallel Process Across Storage Servers
Query:
What were my sales last month?
• Distributed Parallel Processing: Similar to Map-Reduce & Oracle Exadata
• This Narrows the Gap between Databases and Big Data

13
Database Virtualization Enables DBaaS
Processing shared
across database nodes
Highly-available data tier
shared across multiple
database clusters
Database Tier
(CPU)
Storage Tier
(I/O)
Virtualizes & Shares Storage Tier across Elastic Database Clusters

14
Cloud Computing’s Enabling Technologies
Server
• Server Virtualization
• VMWare, Citrix
Storage
• Storage Virtualization
• EMC, Netapp, IBM, Dell, HP
Network
• Network Virtualization
• Cisco, VMWare, Oracle
DBMS
• Database Virtualization
• ScaleDB

How About Performance?

16
Performance: ScaleDB vs. InnoDB
Performance tests running on DL380 servers, large data set
0
500
1000
1500
2000
2500
550
1238
1884
2236
MariaDB
+InnoDB
ScaleDB
1-Node
ScaleDB
2-Nodes
ScaleDB
3-Nodes
Benchmark Details: YCSB Workload A, 1:1 Read/Write Ratio, Database Size: 200M Rows, MariaDB V5.3.5
OperationsperSecond

17
Performance tests running on HP Cloud (Read:Write Ratio = 1:1)
MySQL
+InnoDB
ScaleDB
1-Node
ScaleDB
2-Nodes
Benchmark Details: YCSB Workload A, 1:1 Read/Write Ratio, Database Size: 40M Rows, MySQL V5.1.42
OperationsperSecond
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
544
3542
4668

18
Performance tests running on HP Cloud (Read-Only)
MySQL
+InnoDB
ScaleDB
1-Node
ScaleDB
2-Nodes
Benchmark Details: YCSB Workload A, 1:0 Read/Write Ratio, Database Size: 40M Rows, MySQL V5.1.42
0
2000
4000
6000
8000
10000
12000
930
6117
11920
OperationsperSecond

19
Sysbench benchmark running on HP Cloud (Read-Only)
MySQL
+InnoDB
ScaleDB
1-Node
ScaleDB
2-Nodes
Benchmark Details: Sysbench, Read-Only, Database Size: 500M Rows, MySQL V5.1.42
TransactionsperSecond
0
50
100
150
200
250
7
134
250

20
Sysbench benchmark running on HP Cloud (10% Write )
MySQL
+InnoDB
ScaleDB
1-Node
ScaleDB
2-Nodes
Benchmark Details: Sysbench, 10% Write, Database Size: 500M Rows, MySQL V5.1.42
TransactionsperSecond
0
10
20
30
40
50
60
70
80
3
50
79

21
Summary
• Database Scale-out & Parallelization Address Big Data
• Scaling-out SQL Database Problem: Distributed Locking
• Alternative 1: NoSQL
• Alternative 2: Sharding
• Both Shift Functionality to the Application Tier
• Introducing Database Virtualization…with Performance!
• Closing the Gap Between Databases and Big Data

Thank You

Database Virtualization: The Next Wave of Big Data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Database Virtualization: The Next Wave of Big Data

Similar to Database Virtualization: The Next Wave of Big Data (20)

More from exponential-inc

More from exponential-inc (9)

Recently uploaded

Recently uploaded (20)

Database Virtualization: The Next Wave of Big Data

Editor's Notes