Database Virtualization
The Next Wave of Big Data
Mike Hogan, CEO
2
Agenda
• Big Data: A Moving Target
• Common Understanding of Virtualization
• Database Virtualization Challenge
• Alternative 1: NoSQL
• Alternative 2: Sharding
• Introducing Database Virtualization
• Narrowing the Gap Between Databases and Big Data
3
Big Data: A Moving Target
• Definition: Too much data to
handle in a traditional database
• Big Data tools leverage scale-
out architectures e.g. Hadoop
• Technology advances make Big
Data a moving target
• Databases adopting scale-
out, virtual database
architectures
DataVolume
Time
BIG Data
© Copyright 2013 ScaleDB. The information contained herein is subject to change without notice.
What is Database Virtualization?
5
The Dedicated Server
A Server
Server Utilization
Headroom (to avoid failure)
Usage Spike
(Average 10%)
6
The Virtualized App Server
Shared among many customers
Plenty of room for usage peaks
Virtualization enables Cloud Providers to sell 3-4 TIMES more
servers than they actually own. This is how they make money.
7
Database Virtualization Challenges
• No coordination between databases (data & locking)
Bank Balance = $10M
Withdraw $10M
Wire $8M
Wire $8M
Bank Balance = -$16M
Bank
You
• Requires a distributed locking solution
• Distributed locking is fairly easy to build…
• …but building it to perform well is extremely hard
• It took Oracle RAC 10 years …70 “cloud years”
8
Alternative 1: NoSQL
Elasticity enables you to burst
across servers, so you can run
them at high utilization
9
Alternative 1: NoSQL
Moves functionality to the application tier…more work for you
Your Application
Cons:
1. Non-relational (build this into your app)
2. Reduces consistency: different users/different answers
3. Removes transactions (build this into your app)
4. Less functionality e.g. joins (build these into your app)
The DBMS SQL
NoSQL
App App
You buy this part
You build & maintain this part
Pros:
1. Scalability
2. Elastic = high utilization
10
Alternative 2: SQL Sharding
Masters
Slaves
EACH server must handle the peak for ITS data
Cons:
1. Not elastic = no bursting across servers
2. Rigid partitioning model
3. Requires slaves for fail-over (vs. high-availability)
4. You have to build & maintain routing code
Pros:
1. Relational
2. Consistent data (ACID)
3. Transactional
4. Full functionality
No elasticity means no bursting
across servers, requiring low
utilization.
Not highly-available, relies on
fail-over
11
Introducing Database Virtualization
Highly-available data tier
shared across multiple
database clusters
Database Tier
(CPU)
Storage Tier
(I/O)
Virtualizes & Shares Storage Tier across Elastic Database Clusters
Shared among many customers
Plenty of room for usage peaks
Pros:
1. Relational
2. Consistent data (ACID)
3. Transactional
4. Full functionality
5. Elastic
6. No slaves
12
Introducing Database Virtualization
Processed at the storage
tier, only results are sent
back to the database
Database Tier
(CPU)
Storage Tier
(I/O)
Distributed Parallel Process Across Storage Servers
Query:
What were my sales last month?
• Distributed Parallel Processing: Similar to Map-Reduce & Oracle Exadata
• This Narrows the Gap between Databases and Big Data
13
Database Virtualization Enables DBaaS
Processing shared
across database nodes
Highly-available data tier
shared across multiple
database clusters
Database Tier
(CPU)
Storage Tier
(I/O)
Virtualizes & Shares Storage Tier across Elastic Database Clusters
14
Cloud Computing’s Enabling Technologies
Server
• Server Virtualization
• VMWare, Citrix
Storage
• Storage Virtualization
• EMC, Netapp, IBM, Dell, HP
Network
• Network Virtualization
• Cisco, VMWare, Oracle
DBMS
• Database Virtualization
• ScaleDB
© Copyright 2013 ScaleDB. The information contained herein is subject to change without notice.
How About Performance?
16
Performance: ScaleDB vs. InnoDB
Performance tests running on DL380 servers, large data set
0
500
1000
1500
2000
2500
550
1238
1884
2236
MariaDB
+InnoDB
ScaleDB
1-Node
ScaleDB
2-Nodes
ScaleDB
3-Nodes
Benchmark Details: YCSB Workload A, 1:1 Read/Write Ratio, Database Size: 200M Rows, MariaDB V5.3.5
OperationsperSecond
17
Performance: ScaleDB vs. InnoDB
Performance tests running on HP Cloud (Read:Write Ratio = 1:1)
MySQL
+InnoDB
ScaleDB
1-Node
ScaleDB
2-Nodes
Benchmark Details: YCSB Workload A, 1:1 Read/Write Ratio, Database Size: 40M Rows, MySQL V5.1.42
OperationsperSecond
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
544
3542
4668
18
Performance: ScaleDB vs. InnoDB
Performance tests running on HP Cloud (Read-Only)
MySQL
+InnoDB
ScaleDB
1-Node
ScaleDB
2-Nodes
Benchmark Details: YCSB Workload A, 1:0 Read/Write Ratio, Database Size: 40M Rows, MySQL V5.1.42
0
2000
4000
6000
8000
10000
12000
930
6117
11920
OperationsperSecond
19
Performance: ScaleDB vs. InnoDB
Sysbench benchmark running on HP Cloud (Read-Only)
MySQL
+InnoDB
ScaleDB
1-Node
ScaleDB
2-Nodes
Benchmark Details: Sysbench, Read-Only, Database Size: 500M Rows, MySQL V5.1.42
TransactionsperSecond
0
50
100
150
200
250
7
134
250
20
Performance: ScaleDB vs. InnoDB
Sysbench benchmark running on HP Cloud (10% Write )
MySQL
+InnoDB
ScaleDB
1-Node
ScaleDB
2-Nodes
Benchmark Details: Sysbench, 10% Write, Database Size: 500M Rows, MySQL V5.1.42
TransactionsperSecond
0
10
20
30
40
50
60
70
80
3
50
79
21
Summary
• Database Scale-out & Parallelization Address Big Data
• Scaling-out SQL Database Problem: Distributed Locking
• Alternative 1: NoSQL
• Alternative 2: Sharding
• Both Shift Functionality to the Application Tier
• Introducing Database Virtualization…with Performance!
• Closing the Gap Between Databases and Big Data
© Copyright 2013 ScaleDB. The information contained herein is subject to change without notice.
Thank You

Database Virtualization: The Next Wave of Big Data

  • 1.
    Database Virtualization The NextWave of Big Data Mike Hogan, CEO
  • 2.
    2 Agenda • Big Data:A Moving Target • Common Understanding of Virtualization • Database Virtualization Challenge • Alternative 1: NoSQL • Alternative 2: Sharding • Introducing Database Virtualization • Narrowing the Gap Between Databases and Big Data
  • 3.
    3 Big Data: AMoving Target • Definition: Too much data to handle in a traditional database • Big Data tools leverage scale- out architectures e.g. Hadoop • Technology advances make Big Data a moving target • Databases adopting scale- out, virtual database architectures DataVolume Time BIG Data
  • 4.
    © Copyright 2013ScaleDB. The information contained herein is subject to change without notice. What is Database Virtualization?
  • 5.
    5 The Dedicated Server AServer Server Utilization Headroom (to avoid failure) Usage Spike (Average 10%)
  • 6.
    6 The Virtualized AppServer Shared among many customers Plenty of room for usage peaks Virtualization enables Cloud Providers to sell 3-4 TIMES more servers than they actually own. This is how they make money.
  • 7.
    7 Database Virtualization Challenges •No coordination between databases (data & locking) Bank Balance = $10M Withdraw $10M Wire $8M Wire $8M Bank Balance = -$16M Bank You • Requires a distributed locking solution • Distributed locking is fairly easy to build… • …but building it to perform well is extremely hard • It took Oracle RAC 10 years …70 “cloud years”
  • 8.
    8 Alternative 1: NoSQL Elasticityenables you to burst across servers, so you can run them at high utilization
  • 9.
    9 Alternative 1: NoSQL Movesfunctionality to the application tier…more work for you Your Application Cons: 1. Non-relational (build this into your app) 2. Reduces consistency: different users/different answers 3. Removes transactions (build this into your app) 4. Less functionality e.g. joins (build these into your app) The DBMS SQL NoSQL App App You buy this part You build & maintain this part Pros: 1. Scalability 2. Elastic = high utilization
  • 10.
    10 Alternative 2: SQLSharding Masters Slaves EACH server must handle the peak for ITS data Cons: 1. Not elastic = no bursting across servers 2. Rigid partitioning model 3. Requires slaves for fail-over (vs. high-availability) 4. You have to build & maintain routing code Pros: 1. Relational 2. Consistent data (ACID) 3. Transactional 4. Full functionality No elasticity means no bursting across servers, requiring low utilization. Not highly-available, relies on fail-over
  • 11.
    11 Introducing Database Virtualization Highly-availabledata tier shared across multiple database clusters Database Tier (CPU) Storage Tier (I/O) Virtualizes & Shares Storage Tier across Elastic Database Clusters Shared among many customers Plenty of room for usage peaks Pros: 1. Relational 2. Consistent data (ACID) 3. Transactional 4. Full functionality 5. Elastic 6. No slaves
  • 12.
    12 Introducing Database Virtualization Processedat the storage tier, only results are sent back to the database Database Tier (CPU) Storage Tier (I/O) Distributed Parallel Process Across Storage Servers Query: What were my sales last month? • Distributed Parallel Processing: Similar to Map-Reduce & Oracle Exadata • This Narrows the Gap between Databases and Big Data
  • 13.
    13 Database Virtualization EnablesDBaaS Processing shared across database nodes Highly-available data tier shared across multiple database clusters Database Tier (CPU) Storage Tier (I/O) Virtualizes & Shares Storage Tier across Elastic Database Clusters
  • 14.
    14 Cloud Computing’s EnablingTechnologies Server • Server Virtualization • VMWare, Citrix Storage • Storage Virtualization • EMC, Netapp, IBM, Dell, HP Network • Network Virtualization • Cisco, VMWare, Oracle DBMS • Database Virtualization • ScaleDB
  • 15.
    © Copyright 2013ScaleDB. The information contained herein is subject to change without notice. How About Performance?
  • 16.
    16 Performance: ScaleDB vs.InnoDB Performance tests running on DL380 servers, large data set 0 500 1000 1500 2000 2500 550 1238 1884 2236 MariaDB +InnoDB ScaleDB 1-Node ScaleDB 2-Nodes ScaleDB 3-Nodes Benchmark Details: YCSB Workload A, 1:1 Read/Write Ratio, Database Size: 200M Rows, MariaDB V5.3.5 OperationsperSecond
  • 17.
    17 Performance: ScaleDB vs.InnoDB Performance tests running on HP Cloud (Read:Write Ratio = 1:1) MySQL +InnoDB ScaleDB 1-Node ScaleDB 2-Nodes Benchmark Details: YCSB Workload A, 1:1 Read/Write Ratio, Database Size: 40M Rows, MySQL V5.1.42 OperationsperSecond 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 544 3542 4668
  • 18.
    18 Performance: ScaleDB vs.InnoDB Performance tests running on HP Cloud (Read-Only) MySQL +InnoDB ScaleDB 1-Node ScaleDB 2-Nodes Benchmark Details: YCSB Workload A, 1:0 Read/Write Ratio, Database Size: 40M Rows, MySQL V5.1.42 0 2000 4000 6000 8000 10000 12000 930 6117 11920 OperationsperSecond
  • 19.
    19 Performance: ScaleDB vs.InnoDB Sysbench benchmark running on HP Cloud (Read-Only) MySQL +InnoDB ScaleDB 1-Node ScaleDB 2-Nodes Benchmark Details: Sysbench, Read-Only, Database Size: 500M Rows, MySQL V5.1.42 TransactionsperSecond 0 50 100 150 200 250 7 134 250
  • 20.
    20 Performance: ScaleDB vs.InnoDB Sysbench benchmark running on HP Cloud (10% Write ) MySQL +InnoDB ScaleDB 1-Node ScaleDB 2-Nodes Benchmark Details: Sysbench, 10% Write, Database Size: 500M Rows, MySQL V5.1.42 TransactionsperSecond 0 10 20 30 40 50 60 70 80 3 50 79
  • 21.
    21 Summary • Database Scale-out& Parallelization Address Big Data • Scaling-out SQL Database Problem: Distributed Locking • Alternative 1: NoSQL • Alternative 2: Sharding • Both Shift Functionality to the Application Tier • Introducing Database Virtualization…with Performance! • Closing the Gap Between Databases and Big Data
  • 22.
    © Copyright 2013ScaleDB. The information contained herein is subject to change without notice. Thank You

Editor's Notes

  • #6 Average server utilization runs at about 10%, that then enables your IT or your cloud provider to use/sell the unused capabilities.
  • #7 Companies no longer have to
  • #8 Companies no longer have to
  • #9 Easy to build, you simply lock the other nodes, while one is writing….but then your performance is terrible. How hard is it to build this distributed lock manager? It took Oracle 10 years to get it right with RAC. 10 Years….That’s 70 cloud years…who has time for that?
  • #11 Mitigating Factors: “It depends”Distribution of data/loadUse of slaves to handle read load
  • #12 ScaleDB virtualizes the database, turning it into a database tier and a storage tier. The storage tier provides a pool of cache that is shared among various clusters, enabling it to share I/O peaks across multiple nodes. The database tier then enables very high utilization because they elastically expand to handle peaks. The only Con to this architecture is that it takes the developer a long time to build…but we’ve done that!
  • #13 ScaleDB virtualizes the database, turning it into a database tier and a storage tier. The storage tier provides a pool of cache that is shared among various clusters, enabling it to share I/O peaks across multiple nodes. The database tier then enables very high utilization because they elastically expand to handle peaks.
  • #14 ScaleDB virtualizes the database, turning it into a database tier and a storage tier. The storage tier provides a pool of cache that is shared among various clusters, enabling it to share I/O peaks across multiple nodes. The database tier then enables very high utilization because they elastically expand to handle peaks.