The World We Live In…
• According to IDC, the Database software market has a CAGR of 34.2%
• Wal-Mart generates 1 million new database records every hour
• Chevron generates data at a rate of 2TB/day!
• According to the Data Warehousing Institute 46% of companies plan to
replace their existing data warehousing platforms
• Every day, we create 2.5 quintillion bytes of data — so much that
90% of the data in the world today has been created in the last two
years alone.
MySQL Challenges
• Performance degrades as table sizes get larger
– Limitations of the underlying computer science
• Highly indexed schemas negatively impact performance
– More indexes helps query performance but hurts transactions
• Poor performance with complex queries
– Many table joins
• Data loading times are slow due to poor concurrency
– Table locking and single threaded operations
• Backup time and performance impact
– Big databases are slow to backup and effect system
performance
Technology Limitations
Most relational databases use Traditional B+ Trees which have architectural
limitations that become apparent with large data sets or heavy indexing
120,000
100,000
80,000
60,000
40,000
20,000
-
997,000
5,901,000
InnoDB Inser ng rows into a table with 7 Indexes
(using iiBench with 10 clients in insert only with secondary indexes)
Running on So Layer 32 Core system with 32GB
RAM and 4 drive HDD RAID5.
MySQL 5.5.35 running on Ubuntu 12.04.
Key InnoDB parameters:
innodb_buffer_pool_size= 4G
innodb_flush_log_at_trx_commit =0
innodb_flush_method=O_DIRECT
innodb_log_file_size=100M
innodb_log_files_in_group=2
innodb_log_buffer_size=16M
10,857,000
15,855,000
20,741,000
25,684,000
30,743,000
35,598,000
40,562,000
45,533,000
50,680,000
55,678,000
60,619,000
65,510,000
70,566,000
75,550,000
80,608,000
85,547,000
90,652,000
Insert Rate
Row Count
Elapsed me: 40,600
seconds
Cache Ahead Summary Index Tree
Derived from the
classic B+ Tree
Embedded statistics and other meta-data
in the nodes improves both tree
navigation and indexing
Branch node segments
can vary in size based on
actual data values
Summary nodes provide a
mechanism navigate
extremely large tables by
minimizing the number of
branches walked
Wider trees with embedded
meta-data to enhance
search and modification
operations
CASI Tree Instantiations:
• A CASI Tree exists in both memory and on
disk for each table and index
• The structure of the Tree on disk and in
memory are different
• The (re)organization of the Tree on disk
happens asynchronously from the one in
memory based on adaptive algorithms, to
yield improved disk I/O and CPU concurrency
Root Node
Branch
Summary Node
Node
Summary
Node
Summary
Branch Node
Node
Branch
Node
CASI Tree Benefits
CONSTANT TIME INDEXING
Lightning fast indexing at extreme
scale
SEGMENTED COLUMN STORE
Accelerates analytic operations and
data management at scale
STREAMING I/O
Maximizes disk throughput with
highly efficient use of IOPS
EXTREME CONCURRENCY
Minimizes locks and wait states to
maximize CPU throughput
INTELLIGENT CACHING
Uses adaptive segment sizes and
summaries to eliminate many disk reads
BUILD FOR THE CLOUD
Adaptive configuration and continuous
optimization eliminates scheduled
downtime
CASI Tree Principles:
• Always try to append data to file (i.e. don't seek, use the current seek position)
• Read data sequentially (i.e. don't seek, use the current seek position for next sequence of reads)
• Continually re-writes & reorders data such that the previous two principles above are met
Constant Time Indexing
Minimizes index cost enabling high performance heavily indexed tables
Different data structures on disk and in memory
All work is performed in constant-time eliminating the need for periodic flushing
Streaming
File I/O
(No memory map page
size limitations)
In Memory: Enhanced B+ Tree
• Optimized for ‘wide’ nodes with
accelerated operations
• Stores index summaries to achieve
great scale while maximizing cache
effectiveness
• Values are stored independently of
the tree
• Tree rebalancing occurs only in
memory – no impact on data stored
on disk
• No fixed page/block sizes
On Disk: Segmented Column Store
• Highly optimized for on-disk
read/write access
• Never requires operational/in-place
rebalancing
• All previous database states are
available
• Efficiently supports variable size
keys, values and ‘point reads’
• Utilizes segmented column store
technology for indexes and columns
Key Benefits: Increases maximum practical table sizes and improves analytic
performance by allowing for more indexing
Segmented Column Store
Structure of the index files for the database
– Provides the functional capabilities of a column store
– Simultaneously read and write optimized
– Instantaneous database start up/shut down
– Columns are updated in tandem with value changes
– Consistent performance and latency; optimized in real time
– Columns consist of variable length segments
– Each segment is a block of ordered keys, references to rows and
meta-data
– Changes to the key space require only delta updates
Optimized for real-time analytics
– Embedded statistical data in each segment
– Allows for heavy indexing to improve query performance
– Enables continuous transactional data feed
Suited for high levels of compression
– Compact representation of keys with summarization
– Flexible segment and delta compression
Segmented Column Store
Header
Segment Type & Size
Meta-Data
Segment Type & Size
Segment
A
Segment
B
Delta
Changes
to
Segment
A
Back Reference
Keys and /or Values
Segment Type & Size
Meta-Data
Keys and /or Values
Segment Type & Size
Segment Type & Size
Meta-Data
Keys and /or Values
Segment Type & Size
Key Benefits: Excellent compression facilities and improved query performance.
Supports continuous streaming backups with snapshots
Streaming I/O
• Massively optimized delivering near wire speed throughput
• Append only file structures virtually eliminate disk seeks
• Concurrent operations for updates in memory & on disk
• Optimizations for SSD, HDD, and in-memory-only operation
• Minimizes IO wait states
Data Streams Streaming Transactional State Logging
DeepDB
Streaming Indexing
Key Benefits: Achieves near SSD like performance with magnetic HDD’s. Extends
the life expectancy of SSD’s with built in wear leveling and no write
amplification
Extreme Concurrency
Running the Sysbench test On a 32 CPU core system with 32 attached clients
Strands system resources and
takes longer to complete the
test
Load time 8m59s
Test Time 54.09s
Transaction rate: 1.4k/sec
Utilizes ~100% of available
system resources to complete
the test
Load time 23.96s
Test Time 5.82s
Transaction rate: 15k/sec
Key Benefits: Database operations take full advantage of all allocated system
resources and dramatically improves system performance
Intelligent Caching
• Adaptive algorithms manage cache usage
– Dynamically sized data segments
– Point read capable: no page operations
• In-memory compression
– Maximizes cache effectiveness
– Adaptive operation manages compression vs. performance
• Summary indexing reduces cache ‘thrashing’
– Only pull in the data that is relevant
– No need to pull ‘pages’ in to cache
Key Benefits: Improves overall system performance by staying in cache more
often then standard MySQL
Built for the Cloud
• Designed for easy deployments with virtually no
configuration required in most cases
• No off-line operations
– Continuous defragmentation & optimization
– No downtime for scheduled maintenance
• Linear performance and consistent low latency
• Instantaneous startup and shutdown
• No performance degradations due to B+ Tree
rebalancing or log flushing
Key Benefits: Rapid deployment with almost no configuration and no off- line
maintenance operations. Delivers greatly enhanced performance
when using network based storage
DeepDB for MySQL
A storage engine that breaks through current
performance and scaling limitations
– Easy-to-install plugin replacement for the
InnoDB storage engine
– Requires no application or schema changes
– Scales-up performance of existing systems
– Increases practical data sizes and complexity
– Billions of rows with high index densities
– High performance index creation/maintenance
– High performance ACID transactions with
consistently low latency
– Reduced query latencies
Application Examples:
Wordpress | SugarCRM | Drupal
PHP | Perl | Python | Etc.
Apache Server MySQL
DeepDB InnoDB
CentOS | RHEL | Ubuntu
Bare metal | Virtualized | Cloud
Benefits The Entire Data
Lifecycle
Load
- Delimited files
- Dump files
Operate
- Transactions
- Compress
Analyze
- Replicate
- Query
Protect
- Backup
- Recover
DeepDB
Provides enhanced
scaling and
performance across
a broad set of use
cases
Compatible with all
existing MySQL
applications and
tool chains
Designed to fully
leverage todays
powerful
computing
systems
Optimized for
deployment in the
cloud with adaptive
behavior and on-line
maintenance
Data Loading
15
DeepDB
Reduces data
loading times by
20x or more
Whether you are
loading delimited
files or restoring
MySQL dump files
DeepDB can
dramatically reduce
your load times
DeepDB’s data
loading advantage
can be seen in both
dedicated bare-metal
and cloud based
deployments
Transactional Performance
Use Cases
(All tests performed on MySQL 5.5)
MySQL with DeepDB MySQL with
InnoDB
Improvement
Streaming Data test (Machine-to-Machine)
(iiBench Maximum Transactions/second with Single index)
3.795M/sec 217k/sec 17x
Transactional Workload Test (Financial)
(Sysbench transaction rate)
15,083/sec 1,381/sec 11x
Complex Transactional Test (e-Commerce)
(DBT-2 transaction rate using HDD)
205,184/min 15,086/min 13.6x
Social Media Transactional Test (Twitter)
(iiBench with 250M Rows,7 Indexes w/ composite keys)
Database Creation 15 Minutes 24 Hours 96x
First query from cold start 50 seconds 5.5 Minutes 6.6x
Second query from Cold start 1 second 240 seconds 240x
Disk storage footprint (uncompressed) 29GB 50G 42%
16
Reduces Disk Size Requirements
18
6,000
5,000
4,000
3,000
2,000
1,000
-
Uncompressed
Compressed
5,400
2,800
3,780
640
Size in GB
On Disk Data Size
InnoDB
DeepDB
Cut Your Query Times in Half
2.5
2
1.5
1
0.5
DeepDB improves query speed by 1.5 to 2 times when measured
19
against DBT3 benchmark
1.75
1.86
2.00
1.88
1.93
2.06
1.62
1.87
0
SF=1, 2G, Avg 2
runs
SF=1, 16G, Avg 5
runs
SF=1, 16G, Key
Comp, Avg 5 runs
SF=2, 16G, Avg 5
runs
SF=2, 16G, Key
Comp, Avg 5 runs
SF=5, 16G, Avg 2
runs
SF=5, 16G, Key
Comp, Avg 5 runs
Overall Average
Times Faster
DBT3 Performance Comparison Summary
Average query performance across various configura ons
InnoDB DeepDB
Protect Your Data
DeepDB architecture eliminates potential data
integrity problems and patent-pending error
recovery completes in just seconds
• No updates in place
• No memory map
Unique data structures support real-time and
continuous streaming backups to ensure data is
always protected
• Append only files provide natural incremental
backups
DeepDB
Ensures your
data is
continually
backed up and
available
20
DeepDB Advantages
21
The Ultimate
MySQL
Storage Engine
50% Smaller Data
Footprint
Reduces compressed or
uncompressed data to
less than half the size of
InnoDB
5x-10x
Improvement in ACID
transactional throughput
Plug-in
Replacement for
InnoDB
Install DeepDB without
any changes to existing
MySQL Applications
HDD=SSD
Increases effective HDD
throughput to near SSD
levels and extends SSD
life up to 10x
1B+ Rows
Provides high
performance support for
very large tables
20x Faster Data
Loading
Concurrent operations
and IO optimizations
reduces load times
Run Queries Twice
as Fast
Summary Indexing
techniques enable ultra
low latency queries
Real-Time Backups
Create streaming
backups with
snapshotting
Low Latency
Replicas
Efficiently scale out
analytics and read heavy
work loads