Storage and performance, Whiptail

2

STORAGE and
PERFORMANCE
Darren Williams
Technical Director, EMEA & APAC

THE PROBLEM WITH PERFORMANCE
Accelerate
Workloads

--------

Decrease
Costs

-Accelerate Productivity
-Scale
-Total Costs

A “More Assets” Problem

Resources

Resources

Storage Decisions

Database

60
drives

3 TB
11k IOPS
0% Write

72
drives
Or
discs
Or
cache
Or
arrays
13k IOPS
25% Write

96
drives
Or
more
discs
Or
more
cache
Or
more
arrays

3

Batch
OLTP
Analytics
VDI
HPC
Email

Video

A Demand Solution

Speed
Productivity
3 TB SQL – 17 k IOPS

Total Costs

And

12
TB

Batch – 20 k IOPS
And
OLTP – 10 k IOPS
And…

17k IOPS
80% Write

Workload

Workload

SINCE 1956, HDDS HAVE DEFINED
APPLICATION PERFORMANCE

Speed

Design

4

• 10s of MB/s Data
Transfer Rates
• 100s of Write / Read
operation per second
• .001s Latency (ms)
• Motors
• Spindles
• High Energy
Consumption

FLASH ENABLES APPLICATIONS TO
WRITE FASTER

Speed
Design

5

• 100s of MB/s data
transfer rates
• 1000s of Write or Read
operations per second
• .000001 Latency (µs)

• Silicon
• MLC/SLC NAND
• Low energy
consumption

USE OF FLASH – HOST SIDE – PCIE /
FLASH DRIVE DAS
• PCIe
–
–
–
–

Very fast and low latency
Expensive per GB
No redundancy
CPU/Memory stolen from host

• Flash SATA/SAS
– More cost effective
– Cant get more than 2 drives per blade
– Unmanaged can have perf / endurance issues

6

6

USE OF FLASH – ARRAY BASED
CACHE / TIERING

7

• Array flash cache
– Typically read only
– PVS already caches most reads
– Effectiveness limited by storage array designed for hard disks

• Automated storage tiering
– “Promotes” hot blocks into flash tier
– Only effective for READ
– Cache misses still result in “media” reads

7

USE OF FLASH – FLASH IN THE
TRADITIONAL ARRAY

8

• Flash in a traditional array
–
–
–
–
–

Typically uses SLC or eMLC media
High cost per GB
Array is not designed for flash media
Unmanaged will result in poor random write performance
Unmanaged will result in poor endurance

8

USE OF FLASH – FLASH IN THE ALL
FLASH ARRAY
•
•
•
•
•
•

Optimized to sustain High Write and Read throughput
High bandwidth and IOPS. Low latency.
Multi-protocol
LUN Tunable performance
Software designed to enhance lower cost NAND MLC
Flash by optimizing High Write throughput while
substantially reducing wear
• RAID protection and replication

9

NAND FLASH FUNDAMENTALS:

11

HDD WRITE PROCESS REVIEW

Rewritten data block
4K data blocks

A physical HDD is a bit-addressable medium!
Virtually limitless write and rewrite
capabilities.

STANDARD NAND FLASH ARRAY
WRITE I/O
Fabric

ISCSI

FC

SRP

1. Write request from host
passes over fabric through
HBAs.

2. Write request passes
through the transport stack
to RAID.

Unified Transport
RAID
HBA

NAND
Flash x 8

HBA

NAND
Flash x8

HBA

NAND
Flash x8

3. Request is written to
media.

12

NAND FLASH FUNDAMENTALS:
FLASH WRITE PROCESS

13

2MB NAND Page

1. NAND Page contents are
read to a buffer.
2. NAND Page is erased
(aka, “flashed”).
3. Buffer is written back
with previous data and any
changed or new blocks –
including zeroes.

UNDERSTANDING
ENDURANCE/RANDOM WRITE
PERFORMANCE

14

 Endurance





Each cell has physical limits (dielectric breakdown) 2K-5K PE’s
Time to erase a block is non-deterministic (2-6 ms)
Program time is fairly static based on geometry
Failure to control write amplification *will* cause wear out in a
short amount of time
 Desktop workload is one of the worst for write amplification
 Most writes are 4-8KB

• Random Write Performance
– Write amplification not only causes wear out issues, it also
creates unnecessary delays in small random write workloads.
– What is the point of higher cost flash storage with latency
between 2-5ms?

14

RACERUNNER OS:

15

DESIGN AND OPERATION

Fabric

iSCSI

FC

SRP

Unified Transport
RaceRunner
BlockTranslation Layer:
Alignment | Linearization

Enhanced RAID

NAND SSD
x8

HBA
NAND SSD
x8

2. Write request passes
through the transport stack to
BTL.
3. Incoming blocks are
aligned to native NAND page
size.

Data integrity Layer

HBA

1. Write request from host
passes over fabric through
HBAs.

HBA
NAND SSD
x8

4. Request is written to
media.

THE DATA WAITING DAYS ARE OVER

ACCELA
1.5TB – 12TB
250,000 IOPS
1.9 GB/s Bandwidth

Scalability Path

INVICTA
2-6 Nodes
6TB-72TB
650,000 IOPS
7GB/s Bandwidth

INVICTA – INFINITY (Q1/13)
7-30 Nodes
21TB-360TB
800,000 – 4 Million IOPS
40GB/s Bandwidth

16

THE DATA WAITING DAYS ARE OVER

17

ACCELA

INVICTA

INVICTA INFINITY

Height

2U

6U-14U

16U-64U

Capacity

1.5TB-12TB

6TB-72TB

21TB-360TB

IOPS

Up to 250K

250K – 650K

800K – 4M

Bandwidth

Up to 1.9GB/Sec

Up to 7GB/Sec

Up to 40GB/Sec

Latency

120µs

220µs

250µs

Interfaces

2/4/8 Gbit/Sec FC
1/10 GBE
Infiniband

Protocols

FC, ISCSI, NFS, QDR

Features

RAID Protection & Hot Sparing
Async Replication
VAAI
Write Protection Buffer

Options

vCenter Plugin/INVICTA Node
Kit

RAID Protection and Hot Sparing
LUN Mirroring and LUN Striping
Async Replication
VAAI
Write Protection Buffer
vCenter
Plugin/INFINITY Switch
Kit

vCenter Plugin

MULTI-WORKLOAD
REFERENCE ARCHITECTURE

18

Mercury
Workload Engines

Workload Type

Workload Demand

Dell DVD Store
MS SQL Server

1200 Transactions Per
Second (Continuous)

4,000 IOPS
.05 GB/s

VMWare
View

600 Desktops Boot Storm
(2:30)

109,000 IOPS
.153 GB/s

Heavy OLTP Simulation
100% 4K Writes
(Continuous)

86,000 IOPS
.350 GB/s

Batch Report Simulation
100% 64K Reads
(Continuous)

16,000 IOPS
1 GB/s

SQLIO
MS SQL Server

• INVICTA
•
•
•

350,000 IOPS
3.5 GB/s
18 TB

• 8 Servers

In 2012 Mercury traveled to Barcelona, New York, San
Francisco, Santa Clara, and Seattle demonstrating the
ability to accelerate multiple workloads on to Solid State
Storage.

215,000 IOPS
1.553 GB/s
Raid 5 HDD Equivalent = 3,800
RAID 10 HDD Equivalent = 2,000

FASTER DATABASE
BENCHMARKING

19

$13,000 Power Cost Reduction, 35U to 2U
Replaced 480 Short-Stroked
Hard Disk Drives with one 6 TB
WHIPTAIL Array supporting
multiple storage protocols

50x reduction in Latency

AMD’s systems engineering
department needed to bring various
database workloads up quickly and
efficiently in the Opteron Lab
Eliminate the time spent
performance tuning disk-based
storage systems

40% improvement in database
load times

Engineering team improved
workload cycle times

WHAT WHIPTAIL CAN OFFER:

20

Throughput …..

1.9GB/s – 40GB/s
120µs

Power …………….

90% less

Floor Space …….

90% less

Cooling …………..

90% less

Endurance …….

7.5yrs Guaranteed

Making Decision faster ….

• Cost

250K – 4m

Latency ………….

• Performance

IOPS ………………

POA

Highly experienced - 250+ customers since 2009 for VDI, Database , Analytics etc…
Best in class performance at most competitive price

Q&A

21

Email: darren.williams@whiptail.com

22

THANKYOU
Darren Williams
Email Darren.williams@whiptail.com
Twitter @whiptaildarren

Storage and performance, Whiptail

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Storage and performance, Whiptail

Similar to Storage and performance, Whiptail (20)

More from Internet World

More from Internet World (20)

Recently uploaded

Recently uploaded (20)

Storage and performance, Whiptail

Editor's Notes