Today’s Presenters
CopyrightŠ2015bytheDataManagementInstitute,LLC.AllRightsReserved.
Jon Toigo
Chairman, Data Management Institute
Sushant Rao
Senior Director of Product Marketing, DataCore
Presented By
Jon Toigo
Managing Principal Toigo Partners International
Chairman Data Management Institute
Ask anyone…
• Application performance, especially in virtual server settings, is
becoming a big problem…
• Flash memories and faster disk drives and interconnects seem to be a
temporary fix to the problem…
• What can be done? History provides a clue…
CopyrightŠ2015bytheDataManagementInstitute,LLC.AllRightsReserved.
History repeats itself
• In the 1890s, Karl Benz, Gottlieb Daimler and
Wilhelm Maybach perfected the first
automobiles to run on petroleum.
• Other notables, including Frederick William
Lanchester and George F. Foss, improved
designs…though Foss drove his vehicle for four
years, ignoring official warnings of impending
arrest for his “mad antics.”
• Foss’s single cylinder engine produced about .
024 horsepower and delivered dizzying speeds
of up to 24 mph (45 kmh)
CopyrightŠ2015bytheDataManagementInstitute,LLC.AllRightsReserved.
The Need for Speed
• By 1906, two, three, four and eight
cylinder engines began to appear.
• The V8, invented by Levavasser in 1902
and first built in 1906 by engineers in
Redondo Beach, CA, who called their
engine “the Coyote,” leveraged essentially
two 4 cylinder engines driving a common
crankshaft
• Rolls Royce first manufactured a V8 engine
powered vehicle, but the first mass-
produced vehicle came from Cadillac, the
Model 51, sporting 70 horsepower…
Buick Model F
2 Cylinder
Touring Car
4 cylinder Fiat with
48 horses and
56 mph (90 kmph)
Cadillac Model 51
(1915) First production
V8 made 70
horsepower
CopyrightŠ2015bytheDataManagementInstitute,LLC.AllRightsReserved.
An example of how embracing parallelism drove
performance gains
• Over time, we learned to increase the number of
cylinders and to configure them to derive greater
power and speed
• 4 cylinder engines produced power strokes every half
crankshaft revolution; an 8 cylinder engine every
quarter revolution.
• Power strokes are translated to the drive train to
increase propulsion
• Crossplane crankshafts were introduced to reduce
vibration from multi-engine parallel operations –
though racing V8s continue to use a single plane
crankshaft for faster acceleration
CopyrightŠ2015bytheDataManagementInstitute,LLC.AllRightsReserved.
Today…
• Chevy Camaro SS creates 455 hp using a 6.2 liter
V8; the ZL1 model delivers 184 mph
• Ford Mustang GT does 435 hp with a 5.0 liter V8;
the Shelby GT500 version makes 189 mph
• The Challenger SRT Hellcat delivers 707 hp and
199 mph (burning 1.5 gallons of petrol per
minute) with a 6.4 liter V8
• Those are American “muscle cars”
CopyrightŠ2015bytheDataManagementInstitute,LLC.AllRightsReserved.
Equal time to the Europeans
• The Aston Martin Alfa Romeo 4C boasts 237 hp from a 1.7
liter turbocharged 4 cylinder engine
• The 2015 Porsche Boxster delivers 265 hp from a 2.7 liter 6
cylinder engine, while the S and GTS produce 315 and 330 hp
respectively from a 3.4 liter 6
• Jaguar’s F Type delivers 190 mph and 495 horses from an
optional supercharged V8
• The BMW Z4 makes 249 hp from a turbo-charged 2.0 liter 4
cylinder, and sDrive35i models deliver 335 hp from a 3.0 liter
twin turbo six cylinder
• Ferrari 458 Italia makes 562 hp with a 4.5 liter V8
• Lamborghini Aventador makes well over 200 mph with a 6.5
liter 12 cylinder engine sporting 691 hp
CopyrightŠ2015bytheDataManagementInstitute,LLC.AllRightsReserved.
Okay. We get it…
• Engine designers have been coupling many cylinders into parallel
configurations yielding all kinds of expensive go fast cars that few of
us can afford…
Mercedes-Benz SLS AMG
583 hp from 6.2 liter V8
196 mph
Starting at $221,580
Audi R8
4.2 liter V8 does 430 hp
5.3 liter V10 does 550 hp
Starting at $182,500
Lamborghini Huracan LP610-4
5.2 liter v10 for 602 hp
202 mph
Starting at $244,000
Porsche 918 Spyder
4.6 liter V8 makes 608 hp
$845,000 (recently pulled
for serial problems)
CopyrightŠ2015bytheDataManagementInstitute,LLC.AllRightsReserved.
So it has been with application
performance…
• Server virtualization promised application performance and hosting efficiency…
• When it wasn’t delivered, different explanations and solutions were
proffered…
• Servers overburdened with I/O tasks they aren’t needed to execute – offload these
tasks to array controllers (aka server motherboards attached to disk drives) to free up
CPU for more important work
• Storage is too slow to keep up with CPU – add lots of Flash memory cache – spoofing
the problem
• Storage is too slow to keep up with CPU – ditch “legacy” SAN storage for direct-attached
software-defined storage
• VMs producing random I/O, ultimately randomizing data placement on storage: deploy
log structuring to overcome the I/O blender effect
CopyrightŠ2015bytheDataManagementInstitute,LLC.AllRightsReserved.
Innovation is good when it does the job…
• But when the problem is performance, shouldn’t we address the
causes of performance problems…
• Four basic categories
• Overloaded CPU
• Overloaded Memory
• Storage Latency
• Network Latency
• Focus has been on storage latency…
CopyrightŠ2015bytheDataManagementInstitute,LLC.AllRightsReserved.
But to understand speed problems,
sometimes it helps to look at the track…
• The I/O path has lots of interlocking pieces…
STORAGE I/O
Elements that handle
the reading and
writing of data to
physical storage media
RAW I/O
Elements ahead of
storage that impact I/O
performance
CopyrightŠ2015bytheDataManagementInstitute,LLC.AllRightsReserved.
But some performance issues have to do
with inefficiencies in the engine…
• The CPU is the engine of the computer
• Like cylinders in an auto engine, chip
cores determine the capacity and
throughput of the CPU
• As with autos, CPUs have undergone
significant change over time…
From the 60s until the early 90s,
much work in industry focused on
making multiple low power CPUs
work together in parallel processing
configurations for improved performance…
First transistorized CPU
CopyrightŠ2015bytheDataManagementInstitute,LLC.AllRightsReserved.
Then came the unicore tick-tock…
• Unicore (uniprocessor) – a computer system with a single central
processing unit. All processing tasks share a single CPU
Moore: “Transistors on a die
will double every 24 months…”
House: “What he said. Plus, we
will see chip clock speeds
double every 18 months…”
ALU = Arithmetic Logic Unit
CopyrightŠ2015bytheDataManagementInstitute,LLC.AllRightsReserved.
Ramifications…
• Computer designs based on sequential
processing and unicore chips drove the
PC and server revolution…
• Innovations and speed improvements
occurred too fast for the parallel
processing engineers to keep pace…
leading to some inherently limited
thinking…
CopyrightŠ2015bytheDataManagementInstitute,LLC.AllRightsReserved.
It worked…until it didn’t
• Two things have happened
• First, unicore ran out of steam
(House’s Hypothesis peaked in
2004)
• Second, Moore’s Law proceeded
and transistor densities
continued to double
• Result: Multicore processors that
did not demonstrate significant
clock speed improvements…
CopyrightŠ2015bytheDataManagementInstitute,LLC.AllRightsReserved.
Multicore chips + threading technology
enabled many more logical cores…
• Seized upon by Microsoft, VMware and others to do concurrent sequential
processing of application code
• But at the expense of I/O efficiency: I/O waits for sequential processing
CopyrightŠ2015bytheDataManagementInstitute,LLC.AllRightsReserved.
Multicore chips, meet Parallel I/O
• Parallel I/O is simply the use of a percentage of available logical CPU
cores to provide a scalable I/O processing engine…
• Derived from multiprocessor architecture, implemented by Datacore
as easily deployed and configured software…
Eight “Logical” Cores
Allocate a percentage of
logical cores to
processing I/O exclusively…
CopyrightŠ2015bytheDataManagementInstitute,LLC.AllRightsReserved.
CopyrightŠ2015bytheDataManagementInstitute,LLC.AllRightsReserved.
And in parallel…
Truth be told, comparatively few virtualized application
performance issues are storage related…
• Easy to see when they aren’t
• Despite what hypervisor vendors
may say, what you may need is
parallel I/O processing to alleviate
RAW I/O congestion and to
facilitate I/O operational
efficiency…
Short Storage I/O Queues
SERVERS
STORAGE
High CPU Processing Cycles
Hot CPU
CopyrightŠ2015bytheDataManagementInstitute,LLC.AllRightsReserved.
There are other issues that contribute to
performance problems…
• The I/O Blender Effect is real, needs to
be addressed to prevent
randomization of data placement on
storage devices
• Interconnects need to be properly
configured or virtualized for more
efficient allocation
• Storage devices need to be used in a
manner appropriate to capabilities
and workload requirements…
CopyrightŠ2015bytheDataManagementInstitute,LLC.AllRightsReserved.
However, Parallel I/O establishes a new tick-tock in storage
that will only improve as cores multiply…
CopyrightŠ2015bytheDataManagementInstitute,LLC.AllRightsReserved.
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
Impact of Parallel I/O on
Storage Performance
Sushant Rao, Sr. Director of Product Marketing
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
Storage (I/O) is the Bottleneck,
especially for Virtualized Infrastructure
1990 2000 2010 2020
Performance gap
between Compute
& Storage
Compute vs Storage
Performance
Yearly Performance Gains
Compute: 26%
Storage: 2%
25
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
Big Picture: DataCore
Parallel I/O Architecture
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
IO-Starved Virtualized Servers
Increasingly faster
Uni-processors
Compute
Work
Potential
2010 20202000
CPU clock rates
slow down
More cores
per socket
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
IO-Starved Virtualized Servers
Increasingly faster
Uni-processors
Compute
Serial IO
Work
Potential
2010 20202000
CPU clock rates
slow down
More cores
per socket
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
IO Gap
IO-Starved Virtualized Servers
Increasingly faster
Uni-processors
Compute
Serial IO
Work
Potential
2010 20202000
CPU clock rates
slow down
More cores
per socket
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
Serial vs. Parallel Processing
1
worker
(Serial)
Pile of work
Load 1
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
Serial vs. Parallel Processing
1
worker
(Serial)
Pile of work
Load 1 Load 2 Load 3
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
Serial vs. Parallel Processing
Time to finish1
worker
(Serial)
Pile of work
Load 1 Load 2 Load 3 Load 4 Load 5
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
Serial vs. Parallel Processing
Time to finish1
worker
(Serial)
Pile of work
5
workers
(Parallel)
Load 1
Load 2
Load 3
Load 4
Load 5
Load 1 Load 2 Load 3 Load 4 Load 5
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
Modern Multi-core CPUs
Worker
1
Worker
2
Worker
3
Worker
4
Worker
5
Worker
6
Worker
7
Worker
8
Worker
9
Worker
10
Multiple “workers” capable of simultaneously handling
compute, networking and I/O loads
10-cores
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
Standard use of Multi-core CPUs
in Virtual Servers
VM
1
VM
2
VM
3
VM
4
VM
5
idle I/Oidle idle idle
Sequential Computing
with Concurrency
Serial I/O
VM = Virtual Machine
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
idle
Serial I/O Bottleneck in Virtualized Server
idleidleidleidle I/O
Compute
I/O
 Compute waits on I/O
 CPU cores are wasted
 Very little work gets done
Workload
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
idle
Serial I/O Bottleneck in Virtualized Server
idleidleidleidle I/O
Compute
I/O
 Compute waits on I/O
 CPU cores are wasted
 Very little work gets done
Workload
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
idle
Serial I/O Bottleneck in Virtualized Server
idleidleidleidle I/O
Compute
I/O
 Compute waits on I/O
 CPU cores are wasted
 Very little work gets done
Workload
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
idle
Serial I/O Bottleneck in Virtualized Server
idleidleidleidle I/O
Compute
I/O
 Compute waits on I/O
 CPU cores are wasted
 Very little work gets done
Workload
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
idle
Serial I/O Bottleneck in Virtualized Server
idleidleidleidleI/O
Compute
I/O
 Compute waits on I/O
 CPU cores are wasted
 Very little work gets done
Workload
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
Impact: Many servers needed to spread I/O
Workload
Server 2 Server 3 Server 4 Server 5Server 1
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
I/OI/OI/OI/O
Turbo-Charge through Parallel I/O
Compute
I/O
Workload
 I/O keeps pace with
compute demands
 CPU cores are fully used
 Lots of work gets done in
very little time
I/O
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
I/OI/OI/OI/O
Turbo-Charge through Parallel I/O
Compute
I/O
Workload
 I/O keeps pace with
compute demands
 CPU cores are fully used
 Lots of work gets done in
very little time
I/O
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
I/OI/OI/OI/O
Turbo-Charge through Parallel I/O
Compute
I/O
Workload
 I/O keeps pace with
compute demands
 CPU cores are fully used
 Lots of work gets done in
very little time
I/O
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
I/OI/OI/OI/O
Turbo-Charge through Parallel I/O
Compute
I/O
Workload
 I/O keeps pace with
compute demands
 CPU cores are fully used
 Lots of work gets done in
very little time
I/O
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
I/OI/OI/OI/O
Turbo-Charge through Parallel I/O
Compute
I/O
Workload
 I/O keeps pace with
compute demands
 CPU cores are fully used
 Lots of work gets done in
very little time
I/O
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
I/OI/OI/OI/O
Turbo-Charge through Parallel I/O
Compute
I/O
Workload
 I/O keeps pace with
compute demands
 CPU cores are fully used
 Lots of work gets done in
very little time
I/O
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
I/OI/OI/OI/O
Turbo-Charge through Parallel I/O
Compute
I/O
Workload
 I/O keeps pace with
compute demands
 CPU cores are fully used
 Lots of work gets done in
very little time
I/O
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
I/OI/OI/OI/O
Turbo-Charge through Parallel I/O
Compute
I/O
Workload
 I/O keeps pace with
compute demands
 CPU cores are fully used
 Lots of work gets done in
very little time
I/O
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
I/OI/OI/OI/O
Turbo-Charge through Parallel I/O
Compute
I/O
Workload
 I/O keeps pace with
compute demands
 CPU cores are fully used
 Lots of work gets done in
very little time
I/O
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
51
Adaptive Parallel I/O
Workload
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
52
Adaptive Parallel I/O
Workload
Response
Time
(millisec)
IOPS
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
53
Adaptive Parallel I/O
Workload
Response
Time
(millisec)
IOPS
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
54
Adaptive Parallel I/O
Workload
Response
Time
(millisec)
IOPS
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
55
Adaptive Parallel I/O
Workload
Response
Time
(millisec)
IOPS
400,000 IOPS
< 1 millisec
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
56
Adaptive Parallel I/O
Workload
Response
Time
(millisec)
IOPS
400,000 IOPS
< 1 millisec
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
57
Adaptive Parallel I/O
Workload
Response
Time
(millisec)
IOPS
400,000 IOPS
< 1 millisec
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
58
Adaptive Parallel I/O
Workload
Response
Time
(millisec)
IOPS
400,000 IOPS
< 1 millisec
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
59
Adaptive Parallel I/O
Workload
No more load
400,000 IOPS
< 1 millisec
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
60
Adaptive Parallel I/O
Workload
No more load
400,000 IOPS
< 1 millisec
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
61
Adaptive Parallel I/O
Workload
No more load
400,000 IOPS
< 1 millisec
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
62
Adaptive Parallel I/O
Workload
No more load
400,000 IOPS
< 1 millisec
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
63
Adaptive Parallel I/O
Workload
No more load
400,000 IOPS
< 1 millisec
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
64
Adaptive Parallel I/O
Workload
No more load
400,000 IOPS
< 1 millisec
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
Worker
1
Worker
2
Worker
3
Worker
4
Worker
5
Worker
6
Worker
7
Worker
8
Worker
9
Worker
10
DataCore’s Adaptive use of
Multi-core CPUs in Virtual Servers
VM = Virtual Machine
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
Worker
1
Worker
2
Worker
3
Worker
4
Worker
5
Worker
6
Worker
7
Worker
8
Worker
9
Worker
10
DataCore’s Adaptive use of
Multi-core CPUs in Virtual Servers
VM
1
VM
2
VM
3
VM
4
VM
5
Sequential Computing
with Concurrency
VM = Virtual Machine
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
Worker
1
Worker
2
Worker
3
Worker
4
Worker
5
Worker
6
Worker
7
Worker
8
Worker
9
Worker
10
DataCore’s Adaptive use of
Multi-core CPUs in Virtual Servers
VM
1
VM
2
VM
3
VM
4
VM
5
I/O
Parallel I/O
VM = Virtual Machine
I/OI/O I/O I/O
Sequential Computing
with Concurrency
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
Translations:
Work completes in 1/5th* the time
2* machines can do the work of 10
*Varies based on number of I/O worker cores
 per-ə- lel, pa-rə-, -ləlˈ ˌ ˈ  ī-( )ōˈ ˌ
PARALLEL I/O
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
 Work completes in 1/5 the time
 2 machines can do work of 10
 5X lower overall solution cost
 All-inclusive simplicity
► Compute & storage services combined
DataCore Parallel I/O Breakthrough
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
SANsymphony-V
 Converged Storage
Server
 Separate compute and
shared storage tiers
 Virtualized and non-
virtualized applications
Virtual SAN
 Hyper-converged
 Compute & shared
storage on same nodes
 Virtualized applications
Adaptive Parallel I/O
available in 2 Products
Free Trial: datacore.com/resources/software-downloads.aspx
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
100%reduction in
storage-related
downtime
DataCore Benefits at a Glance
Surveyed DataCore™ Customers Report Up To:
www.techvalidate.com
75%reduction
in storage
costs
4xcapacity
utilization
90%decrease in time
spent on routine
storage tasks
10xperformance
increase
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
25,000+ Deployments Worldwide
10,000+ Customers 10th Gen Product
Companies in all Industries & Sizes
Market: Software-defined Storage
Technology: Storage Virtualization
Main Offices
• Australia
• Germany
• France
• Japan
• UK
• USA
Proven. Globally.
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
 Schedule a 15-minute live demo with one our technical
consultants at http://info.datacore.com/LiveDemo
Request a Live Demo
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
Thank You!
www.datacore.com

The Need for Speed: Parallel I/O and the New Tick-Tock in Computing

  • 2.
    Today’s Presenters Copyright©2015bytheDataManagementInstitute,LLC.AllRightsReserved. Jon Toigo Chairman,Data Management Institute Sushant Rao Senior Director of Product Marketing, DataCore
  • 3.
    Presented By Jon Toigo ManagingPrincipal Toigo Partners International Chairman Data Management Institute
  • 4.
    Ask anyone… • Applicationperformance, especially in virtual server settings, is becoming a big problem… • Flash memories and faster disk drives and interconnects seem to be a temporary fix to the problem… • What can be done? History provides a clue… Copyright©2015bytheDataManagementInstitute,LLC.AllRightsReserved.
  • 5.
    History repeats itself •In the 1890s, Karl Benz, Gottlieb Daimler and Wilhelm Maybach perfected the first automobiles to run on petroleum. • Other notables, including Frederick William Lanchester and George F. Foss, improved designs…though Foss drove his vehicle for four years, ignoring official warnings of impending arrest for his “mad antics.” • Foss’s single cylinder engine produced about . 024 horsepower and delivered dizzying speeds of up to 24 mph (45 kmh) Copyright©2015bytheDataManagementInstitute,LLC.AllRightsReserved.
  • 6.
    The Need forSpeed • By 1906, two, three, four and eight cylinder engines began to appear. • The V8, invented by Levavasser in 1902 and first built in 1906 by engineers in Redondo Beach, CA, who called their engine “the Coyote,” leveraged essentially two 4 cylinder engines driving a common crankshaft • Rolls Royce first manufactured a V8 engine powered vehicle, but the first mass- produced vehicle came from Cadillac, the Model 51, sporting 70 horsepower… Buick Model F 2 Cylinder Touring Car 4 cylinder Fiat with 48 horses and 56 mph (90 kmph) Cadillac Model 51 (1915) First production V8 made 70 horsepower Copyright©2015bytheDataManagementInstitute,LLC.AllRightsReserved.
  • 7.
    An example ofhow embracing parallelism drove performance gains • Over time, we learned to increase the number of cylinders and to configure them to derive greater power and speed • 4 cylinder engines produced power strokes every half crankshaft revolution; an 8 cylinder engine every quarter revolution. • Power strokes are translated to the drive train to increase propulsion • Crossplane crankshafts were introduced to reduce vibration from multi-engine parallel operations – though racing V8s continue to use a single plane crankshaft for faster acceleration Copyright©2015bytheDataManagementInstitute,LLC.AllRightsReserved.
  • 8.
    Today… • Chevy CamaroSS creates 455 hp using a 6.2 liter V8; the ZL1 model delivers 184 mph • Ford Mustang GT does 435 hp with a 5.0 liter V8; the Shelby GT500 version makes 189 mph • The Challenger SRT Hellcat delivers 707 hp and 199 mph (burning 1.5 gallons of petrol per minute) with a 6.4 liter V8 • Those are American “muscle cars” Copyright©2015bytheDataManagementInstitute,LLC.AllRightsReserved.
  • 9.
    Equal time tothe Europeans • The Aston Martin Alfa Romeo 4C boasts 237 hp from a 1.7 liter turbocharged 4 cylinder engine • The 2015 Porsche Boxster delivers 265 hp from a 2.7 liter 6 cylinder engine, while the S and GTS produce 315 and 330 hp respectively from a 3.4 liter 6 • Jaguar’s F Type delivers 190 mph and 495 horses from an optional supercharged V8 • The BMW Z4 makes 249 hp from a turbo-charged 2.0 liter 4 cylinder, and sDrive35i models deliver 335 hp from a 3.0 liter twin turbo six cylinder • Ferrari 458 Italia makes 562 hp with a 4.5 liter V8 • Lamborghini Aventador makes well over 200 mph with a 6.5 liter 12 cylinder engine sporting 691 hp Copyright©2015bytheDataManagementInstitute,LLC.AllRightsReserved.
  • 10.
    Okay. We getit… • Engine designers have been coupling many cylinders into parallel configurations yielding all kinds of expensive go fast cars that few of us can afford… Mercedes-Benz SLS AMG 583 hp from 6.2 liter V8 196 mph Starting at $221,580 Audi R8 4.2 liter V8 does 430 hp 5.3 liter V10 does 550 hp Starting at $182,500 Lamborghini Huracan LP610-4 5.2 liter v10 for 602 hp 202 mph Starting at $244,000 Porsche 918 Spyder 4.6 liter V8 makes 608 hp $845,000 (recently pulled for serial problems) Copyright©2015bytheDataManagementInstitute,LLC.AllRightsReserved.
  • 11.
    So it hasbeen with application performance… • Server virtualization promised application performance and hosting efficiency… • When it wasn’t delivered, different explanations and solutions were proffered… • Servers overburdened with I/O tasks they aren’t needed to execute – offload these tasks to array controllers (aka server motherboards attached to disk drives) to free up CPU for more important work • Storage is too slow to keep up with CPU – add lots of Flash memory cache – spoofing the problem • Storage is too slow to keep up with CPU – ditch “legacy” SAN storage for direct-attached software-defined storage • VMs producing random I/O, ultimately randomizing data placement on storage: deploy log structuring to overcome the I/O blender effect Copyright©2015bytheDataManagementInstitute,LLC.AllRightsReserved.
  • 12.
    Innovation is goodwhen it does the job… • But when the problem is performance, shouldn’t we address the causes of performance problems… • Four basic categories • Overloaded CPU • Overloaded Memory • Storage Latency • Network Latency • Focus has been on storage latency… Copyright©2015bytheDataManagementInstitute,LLC.AllRightsReserved.
  • 13.
    But to understandspeed problems, sometimes it helps to look at the track… • The I/O path has lots of interlocking pieces… STORAGE I/O Elements that handle the reading and writing of data to physical storage media RAW I/O Elements ahead of storage that impact I/O performance Copyright©2015bytheDataManagementInstitute,LLC.AllRightsReserved.
  • 14.
    But some performanceissues have to do with inefficiencies in the engine… • The CPU is the engine of the computer • Like cylinders in an auto engine, chip cores determine the capacity and throughput of the CPU • As with autos, CPUs have undergone significant change over time… From the 60s until the early 90s, much work in industry focused on making multiple low power CPUs work together in parallel processing configurations for improved performance… First transistorized CPU Copyright©2015bytheDataManagementInstitute,LLC.AllRightsReserved.
  • 15.
    Then came theunicore tick-tock… • Unicore (uniprocessor) – a computer system with a single central processing unit. All processing tasks share a single CPU Moore: “Transistors on a die will double every 24 months…” House: “What he said. Plus, we will see chip clock speeds double every 18 months…” ALU = Arithmetic Logic Unit Copyright©2015bytheDataManagementInstitute,LLC.AllRightsReserved.
  • 16.
    Ramifications… • Computer designsbased on sequential processing and unicore chips drove the PC and server revolution… • Innovations and speed improvements occurred too fast for the parallel processing engineers to keep pace… leading to some inherently limited thinking… Copyright©2015bytheDataManagementInstitute,LLC.AllRightsReserved.
  • 17.
    It worked…until itdidn’t • Two things have happened • First, unicore ran out of steam (House’s Hypothesis peaked in 2004) • Second, Moore’s Law proceeded and transistor densities continued to double • Result: Multicore processors that did not demonstrate significant clock speed improvements… Copyright©2015bytheDataManagementInstitute,LLC.AllRightsReserved.
  • 18.
    Multicore chips +threading technology enabled many more logical cores… • Seized upon by Microsoft, VMware and others to do concurrent sequential processing of application code • But at the expense of I/O efficiency: I/O waits for sequential processing Copyright©2015bytheDataManagementInstitute,LLC.AllRightsReserved.
  • 19.
    Multicore chips, meetParallel I/O • Parallel I/O is simply the use of a percentage of available logical CPU cores to provide a scalable I/O processing engine… • Derived from multiprocessor architecture, implemented by Datacore as easily deployed and configured software… Eight “Logical” Cores Allocate a percentage of logical cores to processing I/O exclusively… Copyright©2015bytheDataManagementInstitute,LLC.AllRightsReserved.
  • 20.
  • 21.
    Truth be told,comparatively few virtualized application performance issues are storage related… • Easy to see when they aren’t • Despite what hypervisor vendors may say, what you may need is parallel I/O processing to alleviate RAW I/O congestion and to facilitate I/O operational efficiency… Short Storage I/O Queues SERVERS STORAGE High CPU Processing Cycles Hot CPU Copyright©2015bytheDataManagementInstitute,LLC.AllRightsReserved.
  • 22.
    There are otherissues that contribute to performance problems… • The I/O Blender Effect is real, needs to be addressed to prevent randomization of data placement on storage devices • Interconnects need to be properly configured or virtualized for more efficient allocation • Storage devices need to be used in a manner appropriate to capabilities and workload requirements… Copyright©2015bytheDataManagementInstitute,LLC.AllRightsReserved.
  • 23.
    However, Parallel I/Oestablishes a new tick-tock in storage that will only improve as cores multiply… Copyright©2015bytheDataManagementInstitute,LLC.AllRightsReserved.
  • 24.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. Copyright © 2015 DataCore Software Corp. – All Rights Reserved. Impact of Parallel I/O on Storage Performance Sushant Rao, Sr. Director of Product Marketing
  • 25.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. Storage (I/O) is the Bottleneck, especially for Virtualized Infrastructure 1990 2000 2010 2020 Performance gap between Compute & Storage Compute vs Storage Performance Yearly Performance Gains Compute: 26% Storage: 2% 25
  • 26.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. Big Picture: DataCore Parallel I/O Architecture
  • 27.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. IO-Starved Virtualized Servers Increasingly faster Uni-processors Compute Work Potential 2010 20202000 CPU clock rates slow down More cores per socket
  • 28.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. IO-Starved Virtualized Servers Increasingly faster Uni-processors Compute Serial IO Work Potential 2010 20202000 CPU clock rates slow down More cores per socket
  • 29.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. IO Gap IO-Starved Virtualized Servers Increasingly faster Uni-processors Compute Serial IO Work Potential 2010 20202000 CPU clock rates slow down More cores per socket
  • 30.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. Serial vs. Parallel Processing 1 worker (Serial) Pile of work Load 1
  • 31.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. Serial vs. Parallel Processing 1 worker (Serial) Pile of work Load 1 Load 2 Load 3
  • 32.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. Serial vs. Parallel Processing Time to finish1 worker (Serial) Pile of work Load 1 Load 2 Load 3 Load 4 Load 5
  • 33.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. Serial vs. Parallel Processing Time to finish1 worker (Serial) Pile of work 5 workers (Parallel) Load 1 Load 2 Load 3 Load 4 Load 5 Load 1 Load 2 Load 3 Load 4 Load 5
  • 34.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. Modern Multi-core CPUs Worker 1 Worker 2 Worker 3 Worker 4 Worker 5 Worker 6 Worker 7 Worker 8 Worker 9 Worker 10 Multiple “workers” capable of simultaneously handling compute, networking and I/O loads 10-cores
  • 35.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. Standard use of Multi-core CPUs in Virtual Servers VM 1 VM 2 VM 3 VM 4 VM 5 idle I/Oidle idle idle Sequential Computing with Concurrency Serial I/O VM = Virtual Machine
  • 36.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. idle Serial I/O Bottleneck in Virtualized Server idleidleidleidle I/O Compute I/O  Compute waits on I/O  CPU cores are wasted  Very little work gets done Workload
  • 37.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. idle Serial I/O Bottleneck in Virtualized Server idleidleidleidle I/O Compute I/O  Compute waits on I/O  CPU cores are wasted  Very little work gets done Workload
  • 38.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. idle Serial I/O Bottleneck in Virtualized Server idleidleidleidle I/O Compute I/O  Compute waits on I/O  CPU cores are wasted  Very little work gets done Workload
  • 39.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. idle Serial I/O Bottleneck in Virtualized Server idleidleidleidle I/O Compute I/O  Compute waits on I/O  CPU cores are wasted  Very little work gets done Workload
  • 40.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. idle Serial I/O Bottleneck in Virtualized Server idleidleidleidleI/O Compute I/O  Compute waits on I/O  CPU cores are wasted  Very little work gets done Workload
  • 41.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. Impact: Many servers needed to spread I/O Workload Server 2 Server 3 Server 4 Server 5Server 1
  • 42.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. I/OI/OI/OI/O Turbo-Charge through Parallel I/O Compute I/O Workload  I/O keeps pace with compute demands  CPU cores are fully used  Lots of work gets done in very little time I/O
  • 43.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. I/OI/OI/OI/O Turbo-Charge through Parallel I/O Compute I/O Workload  I/O keeps pace with compute demands  CPU cores are fully used  Lots of work gets done in very little time I/O
  • 44.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. I/OI/OI/OI/O Turbo-Charge through Parallel I/O Compute I/O Workload  I/O keeps pace with compute demands  CPU cores are fully used  Lots of work gets done in very little time I/O
  • 45.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. I/OI/OI/OI/O Turbo-Charge through Parallel I/O Compute I/O Workload  I/O keeps pace with compute demands  CPU cores are fully used  Lots of work gets done in very little time I/O
  • 46.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. I/OI/OI/OI/O Turbo-Charge through Parallel I/O Compute I/O Workload  I/O keeps pace with compute demands  CPU cores are fully used  Lots of work gets done in very little time I/O
  • 47.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. I/OI/OI/OI/O Turbo-Charge through Parallel I/O Compute I/O Workload  I/O keeps pace with compute demands  CPU cores are fully used  Lots of work gets done in very little time I/O
  • 48.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. I/OI/OI/OI/O Turbo-Charge through Parallel I/O Compute I/O Workload  I/O keeps pace with compute demands  CPU cores are fully used  Lots of work gets done in very little time I/O
  • 49.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. I/OI/OI/OI/O Turbo-Charge through Parallel I/O Compute I/O Workload  I/O keeps pace with compute demands  CPU cores are fully used  Lots of work gets done in very little time I/O
  • 50.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. I/OI/OI/OI/O Turbo-Charge through Parallel I/O Compute I/O Workload  I/O keeps pace with compute demands  CPU cores are fully used  Lots of work gets done in very little time I/O
  • 51.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. 51 Adaptive Parallel I/O Workload
  • 52.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. 52 Adaptive Parallel I/O Workload Response Time (millisec) IOPS
  • 53.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. 53 Adaptive Parallel I/O Workload Response Time (millisec) IOPS
  • 54.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. 54 Adaptive Parallel I/O Workload Response Time (millisec) IOPS
  • 55.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. 55 Adaptive Parallel I/O Workload Response Time (millisec) IOPS 400,000 IOPS < 1 millisec
  • 56.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. 56 Adaptive Parallel I/O Workload Response Time (millisec) IOPS 400,000 IOPS < 1 millisec
  • 57.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. 57 Adaptive Parallel I/O Workload Response Time (millisec) IOPS 400,000 IOPS < 1 millisec
  • 58.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. 58 Adaptive Parallel I/O Workload Response Time (millisec) IOPS 400,000 IOPS < 1 millisec
  • 59.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. 59 Adaptive Parallel I/O Workload No more load 400,000 IOPS < 1 millisec
  • 60.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. 60 Adaptive Parallel I/O Workload No more load 400,000 IOPS < 1 millisec
  • 61.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. 61 Adaptive Parallel I/O Workload No more load 400,000 IOPS < 1 millisec
  • 62.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. 62 Adaptive Parallel I/O Workload No more load 400,000 IOPS < 1 millisec
  • 63.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. 63 Adaptive Parallel I/O Workload No more load 400,000 IOPS < 1 millisec
  • 64.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. 64 Adaptive Parallel I/O Workload No more load 400,000 IOPS < 1 millisec
  • 65.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. Worker 1 Worker 2 Worker 3 Worker 4 Worker 5 Worker 6 Worker 7 Worker 8 Worker 9 Worker 10 DataCore’s Adaptive use of Multi-core CPUs in Virtual Servers VM = Virtual Machine
  • 66.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. Worker 1 Worker 2 Worker 3 Worker 4 Worker 5 Worker 6 Worker 7 Worker 8 Worker 9 Worker 10 DataCore’s Adaptive use of Multi-core CPUs in Virtual Servers VM 1 VM 2 VM 3 VM 4 VM 5 Sequential Computing with Concurrency VM = Virtual Machine
  • 67.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. Worker 1 Worker 2 Worker 3 Worker 4 Worker 5 Worker 6 Worker 7 Worker 8 Worker 9 Worker 10 DataCore’s Adaptive use of Multi-core CPUs in Virtual Servers VM 1 VM 2 VM 3 VM 4 VM 5 I/O Parallel I/O VM = Virtual Machine I/OI/O I/O I/O Sequential Computing with Concurrency
  • 68.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. Translations: Work completes in 1/5th* the time 2* machines can do the work of 10 *Varies based on number of I/O worker cores per-ə- lel, pa-rə-, -ləlˈ ˌ ˈ ī-( )ōˈ ˌ PARALLEL I/O
  • 69.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved.  Work completes in 1/5 the time  2 machines can do work of 10  5X lower overall solution cost  All-inclusive simplicity ► Compute & storage services combined DataCore Parallel I/O Breakthrough
  • 70.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. SANsymphony-V  Converged Storage Server  Separate compute and shared storage tiers  Virtualized and non- virtualized applications Virtual SAN  Hyper-converged  Compute & shared storage on same nodes  Virtualized applications Adaptive Parallel I/O available in 2 Products Free Trial: datacore.com/resources/software-downloads.aspx
  • 71.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. 100%reduction in storage-related downtime DataCore Benefits at a Glance Surveyed DataCore™ Customers Report Up To: www.techvalidate.com 75%reduction in storage costs 4xcapacity utilization 90%decrease in time spent on routine storage tasks 10xperformance increase
  • 72.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. 25,000+ Deployments Worldwide 10,000+ Customers 10th Gen Product Companies in all Industries & Sizes Market: Software-defined Storage Technology: Storage Virtualization Main Offices • Australia • Germany • France • Japan • UK • USA Proven. Globally.
  • 73.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved.  Schedule a 15-minute live demo with one our technical consultants at http://info.datacore.com/LiveDemo Request a Live Demo
  • 74.
    Copyright © 2015DataCore Software Corp. – All Rights Reserved. Thank You! www.datacore.com

Editor's Notes

  • #25 V16 – Updated October 12, 2015 3/30/16 Copyright Š 2015 DataCore Software Corp. – All Rights Reserved. &amp;lt;number&amp;gt;
  • #26 Key Points: Another issue is I/O performance CPU, DRAM, Network and BUS speed have increased a minimum of 700% to 10,000% But, hard disks have only gone up by 20% in performance Script: In addition, I/O performance is another issue. You may think that you can stick all that data on high capacity, slow hard disks. But, then storage will be the bottleneck. Compute, Memory, Network and BUS speeds have increased from a minimum of 700% to 10,000%. But, hard drives have only seen an increase of 20% in performance. The slower that storage system, the longer it will take to access and process the data. 3/30/16 Copyright Š 2015 DataCore Software Corp. – All Rights Reserved. &amp;lt;number&amp;gt;
  • #28 IO has long been a serial function designed for single processor machines. While processor speeds climbed, IO processing benefitted from it and kept pace with computational gains. However, when power and heat concerns flattened the CPU clock speed ramp, chipmakers switched their focus on more cores per socket. Now, multi-core designs have the potential to do far more compute work across separate virtual machines than the serial IO is able to process on a single CPU. In other words, serial IO limits the potential work that multi-core CPUs can accomplish. And that IO gap is widening. The physical machine can’t get enough input and output to keep all those processors busy computing – so CPUs go idle and we are forced to spread the work over more underutilized servers. 3/30/16 Copyright Š 2015 DataCore Software Corp. – All Rights Reserved. &amp;lt;number&amp;gt;
  • #29 IO has long been a serial function designed for single processor machines. While processor speeds climbed, IO processing benefitted from it and kept pace with computational gains. However, when power and heat concerns flattened the CPU clock speed ramp, chipmakers switched their focus on more cores per socket. Now, multi-core designs have the potential to do far more compute work across separate virtual machines than the serial IO is able to process on a single CPU. In other words, serial IO limits the potential work that multi-core CPUs can accomplish. And that IO gap is widening. The physical machine can’t get enough input and output to keep all those processors busy computing – so CPUs go idle and we are forced to spread the work over more underutilized servers. 3/30/16 Copyright Š 2015 DataCore Software Corp. – All Rights Reserved. &amp;lt;number&amp;gt;
  • #30 IO has long been a serial function designed for single processor machines. While processor speeds climbed, IO processing benefitted from it and kept pace with computational gains. However, when power and heat concerns flattened the CPU clock speed ramp, chipmakers switched their focus on more cores per socket. Now, multi-core designs have the potential to do far more compute work across separate virtual machines than the serial IO is able to process on a single CPU. In other words, serial IO limits the potential work that multi-core CPUs can accomplish. And that IO gap is widening. The physical machine can’t get enough input and output to keep all those processors busy computing – so CPUs go idle and we are forced to spread the work over more underutilized servers. 3/30/16 Copyright Š 2015 DataCore Software Corp. – All Rights Reserved. &amp;lt;number&amp;gt;
  • #31 The idea is simple. If you have only one worker it takes a lot longer to complete the kinds of tasks that can be divided across multiple workers. 3/30/16 Copyright Š 2015 DataCore Software Corp. – All Rights Reserved. &amp;lt;number&amp;gt;
  • #32 The idea is simple. If you have only one worker it takes a lot longer to complete the kinds of tasks that can be divided across multiple workers. 3/30/16 Copyright Š 2015 DataCore Software Corp. – All Rights Reserved. &amp;lt;number&amp;gt;
  • #33 The idea is simple. If you have only one worker it takes a lot longer to complete the kinds of tasks that can be divided across multiple workers. 3/30/16 Copyright Š 2015 DataCore Software Corp. – All Rights Reserved. &amp;lt;number&amp;gt;
  • #34 The idea is simple. If you have only one worker it takes a lot longer to complete the kinds of tasks that can be divided across multiple workers. 3/30/16 Copyright Š 2015 DataCore Software Corp. – All Rights Reserved. &amp;lt;number&amp;gt;
  • #35 Even the most modest server-class machines come equipped with multi-core CPUs. Take Lenovo’s ThinkServer RD530 with 10 CPU Cores for under $2,000. Instead of having just one CPU working for you, the system effectively has 10 workers that can be used simultaneously. Several applications can run concurrently with CPUs individually dedicated to them. 3/30/16 Copyright Š 2015 DataCore Software Corp. – All Rights Reserved. &amp;lt;number&amp;gt;
  • #59 3/30/16 Copyright Š 2011 DataCore Software Corp. – All Rights Reserved. &amp;lt;number&amp;gt;
  • #72 3/30/16 Copyright Š 2015 DataCore Software Corp. – All Rights Reserved. &amp;lt;number&amp;gt;
  • #73 Key Points DataCore has over 25K+ deployments worldwide across more than 10K customers DataCore’s storage software platform is in its 10th generation and used by companies &amp; organizations in all industries and sizes Let’s discuss the best way to DataCore in your environment 3/30/16 Copyright Š 2011 DataCore Software Corp. – All Rights Reserved. &amp;lt;number&amp;gt;