Revolutionary Storage
For Modern Applications
Sanjay Sabnis
@sabhub1
Big Data Science Meetup
@Paviliondata
05/25/2018
Agenda
• Welcome
• Big Data Application Demands
• Next Generation Storage for Big Data Applications
• Big Data Use Cases
It is About Data
Expectations
Speed
Performance
Latency
Accuracy
Solving the Scalability Problem
RACK
Adding More Nodes
Add More memory
Upgrade Networking
• Need more storage? Add more nodes ßà Comes with Compute As
well
• Under-utilization of storage - Islands of storage still exist
• Limited by Rack Level Data Management at Scale
• Network is not up to date to utilize new features of hardware
Infrastructure
Connectivity Cognition
2.0 3.0
How do we connect
the world?
How do we make
sense of the world?
Crossing the Chasm
AI
Autonomous Vehicles
Image Recognition
Von Neumann
Time to rethink infrastructure, tooling, and
development practices.
Modern Applications Requirements
• Compute/Network/Storage
• Rack Awareness/DC Awareness – Data Locality/HA
• Master/Slave - Scalable
• Master-less - Scalable
• IOPS, Bandwidth – High Data Transfer Rates – 25/50/100 GB is
Standard now.
• Storage Awareness? – This is something new!
• Non-compute Centric Data Management
The Compute & Storage Disconnect
• Compute and Storage Age Differently
• Compute has Moore’s Law, what about storage?
• Replacing Compute calls for replacing disks – Fixed Density, more $$$
• We all have been using SATA drives
• There is a new interface called NVMe for SSD (Non Volatile Memory
Express)
It is a logical device interface specification
Storage Protocol Differences
SATA NVMe
End of Freeway
11
Comparing NVMe to SATA
SATA SSD NVMe SSD NVMe Difference
Read BW (MB/s) 500 3300 6.6X
4K Read IOPS 64K 830K 13X
Write BW (MB/s) 475 2100 4.4X
4K Write IOPS 5K 200K 40X
4K Mixed IOPS (70:30 R:W) 11K 550K 55X
DATA INTENSIVE WORKLOADS
Analytics, IoT, Streaming Media, AI/ML, Databases
NVMe-oF SSD Array
• High Performance
• Cost Efficient
• High Utilization
• Scalable (14 TB to 1 PB)
• HA
• PaYG Model (Pay as You Grow)
NVMe-Based Storage for Big Data
Pavilion All-NVMe Storage Array
13
Advantages for Big Data Deployments
Reduce per-rack costs up to 72%
Improve Storage Utilization 2X+
Free up stranded capacity residing on DAS
Management Flexibility
Less raw storage deployed lowers IT Admin Costs
Move data sets from one server to another without copying
Reduce Infrastructure
Less Servers required, or consolidate more DB instances per
server
Eliminate DAS SSDs
Leverage Full-Performance, Space-Efficient Copies
14
Performance (Latency & Bandwidth) of Direct Attached Storage
Serviceability and Data Management of Shared Storage
DAS
Performance and Cost Advantages
Index More Data With Splunk
Lower Costs of
noSQL Deployments
Using networked Pavilion storage instead of direct-attached SSDs gives better performance per
server, allowing you to reduce server count and size, plus gain the cost advantages of a SAN
15
Pavilion All-NVMe Storage for Big Data
120 GB/S
PERFORMANCE
Up to 40 x 100GE
Ports
MODULAR
14TB – 1PB
CAPACITY
Up To 20, Active-
active Controllers
RESILIENCY
4 RU
DENSITY
Raid-6,
Snapshots, Thin
Provisioning
DATA MANAGEMENT
NVME & NVMEOF
100% STANDARDS COMPLIANT
X86, 2.5” NVMe
SSD
STANDARD OFF-THE-SHELF
COMPONENTS
1/10TH
$COST/IOPS
DISRUPTIVE ECONOMICS
Shared Block Storage For Big Data Applications
ü Hosts connected using
25/40/50/100Gb Ethernet
ü NVMe block storage presented to
host servers using
community/standard NVMeoF driver
ü No custom host software required
ü 10s of micro-second latency
ü Latency of DAS SSDs
ü Full HA capability and hot-pluggable
components
Thin-Provisioned
NVMe volumes
presented to the
host server
17
Management Integration
18
Rest API
Use Cases
Cassandra
C*
C*
C*
1
2
3
Volumes for Node2
a2 b2 c2
a Commit Log
b Data
c Log
Volumes for Node1
a1 b1 c1
Rack Scale
• Dense Compute Rack
• Easy to Add or Replace nodes
• Integrates into DevOps using Rest API
• Thin provisioning to save flash resources
• Increase Volume Size Dynamically
• Manage instant data copies using Rest API
~ 1PB Storage
Snapshot/Clone
Data Backup/Restore
Rack 1 Rack 2
Adding a New Shard • Adding Shard to the Cluster
• Add shard to scale the MongoDB cluster horizontally
• Affects the balance of chunks among the shards of a cluster for all
existing sharded collections.
• The balancer will begin migrating chunks so that the cluster will
achieve balance
• Rebalance will affect existing Read/Write and IOPS performance.
PRIMARY
SECONDARY
SECONDARY
SHARD 1
PRIMARY
SECONDARY
SECONDARY
SHARD 2
APP SERVER APP SERVERAPP SERVER
REPLICA SET 1 REPLICA SET 2
Present Individual Volumes
For each node
PRIMARY
SECONDARY
SECONDARY
SHARD 3
REPLICA SET 3
New Shard
>
> Speed Shard Rebalancing
• Pavilion Advantages
• No sizing activity required.
• No impact of no. of parallel chunk migrations, same IOPS for all with 40 ports
• Pre Configure Pavilion volumes for future shard expansion to automate the
scaling activity
• Over provision the volume size to alleviate IOPS performance as data grows
>
MongoDB - Leveraging Snapshots and Clones
PRIMARY
SECONDARY
SECONDARY
.
.
SECONDARY
Instant Clone
Point in Time Instant Pavilion Snapshots
PRODUCTION
PRIMARY
SECONDARY
SECONDARY
.
.
.
DEV/QA/PREPROD
Backup/Archive
Instant Clone
Use Clone to Scale Replica Set Use Clone to spin up DEV/QA/PREPROD quickly
Pavilion Instant Clones
SECONDARY
Replication
• Scale MongoDB infrastructure without downtime
• Rapid volume cloning capabilities allow for new backup and deployment strategies
• Instant cloning makes node recovery and replacement easy
Reduce Splunk Indexer Sprawl
PAVILION DATA CONFIDENTIAL & PROPRIETARY 23
HOT
WARM
COLD
FROZEN
Tier 1 - $$$$
Tier 2 - $$$
Tier 3 - $$
Tier 4 - $
Backup
Read-Only Snapshots
QA/Dev/PreProd Testing
R/W Clones
Consolidate All Splunk Data on One High-Speed Storage Platform, Simplify Backup and Copy Management
Addressing Splunk Challenges
24
Splunk Solution Design Considerations
Insufficient disk I/O is the most common limitation in Splunk infrastructure
Pavilion delivers over 100 GB/s of bandwidth, and 20 Million IOPS from a
compact, 4U Chassis, which can power even the largest Splunk
deployments
Review the disk subsystem requirements before provisioning your hardware
Pavilion’s scalable platform allows you to focus on the needs on the
compute infrastructure instead of storage
More disks (specifically, more spindles) are better for indexing performance
Pavilion’s low latency storage platform eliminates storage as the indexing
bottleneck
Total throughput of the entire system is important.
Pavilion delivers significant improvements in performance and improves
decision times.
The ratio of disks to disk controllers in a particular system should be higher,
similar to how you provision a database host
Pavilion’s performance and capacity allows for easy storage configuration.
Hot Bucket – Cannot Backup
Take backup of any volume any time without performance overhead on
indexing nodes by using the Pavilion Snapshot feature
Modernize Database Deployments
25
ü Simplify Infrastructure by disaggregating storage
into a centralized, rack-scale appliance
ü Leverage shared storage resources at the speed
and latency of local SSDs
ü Reduce raw flash required
ü Independently scale compute, networking and
storage to maximize flexibility
ü Move to ‘storage-less’ 1U servers to increase
compute density per rack
ü Centralize storage resources to facilitate easy
backup and restore
ü Instantly deploy new copies of the database for
test/dev/QA purposes
DENSE
Compute CLUSTER
Other Use Cases
…………
New Data Architectures
Centralized Logging
“We are a log Management Company that happens
To Stream Videos”
-Netflix Chief Architect
Log Monitoring/Forwarding/….
No Log Forwarding from each Node
Save CPU Cycles
Container Architecture - Cloud
• Fits into Kubernetes or OpenStack implementations
• Integrate Pavilion REST API with Cinder Wrapper provided by the Pavilion
• Storage can be used as Static or Dynamic Volume provisioning
• Fits readily into DevOps CI/CD setup with provided REST API interfaces
• Utilize the Pavilion Snapshot, Clone and volume migration features to manage data beyond lifecycle of the virtual image
• Supports Block Storage, NFS ( S3 support in near future ).
Kubernetes
Pod
Nova
KeyStone
Boot
Launch
Authentication
Persistent
Volume
Docker
Kubernetes Cluster - Datacenter
OpenStack
CSI
Wrapper
Cinder Block
Storage Volumes
Rack Scale Flash Array
Docker’s Containers-as-a-service
(CaaS) platform that can run atop
cloud-based infrastructure such as
OpenStack, or on bare metal
infrastructure, providing complete
application lifecycle management
for container deployments.
HiBD (Hi-Performance Big Data)
• NVMe-oF opens up opportunity for commoditizing the HiBD
• RDMA + NVMe = Killer IOPS & Bandwidth
• Lots of Development has been done using RDMA-based HiBD
Apache Crail - Incubating
Pavilion - 120 GB/S
With DAS Latency
Crail is designed from ground up
for modern high-performance
networking and storage
hardware (RDMA, NVMe, NVMf,
etc.). It leverages user-level I/O to
access hardware directly from
the application context, providing
bare-metal I/O performance to
analytics workloads.
Storage Awareness
Revolutionary Storage for Modern Databases, Applications and Infrastrcture

Revolutionary Storage for Modern Databases, Applications and Infrastrcture

  • 1.
    Revolutionary Storage For ModernApplications Sanjay Sabnis @sabhub1 Big Data Science Meetup @Paviliondata 05/25/2018
  • 2.
    Agenda • Welcome • BigData Application Demands • Next Generation Storage for Big Data Applications • Big Data Use Cases
  • 3.
  • 4.
  • 5.
    Solving the ScalabilityProblem RACK Adding More Nodes Add More memory Upgrade Networking • Need more storage? Add more nodes ßà Comes with Compute As well • Under-utilization of storage - Islands of storage still exist • Limited by Rack Level Data Management at Scale • Network is not up to date to utilize new features of hardware
  • 6.
    Infrastructure Connectivity Cognition 2.0 3.0 Howdo we connect the world? How do we make sense of the world?
  • 7.
    Crossing the Chasm AI AutonomousVehicles Image Recognition Von Neumann Time to rethink infrastructure, tooling, and development practices.
  • 8.
    Modern Applications Requirements •Compute/Network/Storage • Rack Awareness/DC Awareness – Data Locality/HA • Master/Slave - Scalable • Master-less - Scalable • IOPS, Bandwidth – High Data Transfer Rates – 25/50/100 GB is Standard now. • Storage Awareness? – This is something new! • Non-compute Centric Data Management
  • 9.
    The Compute &Storage Disconnect • Compute and Storage Age Differently • Compute has Moore’s Law, what about storage? • Replacing Compute calls for replacing disks – Fixed Density, more $$$ • We all have been using SATA drives • There is a new interface called NVMe for SSD (Non Volatile Memory Express) It is a logical device interface specification
  • 10.
  • 11.
    11 Comparing NVMe toSATA SATA SSD NVMe SSD NVMe Difference Read BW (MB/s) 500 3300 6.6X 4K Read IOPS 64K 830K 13X Write BW (MB/s) 475 2100 4.4X 4K Write IOPS 5K 200K 40X 4K Mixed IOPS (70:30 R:W) 11K 550K 55X
  • 12.
    DATA INTENSIVE WORKLOADS Analytics,IoT, Streaming Media, AI/ML, Databases NVMe-oF SSD Array • High Performance • Cost Efficient • High Utilization • Scalable (14 TB to 1 PB) • HA • PaYG Model (Pay as You Grow) NVMe-Based Storage for Big Data
  • 13.
  • 14.
    Advantages for BigData Deployments Reduce per-rack costs up to 72% Improve Storage Utilization 2X+ Free up stranded capacity residing on DAS Management Flexibility Less raw storage deployed lowers IT Admin Costs Move data sets from one server to another without copying Reduce Infrastructure Less Servers required, or consolidate more DB instances per server Eliminate DAS SSDs Leverage Full-Performance, Space-Efficient Copies 14 Performance (Latency & Bandwidth) of Direct Attached Storage Serviceability and Data Management of Shared Storage DAS
  • 15.
    Performance and CostAdvantages Index More Data With Splunk Lower Costs of noSQL Deployments Using networked Pavilion storage instead of direct-attached SSDs gives better performance per server, allowing you to reduce server count and size, plus gain the cost advantages of a SAN 15
  • 16.
    Pavilion All-NVMe Storagefor Big Data 120 GB/S PERFORMANCE Up to 40 x 100GE Ports MODULAR 14TB – 1PB CAPACITY Up To 20, Active- active Controllers RESILIENCY 4 RU DENSITY Raid-6, Snapshots, Thin Provisioning DATA MANAGEMENT NVME & NVMEOF 100% STANDARDS COMPLIANT X86, 2.5” NVMe SSD STANDARD OFF-THE-SHELF COMPONENTS 1/10TH $COST/IOPS DISRUPTIVE ECONOMICS
  • 17.
    Shared Block StorageFor Big Data Applications ü Hosts connected using 25/40/50/100Gb Ethernet ü NVMe block storage presented to host servers using community/standard NVMeoF driver ü No custom host software required ü 10s of micro-second latency ü Latency of DAS SSDs ü Full HA capability and hot-pluggable components Thin-Provisioned NVMe volumes presented to the host server 17
  • 18.
  • 19.
  • 20.
    Cassandra C* C* C* 1 2 3 Volumes for Node2 a2b2 c2 a Commit Log b Data c Log Volumes for Node1 a1 b1 c1 Rack Scale • Dense Compute Rack • Easy to Add or Replace nodes • Integrates into DevOps using Rest API • Thin provisioning to save flash resources • Increase Volume Size Dynamically • Manage instant data copies using Rest API ~ 1PB Storage Snapshot/Clone Data Backup/Restore Rack 1 Rack 2
  • 21.
    Adding a NewShard • Adding Shard to the Cluster • Add shard to scale the MongoDB cluster horizontally • Affects the balance of chunks among the shards of a cluster for all existing sharded collections. • The balancer will begin migrating chunks so that the cluster will achieve balance • Rebalance will affect existing Read/Write and IOPS performance. PRIMARY SECONDARY SECONDARY SHARD 1 PRIMARY SECONDARY SECONDARY SHARD 2 APP SERVER APP SERVERAPP SERVER REPLICA SET 1 REPLICA SET 2 Present Individual Volumes For each node PRIMARY SECONDARY SECONDARY SHARD 3 REPLICA SET 3 New Shard > > Speed Shard Rebalancing • Pavilion Advantages • No sizing activity required. • No impact of no. of parallel chunk migrations, same IOPS for all with 40 ports • Pre Configure Pavilion volumes for future shard expansion to automate the scaling activity • Over provision the volume size to alleviate IOPS performance as data grows >
  • 22.
    MongoDB - LeveragingSnapshots and Clones PRIMARY SECONDARY SECONDARY . . SECONDARY Instant Clone Point in Time Instant Pavilion Snapshots PRODUCTION PRIMARY SECONDARY SECONDARY . . . DEV/QA/PREPROD Backup/Archive Instant Clone Use Clone to Scale Replica Set Use Clone to spin up DEV/QA/PREPROD quickly Pavilion Instant Clones SECONDARY Replication • Scale MongoDB infrastructure without downtime • Rapid volume cloning capabilities allow for new backup and deployment strategies • Instant cloning makes node recovery and replacement easy
  • 23.
    Reduce Splunk IndexerSprawl PAVILION DATA CONFIDENTIAL & PROPRIETARY 23 HOT WARM COLD FROZEN Tier 1 - $$$$ Tier 2 - $$$ Tier 3 - $$ Tier 4 - $ Backup Read-Only Snapshots QA/Dev/PreProd Testing R/W Clones Consolidate All Splunk Data on One High-Speed Storage Platform, Simplify Backup and Copy Management
  • 24.
    Addressing Splunk Challenges 24 SplunkSolution Design Considerations Insufficient disk I/O is the most common limitation in Splunk infrastructure Pavilion delivers over 100 GB/s of bandwidth, and 20 Million IOPS from a compact, 4U Chassis, which can power even the largest Splunk deployments Review the disk subsystem requirements before provisioning your hardware Pavilion’s scalable platform allows you to focus on the needs on the compute infrastructure instead of storage More disks (specifically, more spindles) are better for indexing performance Pavilion’s low latency storage platform eliminates storage as the indexing bottleneck Total throughput of the entire system is important. Pavilion delivers significant improvements in performance and improves decision times. The ratio of disks to disk controllers in a particular system should be higher, similar to how you provision a database host Pavilion’s performance and capacity allows for easy storage configuration. Hot Bucket – Cannot Backup Take backup of any volume any time without performance overhead on indexing nodes by using the Pavilion Snapshot feature
  • 25.
    Modernize Database Deployments 25 üSimplify Infrastructure by disaggregating storage into a centralized, rack-scale appliance ü Leverage shared storage resources at the speed and latency of local SSDs ü Reduce raw flash required ü Independently scale compute, networking and storage to maximize flexibility ü Move to ‘storage-less’ 1U servers to increase compute density per rack ü Centralize storage resources to facilitate easy backup and restore ü Instantly deploy new copies of the database for test/dev/QA purposes DENSE Compute CLUSTER
  • 26.
  • 27.
    New Data Architectures CentralizedLogging “We are a log Management Company that happens To Stream Videos” -Netflix Chief Architect Log Monitoring/Forwarding/…. No Log Forwarding from each Node Save CPU Cycles
  • 28.
    Container Architecture -Cloud • Fits into Kubernetes or OpenStack implementations • Integrate Pavilion REST API with Cinder Wrapper provided by the Pavilion • Storage can be used as Static or Dynamic Volume provisioning • Fits readily into DevOps CI/CD setup with provided REST API interfaces • Utilize the Pavilion Snapshot, Clone and volume migration features to manage data beyond lifecycle of the virtual image • Supports Block Storage, NFS ( S3 support in near future ). Kubernetes Pod Nova KeyStone Boot Launch Authentication Persistent Volume Docker Kubernetes Cluster - Datacenter OpenStack CSI Wrapper Cinder Block Storage Volumes Rack Scale Flash Array Docker’s Containers-as-a-service (CaaS) platform that can run atop cloud-based infrastructure such as OpenStack, or on bare metal infrastructure, providing complete application lifecycle management for container deployments.
  • 29.
    HiBD (Hi-Performance BigData) • NVMe-oF opens up opportunity for commoditizing the HiBD • RDMA + NVMe = Killer IOPS & Bandwidth • Lots of Development has been done using RDMA-based HiBD Apache Crail - Incubating Pavilion - 120 GB/S With DAS Latency Crail is designed from ground up for modern high-performance networking and storage hardware (RDMA, NVMe, NVMf, etc.). It leverages user-level I/O to access hardware directly from the application context, providing bare-metal I/O performance to analytics workloads. Storage Awareness