REFERENCE ARCHITECTURES
WHAT’S NEW
Brent Compton
Director Storage Solution Architectures
Red Hat
Red Hat Storage Day NYC Oct 2016
Reference Architecture Work
2
MYSQL & HADOOPSOFTWARE-DEFINED NAS DIGITAL MEDIA REPOS
Reference Architecture Work
3
MYSQL & HADOOPSOFTWARE-DEFINED NAS DIGITAL MEDIA REPOS
(link)
Appetite for Storage-Centric Cloud Services
EC2 EBS S3 KVM RBD RGW
(Nova) (Cinder) (Swift)
RDS EMR MySQL Hadoop
Cloud Storage Features Useful for MySQL
HEAD-TO-HEAD LAB
TEST ENVIRONMENTS
•  EC2 r3.2xlarge and m4.4xlarge
•  EBS Provisioned IOPS and GPSSD
•  Percona Server
•  Supermicro servers
•  Red Hat Ceph Storage RBD
•  Percona Server
30.0 29.8
3.6
25.6 25.7
4.1
0.0
5.0
10.0
15.0
20.0
25.0
30.0
35.0
P-IOPS
m4.4xl
P-IOPS
r3.2xl
GP-SSD
r3.2xl
100% Read
100% Write
AWS IOPS/GB BASELINE: ~ AS ADVERTISED
IOPS/GB PER MYSQL INSTANCE
30
252
150
26
78
19
0
50
100
150
200
250
300
P-IOPS
m4.4xl
Ceph cluster
1x "m4.4xl"
(14%
capacity)
Ceph cluster
6x "m4.4xl"
(87%
capacity)
MySQL IOPS/GB
Reads
MySQL IOPS/GB
Writes
FOCUSING ON WRITE IOPS/GB
AWS IO THROTTLING LEVEL FOR DETERMINISTIC PERFORMANCE
26
78
19
0
10
20
30
40
50
60
70
80
90
P-IOPS
m4.4xl
Ceph cluster
1x "m4.4xl"
(14% capacity)
Ceph cluster
6x "m4.4xl"
(87% capacity)
EFFECT OF CEPH CLUSTER LOADING ON IOPS/GB
78
37
25
19
0
10
20
30
40
50
60
70
80
90
Ceph cluster
(14%
capacity)
Ceph cluster
(36%
capacity)
Ceph cluster
(72%
capacity)
Ceph cluster
(87%
capacity)
IOPS/GB
100% Write
$/STORAGE-IOP* FOR COMPRABLE CONFIGS
$2.40
$0.80 $0.78
$1.06
$-
$1.00
$2.00
$3.00
Storage$/IOP
(SysbenchWrite)
AWS EBS Provisioned-IOPS
Ceph on Supermicro FatTwin 72% Capacity
Ceph on Supermicro MicroCloud 87% Capacity
Ceph on Supermicro MicroCloud 14% Capacity
* Ceph configs do not
include power, cooling,
or admin costs
18 18
19
6
0
5
10
15
20
25
Ceph cluster
80 cores
8 NVMe
(87%
capacity)
Ceph cluster
40 cores
4 NVMe
(87%
capacity)
Ceph cluster
80 cores
4 NVMe
(87%
capacity)
Ceph cluster
80 cores
12 NVMe
(84%
capacity)
IOPS/GB
100% Write
CONSIDERING CORE-TO-FLASH RATIO
8x Nodes in 3U chassis
Model:
SYS-5038MR-OSDXXXP
Per Node Configuration:
CPU: Single Intel Xeon E5-2630 v4
Memory: 32GB
NVMe Storage: Single 800GB Intel P3700
Networking: 1x dual-port 10G SFP+
+ +
1x CPU + 1x NVMe + 1x SFP
SUPERMICRO MICRO CLOUD
CEPH MYSQL PERFORMANCE SKU
Enhancing On-premise MySQL Scalability
Appetite for Storage-Centric Cloud Services
EC2 EBS S3 KVM RBD RGW
(Nova) (Cinder) (Swift)
RDS EMR MySQL Hadoop
Trend: Disaggregating Hadoop Compute and Storage
§  Hadoop retained data growing at faster pace than hadoop compute needs.
§  Don’t want to waste money on un-needed compute in more hadoop data
nodes.
§  Driving trend to disaggregate storage from traditional hadoop nodes.
(eBay blog on tiering – here)
§  Multiple disaggregation architecture options
Data Flow Options (Traditional, Partial Disaggregation)
ingres copy
?
?
HDFS
HDFS
aging
retrieval
Ceph
MapReduce/Pig, Spark, Hbase/Hive
ingres copy
S3A
MapReduce, Cold Data
S3AMapReduce/Pig, Spark, Hbase/Hive
Data Flow Options (Full Disaggregation)
Ceph
MapReduce/Pig, Hive/HBase MapReduce, Spark Hot Data
Non-Hadoop Tools
ingres
S3
S3A
S3
RBD
HDFS
over
RBD
volumes
Reference Architecture Work
20
MYSQL & HADOOPSOFTWARE-DEFINED NAS DIGITAL MEDIA REPOS
•  Digital object repo
•  Digital file repo (WIP)
(link)
-				 	20,000			40,000			60,000			80,000			100,000		
UHD	movie	
Blue-ray	movie	
HD	movie	
DVD	movie	
Audio	CD	
MP3	song	
e-book	
	100,000		
	25,000		
	12,000		
	3,000		
	750		
	4		
	1		
MB	
Different Servers Yield 10x Ceph Performance
1 DVD movie/sec with 3-node* cluster A
1 Blue-ray movie/sec with 3-node* cluster B
*Both A & B cluster nodes are 2U servers
Ceph Nodes Saturating 80GbE Pipes
1MB	Seq.	Read	=	28.5	GB/sec 1MB	Seq.	Write	=	6.2	GB/sec
… and note optimal CPU/SSD ratio for IOPS
4KB	Random	Read	=	693K	IOPS 4KB	Random	Write	=	87.8K	IOPS
Sample High Throughput Config for Ceph
• 2x Intel E5-269xv3 (up to 145W per CPU)
• 4x-24x 2.5” hot swap Samsung NVMe SSDs
• 16x DDR4 2133MHz L/RDIMM & up to 1024GB
• 2x 16-lane PCIe Gen3 slots
• 2x dual 40 GbE NICs (with 100 GbE option)
• EIA 310-D 19” in 2U
Scaling Ceph Object Storage (RGW) to NIC Saturation
Reference Architecture Work
27
MYSQL & HADOOPSOFTWARE-DEFINED NAS DIGITAL MEDIA REPOS
(link)
Server Category Nomenclature
Contemporary	Storage	
Server	Chassis	Categories
Flash	Blades Flash	Array Standard Dense Ultra-dense
Storage	media
SSD
(NVMe	hi-perf)
SSD
(NVMe	mid-perf)
HDD	+	SSD HDD	+	SSD HDD
Media	drives/node 1-4 4-24 12-16 24-46 60-90
Server	nodes	per	chassis 2-8 1 1 1-2 1
CPU	sizing*** 10	cores	per	SSD 4	cores	per	SSD 1	core	per	2	HDD 1	core	per	2	HDD 1	core	per	2	HDD
Server	networking 10GbE 40GbE+ 10GbE
10GbE	(archive)
40GbE	(active)
10GbE	(archive)
40GbE	(active)
Target	IO	pattern small	random	IO large	sequential	IO mixed mixed large	sequential	IO
Vendor	Examples
Supermicro	Microcloud
Samsung	Sierra	2U/4*
SanDisk	InfiniFlash**
Samsung	Sierra	2U/24
Supermicro	2U/12
Quanta/QCT	1U/12
Dell	R730XD
Cisco	C240M
Lenovo	X3650
Supermicro	4U/36
Quanta/QCT	4U/35x2
Dell	DSS7000/2
Cisco	C3260
Supermicro	4U/72
Quanta/QCT	4U/76
Dell	DSS7000/1
Cisco	C3160
*	smaller	flash	array **JBOF	with	servers ***characterized	for	Ceph
-				 	20,000			40,000			60,000			80,000			100,000		
UHD	movie	
Blue-ray	movie	
HD	movie	
DVD	movie	
Audio	CD	
MP3	song	
e-book	
	100,000		
	25,000		
	12,000		
	3,000		
	750		
	4		
	1		
MB	
Size of Common Things
Jumbo File Performance Comparison
(4GB files with 4MB sequential IO - think DVD video size)
0 50 100 150 200 250
Manufacturer spec (HDD)
Standard, baseline (1xRAID6 vol, no Gluster)
Dense, baseline (3xRAID6 vol, no Gluster)
Standard (EC4:2), JBOD Bricks
Dense, (2xRep), RAID6 Bricks
Standard, (2xRep) RAID6 Bricks
Dense (EC4:2), RAID6 bricks
Standard (EC4:2), RAID6 Bricks
MB/sec per Drive (HDD)
Read
Write
Jumbo File Price-Performance Comparison
(4GB files with 4MB sequential IO - think DVD video size)
0 50 100 150 200 250 300 350
Standard (EC4:2), JBOD Bricks
Dense, (2xRep), RAID6 Bricks
Standard, (2xRep) RAID6 Bricks
Dense (EC4:2), RAID6 bricks
Standard (EC4:2), RAID6 Bricks
MB/sec per $
Read
Write
3 Year TCO
(incl. support)
TCO COMPARISON
For 1PB Usable Capacity Throughput-optimized Solutions
Configuration Highlights:
•  HDD-only media
•  2x replication with RHGS
•  8:3 Erasure Coding with
EMC Isilon
•  Higher CPU-to-media ratio
than archive-optimized
X-210 12LFF 12LFF
(standard)
X-410 36LFF 36LFF
(dense)
Pricing Sources: EMC Isilon: Gartner Competitive Profiles, as of 2/16/16) & Supermicro: Thinkmate, as of 1/13/16)
Comparing Throughput
and Costs at Scale
STORAGE PERFORMANCE
SCALABILITY
STORAGE COSTS
SCALABILITY
Number of Storage Nodes Number of Storage Nodes
TotalStorageCosts($)
Reads/WritesThroughput(mBps)
Software Defined
Scale out Storage
Traditional
Enterprise NAS
Storage
Traditional
Enterprise NAS
Storage
Software Defined
Scale out Storage
Small File Performance Comparison
(50KB files - think small jpeg image size)
0 100 200 300 400 500 600
Dense (EC4:2), Tiered (2xSSD/svr)
Dense, no tiering
Dense, Tiered (2xSSD/svr)
Standard, no tiering
Standard, Tiered (4xSSD/svr)
Standard, Tiered (2xNVMe/svr), 70%
Standard, Tiered (2xNVMe/svr)
Standard, Tiered (1xNVMe/svr)
File Operations/Second per drive (HDD), 50KB Files
Read
Create
Small File Performance Comparison
(50KB files - think small jpeg image size)
0 100 200 300 400 500 600
Dense (EC4:2), Tiered (2xSSD/svr)
Dense, no tiering
Dense, Tiered (2xSSD/svr)
Standard, no tiering
Standard, Tiered (4xSSD/svr)
Standard, Tiered (2xNVMe/svr), 70% full
Standard, Tiered (2xNVMe/svr)
Standard, Tiered (1xNVMe/svr)
File Operations/Second per drive (HDD), 50KB Files
Read
Create
Small File Performance Comparison
(50KB files - think small jpeg image size)
0 100 200 300 400 500 600
Dense (EC4:2), Tiered (2xSSD/svr)
Dense, no tiering
Dense, Tiered (2xSSD/svr)
Standard, no tiering
Standard, Tiered (4xSSD/svr)
Standard, Tiered (2xNVMe/svr), 70%
Standard, Tiered (2xNVMe/svr)
Standard, Tiered (1xNVMe/svr)
File Operations/Second per drive (HDD), 50KB Files
Read
Create
Test drive:
bit.ly/cephtestdrive bit.ly/glustertestdrive
Test drive:
Try it
Building your first software-defined storage cluster
THANK YOU

Red Hat Storage Day New York - New Reference Architectures

  • 1.
    REFERENCE ARCHITECTURES WHAT’S NEW BrentCompton Director Storage Solution Architectures Red Hat Red Hat Storage Day NYC Oct 2016
  • 2.
    Reference Architecture Work 2 MYSQL& HADOOPSOFTWARE-DEFINED NAS DIGITAL MEDIA REPOS
  • 3.
    Reference Architecture Work 3 MYSQL& HADOOPSOFTWARE-DEFINED NAS DIGITAL MEDIA REPOS
  • 4.
  • 5.
    Appetite for Storage-CentricCloud Services EC2 EBS S3 KVM RBD RGW (Nova) (Cinder) (Swift) RDS EMR MySQL Hadoop
  • 6.
    Cloud Storage FeaturesUseful for MySQL
  • 7.
    HEAD-TO-HEAD LAB TEST ENVIRONMENTS • EC2 r3.2xlarge and m4.4xlarge •  EBS Provisioned IOPS and GPSSD •  Percona Server •  Supermicro servers •  Red Hat Ceph Storage RBD •  Percona Server
  • 8.
  • 9.
    IOPS/GB PER MYSQLINSTANCE 30 252 150 26 78 19 0 50 100 150 200 250 300 P-IOPS m4.4xl Ceph cluster 1x "m4.4xl" (14% capacity) Ceph cluster 6x "m4.4xl" (87% capacity) MySQL IOPS/GB Reads MySQL IOPS/GB Writes
  • 10.
    FOCUSING ON WRITEIOPS/GB AWS IO THROTTLING LEVEL FOR DETERMINISTIC PERFORMANCE 26 78 19 0 10 20 30 40 50 60 70 80 90 P-IOPS m4.4xl Ceph cluster 1x "m4.4xl" (14% capacity) Ceph cluster 6x "m4.4xl" (87% capacity)
  • 11.
    EFFECT OF CEPHCLUSTER LOADING ON IOPS/GB 78 37 25 19 0 10 20 30 40 50 60 70 80 90 Ceph cluster (14% capacity) Ceph cluster (36% capacity) Ceph cluster (72% capacity) Ceph cluster (87% capacity) IOPS/GB 100% Write
  • 12.
    $/STORAGE-IOP* FOR COMPRABLECONFIGS $2.40 $0.80 $0.78 $1.06 $- $1.00 $2.00 $3.00 Storage$/IOP (SysbenchWrite) AWS EBS Provisioned-IOPS Ceph on Supermicro FatTwin 72% Capacity Ceph on Supermicro MicroCloud 87% Capacity Ceph on Supermicro MicroCloud 14% Capacity * Ceph configs do not include power, cooling, or admin costs
  • 13.
    18 18 19 6 0 5 10 15 20 25 Ceph cluster 80cores 8 NVMe (87% capacity) Ceph cluster 40 cores 4 NVMe (87% capacity) Ceph cluster 80 cores 4 NVMe (87% capacity) Ceph cluster 80 cores 12 NVMe (84% capacity) IOPS/GB 100% Write CONSIDERING CORE-TO-FLASH RATIO
  • 14.
    8x Nodes in3U chassis Model: SYS-5038MR-OSDXXXP Per Node Configuration: CPU: Single Intel Xeon E5-2630 v4 Memory: 32GB NVMe Storage: Single 800GB Intel P3700 Networking: 1x dual-port 10G SFP+ + + 1x CPU + 1x NVMe + 1x SFP SUPERMICRO MICRO CLOUD CEPH MYSQL PERFORMANCE SKU
  • 15.
  • 16.
    Appetite for Storage-CentricCloud Services EC2 EBS S3 KVM RBD RGW (Nova) (Cinder) (Swift) RDS EMR MySQL Hadoop
  • 17.
    Trend: Disaggregating HadoopCompute and Storage §  Hadoop retained data growing at faster pace than hadoop compute needs. §  Don’t want to waste money on un-needed compute in more hadoop data nodes. §  Driving trend to disaggregate storage from traditional hadoop nodes. (eBay blog on tiering – here) §  Multiple disaggregation architecture options
  • 18.
    Data Flow Options(Traditional, Partial Disaggregation) ingres copy ? ? HDFS HDFS aging retrieval Ceph MapReduce/Pig, Spark, Hbase/Hive ingres copy S3A MapReduce, Cold Data S3AMapReduce/Pig, Spark, Hbase/Hive
  • 19.
    Data Flow Options(Full Disaggregation) Ceph MapReduce/Pig, Hive/HBase MapReduce, Spark Hot Data Non-Hadoop Tools ingres S3 S3A S3 RBD HDFS over RBD volumes
  • 20.
    Reference Architecture Work 20 MYSQL& HADOOPSOFTWARE-DEFINED NAS DIGITAL MEDIA REPOS •  Digital object repo •  Digital file repo (WIP)
  • 21.
  • 22.
    - 20,000 40,000 60,000 80,000 100,000 UHD movie Blue-ray movie HD movie DVD movie Audio CD MP3 song e-book 100,000 25,000 12,000 3,000 750 4 1 MB Different ServersYield 10x Ceph Performance 1 DVD movie/sec with 3-node* cluster A 1 Blue-ray movie/sec with 3-node* cluster B *Both A & B cluster nodes are 2U servers
  • 23.
    Ceph Nodes Saturating80GbE Pipes 1MB Seq. Read = 28.5 GB/sec 1MB Seq. Write = 6.2 GB/sec
  • 24.
    … and noteoptimal CPU/SSD ratio for IOPS 4KB Random Read = 693K IOPS 4KB Random Write = 87.8K IOPS
  • 25.
    Sample High ThroughputConfig for Ceph • 2x Intel E5-269xv3 (up to 145W per CPU) • 4x-24x 2.5” hot swap Samsung NVMe SSDs • 16x DDR4 2133MHz L/RDIMM & up to 1024GB • 2x 16-lane PCIe Gen3 slots • 2x dual 40 GbE NICs (with 100 GbE option) • EIA 310-D 19” in 2U
  • 26.
    Scaling Ceph ObjectStorage (RGW) to NIC Saturation
  • 27.
    Reference Architecture Work 27 MYSQL& HADOOPSOFTWARE-DEFINED NAS DIGITAL MEDIA REPOS
  • 28.
  • 29.
    Server Category Nomenclature Contemporary Storage Server Chassis Categories Flash BladesFlash Array Standard Dense Ultra-dense Storage media SSD (NVMe hi-perf) SSD (NVMe mid-perf) HDD + SSD HDD + SSD HDD Media drives/node 1-4 4-24 12-16 24-46 60-90 Server nodes per chassis 2-8 1 1 1-2 1 CPU sizing*** 10 cores per SSD 4 cores per SSD 1 core per 2 HDD 1 core per 2 HDD 1 core per 2 HDD Server networking 10GbE 40GbE+ 10GbE 10GbE (archive) 40GbE (active) 10GbE (archive) 40GbE (active) Target IO pattern small random IO large sequential IO mixed mixed large sequential IO Vendor Examples Supermicro Microcloud Samsung Sierra 2U/4* SanDisk InfiniFlash** Samsung Sierra 2U/24 Supermicro 2U/12 Quanta/QCT 1U/12 Dell R730XD Cisco C240M Lenovo X3650 Supermicro 4U/36 Quanta/QCT 4U/35x2 Dell DSS7000/2 Cisco C3260 Supermicro 4U/72 Quanta/QCT 4U/76 Dell DSS7000/1 Cisco C3160 * smaller flash array **JBOF with servers ***characterized for Ceph
  • 30.
  • 31.
    Jumbo File PerformanceComparison (4GB files with 4MB sequential IO - think DVD video size) 0 50 100 150 200 250 Manufacturer spec (HDD) Standard, baseline (1xRAID6 vol, no Gluster) Dense, baseline (3xRAID6 vol, no Gluster) Standard (EC4:2), JBOD Bricks Dense, (2xRep), RAID6 Bricks Standard, (2xRep) RAID6 Bricks Dense (EC4:2), RAID6 bricks Standard (EC4:2), RAID6 Bricks MB/sec per Drive (HDD) Read Write
  • 32.
    Jumbo File Price-PerformanceComparison (4GB files with 4MB sequential IO - think DVD video size) 0 50 100 150 200 250 300 350 Standard (EC4:2), JBOD Bricks Dense, (2xRep), RAID6 Bricks Standard, (2xRep) RAID6 Bricks Dense (EC4:2), RAID6 bricks Standard (EC4:2), RAID6 Bricks MB/sec per $ Read Write
  • 33.
    3 Year TCO (incl.support) TCO COMPARISON For 1PB Usable Capacity Throughput-optimized Solutions Configuration Highlights: •  HDD-only media •  2x replication with RHGS •  8:3 Erasure Coding with EMC Isilon •  Higher CPU-to-media ratio than archive-optimized X-210 12LFF 12LFF (standard) X-410 36LFF 36LFF (dense) Pricing Sources: EMC Isilon: Gartner Competitive Profiles, as of 2/16/16) & Supermicro: Thinkmate, as of 1/13/16)
  • 34.
    Comparing Throughput and Costsat Scale STORAGE PERFORMANCE SCALABILITY STORAGE COSTS SCALABILITY Number of Storage Nodes Number of Storage Nodes TotalStorageCosts($) Reads/WritesThroughput(mBps) Software Defined Scale out Storage Traditional Enterprise NAS Storage Traditional Enterprise NAS Storage Software Defined Scale out Storage
  • 35.
    Small File PerformanceComparison (50KB files - think small jpeg image size) 0 100 200 300 400 500 600 Dense (EC4:2), Tiered (2xSSD/svr) Dense, no tiering Dense, Tiered (2xSSD/svr) Standard, no tiering Standard, Tiered (4xSSD/svr) Standard, Tiered (2xNVMe/svr), 70% Standard, Tiered (2xNVMe/svr) Standard, Tiered (1xNVMe/svr) File Operations/Second per drive (HDD), 50KB Files Read Create
  • 36.
    Small File PerformanceComparison (50KB files - think small jpeg image size) 0 100 200 300 400 500 600 Dense (EC4:2), Tiered (2xSSD/svr) Dense, no tiering Dense, Tiered (2xSSD/svr) Standard, no tiering Standard, Tiered (4xSSD/svr) Standard, Tiered (2xNVMe/svr), 70% full Standard, Tiered (2xNVMe/svr) Standard, Tiered (1xNVMe/svr) File Operations/Second per drive (HDD), 50KB Files Read Create
  • 37.
    Small File PerformanceComparison (50KB files - think small jpeg image size) 0 100 200 300 400 500 600 Dense (EC4:2), Tiered (2xSSD/svr) Dense, no tiering Dense, Tiered (2xSSD/svr) Standard, no tiering Standard, Tiered (4xSSD/svr) Standard, Tiered (2xNVMe/svr), 70% Standard, Tiered (2xNVMe/svr) Standard, Tiered (1xNVMe/svr) File Operations/Second per drive (HDD), 50KB Files Read Create
  • 38.
    Test drive: bit.ly/cephtestdrive bit.ly/glustertestdrive Testdrive: Try it Building your first software-defined storage cluster
  • 39.