FAST Storage Design Basics for EMC VNX
 

FAST Storage Design Basics for EMC VNX

on

  • 1,280 views

FAST Storage Design Basics for EMC VNX ...

FAST Storage Design Basics for EMC VNX

The session introduces the notion of Skew, and outline two approaches to using tiered storage to leverage skew. This session also includes discussions of FAST Cache and FAST VP with SSD, when to use one or both together and what tools are available to help right size the tiered configuration. Also we'll cover use of SAS drives between the SSD and NL-SAS layer.


Objective 1: Understand a skew measurement from a EMC field engineer.
After this session you will be able to:
Objective 2: Understand how pools are built and the ramifications of pool design.
Objective 3: Understand the basics of pool design at the system level.

Statistics

Views

Total Views
1,280
Views on SlideShare
1,280
Embed Views
0

Actions

Likes
1
Downloads
83
Comments
1

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

FAST Storage Design Basics for EMC VNX FAST Storage Design Basics for EMC VNX Presentation Transcript

  • FAST Storage Design Basics for EMC VNX Mick Turner – Unified Storage Division © Copyright 2013 EMC Corporation. All rights reserved. 1
  • Roadmap Information Disclaimer  EMC makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”).  Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.  Roadmap information is EMC Restricted Confidential and is provided under the terms, conditions and restrictions defined in the EMC NonDisclosure Agreement in place with your organization. © Copyright 2013 EMC Corporation. All rights reserved. 2
  • Agenda  FAST VP and FAST Cache Mechanics  The variables that affect your design  Visualization of Skew  Applications and Skew – How much is enough? – Do I have skew? Examples – File and FAST  How to estimate Skew for Tiered Design – – – – ROT ‘5/10/85’ Data decay Measured: Tier Advisor, and Heat map Sizing for skew: VNX Sizer  Pool Design © Copyright 2013 EMC Corporation. All rights reserved. 3
  • FAST VP and FAST Cache Mechanics © Copyright 2013 EMC Corporation. All rights reserved. 4
  • Basic Sizing Rules of Thumb Drive Type SSD Small (~8 KB) Random I/O ROT Large (~256 KB) Sequential I/O – ROT 3500 IOPS 100 MB/s SAS 10K rpm 150 IOPS 35 MB/s SAS 15K rpm 180 IOPS 35 MB/s NLSAS 90 IOPS 30 MB/s • Determine host load (SAR, Perfmon, NAR, USAT) • Rough estimate read/write mix • Apply RAID factors to get back end load – – – • Determine disk capability: ROT * Disk Count • Determine disk saturation: – – © Copyright 2013 EMC Corporation. All rights reserved. R5: read + 4 * writes R6: read + 6 * writes R1/0: read + 2 * writes Disk Sat = disk capability / back end load Aim for 70% 5
  • FLASH 1st Data Strategy Hot data on FAST Flash SSDs—cold data on dense disks “Hot” Data Activity high activity Highly active data is stored on Flash SSDs for fastest response time As data ages, activity falls, triggering automatic movement to high-capacity disk drives for lowest cost Highcapacity HDD Flash SSD “Cold” Movement Trigger Low Activity Data Age © Copyright 2013 EMC Corporation. All rights reserved. 6
  • The FAST Suite Dynamically Optimizes IO at 64K Granularity  FAST Cache FAST Cache - FAST VP Relationship – Caches data from the HDD or NL-HDD tiers in the pool to SSD – Operates at a page granularity of 64K Controller  FAST VP 16,384 times more granular FAST Cache – Dynamically moves data between SSD, HDD and NL-HDD tiers in the storage pool – Operates at a slice granularity of 1GB 64K 64K 64K SSD HDD NL-HDD FAST Virtual Pool © Copyright 2013 EMC Corporation. All rights reserved. 1G  Deploying both together ensures maximum IO granularity 7
  • FAST VP Mechanics • FAST VP operates using the slice allocation paradigm – Granularity is 1 GB • Slice activity is tracked – A geometric moving average, which favors recent data over older data • Slices can then be relocated according to user’s policy – Policy is set on a per-LUN basis • Slices will be relocated during a window – – – FAST VP SSD SAS NL-SAS Dynamic, Self-optimizing Storage Pools based on actual data activity Defaults to an 8-hour window overnight Defaults to daily migrations User can change relocation times © Copyright 2013 EMC Corporation. All rights reserved. 8
  • FAST Cache – More Flexibility and Speed • FAST Cache reacts in near-real-time • FAST Cache granularity is small – 64 KB • No migration needed; no Pools needed – Works with Classic, Thick and Thin Pool LUNS • What FAST Cache does: – Provides higher IOPS for busy areas currently bound by the current drive capabilities • What FAST Cache does not: – – It does not reduce CPU utilization. In fact – SP CPU may increase. High write workloads relocated to FAST Cache, can reduce SP workload since FAST Cache is a RAID1 technology © Copyright 2013 EMC Corporation. All rights reserved. FAST Cache Effect FAST Cache drives absorb writes more quickly, and returns reads quickly as well High write rate sustained Free pages for incoming I/O D R A M Disks are less busy – secondary benefit to application Flash SAS NL-SAS 9
  • Visualization of Skew © Copyright 2013 EMC Corporation. All rights reserved. 10
  • Visualization of Skew  Overview LUN IOPS, per LUN – A statistical shorthand ‘field expedient’ expression of concentration of I/O over capacity – Start with a series of LUNs, ordered by load LUNs: x = capacity, y = IOPS 100 % capacity LUNs are ordered, busiest to least busy. Plot width (x-axis) represents capacity. How can we express this shape in a summary fashion? LUN Capacity, aggregate © Copyright 2013 EMC Corporation. All rights reserved. 11
  • Visualization of Skew  Visualization of Skew Aggregate IOPS at aggregate capacity IOPS per unit – To visualize the statistic, plot the aggregate IOPS on the Yaxis – This makes a cumulative curve Ordered units of storage, y-axis is IOPS per unit, x-axis is capacity per unit, aggregated Unit capacity © Copyright 2013 EMC Corporation. All rights reserved. As capacity increases, plot the aggregate IOPS represented by those LUNs 12
  • Visualization of Skew  Calculation of Skew 100% IOPS at 100 % capacity – Normalize on percentage of 70% IOPS at 30 % capacity aggregate – At each point on the curve, there is some fraction of capacity and a fraction of IOPS  Somewhere along this curve, fraction of IOPS + fraction of capacity = 1. – That is our Skew Point – We’d say this distribution has a Skew of 70% 0% IOPS at 0 % capacity © Copyright 2013 EMC Corporation. All rights reserved. As capacity increases, plot the aggregate IOPS represented by those LUNs 13
  • Visualization of Skew  Example: skew of ‘90’: 90% of IOPS occurs on 10% of storage (0.9 + 0.1 = 1.0) by capacity IOPS High Skew – distribution of I/O over a narrow range of LBA Gigabytes © Copyright 2013 EMC Corporation. All rights reserved. 14
  • Visualization of Skew Low Skew (61.9) implies flatter curve – distribution of I/O over a broad range of LBA Here, the busiest 38.1% of capacity is hosting 61.9% of the I/O. IOPS The remaining 61.9% of capacity has 38.1% of the I/O – that’s pretty well distributed. Gigabytes © Copyright 2013 EMC Corporation. All rights reserved. 15
  • Applications and Skew © Copyright 2013 EMC Corporation. All rights reserved. 16
  • How Much is Enough?  For leveraging FAST VP, 85% or better is a good target – This assumes an estimated skew from NAR data – Typically, sub-LUN skew is much higher  FAST Cache can help in cases where data is distributed – We can investigate activity at a very fine granularity with our internal tools  Analyze environments as well as applications – Remember to apply some common-sense filters ▪ You can help your SE identify LUNs that just don’t belong — That can significantly help © Copyright 2013 EMC Corporation. All rights reserved. 17
  • Do I Have Skew? Examples  High-skew applications – “Long tail’ fulfillment and ordering systems: book/media/auto part companies, which stack very large numbers of items, but whose sales are concentrated in ‘popular’ stock Fifty Shades of Grey Online Bookseller (hypothetical) Romances, actions, mysteries © Copyright 2013 EMC Corporation. All rights reserved. Field guide to arctic lichens 18
  • Do I Have Skew? Examples  High-skew applications – OLTP Databases: indexes, ‘hot’ tables (address table at a direct-mail firm for example), tables used in many join statements  High skew environments – Dominated by a small set of mission-critical applications Four metas doing 80% of the load High skew environment (actual) File shares © Copyright 2013 EMC Corporation. All rights reserved. Internal databases 19
  • Do I Have Skew? Examples  Low-skew examples – – Microsoft Exchange (all versions) – low skew Databases with dispersed access patterns (all the data is busy all the time) – Service-model data services (IT departments who hand out ‘9GB’ chunks, service providers etc.) ▪ HPC nodes, DWSE, BI SQL Server backups (smaller target, equal IOPS) Backup to disk targets: 72% skew File Home Directories © Copyright 2013 EMC Corporation. All rights reserved. 20
  • How to Estimate Skew © Copyright 2013 EMC Corporation. All rights reserved. 21
  • Estimating: Methods  We’ll use as much data as we can get to analyze your environment  Depending on the amount of information, we can make estimates with a corresponding level of confidence  Lower confidence methods require more conservative planning – No data: Rule of thumb (ROT); low confidence – Business growth rate and activity rate: Data decay model; moderate confidence – Storage system performance data records: Measured skew; high confidence – Fine-grained performance data: Measured sub-LUN skew; very high confidence © Copyright 2013 EMC Corporation. All rights reserved. 22
  • Estimating: Rule of Thumb  A generally safe rule of thumb is “5/10/85” – 5% Flash, 10% SAS, 85% NL-SAS, by capacity – Inherently imprecise (disk capacities can vary so disk counts can vary)  Where does it fit? – General business mixes – Random-access database applications – No large scale mission-critical app that dominates the workload and: ▪ ▪ Uses wide-striping techniques Has programmatically randomized access  Why is it rational? – We’ve seen a lot of workloads, and capacity usually dominates – Many underutilized drives is typical  We prefer using more information – Because many exceptions exist © Copyright 2013 EMC Corporation. All rights reserved. 23
  • Estimating: Data Decay  Data Decay Model – Simple and elegant – Requires some familiarity with data access patterns – Takes three parameters ▪ Amount of storage in use now ▪ Growth rate ▪ Time that data stays ‘hot’ (Accessed daily) – Percentage of Flash needed is calculated from Growth rate and time ‘hot’ – Percentage of Flash is multiplied by the current storage capacity to get Flash capacity required © Copyright 2013 EMC Corporation. All rights reserved. 24
  • Estimating: Data Decay FLASH portion of storage pool FLASH Portion Yearly Growth Rate% X Number of Hot Days X 100 FLASH % = 365 X (Yearly Growth Rate% + 100%) FLASH portion as a function of Yearly Data Growth 20% 15% 10% 5% 0% 30 days Hot 60 days Hot 90 days Hot 120 days Hot 10% 20% 30% 40% 1% 1% 2% 3% 1% 3% 4% 5% 2% 4% 6% 8% 2% 5% 7% 9% © Copyright 2013 EMC Corporation. All rights reserved. 50% 60% 3% Growth 3% 5% 6% 8% 9% 11% 12% 70% 80% 90% 100% 7% 11% 15% 4% 8% 12% 16% 4% 8% 12% 16% Rate 4% 3% 7% 10% 14% 25
  • Estimating Measured: Tier Advisor  Tier Advisor – uses Unified array data – – – – NAR required, sub-LUN estimated Trace data optional, allows refinement of sub-LUN estimate Supports FLARE levels from R24 to R32 Deep and detailed analyses! Configuration: aggregation of all target groups Shows capacity and performance utilization for each disk type in the entire config Target Group: shows target group with selected policy Capacity and performance utilization of the target group for each disk type hosting any part of the target group IOPS/Response time graph for the selected target group © Copyright 2013 EMC Corporation. All rights reserved. 26
  • Estimating Measured: Tier Advisor  Tier Advisor – NAR only. Note the “Est virtual” (sub-LUN) skew  Use of Trace files narrows this analysis Sub-LUN skew (91%) looks excellent Compares to 73% LUN Skew © Copyright 2013 EMC Corporation. All rights reserved. 27
  • Estimating Measured: WPA Tool Simple Process. Highly responsive. Heterogeneous Sources 1 Collect supported logs/scripts AWR/Stat spack 2 Post files to Mitrend 3 Receive preformatted outputs within 24 hours Perf collect SAR iostat ESXtop NAR file Auto support EVA perf Support for Applications: Oracle, Microsoft, Custom UNIX Virtualization: VMWare Storage: EMC VNX, CLARiiON, NetApp & HP © Copyright 2013 EMC Corporation. All rights reserved. 28
  • Estimating Measured: HEAT Map Tool  Maps slices in an existing Pool for FAST VP sizing (VNX OE R32) – Homogenous or heterogeneous pool – Does not have to have FAST VP active (requires efficiency enabler eg. Thin) – Uses same heat map calculation used by FAST VP Each column represents 1 LUN, with each 1GB slice stacked vertically. Red bands are Hot/Busy slices. Grey bands are idle slices. Black is shown when some LUNs are larger than others. % of IOPS % of IOPS 100 80 60 40 20 0 0 100 200 400 800 1600 3200 # Slices © Copyright 2013 EMC Corporation. All rights reserved. 29
  • Sizing: VNX Sizer 2.0  Simple dashboard based sizing tool  Data decay or application workload based solutions – Inputs are business/application specific  Different results options are offered – Aggressive to conservative comparison provided  Realistic workload modeling – OLTP, DW, File Sharing and Exchange initially – Based on thousands of workload analyses – Complex best-fit engine for best practice configurations © Copyright 2013 EMC Corporation. All rights reserved. 30
  • Pool Design © Copyright 2013 EMC Corporation. All rights reserved. 31
  • Pool Design Precepts  Some basic things to keep in mind – FAST technologies need time to optimize – Drive behavior still matters ▪ Most of your capacity will be on HDD ▪ Some access will ‘miss’ SSD – In larger systems, multiple pools makes sense – RAID is applied at the pool tier level – FAST Cache is applied at the pool level © Copyright 2013 EMC Corporation. All rights reserved. 32
  • Pool Design Precepts  Pool Domains – why more than one pool – Fault domains: don’t mix primary and replica, log and table, Store & Point DB with BLOB DB1 Tables DB2 Tables HomeDirs/Users © Copyright 2013 EMC Corporation. All rights reserved. DB1 Logs DB2 Logs Home Dir/User Backup 33
  • Pool Design Precepts  Pool Domains – why more than one pool – Performance (SLA) domains: RAID 1/0? Drive mix? – FAST Cache on/off is done at the Pool level High Skew RAID6 5/25/70 Mix DB1 Medium Skew RAID5 0/25/75 Mix +FAST Cache VMDKs DB2 HomeDirs/Users © Copyright 2013 EMC Corporation. All rights reserved. File Systems Home Dir/User Backup Deterministic RAID1/0 No FAST Cache Messaging Logs 34
  • Pool Design Precepts  Pool RAID types – RAID 1/0: For >25% small-block random writes on HDD – RAID 5: General purpose ▪ ▪ ▪ Homogeneous, med-high ‘general purpose’ workloads SSD + SAS mixes for very high performance NL-SAS if replica, restorable – RAID 6: Most uses of NL-SAS; high bandwidth, good read performance ▪ ▪ RAID 6 overhead mostly seen in smaller writes R31 MR1 mitigates this due to better XOR offloading  Pool Disk Choices – Extreme Performance (SSD): Tiered if you have a highly skewed workload – 4+1 R-5 – Performance (SAS): still important – for those ‘misses’ of your SSD layer – 8+1 R-5 – Availability (NL-SAS): is great for aged data, backups, well-behaved streaming – 14+2 R-6 © Copyright 2013 EMC Corporation. All rights reserved. 35
  • Applying Skew to the Pool  In high skew environments, capture the ‘skew I/O’ with Flash – Example: given a skew number of 90, 90% of your I/O will be targeted at Flash ▪ And given that companion capacity ratio, that means 10% of your capacity to Flash  FAST Cache first – In smaller systems, this is all you may need ▪ Add FAST Cache to reduce response times – On larger systems, provides the timely buffer between FAST VP relocations – Consider the capacity in FAST Cache as having been applied to your workload ▪ If you have 5% of your capacity in FAST Cache, then a further 5% in the tier covers you 10% (in a 90% skew case)  SAS Second – – After applying Flash capacity to the pool, size about 10%-20% of the remaining capacity as SAS – enough to support 80+% of the remaining I/O In low skew environments, focus on SAS and NL-SAS working together  In low skew environments, focus on SAS and NL-SAS working together © Copyright 2013 EMC Corporation. All rights reserved. 36
  • FAST Suite for File and Block Dynamically Optimizes IO at 64K Granularity  Use FAST Cache First FAST Cache, FAST VP + File & Block  FAST VP with File Best Practices LUNs SAN LUNs NFS/CIFS NAS Volumes 16,384 times more granular FAST Cache 64K 64K 64K SSD HDD NL-HDD FAST Virtual Pool © Copyright 2013 EMC Corporation. All rights reserved. 1GB New – Dynamically moves data between FAST VP Optimized SSDs, HDD and NL-HDD tiers at slice granularity of 1GB – Separate pool for File – Use thin enabled File System to generate skew – Size SSD to match max. FS extension size 37
  • What Fits And Where It Fits General Application Guidance SQL and Oracle Server flash Flash only pools FAST Cache Database logs Neutral SharePoint Neutral Neutral NFS & CIFS Neutral Neutral Exchange 2012 FAST VP Neutral Neutral Virtual Desktops Neutral Sequential I/O Neutral = Sometime there is benefit, sometimes not © Copyright 2013 EMC Corporation. All rights reserved. 38
  • Conclusion and Recommendations  FAST technologies allow focusing high-IOPS workloads on top-tier Flash drives  FAST takes advantage of skew – A phenomenon common to many applications and environments  FAST VP and FAST Cache work together – Covering broad and very focused optimizations  Skew helps determine how much Flash to use – Skew can be estimated with varying levels of confidence – Skew can be measured  Skew simplifies the ’how much Flash’ question – All the ‘high skew’ IOPS – Use a simple rule to determine the lower two tiers © Copyright 2013 EMC Corporation. All rights reserved. 39
  • USD Related Sessions Session Day Time Monday 5/6 Tuesday 5/7 Wednesday 5/8 4-5pm 11:30-12:30pm 11:30-12:30pm Monday 5/6 Tuesday 5/7 1-2pm 4-5pm Monday 5/6 Wednesday 5/8 4-5pm 10-11am Monday 5/6 Tuesday 5/7 2:30-3:30pm 11:30-12:30pm Wednesday 5/8 Thursday 5/9 4-5pm 10-11am Monday 5/6 Tuesday 5/7 11.30am -12.30 am 10-11am Monday 5/6 Wednesday 5/8 1-2pm 2:30-3:30pm Scaling Oracle Environments with dNFS & VNX FAST Technologies Monday 5/6 Thursday 5/9 4-5pm 11:30-12:30pm VNX FAST Cache is King: Desktop Virtualization with the VNX series Monday 5/6 Thursday 5/9 11:30-12:30pm 11.30am-12.30pm BOF: Dive into Storage Efficiencies Tuesday 5/7 1-2pm FLASH 1st at Scale VNX FAST VP VNX FAST Cache VNX Compression and Deduplication Leveraging SSD: Designing for FAST Cache & FAST VP on Unified Storage VNX Virtual Provisioning Strategic Perspective of VNX Family Storage Efficiency Technologies - Where, Which & Why © Copyright 2013 EMC Corporation. All rights reserved. 40
  • Visit the vCredible Unified Storage Booth Booth #134  Demos covering Oracle, Microsoft, VMware and Hyper-V, FLASH 1st, Sizing Tools and more…  “Whack a Villain” Challenge  Win iPad mini’s daily!  Meet the Experts! Modular Unified Integrated Unified vBaby © Copyright 2013 EMC Corporation. All rights reserved. Flash Boy vEdna vViolet Flexigirl Mr vCredible 41
  • QUESTIONS? © Copyright 2013 EMC Corporation. All rights reserved. 42