• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Little's Law in 3D and Storage Performance
 

Little's Law in 3D and Storage Performance

on

  • 4,816 views

Northern California CMG presentation.

Northern California CMG presentation.

Statistics

Views

Total Views
4,816
Views on SlideShare
1,528
Embed Views
3,288

Actions

Likes
7
Downloads
0
Comments
0

42 Embeds 3,288

http://perfdynamics.blogspot.com 1701
http://perfdynamics.blogspot.in 340
http://perfdynamics.blogspot.it 214
http://perfdynamics.blogspot.de 159
http://perfdynamics.blogspot.co.uk 123
http://perfdynamics.blogspot.com.br 91
http://perfdynamics.blogspot.com.au 75
http://perfdynamics.blogspot.nl 58
http://perfdynamics.blogspot.ca 57
http://perfdynamics.blogspot.fr 54
http://perfdynamics.blogspot.com.es 49
http://feeds.feedburner.com 43
http://perfdynamics.blogspot.ru 41
http://perfdynamics.blogspot.se 38
http://perfdynamics.blogspot.fi 31
http://www.linkedin.com 29
http://perfdynamics.blogspot.ie 29
http://perfdynamics.blogspot.ch 21
http://perfdynamics.blogspot.kr 20
http://perfdynamics.blogspot.com.ar 15
http://perfdynamics.blogspot.co.il 10
http://perfdynamics.blogspot.gr 9
https://twitter.com 8
http://perfdynamics.blogspot.jp 8
http://perfdynamics.blogspot.be 8
http://perfdynamics.blogspot.sg 7
http://perfdynamics.blogspot.pt 7
http://perfdynamics.blogspot.tw 6
http://perfdynamics.blogspot.ro 6
https://www.linkedin.com 5
http://perfdynamics.blogspot.no 5
http://perfdynamics.blogspot.mx 4
http://perfdynamics.blogspot.com.tr 3
http://perfdynamics.blogspot.hk 3
http://perfdynamics.blogspot.co.at 3
http://www.newsblur.com 2
http://perfdynamics.blogspot.hu 1
http://www.yahoo.com&_=1351399719739 HTTP 1
http://perfdynamics.blogspot.sk 1
http://perfdynamics.blogspot.cz 1
http://translate.googleusercontent.com 1
http://perfdynamics.blogspot.co.nz 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Little's Law in 3D and Storage Performance Little's Law in 3D and Storage Performance Presentation Transcript

    • Little’s Law in 3D and Storage Performance NorCal CMG Meeting Dr. Neil Gunther Performance Dynamics August 7, 2012 SMc 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 1 / 34
    • BackgroundOutline1 Background Review Little’s Law The Utilization Law2 Throughput-Delay Plots Need for Speed Benchmarking Paradox Paradox Resolved3 Storage Performance Throughput Latency Concurrency4 Conclusion c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 2 / 34
    • BackgroundLittle’s Law 1 What is it? 1 If your data don’t fit LL, change your data! c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 3 / 34
    • BackgroundLittle’s Law 1 What is it? N = XR 1 If your data don’t fit LL, change your data! c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 3 / 34
    • BackgroundLittle’s Law 1 What is it? N = XR An immutable law of performance 1 1 If your data don’t fit LL, change your data! c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 3 / 34
    • BackgroundLittle’s Law 1 What is it? N = XR An immutable law of performance 1 2 Why is it important? 1 If your data don’t fit LL, change your data! c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 3 / 34
    • BackgroundLittle’s Law 1 What is it? N = XR An immutable law of performance 1 2 Why is it important? L = λW proven 1961 1 If your data don’t fit LL, change your data! c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 3 / 34
    • BackgroundLittle’s Law 1 What is it? N = XR An immutable law of performance 1 2 Why is it important? L = λW proven 1961 Algebraic simplification 1 If your data don’t fit LL, change your data! c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 3 / 34
    • BackgroundLittle’s Law 1 What is it? N = XR An immutable law of performance 1 2 Why is it important? L = λW proven 1961 Algebraic simplification Cross-checkingJ.D.C. Little’s lore (in his own words): perfdynamics.blogspot.com/2011/07/ 1 If your data don’t fit LL, change your data! c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 3 / 34
    • BackgroundA Little Historical Perspective LL is not based on queueing theory LL relates inventory and manufacturing cycle time John Little (now 84) is not a computer performance analyst Prof. Little did not invent his own law LL was known to A. K. Erlang more than 100 years ago There are actually two versions of Little’s lawA Paradox 1 LL expresses the fact that R decreases with increasing X 2 Benchmarks show R increases with increasing throughput X c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 4 / 34
    • BackgroundPurpose of This Talk 1 Review LL (both versions) 2 Resolve the XR paradox by introducing 3D version of LL 3 Apply LL to understand IOPS bottleneck c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 5 / 34
    • Background Review Little’s LawLittle’s Law at the System LevelIn steady state, the mean rate of arrival (λ) of customers into a system is equal to the meanoutput rate or throughput (X ) of customers departing the system. λ=X (1)The total number of customers, requests, processes, threads (N) in the system is given by: N = λR = XR (2)where R is the mean total time spent in the system.Classic Little’s lawN is the mean number of customers/requests in residence. c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 6 / 34
    • Background Review Little’s LawLittle’s Law at the Device LevelIf the system is like a grocery store, the device level is like a checkout lane.At any device (labelled k = 1, 2, . . .), equation (2) yields the local number of customers/requests(Qk ) enqueued: Please link back to the page you downloaded this from, or just link to parkablogs.blogspot.com Qk = λRk (3)where Rk is the time in residence at the device. Rk is defined as the sum of the service time moc.topsgolb.sgolbakrap ot knil tsuj ro ,morf siht dedaolnwod uoy egap eht ot kcab knil esaelP(Sk ) at the cashier and the time (Wk ) spent waiting to get serviced by the cashier: Please link back to the page you downloaded this from, or just link to parkablogs.blogspot.com Rk = Wk + Sk Please link back to the page you downloaded this from, or just link to parkablogs.blogspot.com Please link back to the page you downloaded this from, or just link to parkablogs.blogspot.com (4)The total number, N, in the global system (2) is the sum of all the customers/requests enqueuedat each device: N = Q1 + Q2 + · · · + Qk (5) c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 7 / 34
    • Background The Utilization LawLittle’s Law and Device UtilizationThe utilization of the device comes from (3) by ignoring the waiting time contribution. Logically,this is equivalent to letting W → 0: Qk = λRk = λ(Wk + Sk ) → λSk (6)We changed the right side of (6), so the left side must also be changed. But to what? It has to benumber (like N) and Qk can be unbounded: Qk < ∞ (but not infinite).Call the “new” number ρk (to agree with queueing literature) so that (6) becomes: ρk = λSk (7)Since the cashier cannot service more than one customer at a time: ρk < 1 (8)or ρk < 100%, on average.Little’s utilization lawThe utilization ρk is the mean number of customers/requests in service at device k . c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 8 / 34
    • Throughput-Delay PlotsOutline1 Background Review Little’s Law The Utilization Law2 Throughput-Delay Plots Need for Speed Benchmarking Paradox Paradox Resolved3 Storage Performance Throughput Latency Concurrency4 Conclusion c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 9 / 34
    • Throughput-Delay Plots Need for SpeedSpeed, Distance and TimeExampleDriving on the freeway at 60 mph. At that speed, you travel a mile a minute. How far will youtravel in 15 minutes? c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 10 / 34
    • Throughput-Delay Plots Need for SpeedSpeed, Distance and TimeExampleDriving on the freeway at 60 mph. At that speed, you travel a mile a minute. How far will youtravel in 15 minutes?AnswerIn a quarter of an hour you will travel one quarter the distance you would have covered in anhour. Therefore, in 15 minutes you will travel 15 miles.Congratulations! You just used LL without realizing it.Let X be the speed, R the elapsed time and N the miles covered: N=XR 15 15 miles = 60 mph × hours 60 c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 10 / 34
    • Throughput-Delay Plots Need for SpeedSpeed and Delay are Inversely RelatedExampleNow, suppose it’s an emergency and you need to cover the same distance in 10 minutes. Howfast do you need to go? c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 11 / 34
    • Throughput-Delay Plots Need for SpeedSpeed and Delay are Inversely RelatedExampleNow, suppose it’s an emergency and you need to cover the same distance in 10 minutes. Howfast do you need to go?The answer may not be so obvious, but not to worry. We can still use LL.Answer N=XR 10 15 miles = X × hours 60Solving for X: X = 15 × 6 = 90 mph c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 11 / 34
    • Throughput-Delay Plots Need for SpeedSpeed and Delay are Inversely RelatedExampleNow, suppose it’s an emergency and you need to cover the same distance in 10 minutes. Howfast do you need to go?The answer may not be so obvious, but not to worry. We can still use LL.Answer N=XR 10 15 miles = X × hours 60Solving for X: X = 15 × 6 = 90 mphTheorem (Inverse Proportion of LL)To reduce the delay R (elapsed time), the speed X must be increased. c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 11 / 34
    • Throughput-Delay Plots Need for SpeedXR Plots of LL X R 15 15 10 10 N 50 N 50 5 5 N 15 N 15 N 1 N 1 0 R 0 X 0 5 10 15 0 5 10 15 Example was for the N = 15 miles curve Time for N = 15 miles is reduced by going from green to red dot Different distance means a different curve Curves are symmetric about the diagonal Can flip X and R axes w/o changing the curves Independent variable goes on x-axis c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 12 / 34
    • Throughput-Delay Plots Benchmarking ParadoxBenchmark XR Plots SPEC SFS: Performance Plots !"#$%&#()*+,*%&-.(&-/(.&)&.$#0(*12$*%-#"(+,*(3456(5&*.7(5*8*(9:;:(+,*(57&*<,$-#( ( NSPLab Dec 2007 Hitachi Jan 2012 50 45 Response Time (mSec) 40 35 30 25 20 15 SC2000 10 NS6000 5 0 0 500 1000 1500 2000 2500 3000 NFSops/Second SPEC SFS97 Fusion-io SQLServer 2010 @$#7($/A(.,-#-#(.*&BA"(C",A$/(A$-"DE(F?(&.7$8"(&*,2-/(9G(1)"E(H+,*(/=*&/$-=(#,(&*,2-/(9:( I<5(2-/*(,8*A,&/(.,-/$#$,-"E(B$#7(JK(H.,%$-=(#7(H,##A-.LM(F;:($"(&HA(#,(/A$8*(G:(I<5E( 8 &#(B7$.7(),$-#($#(H.,%"(A$%$#/(H0(#7(#7*,2=7)2#(N<OM(O"$-=(+&"#*(,*(%,*(N<O"(B,2A/(7&8( $-.*&"/(#7$"(H-+$#(8-(%,*(#7&-(#7(%&"2*/(G:P(=&$-M( c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance Q2*$-=(.,-#-#(.*&BA$-=M(F?(7&"(-,("$=-$+$.&-#(.7&-="($-(12*0()*+,*%&-.(.,%)&*/(#,($/AM( 34 August 7, 2012 13 /
    • Throughput-Delay Plots Benchmarking ParadoxLL is 3-Dimensional 2.0 100 1.5 N 50 1.0 0 R s 0 0.5 20 X QPS 40 0.0 Three variables (like PVT in chemistry) 3D surface Like a cone but not rotationally symmetric about apex Square edges cause hyperbolic contours c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 14 / 34
    • Throughput-Delay Plots Benchmarking Paradox Fusion-io Benchmark!"#$%&#()*+,*%&-.(&-/(.&)&.$#0(*12$*%-#"(+,*(3456(5&*.7(5*8*(9:;:(+,*(57&*<,$-#( ( R 1.5 1.0 0.5 0.0 X 5 10 15 20 25 30 Actual data (with and without FIO) Extracted data@$#7($/A(.,-#-#(.*&BA"(C",A$/(A$-"DE(F?(&.7$8"(&*,2-/(9G(1)"E(H+,*(/=*&/$-=(#,(&*,2-/(9:(I<5(2-/*(,8*A,&/(.,-/$#$,-"E(B$#7(JK(H.,%$-=(#7(H,##A-.LM(F;:($"(&HA(#,(/A$8*(G:(I<5E(&#(B7$.7(),$-#($#(H.,%"(A$%$#/(H0(#7(#7*,2=7)2#(N<OM(O"$-=(+&"#*(,*(%,*(N<O"(B,2A/(7&8($-.*&"/(#7$"(H-+$#(8-(%,*(#7&-(#7(%&"2*/(G:P(=&$-M( SQL Server RDBMS: Measure X in QPS and R in s at each load (N)Q2*$-=(.,-#-#(.*&BA$-=M(F?(7&"(-,("$=-$+$.&-#(.7&-="($-(12*0()*+,*%&-.(.,%)&*/(#,($/AM(J#($"(&.7$8$-=(.*&BA(&-/(12*0(A,&/(")&*&#$,-(H0(2"$-=(&-(&//$#$,-&A("#(,+("*8*"($-(&( Two curves: before (red) and after (blue) application of FIO device//$.&#/("&*.7(*,BM(F;:(=#"(",%(/=*&/&#$,-E(&"(.,-#-#()*,.""$-=(&-/(12*$"(.,%)#(+,*(#7("&%(N<O(*",2*."M(5#$AAE(F;:(&.7$8"(#7("&%(9:(I<5(&"(F?(2-/*(#7(7$=7"#(A,&/(.,-/$#$,-"M(4A",(-,#(#7&-(#7(.,-#-#(.*&BA$-=(*&#(,-(F;:($"(9:P(7$=7*(#7&-(,-(F?(/2*$-=( Manually extracted pertinent data points#7$"(#"#E(&"(#7($-.*&"/(JK()*+,*%&-.(&AA,B"(+,*(H##*(7&-/A$-=(,+(#7(.,-.2**-#(,)*&#$,-"M("#$%!&$()!R(*+,)-!,#$%!&$()!67(#&HA(HA,B("7,B"(#7(.,%H$-/($-.*&"($-(/$"L(2"&=(,-(&AA(-,/"(&+#*(#7(8&*$,2"(.,-#-#(",2*."(7&8(H-($-/S/M( c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 15 / 34
    • Throughput-Delay Plots Paradox ResolvedBack to the ParadoxThe XR Paradox 1 LL says R decreases with increasing X (3D contour lines) 2 Benchmarks show R increases with increasing throughput X R R 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 X 0.0 X 5 10 15 20 25 30 0 10 20 30 40 50 Extracted data Data moves on LL contoursThe ResolutionSuperimpose LL 3D contours onto 2D benchmark data. c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 16 / 34
    • Throughput-Delay Plots Paradox Resolved2D Projection of 3D Surface R s 1.4 1.2 1.0 0.8 0.6 X QPS 22 24 26 28 30Theorem (Gunther 2012)All benchmark data “moves” along LL contours. c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 17 / 34
    • Storage PerformanceOutline1 Background Review Little’s Law The Utilization Law2 Throughput-Delay Plots Need for Speed Benchmarking Paradox Paradox Resolved3 Storage Performance Throughput Latency Concurrency4 Conclusion c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 18 / 34
    • Storage Performance ThroughputSystem Level Query RateExampleSuppose processing a query requires the execution of 100 K instructions on the CPU. The CPU canexecute 10 GIPS. 1 IPQ: 100 K = 100 × 103 instruction per application query 2 IPS: 10 GIPS = 10 × 109 cpu instructions per secondThe throughput (or request rate) for queries is: IPS λQPS = IPQ 10 × 109 = 100 × 103 1010 = 105 = 100, 000The steady state assumption (1) tells us: λQPS = 100 KQPS = XQPS (9)A maximum of 100 KQPS can be processed c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 19 / 34
    • Storage Performance ThroughputStorage Device IO RateExample (cont’d)Assume further that within the query instructions a single IO is issued. The CPU thread mustwait before the rest of the query instructions can be completed.This creates a nice convenience since λIOPS ≡ λQPS . QPS λIOPS = IOPQ 105 = 1 = 100, 000 λIOPS = 100 KIOPS = XIOPS (10)Device IOPSBut this is aggregate IOPS. How many IOPS can a single disk do? c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 20 / 34
    • Storage Performance ThroughputDevice IOPS RatingExample (Seagate IOPS)A Seagate Barracuda 7200 RPM disk is capable of about 100 IOPS. Follows from combinedseek time and RPS time being on the order of 10 ms. Hence: 1 IOPS = = 100 (11) 0.010Simple arithmetic suggests that 1000 Seagate Barracudas would needed to accommodate the100 KIOPS aggregate throughput being considered here.Caveat emptorNote that (11) is a rearrangement of the LL utilization law (7): ρ λIOPS = (12) Sdiskwith ρ = 1. Hence, it is the theoretical maximum possible IOPS that this disk can support. Inpractice, the sustainable IOPS rate will be considerably lower. c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 21 / 34
    • Storage Performance LatencyStorage LatencyExample (cont’d)If the storage device is capable of responding to an IO request in 1 ms (10x SeagateBarracuda), the processor needs to issue 100 concurrent IO requests to the storage system sothat it can complete 100 KQPS. If the storage device were 10 times faster (e.g., SSD), then theprocessor would only need to be handing a 10th as many IO requests, or just 10 concurrentrequests. Sdisk = 10−3 s Sssd = 10−4 sApplying the LL utilization law (7): ρdisk = λIOPS Sdisk = 105 × 10−3 = 100 (13)Suggests we need more than 100 spindles. Similary, for faster SSD devices: ρssd = λIOPS Sssd = 105 × 10−4 = 10 (14)LatencyLatency is an ill-defined word that means different things to different technical people. Need themore exacting language of queueing theory to see where different latencies arise. c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 22 / 34
    • Storage Performance LatencyTandem Queue ModelSince computer systems are not deterministic, we represent CPU and storage as a queueingnetwork with two stages: Src ! Scpu Sdev ! SnkQueries are sourced by the application at an aggregate request rate of λ = 100 KQPS and theCPU issues IO requests at the rate of 100 KIOPS.However, from (13) we know ρdisk = 100 or 10,000% !!TroubleThis violates the utilization bound ρdisk < 1 given by (8).We already suspected we would need at least 100 spindles from (13).But how should the disks be arranged to give the correct latencies? c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 23 / 34
    • Storage Performance LatencyParallel Disk QueuesThe message from LL (8) is that we need many (q) disks operating in parallel. !/q !/q ! ! ! Source Scpu Sink !/q !/qParallel disks divide the total throughput (λ) into q substreams, each load-balanced with equalrate λ/q. Moreover, considering (13), we can write: 100 ρdisk = <1 (15) qLL tells us we actually need more than q = 100 disks to satisfy the utilization bound.Disk ArraysThis is why typical storage subsystems are configured as arrays. c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 24 / 34
    • Storage Performance LatencyCPU LatencyCPU service time (i.e., execution time) for a query: IPQ SCPU = = 10−5 seconds (16) IPSi.e., 10 µs per query . The mean CPU utilization is: ρCPU = λQPS SCPU = 105 × 10−5 = 1which is right on the edge of the utilization bound. Scpu Src Scpu Snk ScpuSo, we need more than one core or execution unit.Duo-coreLL tells us we need a duo-core, at least, to meet the utilization bound. c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 25 / 34
    • Storage Performance ConcurrencyMulticore with Infinite IOPSExample (cont’d)If the storage system is capable of responding to an IO request in 1 ms······If the storage were 10 times faster in responding with I/O requests...These numbers become Sdev in the following diagram. Sdev !/q Scpu !/q Sdev ! ! ! Snk Src !/q Scpu !/q SdevWe use this queueing model to examine both latency and concurrency effects.“Infinite IOPS” is represented by 1000 parallel storage devices. c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 26 / 34
    • Storage Performance ConcurrencyQueueing Model ResultsExample (cont’d)If the storage system is capable of responding to I/O requests in 1/1,000th of a second, thenthe CPU will need to issue N = 100 concurrent requests······If the storage were 10 times faster then the processor would only need to be handing 1/10th asmany concurrent requests, or just N = 10 concurrent requests. Latency Concurrency Device (#) Service Residence Qk N CPU (2) 0.00001 0.0000133333 1.33333 1.33333 Disk (1000) 0.001000 0.001111 0.1111 111.1 SSD (1000) 0.000100 0.0001010 0.01010 10.10 FIOa (1000) 1.000 × 10−6 1.000 × 10−6 0.0001 0.1000 FIOb (1) 1.000 × 10−6 1.111 × 10−6 0.1111 0.1111The overall time in the system, per LL in eqn. (2), is the sum of the CPU residence time (1st row,3rd column) and the residence time of an IO at the respective storage device.With 1000 disks, N = 111.1 concurrent IOs.With 1000 SSDs, N = 10.1 concurrent IOs.With 1000 FIOs, N = 0.1 concurrent IOs. But wait! It gets even better... c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 27 / 34
    • ConclusionOutline1 Background Review Little’s Law The Utilization Law2 Throughput-Delay Plots Need for Speed Benchmarking Paradox Paradox Resolved3 Storage Performance Throughput Latency Concurrency4 Conclusion c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 28 / 34
    • ConclusionLatency Trumps IOPS CPU The residence time RCPU is 33% bigger than query execution time, SCPU . In general, this time can be reduced further with more cores. Disk All 1000 disks have S = 1 ms service time. Residence time is twice the service time. Concurrent IO threads Nio = 111. These threads also have to be managed by the OS (not shown). Threads management also uses up CPU cycles (not shown). Response time = 0.000013 + 0.001111 is dominated by disk latency. SSD Faster “SSD” (10x) with nominal S = 0.1 ms service time. Residence time is now close to service time. Concurrency is also reduced by 10x to N = 10 threads. Response time = 0.000013 + 0.0001010 still dominated by storage latency. FIOa Fusion flash service time S = 1 microsecond. Residence time is equal to the device service time. Concurrent IO threads N = 0.1 are negligible. Response time = 0.000013 + 0.000001 is now CPU-bound. FIOb Bigger message: Don’t need 1000 Fusion flash devices. Small NFIOa = 0.1 means a single FIO device has same IO concurrency. A single Fusion card can replace 1000 standard devices! SAN in your hand c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 29 / 34
    • ConclusionioDrive2 Duo 2.4TBFrom the Fusion-io web site Read bandwidth 3.0 GB/s Random read 285,000 IOPS Write bandwidth 2.5 GB/s Random write 725,000 IOPS Sequential read 892,000 IOPS Read access latency 68 µs Sequential write 935,000 IOPS Write access latency 15 µs c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 30 / 34
    • ConclusionSummary LL is really 3D (3 variables: N, X , R). LL has 2 versions: N = XR (with waiting) and ρ = XS (no waiting). Assume no bandwidth limit and choose throughput target (here, 100 KQPS). With current tech, LL tells us we need parallel devices (disk array, multicore). Storage “latency” (service times) orders of magnitude longer than CPU execution times. The number of outstanding IOs determines the the total (response) time in the system to complete each application query: R = W + S. Rstor Rcpu so, storage latency dominates system response time. If can make Rstor Rcpu , then outstanding IOs become negligible. Application query times determined soley by the CPU execution time. A CPU-bound application is always the optimal goal. Fusion-io also eliminates IO controller latency: all data gets closer to CPU. c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 31 / 34
    • Conclusion Table: Comparative storage device attributes Storage Type Relative Latency Relative Technology Persistent Controller Device Cost Disk Yes High High Low SSD Yes High Low High Fusion-IO Yes Low Low High RAM No Low Low Highestc 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 32 / 34
    • ConclusionGuerrilla Training Wanna learn about more stuff like this? Come to class c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 33 / 34
    • Conclusion Thank you for your participationPerformance Dynamics CompanyCastro Valley, Californiawww.perfdynamics.comperfdynamics.blogspot.comtwitter.com/DrQzfacebook.com/Performance-Dynamics -Companyinfo@perfdynamics.com+1-510-537-5758 c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 34 / 34