SlideShare a Scribd company logo
A Swift Benchmarking Tool

                                      Mark Seger
                                      Hewlett Packard
                                      Cloud Services




4/19/2013          Getput Swift Performance Tools
                                                        1
Problem Statement
• Performance Measurements
     –      Consistent/standard mechanisms for controlled experiments
     –      Ability to easily modify test parameters
     –      Minimal installation, configuration and use
     –      Easy to compare results of multiple runs
     –      Easy to clean up when done
• Benchmarking – run performance tests at scale
     – Repeat tests while increasing demand for resources
     – Parallel tests must be coordinated: start/finish together



4/19/2013                     Getput Swift Performance Tools   2
Getput Suite
 • Multiple tools organized in a hierarchy
       –    getput: actual workhorse, runs tests on single client
       –    gpmaster: coordinates running getput on multiple clients
       –    gpsuite: defines suites of tests to minimize switches usage
       –    yourscript: can call gpsuite multiple times when desired




4/19/2013                     Getput Swift Performance Tools    3
getput.py
• Uses swiftclient library
• Lots of switches, lots of different behaviors
      – Standalone
            • Basic: creds, cname, oname, size, num/runtime, tests, rep count
            • More: processes, container type: shared/byproc/bynode, latency
              details, operation logging, and still more
      – Multi-node (controlled by gpmaster)
            • start time, rank




4/19/2013                        Getput Swift Performance Tools       4
gpmaster.py
• Coordinates running of getput on multiple clients
      – Assures all start together and finish approx together
      – Summarizes results as a single line
      – Unlike getput only runs 1 test at a time, job for gpsuite
• More required switches than getput
      –     Credentials file
      –     Rank
      –     Start time
      –     Hosts file or single client name, may need ssh key too
      –     And a few more…
• But rarely run by itself!
4/19/2013                     Getput Swift Performance Tools    5
gpsuite.py
• Removes complexity of running gpmaster
• Think of macros: gpsuite –suite full
      – Sets of object sizes, eg: 1k, 10k, 100k, etc
      – Numbers of threads, eg: 1, 2, 4, 8, etc
            • Distributes threads across multiple clients
• Some runs can take hours with a single command
• Cleans up after each run




4/19/2013                        Getput Swift Performance Tools   6
Getput Output
Earliest versions
Inst   Start      End      Seconds Tests   Num   MB/S     IOPS Errs
   0   13:59:20   13:59:29    8.57   put   100   0.11    11.67    0
   0   13:59:29   13:59:33    4.03   get   100   0.24    24.83    0
   0   13:59:33   13:59:34    1.80   del   100   0.54    55.68    0




4/19/2013                                  Getput Swift Performance Tools   7
Getput Output
Earliest versions
Inst   Start      End      Seconds Tests   Num   MB/S     IOPS Errs
   0   13:59:20   13:59:29    8.57   put   100   0.11    11.67    0
   0   13:59:29   13:59:33    4.03   get   100   0.24    24.83    0
   0   13:59:33   13:59:34    1.80   del   100   0.54    55.68    0



Added latency range in later versions
Inst   Start      End      Seconds Tests   Num   MB/S     IOPS Latency        LatRange Errs
   0   13:59:20   13:59:29    8.57   put   100   0.11    11.67   0.085      0.02-00.22    0
   0   13:59:29   13:59:33    4.03   get   100   0.24    24.83   0.040      0.04-00.05    0
   0   13:59:33   13:59:34    1.80   del   100   0.54    55.68   0.018      0.01-00.05    0




4/19/2013                                  Getput Swift Performance Tools                     8
Getput Output
Earliest versions
Inst   Start      End      Seconds Tests   Num   MB/S     IOPS Errs
   0   13:59:20   13:59:29    8.57   put   100   0.11    11.67    0
   0   13:59:29   13:59:33    4.03   get   100   0.24    24.83    0
   0   13:59:33   13:59:34    1.80   del   100   0.54    55.68    0



Added latency range in later versions
Inst   Start      End      Seconds Tests   Num   MB/S     IOPS Latency        LatRange Errs
   0   13:59:20   13:59:29    8.57   put   100   0.11    11.67   0.085      0.02-00.22    0
   0   13:59:29   13:59:33    4.03   get   100   0.24    24.83   0.040      0.04-00.05    0
   0   13:59:33   13:59:34    1.80   del   100   0.54    55.68   0.018      0.01-00.05    0



Added CPU and started playing with compression in more recent versions
Inst   Start      End      Seconds Tests   Num   MB/S     IOPS Latency        LatRange Errs Procs OSize   %CPU Comp
   0   13:59:20   13:59:29    8.57   put   100   0.11    11.67   0.085      0.02-00.22    0     1   10k   0.30   no
   0   13:59:29   13:59:33    4.03   get   100   0.24    24.83   0.040      0.04-00.05    0     1   10k   0.39   no
   0   13:59:33   13:59:34    1.80   del   100   0.54    55.68   0.018      0.01-00.05    0     1   10k   0.58   no




4/19/2013                                  Getput Swift Performance Tools                             9
Getput Output
Earliest versions
Inst   Start      End      Seconds Tests   Num    MB/S    IOPS Errs
   0   13:59:20   13:59:29    8.57   put   100    0.11   11.67    0
   0   13:59:29   13:59:33    4.03   get   100    0.24   24.83    0
   0   13:59:33   13:59:34    1.80   del   100    0.54   55.68    0



Added latency range in later versions
Inst   Start      End      Seconds Tests   Num    MB/S    IOPS Latency        LatRange Errs
   0   13:59:20   13:59:29    8.57   put   100    0.11   11.67   0.085      0.02-00.22    0
   0   13:59:29   13:59:33    4.03   get   100    0.24   24.83   0.040      0.04-00.05    0
   0   13:59:33   13:59:34    1.80   del   100    0.54   55.68   0.018      0.01-00.05    0



Added CPU and started playing with compression in more recent versions
Inst   Start      End      Seconds Tests   Num    MB/S    IOPS Latency        LatRange Errs Procs OSize   %CPU Comp
   0   13:59:20   13:59:29    8.57   put   100    0.11   11.67   0.085      0.02-00.22    0     1   10k   0.30   no
   0   13:59:29   13:59:33    4.03   get   100    0.24   24.83   0.040      0.04-00.05    0     1   10k   0.39   no
   0   13:59:33   13:59:34    1.80   del   100    0.54   55.68   0.018      0.01-00.05    0     1   10k   0.58   no



Eventually added latency distribution histogram
Latency       LatRange Errs Procs OSize     0.0    0.1   0.2    0.3    0.4     0.5
  0.106     0.02-00.36    0    10   10k     527    396    67     10      0       0
  0.041     0.01-00.07    0    10   10k    1000      0     0      0      0       0
  0.031     0.01-00.16    0    10   10k     964     36     0      0      0       0


4/19/2013                                  Getput Swift Performance Tools                            10
Observations
  • Swift multi-scaling excellent
        – With multiple clients performance grows close to linearly
        – With single client and multiple threads
              • Smaller objects scale very well with even lots of threads
              • Larger objects hit either CPU/Network wall!
  • Both compression and encryption cost CPU
        – Limits large object bandwidth, less so with smaller ones
        – Early testing: !compression up to 2X boost for large objects
              • Similar behavior when using http instead of https
        – Only just started looking at changing ciphers

Recommendation: make compression, ssl and cipher choice optional in swiftclient

  4/19/2013                        Getput Swift Performance Tools           11
Look at the network during tests
  This is always true for uncompressible objects: upload speed ~= network bandwidth
   segerm@az1-nv-compute-0001:~$ ./getput.py -cc -oo -n1 -s100m -tp --comp
   Inst Start    End      Seconds Tests   Num   MB/S    IOPS Latency    LatRange
      0 15:52:15 15:52:20    5.85   put     1 17.10     0.17   5.800 5.80-05.80



   segerm@az1-nv-compute-0001:~$ collectl
   waiting for 1 second sample...
   #<----CPU[HYPER]-----><----------Disks-----------><----------Network---------->
   #cpu sys inter ctxsw KBRead Reads KBWrit Writes     KBIn PktIn KBOut PktOut
      0   0 1342    1078      0      0     20      2      0      4     70      56
      0   0   261    304      0      0     20      2      0      3      0       2
      1   0   580    578      0      0      0      0      0      5      0       3
      3   0 4697     780      0      0      0      0    135   2010 15956    11517
      4   0 5859    1324      0      0      0      0    138   2345 19037    13708
      4   0 5168     609      0      0     48      6    138   2354 19036    13706
      4   0 5597     993      0      0      4      1    138   2351 19053    13717
      4   0 5129     538      0      0      0      0    139   2366 19053    13716
      3   0 4579    1070      0      0      0      0    107   1817 14554    10495
      0   0   154    201      0      0     20      2      0      1      0       1




4/19/2013                        Getput Swift Performance Tools              12
Compression can be your friend too
                      …but only for compressible objects
 segerm@az1-nv-compute-0001:~$ ./getput.py -cc -oo -n5 -s100m -tp --otype s --comp
 Inst Start    End      Seconds Tests        Num     MB/S        IOPS Latency     LatRange
    0 16:00:19 16:00:29   10.33   put          5    48.42        0.48   2.060   2.03-02.09



 #<----CPU[HYPER]-----><----------Disks-----------><----------Network---------->
 #cpu sys inter ctxsw KBRead Reads KBWrit Writes     KBIn PktIn KBOut PktOut
    0   0   223    292      0      0     56      9      0      1      0       1
    1   0   618    565      0      0      0      0     14     20      2      16
    3   0 1380     694      0      0      0      0     14    167    605     317
    4   0 1846    1194      0      0      0      0     11    165    508     304
    3   1 9799    1008      0      0     12      2    173   2949    848    2949
    4   1 11071    996      0      0      0      0    198   3377    607    3376



                             Look what the proxy is doing
 #<----CPU[HYPER]-----><----------Disks-----------><----------Network---------->
 #cpu sys inter ctxsw KBRead Reads KBWrit Writes     KBIn PktIn KBOut PktOut
    1   0 1512     523      0      0     16      3      5     36      8      34
    8   2 6377     892      0      0      0      0    658   6588 171130 117279
    7   2 5488    1835      0      0      8      1    519   4933 150290 103175
    6   2 8772    6113      0      0      0      0    744   8679 162089 114059

                                                                                         3 Obj Servers
                      Another reason to make compression optional!
4/19/2013                           Getput Swift Performance Tools                      13
Let’s talk about latency
• Latency metrics originally based on averages
      – Like coarse monitoring, great for trends but poor for exceptions
      – Soon realized more detail was needed
• Consider the following. What does it really mean?
      – Is the only problem that one entry of 0.083?




4/19/2013                   Getput Swift Performance Tools      14
On closer inspection
• The first 4 entries don’t look too bad
• Even the bottom one isn’t that horrible




4/19/2013           Getput Swift Performance Tools   15
Ranges shed more light
• Even though first 4 lines have close latencies, look at
  their max values
• Now we know why line 5 so bad
• Even line 6 has very high max




4/19/2013             Getput Swift Performance Tools   16
But even that’s not enough
•   Min/Max doesn’t tell us how many outliers
•   Line 2/4 have almost 50 in the .5 bucket
•   Line 5 has 6 PUTs >4 seconds
•   Line 6 all over the place




4/19/2013             Getput Swift Performance Tools   17
Example 1: Latency of 0.04 too high!
• When looking at 1k, 10k and 100k GETS, noticed IOPS for 10k
  were lower!
      – Great reason to look at more than MB/sec
• After much digging discovered this only applied to object sizes
  7888 -> 22469 bytes
      – This could only have been found by running sets of tests and looking
        very closely at the numbers
• What’s going on here?




4/19/2013                     Getput Swift Performance Tools         18
Example 1: Latency of 0.04 too high!
• When looking at 1k, 10k and 100k GETS, noticed IOPS for 10k
  were lower!
      – Great reason to look at more than MB/sec
• After much digging discovered this only applied to object sizes
  7888 -> 22469 bytes
      – This could only have been found by running sets of tests and looking
        very closely at the numbers
• What’s going on here?
      –     We run pound on proxies to support multiple connection ports
      –     Proxy does fast get and passes data to pound over loopback address
      –     Max segsize for loopback >> network MSS
      –     Eventlet uses 8192 byte buffers
      –     Nagle algorithm: bytes > 8192 and ~<8192+MSS have delayed ACK
• Eventlet needs bigger buffers? Turn off nagle?
4/19/2013                        Getput Swift Performance Tools        19
Example 2: Latency 0.5
• Observed a number of these in small object PUTs
• Caused by a proxy timeout connecting to obj server
• Might be worth looking into ways to reduce and/or
  not try to re-contact a non-responsive server




4/19/2013           Getput Swift Performance Tools   20
Example 3: Latency 6 Secs
•   These occur less frequently, but do happen
•   Traced back to disk error on object server
•   BUT the other 2 object servers responded in < 1sec
•   Think about how many IOPS are being lost!

     Might it be worth it to return after 2 successes?
       Maybe at least ignore writes to that disk?



4/19/2013              Getput Swift Performance Tools   21
So what’s next for latency?
• Investigate why some ops have even longer latencies
• Added another switch to getput! --logops
      – Extended put_object() to return transaction ID
      – Writes detailed log records for every operation
      – Makes it possible for longer latency transactions to be traced
segerm@az1-nv-compute-0000:~$ more /tmp/getput-p-0-1363878303.log
15:05:03.522 1363878303.521659 1363878303.459080 0.062547 eb4194b73e46f52f774a63fa552755d4   o-0-1-1
15:05:03.574 1363878303.574005 1363878303.521702 0.052291 eb4194b73e46f52f774a63fa552755d4   o-0-1-2
15:05:03.627 1363878303.627218 1363878303.574032 0.053174 eb4194b73e46f52f774a63fa552755d4   o-0-1-3
15:05:03.686 1363878303.686175 1363878303.627244 0.058918 eb4194b73e46f52f774a63fa552755d4   o-0-1-4
15:05:03.747 1363878303.746874 1363878303.686201 0.060661 eb4194b73e46f52f774a63fa552755d4   o-0-1-5
15:05:03.804 1363878303.804106 1363878303.746900 0.057194 eb4194b73e46f52f774a63fa552755d4   o-0-1-6
15:05:03.866 1363878303.866148 1363878303.804133 0.061979 eb4194b73e46f52f774a63fa552755d4   o-0-1-7
15:05:03.932 1363878303.931911 1363878303.866175 0.065724 eb4194b73e46f52f774a63fa552755d4   o-0-1-8

      Recommendation: GET, PUT and DEL calls should return transaction IDs

4/19/2013                           Getput Swift Performance Tools                  22
swcmd: a nifty helper utility
• One challenge of benchmarking can be LOTs of
  containers and objects needing cleanup
      – Can have dozens to 100s containers
      – Can have Ks to 100Ks of objects
      – Swift client too slow for deletes!
• Swift client utility could use some more functionality
      – How about displaying numbers of objects in containers?
      – Container sizes and even dates?
      – When listing containers same things
      – What about parallel or even wild card listing/deletes?
            • Only parallelizes for >1K objects in a container
            • Uses multiprocessing can hit 300-400 deletes/sec

4/19/2013                     Getput Swift Performance Tools     23
Examples
                     swcmd ls
                     63482   61M 2013-03-21 16:19:12 qc-1363882747
                        49    4G 2013-03-09 13:13:36 vlat-1362834811
                         0     0 2013-03-20 22:05:06 vlat-1363817101
                         1    10 2013-03-15 13:58:37 xxx-0-0
                         1 200M 2013-03-11 12:28:16 xyxxy
                         2 200M 2013-03-11 12:29:01 xyzzy
                      2901 702M 2013-02-12 16:34:19 zzz



 swcmd      –p   ls xyz        #   list containers starting with xyz
 swcmd      –f   rc zzz        #   force removal of zzz even though not empty
 swcmd      –p   pf x          #   force removal of ALL containers starting with x
 Swcmd      rm   xyzzy/xyzzy   #   remove specific object



        Recommendation: add these types of features to the swift utility


4/19/2013                             Getput Swift Performance Tools        24
Questions?




4/19/2013    Getput Swift Performance Tools   25

More Related Content

What's hot

Lisa12 methodologies
Lisa12 methodologiesLisa12 methodologies
Lisa12 methodologies
Brendan Gregg
 
(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014
(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014
(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014
Amazon Web Services
 
200.1,2-Capacity Planning
200.1,2-Capacity Planning200.1,2-Capacity Planning
200.1,2-Capacity Planning
behrad eslamifar
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slides
Ed Hunter
 
Performance Analysis: The USE Method
Performance Analysis: The USE MethodPerformance Analysis: The USE Method
Performance Analysis: The USE Method
Brendan Gregg
 
Training Slides: Intermediate 204: Identifying and Resolving Issues with Tung...
Training Slides: Intermediate 204: Identifying and Resolving Issues with Tung...Training Slides: Intermediate 204: Identifying and Resolving Issues with Tung...
Training Slides: Intermediate 204: Identifying and Resolving Issues with Tung...
Continuent
 
What every Java developer should know about network?
What every Java developer should know about network?What every Java developer should know about network?
What every Java developer should know about network?
aragozin
 
MeetBSD2014 Performance Analysis
MeetBSD2014 Performance AnalysisMeetBSD2014 Performance Analysis
MeetBSD2014 Performance Analysis
Brendan Gregg
 
Java profiling Do It Yourself
Java profiling Do It YourselfJava profiling Do It Yourself
Java profiling Do It Yourself
aragozin
 
DIY Java Profiler
DIY Java ProfilerDIY Java Profiler
DIY Java Profiler
aragozin
 
SREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsSREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREs
Brendan Gregg
 
LISA17 Container Performance Analysis
LISA17 Container Performance AnalysisLISA17 Container Performance Analysis
LISA17 Container Performance Analysis
Brendan Gregg
 
Stop the Guessing: Performance Methodologies for Production Systems
Stop the Guessing: Performance Methodologies for Production SystemsStop the Guessing: Performance Methodologies for Production Systems
Stop the Guessing: Performance Methodologies for Production Systems
Brendan Gregg
 
FreeBSD 2014 Flame Graphs
FreeBSD 2014 Flame GraphsFreeBSD 2014 Flame Graphs
FreeBSD 2014 Flame Graphs
Brendan Gregg
 
Java on Linux for devs and ops
Java on Linux for devs and opsJava on Linux for devs and ops
Java on Linux for devs and ops
aragozin
 
FPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow SwitchFPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow Switch
Yutaka Yasuda
 
Oracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention TroubleshootingOracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention Troubleshooting
Tanel Poder
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance Tools
Brendan Gregg
 
Open Source Systems Performance
Open Source Systems PerformanceOpen Source Systems Performance
Open Source Systems Performance
Brendan Gregg
 
Designing Tracing Tools
Designing Tracing ToolsDesigning Tracing Tools
Designing Tracing Tools
Brendan Gregg
 

What's hot (20)

Lisa12 methodologies
Lisa12 methodologiesLisa12 methodologies
Lisa12 methodologies
 
(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014
(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014
(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014
 
200.1,2-Capacity Planning
200.1,2-Capacity Planning200.1,2-Capacity Planning
200.1,2-Capacity Planning
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slides
 
Performance Analysis: The USE Method
Performance Analysis: The USE MethodPerformance Analysis: The USE Method
Performance Analysis: The USE Method
 
Training Slides: Intermediate 204: Identifying and Resolving Issues with Tung...
Training Slides: Intermediate 204: Identifying and Resolving Issues with Tung...Training Slides: Intermediate 204: Identifying and Resolving Issues with Tung...
Training Slides: Intermediate 204: Identifying and Resolving Issues with Tung...
 
What every Java developer should know about network?
What every Java developer should know about network?What every Java developer should know about network?
What every Java developer should know about network?
 
MeetBSD2014 Performance Analysis
MeetBSD2014 Performance AnalysisMeetBSD2014 Performance Analysis
MeetBSD2014 Performance Analysis
 
Java profiling Do It Yourself
Java profiling Do It YourselfJava profiling Do It Yourself
Java profiling Do It Yourself
 
DIY Java Profiler
DIY Java ProfilerDIY Java Profiler
DIY Java Profiler
 
SREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsSREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREs
 
LISA17 Container Performance Analysis
LISA17 Container Performance AnalysisLISA17 Container Performance Analysis
LISA17 Container Performance Analysis
 
Stop the Guessing: Performance Methodologies for Production Systems
Stop the Guessing: Performance Methodologies for Production SystemsStop the Guessing: Performance Methodologies for Production Systems
Stop the Guessing: Performance Methodologies for Production Systems
 
FreeBSD 2014 Flame Graphs
FreeBSD 2014 Flame GraphsFreeBSD 2014 Flame Graphs
FreeBSD 2014 Flame Graphs
 
Java on Linux for devs and ops
Java on Linux for devs and opsJava on Linux for devs and ops
Java on Linux for devs and ops
 
FPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow SwitchFPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow Switch
 
Oracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention TroubleshootingOracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention Troubleshooting
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance Tools
 
Open Source Systems Performance
Open Source Systems PerformanceOpen Source Systems Performance
Open Source Systems Performance
 
Designing Tracing Tools
Designing Tracing ToolsDesigning Tracing Tools
Designing Tracing Tools
 

Similar to Getput suite

Analyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodAnalyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE Method
Brendan Gregg
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
Brendan Gregg
 
Puppet Availability and Performance at 100K Nodes - PuppetConf 2014
Puppet Availability and Performance at 100K Nodes - PuppetConf 2014Puppet Availability and Performance at 100K Nodes - PuppetConf 2014
Puppet Availability and Performance at 100K Nodes - PuppetConf 2014
Puppet
 
Writing Serverless Application in Java with comparison of 3 approaches: AWS S...
Writing Serverless Application in Java with comparison of 3 approaches: AWS S...Writing Serverless Application in Java with comparison of 3 approaches: AWS S...
Writing Serverless Application in Java with comparison of 3 approaches: AWS S...
Andrew Zakordonets
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla Cluster
ScyllaDB
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
Brendan Gregg
 
Guide to alfresco monitoring
Guide to alfresco monitoringGuide to alfresco monitoring
Guide to alfresco monitoring
Miguel Rodriguez
 
Fine grained monitoring
Fine grained monitoringFine grained monitoring
Fine grained monitoring
Iben Rodriguez
 
AEO Training - 2023.pdf
AEO Training - 2023.pdfAEO Training - 2023.pdf
AEO Training - 2023.pdf
Mohamed Taoufik TEKAYA
 
Cloud Performance Benchmarking
Cloud Performance BenchmarkingCloud Performance Benchmarking
Cloud Performance Benchmarking
Santanu Dey
 
Performance tunning
Performance tunningPerformance tunning
Performance tunning
lokesh777
 
Extreme Linux Performance Monitoring and Tuning
Extreme Linux Performance Monitoring and TuningExtreme Linux Performance Monitoring and Tuning
Extreme Linux Performance Monitoring and Tuning
Milind Koyande
 
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Anne Nicolas
 
Microservices with Micronaut
Microservices with MicronautMicroservices with Micronaut
Microservices with Micronaut
QAware GmbH
 
How deep is your buffer – Demystifying buffers and application performance
How deep is your buffer – Demystifying buffers and application performanceHow deep is your buffer – Demystifying buffers and application performance
How deep is your buffer – Demystifying buffers and application performance
Cumulus Networks
 
Lxbrand
LxbrandLxbrand
Lxbrand
mrbruning
 
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
rschuppe
 
Load testing with Blitz
Load testing with BlitzLoad testing with Blitz
Load testing with Blitz
Lindsay Holmwood
 
Trying and evaluating the new features of GlusterFS 3.5
Trying and evaluating the new features of GlusterFS 3.5Trying and evaluating the new features of GlusterFS 3.5
Trying and evaluating the new features of GlusterFS 3.5
Keisuke Takahashi
 
Load Test Like a Pro
Load Test Like a ProLoad Test Like a Pro
Load Test Like a Pro
Rob Harrop
 

Similar to Getput suite (20)

Analyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodAnalyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE Method
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Puppet Availability and Performance at 100K Nodes - PuppetConf 2014
Puppet Availability and Performance at 100K Nodes - PuppetConf 2014Puppet Availability and Performance at 100K Nodes - PuppetConf 2014
Puppet Availability and Performance at 100K Nodes - PuppetConf 2014
 
Writing Serverless Application in Java with comparison of 3 approaches: AWS S...
Writing Serverless Application in Java with comparison of 3 approaches: AWS S...Writing Serverless Application in Java with comparison of 3 approaches: AWS S...
Writing Serverless Application in Java with comparison of 3 approaches: AWS S...
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla Cluster
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
Guide to alfresco monitoring
Guide to alfresco monitoringGuide to alfresco monitoring
Guide to alfresco monitoring
 
Fine grained monitoring
Fine grained monitoringFine grained monitoring
Fine grained monitoring
 
AEO Training - 2023.pdf
AEO Training - 2023.pdfAEO Training - 2023.pdf
AEO Training - 2023.pdf
 
Cloud Performance Benchmarking
Cloud Performance BenchmarkingCloud Performance Benchmarking
Cloud Performance Benchmarking
 
Performance tunning
Performance tunningPerformance tunning
Performance tunning
 
Extreme Linux Performance Monitoring and Tuning
Extreme Linux Performance Monitoring and TuningExtreme Linux Performance Monitoring and Tuning
Extreme Linux Performance Monitoring and Tuning
 
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
 
Microservices with Micronaut
Microservices with MicronautMicroservices with Micronaut
Microservices with Micronaut
 
How deep is your buffer – Demystifying buffers and application performance
How deep is your buffer – Demystifying buffers and application performanceHow deep is your buffer – Demystifying buffers and application performance
How deep is your buffer – Demystifying buffers and application performance
 
Lxbrand
LxbrandLxbrand
Lxbrand
 
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
 
Load testing with Blitz
Load testing with BlitzLoad testing with Blitz
Load testing with Blitz
 
Trying and evaluating the new features of GlusterFS 3.5
Trying and evaluating the new features of GlusterFS 3.5Trying and evaluating the new features of GlusterFS 3.5
Trying and evaluating the new features of GlusterFS 3.5
 
Load Test Like a Pro
Load Test Like a ProLoad Test Like a Pro
Load Test Like a Pro
 

More from Iben Rodriguez

Ipv6 test plan for opnfv poc v2.2 spirent-vctlab
Ipv6 test plan for opnfv poc v2.2 spirent-vctlabIpv6 test plan for opnfv poc v2.2 spirent-vctlab
Ipv6 test plan for opnfv poc v2.2 spirent-vctlab
Iben Rodriguez
 
CENIC Conference agenda 2017_v1
CENIC Conference agenda 2017_v1CENIC Conference agenda 2017_v1
CENIC Conference agenda 2017_v1
Iben Rodriguez
 
Incident Handling in a BYOD Environment
Incident Handling in a BYOD EnvironmentIncident Handling in a BYOD Environment
Incident Handling in a BYOD Environment
Iben Rodriguez
 
New Threats, New Approaches in Modern Data Centers
New Threats, New Approaches in Modern Data CentersNew Threats, New Approaches in Modern Data Centers
New Threats, New Approaches in Modern Data Centers
Iben Rodriguez
 
Verigraph
VerigraphVerigraph
Verigraph
Iben Rodriguez
 
Iben from Spirent talks at the SDN World Congress about the importance of and...
Iben from Spirent talks at the SDN World Congress about the importance of and...Iben from Spirent talks at the SDN World Congress about the importance of and...
Iben from Spirent talks at the SDN World Congress about the importance of and...
Iben Rodriguez
 
Re-Engineering Engineering
Re-Engineering EngineeringRe-Engineering Engineering
Re-Engineering Engineering
Iben Rodriguez
 
Vmworld 2005-sln241
Vmworld 2005-sln241Vmworld 2005-sln241
Vmworld 2005-sln241
Iben Rodriguez
 

More from Iben Rodriguez (8)

Ipv6 test plan for opnfv poc v2.2 spirent-vctlab
Ipv6 test plan for opnfv poc v2.2 spirent-vctlabIpv6 test plan for opnfv poc v2.2 spirent-vctlab
Ipv6 test plan for opnfv poc v2.2 spirent-vctlab
 
CENIC Conference agenda 2017_v1
CENIC Conference agenda 2017_v1CENIC Conference agenda 2017_v1
CENIC Conference agenda 2017_v1
 
Incident Handling in a BYOD Environment
Incident Handling in a BYOD EnvironmentIncident Handling in a BYOD Environment
Incident Handling in a BYOD Environment
 
New Threats, New Approaches in Modern Data Centers
New Threats, New Approaches in Modern Data CentersNew Threats, New Approaches in Modern Data Centers
New Threats, New Approaches in Modern Data Centers
 
Verigraph
VerigraphVerigraph
Verigraph
 
Iben from Spirent talks at the SDN World Congress about the importance of and...
Iben from Spirent talks at the SDN World Congress about the importance of and...Iben from Spirent talks at the SDN World Congress about the importance of and...
Iben from Spirent talks at the SDN World Congress about the importance of and...
 
Re-Engineering Engineering
Re-Engineering EngineeringRe-Engineering Engineering
Re-Engineering Engineering
 
Vmworld 2005-sln241
Vmworld 2005-sln241Vmworld 2005-sln241
Vmworld 2005-sln241
 

Recently uploaded

Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 

Recently uploaded (20)

Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 

Getput suite

  • 1. A Swift Benchmarking Tool Mark Seger Hewlett Packard Cloud Services 4/19/2013 Getput Swift Performance Tools 1
  • 2. Problem Statement • Performance Measurements – Consistent/standard mechanisms for controlled experiments – Ability to easily modify test parameters – Minimal installation, configuration and use – Easy to compare results of multiple runs – Easy to clean up when done • Benchmarking – run performance tests at scale – Repeat tests while increasing demand for resources – Parallel tests must be coordinated: start/finish together 4/19/2013 Getput Swift Performance Tools 2
  • 3. Getput Suite • Multiple tools organized in a hierarchy – getput: actual workhorse, runs tests on single client – gpmaster: coordinates running getput on multiple clients – gpsuite: defines suites of tests to minimize switches usage – yourscript: can call gpsuite multiple times when desired 4/19/2013 Getput Swift Performance Tools 3
  • 4. getput.py • Uses swiftclient library • Lots of switches, lots of different behaviors – Standalone • Basic: creds, cname, oname, size, num/runtime, tests, rep count • More: processes, container type: shared/byproc/bynode, latency details, operation logging, and still more – Multi-node (controlled by gpmaster) • start time, rank 4/19/2013 Getput Swift Performance Tools 4
  • 5. gpmaster.py • Coordinates running of getput on multiple clients – Assures all start together and finish approx together – Summarizes results as a single line – Unlike getput only runs 1 test at a time, job for gpsuite • More required switches than getput – Credentials file – Rank – Start time – Hosts file or single client name, may need ssh key too – And a few more… • But rarely run by itself! 4/19/2013 Getput Swift Performance Tools 5
  • 6. gpsuite.py • Removes complexity of running gpmaster • Think of macros: gpsuite –suite full – Sets of object sizes, eg: 1k, 10k, 100k, etc – Numbers of threads, eg: 1, 2, 4, 8, etc • Distributes threads across multiple clients • Some runs can take hours with a single command • Cleans up after each run 4/19/2013 Getput Swift Performance Tools 6
  • 7. Getput Output Earliest versions Inst Start End Seconds Tests Num MB/S IOPS Errs 0 13:59:20 13:59:29 8.57 put 100 0.11 11.67 0 0 13:59:29 13:59:33 4.03 get 100 0.24 24.83 0 0 13:59:33 13:59:34 1.80 del 100 0.54 55.68 0 4/19/2013 Getput Swift Performance Tools 7
  • 8. Getput Output Earliest versions Inst Start End Seconds Tests Num MB/S IOPS Errs 0 13:59:20 13:59:29 8.57 put 100 0.11 11.67 0 0 13:59:29 13:59:33 4.03 get 100 0.24 24.83 0 0 13:59:33 13:59:34 1.80 del 100 0.54 55.68 0 Added latency range in later versions Inst Start End Seconds Tests Num MB/S IOPS Latency LatRange Errs 0 13:59:20 13:59:29 8.57 put 100 0.11 11.67 0.085 0.02-00.22 0 0 13:59:29 13:59:33 4.03 get 100 0.24 24.83 0.040 0.04-00.05 0 0 13:59:33 13:59:34 1.80 del 100 0.54 55.68 0.018 0.01-00.05 0 4/19/2013 Getput Swift Performance Tools 8
  • 9. Getput Output Earliest versions Inst Start End Seconds Tests Num MB/S IOPS Errs 0 13:59:20 13:59:29 8.57 put 100 0.11 11.67 0 0 13:59:29 13:59:33 4.03 get 100 0.24 24.83 0 0 13:59:33 13:59:34 1.80 del 100 0.54 55.68 0 Added latency range in later versions Inst Start End Seconds Tests Num MB/S IOPS Latency LatRange Errs 0 13:59:20 13:59:29 8.57 put 100 0.11 11.67 0.085 0.02-00.22 0 0 13:59:29 13:59:33 4.03 get 100 0.24 24.83 0.040 0.04-00.05 0 0 13:59:33 13:59:34 1.80 del 100 0.54 55.68 0.018 0.01-00.05 0 Added CPU and started playing with compression in more recent versions Inst Start End Seconds Tests Num MB/S IOPS Latency LatRange Errs Procs OSize %CPU Comp 0 13:59:20 13:59:29 8.57 put 100 0.11 11.67 0.085 0.02-00.22 0 1 10k 0.30 no 0 13:59:29 13:59:33 4.03 get 100 0.24 24.83 0.040 0.04-00.05 0 1 10k 0.39 no 0 13:59:33 13:59:34 1.80 del 100 0.54 55.68 0.018 0.01-00.05 0 1 10k 0.58 no 4/19/2013 Getput Swift Performance Tools 9
  • 10. Getput Output Earliest versions Inst Start End Seconds Tests Num MB/S IOPS Errs 0 13:59:20 13:59:29 8.57 put 100 0.11 11.67 0 0 13:59:29 13:59:33 4.03 get 100 0.24 24.83 0 0 13:59:33 13:59:34 1.80 del 100 0.54 55.68 0 Added latency range in later versions Inst Start End Seconds Tests Num MB/S IOPS Latency LatRange Errs 0 13:59:20 13:59:29 8.57 put 100 0.11 11.67 0.085 0.02-00.22 0 0 13:59:29 13:59:33 4.03 get 100 0.24 24.83 0.040 0.04-00.05 0 0 13:59:33 13:59:34 1.80 del 100 0.54 55.68 0.018 0.01-00.05 0 Added CPU and started playing with compression in more recent versions Inst Start End Seconds Tests Num MB/S IOPS Latency LatRange Errs Procs OSize %CPU Comp 0 13:59:20 13:59:29 8.57 put 100 0.11 11.67 0.085 0.02-00.22 0 1 10k 0.30 no 0 13:59:29 13:59:33 4.03 get 100 0.24 24.83 0.040 0.04-00.05 0 1 10k 0.39 no 0 13:59:33 13:59:34 1.80 del 100 0.54 55.68 0.018 0.01-00.05 0 1 10k 0.58 no Eventually added latency distribution histogram Latency LatRange Errs Procs OSize 0.0 0.1 0.2 0.3 0.4 0.5 0.106 0.02-00.36 0 10 10k 527 396 67 10 0 0 0.041 0.01-00.07 0 10 10k 1000 0 0 0 0 0 0.031 0.01-00.16 0 10 10k 964 36 0 0 0 0 4/19/2013 Getput Swift Performance Tools 10
  • 11. Observations • Swift multi-scaling excellent – With multiple clients performance grows close to linearly – With single client and multiple threads • Smaller objects scale very well with even lots of threads • Larger objects hit either CPU/Network wall! • Both compression and encryption cost CPU – Limits large object bandwidth, less so with smaller ones – Early testing: !compression up to 2X boost for large objects • Similar behavior when using http instead of https – Only just started looking at changing ciphers Recommendation: make compression, ssl and cipher choice optional in swiftclient 4/19/2013 Getput Swift Performance Tools 11
  • 12. Look at the network during tests This is always true for uncompressible objects: upload speed ~= network bandwidth segerm@az1-nv-compute-0001:~$ ./getput.py -cc -oo -n1 -s100m -tp --comp Inst Start End Seconds Tests Num MB/S IOPS Latency LatRange 0 15:52:15 15:52:20 5.85 put 1 17.10 0.17 5.800 5.80-05.80 segerm@az1-nv-compute-0001:~$ collectl waiting for 1 second sample... #<----CPU[HYPER]-----><----------Disks-----------><----------Network----------> #cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut 0 0 1342 1078 0 0 20 2 0 4 70 56 0 0 261 304 0 0 20 2 0 3 0 2 1 0 580 578 0 0 0 0 0 5 0 3 3 0 4697 780 0 0 0 0 135 2010 15956 11517 4 0 5859 1324 0 0 0 0 138 2345 19037 13708 4 0 5168 609 0 0 48 6 138 2354 19036 13706 4 0 5597 993 0 0 4 1 138 2351 19053 13717 4 0 5129 538 0 0 0 0 139 2366 19053 13716 3 0 4579 1070 0 0 0 0 107 1817 14554 10495 0 0 154 201 0 0 20 2 0 1 0 1 4/19/2013 Getput Swift Performance Tools 12
  • 13. Compression can be your friend too …but only for compressible objects segerm@az1-nv-compute-0001:~$ ./getput.py -cc -oo -n5 -s100m -tp --otype s --comp Inst Start End Seconds Tests Num MB/S IOPS Latency LatRange 0 16:00:19 16:00:29 10.33 put 5 48.42 0.48 2.060 2.03-02.09 #<----CPU[HYPER]-----><----------Disks-----------><----------Network----------> #cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut 0 0 223 292 0 0 56 9 0 1 0 1 1 0 618 565 0 0 0 0 14 20 2 16 3 0 1380 694 0 0 0 0 14 167 605 317 4 0 1846 1194 0 0 0 0 11 165 508 304 3 1 9799 1008 0 0 12 2 173 2949 848 2949 4 1 11071 996 0 0 0 0 198 3377 607 3376 Look what the proxy is doing #<----CPU[HYPER]-----><----------Disks-----------><----------Network----------> #cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut 1 0 1512 523 0 0 16 3 5 36 8 34 8 2 6377 892 0 0 0 0 658 6588 171130 117279 7 2 5488 1835 0 0 8 1 519 4933 150290 103175 6 2 8772 6113 0 0 0 0 744 8679 162089 114059 3 Obj Servers Another reason to make compression optional! 4/19/2013 Getput Swift Performance Tools 13
  • 14. Let’s talk about latency • Latency metrics originally based on averages – Like coarse monitoring, great for trends but poor for exceptions – Soon realized more detail was needed • Consider the following. What does it really mean? – Is the only problem that one entry of 0.083? 4/19/2013 Getput Swift Performance Tools 14
  • 15. On closer inspection • The first 4 entries don’t look too bad • Even the bottom one isn’t that horrible 4/19/2013 Getput Swift Performance Tools 15
  • 16. Ranges shed more light • Even though first 4 lines have close latencies, look at their max values • Now we know why line 5 so bad • Even line 6 has very high max 4/19/2013 Getput Swift Performance Tools 16
  • 17. But even that’s not enough • Min/Max doesn’t tell us how many outliers • Line 2/4 have almost 50 in the .5 bucket • Line 5 has 6 PUTs >4 seconds • Line 6 all over the place 4/19/2013 Getput Swift Performance Tools 17
  • 18. Example 1: Latency of 0.04 too high! • When looking at 1k, 10k and 100k GETS, noticed IOPS for 10k were lower! – Great reason to look at more than MB/sec • After much digging discovered this only applied to object sizes 7888 -> 22469 bytes – This could only have been found by running sets of tests and looking very closely at the numbers • What’s going on here? 4/19/2013 Getput Swift Performance Tools 18
  • 19. Example 1: Latency of 0.04 too high! • When looking at 1k, 10k and 100k GETS, noticed IOPS for 10k were lower! – Great reason to look at more than MB/sec • After much digging discovered this only applied to object sizes 7888 -> 22469 bytes – This could only have been found by running sets of tests and looking very closely at the numbers • What’s going on here? – We run pound on proxies to support multiple connection ports – Proxy does fast get and passes data to pound over loopback address – Max segsize for loopback >> network MSS – Eventlet uses 8192 byte buffers – Nagle algorithm: bytes > 8192 and ~<8192+MSS have delayed ACK • Eventlet needs bigger buffers? Turn off nagle? 4/19/2013 Getput Swift Performance Tools 19
  • 20. Example 2: Latency 0.5 • Observed a number of these in small object PUTs • Caused by a proxy timeout connecting to obj server • Might be worth looking into ways to reduce and/or not try to re-contact a non-responsive server 4/19/2013 Getput Swift Performance Tools 20
  • 21. Example 3: Latency 6 Secs • These occur less frequently, but do happen • Traced back to disk error on object server • BUT the other 2 object servers responded in < 1sec • Think about how many IOPS are being lost! Might it be worth it to return after 2 successes? Maybe at least ignore writes to that disk? 4/19/2013 Getput Swift Performance Tools 21
  • 22. So what’s next for latency? • Investigate why some ops have even longer latencies • Added another switch to getput! --logops – Extended put_object() to return transaction ID – Writes detailed log records for every operation – Makes it possible for longer latency transactions to be traced segerm@az1-nv-compute-0000:~$ more /tmp/getput-p-0-1363878303.log 15:05:03.522 1363878303.521659 1363878303.459080 0.062547 eb4194b73e46f52f774a63fa552755d4 o-0-1-1 15:05:03.574 1363878303.574005 1363878303.521702 0.052291 eb4194b73e46f52f774a63fa552755d4 o-0-1-2 15:05:03.627 1363878303.627218 1363878303.574032 0.053174 eb4194b73e46f52f774a63fa552755d4 o-0-1-3 15:05:03.686 1363878303.686175 1363878303.627244 0.058918 eb4194b73e46f52f774a63fa552755d4 o-0-1-4 15:05:03.747 1363878303.746874 1363878303.686201 0.060661 eb4194b73e46f52f774a63fa552755d4 o-0-1-5 15:05:03.804 1363878303.804106 1363878303.746900 0.057194 eb4194b73e46f52f774a63fa552755d4 o-0-1-6 15:05:03.866 1363878303.866148 1363878303.804133 0.061979 eb4194b73e46f52f774a63fa552755d4 o-0-1-7 15:05:03.932 1363878303.931911 1363878303.866175 0.065724 eb4194b73e46f52f774a63fa552755d4 o-0-1-8 Recommendation: GET, PUT and DEL calls should return transaction IDs 4/19/2013 Getput Swift Performance Tools 22
  • 23. swcmd: a nifty helper utility • One challenge of benchmarking can be LOTs of containers and objects needing cleanup – Can have dozens to 100s containers – Can have Ks to 100Ks of objects – Swift client too slow for deletes! • Swift client utility could use some more functionality – How about displaying numbers of objects in containers? – Container sizes and even dates? – When listing containers same things – What about parallel or even wild card listing/deletes? • Only parallelizes for >1K objects in a container • Uses multiprocessing can hit 300-400 deletes/sec 4/19/2013 Getput Swift Performance Tools 23
  • 24. Examples swcmd ls 63482 61M 2013-03-21 16:19:12 qc-1363882747 49 4G 2013-03-09 13:13:36 vlat-1362834811 0 0 2013-03-20 22:05:06 vlat-1363817101 1 10 2013-03-15 13:58:37 xxx-0-0 1 200M 2013-03-11 12:28:16 xyxxy 2 200M 2013-03-11 12:29:01 xyzzy 2901 702M 2013-02-12 16:34:19 zzz swcmd –p ls xyz # list containers starting with xyz swcmd –f rc zzz # force removal of zzz even though not empty swcmd –p pf x # force removal of ALL containers starting with x Swcmd rm xyzzy/xyzzy # remove specific object Recommendation: add these types of features to the swift utility 4/19/2013 Getput Swift Performance Tools 24
  • 25. Questions? 4/19/2013 Getput Swift Performance Tools 25