SlideShare a Scribd company logo
1 of 24
Download to read offline
o zEnterprise – Freedom Through Design

              SLES 11 SP2 Performance
            Evaluation for Linux on System z

                           Christian Ehrhardt
               IBM Germany Research & Development GmbH




 09/18/12          SLES11 SP2 Performance Evaluation                   1
                                                       © 2012 IBM Corporation
SLES 11 SP2 Performance Evaluation




Agenda
      Performance Evaluation
          ►   Environment
          ►   Changes one should be aware of


      Performance evaluation Summary
          ►   Improvements and degradations per area
          ►   Summarized comparison




     IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International
     Business Machines Corp., registered in many jurisdictions worldwide. Other product and
     service names might be trademarks of IBM or other companies. A current list of IBM
     trademarks is available on the Web at www.ibm.com/legal/copytrade.shtml.



 2                  SLES11 SP2 Performance Evaluation                   09/18/12   © 2012 IBM Corporation
SLES 11 SP2 Performance Evaluation


Environment
     ►   Hardware Platform – System z10                 ►   Hardware Platform – System zEnterprise (z196)
           ●   FICON 8 Gbps                                  ●   FICON 8 Gbps
           ●   FCP 8 Gbps                                    ●   FCP 8 Gbps
           ●   HiperSockets                                  ●   HiperSockets
           ●   OSA Express 3 1GbE + 10GbE                    ●   OSA Express 3 1GbE + 10GbE

     ►   Software Platform                              ►   Software Platform
           ● VM 5.4                                          ● VM 6.1
           ● LPAR                                            ● LPAR

     ►   Storage – DS8300 (2107-922 )                   ►   Storage – DS8800
           ● FICON 8 Gbps                                    ● FICON 8 Gbps
           ● FCP 8 Gbps                                      ● FCP 8 Gbps




 3                  SLES11 SP2 Performance Evaluation                           09/18/12   © 2012 IBM Corporation
SLES 11 SP2 Performance Evaluation




Compared Distribution Levels

 Compared Distribution Levels
       ►   SLES 11 SP1 (2.6.32.12-0.6-default)
       ►   SLES 11 SP2 (3.0.13-0.27-default)

 Measurements
       ►   Base regression set covering most customer use cases as good as possible
       ►   Focus on areas where performance issues are more likely
       ►   Just the top level summary, based on thousands of comparisons
       ►   Special case studies for non-common features and setups

 Terminology
       ►   Throughput – “How much could I transfer in X seconds?”
       ►   Latency – “How long do I have to wait for event X?”
       ►   Normalized cpu consumption - “How much cpu per byte do I need?”




 4                  SLES11 SP2 Performance Evaluation                        09/18/12   © 2012 IBM Corporation
SLES 11 SP2 Performance Evaluation



New process scheduler (CFS)
      Goals of CFS
         ►   Models “ideal, precise multi-tasking CPU”
         ►   Fair scheduling based on virtual runtime


      Changes you might notice when switching from O(1) to CFS
         ►   Lower response times for I/O, signals, …
         ►   Balanced distribution of process time-slices
         ►   Improved distribution across processors
         ►   Shorter consecutive time-slices
         ►   More context switches


      Improved balancing
         ►   Topology support can be activated via the topology=on kernel parameter
         ►   This makes the scheduler aware of the cpu hierarchy


      You really get something from fairness as well
         ►   Improved worst case latency and throughput
         ►   By that CFS can ease QoS commitments
 5                  SLES11 SP2 Performance Evaluation                         09/18/12   © 2012 IBM Corporation
SLES 11 SP2 Performance Evaluation



Topology of a zEnterprise System
                                                                                                Domain
                    Memory                             HW → Linux                                CPU


                                                                                  Domain                            Domain
                                                                                   BOOK                              BOOK
                       L4 Cache

            L3 Cache       …           L3 Cache
                                                                     Domain
                                                                       MC
                                                                                                Domain
                                                                                                  MC
                                                                                                                             Domain
                                                                                                                               MC

                                                                RQ    RQ   RQ     RQ       RQ   RQ     RQ       RQ      RQ   RQ     RQ    RQ
      L2           L2             L2          L2
       L1
             …      L1             L1    …     L1

     CPU 1        CPU 4          CPU 1       CPU 4
                                                             CPU 0     1      2   3        4     5      6       7        8    9     10    11


      Recreate the HW layout in the scheduler
             ►   Off in z/VM Guests, since there is no virtual topology information
             ►   Ability to group (rec. ipc heavy loads) or spread (rec. cache hungry) loads
             ►   Unintended asymmetries now known to the system


      Tunable, but complex
             ►   /proc/sys/kernel/sched_* files contains tunables for decisions regarding request queues (█)
             ►   /proc/sys/kernel/sched_domain/... provides options for the scheduling domains (█/█)

 6                       SLES11 SP2 Performance Evaluation                                           09/18/12           © 2012 IBM Corporation
SLES 11 SP2 Performance Evaluation



Benchmark descriptions - File system / LVM / Scaling

 Filesystem benchmark dbench
       ►   Emulation of Netbench benchmark
       ►   Generates file system load on the Linux VFS
       ►   Does the same I/O calls like smbd server in Samba (without networking calls)

 Simulation
       ►   Workload simulates client and server (Emulation of Netbench benchmark)
       ►   Mixed file operations workload for each process: create, write, read, append, delete
       ►   Measures throughput of transferred data
       ►   Two setup scenarios
              ● Scaling – Loads fits in cache, so mainly memory operations for scaling
                2,4,8,16 CPUs, 8Gib Memory and scaling from 1 to 40 processes
              ● Low main memory and LVM setup for mixed I/O LVM performance
                8 CPUs, 2 GiB memory and scaling from 4 to 62 processes




 7                  SLES11 SP2 Performance Evaluation                            09/18/12         © 2012 IBM Corporation
SLES 11 SP2 Performance Evaluation



File System benchmark - Scaling Scenario
                                                                                                                                                    SLES 11 SP1
              Throughput in memory (Scaling) 16 PU              SLES 11 SP1                Throughput FCP LVM 4 PU
                                                                SLES 11 SP2                                                                         SLES 11 SP2
 throughput




                                                                              throughput
              1       4    8       12      16      20   26     32     40                   1   4   8   12   16   20    26      32   40    46   50    54   60
                                                                                                       number of proces s es
                           number of proces s es




         Improved scalability for page cache operations
                  ►   Especially improves large workloads
                        ● Saves cache misses of the load that runs primarily in memory
                  ►   At the same time lower cross process deviation improves QoS
         Improved throughput for disk bound LVM setups as well
                  ►   Especially improves heavily concurrent workloads


 8                         SLES11 SP2 Performance Evaluation                                                      09/18/12               © 2012 IBM Corporation
SLES 11 SP2 Performance Evaluation


Benchmark descriptions – Re-Aim-7

  Scalability benchmark Re-Aim-7
        ►   Open Source equivalent to the AIM Multiuser benchmark
        ►   Workload patterns describe system call ratios (can be ipc, disk or calculation intensive)
        ►   The benchmark then scales concurrent jobs until the overall throughput drops
              ● Starts with one job, continuously increases that number
              ● Overall throughput usually increases until #threads ≈ #CPUs
              ● Then threads are further increased until a drop in throughput occurs
              ● Scales up to thousands of concurrent threads stressing the same components
        ►   Often a good check for non-scaling interfaces
             ● Some interfaces don't scale at all (1 Job throughput ≈ multiple jobs throughput, despite >1 CPUs)
             ● Some interfaces only scale in certain ranges (throughput suddenly drops earlier as expected)
        ►   Measures the amount of jobs per minute a single thread and all the threads can achieve

  Our Setup
        ►   2, 8, 16 CPUs, 4 GiB memory, scaling until overall performance drops
        ►   Using a journaled file system on an xpram device (stress FS code, but not be I/O bound)
        ►   Using fserver, new-db and compute workload patterns



 9                  SLES11 SP2 Performance Evaluation                             09/18/12      © 2012 IBM Corporation
SLES 11 SP2 Performance Evaluation


   Improvements to file-system sync
                                                                                                                                                                        SP1-4CPU
                                        Re-Aim-7 - fserver - 4 cpus            SP1                                      Re-Aim-7 - fserver - cpu scaling                SP1-16CPU
                                                                               SP2                                                                                      SP2-4CPU
                                          improved process scalability                                                           improved cpu scaling                   SP2-16CPU

                             60000                                                                             160000

                             50000                                                                             140000
   overall jobs per minute




                                                                                     overall jobs per minute
                                                                                                               120000
                             40000
                                                                                                               100000
                             30000                                                                             80000

                             20000                                                                             60000
                                                                                                               40000
                             10000
                                                                                                               20000
                                0                                                                                  0
                                                    16

                                                         19



                                                                   58

                                                                         98
                                  1

                                      4

                                           7
                                               10




                                                              34




                                                                          1



                                                                          8
                                                                          6
                                                                        17

                                                                        30

                                                                        56




                                                                                                                    1

                                                                                                                        8

                                                                                                                            17



                                                                                                                                      51




                                                                                                                                                  4

                                                                                                                                                        4

                                                                                                                                                              6

                                                                                                                                                                    0

                                                                                                                                                                           8

                                                                                                                                                                          46
                                                                                                                                 31



                                                                                                                                           89



                                                                                                                                                      20

                                                                                                                                                            34



                                                                                                                                                                         74
                                                                                                                                                13




                                                                                                                                                                  51



                                                                                                                                                                        13
                                                     # processes
                                                                                                                                        # processes



    The issue blocked process scaling (left) and cpu scaling (right)

    The sync call was broken, so scaling relying on it was almost non existent
                              ►      Scales well in SP2 now with increasing number of processes
                              ►      Fortunately for SP1 this system call is not one of the most frequently called ones


 10                                        SLES11 SP2 Performance Evaluation                                                           09/18/12             © 2012 IBM Corporation
SLES 11 SP2 Performance Evaluation


Benchmark descriptions – SysBench
    Scalability benchmark SysBench
         ►   SysBench is a multi-threaded benchmark tool for (among others) oltp database loads
         ►   Can be run read-only and read-write
         ►   Clients can connect locally or via network to the database
         ►   Database level and tuning is important
               ● We use Postgres 9.0.4 with configuration tuned for this workload in our test
         ►   High/Low Hit cases resemble different real world setup cases with high or low cache hit ratios

    Our List of Setups
         ►   Scaling – read-only load with 2, 8, 16 CPUs, 8 GiB memory, 4GiB DB (High-Hit)
         ►   Scaling Net – read-only load with 2, 8, 16 CPUs, 8 GiB memory, 4GiB DB (High-Hit)
         ►   Scaling FCP/FICON High Hit ratio – read-write load with 8 CPUs, 8 GiB memory, 4GiB DB
               ● RW loads still need to maintain the transaction log, so I/O is still important despite DB<MEM
         ►   Scaling FCP/FICON Low Hit ratio – read-write load with 8 CPUs, 4 GiB memory, 64GiB DB
               ● This is also I/O bound to get the Data into cache TODO
         ►   All setups use
                ● HyperPAV (FICON) / Mulitpathing (FCP)
                ● Disk spread over the Storage Server as recommended + Storage Pool Striping
                ● Extra Set of disks for the WAL (Transaction Protocol)
 11                 SLES11 SP2 Performance Evaluation                            09/18/12      © 2012 IBM Corporation
SLES 11 SP2 Performance Evaluation


SysBench – improved thread fairness
                                                                      SysBench FICON Lowhit
                                                                                                            Fairness
                                                                         Fairness improvement               improve-
                                                                                                            ment
                                            80
      Improvement to stddev from Avg in %




                                            70

                                            60

                                            50

                                            40

                                            30

                                            20

                                            10

                                            0
                                                 8           16              32             48      64          128
                                                                                  Threads

    Overall throughput stayed comparable
    But the fairness across the concurrent threads improved
                                            ►    Good to improve fair resource sharing without enforced limits in shared environments
                                            ►    Effect especially visible when the Database really has to go to disk (low hit scenario)
                                            ►    Can ease fulfilling QoS commitments



 12                                                   SLES11 SP2 Performance Evaluation                                09/18/12      © 2012 IBM Corporation
SLES 11 SP2 Performance Evaluation


Benchmark descriptions - Network
  Network Benchmark which simulates several workloads
  Transactional Workloads
        ► 2 types
            ● RR – A connection to the server is opened once for a 5 minute time frame
            ● CRR – A connection is opened and closed for every request/response
        ► 4 sizes
            ● RR 1x1 – Simulating low latency keepalives
            ● RR 200x1000 – Simulating online transactions
            ● RR 200x32k – Simulating database query
            ● CRR 64x8k – Simulating website access
  Streaming Workloads – 2 types
        ►   STRP/STRG – Simulating incoming/outgoing large file transfers (20mx20)
  All tests are done with 1, 10 and 50 simultaneous connections
  All that across on multiple connection types (different cards and MTU
      configurations)




 13                 SLES11 SP2 Performance Evaluation                09/18/12   © 2012 IBM Corporation
SLES 11 SP2 Performance Evaluation




Network I
      Gigabit Ethernet OSA Express3 MTU 1492 1CPU - TP                                                    Gigabit Ethernet OSA Express3 MTU 1492 1CPU - CPU
                                                                          SLES11-SP1                                                                             SLES11-SP1
                                                                          SLES11-SP2                                                                             SLES11-SP2




                                                                                       CPU cons umption per tras action
      Trans actions per Second




                                                                                                                          Better
                                       k1                 10                50
                                                                                                                                         k1           10            50
                                    x32                2k                2k
                                  00                0x3               0x3                                                             x32          2k            2k
                                 2                20                20                                                             200          0x3           0x3
                                                                                                                                              20            20
                                                   Workload                                                                                   Workload

    Small systems gain an improvement in streaming throughput and cpu consumption
                                 ►   Systems being cpu-oversized always had to pay a price in terms of cpu consumption
                                 ►   Sometimes dynamic adjustment of your sizing can be an option, check out cpuplugd
                                        ● A paper about that can be found at
                                          http://www.ibm.com/developerworks/linux/linux390/perf/index.html

    Generic receive offload is now on by default
                                 ►   Further improves cpu consumption, especially for streaming workloads


 14                                         SLES11 SP2 Performance Evaluation                                                                 09/18/12     © 2012 IBM Corporation
SLES 11 SP2 Performance Evaluation




Network II
      Vswitch MTU 1492                                                 Hipersockets 32k
                                                          SLES11-SP1                                                      SLES11-SP1
                                                          SLES11-SP2                                                      SLES11-SP2
      Throughput




                                                                        Throughput
                          01               10               50                              01             10               50
                        00              00               00                               00            00               00
                     0x1            0x10              x10                              0x1           x10              x10
                   20             20               20
                                                     0                               20          20
                                                                                                   0
                                                                                                                   20
                                                                                                                     0
                                     Workload                                                       Workload


    Pure virtual connections degraded by 5 to 20%
                   ►    Affects approximately half of the workload scenarios (smaller payloads are more in trouble)
                   ►    Affects virtual vswitch and hipersocket connections
    Some good messages mitigating that degradations
                   ►    The reported overhead caused in the virtualization layers improved, so scaling will be better
                   ►    Smaller degradations with larger mtu sizes
                   ►    Effect smaller on zEnterprise than on z10

 15                            SLES11 SP2 Performance Evaluation                                            09/18/12             © 2012 IBM Corporation
SLES 11 SP2 Performance Evaluation




Network III
      10 Gigabit Ethernet OSA Express 3 MTU 1492                                                        10 Gigabit Ethernet OSA Express 3 MTU 1492




                                                                                                        CPU cons umption dev. SP1 to SP2
                                        15                                                                                                     30
      Throughput deviation SP1 to SP2




                                                                                                                                               25
                                        10
                                                                                                                                               20
                                         5
                                                                                                                                               15
                                         0




                                                                                                                                           Better
                                                                                                                                               10
                                         -5
                                                                                                                                                    5
                                        -10                                                                                                         0
                                        -15                                                                                                         -5

                                        -20                                                                                                   -10

                                            11  10   50   01    10  50   k1    10 50 tr g 1 g 10 g 50                                                  11  10   50  01    10 50   k1   10 50 tr g 1 g 10 g 50
                                          1x 1x1 1x1 100 000 000 x32 32k 3 2k                                                                        1x 1x1 1x1 100 000 000 x32 32k 32k     s     s tr s tr
                                                     0 x  x1    x1 20 0 00 x 00 x   s     s tr s tr                                                             0x 0x1 0x1 200 00x 00x
                                                   20 2 00 2 00        2     2                                                                                20 20   2 0       2    2
                                                               Workload                                                                                                 Workload


    Degradations and Improvements often show no clear line to stay away from
                                          ►   Overall we rated most of the network changes as acceptable tradeoff
                                               ● If your workload matches exactly one of the degrading spots it might be not acceptable for you
                                               ● On the other hand if your load is in one of the sweets spots your load can improve a lot
                                          ►   No solid recommendations what will surely improve or degrade in a migration
                                                 ● While visible in pure network benchmarks, our net based Application benchmarks didn't show
                                                   impacts
                                                 ● Streaming like workloads improve in most, but not all cases

 16                                                  SLES11 SP2 Performance Evaluation                                                                                  09/18/12         © 2012 IBM Corporation
SLES 11 SP2 Performance Evaluation


Benchmark descriptions - Disk I/O

    Workload
         ►   Threaded I/O benchmark
         ►   Each process writes or reads to a single file, volume or disk
         ►   Can be configured to run with and without page cache (direct I/O)
         ►   Operating modes: Sequential write/rewrite/read + Random write/read


    Setup
         ►   Main memory was restricted to 256 MiB
         ►   File size (overall): 2 GiB, Record size: 64KiB
         ►   Scaling over 1, 2, 4, 8, 16, 32, 64 processes
         ►   Sequential run: write, rewrite, read
         ►   Random run: write, read (with previous sequential write)
         ►   Once using bypassing the page cache)
         ►   Sync and Drop Caches prior to every invocation




 17                 SLES11 SP2 Performance Evaluation                        09/18/12   © 2012 IBM Corporation
SLES 11 SP2 Performance Evaluation


Page cache based read - issues fixed and further improved

                                           Sequential Read I/O via page cache              improvement
      Throughput




                                                                                        SLES11-SP1
                                                                                        SLES11-SP2




                       1        2          4            8             16   32   64
                                               Concurrent Processes                   Bug fix

    Huge improvement for read throughput
                   ►   It has improved, but most of the impressive numbers are from a bug in older releases
                   ►   Occurred if a lot of concurrent read streams ran on a small (memory) system
                         ● Last Distribution releases only had a partial mitigation of the issue, but no fix
                   ►   The improvements for other loads are within a range from 0 to 15%


 18                         SLES11 SP2 Performance Evaluation                                   09/18/12       © 2012 IBM Corporation
SLES 11 SP2 Performance Evaluation




OpenSSL based Cryptography
 OpenSSL test suite
      ►   Part of the openssl suite
      ►   Able to compare different Ciphers
      ►   Able to compare different payload sizes
      ►   contains a local and distributed (via network) test tools
      ►   Can pass handshaking to crypto cards using the ibmca openssl engine
      ►   Can pass en-/decryption to accelerated CPACF commands using the ibmca openssl engine

 Our Setups
      ►   Scale concurrent connections to find bottlenecks
      ►   Iterate over different Ciphers like AES, DES
      ►   Run the workload with different payload sizes
      ►   Run SW only, CPACF assisted and CPACF + CEX3 Card assisted modes
           ● CEX cards in in accelerator and co-processor mode
      ►   We use distributed clients as workload driver
            ● Evaluate overall throughput and fairness of throughput distribution
            ● Evaluate the cpu consumption caused by the load


 19                 SLES11 SP2 Performance Evaluation                               09/18/12   © 2012 IBM Corporation
SLES 11 SP2 Performance Evaluation


OpenSSL based Cryptography
                             Improvement due to avoided compression                                                                CPU consumption
               450

               400

               350




                                                                               CPU consumption per kilobyte
               300
                                                                                                                                                                      SP1-CEX
                                                                                                                                                                      SP1-CPACF
               250
                                                                 x3.8
  Throughput




                                                                                                                                                                      SP1-NoHW




                                                                                                              Better
               200                                                                                                                                                    SP2-CEX
                                                                                                                                                                      SP2-CPACF
               150
                                                                                                                                                                      SP2-NoHW
                                                                                                                                                80%
               100

                50

                 0
                     4           8       16       32       64     128    256                                           1   2   4       8         16   32     64
                                                                                                                               Concurrent Sessions
                                          Concurrent Sessions


     Compressing the data to save cryptographic effort was the default for a while
                         ►    Counter-productive on System z as CPACF/CEX is so fast (and CEX account as off-loaded)
     Now it is possible to deactivate compression via an Environment variable
                OPENSSL_NO_DEFAULT_ZLIB=Y
                         ►    1000k payload cases with CPACF and cards x3.8 times faster now, still x2.3 without CEX cards
                         ►    Even 40b payload cases still show 15% throughput improvement
                         ►    Additionally depending on the setup 50% to 80% less cpu per transferred kilobyte
 20                                  SLES11 SP2 Performance Evaluation                                                                     09/18/12    © 2012 IBM Corporation
SLES 11 SP2 Performance Evaluation




Agenda

       Performance Evaluation
           ►   Environment
           ►   Changes one should be aware of


       Performance evaluation Summary
           ►   Improvements and degradations per area
           ►   Summarized comparison




 21                 SLES11 SP2 Performance Evaluation   09/18/12   © 2012 IBM Corporation
SLES 11 SP2 Performance Evaluation


SLES 11 SP2 Improvements & Degradations per area
                 SLES 11 SP2 vs. SLES 11 SP1
      Improvements/Degradations            Especially affects, but not limited to the following workloads
      Process scaling                      Websphere Family, large scale Databases
      Filesystem Scaling                   File serving
      Network Streaming                    TSM, replication tasks (DB2 HADR, Domino)
      Disk I/O via page cache              Clearcase, DB2 on ECKD disks, File serving, Datastage
      Disk I/O                             TSM, Databases
      Cryptography                         Secure Serving/Communication in general
      Pure Virtual Networks                Common Hipersocket setups: SAP enqueue server,
      (vswitch G2G, HS)                    Websphere to z/OS, Cognos to z/OS

   Improvements in almost every area
         ►   Especially for large workloads/machines (scaling)
   Degradations for virtual networking




 22                 SLES11 SP2 Performance Evaluation                                  09/18/12      © 2012 IBM Corporation
SLES 11 SP2 Performance Evaluation




Summary for SLES 11 SP2 vs. SP1
 SLES 11 SP2 performance is good
        ► Improved compared to the already good SP1 release
            ● Beneficial effects slightly bigger on newer System zEnterprise systems
        ► Generally recommendable
            ● Except environments focusing on pure virtual networks


 Improvements and degradations
      Level               On HW                 Improved     No difference              Degraded
                                                              or Trade-off

      SLES 11 SP2         z10                           30        67                       8
      SLES 11 SP2         z196                          33        64                       3




 23                 SLES11 SP2 Performance Evaluation                        09/18/12          © 2012 IBM Corporation
SLES 11 SP2 Performance Evaluation




Questions
    Further information is available at
         ►   Linux on System z – Tuning hints and tips
             http://www.ibm.com/developerworks/linux/linux390/perf/index.html
         ►   Live Virtual Classes for z/VM and Linux
             http://www.vm.ibm.com/education/lvc/




                                                        Christian Ehrhardt        Research & Development
                                                        Linux on System z         Schönaicher Strasse 220
                                                        Performance Evaluation
                                                                                  71032 Böblingen, Germany

                                                                                  ehrhardt@de.ibm.com




 24                 SLES11 SP2 Performance Evaluation                            09/18/12      © 2012 IBM Corporation

More Related Content

What's hot

Red Hat Enterprise Linux on IBM System z Performance Evaluation
Red Hat Enterprise Linux on IBM System z Performance EvaluationRed Hat Enterprise Linux on IBM System z Performance Evaluation
Red Hat Enterprise Linux on IBM System z Performance EvaluationIBM India Smarter Computing
 
DB2 for z/OS Architecture in Nutshell
DB2 for z/OS Architecture in NutshellDB2 for z/OS Architecture in Nutshell
DB2 for z/OS Architecture in NutshellCuneyt Goksu
 
DB2 Accounting Reporting
DB2  Accounting ReportingDB2  Accounting Reporting
DB2 Accounting ReportingJohn Campbell
 
TSM 6.4 Technical updates
TSM 6.4 Technical updates TSM 6.4 Technical updates
TSM 6.4 Technical updates Solv AS
 
DB2 Pure Scale Webcast
DB2 Pure Scale WebcastDB2 Pure Scale Webcast
DB2 Pure Scale WebcastLaura Hood
 
Monitoring a SUSE Linux Enterprise Environment with System Center Operations ...
Monitoring a SUSE Linux Enterprise Environment with System Center Operations ...Monitoring a SUSE Linux Enterprise Environment with System Center Operations ...
Monitoring a SUSE Linux Enterprise Environment with System Center Operations ...Novell
 
Synchronous Log Shipping Replication
Synchronous Log Shipping ReplicationSynchronous Log Shipping Replication
Synchronous Log Shipping Replicationelliando dias
 
Multi-threaded Performance Pitfalls
Multi-threaded Performance PitfallsMulti-threaded Performance Pitfalls
Multi-threaded Performance PitfallsCiaran McHale
 
An Integrated Asset Management Solution For Quantel sQ Servers
An Integrated Asset Management Solution For Quantel sQ ServersAn Integrated Asset Management Solution For Quantel sQ Servers
An Integrated Asset Management Solution For Quantel sQ ServersQuantel
 
DB2 Data Sharing Performance for Beginners
DB2 Data Sharing Performance for BeginnersDB2 Data Sharing Performance for Beginners
DB2 Data Sharing Performance for BeginnersMartin Packer
 
Linaro Connect 2016 (BKK16) - Introduction to LISA
Linaro Connect 2016 (BKK16) - Introduction to LISALinaro Connect 2016 (BKK16) - Introduction to LISA
Linaro Connect 2016 (BKK16) - Introduction to LISAPatrick Bellasi
 
ttec infortrend ds
ttec infortrend dsttec infortrend ds
ttec infortrend dsTTEC
 
DB2 11 for z/OS Migration Planning and Early Customer Experiences
DB2 11 for z/OS Migration Planning and Early Customer ExperiencesDB2 11 for z/OS Migration Planning and Early Customer Experiences
DB2 11 for z/OS Migration Planning and Early Customer ExperiencesJohn Campbell
 
Episode 3 DB2 pureScale Availability And Recovery [Read Only] [Compatibility...
Episode 3  DB2 pureScale Availability And Recovery [Read Only] [Compatibility...Episode 3  DB2 pureScale Availability And Recovery [Read Only] [Compatibility...
Episode 3 DB2 pureScale Availability And Recovery [Read Only] [Compatibility...Laura Hood
 
OpenPOWER Application Optimization
OpenPOWER Application Optimization OpenPOWER Application Optimization
OpenPOWER Application Optimization Ganesan Narayanasamy
 
Leveraging Open Source Integration with WSO2 Enterprise Service Bus
Leveraging Open Source Integration with WSO2 Enterprise Service BusLeveraging Open Source Integration with WSO2 Enterprise Service Bus
Leveraging Open Source Integration with WSO2 Enterprise Service BusWSO2
 
Session 7362 Handout 427 0
Session 7362 Handout 427 0Session 7362 Handout 427 0
Session 7362 Handout 427 0jln1028
 
L lpic2201-pdf
L lpic2201-pdfL lpic2201-pdf
L lpic2201-pdfG&P
 

What's hot (20)

Red Hat Enterprise Linux on IBM System z Performance Evaluation
Red Hat Enterprise Linux on IBM System z Performance EvaluationRed Hat Enterprise Linux on IBM System z Performance Evaluation
Red Hat Enterprise Linux on IBM System z Performance Evaluation
 
DB2 for z/OS Architecture in Nutshell
DB2 for z/OS Architecture in NutshellDB2 for z/OS Architecture in Nutshell
DB2 for z/OS Architecture in Nutshell
 
DB2 Accounting Reporting
DB2  Accounting ReportingDB2  Accounting Reporting
DB2 Accounting Reporting
 
TSM 6.4 Technical updates
TSM 6.4 Technical updates TSM 6.4 Technical updates
TSM 6.4 Technical updates
 
DB2 Pure Scale Webcast
DB2 Pure Scale WebcastDB2 Pure Scale Webcast
DB2 Pure Scale Webcast
 
Monitoring a SUSE Linux Enterprise Environment with System Center Operations ...
Monitoring a SUSE Linux Enterprise Environment with System Center Operations ...Monitoring a SUSE Linux Enterprise Environment with System Center Operations ...
Monitoring a SUSE Linux Enterprise Environment with System Center Operations ...
 
Samind brain power_v2a
Samind brain power_v2aSamind brain power_v2a
Samind brain power_v2a
 
Gfs andmapreduce
Gfs andmapreduceGfs andmapreduce
Gfs andmapreduce
 
Synchronous Log Shipping Replication
Synchronous Log Shipping ReplicationSynchronous Log Shipping Replication
Synchronous Log Shipping Replication
 
Multi-threaded Performance Pitfalls
Multi-threaded Performance PitfallsMulti-threaded Performance Pitfalls
Multi-threaded Performance Pitfalls
 
An Integrated Asset Management Solution For Quantel sQ Servers
An Integrated Asset Management Solution For Quantel sQ ServersAn Integrated Asset Management Solution For Quantel sQ Servers
An Integrated Asset Management Solution For Quantel sQ Servers
 
DB2 Data Sharing Performance for Beginners
DB2 Data Sharing Performance for BeginnersDB2 Data Sharing Performance for Beginners
DB2 Data Sharing Performance for Beginners
 
Linaro Connect 2016 (BKK16) - Introduction to LISA
Linaro Connect 2016 (BKK16) - Introduction to LISALinaro Connect 2016 (BKK16) - Introduction to LISA
Linaro Connect 2016 (BKK16) - Introduction to LISA
 
ttec infortrend ds
ttec infortrend dsttec infortrend ds
ttec infortrend ds
 
DB2 11 for z/OS Migration Planning and Early Customer Experiences
DB2 11 for z/OS Migration Planning and Early Customer ExperiencesDB2 11 for z/OS Migration Planning and Early Customer Experiences
DB2 11 for z/OS Migration Planning and Early Customer Experiences
 
Episode 3 DB2 pureScale Availability And Recovery [Read Only] [Compatibility...
Episode 3  DB2 pureScale Availability And Recovery [Read Only] [Compatibility...Episode 3  DB2 pureScale Availability And Recovery [Read Only] [Compatibility...
Episode 3 DB2 pureScale Availability And Recovery [Read Only] [Compatibility...
 
OpenPOWER Application Optimization
OpenPOWER Application Optimization OpenPOWER Application Optimization
OpenPOWER Application Optimization
 
Leveraging Open Source Integration with WSO2 Enterprise Service Bus
Leveraging Open Source Integration with WSO2 Enterprise Service BusLeveraging Open Source Integration with WSO2 Enterprise Service Bus
Leveraging Open Source Integration with WSO2 Enterprise Service Bus
 
Session 7362 Handout 427 0
Session 7362 Handout 427 0Session 7362 Handout 427 0
Session 7362 Handout 427 0
 
L lpic2201-pdf
L lpic2201-pdfL lpic2201-pdf
L lpic2201-pdf
 

Similar to SLES 11 SP2 PerformanceEvaluation for Linux on System z

We4IT lcty 2013 - infra-man - domino run faster
We4IT lcty 2013 - infra-man - domino run faster We4IT lcty 2013 - infra-man - domino run faster
We4IT lcty 2013 - infra-man - domino run faster We4IT Group
 
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』Insight Technology, Inc.
 
Current and Future of Non-Volatile Memory on Linux
Current and Future of Non-Volatile Memory on LinuxCurrent and Future of Non-Volatile Memory on Linux
Current and Future of Non-Volatile Memory on Linuxmountpoint.io
 
Migrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMigrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMariaDB plc
 
2009-01-28 DOI NBC Red Hat on System z Performance Considerations
2009-01-28 DOI NBC Red Hat on System z Performance Considerations2009-01-28 DOI NBC Red Hat on System z Performance Considerations
2009-01-28 DOI NBC Red Hat on System z Performance ConsiderationsShawn Wells
 
Application Caching: The Hidden Microservice
Application Caching: The Hidden MicroserviceApplication Caching: The Hidden Microservice
Application Caching: The Hidden MicroserviceScott Mansfield
 
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Community
 
SQLintersection keynote a tale of two teams
SQLintersection keynote a tale of two teamsSQLintersection keynote a tale of two teams
SQLintersection keynote a tale of two teamsSumeet Bansal
 
X86 hardware for packet processing
X86 hardware for packet processingX86 hardware for packet processing
X86 hardware for packet processingHisaki Ohara
 
Vm13 vnx mixed workloads
Vm13 vnx mixed workloadsVm13 vnx mixed workloads
Vm13 vnx mixed workloadspittmantony
 
Os Madsen Block
Os Madsen BlockOs Madsen Block
Os Madsen Blockoscon2007
 
AMP Kynetics - ELC 2018 Portland
AMP  Kynetics - ELC 2018 PortlandAMP  Kynetics - ELC 2018 Portland
AMP Kynetics - ELC 2018 PortlandKynetics
 
Asymmetric Multiprocessing - Kynetics ELC 2018 portland
Asymmetric Multiprocessing - Kynetics ELC 2018 portlandAsymmetric Multiprocessing - Kynetics ELC 2018 portland
Asymmetric Multiprocessing - Kynetics ELC 2018 portlandNicola La Gloria
 
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalShak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalTommy Lee
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualizationSisimon Soman
 
Tackling Disaster in a SCM Environment
Tackling Disaster in a SCM EnvironmentTackling Disaster in a SCM Environment
Tackling Disaster in a SCM Environmentziaulm
 
Audio in linux embedded
Audio in linux embeddedAudio in linux embedded
Audio in linux embeddedtrx2001
 
Storage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, WhiptailStorage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, WhiptailInternet World
 
Towards Software Defined Persistent Memory
Towards Software Defined Persistent MemoryTowards Software Defined Persistent Memory
Towards Software Defined Persistent MemorySwaminathan Sundararaman
 

Similar to SLES 11 SP2 PerformanceEvaluation for Linux on System z (20)

We4IT lcty 2013 - infra-man - domino run faster
We4IT lcty 2013 - infra-man - domino run faster We4IT lcty 2013 - infra-man - domino run faster
We4IT lcty 2013 - infra-man - domino run faster
 
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
 
Current and Future of Non-Volatile Memory on Linux
Current and Future of Non-Volatile Memory on LinuxCurrent and Future of Non-Volatile Memory on Linux
Current and Future of Non-Volatile Memory on Linux
 
Migrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMigrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at Facebook
 
2009-01-28 DOI NBC Red Hat on System z Performance Considerations
2009-01-28 DOI NBC Red Hat on System z Performance Considerations2009-01-28 DOI NBC Red Hat on System z Performance Considerations
2009-01-28 DOI NBC Red Hat on System z Performance Considerations
 
Application Caching: The Hidden Microservice
Application Caching: The Hidden MicroserviceApplication Caching: The Hidden Microservice
Application Caching: The Hidden Microservice
 
Linux on System z disk I/O performance
Linux on System z disk I/O performanceLinux on System z disk I/O performance
Linux on System z disk I/O performance
 
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK
 
SQLintersection keynote a tale of two teams
SQLintersection keynote a tale of two teamsSQLintersection keynote a tale of two teams
SQLintersection keynote a tale of two teams
 
X86 hardware for packet processing
X86 hardware for packet processingX86 hardware for packet processing
X86 hardware for packet processing
 
Vm13 vnx mixed workloads
Vm13 vnx mixed workloadsVm13 vnx mixed workloads
Vm13 vnx mixed workloads
 
Os Madsen Block
Os Madsen BlockOs Madsen Block
Os Madsen Block
 
AMP Kynetics - ELC 2018 Portland
AMP  Kynetics - ELC 2018 PortlandAMP  Kynetics - ELC 2018 Portland
AMP Kynetics - ELC 2018 Portland
 
Asymmetric Multiprocessing - Kynetics ELC 2018 portland
Asymmetric Multiprocessing - Kynetics ELC 2018 portlandAsymmetric Multiprocessing - Kynetics ELC 2018 portland
Asymmetric Multiprocessing - Kynetics ELC 2018 portland
 
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalShak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-final
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualization
 
Tackling Disaster in a SCM Environment
Tackling Disaster in a SCM EnvironmentTackling Disaster in a SCM Environment
Tackling Disaster in a SCM Environment
 
Audio in linux embedded
Audio in linux embeddedAudio in linux embedded
Audio in linux embedded
 
Storage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, WhiptailStorage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, Whiptail
 
Towards Software Defined Persistent Memory
Towards Software Defined Persistent MemoryTowards Software Defined Persistent Memory
Towards Software Defined Persistent Memory
 

More from IBM India Smarter Computing

Using the IBM XIV Storage System in OpenStack Cloud Environments
Using the IBM XIV Storage System in OpenStack Cloud Environments Using the IBM XIV Storage System in OpenStack Cloud Environments
Using the IBM XIV Storage System in OpenStack Cloud Environments IBM India Smarter Computing
 
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...IBM India Smarter Computing
 
A Comparison of PowerVM and Vmware Virtualization Performance
A Comparison of PowerVM and Vmware Virtualization PerformanceA Comparison of PowerVM and Vmware Virtualization Performance
A Comparison of PowerVM and Vmware Virtualization PerformanceIBM India Smarter Computing
 
IBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM pureflex system and vmware vcloud enterprise suite reference architectureIBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM pureflex system and vmware vcloud enterprise suite reference architectureIBM India Smarter Computing
 

More from IBM India Smarter Computing (20)

Using the IBM XIV Storage System in OpenStack Cloud Environments
Using the IBM XIV Storage System in OpenStack Cloud Environments Using the IBM XIV Storage System in OpenStack Cloud Environments
Using the IBM XIV Storage System in OpenStack Cloud Environments
 
All-flash Needs End to End Storage Efficiency
All-flash Needs End to End Storage EfficiencyAll-flash Needs End to End Storage Efficiency
All-flash Needs End to End Storage Efficiency
 
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
 
IBM FlashSystem 840 Product Guide
IBM FlashSystem 840 Product GuideIBM FlashSystem 840 Product Guide
IBM FlashSystem 840 Product Guide
 
IBM System x3250 M5
IBM System x3250 M5IBM System x3250 M5
IBM System x3250 M5
 
IBM NeXtScale nx360 M4
IBM NeXtScale nx360 M4IBM NeXtScale nx360 M4
IBM NeXtScale nx360 M4
 
IBM System x3650 M4 HD
IBM System x3650 M4 HDIBM System x3650 M4 HD
IBM System x3650 M4 HD
 
IBM System x3300 M4
IBM System x3300 M4IBM System x3300 M4
IBM System x3300 M4
 
IBM System x iDataPlex dx360 M4
IBM System x iDataPlex dx360 M4IBM System x iDataPlex dx360 M4
IBM System x iDataPlex dx360 M4
 
IBM System x3500 M4
IBM System x3500 M4IBM System x3500 M4
IBM System x3500 M4
 
IBM System x3550 M4
IBM System x3550 M4IBM System x3550 M4
IBM System x3550 M4
 
IBM System x3650 M4
IBM System x3650 M4IBM System x3650 M4
IBM System x3650 M4
 
IBM System x3500 M3
IBM System x3500 M3IBM System x3500 M3
IBM System x3500 M3
 
IBM System x3400 M3
IBM System x3400 M3IBM System x3400 M3
IBM System x3400 M3
 
IBM System x3250 M3
IBM System x3250 M3IBM System x3250 M3
IBM System x3250 M3
 
IBM System x3200 M3
IBM System x3200 M3IBM System x3200 M3
IBM System x3200 M3
 
IBM PowerVC Introduction and Configuration
IBM PowerVC Introduction and ConfigurationIBM PowerVC Introduction and Configuration
IBM PowerVC Introduction and Configuration
 
A Comparison of PowerVM and Vmware Virtualization Performance
A Comparison of PowerVM and Vmware Virtualization PerformanceA Comparison of PowerVM and Vmware Virtualization Performance
A Comparison of PowerVM and Vmware Virtualization Performance
 
IBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM pureflex system and vmware vcloud enterprise suite reference architectureIBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM pureflex system and vmware vcloud enterprise suite reference architecture
 
X6: The sixth generation of EXA Technology
X6: The sixth generation of EXA TechnologyX6: The sixth generation of EXA Technology
X6: The sixth generation of EXA Technology
 

SLES 11 SP2 PerformanceEvaluation for Linux on System z

  • 1. o zEnterprise – Freedom Through Design SLES 11 SP2 Performance Evaluation for Linux on System z Christian Ehrhardt IBM Germany Research & Development GmbH 09/18/12 SLES11 SP2 Performance Evaluation 1 © 2012 IBM Corporation
  • 2. SLES 11 SP2 Performance Evaluation Agenda  Performance Evaluation ► Environment ► Changes one should be aware of  Performance evaluation Summary ► Improvements and degradations per area ► Summarized comparison IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at www.ibm.com/legal/copytrade.shtml. 2 SLES11 SP2 Performance Evaluation 09/18/12 © 2012 IBM Corporation
  • 3. SLES 11 SP2 Performance Evaluation Environment ► Hardware Platform – System z10 ► Hardware Platform – System zEnterprise (z196) ● FICON 8 Gbps ● FICON 8 Gbps ● FCP 8 Gbps ● FCP 8 Gbps ● HiperSockets ● HiperSockets ● OSA Express 3 1GbE + 10GbE ● OSA Express 3 1GbE + 10GbE ► Software Platform ► Software Platform ● VM 5.4 ● VM 6.1 ● LPAR ● LPAR ► Storage – DS8300 (2107-922 ) ► Storage – DS8800 ● FICON 8 Gbps ● FICON 8 Gbps ● FCP 8 Gbps ● FCP 8 Gbps 3 SLES11 SP2 Performance Evaluation 09/18/12 © 2012 IBM Corporation
  • 4. SLES 11 SP2 Performance Evaluation Compared Distribution Levels  Compared Distribution Levels ► SLES 11 SP1 (2.6.32.12-0.6-default) ► SLES 11 SP2 (3.0.13-0.27-default)  Measurements ► Base regression set covering most customer use cases as good as possible ► Focus on areas where performance issues are more likely ► Just the top level summary, based on thousands of comparisons ► Special case studies for non-common features and setups  Terminology ► Throughput – “How much could I transfer in X seconds?” ► Latency – “How long do I have to wait for event X?” ► Normalized cpu consumption - “How much cpu per byte do I need?” 4 SLES11 SP2 Performance Evaluation 09/18/12 © 2012 IBM Corporation
  • 5. SLES 11 SP2 Performance Evaluation New process scheduler (CFS)  Goals of CFS ► Models “ideal, precise multi-tasking CPU” ► Fair scheduling based on virtual runtime  Changes you might notice when switching from O(1) to CFS ► Lower response times for I/O, signals, … ► Balanced distribution of process time-slices ► Improved distribution across processors ► Shorter consecutive time-slices ► More context switches  Improved balancing ► Topology support can be activated via the topology=on kernel parameter ► This makes the scheduler aware of the cpu hierarchy  You really get something from fairness as well ► Improved worst case latency and throughput ► By that CFS can ease QoS commitments 5 SLES11 SP2 Performance Evaluation 09/18/12 © 2012 IBM Corporation
  • 6. SLES 11 SP2 Performance Evaluation Topology of a zEnterprise System Domain Memory HW → Linux CPU Domain Domain BOOK BOOK L4 Cache L3 Cache … L3 Cache Domain MC Domain MC Domain MC RQ RQ RQ RQ RQ RQ RQ RQ RQ RQ RQ RQ L2 L2 L2 L2 L1 … L1 L1 … L1 CPU 1 CPU 4 CPU 1 CPU 4 CPU 0 1 2 3 4 5 6 7 8 9 10 11  Recreate the HW layout in the scheduler ► Off in z/VM Guests, since there is no virtual topology information ► Ability to group (rec. ipc heavy loads) or spread (rec. cache hungry) loads ► Unintended asymmetries now known to the system  Tunable, but complex ► /proc/sys/kernel/sched_* files contains tunables for decisions regarding request queues (█) ► /proc/sys/kernel/sched_domain/... provides options for the scheduling domains (█/█) 6 SLES11 SP2 Performance Evaluation 09/18/12 © 2012 IBM Corporation
  • 7. SLES 11 SP2 Performance Evaluation Benchmark descriptions - File system / LVM / Scaling  Filesystem benchmark dbench ► Emulation of Netbench benchmark ► Generates file system load on the Linux VFS ► Does the same I/O calls like smbd server in Samba (without networking calls)  Simulation ► Workload simulates client and server (Emulation of Netbench benchmark) ► Mixed file operations workload for each process: create, write, read, append, delete ► Measures throughput of transferred data ► Two setup scenarios ● Scaling – Loads fits in cache, so mainly memory operations for scaling 2,4,8,16 CPUs, 8Gib Memory and scaling from 1 to 40 processes ● Low main memory and LVM setup for mixed I/O LVM performance 8 CPUs, 2 GiB memory and scaling from 4 to 62 processes 7 SLES11 SP2 Performance Evaluation 09/18/12 © 2012 IBM Corporation
  • 8. SLES 11 SP2 Performance Evaluation File System benchmark - Scaling Scenario SLES 11 SP1 Throughput in memory (Scaling) 16 PU SLES 11 SP1 Throughput FCP LVM 4 PU SLES 11 SP2 SLES 11 SP2 throughput throughput 1 4 8 12 16 20 26 32 40 1 4 8 12 16 20 26 32 40 46 50 54 60 number of proces s es number of proces s es  Improved scalability for page cache operations ► Especially improves large workloads ● Saves cache misses of the load that runs primarily in memory ► At the same time lower cross process deviation improves QoS  Improved throughput for disk bound LVM setups as well ► Especially improves heavily concurrent workloads 8 SLES11 SP2 Performance Evaluation 09/18/12 © 2012 IBM Corporation
  • 9. SLES 11 SP2 Performance Evaluation Benchmark descriptions – Re-Aim-7  Scalability benchmark Re-Aim-7 ► Open Source equivalent to the AIM Multiuser benchmark ► Workload patterns describe system call ratios (can be ipc, disk or calculation intensive) ► The benchmark then scales concurrent jobs until the overall throughput drops ● Starts with one job, continuously increases that number ● Overall throughput usually increases until #threads ≈ #CPUs ● Then threads are further increased until a drop in throughput occurs ● Scales up to thousands of concurrent threads stressing the same components ► Often a good check for non-scaling interfaces ● Some interfaces don't scale at all (1 Job throughput ≈ multiple jobs throughput, despite >1 CPUs) ● Some interfaces only scale in certain ranges (throughput suddenly drops earlier as expected) ► Measures the amount of jobs per minute a single thread and all the threads can achieve  Our Setup ► 2, 8, 16 CPUs, 4 GiB memory, scaling until overall performance drops ► Using a journaled file system on an xpram device (stress FS code, but not be I/O bound) ► Using fserver, new-db and compute workload patterns 9 SLES11 SP2 Performance Evaluation 09/18/12 © 2012 IBM Corporation
  • 10. SLES 11 SP2 Performance Evaluation Improvements to file-system sync SP1-4CPU Re-Aim-7 - fserver - 4 cpus SP1 Re-Aim-7 - fserver - cpu scaling SP1-16CPU SP2 SP2-4CPU improved process scalability improved cpu scaling SP2-16CPU 60000 160000 50000 140000 overall jobs per minute overall jobs per minute 120000 40000 100000 30000 80000 20000 60000 40000 10000 20000 0 0 16 19 58 98 1 4 7 10 34 1 8 6 17 30 56 1 8 17 51 4 4 6 0 8 46 31 89 20 34 74 13 51 13 # processes # processes  The issue blocked process scaling (left) and cpu scaling (right)  The sync call was broken, so scaling relying on it was almost non existent ► Scales well in SP2 now with increasing number of processes ► Fortunately for SP1 this system call is not one of the most frequently called ones 10 SLES11 SP2 Performance Evaluation 09/18/12 © 2012 IBM Corporation
  • 11. SLES 11 SP2 Performance Evaluation Benchmark descriptions – SysBench  Scalability benchmark SysBench ► SysBench is a multi-threaded benchmark tool for (among others) oltp database loads ► Can be run read-only and read-write ► Clients can connect locally or via network to the database ► Database level and tuning is important ● We use Postgres 9.0.4 with configuration tuned for this workload in our test ► High/Low Hit cases resemble different real world setup cases with high or low cache hit ratios  Our List of Setups ► Scaling – read-only load with 2, 8, 16 CPUs, 8 GiB memory, 4GiB DB (High-Hit) ► Scaling Net – read-only load with 2, 8, 16 CPUs, 8 GiB memory, 4GiB DB (High-Hit) ► Scaling FCP/FICON High Hit ratio – read-write load with 8 CPUs, 8 GiB memory, 4GiB DB ● RW loads still need to maintain the transaction log, so I/O is still important despite DB<MEM ► Scaling FCP/FICON Low Hit ratio – read-write load with 8 CPUs, 4 GiB memory, 64GiB DB ● This is also I/O bound to get the Data into cache TODO ► All setups use ● HyperPAV (FICON) / Mulitpathing (FCP) ● Disk spread over the Storage Server as recommended + Storage Pool Striping ● Extra Set of disks for the WAL (Transaction Protocol) 11 SLES11 SP2 Performance Evaluation 09/18/12 © 2012 IBM Corporation
  • 12. SLES 11 SP2 Performance Evaluation SysBench – improved thread fairness SysBench FICON Lowhit Fairness Fairness improvement improve- ment 80 Improvement to stddev from Avg in % 70 60 50 40 30 20 10 0 8 16 32 48 64 128 Threads  Overall throughput stayed comparable  But the fairness across the concurrent threads improved ► Good to improve fair resource sharing without enforced limits in shared environments ► Effect especially visible when the Database really has to go to disk (low hit scenario) ► Can ease fulfilling QoS commitments 12 SLES11 SP2 Performance Evaluation 09/18/12 © 2012 IBM Corporation
  • 13. SLES 11 SP2 Performance Evaluation Benchmark descriptions - Network  Network Benchmark which simulates several workloads  Transactional Workloads ► 2 types ● RR – A connection to the server is opened once for a 5 minute time frame ● CRR – A connection is opened and closed for every request/response ► 4 sizes ● RR 1x1 – Simulating low latency keepalives ● RR 200x1000 – Simulating online transactions ● RR 200x32k – Simulating database query ● CRR 64x8k – Simulating website access  Streaming Workloads – 2 types ► STRP/STRG – Simulating incoming/outgoing large file transfers (20mx20)  All tests are done with 1, 10 and 50 simultaneous connections  All that across on multiple connection types (different cards and MTU configurations) 13 SLES11 SP2 Performance Evaluation 09/18/12 © 2012 IBM Corporation
  • 14. SLES 11 SP2 Performance Evaluation Network I Gigabit Ethernet OSA Express3 MTU 1492 1CPU - TP Gigabit Ethernet OSA Express3 MTU 1492 1CPU - CPU SLES11-SP1 SLES11-SP1 SLES11-SP2 SLES11-SP2 CPU cons umption per tras action Trans actions per Second Better k1 10 50 k1 10 50 x32 2k 2k 00 0x3 0x3 x32 2k 2k 2 20 20 200 0x3 0x3 20 20 Workload Workload  Small systems gain an improvement in streaming throughput and cpu consumption ► Systems being cpu-oversized always had to pay a price in terms of cpu consumption ► Sometimes dynamic adjustment of your sizing can be an option, check out cpuplugd ● A paper about that can be found at http://www.ibm.com/developerworks/linux/linux390/perf/index.html  Generic receive offload is now on by default ► Further improves cpu consumption, especially for streaming workloads 14 SLES11 SP2 Performance Evaluation 09/18/12 © 2012 IBM Corporation
  • 15. SLES 11 SP2 Performance Evaluation Network II Vswitch MTU 1492 Hipersockets 32k SLES11-SP1 SLES11-SP1 SLES11-SP2 SLES11-SP2 Throughput Throughput 01 10 50 01 10 50 00 00 00 00 00 00 0x1 0x10 x10 0x1 x10 x10 20 20 20 0 20 20 0 20 0 Workload Workload  Pure virtual connections degraded by 5 to 20% ► Affects approximately half of the workload scenarios (smaller payloads are more in trouble) ► Affects virtual vswitch and hipersocket connections  Some good messages mitigating that degradations ► The reported overhead caused in the virtualization layers improved, so scaling will be better ► Smaller degradations with larger mtu sizes ► Effect smaller on zEnterprise than on z10 15 SLES11 SP2 Performance Evaluation 09/18/12 © 2012 IBM Corporation
  • 16. SLES 11 SP2 Performance Evaluation Network III 10 Gigabit Ethernet OSA Express 3 MTU 1492 10 Gigabit Ethernet OSA Express 3 MTU 1492 CPU cons umption dev. SP1 to SP2 15 30 Throughput deviation SP1 to SP2 25 10 20 5 15 0 Better 10 -5 5 -10 0 -15 -5 -20 -10 11 10 50 01 10 50 k1 10 50 tr g 1 g 10 g 50 11 10 50 01 10 50 k1 10 50 tr g 1 g 10 g 50 1x 1x1 1x1 100 000 000 x32 32k 3 2k 1x 1x1 1x1 100 000 000 x32 32k 32k s s tr s tr 0 x x1 x1 20 0 00 x 00 x s s tr s tr 0x 0x1 0x1 200 00x 00x 20 2 00 2 00 2 2 20 20 2 0 2 2 Workload Workload  Degradations and Improvements often show no clear line to stay away from ► Overall we rated most of the network changes as acceptable tradeoff ● If your workload matches exactly one of the degrading spots it might be not acceptable for you ● On the other hand if your load is in one of the sweets spots your load can improve a lot ► No solid recommendations what will surely improve or degrade in a migration ● While visible in pure network benchmarks, our net based Application benchmarks didn't show impacts ● Streaming like workloads improve in most, but not all cases 16 SLES11 SP2 Performance Evaluation 09/18/12 © 2012 IBM Corporation
  • 17. SLES 11 SP2 Performance Evaluation Benchmark descriptions - Disk I/O  Workload ► Threaded I/O benchmark ► Each process writes or reads to a single file, volume or disk ► Can be configured to run with and without page cache (direct I/O) ► Operating modes: Sequential write/rewrite/read + Random write/read  Setup ► Main memory was restricted to 256 MiB ► File size (overall): 2 GiB, Record size: 64KiB ► Scaling over 1, 2, 4, 8, 16, 32, 64 processes ► Sequential run: write, rewrite, read ► Random run: write, read (with previous sequential write) ► Once using bypassing the page cache) ► Sync and Drop Caches prior to every invocation 17 SLES11 SP2 Performance Evaluation 09/18/12 © 2012 IBM Corporation
  • 18. SLES 11 SP2 Performance Evaluation Page cache based read - issues fixed and further improved Sequential Read I/O via page cache improvement Throughput SLES11-SP1 SLES11-SP2 1 2 4 8 16 32 64 Concurrent Processes Bug fix  Huge improvement for read throughput ► It has improved, but most of the impressive numbers are from a bug in older releases ► Occurred if a lot of concurrent read streams ran on a small (memory) system ● Last Distribution releases only had a partial mitigation of the issue, but no fix ► The improvements for other loads are within a range from 0 to 15% 18 SLES11 SP2 Performance Evaluation 09/18/12 © 2012 IBM Corporation
  • 19. SLES 11 SP2 Performance Evaluation OpenSSL based Cryptography  OpenSSL test suite ► Part of the openssl suite ► Able to compare different Ciphers ► Able to compare different payload sizes ► contains a local and distributed (via network) test tools ► Can pass handshaking to crypto cards using the ibmca openssl engine ► Can pass en-/decryption to accelerated CPACF commands using the ibmca openssl engine  Our Setups ► Scale concurrent connections to find bottlenecks ► Iterate over different Ciphers like AES, DES ► Run the workload with different payload sizes ► Run SW only, CPACF assisted and CPACF + CEX3 Card assisted modes ● CEX cards in in accelerator and co-processor mode ► We use distributed clients as workload driver ● Evaluate overall throughput and fairness of throughput distribution ● Evaluate the cpu consumption caused by the load 19 SLES11 SP2 Performance Evaluation 09/18/12 © 2012 IBM Corporation
  • 20. SLES 11 SP2 Performance Evaluation OpenSSL based Cryptography Improvement due to avoided compression CPU consumption 450 400 350 CPU consumption per kilobyte 300 SP1-CEX SP1-CPACF 250 x3.8 Throughput SP1-NoHW Better 200 SP2-CEX SP2-CPACF 150 SP2-NoHW 80% 100 50 0 4 8 16 32 64 128 256 1 2 4 8 16 32 64 Concurrent Sessions Concurrent Sessions  Compressing the data to save cryptographic effort was the default for a while ► Counter-productive on System z as CPACF/CEX is so fast (and CEX account as off-loaded)  Now it is possible to deactivate compression via an Environment variable OPENSSL_NO_DEFAULT_ZLIB=Y ► 1000k payload cases with CPACF and cards x3.8 times faster now, still x2.3 without CEX cards ► Even 40b payload cases still show 15% throughput improvement ► Additionally depending on the setup 50% to 80% less cpu per transferred kilobyte 20 SLES11 SP2 Performance Evaluation 09/18/12 © 2012 IBM Corporation
  • 21. SLES 11 SP2 Performance Evaluation Agenda  Performance Evaluation ► Environment ► Changes one should be aware of  Performance evaluation Summary ► Improvements and degradations per area ► Summarized comparison 21 SLES11 SP2 Performance Evaluation 09/18/12 © 2012 IBM Corporation
  • 22. SLES 11 SP2 Performance Evaluation SLES 11 SP2 Improvements & Degradations per area SLES 11 SP2 vs. SLES 11 SP1 Improvements/Degradations Especially affects, but not limited to the following workloads Process scaling Websphere Family, large scale Databases Filesystem Scaling File serving Network Streaming TSM, replication tasks (DB2 HADR, Domino) Disk I/O via page cache Clearcase, DB2 on ECKD disks, File serving, Datastage Disk I/O TSM, Databases Cryptography Secure Serving/Communication in general Pure Virtual Networks Common Hipersocket setups: SAP enqueue server, (vswitch G2G, HS) Websphere to z/OS, Cognos to z/OS  Improvements in almost every area ► Especially for large workloads/machines (scaling)  Degradations for virtual networking 22 SLES11 SP2 Performance Evaluation 09/18/12 © 2012 IBM Corporation
  • 23. SLES 11 SP2 Performance Evaluation Summary for SLES 11 SP2 vs. SP1  SLES 11 SP2 performance is good ► Improved compared to the already good SP1 release ● Beneficial effects slightly bigger on newer System zEnterprise systems ► Generally recommendable ● Except environments focusing on pure virtual networks  Improvements and degradations Level On HW Improved No difference Degraded or Trade-off SLES 11 SP2 z10 30 67 8 SLES 11 SP2 z196 33 64 3 23 SLES11 SP2 Performance Evaluation 09/18/12 © 2012 IBM Corporation
  • 24. SLES 11 SP2 Performance Evaluation Questions  Further information is available at ► Linux on System z – Tuning hints and tips http://www.ibm.com/developerworks/linux/linux390/perf/index.html ► Live Virtual Classes for z/VM and Linux http://www.vm.ibm.com/education/lvc/ Christian Ehrhardt Research & Development Linux on System z Schönaicher Strasse 220 Performance Evaluation 71032 Böblingen, Germany ehrhardt@de.ibm.com 24 SLES11 SP2 Performance Evaluation 09/18/12 © 2012 IBM Corporation