SlideShare a Scribd company logo
Linux on System z –
       disk I/O performance
        zLG13           Mario Held




©2012 IBM Corporation
Trademarks
    IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
    Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be
    trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at
    “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml.

    Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

    Oracle and Java are registered trademarks of Oracle and/or its affiliates in the United States,
     other countries, or both.


     Red Hat, Red Hat Enterprise Linux (RHEL) are registered trademarks of Red Hat, Inc. in the United
        States and other countries.


     SLES and SUSE are registered trademarks of Novell, Inc., in the United States and other countries.


     SPEC and the "performance chart" SPEC logo are registered trademarks of the Standard Performance
        Evaluation Corporation.

    Other product and service names might be trademarks of IBM or other companies.

2                                                                                                     © 2012 IBM Corporation
      Source: If applicable, describe source origin
Agenda
    ■   Terms and characteristics in the disk I/O area
    ■   Disk configurations for FICON/ECKD and FCP/SCSI
    ■   Measurements
              – Setup description
              – Results comparison for FICON/ECKD and FCP/SCSI
    ■   Summary / Conclusion




3                                                                © 2012 IBM Corporation
Logical volumes
    ■   Linear logical volumes allow an easy extension of the file system
    ■   Striped logical volumes
               – provide the capability to perform simultaneous I/O on different stripes
               – allow load balancing
               – are extendable
               – may lead to smaller I/O request sizes compared to linear logical volumes or
                     normal volumes
    ■   Don't use logical volumes for “/” or “/usr”
               – in case the logical volume gets corrupted your system is gone
    ■   Logical volumes require more CPU cycles than physical disks
               – Consumption increases with the number of physical disks used for the logical
                    volume
    ■   LVM or Logical Volume Manager is the tool to manage logical volumes




4                                                                                       © 2012 IBM Corporation
Linux multipath
    ■   Connects a volume over multiple independent paths and/or ports
    ■   Multibus mode
               – for load balancing
               – to overcome bandwidth limitations of a single path
               – uses all paths in a priority group in a round-robin manner
                         • switches the path after a configurable amount of I/Os
                            – See rr_min_io in multipath.conf
               – increases throughput for
                         • ECKD volumes with static PAV devices in Linux distributions older than
                              SLES11 and RHEL6
                         • SCSI volumes in all Linux distributions
               – for high availability
    ■   Failover mode
               – keeps the volume accessible in case of a single failure in the connecting hardware
                          • should one path fail, the operating system will route I/O through one of
                              the remaining paths, with no changes visible to the applications
               – will not improve performance
               – for high availability


5                                                                                        © 2012 IBM Corporation
Page cache considerations
    ■   Page cache helps Linux to economize I/O
    ■   Requires Linux memory pages
    ■   Adopts its size to the amount of memory in use by processes
    ■   Write requests are delayed and data in the page cache can have multiple updates before
           being written to disk.
    ■   Write requests in the page cache can be merged into larger I/O requests
    ■   Read requests can be made faster by adding a read ahead quantity, depending on the
           historical behavior of file system accesses by applications
    ■   Settings of the configuration parameters swappiness, dirty_ratio, dirty_background ratio
           control the page cache
    ■   Linux does not know which data the application really needs next. It makes only a guess




6                                                                                       © 2012 IBM Corporation
Linux kernel components involved in disk I/O
             Application program            Issuing reads and writes


                                            VFS dispatches requests to different devices and
          Virtual File System (VFS)         Translates to sector addressing

       Logical Volume Manager (LVM)         LVM defines the physical to logical device relation

                dm-multipath                Multipath sets the multipath policies

            Device mapper (dm)              dm holds the generic mapping of all block devices and
                                            performs 1:n mapping for logical volumes and/or
            Block device layer              multipath
                Page cache                  Page cache contains all file I/O data, direct I/O bypasses
                                            the page cache
                I/O scheduler               I/O schedulers merge, order and queue requests, start
                                            device drivers via (un)plug device

                                            Data transfer handling
    Direct Acces Storage Device driver (DASD)    Small Computer System Interface driver (SCSI)

                             Device drivers       z Fiber Channel Protocol driver (zFCP)
                                                   Queued Direct I/O driver (qdio)
7                                                                                       © 2012 IBM Corporation
DS8000 storage server – components
    ■   The rank
              – represents a raid array of disk                             Device    Volumes            Rank
                                                                            Adapter                      s
                    drives
              – is connected to the servers by                                                             1




                                                    Read cache
                    device adapters




                                                                 Server 0
              – belongs to one server for primary                                                          2
                    access
    ■   A device adapter connects ranks to                                                                 3
           servers




                                                     NVS
                                                                                                           4
    ■   Each server controls half of the storage
           servers disk drives, read cache and                                                             5

                                                    Read cache
           non-volatile storage (NVS / write




                                                                 Server 1
           cache)
                                                                                                           6
                                                                                                           7
                                                     NVS



                                                                                                           8


8                                                                                               © 2012 IBM Corporation
DS8000 storage server – extent pool
    ■   Extent pools                                Device
               – consist of one or                            Volumes   Rank
                                                    Adapter
                    several ranks                                       s
               – can be configured for                                   1           ECKD
                    one type of




                                         Server 0
                    volumes only:                                       2            ECKD
                    either ECKD
                    DASDs or SCSI                                       3            ECKD
                    LUNs
               – are assigned to one                                    4            SCSI
                    server
    ■   Some possible extent pool                                       5            SCSI
                                         Server 1
          definitions for ECKD and
          SCSI are shown in the                                         6            SCSI
          example
                                                                        7            SCSI

                                                                        8            ECKD



9                                                                           © 2012 IBM Corporation
DS8000 storage pool striped volume
     ■   A storage pool striped volume (rotate extents) is defined on
             a extent pool consisting of several ranks
     ■   It is striped over the ranks in stripes of 1 GiB                     Volume
     ■   As shown in the example                                                      Ranks
               – The 1 GiB stripes are rotated over all ranks
               – The first storage pool striped volume start in rank    147      1
                    1, the next one would start in rank 2
                                                                        258       2

                                                                        36        3




                    Disk drives
10                                                                             © 2012 IBM Corporation
DS8000 storage pool striped volume (cont.)
       ■   A storage pool striped volume
                  – uses disk drives of multiple
                        ranks                                  Device    Volumes            Rank
                                                               Adapter
                  – uses several device                                                     s
                        adapters
                  – is bound to one server of the                                             1




                                                    Server 0
                        storage
                                                                                              2
                                                                                              3
                                                                         extent pool



                                                    Server 1
     Disk drives

11                                                                                 © 2012 IBM Corporation
Striped volumes comparison/best practices for best performance


                                 LVM striped logical            DS8000 storage
                                 volumes                        pool striped volumes
 Striping is done by...          Linux (device-mapper)          Storage server
 Which disks to choose...        plan carefully                 don't care
 Disks from one extent pool...   per rank, alternating over     out of multiple ranks
                                 servers
 Administrating disks is...      complex                        simple
 Extendable...                   yes                            no
                                                                “gluing” disks together as linear
                                                                LV can be a workaround
 Stripe size...                  variable, to suit your workload 1GiB
                                 (64KiB, default)




12                                                                                   © 2012 IBM Corporation
Disk I/O attachment and DS8000 storage subsystem
     ■   Each disk is a configured volume
            in a extent pool (here extent
            pool = rank)
     ■   Host bus adapters connect                       Device
                                                         Adapter   ranks
            internally to both servers                                             ECKD
          Channel                 Host Bus                                 1
           path 1                 Adapter 1




                                              Server 0
          Channel                 Host Bus                                 2       SCSI
           path 2                 Adapter 2
          Channel                 Host Bus
                                                                           3       ECKD
           path 3                 Adapter 3
                        Switch




          Channel                 Host Bus
           path 4                 Adapter 4                                4       SCSI
          Channel                 Host Bus
           path 5                 Adapter 5
                                                                           5       ECKD
          Channel                 Host Bus
                                              Server 1
           path 6                 Adapter 6
          Channel                 Host Bus                                 6       SCSI
           path 7                 Adapter 7
          Channel                 Host Bus                                 7       ECKD
           path 8                 Adapter 8

FICON Express                                                              8       SCSI
13                                                                         © 2012 IBM Corporation
FICON/ECKD layout
                             ■    The dasd driver starts the I/O on a subchannel
                             ■    Each subchannel connects to all channel paths in the path group
                             ■    Each channel connects via a switch to a host bus adapter
       Application program   ■    A host bus adapter connects to both servers
                             ■    Each server connects to its ranks
              VFS

             LVM
            Multipath
              dm                                                                           DA   Ranks
      Block device layer          Channel                                                                        1




                                                                       Server 1 Server 0
           Page cache            subsystem
          I/O scheduler          a        chpid 1             HBA 1
                                                                                                                 3
           dasd driver
                                 b        chpid 2
                                                     Switch
                                                              HBA 2

                                 c        chpid 3             HBA 3                                              5

                                 d        chpid 4             HBA 4
                                                                                                                 7


14                                                                                                © 2012 IBM Corporation
FICON/ECKD single disk I/O
                             ■   Assume that subchannel a corresponds to disk 2 in rank 1
                             ■   The full choice of host adapters can be used
                             ■   Only one I/O can be issued at a time through subchannel a
       Application program   ■   All other I/Os need to be queued in the dasd driver and in the
                                      block device layer until the subchannel is no longer busy with
                                      the preceding I/O
              VFS


                                                                                              DA   Ranks
      Block device layer          Channel                                                                            1




                                                                          Server 1 Server 0
           Page cache            subsystem
          I/O scheduler          a           chpid 1            HBA 1
                                                                                                                     3
                                           chpid 2              HBA 2
           dasd driver           !                     Switch
                                           chpid 3              HBA 3                                                5
                                           chpid 4              HBA 4
                                                                                                                     7


15                                                                                                   © 2012 IBM Corporation
FICON/ECKD single disk I/O with PAV (SLES10/RHEL5)
                             ■   VFS sees one device
                             ■   The device mapper sees the real device and all alias devices
                             ■   Parallel Access Volumes solve the I/O queuing in front of the subchannel
                             ■   Each alias device uses its own subchannel
      Application program    ■   Additional processor cycles are spent to do the load balancing in the
                                    device mapper
             VFS             ■   The next slowdown is the fact that only one disk is used in the storage
                                    server. This implies the use of only one rank, one device adapter,
                                    one server
           Multipath                                                                              ! Ranks
             dm          !                                                                   DA
      Block device layer          Channel                                                                           1




                                                                         Server 1 Server 0
          Page cache             subsystem
         I/O scheduler           a         chpid 1             HBA 1
                                                                                                                    3
          dasd driver
                                 b         chpid 2    Switch   HBA 2

                                 c         chpid 3             HBA 3                                                5

                                 d         chpid 4             HBA 4
                                                                                                                    7


16                                                                                                   © 2012 IBM Corporation
FICON/ECKD single disk I/O with HyperPAV (SLES11/RHEL6)
                            ■   VFS sees one device
                            ■   The dasd driver sees the real device and all alias devices
                            ■   Load balancing with HyperPAV and static PAV is done in the dasd driver. The
                                    aliases need only to be added to Linux. The load balancing works better
      Application program           than on the device mapper layer.
                            ■   Less additional processor cycles are needed than with Linux multipath.
             VFS
                            ■   Slowdown is due the fact that only one disk is used in the storage server. This
                                    implies the use of only one rank, one device            adapter, one server

                                                                                                DA   ! Ranks
      Block device layer         Channel                                                                                1




                                                                            Server 1 Server 0
          Page cache            subsystem
         I/O scheduler          a           chpid 1             HBA 1
                                                                                                                        3
          dasd driver
                                b           chpid 2    Switch   HBA 2

                                c           chpid 3             HBA 3                                                   5

                                d           chpid 4             HBA 4
                                                                                                                        7

17                                                                                                      © 2012 IBM Corporation
FICON/ECKD DASD I/O to a linear/striped logical volume
                              ■   VFS sees one device (logical volume)
                              ■   The device mapper sees the logical volume and the physical volumes
                              ■   Additional processor cycles are spent to map the I/Os to the physical volumes.
                              ■   Striped logical volumes require more additional processor cycles than linear
                                       logical volumes
       Application program    ■   With a striped logical volume the I/Os can be well balanced over the entire
                                      storage server and overcome limitations from a single rank, a single
                                      device adapter or a single server
              VFS
                              ■   To ensure that I/O to one physical disk is not limited by one subchannel, PAV
                                      or HyperPAV should be used in combination with logical volumes
              LVM
               dm         !                                                                      DA   Ranks
      Block device layer           Channel                                                                              1




                                                                             Server 1 Server 0
           Page cache             subsystem
          I/O scheduler           a           chpid 1            HBA 1
                                                                                                                        3
           dasd driver
                                  b           chpid 2   Switch   HBA 2

                                  c           chpid 3            HBA 3                                                  5

                                  d           chpid 4            HBA 4
                                                                                                                        7

18                                                                                                      © 2012 IBM Corporation
FICON/ECKD dasd I/O to a storage pool striped volume
      with HyperPAV     A storage pool striped volume makes only sense in combination with PAV or
                                 ■

                                         HyperPAV
                                 ■    VFS sees one device
                                 ■    The dasd driver sees the real device and all alias devices
     Application program         ■    The storage pool striped volume spans over several ranks of one server and
                                          overcomes the limitations of a single rank and / or a single device
                                          adapter
             VFS                 ■    Storage pool striped volumes can also be used as physical disks for a logical
                                          volume to use both server sides.
                                 ■    To ensure that I/O to one dasd is not limited by one subchannel, PAV or
                                          HyperPAV should be used
                                                                                       ! DA             Ranks
     Block device layer               Channel                                                                             1




                                                                                 Server 1 Server 0
          Page cache                 subsystem
         I/O scheduler               a           chpid 1             HBA 1
                                                                                                                          3
                                     b           chpid 2    Switch   HBA 2
                                                                                                     extent pool
          dasd driver
                                     c           chpid 3             HBA 3                                                5

                                     d           chpid 4             HBA 4
                                                                                                                          7

19                                                                                                        © 2012 IBM Corporation
FCP/SCSI layout
                             ■   The SCSI driver finalizes the I/O requests
                             ■   The zFCP driver adds the FCP protocol to the
                                     requests
                             ■   The qdio driver transfers the I/O to the channel
       Application program   ■   A host bus adapter connects to both servers
                             ■   Each server connects to its ranks
               VFS

             LVM
            Multipath
              dm                                                                        DA   Ranks
      Block device layer




                                                                    Server 1 Server 0
           Page cache
                                                                                                             2
          I/O scheduler              chpid 5              HBA 5

            SCSI driver
                                     chpid 6     Switch   HBA 6                                              4
            zFCP driver
                                     chpid 7              HBA 7
             qdio driver
                                                                                                             6
                                     chpid 8              HBA 8

                                                                                                             8
20                                                                                             © 2012 IBM Corporation
FCP/SCSI LUN I/O to a single disk
                             ■   Assume that disk 3 in rank 8 is reachable via channel 6 and host bus
                                     adapter 6
                             ■   Up to 32 (default value) I/O requests can be sent out to disk 3 before
                                     the first completion is required
       Application program   ■   The throughput will be limited by the rank and / or the device
                                     adapter
               VFS           ■   There is no high availability provided for the connection between the
                                     host and the storage server


                                                                                           DA   Ranks
       Block device layer




                                                                       Server 1 Server 0
           Page cache
                                                                                                                2
          I/O scheduler                 chpid 5              HBA 5

            SCSI driver
                                        chpid 6     Switch   HBA 6                                              4
            zFCP driver
                                        chpid 7              HBA 7
             qdio driver
                                 !                                                                              6
                                        chpid 8              HBA 8

                                                                                                                8
21
                                                                                                  !
                                                                                                  © 2012 IBM Corporation
FCP/SCSI LUN I/O to a single disk with multipathing
                               ■   VFS sees one device
                               ■   The device mapper sees the multibus or failover
                                      alternatives to the same disk
       Application program     ■   Administrational effort is required to define all paths to one
                                      disk
               VFS             ■   Additional processor cycles are spent to do the mapping to
                                      the desired path for the disk in the device mapper


            Multipath      !
              dm                                                                              DA   Ranks
      Block device layer




                                                                          Server 1 Server 0
           Page cache
                                                                                                                    2
          I/O scheduler                   chpid 5              HBA 5

            SCSI driver
                                          chpid 6
                                                      Switch
                                                               HBA 6                                                4
            zFCP driver
                                          chpid 7              HBA 7
             qdio driver
                                                                                                                    6
                                          chpid 8              HBA 8

                                                                                                                    8
22                                                                                                   !
                                                                                                     © 2012 IBM Corporation
FCP/SCSI LUN I/O to a linear/striped logical volume
                              ■   VFS sees one device (logical volume)
                              ■   The device mapper sees the logical volume and the physical volumes
                              ■   Additional processor cycles are spent to map the I/Os to the physical
                                      volumes.
      Application program     ■   Striped logical volumes require more additional processor cycles than linear
                                       logical volumes
                              ■   With a striped logical volume the I/Os can be well balanced over the entire
              VFS                     storage server and overcome limitations from a single rank, a single
                                      device adapter or a single server
              LVM             ■   To ensure high availability the logical volume should be used in
                                      combination with multipathing
              dm          !                                                                     DA   Ranks
      Block device layer




                                                                            Server 1 Server 0
           Page cache
                                                                                                                      2
          I/O scheduler                   chpid 5               HBA 5

           SCSI driver
                                          chpid 6
                                                       Switch
                                                                HBA 6                                                 4
           zFCP driver
                                          chpid 7               HBA 7
            qdio driver
                                                                                                                      6
                                          chpid 8               HBA 8

                                                                                                                      8

23                                                                                                     © 2012 IBM Corporation
FCP/SCSI LUN I/O to storage pool striped volume
     with multipathing   Storage pool striped volumes make no sense without high availability
                                ■

                                ■   VFS sees one device
                                ■   The device mapper sees the multibus or failover alternatives to the
                                        same disk
     Application program        ■   The storage pool striped volume spans over several ranks of one
                                        server and overcomes the limitations of a single rank and / or a
             VFS                        single device adapter
                                ■   Storage pool striped volumes can also be used as physical disks for a
                                        logical volume to make use of both server sides
          Multipath      !
            dm                                                                               DA   Ranks
     Block device layer




                                                                         Server 1 Server 0
         Page cache
                                                                                                                   2
        I/O scheduler                      chpid 5             HBA 5

                                           chpid 6             HBA 6                                               4
          SCSI driver                                 Switch
          zFCP driver
                                           chpid 7             HBA 7
           qdio driver
                                                                                                                   6
                                           chpid 8             HBA 8

                                                                                                                   8
                                                                                             extent pool
24
                                                                          !                         © 2012 IBM Corporation
Summary – I/O processing characteristics

      ■   FICON/ECKD:
               – 1:1 mapping host subchannel:dasd
               – Serialization of I/Os per subchannel
               – I/O request queue in Linux
               – Disk blocks are 4 KiB
               – High availability by FICON path groups
               – Load balancing by FICON path groups and Parallel Access Volumes (PAV)
                    or HyperPAV


      ■   FCP/SCSI
               – Several I/Os can be issued against a LUN immediately
               – Queuing in the FICON Express card and/or in the storage server
               – Additional I/O request queue in Linux
               – Disk blocks are 512 Bytes
               – High availability by Linux multipathing, type failover or multibus
               – Load balancing by Linux multipathing, type multibus




25                                                                                    © 2012 IBM Corporation
Summary – Performance considerations
     ■   Use more than 1 FICON Express channel


     ■   Speed up techniques
               – Use more than 1 rank: storage pool striping, Linux striped logical volume
               – Use more channels for ECKD
                        • Logical path groups
                        • Use more than 1 subchannel: PAV, HyperPAV
                        • High Performance FICON and Read Write Track Data
               – Use more channels for SCSI
                        • SCSI Linux multipath multibus




26                                                                                       © 2012 IBM Corporation
Configuration – host and connection to storage server
      z9 LPAR:
      ■ 8 CPUs

      ■ 512MiB

      ■ 4 FICON Express4

                – 2 ports per feature used for FICON
                – 2 ports per feature used for FCP
                – Total 8 paths FICON, 8 paths FCP
      DS8700:




                                                                   Switch
      ■ 8 FCION Express4

                – 1 port per feature used for FICON      System
                                                                                     DS8K
                – 1 port per feature used for FCP           z
      Linux:
      ■ SLES11 SP1 (with HyperPAV, High Performance

            FICON)
      ■ Kernel: 2.6.32.13-0.5-default (+dm stripe patch)
                                                                  FICON Port
      ■ Device-mapper:
                                                                  FCP Port
                – multipath: version 1.2.0                        Not used
                – striped: version 1.3.0
                – Multipath-tools-0.4.8-40.23.1
      ■ 8 FICON paths defined in a channel path group




27                                                                             © 2012 IBM Corporation
IOZone Benchmark description
     ■   Workload
               – Threaded I/O benchmark (IOzone)
               – Each process writes or reads a single file
               – Options to bypass page cache, separate execution of sequential write, sequential
                    read, random read/write

     ■   Setup
                 – Main memory was restricted to 512 MiB
                 – File size: 2 GiB, Record size: 8 KiB or 64 KiB
                 – Run with 1, 8 and 32 processes
                 – Sequential run: write, rewrite, read
                 – Random run: write, read (with previous sequential write)
                 – Runs with direct I/O and Linux page cache
                 – Sync and drop caches prior to every invocation of the workload to reduce noise




28                                                                                         © 2012 IBM Corporation
Measured scenarios: FICON/ECKD
     FICON/ECKD (always 8 paths)
     ■ 1 single disk

           – configured as 1 volume in a extent pool, containing 1 rank
                                                                          FICON/ECKD measurement series
     ■   1 single disk
           – configured as 1 volume in a extent pool, containing 1 rank
           – using 7 HyperPAV alias devices
     ■   1 storage pool striped disk
           – configured as 1 volume in a extent pool, containing 4
             ranks
           – using 7 HyperPAV alias devices
     ■   1 striped logical volume
           – built from 2 storage pool striped disks, 1 from sever0
             and 1 from server1
           – each disk
               • configured as 1 volume in a extent pool, containing 4
                  ranks
               • using 7 HyperPAV alias devices                              ECKD    ECKD          ECKD
     ■   1 striped logical volume                                            1d      1d hpav       1d hpav
                                                                                                   sps
           – built from 8 disks, 4 from sever0 and 4 from server1            ECKD    ECKD
           – each disk                                                       2d lv   8d lv
               • configured as 1 volume in a extent pool, containing 1       hpav    hpav
                  rank                                                       sps
               • using 7 HyperPAV alias devices
29                                                                                           © 2012 IBM Corporation
Measured scenarios: FCP/SCSI
     FCP/SCSI
     ■ 1 single disk

           – configured as 1 volume in a extent pool, containing 1 rank
           – using 1 path                                                  FCP/SCSI measurement series
     ■   1 single disk
           – configured as 1 volume in a extent pool, containing 1 rank
           – using 8 paths, multipath multibus, rr_min_io = 1
     ■   1 storage pool striped disk
           – configured as 1 volume in a extent pool, containing 4 ranks
           – using 8 paths, multipath multibus, rr_min_io = 1
     ■   1 striped logical volume
           – built from 2 storage pool striped disks, 1 from sever0
             and 1 from server1
           – each disk
               • configured as 1 volume in a extent pool, containing 4
                  ranks
               • using 8 paths, multipath multibus, rr_min_io = 1          SCSI 1d    SCSI 1d     SCSI 1d
     ■   1 striped logical volume                                                     mb          mb sps
           – built from 8 disks, 4 from sever0 and 4 from server1          SCSI 2d    SCSI 8d
           – each disk                                                     lv mb      lv mb
                                                                           sps
               • configured as 1 volume in a extent pool, containing 1
                  rank
               • using 8 paths, multipath multibus, rr_min_io = 1
30                                                                                         © 2012 IBM Corporation
Database like scenario, random read
     ■   ECKD
          – For 1 process the scenarios show equal                            Throughput 8 KiB requests
            throughput                                                              direct io random read
          – For 8 processes HyperPAV improves the
            throughput by up to 5.9x
          – For 32 processes the combination Linux
            logical volume with HyperPAV dasd improves
            throughput by 13.6x




                                                          Throughput
     ■   SCSI
          – For 1 process the scenarios show equal
            throughput
          – For 8 processes multipath multibus improves
            throughput by 1.4x
          – For 32 processes multipath multibus                           1                     8                    32
            improves throughput by 3.5x                                                number of processes
     ■   ECKD versus SCSI
                                                                       ECKD 1d           ECKD 1d hpav        ECKD 1d hpav
          – Throughput for corresponding scenario is                                                         sps
            always higher with SCSI                                    ECKD 2d lv        ECKD 8d lv          SCSI 1d
                                                                       hpav sps          hpav
                                                                       SCSI 1d mb        SCSI 1d mb          SCSI 2d lv mb
                                                                                         sps                 sps
                                                                       SCSI 8d lv mb




31                                                                                                             © 2012 IBM Corporation
Database like scenario, random write
     ■   ECKD
          – For 1 process the throughput for all scenarios                      Throughput 8 KiB requests
            show minor deviation                                                    direct io random write
          – For 8 processes HyperPAV + storage pool
            striping or Linux logical volume improve the
            throughput by 3.5x
          – For 32 processes the combination Linux logical
            volume with HyperPAV or storage pool striped
            dasd improves throughput by 10.8x




                                                             Throughput
     ■   SCSI
          – For 1 process the throughput for all scenarios
            show minor deviation
          – For 8 processes the combination storage pool
            striping and multipath multibus improves
            throughput by 5.4x
                                                                            1                      8                       32
          – For 32 processes the combination Linux logical
            volume and multipath multibus improves                                        number of processes

            throughput by 13.3x                                           ECKD 1d            ECKD 1d            ECKD 1d
     ■   ECKD versus SCSI                                                                    hpav               hpav sps
                                                                          ECKD 2d lv         ECKD 8d lv         SCSI 1d
          – ECKD is better for 1 process                                  hpav sps           hpav
          – SCSI is better for multiple processes                         SCSI 1d mb         SCSI 1d mb         SCSI 2d lv mb
     ■   General                                                                             sps                sps
                                                                          SCSI 8d lv mb
          – More NVS keeps throughput up with 32
            processes
32                                                                                                                 © 2012 IBM Corporation
Database like scenario, sequential read
     ■   ECKD
          – For 1 process the scenarios show equal                               Throughput 8 KiB requests
            throughput                                                              direct io sequential read
          – For 8 processes HyperPAV improves the
            throughput by up to 5.9x
          – For 32 processes the combination Linux logical
            volume with HyperPAV dasd improves
            throughput by 13.7x




                                                             Throughput
     ■   SCSI
          – For 1 process the throughput for all scenarios
            show minor deviation
          – For 8 processes multipath multibus improves
            throughput by 1.5x
          – For 32 processes multipath multibus improves
            throughput by 3.5x                                               1                     8                    32
     ■   ECKD versus SCSI                                                                 number of processes

          – Throughput for corresponding scenario is                      ECKD 1d           ECKD 1d hpav        ECKD 1d hpav
            always higher with SCSI                                                                             sps
                                                                          ECKD 2d lv        ECKD 8d lv          SCSI 1d
     ■   General                                                          hpav sps          hpav
          – Same picture as for random read                               SCSI 1d mb        SCSI 1d mb          SCSI 2d lv mb
                                                                                            sps                 sps
                                                                          SCSI 8d lv mb



33                                                                                                               © 2012 IBM Corporation
File server, sequential write
     ■   ECKD
          – For 1 process the throughput for all scenarios                      Throughput 64 KiB requests
            show minor deviation                                                    direct io sequential write
          – For 8 processes HyperPAV + Linux logical
            volume improve the throughput by 5.7x
          – For 32 processes the combination Linux
            logical volume with HyperPAV dasd improves
            throughput by 8.8x




                                                             Throughput
     ■   SCSI
          – For 1 process multipath multibus improves
            throughput by 2.5x
          – For 8 processes the combination storage pool
            striping and multipath multibus improves
            throughput by 2.1x
          – For 32 processes the combination Linux                          1                      8                       32
            logical volume and multipath multibus                                         number of processes
            improves throughput by 4.3x                                   ECKD 1d            ECKD 1d            ECKD 1d
     ■   ECKD versus SCSI                                                                    hpav               hpav sps
                                                                          ECKD 2d lv         ECKD 8d lv         SCSI 1d
          – For 1 process sometimes advantages for                        hpav sps           hpav
            ECKD, sometimes for SCSI                                      SCSI 1d mb         SCSI 1d mb         SCSI 2d lv mb
                                                                                             sps                sps
          – SCSI is better in most cases for multiple                     SCSI 8d lv mb
            processes

34                                                                                                                © 2012 IBM Corporation
File server, sequential read
     ■   ECKD
          – For 1 process the throughput for all scenarios                      Throughput 64 KiB requests
            show minor deviation                                                    direct io sequential read
          – For 8 processes HyperPAV improves the
            throughput by up to 6.2x
          – For 32 processes the combination Linux
            logical volume with HyperPAV dasd improves
            throughput by 13.8x




                                                             Throughput
     ■   SCSI
          – For 1 process multipath multibus improves
            throughput by 2.8x
          – For 8 processes multipath multibus improves
            throughput by 4.3x
          – For 32 processes the combination Linux
                                                                            1                      8                       32
            logical volume and multipath multibus
            improves throughput by 6.5x                                                   number of processes

     ■   ECKD versus SCSI                                                 ECKD 1d            ECKD 1d            ECKD 1d
                                                                                             hpav               hpav sps
          – SCSI is better in most cases                                  ECKD 2d lv         ECKD 8d lv         SCSI 1d
                                                                          hpav sps           hpav
                                                                          SCSI 1d mb         SCSI 1d mb         SCSI 2d lv mb
                                                                                             sps                sps
                                                                          SCSI 8d lv mb



35                                                                                                                © 2012 IBM Corporation
Effect of page cache with sequential write
     ■   General
          – Compared to direct I/O:                                             Throughput 64 KiB requests
              • Helps to increase throughput for                                 page cache sequential write
                 scenarios with 1 or a few processes
              • Limits throughput in the many process
                 case
          – Advanatage of SCSI scenarios with
            additional features no longer visible




                                                            Throughput
     ■   ECKD
          – HyperPAV,storage pool striping and Linux
            logical volume still improves throughput up
            to 4.6x
     ■   SCSI
          – Multipath multibus with storage pool striping
            and/or Linux logical volume still improves                      1                     8                     32
            throughput up to 2.2x                                                        number of processes

                                                                         ECKD 1d           ECKD 1d hpav        ECKD 1d hpav
                                                                                                               sps
                                                                         ECKD 2d lv        ECKD 8d lv          SCSI 1d
                                                                         hpav sps          hpav
                                                                         SCSI 1d mb        SCSI 1d mb          SCSI 2d lv mb
                                                                                           sps                 sps
                                                                         SCSI 8d lv mb


36                                                                                                               © 2012 IBM Corporation
Effect of page cache with sequential read
     ■   General
                                                                                Throughput 64 KiB requests
          – The SLES11 read ahead setting of 1024 helps
            a lot to improve throughput                                          page cache sequential read
          – Compared to direct I/O:
              • Big throughput increase with 1 or a few
                 processes
              • Limits throughput in the many process
                 case for SCSI
          – Advantage of SCSI scenarios with additional




                                                             Throughput
            features no longer visible
              • The number of available pages in the
                 page cache limits the throughput at a
                 certain rate
     ■   ECKD
          – HyperPAV,storage pool striping and Linux
            logical volume still improves throughput up to                  1                      8                       32
            9.3x                                                                          number of processes
     ■   SCSI                                                             ECKD 1d            ECKD 1d            ECKD 1d
          – Multipath multibus with storage pool striping                                    hpav               hpav sps
                                                                          ECKD 2d lv         ECKD 8d lv         SCSI 1d
            and/or Linux logical volume still improves                    hpav sps           hpav
            throughput up to 4.8x                                         SCSI 1d mb         SCSI 1d mb         SCSI 2d lv mb
                                                                                             sps                sps
                                                                          SCSI 8d lv mb

37                                                                                                                © 2012 IBM Corporation
Summary
     Linux options                                Hardware options
         Choice of Linux distribution                FICON Express4 or 8
         Appropriate number and size of I/Os         number of channel paths to the
         File system                                  storage server
         Placement of temp files                     Port selection to exploit maximum
         direct I/O or page cache                     link speed
         Read ahead setting                          No switch interconnects with less
         Use of striped logical volume                bandwidth
          ● ECKD                                      Storage server configuration
             ● HyperPAV                                ● Extent pool definitions

             ● High Performance FICON for              ● Disk placement

               small I/O requests                      ● Storage pool striped volumes

          ● SCSI

             ● Single path configuration is not

               supported!
             ● Multipath multibus – choose

               rr_min_io value


38                                                                               © 2012 IBM Corporation
Conclusions
     ■   Small sets of I/O processes benefit from Linux page cache in case of sequential I/O
     ■   Larger I/O requests from the application lead to higher throughput
     ■   Reads benefit most from using HyperPAV (FICON/ECKD) and multipath multibus (FCP/SCSI)
     ■   Writes benefit from the NVS size
                – Can be increased by the use of
                           • Storage pool striped volumes
                           • Linux striped logical volumes to disks of different extent pools
     ■   The results may vary depending on used
                – Storage servers
                – Linux distributions
                – Number of disks
                – Number of channel paths




39                                                                                              © 2012 IBM Corporation
CPU consumption
     ■   Linux features, like page cache, PAV, striped logical volume or multipath consume additional
            processor cycles
     ■   CPU consumption
              – grows with number of I/O requests and/or number of I/O processes
              – depends on the Linux distribution and versions of components like device mapper or
                    device drivers
              – depends on customizable values as e.g. Linux memory size (and implicit page
                    cache size), read ahead value, number of alias devices, number of paths,
                    rr_min_io setting, I/O request size from the applications
              – is similar for ECKD and SCSI in the 1 disk case with no further options
     ■   HyperPAV and static PAV in SLES11 consume less CPU cycles compared to static PAV in
            older Linux distributions
     ■   The CPU consumption in the measured scenarios
               – has to be normalized to the amount of transferred data
               – differs between a simple and a complex setup
                          • for ECKD up to 2x
                          • for SCSI up to 2.5x



40                                                                                        © 2012 IBM Corporation
Questions?



                  Mario Held                    IBM Deutschland Research
                                                & Development
                  Linux on System z             Schoenaicher Strasse 220
                  System Software Performance   71032 Boeblingen, Germany
                  Engineer
                                                Phone +49 (0)7031–16–4257
                                                Email mario.held@de.ibm.com




41                                                                 © 2012 IBM Corporation

More Related Content

What's hot

Revisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS SchedulerRevisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS Scheduler
Yongseok Oh
 
Enhanced Embedded Linux Board Support Package Field Upgrade – A Cost Effectiv...
Enhanced Embedded Linux Board Support Package Field Upgrade – A Cost Effectiv...Enhanced Embedded Linux Board Support Package Field Upgrade – A Cost Effectiv...
Enhanced Embedded Linux Board Support Package Field Upgrade – A Cost Effectiv...
ijesajournal
 
Aix6+quick+ref+sheet
Aix6+quick+ref+sheetAix6+quick+ref+sheet
Aix6+quick+ref+sheet
db2kid
 
Setting up Storage Features in Windows Server 2012
Setting up Storage Features in Windows Server 2012Setting up Storage Features in Windows Server 2012
Setting up Storage Features in Windows Server 2012
Lai Yoong Seng
 
Ps3q06 20060189-michael
Ps3q06 20060189-michaelPs3q06 20060189-michael
Ps3q06 20060189-michael
Reuven Goldberg
 
Storage virtualization citrix blr wide tech talk
Storage virtualization citrix blr wide tech talkStorage virtualization citrix blr wide tech talk
Storage virtualization citrix blr wide tech talk
Sisimon Soman
 
VIO LPAR Introduction | Basics | Demo
VIO LPAR Introduction | Basics | DemoVIO LPAR Introduction | Basics | Demo
VIO LPAR Introduction | Basics | Demo
Kernel Training
 
The ideal and reality of NVDIMM RAS
The ideal and reality of NVDIMM RASThe ideal and reality of NVDIMM RAS
The ideal and reality of NVDIMM RAS
Yasunori Goto
 
Extlect03
Extlect03Extlect03
Extlect03
Vin Voro
 
Nextserver Evo
Nextserver EvoNextserver Evo
Nextserver Evo
nextcomhartley
 
Linux for embedded_systems
Linux for embedded_systemsLinux for embedded_systems
Linux for embedded_systems
Vandana Salve
 
Embedded Os [Linux & Co.]
Embedded Os [Linux & Co.]Embedded Os [Linux & Co.]
Embedded Os [Linux & Co.]
Ionela
 
An Integrated Asset Management Solution For Quantel sQ Servers
An Integrated Asset Management Solution For Quantel sQ ServersAn Integrated Asset Management Solution For Quantel sQ Servers
An Integrated Asset Management Solution For Quantel sQ Servers
Quantel
 
Student guide power systems for aix - virtualization i implementing virtual...
Student guide   power systems for aix - virtualization i implementing virtual...Student guide   power systems for aix - virtualization i implementing virtual...
Student guide power systems for aix - virtualization i implementing virtual...
solarisyougood
 
Presentation aix workload partitions (wpa rs)
Presentation   aix workload partitions (wpa rs)Presentation   aix workload partitions (wpa rs)
Presentation aix workload partitions (wpa rs)
xKinAnx
 
SSD HSD storage drives
SSD HSD storage drives SSD HSD storage drives
SSD HSD storage drives
ParikshitTaksande1
 
Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2
Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2
Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2
Cloudera, Inc.
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualization
Sisimon Soman
 
O ssvv72014
O ssvv72014O ssvv72014

What's hot (19)

Revisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS SchedulerRevisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS Scheduler
 
Enhanced Embedded Linux Board Support Package Field Upgrade – A Cost Effectiv...
Enhanced Embedded Linux Board Support Package Field Upgrade – A Cost Effectiv...Enhanced Embedded Linux Board Support Package Field Upgrade – A Cost Effectiv...
Enhanced Embedded Linux Board Support Package Field Upgrade – A Cost Effectiv...
 
Aix6+quick+ref+sheet
Aix6+quick+ref+sheetAix6+quick+ref+sheet
Aix6+quick+ref+sheet
 
Setting up Storage Features in Windows Server 2012
Setting up Storage Features in Windows Server 2012Setting up Storage Features in Windows Server 2012
Setting up Storage Features in Windows Server 2012
 
Ps3q06 20060189-michael
Ps3q06 20060189-michaelPs3q06 20060189-michael
Ps3q06 20060189-michael
 
Storage virtualization citrix blr wide tech talk
Storage virtualization citrix blr wide tech talkStorage virtualization citrix blr wide tech talk
Storage virtualization citrix blr wide tech talk
 
VIO LPAR Introduction | Basics | Demo
VIO LPAR Introduction | Basics | DemoVIO LPAR Introduction | Basics | Demo
VIO LPAR Introduction | Basics | Demo
 
The ideal and reality of NVDIMM RAS
The ideal and reality of NVDIMM RASThe ideal and reality of NVDIMM RAS
The ideal and reality of NVDIMM RAS
 
Extlect03
Extlect03Extlect03
Extlect03
 
Nextserver Evo
Nextserver EvoNextserver Evo
Nextserver Evo
 
Linux for embedded_systems
Linux for embedded_systemsLinux for embedded_systems
Linux for embedded_systems
 
Embedded Os [Linux & Co.]
Embedded Os [Linux & Co.]Embedded Os [Linux & Co.]
Embedded Os [Linux & Co.]
 
An Integrated Asset Management Solution For Quantel sQ Servers
An Integrated Asset Management Solution For Quantel sQ ServersAn Integrated Asset Management Solution For Quantel sQ Servers
An Integrated Asset Management Solution For Quantel sQ Servers
 
Student guide power systems for aix - virtualization i implementing virtual...
Student guide   power systems for aix - virtualization i implementing virtual...Student guide   power systems for aix - virtualization i implementing virtual...
Student guide power systems for aix - virtualization i implementing virtual...
 
Presentation aix workload partitions (wpa rs)
Presentation   aix workload partitions (wpa rs)Presentation   aix workload partitions (wpa rs)
Presentation aix workload partitions (wpa rs)
 
SSD HSD storage drives
SSD HSD storage drives SSD HSD storage drives
SSD HSD storage drives
 
Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2
Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2
Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualization
 
O ssvv72014
O ssvv72014O ssvv72014
O ssvv72014
 

Similar to Linux on System z – disk I/O performance

DAS RAID NAS SAN
DAS RAID NAS SANDAS RAID NAS SAN
DAS RAID NAS SAN
Ghassen Smida
 
Building a Distributed Block Storage System on Xen
Building a Distributed Block Storage System on XenBuilding a Distributed Block Storage System on Xen
Building a Distributed Block Storage System on Xen
The Linux Foundation
 
Aerospike Architecture
Aerospike ArchitectureAerospike Architecture
Aerospike Architecture
Peter Milne
 
Linux High Availability Overview - openSUSE.Asia Summit 2015
Linux High Availability Overview - openSUSE.Asia Summit 2015 Linux High Availability Overview - openSUSE.Asia Summit 2015
Linux High Availability Overview - openSUSE.Asia Summit 2015
Roger Zhou 周志强
 
IBM Platform Computing Elastic Storage
IBM Platform Computing  Elastic StorageIBM Platform Computing  Elastic Storage
IBM Platform Computing Elastic Storage
Patrick Bouillaud
 
Session 7362 Handout 427 0
Session 7362 Handout 427 0Session 7362 Handout 427 0
Session 7362 Handout 427 0
jln1028
 
Zoned Storage
Zoned StorageZoned Storage
Zoned Storage
singh.gurjeet
 
Aerospike Architecture
Aerospike ArchitectureAerospike Architecture
Aerospike Architecture
Peter Milne
 
TechNet Live spor 1 sesjon 6 - more vdi
TechNet Live spor 1   sesjon 6 - more vdiTechNet Live spor 1   sesjon 6 - more vdi
TechNet Live spor 1 sesjon 6 - more vdi
Anders Borchsenius
 
End of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph ReplicationEnd of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph Replication
Ceph Community
 
Oracle rac 10g best practices
Oracle rac 10g best practicesOracle rac 10g best practices
Oracle rac 10g best practices
Haseeb Alam
 
Virtualisation overview
Virtualisation overviewVirtualisation overview
Virtualisation overview
sagaroceanic11
 
VMworld Europe 2014: Virtual SAN Architecture Deep Dive
VMworld Europe 2014: Virtual SAN Architecture Deep DiveVMworld Europe 2014: Virtual SAN Architecture Deep Dive
VMworld Europe 2014: Virtual SAN Architecture Deep Dive
VMworld
 
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...Building High Availability Clusters with SUSE Linux Enterprise High Availabil...
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...
Novell
 
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1
sprdd
 
XenSummit - 08/28/2012
XenSummit - 08/28/2012XenSummit - 08/28/2012
XenSummit - 08/28/2012
Ceph Community
 
I/O System and Case study
I/O System and Case studyI/O System and Case study
I/O System and Case study
Lavanya G
 
SAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego CloudSAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego Cloud
aidanshribman
 
os
osos
The Pendulum Swings Back: Converged and Hyperconverged Environments
The Pendulum Swings Back: Converged and Hyperconverged EnvironmentsThe Pendulum Swings Back: Converged and Hyperconverged Environments
The Pendulum Swings Back: Converged and Hyperconverged Environments
Tony Pearson
 

Similar to Linux on System z – disk I/O performance (20)

DAS RAID NAS SAN
DAS RAID NAS SANDAS RAID NAS SAN
DAS RAID NAS SAN
 
Building a Distributed Block Storage System on Xen
Building a Distributed Block Storage System on XenBuilding a Distributed Block Storage System on Xen
Building a Distributed Block Storage System on Xen
 
Aerospike Architecture
Aerospike ArchitectureAerospike Architecture
Aerospike Architecture
 
Linux High Availability Overview - openSUSE.Asia Summit 2015
Linux High Availability Overview - openSUSE.Asia Summit 2015 Linux High Availability Overview - openSUSE.Asia Summit 2015
Linux High Availability Overview - openSUSE.Asia Summit 2015
 
IBM Platform Computing Elastic Storage
IBM Platform Computing  Elastic StorageIBM Platform Computing  Elastic Storage
IBM Platform Computing Elastic Storage
 
Session 7362 Handout 427 0
Session 7362 Handout 427 0Session 7362 Handout 427 0
Session 7362 Handout 427 0
 
Zoned Storage
Zoned StorageZoned Storage
Zoned Storage
 
Aerospike Architecture
Aerospike ArchitectureAerospike Architecture
Aerospike Architecture
 
TechNet Live spor 1 sesjon 6 - more vdi
TechNet Live spor 1   sesjon 6 - more vdiTechNet Live spor 1   sesjon 6 - more vdi
TechNet Live spor 1 sesjon 6 - more vdi
 
End of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph ReplicationEnd of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph Replication
 
Oracle rac 10g best practices
Oracle rac 10g best practicesOracle rac 10g best practices
Oracle rac 10g best practices
 
Virtualisation overview
Virtualisation overviewVirtualisation overview
Virtualisation overview
 
VMworld Europe 2014: Virtual SAN Architecture Deep Dive
VMworld Europe 2014: Virtual SAN Architecture Deep DiveVMworld Europe 2014: Virtual SAN Architecture Deep Dive
VMworld Europe 2014: Virtual SAN Architecture Deep Dive
 
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...Building High Availability Clusters with SUSE Linux Enterprise High Availabil...
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...
 
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1
 
XenSummit - 08/28/2012
XenSummit - 08/28/2012XenSummit - 08/28/2012
XenSummit - 08/28/2012
 
I/O System and Case study
I/O System and Case studyI/O System and Case study
I/O System and Case study
 
SAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego CloudSAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego Cloud
 
os
osos
os
 
The Pendulum Swings Back: Converged and Hyperconverged Environments
The Pendulum Swings Back: Converged and Hyperconverged EnvironmentsThe Pendulum Swings Back: Converged and Hyperconverged Environments
The Pendulum Swings Back: Converged and Hyperconverged Environments
 

More from IBM India Smarter Computing

Using the IBM XIV Storage System in OpenStack Cloud Environments
Using the IBM XIV Storage System in OpenStack Cloud Environments Using the IBM XIV Storage System in OpenStack Cloud Environments
Using the IBM XIV Storage System in OpenStack Cloud Environments
IBM India Smarter Computing
 
All-flash Needs End to End Storage Efficiency
All-flash Needs End to End Storage EfficiencyAll-flash Needs End to End Storage Efficiency
All-flash Needs End to End Storage Efficiency
IBM India Smarter Computing
 
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
IBM India Smarter Computing
 
IBM FlashSystem 840 Product Guide
IBM FlashSystem 840 Product GuideIBM FlashSystem 840 Product Guide
IBM FlashSystem 840 Product Guide
IBM India Smarter Computing
 
IBM System x3250 M5
IBM System x3250 M5IBM System x3250 M5
IBM System x3250 M5
IBM India Smarter Computing
 
IBM NeXtScale nx360 M4
IBM NeXtScale nx360 M4IBM NeXtScale nx360 M4
IBM NeXtScale nx360 M4
IBM India Smarter Computing
 
IBM System x3650 M4 HD
IBM System x3650 M4 HDIBM System x3650 M4 HD
IBM System x3650 M4 HD
IBM India Smarter Computing
 
IBM System x3300 M4
IBM System x3300 M4IBM System x3300 M4
IBM System x3300 M4
IBM India Smarter Computing
 
IBM System x iDataPlex dx360 M4
IBM System x iDataPlex dx360 M4IBM System x iDataPlex dx360 M4
IBM System x iDataPlex dx360 M4
IBM India Smarter Computing
 
IBM System x3500 M4
IBM System x3500 M4IBM System x3500 M4
IBM System x3500 M4
IBM India Smarter Computing
 
IBM System x3550 M4
IBM System x3550 M4IBM System x3550 M4
IBM System x3550 M4
IBM India Smarter Computing
 
IBM System x3650 M4
IBM System x3650 M4IBM System x3650 M4
IBM System x3650 M4
IBM India Smarter Computing
 
IBM System x3500 M3
IBM System x3500 M3IBM System x3500 M3
IBM System x3500 M3
IBM India Smarter Computing
 
IBM System x3400 M3
IBM System x3400 M3IBM System x3400 M3
IBM System x3400 M3
IBM India Smarter Computing
 
IBM System x3250 M3
IBM System x3250 M3IBM System x3250 M3
IBM System x3250 M3
IBM India Smarter Computing
 
IBM System x3200 M3
IBM System x3200 M3IBM System x3200 M3
IBM System x3200 M3
IBM India Smarter Computing
 
IBM PowerVC Introduction and Configuration
IBM PowerVC Introduction and ConfigurationIBM PowerVC Introduction and Configuration
IBM PowerVC Introduction and Configuration
IBM India Smarter Computing
 
A Comparison of PowerVM and Vmware Virtualization Performance
A Comparison of PowerVM and Vmware Virtualization PerformanceA Comparison of PowerVM and Vmware Virtualization Performance
A Comparison of PowerVM and Vmware Virtualization Performance
IBM India Smarter Computing
 
IBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM pureflex system and vmware vcloud enterprise suite reference architectureIBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM India Smarter Computing
 
X6: The sixth generation of EXA Technology
X6: The sixth generation of EXA TechnologyX6: The sixth generation of EXA Technology
X6: The sixth generation of EXA Technology
IBM India Smarter Computing
 

More from IBM India Smarter Computing (20)

Using the IBM XIV Storage System in OpenStack Cloud Environments
Using the IBM XIV Storage System in OpenStack Cloud Environments Using the IBM XIV Storage System in OpenStack Cloud Environments
Using the IBM XIV Storage System in OpenStack Cloud Environments
 
All-flash Needs End to End Storage Efficiency
All-flash Needs End to End Storage EfficiencyAll-flash Needs End to End Storage Efficiency
All-flash Needs End to End Storage Efficiency
 
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
 
IBM FlashSystem 840 Product Guide
IBM FlashSystem 840 Product GuideIBM FlashSystem 840 Product Guide
IBM FlashSystem 840 Product Guide
 
IBM System x3250 M5
IBM System x3250 M5IBM System x3250 M5
IBM System x3250 M5
 
IBM NeXtScale nx360 M4
IBM NeXtScale nx360 M4IBM NeXtScale nx360 M4
IBM NeXtScale nx360 M4
 
IBM System x3650 M4 HD
IBM System x3650 M4 HDIBM System x3650 M4 HD
IBM System x3650 M4 HD
 
IBM System x3300 M4
IBM System x3300 M4IBM System x3300 M4
IBM System x3300 M4
 
IBM System x iDataPlex dx360 M4
IBM System x iDataPlex dx360 M4IBM System x iDataPlex dx360 M4
IBM System x iDataPlex dx360 M4
 
IBM System x3500 M4
IBM System x3500 M4IBM System x3500 M4
IBM System x3500 M4
 
IBM System x3550 M4
IBM System x3550 M4IBM System x3550 M4
IBM System x3550 M4
 
IBM System x3650 M4
IBM System x3650 M4IBM System x3650 M4
IBM System x3650 M4
 
IBM System x3500 M3
IBM System x3500 M3IBM System x3500 M3
IBM System x3500 M3
 
IBM System x3400 M3
IBM System x3400 M3IBM System x3400 M3
IBM System x3400 M3
 
IBM System x3250 M3
IBM System x3250 M3IBM System x3250 M3
IBM System x3250 M3
 
IBM System x3200 M3
IBM System x3200 M3IBM System x3200 M3
IBM System x3200 M3
 
IBM PowerVC Introduction and Configuration
IBM PowerVC Introduction and ConfigurationIBM PowerVC Introduction and Configuration
IBM PowerVC Introduction and Configuration
 
A Comparison of PowerVM and Vmware Virtualization Performance
A Comparison of PowerVM and Vmware Virtualization PerformanceA Comparison of PowerVM and Vmware Virtualization Performance
A Comparison of PowerVM and Vmware Virtualization Performance
 
IBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM pureflex system and vmware vcloud enterprise suite reference architectureIBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM pureflex system and vmware vcloud enterprise suite reference architecture
 
X6: The sixth generation of EXA Technology
X6: The sixth generation of EXA TechnologyX6: The sixth generation of EXA Technology
X6: The sixth generation of EXA Technology
 

Linux on System z – disk I/O performance

  • 1. Linux on System z – disk I/O performance zLG13 Mario Held ©2012 IBM Corporation
  • 2. Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Oracle and Java are registered trademarks of Oracle and/or its affiliates in the United States, other countries, or both. Red Hat, Red Hat Enterprise Linux (RHEL) are registered trademarks of Red Hat, Inc. in the United States and other countries. SLES and SUSE are registered trademarks of Novell, Inc., in the United States and other countries. SPEC and the "performance chart" SPEC logo are registered trademarks of the Standard Performance Evaluation Corporation. Other product and service names might be trademarks of IBM or other companies. 2 © 2012 IBM Corporation Source: If applicable, describe source origin
  • 3. Agenda ■ Terms and characteristics in the disk I/O area ■ Disk configurations for FICON/ECKD and FCP/SCSI ■ Measurements – Setup description – Results comparison for FICON/ECKD and FCP/SCSI ■ Summary / Conclusion 3 © 2012 IBM Corporation
  • 4. Logical volumes ■ Linear logical volumes allow an easy extension of the file system ■ Striped logical volumes – provide the capability to perform simultaneous I/O on different stripes – allow load balancing – are extendable – may lead to smaller I/O request sizes compared to linear logical volumes or normal volumes ■ Don't use logical volumes for “/” or “/usr” – in case the logical volume gets corrupted your system is gone ■ Logical volumes require more CPU cycles than physical disks – Consumption increases with the number of physical disks used for the logical volume ■ LVM or Logical Volume Manager is the tool to manage logical volumes 4 © 2012 IBM Corporation
  • 5. Linux multipath ■ Connects a volume over multiple independent paths and/or ports ■ Multibus mode – for load balancing – to overcome bandwidth limitations of a single path – uses all paths in a priority group in a round-robin manner • switches the path after a configurable amount of I/Os – See rr_min_io in multipath.conf – increases throughput for • ECKD volumes with static PAV devices in Linux distributions older than SLES11 and RHEL6 • SCSI volumes in all Linux distributions – for high availability ■ Failover mode – keeps the volume accessible in case of a single failure in the connecting hardware • should one path fail, the operating system will route I/O through one of the remaining paths, with no changes visible to the applications – will not improve performance – for high availability 5 © 2012 IBM Corporation
  • 6. Page cache considerations ■ Page cache helps Linux to economize I/O ■ Requires Linux memory pages ■ Adopts its size to the amount of memory in use by processes ■ Write requests are delayed and data in the page cache can have multiple updates before being written to disk. ■ Write requests in the page cache can be merged into larger I/O requests ■ Read requests can be made faster by adding a read ahead quantity, depending on the historical behavior of file system accesses by applications ■ Settings of the configuration parameters swappiness, dirty_ratio, dirty_background ratio control the page cache ■ Linux does not know which data the application really needs next. It makes only a guess 6 © 2012 IBM Corporation
  • 7. Linux kernel components involved in disk I/O Application program Issuing reads and writes VFS dispatches requests to different devices and Virtual File System (VFS) Translates to sector addressing Logical Volume Manager (LVM) LVM defines the physical to logical device relation dm-multipath Multipath sets the multipath policies Device mapper (dm) dm holds the generic mapping of all block devices and performs 1:n mapping for logical volumes and/or Block device layer multipath Page cache Page cache contains all file I/O data, direct I/O bypasses the page cache I/O scheduler I/O schedulers merge, order and queue requests, start device drivers via (un)plug device Data transfer handling Direct Acces Storage Device driver (DASD) Small Computer System Interface driver (SCSI) Device drivers z Fiber Channel Protocol driver (zFCP) Queued Direct I/O driver (qdio) 7 © 2012 IBM Corporation
  • 8. DS8000 storage server – components ■ The rank – represents a raid array of disk Device Volumes Rank Adapter s drives – is connected to the servers by 1 Read cache device adapters Server 0 – belongs to one server for primary 2 access ■ A device adapter connects ranks to 3 servers NVS 4 ■ Each server controls half of the storage servers disk drives, read cache and 5 Read cache non-volatile storage (NVS / write Server 1 cache) 6 7 NVS 8 8 © 2012 IBM Corporation
  • 9. DS8000 storage server – extent pool ■ Extent pools Device – consist of one or Volumes Rank Adapter several ranks s – can be configured for 1 ECKD one type of Server 0 volumes only: 2 ECKD either ECKD DASDs or SCSI 3 ECKD LUNs – are assigned to one 4 SCSI server ■ Some possible extent pool 5 SCSI Server 1 definitions for ECKD and SCSI are shown in the 6 SCSI example 7 SCSI 8 ECKD 9 © 2012 IBM Corporation
  • 10. DS8000 storage pool striped volume ■ A storage pool striped volume (rotate extents) is defined on a extent pool consisting of several ranks ■ It is striped over the ranks in stripes of 1 GiB Volume ■ As shown in the example Ranks – The 1 GiB stripes are rotated over all ranks – The first storage pool striped volume start in rank 147 1 1, the next one would start in rank 2 258 2 36 3 Disk drives 10 © 2012 IBM Corporation
  • 11. DS8000 storage pool striped volume (cont.) ■ A storage pool striped volume – uses disk drives of multiple ranks Device Volumes Rank Adapter – uses several device s adapters – is bound to one server of the 1 Server 0 storage 2 3 extent pool Server 1 Disk drives 11 © 2012 IBM Corporation
  • 12. Striped volumes comparison/best practices for best performance LVM striped logical DS8000 storage volumes pool striped volumes Striping is done by... Linux (device-mapper) Storage server Which disks to choose... plan carefully don't care Disks from one extent pool... per rank, alternating over out of multiple ranks servers Administrating disks is... complex simple Extendable... yes no “gluing” disks together as linear LV can be a workaround Stripe size... variable, to suit your workload 1GiB (64KiB, default) 12 © 2012 IBM Corporation
  • 13. Disk I/O attachment and DS8000 storage subsystem ■ Each disk is a configured volume in a extent pool (here extent pool = rank) ■ Host bus adapters connect Device Adapter ranks internally to both servers ECKD Channel Host Bus 1 path 1 Adapter 1 Server 0 Channel Host Bus 2 SCSI path 2 Adapter 2 Channel Host Bus 3 ECKD path 3 Adapter 3 Switch Channel Host Bus path 4 Adapter 4 4 SCSI Channel Host Bus path 5 Adapter 5 5 ECKD Channel Host Bus Server 1 path 6 Adapter 6 Channel Host Bus 6 SCSI path 7 Adapter 7 Channel Host Bus 7 ECKD path 8 Adapter 8 FICON Express 8 SCSI 13 © 2012 IBM Corporation
  • 14. FICON/ECKD layout ■ The dasd driver starts the I/O on a subchannel ■ Each subchannel connects to all channel paths in the path group ■ Each channel connects via a switch to a host bus adapter Application program ■ A host bus adapter connects to both servers ■ Each server connects to its ranks VFS LVM Multipath dm DA Ranks Block device layer Channel 1 Server 1 Server 0 Page cache subsystem I/O scheduler a chpid 1 HBA 1 3 dasd driver b chpid 2 Switch HBA 2 c chpid 3 HBA 3 5 d chpid 4 HBA 4 7 14 © 2012 IBM Corporation
  • 15. FICON/ECKD single disk I/O ■ Assume that subchannel a corresponds to disk 2 in rank 1 ■ The full choice of host adapters can be used ■ Only one I/O can be issued at a time through subchannel a Application program ■ All other I/Os need to be queued in the dasd driver and in the block device layer until the subchannel is no longer busy with the preceding I/O VFS DA Ranks Block device layer Channel 1 Server 1 Server 0 Page cache subsystem I/O scheduler a chpid 1 HBA 1 3 chpid 2 HBA 2 dasd driver ! Switch chpid 3 HBA 3 5 chpid 4 HBA 4 7 15 © 2012 IBM Corporation
  • 16. FICON/ECKD single disk I/O with PAV (SLES10/RHEL5) ■ VFS sees one device ■ The device mapper sees the real device and all alias devices ■ Parallel Access Volumes solve the I/O queuing in front of the subchannel ■ Each alias device uses its own subchannel Application program ■ Additional processor cycles are spent to do the load balancing in the device mapper VFS ■ The next slowdown is the fact that only one disk is used in the storage server. This implies the use of only one rank, one device adapter, one server Multipath ! Ranks dm ! DA Block device layer Channel 1 Server 1 Server 0 Page cache subsystem I/O scheduler a chpid 1 HBA 1 3 dasd driver b chpid 2 Switch HBA 2 c chpid 3 HBA 3 5 d chpid 4 HBA 4 7 16 © 2012 IBM Corporation
  • 17. FICON/ECKD single disk I/O with HyperPAV (SLES11/RHEL6) ■ VFS sees one device ■ The dasd driver sees the real device and all alias devices ■ Load balancing with HyperPAV and static PAV is done in the dasd driver. The aliases need only to be added to Linux. The load balancing works better Application program than on the device mapper layer. ■ Less additional processor cycles are needed than with Linux multipath. VFS ■ Slowdown is due the fact that only one disk is used in the storage server. This implies the use of only one rank, one device adapter, one server DA ! Ranks Block device layer Channel 1 Server 1 Server 0 Page cache subsystem I/O scheduler a chpid 1 HBA 1 3 dasd driver b chpid 2 Switch HBA 2 c chpid 3 HBA 3 5 d chpid 4 HBA 4 7 17 © 2012 IBM Corporation
  • 18. FICON/ECKD DASD I/O to a linear/striped logical volume ■ VFS sees one device (logical volume) ■ The device mapper sees the logical volume and the physical volumes ■ Additional processor cycles are spent to map the I/Os to the physical volumes. ■ Striped logical volumes require more additional processor cycles than linear logical volumes Application program ■ With a striped logical volume the I/Os can be well balanced over the entire storage server and overcome limitations from a single rank, a single device adapter or a single server VFS ■ To ensure that I/O to one physical disk is not limited by one subchannel, PAV or HyperPAV should be used in combination with logical volumes LVM dm ! DA Ranks Block device layer Channel 1 Server 1 Server 0 Page cache subsystem I/O scheduler a chpid 1 HBA 1 3 dasd driver b chpid 2 Switch HBA 2 c chpid 3 HBA 3 5 d chpid 4 HBA 4 7 18 © 2012 IBM Corporation
  • 19. FICON/ECKD dasd I/O to a storage pool striped volume with HyperPAV A storage pool striped volume makes only sense in combination with PAV or ■ HyperPAV ■ VFS sees one device ■ The dasd driver sees the real device and all alias devices Application program ■ The storage pool striped volume spans over several ranks of one server and overcomes the limitations of a single rank and / or a single device adapter VFS ■ Storage pool striped volumes can also be used as physical disks for a logical volume to use both server sides. ■ To ensure that I/O to one dasd is not limited by one subchannel, PAV or HyperPAV should be used ! DA Ranks Block device layer Channel 1 Server 1 Server 0 Page cache subsystem I/O scheduler a chpid 1 HBA 1 3 b chpid 2 Switch HBA 2 extent pool dasd driver c chpid 3 HBA 3 5 d chpid 4 HBA 4 7 19 © 2012 IBM Corporation
  • 20. FCP/SCSI layout ■ The SCSI driver finalizes the I/O requests ■ The zFCP driver adds the FCP protocol to the requests ■ The qdio driver transfers the I/O to the channel Application program ■ A host bus adapter connects to both servers ■ Each server connects to its ranks VFS LVM Multipath dm DA Ranks Block device layer Server 1 Server 0 Page cache 2 I/O scheduler chpid 5 HBA 5 SCSI driver chpid 6 Switch HBA 6 4 zFCP driver chpid 7 HBA 7 qdio driver 6 chpid 8 HBA 8 8 20 © 2012 IBM Corporation
  • 21. FCP/SCSI LUN I/O to a single disk ■ Assume that disk 3 in rank 8 is reachable via channel 6 and host bus adapter 6 ■ Up to 32 (default value) I/O requests can be sent out to disk 3 before the first completion is required Application program ■ The throughput will be limited by the rank and / or the device adapter VFS ■ There is no high availability provided for the connection between the host and the storage server DA Ranks Block device layer Server 1 Server 0 Page cache 2 I/O scheduler chpid 5 HBA 5 SCSI driver chpid 6 Switch HBA 6 4 zFCP driver chpid 7 HBA 7 qdio driver ! 6 chpid 8 HBA 8 8 21 ! © 2012 IBM Corporation
  • 22. FCP/SCSI LUN I/O to a single disk with multipathing ■ VFS sees one device ■ The device mapper sees the multibus or failover alternatives to the same disk Application program ■ Administrational effort is required to define all paths to one disk VFS ■ Additional processor cycles are spent to do the mapping to the desired path for the disk in the device mapper Multipath ! dm DA Ranks Block device layer Server 1 Server 0 Page cache 2 I/O scheduler chpid 5 HBA 5 SCSI driver chpid 6 Switch HBA 6 4 zFCP driver chpid 7 HBA 7 qdio driver 6 chpid 8 HBA 8 8 22 ! © 2012 IBM Corporation
  • 23. FCP/SCSI LUN I/O to a linear/striped logical volume ■ VFS sees one device (logical volume) ■ The device mapper sees the logical volume and the physical volumes ■ Additional processor cycles are spent to map the I/Os to the physical volumes. Application program ■ Striped logical volumes require more additional processor cycles than linear logical volumes ■ With a striped logical volume the I/Os can be well balanced over the entire VFS storage server and overcome limitations from a single rank, a single device adapter or a single server LVM ■ To ensure high availability the logical volume should be used in combination with multipathing dm ! DA Ranks Block device layer Server 1 Server 0 Page cache 2 I/O scheduler chpid 5 HBA 5 SCSI driver chpid 6 Switch HBA 6 4 zFCP driver chpid 7 HBA 7 qdio driver 6 chpid 8 HBA 8 8 23 © 2012 IBM Corporation
  • 24. FCP/SCSI LUN I/O to storage pool striped volume with multipathing Storage pool striped volumes make no sense without high availability ■ ■ VFS sees one device ■ The device mapper sees the multibus or failover alternatives to the same disk Application program ■ The storage pool striped volume spans over several ranks of one server and overcomes the limitations of a single rank and / or a VFS single device adapter ■ Storage pool striped volumes can also be used as physical disks for a logical volume to make use of both server sides Multipath ! dm DA Ranks Block device layer Server 1 Server 0 Page cache 2 I/O scheduler chpid 5 HBA 5 chpid 6 HBA 6 4 SCSI driver Switch zFCP driver chpid 7 HBA 7 qdio driver 6 chpid 8 HBA 8 8 extent pool 24 ! © 2012 IBM Corporation
  • 25. Summary – I/O processing characteristics ■ FICON/ECKD: – 1:1 mapping host subchannel:dasd – Serialization of I/Os per subchannel – I/O request queue in Linux – Disk blocks are 4 KiB – High availability by FICON path groups – Load balancing by FICON path groups and Parallel Access Volumes (PAV) or HyperPAV ■ FCP/SCSI – Several I/Os can be issued against a LUN immediately – Queuing in the FICON Express card and/or in the storage server – Additional I/O request queue in Linux – Disk blocks are 512 Bytes – High availability by Linux multipathing, type failover or multibus – Load balancing by Linux multipathing, type multibus 25 © 2012 IBM Corporation
  • 26. Summary – Performance considerations ■ Use more than 1 FICON Express channel ■ Speed up techniques – Use more than 1 rank: storage pool striping, Linux striped logical volume – Use more channels for ECKD • Logical path groups • Use more than 1 subchannel: PAV, HyperPAV • High Performance FICON and Read Write Track Data – Use more channels for SCSI • SCSI Linux multipath multibus 26 © 2012 IBM Corporation
  • 27. Configuration – host and connection to storage server z9 LPAR: ■ 8 CPUs ■ 512MiB ■ 4 FICON Express4 – 2 ports per feature used for FICON – 2 ports per feature used for FCP – Total 8 paths FICON, 8 paths FCP DS8700: Switch ■ 8 FCION Express4 – 1 port per feature used for FICON System DS8K – 1 port per feature used for FCP z Linux: ■ SLES11 SP1 (with HyperPAV, High Performance FICON) ■ Kernel: 2.6.32.13-0.5-default (+dm stripe patch) FICON Port ■ Device-mapper: FCP Port – multipath: version 1.2.0 Not used – striped: version 1.3.0 – Multipath-tools-0.4.8-40.23.1 ■ 8 FICON paths defined in a channel path group 27 © 2012 IBM Corporation
  • 28. IOZone Benchmark description ■ Workload – Threaded I/O benchmark (IOzone) – Each process writes or reads a single file – Options to bypass page cache, separate execution of sequential write, sequential read, random read/write ■ Setup – Main memory was restricted to 512 MiB – File size: 2 GiB, Record size: 8 KiB or 64 KiB – Run with 1, 8 and 32 processes – Sequential run: write, rewrite, read – Random run: write, read (with previous sequential write) – Runs with direct I/O and Linux page cache – Sync and drop caches prior to every invocation of the workload to reduce noise 28 © 2012 IBM Corporation
  • 29. Measured scenarios: FICON/ECKD FICON/ECKD (always 8 paths) ■ 1 single disk – configured as 1 volume in a extent pool, containing 1 rank FICON/ECKD measurement series ■ 1 single disk – configured as 1 volume in a extent pool, containing 1 rank – using 7 HyperPAV alias devices ■ 1 storage pool striped disk – configured as 1 volume in a extent pool, containing 4 ranks – using 7 HyperPAV alias devices ■ 1 striped logical volume – built from 2 storage pool striped disks, 1 from sever0 and 1 from server1 – each disk • configured as 1 volume in a extent pool, containing 4 ranks • using 7 HyperPAV alias devices ECKD ECKD ECKD ■ 1 striped logical volume 1d 1d hpav 1d hpav sps – built from 8 disks, 4 from sever0 and 4 from server1 ECKD ECKD – each disk 2d lv 8d lv • configured as 1 volume in a extent pool, containing 1 hpav hpav rank sps • using 7 HyperPAV alias devices 29 © 2012 IBM Corporation
  • 30. Measured scenarios: FCP/SCSI FCP/SCSI ■ 1 single disk – configured as 1 volume in a extent pool, containing 1 rank – using 1 path FCP/SCSI measurement series ■ 1 single disk – configured as 1 volume in a extent pool, containing 1 rank – using 8 paths, multipath multibus, rr_min_io = 1 ■ 1 storage pool striped disk – configured as 1 volume in a extent pool, containing 4 ranks – using 8 paths, multipath multibus, rr_min_io = 1 ■ 1 striped logical volume – built from 2 storage pool striped disks, 1 from sever0 and 1 from server1 – each disk • configured as 1 volume in a extent pool, containing 4 ranks • using 8 paths, multipath multibus, rr_min_io = 1 SCSI 1d SCSI 1d SCSI 1d ■ 1 striped logical volume mb mb sps – built from 8 disks, 4 from sever0 and 4 from server1 SCSI 2d SCSI 8d – each disk lv mb lv mb sps • configured as 1 volume in a extent pool, containing 1 rank • using 8 paths, multipath multibus, rr_min_io = 1 30 © 2012 IBM Corporation
  • 31. Database like scenario, random read ■ ECKD – For 1 process the scenarios show equal Throughput 8 KiB requests throughput direct io random read – For 8 processes HyperPAV improves the throughput by up to 5.9x – For 32 processes the combination Linux logical volume with HyperPAV dasd improves throughput by 13.6x Throughput ■ SCSI – For 1 process the scenarios show equal throughput – For 8 processes multipath multibus improves throughput by 1.4x – For 32 processes multipath multibus 1 8 32 improves throughput by 3.5x number of processes ■ ECKD versus SCSI ECKD 1d ECKD 1d hpav ECKD 1d hpav – Throughput for corresponding scenario is sps always higher with SCSI ECKD 2d lv ECKD 8d lv SCSI 1d hpav sps hpav SCSI 1d mb SCSI 1d mb SCSI 2d lv mb sps sps SCSI 8d lv mb 31 © 2012 IBM Corporation
  • 32. Database like scenario, random write ■ ECKD – For 1 process the throughput for all scenarios Throughput 8 KiB requests show minor deviation direct io random write – For 8 processes HyperPAV + storage pool striping or Linux logical volume improve the throughput by 3.5x – For 32 processes the combination Linux logical volume with HyperPAV or storage pool striped dasd improves throughput by 10.8x Throughput ■ SCSI – For 1 process the throughput for all scenarios show minor deviation – For 8 processes the combination storage pool striping and multipath multibus improves throughput by 5.4x 1 8 32 – For 32 processes the combination Linux logical volume and multipath multibus improves number of processes throughput by 13.3x ECKD 1d ECKD 1d ECKD 1d ■ ECKD versus SCSI hpav hpav sps ECKD 2d lv ECKD 8d lv SCSI 1d – ECKD is better for 1 process hpav sps hpav – SCSI is better for multiple processes SCSI 1d mb SCSI 1d mb SCSI 2d lv mb ■ General sps sps SCSI 8d lv mb – More NVS keeps throughput up with 32 processes 32 © 2012 IBM Corporation
  • 33. Database like scenario, sequential read ■ ECKD – For 1 process the scenarios show equal Throughput 8 KiB requests throughput direct io sequential read – For 8 processes HyperPAV improves the throughput by up to 5.9x – For 32 processes the combination Linux logical volume with HyperPAV dasd improves throughput by 13.7x Throughput ■ SCSI – For 1 process the throughput for all scenarios show minor deviation – For 8 processes multipath multibus improves throughput by 1.5x – For 32 processes multipath multibus improves throughput by 3.5x 1 8 32 ■ ECKD versus SCSI number of processes – Throughput for corresponding scenario is ECKD 1d ECKD 1d hpav ECKD 1d hpav always higher with SCSI sps ECKD 2d lv ECKD 8d lv SCSI 1d ■ General hpav sps hpav – Same picture as for random read SCSI 1d mb SCSI 1d mb SCSI 2d lv mb sps sps SCSI 8d lv mb 33 © 2012 IBM Corporation
  • 34. File server, sequential write ■ ECKD – For 1 process the throughput for all scenarios Throughput 64 KiB requests show minor deviation direct io sequential write – For 8 processes HyperPAV + Linux logical volume improve the throughput by 5.7x – For 32 processes the combination Linux logical volume with HyperPAV dasd improves throughput by 8.8x Throughput ■ SCSI – For 1 process multipath multibus improves throughput by 2.5x – For 8 processes the combination storage pool striping and multipath multibus improves throughput by 2.1x – For 32 processes the combination Linux 1 8 32 logical volume and multipath multibus number of processes improves throughput by 4.3x ECKD 1d ECKD 1d ECKD 1d ■ ECKD versus SCSI hpav hpav sps ECKD 2d lv ECKD 8d lv SCSI 1d – For 1 process sometimes advantages for hpav sps hpav ECKD, sometimes for SCSI SCSI 1d mb SCSI 1d mb SCSI 2d lv mb sps sps – SCSI is better in most cases for multiple SCSI 8d lv mb processes 34 © 2012 IBM Corporation
  • 35. File server, sequential read ■ ECKD – For 1 process the throughput for all scenarios Throughput 64 KiB requests show minor deviation direct io sequential read – For 8 processes HyperPAV improves the throughput by up to 6.2x – For 32 processes the combination Linux logical volume with HyperPAV dasd improves throughput by 13.8x Throughput ■ SCSI – For 1 process multipath multibus improves throughput by 2.8x – For 8 processes multipath multibus improves throughput by 4.3x – For 32 processes the combination Linux 1 8 32 logical volume and multipath multibus improves throughput by 6.5x number of processes ■ ECKD versus SCSI ECKD 1d ECKD 1d ECKD 1d hpav hpav sps – SCSI is better in most cases ECKD 2d lv ECKD 8d lv SCSI 1d hpav sps hpav SCSI 1d mb SCSI 1d mb SCSI 2d lv mb sps sps SCSI 8d lv mb 35 © 2012 IBM Corporation
  • 36. Effect of page cache with sequential write ■ General – Compared to direct I/O: Throughput 64 KiB requests • Helps to increase throughput for page cache sequential write scenarios with 1 or a few processes • Limits throughput in the many process case – Advanatage of SCSI scenarios with additional features no longer visible Throughput ■ ECKD – HyperPAV,storage pool striping and Linux logical volume still improves throughput up to 4.6x ■ SCSI – Multipath multibus with storage pool striping and/or Linux logical volume still improves 1 8 32 throughput up to 2.2x number of processes ECKD 1d ECKD 1d hpav ECKD 1d hpav sps ECKD 2d lv ECKD 8d lv SCSI 1d hpav sps hpav SCSI 1d mb SCSI 1d mb SCSI 2d lv mb sps sps SCSI 8d lv mb 36 © 2012 IBM Corporation
  • 37. Effect of page cache with sequential read ■ General Throughput 64 KiB requests – The SLES11 read ahead setting of 1024 helps a lot to improve throughput page cache sequential read – Compared to direct I/O: • Big throughput increase with 1 or a few processes • Limits throughput in the many process case for SCSI – Advantage of SCSI scenarios with additional Throughput features no longer visible • The number of available pages in the page cache limits the throughput at a certain rate ■ ECKD – HyperPAV,storage pool striping and Linux logical volume still improves throughput up to 1 8 32 9.3x number of processes ■ SCSI ECKD 1d ECKD 1d ECKD 1d – Multipath multibus with storage pool striping hpav hpav sps ECKD 2d lv ECKD 8d lv SCSI 1d and/or Linux logical volume still improves hpav sps hpav throughput up to 4.8x SCSI 1d mb SCSI 1d mb SCSI 2d lv mb sps sps SCSI 8d lv mb 37 © 2012 IBM Corporation
  • 38. Summary Linux options Hardware options  Choice of Linux distribution  FICON Express4 or 8  Appropriate number and size of I/Os  number of channel paths to the  File system storage server  Placement of temp files  Port selection to exploit maximum  direct I/O or page cache link speed  Read ahead setting  No switch interconnects with less  Use of striped logical volume bandwidth ● ECKD  Storage server configuration ● HyperPAV ● Extent pool definitions ● High Performance FICON for ● Disk placement small I/O requests ● Storage pool striped volumes ● SCSI ● Single path configuration is not supported! ● Multipath multibus – choose rr_min_io value 38 © 2012 IBM Corporation
  • 39. Conclusions ■ Small sets of I/O processes benefit from Linux page cache in case of sequential I/O ■ Larger I/O requests from the application lead to higher throughput ■ Reads benefit most from using HyperPAV (FICON/ECKD) and multipath multibus (FCP/SCSI) ■ Writes benefit from the NVS size – Can be increased by the use of • Storage pool striped volumes • Linux striped logical volumes to disks of different extent pools ■ The results may vary depending on used – Storage servers – Linux distributions – Number of disks – Number of channel paths 39 © 2012 IBM Corporation
  • 40. CPU consumption ■ Linux features, like page cache, PAV, striped logical volume or multipath consume additional processor cycles ■ CPU consumption – grows with number of I/O requests and/or number of I/O processes – depends on the Linux distribution and versions of components like device mapper or device drivers – depends on customizable values as e.g. Linux memory size (and implicit page cache size), read ahead value, number of alias devices, number of paths, rr_min_io setting, I/O request size from the applications – is similar for ECKD and SCSI in the 1 disk case with no further options ■ HyperPAV and static PAV in SLES11 consume less CPU cycles compared to static PAV in older Linux distributions ■ The CPU consumption in the measured scenarios – has to be normalized to the amount of transferred data – differs between a simple and a complex setup • for ECKD up to 2x • for SCSI up to 2.5x 40 © 2012 IBM Corporation
  • 41. Questions? Mario Held IBM Deutschland Research & Development Linux on System z Schoenaicher Strasse 220 System Software Performance 71032 Boeblingen, Germany Engineer Phone +49 (0)7031–16–4257 Email mario.held@de.ibm.com 41 © 2012 IBM Corporation