Why 4K?
     October 2, 2012
      George Wilson
george.wilson@delphix.com
Why 4K? (Not Y4K)
●   This is not the next Millennium bug!




                         Delphix Proprietary and Confidential
Storage History
●   1998 IBM publishes a paper proposing an increase of disk
    sector size from 512B to 4K
●   2000 4K IDEMA (International Disk Drive Equipment and
    Materials Association) committee was formed
●   2005 ZFS released in OpenSolaris with support for block sizes
    ranging from 512B to 128K
●   2005 512B emulation mode proposed, later known as AF
    512e
●   2006 ZFS adds large sector support
●   2009 Advanced Format is approved as naming convention
    for 4K sectors
●   2011 All hard drive manufactures start to ship AF 512e drives
                         Delphix Proprietary and Confidential
Advanced Format Drives
●   Two flavors of Advanced Format Drives
     ○ AF 512e - Advanced Format 512B Emulation

     ○ AF 4Kn - Advanced Format 4K Native




    Today
                                                               Future
                                                              (2012?)

                       Delphix Proprietary and Confidential
Advanced Format 512e (AF 512e)
●   Maps 8 512B logical blocks
    into 1 physical 4K block
●   Provides an emulation layer
    for compatibility


     0   1   2     3     4      5         6         7               512B Logical Blocks

             4K Physical Block #0                                   4K Physical Blocks




                             Delphix Proprietary and Confidential
Predicting Future Problems

"Access on a 512-byte basis would continue to be
supported, but performance would be inferior to
that in which access is done on a 4096-byte basis,
and might well be inferior to that of previous
drives with 512-byte native block size." -- Large
Block Size by Paul Hodges and David Cheng, 1998
Problems in a 4K World
●   Lies
     ○ AF 512e Drive lie about their physical block size

     ○ LUNs from storage vendors lie about their physical block

        size
●   Misaligned I/O
     ○ Proper partitioning

     ○ Some AF 512e drives provide an XP jumper (XP partition

        starts on sector 63, not 4K aligned)
●   Read-modify-write



                        Delphix Proprietary and Confidential
Sub-block Reads

  Read 512B Block



   0    1     2     3     4      5         6         7               512B Logical Blocks

              4K Physical Block #1                                   4K Physical Blocks




   0    1     2     3     4      5         6         7               512B Logical Blocks

              4K Physical Block #1                                   4K Physical Blocks




                                                    Must Read 4K Block


                              Delphix Proprietary and Confidential
Sub-block Writes (Read-modify-write)
Logical Block
 Read 512B




                 0   1   2     3     4          5          6         7          512B Logical Blocks

                         4K Physical Block #0                                   4K Physical Blocks
Physical Block
  Read 4K




                 0   1   2     3     4          5          6         7          512B Logical Blocks

                         4K Physical Block #0                                   4K Physical Blocks
Physical Block
  Write 4K




                 0   1   2     3     4          5          6         7          512B Logical Blocks

                         4K Physical Block #0                                   4K Physical Blocks

                                         Delphix Proprietary and Confidential
Misaligned 4K Writes
Logical Block
  Read 4K




                 0   1   2     3     4          5          6         7          8     9

                         4K Physical Block #0                                       4K Physical Block #1
Physical Block
 Read 2 4K




                 0   1   2     3     4          5          6         7          8     9

                         4K Physical Block #0                                       4K Physical Block #1
Physical Block
 Write 2 4K




                 0   1   2     3     4          5          6         7          8     9

                         4K Physical Block #0                                       4K Physical Block #1

                                         Delphix Proprietary and Confidential
Solutions (sort of)
●   Override the lies from the device
     ○ FreeBSD, Illumos, and Linux have all implemented a way

       to override the discovered sector size
     ○ FreeBSD

         ■ using gnop to create 4k device

     ○ Illumos

         ■ add an override into sd.conf:
            sd-config-list = "VENDOR        PRODUCT", physical-block-size:4096;
    ○   Linux
        ■   zpool create -o ashift=12 tank <device>




                              Delphix Proprietary and Confidential
Drawbacks of 4K and ZFS
●   Reduced compression ratio
     ○ Blocks less than 4K mean 0% compression

     ○ 8K block can only achieve 50% compression

●   Migrating drives from 512B to 4K
●   Inefficient metadata allocation
     ○ Some metadata is allocated in 4K chunks and will no

        longer get compressed
●   Improper accounting of compressed sizes in datasets
●   RAID-Z and 4k -- not recommended
●   Configuring root pools to use 4K
     ○ Grub support?

●   Fewer uberblocks Delphix Proprietary and Confidential
Q&A / Beer?




              Delphix Proprietary and Confidential
ZFS Day
     October 2, 2012
      George Wilson
george.wilson@delphix.com

Why 4k?

  • 1.
    Why 4K? October 2, 2012 George Wilson george.wilson@delphix.com
  • 2.
    Why 4K? (NotY4K) ● This is not the next Millennium bug! Delphix Proprietary and Confidential
  • 3.
    Storage History ● 1998 IBM publishes a paper proposing an increase of disk sector size from 512B to 4K ● 2000 4K IDEMA (International Disk Drive Equipment and Materials Association) committee was formed ● 2005 ZFS released in OpenSolaris with support for block sizes ranging from 512B to 128K ● 2005 512B emulation mode proposed, later known as AF 512e ● 2006 ZFS adds large sector support ● 2009 Advanced Format is approved as naming convention for 4K sectors ● 2011 All hard drive manufactures start to ship AF 512e drives Delphix Proprietary and Confidential
  • 4.
    Advanced Format Drives ● Two flavors of Advanced Format Drives ○ AF 512e - Advanced Format 512B Emulation ○ AF 4Kn - Advanced Format 4K Native Today Future (2012?) Delphix Proprietary and Confidential
  • 5.
    Advanced Format 512e(AF 512e) ● Maps 8 512B logical blocks into 1 physical 4K block ● Provides an emulation layer for compatibility 0 1 2 3 4 5 6 7 512B Logical Blocks 4K Physical Block #0 4K Physical Blocks Delphix Proprietary and Confidential
  • 6.
    Predicting Future Problems "Accesson a 512-byte basis would continue to be supported, but performance would be inferior to that in which access is done on a 4096-byte basis, and might well be inferior to that of previous drives with 512-byte native block size." -- Large Block Size by Paul Hodges and David Cheng, 1998
  • 7.
    Problems in a4K World ● Lies ○ AF 512e Drive lie about their physical block size ○ LUNs from storage vendors lie about their physical block size ● Misaligned I/O ○ Proper partitioning ○ Some AF 512e drives provide an XP jumper (XP partition starts on sector 63, not 4K aligned) ● Read-modify-write Delphix Proprietary and Confidential
  • 8.
    Sub-block Reads Read 512B Block 0 1 2 3 4 5 6 7 512B Logical Blocks 4K Physical Block #1 4K Physical Blocks 0 1 2 3 4 5 6 7 512B Logical Blocks 4K Physical Block #1 4K Physical Blocks Must Read 4K Block Delphix Proprietary and Confidential
  • 9.
    Sub-block Writes (Read-modify-write) LogicalBlock Read 512B 0 1 2 3 4 5 6 7 512B Logical Blocks 4K Physical Block #0 4K Physical Blocks Physical Block Read 4K 0 1 2 3 4 5 6 7 512B Logical Blocks 4K Physical Block #0 4K Physical Blocks Physical Block Write 4K 0 1 2 3 4 5 6 7 512B Logical Blocks 4K Physical Block #0 4K Physical Blocks Delphix Proprietary and Confidential
  • 10.
    Misaligned 4K Writes LogicalBlock Read 4K 0 1 2 3 4 5 6 7 8 9 4K Physical Block #0 4K Physical Block #1 Physical Block Read 2 4K 0 1 2 3 4 5 6 7 8 9 4K Physical Block #0 4K Physical Block #1 Physical Block Write 2 4K 0 1 2 3 4 5 6 7 8 9 4K Physical Block #0 4K Physical Block #1 Delphix Proprietary and Confidential
  • 11.
    Solutions (sort of) ● Override the lies from the device ○ FreeBSD, Illumos, and Linux have all implemented a way to override the discovered sector size ○ FreeBSD ■ using gnop to create 4k device ○ Illumos ■ add an override into sd.conf: sd-config-list = "VENDOR PRODUCT", physical-block-size:4096; ○ Linux ■ zpool create -o ashift=12 tank <device> Delphix Proprietary and Confidential
  • 12.
    Drawbacks of 4Kand ZFS ● Reduced compression ratio ○ Blocks less than 4K mean 0% compression ○ 8K block can only achieve 50% compression ● Migrating drives from 512B to 4K ● Inefficient metadata allocation ○ Some metadata is allocated in 4K chunks and will no longer get compressed ● Improper accounting of compressed sizes in datasets ● RAID-Z and 4k -- not recommended ● Configuring root pools to use 4K ○ Grub support? ● Fewer uberblocks Delphix Proprietary and Confidential
  • 13.
    Q&A / Beer? Delphix Proprietary and Confidential
  • 14.
    ZFS Day October 2, 2012 George Wilson george.wilson@delphix.com