Header
                                                                 Header                                                             Block Type (8B)
                                                                  Entry
                                                                                                                                 Compressed Size (Int)
                                                                  Entry
                                                                                                                                Uncompressed Size (Int)
                                                                  Entry
                                                                                                                                 Offset of previous same
                                                                                                                                   type block (Long)


                                                                 Header
                   Data Block
                                                            Compressed data                KeyLen (Int)       ValueLen (Int)   Key (Byte[])        Value (Byte[])
                                                               (Byte[])
                   Data Block
  “Scanned
                    Leaf index                                                                      not root format
block” section
                   block/Bloom
                      block
                                                                   Header                                                              Block offset (Long)
                        …                                                                                 Offset (Int)
                                                              # of entries (Int)                                                        On disk size ( Int)
                                                                                                          Offset (Int)
                   Meta Block                                Entry offsets (Int[])                        Offset (Int)                     Key (Byte[])
                                                                    Entry                           Total Size of Entries
                         ...                                                                                (Int)
 Non-scanned                                                        Etnry
 block section     Meta Block
                                                                    Entry
                   Intermediate
                 Level Data Index
                 Blocks (optional)

                 Root Data Index                                    Header                           Block offset (Long)
                                                                                                                                              Key length (VInt)
                                                                     Entry                              On disk size (Int)
                    Fields for                                                                                                                Key bytes (Byte[])
 Load-on-open        midkey                                                                                   Key
                                                                       ...
    section        Meta Index
                                                                     Entry                                   Root format
                     File Info
                   Bloom Filter                                                                                                                Header
                                                                                                          Header
                    metadata
                                                                                                                                       Size of entries (Int)
                   Trailer fields                                    Header                    Bloom filter version (Int)
     Trailer                                                                                                                            Last key (Byte[])
                     Version                                 File info offset (Long)            Total byte size of bloom
                                                                                                  filter chunks (Long)               Average key length (Int)
                                                           loadOnOpenOffSet (Long)
                                                                                                                                      Average value length
                                    Offset of Middle                                            # of hash functions (Int)                    (Int)
                                                           # of data index entries (Int)
                                    leaf index block                                                                                          Comparator
                                         (Long)            uncompressedDataIndexSiz
                                                                                              Type of hash functions (Int)
                                                                   e (Long)
                                                                                                                                          User defined
                                    On-disk size (Int)     # of meta index entries (Int)        Total key count (Long)
                                    Index of the mid-      Total uncompressed bytes
                                        key (Int)                   (Long)                     Maximum total number of
                                                                                                    keys (Long)
                      This midkey fields is part of Root       numEntries (Long)
                      Data Index Block, appended to tail                                           # of chunks (Int)
                            of Data Index Blocks.           Compression codec (Int)
                                                           # of levels in the data block           Comparator class
                                                                        (Int)
                                                              First data block offset
                                                                      (Long)                         Chunk index
                                                              First data block offset
                                                                      (Long)
                                                                  Version (Int)
                                                                                                                                     Ant.Rao@gmail.com

The Format of new HFile 2

  • 1.
    Header Header Block Type (8B) Entry Compressed Size (Int) Entry Uncompressed Size (Int) Entry Offset of previous same type block (Long) Header Data Block Compressed data KeyLen (Int) ValueLen (Int) Key (Byte[]) Value (Byte[]) (Byte[]) Data Block “Scanned Leaf index not root format block” section block/Bloom block Header Block offset (Long) … Offset (Int) # of entries (Int) On disk size ( Int) Offset (Int) Meta Block Entry offsets (Int[]) Offset (Int) Key (Byte[]) Entry Total Size of Entries ... (Int) Non-scanned Etnry block section Meta Block Entry Intermediate Level Data Index Blocks (optional) Root Data Index Header Block offset (Long) Key length (VInt) Entry On disk size (Int) Fields for Key bytes (Byte[]) Load-on-open midkey Key ... section Meta Index Entry Root format File Info Bloom Filter Header Header metadata Size of entries (Int) Trailer fields Header Bloom filter version (Int) Trailer Last key (Byte[]) Version File info offset (Long) Total byte size of bloom filter chunks (Long) Average key length (Int) loadOnOpenOffSet (Long) Average value length Offset of Middle # of hash functions (Int) (Int) # of data index entries (Int) leaf index block Comparator (Long) uncompressedDataIndexSiz Type of hash functions (Int) e (Long) User defined On-disk size (Int) # of meta index entries (Int) Total key count (Long) Index of the mid- Total uncompressed bytes key (Int) (Long) Maximum total number of keys (Long) This midkey fields is part of Root numEntries (Long) Data Index Block, appended to tail # of chunks (Int) of Data Index Blocks. Compression codec (Int) # of levels in the data block Comparator class (Int) First data block offset (Long) Chunk index First data block offset (Long) Version (Int) Ant.Rao@gmail.com