SlideShare a Scribd company logo
HFile: A Block-Indexed File Format
                   to Store Sorted Key-Value Pairs

 Sep 10, 2009 Schubert Zhang (schubert.zhang@gmail.com) http://cloudepr.blogspot.com



1. Introduction

HFile is a mimic of Google’s SSTable. Now, it is available in Hadoop
HBase-0.20.0. And the previous releases of HBase temporarily use an
alternate file format – MapFile[4], which is a common file format in Hadoop
IO package. I think HFile should also become a common file format when
it becomes mature, and should be moved into the common IO package of Hadoop
in the future.


Following words of SSTable are from section 4 of Google’s Bigtable paper.


The Google SSTable file format is used internally to store Bigtable data.
An SSTable provides a persistent, ordered immutable map from keys to values,
where both keys and values are arbitrary byte strings. Operations are
provided to look up the value associated with a specified key, and to
iterate over all key/value pairs in a specified key range. Internally,
each SSTable contains a sequence of blocks (typically each block is 64KB
in size, but this is configurable). A block index (stored at the end of
the SSTable) is used to locate blocks; the index is loaded into memory
when the SSTable is opened. A lookup can be performed with a single disk
seek: we first find the appropriate block by performing a binary search
in the in-memory index, and then reading the appropriate block from disk.
Optionally, an SSTable can be completely mapped into memory, which allows
us to perform lookups and scans without touching disk.[1]


The HFile implements the same features as SSTable, but may provide more
or less.


2. File Format

Data Block Size

Whenever we say Block Size, it means the uncompressed size.
The size of each data block is 64KB by default, and is configurable in
HFile.Writer. It means the data block will not exceed this size more than
one key/value pair. The HFile.Writer starts a new data block to add
key/value pairs if the current writing block is equal to or bigger than
this size. The 64KB size is same as Google’s [1].




                                                                                       1
To achieve better performance, we should select different block size. If
the average key/value size is very short (e.g. 100 bytes), we should select
small blocks (e.g. 16KB) to avoid too many key/value pairs in each block,
which will increase the latency of in-block seek, because the seeking
operation always finds the key from the first key/value pair in sequence
within a block.


Maximum Key Length

The key of each key/value pair is currently up to 64KB in size. Usually,
10-100 bytes is a typical size for most of our applications. Even in the
data model of HBase, the key (rowkey+column family:qualifier+timestamp)
should not be too long.


Maximum File Size

The trailer, file-info and total data block indexes (optionally, may add
meta block indexes) will be in memory when writing and reading of an HFile.
So, a larger HFile (with more data blocks) requires more memory. For example,
a 1GB uncompressed HFile would have about 15600 (1GB/64KB) data blocks,
and correspondingly about 15600 indexes. Suppose the average key size is
64 bytes, then we need about 1.2MB RAM (15600X80) to hold these indexes
in memory.


Compression Algorithm

-   Compression reduces the number of bytes written to/read from HDFS.
-   Compression effectively improves the efficiency of network bandwidth
    and disk space
-   Compression reduces the size of data needed to be read when issuing
    a read


To be as low friction as necessary, a real-time compression library is
preferred. Currently, HFile supports following three algorithms:
(1) NONE (Default, uncompressed, string name=”none”)
(2) GZ (Gzip, string name=”gz”)
    Out of the box, HFile ships with only Gzip compression, which is fairly
    slow.
(3) LZO(Lempel-Ziv-Oberhumer, preferred, string name=”lzo”)
    To achieve maximal performance and benefit, you must enable LZO, which
    is   a   lossless   data   compression   algorithm   that   is   focused   on
    decompression speed.


Following figures show the format of an HFile.


                                                                                2
KeyLen (int)             ValLen (int)            Key (byte[])                Value (byte[])




    Data Block 0
                                                       DATA BLOCK MAGIC (8B)

                                                               Key-Value (First)

    Data Block 1

                                                                        ……

                                                               Key-Value (Last)
    Data Block 2

                                     KeyLen        Key            id         ValLen        Val
                                      (vint)      (byte[])       (1B)        (vint)      (byte[])
    Meta Block 0
     (Optional)                     User Defined Metadata,
                                    start with METABLOCKMAGIC

    Meta Block 1                                             Size or ItemsNum (int)
     (Optional)
                                                              LASTKEY (byte[])

                                                             AVG_KEY_LEN (int)
       File Info
                                                          AVG_VALUE_LEN (int)

                                                       COMPARATOR (className)

                                                                 User Defined
     Data Index

                                                    INDEX BLOCK MAGIC (8B)

     Meta Index                                              Index of Data Block 0
     (Optional)
                                                                        …


        Trailer                                     INDEX BLOCK MAGIC (8B)

                                                             Index of Meta Block 0

                                                                        …


                                                               Fixed File Trailer
                                                              (Go to next picture)


Offset(long)       MetaSize (int)      MetaNameLen (vint)              MetaName (byte[])


Offset(long)       DataSize (int)         KeyLen (vint)                   Key (byte[])



                                                                                                    3
TRAILER BLOCK MAGIC (8B)

                                                       File Info Offset (long)

               Trailer                                Data Index Offset (long)

                                                       Data Index Count (int)

                                                      Meta Index Offset (long)

                                                       Meta Index Count (int)

                                                Total Uncompressed Data Bytes (long)

                                                 Entry Count or Data K-V Count (int)

                                                      Compression Codec (int)

                                                            Version (int)


                    Total Size of Trailer: 4xLong + 5xInt + 8Bytes = 60 Bytes



In above figures, an HFile is separated into multiple segments, from
beginning to end, they are:


-   Data Block segment
    To store key/value pairs, may be compressed.
-   Meta Block segment (Optional)
    To store user defined large metadata, may be compressed.
-   File Info segment
    It is a small metadata of the HFile, without compression. User can add
    user defined small metadata (name/value) here.
-   Data Block Index segment
    Indexes the data block offset in the HFile. The key of each index is
    the key of first key/value pair in the block.
-   Meta Block Index segment (Optional)
    Indexes the meta block offset in the HFile. The key of each index is
    the user defined unique name of the meta block.
-   Trailer
    The fix sized metadata. To hold the offset of each segment, etc. To
    read an HFile, we should always read the Trailer firstly.


The current implementation of HFile does not include Bloom Filter, which
should be added in the future.


The FileInfo is a SortedMap in implementation. So the actual order of those


                                                                                       4
fields is alphabetically based on the key.


3. LZO Compression

LZO is now removed from Hadoop or HBase 0.20+ because of GPL restrictions.
To enable it, we should install native library firstly as following.
[6][7][8][9]


(1) Download LZO: http://www.oberhumer.com/, and build.
     # ./configure --build=x86_64-redhat-linux-gnu --enable-shared
     --disable-asm
     # make
     # make install
     Then the libraries have been installed in: /usr/local/lib
(2) Download the native connector library
    http://code.google.com/p/hadoop-gpl-compression/, and build.
    Copy hadoo-0.20.0-core.jar to ./lib.
    # ant compile-native
    # ant jar
(3) Copy the native library (build/native/ Linux-amd64-64) and
    hadoop-gpl-compression-0.1.0-dev.jar to your application’s lib
    directory. If your application is a MapReduce job, copy them to
    hadoop’s lib directory. Your application should follow the
    $HADOOP_HOME/bin/hadoop script to ensure that the native hadoop
    library is on the library path via the system property
    -Djava.library.path=<path>. [9] For example:

     # setup 'java.library.path' for native-hadoop code if necessary
     JAVA_LIBRARY_PATH=''
     if [ -d "${HADOOP_HOME}/build/native" -o -d "${HADOOP_HOME}/lib/native" ]; then
       JAVA_PLATFORM=`CLASSPATH=${CLASSPATH} ${JAVA} -Xmx32m
     org.apache.hadoop.util.PlatformName | sed -e "s/ /_/g"`

      if [ -d "$HADOOP_HOME/build/native" ]; then
        JAVA_LIBRARY_PATH=${HADOOP_HOME}/build/native/${JAVA_PLATFORM}/lib
      fi

      if [ -d "${HADOOP_HOME}/lib/native" ]; then
        if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then

     JAVA_LIBRARY_PATH=${JAVA_LIBRARY_PATH}:${HADOOP_HOME}/lib/native/${JAVA_PLATFORM}
         else
           JAVA_LIBRARY_PATH=${HADOOP_HOME}/lib/native/${JAVA_PLATFORM}
         fi
       fi
     fi



   Then our application and hadoop/MapReduce can use LZO.




                                                                                         5
4. Performance Evaluation

Testbed
   −   4 slaves + 1 master
   −   Machine: 4 CPU cores (2.0G), 2x500GB 7200RPM SATA disks, 8GB RAM.
   −   Linux: RedHat 5.1 (2.6.18-53.el5), ext3, no RAID, noatime
   −   1Gbps network, all nodes under the same switch.
   −   Hadoop-0.20.0 (1GB heap), lzo-2.0.3


Some MapReduce-based benchmarks are designed to evaluate the performance
of operations to HFiles, in parallel.
   −   Total key/value entries: 30,000,000.
   −   Key/Value size: 1000 bytes (10 for key, and 990 for value). We have
       totally 30GB of data.
   −   Sequential key ranges: 60, i.e. each range have 500,000 entries.
   −   Use default block size.
   −   The entry value is a string, each continuous 8 bytes are a filled
       with a same letter (A~Z). E.g. “BBBBBBBBXXXXXXXXGGGGGGGG……”.
We set mapred.tasktracker.map.tasks.maximum=3 to avoid client side
bottleneck.


(1) Write
    Each MapTask for each range of key, which writes a separate HFile with
    500,000 key/value entries.
(2) Full Scan
    Each MapTask scans a separate HFile from beginning to end.
(3) Random Seek a specified key
    Each MapTask opens one separate HFile, and selects a random key within
    that file to seek it. Each MapTask runs 50,000 (1/10 of the entries)
    random seeks.
(4) Random Short Scan
    Each MapTask opens one separate HFile, and selects a random key within
    that file as a beginning to scan 30 entries. Each MapTask runs 50,000
    scans, i.e. scans 50,000*30=1,500,000 entries.


This table shows the average entries which are written/seek/scanned per
second, and per node.


            Compress   none       gz         lzo         SequenceFile
  Operation                                              (none compress)
  Write                20718      23885      55147       19789
  Full Scan            41436      94937      100000      28626
  Random Seek          600        989        956         N/A
  Random Short Scan    12241      25568      25655       N/A


                                                                           6
In this evaluation case, the compression ratio is about 7:1 for gz(Gzip),
and about 4:1 for lzo. Even through the compression ratio is just moderate,
the lzo column shows the best performance, especially for writes.


The performance of full scan is much better than SequenceFile, so HFile
may   provide    better   performance   to   MapReduce-based    analytical
applications.


The random seek in HFiles is slow, especially in none-compressed HFiles.
But the above numbers already show 6X~10X better performance than a disk
seek (10ms). Following Ganglia charts show us the overhead of load, CPU,
and network. The random short scan makes the similar phenomena.




5. Implementation and API

5.1 HFile.Writer : How to create and write an HFile

(1) Constructors
There are 5 constructors. We suggest using following two:


public Writer(FileSystem fs, Path path, int blocksize,
                String compress,
                final RawComparator<byte []> comparator)
public Writer(FileSystem fs, Path path, int blocksize,
                Compression.Algorithm compress,
                final RawComparator<byte []> comparator)


These two constructors are same. They create file (call fs.create(…)) and
get an FSDataOutputStream for writing. Since the FSDataOutputStream is


                                                                          7
created when constructing the HFile.Writer, it will be automatically
closed when the HFile.Writer is closed.


The other two constructors provide FSDataOutputStream as a parameter. It
means the file is created and opened outside of the HFile.Writer, so, when
we close the HFile.Writer, the FSDataOutputStream will not be closed. But
we do not suggest using these two constructors directly.


public Writer(final FSDataOutputStream ostream, final int blocksize,
               final String compress,
               final RawComparator<byte []> c)
public Writer(final FSDataOutputStream ostream, final int blocksize,
               final Compression.Algorithm compress,
               final RawComparator<byte []> c)


Another constructor only provides fs and path as parameters, all other
attributes are default, i.e. NONE of compression, 64KB of block size, raw
ByteArrayComparator, etc.


(2) Write Key/Value pairs into HFile

Before key/value pairs are written into an HFile, the application must
sort them using the same comparator, i.e. all key/value pairs must be
sequentially and increasingly write/append into an HFile. There are
following methods to write/append key/value pairs:


public void append(final KeyValue kv)
public void append(final byte [] key, final byte [] value)
public void append(final byte [] key, final int koffset, final int klength,
               final byte [] value, final int voffset, final int vlength)


When adding a key/value pair, they will check the current block size. If
the size reach the maximum size of a block, the current block will be
compressed and written to the output stream (of the HFile), and then create
a new block for writing. The compression is based on each block. For each
block, an output stream for compression will be created from beginning
of a new block and released when finish.


Following chart is the relationship of the output steams OO design:




                                                                          8
DataOutputStream


          BufferedOutputStream

           FinishOnFlushCompressionStream


             Compression?OutputStream
             (for different codec)

              FSDataOutputStream
              (to an HFile)




The     key/value   appending   operation   is   written   from   the   outside
(DataOutputStream), and the above OO mechanism will handle the buffer and
compression functions and then write to the file in under layer file system.


Before a key/value pair is written, following will checked:
-   The length of Key
-     The order of Key (must bigger than the last one)


(3) Add metadata into HFile

We can add metadata block into an HFile.
public void appendMetaBlock(String metaBlockName, byte [] bytes)


The application should provide a unique metaBlockName for each metadata
block within an HFile.


Reminding: If your metadata is large enough (e.g. 32KB uncompressed), you
can use this feature to add a separate meta block. It may be compressed
in the file.
But if your metadata is very small (e.g. less than 1KB), please use
following method to append it into file info. File info will not be
compressed.


public void appendFileInfo(final byte [] k, final byte [] v)


(4) Close

Before the HFile.Writer is closed, the file is not completed written. So,
we must call close() to:
-   finish and flush the last block


                                                                              9
-   write all meta block into file (may be compressed)
-   generate and write file info metadata
-   write data block indexes
-   write meta block indexes
-   generate and write trailer metadata
-   close the output-stream.


5.2 HFile.Reader: How to read HFile

Create an HFile.Reader to open an HFile, and we can seek, scan and read
on it.


(1) Constructor

We suggest using following constructor to create an HFile.Reader.

public Reader(FileSystem fs, Path path, BlockCache cache,
               boolean inMemory)


It calls fs.open(…) to open the file, and gets an FSDataInputStream
for reading. The input stream will be automatically closed when the
HFile.Reader is closed.

Another constructor uses InputStream as parameter directly. It means
the file is opened outside the HFile.Reader.

public Reader(final FSDataInputStream fsdis, final long size,
               final BlockCache cache, final boolean inMemory)


We can use BlockCache to improve the performance of read, and the mechanism
of mechanism will be described in other document.


(2) Load metadata and block indexes of an HFile

The HFile is not readable before loadFileInfo() is explicitly called .
It will read metadata (Trailer, File Info) and Block Indexes (data
block and meta block) into memory. And the COMPARATOR instance will
reconstruct from file info.

BlockIndex

The important method of BlockIndex is:

int blockContainingKey(final byte[] key, int offset, int length)


                                                                         10
It uses binarySearch to check if a key is contained in a block. The
return value of binarySearch() is very puzzled:
  Data Block Index List     Before   0    1    2    3    4    5    6    …
  binarySearch() return     -1       -2   -3   -4   -5   -6   -7   -8   …


HFileScanner

We must create an HFile.Reader.Scanner to seek, scan, and read on an
HFile. HFile.Reader.Scanner is an implementation of HFileScanner
interface.

To seek and scan in an HFIle, we should do as following:

(1) Create a HFile.Reader, and loadFileInfo().
(2) In this HFile.Reader, calls getScanner() to obtain an HFileScanner.
(3) .1 For a scan from the beginning of the HFile, calls seekTo() to
   seek to the beginning of the first block.
   .2 For a scan from a key, calls seekTo(key) to seek to the position
   of the key or before the key (if there is not such a key in this
   HFile).
   .3 For a scan from before of a key, calls seekBefore(key).
(4) Calls next() to iterate over all key/value pairs. The next() will
   return false when it reach the end of the HFile. If an application
   wants to stop at any condition, it should be implemented by the
   application itself. (e.g. stop at a special endKey.)
(5) If you want lookup a specified key, just call seekTo(key), the
   returned value=0 means you found it.
(6) After we seekTo(…) or next() to a position of specified key, we
   can call following methods to get the current key and value.
     public KeyValue getKeyValue() // recommended
     public ByteBuffer getKey()
     public ByteBuffer getValue()
(7) Don’t forget to close the HFile.Reader. But a scanner need not be
   closed, since it does not hold any resource.




                                                                            11
References
[1]   Google, Bigtable: A Distributed Storage System for Structured Data,
      http://labs.google.com/papers/bigtable.html
[2]   HBase-0.20.0 Documentation,
      http://hadoop.apache.org/hbase/docs/r0.20.0/
[3]   HFile code review and refinement.
      http://issues.apache.org/jira/browse/HBASE-1818
[4]   MapFile API:
      http://hadoop.apache.org/common/docs/current/api/org/apache
      /hadoop/io/MapFile.html
[5]   Parallel LZO: Splittable Compression for Hadoop.
      http://www.cloudera.com/blog/2009/06/24/parallel-lzo-splitt
      able-compression-for-hadoop/
      http://blog.chrisgoffinet.com/2009/06/parallel-lzo-splittab
      le-on-hadoop-using-cloudera/
[6]   Using LZO in Hadoop and HBase:
      http://wiki.apache.org/hadoop/UsingLzoCompression
[7]   LZO: http://www.oberhumer.com
[8]   Hadoop LZO native connector library:
      http://code.google.com/p/hadoop-gpl-compression/
[9]   Hadoop Native Libraries Guide:
      http://hadoop.apache.org/common/docs/r0.20.0/native_librari
      es.html




                                                                      12

More Related Content

What's hot

Hive Does ACID
Hive Does ACIDHive Does ACID
Hive Does ACID
DataWorks Summit
 
Mellanox VXLAN Acceleration
Mellanox VXLAN AccelerationMellanox VXLAN Acceleration
Mellanox VXLAN Acceleration
Mellanox Technologies
 
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | EdurekaWhat Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
Edureka!
 
Distributed storage system
Distributed storage systemDistributed storage system
Distributed storage system
Công Lợi Dương
 
State transfer With Galera
State transfer With GaleraState transfer With Galera
State transfer With Galera
Mydbops
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
HostedbyConfluent
 
Cisco Unified Wireless Network and Converged access – Design session
Cisco Unified Wireless Network and Converged access – Design sessionCisco Unified Wireless Network and Converged access – Design session
Cisco Unified Wireless Network and Converged access – Design session
Cisco Russia
 
YugaByte DB on Kubernetes - An Introduction
YugaByte DB on Kubernetes - An IntroductionYugaByte DB on Kubernetes - An Introduction
YugaByte DB on Kubernetes - An Introduction
Yugabyte
 
Nat pat
Nat patNat pat
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon
 
Cql – cassandra query language
Cql – cassandra query languageCql – cassandra query language
Cql – cassandra query language
Courtney Robinson
 
The Ldap Protocol
The Ldap ProtocolThe Ldap Protocol
The Ldap Protocol
Glen Plantz
 
Big Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreBig Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStore
MariaDB plc
 
AAA & RADIUS Protocols
AAA & RADIUS ProtocolsAAA & RADIUS Protocols
AAA & RADIUS Protocols
Peter R. Egli
 
Datastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobsDatastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobs
shanker_uma
 
Java DataBase Connectivity API (JDBC API)
Java DataBase Connectivity API (JDBC API)Java DataBase Connectivity API (JDBC API)
Java DataBase Connectivity API (JDBC API)
Luzan Baral
 
Light Weight Transactions Under Stress (Christopher Batey, The Last Pickle) ...
Light Weight Transactions Under Stress  (Christopher Batey, The Last Pickle) ...Light Weight Transactions Under Stress  (Christopher Batey, The Last Pickle) ...
Light Weight Transactions Under Stress (Christopher Batey, The Last Pickle) ...
DataStax
 
AD & LDAP
AD & LDAPAD & LDAP
Amazon RDS for MySQL: Best Practices and Migration
Amazon RDS for MySQL: Best Practices and MigrationAmazon RDS for MySQL: Best Practices and Migration
Amazon RDS for MySQL: Best Practices and Migration
Amazon Web Services
 

What's hot (20)

Hive Does ACID
Hive Does ACIDHive Does ACID
Hive Does ACID
 
Mellanox VXLAN Acceleration
Mellanox VXLAN AccelerationMellanox VXLAN Acceleration
Mellanox VXLAN Acceleration
 
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | EdurekaWhat Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
 
Distributed storage system
Distributed storage systemDistributed storage system
Distributed storage system
 
State transfer With Galera
State transfer With GaleraState transfer With Galera
State transfer With Galera
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
 
Cisco Unified Wireless Network and Converged access – Design session
Cisco Unified Wireless Network and Converged access – Design sessionCisco Unified Wireless Network and Converged access – Design session
Cisco Unified Wireless Network and Converged access – Design session
 
YugaByte DB on Kubernetes - An Introduction
YugaByte DB on Kubernetes - An IntroductionYugaByte DB on Kubernetes - An Introduction
YugaByte DB on Kubernetes - An Introduction
 
Nat pat
Nat patNat pat
Nat pat
 
Edge architecture ieee international conference on cloud engineering
Edge architecture   ieee international conference on cloud engineeringEdge architecture   ieee international conference on cloud engineering
Edge architecture ieee international conference on cloud engineering
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
Cql – cassandra query language
Cql – cassandra query languageCql – cassandra query language
Cql – cassandra query language
 
The Ldap Protocol
The Ldap ProtocolThe Ldap Protocol
The Ldap Protocol
 
Big Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreBig Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStore
 
AAA & RADIUS Protocols
AAA & RADIUS ProtocolsAAA & RADIUS Protocols
AAA & RADIUS Protocols
 
Datastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobsDatastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobs
 
Java DataBase Connectivity API (JDBC API)
Java DataBase Connectivity API (JDBC API)Java DataBase Connectivity API (JDBC API)
Java DataBase Connectivity API (JDBC API)
 
Light Weight Transactions Under Stress (Christopher Batey, The Last Pickle) ...
Light Weight Transactions Under Stress  (Christopher Batey, The Last Pickle) ...Light Weight Transactions Under Stress  (Christopher Batey, The Last Pickle) ...
Light Weight Transactions Under Stress (Christopher Batey, The Last Pickle) ...
 
AD & LDAP
AD & LDAPAD & LDAP
AD & LDAP
 
Amazon RDS for MySQL: Best Practices and Migration
Amazon RDS for MySQL: Best Practices and MigrationAmazon RDS for MySQL: Best Practices and Migration
Amazon RDS for MySQL: Best Practices and Migration
 

Similar to HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs

JavaOne 2013: Memory Efficient Java
JavaOne 2013: Memory Efficient JavaJavaOne 2013: Memory Efficient Java
JavaOne 2013: Memory Efficient Java
Chris Bailey
 
On metadata for Open Data
On metadata for Open DataOn metadata for Open Data
On metadata for Open Data
Yannis Charalabidis
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Daniel Lemire
 
Phily JUG : Web Services APIs for Real-time Analytics w/ Storm and DropWizard
Phily JUG : Web Services APIs for Real-time Analytics w/ Storm and DropWizardPhily JUG : Web Services APIs for Real-time Analytics w/ Storm and DropWizard
Phily JUG : Web Services APIs for Real-time Analytics w/ Storm and DropWizard
Brian O'Neill
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache Accumulo
Jared Winick
 
Configuring Mahout Clustering Jobs - Frank Scholten
Configuring Mahout Clustering Jobs - Frank ScholtenConfiguring Mahout Clustering Jobs - Frank Scholten
Configuring Mahout Clustering Jobs - Frank Scholten
lucenerevolution
 
Database Sizing
Database SizingDatabase Sizing
Database Sizing
Amin Chowdhury
 
10 -bits_and_bytes
10  -bits_and_bytes10  -bits_and_bytes
10 -bits_and_bytesHector Garzo
 
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
Zilliz
 
About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"
Jihyun Ahn
 
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
buildacloud
 
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Brian O'Neill
 
Interview with Anatoliy Kuznetsov, the author of BitMagic C++ library
Interview with Anatoliy Kuznetsov, the author of BitMagic C++ libraryInterview with Anatoliy Kuznetsov, the author of BitMagic C++ library
Interview with Anatoliy Kuznetsov, the author of BitMagic C++ library
PVS-Studio
 
PERFORMING AN EXPERIMENTAL PLATFORM TO OPTIMIZE DATA MULTIPLEXING
PERFORMING AN EXPERIMENTAL PLATFORM TO OPTIMIZE DATA MULTIPLEXINGPERFORMING AN EXPERIMENTAL PLATFORM TO OPTIMIZE DATA MULTIPLEXING
PERFORMING AN EXPERIMENTAL PLATFORM TO OPTIMIZE DATA MULTIPLEXING
ijesajournal
 
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft..."Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
Dataconomy Media
 
Basics of building a blackfin application
Basics of building a blackfin applicationBasics of building a blackfin application
Basics of building a blackfin application
Pantech ProLabs India Pvt Ltd
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinaloscon2007
 

Similar to HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs (20)

Hfile格式详细介绍
Hfile格式详细介绍Hfile格式详细介绍
Hfile格式详细介绍
 
JavaOne 2013: Memory Efficient Java
JavaOne 2013: Memory Efficient JavaJavaOne 2013: Memory Efficient Java
JavaOne 2013: Memory Efficient Java
 
OOPSLA Talk on Preon
OOPSLA Talk on PreonOOPSLA Talk on Preon
OOPSLA Talk on Preon
 
On metadata for Open Data
On metadata for Open DataOn metadata for Open Data
On metadata for Open Data
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
 
Phily JUG : Web Services APIs for Real-time Analytics w/ Storm and DropWizard
Phily JUG : Web Services APIs for Real-time Analytics w/ Storm and DropWizardPhily JUG : Web Services APIs for Real-time Analytics w/ Storm and DropWizard
Phily JUG : Web Services APIs for Real-time Analytics w/ Storm and DropWizard
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache Accumulo
 
Configuring Mahout Clustering Jobs - Frank Scholten
Configuring Mahout Clustering Jobs - Frank ScholtenConfiguring Mahout Clustering Jobs - Frank Scholten
Configuring Mahout Clustering Jobs - Frank Scholten
 
Database Sizing
Database SizingDatabase Sizing
Database Sizing
 
10 -bits_and_bytes
10  -bits_and_bytes10  -bits_and_bytes
10 -bits_and_bytes
 
Registry
RegistryRegistry
Registry
 
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
 
About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"
 
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
 
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
 
Interview with Anatoliy Kuznetsov, the author of BitMagic C++ library
Interview with Anatoliy Kuznetsov, the author of BitMagic C++ libraryInterview with Anatoliy Kuznetsov, the author of BitMagic C++ library
Interview with Anatoliy Kuznetsov, the author of BitMagic C++ library
 
PERFORMING AN EXPERIMENTAL PLATFORM TO OPTIMIZE DATA MULTIPLEXING
PERFORMING AN EXPERIMENTAL PLATFORM TO OPTIMIZE DATA MULTIPLEXINGPERFORMING AN EXPERIMENTAL PLATFORM TO OPTIMIZE DATA MULTIPLEXING
PERFORMING AN EXPERIMENTAL PLATFORM TO OPTIMIZE DATA MULTIPLEXING
 
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft..."Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
 
Basics of building a blackfin application
Basics of building a blackfin applicationBasics of building a blackfin application
Basics of building a blackfin application
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinal
 

More from Schubert Zhang

Blockchain in Action
Blockchain in ActionBlockchain in Action
Blockchain in Action
Schubert Zhang
 
科普区块链
科普区块链科普区块链
科普区块链
Schubert Zhang
 
Engineering Culture and Infrastructure
Engineering Culture and InfrastructureEngineering Culture and Infrastructure
Engineering Culture and Infrastructure
Schubert Zhang
 
Simple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluationSimple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluation
Schubert Zhang
 
Scrum Agile Development
Scrum Agile DevelopmentScrum Agile Development
Scrum Agile Development
Schubert Zhang
 
Career Advice
Career AdviceCareer Advice
Career Advice
Schubert Zhang
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processing
Schubert Zhang
 
HiveServer2
HiveServer2HiveServer2
HiveServer2
Schubert Zhang
 
Horizon for Big Data
Horizon for Big DataHorizon for Big Data
Horizon for Big Data
Schubert Zhang
 
Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算
Schubert Zhang
 
Big Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aBig Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223a
Schubert Zhang
 
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor Introduction
Schubert Zhang
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验
Schubert Zhang
 
Wild Thinking of BigdataBase
Wild Thinking of BigdataBaseWild Thinking of BigdataBase
Wild Thinking of BigdataBase
Schubert Zhang
 
RockStor - A Cloud Object System based on Hadoop
RockStor -  A Cloud Object System based on HadoopRockStor -  A Cloud Object System based on Hadoop
RockStor - A Cloud Object System based on Hadoop
Schubert Zhang
 
Fans of running gump
Fans of running gumpFans of running gump
Fans of running gump
Schubert Zhang
 
Hadoop compress-stream
Hadoop compress-streamHadoop compress-stream
Hadoop compress-stream
Schubert Zhang
 
Ganglia轻度使用指南
Ganglia轻度使用指南Ganglia轻度使用指南
Ganglia轻度使用指南
Schubert Zhang
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solution
Schubert Zhang
 
Big data and cloud
Big data and cloudBig data and cloud
Big data and cloud
Schubert Zhang
 

More from Schubert Zhang (20)

Blockchain in Action
Blockchain in ActionBlockchain in Action
Blockchain in Action
 
科普区块链
科普区块链科普区块链
科普区块链
 
Engineering Culture and Infrastructure
Engineering Culture and InfrastructureEngineering Culture and Infrastructure
Engineering Culture and Infrastructure
 
Simple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluationSimple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluation
 
Scrum Agile Development
Scrum Agile DevelopmentScrum Agile Development
Scrum Agile Development
 
Career Advice
Career AdviceCareer Advice
Career Advice
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processing
 
HiveServer2
HiveServer2HiveServer2
HiveServer2
 
Horizon for Big Data
Horizon for Big DataHorizon for Big Data
Horizon for Big Data
 
Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算
 
Big Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aBig Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223a
 
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor Introduction
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验
 
Wild Thinking of BigdataBase
Wild Thinking of BigdataBaseWild Thinking of BigdataBase
Wild Thinking of BigdataBase
 
RockStor - A Cloud Object System based on Hadoop
RockStor -  A Cloud Object System based on HadoopRockStor -  A Cloud Object System based on Hadoop
RockStor - A Cloud Object System based on Hadoop
 
Fans of running gump
Fans of running gumpFans of running gump
Fans of running gump
 
Hadoop compress-stream
Hadoop compress-streamHadoop compress-stream
Hadoop compress-stream
 
Ganglia轻度使用指南
Ganglia轻度使用指南Ganglia轻度使用指南
Ganglia轻度使用指南
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solution
 
Big data and cloud
Big data and cloudBig data and cloud
Big data and cloud
 

Recently uploaded

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 

Recently uploaded (20)

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 

HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs

  • 1. HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs Sep 10, 2009 Schubert Zhang (schubert.zhang@gmail.com) http://cloudepr.blogspot.com 1. Introduction HFile is a mimic of Google’s SSTable. Now, it is available in Hadoop HBase-0.20.0. And the previous releases of HBase temporarily use an alternate file format – MapFile[4], which is a common file format in Hadoop IO package. I think HFile should also become a common file format when it becomes mature, and should be moved into the common IO package of Hadoop in the future. Following words of SSTable are from section 4 of Google’s Bigtable paper. The Google SSTable file format is used internally to store Bigtable data. An SSTable provides a persistent, ordered immutable map from keys to values, where both keys and values are arbitrary byte strings. Operations are provided to look up the value associated with a specified key, and to iterate over all key/value pairs in a specified key range. Internally, each SSTable contains a sequence of blocks (typically each block is 64KB in size, but this is configurable). A block index (stored at the end of the SSTable) is used to locate blocks; the index is loaded into memory when the SSTable is opened. A lookup can be performed with a single disk seek: we first find the appropriate block by performing a binary search in the in-memory index, and then reading the appropriate block from disk. Optionally, an SSTable can be completely mapped into memory, which allows us to perform lookups and scans without touching disk.[1] The HFile implements the same features as SSTable, but may provide more or less. 2. File Format Data Block Size Whenever we say Block Size, it means the uncompressed size. The size of each data block is 64KB by default, and is configurable in HFile.Writer. It means the data block will not exceed this size more than one key/value pair. The HFile.Writer starts a new data block to add key/value pairs if the current writing block is equal to or bigger than this size. The 64KB size is same as Google’s [1]. 1
  • 2. To achieve better performance, we should select different block size. If the average key/value size is very short (e.g. 100 bytes), we should select small blocks (e.g. 16KB) to avoid too many key/value pairs in each block, which will increase the latency of in-block seek, because the seeking operation always finds the key from the first key/value pair in sequence within a block. Maximum Key Length The key of each key/value pair is currently up to 64KB in size. Usually, 10-100 bytes is a typical size for most of our applications. Even in the data model of HBase, the key (rowkey+column family:qualifier+timestamp) should not be too long. Maximum File Size The trailer, file-info and total data block indexes (optionally, may add meta block indexes) will be in memory when writing and reading of an HFile. So, a larger HFile (with more data blocks) requires more memory. For example, a 1GB uncompressed HFile would have about 15600 (1GB/64KB) data blocks, and correspondingly about 15600 indexes. Suppose the average key size is 64 bytes, then we need about 1.2MB RAM (15600X80) to hold these indexes in memory. Compression Algorithm - Compression reduces the number of bytes written to/read from HDFS. - Compression effectively improves the efficiency of network bandwidth and disk space - Compression reduces the size of data needed to be read when issuing a read To be as low friction as necessary, a real-time compression library is preferred. Currently, HFile supports following three algorithms: (1) NONE (Default, uncompressed, string name=”none”) (2) GZ (Gzip, string name=”gz”) Out of the box, HFile ships with only Gzip compression, which is fairly slow. (3) LZO(Lempel-Ziv-Oberhumer, preferred, string name=”lzo”) To achieve maximal performance and benefit, you must enable LZO, which is a lossless data compression algorithm that is focused on decompression speed. Following figures show the format of an HFile. 2
  • 3. KeyLen (int) ValLen (int) Key (byte[]) Value (byte[]) Data Block 0 DATA BLOCK MAGIC (8B) Key-Value (First) Data Block 1 …… Key-Value (Last) Data Block 2 KeyLen Key id ValLen Val (vint) (byte[]) (1B) (vint) (byte[]) Meta Block 0 (Optional) User Defined Metadata, start with METABLOCKMAGIC Meta Block 1 Size or ItemsNum (int) (Optional) LASTKEY (byte[]) AVG_KEY_LEN (int) File Info AVG_VALUE_LEN (int) COMPARATOR (className) User Defined Data Index INDEX BLOCK MAGIC (8B) Meta Index Index of Data Block 0 (Optional) … Trailer INDEX BLOCK MAGIC (8B) Index of Meta Block 0 … Fixed File Trailer (Go to next picture) Offset(long) MetaSize (int) MetaNameLen (vint) MetaName (byte[]) Offset(long) DataSize (int) KeyLen (vint) Key (byte[]) 3
  • 4. TRAILER BLOCK MAGIC (8B) File Info Offset (long) Trailer Data Index Offset (long) Data Index Count (int) Meta Index Offset (long) Meta Index Count (int) Total Uncompressed Data Bytes (long) Entry Count or Data K-V Count (int) Compression Codec (int) Version (int) Total Size of Trailer: 4xLong + 5xInt + 8Bytes = 60 Bytes In above figures, an HFile is separated into multiple segments, from beginning to end, they are: - Data Block segment To store key/value pairs, may be compressed. - Meta Block segment (Optional) To store user defined large metadata, may be compressed. - File Info segment It is a small metadata of the HFile, without compression. User can add user defined small metadata (name/value) here. - Data Block Index segment Indexes the data block offset in the HFile. The key of each index is the key of first key/value pair in the block. - Meta Block Index segment (Optional) Indexes the meta block offset in the HFile. The key of each index is the user defined unique name of the meta block. - Trailer The fix sized metadata. To hold the offset of each segment, etc. To read an HFile, we should always read the Trailer firstly. The current implementation of HFile does not include Bloom Filter, which should be added in the future. The FileInfo is a SortedMap in implementation. So the actual order of those 4
  • 5. fields is alphabetically based on the key. 3. LZO Compression LZO is now removed from Hadoop or HBase 0.20+ because of GPL restrictions. To enable it, we should install native library firstly as following. [6][7][8][9] (1) Download LZO: http://www.oberhumer.com/, and build. # ./configure --build=x86_64-redhat-linux-gnu --enable-shared --disable-asm # make # make install Then the libraries have been installed in: /usr/local/lib (2) Download the native connector library http://code.google.com/p/hadoop-gpl-compression/, and build. Copy hadoo-0.20.0-core.jar to ./lib. # ant compile-native # ant jar (3) Copy the native library (build/native/ Linux-amd64-64) and hadoop-gpl-compression-0.1.0-dev.jar to your application’s lib directory. If your application is a MapReduce job, copy them to hadoop’s lib directory. Your application should follow the $HADOOP_HOME/bin/hadoop script to ensure that the native hadoop library is on the library path via the system property -Djava.library.path=<path>. [9] For example: # setup 'java.library.path' for native-hadoop code if necessary JAVA_LIBRARY_PATH='' if [ -d "${HADOOP_HOME}/build/native" -o -d "${HADOOP_HOME}/lib/native" ]; then JAVA_PLATFORM=`CLASSPATH=${CLASSPATH} ${JAVA} -Xmx32m org.apache.hadoop.util.PlatformName | sed -e "s/ /_/g"` if [ -d "$HADOOP_HOME/build/native" ]; then JAVA_LIBRARY_PATH=${HADOOP_HOME}/build/native/${JAVA_PLATFORM}/lib fi if [ -d "${HADOOP_HOME}/lib/native" ]; then if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then JAVA_LIBRARY_PATH=${JAVA_LIBRARY_PATH}:${HADOOP_HOME}/lib/native/${JAVA_PLATFORM} else JAVA_LIBRARY_PATH=${HADOOP_HOME}/lib/native/${JAVA_PLATFORM} fi fi fi Then our application and hadoop/MapReduce can use LZO. 5
  • 6. 4. Performance Evaluation Testbed − 4 slaves + 1 master − Machine: 4 CPU cores (2.0G), 2x500GB 7200RPM SATA disks, 8GB RAM. − Linux: RedHat 5.1 (2.6.18-53.el5), ext3, no RAID, noatime − 1Gbps network, all nodes under the same switch. − Hadoop-0.20.0 (1GB heap), lzo-2.0.3 Some MapReduce-based benchmarks are designed to evaluate the performance of operations to HFiles, in parallel. − Total key/value entries: 30,000,000. − Key/Value size: 1000 bytes (10 for key, and 990 for value). We have totally 30GB of data. − Sequential key ranges: 60, i.e. each range have 500,000 entries. − Use default block size. − The entry value is a string, each continuous 8 bytes are a filled with a same letter (A~Z). E.g. “BBBBBBBBXXXXXXXXGGGGGGGG……”. We set mapred.tasktracker.map.tasks.maximum=3 to avoid client side bottleneck. (1) Write Each MapTask for each range of key, which writes a separate HFile with 500,000 key/value entries. (2) Full Scan Each MapTask scans a separate HFile from beginning to end. (3) Random Seek a specified key Each MapTask opens one separate HFile, and selects a random key within that file to seek it. Each MapTask runs 50,000 (1/10 of the entries) random seeks. (4) Random Short Scan Each MapTask opens one separate HFile, and selects a random key within that file as a beginning to scan 30 entries. Each MapTask runs 50,000 scans, i.e. scans 50,000*30=1,500,000 entries. This table shows the average entries which are written/seek/scanned per second, and per node. Compress none gz lzo SequenceFile Operation (none compress) Write 20718 23885 55147 19789 Full Scan 41436 94937 100000 28626 Random Seek 600 989 956 N/A Random Short Scan 12241 25568 25655 N/A 6
  • 7. In this evaluation case, the compression ratio is about 7:1 for gz(Gzip), and about 4:1 for lzo. Even through the compression ratio is just moderate, the lzo column shows the best performance, especially for writes. The performance of full scan is much better than SequenceFile, so HFile may provide better performance to MapReduce-based analytical applications. The random seek in HFiles is slow, especially in none-compressed HFiles. But the above numbers already show 6X~10X better performance than a disk seek (10ms). Following Ganglia charts show us the overhead of load, CPU, and network. The random short scan makes the similar phenomena. 5. Implementation and API 5.1 HFile.Writer : How to create and write an HFile (1) Constructors There are 5 constructors. We suggest using following two: public Writer(FileSystem fs, Path path, int blocksize, String compress, final RawComparator<byte []> comparator) public Writer(FileSystem fs, Path path, int blocksize, Compression.Algorithm compress, final RawComparator<byte []> comparator) These two constructors are same. They create file (call fs.create(…)) and get an FSDataOutputStream for writing. Since the FSDataOutputStream is 7
  • 8. created when constructing the HFile.Writer, it will be automatically closed when the HFile.Writer is closed. The other two constructors provide FSDataOutputStream as a parameter. It means the file is created and opened outside of the HFile.Writer, so, when we close the HFile.Writer, the FSDataOutputStream will not be closed. But we do not suggest using these two constructors directly. public Writer(final FSDataOutputStream ostream, final int blocksize, final String compress, final RawComparator<byte []> c) public Writer(final FSDataOutputStream ostream, final int blocksize, final Compression.Algorithm compress, final RawComparator<byte []> c) Another constructor only provides fs and path as parameters, all other attributes are default, i.e. NONE of compression, 64KB of block size, raw ByteArrayComparator, etc. (2) Write Key/Value pairs into HFile Before key/value pairs are written into an HFile, the application must sort them using the same comparator, i.e. all key/value pairs must be sequentially and increasingly write/append into an HFile. There are following methods to write/append key/value pairs: public void append(final KeyValue kv) public void append(final byte [] key, final byte [] value) public void append(final byte [] key, final int koffset, final int klength, final byte [] value, final int voffset, final int vlength) When adding a key/value pair, they will check the current block size. If the size reach the maximum size of a block, the current block will be compressed and written to the output stream (of the HFile), and then create a new block for writing. The compression is based on each block. For each block, an output stream for compression will be created from beginning of a new block and released when finish. Following chart is the relationship of the output steams OO design: 8
  • 9. DataOutputStream BufferedOutputStream FinishOnFlushCompressionStream Compression?OutputStream (for different codec) FSDataOutputStream (to an HFile) The key/value appending operation is written from the outside (DataOutputStream), and the above OO mechanism will handle the buffer and compression functions and then write to the file in under layer file system. Before a key/value pair is written, following will checked: - The length of Key - The order of Key (must bigger than the last one) (3) Add metadata into HFile We can add metadata block into an HFile. public void appendMetaBlock(String metaBlockName, byte [] bytes) The application should provide a unique metaBlockName for each metadata block within an HFile. Reminding: If your metadata is large enough (e.g. 32KB uncompressed), you can use this feature to add a separate meta block. It may be compressed in the file. But if your metadata is very small (e.g. less than 1KB), please use following method to append it into file info. File info will not be compressed. public void appendFileInfo(final byte [] k, final byte [] v) (4) Close Before the HFile.Writer is closed, the file is not completed written. So, we must call close() to: - finish and flush the last block 9
  • 10. - write all meta block into file (may be compressed) - generate and write file info metadata - write data block indexes - write meta block indexes - generate and write trailer metadata - close the output-stream. 5.2 HFile.Reader: How to read HFile Create an HFile.Reader to open an HFile, and we can seek, scan and read on it. (1) Constructor We suggest using following constructor to create an HFile.Reader. public Reader(FileSystem fs, Path path, BlockCache cache, boolean inMemory) It calls fs.open(…) to open the file, and gets an FSDataInputStream for reading. The input stream will be automatically closed when the HFile.Reader is closed. Another constructor uses InputStream as parameter directly. It means the file is opened outside the HFile.Reader. public Reader(final FSDataInputStream fsdis, final long size, final BlockCache cache, final boolean inMemory) We can use BlockCache to improve the performance of read, and the mechanism of mechanism will be described in other document. (2) Load metadata and block indexes of an HFile The HFile is not readable before loadFileInfo() is explicitly called . It will read metadata (Trailer, File Info) and Block Indexes (data block and meta block) into memory. And the COMPARATOR instance will reconstruct from file info. BlockIndex The important method of BlockIndex is: int blockContainingKey(final byte[] key, int offset, int length) 10
  • 11. It uses binarySearch to check if a key is contained in a block. The return value of binarySearch() is very puzzled: Data Block Index List Before 0 1 2 3 4 5 6 … binarySearch() return -1 -2 -3 -4 -5 -6 -7 -8 … HFileScanner We must create an HFile.Reader.Scanner to seek, scan, and read on an HFile. HFile.Reader.Scanner is an implementation of HFileScanner interface. To seek and scan in an HFIle, we should do as following: (1) Create a HFile.Reader, and loadFileInfo(). (2) In this HFile.Reader, calls getScanner() to obtain an HFileScanner. (3) .1 For a scan from the beginning of the HFile, calls seekTo() to seek to the beginning of the first block. .2 For a scan from a key, calls seekTo(key) to seek to the position of the key or before the key (if there is not such a key in this HFile). .3 For a scan from before of a key, calls seekBefore(key). (4) Calls next() to iterate over all key/value pairs. The next() will return false when it reach the end of the HFile. If an application wants to stop at any condition, it should be implemented by the application itself. (e.g. stop at a special endKey.) (5) If you want lookup a specified key, just call seekTo(key), the returned value=0 means you found it. (6) After we seekTo(…) or next() to a position of specified key, we can call following methods to get the current key and value. public KeyValue getKeyValue() // recommended public ByteBuffer getKey() public ByteBuffer getValue() (7) Don’t forget to close the HFile.Reader. But a scanner need not be closed, since it does not hold any resource. 11
  • 12. References [1] Google, Bigtable: A Distributed Storage System for Structured Data, http://labs.google.com/papers/bigtable.html [2] HBase-0.20.0 Documentation, http://hadoop.apache.org/hbase/docs/r0.20.0/ [3] HFile code review and refinement. http://issues.apache.org/jira/browse/HBASE-1818 [4] MapFile API: http://hadoop.apache.org/common/docs/current/api/org/apache /hadoop/io/MapFile.html [5] Parallel LZO: Splittable Compression for Hadoop. http://www.cloudera.com/blog/2009/06/24/parallel-lzo-splitt able-compression-for-hadoop/ http://blog.chrisgoffinet.com/2009/06/parallel-lzo-splittab le-on-hadoop-using-cloudera/ [6] Using LZO in Hadoop and HBase: http://wiki.apache.org/hadoop/UsingLzoCompression [7] LZO: http://www.oberhumer.com [8] Hadoop LZO native connector library: http://code.google.com/p/hadoop-gpl-compression/ [9] Hadoop Native Libraries Guide: http://hadoop.apache.org/common/docs/r0.20.0/native_librari es.html 12