Be the first to like this
Ever since its creation, HDFS has been relying on data replication to shield against most failure scenarios. However, with the explosive growth in data volume, replication is getting quite expensive: the default 3x replication scheme incurs a 200% overhead in storage space and other resources (e.g., network bandwidth when writing the data). Erasure coding (EC) uses far less storage space while still providing the same level of fault tolerance. Under typical configurations, EC reduces the storage cost by ~50% compared with 3x replication.