Once the staple of HPC clusters, today high-performance network and storage devices are everywhere. For a fraction of the cost, one can rent 40/100 Gbps RDMA networks and high-end NVMe flash devices supporting 10s GB/s bandwidths, less than 100 microseconds of latencies, with millions of IOPS. How does one leverage this phenomenal performance for popular data processing frameworks such as Apache Spark, Flink, Hadoop that we all know and love?
In this talk, I will introduce the Apache Crail (Incubating), which is a fast, distributed data store that is designed specifically for high-performance network and storage devices. The goal of the project is to deliver the true hardware performance to Apache data processing frameworks in the most accessible way. With its modular design, Crail supports multiple storage back ends (DRAM, NVMe Flash, and 3D XPoint) and networking protocols (RDMA and TPC/sockets). Crail provides multiple flexible APIs (file system, KV, HDFS, streaming) for a better integration with the high-level data access operations in Apache compute frameworks. As a result, on a 100 Gbps network infrastructure, Crail delivers all-to-all shuffle operations at 80+ Gbps speed, broadcast operations at less than 10 usec latencies, and more than 8M lookups/namenode, etc. Moreover, Crail is a generic solution that integrates well with the Apache ecosystem including frameworks like Spark, Hadoop, Hive, etc.
I will present the case for Crail, its current status, and future plans. As Crail is a young Apache project, we are seeking to build a community and expand its application to other interesting domains.
Animesh Trivedi, IBM Research, Research Staff Member (RSM)