Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based FPGA Accelerators


Published on

In Big Data field, Spark SQL is important data processing module for Apache Spark to work with structured row-based data in a majority of operators. Field-programmable gate array(FPGA) with highly customized intellectual property(IP) can not only bring better performance but also lower power consumption to accelerate CPU-intensive segments for an application.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based FPGA Accelerators

  1. 1. Accelerating SPARK SQL to 50X Performance with Apache Arrow based FPGA accelerators Calvin Hung Wasai Technology, Inc. Weiting Chen Intel Corp.
  2. 2. AGENDA Background Problem Definition Solutions Performance Summary Q&A
  3. 3. ABOUT US Calvin Hung @wasaitech Calvin is the co-founder, CEO and CTO of WASAI Technology, which is specialized in FPGA-based datacenter accelerations for Apache Spark, Apache Hadoop and Genomics Analysis applications. He has more than 15 years of experience in software and hardware architecture co-design and performance optimization. Weiting Chen(William) @intel Weiting is a senior software engineer at Intel Software. He has worked for Big Data and Cloud Solutions including Spark, Hadoop, OpenStack, and Kubernetes for more than 6 years.
  4. 4. MOTIVATION ▪ CPU support SIMD instructions such as SSE, AVX2, AVX512, …etc. We would like to unleash the power in SPARK. ▪ Many accelerators such as FPGA, GPU, ASIC, …etc in the world can help CPU to offload functions and speed up the performance.
  5. 5. PROBLEM DEFINITION ▪ How to avoid row and columnar convert overhead during data processing? ▪ How SPARK can leverage AVX support? ▪ How to coordinate the accelerators (e.g. FPGA, GPU, …etc) to work with CPU in SPARK3.0? ▪ How FPGA can help to speed up SPARK? ▪ How to minimize data copy and serialization overhead when copying from host to device? ▪ How to enhance the performance during DMA transfer?
  6. 6. SOLUTION: SPARK + ARROW + FPGA A better way to run SPARK with AVX and accelerators support - Apache Arrow - SPARK 3.0 New Features - OAP Native SQL Engine (Intel) - FPGA Accelerators (WASAI)
  7. 7. THE GOALS End-to-End Columnar-to-Columnar Data Processing: To avoid columnar-to-row or row-to-columnar overhead when processing data. Support AVX(via OAP Native SQL): Columnar based Reader -> Columnar based Data Processing(w/ AVX) -> Columnar based Writer Result With FPGA Integration and acceleration: Columnar based Reader -> Columnar based Data Processing(w/ AVX) -> Columnar based Data Copy to Device -> Columnar based Data Processing(on FPGA) -> Columnar
  8. 8. APACHE ARROW • Each system has its own internal memory format • 70-80% computation wasted on serialization and deserialization • Similar functionality implemented in multiple projects • All systems utilize the same memory format • No overhead for cross-system communication • Projects can share functionality Reference:
  9. 9. NEW FEATURES in SPARK3.0 SPARK-27396 Public APIs for extended Columnar Processing Support ▪ An interface to extend columnar processing API ▪ Provide an opportunity to create a custom API for columnar data processing with OAP Native SQL Engine and FPGA support ▪ Advanced user can define a new interface to communication with accelerators such as GPU or FPGA SPARK-24615 Accelerator-aware task scheduling for SPARK ▪ An interface for SPARK to allocate accelerators in task level ▪ Make SPARK task to be aware accelerators such as GPU, FPGA, …etc ▪ Currently only support GPU ▪ FPGA can be supported in the same way (vendor specific)
  10. 10. OAP Native SQL Engine Plugin Intel Optimized Analytics Package(OAP): Native SQL Engine An End-to-End SPARK Columnar based data processing with Intel AVX support Apache Arrow Arrow Data Source Arrow Data Processing Intel CPU Other Accelerators (FPGA, GPU, …) Columnar Shuffle SPARK SQL SPARK Catalyst
  11. 11. FPGA Acceleration END TO END, COLUMNAR TO COLUMNAR DATA PROCESSING SPARK3.0 FileScan FileWriter Parquet Writer Parquet Reader ColumnarVector ColumnarVector Columnar-to-Row Row-to-Columnar InternalRow Whole Stage Codegen Row based Operator Row based Operator InternalRow InternalRow ColumnarVector SPARK3.0 + OAP Native SQL Engine FileScan FileWriter Parquet Writer Parquet Reader Arrow Arrow Columnar based Operator Arrow Columnar based Operators Arrow FPGA Templates Arrow FPGA Operators Aggregation/GroupBy/… Arrow
  12. 12. OAP NATIVE SQL ENGINE HIGHLIGHT ▪ An Open Source Columnar based Data Processing for SPARK ▪ Apache Arrow based Solution ▪ Enable AVX Support with SIMD instruction acceleration ▪ Leverage SPARK3.0 Support ▪ Communicate with 3rd Party Accelerators ▪ Support Data Source Parsing, SQL Operators, and Columnar Shuffle ▪ Common SQL Operators Support such as filter, join, groupby, aggregate, …etc.
  13. 13. SPARK + FPGA + ARROW Why SPARK SQL + FPGA - SPARK SQL essentially processes structured row-based dataset at once with single query of a bunch of SQL operators. The operators can be simple while the dataset could be extremely large. - FPGA with highly specialized IPs can deal with such multiple-instruction, single- dataset analysis faster, more power and resource-efficiently than CPU and GPU under the same total-cost-of-ownership. Why Arrow - In order to offload SPARK SQL workload from Java runtime to FPGA, leveraging a new WholeStageCodegen to invoke native function calls to process data with FPGA can be messy. Apache Arrow can hold Columnar Batch data inside native memory and manage its memory reference inside Spark.
  14. 14. SPARK SQL FPGA ACCELERATION ▪ SQL Operators (Aggregation, GroupBy, Filter, Sort, Join, …etc) ▪ Using Apache Arrow for data transfer between Java runtime and FPGA to reduce data traffic ▪ Next step will be leveraging Arrow::RecordBatch
  15. 15. SPARK(CPU ONLY) SQL RDD Dataset DataFrame Catalyst Optimization Code Generation RDD Dataset DataFrame Code Execution Spark Executor Data Processed By CPU Input
  16. 16. SPARK(CPU + FPGA) SQL RDD Dataset DataFrame Catalyst Optimization Code Generation RDD Dataset DataFrame Code Execution Spark Executor 1. Get Mem Page for Input / Output 5. Free Memory Pages FPGA Java Wrapper 3. Start FPGA Engine Data Block 2. Fill the Input Data Block Adaptor / JNI / Driver Array[Byte] 4. Iterating the data by Spark API (Schema Specified) DMA Engine Aggregation Engine GroupBy EngineFPGA Input Data Processed By FPGA Re-generate Physical Plan for FPGA
  17. 17. SPARK SQL + ARROW PERFORMANCE ▪ A simple query with 300GB dataset from TPC-DS Q55 ▪ With Apache Arrow, performance boost can be up to 33% and CPU is obviously offloaded. SELECT ss_sold_date_sk, sum(ss_ext_sales_price) FROM store_sales WHERE ss_item_sk = 3175 GROUP BY ss_sold_date_sk 0 2 4 6 8 10 12 14 32 90 300 Minutes Arrow-boosted Original Intel® Xeon® Gold 6120 CPU x2 DDR4 256GB Intel PAC Arria10 x1 (GB)
  18. 18. SYSTEM STACK Storage Storage/Data Format JSON Parquet Distributed Execution Big Data Cores MapReduce Spark SQL Engine Spark RDD/DFOS OS Core System CentOS RHEL Ubuntu FPGA Accelerator Accelerators MapReduce Accelerator Spark SQL Accelerator Spark RDD/DF Accelerator Data Decoder WASAI System Lib & Drivers WASAI IOBooster WASAI EvoCores Compressor
  19. 19. OTHER ACCELERATORS Solution Description Workloads Result Spark RDD groupByKey, foldByKey, etc microbench 80%~3x performance boost. Shuffle size 90% to 99% reduction. General Sort General TimSort for both Hadoop & Spark TeraSort microbench 20% performance boost Compression Compression encoding/decoding microbench Ongoing Erasure Coding EC codec microbench Reach maximum throughput of PCIe Input Format Parsing JSON, CSV, Parquet format parser microbench 2X~7.8X performance boost Intel® Xeon® Gold 6120 CPU x2 DDR4 256GB Intel PAC Arria10 x1
  20. 20. KEY TAKEAWAYS ▪ End-to-End Columnar Data Processing can optimize the performance in CPU, FPGA and other Accelerators in native layer. ▪ FPGA can help to accelerate SPARK in many cases that involved heavy CPU-intensive operations. ▪ Last but not least, with SPARK3.0 support, many new opportunities can be done in the future.
  21. 21. NEXT … ▪ More features in OAP Native SQL Engine ▪ OAP Native SQL Engine + FPGA integration OAP Native SQL Engine Plugin Apache Arrow Arrow Data Source Arrow Data Processing Intel CPU WASAI FPGA Accelerators Columnar Shuffle WASAI CodeGen
  22. 22. CALL TO ACTION We encourage you to try OAP Native SQL Engine for SPARK in Wasai SPARK SQL + FPGA Solution Please contact Intel: Wasai:
  23. 23. Q & A
  24. 24. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.