Successfully reported this slideshow.
Your SlideShare is downloading. ×

Powering Custom Apps at Facebook using Spark Script Transformation

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 56 Ad

Powering Custom Apps at Facebook using Spark Script Transformation

Download to read offline

Script Transformation is an important and growing use-case for Apache Spark at Facebook. Spark’s script transforms allow users to run custom scripts and binaries directly from SQL and serves as an important means of stitching Facebook’s custom business logic with existing data pipelines.

Along with Spark SQL + UDFs, a growing number of our custom pipelines leverage Spark’s script transform operator to run user-provided binaries for applications such as indexing, parallel training and inference at scale. Spawning custom processes from the Spark executors introduces new challenges in production ranging from external resources allocation/management, structured data serialization, and external process monitoring.

In this session, we will talk about the improvements to Spark SQL (and the resource manager) to support running reliable and performant script transformation pipelines. This includes:
1) cgroup v2 containers for CPU, Memory and IO enforcement,
2) Transform jail for processes namespace management,
3) Support for complex types in Row format delimited SerDe,
4) Protocol Buffers for fast and efficient structured data serialization. Finally, we will conclude by sharing our results, lessons learned and future directions (e.g., transform pipelines resource over-subscription).

Script Transformation is an important and growing use-case for Apache Spark at Facebook. Spark’s script transforms allow users to run custom scripts and binaries directly from SQL and serves as an important means of stitching Facebook’s custom business logic with existing data pipelines.

Along with Spark SQL + UDFs, a growing number of our custom pipelines leverage Spark’s script transform operator to run user-provided binaries for applications such as indexing, parallel training and inference at scale. Spawning custom processes from the Spark executors introduces new challenges in production ranging from external resources allocation/management, structured data serialization, and external process monitoring.

In this session, we will talk about the improvements to Spark SQL (and the resource manager) to support running reliable and performant script transformation pipelines. This includes:
1) cgroup v2 containers for CPU, Memory and IO enforcement,
2) Transform jail for processes namespace management,
3) Support for complex types in Row format delimited SerDe,
4) Protocol Buffers for fast and efficient structured data serialization. Finally, we will conclude by sharing our results, lessons learned and future directions (e.g., transform pipelines resource over-subscription).

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Powering Custom Apps at Facebook using Spark Script Transformation (20)

Advertisement

More from Databricks (20)

Advertisement

Powering Custom Apps at Facebook using Spark Script Transformation

  1. 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  2. 2. Powering Custom Apps at Facebook using Spark Script Transformation Abdulrahman Alfozan Spark Summit Europe
  3. 3. Agenda 1. Intro to Spark Script Transforms 2. Spark Transforms at Facebook 3. Core Engine Improvements 4. Efficiency Analysis and Results 5. Transforms Execution Model 6. Future Plans
  4. 4. 2015 Small Scale Experiments 2016 Few Pipelines in Production 2017 Running 60TB+ shuffle pipelines 2018 Full-production deployment Successor to Apache Hive at Facebook 2019 Scaling Spark Largest Compute Engine at Facebook by CPU Spark at Facebook Reliability and efficiency are our top priority
  5. 5. Agenda 1. Intro to Spark Script Transforms 2. Spark Transforms at Facebook 3. Core Engine Improvements 4. Efficiency Analysis and Results 5. Transforms Execution Model 6. Future Plans
  6. 6. Script Transforms SQL query SELECT TRANSFORM (inputs) USING “script” AS (outputs) FROM src_tbl;
  7. 7. Script Transforms SQL query ScriptTransformation (inputs, script, outputs) TableScan (src_tbl) SELECT TRANSFORM (inputs) USING “script” AS (outputs) FROM src_tbl; Query plan
  8. 8. Script Transforms ScriptTransformation (inputs, script, outputs) TableScan (src_tbl) SQL query Query plan Spark External Process Input Table Output Table inputs outputs Execution SELECT TRANSFORM (inputs) USING “script” AS (outputs) FROM src_tbl;
  9. 9. 1. Flexibility: Unlike UDFs, transforms allow unlimited use-cases 2. Efficiency: Most transformers are written in C++ Why Script Transforms?
  10. 10. 1. Flexibility: Unlike UDFs, transforms allow unlimited use-cases 2. Efficiency: Most transformers are written in C++ Why Script Transforms? Transforms provide custom data processing while relying on Spark for ETL, data partitioning, distributed execution, and fault-tolerance.
  11. 11. 1. Flexibility: Unlike UDFs, transforms allow unlimited use-cases 2. Efficiency: Most transformers are written in C++ Why Script Transforms? Transforms provide custom data processing while relying on Spark for ETL, data partitioning, distributed execution, and fault-tolerance. e.g. Spark is optimized for ETL. PyTorch is optimized for model serving.
  12. 12. Agenda 1. Intro to Spark Script Transforms 2. Spark Transforms at Facebook 3. Core Engine Improvements 4. Efficiency Analysis and Results 5. Transforms Execution Model 6. Future plans
  13. 13. Transform Pipelines Usage % of overall CPU 15% 12% 9% 6% 3% 0%
  14. 14. Pure SQL (54%) Pure SQL (72%) Transforms & UDFs (45%) Transforms & UDFs (20%) DataFrames (1%) DataFrames (8%) Count CPU Transform Pipelines Usage Query Count CPU Comparison
  15. 15. Use-case 1: Batch Inference SQL Query Transform resourcesADD FILES inference_engine, model.md; SELECT TRANSFORM (id INT, metadata STRING, image STRING) ROW FORMAT SERDE 'JSONSimpleSerDe' USING ‘inference_engine --model=model.md’ AS labels MAP<STRING, DOUBLE> ROW FORMAT SERDE 'JSONSimpleSerDe' FROM tlb_images; Output: category>confidence Input columns Input format Output format
  16. 16. Use-case 1: Batch Inference Transform main.cpp #include ”spark/Transformer.h” ... while (transformer.readRow(input)) { // data processing auto prediction = predict(input) // write output map transformer.writeRow(prediction) } Transform lib Row iterator
  17. 17. Use-case 1: Batch Inference PyTorch runtime container Self-contained Executable Spark Executor Transform Process stdin stdout Spark Task InternalRow Serialization into JSON JSON deserialization into InternalRow JSON deserialization into C++ objects C++ objects serialization into JSON {id:1, metadata:, image:…} {label_1: score, label_2: score} Model
  18. 18. Use-case 2: Batch Indexing SQL Query Transform resourcesADD FILES indexer; SELECT TRANSFORM (shard_id INT, data STRING) ROW FORMAT SERDE ‘RowFormatDelimited‘ USING ‘indexer --schema=data<STRING>’ FROM src_tbl CLUSTER BY shard_id; Input columns Input format Partition operator
  19. 19. Spark Task Spark Task Use-case 2: Batch Indexing shard_id data 1 {…} 1 {…} 2 {…} shard_id data 1 {…} 2 {…} 2 {…} shard_id data 1 {…} 1 {…} 1 {…} shard_id data 2 {…} 2 {…} 2 {…} Mappers Reducer Transforms Shuffle Reducer 1 Mapper 1 Mapper 2 Transform Process stdin indexer … Reducer 2 Execution
  20. 20. Agenda 1. Intro to Spark Script Transforms 2. Spark Transforms at Facebook 3. Core Engine Improvements 4. Efficiency Analysis and Results 5. Transforms Execution Model 6. Future Plans
  21. 21. ScriptTransformationExec.scala • Direct process invocation • Class IOSchema to handle SerDe schema and config • MonitorThread to track transform process progress • Transform process error handling and surfacing Core Engine Improvements Operator
  22. 22. • DelimitedJSONSerDe.scala JSON format standard RFC 8259 Core Engine Improvements SerDe support
  23. 23. • SimpleSerDe.scala ROW FORMAT DELIMITED Core Engine Improvements SerDe support FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY '|' MAP KEYS TERMINATED BY ':' LINES TERMINATED BY 'n' Configurable properties
  24. 24. Core Engine Improvements SerDe support Development • Text-based DelimitedJSONSerDe.scala SimpleSerDe.scala • Binary ? Production
  25. 25. • Binary format Text-based encoding is slow and less-compact Core Engine Improvements Production SerDe Requirements
  26. 26. • Binary format Text-based encoding is slow and less-compact • Zero-copy Access to serialized data without parsing or unpacking Improving Facebook’s performance on Android with FlatBuffers Core Engine Improvements Production SerDe Requirements
  27. 27. • Binary format Text-based encoding is slow and less-compact • Zero-copy Access to serialized data without parsing or unpacking Improving Facebook’s performance on Android with FlatBuffers • Word-aligned data Allow for SIMD optimizations Core Engine Improvements Production SerDe Requirements
  28. 28. • LazyBinarySerDe (Apache Hive) Not zero-copy nor word-aligned, require converters in Spark • Protocol Buffers / Thrift Not zero-copy, more suited for RPC • Flatbuffers / Cap’n Proto require converters (to/from InternalRow) in Spark Core • Apache Arrow great future option Binary SerDe Considerations
  29. 29. UnsafeRow • Binary & Word-aligned • Zero-copy • Already part of Spark core • Available converters to/from InternalRow Binary SerDe Considerations Chosen format
  30. 30. UnsafeRow SerDe SPARK-7076: Introduced UnsafeRow format to Spark apache/spark/sql/catalyst/expressions/UnsafeRow.java
  31. 31. UnsafeRow SerDe SPARK-15962: Introduced UnsafeArrayData and UnsafeMapData apache/spark/sql/catalyst/expressions/UnsafeArrayData.java apache/spark/sql/catalyst/expressions/UnsafeMapData.java
  32. 32. UnsafeRow SerDe UnsafeRow SerDe C++ library INT BIGINT BOOLEAN FLOAT DOUBLE STRING ARRAY<INT> MAP<INT,STRING> int32_t int64_t bool float double unsaferow::String unsaferow::List<int32_t> unsaferow::Map<int32_t, unsaferow::String> SQL datatypes C++ datatypes
  33. 33. UnsafeRow SerDe UnsafeRow SerDe C++ library SELECT TRANSFORM (id INT) ROW FORMAT SERDE 'UnsafeRowSerDe' USING ‘script’ AS (value BIGINT) ROW FORMAT SERDE UnsafeRowSerDe' FROM src_tbl; #include ”spark/Transformer.h” while (transformer.readRow(input)) { // data processing int32_t id = input->getID(); output->setValue(id*id); // write output transformer.writeRow(output) } SQL Query C++ Transformer
  34. 34. Core Engine Improvements SerDe support summary Development Production • Text-based DelimitedJSONSerDe.scala SimpleSerDe.scala • Binary UnsafeRowSerDe.scala
  35. 35. Core Engine Improvements SELECT TRANSFORM (id, AVG(value) AS value_avg) USING ‘script’ AS (output) FROM src_tbl; GROUP BY id; Aggregation and projection support (SQL)
  36. 36. Agenda 1. Intro to Spark Script Transforms 2. Spark Transforms at Facebook 3. Core Engine Improvements 4. Efficiency Analysis and Results 5. Transforms Execution Model 6. Future Plans
  37. 37. • Text-based (UTF-8) - JSON - Row Format Delimited • Binary: - UnsafeRow Efficiency Analysis SerDe overhead JSON lib
  38. 38. Efficiency Analysis Text-SerDe CPU overhead: Spark
  39. 39. Efficiency Analysis Text-SerDe CPU overhead: Transform process
  40. 40. • Text-based SerDe overhead is non-negligible especially for Complex types • SerDe cost could be up to 70% of pipeline’s CPU resources Efficiency Analysis SerDe overhead
  41. 41. • Text-based SerDe overhead is non-negligible especially for Complex types • SerDe cost could be up to 70% of pipeline’s CPU resources Solution: use an efficient binary SerDe Efficiency Analysis SerDe overhead
  42. 42. Efficiency Analysis: UnsafeRow Efficient Binary SerDe UnsafeRow C++ lib
  43. 43. Efficiency Analysis: UnsafeRow Spark
  44. 44. Efficiency Analysis: UnsafeRow Transform process
  45. 45. UnsafeRow SerDe Benchmark Text-Based SerDe vs Binary SerDe (UnsafeRow) Transform pipelines end- to-end CPU savings: up to 4x Complex types SerDe impacted the most
  46. 46. Agenda 1. Intro to Spark Script Transforms 2. Spark Transforms at Facebook 3. Core Engine Improvements 4. Efficiency Analysis and Results 5. Transforms Execution Model 6. Future Plans
  47. 47. CPU cores per container: spark.executor.cores = 4 Memory per container: spark.executor.memory=4GB + spark.transform.memory=4GB Transforms Execution Model Resource Request Spark Driver Cluster Manager Node Manager Node Manager Executor Task 1 Task 2 Resource Request CPU cores = 4, Memory = 8GB Launch Spark Executor Process 1 Process 2 Executor
  48. 48. • JVM’s memory limits: Xms, Xmx and Xss. • CPU threads: spark.executor.core,spark.task.cpus Transforms Execution Model Resource Control
  49. 49. • JVM’s memory limits: Xms, Xmx and Xss. • CPU threads: spark.executor.core,spark.task.cpus These limits are irrelevant when running an external process! Transforms Execution Model Resource Control
  50. 50. • JVM’s memory limits: Xms, Xmx and Xss. • CPU threads: spark.executor.core,spark.task.cpus These limits are irrelevant when running an external process! Solution: cgroup v2 containers Transforms Execution Model Resource Control
  51. 51. cgroup v2 controllers: • cpu.weight Allows Multi-threaded transforms • memory.max OOM offending processes • io.latency IO QoS Transforms Execution Model Resource Control & Isolation
  52. 52. Transforms Execution Model Resource Control & Isolation /cgroup2/task_container/exec1
  53. 53. Agenda 1. Intro to Spark Script Transforms 2. Spark Transforms at Facebook 3. Core Engine Improvements 4. Efficiency Analysis and Results 5. Transforms Execution Model 6. Future Plans
  54. 54. • Binary SerDe based on Apache arrow • Vectorization Future Plans
  55. 55. Questions
  56. 56. INFRASTRUCTURE

×