Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scaling spark on kubernetes at Lyft


Published on

Slides presented during the Strata SF 2019 conference. Explaining how Lyft is building a multi-cluster solution for running Apache Spark on kubernetes at scale to support diverse workloads and overcome challenges.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Scaling spark on kubernetes at Lyft

  1. 1. Scaling Spark on Kubernetes Li Gao (Lyft) Bill Graham (Lyft)
  2. 2. Introduction Li Gao Works in the Data Platform team at Lyft, currently leading the Compute Infra initiatives including Spark on Kubernetes. Previously at: Salesforce, Fitbit, Groupon, and other startups. Bill Graham Engineer/Architect on the Data Platform team at Lyft, currently developing data ingestion systems. Previously at Twitter, CBS Interactive, CNET Networks
  3. 3. ● Introduction of Data Landscape at Lyft ● The challenges we face ● How Apache Spark on Kubernetes can help ● Remaining work Agenda
  4. 4. Data Landscape ● Batch data Ingestion and ETL ● Data Streaming ● ML platforms ● Notebooks and BI tools ● Query and Visualization ● Operational Analytics ● Data Discovery & Lineage ● Workflow orchestration ● Cloud Platforms
  5. 5. Data Landscape ● Batch data Ingestion and ETL ● Data Streaming ● ML platforms ● Notebooks and BI tools ● Query and Visualization ● Operational Analytics ● Data Discovery & Lineage ● Workflow orchestration ● Cloud Platforms
  6. 6. The Evolving Batch Compute Architecture Future2016-2017 Vendor-based Hadoop Early 2018 Hive on MR Vendor Presto Mid 2018 Hive on Tez + Spark adhoc Late 2018 Spark on Vendor GA Early 2019 Spark on K8s Alpha Spark on K8s Beta
  7. 7. What batch compute is used for Events Ext Data RDB/KV Sys Events InjestPipelines AWSS3 AWSS3 Batch Compute Clusters HMS Presto,Hive,andBITools Analysts Engineers Scientists Services
  8. 8. Initial Architecture
  9. 9. Batch Compute Challenges ● 3rd Party vendor dependency issues ● Data ETL expressed solely in SQL ● Complex logic expressed in Python that hard to adopt in SQL ● Different dependencies and versions ● Resource load balancing for heterogeneous workloads
  10. 10. 3rd Party Vendor Dependencies ● Proprietary patches ● Inconsistent bootstrap ● Release schedule ● Homogeneous environments ● HIPAA Compliance
  11. 11. Is SQL the complete solution?
  12. 12. What about Python functions? “I want to express my processing logic in python functions with external geo libraries (i.e. Geomesa) and interact with Hive tables” --- Lyft data engineer
  13. 13. How Apache Spark helps? RDB/KV Applications APIs Environments Data Sources and Data Sinks
  14. 14. What challenges remain? ● Per job custom dependencies ● Handling version requirements (Py3 v.s. Py2) ● Still need to run on shared clusters for cost efficiency
  15. 15. How about Dependencies? RTree Libraries Data CodecsSpatial Libraries
  16. 16. How about different Spark or Hive versions? ● Legacy jobs that require Spark 2.2 ● Newer Jobs require Spark 2.3 or Spark 2.4 ● Hive 2.1 SQL and Hive 2.3
  17. 17. How Kubernetes can help? CRD Operators & Controllers Pods Ingress & CNI Services Namespaces Pods Declarative Resources Deployment & Replicas Community
  18. 18. What are the challenges running Spark on k8s? ● Spark on k8s is still in its infancy ● Single cluster scaling limit ● CRD and control plane update challenges ● Pod churn and IP address allocations ● ECR container image reliability
  19. 19. Current scale of batch jobs ● PB data lake ● (O) k batch jobs running daily ● ~ 1000s of EC2 nodes spanning multiple clusters ● ~ 1000s of workflows running daily
  20. 20. How Lyft scales Spark on K8s # of Clusters # of Namespaces # of Pods Pod Churn Rate # of Nodes Pod Size Job:Pod ratio IP Alloc Rate Limit ECR Rate Limit
  21. 21. The Evolving Architecture
  22. 22. One vs Many Kubernetes Clusters
  23. 23. Cluster Pool HA Support Cluster 1 Cluster 2 Cluster 3 Cluster Pool A Cluster 4 ● Cluster rotation within a cluster pool ● Automated provisioning of a new cluster and (manually) add into rotation ● Throttle at lower bound when rotation in progress
  24. 24. One vs Many Kubernetes Namespaces Pod Pod Pod Namespace 1 Pod Pod Pod Namespace 2 Pod Pod Pod Namespace 3 Node A Node B Node C Node D Role1 Role1 Role2 Max Pod Size 1 Max Pod Size 2 ● Practical ~3-5K active pods per namespace observed ● Less preemption required when namespace isolated by quota ● Different namespaces can map different IAM roles and sidecar configurations
  25. 25. Shared vs Dedicated Kubernetes Pods Job Controller Spark Driver Pod Spark Exec Pods Job 2 Driver Pod Job 2 Exec Pods Job 3 Driver Pod Job 3 Exec Pods Shared Pods Job 1 Job 4 Job 3 Job 2 AWS S3 Dep Dep Dedicate & Isolated Pods Dep
  26. 26. What about Pod Churn? Separating DDL from DML to reduce churn
  27. 27. Separating DDL from DML Commands
  28. 28. Pod Priority and Preemptions (WIP) ● Priority base preemption ● Driver pod has higher priority than executor pod D1 D2 E1 E2 E3 E4 Scheduler D1 E5 New Pod Req Before D2 E5 E2 E3 E4 After E1 Evicted
  29. 29. What about ECR reliability? Node 1 Node 2 Node 3 Pods Pods Pods DaemonSet + Docker In Docker Container Images
  30. 30. Spark Job Config Overlays Cluster Pool Defaults Cluster Defaults Spark Job User Specified Config Cluster and Namespace Overrides Final Spark Job Config Job Controller and Event Watcher Spark Operator
  31. 31. X-Rays of the Architecture - Job Controller
  32. 32. X-Rays of the Architecture - Spark Operator
  33. 33. Monitoring & Logging Toolbox HEKA JMX
  34. 34. Monitoring Example - OOM Kill in namespace
  35. 35. Automation Toolbox Kustomize Template K8S Deploy Sidecar injectors Secrets injectors DaemonSets KIAM
  36. 36. Remaining Work ● More intelligent job routing and parameter setting ● Granular cost attribution ● Improved docker image distribution ● Spark 3.0!
  37. 37. Key Takeaways ● Apache Spark can help unify different batch data compute use cases ● Kubernetes can help solve the dependency and multi-version requirements using its containerized approach ● Spark on Kubernetes can scale significantly by using a multi-cluster approach with proper resource isolation and scheduling techniques ● Challenges remain when running Spark on Kubernetes at scale
  38. 38. Community This effort would not be possible without the help from the open source and wider communities:
  39. 39. Thank you Strata SF 2019 Li Gao, in/ligao101 @ligao Bill Graham, @billgraham Please rate this session! Questions?
  40. 40. We’re Hiring! Apply at or email Data Engineering Engineering Manager San Francisco Software Engineer San Francisco, Seattle, & New York City Data Infrastructure Engineering Manager San Francisco Software Engineer San Francisco & Seattle Experimentation Software Engineer San Francisco Streaming Software Engineer San Francisco Observability Software Engineer San Francisco
  41. 41. Strata SF 2019 Rate this session session page on conference website O’Reilly Events App