Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio

242 views

Published on

Alluxio Community Office Hour
Dec 17, 2019

Speakers:
Bin Fan & Jiacheng Liu, Alluxio

While adoption of the Cloud & Kubernetes has made it exceptionally easy to scale compute, the increasing spread of data across different systems and clouds has created new challenges for data engineers. Effectively accessing data from AWS S3 or on-premises HDFS becomes harder and data locality is also lost - how do you move data to compute workers efficiently, how do you unify data across multiple or remote clouds, and many more. Open source project Alluxio approaches this problem in a new way. It helps elastic compute workloads, such as Apache Spark, realize the true benefits of the cloud while bringing data locality and data accessibility to workloads orchestrated by Kubernetes.

One important performance optimization in Apache Spark is to schedule tasks on nodes with HDFS data nodes locally serving the task input data. However, more users are running Apache Spark natively on Kubernetes where HDFS is not an option. This office hour describes the concept and dataflow with respect to using the stack of Spark/Alluxio in Kubernetes with enhanced data locality even the storage service is outside or remote.

In this Office Hour, we will go over:
- Why Spark is able to make a locality-aware schedule when working with Alluxio in K8s environment using the host network
- Why a pod running Alluxio can share data efficiently with a pod running Spark on the same host using domain socket and host path volume
- The roadmap to improve this Spark / Alluxio stack in the context of K8s

Published in: Software
  • Be the first to comment

  • Be the first to like this

Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio

  1. 1. Achieving Data Locality Using Spark + Alluxio with and without K8s Bin Fan, Jiacheng Liu 2019/12/17 Office Hour Questions at https://alluxio.io/slack
  2. 2. Speakers • Bin Fan ○ Founding engineer & Alluxio PMC @ Alluxio ○ Email: binfan@alluxio.com • Jiacheng Liu ○ Core maintainer @ Alluxio ○ Email: jiacheng@alluxio.com For Q&A, join https://alluxio.io/slack
  3. 3. Agenda • Alluxio Overview • Spark + Alluxio Data Locality without K8s • K8s Overview • Spark + Alluxio Data Locality with K8s • Limitations and Future Work
  4. 4. Alluxio Overview
  5. 5. The Alluxio Story Originated as Tachyon project, at UC Berkley AMPLab by Ph.D. student Haoyuan (H.Y.) Li - now Alluxio CTO2013 2015 Open Source project established & company to commercialize Alluxio founded Goal: Orchestrate Data at Memory Speed for the Cloud for data driven apps such as Big Data Analytics, ML and AI. 20192018 2019 Top 10 Big Data 2019 Top 10 Cloud Software
  6. 6. Fast-growing Open Source Community 4000+ Github Stars1000+ Contributors Join the community on Slack (FAQ for this office hour) alluxio.io/slack Apache 2.0 Licensed Contribute to source code github.com/alluxio/alluxio
  7. 7. Alluxio is Open-Source Data Orchestration Data Orchestration for the Cloud Java File API HDFS Interface S3 Interface REST APIPOSIX Interface HDFS Driver GCS Driver S3 Driver Azure Driver
  8. 8. Why Alluxio in K8s? Elastic Data for Elastic Compute with Tight Locality ● Improve Data locality ○ Big data analytics or ML ○ Cache data close to compute ● Enable high-speed data sharing across jobs ○ A closer staging storage layer for compute jobs ● Provide Unification of persistent storage ○ Same data abstraction across different storage Host Alluxio Spark AlluxioAlluxio Spark Alluxio SparkSpark or Read more at https://www.alluxio.io/blog/kubernetes-alluxio-and-the-disaggregated-analytics-stack/
  9. 9. Data Challenges ● Data Copy ○ Multiple Copies of Data ● A Stateful Application on K8s can be hard ○ Data Migration and Rebalancing on elasticity ● Changing applications for a new storage system can be hard ○ Lack of familiar API ○ Tuning can be challenging
  10. 10. Spark + Alluxio Data Locality Without K8s
  11. 11. Spark Workflow (Without Alluxio) Cluster Manager (i.e. YARN/Mesos) Application Spark Context s3://data/ Worker Node 1) run Spark job Worker Node Spark Executor 3) launch executors and launch tasks 4) access data and compute 2) allocate executors Takeaway: Remote data, no locality
  12. 12. Step 2: Schedule compute to data cache location Cluster Manager (i.e. YARN/Mesos) Application Spark Context Alluxio Client Alluxio Masters HostA: 196.0.0.7 Alluxio Worker HostB: 196.0.0.8 Alluxio Worker 2.2) allocate on [HostA] block1 block2 2.1) where is block1? 2.1) block1 is served at [HostA] ● Alluxio client implements HDFS compatible API with block location info ● Alluxio masters keep track and serve a list of worker nodes that currently have the cache.
  13. 13. Step 4: Find local Alluxio Worker and Efficient Data Exchange s3://data/ Spark Executor Alluxio WorkerAlluxio Client HostB: 196.0.0.8 Alluxio Worker HostA: 196.0.0.7 block1 Alluxio Masters block1? [HostA] Efficient I/O via local fs (e.g., /mnt/ramdisk/) or local domain socket (/opt/domain) ● Spark Executor finds local Alluxio Worker by hostname comparison ● Spark Executor talks to local Alluxio Worker using either short-circuit access (via local FS) or local domain socket
  14. 14. Recap: Spark Architecture with Alluxio Cluster Manager (i.e. YARN/Mesos) Application Spark Context s3://data/ Alluxio Client Alluxio Masters Worker Node Spark Executor Alluxio Worker Alluxio Client Worker Node Alluxio Worker 1) run spark job 2.2) allocate executors 4) access Alluxio for data and compute 2.1) talk to Alluxio for where the data cache is Step 2: Help Spark schedule compute to data cache Step 4: Enable Spark to access local Alluxio cache 3) launch executors and launch tasks
  15. 15. Kubernetes Overview
  16. 16. Kubernetes (k8s) is... “an open-source container-orchestration system for automating application deployment, scaling, and management.”
  17. 17. Container Orchestration Platform ● Platform agnostic cluster management ● Service discovery and load balancing ● Storage orchestration ● Horizontal scaling ● Self-monitoring and self-healing ● Automated rollouts and rollbacks
  18. 18. K8s Terms ● Node ○ A VM or physical machine ● Container ○ Docker Image = Lightweight OS and application execution environment ○ Container = Image once running on Docker Engine ● Pod ○ Schedulable unit of one or more containers running together ○ Containers share storage and network etc ● Controller ○ Controls the desired state such as copies of a Pod ● DaemonSet ○ A Controller that ensures each Node has only one such Pod ● Volume ○ A directory accessible to the Containers in a Pod ● Persistent Volume ○ A storage resource with lifecycle independent of Pods
  19. 19. Spark + Alluxio Data Locality In K8s environment
  20. 20. Spark on K8s architecture ● Spark 2.3 added native K8s support ● Spark application runs in K8s as a custom K8s Controller ● spark-submit talks to K8s API Server to create Driver ● Spark Driver creates Executor Pods ● When the application completes, the Executor Pods terminate and are cleaned up ● The Driver Pod persists logs and remains in “COMPLETED” state
  21. 21. Deployment model: Co-location ● Co-locate Spark Executor Pod with Alluxio Worker Pod ● Lifecycle ○ Spark Executors are ephemeral ○ Alluxio Worker serves cache across all Spark jobs ● Deployment order: ○ Deploy Alluxio cluster first (masters+workers) ○ 1 Alluxio Worker on each Node, by DaemonSet ○ Spark-submit to launch Spark Driver + Executors Worker Node Spark Executor Alluxio WorkerAlluxio Client Worker Node Alluxio Worker PodAlluxio Client Spark Executor Pod Co-location No K8s Co-location K8s
  22. 22. Challenge 1: Spark Driver fails to allocate Executors correctly Application Spark Context Alluxio Client allocate on [AlluxioWorkerPodA]? block1 block2 block1? [AlluxioWorkerPodA] HostA: 196.0.0.7 AlluxioWorkerPodA: 10.0.1.1 HostB: 196.0.0.8 AlluxioWorkerPodB: 10.0.1.2 Alluxio Master Pods Some Node Spark Driver Pod Some Node Problem: Alluxio Worker Pods have different addresses from Nodes
  23. 23. Current Solution: Launch Alluxio Workers with hostNetwork Application Spark Context Alluxio Client allocate on [HostA] block1 block2 block1? [HostA] HostA: 196.0.0.7 HostA: 196.0.0.7 HostB: 196.0.0.8 HostB: 196.0.0.8 Alluxio Master Pods Some Node Spark Driver Pod Some Node hostNetwork: true hostNetwork: true Network adapter Network adapter
  24. 24. Challenge 2: Spark Executor failed to identify if Alluxio Pod is host-local Problem: ● Spark Executor Pod has different hostName ○ Hostname HostA != SparkExecPod1 ○ Local Alluxio Worker not identified! HostA: 196.0.0.7 Network adapter block1 Alluxio Client SparkExecPod1: 10.0.1,1 Alluxio Master block1? [HostA]
  25. 25. Challenge 3: Spark Executor failed to find domain socket Problem: ● Pods don’t share the File System ● Domain socket /opt/domain is in Alluxio Worker Pod HostA: 196.0.0.7 Network adapter block1 Alluxio Client SparkExecPod1: 10.0.1,1 Alluxio Master block1? [HostA] /opt/domain/
  26. 26. Current Solution: Use UUID to identify Alluxio Worker and share a hostPath Volume between Pods HostA: 196.0.0.7 block1 Alluxio Client SparkExecPod1: 10.0.1,1 Alluxio Master block1? [HostA: UUIDABC] /opt/domain/UUIDABC UUID: UUIDABC ● Each Alluxio Worker has a UUID ● Share domain socket by a hostPath Volume ● Alluxio Client finds local worker’s domain socket by finding file matching worker UUID. ● Worker domain socket path /opt/domain/d -> /opt/domain/UUIDABC Enabled by alluxio.worker.data.server.domain.socket.address ● Mount hostPath Volume to Spark Executor Enabled by Spark 2.4 (PR)
  27. 27. Recap: Spark + Alluxio Architecture on K8s Application Spark Context Alluxio Client Spark Executor Pod Alluxio Worker Pod Alluxio Client Alluxio Worker Pod Worker Node Worker Node Spark Driver Pod Alluxio Master Pods Some Node Some Node 1) run spark job 2.1) talk to Alluxio for where the data is s3://data/ 2.2) create Driver 4) access Alluxio for data and compute Recap: Step 2: Help Spark schedule compute to data location Step 4: Enable Spark to access data locally 3) launch executors and launch tasks
  28. 28. Limitations and future work ● Many enterprise environments have strict control on hostNetwork and hostPath. ● Alluxio workers need hostNetwork ○ Plan: workarounds like supporting container network translation ● The domain socket file requires a hostPath volume ○ Plan: workarounds like using Local Persistent Volumes ● Feedback/collaboration are welcome!
  29. 29. Additional Resources Spark Native K8s Support https://spark.apache.org/docs/latest/running-on-kubernetes.html https://kubernetes.io/blog/2018/03/apache-spark-23-with-native-kubernetes/ Spark + Alluxio Advanced Data Locality https://www.alluxio.io/blog/top-10-tips-for-making-the-spark-alluxio-stack-blazing-fast/ Running Alluxio in K8s https://docs.alluxio.io/os/user/stable/en/deploy/Running-Alluxio-On-Kubernetes.html Running Spark with Alluxio in K8s https://docs.alluxio.io/os/user/stable/en/compute/Spark-On-Kubernetes.html How Alluxio Helps Analytics Stack in Kubernetes https://www.alluxio.io/blog/kubernetes-alluxio-and-the-disaggregated-analytics-stack/
  30. 30. zhuanlan.zhihu.com/alluxio www.alluxio.com info@alluxio.com twitter.com/alluxio linkedIn.com/alluxio Thank you binfan@alluxio.com jiacheng@alluxio.com https://alluxio.io/slack

×