Improving Data Locality for Analytics
Jobs on Kubernetes Using Alluxio
Adit Madan | Alluxio
Gene Pang | Alluxio
Alluxio Overview
Data locality with Spark and Alluxio
Kubernetes Overview
Spark and Alluxio in Kubernetes
Alluxio Innovations for Structured Data
Outline
2
Data Ecosystem - Beta Data Ecosystem 1.0
COMPUTE
STORAGE STORAGE
COMPUTE
3
Co-located
Data stack journey and innovation paths
Co-located
compute & HDFS
on the same cluster
Disaggregated
compute & HDFS
on the same cluster
MR / Hive
HDFS
Hive
HDFS
Disaggregated
Burst HDFS data in
the cloud,
public or private
Support Presto, Spark
and other computes
without app changes
Enable & accelerate
big data on
object stores
Transition to Object
store
HDFS for Hybrid Cloud
Support more
frameworks
▪ Typically compute-bound
clusters over 100%
capacity
▪ Compute & I/O need to
be scaled together even
when not needed
▪ Compute & I/O can be
scaled independently but
I/O still needed on HDFS
which is expensive
1 2
3
4
5
Data Orchestration for the
Cloud
Java File API HDFS Interface S3 Interface REST APIPOSIX Interface
HDFS Driver Swift Driver S3 Driver NFS Driver
Independent scaling of compute & storage
Alluxio Data Orchestration for the Cloud
Structured
Data Catalog
Intelligent
Caching
Data
Transformation
Data
Management
Global
Namespace
7
Host
Alluxio
Spark
AlluxioAlluxio
Spark
Alluxio
SparkSpark
or
Read more at https://www.alluxio.io/blog/kubernetes-alluxio-and-the-disaggregated-analytics-stack/
Elastic Data for Elastic Compute
Improve Data locality
• Big data analytics or ML
• Cache data close to compute
Enable high-speed data sharing across jobs
• A staging storage layer
Unification of persistent storage
• Data abstraction across different storage
Spark + Alluxio Data Locality
Without K8s
Spark Workflow (w/o Alluxio)
9
Cluster Manager
(i.e. YARN/Mesos)
Application
Spark
Context
s3://data/
Worker Node
1) run Spark job
Worker Node
Spark
Executor3) launch executors
and launch tasks
4) access data and compute
2) allocate executors
Takeaway: Remote data, no locality
Step I: Schedule Compute to Data Location
10
Cluster Manager
(i.e. YARN/Mesos)
Application
Spark
Context
Alluxio
Client
Alluxio
Masters
HostA: 196.0.0.7
Alluxio
Worker
HostB: 196.0.0.8
Alluxio
Worker
1.2) allocate on [HostA]
block1
block2
1.1) where is block1? 1.1) block1 is
served by [HostA]
Alluxio client implements HDFS compatible API
Retrieves block location info
Alluxio masters keep track of block to worker mapping
Serve a list of worker nodes containing the block
Step 2: Detect+Exchange Data w/ local Worker
11
s3://data/
Spark Executor
Alluxio
WorkerAlluxio
Client
HostB: 196.0.0.8
Alluxio
Worker
HostA: 196.0.0.7
block1
Alluxio
Masters
block1?
[HostA]
Efficient I/O via local fs
(e.g., /mnt/ramdisk/) or
local domain socket
(/opt/domain)
Spark Executor finds local Alluxio Worker
Mechanism: Hostname comparison
Spark Executor talks to local Alluxio Worker
Mechanism: Either short-circuit access (via local
FS) or local domain socket
Recap: Spark Architecture w/ Alluxio
12
Cluster Manager
(i.e. YARN/Mesos)
Application
Spark
Context
s3://data/Alluxio
Client
Alluxio
Masters
Worker Node
Spark
Executor Alluxio
WorkerAlluxio
Client
Worker Node
Alluxio
Worker
1) run spark job
2.2) allocate executors
4) access Alluxio for data
and compute
2.1) talk to Alluxio for
where the data cache is
Step 1: Help Spark schedule compute to data cache
Step 2: Enable Spark to access local Alluxio cache
3) launch executors
and launch tasks
Kubernetes Overview
“an open-source container-orchestration system for automating application deployment,
scaling, and management.”
Kubernetes (K8s) is…
14
Platform agnostic cluster management
Service discovery and load balancing
Storage orchestration
Horizontal scaling
Self-monitoring and self-healing
Automated rollouts and rollbacks
Container Orchestration
15
Node
A VM or physical machine
Container
Container = Image once running on Docker Engine
Pod
Schedulable unit of one or more containers running together
Controller
Controls the desired state such as copies of a Pod
DaemonSet
A Controller that ensures each Node has only one such Pod
Persistent Volume
A storage resource with lifecycle independent of Pods
Key K8s Terms
16
Spark + Alluxio Data Locality
In K8s environment
Spark 2.3 added native K8s support
spark-submit talks to API Server to launch
Driver
Spark Driver launches Executor Pods
When the application completes,
Executor Pods terminate and are cleaned up
Driver Pod persists logs and remains in
“COMPLETED” state
Spark on K8s Architecture
18
Co-locate Spark Executor Pod w/ Alluxio Worker Pod
Lifecycle
Spark Executors are ephemeral
Alluxio Workers persist across all Spark jobs
Deployment order:
Deploy Alluxio cluster first (masters+workers)
An Alluxio Worker on each Node, by DaemonSet
spark-submit launches Spark Driver + Executors
Deployment Model: Co-location
19
Worker Node
Alluxio
Worker PodAlluxio
Client
Spark Executor
Pod
Challenge 1: Executor Allocation to Workers
20
Application
Spark
Context
Alluxio
Client
allocate on [AlluxioWorkerPodA]?
block1
block2
block1? [AlluxioWorkerPodA]
HostA: 196.0.0.7
AlluxioWorkerPodA:
10.0.1.1
HostB: 196.0.0.8
AlluxioWorkerPodB:
10.0.1.2
Alluxio
Master
Pods
Some Node
Spark
Driver Pod
Some Node
Problem: Worker Pods have different addresses from Nodes
Solution: Alluxio Workers w/ hostNetwork
21
Application
Spark
Context
Alluxio
Client
allocate on [HostA]
block1
block2
block1? [HostA]
HostA: 196.0.0.7
HostA: 196.0.0.7
HostB: 196.0.0.8
HostB: 196.0.0.8
Alluxio
Master
Pods
Some Node
Spark
Driver Pod
Some Node
hostNetwork: true
hostNetwork: true
Network
adapter
Network
adapter
Alluxio Workers advertise physical host address
Spark maps Executor Pods to physical host
Challenge 2: Identify host-local Alluxio Pod
22
Problem:
Spark Executor Pod has different hostName
Hostname HostA != SparkExecPod1
Local Alluxio Worker not identified!
HostA: 196.0.0.7
Network
adapter
block1
Alluxio
Client
SparkExecPod1:
10.0.1,1
Alluxio
Master
block1? [HostA]
Challenge 3: Executor fails to find domain socket
23
Problem:
• Pods don’t share the File System
• Domain socket /opt/domain is in Alluxio Worker Pod
HostA: 196.0.0.7
Network
adapter
block1
Alluxio
Client
SparkExecPod1:
10.0.1,1
Alluxio
Master
block1? [HostA]
/opt/domain/
Each Alluxio Worker has a UUID
Share domain socket by a hostPath Volume
Alluxio Client finds local worker’s domain
socket by finding file matching Worker
UUID
Worker domain socket path
/opt/domain/d -> /opt/domain/UUIDABC
Mount hostPath Volume to Spark Executor
Enabled in Spark 2.4
Solution: Share a hostPath Volume b/w Pods
24
HostA: 196.0.0.7
block1
Alluxio
Client
SparkExecPod1:
10.0.1,1
Alluxio
Master
block1? [HostA:
UUIDABC]
/opt/domain/UUIDABC
UUID: UUIDABC
Recap: Spark + Alluxio Architecture on K8s
25
Application
Spark
Context
Alluxio
Client
Spark
Executor
Pod
Alluxio
Worker
Pod
Alluxio
Client
Alluxio
Worker
Pod
Worker Node
Worker Node
Spark
Driver Pod
Alluxio
Master
Pods
Some Node
Some Node
1) run spark job
2.1) talk to Alluxio for
where the data is
s3://data/
2.2) create Driver
4) access Alluxio for data
and compute
Step 1: Help Spark schedule compute to data location
Step 2: Enable Spark to access data locally
3) launch executors
and launch tasks
Enterprise environments may restrict hostNetwork and hostPath
Alluxio workers need hostNetwork
Plan: Support container network translation
The domain socket file requires a hostPath volume
Plan: Using Local Persistent Volumes
Feedback/collaboration are welcome!
Limitations and Future Work
26
Alternate Deployment Options
Deploying Alluxio in K8s
28
Alluxio and Compute in different pods on the same host
When do you use this?
- Compute, like Spark, is short running and ephemeral
- Alluxio data orchestration & access layer is long running
and used across many jobs
Alluxio and Compute framework in the same pod
When do you use this?
- Compute, like Presto, is long running
- Data tier with Alluxio needs to be scaled along with
compute tier
Host
Alluxio
Spark
AlluxioAlluxio
Spark
Alluxio
SparkSpark
Host
Alluxio
Spark
AlluxioAlluxio
Spark
Alluxio
SparkPresto
PodLegend:
Alluxio
Structured Data Management
Innovations for Structured Data
Common Alluxio Use Cases
30
…
…
Unified Interface
Unified Namespace
Caching and Locality
SQL Engines are popular
31
Storage Systems SQL Frameworks
Files/Objects
Directories
Raw Bytes
Storage
Optimized
Tables
Schemas
Rows/Columns
Compute
Optimized
Impedance Mismatch
Further Expand Benefits!
Benefits of Alluxio Data Orchestration
32
Storage
Systems
SQL
Frameworks
Caching
Unified Interface/Namespace
Schema-Aware Optimizations
Compute-Optimized Formats
Physical Data Independence
Provide Structured Data APIs
Focus on how frameworks interact with data
High-Level Philosophy
33
Cache Logical Data Access
Focus on caching what frameworks want
Alluxio Structured Data Management
Alluxio Structured Data Management
34
Storage
System
Transformation
Service
Structured Data
and Metadata
Logical Data
Access Layer
Structured
Data Client
SQL
Engine
Engine
Developer Preview in Alluxio 2.1
Target Environment
36
Presto
Hive
Connector
Hive
Metastore
Storage
Alluxio Structured Data Management
37
Presto
Alluxio Caching
Service
Alluxio Catalog
Service
Alluxio Transformation
Service
Hive
Connector
Alluxio
Connector
Hive
Metastore
Storage
Alluxio Catalog Service
38
Alluxio Catalog Service
Hive Metastore
Hive Under Database
Functionality
Manages metadata for structured data
Abstracts other database catalogs as
Under Database (UDB)
Benefits
Schema-aware optimizations
Simple deployment
Tighter integration with Presto
New plugin based on the Presto Hive connector
Available in Alluxio 2.1 distribution
In Progress: Merging connector into Presto codebase
Alluxio Presto Connector
39
Transformation Service
40
Transform data to be compute-optimized
independent from storage-optimized format
Coalesce Format Conversion
parquetcsv
Demo!
2 isolated AWS 10-node clusters
Presto + Hive Metastore + S3 Data
Presto + Alluxio + Hive Metastore + S3 Data
TPCDS sample dataset on S3
~10,000 CSV files
Demo
42
Attached existing Hive database into Alluxio Catalog
Alluxio Catalog served table metadata for Presto
Transformed store_sales by coalescing and converting CSV to Parquet
Demo Summary
43
Presto Without
Alluxio
20s
Alluxio
Transformations
7s
Alluxio Transformations
With Caching
3s
User community feedback/collaboration is important!
Future projects
New UDB implementations (AWS Glue)
More conversion formats (json)
DDL/DML workloads (CREATE TABLE, INSERT, etc.)
New Client APIs for structured data (Arrow)
Future Work
44
Try it out!
Documentation
Provide feedback
Feature requests and issues in Github Alluxio/alluxio
Developer Preview Available in Alluxio 2.1
45
Thank You!
46

CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes Using Alluxio

  • 1.
    Improving Data Localityfor Analytics Jobs on Kubernetes Using Alluxio Adit Madan | Alluxio Gene Pang | Alluxio
  • 2.
    Alluxio Overview Data localitywith Spark and Alluxio Kubernetes Overview Spark and Alluxio in Kubernetes Alluxio Innovations for Structured Data Outline 2
  • 3.
    Data Ecosystem -Beta Data Ecosystem 1.0 COMPUTE STORAGE STORAGE COMPUTE 3
  • 4.
    Co-located Data stack journeyand innovation paths Co-located compute & HDFS on the same cluster Disaggregated compute & HDFS on the same cluster MR / Hive HDFS Hive HDFS Disaggregated Burst HDFS data in the cloud, public or private Support Presto, Spark and other computes without app changes Enable & accelerate big data on object stores Transition to Object store HDFS for Hybrid Cloud Support more frameworks ▪ Typically compute-bound clusters over 100% capacity ▪ Compute & I/O need to be scaled together even when not needed ▪ Compute & I/O can be scaled independently but I/O still needed on HDFS which is expensive 1 2 3 4 5
  • 5.
    Data Orchestration forthe Cloud Java File API HDFS Interface S3 Interface REST APIPOSIX Interface HDFS Driver Swift Driver S3 Driver NFS Driver Independent scaling of compute & storage
  • 6.
    Alluxio Data Orchestrationfor the Cloud Structured Data Catalog Intelligent Caching Data Transformation Data Management Global Namespace
  • 7.
    7 Host Alluxio Spark AlluxioAlluxio Spark Alluxio SparkSpark or Read more athttps://www.alluxio.io/blog/kubernetes-alluxio-and-the-disaggregated-analytics-stack/ Elastic Data for Elastic Compute Improve Data locality • Big data analytics or ML • Cache data close to compute Enable high-speed data sharing across jobs • A staging storage layer Unification of persistent storage • Data abstraction across different storage
  • 8.
    Spark + AlluxioData Locality Without K8s
  • 9.
    Spark Workflow (w/oAlluxio) 9 Cluster Manager (i.e. YARN/Mesos) Application Spark Context s3://data/ Worker Node 1) run Spark job Worker Node Spark Executor3) launch executors and launch tasks 4) access data and compute 2) allocate executors Takeaway: Remote data, no locality
  • 10.
    Step I: ScheduleCompute to Data Location 10 Cluster Manager (i.e. YARN/Mesos) Application Spark Context Alluxio Client Alluxio Masters HostA: 196.0.0.7 Alluxio Worker HostB: 196.0.0.8 Alluxio Worker 1.2) allocate on [HostA] block1 block2 1.1) where is block1? 1.1) block1 is served by [HostA] Alluxio client implements HDFS compatible API Retrieves block location info Alluxio masters keep track of block to worker mapping Serve a list of worker nodes containing the block
  • 11.
    Step 2: Detect+ExchangeData w/ local Worker 11 s3://data/ Spark Executor Alluxio WorkerAlluxio Client HostB: 196.0.0.8 Alluxio Worker HostA: 196.0.0.7 block1 Alluxio Masters block1? [HostA] Efficient I/O via local fs (e.g., /mnt/ramdisk/) or local domain socket (/opt/domain) Spark Executor finds local Alluxio Worker Mechanism: Hostname comparison Spark Executor talks to local Alluxio Worker Mechanism: Either short-circuit access (via local FS) or local domain socket
  • 12.
    Recap: Spark Architecturew/ Alluxio 12 Cluster Manager (i.e. YARN/Mesos) Application Spark Context s3://data/Alluxio Client Alluxio Masters Worker Node Spark Executor Alluxio WorkerAlluxio Client Worker Node Alluxio Worker 1) run spark job 2.2) allocate executors 4) access Alluxio for data and compute 2.1) talk to Alluxio for where the data cache is Step 1: Help Spark schedule compute to data cache Step 2: Enable Spark to access local Alluxio cache 3) launch executors and launch tasks
  • 13.
  • 14.
    “an open-source container-orchestrationsystem for automating application deployment, scaling, and management.” Kubernetes (K8s) is… 14
  • 15.
    Platform agnostic clustermanagement Service discovery and load balancing Storage orchestration Horizontal scaling Self-monitoring and self-healing Automated rollouts and rollbacks Container Orchestration 15
  • 16.
    Node A VM orphysical machine Container Container = Image once running on Docker Engine Pod Schedulable unit of one or more containers running together Controller Controls the desired state such as copies of a Pod DaemonSet A Controller that ensures each Node has only one such Pod Persistent Volume A storage resource with lifecycle independent of Pods Key K8s Terms 16
  • 17.
    Spark + AlluxioData Locality In K8s environment
  • 18.
    Spark 2.3 addednative K8s support spark-submit talks to API Server to launch Driver Spark Driver launches Executor Pods When the application completes, Executor Pods terminate and are cleaned up Driver Pod persists logs and remains in “COMPLETED” state Spark on K8s Architecture 18
  • 19.
    Co-locate Spark ExecutorPod w/ Alluxio Worker Pod Lifecycle Spark Executors are ephemeral Alluxio Workers persist across all Spark jobs Deployment order: Deploy Alluxio cluster first (masters+workers) An Alluxio Worker on each Node, by DaemonSet spark-submit launches Spark Driver + Executors Deployment Model: Co-location 19 Worker Node Alluxio Worker PodAlluxio Client Spark Executor Pod
  • 20.
    Challenge 1: ExecutorAllocation to Workers 20 Application Spark Context Alluxio Client allocate on [AlluxioWorkerPodA]? block1 block2 block1? [AlluxioWorkerPodA] HostA: 196.0.0.7 AlluxioWorkerPodA: 10.0.1.1 HostB: 196.0.0.8 AlluxioWorkerPodB: 10.0.1.2 Alluxio Master Pods Some Node Spark Driver Pod Some Node Problem: Worker Pods have different addresses from Nodes
  • 21.
    Solution: Alluxio Workersw/ hostNetwork 21 Application Spark Context Alluxio Client allocate on [HostA] block1 block2 block1? [HostA] HostA: 196.0.0.7 HostA: 196.0.0.7 HostB: 196.0.0.8 HostB: 196.0.0.8 Alluxio Master Pods Some Node Spark Driver Pod Some Node hostNetwork: true hostNetwork: true Network adapter Network adapter Alluxio Workers advertise physical host address Spark maps Executor Pods to physical host
  • 22.
    Challenge 2: Identifyhost-local Alluxio Pod 22 Problem: Spark Executor Pod has different hostName Hostname HostA != SparkExecPod1 Local Alluxio Worker not identified! HostA: 196.0.0.7 Network adapter block1 Alluxio Client SparkExecPod1: 10.0.1,1 Alluxio Master block1? [HostA]
  • 23.
    Challenge 3: Executorfails to find domain socket 23 Problem: • Pods don’t share the File System • Domain socket /opt/domain is in Alluxio Worker Pod HostA: 196.0.0.7 Network adapter block1 Alluxio Client SparkExecPod1: 10.0.1,1 Alluxio Master block1? [HostA] /opt/domain/
  • 24.
    Each Alluxio Workerhas a UUID Share domain socket by a hostPath Volume Alluxio Client finds local worker’s domain socket by finding file matching Worker UUID Worker domain socket path /opt/domain/d -> /opt/domain/UUIDABC Mount hostPath Volume to Spark Executor Enabled in Spark 2.4 Solution: Share a hostPath Volume b/w Pods 24 HostA: 196.0.0.7 block1 Alluxio Client SparkExecPod1: 10.0.1,1 Alluxio Master block1? [HostA: UUIDABC] /opt/domain/UUIDABC UUID: UUIDABC
  • 25.
    Recap: Spark +Alluxio Architecture on K8s 25 Application Spark Context Alluxio Client Spark Executor Pod Alluxio Worker Pod Alluxio Client Alluxio Worker Pod Worker Node Worker Node Spark Driver Pod Alluxio Master Pods Some Node Some Node 1) run spark job 2.1) talk to Alluxio for where the data is s3://data/ 2.2) create Driver 4) access Alluxio for data and compute Step 1: Help Spark schedule compute to data location Step 2: Enable Spark to access data locally 3) launch executors and launch tasks
  • 26.
    Enterprise environments mayrestrict hostNetwork and hostPath Alluxio workers need hostNetwork Plan: Support container network translation The domain socket file requires a hostPath volume Plan: Using Local Persistent Volumes Feedback/collaboration are welcome! Limitations and Future Work 26
  • 27.
  • 28.
    Deploying Alluxio inK8s 28 Alluxio and Compute in different pods on the same host When do you use this? - Compute, like Spark, is short running and ephemeral - Alluxio data orchestration & access layer is long running and used across many jobs Alluxio and Compute framework in the same pod When do you use this? - Compute, like Presto, is long running - Data tier with Alluxio needs to be scaled along with compute tier Host Alluxio Spark AlluxioAlluxio Spark Alluxio SparkSpark Host Alluxio Spark AlluxioAlluxio Spark Alluxio SparkPresto PodLegend:
  • 29.
  • 30.
    Common Alluxio UseCases 30 … … Unified Interface Unified Namespace Caching and Locality SQL Engines are popular
  • 31.
    31 Storage Systems SQLFrameworks Files/Objects Directories Raw Bytes Storage Optimized Tables Schemas Rows/Columns Compute Optimized Impedance Mismatch Further Expand Benefits!
  • 32.
    Benefits of AlluxioData Orchestration 32 Storage Systems SQL Frameworks Caching Unified Interface/Namespace Schema-Aware Optimizations Compute-Optimized Formats Physical Data Independence
  • 33.
    Provide Structured DataAPIs Focus on how frameworks interact with data High-Level Philosophy 33 Cache Logical Data Access Focus on caching what frameworks want
  • 34.
    Alluxio Structured DataManagement Alluxio Structured Data Management 34 Storage System Transformation Service Structured Data and Metadata Logical Data Access Layer Structured Data Client SQL Engine Engine
  • 35.
  • 36.
  • 37.
    Alluxio Structured DataManagement 37 Presto Alluxio Caching Service Alluxio Catalog Service Alluxio Transformation Service Hive Connector Alluxio Connector Hive Metastore Storage
  • 38.
    Alluxio Catalog Service 38 AlluxioCatalog Service Hive Metastore Hive Under Database Functionality Manages metadata for structured data Abstracts other database catalogs as Under Database (UDB) Benefits Schema-aware optimizations Simple deployment
  • 39.
    Tighter integration withPresto New plugin based on the Presto Hive connector Available in Alluxio 2.1 distribution In Progress: Merging connector into Presto codebase Alluxio Presto Connector 39
  • 40.
    Transformation Service 40 Transform datato be compute-optimized independent from storage-optimized format Coalesce Format Conversion parquetcsv
  • 41.
  • 42.
    2 isolated AWS10-node clusters Presto + Hive Metastore + S3 Data Presto + Alluxio + Hive Metastore + S3 Data TPCDS sample dataset on S3 ~10,000 CSV files Demo 42
  • 43.
    Attached existing Hivedatabase into Alluxio Catalog Alluxio Catalog served table metadata for Presto Transformed store_sales by coalescing and converting CSV to Parquet Demo Summary 43 Presto Without Alluxio 20s Alluxio Transformations 7s Alluxio Transformations With Caching 3s
  • 44.
    User community feedback/collaborationis important! Future projects New UDB implementations (AWS Glue) More conversion formats (json) DDL/DML workloads (CREATE TABLE, INSERT, etc.) New Client APIs for structured data (Arrow) Future Work 44
  • 45.
    Try it out! Documentation Providefeedback Feature requests and issues in Github Alluxio/alluxio Developer Preview Available in Alluxio 2.1 45 Thank You!
  • 46.