SlideShare a Scribd company logo
Accelerating Spark with Kubernetes
Adit Madan | Lead Engineer | adit@alluxio.com
Dipti Borkar | Product | dipti@alluxio.com
Agenda
• Alluxio Overview
• Kubernetes Basics
• Alluxio Deployment Options
• Spark + Alluxio on Kubernetes Demo
Alluxio Overview
The Alluxio Story
Originated as Tachyon project, at UC Berkley AMPLab by
then Ph.D. student & now Alluxio CTO, Haoyuan (H.Y.) Li.2013
2015
Open Source project established & company to
commercialize Alluxio founded
Goal: Orchestrate Data at Memory Speed for the Cloud
for data driven apps such as Big Data Analytics, ML and AI.
20192018
2019
Top 10 Big Data
2019
Top 10 Cloud Software
Fast-growing Open Source Community
4000+ Github Stars1000+ Contributors
Join the community on Slack
alluxio.io/slack
Apache 2.0 Licensed
Contribute to source code
github.com/alluxio/alluxio
Data Ecosystem - Beta Data Ecosystem 1.0
COMPUTE
STORAGE STORAGE
COMPUTE
6
Co-located
Data stack journey and innovation paths
Co-located
compute & HDFS
on the same cluster
Disaggregated
compute & HDFS
on the same cluster
MR / Hive
HDFS
Hive
HDFS
Disaggregated
Burst HDFS data in
the cloud,
public or private
Support Presto, Spark
and other computes
without app changes
Enable & accelerate
big data on
object stores
Transition to Object store
HDFS for Hybrid Cloud
Support more frameworks
§ Typically compute-bound
clusters over 100%
capacity
§ Compute & I/O need to
be scaled together even
when not needed
§ Compute & I/O can be
scaled independently but
I/O still needed on HDFS
which is expensive
1 2
3
4
5
Data Orchestration for the Cloud
Java File API HDFS Interface S3 Interface REST APIPOSIX Interface
HDFS Driver Swift Driver S3 Driver NFS Driver
Independent scaling of compute & storage
Data Elasticity
with a unified
namespace
Abstract data silos & storage
systems to independently scale
data on-demand with compute
Run Spark, Hive, Presto, ML
workloads on your data
located anywhere
Accelerate big data
workloads with transparent
tiered local data
Data Accessibility
for popular APIs &
API translation
Data Locality
with Intelligent
Multi-tiering
Alluxio – Key innovations
Flexible APIs to Interact with data in Alluxio
Spark
Presto
POSIX
Java
> rdd = sc.textFile(“alluxio://localhost:19998/myInput”)
CREATE SCHEMA hive.web
WITH (location = 'alluxio://master:port/my-table/')
$ cat /mnt/alluxio/myInput
FileSystem fs = FileSystem.Factory.get();
FileInStream in = fs.openFile(new AlluxioURI("/myInput"));
10
Alluxio
MasterZookeeper
/ RAFT
Standby
Master
Alluxio
Worker
Alluxio
Worker
Alluxio Reference Architecture
…
…
Application
Application
Under Store 1
Under Store 2
Kubernetes Overview
Kubernetes (k8s) is...
an open-source system for automating deployment, scaling, and
management of containerized applications.
What we’ll cover
● Kubernetes Basics
● Alluxio Deployment Options
● Spark on Alluxio Demo
Container Orchestration Platform
● Abstract Physical Infrastructure
○ Platform Agnostic
○ On-premise, hybrid or in the public cloud
● Service Discovery
○ Networking abstraction
● Self-healing
○ Resilience to failures
● Secret Management
○ Management of sensitive credentials
● Storage Management
○ Lifecycle management tied to applications
Key Concepts
● Containers
○ Docker Image = Lightweight OS and application execution
environment
○ Container = Image once running on Docker Engine
● Pods
○ Schedulable unit of one or more containers
○ Containers share resources and network
● Controllers
○ Controls the desired state such as copies of a Pod
● PersistentVolumes
○ Storage provisioned by admin with lifecycle independent of a Pod
Some more K8s Basics...
● Declarative Specs
○ Configure, deploy and manage an application on k8s using a declarative
language
● Helm Charts
○ Thin wrapper over declarative specs
○ Reduce complexity using a single configuration file
● Operators
○ Another abstraction layer over declarative specs to
○ Built-in domain knowledge
○ Manage upgrades and improve troubleshooting
Alluxio on Kubernetes
Deploying Alluxio in K8s
Alluxio and Compute framework in different
pods on the same host
When do you use this?
- Compute, like Spark, is short running and
ephemeral
- Alluxio data orchestration & access layer is long
running and used across many jobs
Alluxio and Compute framework in the same
pod
When do you use this?
- Compute, like Presto, is long running
- Data tier with Alluxio needs to be scaled along
with compute tier
Host
Alluxio
Spark
AlluxioAlluxio
Spark
Alluxio
SparkSpark
Host
Alluxio
Spark
AlluxioAlluxio
Spark
Alluxio
SparkPresto
PodLegend:
Elastic Data for Elastic Compute with Tight Locality
● Data locality
○ Big data analytics or ML
○ Cache data close to compute
● Enable high-speed data sharing
across jobs
○ A closer staging storage layer for compute
jobs
● Unification of persistent storage
○ Same data abstraction across different
storage
Host
Alluxio
Spark
AlluxioAlluxio
Spark
Alluxio
SparkSpark
or
Alluxio
Master
Alluxio
Worker
Alluxio
Service
/journal
EBS
Volume
Persistent
Volume
Claim
Alluxio on K8s Architecture
Alluxio
Master
Spark
Driver
Alluxio
Worker
Spark
Executor
Alluxio
Service
/journal
EBS
Volume
Persistent
Volume
Claim
Spark on Alluxio on K8s
Deploying Alluxio
● Provision a Persistent Volume for Journal
$ kubectl create -f alluxio-journal-volume.yaml
● Specify Configuration
$ kubectl create -f alluxio-configMap.yaml
● Deploy Master
$ kubectl create -f alluxio-master.yaml
● Deploy Workers
$ kubectl create -f alluxio-worker.yaml
Alluxio Helm Chart
24
Now simplified…
$ cat << EOF > config.yaml
properties:
alluxio.mount.table.root.ufs: "<under_storage_address>
aws.accessKeyId: "<accessKey>"
aws.secretKey: "<secretKey>"
EOF
$ helm install -f config.yaml alluxio-repo/alluxio --version 2.1.0-SNAPSHOT
Demo:
Running Spark & Alluxio
on K8s
Summary
● An Overview of Data Orchestration
● Alluxio enables elastic data for elastic compute in Kubernetes
○ Data Locality on Demand, Data Abstraction & Unification, Data Sharing
● A guide to run Alluxio in Kubernetes environment
● A demo of running Spark on Alluxio in Kubernetes
26
Appendix
27
Kubernetes w/ Alluxio Components
● Containers
○ Master, Job Master, Worker, Job Worker
● Pods
○ Master, Worker
● Service
○ Master
● Daemon Sets
○ Workers
● Persistent Volumes
○ Master Journal, Worker Tiered Storage (optional)
● HostPath Volume
○ Worker Domain Socket Directory
28

More Related Content

What's hot

Orchestrate a Data Symphony
Orchestrate a Data SymphonyOrchestrate a Data Symphony
Orchestrate a Data Symphony
Alluxio, Inc.
 
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Alluxio, Inc.
 
Alluxio 2 Community Update
Alluxio 2 Community UpdateAlluxio 2 Community Update
Alluxio 2 Community Update
Alluxio, Inc.
 
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud EraModernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Alluxio, Inc.
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
Alluxio, Inc.
 
Data Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud EraData Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud Era
Alluxio, Inc.
 
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Alluxio, Inc.
 
Introducing the Hub for Data Orchestration
Introducing the Hub for Data OrchestrationIntroducing the Hub for Data Orchestration
Introducing the Hub for Data Orchestration
Alluxio, Inc.
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Alluxio, Inc.
 
Data Orchestration for AI, Big Data, and Cloud
Data Orchestration for AI, Big Data, and CloudData Orchestration for AI, Big Data, and Cloud
Data Orchestration for AI, Big Data, and Cloud
Alluxio, Inc.
 
Building Fast SQL Analytics on Anything with Presto, Alluxio
Building Fast SQL Analytics on Anything with Presto, AlluxioBuilding Fast SQL Analytics on Anything with Presto, Alluxio
Building Fast SQL Analytics on Anything with Presto, Alluxio
Alluxio, Inc.
 
Alluxio Mesos Meetup - SMACK to SMAACK
Alluxio Mesos Meetup - SMACK to SMAACKAlluxio Mesos Meetup - SMACK to SMAACK
Alluxio Mesos Meetup - SMACK to SMAACK
Alluxio, Inc.
 
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio, Inc.
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Adam Doyle
 
How to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data PlatformsHow to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data Platforms
Alluxio, Inc.
 
Alluxio + Spark: Accelerating Auto Data Tagging in WeRide
Alluxio + Spark: Accelerating Auto Data Tagging in WeRideAlluxio + Spark: Accelerating Auto Data Tagging in WeRide
Alluxio + Spark: Accelerating Auto Data Tagging in WeRide
Alluxio, Inc.
 
OpenNebulaConf2017EU: Enabling Dev and Infra teams by Lodewijk De Schuyter,De...
OpenNebulaConf2017EU: Enabling Dev and Infra teams by Lodewijk De Schuyter,De...OpenNebulaConf2017EU: Enabling Dev and Infra teams by Lodewijk De Schuyter,De...
OpenNebulaConf2017EU: Enabling Dev and Infra teams by Lodewijk De Schuyter,De...
OpenNebula Project
 
Alluxio Use Cases and Future Directions
Alluxio Use Cases and Future DirectionsAlluxio Use Cases and Future Directions
Alluxio Use Cases and Future Directions
Alluxio, Inc.
 
OpenNebula Conf 2014: Expanding OpenNebula´s support for Cloud Bursting - Emm...
OpenNebula Conf 2014: Expanding OpenNebula´s support for Cloud Bursting - Emm...OpenNebula Conf 2014: Expanding OpenNebula´s support for Cloud Bursting - Emm...
OpenNebula Conf 2014: Expanding OpenNebula´s support for Cloud Bursting - Emm...
NETWAYS
 
What’s new in Alluxio 2: from seamless operations to structured data management
What’s new in Alluxio 2: from seamless operations to structured data managementWhat’s new in Alluxio 2: from seamless operations to structured data management
What’s new in Alluxio 2: from seamless operations to structured data management
Alluxio, Inc.
 

What's hot (20)

Orchestrate a Data Symphony
Orchestrate a Data SymphonyOrchestrate a Data Symphony
Orchestrate a Data Symphony
 
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
 
Alluxio 2 Community Update
Alluxio 2 Community UpdateAlluxio 2 Community Update
Alluxio 2 Community Update
 
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud EraModernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
Data Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud EraData Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud Era
 
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
 
Introducing the Hub for Data Orchestration
Introducing the Hub for Data OrchestrationIntroducing the Hub for Data Orchestration
Introducing the Hub for Data Orchestration
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 
Data Orchestration for AI, Big Data, and Cloud
Data Orchestration for AI, Big Data, and CloudData Orchestration for AI, Big Data, and Cloud
Data Orchestration for AI, Big Data, and Cloud
 
Building Fast SQL Analytics on Anything with Presto, Alluxio
Building Fast SQL Analytics on Anything with Presto, AlluxioBuilding Fast SQL Analytics on Anything with Presto, Alluxio
Building Fast SQL Analytics on Anything with Presto, Alluxio
 
Alluxio Mesos Meetup - SMACK to SMAACK
Alluxio Mesos Meetup - SMACK to SMAACKAlluxio Mesos Meetup - SMACK to SMAACK
Alluxio Mesos Meetup - SMACK to SMAACK
 
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
How to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data PlatformsHow to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data Platforms
 
Alluxio + Spark: Accelerating Auto Data Tagging in WeRide
Alluxio + Spark: Accelerating Auto Data Tagging in WeRideAlluxio + Spark: Accelerating Auto Data Tagging in WeRide
Alluxio + Spark: Accelerating Auto Data Tagging in WeRide
 
OpenNebulaConf2017EU: Enabling Dev and Infra teams by Lodewijk De Schuyter,De...
OpenNebulaConf2017EU: Enabling Dev and Infra teams by Lodewijk De Schuyter,De...OpenNebulaConf2017EU: Enabling Dev and Infra teams by Lodewijk De Schuyter,De...
OpenNebulaConf2017EU: Enabling Dev and Infra teams by Lodewijk De Schuyter,De...
 
Alluxio Use Cases and Future Directions
Alluxio Use Cases and Future DirectionsAlluxio Use Cases and Future Directions
Alluxio Use Cases and Future Directions
 
OpenNebula Conf 2014: Expanding OpenNebula´s support for Cloud Bursting - Emm...
OpenNebula Conf 2014: Expanding OpenNebula´s support for Cloud Bursting - Emm...OpenNebula Conf 2014: Expanding OpenNebula´s support for Cloud Bursting - Emm...
OpenNebula Conf 2014: Expanding OpenNebula´s support for Cloud Bursting - Emm...
 
What’s new in Alluxio 2: from seamless operations to structured data management
What’s new in Alluxio 2: from seamless operations to structured data managementWhat’s new in Alluxio 2: from seamless operations to structured data management
What’s new in Alluxio 2: from seamless operations to structured data management
 

Similar to Accelerating Spark with Kubernetes

Running Spark & Alluxio in Kubernetes
Running Spark & Alluxio in KubernetesRunning Spark & Alluxio in Kubernetes
Running Spark & Alluxio in Kubernetes
Alluxio, Inc.
 
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & MoreMeetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Alluxio, Inc.
 
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...
Alluxio, Inc.
 
Speeding up I/O for Machine Learning ft Apple Case Study using TensorFlow, N...
Speeding up I/O for Machine Learning  ft Apple Case Study using TensorFlow, N...Speeding up I/O for Machine Learning  ft Apple Case Study using TensorFlow, N...
Speeding up I/O for Machine Learning ft Apple Case Study using TensorFlow, N...
Alluxio, Inc.
 
Open Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and CloudOpen Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and Cloud
Alluxio, Inc.
 
Achieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloadsAchieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloads
Alluxio, Inc.
 
Ultra-fast SQL Analytics using PAS (Presto on Alluxio Stack)
Ultra-fast SQL Analytics using PAS (Presto on Alluxio Stack)Ultra-fast SQL Analytics using PAS (Presto on Alluxio Stack)
Ultra-fast SQL Analytics using PAS (Presto on Alluxio Stack)
Alluxio, Inc.
 
Enabling Ultra-fast Presto in the Cloud with Alluxio
Enabling Ultra-fast Presto in the Cloud with AlluxioEnabling Ultra-fast Presto in the Cloud with Alluxio
Enabling Ultra-fast Presto in the Cloud with Alluxio
Alluxio, Inc.
 
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Getting Started with Apache Spark and Alluxio for Blazingly Fast AnalyticsGetting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Alluxio, Inc.
 
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Simplified Data Preparation for Machine Learning in Hybrid and Multi CloudsSimplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Alluxio, Inc.
 
Building a Cloud Native Stack with EMR Spark, Alluxio, and S3
Building a Cloud Native Stack with EMR Spark, Alluxio, and S3Building a Cloud Native Stack with EMR Spark, Alluxio, and S3
Building a Cloud Native Stack with EMR Spark, Alluxio, and S3
Alluxio, Inc.
 
Deploying Alluxio in the Cloud for Machine Learning
Deploying Alluxio in the Cloud for Machine LearningDeploying Alluxio in the Cloud for Machine Learning
Deploying Alluxio in the Cloud for Machine Learning
Alluxio, Inc.
 
Alluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle MeetupAlluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle Meetup
Alluxio, Inc.
 
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio, Inc.
 
Best Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+AlluxioBest Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+Alluxio
Alluxio, Inc.
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
Alluxio, Inc.
 
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the CloudInteractive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Alluxio, Inc.
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Alluxio, Inc.
 
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Community
 
Unified Data API for Distributed Cloud Analytics and AI
Unified Data API for Distributed Cloud Analytics and AIUnified Data API for Distributed Cloud Analytics and AI
Unified Data API for Distributed Cloud Analytics and AI
Alluxio, Inc.
 

Similar to Accelerating Spark with Kubernetes (20)

Running Spark & Alluxio in Kubernetes
Running Spark & Alluxio in KubernetesRunning Spark & Alluxio in Kubernetes
Running Spark & Alluxio in Kubernetes
 
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & MoreMeetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
 
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...
 
Speeding up I/O for Machine Learning ft Apple Case Study using TensorFlow, N...
Speeding up I/O for Machine Learning  ft Apple Case Study using TensorFlow, N...Speeding up I/O for Machine Learning  ft Apple Case Study using TensorFlow, N...
Speeding up I/O for Machine Learning ft Apple Case Study using TensorFlow, N...
 
Open Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and CloudOpen Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and Cloud
 
Achieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloadsAchieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloads
 
Ultra-fast SQL Analytics using PAS (Presto on Alluxio Stack)
Ultra-fast SQL Analytics using PAS (Presto on Alluxio Stack)Ultra-fast SQL Analytics using PAS (Presto on Alluxio Stack)
Ultra-fast SQL Analytics using PAS (Presto on Alluxio Stack)
 
Enabling Ultra-fast Presto in the Cloud with Alluxio
Enabling Ultra-fast Presto in the Cloud with AlluxioEnabling Ultra-fast Presto in the Cloud with Alluxio
Enabling Ultra-fast Presto in the Cloud with Alluxio
 
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Getting Started with Apache Spark and Alluxio for Blazingly Fast AnalyticsGetting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
 
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Simplified Data Preparation for Machine Learning in Hybrid and Multi CloudsSimplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
 
Building a Cloud Native Stack with EMR Spark, Alluxio, and S3
Building a Cloud Native Stack with EMR Spark, Alluxio, and S3Building a Cloud Native Stack with EMR Spark, Alluxio, and S3
Building a Cloud Native Stack with EMR Spark, Alluxio, and S3
 
Deploying Alluxio in the Cloud for Machine Learning
Deploying Alluxio in the Cloud for Machine LearningDeploying Alluxio in the Cloud for Machine Learning
Deploying Alluxio in the Cloud for Machine Learning
 
Alluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle MeetupAlluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle Meetup
 
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
 
Best Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+AlluxioBest Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+Alluxio
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the CloudInteractive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
 
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
 
Unified Data API for Distributed Cloud Analytics and AI
Unified Data API for Distributed Cloud Analytics and AIUnified Data API for Distributed Cloud Analytics and AI
Unified Data API for Distributed Cloud Analytics and AI
 

More from Alluxio, Inc.

AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
Alluxio, Inc.
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
Alluxio, Inc.
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
Alluxio, Inc.
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio, Inc.
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio, Inc.
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
Alluxio, Inc.
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
Alluxio, Inc.
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
Alluxio, Inc.
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Alluxio, Inc.
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio, Inc.
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio, Inc.
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Alluxio, Inc.
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Alluxio, Inc.
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Alluxio, Inc.
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
Alluxio, Inc.
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
Alluxio, Inc.
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio, Inc.
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
Alluxio, Inc.
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
Alluxio, Inc.
 

More from Alluxio, Inc. (20)

AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 

Recently uploaded

Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 

Recently uploaded (20)

Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 

Accelerating Spark with Kubernetes

  • 1. Accelerating Spark with Kubernetes Adit Madan | Lead Engineer | adit@alluxio.com Dipti Borkar | Product | dipti@alluxio.com
  • 2. Agenda • Alluxio Overview • Kubernetes Basics • Alluxio Deployment Options • Spark + Alluxio on Kubernetes Demo
  • 4. The Alluxio Story Originated as Tachyon project, at UC Berkley AMPLab by then Ph.D. student & now Alluxio CTO, Haoyuan (H.Y.) Li.2013 2015 Open Source project established & company to commercialize Alluxio founded Goal: Orchestrate Data at Memory Speed for the Cloud for data driven apps such as Big Data Analytics, ML and AI. 20192018 2019 Top 10 Big Data 2019 Top 10 Cloud Software
  • 5. Fast-growing Open Source Community 4000+ Github Stars1000+ Contributors Join the community on Slack alluxio.io/slack Apache 2.0 Licensed Contribute to source code github.com/alluxio/alluxio
  • 6. Data Ecosystem - Beta Data Ecosystem 1.0 COMPUTE STORAGE STORAGE COMPUTE 6
  • 7. Co-located Data stack journey and innovation paths Co-located compute & HDFS on the same cluster Disaggregated compute & HDFS on the same cluster MR / Hive HDFS Hive HDFS Disaggregated Burst HDFS data in the cloud, public or private Support Presto, Spark and other computes without app changes Enable & accelerate big data on object stores Transition to Object store HDFS for Hybrid Cloud Support more frameworks § Typically compute-bound clusters over 100% capacity § Compute & I/O need to be scaled together even when not needed § Compute & I/O can be scaled independently but I/O still needed on HDFS which is expensive 1 2 3 4 5
  • 8. Data Orchestration for the Cloud Java File API HDFS Interface S3 Interface REST APIPOSIX Interface HDFS Driver Swift Driver S3 Driver NFS Driver Independent scaling of compute & storage
  • 9. Data Elasticity with a unified namespace Abstract data silos & storage systems to independently scale data on-demand with compute Run Spark, Hive, Presto, ML workloads on your data located anywhere Accelerate big data workloads with transparent tiered local data Data Accessibility for popular APIs & API translation Data Locality with Intelligent Multi-tiering Alluxio – Key innovations
  • 10. Flexible APIs to Interact with data in Alluxio Spark Presto POSIX Java > rdd = sc.textFile(“alluxio://localhost:19998/myInput”) CREATE SCHEMA hive.web WITH (location = 'alluxio://master:port/my-table/') $ cat /mnt/alluxio/myInput FileSystem fs = FileSystem.Factory.get(); FileInStream in = fs.openFile(new AlluxioURI("/myInput")); 10
  • 11. Alluxio MasterZookeeper / RAFT Standby Master Alluxio Worker Alluxio Worker Alluxio Reference Architecture … … Application Application Under Store 1 Under Store 2
  • 13. Kubernetes (k8s) is... an open-source system for automating deployment, scaling, and management of containerized applications.
  • 14. What we’ll cover ● Kubernetes Basics ● Alluxio Deployment Options ● Spark on Alluxio Demo
  • 15. Container Orchestration Platform ● Abstract Physical Infrastructure ○ Platform Agnostic ○ On-premise, hybrid or in the public cloud ● Service Discovery ○ Networking abstraction ● Self-healing ○ Resilience to failures ● Secret Management ○ Management of sensitive credentials ● Storage Management ○ Lifecycle management tied to applications
  • 16. Key Concepts ● Containers ○ Docker Image = Lightweight OS and application execution environment ○ Container = Image once running on Docker Engine ● Pods ○ Schedulable unit of one or more containers ○ Containers share resources and network ● Controllers ○ Controls the desired state such as copies of a Pod ● PersistentVolumes ○ Storage provisioned by admin with lifecycle independent of a Pod
  • 17. Some more K8s Basics... ● Declarative Specs ○ Configure, deploy and manage an application on k8s using a declarative language ● Helm Charts ○ Thin wrapper over declarative specs ○ Reduce complexity using a single configuration file ● Operators ○ Another abstraction layer over declarative specs to ○ Built-in domain knowledge ○ Manage upgrades and improve troubleshooting
  • 19. Deploying Alluxio in K8s Alluxio and Compute framework in different pods on the same host When do you use this? - Compute, like Spark, is short running and ephemeral - Alluxio data orchestration & access layer is long running and used across many jobs Alluxio and Compute framework in the same pod When do you use this? - Compute, like Presto, is long running - Data tier with Alluxio needs to be scaled along with compute tier Host Alluxio Spark AlluxioAlluxio Spark Alluxio SparkSpark Host Alluxio Spark AlluxioAlluxio Spark Alluxio SparkPresto PodLegend:
  • 20. Elastic Data for Elastic Compute with Tight Locality ● Data locality ○ Big data analytics or ML ○ Cache data close to compute ● Enable high-speed data sharing across jobs ○ A closer staging storage layer for compute jobs ● Unification of persistent storage ○ Same data abstraction across different storage Host Alluxio Spark AlluxioAlluxio Spark Alluxio SparkSpark or
  • 23. Deploying Alluxio ● Provision a Persistent Volume for Journal $ kubectl create -f alluxio-journal-volume.yaml ● Specify Configuration $ kubectl create -f alluxio-configMap.yaml ● Deploy Master $ kubectl create -f alluxio-master.yaml ● Deploy Workers $ kubectl create -f alluxio-worker.yaml
  • 24. Alluxio Helm Chart 24 Now simplified… $ cat << EOF > config.yaml properties: alluxio.mount.table.root.ufs: "<under_storage_address> aws.accessKeyId: "<accessKey>" aws.secretKey: "<secretKey>" EOF $ helm install -f config.yaml alluxio-repo/alluxio --version 2.1.0-SNAPSHOT
  • 25. Demo: Running Spark & Alluxio on K8s
  • 26. Summary ● An Overview of Data Orchestration ● Alluxio enables elastic data for elastic compute in Kubernetes ○ Data Locality on Demand, Data Abstraction & Unification, Data Sharing ● A guide to run Alluxio in Kubernetes environment ● A demo of running Spark on Alluxio in Kubernetes 26
  • 28. Kubernetes w/ Alluxio Components ● Containers ○ Master, Job Master, Worker, Job Worker ● Pods ○ Master, Worker ● Service ○ Master ● Daemon Sets ○ Workers ● Persistent Volumes ○ Master Journal, Worker Tiered Storage (optional) ● HostPath Volume ○ Worker Domain Socket Directory 28