SlideShare a Scribd company logo
Alluxio on Kubernetes
Powering training through
Container Storage Interface (CSI) plugin
Alluxio Fuse
Alluxio on Kubernetes
Alluxio Master pod
Master
Container
Job
Master
Container
Alluxio Worker pod
Worker
Container
Job
Worker
Container
Alluxio Fuse pod
Fuse
Container
Alluxio Worker pod
Worker
Container
Job
Worker
Container
Alluxio Fuse pod
Fuse
Container
DaemonSet: Worker & Fuse
Alluxio Worker pod
Worker
Container
Job
Worker
Container
Alluxio Fuse pod
Fuse
Container
Alluxio Worker pod
Worker
Container
Job
Worker
Container
Alluxio Fuse pod
Fuse
Container
Alluxio Worker pod
Worker
Container
Job
Worker
Container
Alluxio Fuse pod
Fuse
Container
Host Machine 1 Host Machine 2 Host Machine 3
Application
Alluxio Worker pod
Worker
Container
Job
Worker
Container
Alluxio Fuse pod
Fuse
Container
Host Machine
Volume
mount
Application
Alluxio Worker pod
Worker
Container
Job
Worker
Container
Alluxio Fuse pod
Fuse
Container
Host Machine
Application pod
Application
Container
Volume
mount
mount
req data
Advantages
1. Better performance because of
data-locality
2. Easy to deploy. `helm install` to start
the whole Alluxio cluster
Alluxio Worker pod
Worker
Contain
er
Job
Worker
Contain
er
Alluxio Fuse pod
Fuse
Container
Host Machine
Application pod
Application
Container
Volume
mount
mount
req data
Challenges
Alluxio Worker pod
Worker
Contain
er
Job
Worker
Contain
er
Alluxio Fuse pod
Fuse
Container
Host Machine
Application pod
Application
Container
Volume
mount
mount
req data
1. Fuse pod is not always needed but
always taking resources.
Ex. data preprocessing
2. In some workloads Fuse pod is not
needed on some machines in the
cluster. Lots of manual work.
Before CSI on Kubernetes
Pod
Volume
AWS EBS
AzureDisk
AzureFile
CephFS
…
CSI on Kubernetes
Pod
Volume CSI
AWS EBS
AzureDisk
AzureFile
CephFS
…
More than 100
existing CSI drivers
Alluxio CSI Driver v1.0.0
Host Machine
Alluxio CSI Driver
Components
Alluxio CSI Driver v1.0.0
Host Machine
Alluxio CSI Driver
Components
Persistent
volume +
claim
Alluxio CSI Driver v1.0.0
Host Machine
Alluxio CSI Driver
Components
Application pod
Application
Container
Persistent
volume +
claim
mount
mount
+
Alluxio Fuse
Advantages of Alluxio CSI Driver v1.0.0
1. Fuse process lifecycle is the same as
the application pod. Perfectly
solving our previous challenges
2. Easy to deploy. With CSI deployed
with `helm install`, Fuse process is
automated.
Host
Machine
CSI
Components
Application pod
Application
Container
Persistent
volume +
claim
mount
mount
+
Alluxio
Fuse
Challenges of Alluxio CSI Driver v1.0.0
1. Fuse processes reside in one of the
CSI components - nodeserver
What if nodeserver is down?
What if nodeserver needs to be upgraded?
Host
Machine
CSI
Components
Application pod
Application
Container
Persistent
volume +
claim
mount
mount
+
Alluxio
Fuse
Challenges of Alluxio CSI Driver v1.0.0
1. Fuse processes reside in one of the
CSI components - nodeserver
What if nodeserver is down?
What if nodeserver needs to be upgraded?
Host
Machine
Application pod
Application
Container
Persistent
volume +
claim
mount
Alluxio CSI Driver v1.1.0
Host Machine
Persistent
volume +
claim
CSI
Components
Alluxio CSI Driver v1.1.0
Alluxio Fuse pod
Fuse
Container
Host Machine
Application pod
Application
Container
Persistent
volume +
claim
mount
mount
CSI
Components
create
Alluxio CSI Driver v1.1.0
Alluxio Fuse pod
Fuse
Container
Host Machine
Application pod
Application
Container
Persistent
volume +
claim
mount
mount
What if nodeserver is down?
What if nodeserver needs to be
upgraded?
Problem solved!
1. Fuse pod resource allocation
2. In some cases, different applications always use different Fuse
processes
Next Steps
Acknowledgement
Community users
● Kevin Cai@AntFin
● Hui Fei, Haoning Sun@Shopee
● Binyang Li@Microsoft
Thanks for watching
Questions?

More Related Content

Similar to Alluxio on Kubernetes - Powering training through Container Storage Interface plugin

Masterless Puppet Using AWS S3 Buckets and IAM Roles
Masterless Puppet Using AWS S3 Buckets and IAM RolesMasterless Puppet Using AWS S3 Buckets and IAM Roles
Masterless Puppet Using AWS S3 Buckets and IAM Roles
Malcolm Duncanson, CISSP
 
Run your Appium tests using Docker Android - AppiumConf 2019
Run your Appium tests using Docker Android - AppiumConf 2019Run your Appium tests using Docker Android - AppiumConf 2019
Run your Appium tests using Docker Android - AppiumConf 2019
Sargis Sargsyan
 
70 533 study material
70 533 study material70 533 study material
70 533 study material
Jayasimha reddy Madhira
 
Breaking the Monolith Using AWS Container Services
Breaking the Monolith Using AWS Container ServicesBreaking the Monolith Using AWS Container Services
Breaking the Monolith Using AWS Container Services
Amazon Web Services
 
VMware Tanzu Introduction
VMware Tanzu IntroductionVMware Tanzu Introduction
VMware Tanzu Introduction
VMware Tanzu
 
vSphere with Tanzu Tech Overview 7.0 U1 (1).pptx
vSphere with Tanzu Tech Overview 7.0 U1 (1).pptxvSphere with Tanzu Tech Overview 7.0 U1 (1).pptx
vSphere with Tanzu Tech Overview 7.0 U1 (1).pptx
hokismen
 
Agility Requires Safety
Agility Requires SafetyAgility Requires Safety
Agility Requires Safety
Yevgeniy Brikman
 
Virtualize Your Disaster! Introduction & Update
Virtualize Your Disaster! Introduction & UpdateVirtualize Your Disaster! Introduction & Update
Virtualize Your Disaster! Introduction & Update
Emirates Computers
 
GitOps on Kubernetes with Carvel
GitOps on Kubernetes with CarvelGitOps on Kubernetes with Carvel
GitOps on Kubernetes with Carvel
Alexandre Roman
 
TLS303 How to Deploy Python Applications on AWS Elastic Beanstalk - AWS re:In...
TLS303 How to Deploy Python Applications on AWS Elastic Beanstalk - AWS re:In...TLS303 How to Deploy Python Applications on AWS Elastic Beanstalk - AWS re:In...
TLS303 How to Deploy Python Applications on AWS Elastic Beanstalk - AWS re:In...
Amazon Web Services
 
Ibm smart cloud entry+ for system x administrator guide
Ibm smart cloud entry+ for system x administrator guideIbm smart cloud entry+ for system x administrator guide
Ibm smart cloud entry+ for system x administrator guide
IBM India Smarter Computing
 
Code Factory avec GitLab CI et Rancher
Code Factory avec GitLab CI et RancherCode Factory avec GitLab CI et Rancher
Code Factory avec GitLab CI et Rancher
SUSE
 
Django Deployment
Django DeploymentDjango Deployment
Django Deployment
Tareque Hossain
 
Using Security To Build With Confidence in AWS – Justin Foster, Director of P...
Using Security To Build With Confidence in AWS – Justin Foster, Director of P...Using Security To Build With Confidence in AWS – Justin Foster, Director of P...
Using Security To Build With Confidence in AWS – Justin Foster, Director of P...
Amazon Web Services
 
AWS CodeDeploy - basic intro
AWS CodeDeploy - basic introAWS CodeDeploy - basic intro
AWS CodeDeploy - basic intro
Anton Babenko
 
VBR v8 Overview-handout
VBR v8 Overview-handoutVBR v8 Overview-handout
VBR v8 Overview-handout
Bastian Nurcahya
 
Introduction into Cloud Foundry and Bosh | anynines
Introduction into Cloud Foundry and Bosh | anyninesIntroduction into Cloud Foundry and Bosh | anynines
Introduction into Cloud Foundry and Bosh | anynines
anynines GmbH
 
Oracle autovue
Oracle autovueOracle autovue
Oracle autovue
Osama Mustafa
 
AKS backup with Velero and Workload Identities
AKS backup with Velero and Workload IdentitiesAKS backup with Velero and Workload Identities
AKS backup with Velero and Workload Identities
Kumton Suttiraksiri
 
Dev ops & laas fundamental
Dev ops & laas fundamentalDev ops & laas fundamental
Dev ops & laas fundamental
Kanin Kearpimy
 

Similar to Alluxio on Kubernetes - Powering training through Container Storage Interface plugin (20)

Masterless Puppet Using AWS S3 Buckets and IAM Roles
Masterless Puppet Using AWS S3 Buckets and IAM RolesMasterless Puppet Using AWS S3 Buckets and IAM Roles
Masterless Puppet Using AWS S3 Buckets and IAM Roles
 
Run your Appium tests using Docker Android - AppiumConf 2019
Run your Appium tests using Docker Android - AppiumConf 2019Run your Appium tests using Docker Android - AppiumConf 2019
Run your Appium tests using Docker Android - AppiumConf 2019
 
70 533 study material
70 533 study material70 533 study material
70 533 study material
 
Breaking the Monolith Using AWS Container Services
Breaking the Monolith Using AWS Container ServicesBreaking the Monolith Using AWS Container Services
Breaking the Monolith Using AWS Container Services
 
VMware Tanzu Introduction
VMware Tanzu IntroductionVMware Tanzu Introduction
VMware Tanzu Introduction
 
vSphere with Tanzu Tech Overview 7.0 U1 (1).pptx
vSphere with Tanzu Tech Overview 7.0 U1 (1).pptxvSphere with Tanzu Tech Overview 7.0 U1 (1).pptx
vSphere with Tanzu Tech Overview 7.0 U1 (1).pptx
 
Agility Requires Safety
Agility Requires SafetyAgility Requires Safety
Agility Requires Safety
 
Virtualize Your Disaster! Introduction & Update
Virtualize Your Disaster! Introduction & UpdateVirtualize Your Disaster! Introduction & Update
Virtualize Your Disaster! Introduction & Update
 
GitOps on Kubernetes with Carvel
GitOps on Kubernetes with CarvelGitOps on Kubernetes with Carvel
GitOps on Kubernetes with Carvel
 
TLS303 How to Deploy Python Applications on AWS Elastic Beanstalk - AWS re:In...
TLS303 How to Deploy Python Applications on AWS Elastic Beanstalk - AWS re:In...TLS303 How to Deploy Python Applications on AWS Elastic Beanstalk - AWS re:In...
TLS303 How to Deploy Python Applications on AWS Elastic Beanstalk - AWS re:In...
 
Ibm smart cloud entry+ for system x administrator guide
Ibm smart cloud entry+ for system x administrator guideIbm smart cloud entry+ for system x administrator guide
Ibm smart cloud entry+ for system x administrator guide
 
Code Factory avec GitLab CI et Rancher
Code Factory avec GitLab CI et RancherCode Factory avec GitLab CI et Rancher
Code Factory avec GitLab CI et Rancher
 
Django Deployment
Django DeploymentDjango Deployment
Django Deployment
 
Using Security To Build With Confidence in AWS – Justin Foster, Director of P...
Using Security To Build With Confidence in AWS – Justin Foster, Director of P...Using Security To Build With Confidence in AWS – Justin Foster, Director of P...
Using Security To Build With Confidence in AWS – Justin Foster, Director of P...
 
AWS CodeDeploy - basic intro
AWS CodeDeploy - basic introAWS CodeDeploy - basic intro
AWS CodeDeploy - basic intro
 
VBR v8 Overview-handout
VBR v8 Overview-handoutVBR v8 Overview-handout
VBR v8 Overview-handout
 
Introduction into Cloud Foundry and Bosh | anynines
Introduction into Cloud Foundry and Bosh | anyninesIntroduction into Cloud Foundry and Bosh | anynines
Introduction into Cloud Foundry and Bosh | anynines
 
Oracle autovue
Oracle autovueOracle autovue
Oracle autovue
 
AKS backup with Velero and Workload Identities
AKS backup with Velero and Workload IdentitiesAKS backup with Velero and Workload Identities
AKS backup with Velero and Workload Identities
 
Dev ops & laas fundamental
Dev ops & laas fundamentalDev ops & laas fundamental
Dev ops & laas fundamental
 

More from Alluxio, Inc.

AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
Alluxio, Inc.
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
Alluxio, Inc.
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
Alluxio, Inc.
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio, Inc.
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio, Inc.
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
Alluxio, Inc.
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
Alluxio, Inc.
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
Alluxio, Inc.
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Alluxio, Inc.
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio, Inc.
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio, Inc.
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Alluxio, Inc.
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Alluxio, Inc.
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Alluxio, Inc.
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
Alluxio, Inc.
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
Alluxio, Inc.
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio, Inc.
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
Alluxio, Inc.
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
Alluxio, Inc.
 

More from Alluxio, Inc. (20)

AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 

Recently uploaded

2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
Green Software Development
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
Hironori Washizaki
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
ICS
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
pavan998932
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 

Recently uploaded (20)

2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 

Alluxio on Kubernetes - Powering training through Container Storage Interface plugin

  • 1. Alluxio on Kubernetes Powering training through Container Storage Interface (CSI) plugin
  • 3. Alluxio on Kubernetes Alluxio Master pod Master Container Job Master Container Alluxio Worker pod Worker Container Job Worker Container Alluxio Fuse pod Fuse Container Alluxio Worker pod Worker Container Job Worker Container Alluxio Fuse pod Fuse Container
  • 4. DaemonSet: Worker & Fuse Alluxio Worker pod Worker Container Job Worker Container Alluxio Fuse pod Fuse Container Alluxio Worker pod Worker Container Job Worker Container Alluxio Fuse pod Fuse Container Alluxio Worker pod Worker Container Job Worker Container Alluxio Fuse pod Fuse Container Host Machine 1 Host Machine 2 Host Machine 3
  • 5. Application Alluxio Worker pod Worker Container Job Worker Container Alluxio Fuse pod Fuse Container Host Machine Volume mount
  • 6. Application Alluxio Worker pod Worker Container Job Worker Container Alluxio Fuse pod Fuse Container Host Machine Application pod Application Container Volume mount mount req data
  • 7. Advantages 1. Better performance because of data-locality 2. Easy to deploy. `helm install` to start the whole Alluxio cluster Alluxio Worker pod Worker Contain er Job Worker Contain er Alluxio Fuse pod Fuse Container Host Machine Application pod Application Container Volume mount mount req data
  • 8. Challenges Alluxio Worker pod Worker Contain er Job Worker Contain er Alluxio Fuse pod Fuse Container Host Machine Application pod Application Container Volume mount mount req data 1. Fuse pod is not always needed but always taking resources. Ex. data preprocessing 2. In some workloads Fuse pod is not needed on some machines in the cluster. Lots of manual work.
  • 9. Before CSI on Kubernetes Pod Volume AWS EBS AzureDisk AzureFile CephFS …
  • 10. CSI on Kubernetes Pod Volume CSI AWS EBS AzureDisk AzureFile CephFS … More than 100 existing CSI drivers
  • 11. Alluxio CSI Driver v1.0.0 Host Machine Alluxio CSI Driver Components
  • 12. Alluxio CSI Driver v1.0.0 Host Machine Alluxio CSI Driver Components Persistent volume + claim
  • 13. Alluxio CSI Driver v1.0.0 Host Machine Alluxio CSI Driver Components Application pod Application Container Persistent volume + claim mount mount + Alluxio Fuse
  • 14. Advantages of Alluxio CSI Driver v1.0.0 1. Fuse process lifecycle is the same as the application pod. Perfectly solving our previous challenges 2. Easy to deploy. With CSI deployed with `helm install`, Fuse process is automated. Host Machine CSI Components Application pod Application Container Persistent volume + claim mount mount + Alluxio Fuse
  • 15. Challenges of Alluxio CSI Driver v1.0.0 1. Fuse processes reside in one of the CSI components - nodeserver What if nodeserver is down? What if nodeserver needs to be upgraded? Host Machine CSI Components Application pod Application Container Persistent volume + claim mount mount + Alluxio Fuse
  • 16. Challenges of Alluxio CSI Driver v1.0.0 1. Fuse processes reside in one of the CSI components - nodeserver What if nodeserver is down? What if nodeserver needs to be upgraded? Host Machine Application pod Application Container Persistent volume + claim mount
  • 17. Alluxio CSI Driver v1.1.0 Host Machine Persistent volume + claim CSI Components
  • 18. Alluxio CSI Driver v1.1.0 Alluxio Fuse pod Fuse Container Host Machine Application pod Application Container Persistent volume + claim mount mount CSI Components create
  • 19. Alluxio CSI Driver v1.1.0 Alluxio Fuse pod Fuse Container Host Machine Application pod Application Container Persistent volume + claim mount mount What if nodeserver is down? What if nodeserver needs to be upgraded? Problem solved!
  • 20. 1. Fuse pod resource allocation 2. In some cases, different applications always use different Fuse processes Next Steps
  • 21. Acknowledgement Community users ● Kevin Cai@AntFin ● Hui Fei, Haoning Sun@Shopee ● Binyang Li@Microsoft