SlideShare a Scribd company logo
Eric Li
Senior Architect of Alibaba Cloud
2
Agenda
• Why Alluxio On Kubernetes
• Brief introduction to Alibaba Cloud Kubernetes
• Challenges
• Alluxio Helm chart
• Contribution to Alluxio
• Best practice
• Known issues
3
Kubernetes: Cloud Native OS
BA
,/ ,
AD
,
,B : : B BA , C A : B
A
,
B A E F
Web/mobile applications
- Stateless
- Idempotent
- Horizontal scalable
Mysql Kafka TIDB
Elastic
Search
Tenso
r
Flow
Spark FlinkRedis
Zoo
keeper
Stateless -> StatefulSet (Enterprise App) -> Data Intelligence
B : D The 1st Choice to train AI model with 32/64/128 v100 GPU
Why Alluxio on Kubernetes ?
More and more data-driven applications run on Kubernetes
Unified Orchestration
Consistent, declarative provisioning
Fastest Growing Community
Disaggregated compute and storage is becoming mainstream in cloud
Flexible
Scalable
Easy to maintain
But the data access of application in Kubernetes is bottleneck
Adaption for different storages and computation framework
Speed
Efficiency
4
5
ECFJI D
. ) ( )
/ ILEGA
/ . - /
IEG
/ (
,J GD I O ,P GK GB ,J GD I ,
IGN GK
. GA IFB
GK
DI G I EDJG IN
ED EGC D
GK B IN
B I IN
D EM
EDI D G
I E ,D I K JIE BEI
K F . GE GK GK . DI GFG FFB I ED DDEK I ED
) IB + DA D ) I F I E D I + BE A D EFG D BEJ
J B BEJ ECFJI D G K I BEJ
GK GB FFB I ED
) D D E D
.JBI BEJ
Overview of ACK
. GE GK I I JB . B L G I DDEK I ED
F J E FG D BEJ F (B DA D EG BEL BE A D E.N - E A I.
6
The Challenges of Alluxio + Kubernetes
How to deploy Alluxio in Kubernetes way?
How to access data without any change of application?
How to achieve the best performance of Alluxio in Kubernetes?
6
7
The Challenges of Alluxio + Kubernetes
Helm/Operator
UFS and POSIX Fuse, lazy load oss
Optimize OSS SDK and short circuit
7
8
Node
Caffe
Alluxio-fusePod
Worke
r
Job Worker
Pod RAM/SSD/HDD
fuse
Short circuit
Caffe
Node
MxNet
Alluxio-fusePod
Worke
r
Job Worker
Pod RAM/SSD/HDD
fuse
Short circuit
MxNet
Node
TensorFlo
w
Alluxio-fusePod
Worke
r
Job Worker
Pod RAM/SSD/HDD
fuse
Short circuit
TensorFlo
w
Master
Alluxio Worker Daemonset
Alluxio Fuse Daemonset
Master Job Master
ConfigMap
ALLUXIO_JAVA_OPTS
ALLUXIO_WORKER_JAVA_OPTS
ALLUXIO_MASTER_JAVA_OPTS
Pod
Statefulset
Alluxio On Kubernetes Architecture
9
OSS SDK Optimization for Alluxio
0
5
10
15
20
25
30
35
40
45
ossfuse ossutil Alluxio
Minutes
The time cost of Data Load of ImageNet(143GB)
10
One-click Installation with Helm
value file of Helm Chart:
An application-specific YAML file
Custom free
Simple to deploy
Easy to share through helm repo
Move to Operator in next step
10
11
Usage of Alluxio Helm Chart
$ cat << EOF > config.yaml
properties:
fs.oss.accessKeyId: xxx
fs.oss.accessKeySecret: yyy
alluxio.master.mount.table.root.ufs: oss://imagenet-huabei5/
EOF
# One click install
$ helm install -f config.yaml alluxio-repo/alluxio --version 2.1.0-SNAPSHOT
# Preload the data
$ helm install --set dir=/images --set threads=54 alluxio-job
11
12
Poor performance
Poor scalability
Good performance
Good scalability
Explicit copying
Expensive
Good Performance
Good scalability
Lazy load
Cheap!
Why Choose Alluxio for HPC
CPFS
Alibaba OSS Alibaba OSS Alibaba OSS
13
Arena for Deep Learning Training
. . . . , '.
. . . , '
https://github.com/kubeflow/arena
Kubernetes / Docker
Kubeflow
arena CLI
Other backends CRD
Arena
Tensorflow, Caffe, PyTorch, MPI, Hovorod
CPU/GPU/FPGA Ethernet/RDMA Hadoop/OSS/CPFS
Flink, Spark
14
Run Deep Learning Job with Alluxio
$ arena submit mpi 
--name alluxio-4x8-cold 
--gpus=8 
--workers=4 
--data-dir /alluxio-fuse/images:/data/imagenet 
-e DATA_DIR=/data/imagenet 
--image=registry.cn-hangzhou.aliyuncs.com/tensorflow-samples/perseus-benchmark 
./launch-example.sh 4 8
2019-10-24T07:51:42.021611213Z ----------------------------------------------------------------
2019-10-24T07:51:42.024245962Z 1000 images/sec: 234.2 +/- 0.7 (jitter = 8.3) 5.781
2019-10-24T07:51:42.024259919Z ----------------------------------------------------------------
2019-10-24T07:51:42.024264488Z total images/sec: 7492.44
2019-10-24T07:51:42.024267687Z ----------------------------------------------------------------
14
15
100% Faster (alluxio-fuse vs ossgw-nfs)
309.79
569.8 699.2
1349.87
3478.98
209.82
1154.8
2244.3
3868.79
7492.44
0
1000
2000
3000
4000
5000
6000
7000
8000
1 4 8 16 32
Images/seconds
GPUs
Training throughput between Alluxio and OSS(ResNet50, Batch Size 128)
ossmounter alluxio-fuse
16
50% Faster(alluxio-fuse vs ossfs-fuse)
284.05
833.6
1312.02
2685.07
5054.61
209.82
1154.8
2244.3
3868.79
7492.44
0
1000
2000
3000
4000
5000
6000
7000
8000
1 4 8 16 32
Images/seconds
GPUs
Training throughput between Alluxio and OSS(ResNet50, Batch Size 128)
oss alluxio-fuse
17
HPC: Genomic Computing on Kubernetes
KN LF
0 0
2 A
U
WT
1
OG
1
KN
C LF
QP
SE
+SE
+
0 0
B 2 00 A 02
CSI PVC
Users submit pipeline
18
IO Feature
1. Few number of files
(100)
2. High Throughput
3. Intensive request 1W
s
4. Frequently read the
same reference
data.(50GB) in
different pipeline.
19
Read/Write Intensive throughput
- Leverage Alluxio to Reduce read IO for reference data
20
Best Practice – Cont.
1. Data size is less than whole cache(mem + ssd), leverage
LocalFirstAvoidEvictionPolicy, avoid to swap data from disk to memory
frequently.
2. Data size is huge than whole cache, keep default eviction behavior.
Cache PolicyTradeoff
alluxio.user.ufs.block.read.location.policy: alluxio.client.block.policy.LocalFirstAvoidEvictionPolicy
alluxio.user.block.avoid.eviction.policy.reserved.size.bytes: 8GB
alluxio.worker.tieredstore.level0.dirs.path=/dev/shm,/var/lib/docker/alluxio-ssd
20
21
ResNet50 : UnderFileSystemBlockReader failure
170 images/sec: 191.7 +/- 2.5 (jitter = 36.8) 7.467
170 images/sec: 191.7 +/- 2.5 (jitter = 36.8) 7.299
### Pin + No Eviction, data size exceed the pool size
450 images/sec: 91.8 +/- 3.0 (jitter = 40.1) 5.701
450 images/sec: 91.8 +/- 3.0 (jitter = 40.5) 5.487
650 images/sec: 75.5 +/- 2.9 (jitter = 35.2) 5.455
650 images/sec: 75.5 +/- 2.9 (jitter = 33.1) 5.776
21
22
ResNet50: Eviction
950 images/sec: 206.0 +/- 1.2 (jitter = 23.2) 6.197
950 images/sec: 206.0 +/- 1.2 (jitter = 23.2) 6.214
### No pin and no eviction, Eviction happened
990 images/sec: 191.1 +/- 1.3 (jitter = 23.5) 6.234
1000 images/sec: 189.5 +/- 1.3 (jitter = 23.5) 6.171
22
23
Short Circuit with LocalVolume
Tiered storage
capacity, medium type and quota
hostPath or emptyDir
Different choice of short circuit
Unix socket for grpc
Shared hostPath volume for fuse
23
24
DL:Avoid Passive Cache
Training Data is distributed in Alluxio cluster, the client Do Not
synchronize to the local.
passive vs initiative
Worker configuration,Turn Off passive cache
alluxio.user.file.passive.cache.enabled: false
25
Not So Cloud NativeYet
• Health check and availability check
• How to leverage API to detect health of fuse and worker?
• Missing Liveness Probe and Readyness Probe
• Observerability support for Prometheus
• fs report metrics exporter
• Graphana dashboard
• Data cache aware scheduling
• Scheduler locality according to block host
26
Known Issues
1. Performance downgrade 10%-20% during data eviction.
2. Append write
3. Intensive Write
26
27
OOM for JVM/OS
1. Different node specifications, high and low node 8c16G/8c32G, need to use distributed memory effectively
Fuse memory/worker memory
2. FUSE process memory consumption is high
jvmOptions: " -XX:MaxDirectMemorySize=16g ” Bug: https://github.com/Alluxio/alluxio/issues/9525
3. Alluxio's caching strategy, which retains the most frequently accessed pieces of data, can be accessed?
alluxio.worker.evictor.class =alluxio.worker.block.evictor.LRUEvictor
4. Data refresh strategy?
alluxio.user.ufs.block.read.location.policy: alluxio.client.block.policy.LocalFirstAvoidEvictionPolicy
27
28
Take away
1. DL: Sample data size is less than whole cache(mem + ssd), avoid to
swap data from disk to memory frequently.
2. DL: Sample data size is larger than whole cache, keep default
eviction behavior.
3. SSD tiered, enable short circuit with local volume
4. HPC: Object Storage, accelerate reading only at present
5. HPC: For small size of worker node, disable passive mode.
6. HPC:Always keep frequent access data in memory tier
7. K8s Scheduler locality for MPI/PS jobs.
THANK YOU
29

More Related Content

What's hot

Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Alluxio, Inc.
 
Alluxio Mesos Meetup - SMACK to SMAACK
Alluxio Mesos Meetup - SMACK to SMAACKAlluxio Mesos Meetup - SMACK to SMAACK
Alluxio Mesos Meetup - SMACK to SMAACK
Alluxio, Inc.
 
Building Fast SQL Analytics on Anything with Presto, Alluxio
Building Fast SQL Analytics on Anything with Presto, AlluxioBuilding Fast SQL Analytics on Anything with Presto, Alluxio
Building Fast SQL Analytics on Anything with Presto, Alluxio
Alluxio, Inc.
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
Alluxio, Inc.
 
What's New in Alluxio 2.3
What's New in Alluxio 2.3What's New in Alluxio 2.3
What's New in Alluxio 2.3
Alluxio, Inc.
 
How to Build a new under filesystem in Alluxio: Apache Ozone as an example
How to Build a new under filesystem in Alluxio: Apache Ozone as an exampleHow to Build a new under filesystem in Alluxio: Apache Ozone as an example
How to Build a new under filesystem in Alluxio: Apache Ozone as an example
Alluxio, Inc.
 
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
Securely Enhancing Data Access in Hybrid Cloud with AlluxioSecurely Enhancing Data Access in Hybrid Cloud with Alluxio
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
Alluxio, Inc.
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
Alluxio, Inc.
 
Running Spark & Alluxio in Kubernetes
Running Spark & Alluxio in KubernetesRunning Spark & Alluxio in Kubernetes
Running Spark & Alluxio in Kubernetes
Alluxio, Inc.
 
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Alluxio, Inc.
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
Alluxio, Inc.
 
Hands-on with Alluxio Structured Data Management
Hands-on with Alluxio Structured Data ManagementHands-on with Alluxio Structured Data Management
Hands-on with Alluxio Structured Data Management
Alluxio, Inc.
 
Setting up monitoring system for Alluxio with Prometheus and Grafana in 10 mi...
Setting up monitoring system for Alluxio with Prometheus and Grafana in 10 mi...Setting up monitoring system for Alluxio with Prometheus and Grafana in 10 mi...
Setting up monitoring system for Alluxio with Prometheus and Grafana in 10 mi...
Alluxio, Inc.
 
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud EraModernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Alluxio, Inc.
 
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio, Inc.
 
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Alluxio, Inc.
 
Alluxio Architecture and Performance
Alluxio Architecture and PerformanceAlluxio Architecture and Performance
Alluxio Architecture and Performance
Alluxio, Inc.
 
Presto on Alluxio Hands-On Lab
Presto on Alluxio Hands-On LabPresto on Alluxio Hands-On Lab
Presto on Alluxio Hands-On Lab
Alluxio, Inc.
 
Deploying Alluxio in the Cloud for Machine Learning
Deploying Alluxio in the Cloud for Machine LearningDeploying Alluxio in the Cloud for Machine Learning
Deploying Alluxio in the Cloud for Machine Learning
Alluxio, Inc.
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
Alluxio, Inc.
 

What's hot (20)

Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 
Alluxio Mesos Meetup - SMACK to SMAACK
Alluxio Mesos Meetup - SMACK to SMAACKAlluxio Mesos Meetup - SMACK to SMAACK
Alluxio Mesos Meetup - SMACK to SMAACK
 
Building Fast SQL Analytics on Anything with Presto, Alluxio
Building Fast SQL Analytics on Anything with Presto, AlluxioBuilding Fast SQL Analytics on Anything with Presto, Alluxio
Building Fast SQL Analytics on Anything with Presto, Alluxio
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
 
What's New in Alluxio 2.3
What's New in Alluxio 2.3What's New in Alluxio 2.3
What's New in Alluxio 2.3
 
How to Build a new under filesystem in Alluxio: Apache Ozone as an example
How to Build a new under filesystem in Alluxio: Apache Ozone as an exampleHow to Build a new under filesystem in Alluxio: Apache Ozone as an example
How to Build a new under filesystem in Alluxio: Apache Ozone as an example
 
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
Securely Enhancing Data Access in Hybrid Cloud with AlluxioSecurely Enhancing Data Access in Hybrid Cloud with Alluxio
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
Running Spark & Alluxio in Kubernetes
Running Spark & Alluxio in KubernetesRunning Spark & Alluxio in Kubernetes
Running Spark & Alluxio in Kubernetes
 
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
Hands-on with Alluxio Structured Data Management
Hands-on with Alluxio Structured Data ManagementHands-on with Alluxio Structured Data Management
Hands-on with Alluxio Structured Data Management
 
Setting up monitoring system for Alluxio with Prometheus and Grafana in 10 mi...
Setting up monitoring system for Alluxio with Prometheus and Grafana in 10 mi...Setting up monitoring system for Alluxio with Prometheus and Grafana in 10 mi...
Setting up monitoring system for Alluxio with Prometheus and Grafana in 10 mi...
 
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud EraModernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
 
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
 
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
 
Alluxio Architecture and Performance
Alluxio Architecture and PerformanceAlluxio Architecture and Performance
Alluxio Architecture and Performance
 
Presto on Alluxio Hands-On Lab
Presto on Alluxio Hands-On LabPresto on Alluxio Hands-On Lab
Presto on Alluxio Hands-On Lab
 
Deploying Alluxio in the Cloud for Machine Learning
Deploying Alluxio in the Cloud for Machine LearningDeploying Alluxio in the Cloud for Machine Learning
Deploying Alluxio in the Cloud for Machine Learning
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 

Similar to Deep Learning and Gene Computing Acceleration with Alluxio in Kubernetes

TechWiseTV Workshop: Application Hosting on Catalyst 9000 Series Switches
TechWiseTV Workshop: Application Hosting on Catalyst 9000 Series SwitchesTechWiseTV Workshop: Application Hosting on Catalyst 9000 Series Switches
TechWiseTV Workshop: Application Hosting on Catalyst 9000 Series Switches
Robb Boyd
 
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Community
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Danielle Womboldt
 
2014/09/02 Cisco UCS HPC @ ANL
2014/09/02 Cisco UCS HPC @ ANL2014/09/02 Cisco UCS HPC @ ANL
2014/09/02 Cisco UCS HPC @ ANL
dgoodell
 
OpenNebulaConf 2016 - OpenNebula, a story about flexibility and technological...
OpenNebulaConf 2016 - OpenNebula, a story about flexibility and technological...OpenNebulaConf 2016 - OpenNebula, a story about flexibility and technological...
OpenNebulaConf 2016 - OpenNebula, a story about flexibility and technological...
OpenNebula Project
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
Nordic APIs
 
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Community
 
CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...
CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...
CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...
Andrey Korolyov
 
The Data Center and Hadoop
The Data Center and HadoopThe Data Center and Hadoop
The Data Center and Hadoop
DataWorks Summit
 
Unleash oracle 12c performance with cisco ucs
Unleash oracle 12c performance with cisco ucsUnleash oracle 12c performance with cisco ucs
Unleash oracle 12c performance with cisco ucs
solarisyougood
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
Ryousei Takano
 
The Enhanced Cisco Container Platform
The Enhanced Cisco Container PlatformThe Enhanced Cisco Container Platform
The Enhanced Cisco Container Platform
Robb Boyd
 
Minikube – get Connections in the smalles possible setup
Minikube – get Connections in the smalles possible setupMinikube – get Connections in the smalles possible setup
Minikube – get Connections in the smalles possible setup
Martin Schmidt
 
Datasheet - Pivot3 - HCI Family
Datasheet - Pivot3 - HCI FamilyDatasheet - Pivot3 - HCI Family
Datasheet - Pivot3 - HCI Family
Grant Aitken
 
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius SchumacherOSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
NETWAYS
 
Brain in the Cloud: Machine Learning on OpenStack & Kubernetes Done Right - E...
Brain in the Cloud: Machine Learning on OpenStack & Kubernetes Done Right - E...Brain in the Cloud: Machine Learning on OpenStack & Kubernetes Done Right - E...
Brain in the Cloud: Machine Learning on OpenStack & Kubernetes Done Right - E...
Cloud Native Day Tel Aviv
 
#VirtualDesignMaster 3 Challenge 2 - Harshvardhan Gupta
#VirtualDesignMaster 3 Challenge 2 - Harshvardhan Gupta#VirtualDesignMaster 3 Challenge 2 - Harshvardhan Gupta
#VirtualDesignMaster 3 Challenge 2 - Harshvardhan Gupta
vdmchallenge
 
Azure Kubernetes Service - benefits and challenges
Azure Kubernetes Service - benefits and challengesAzure Kubernetes Service - benefits and challenges
Azure Kubernetes Service - benefits and challenges
Wojciech Barczyński
 
Appliance Launch Webcast
Appliance Launch WebcastAppliance Launch Webcast
Appliance Launch Webcast
Gina Tragos
 
Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...
PT Datacomm Diangraha
 

Similar to Deep Learning and Gene Computing Acceleration with Alluxio in Kubernetes (20)

TechWiseTV Workshop: Application Hosting on Catalyst 9000 Series Switches
TechWiseTV Workshop: Application Hosting on Catalyst 9000 Series SwitchesTechWiseTV Workshop: Application Hosting on Catalyst 9000 Series Switches
TechWiseTV Workshop: Application Hosting on Catalyst 9000 Series Switches
 
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 
2014/09/02 Cisco UCS HPC @ ANL
2014/09/02 Cisco UCS HPC @ ANL2014/09/02 Cisco UCS HPC @ ANL
2014/09/02 Cisco UCS HPC @ ANL
 
OpenNebulaConf 2016 - OpenNebula, a story about flexibility and technological...
OpenNebulaConf 2016 - OpenNebula, a story about flexibility and technological...OpenNebulaConf 2016 - OpenNebula, a story about flexibility and technological...
OpenNebulaConf 2016 - OpenNebula, a story about flexibility and technological...
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
 
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
 
CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...
CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...
CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...
 
The Data Center and Hadoop
The Data Center and HadoopThe Data Center and Hadoop
The Data Center and Hadoop
 
Unleash oracle 12c performance with cisco ucs
Unleash oracle 12c performance with cisco ucsUnleash oracle 12c performance with cisco ucs
Unleash oracle 12c performance with cisco ucs
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
 
The Enhanced Cisco Container Platform
The Enhanced Cisco Container PlatformThe Enhanced Cisco Container Platform
The Enhanced Cisco Container Platform
 
Minikube – get Connections in the smalles possible setup
Minikube – get Connections in the smalles possible setupMinikube – get Connections in the smalles possible setup
Minikube – get Connections in the smalles possible setup
 
Datasheet - Pivot3 - HCI Family
Datasheet - Pivot3 - HCI FamilyDatasheet - Pivot3 - HCI Family
Datasheet - Pivot3 - HCI Family
 
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius SchumacherOSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
 
Brain in the Cloud: Machine Learning on OpenStack & Kubernetes Done Right - E...
Brain in the Cloud: Machine Learning on OpenStack & Kubernetes Done Right - E...Brain in the Cloud: Machine Learning on OpenStack & Kubernetes Done Right - E...
Brain in the Cloud: Machine Learning on OpenStack & Kubernetes Done Right - E...
 
#VirtualDesignMaster 3 Challenge 2 - Harshvardhan Gupta
#VirtualDesignMaster 3 Challenge 2 - Harshvardhan Gupta#VirtualDesignMaster 3 Challenge 2 - Harshvardhan Gupta
#VirtualDesignMaster 3 Challenge 2 - Harshvardhan Gupta
 
Azure Kubernetes Service - benefits and challenges
Azure Kubernetes Service - benefits and challengesAzure Kubernetes Service - benefits and challenges
Azure Kubernetes Service - benefits and challenges
 
Appliance Launch Webcast
Appliance Launch WebcastAppliance Launch Webcast
Appliance Launch Webcast
 
Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...
 

More from Alluxio, Inc.

AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
Alluxio, Inc.
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
Alluxio, Inc.
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
Alluxio, Inc.
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio, Inc.
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio, Inc.
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
Alluxio, Inc.
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
Alluxio, Inc.
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
Alluxio, Inc.
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Alluxio, Inc.
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio, Inc.
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio, Inc.
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Alluxio, Inc.
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Alluxio, Inc.
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Alluxio, Inc.
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
Alluxio, Inc.
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
Alluxio, Inc.
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio, Inc.
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
Alluxio, Inc.
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
Alluxio, Inc.
 

More from Alluxio, Inc. (20)

AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 

Recently uploaded

Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
Green Software Development
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
SOCRadar
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
Green Software Development
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
Roshan Dwivedi
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
Gerardo Pardo-Castellote
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Mobile app Development Services | Drona Infotech
Mobile app Development Services  | Drona InfotechMobile app Development Services  | Drona Infotech
Mobile app Development Services | Drona Infotech
Drona Infotech
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
Preparing Non - Technical Founders for Engaging a Tech Agency
Preparing Non - Technical Founders for Engaging  a  Tech AgencyPreparing Non - Technical Founders for Engaging  a  Tech Agency
Preparing Non - Technical Founders for Engaging a Tech Agency
ISH Technologies
 

Recently uploaded (20)

Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Mobile app Development Services | Drona Infotech
Mobile app Development Services  | Drona InfotechMobile app Development Services  | Drona Infotech
Mobile app Development Services | Drona Infotech
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
Preparing Non - Technical Founders for Engaging a Tech Agency
Preparing Non - Technical Founders for Engaging  a  Tech AgencyPreparing Non - Technical Founders for Engaging  a  Tech Agency
Preparing Non - Technical Founders for Engaging a Tech Agency
 

Deep Learning and Gene Computing Acceleration with Alluxio in Kubernetes

  • 1. Eric Li Senior Architect of Alibaba Cloud
  • 2. 2 Agenda • Why Alluxio On Kubernetes • Brief introduction to Alibaba Cloud Kubernetes • Challenges • Alluxio Helm chart • Contribution to Alluxio • Best practice • Known issues
  • 3. 3 Kubernetes: Cloud Native OS BA ,/ , AD , ,B : : B BA , C A : B A , B A E F Web/mobile applications - Stateless - Idempotent - Horizontal scalable Mysql Kafka TIDB Elastic Search Tenso r Flow Spark FlinkRedis Zoo keeper Stateless -> StatefulSet (Enterprise App) -> Data Intelligence B : D The 1st Choice to train AI model with 32/64/128 v100 GPU
  • 4. Why Alluxio on Kubernetes ? More and more data-driven applications run on Kubernetes Unified Orchestration Consistent, declarative provisioning Fastest Growing Community Disaggregated compute and storage is becoming mainstream in cloud Flexible Scalable Easy to maintain But the data access of application in Kubernetes is bottleneck Adaption for different storages and computation framework Speed Efficiency 4
  • 5. 5 ECFJI D . ) ( ) / ILEGA / . - / IEG / ( ,J GD I O ,P GK GB ,J GD I , IGN GK . GA IFB GK DI G I EDJG IN ED EGC D GK B IN B I IN D EM EDI D G I E ,D I K JIE BEI K F . GE GK GK . DI GFG FFB I ED DDEK I ED ) IB + DA D ) I F I E D I + BE A D EFG D BEJ J B BEJ ECFJI D G K I BEJ GK GB FFB I ED ) D D E D .JBI BEJ Overview of ACK . GE GK I I JB . B L G I DDEK I ED F J E FG D BEJ F (B DA D EG BEL BE A D E.N - E A I.
  • 6. 6 The Challenges of Alluxio + Kubernetes How to deploy Alluxio in Kubernetes way? How to access data without any change of application? How to achieve the best performance of Alluxio in Kubernetes? 6
  • 7. 7 The Challenges of Alluxio + Kubernetes Helm/Operator UFS and POSIX Fuse, lazy load oss Optimize OSS SDK and short circuit 7
  • 8. 8 Node Caffe Alluxio-fusePod Worke r Job Worker Pod RAM/SSD/HDD fuse Short circuit Caffe Node MxNet Alluxio-fusePod Worke r Job Worker Pod RAM/SSD/HDD fuse Short circuit MxNet Node TensorFlo w Alluxio-fusePod Worke r Job Worker Pod RAM/SSD/HDD fuse Short circuit TensorFlo w Master Alluxio Worker Daemonset Alluxio Fuse Daemonset Master Job Master ConfigMap ALLUXIO_JAVA_OPTS ALLUXIO_WORKER_JAVA_OPTS ALLUXIO_MASTER_JAVA_OPTS Pod Statefulset Alluxio On Kubernetes Architecture
  • 9. 9 OSS SDK Optimization for Alluxio 0 5 10 15 20 25 30 35 40 45 ossfuse ossutil Alluxio Minutes The time cost of Data Load of ImageNet(143GB)
  • 10. 10 One-click Installation with Helm value file of Helm Chart: An application-specific YAML file Custom free Simple to deploy Easy to share through helm repo Move to Operator in next step 10
  • 11. 11 Usage of Alluxio Helm Chart $ cat << EOF > config.yaml properties: fs.oss.accessKeyId: xxx fs.oss.accessKeySecret: yyy alluxio.master.mount.table.root.ufs: oss://imagenet-huabei5/ EOF # One click install $ helm install -f config.yaml alluxio-repo/alluxio --version 2.1.0-SNAPSHOT # Preload the data $ helm install --set dir=/images --set threads=54 alluxio-job 11
  • 12. 12 Poor performance Poor scalability Good performance Good scalability Explicit copying Expensive Good Performance Good scalability Lazy load Cheap! Why Choose Alluxio for HPC CPFS Alibaba OSS Alibaba OSS Alibaba OSS
  • 13. 13 Arena for Deep Learning Training . . . . , '. . . . , ' https://github.com/kubeflow/arena Kubernetes / Docker Kubeflow arena CLI Other backends CRD Arena Tensorflow, Caffe, PyTorch, MPI, Hovorod CPU/GPU/FPGA Ethernet/RDMA Hadoop/OSS/CPFS Flink, Spark
  • 14. 14 Run Deep Learning Job with Alluxio $ arena submit mpi --name alluxio-4x8-cold --gpus=8 --workers=4 --data-dir /alluxio-fuse/images:/data/imagenet -e DATA_DIR=/data/imagenet --image=registry.cn-hangzhou.aliyuncs.com/tensorflow-samples/perseus-benchmark ./launch-example.sh 4 8 2019-10-24T07:51:42.021611213Z ---------------------------------------------------------------- 2019-10-24T07:51:42.024245962Z 1000 images/sec: 234.2 +/- 0.7 (jitter = 8.3) 5.781 2019-10-24T07:51:42.024259919Z ---------------------------------------------------------------- 2019-10-24T07:51:42.024264488Z total images/sec: 7492.44 2019-10-24T07:51:42.024267687Z ---------------------------------------------------------------- 14
  • 15. 15 100% Faster (alluxio-fuse vs ossgw-nfs) 309.79 569.8 699.2 1349.87 3478.98 209.82 1154.8 2244.3 3868.79 7492.44 0 1000 2000 3000 4000 5000 6000 7000 8000 1 4 8 16 32 Images/seconds GPUs Training throughput between Alluxio and OSS(ResNet50, Batch Size 128) ossmounter alluxio-fuse
  • 16. 16 50% Faster(alluxio-fuse vs ossfs-fuse) 284.05 833.6 1312.02 2685.07 5054.61 209.82 1154.8 2244.3 3868.79 7492.44 0 1000 2000 3000 4000 5000 6000 7000 8000 1 4 8 16 32 Images/seconds GPUs Training throughput between Alluxio and OSS(ResNet50, Batch Size 128) oss alluxio-fuse
  • 17. 17 HPC: Genomic Computing on Kubernetes KN LF 0 0 2 A U WT 1 OG 1 KN C LF QP SE +SE + 0 0 B 2 00 A 02 CSI PVC Users submit pipeline
  • 18. 18 IO Feature 1. Few number of files (100) 2. High Throughput 3. Intensive request 1W s 4. Frequently read the same reference data.(50GB) in different pipeline.
  • 19. 19 Read/Write Intensive throughput - Leverage Alluxio to Reduce read IO for reference data
  • 20. 20 Best Practice – Cont. 1. Data size is less than whole cache(mem + ssd), leverage LocalFirstAvoidEvictionPolicy, avoid to swap data from disk to memory frequently. 2. Data size is huge than whole cache, keep default eviction behavior. Cache PolicyTradeoff alluxio.user.ufs.block.read.location.policy: alluxio.client.block.policy.LocalFirstAvoidEvictionPolicy alluxio.user.block.avoid.eviction.policy.reserved.size.bytes: 8GB alluxio.worker.tieredstore.level0.dirs.path=/dev/shm,/var/lib/docker/alluxio-ssd 20
  • 21. 21 ResNet50 : UnderFileSystemBlockReader failure 170 images/sec: 191.7 +/- 2.5 (jitter = 36.8) 7.467 170 images/sec: 191.7 +/- 2.5 (jitter = 36.8) 7.299 ### Pin + No Eviction, data size exceed the pool size 450 images/sec: 91.8 +/- 3.0 (jitter = 40.1) 5.701 450 images/sec: 91.8 +/- 3.0 (jitter = 40.5) 5.487 650 images/sec: 75.5 +/- 2.9 (jitter = 35.2) 5.455 650 images/sec: 75.5 +/- 2.9 (jitter = 33.1) 5.776 21
  • 22. 22 ResNet50: Eviction 950 images/sec: 206.0 +/- 1.2 (jitter = 23.2) 6.197 950 images/sec: 206.0 +/- 1.2 (jitter = 23.2) 6.214 ### No pin and no eviction, Eviction happened 990 images/sec: 191.1 +/- 1.3 (jitter = 23.5) 6.234 1000 images/sec: 189.5 +/- 1.3 (jitter = 23.5) 6.171 22
  • 23. 23 Short Circuit with LocalVolume Tiered storage capacity, medium type and quota hostPath or emptyDir Different choice of short circuit Unix socket for grpc Shared hostPath volume for fuse 23
  • 24. 24 DL:Avoid Passive Cache Training Data is distributed in Alluxio cluster, the client Do Not synchronize to the local. passive vs initiative Worker configuration,Turn Off passive cache alluxio.user.file.passive.cache.enabled: false
  • 25. 25 Not So Cloud NativeYet • Health check and availability check • How to leverage API to detect health of fuse and worker? • Missing Liveness Probe and Readyness Probe • Observerability support for Prometheus • fs report metrics exporter • Graphana dashboard • Data cache aware scheduling • Scheduler locality according to block host
  • 26. 26 Known Issues 1. Performance downgrade 10%-20% during data eviction. 2. Append write 3. Intensive Write 26
  • 27. 27 OOM for JVM/OS 1. Different node specifications, high and low node 8c16G/8c32G, need to use distributed memory effectively Fuse memory/worker memory 2. FUSE process memory consumption is high jvmOptions: " -XX:MaxDirectMemorySize=16g ” Bug: https://github.com/Alluxio/alluxio/issues/9525 3. Alluxio's caching strategy, which retains the most frequently accessed pieces of data, can be accessed? alluxio.worker.evictor.class =alluxio.worker.block.evictor.LRUEvictor 4. Data refresh strategy? alluxio.user.ufs.block.read.location.policy: alluxio.client.block.policy.LocalFirstAvoidEvictionPolicy 27
  • 28. 28 Take away 1. DL: Sample data size is less than whole cache(mem + ssd), avoid to swap data from disk to memory frequently. 2. DL: Sample data size is larger than whole cache, keep default eviction behavior. 3. SSD tiered, enable short circuit with local volume 4. HPC: Object Storage, accelerate reading only at present 5. HPC: For small size of worker node, disable passive mode. 6. HPC:Always keep frequent access data in memory tier 7. K8s Scheduler locality for MPI/PS jobs.