SlideShare a Scribd company logo
1 of 21
Vinetalk: The missing piece for
cluster managers to enable
accelerator sharing.
Christos Kozanitis - FORTH
Today’s ecosystem
Productivity,
Performance, QoS
Software Stack
Utilization
To make things harder: Add heterogeneity
• Accelerators
(GPU/FPGA)
• $$$ / device
• Better value if:
• Good utilization
• Ease of use
Existing accelerator support
• Mesos/Kubernetes
• They know how to manage GPU resources
• They do not know how to offer fractions of GPUs.
• Underutilization => Expensive practice
• More workloads => more hardware
Production workload characteristics
• Hypothesis:
• “User facing” tasks take a few msec
• Non production tasks run in the background for days
• No accelerator-cluster data available to test
• Google datacenter data from the past
User facing task durations
• P50: 300sec
• Long tail: millions of sec
• Hypothesis does not hold
• Production tasks last longer
than “a few msec”
CDF of Task Execution time for priorities 9-11
Total tasks: 80K
Source: Google cluster data
Are tasks doing “work” all of the time?
• Not really
• Noon is busier than night
• Assuming GPU tasks
follow the same pattern
• Expensive to keep GPUs
idle/underutilized
Source: Reiss et.al, Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis, SOCC’12
50%
Time (hour)
Portionof
clusterCPU
Long running production tasks
To sum up so far
• Cluster managers do not enable accelerator sharing
• Expensive practice
• 10’s of $Ks / device
• More users => more devices
• Long reservations for too many production tasks
• P50=300sec, P80=1hr
• The tasks are not always busy
• 50% workload volatility
Our approach
• Enable fractions of accelerator offers
• Enable multiple containers work on the same accelerator
concurrently
Why accelerator sharing is hard in cluster
managers
• Device drivers do not enable sharing
• Propagation to all cluster manager modules
• Case in point: Apache Mesos
How Apache Mesos works
GPU GPU GPU
Apache Mesos
Offers do not
allow sharing
Bound to
vendor drivers
Approach: Decouple executors from vendor
locking.
GPU GPU GPU
Abstraction Layer
+ gpu abstract units
+ gpu abstract units + gpu abstract units
+ gpu abstract units
Questions:
• How to implement the
abstraction layer?
• What is the offer “currency”?
• Ease of use?
Accelerator abstraction layer: The hardware
interface
• Main idea:
• Have a server process for each
accelerator of a node.
• Implement all vendor bound
functionality
• Three functions:
• Monitor for incoming workloads.
• Load the proper kernel.
• Assume a lib of kernels
• Properly transfer the data.
Kernel call
+ data
Butler
process
Butler
process
Butler
process
Kernel call
+ dataKernel call
+ data
Nvidia API
Intel API
Xilinx API
Shared
Memory
VAQ VAQ
VAQ VAQ
Hardware
Interface
(a user space
process)
Mesos
Executor
Mesos
Task
Mesos
Task
Mesos
Task
subtask
+ data
subtask
+ data
data data
data data
subtask
+ data
subtasks
k + data
subtask
+ data
subtask
+ data
Accelerator abstraction layer: The software
interface
• Shared interface among Mesos tasks
• IPC mechanism to transfer functions + data
• Shared memory
• Abstract accelerators as Virtual Access Queues
Mesos
Executor
Vinetalk: Putting everything together.
15
VineTalk VineController
`
Virtual Accelerator Queues
Physical
Accelerator
thread
Data buffers
Task
Task
Task Physical
Accelerator
thread
Physical
Accelerator
thread
Task
GPU
Task
FPGA
Task
GPU
VAQs become the offer currency
A Showcase of the gain
• Two mesos tasks trying to share a GPU:
• One that launches 1000x a Monte Carlo GPU kernel
• One that launches 1000x a Black&Scholes GPU kernel
• Monte Carlo > Black&Scholes
• Compare queuing time of the first subtask
Black & Scholes (msec) Montecarlo(msec)
Mesos 126Κ 38
Mesos + Vinetalk 40 42
Is that easy to use?
• It is much easier (30% fewer lines of code)
• It hides all vendor specific APIs.
• Example: porting of ICCS FPGA financial application (FPL’17).
FPGA
Black-Scholes
(ICCS)
SDAccel
User application
(ICCS)
FPGA
Black-Scholes
(ICCS)
SDAccel
Vinetalk
User application
• We ported all SDAccel API
inside Vinetalk.
• User applications just need
to use the Vinetalk API
Software engineering effort
How about overheads?
My gift to you:
• Vinetalk as of today becomes open source
• Apache v2.0
• Check it out:
•https://github.com/vineyard2020/vinetalkSuite
Conclusions
• Problem: Accelerators cannot be shared in a cluster
• Cause: The big forest of (proprietary) device drivers
• Solution: Abstract accelerators through Vinetalk
• Install Vinetalk to workers
• Offer VAQs as resources
• 1-5% overhead due to memory transfers
• Ease of integration with all Mesos frameworks (such as Spark etc).
• https://github.com/vineyard2020/vinetalkSuite

More Related Content

What's hot

OpenStack Nova Scheduler
OpenStack Nova Scheduler OpenStack Nova Scheduler
OpenStack Nova Scheduler Peeyush Gupta
 
Google Compute Engine
Google Compute EngineGoogle Compute Engine
Google Compute EngineCsaba Toth
 
CloudStack news
CloudStack newsCloudStack news
CloudStack newsShapeBlue
 
12.07.2017 Docker Meetup - POSTGRE SQL ON KUBERNETES
12.07.2017 Docker Meetup - POSTGRE SQL ON KUBERNETES12.07.2017 Docker Meetup - POSTGRE SQL ON KUBERNETES
12.07.2017 Docker Meetup - POSTGRE SQL ON KUBERNETESZalando adtech lab
 
HybridAzureCloud
HybridAzureCloudHybridAzureCloud
HybridAzureCloudChris Condo
 
OpenNebula Conf 2014 | Bootstrapping a virtual infrastructure using OpenNebul...
OpenNebula Conf 2014 | Bootstrapping a virtual infrastructure using OpenNebul...OpenNebula Conf 2014 | Bootstrapping a virtual infrastructure using OpenNebul...
OpenNebula Conf 2014 | Bootstrapping a virtual infrastructure using OpenNebul...NETWAYS
 
Mobile 3: Launch Like a Boss!
Mobile 3: Launch Like a Boss!Mobile 3: Launch Like a Boss!
Mobile 3: Launch Like a Boss!MongoDB
 
Arc305 how netflix leverages multiple regions to increase availability an i...
Arc305 how netflix leverages multiple regions to increase availability   an i...Arc305 how netflix leverages multiple regions to increase availability   an i...
Arc305 how netflix leverages multiple regions to increase availability an i...Ruslan Meshenberg
 
Cloud: From Unmanned Data Center to Algorithmic Economy using Openstack
Cloud: From Unmanned Data Center to Algorithmic Economy using OpenstackCloud: From Unmanned Data Center to Algorithmic Economy using Openstack
Cloud: From Unmanned Data Center to Algorithmic Economy using OpenstackAndrew Yongjoon Kong
 
Scaling DataStax in Docker
Scaling DataStax in DockerScaling DataStax in Docker
Scaling DataStax in DockerDataStax
 
SUSE CaaSP: deploy OpenFaaS and Ethereum Blockchain on Kubernetes
SUSE CaaSP: deploy OpenFaaS and Ethereum Blockchain on KubernetesSUSE CaaSP: deploy OpenFaaS and Ethereum Blockchain on Kubernetes
SUSE CaaSP: deploy OpenFaaS and Ethereum Blockchain on KubernetesJuan Herrera Utande
 
WordPress Cluster for Enterprise High-Availability and On-Demand Scaling
WordPress Cluster for Enterprise High-Availability and On-Demand ScalingWordPress Cluster for Enterprise High-Availability and On-Demand Scaling
WordPress Cluster for Enterprise High-Availability and On-Demand ScalingJelastic Multi-Cloud PaaS
 
(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC
(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC
(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPCAmazon Web Services
 
Suning OpenStack Cloud and Heat
Suning OpenStack Cloud and HeatSuning OpenStack Cloud and Heat
Suning OpenStack Cloud and HeatQiming Teng
 
Building software defined clouds - Boyan Ivanov
Building software defined clouds - Boyan Ivanov  Building software defined clouds - Boyan Ivanov
Building software defined clouds - Boyan Ivanov ShapeBlue
 

What's hot (18)

OpenStack Nova Scheduler
OpenStack Nova Scheduler OpenStack Nova Scheduler
OpenStack Nova Scheduler
 
Google Compute Engine
Google Compute EngineGoogle Compute Engine
Google Compute Engine
 
CloudStack news
CloudStack newsCloudStack news
CloudStack news
 
12.07.2017 Docker Meetup - POSTGRE SQL ON KUBERNETES
12.07.2017 Docker Meetup - POSTGRE SQL ON KUBERNETES12.07.2017 Docker Meetup - POSTGRE SQL ON KUBERNETES
12.07.2017 Docker Meetup - POSTGRE SQL ON KUBERNETES
 
HybridAzureCloud
HybridAzureCloudHybridAzureCloud
HybridAzureCloud
 
OpenNebula Conf 2014 | Bootstrapping a virtual infrastructure using OpenNebul...
OpenNebula Conf 2014 | Bootstrapping a virtual infrastructure using OpenNebul...OpenNebula Conf 2014 | Bootstrapping a virtual infrastructure using OpenNebul...
OpenNebula Conf 2014 | Bootstrapping a virtual infrastructure using OpenNebul...
 
OpenStack
OpenStackOpenStack
OpenStack
 
Mobile 3: Launch Like a Boss!
Mobile 3: Launch Like a Boss!Mobile 3: Launch Like a Boss!
Mobile 3: Launch Like a Boss!
 
Arc305 how netflix leverages multiple regions to increase availability an i...
Arc305 how netflix leverages multiple regions to increase availability   an i...Arc305 how netflix leverages multiple regions to increase availability   an i...
Arc305 how netflix leverages multiple regions to increase availability an i...
 
Cloud: From Unmanned Data Center to Algorithmic Economy using Openstack
Cloud: From Unmanned Data Center to Algorithmic Economy using OpenstackCloud: From Unmanned Data Center to Algorithmic Economy using Openstack
Cloud: From Unmanned Data Center to Algorithmic Economy using Openstack
 
Scaling DataStax in Docker
Scaling DataStax in DockerScaling DataStax in Docker
Scaling DataStax in Docker
 
K8s monitoring with elk
K8s monitoring with elkK8s monitoring with elk
K8s monitoring with elk
 
SUSE CaaSP: deploy OpenFaaS and Ethereum Blockchain on Kubernetes
SUSE CaaSP: deploy OpenFaaS and Ethereum Blockchain on KubernetesSUSE CaaSP: deploy OpenFaaS and Ethereum Blockchain on Kubernetes
SUSE CaaSP: deploy OpenFaaS and Ethereum Blockchain on Kubernetes
 
WordPress Cluster for Enterprise High-Availability and On-Demand Scaling
WordPress Cluster for Enterprise High-Availability and On-Demand ScalingWordPress Cluster for Enterprise High-Availability and On-Demand Scaling
WordPress Cluster for Enterprise High-Availability and On-Demand Scaling
 
Openstack nova
Openstack novaOpenstack nova
Openstack nova
 
(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC
(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC
(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC
 
Suning OpenStack Cloud and Heat
Suning OpenStack Cloud and HeatSuning OpenStack Cloud and Heat
Suning OpenStack Cloud and Heat
 
Building software defined clouds - Boyan Ivanov
Building software defined clouds - Boyan Ivanov  Building software defined clouds - Boyan Ivanov
Building software defined clouds - Boyan Ivanov
 

Similar to Vinetalk: The missing piece for cluster managers to enable accelerator sharing

Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Fwdays
 
To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…Sergey Dzyuban
 
Distributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learnedDistributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learnedWee Hyong Tok
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLNordic APIs
 
Choosing the right parallel compute architecture
Choosing the right parallel compute architecture Choosing the right parallel compute architecture
Choosing the right parallel compute architecture corehard_by
 
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)Tibo Beijen
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
Container orchestration and microservices world
Container orchestration and microservices worldContainer orchestration and microservices world
Container orchestration and microservices worldKarol Chrapek
 
Democratizing machine learning on kubernetes
Democratizing machine learning on kubernetesDemocratizing machine learning on kubernetes
Democratizing machine learning on kubernetesDocker, Inc.
 
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...Indrajit Poddar
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
 
Criteo meetup - S.R.E Tech Talk
Criteo meetup - S.R.E Tech TalkCriteo meetup - S.R.E Tech Talk
Criteo meetup - S.R.E Tech TalkPierre Mavro
 
Optimizing elastic search on google compute engine
Optimizing elastic search on google compute engineOptimizing elastic search on google compute engine
Optimizing elastic search on google compute engineBhuvaneshwaran R
 
Running ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in ProductionRunning ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in ProductionSearce Inc
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageMayaData Inc
 
Spark and Deep Learning frameworks with distributed workloads
Spark and Deep Learning frameworks with distributed workloadsSpark and Deep Learning frameworks with distributed workloads
Spark and Deep Learning frameworks with distributed workloadsS N
 

Similar to Vinetalk: The missing piece for cluster managers to enable accelerator sharing (20)

Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
 
To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…
 
Distributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learnedDistributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learned
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
 
Choosing the right parallel compute architecture
Choosing the right parallel compute architecture Choosing the right parallel compute architecture
Choosing the right parallel compute architecture
 
Accelerated SDN in Azure
Accelerated SDN in AzureAccelerated SDN in Azure
Accelerated SDN in Azure
 
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Container orchestration and microservices world
Container orchestration and microservices worldContainer orchestration and microservices world
Container orchestration and microservices world
 
Democratizing machine learning on kubernetes
Democratizing machine learning on kubernetesDemocratizing machine learning on kubernetes
Democratizing machine learning on kubernetes
 
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
Criteo meetup - S.R.E Tech Talk
Criteo meetup - S.R.E Tech TalkCriteo meetup - S.R.E Tech Talk
Criteo meetup - S.R.E Tech Talk
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 
Optimizing elastic search on google compute engine
Optimizing elastic search on google compute engineOptimizing elastic search on google compute engine
Optimizing elastic search on google compute engine
 
Running ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in ProductionRunning ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in Production
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
Spark and Deep Learning frameworks with distributed workloads
Spark and Deep Learning frameworks with distributed workloadsSpark and Deep Learning frameworks with distributed workloads
Spark and Deep Learning frameworks with distributed workloads
 

Recently uploaded

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Recently uploaded (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Vinetalk: The missing piece for cluster managers to enable accelerator sharing

  • 1. Vinetalk: The missing piece for cluster managers to enable accelerator sharing. Christos Kozanitis - FORTH
  • 3. To make things harder: Add heterogeneity • Accelerators (GPU/FPGA) • $$$ / device • Better value if: • Good utilization • Ease of use
  • 4. Existing accelerator support • Mesos/Kubernetes • They know how to manage GPU resources • They do not know how to offer fractions of GPUs. • Underutilization => Expensive practice • More workloads => more hardware
  • 5. Production workload characteristics • Hypothesis: • “User facing” tasks take a few msec • Non production tasks run in the background for days • No accelerator-cluster data available to test • Google datacenter data from the past
  • 6. User facing task durations • P50: 300sec • Long tail: millions of sec • Hypothesis does not hold • Production tasks last longer than “a few msec” CDF of Task Execution time for priorities 9-11 Total tasks: 80K Source: Google cluster data
  • 7. Are tasks doing “work” all of the time? • Not really • Noon is busier than night • Assuming GPU tasks follow the same pattern • Expensive to keep GPUs idle/underutilized Source: Reiss et.al, Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis, SOCC’12 50% Time (hour) Portionof clusterCPU Long running production tasks
  • 8. To sum up so far • Cluster managers do not enable accelerator sharing • Expensive practice • 10’s of $Ks / device • More users => more devices • Long reservations for too many production tasks • P50=300sec, P80=1hr • The tasks are not always busy • 50% workload volatility
  • 9. Our approach • Enable fractions of accelerator offers • Enable multiple containers work on the same accelerator concurrently
  • 10. Why accelerator sharing is hard in cluster managers • Device drivers do not enable sharing • Propagation to all cluster manager modules • Case in point: Apache Mesos
  • 11. How Apache Mesos works GPU GPU GPU Apache Mesos Offers do not allow sharing Bound to vendor drivers
  • 12. Approach: Decouple executors from vendor locking. GPU GPU GPU Abstraction Layer + gpu abstract units + gpu abstract units + gpu abstract units + gpu abstract units Questions: • How to implement the abstraction layer? • What is the offer “currency”? • Ease of use?
  • 13. Accelerator abstraction layer: The hardware interface • Main idea: • Have a server process for each accelerator of a node. • Implement all vendor bound functionality • Three functions: • Monitor for incoming workloads. • Load the proper kernel. • Assume a lib of kernels • Properly transfer the data. Kernel call + data Butler process Butler process Butler process Kernel call + dataKernel call + data Nvidia API Intel API Xilinx API
  • 14. Shared Memory VAQ VAQ VAQ VAQ Hardware Interface (a user space process) Mesos Executor Mesos Task Mesos Task Mesos Task subtask + data subtask + data data data data data subtask + data subtasks k + data subtask + data subtask + data Accelerator abstraction layer: The software interface • Shared interface among Mesos tasks • IPC mechanism to transfer functions + data • Shared memory • Abstract accelerators as Virtual Access Queues
  • 15. Mesos Executor Vinetalk: Putting everything together. 15 VineTalk VineController ` Virtual Accelerator Queues Physical Accelerator thread Data buffers Task Task Task Physical Accelerator thread Physical Accelerator thread Task GPU Task FPGA Task GPU VAQs become the offer currency
  • 16. A Showcase of the gain • Two mesos tasks trying to share a GPU: • One that launches 1000x a Monte Carlo GPU kernel • One that launches 1000x a Black&Scholes GPU kernel • Monte Carlo > Black&Scholes • Compare queuing time of the first subtask Black & Scholes (msec) Montecarlo(msec) Mesos 126Κ 38 Mesos + Vinetalk 40 42
  • 17. Is that easy to use? • It is much easier (30% fewer lines of code) • It hides all vendor specific APIs. • Example: porting of ICCS FPGA financial application (FPL’17). FPGA Black-Scholes (ICCS) SDAccel User application (ICCS) FPGA Black-Scholes (ICCS) SDAccel Vinetalk User application • We ported all SDAccel API inside Vinetalk. • User applications just need to use the Vinetalk API
  • 20. My gift to you: • Vinetalk as of today becomes open source • Apache v2.0 • Check it out: •https://github.com/vineyard2020/vinetalkSuite
  • 21. Conclusions • Problem: Accelerators cannot be shared in a cluster • Cause: The big forest of (proprietary) device drivers • Solution: Abstract accelerators through Vinetalk • Install Vinetalk to workers • Offer VAQs as resources • 1-5% overhead due to memory transfers • Ease of integration with all Mesos frameworks (such as Spark etc). • https://github.com/vineyard2020/vinetalkSuite

Editor's Notes

  1. Here is a view of Apache Mesos