Dask for Fast Distributed Batch Scoring of Computer Vision Workloads

•

2 likes•705 views

Mathew Salvaris

Presentation on using Dask for distributed scoring of Deep Learning models

Engineering

Dask and V100s for Fast,
Distributed Batch Scoring of
Computer Vision Workloads
Mathew Salvaris

• What is Dask?
• Batch scoring use cases
• Style transfer
• Mask- RCNN for object detection
and segmentation

Dask for batch scoring
• Same code runs on single node
locally or in a cluster
• No orchestration code to write,
Dask handles orchestration
• Can create and execute complex
DAGs that can be generated on the
fly

Batch scoring use cases
• Style Transfer
• Object detection and segmentation
ReturnModel Write

Style transfer process
Read
Apply
ReturnModel Write

Dask Graph of Style Transfer Process
Load
Image
Load
Image
Load
Image
Load
Image
Stack
Style
Transfer
Write
N

@curry
def process_batch(client, style_model, output_path, batch_filenames):
remote_batch_f = client.scatter(batch_filenames)
img_array_f = client.map(load_image, remote_batch_f)
stacked_array_f = client.submit(stack, img_array_f)
styled_array_f = client.submit(stylize_batch, style_model, stacked_array_f)
return client.submit(write, batch, styled_array_f, output_path)

Testing Dask locally on GPU
• Need GPU based libraries for workers and CPU based ones client
• Would limit the visibility of CUDA devices to each worker
• Spawn as many workers as there are GPUs
CUDA_VISIBLE_DEVICES=0 dask-worker 127.0.0.1 --nprocs 1 --nthreads 1 --resources 'GPU=1

What is Azure ML Pipelines
Create experimentation graph that gets you from data to a model
The pipeline can be exported as a parameterised end point

Dask on Azure pipelines
• Need to set off workers, scheduler and run client
• Azure ML pipelines has MPIStep which allows us to trigger MPI job
• Run workers on all ranks - Run client and scheduler on rank 0

GPU: V100
# Images: 823
180.4
96.7
48.9
25.4
15.5
0
20
40
60
80
100
120
140
160
180
200
1 Process 2 Processes 4 Processes 8 Processes 12 Processes
Time Taken (S)
4-5 images/second
53 images/second

180.4
96.7
48.9
25.4
15.5 12.3
0
20
40
60
80
100
120
140
160
180
200
1 Process 2 Processes 4 Processes 8 Processes 12 Processes
Blob Premium Blob
GPU: V100
# Images: 823

180.4
96.7
48.9
25.4
15.5
106.8
56.5
33.7
16.8
12.3
0
20
40
60
80
100
120
140
160
180
200
1 Process 2 Processes 4 Processes 8 Processes 12 Processes
Blob Premium Blob
69 images/second
8 images/second
GPU: V100
# Images: 823

Mask-RCNN process
ReturnCombine Write
Read Model
BBOX
Mask
BBOX
Mask

Dask Graph of Mask-RCNN
Load
Image
Load
Image
Load
Image
Load
Image
Preprocessing
Score
Merge image and
annotations
N
Write

@curry
def process_batch(client, style_model, preprocessing, output_path, batch):
remote_batch_f = client.scatter(batch)
img_array_f = client.map(load_image, remote_batch_f)
pre_img_array_f = client.map(preprocessing, img_array_f)
bbox_list_f = client.submit(score_batch, style_model, pre_img_array_f)
results_f = client.submit(loop_annotations, img_array_f, bbox_list_f)
return client.submit(loop_write(output_path), batch, results_f)

Dask on Kubernetes
• Use Helm chart to deploy, workers, scheduler and Jupyter lab
• Provision 3 VMs with 4 V100s each
• Installed Nvidia device plugin
• Installed plugin for storage

Summary
Able to easily prototype two
batch scoring scenarios locally
then deploy on Kubernetes as
well as Azure pipelines
Using GPUs through Dask could
be more straight forward, better
interaction between Dask and DL
libraries

Acknowledgements
JS Tan
Azure Machine Learning Pipelines Team

What's hot

Workers and Worker Patterns at ScaleChad Arimura

Going Microserverless on Google Cloud @ mablJoseph Lust

Cosla: JIRA SLA Metrics Tool in ClojureNoah Zucker

Performance testing in scope of migration to cloud by Serghei RadovValeriia Maliarenko

Yipit - AWS Start-Up Customer Amazon Web Services

Storm over gearpumpTianlun Zhang

Temporal Performance Modelling of Serverless Computing Platforms - WoSC6Nima Mahmoudi

Experiences sharing about Lambda, Kinesis, and PostgresqlOkis Chuang

Build Low-Latency Applications in Rust on ScyllaDBScyllaDB

Garbage collection in JVMaragozin

Cassandra On EPAM Cloud - VDAY 2017Oresztész Margaritisz

Unikraft: Fast, Specialized Unikernels the Easy WayScyllaDB

Firebase Cloud Functions: a quick overviewJoseph Lust

Monitoring Kafka w/ Prometheuskawamuray

Everyday tools and tricks for scaling Node.jsNikolay Stoitsev

HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon

Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016ShuttleCloud

Apache StormRajind Ruparathna

Introduction to Streaming Distributed Processing with StormBrandon O'Brien

The Dark Side Of Go -- Go runtime related problems in TiDB in productionPingCAP

What's hot (20)

Workers and Worker Patterns at Scale

Going Microserverless on Google Cloud @ mabl

Cosla: JIRA SLA Metrics Tool in Clojure

Performance testing in scope of migration to cloud by Serghei Radov

Yipit - AWS Start-Up Customer

Storm over gearpump

Temporal Performance Modelling of Serverless Computing Platforms - WoSC6

Experiences sharing about Lambda, Kinesis, and Postgresql

Build Low-Latency Applications in Rust on ScyllaDB

Garbage collection in JVM

Cassandra On EPAM Cloud - VDAY 2017

Unikraft: Fast, Specialized Unikernels the Easy Way

Firebase Cloud Functions: a quick overview

Monitoring Kafka w/ Prometheus

Everyday tools and tricks for scaling Node.js

HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase

Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Apache Storm

Introduction to Streaming Distributed Processing with Storm

The Dark Side Of Go -- Go runtime related problems in TiDB in production

Similar to Dask for Fast Distributed Batch Scoring of Computer Vision Workloads

End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit

CloudStack - Top 5 Technical Issues and TroubleshootingShapeBlue

ETL with SPARK - First Spark London meetupRafal Kwasny

Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesDataWorks Summit

DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...Rustem Feyzkhanov

Improving Apache Spark DownscalingDatabricks

Fundamentals of Physical Memory AnalysisDmitry Vostokov

Drools, jBPM OptaPlanner presentationMark Proctor

HTML5 - A Whirlwind tourLohith Goudagere Nagaraj

Html5 more than just html5 v finalLohith Goudagere Nagaraj

Dok Talks #124 - Intro to Druid on KubernetesDoKC

Champion Fas DeduplicationMichael Hudak

A Technical Deep Dive on Protecting Acropolis Workloads with RubrikNEXTtour

DevFest 2022 - Skaffold 2 Deep Dive Taipei.pdfKAI CHU CHUNG

Designing High Performance RTC Signaling ServersDaniel-Constantin Mierla

JOSA TechTalks - Downgrade your CostsJordan Open Source Association

Usman Shakeel - Cloud Rendering at Scale :: AWS Rendering SeminarAmazon Web Services Korea

(ARC402) Deployment Automation: From Developers' Keyboards to End Users' Scre...Amazon Web Services

Improving CloudStack for operatorsShapeBlue

FMK2019 being an optimist in a pessimistic world by vincenzo menannoVerein FM Konferenz

Similar to Dask for Fast Distributed Batch Scoring of Computer Vision Workloads (20)

End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...

CloudStack - Top 5 Technical Issues and Troubleshooting

ETL with SPARK - First Spark London meetup

Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes

DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...

Improving Apache Spark Downscaling

Fundamentals of Physical Memory Analysis

Drools, jBPM OptaPlanner presentation

HTML5 - A Whirlwind tour

Html5 more than just html5 v final

Dok Talks #124 - Intro to Druid on Kubernetes

Champion Fas Deduplication

A Technical Deep Dive on Protecting Acropolis Workloads with Rubrik

DevFest 2022 - Skaffold 2 Deep Dive Taipei.pdf

Designing High Performance RTC Signaling Servers

JOSA TechTalks - Downgrade your Costs

Usman Shakeel - Cloud Rendering at Scale :: AWS Rendering Seminar

(ARC402) Deployment Automation: From Developers' Keyboards to End Users' Scre...

Improving CloudStack for operators

FMK2019 being an optimist in a pessimistic world by vincenzo menanno

Recently uploaded

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Low Rate Call Girls In Saket, Delhi NCR

Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth

Introduction to IEEE STANDARDS and its different types.pptxupamatechverse

Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis

★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR9953056974 Low Rate Call Girls In Saket, Delhi NCR

Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis

(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...ranjana rawat

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal

Porous Ceramics seminar and technical writingrakeshbaidya232001

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh

Recently uploaded (20)

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf

Processing & Properties of Floor and Wall Tiles.pptx

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...

Introduction to IEEE STANDARDS and its different types.pptx

Microscopic Analysis of Ceramic Materials.pptx

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...

★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR

Coefficient of Thermal Expansion and their Importance.pptx

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...

(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...

Porous Ceramics seminar and technical writing

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts

Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝

Dask for Fast Distributed Batch Scoring of Computer Vision Workloads

1. Dask and V100s for Fast, Distributed Batch Scoring of Computer Vision Workloads Mathew Salvaris

2. • What is Dask? • Batch scoring use cases • Style transfer • Mask- RCNN for object detection and segmentation

3. http://docs.dask.org/en/latest/why.html

4. Dask for batch scoring • Same code runs on single node locally or in a cluster • No orchestration code to write, Dask handles orchestration • Can create and execute complex DAGs that can be generated on the fly

5. Batch scoring use cases • Style Transfer • Object detection and segmentation ReturnModel Write

6. Style transfer process Read Apply ReturnModel Write

7. Dask Graph of Style Transfer Process Load Image Load Image Load Image Load Image Stack Style Transfer Write N

8. @curry def process_batch(client, style_model, output_path, batch_filenames): remote_batch_f = client.scatter(batch_filenames) img_array_f = client.map(load_image, remote_batch_f) stacked_array_f = client.submit(stack, img_array_f) styled_array_f = client.submit(stylize_batch, style_model, stacked_array_f) return client.submit(write, batch, styled_array_f, output_path)

9. Testing Dask locally on GPU • Need GPU based libraries for workers and CPU based ones client • Would limit the visibility of CUDA devices to each worker • Spawn as many workers as there are GPUs CUDA_VISIBLE_DEVICES=0 dask-worker 127.0.0.1 --nprocs 1 --nthreads 1 --resources 'GPU=1

10.

11. What is Azure ML Pipelines Create experimentation graph that gets you from data to a model The pipeline can be exported as a parameterised end point

12. Dask on Azure pipelines • Need to set off workers, scheduler and run client • Azure ML pipelines has MPIStep which allows us to trigger MPI job • Run workers on all ranks - Run client and scheduler on rank 0

13. GPU: V100 # Images: 823 180.4 96.7 48.9 25.4 15.5 0 20 40 60 80 100 120 140 160 180 200 1 Process 2 Processes 4 Processes 8 Processes 12 Processes Time Taken (S) 4-5 images/second 53 images/second

14. 180.4 96.7 48.9 25.4 15.5 12.3 0 20 40 60 80 100 120 140 160 180 200 1 Process 2 Processes 4 Processes 8 Processes 12 Processes Blob Premium Blob GPU: V100 # Images: 823

15. 180.4 96.7 48.9 25.4 15.5 106.8 56.5 33.7 16.8 12.3 0 20 40 60 80 100 120 140 160 180 200 1 Process 2 Processes 4 Processes 8 Processes 12 Processes Blob Premium Blob 69 images/second 8 images/second GPU: V100 # Images: 823

16. Azure ML Pipelines Notebook

17. Mask-RCNN process ReturnCombine Write Read Model BBOX Mask BBOX Mask

18. Dask Graph of Mask-RCNN Load Image Load Image Load Image Load Image Preprocessing Score Merge image and annotations N Write

19. @curry def process_batch(client, style_model, preprocessing, output_path, batch): remote_batch_f = client.scatter(batch) img_array_f = client.map(load_image, remote_batch_f) pre_img_array_f = client.map(preprocessing, img_array_f) bbox_list_f = client.submit(score_batch, style_model, pre_img_array_f) results_f = client.submit(loop_annotations, img_array_f, bbox_list_f) return client.submit(loop_write(output_path), batch, results_f)

20. Dask on Kubernetes • Use Helm chart to deploy, workers, scheduler and Jupyter lab • Provision 3 VMs with 4 V100s each • Installed Nvidia device plugin • Installed plugin for storage

21. Demo Kubernetes

22. Summary Able to easily prototype two batch scoring scenarios locally then deploy on Kubernetes as well as Azure pipelines Using GPUs through Dask could be more straight forward, better interaction between Dask and DL libraries

23. Acknowledgements JS Tan Azure Machine Learning Pipelines Team

24. Thanks & Questions?

Dask for Fast Distributed Batch Scoring of Computer Vision Workloads

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Dask for Fast Distributed Batch Scoring of Computer Vision Workloads

Similar to Dask for Fast Distributed Batch Scoring of Computer Vision Workloads (20)

Recently uploaded

Recently uploaded (20)

Dask for Fast Distributed Batch Scoring of Computer Vision Workloads