for analytics
Nico Liberato Candio
WHY DASK
DASK IN ACTION
RESULTS TABLE OF
CONTENTS
01
02
03
04 REFERENCES
03
WHY DASK
01
Parallel computing
library that scales
the existing Python
ecosystem : NumPy,
Pandas , Scikit
PARALLELIZATION FLEXIBLE RESPONSIVE
Task scheduling
interface for more
custom workloads
and integration.
Single machine or
distributed
workloads across
GCP clusters
Runs resiliently up
to 1000s cores.
Horizontal and
vertical scaling.
03
DASK IN ACTION
02
6
Example:
1. Dask in GKE cluster
2. Models deployment (Helm)
3. Faster training and
workload parallelization
4. Scale via Kubernetes or
Dask (threads/workers)
Demo: simple tree
summation computation
with Dask installed in our
cluster.
Example: parallelization in GKE.
03
CONCLUSIONS
03
WHY DASK IS
GOOD FOR US
OPEN SOURCE
Our setup is on GKE
with a single Dask
cluster deployed
with Helm. Open
Source solution
configurable for all
the ML projects.
SCALING
We scale the number
of workers (threads)
dividing the
workload in multiple
units. Horizontal or
vertical scaling via
dashboard.
SPEED
We speed up the
computation
achieving faster
times in training
models
Next?
Creation of Dask clusters
by computation context,
saving costs .
Storage and dataframes
parallelization refactoring
the model
implementation.
Questions?
nicoliberatoc@gmail.com
+353 089 44 207 64
THANKS!
CREDITS
This is where you give credit to the ones who are part of this
project.
Did you like the resources on this template? Get them for
free at our other websites.
◂ Presentation template by Slidesgo
◂ Icons by Flaticon
◂ Infographics by Freepik
◂ Images created by Freepik and rawpixel -Freepik
◂ Author introduction slide photo created by Freepik
◂ Text & Image slide photo created by Freepik.com
Dask for Analytics

Dask for Analytics

  • 1.
  • 2.
    WHY DASK DASK INACTION RESULTS TABLE OF CONTENTS 01 02 03 04 REFERENCES
  • 3.
  • 4.
    Parallel computing library thatscales the existing Python ecosystem : NumPy, Pandas , Scikit PARALLELIZATION FLEXIBLE RESPONSIVE Task scheduling interface for more custom workloads and integration. Single machine or distributed workloads across GCP clusters Runs resiliently up to 1000s cores. Horizontal and vertical scaling.
  • 5.
  • 6.
    6 Example: 1. Dask inGKE cluster 2. Models deployment (Helm) 3. Faster training and workload parallelization 4. Scale via Kubernetes or Dask (threads/workers) Demo: simple tree summation computation with Dask installed in our cluster.
  • 7.
  • 8.
  • 9.
    WHY DASK IS GOODFOR US OPEN SOURCE Our setup is on GKE with a single Dask cluster deployed with Helm. Open Source solution configurable for all the ML projects. SCALING We scale the number of workers (threads) dividing the workload in multiple units. Horizontal or vertical scaling via dashboard. SPEED We speed up the computation achieving faster times in training models
  • 10.
    Next? Creation of Daskclusters by computation context, saving costs . Storage and dataframes parallelization refactoring the model implementation.
  • 11.
  • 12.
    CREDITS This is whereyou give credit to the ones who are part of this project. Did you like the resources on this template? Get them for free at our other websites. ◂ Presentation template by Slidesgo ◂ Icons by Flaticon ◂ Infographics by Freepik ◂ Images created by Freepik and rawpixel -Freepik ◂ Author introduction slide photo created by Freepik ◂ Text & Image slide photo created by Freepik.com