Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scalable Scientific Computing with Dask


Published on

Dask Tutorial at PyConDE / PyData Karlsruhe 2018. These were the introductory slides that mainly contain the link to Matthew Rocklin's Dask workshop at PyData NYC 2018 whereon this workshop was based.

Published in: Data & Analytics
  • Be the first to comment

Scalable Scientific Computing with Dask

  1. 1. 1 PyCon.DE / PyData Karlsruhe 2018 Uwe L. Korn Scalable Scientific Computing with Dask
  2. 2. 2 • Senior Data Scientist at Blue Yonder (@BlueYonderTech) • Apache {Arrow, Parquet} PMC • Data Engineer and Architect with heavy focus around Pandas About me xhochy
  3. 3. 3 • Execution and definition of task graphs • a parallel computing library that scales the existing Python ecosystem. • scales down to your laptop laptop • sclaes up to a cluster What is Dask?
  4. 4. 4 • multi-core and distributed parallel execution • low-level: task schedulers for computation graphs • high-level: Array, Bag and DataFrame More than a single CPU
  5. 5. 5 Dask is • More light-weight • In Python, operates well with C/C++/Fortran/LLVM or other natively compiled code • Part of the Python ecosystem What about Spark?
  6. 6. 6 Spark is • Written in Scala and works well within the JVM • Python support is very limited • Brings its own ecosystem • Able to provide more higher level optimizations What about Spark?
  7. 7. pydata-nyc-2018-tutorial 7