One of the earliest challenges facing new data science practitioners is how to scale their work from something that runs on their laptop to larger-scale jobs. Tools like Spark and Hadoop can have a steep learning curve, and often require explicit management of a compute cluster. Here we talk about pyWren, a python library that lets users run their workloads on hundreds of cloud machines with no distributed computing knowledge, for a few dollars at a time. We will walk the audience from writing simple data analysis functions on their laptop to running on 1000 cores on Amazon's web services, in 30 minutes.
Presented at AnacondaCON 2017 by Eric Jonas, UC Berkeley.