Data analytics applications are often 10x off peak hardware performance since they combine multiple functions from different libraries and frameworks to build increasingly complex workflows. Even if each individual function is optimized in isolation, the cost of data movement across these functions can cause order of magnitude slowdowns. For example, even though the TensorFlow machine-learning library uses highly tuned linear algebra functions for each of its operators, workflows that combine these operators can be 16x slower than hand-tuned code. Similarly, workflows that perform relational processing in Spark SQL or pandas, numerical processing in NumPy, or a combination of these tasks spend most of their time in data movement across processing functions and could run between 2x and 30× faster if optimized end to end.
This talk offers an overview of Weld, an optimizing runtime for data-intensive applications that works across disjoint libraries and functions. Weld uses a common representation to capture the structure of diverse data-parallel workloads such as SQL, machine learning, and graph analytics and then optimizes across them using a cost-based optimizer that takes into account hardware characteristics. Weld can be integrated into a variety of widely used analytics frameworks, such as Spark SQL for relational processing, TensorFlow for machine learning, and Pandas and NumPy for general data science workloads. Integrating Weld with these frameworks requires no changes to user application code. Weld speeds up existing workloads in these frameworks by up to 16x and can also enable speed-ups of two orders of magnitude in applications that combine them.