This document summarizes Timothée Hunter's presentation on TensorFrames, which allows running Google TensorFlow models on Apache Spark. Some key points:
- TensorFrames embeds TensorFlow into Spark to enable distributed numerical computing on big data. This leverages GPUs to speed up computationally intensive machine learning algorithms.
- An example demonstrates speedups from using TensorFrames and GPUs for kernel density estimation, a non-parametric statistical technique.
- Future improvements include better integration with Tungsten in Spark for direct memory copying and columnar storage to reduce communication costs.