We took Google’s recently open sourced deep learning library called “TensorFlow”, which is single-machine, and built a distributed implementation on Spark. We did this by using the abstraction layer called “Distributed DataFrame”, or DDF. TensorFlow started as an internal research project at Google and has since spread to be used across the organization. Google open sourced the project in Nov. 2015, but kept the distributed version of it closed. There are other deep learning frameworks that have been implemented on Spark, but TensorFlow’s API is elegant and powerful and has proven to work at Google scale. Putting TensorFlow on Spark makes it easier for companies to harness the power of deep learning in their own businesses without sacrificing scalability.
KEY TAKE-AWAYS: (1) Google’s TensorFlow release is a single-machine implementation, (2) We have built a distributed implementation in Spark which allows TensorFlow to scale horizontally.
This talk was presented at the Spark Summit East 2016; watch it here: https://www.youtube.com/watch?v=-QtcP3yRqyM&index=5&list=PLMpF-Q2WyoGudT3z-VPGs-6nnv45cod_R