In recent releases, TensorFlow has been enhanced for distributed learning and HDFS access. Outside of the Google cloud, however, users still needed a dedicated cluster for TensorFlow applications. There are several community projects wiring TensorFlow onto Apache Spark clusters. Unfortunately, they are limited to support synchronous distributed learning only, and don’t allow TensorFlow servers to communicate with each other directly.
In this talk, we will introduce a new framework, TensorFlowOnSpark, for scalable TensorFlow learning, which will be open sourced in Q1 2017. This new framework enables easy experimentation for algorithm designs, and supports scalable training & inferencing on Spark clusters. It supports all TensorFlow functionalities including synchronous & asynchronous learning, model & data parallelism, and TensorBoard. It provides architectural flexibility for data ingestion to TensorFlow and network protocols for server-to-server communication. With a few lines of code changes, an existing TensorFlow algorithm can be transformed into a scalable application.