Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Spark Summit EU talk by Tim Hunter

1,317 views

Published on

TensorFrames: Deep Learning with TensorFlow on Apache Spark

Published in: Data & Analytics
  • I'm sleeping so much better! Hi David, just a quick message to say thank you so much for this. I have been using a CPAP machine for 3 years and I absolutely hate it. It's so uncomfortable and I sleep worse with it on than I do without it. I'm now sleeping much better thanks to your program. And my wife is so much happier too! ♣♣♣ http://t.cn/AigiN2V1
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Spark Summit EU talk by Tim Hunter

  1. 1. SPARK SUMMIT EUROPE2016 TensorFrames: Google Tensorflow with Apache Spark Timothée Hunter Databricks, Inc.
  2. 2. About Databricks 2 Why Us Our Product • Created Apache Spark to enable big data use cases with a single engine. • Contributes 75% of Spark’s code - 10x more than others. • Bring Spark to the enterprise: The just- in-time data platform. • Fully managed platform powered by Apache Spark. • A unified solution for data science and engineering teams.
  3. 3. About me Software engineer at Databricks Apache Spark contributor Ph.D. UC Berkeley in Machine Learning (and Spark user since Spark 0.2) 3
  4. 4. Outline • Numerical computing with Apache Spark • Using GPUs with Spark and TensorFlow • Performance details • The future 4
  5. 5. Numerical computing for Data Science • Queries are data-heavy • However algorithms are computation-heavy • They operate on simple data types: integers, floats, doubles, vectors, matrices 5
  6. 6. The case for speed • Numerical bottlenecks are good targets for optimization • Let data scientists get faster results • Faster turnaround for experimentations • How can we run these numerical algorithms faster? 6
  7. 7. Evolution of computing power 7 Failure is not an option: it is a fact When you can afford your dedicated chip GPGPU Scale out Scale up
  8. 8. Evolution of computing power 8 NLTK Theano Today’s talk: Spark + TensorFlowTorch
  9. 9. Evolution of computing power • Processor speed cannot keep up with memory and network improvements • Access to the processor is the new bottleneck • Project Tungsten in Spark: leverage the processor’s heuristics for executing code and fetching memory • Does not account for the fact that the problem is numerical 9
  10. 10. Outline • Numerical computing with Apache Spark • Using GPUs with Spark and TensorFlow • Performance details • The future 10
  11. 11. GPGPUs • Graphics Processing Units for General Purpose computations 11 4.6 Theoretical peak throughput (Tflops, single precision) GPU CPU Theoretical peak bandwidth (GB/s) GPU CPU
  12. 12. Google TensorFlow • Library for writing “machine intelligence” algorithms • Very popular for deep learning and neural networks • Can also be used for general purpose numerical computations • Interface in C++ and Python 12
  13. 13. Numerical dataflow with Tensorflow 13 x = tf.placeholder(tf.int32, name=“x”) y = tf.placeholder(tf.int32, name=“y”) output = tf.add(x, 3 * y, name=“z”) session = tf.Session() output_value = session.run(output, {x: 3, y: 5}) x: int32 y: int32 mul 3 z
  14. 14. Numerical dataflow with Spark df = sqlContext.createDataFrame(…) x = tf.placeholder(tf.int32, name=“x”) y = tf.placeholder(tf.int32, name=“y”) output = tf.add(x, 3 * y, name=“z”) output_df = tfs.map_rows(output, df) output_df.collect() df: DataFrame[x: int, y: int] output_df: DataFrame[x: int, y: int, z: int] x: int32 y: int32 mul 3 z
  15. 15. Demo 15
  16. 16. Outline • Numerical computing with Apache Spark • Using GPUs with Spark and TensorFlow • Performance details • The future 16
  17. 17. It is a communication problem 17 Spark worker process Worker python process C++ buffer Python pickle Tungsten binary format Python pickle Java object
  18. 18. TensorFrames: native embedding of TensorFlow 18 Spark worker process C++ buffer Tungsten binary format Java object
  19. 19. An example: kernel density scoring • Estimation of distribution from samples • Non-parametric • Unknown bandwidth parameter • Can be evaluated with goodness of fit 19
  20. 20. An example: kernel density scoring • In practice, compute: with: • In a nutshell: a complex numerical function 20
  21. 21. Speedup 21 0 50 100 Scala UDF Scala UDF (optimized) TensorFrames TensorFrames + GPU Normalized cost def score(x: Double): Double = { val dis = points.map { z_k => - (x - z_k) * (x - z_k) / ( 2 * b * b) } val minDis = dis.min val exps = dis.map(d => math.exp(d - minDis)) minDis - math.log(b * N) + math.log(exps.sum) } val scoreUDF = sqlContext.udf.register("scoreUDF", score _) sql("select sum(scoreUDF(sample)) from samples").collect()
  22. 22. Speedup 22 0 50 100 Scala UDF Scala UDF (optimized) TensorFrames TensorFrames + GPU Normalized cost def score(x: Double): Double = { val dis = new Array[Double](N) var idx = 0 while(idx < N) { val z_k = points(idx) dis(idx) = - (x - z_k) * (x - z_k) / ( 2 * b * b) idx += 1 } val minDis = dis.min var expSum = 0.0 idx = 0 while(idx < N) { expSum += math.exp(dis(idx) - minDis) idx += 1 } minDis - math.log(b * N) + math.log(expSum) } val scoreUDF = sqlContext.udf.register("scoreUDF", score _) sql("select sum(scoreUDF(sample)) from samples").collect()
  23. 23. Speedup 23 0 50 100 Scala UDF Scala UDF (optimized) TensorFrames TensorFrames + GPU Normalized cost def cost_fun(block, bandwidth): distances = - square(constant(X) - sample) / (2 * b * b) m = reduce_max(distances, 0) x = log(reduce_sum(exp(distances - m), 0)) return identity(x + m - log(b * N), name="score”) sample = tfs.block(df, "sample") score = cost_fun(sample, bandwidth=0.5) df.agg(sum(tfs.map_blocks(score, df))).collect()
  24. 24. Speedup 24 0 50 100 Scala UDF Scala UDF (optimized) TensorFrames TensorFrames + GPU Normalized cost def cost_fun(block, bandwidth): distances = - square(constant(X) - sample) / (2 * b * b) m = reduce_max(distances, 0) x = log(reduce_sum(exp(distances - m), 0)) return identity(x + m - log(b * N), name="score”) with device("/gpu"): sample = tfs.block(df, "sample") score = cost_fun(sample, bandwidth=0.5) df.agg(sum(tfs.map_blocks(score, df))).collect()
  25. 25. Outline • Numerical computing with Apache Spark • Using GPUs with Spark and TensorFlow • Performance details • The future 25
  26. 26. Improving communication 26 Spark worker process C++ buffer Tungsten binary format Java object Direct memory copy Columnar storage
  27. 27. The future • Integration with Tungsten: – Direct memory copy – Columnar storage • Better integration with MLlib data types 27
  28. 28. Recap • Spark: an efficient framework for running computations on thousands of computers • TensorFlow: high-performance numerical framework • Get the best of both with TensorFrames: – Simple API for distributed numerical computing – Can leverage the hardware of the cluster 28
  29. 29. Try these demos yourself • TensorFrames source code and documentation: github.com/databricks/tensorframes spark-packages.org/package/databricks/tensorframes • Demo notebooks available on Databricks • The official TensorFlow website: www.tensorflow.org 29
  30. 30. Thank you.

×