TensorFrames:
Google Tensorflow on
Apache Spark
Tim Hunter
Spark Summit 2016 - Meetup
How familiar are you with Spark?
1. What is Apache Spark?
2. I have used Spark
3. I am using Spark in production or I contribute to
its development
2
How familiar are you with TensorFlow?
1. What is TensorFlow?
2. I have heard about it
3. I am training my own neural networks
3
Founded by the team who
created ApacheSpark
Offers a hosted service:
- Apache Spark in the cloud
- Notebooks
- Clustermanagement
- Productionenvironment
About Databricks
4
Software engineerat Databricks
Apache Spark contributor
Ph.D. UC Berkeleyin Machine Learning
(and Spark user since Spark 0.2)
About me
5
Outline
• Numerical computing with Apache Spark
• Using GPUs with Spark and TensorFlow
• Performance details
• The future
6
Numerical computing for Data
Science
• Queries are data-heavy
• However algorithmsare computation-heavy
• They operate on simple data types: integers,floats,
doubles, vectors, matrices
7
The case for speed
• Numerical bottlenecksare good targets for
optimization
• Let data scientists get faster results
• Faster turnaroundfor experimentations
• How can we run these numerical algorithmsfaster?
8
Evolution of computing power
9
Failure	is	not	an	option:
it	is	a	fact
When	you	can	afford	your	dedicated	chip
GPGPU
Scale	out
Scale	up
Evolution of computing power
10
NLTK
Theano
Today’s	talk:
Spark	+	TensorFlow
Evolution of computing power
• Processorspeedcannotkeep up with memory and
network improvements
• Accessto the processoris the new bottleneck
• ProjectTungstenin Spark: leverage the processor’s
heuristicsfor executingcode and fetching memory
• Does not accountfor the fact that the problem is
numerical
11
Asynchronous vs. synchronous
• Asynchronousalgorithms performupdates concurrently
• Spark is synchronousmodel, deep learningframeworks
usuallyasynchronous
• A large number of ML computations are synchronous
• Even deep learningmay benefit from synchronousupdates
12
Outline
• Numerical computing with Apache Spark
• Using GPUs with Spark and TensorFlow
• Performance details
• The future
13
GPGPUs
14
• Graphics Processing Units for General Purpose
computations
6000
Theoretical	peak
throughput
GPU CPU
Theoretical	peak
bandwidth
GPU CPU
• Library for writing “machine intelligence”algorithms
• Very popular for deep learning and neural networks
• Can also be used for general purpose numerical
computations
• Interface in C++ and Python
15
Google TensorFlow
Numerical dataflow with
Tensorflow
16
x = tf.placeholder(tf.int32, name=“x”)
y = tf.placeholder(tf.int32, name=“y”)
output = tf.add(x, 3 * y, name=“z”)
session = tf.Session()
output_value = session.run(output,
{x: 3, y: 5})
x:
int32
y:
int32
mul 3
z
Numerical dataflow with Spark
df = sqlContext.createDataFrame(…)
x = tf.placeholder(tf.int32, name=“x”)
y = tf.placeholder(tf.int32, name=“y”)
output = tf.add(x, 3 * y, name=“z”)
output_df = tfs.map_rows(output, df)
output_df.collect()
df:	DataFrame[x:	int,	y:	int]
output_df:	
DataFrame[x:	int,	y:	int,	z:	int]
x:
int32
y:
int32
mul 3
z
Outline
• Numerical computing with Apache Spark
• Using GPUs with Spark and TensorFlow
• Performance details
• The future
18
19
It is a communication problem
Spark	worker	process Worker	python	process
C++
buffer
Python	
pickle
Tungsten	
binary	
format
Python	
pickle
Java
object
20
TensorFrames: native embedding
of TensorFlow
Spark	worker	process
C++
buffer
Tungsten	
binary	
format
Java
object
• Estimation of distribution
from samples
• Non-parametric
• Unknown bandwidth
parameter
• Can be evaluatedwith
goodness of fit
An example: kernel density scoring
21
• In practice, compute:
with:
• In a nutshell:a complexnumerical function
An example: kernel density scoring
22
23
Speedup
0
60
120
180
Scala	UDF Scala	UDF	(optimized) TensorFrames TensorFrames	+	GPU
Run	time	(sec)
def score(x:	Double):	Double =	{
val dis	=	points.map {		z_k =>
- (x	- z_k)	*	(x	- z_k)	/	(	2	*	b	*	b)
}
val minDis =	dis.min
val exps =	dis.map(d	=>	math.exp(d	- minDis))
minDis - math.log(b	*	N)	+	math.log(exps.sum)
}
val scoreUDF =	sqlContext.udf.register("scoreUDF",	 score	_)
sql("select	sum(scoreUDF(sample))	 from	samples").collect()
24
Speedup
0
60
120
180
Scala	UDF Scala	UDF	(optimized) TensorFrames TensorFrames	+	GPU
Run	time	(sec)
def score(x:	Double):	Double =	{
val dis	=	new Array[Double](N)
varidx =	0
while(idx <	N)	{
val z_k =	points(idx)
dis(idx)	=	- (x	- z_k)	*	(x	- z_k)	/	(	2	*	b	*	b)
idx +=	1
}
val minDis =	dis.min
varexpSum =	0.0
idx =	0	
while(idx <	N)	{
expSum +=	math.exp(dis(idx) - minDis)
idx +=	1
}
minDis - math.log(b	*	N)	+	math.log(expSum)
}
val scoreUDF =	sqlContext.udf.register("scoreUDF",	score	_)
sql("select	sum(scoreUDF(sample))	from	samples").collect()
25
Speedup
0
60
120
180
Scala	UDF Scala	UDF	(optimized) TensorFrames TensorFrames	+	GPU
Run	time	(sec)
def cost_fun(block,	bandwidth):
distances	=	- square(constant(X)	- sample)	/	(2	*	b	*	b)
m	=	reduce_max(distances,	0)
x	=	log(reduce_sum(exp(distances	- m),	0))
return identity(x	+	m	- log(b	*	N),	name="score”)
sample	=	tfs.block(df,	"sample")
score	=	cost_fun(sample,	bandwidth=0.5)
df.agg(sum(tfs.map_blocks(score,	df))).collect()
26
Speedup
0
60
120
180
Scala	UDF Scala	UDF	(optimized) TensorFrames TensorFrames	+	GPU
Run	time	(sec)
def cost_fun(block,	bandwidth):
distances	=	- square(constant(X)	- sample)	/	(2	*	b	*	b)
m	=	reduce_max(distances,	0)
x	=	log(reduce_sum(exp(distances	- m),	0))
return identity(x	+	m	- log(b	*	N),	name="score”)
with device("/gpu"):
sample	=	tfs.block(df,	"sample")
score	=	cost_fun(sample,	bandwidth=0.5)
df.agg(sum(tfs.map_blocks(score,	df))).collect()
Demo: Deep dreams
27
Demo: Deep dreams
28
Outline
• Numerical computing with Apache Spark
• Using GPUs with Spark and TensorFlow
• Performance details
• The future
29
30
Improving communication
Spark	worker	process
C++
buffer
Tungsten	
binary	
format
Java
object
Direct	memory	copy
Columnar
storage
The future
• Integrationwith Tungsten:
• Direct memory copy
• Columnar storage
• Betterintegration with MLlib data types
• GPU instances in Databricks:
Official support coming this summer
31
Recap
• Spark: an efficient framework for running
computations on thousands of computers
• TensorFlow: high-performance numerical
framework
• Get the best of both with TensorFrames:
• Simple API for distributed numerical computing
• Can leveragethe hardwareof the cluster
32
Try these demos yourself
• TensorFrames source code and documentation:
github.com/tjhunter/tensorframes
spark-packages.org/package/tjhunter/tensorframes
• Demo available lateron Databricks
• The official TensorFlow website:
www.tensorflow.org
• More questions and attending the Spark summit?
We will hold office hours at the Databricks booth.
33
Thank you.

Spark Meetup TensorFrames

  • 1.
    TensorFrames: Google Tensorflow on ApacheSpark Tim Hunter Spark Summit 2016 - Meetup
  • 2.
    How familiar areyou with Spark? 1. What is Apache Spark? 2. I have used Spark 3. I am using Spark in production or I contribute to its development 2
  • 3.
    How familiar areyou with TensorFlow? 1. What is TensorFlow? 2. I have heard about it 3. I am training my own neural networks 3
  • 4.
    Founded by theteam who created ApacheSpark Offers a hosted service: - Apache Spark in the cloud - Notebooks - Clustermanagement - Productionenvironment About Databricks 4
  • 5.
    Software engineerat Databricks ApacheSpark contributor Ph.D. UC Berkeleyin Machine Learning (and Spark user since Spark 0.2) About me 5
  • 6.
    Outline • Numerical computingwith Apache Spark • Using GPUs with Spark and TensorFlow • Performance details • The future 6
  • 7.
    Numerical computing forData Science • Queries are data-heavy • However algorithmsare computation-heavy • They operate on simple data types: integers,floats, doubles, vectors, matrices 7
  • 8.
    The case forspeed • Numerical bottlenecksare good targets for optimization • Let data scientists get faster results • Faster turnaroundfor experimentations • How can we run these numerical algorithmsfaster? 8
  • 9.
    Evolution of computingpower 9 Failure is not an option: it is a fact When you can afford your dedicated chip GPGPU Scale out Scale up
  • 10.
    Evolution of computingpower 10 NLTK Theano Today’s talk: Spark + TensorFlow
  • 11.
    Evolution of computingpower • Processorspeedcannotkeep up with memory and network improvements • Accessto the processoris the new bottleneck • ProjectTungstenin Spark: leverage the processor’s heuristicsfor executingcode and fetching memory • Does not accountfor the fact that the problem is numerical 11
  • 12.
    Asynchronous vs. synchronous •Asynchronousalgorithms performupdates concurrently • Spark is synchronousmodel, deep learningframeworks usuallyasynchronous • A large number of ML computations are synchronous • Even deep learningmay benefit from synchronousupdates 12
  • 13.
    Outline • Numerical computingwith Apache Spark • Using GPUs with Spark and TensorFlow • Performance details • The future 13
  • 14.
    GPGPUs 14 • Graphics ProcessingUnits for General Purpose computations 6000 Theoretical peak throughput GPU CPU Theoretical peak bandwidth GPU CPU
  • 15.
    • Library forwriting “machine intelligence”algorithms • Very popular for deep learning and neural networks • Can also be used for general purpose numerical computations • Interface in C++ and Python 15 Google TensorFlow
  • 16.
    Numerical dataflow with Tensorflow 16 x= tf.placeholder(tf.int32, name=“x”) y = tf.placeholder(tf.int32, name=“y”) output = tf.add(x, 3 * y, name=“z”) session = tf.Session() output_value = session.run(output, {x: 3, y: 5}) x: int32 y: int32 mul 3 z
  • 17.
    Numerical dataflow withSpark df = sqlContext.createDataFrame(…) x = tf.placeholder(tf.int32, name=“x”) y = tf.placeholder(tf.int32, name=“y”) output = tf.add(x, 3 * y, name=“z”) output_df = tfs.map_rows(output, df) output_df.collect() df: DataFrame[x: int, y: int] output_df: DataFrame[x: int, y: int, z: int] x: int32 y: int32 mul 3 z
  • 18.
    Outline • Numerical computingwith Apache Spark • Using GPUs with Spark and TensorFlow • Performance details • The future 18
  • 19.
    19 It is acommunication problem Spark worker process Worker python process C++ buffer Python pickle Tungsten binary format Python pickle Java object
  • 20.
    20 TensorFrames: native embedding ofTensorFlow Spark worker process C++ buffer Tungsten binary format Java object
  • 21.
    • Estimation ofdistribution from samples • Non-parametric • Unknown bandwidth parameter • Can be evaluatedwith goodness of fit An example: kernel density scoring 21
  • 22.
    • In practice,compute: with: • In a nutshell:a complexnumerical function An example: kernel density scoring 22
  • 23.
    23 Speedup 0 60 120 180 Scala UDF Scala UDF (optimized) TensorFramesTensorFrames + GPU Run time (sec) def score(x: Double): Double = { val dis = points.map { z_k => - (x - z_k) * (x - z_k) / ( 2 * b * b) } val minDis = dis.min val exps = dis.map(d => math.exp(d - minDis)) minDis - math.log(b * N) + math.log(exps.sum) } val scoreUDF = sqlContext.udf.register("scoreUDF", score _) sql("select sum(scoreUDF(sample)) from samples").collect()
  • 24.
    24 Speedup 0 60 120 180 Scala UDF Scala UDF (optimized) TensorFramesTensorFrames + GPU Run time (sec) def score(x: Double): Double = { val dis = new Array[Double](N) varidx = 0 while(idx < N) { val z_k = points(idx) dis(idx) = - (x - z_k) * (x - z_k) / ( 2 * b * b) idx += 1 } val minDis = dis.min varexpSum = 0.0 idx = 0 while(idx < N) { expSum += math.exp(dis(idx) - minDis) idx += 1 } minDis - math.log(b * N) + math.log(expSum) } val scoreUDF = sqlContext.udf.register("scoreUDF", score _) sql("select sum(scoreUDF(sample)) from samples").collect()
  • 25.
    25 Speedup 0 60 120 180 Scala UDF Scala UDF (optimized) TensorFramesTensorFrames + GPU Run time (sec) def cost_fun(block, bandwidth): distances = - square(constant(X) - sample) / (2 * b * b) m = reduce_max(distances, 0) x = log(reduce_sum(exp(distances - m), 0)) return identity(x + m - log(b * N), name="score”) sample = tfs.block(df, "sample") score = cost_fun(sample, bandwidth=0.5) df.agg(sum(tfs.map_blocks(score, df))).collect()
  • 26.
    26 Speedup 0 60 120 180 Scala UDF Scala UDF (optimized) TensorFramesTensorFrames + GPU Run time (sec) def cost_fun(block, bandwidth): distances = - square(constant(X) - sample) / (2 * b * b) m = reduce_max(distances, 0) x = log(reduce_sum(exp(distances - m), 0)) return identity(x + m - log(b * N), name="score”) with device("/gpu"): sample = tfs.block(df, "sample") score = cost_fun(sample, bandwidth=0.5) df.agg(sum(tfs.map_blocks(score, df))).collect()
  • 27.
  • 28.
  • 29.
    Outline • Numerical computingwith Apache Spark • Using GPUs with Spark and TensorFlow • Performance details • The future 29
  • 30.
  • 31.
    The future • IntegrationwithTungsten: • Direct memory copy • Columnar storage • Betterintegration with MLlib data types • GPU instances in Databricks: Official support coming this summer 31
  • 32.
    Recap • Spark: anefficient framework for running computations on thousands of computers • TensorFlow: high-performance numerical framework • Get the best of both with TensorFrames: • Simple API for distributed numerical computing • Can leveragethe hardwareof the cluster 32
  • 33.
    Try these demosyourself • TensorFrames source code and documentation: github.com/tjhunter/tensorframes spark-packages.org/package/tjhunter/tensorframes • Demo available lateron Databricks • The official TensorFlow website: www.tensorflow.org • More questions and attending the Spark summit? We will hold office hours at the Databricks booth. 33
  • 34.