SPARK SUMMIT
EUROPE2016
TensorFrames:
Google Tensorflow with Apache Spark
Timothée Hunter
Databricks, Inc.
About Databricks
2
Why Us Our Product
• Created Apache Spark to enable big
data use cases with a single engine.
• Contributes 75% of Spark’s code - 10x
more than others.
• Bring Spark to the enterprise: The just-
in-time data platform.
• Fully managed platform powered by
Apache Spark.
• A unified solution for data science and
engineering teams.
About me
Software engineer at Databricks
Apache Spark contributor
Ph.D. UC Berkeley in Machine Learning
(and Spark user since Spark 0.2)
3
Outline
• Numerical computing with Apache Spark
• Using GPUs with Spark and TensorFlow
• Performance details
• The future
4
Numerical computing for Data Science
• Queries are data-heavy
• However algorithms are computation-heavy
• They operate on simple data types: integers, floats, doubles,
vectors, matrices
5
The case for speed
• Numerical bottlenecks are good targets for optimization
• Let data scientists get faster results
• Faster turnaround for experimentations
• How can we run these numerical algorithms faster?
6
Evolution of computing power
7
Failure	is	not	an	option:
it	is	a	fact
When	you	can	afford	your	dedicated	chip
GPGPU
Scale	out
Scale	up
Evolution of computing power
8
NLTK
Theano
Today’s	talk:
Spark	+	TensorFlowTorch
Evolution of computing power
• Processor speed cannot keep up with memory and network
improvements
• Access to the processor is the new bottleneck
• Project Tungsten in Spark: leverage the processor’s heuristics for
executing code and fetching memory
• Does not account for the fact that the problem is numerical
9
Outline
• Numerical computing with Apache Spark
• Using GPUs with Spark and TensorFlow
• Performance details
• The future
10
GPGPUs
• Graphics Processing Units for General Purpose computations
11
4.6
Theoretical peak
throughput
(Tflops, single precision)
GPU CPU
Theoretical peak
bandwidth
(GB/s)
GPU CPU
Google TensorFlow
• Library for writing “machine intelligence” algorithms
• Very popular for deep learning and neural networks
• Can also be used for general purpose numerical
computations
• Interface in C++ and Python
12
Numerical dataflow with Tensorflow
13
x = tf.placeholder(tf.int32, name=“x”)
y = tf.placeholder(tf.int32, name=“y”)
output = tf.add(x, 3 * y, name=“z”)
session = tf.Session()
output_value = session.run(output,
{x: 3, y: 5})
x:
int32
y:
int32
mul 3
z
Numerical dataflow with Spark
df = sqlContext.createDataFrame(…)
x = tf.placeholder(tf.int32, name=“x”)
y = tf.placeholder(tf.int32, name=“y”)
output = tf.add(x, 3 * y, name=“z”)
output_df = tfs.map_rows(output, df)
output_df.collect()
df: DataFrame[x: int, y: int]
output_df:
DataFrame[x: int, y: int, z: int]
x:
int32
y:
int32
mul 3
z
Demo
15
Outline
• Numerical computing with Apache Spark
• Using GPUs with Spark and TensorFlow
• Performance details
• The future
16
It is a communication problem
17
Spark	worker	process Worker	python	process
C++
buffer
Python
pickle
Tungsten
binary
format
Python
pickle
Java
object
TensorFrames: native embedding of TensorFlow
18
Spark	worker	process
C++
buffer
Tungsten
binary
format
Java
object
An example: kernel density scoring
• Estimation of distribution from
samples
• Non-parametric
• Unknown bandwidth parameter
• Can be evaluated with
goodness of fit
19
An example: kernel density scoring
• In practice, compute:
with:
• In a nutshell: a complex numerical function
20
Speedup
21
0
50
100
Scala UDF Scala UDF
(optimized)
TensorFrames TensorFrames
+ GPU
Normalized	cost
def score(x:	Double):	Double =	{
val dis	=	points.map {		z_k =>	- (x	- z_k)	*	(x	- z_k)	/	(	2	*	b	*	b)	}
val minDis =	dis.min
val exps =	dis.map(d	=>	math.exp(d	- minDis))
minDis - math.log(b	*	N)	+	math.log(exps.sum)
}
val scoreUDF =	sqlContext.udf.register("scoreUDF",	score	_)
sql("select	sum(scoreUDF(sample))	from	samples").collect()
Speedup
22
0
50
100
Scala UDF Scala UDF
(optimized)
TensorFrames TensorFrames
+ GPU
Normalized	cost
def score(x:	Double):	Double =	{
val dis	=	new Array[Double](N)
var idx =	0
while(idx <	N)	{
val z_k =	points(idx)
dis(idx)	=	- (x	- z_k)	*	(x	- z_k)	/	(	2	*	b	*	b)
idx +=	1
}
val minDis =	dis.min
var expSum =	0.0
idx =	0	
while(idx <	N)	{
expSum +=	math.exp(dis(idx)	- minDis)
idx +=	1
}
minDis - math.log(b	*	N)	+	math.log(expSum)
}
val scoreUDF =	sqlContext.udf.register("scoreUDF",	score	_)
sql("select	sum(scoreUDF(sample))	from	samples").collect()
Speedup
23
0
50
100
Scala UDF Scala UDF
(optimized)
TensorFrames TensorFrames
+ GPU
Normalized	cost
def cost_fun(block,	bandwidth):
distances	=	- square(constant(X)	- sample)	/	(2	*	b	*	b)
m	=	reduce_max(distances,	0)
x	=	log(reduce_sum(exp(distances	- m),	0))
return identity(x	+	m	- log(b	*	N),	name="score”)
sample	=	tfs.block(df,	"sample")
score	=	cost_fun(sample,	bandwidth=0.5)
df.agg(sum(tfs.map_blocks(score,	df))).collect()
Speedup
24
0
50
100
Scala UDF Scala UDF
(optimized)
TensorFrames TensorFrames
+ GPU
Normalized	cost
def cost_fun(block,	bandwidth):
distances	=	- square(constant(X)	- sample)	/	(2	*	b	*	b)
m	=	reduce_max(distances,	0)
x	=	log(reduce_sum(exp(distances	- m),	0))
return identity(x	+	m	- log(b	*	N),	name="score”)
with device("/gpu"):
sample	=	tfs.block(df,	"sample")
score	=	cost_fun(sample,	bandwidth=0.5)
df.agg(sum(tfs.map_blocks(score,	df))).collect()
Outline
• Numerical computing with Apache Spark
• Using GPUs with Spark and TensorFlow
• Performance details
• The future
25
Improving communication
26
Spark	worker	process
C++
buffer
Tungsten
binary
format
Java
object
Direct	memory	copy
Columnar
storage
The future
• Integration with Tungsten:
– Direct memory copy
– Columnar storage
• Better integration with MLlib data types
27
Recap
• Spark: an efficient framework for running computations on
thousands of computers
• TensorFlow: high-performance numerical framework
• Get the best of both with TensorFrames:
– Simple API for distributed numerical computing
– Can leverage the hardware of the cluster
28
Try these demos yourself
• TensorFrames source code and documentation:
github.com/databricks/tensorframes
spark-packages.org/package/databricks/tensorframes
• Demo notebooks available on Databricks
• The official TensorFlow website:
www.tensorflow.org
29
Thank you.

Spark Summit EU talk by Tim Hunter

  • 1.
    SPARK SUMMIT EUROPE2016 TensorFrames: Google Tensorflowwith Apache Spark Timothée Hunter Databricks, Inc.
  • 2.
    About Databricks 2 Why UsOur Product • Created Apache Spark to enable big data use cases with a single engine. • Contributes 75% of Spark’s code - 10x more than others. • Bring Spark to the enterprise: The just- in-time data platform. • Fully managed platform powered by Apache Spark. • A unified solution for data science and engineering teams.
  • 3.
    About me Software engineerat Databricks Apache Spark contributor Ph.D. UC Berkeley in Machine Learning (and Spark user since Spark 0.2) 3
  • 4.
    Outline • Numerical computingwith Apache Spark • Using GPUs with Spark and TensorFlow • Performance details • The future 4
  • 5.
    Numerical computing forData Science • Queries are data-heavy • However algorithms are computation-heavy • They operate on simple data types: integers, floats, doubles, vectors, matrices 5
  • 6.
    The case forspeed • Numerical bottlenecks are good targets for optimization • Let data scientists get faster results • Faster turnaround for experimentations • How can we run these numerical algorithms faster? 6
  • 7.
    Evolution of computingpower 7 Failure is not an option: it is a fact When you can afford your dedicated chip GPGPU Scale out Scale up
  • 8.
    Evolution of computingpower 8 NLTK Theano Today’s talk: Spark + TensorFlowTorch
  • 9.
    Evolution of computingpower • Processor speed cannot keep up with memory and network improvements • Access to the processor is the new bottleneck • Project Tungsten in Spark: leverage the processor’s heuristics for executing code and fetching memory • Does not account for the fact that the problem is numerical 9
  • 10.
    Outline • Numerical computingwith Apache Spark • Using GPUs with Spark and TensorFlow • Performance details • The future 10
  • 11.
    GPGPUs • Graphics ProcessingUnits for General Purpose computations 11 4.6 Theoretical peak throughput (Tflops, single precision) GPU CPU Theoretical peak bandwidth (GB/s) GPU CPU
  • 12.
    Google TensorFlow • Libraryfor writing “machine intelligence” algorithms • Very popular for deep learning and neural networks • Can also be used for general purpose numerical computations • Interface in C++ and Python 12
  • 13.
    Numerical dataflow withTensorflow 13 x = tf.placeholder(tf.int32, name=“x”) y = tf.placeholder(tf.int32, name=“y”) output = tf.add(x, 3 * y, name=“z”) session = tf.Session() output_value = session.run(output, {x: 3, y: 5}) x: int32 y: int32 mul 3 z
  • 14.
    Numerical dataflow withSpark df = sqlContext.createDataFrame(…) x = tf.placeholder(tf.int32, name=“x”) y = tf.placeholder(tf.int32, name=“y”) output = tf.add(x, 3 * y, name=“z”) output_df = tfs.map_rows(output, df) output_df.collect() df: DataFrame[x: int, y: int] output_df: DataFrame[x: int, y: int, z: int] x: int32 y: int32 mul 3 z
  • 15.
  • 16.
    Outline • Numerical computingwith Apache Spark • Using GPUs with Spark and TensorFlow • Performance details • The future 16
  • 17.
    It is acommunication problem 17 Spark worker process Worker python process C++ buffer Python pickle Tungsten binary format Python pickle Java object
  • 18.
    TensorFrames: native embeddingof TensorFlow 18 Spark worker process C++ buffer Tungsten binary format Java object
  • 19.
    An example: kerneldensity scoring • Estimation of distribution from samples • Non-parametric • Unknown bandwidth parameter • Can be evaluated with goodness of fit 19
  • 20.
    An example: kerneldensity scoring • In practice, compute: with: • In a nutshell: a complex numerical function 20
  • 21.
    Speedup 21 0 50 100 Scala UDF ScalaUDF (optimized) TensorFrames TensorFrames + GPU Normalized cost def score(x: Double): Double = { val dis = points.map { z_k => - (x - z_k) * (x - z_k) / ( 2 * b * b) } val minDis = dis.min val exps = dis.map(d => math.exp(d - minDis)) minDis - math.log(b * N) + math.log(exps.sum) } val scoreUDF = sqlContext.udf.register("scoreUDF", score _) sql("select sum(scoreUDF(sample)) from samples").collect()
  • 22.
    Speedup 22 0 50 100 Scala UDF ScalaUDF (optimized) TensorFrames TensorFrames + GPU Normalized cost def score(x: Double): Double = { val dis = new Array[Double](N) var idx = 0 while(idx < N) { val z_k = points(idx) dis(idx) = - (x - z_k) * (x - z_k) / ( 2 * b * b) idx += 1 } val minDis = dis.min var expSum = 0.0 idx = 0 while(idx < N) { expSum += math.exp(dis(idx) - minDis) idx += 1 } minDis - math.log(b * N) + math.log(expSum) } val scoreUDF = sqlContext.udf.register("scoreUDF", score _) sql("select sum(scoreUDF(sample)) from samples").collect()
  • 23.
    Speedup 23 0 50 100 Scala UDF ScalaUDF (optimized) TensorFrames TensorFrames + GPU Normalized cost def cost_fun(block, bandwidth): distances = - square(constant(X) - sample) / (2 * b * b) m = reduce_max(distances, 0) x = log(reduce_sum(exp(distances - m), 0)) return identity(x + m - log(b * N), name="score”) sample = tfs.block(df, "sample") score = cost_fun(sample, bandwidth=0.5) df.agg(sum(tfs.map_blocks(score, df))).collect()
  • 24.
    Speedup 24 0 50 100 Scala UDF ScalaUDF (optimized) TensorFrames TensorFrames + GPU Normalized cost def cost_fun(block, bandwidth): distances = - square(constant(X) - sample) / (2 * b * b) m = reduce_max(distances, 0) x = log(reduce_sum(exp(distances - m), 0)) return identity(x + m - log(b * N), name="score”) with device("/gpu"): sample = tfs.block(df, "sample") score = cost_fun(sample, bandwidth=0.5) df.agg(sum(tfs.map_blocks(score, df))).collect()
  • 25.
    Outline • Numerical computingwith Apache Spark • Using GPUs with Spark and TensorFlow • Performance details • The future 25
  • 26.
  • 27.
    The future • Integrationwith Tungsten: – Direct memory copy – Columnar storage • Better integration with MLlib data types 27
  • 28.
    Recap • Spark: anefficient framework for running computations on thousands of computers • TensorFlow: high-performance numerical framework • Get the best of both with TensorFrames: – Simple API for distributed numerical computing – Can leverage the hardware of the cluster 28
  • 29.
    Try these demosyourself • TensorFrames source code and documentation: github.com/databricks/tensorframes spark-packages.org/package/databricks/tensorframes • Demo notebooks available on Databricks • The official TensorFlow website: www.tensorflow.org 29
  • 30.