SlideShare a Scribd company logo
Scaling Machine Learning To
Billions of Parameters
Badri Bhaskar, Erik Ordentlich
(joint with Andy Feng, Lee Yang, Peter Cnudde)
Yahoo, Inc.
Outline
• Large scale machine learning (ML)
• Spark + Parameter Server
– Architecture
– Implementation
• Examples:
- Distributed L-BFGS (Batch)
- Distributed Word2vec (Sequential)
• Spark + Parameter Server on Hadoop Cluster
LARGE SCALE ML
1.2 1.2 -78
6.3 -8.1
5.4 -8
4.2 2.3 -3.4
-1.1
2.3 4.9 7.4
4.5 2.1 -15
2.3 2.3
0.5 1.2 -0.9
-24
-1.3 -2.2 1.8
-4.9 -2.1 1.2
Web Scale ML
Billions of features
Hundredsofbillionsofexamples
Big Model
BigData
Ex:	Yahoo	word2vec	-	120	billion	parameters	and	500	billion	samples
1.2 1.2 -78
6.3 -8.1
5.4 -8
4.2 2.3 -3.4
-1.1
2.3 4.9 7.4
4.5 2.1 -15
2.3 2.3
0.5 1.2 -0.9
-24
-1.3 -2.2 1.8
-4.9 -2.1 1.2
Web Scale ML
Billions of features
Hundredsofbillionsofexamples
Big Model
BigData
Ex:	Yahoo	word2vec	-	120	billion	parameters	and	500	billion	samples
Store Store Store
1.2 1.2 -78
6.3 -8.1
5.4 -8
4.2 2.3 -3.4
-1.1
2.3 4.9 7.4
4.5 2.1 -15
2.3 2.3
0.5 1.2 -0.9
-24
-1.3 -2.2 1.8
-4.9 -2.1 1.2
Web Scale ML
Billions of features
Hundredsofbillionsofexamples
Big Model
BigData
Ex:	Yahoo	word2vec	-	120	billion	parameters	and	500	billion	samples
Worker
Worker
Worker
Store Store Store
1.2 1.2 -78
6.3 -8.1
5.4 -8
4.2 2.3 -3.4
-1.1
2.3 4.9 7.4
4.5 2.1 -15
2.3 2.3
0.5 1.2 -0.9
-24
-1.3 -2.2 1.8
-4.9 -2.1 1.2
Web Scale ML
Billions of features
Hundredsofbillionsofexamples
Big Model
BigData
Ex:	Yahoo	word2vec	-	120	billion	parameters	and	500	billion	samples
Worker
Worker
Worker
Store Store Store
Each example depends only on a tiny fraction of the model
Two Optimization Strategies
Model
Multiple epochs…
BATCH
Example: Gradient Descent, L-BFGS
Model
Model
Model
SEQUENTIAL
Multiple random samples…
Example: (Minibatch) stochastic gradient method,
perceptron
Examples
Two Optimization Strategies
Model
Multiple epochs…
BATCH
Example: Gradient Descent, L-BFGS
Model
Model
Model
SEQUENTIAL
Multiple random samples…
Example: (Minibatch) stochastic gradient method,
perceptron
• Small number of model updates
• Accurate
• Each epoch may be expensive.
• Easy to parallelize.
Examples
Two Optimization Strategies
Model
Multiple epochs…
BATCH
Example: Gradient Descent, L-BFGS
Model
Model
Model
SEQUENTIAL
Multiple random samples…
Example: (Minibatch) stochastic gradient method,
perceptron
• Small number of model updates
• Accurate
• Each epoch may be expensive.
• Easy to parallelize.
• Requires lots of model updates.
• Not as accurate, but often good enough
• A lot of progress in one pass* for big data.
• Not trivial to parallelize.
*also optimal in terms of generalization error (often with a lot of tuning)
Examples
Requirements
Requirements
✓ Support both batch and sequential optimization
Requirements
✓ Support both batch and sequential optimization
✓ Sequential training: Handle frequent updates to the model
Requirements
✓ Support both batch and sequential optimization
✓ Sequential training: Handle frequent updates to the model
✓ Batch training: 100+ passes each pass must be fast.
Parameter Server (PS)
Client
Data
Client
Data
Client
Data
Client
Data
Training state stored in PS shards, asynchronous updates
PS Shard PS ShardPS Shard
ΔM
Model Update
M
Model
Early work: Yahoo LDA by Smola and Narayanamurthy based on memcached (2010),
Introduced in Google’s Distbelief (2012), refined in Petuum / Bösen (2013), Mu Li et al (2014)
SPARK + PARAMETER SERVER
ML in Spark alone
Executor Executor
CoreCore Core Core Core
Driver
Holds model
ML in Spark alone
Executor Executor
CoreCore Core Core Core
Driver
Holds model
MLlib optimization
ML in Spark alone
• Sequential:
– Driver-based communication limits frequency of model updates.
– Large minibatch size limits model update frequency, convergence suffers.
Executor Executor
CoreCore Core Core Core
Driver
Holds model
MLlib optimization
ML in Spark alone
• Sequential:
– Driver-based communication limits frequency of model updates.
– Large minibatch size limits model update frequency, convergence suffers.
• Batch:
– Driver bandwidth can be a bottleneck
– Synchronous stage wise processing limits throughput.
Executor Executor
CoreCore Core Core Core
Driver
Holds model
MLlib optimization
ML in Spark alone
• Sequential:
– Driver-based communication limits frequency of model updates.
– Large minibatch size limits model update frequency, convergence suffers.
• Batch:
– Driver bandwidth can be a bottleneck
– Synchronous stage wise processing limits throughput.
Executor Executor
CoreCore Core Core Core
Driver
Holds model
MLlib optimization
PS Architecture circumvents both limitations…
Spark + Parameter Server
• Leverage Spark for HDFS I/O, distributed processing, fine-grained
load balancing, failure recovery, in-memory operations
Spark + Parameter Server
• Leverage Spark for HDFS I/O, distributed processing, fine-grained
load balancing, failure recovery, in-memory operations
• Use PS to sync models, incremental updates during training, or
sometimes even some vector math.
Spark + Parameter Server
• Leverage Spark for HDFS I/O, distributed processing, fine-grained
load balancing, failure recovery, in-memory operations
• Use PS to sync models, incremental updates during training, or
sometimes even some vector math.
Spark + Parameter Server
HDFS
Training state stored in PS shards
Driver
Executor ExecutorExecutor
CoreCore Core Core
PS Shard PS ShardPS Shard
control
control
Yahoo PS
Yahoo PS
Server
Client API
Yahoo PS
Server
Client API
GC
Preallocated
arrays
Yahoo PS
Server
Client API
GC
Preallocated
arrays
• In-memory
• Lock per key / Lock-free
• Sync / Async
K-V
Yahoo PS
Server
Client API
GC
Preallocated
arrays
• In-memory
• Lock per key / Lock-free
• Sync / Async
K-V
• Column-
partitioned
• Supports
BLAS
Yahoo PS
Server
Client API
GC
Preallocated
arrays
• In-memory
• Lock per key / Lock-free
• Sync / Async
K-V
• Column-
partitioned
• Supports
BLAS
• Export Model
• Checkpoint
HDFS
Yahoo PS
Server
Client API
GC
Preallocated
arrays
• In-memory
• Lock per key / Lock-free
• Sync / Async
K-V
• Column-
partitioned
• Supports
BLAS
• Export Model
• Checkpoint
HDFS
UDF
• Client supplied aggregation
• Custom shard operations
Map PS API
• Distributed key-value store abstraction
• Supports batched operations in addition to usual get and put
• Many operations return a future – you can operate asynchronously or block
Matrix PS API
• Vector math (BLAS style operations), in addition to everything Map API provides
• Increment and fetch sparse vectors (e.g., for gradient aggregation)
• We use other custom operations on shard (API not shown)
EXAMPLES
Sponsored Search Advertising
Sponsored Search Advertising
Sponsored Search Advertising
query user ad context
Features
Model
Example Click Model: (Logistic Regression)
L-BFGS Background
L-BFGS Background
Newton’s method
Gradient Descent
Using curvature information,
you can converge faster…
L-BFGS Background
Exact, impractical
Newton’s method
Gradient Descent
Using curvature information,
you can converge faster…
L-BFGS Background
Exact, impractical
Step Size computation
- Needs to satisfy some technical (Wolfe) conditions
- Adaptively determined from data
Inverse Hessian Approximation
(based on history of L-previous gradients and model deltas)
Approximate, practical
Newton’s method
Gradient Descent
Using curvature information,
you can converge faster…
L-BFGS Background
Exact, impractical
Step Size computation
- Needs to satisfy some technical (Wolfe) conditions
- Adaptively determined from data
Inverse Hessian Approximation
(based on history of L-previous gradients and model deltas)
Approximate, practical
Newton’s method
Gradient Descent
Using curvature information,
you can converge faster…
L-BFGS Background
Exact, impractical
Step Size computation
- Needs to satisfy some technical (Wolfe) conditions
- Adaptively determined from data
Inverse Hessian Approximation
(based on history of L-previous gradients and model deltas)
Approximate, practical
Newton’s method
Gradient Descent
Using curvature information,
you can converge faster…
L-BFGS Background
Exact, impractical
Step Size computation
- Needs to satisfy some technical (Wolfe) conditions
- Adaptively determined from data
Inverse Hessian Approximation
(based on history of L-previous gradients and model deltas)
Approximate, practical
Newton’s method
Gradient Descent
Using curvature information,
you can converge faster…
dotprod
axpy (y ← ax + y)
copy
axpy
scal
scal
Vector Math
dotprod
L-BFGS Background
Exact, impractical
Step Size computation
- Needs to satisfy some technical (Wolfe) conditions
- Adaptively determined from data
Inverse Hessian Approximation
(based on history of L-previous gradients and model deltas)
Approximate, practical
Newton’s method
Gradient Descent
Using curvature information,
you can converge faster…
Executor ExecutorExecutor
HDFS HDFSHDFS
Driver
PS PS PS PS
Distributed LBFGS*
Compute gradient and loss
1. Incremental sparse gradient update
2. Fetch sparse portions of model
Coordinates executor
Step 1: Compute and update Gradient
*Our design is very similar to Sandblaster L-BFGS, Jeff Dean et al, Large Scale Distributed Deep Networks (2012)
state vectors
Executor ExecutorExecutor
HDFS HDFSHDFS
Driver
PS PS PS PS
Distributed LBFGS*
Compute gradient and loss
1. Incremental sparse gradient update
2. Fetch sparse portions of model
Coordinates executor
Step 1: Compute and update Gradient
*Our design is very similar to Sandblaster L-BFGS, Jeff Dean et al, Large Scale Distributed Deep Networks (2012)
state vectors
Executor ExecutorExecutor
HDFS HDFSHDFS
Driver
PS PS PS PS
Distributed LBFGS
Coordinates PS for
performing L-BFGS updates
Actual L-BFGS updates
(BLAS vector math)
Step 2: Build inverse Hessian Approximation
Executor ExecutorExecutor
HDFS HDFSHDFS
Driver
PS PS PS PS
Distributed LBFGS
Coordinates executor Compute directional derivatives
for parallel line search
Fetch sparse portions of modelStep 3: Compute losses and directional derivatives
Executor ExecutorExecutor
HDFS HDFSHDFS
Driver
PS PS PS PS
Distributed LBFGS
Performs line search and
model update
Model update (BLAS vector math)Step 4: Line search and model update
Speedup tricks
Speedup tricks
• Intersperse communication and computation
Speedup tricks
• Intersperse communication and computation
• Quicker convergence
– Parallel line search for step size
– Curvature for initial Hessian approximation*
*borrowed from vowpal wabbit
Speedup tricks
• Intersperse communication and computation
• Quicker convergence
– Parallel line search for step size
– Curvature for initial Hessian approximation*
• Network bandwidth reduction
– Compressed integer arrays
– Only store indices for binary data
*borrowed from vowpal wabbit
Speedup tricks
• Intersperse communication and computation
• Quicker convergence
– Parallel line search for step size
– Curvature for initial Hessian approximation*
• Network bandwidth reduction
– Compressed integer arrays
– Only store indices for binary data
• Matrix math on minibatch
*borrowed from vowpal wabbit
Speedup tricks
• Intersperse communication and computation
• Quicker convergence
– Parallel line search for step size
– Curvature for initial Hessian approximation*
• Network bandwidth reduction
– Compressed integer arrays
– Only store indices for binary data
• Matrix math on minibatch
0
750
1500
2250
3000
10
20
100
221612
2880
1260
96
MLlib
PS + Spark
1.6 x 108 examples, 100 executors, 10 cores
time(inseconds)perepoch
feature size (millions)
*borrowed from vowpal wabbit
Word Embeddings
Word Embeddings
Word Embeddings
v(paris) = [0.13, -0.4, 0.22, …., -0.45]
v(lion) = [-0.23, -0.1, 0.98, …., 0.65]
v(quark) = [1.4, 0.32, -0.01, …, 0.023]
...
Word2vec
Word2vec
Word2vec
• new techniques to
compute vector
representations of words
from corpus
Word2vec
• new techniques to
compute vector
representations of words
from corpus
• geometry of vectors
captures word semantics
Word2vec
Word2vec
• Skipgram with negative sampling:
Word2vec
• Skipgram with negative sampling:
– training set includes pairs of words and neighbors in corpus,
along with randomly selected words for each neighbor
Word2vec
• Skipgram with negative sampling:
– training set includes pairs of words and neighbors in corpus,
along with randomly selected words for each neighbor
– determine w → u(w),v(w) so that sigmoid(u(w)•v(w’)) is close
to (minimizes log loss) the probability that w’ is a neighbor of
w as opposed to a randomly selected word.
Word2vec
• Skipgram with negative sampling:
– training set includes pairs of words and neighbors in corpus,
along with randomly selected words for each neighbor
– determine w → u(w),v(w) so that sigmoid(u(w)•v(w’)) is close
to (minimizes log loss) the probability that w’ is a neighbor of
w as opposed to a randomly selected word.
– SGD involves computing many vector dot products e.g.,
u(w)•v(w’) and vector linear combinations
e.g., u(w) += α v(w’).
Word2vec Application at Yahoo
• Example training data:
gas_cap_replacement_for_car
slc_679f037df54f5d9c41cab05bfae0926
gas_door_replacement_for_car
slc_466145af16a40717c84683db3f899d0a fuel_door_covers
adid_c_28540527225_285898621262
slc_348709d73214fdeb9782f8b71aff7b6e autozone_auto_parts
adid_b_3318310706_280452370893 auoto_zone
slc_8dcdab5d20a2caa02b8b1d1c8ccbd36b
slc_58f979b6deb6f40c640f7ca8a177af2d
[ Grbovic, et. al. SIGIR 2015 and SIGIR 2016 (to appear) ]
Distributed Word2vec
Distributed Word2vec
• Needed system to train 200 million 300
dimensional word2vec model using minibatch
SGD
Distributed Word2vec
• Needed system to train 200 million 300
dimensional word2vec model using minibatch
SGD
• Achieved in a high throughput and network
efficient way using our matrix based PS server:
Distributed Word2vec
• Needed system to train 200 million 300
dimensional word2vec model using minibatch
SGD
• Achieved in a high throughput and network
efficient way using our matrix based PS server:
– Vectors don’t go over network.
Distributed Word2vec
• Needed system to train 200 million 300
dimensional word2vec model using minibatch
SGD
• Achieved in a high throughput and network
efficient way using our matrix based PS server:
– Vectors don’t go over network.
– Most compute on PS servers, with clients aggregating
partial results from shards.
Distributed Word2vec
Word2vec
learners
PS Shards
V1 row
V2 row
…
Vn row
HDFS
. . .
Distributed Word2vec
Word2vec
learners
PS Shards
V1 row
V2 row
…
Vn row
Each shard stores
a part of every vector
HDFS
. . .
Distributed Word2vec
Send word indices
and seeds
Word2vec
learners
PS Shards
V1 row
V2 row
…
Vn row
HDFS
. . .
Distributed Word2vec
Negative sampling,
compute u•v
Word2vec
learners
PS Shards
V1 row
V2 row
…
Vn row
HDFS
. . .
Distributed Word2vec
Word2vec
learners
PS Shards
Aggregate results &
compute lin. comb. coefficients (e.g., α…)
V1 row
V2 row
…
Vn row
HDFS
. . .
Distributed Word2vec
Word2vec
learners
PS Shards
V1 row
V2 row
…
Vn row
Update vectors
(v += αu, …)
HDFS
. . .
Distributed Word2vec
Distributed Word2vec
• Network lower by factor of #shards/dimension
compared to conventional PS based system
(1/20 to 1/100 for useful scenarios).
Distributed Word2vec
• Network lower by factor of #shards/dimension
compared to conventional PS based system
(1/20 to 1/100 for useful scenarios).
• Trains 200 million vocab, 55 billion word search
session in 2.5 days.
Distributed Word2vec
• Network lower by factor of #shards/dimension
compared to conventional PS based system
(1/20 to 1/100 for useful scenarios).
• Trains 200 million vocab, 55 billion word search
session in 2.5 days.
• In production for regular training in Yahoo search
ad serving system.
Other Projects using Spark + PS
• Online learning on PS
– Personalization as a Service
– Sponsored Search
• Factorization Machines
– Large scale user profiling
SPARK+PS ON HADOOP CLUSTER
Training Data on HDFS
HDFS
Launch PS Using Apache Slider on YARN
HDFS
YARN
Apache Slider
Parameter Servers
Launch Clients using Spark or Hadoop Streaming
API
HDFS
YARN
Apache Slider
Parameter Servers
Apache Spark
Clients
Parameter ServerClients
Training
HDFS
YARN
Apache SliderApache Spark
Model Export
HDFS
YARN
Apache Slider
Parameter Server
Apache Spark
Clients
Model Export
HDFS
YARN
Apache Slider
Parameter Server
Apache Spark
Clients
Summary
• Parameter server indispensable for big models
• Spark + Parameter Server has proved to be very
flexible platform for our large scale computing
needs
• Direct computation on the parameter servers
accelerate training for our use-cases
Thank you!
For more, contact bigdata@yahoo-inc.com.

More Related Content

What's hot

Apache Spark MLlib 2.0 Preview: Data Science and Production
Apache Spark MLlib 2.0 Preview: Data Science and ProductionApache Spark MLlib 2.0 Preview: Data Science and Production
Apache Spark MLlib 2.0 Preview: Data Science and Production
Databricks
 
Random Walks on Large Scale Graphs with Apache Spark with Min Shen
Random Walks on Large Scale Graphs with Apache Spark with Min ShenRandom Walks on Large Scale Graphs with Apache Spark with Min Shen
Random Walks on Large Scale Graphs with Apache Spark with Min Shen
Databricks
 
Operational Tips For Deploying Apache Spark
Operational Tips For Deploying Apache SparkOperational Tips For Deploying Apache Spark
Operational Tips For Deploying Apache Spark
Databricks
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar Castaneda
Spark Summit
 
Deep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
Deep Dive of ADBMS Migration to Apache Spark—Use Cases SharingDeep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
Deep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
Databricks
 
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowImproving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Databricks
 
Spark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni SchieferSpark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni Schiefer
Spark Summit
 
Recent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced AnalyticsRecent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced Analytics
Databricks
 
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim Hunter
Databricks
 
ROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlowROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlow
Databricks
 
Spark Summit EU talk by Elena Lazovik
Spark Summit EU talk by Elena LazovikSpark Summit EU talk by Elena Lazovik
Spark Summit EU talk by Elena Lazovik
Spark Summit
 
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence SpracklenSpark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Summit
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Jen Aman
 
A Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons LearnedA Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons Learned
Databricks
 
Spark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim HunterSpark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim Hunter
Spark Summit
 
Build, Scale, and Deploy Deep Learning Pipelines with Ease
Build, Scale, and Deploy Deep Learning Pipelines with EaseBuild, Scale, and Deploy Deep Learning Pipelines with Ease
Build, Scale, and Deploy Deep Learning Pipelines with Ease
Databricks
 
High Performance Python on Apache Spark
High Performance Python on Apache SparkHigh Performance Python on Apache Spark
High Performance Python on Apache Spark
Wes McKinney
 
Willump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML InferenceWillump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML Inference
Databricks
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
huguk
 
Improving Spark SQL at LinkedIn
Improving Spark SQL at LinkedInImproving Spark SQL at LinkedIn
Improving Spark SQL at LinkedIn
Databricks
 

What's hot (20)

Apache Spark MLlib 2.0 Preview: Data Science and Production
Apache Spark MLlib 2.0 Preview: Data Science and ProductionApache Spark MLlib 2.0 Preview: Data Science and Production
Apache Spark MLlib 2.0 Preview: Data Science and Production
 
Random Walks on Large Scale Graphs with Apache Spark with Min Shen
Random Walks on Large Scale Graphs with Apache Spark with Min ShenRandom Walks on Large Scale Graphs with Apache Spark with Min Shen
Random Walks on Large Scale Graphs with Apache Spark with Min Shen
 
Operational Tips For Deploying Apache Spark
Operational Tips For Deploying Apache SparkOperational Tips For Deploying Apache Spark
Operational Tips For Deploying Apache Spark
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar Castaneda
 
Deep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
Deep Dive of ADBMS Migration to Apache Spark—Use Cases SharingDeep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
Deep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
 
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowImproving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
 
Spark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni SchieferSpark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni Schiefer
 
Recent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced AnalyticsRecent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced Analytics
 
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim Hunter
 
ROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlowROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlow
 
Spark Summit EU talk by Elena Lazovik
Spark Summit EU talk by Elena LazovikSpark Summit EU talk by Elena Lazovik
Spark Summit EU talk by Elena Lazovik
 
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence SpracklenSpark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
 
A Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons LearnedA Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons Learned
 
Spark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim HunterSpark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim Hunter
 
Build, Scale, and Deploy Deep Learning Pipelines with Ease
Build, Scale, and Deploy Deep Learning Pipelines with EaseBuild, Scale, and Deploy Deep Learning Pipelines with Ease
Build, Scale, and Deploy Deep Learning Pipelines with Ease
 
High Performance Python on Apache Spark
High Performance Python on Apache SparkHigh Performance Python on Apache Spark
High Performance Python on Apache Spark
 
Willump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML InferenceWillump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML Inference
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Improving Spark SQL at LinkedIn
Improving Spark SQL at LinkedInImproving Spark SQL at LinkedIn
Improving Spark SQL at LinkedIn
 

Similar to Scaling Machine Learning To Billions Of Parameters

Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
PAPIs.io
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
Apache Apex
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time Learning
Swiss Big Data User Group
 
Cloudera Federal Forum 2014: The Evolution of Machine Learning from Science t...
Cloudera Federal Forum 2014: The Evolution of Machine Learning from Science t...Cloudera Federal Forum 2014: The Evolution of Machine Learning from Science t...
Cloudera Federal Forum 2014: The Evolution of Machine Learning from Science t...
Cloudera, Inc.
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform System
James Serra
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
Databricks
 
Auto-Train a Time-Series Forecast Model With AML + ADB
Auto-Train a Time-Series Forecast Model With AML + ADBAuto-Train a Time-Series Forecast Model With AML + ADB
Auto-Train a Time-Series Forecast Model With AML + ADB
Databricks
 
DataMass Summit - Machine Learning for Big Data in SQL Server
DataMass Summit - Machine Learning for Big Data  in SQL ServerDataMass Summit - Machine Learning for Big Data  in SQL Server
DataMass Summit - Machine Learning for Big Data in SQL Server
Łukasz Grala
 
Scaling AI in production using PyTorch
Scaling AI in production using PyTorchScaling AI in production using PyTorch
Scaling AI in production using PyTorch
geetachauhan
 
Challenges on Distributed Machine Learning
Challenges on Distributed Machine LearningChallenges on Distributed Machine Learning
Challenges on Distributed Machine Learning
jie cao
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
Rahul Jain
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R Workshop
Avkash Chauhan
 
Sunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_Spark
Sunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_SparkSunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_Spark
Sunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_Spark
Mopuru Babu
 
Sunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_Spark
Sunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_SparkSunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_Spark
Sunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_Spark
Mopuru Babu
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
Databricks
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
Amazon Web Services
 
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Sascha Wenninger
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
markgrover
 
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Chris Fregly
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
DataWorks Summit
 

Similar to Scaling Machine Learning To Billions Of Parameters (20)

Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time Learning
 
Cloudera Federal Forum 2014: The Evolution of Machine Learning from Science t...
Cloudera Federal Forum 2014: The Evolution of Machine Learning from Science t...Cloudera Federal Forum 2014: The Evolution of Machine Learning from Science t...
Cloudera Federal Forum 2014: The Evolution of Machine Learning from Science t...
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform System
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
 
Auto-Train a Time-Series Forecast Model With AML + ADB
Auto-Train a Time-Series Forecast Model With AML + ADBAuto-Train a Time-Series Forecast Model With AML + ADB
Auto-Train a Time-Series Forecast Model With AML + ADB
 
DataMass Summit - Machine Learning for Big Data in SQL Server
DataMass Summit - Machine Learning for Big Data  in SQL ServerDataMass Summit - Machine Learning for Big Data  in SQL Server
DataMass Summit - Machine Learning for Big Data in SQL Server
 
Scaling AI in production using PyTorch
Scaling AI in production using PyTorchScaling AI in production using PyTorch
Scaling AI in production using PyTorch
 
Challenges on Distributed Machine Learning
Challenges on Distributed Machine LearningChallenges on Distributed Machine Learning
Challenges on Distributed Machine Learning
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R Workshop
 
Sunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_Spark
Sunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_SparkSunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_Spark
Sunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_Spark
 
Sunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_Spark
Sunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_SparkSunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_Spark
Sunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_Spark
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
 
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 

More from Jen Aman

Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei ZahariaDeep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Jen Aman
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
Jen Aman
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
Jen Aman
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Jen Aman
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time Decisions
Jen Aman
 
Spatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using SparkSpatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using Spark
Jen Aman
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Jen Aman
 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat Detection
Jen Aman
 
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Jen Aman
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
Jen Aman
 
Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using Spark
Jen Aman
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
Jen Aman
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
Jen Aman
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
Jen Aman
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out Databases
Jen Aman
 
Livy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkLivy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache Spark
Jen Aman
 
GPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonGPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And Python
Jen Aman
 
Spark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousSpark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 Furious
Jen Aman
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemML
Jen Aman
 
Spark on Mesos
Spark on MesosSpark on Mesos
Spark on Mesos
Jen Aman
 

More from Jen Aman (20)

Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei ZahariaDeep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time Decisions
 
Spatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using SparkSpatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using Spark
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat Detection
 
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
 
Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using Spark
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out Databases
 
Livy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkLivy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache Spark
 
GPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonGPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And Python
 
Spark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousSpark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 Furious
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemML
 
Spark on Mesos
Spark on MesosSpark on Mesos
Spark on Mesos
 

Recently uploaded

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 

Recently uploaded (20)

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 

Scaling Machine Learning To Billions Of Parameters

  • 1. Scaling Machine Learning To Billions of Parameters Badri Bhaskar, Erik Ordentlich (joint with Andy Feng, Lee Yang, Peter Cnudde) Yahoo, Inc.
  • 2. Outline • Large scale machine learning (ML) • Spark + Parameter Server – Architecture – Implementation • Examples: - Distributed L-BFGS (Batch) - Distributed Word2vec (Sequential) • Spark + Parameter Server on Hadoop Cluster
  • 4. 1.2 1.2 -78 6.3 -8.1 5.4 -8 4.2 2.3 -3.4 -1.1 2.3 4.9 7.4 4.5 2.1 -15 2.3 2.3 0.5 1.2 -0.9 -24 -1.3 -2.2 1.8 -4.9 -2.1 1.2 Web Scale ML Billions of features Hundredsofbillionsofexamples Big Model BigData Ex: Yahoo word2vec - 120 billion parameters and 500 billion samples
  • 5. 1.2 1.2 -78 6.3 -8.1 5.4 -8 4.2 2.3 -3.4 -1.1 2.3 4.9 7.4 4.5 2.1 -15 2.3 2.3 0.5 1.2 -0.9 -24 -1.3 -2.2 1.8 -4.9 -2.1 1.2 Web Scale ML Billions of features Hundredsofbillionsofexamples Big Model BigData Ex: Yahoo word2vec - 120 billion parameters and 500 billion samples Store Store Store
  • 6. 1.2 1.2 -78 6.3 -8.1 5.4 -8 4.2 2.3 -3.4 -1.1 2.3 4.9 7.4 4.5 2.1 -15 2.3 2.3 0.5 1.2 -0.9 -24 -1.3 -2.2 1.8 -4.9 -2.1 1.2 Web Scale ML Billions of features Hundredsofbillionsofexamples Big Model BigData Ex: Yahoo word2vec - 120 billion parameters and 500 billion samples Worker Worker Worker Store Store Store
  • 7. 1.2 1.2 -78 6.3 -8.1 5.4 -8 4.2 2.3 -3.4 -1.1 2.3 4.9 7.4 4.5 2.1 -15 2.3 2.3 0.5 1.2 -0.9 -24 -1.3 -2.2 1.8 -4.9 -2.1 1.2 Web Scale ML Billions of features Hundredsofbillionsofexamples Big Model BigData Ex: Yahoo word2vec - 120 billion parameters and 500 billion samples Worker Worker Worker Store Store Store Each example depends only on a tiny fraction of the model
  • 8. Two Optimization Strategies Model Multiple epochs… BATCH Example: Gradient Descent, L-BFGS Model Model Model SEQUENTIAL Multiple random samples… Example: (Minibatch) stochastic gradient method, perceptron Examples
  • 9. Two Optimization Strategies Model Multiple epochs… BATCH Example: Gradient Descent, L-BFGS Model Model Model SEQUENTIAL Multiple random samples… Example: (Minibatch) stochastic gradient method, perceptron • Small number of model updates • Accurate • Each epoch may be expensive. • Easy to parallelize. Examples
  • 10. Two Optimization Strategies Model Multiple epochs… BATCH Example: Gradient Descent, L-BFGS Model Model Model SEQUENTIAL Multiple random samples… Example: (Minibatch) stochastic gradient method, perceptron • Small number of model updates • Accurate • Each epoch may be expensive. • Easy to parallelize. • Requires lots of model updates. • Not as accurate, but often good enough • A lot of progress in one pass* for big data. • Not trivial to parallelize. *also optimal in terms of generalization error (often with a lot of tuning) Examples
  • 12. Requirements ✓ Support both batch and sequential optimization
  • 13. Requirements ✓ Support both batch and sequential optimization ✓ Sequential training: Handle frequent updates to the model
  • 14. Requirements ✓ Support both batch and sequential optimization ✓ Sequential training: Handle frequent updates to the model ✓ Batch training: 100+ passes each pass must be fast.
  • 15. Parameter Server (PS) Client Data Client Data Client Data Client Data Training state stored in PS shards, asynchronous updates PS Shard PS ShardPS Shard ΔM Model Update M Model Early work: Yahoo LDA by Smola and Narayanamurthy based on memcached (2010), Introduced in Google’s Distbelief (2012), refined in Petuum / Bösen (2013), Mu Li et al (2014)
  • 17. ML in Spark alone Executor Executor CoreCore Core Core Core Driver Holds model
  • 18. ML in Spark alone Executor Executor CoreCore Core Core Core Driver Holds model MLlib optimization
  • 19. ML in Spark alone • Sequential: – Driver-based communication limits frequency of model updates. – Large minibatch size limits model update frequency, convergence suffers. Executor Executor CoreCore Core Core Core Driver Holds model MLlib optimization
  • 20. ML in Spark alone • Sequential: – Driver-based communication limits frequency of model updates. – Large minibatch size limits model update frequency, convergence suffers. • Batch: – Driver bandwidth can be a bottleneck – Synchronous stage wise processing limits throughput. Executor Executor CoreCore Core Core Core Driver Holds model MLlib optimization
  • 21. ML in Spark alone • Sequential: – Driver-based communication limits frequency of model updates. – Large minibatch size limits model update frequency, convergence suffers. • Batch: – Driver bandwidth can be a bottleneck – Synchronous stage wise processing limits throughput. Executor Executor CoreCore Core Core Core Driver Holds model MLlib optimization PS Architecture circumvents both limitations…
  • 23. • Leverage Spark for HDFS I/O, distributed processing, fine-grained load balancing, failure recovery, in-memory operations Spark + Parameter Server
  • 24. • Leverage Spark for HDFS I/O, distributed processing, fine-grained load balancing, failure recovery, in-memory operations • Use PS to sync models, incremental updates during training, or sometimes even some vector math. Spark + Parameter Server
  • 25. • Leverage Spark for HDFS I/O, distributed processing, fine-grained load balancing, failure recovery, in-memory operations • Use PS to sync models, incremental updates during training, or sometimes even some vector math. Spark + Parameter Server HDFS Training state stored in PS shards Driver Executor ExecutorExecutor CoreCore Core Core PS Shard PS ShardPS Shard control control
  • 29. Yahoo PS Server Client API GC Preallocated arrays • In-memory • Lock per key / Lock-free • Sync / Async K-V
  • 30. Yahoo PS Server Client API GC Preallocated arrays • In-memory • Lock per key / Lock-free • Sync / Async K-V • Column- partitioned • Supports BLAS
  • 31. Yahoo PS Server Client API GC Preallocated arrays • In-memory • Lock per key / Lock-free • Sync / Async K-V • Column- partitioned • Supports BLAS • Export Model • Checkpoint HDFS
  • 32. Yahoo PS Server Client API GC Preallocated arrays • In-memory • Lock per key / Lock-free • Sync / Async K-V • Column- partitioned • Supports BLAS • Export Model • Checkpoint HDFS UDF • Client supplied aggregation • Custom shard operations
  • 33. Map PS API • Distributed key-value store abstraction • Supports batched operations in addition to usual get and put • Many operations return a future – you can operate asynchronously or block
  • 34. Matrix PS API • Vector math (BLAS style operations), in addition to everything Map API provides • Increment and fetch sparse vectors (e.g., for gradient aggregation) • We use other custom operations on shard (API not shown)
  • 38. Sponsored Search Advertising query user ad context Features Model Example Click Model: (Logistic Regression)
  • 40. L-BFGS Background Newton’s method Gradient Descent Using curvature information, you can converge faster…
  • 41. L-BFGS Background Exact, impractical Newton’s method Gradient Descent Using curvature information, you can converge faster…
  • 42. L-BFGS Background Exact, impractical Step Size computation - Needs to satisfy some technical (Wolfe) conditions - Adaptively determined from data Inverse Hessian Approximation (based on history of L-previous gradients and model deltas) Approximate, practical Newton’s method Gradient Descent Using curvature information, you can converge faster…
  • 43. L-BFGS Background Exact, impractical Step Size computation - Needs to satisfy some technical (Wolfe) conditions - Adaptively determined from data Inverse Hessian Approximation (based on history of L-previous gradients and model deltas) Approximate, practical Newton’s method Gradient Descent Using curvature information, you can converge faster…
  • 44. L-BFGS Background Exact, impractical Step Size computation - Needs to satisfy some technical (Wolfe) conditions - Adaptively determined from data Inverse Hessian Approximation (based on history of L-previous gradients and model deltas) Approximate, practical Newton’s method Gradient Descent Using curvature information, you can converge faster…
  • 45. L-BFGS Background Exact, impractical Step Size computation - Needs to satisfy some technical (Wolfe) conditions - Adaptively determined from data Inverse Hessian Approximation (based on history of L-previous gradients and model deltas) Approximate, practical Newton’s method Gradient Descent Using curvature information, you can converge faster… dotprod axpy (y ← ax + y) copy axpy scal scal Vector Math dotprod
  • 46. L-BFGS Background Exact, impractical Step Size computation - Needs to satisfy some technical (Wolfe) conditions - Adaptively determined from data Inverse Hessian Approximation (based on history of L-previous gradients and model deltas) Approximate, practical Newton’s method Gradient Descent Using curvature information, you can converge faster…
  • 47. Executor ExecutorExecutor HDFS HDFSHDFS Driver PS PS PS PS Distributed LBFGS* Compute gradient and loss 1. Incremental sparse gradient update 2. Fetch sparse portions of model Coordinates executor Step 1: Compute and update Gradient *Our design is very similar to Sandblaster L-BFGS, Jeff Dean et al, Large Scale Distributed Deep Networks (2012) state vectors
  • 48. Executor ExecutorExecutor HDFS HDFSHDFS Driver PS PS PS PS Distributed LBFGS* Compute gradient and loss 1. Incremental sparse gradient update 2. Fetch sparse portions of model Coordinates executor Step 1: Compute and update Gradient *Our design is very similar to Sandblaster L-BFGS, Jeff Dean et al, Large Scale Distributed Deep Networks (2012) state vectors
  • 49. Executor ExecutorExecutor HDFS HDFSHDFS Driver PS PS PS PS Distributed LBFGS Coordinates PS for performing L-BFGS updates Actual L-BFGS updates (BLAS vector math) Step 2: Build inverse Hessian Approximation
  • 50. Executor ExecutorExecutor HDFS HDFSHDFS Driver PS PS PS PS Distributed LBFGS Coordinates executor Compute directional derivatives for parallel line search Fetch sparse portions of modelStep 3: Compute losses and directional derivatives
  • 51. Executor ExecutorExecutor HDFS HDFSHDFS Driver PS PS PS PS Distributed LBFGS Performs line search and model update Model update (BLAS vector math)Step 4: Line search and model update
  • 53. Speedup tricks • Intersperse communication and computation
  • 54. Speedup tricks • Intersperse communication and computation • Quicker convergence – Parallel line search for step size – Curvature for initial Hessian approximation* *borrowed from vowpal wabbit
  • 55. Speedup tricks • Intersperse communication and computation • Quicker convergence – Parallel line search for step size – Curvature for initial Hessian approximation* • Network bandwidth reduction – Compressed integer arrays – Only store indices for binary data *borrowed from vowpal wabbit
  • 56. Speedup tricks • Intersperse communication and computation • Quicker convergence – Parallel line search for step size – Curvature for initial Hessian approximation* • Network bandwidth reduction – Compressed integer arrays – Only store indices for binary data • Matrix math on minibatch *borrowed from vowpal wabbit
  • 57. Speedup tricks • Intersperse communication and computation • Quicker convergence – Parallel line search for step size – Curvature for initial Hessian approximation* • Network bandwidth reduction – Compressed integer arrays – Only store indices for binary data • Matrix math on minibatch 0 750 1500 2250 3000 10 20 100 221612 2880 1260 96 MLlib PS + Spark 1.6 x 108 examples, 100 executors, 10 cores time(inseconds)perepoch feature size (millions) *borrowed from vowpal wabbit
  • 60. Word Embeddings v(paris) = [0.13, -0.4, 0.22, …., -0.45] v(lion) = [-0.23, -0.1, 0.98, …., 0.65] v(quark) = [1.4, 0.32, -0.01, …, 0.023] ...
  • 63. Word2vec • new techniques to compute vector representations of words from corpus
  • 64. Word2vec • new techniques to compute vector representations of words from corpus • geometry of vectors captures word semantics
  • 66. Word2vec • Skipgram with negative sampling:
  • 67. Word2vec • Skipgram with negative sampling: – training set includes pairs of words and neighbors in corpus, along with randomly selected words for each neighbor
  • 68. Word2vec • Skipgram with negative sampling: – training set includes pairs of words and neighbors in corpus, along with randomly selected words for each neighbor – determine w → u(w),v(w) so that sigmoid(u(w)•v(w’)) is close to (minimizes log loss) the probability that w’ is a neighbor of w as opposed to a randomly selected word.
  • 69. Word2vec • Skipgram with negative sampling: – training set includes pairs of words and neighbors in corpus, along with randomly selected words for each neighbor – determine w → u(w),v(w) so that sigmoid(u(w)•v(w’)) is close to (minimizes log loss) the probability that w’ is a neighbor of w as opposed to a randomly selected word. – SGD involves computing many vector dot products e.g., u(w)•v(w’) and vector linear combinations e.g., u(w) += α v(w’).
  • 70. Word2vec Application at Yahoo • Example training data: gas_cap_replacement_for_car slc_679f037df54f5d9c41cab05bfae0926 gas_door_replacement_for_car slc_466145af16a40717c84683db3f899d0a fuel_door_covers adid_c_28540527225_285898621262 slc_348709d73214fdeb9782f8b71aff7b6e autozone_auto_parts adid_b_3318310706_280452370893 auoto_zone slc_8dcdab5d20a2caa02b8b1d1c8ccbd36b slc_58f979b6deb6f40c640f7ca8a177af2d [ Grbovic, et. al. SIGIR 2015 and SIGIR 2016 (to appear) ]
  • 72. Distributed Word2vec • Needed system to train 200 million 300 dimensional word2vec model using minibatch SGD
  • 73. Distributed Word2vec • Needed system to train 200 million 300 dimensional word2vec model using minibatch SGD • Achieved in a high throughput and network efficient way using our matrix based PS server:
  • 74. Distributed Word2vec • Needed system to train 200 million 300 dimensional word2vec model using minibatch SGD • Achieved in a high throughput and network efficient way using our matrix based PS server: – Vectors don’t go over network.
  • 75. Distributed Word2vec • Needed system to train 200 million 300 dimensional word2vec model using minibatch SGD • Achieved in a high throughput and network efficient way using our matrix based PS server: – Vectors don’t go over network. – Most compute on PS servers, with clients aggregating partial results from shards.
  • 76. Distributed Word2vec Word2vec learners PS Shards V1 row V2 row … Vn row HDFS . . .
  • 77. Distributed Word2vec Word2vec learners PS Shards V1 row V2 row … Vn row Each shard stores a part of every vector HDFS . . .
  • 78. Distributed Word2vec Send word indices and seeds Word2vec learners PS Shards V1 row V2 row … Vn row HDFS . . .
  • 79. Distributed Word2vec Negative sampling, compute u•v Word2vec learners PS Shards V1 row V2 row … Vn row HDFS . . .
  • 80. Distributed Word2vec Word2vec learners PS Shards Aggregate results & compute lin. comb. coefficients (e.g., α…) V1 row V2 row … Vn row HDFS . . .
  • 81. Distributed Word2vec Word2vec learners PS Shards V1 row V2 row … Vn row Update vectors (v += αu, …) HDFS . . .
  • 83. Distributed Word2vec • Network lower by factor of #shards/dimension compared to conventional PS based system (1/20 to 1/100 for useful scenarios).
  • 84. Distributed Word2vec • Network lower by factor of #shards/dimension compared to conventional PS based system (1/20 to 1/100 for useful scenarios). • Trains 200 million vocab, 55 billion word search session in 2.5 days.
  • 85. Distributed Word2vec • Network lower by factor of #shards/dimension compared to conventional PS based system (1/20 to 1/100 for useful scenarios). • Trains 200 million vocab, 55 billion word search session in 2.5 days. • In production for regular training in Yahoo search ad serving system.
  • 86. Other Projects using Spark + PS • Online learning on PS – Personalization as a Service – Sponsored Search • Factorization Machines – Large scale user profiling
  • 88. Training Data on HDFS HDFS
  • 89. Launch PS Using Apache Slider on YARN HDFS YARN Apache Slider Parameter Servers
  • 90. Launch Clients using Spark or Hadoop Streaming API HDFS YARN Apache Slider Parameter Servers Apache Spark Clients
  • 92. Model Export HDFS YARN Apache Slider Parameter Server Apache Spark Clients
  • 93. Model Export HDFS YARN Apache Slider Parameter Server Apache Spark Clients
  • 94. Summary • Parameter server indispensable for big models • Spark + Parameter Server has proved to be very flexible platform for our large scale computing needs • Direct computation on the parameter servers accelerate training for our use-cases
  • 95. Thank you! For more, contact bigdata@yahoo-inc.com.