SlideShare a Scribd company logo
Marianas Labs
Alex Smola
CMU & Marianas Labs
github.com/dmlc
Fast, Cheap and Deep
Scaling machine learning
SFW
Marianas Labs
This Talk in 3 Slides
(the systems view)
Marianas Labs
Parameter Server
Data (local or cloud)
read
process
process
process
process
process
process
process
process
write
Data (local or cloud)
local
state
(or copy)
update
Server
Data (local or cloud)
read
process
process
process
process
process
process
process
process
write
Data (local or cloud)
local
state
(or copy)
update
Data (local or cloud)
read
process
process
process
process
process
process
process
process
write
Data (local or cloud)
local
state
(or copy)
update
Server
Marianas Labs
Data (local or cloud)
read
process
process
process
process
process
process
process
process
write
Data (local or cloud)
local
state
(or copy)
update
Parameter
Server
Multicore
Marianas Labs
Data (local or cloud)
read
write
Data (local or cloud)
local
state
(or copy)
update
Parameter
Server
GPUs (for Deep Learning)
Marianas Labs
Details
• Parameter Server Basics

Logistic Regression (Classification)
• Memory Subsystem

Matrix Factorization (Recommender)
• Large Distributed State

Factorization Machines (CTR)
• GPUs

MXNET and applications
• Models
Marianas Labs
Estimate Click
Through Rate
p(click|{z}
=:y
| ad, query
| {z }
=:x
, w)
Marianas Labs
sparse models
for advertising
Click Through Rate (CTR)
• Linear function class
• Logistic regression



• Optimization Problem
• Solve distributed over many machines

(typically 1TB to 1PB of data)
f(x) = hw, xi
minimize
w
log p(f|X, Y )
minimize
w
mX
i=1
log(1 + exp( yi hw, xii)) + kwk1
p(y|x, w) =
1
1 + exp ( y hw, xi)
Marianas Labs
Optimization Algorithm
• Compute gradient on data
• l1 norm is nonsmooth, hence proximal operator



• Updates for l1 are very simple



• All steps decompose by coordinates
• Solve in parallel (and asynchronously)
argmin
w
kwk1 +
2
kw (wt ⌘gt)k
wi sgn(wi) max(0, |wi| ✏)
2
Marianas Labs
Parameter Server Template
• Compute gradient on (subset of data) on each client
• Send gradient from client to server asynchronously

push(key_list,value_list,timestamp)
• Proximal gradient update on server per coordinate
• Server returns parameters

pull(key_list,value_list,timestamp)
Smola & Narayanamurthy, 2010, VLDB
Gonzalez et al., 2012, WSDM
Dean et al, 2012, NIPS
Shervashidze et al., 2013, WWW
Google, Baidu, Facebook, 

Amazon, Yahoo, Microsoft
Marianas Labs
w2Rp
i=1
where the regularizer kwk1 has a desirable property to
control the number of non-zero entries in the optimal solu-
tion w⇤
, but its non-smooth property makes this objective
function hard to be solved.
We compared parameter server with two specific-
purpose systems developed by an Internet company. For
privacy purpose, we name them System-A, and System-
B respectively. The former uses an variant of the well-
known L-BFGS [21, 5], while the latter runs an variant
of block proximal gradient method [24], which updates a
block of parameters at each iteration according to the first-
order and diagonal second-order gradients. Both systems
use sequential consistency model, but are well optimized
in both computation and communication.
We re-implemented the algorithm used by System-B on
parameter server. Besides, we made two modifications.
One is that we relax the consistency model to bounded
delay. The other one is a KKT filter to avoid sending gra-
dients which may do not affect the parameters.
Specifically, let gk be the global (first-order) gradient
on feature k at iteration t. Then, the according parameter
wk will not be changed at this iteration if wk = 0 and
 gk  due to the update rule. Therefore it is not
necessary for workers to send gk at this iteration. But a
worker does not know the global gk without communica-
tion, instead, we let a worker i approximate gk based on
its local gradient gi
k by ˜gk = ckgi
k/ci
k, where ck is the
each one has 16 cores, 192GB memory, and are connec
by 10GB Ethernet. For parameter server, we use 800
chines to form the worker group. Each worker cac
around 1 billions of parameters. The rest 200 mach
make the server group, where each machine runs 10 (
tual) server nodes.
10
−1
10
0
10
1
10
10.6
10
10.7
time (hour)
objectivevalue
System−A
System−B
Parameter Server
Figure 8: Convergence results of sparse logistic reg
sion, the goal is to achieve small objective value us
less time.
We run these three systems to achieve the same ob
tive value, the less time used the better. Both system
and parameter server use 500 blocks. In addition, par
eter server fix ⌧ = 4 for the bounded delay, which me
each worker can parallel executes 4 blocks.
Solving it at scale
• 2014 - Li et al., OSDI’14
• 500 TB data, 1011 variables
• Local file system stores files
• 1000 servers (corp cloud),
• 1h time, 140 MB/s learning
• 2015 - Online solver
• 1.1 TB (Criteo), 8·108 variables, 4·109 samples
• S3 stores files (no preprocessing) - better IO library
• 5 machines (c4.8xlarge),
• 1000s time, 220 MB/s learning
2014
Marianas Labs
Details
• Parameter Server Basics

Logistic Regression (Classification)
• Memory Subsystem

Matrix Factorization (Recommender)
• Large Distributed State

Factorization Machines (CTR)
• GPUs

MXNET and applications
• Models
Marianas Labs
Recommender Systems
• Users u, movies m (or projects)
• Function class
• Loss function for recommendation (Yelp, Netflix)



rum = hvu, wmi + bu + bm
X
u⇠m
(hvu, wmi + bu + bm yum)
2
Marianas Labs
Recommender Systems
• Regularized Objective



• Update operations
• Very simple SGD algorithm (random pairs)
• This should be cheap …
X
u⇠m
(hvu, wmi + bu + bm + b0 rum)
2
+
2
h
kUk
2
Frob + kV k
2
Frob
i
vu (1 ⌘t )vu ⌘twm (hvu, wmi + bu + bm + b0 rum)
wm (1 ⌘t )wm ⌘tvu (hvu, wmi + bu + bm + b0 rum)
memory subsystem
Marianas Labs
This should be cheap …
• O(md) burst reads and O(m) random reads
• Netflix dataset 

m = 100 million, d = 2048 dimensions, 30 steps
• Runtime should be > 4500s
• 60 GB/s memory bandwidth = 3300s
• 100 ns random reads = 1200s

We get 560s. Why?
Liu, Wang, Smola, RecSys 2015
Marianas Labs
Power law in Collaborative Filtering
# of ratings
10
0
10
2
10
4
10
6
Movieitemcounts
10
0
101
102
10
3
104
Netflix dataset
# ratings
#movies
Marianas Labs
Key Ideas
• Stratify ratings by users 

(only 1 cache miss / read per user / out of core)
• Keep frequent movies in cache

(stratify by blocks of movie popularity)
• Avoid false sharing between sockets

(key cached in the wrong CPU causes miss)
Marianas Labs
Key Ideas
GraphChi
Partitioning
Marianas Labs
Key Ideas
SC-SGD
partitioning
Marianas Labs
Speed (c4.8xlarge)
Num of Dimensions
0 500 1000 1500 2000
Seconds
0
100
200
300
400
500
600
700
800
900
1000
C-SGD
Fast SGLD
Graphchi
Graphlab
BIDMach
Num of Dimensions
0 500 1000 1500 2000
Seconds
0
1000
2000
3000
4000
5000
6000
C-SGD
Fast SGLD
Graphchi
Graphlab
Netflix - 100M, 15 iterations Yahoo - 250M, 30 iterations
g2.8xlarge
Marianas Labs
Convergence
• GraphChi blocks
(users, movies) into
random groups
• Poor mixing
• Slow convergence
Runtime in sec (30 epoches in total)
0 1000 2000 3000 4000 5000 6000
TestingRMSE
1.2
1.4
1.6
1.8
2
2.2
2.4
C-SGD at k=2048
GraphChi SGD at k=2048
Fast SGLD at k=2048
Marianas Labs
Details
• Parameter Server Basics

Logistic Regression (Classification)
• Memory Subsystem

Matrix Factorization (Recommender)
• Large Distributed State

Factorization Machines (CTR)
• GPUs

MXNET and applications
• Models
Marianas Labs
A Linear Model
is not enough
p(y|x, w)
Marianas Labs
Factorization Machines
• Linear Model



• Polynomial Expansion (Rendle, 2012)
memory hogmemory hog
too large for
individual machine
f(x) = hw, xi
f(x) = hw, xi +
X
i<j
xixj tr
⇣
V
(2)
i ⌦ V
(2)
j
⌘
+
X
i<j<k
xixjxk tr
⇣
V
(3)
i ⌦ V
(3)
j ⌦ V
(3)
k
⌘
+ . . .
Marianas Labs
Prefetching to the rescue
• Most keys are infrequent

(power law distribution)
• Prefetch the embedding

vectors for a minibatch from

parameter server
• Compute gradients and push to server
• Variable dimensionality embedding
• Enforcing sparsity (ANOVA style)
• Adaptive gradient normalization
• Frequency adaptive regularization (CF style)
a minibatch. The consequence of this is
associated with frequent features are less
The penalty can be presented as
X
i
⇥
iw2
i + µi kVik2
2
⇤
with i / µi. (11)
namic Regularization) Assume that we solve
machines problem with the following updates
pdate setting: for each feature i,
6= 0 for some (xj, yj) 2 B} (12)
X
(xj ,yj )2B
@wi l(f(xj), yj) ⌘t wiIi (13)
X
(xj ,yj )2B
@Vi l(f(xj), yj) ⌘tµViIi (14)
the minibatch and b the according size, and
er or not feature i appears in this minibatch.
gularization is given by
1 2 3
server
nodes:
worker
nodes:
t=1 t=1 t=2
t=2 t=3 t=4
Figure 1: An illustration of asynchronous SGD. Wo
W1 and W2 first pull weights from the servers at tim
Next W1 pushes the gradient back and therefore incr
the timestamp at the servers. Worker W3 pulls the we
immediately after that. Then W2 and W3 push the grad
back. The delays for W1, W2, and W3 are 0, 1, and 1.
data preprocessing. In this paper, we consider asynchro
stochastic gradient descent (SGD) [8] for optimization.
4.1 Asynchronous Stochastic Gradient Des
Marianas Labs
Better Models10
0
10
1
10
2
300
(b) Runtim
10
0
10
1
10
2
−3
−2.5
−2
−1.5
−1
−0.5
0
dimesion (k)
relativelogloss(%)
no mem adaption
freqency
freqency + l1 shrk
(Criteo 1TB)
k
what
everyone
else does
Marianas Labs
Faster Solver (small Criteo)
10
1
10
2
10
3
10
4
10
−0.348
10
−0.345
10
−0.342
10
−0.339
time (sec)
testlogloss
LibFM
DiFacto, 1 worker
DiFacto, 10 workers
sec
LibFM died on
large models
Marianas Labs
0 5 10 15 20
0
2
4
6
8
10
# of machines
speedup(x)
Criteo2
CTR2
Multiple Machines 

Li, Wang, Liu, Smola, WSDM’16
80x LibFM speed

on 16 machines
Marianas Labs
Details
• Parameter Server Basics

Logistic Regression (Classification)
• Memory Subsystem

Matrix Factorization (Recommender)
• Large Distributed State

Factorization Machines (CTR)
• GPUs

MXNET and applications
• Data & Models
Marianas Labs
MXNET - Distributed Deep Learning
• Several deep learning frameworks
• Torch7: matlab-like tensor computation in Lua
• Theano: symbolic programming in Python
• Caffe: configuration in Protobuf
• Problem - need to learn another system for a
different programing flavor
• MXNET
• provides programming interfaces in a consistent way
• multiple languages: C++/Python/R/Julia/…
• fast, memory efficient, and distributed training
Marianas Labs
Programming Interfaces
• Tensor Algebra Interface
• compatible (e.g. NumPy)
• GPU/CPU/ARM (mobile)
• Symbolic Expression
• Easy Deep Network
design
• Automatic differentiation
• Mixed programming
• CPU/GPU
• symbolic + local code
Python
Julia
Marianas Labs
from data import ilsvrc12_iterator
import mxnet as mx
## define alexnet
input_data = mx.symbol.Variable(name="data")
# stage 1
conv1 = mx.symbol.Convolution(data=input_data, kernel=(11, 11), stride=(4, 4), num_filter=96)
relu1 = mx.symbol.Activation(data=conv1, act_type="relu")
pool1 = mx.symbol.Pooling(data=relu1, pool_type="max", kernel=(3, 3), stride=(2,2))
lrn1 = mx.symbol.LRN(data=pool1, alpha=0.0001, beta=0.75, knorm=1, nsize=5)
# stage 2
conv2 = mx.symbol.Convolution(data=lrn1, kernel=(5, 5), pad=(2, 2), num_filter=256)
relu2 = mx.symbol.Activation(data=conv2, act_type="relu")
pool2 = mx.symbol.Pooling(data=relu2, kernel=(3, 3), stride=(2, 2), pool_type="max")
lrn2 = mx.symbol.LRN(data=pool2, alpha=0.0001, beta=0.75, knorm=1, nsize=5)
# stage 3
conv3 = mx.symbol.Convolution(data=lrn2, kernel=(3, 3), pad=(1, 1), num_filter=384)
relu3 = mx.symbol.Activation(data=conv3, act_type="relu")
conv4 = mx.symbol.Convolution(data=relu3, kernel=(3, 3), pad=(1, 1), num_filter=384)
relu4 = mx.symbol.Activation(data=conv4, act_type="relu")
conv5 = mx.symbol.Convolution(data=relu4, kernel=(3, 3), pad=(1, 1), num_filter=256)
relu5 = mx.symbol.Activation(data=conv5, act_type="relu")
pool3 = mx.symbol.Pooling(data=relu5, kernel=(3, 3), stride=(2, 2), pool_type="max")
# stage 4
flatten = mx.symbol.Flatten(data=pool3)
fc1 = mx.symbol.FullyConnected(data=flatten, num_hidden=4096)
relu6 = mx.symbol.Activation(data=fc1, act_type="relu")
dropout1 = mx.symbol.Dropout(data=relu6, p=0.5)
# stage 5
fc2 = mx.symbol.FullyConnected(data=dropout1, num_hidden=4096)
relu7 = mx.symbol.Activation(data=fc2, act_type="relu")
dropout2 = mx.symbol.Dropout(data=relu7, p=0.5)
# stage 6
fc3 = mx.symbol.FullyConnected(data=dropout2, num_hidden=1000)
softmax = mx.symbol.SoftmaxOutput(data=fc3, name='softmax')
## data
batch_size = 256
train, val = ilsvrc12_iterator(batch_size=batch_size, input_shape=(3,224,224))
## train
num_gpus = 4
gpus = [mx.gpu(i) for i in range(num_gpus)]
model = mx.model.FeedForward(
ctx = gpus,
symbol = softmax,
num_round = 20,
learning_rate = 0.01,
momentum = 0.9,
wd = 0.00001)
model.fit(X = train, eval_data = val, batch_end_callback = mx.callback.Speedometer(batch_size=batch_size))
6 layers
definition
6 layers
definition
multi GPUmulti GPU
train ittrain it
Marianas Labs
Engine
• All operations issued into backend C++ engine

Binding languages’ performance does not matter
• Engine builds the read/write dependency graph
• lazy evaluation
• parallel execution
• sophisticated memory

allocation
Marianas Labs
Distributed Deep Learning
Marianas Labs
Distributed Deep Learning
Plays well with S3

Spot instances &
containers
Marianas Labs
• Imagenet datasets (1m images, 1k classes)
• Google Inception network 

(88 lines of code)
Results
40% faster due to parallelization
1 2 4
0
100
200
300
400
#ofimages/sec
# of GPUs
CXXNet
MXNet
naive inplace sharing both
0
1
2
3
4
5
memoryusage(GB)
50% GPU memory reduction
Marianas Labs
Distributed Results
x
0
2
4
6
8
Mbps
0
300
600
900
1200
machine
1 2 3 4 5 6 7 8 9 10
bandwidth speedup
g2.2xlarge network limit
10
0
10
2
0.2
0.3
0.4
0.5
0.6
0.7
time (hour)
accuracy
single GPU
4 x dual GPUs
Convergence on
4 x dual GTX980
Marianas Labs
Amazon g2.8xlarge
• 12 instances (48 GPUs) @ $0.50/h spot
• Minibatch size 512
• BSP with 1 delay between machines
• 2 GB/s bandwidth between machines (awful)
10.113.170.187,10.157.109.227, 10.169.170.55, 10.136.52.151, 10.45.64.250, 10.166.137.100, 

10.97.167.10, 10.97.187.157, 10.61.128.107, 10.171.105.160, 10.203.143.220, 10.45.71.20 (all over the place in availability zone)
• Compressing to 1 byte per coordinate helps a bit
but adds latency due to extra pass (need to fix)
• 37x speedup on 48 GPUS
• Imagenet’12 dataset in trained in 4h, i.e. $24

(with alexnet; googlenet even better for network)
Marianas Labs
Mobile, too
• 10 hours on 10 GTX980
• Deployed on Android
Marianas Labs
Plans
• This month (went live 2 months ago)
• 700 stars
• 27 authors with

>1k commits
• Ranks 5th on Github C++ project trending
• Build a larger community
• Many more applications & platforms
• Image/video analysis, Voice recognition
• Language model, machine translation
• Intel drivers
Marianas Labs
Side-effects of using MXNET
Zichao was
using Theano
Marianas Labs
Tensorflow vs. MXNET
Languages MultiGPU Distributed Mobile
Runtime
Engine
Tensorflow Python Yes No Yes ?
MXNET
Python, R,
Julia, Go
Yes Yes Yes Yes
(Googlenet)
E5-1650/980
Tensor
flow
Torch7 Caffe MXNET
Time 940ms 172ms 170ms 180ms
Memory
all (OOM
after 24)
2.1GB 2.2GB 1.6GB
Marianas Labs
More numbers (this morning)
Googlenet
batch 32 (24)
Alexnet

batch 128
VGG
batch 32 (24)
Torch7 172ms 2.1GB 135ms 1.6GB 459ms 3.4gb
Caffe 170ms 2.2GB 165ms 1.34GB 448ms 3.4gb
MXNET
172ms

(new and
improved)
1.5GB 147ms 1.38GB 456ms 3.6gb
Tensor flow 940ms - 433ms - 980ms -
Marianas Labs
Details
• Parameter Server Basics

Logistic Regression (Classification)
• Memory Subsystem

Matrix Factorization (Recommender)
• Large Distributed State

Factorization Machines (CTR)
• GPUs

MXNET and applications
• Models
Marianas Labs
Models
• Fully connected networks (generic data)

(70s and 80s … trivial to scale, see e.g. CAFFE)
• Convolutional networks (1D, 2D grids)

(90s - images, 2014 - text … see e.g. CUDNN)
• Recurrent networks (time series, text)

(90s - LSTM - latent variable autoregressive)
• Memory
• Attention model (Cho et al, 2014)
• Memory model (Weston et al, 2014)
• Memory rewriting - NTM (Graves et al, 2014)
Marianas Labs
Marianas Labs
Memory Networks

(Sukhbataar, Szlam, Fergus, Weston, 2015)
attention layer
state layer
query
Marianas Labs
Memory Networks

(Sukhbataar, Szlam, Fergus, Weston, 2015)
• Query weirdly defined

(via bilinear embedding matrix)
• Works on facts / text only

(ingest sequence and weigh it)
• ‘Ask me anything’ by Socher et al. 2015

(uses GRU rather than LSTMs)
Marianas Labs
Memory networks for images
http://arxiv.org/abs/1511.02274 (Yang et al, 2015)
Marianas Labs
Results
Best results on
• DAQUAR
• DAQUAR-
REDUCED
• COCO-QA
Marianas Labs
Marianas Labs
Marianas Labs
Summary
• Parameter Server Basics

Logistic Regression
• Large Distributed State

Factorization Machines
• Memory Subsystem

Matrix Factorization
• GPUs

MXNET and applications
• Models
Data (local or cloud)
read
process
process
process
process
process
process
process
process
write
Data (local or cloud)
local
state
(or copy)
update
Data (local or cloud)
read
process
process
process
process
process
process
process
process
write
Data (local or cloud)
local
state
(or copy)
update
Data (local or cloud)
read
process
process
process
process
process
process
process
process
write
Data (local or cloud)
local
state
(or copy)
update
Server
Server
github.com/dmlc
Marianas Labs
Many thanks to
• CMU & Marianas
• Mu Li
• Ziqi Liu
• Yu-Xiang Wang
• Zichao Zhang
• Microsoft
• Xiaodong He
• Jianfeng Gao
• Li Deng
• MXNET
• Tianqi Chen (UW)
• Bing Xu
• Naiyang Wang
• Minjie Wang
• Tianjun Xiao
• Jianpeng Li
• Jiaxing Zhang
• Chuntao Hong

More Related Content

What's hot

A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander UlanovA Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
Spark Summit
 
Caffe framework tutorial2
Caffe framework tutorial2Caffe framework tutorial2
Caffe framework tutorial2
Park Chunduck
 
[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codes
NAVER D2
 
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
MLconf
 
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
MLconf
 
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
MLconf
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Spark Summit
 
Scaling Deep Learning with MXNet
Scaling Deep Learning with MXNetScaling Deep Learning with MXNet
Scaling Deep Learning with MXNet
AI Frontiers
 
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
MLconf
 
Scalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNetScalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNet
Amazon Web Services
 
Introduction to Deep Learning with Python
Introduction to Deep Learning with PythonIntroduction to Deep Learning with Python
Introduction to Deep Learning with Python
indico data
 
Tokyo Webmining Talk1
Tokyo Webmining Talk1Tokyo Webmining Talk1
Tokyo Webmining Talk1
Kenta Oono
 
A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)
A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)
A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)
Alexander Ulanov
 
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!
DataWorks Summit
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architecture
Mani Goswami
 
Online learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopOnline learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and Hadoop
Héloïse Nonne
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
Emanuel Di Nardo
 
Technical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal WabbitTechnical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal Wabbit
jakehofman
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」
Preferred Networks
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Flink Forward
 

What's hot (20)

A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander UlanovA Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
 
Caffe framework tutorial2
Caffe framework tutorial2Caffe framework tutorial2
Caffe framework tutorial2
 
[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codes
 
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
 
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
 
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
 
Scaling Deep Learning with MXNet
Scaling Deep Learning with MXNetScaling Deep Learning with MXNet
Scaling Deep Learning with MXNet
 
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
 
Scalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNetScalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNet
 
Introduction to Deep Learning with Python
Introduction to Deep Learning with PythonIntroduction to Deep Learning with Python
Introduction to Deep Learning with Python
 
Tokyo Webmining Talk1
Tokyo Webmining Talk1Tokyo Webmining Talk1
Tokyo Webmining Talk1
 
A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)
A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)
A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)
 
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architecture
 
Online learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopOnline learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and Hadoop
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 
Technical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal WabbitTechnical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal Wabbit
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
 

Viewers also liked

Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
MLconf
 
MXNet talk @useR!2016
MXNet talk @useR!2016MXNet talk @useR!2016
MXNet talk @useR!2016
Qiang Kou
 
Performance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorialPerformance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorialBilkent University
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
HJ van Veen
 
53 interesting ideas for your teaching
53 interesting ideas for your teaching53 interesting ideas for your teaching
53 interesting ideas for your teaching
largerama
 
How To Make Your Teaching Interesting
How To Make Your Teaching InterestingHow To Make Your Teaching Interesting
How To Make Your Teaching Interesting
Kidzrio
 
Why not do a PhD
Why not do a PhDWhy not do a PhD
Why not do a PhD
DesiLinguist
 
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in RUnderstanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Manish Saraswat
 
Geetu Ambwani, Principal Data Scientist, Huffington Post at MLconf NYC - 4/15/16
Geetu Ambwani, Principal Data Scientist, Huffington Post at MLconf NYC - 4/15/16Geetu Ambwani, Principal Data Scientist, Huffington Post at MLconf NYC - 4/15/16
Geetu Ambwani, Principal Data Scientist, Huffington Post at MLconf NYC - 4/15/16
MLconf
 
Qualcomm research-imagenet2015
Qualcomm research-imagenet2015Qualcomm research-imagenet2015
Qualcomm research-imagenet2015
Bilkent University
 
Alessandro Magnani, Data Scientist, @WalmartLabs at MLconf SF - 11/13/15
Alessandro Magnani, Data Scientist, @WalmartLabs at MLconf SF - 11/13/15Alessandro Magnani, Data Scientist, @WalmartLabs at MLconf SF - 11/13/15
Alessandro Magnani, Data Scientist, @WalmartLabs at MLconf SF - 11/13/15
MLconf
 
Machine Learning Methods for Parameter Acquisition in a Human ...
Machine Learning Methods for Parameter Acquisition in a Human ...Machine Learning Methods for Parameter Acquisition in a Human ...
Machine Learning Methods for Parameter Acquisition in a Human ...butest
 
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
asimkadav
 
Spark Summit EU talk by Rolf Jagerman
Spark Summit EU talk by Rolf JagermanSpark Summit EU talk by Rolf Jagerman
Spark Summit EU talk by Rolf Jagerman
Spark Summit
 
Academia vs industry
Academia vs industryAcademia vs industry
Academia vs industry
Setia Pramana
 
Ben Hamner, Co-founder and CTO, Kaggle at MLconf SF - 11/13/15
Ben Hamner, Co-founder and CTO, Kaggle at MLconf SF - 11/13/15Ben Hamner, Co-founder and CTO, Kaggle at MLconf SF - 11/13/15
Ben Hamner, Co-founder and CTO, Kaggle at MLconf SF - 11/13/15
MLconf
 
Fame cvpr
Fame cvprFame cvpr
Scalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In BaiduScalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In Baidu
Jen Aman
 
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
MLconf
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
MLconf
 

Viewers also liked (20)

Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
 
MXNet talk @useR!2016
MXNet talk @useR!2016MXNet talk @useR!2016
MXNet talk @useR!2016
 
Performance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorialPerformance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorial
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
53 interesting ideas for your teaching
53 interesting ideas for your teaching53 interesting ideas for your teaching
53 interesting ideas for your teaching
 
How To Make Your Teaching Interesting
How To Make Your Teaching InterestingHow To Make Your Teaching Interesting
How To Make Your Teaching Interesting
 
Why not do a PhD
Why not do a PhDWhy not do a PhD
Why not do a PhD
 
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in RUnderstanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
 
Geetu Ambwani, Principal Data Scientist, Huffington Post at MLconf NYC - 4/15/16
Geetu Ambwani, Principal Data Scientist, Huffington Post at MLconf NYC - 4/15/16Geetu Ambwani, Principal Data Scientist, Huffington Post at MLconf NYC - 4/15/16
Geetu Ambwani, Principal Data Scientist, Huffington Post at MLconf NYC - 4/15/16
 
Qualcomm research-imagenet2015
Qualcomm research-imagenet2015Qualcomm research-imagenet2015
Qualcomm research-imagenet2015
 
Alessandro Magnani, Data Scientist, @WalmartLabs at MLconf SF - 11/13/15
Alessandro Magnani, Data Scientist, @WalmartLabs at MLconf SF - 11/13/15Alessandro Magnani, Data Scientist, @WalmartLabs at MLconf SF - 11/13/15
Alessandro Magnani, Data Scientist, @WalmartLabs at MLconf SF - 11/13/15
 
Machine Learning Methods for Parameter Acquisition in a Human ...
Machine Learning Methods for Parameter Acquisition in a Human ...Machine Learning Methods for Parameter Acquisition in a Human ...
Machine Learning Methods for Parameter Acquisition in a Human ...
 
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
 
Spark Summit EU talk by Rolf Jagerman
Spark Summit EU talk by Rolf JagermanSpark Summit EU talk by Rolf Jagerman
Spark Summit EU talk by Rolf Jagerman
 
Academia vs industry
Academia vs industryAcademia vs industry
Academia vs industry
 
Ben Hamner, Co-founder and CTO, Kaggle at MLconf SF - 11/13/15
Ben Hamner, Co-founder and CTO, Kaggle at MLconf SF - 11/13/15Ben Hamner, Co-founder and CTO, Kaggle at MLconf SF - 11/13/15
Ben Hamner, Co-founder and CTO, Kaggle at MLconf SF - 11/13/15
 
Fame cvpr
Fame cvprFame cvpr
Fame cvpr
 
Scalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In BaiduScalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In Baidu
 
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
 

Similar to Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon University at MLconf SF - 11/13/15

CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
Coburn Watson
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...
NECST Lab @ Politecnico di Milano
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...
NECST Lab @ Politecnico di Milano
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Sean Zhong
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
Chester Chen
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Duy-Hieu Bui
 
Autonomous control in Big Data platforms: and experience with Cassandra
Autonomous control in Big Data platforms: and experience with CassandraAutonomous control in Big Data platforms: and experience with Cassandra
Autonomous control in Big Data platforms: and experience with Cassandra
Emiliano
 
Iwsm2014 performance measurement for cloud computing applications using iso...
Iwsm2014   performance measurement for cloud computing applications using iso...Iwsm2014   performance measurement for cloud computing applications using iso...
Iwsm2014 performance measurement for cloud computing applications using iso...
Nesma
 
data-stream-processing-SEEP.pptx
data-stream-processing-SEEP.pptxdata-stream-processing-SEEP.pptx
data-stream-processing-SEEP.pptx
AhmadTawfigAlRadaide
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
Amazon Web Services
 
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like systemAccelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Shuai Yuan
 
Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)
Shien-Chun Luo
 
High Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureHigh Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureDataStax Academy
 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Amazon Web Services
 
Adventures in RDS Load Testing
Adventures in RDS Load TestingAdventures in RDS Load Testing
Adventures in RDS Load Testing
Mike Harnish
 
Cloudsim & greencloud
Cloudsim & greencloud Cloudsim & greencloud
Cloudsim & greencloud
nedamaleki87
 
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMachine learning at Scale with Apache Spark
Machine learning at Scale with Apache Spark
Martin Zapletal
 
High Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and SolutionsHigh Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and Solutions
Yinghai Lu
 
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, SematextTuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Lucidworks
 
IncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudIncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the Cloud
Gábor Szárnyas
 

Similar to Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon University at MLconf SF - 11/13/15 (20)

CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
 
Autonomous control in Big Data platforms: and experience with Cassandra
Autonomous control in Big Data platforms: and experience with CassandraAutonomous control in Big Data platforms: and experience with Cassandra
Autonomous control in Big Data platforms: and experience with Cassandra
 
Iwsm2014 performance measurement for cloud computing applications using iso...
Iwsm2014   performance measurement for cloud computing applications using iso...Iwsm2014   performance measurement for cloud computing applications using iso...
Iwsm2014 performance measurement for cloud computing applications using iso...
 
data-stream-processing-SEEP.pptx
data-stream-processing-SEEP.pptxdata-stream-processing-SEEP.pptx
data-stream-processing-SEEP.pptx
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
 
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like systemAccelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
 
Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)
 
High Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureHigh Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & Azure
 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
 
Adventures in RDS Load Testing
Adventures in RDS Load TestingAdventures in RDS Load Testing
Adventures in RDS Load Testing
 
Cloudsim & greencloud
Cloudsim & greencloud Cloudsim & greencloud
Cloudsim & greencloud
 
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMachine learning at Scale with Apache Spark
Machine learning at Scale with Apache Spark
 
High Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and SolutionsHigh Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and Solutions
 
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, SematextTuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
 
IncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudIncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the Cloud
 

More from MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
MLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
MLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
MLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
MLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
MLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
MLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
MLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
MLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
MLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
MLconf
 

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Recently uploaded

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 

Recently uploaded (20)

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 

Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon University at MLconf SF - 11/13/15

  • 1. Marianas Labs Alex Smola CMU & Marianas Labs github.com/dmlc Fast, Cheap and Deep Scaling machine learning SFW
  • 2. Marianas Labs This Talk in 3 Slides (the systems view)
  • 3. Marianas Labs Parameter Server Data (local or cloud) read process process process process process process process process write Data (local or cloud) local state (or copy) update Server Data (local or cloud) read process process process process process process process process write Data (local or cloud) local state (or copy) update Data (local or cloud) read process process process process process process process process write Data (local or cloud) local state (or copy) update Server
  • 4. Marianas Labs Data (local or cloud) read process process process process process process process process write Data (local or cloud) local state (or copy) update Parameter Server Multicore
  • 5. Marianas Labs Data (local or cloud) read write Data (local or cloud) local state (or copy) update Parameter Server GPUs (for Deep Learning)
  • 6. Marianas Labs Details • Parameter Server Basics
 Logistic Regression (Classification) • Memory Subsystem
 Matrix Factorization (Recommender) • Large Distributed State
 Factorization Machines (CTR) • GPUs
 MXNET and applications • Models
  • 7. Marianas Labs Estimate Click Through Rate p(click|{z} =:y | ad, query | {z } =:x , w)
  • 8. Marianas Labs sparse models for advertising Click Through Rate (CTR) • Linear function class • Logistic regression
 
 • Optimization Problem • Solve distributed over many machines
 (typically 1TB to 1PB of data) f(x) = hw, xi minimize w log p(f|X, Y ) minimize w mX i=1 log(1 + exp( yi hw, xii)) + kwk1 p(y|x, w) = 1 1 + exp ( y hw, xi)
  • 9. Marianas Labs Optimization Algorithm • Compute gradient on data • l1 norm is nonsmooth, hence proximal operator
 
 • Updates for l1 are very simple
 
 • All steps decompose by coordinates • Solve in parallel (and asynchronously) argmin w kwk1 + 2 kw (wt ⌘gt)k wi sgn(wi) max(0, |wi| ✏) 2
  • 10. Marianas Labs Parameter Server Template • Compute gradient on (subset of data) on each client • Send gradient from client to server asynchronously
 push(key_list,value_list,timestamp) • Proximal gradient update on server per coordinate • Server returns parameters
 pull(key_list,value_list,timestamp) Smola & Narayanamurthy, 2010, VLDB Gonzalez et al., 2012, WSDM Dean et al, 2012, NIPS Shervashidze et al., 2013, WWW Google, Baidu, Facebook, 
 Amazon, Yahoo, Microsoft
  • 11. Marianas Labs w2Rp i=1 where the regularizer kwk1 has a desirable property to control the number of non-zero entries in the optimal solu- tion w⇤ , but its non-smooth property makes this objective function hard to be solved. We compared parameter server with two specific- purpose systems developed by an Internet company. For privacy purpose, we name them System-A, and System- B respectively. The former uses an variant of the well- known L-BFGS [21, 5], while the latter runs an variant of block proximal gradient method [24], which updates a block of parameters at each iteration according to the first- order and diagonal second-order gradients. Both systems use sequential consistency model, but are well optimized in both computation and communication. We re-implemented the algorithm used by System-B on parameter server. Besides, we made two modifications. One is that we relax the consistency model to bounded delay. The other one is a KKT filter to avoid sending gra- dients which may do not affect the parameters. Specifically, let gk be the global (first-order) gradient on feature k at iteration t. Then, the according parameter wk will not be changed at this iteration if wk = 0 and  gk  due to the update rule. Therefore it is not necessary for workers to send gk at this iteration. But a worker does not know the global gk without communica- tion, instead, we let a worker i approximate gk based on its local gradient gi k by ˜gk = ckgi k/ci k, where ck is the each one has 16 cores, 192GB memory, and are connec by 10GB Ethernet. For parameter server, we use 800 chines to form the worker group. Each worker cac around 1 billions of parameters. The rest 200 mach make the server group, where each machine runs 10 ( tual) server nodes. 10 −1 10 0 10 1 10 10.6 10 10.7 time (hour) objectivevalue System−A System−B Parameter Server Figure 8: Convergence results of sparse logistic reg sion, the goal is to achieve small objective value us less time. We run these three systems to achieve the same ob tive value, the less time used the better. Both system and parameter server use 500 blocks. In addition, par eter server fix ⌧ = 4 for the bounded delay, which me each worker can parallel executes 4 blocks. Solving it at scale • 2014 - Li et al., OSDI’14 • 500 TB data, 1011 variables • Local file system stores files • 1000 servers (corp cloud), • 1h time, 140 MB/s learning • 2015 - Online solver • 1.1 TB (Criteo), 8·108 variables, 4·109 samples • S3 stores files (no preprocessing) - better IO library • 5 machines (c4.8xlarge), • 1000s time, 220 MB/s learning 2014
  • 12. Marianas Labs Details • Parameter Server Basics
 Logistic Regression (Classification) • Memory Subsystem
 Matrix Factorization (Recommender) • Large Distributed State
 Factorization Machines (CTR) • GPUs
 MXNET and applications • Models
  • 13. Marianas Labs Recommender Systems • Users u, movies m (or projects) • Function class • Loss function for recommendation (Yelp, Netflix)
 
 rum = hvu, wmi + bu + bm X u⇠m (hvu, wmi + bu + bm yum) 2
  • 14. Marianas Labs Recommender Systems • Regularized Objective
 
 • Update operations • Very simple SGD algorithm (random pairs) • This should be cheap … X u⇠m (hvu, wmi + bu + bm + b0 rum) 2 + 2 h kUk 2 Frob + kV k 2 Frob i vu (1 ⌘t )vu ⌘twm (hvu, wmi + bu + bm + b0 rum) wm (1 ⌘t )wm ⌘tvu (hvu, wmi + bu + bm + b0 rum) memory subsystem
  • 15. Marianas Labs This should be cheap … • O(md) burst reads and O(m) random reads • Netflix dataset 
 m = 100 million, d = 2048 dimensions, 30 steps • Runtime should be > 4500s • 60 GB/s memory bandwidth = 3300s • 100 ns random reads = 1200s
 We get 560s. Why? Liu, Wang, Smola, RecSys 2015
  • 16. Marianas Labs Power law in Collaborative Filtering # of ratings 10 0 10 2 10 4 10 6 Movieitemcounts 10 0 101 102 10 3 104 Netflix dataset # ratings #movies
  • 17. Marianas Labs Key Ideas • Stratify ratings by users 
 (only 1 cache miss / read per user / out of core) • Keep frequent movies in cache
 (stratify by blocks of movie popularity) • Avoid false sharing between sockets
 (key cached in the wrong CPU causes miss)
  • 20. Marianas Labs Speed (c4.8xlarge) Num of Dimensions 0 500 1000 1500 2000 Seconds 0 100 200 300 400 500 600 700 800 900 1000 C-SGD Fast SGLD Graphchi Graphlab BIDMach Num of Dimensions 0 500 1000 1500 2000 Seconds 0 1000 2000 3000 4000 5000 6000 C-SGD Fast SGLD Graphchi Graphlab Netflix - 100M, 15 iterations Yahoo - 250M, 30 iterations g2.8xlarge
  • 21. Marianas Labs Convergence • GraphChi blocks (users, movies) into random groups • Poor mixing • Slow convergence Runtime in sec (30 epoches in total) 0 1000 2000 3000 4000 5000 6000 TestingRMSE 1.2 1.4 1.6 1.8 2 2.2 2.4 C-SGD at k=2048 GraphChi SGD at k=2048 Fast SGLD at k=2048
  • 22. Marianas Labs Details • Parameter Server Basics
 Logistic Regression (Classification) • Memory Subsystem
 Matrix Factorization (Recommender) • Large Distributed State
 Factorization Machines (CTR) • GPUs
 MXNET and applications • Models
  • 23. Marianas Labs A Linear Model is not enough p(y|x, w)
  • 24. Marianas Labs Factorization Machines • Linear Model
 
 • Polynomial Expansion (Rendle, 2012) memory hogmemory hog too large for individual machine f(x) = hw, xi f(x) = hw, xi + X i<j xixj tr ⇣ V (2) i ⌦ V (2) j ⌘ + X i<j<k xixjxk tr ⇣ V (3) i ⌦ V (3) j ⌦ V (3) k ⌘ + . . .
  • 25. Marianas Labs Prefetching to the rescue • Most keys are infrequent
 (power law distribution) • Prefetch the embedding
 vectors for a minibatch from
 parameter server • Compute gradients and push to server • Variable dimensionality embedding • Enforcing sparsity (ANOVA style) • Adaptive gradient normalization • Frequency adaptive regularization (CF style) a minibatch. The consequence of this is associated with frequent features are less The penalty can be presented as X i ⇥ iw2 i + µi kVik2 2 ⇤ with i / µi. (11) namic Regularization) Assume that we solve machines problem with the following updates pdate setting: for each feature i, 6= 0 for some (xj, yj) 2 B} (12) X (xj ,yj )2B @wi l(f(xj), yj) ⌘t wiIi (13) X (xj ,yj )2B @Vi l(f(xj), yj) ⌘tµViIi (14) the minibatch and b the according size, and er or not feature i appears in this minibatch. gularization is given by 1 2 3 server nodes: worker nodes: t=1 t=1 t=2 t=2 t=3 t=4 Figure 1: An illustration of asynchronous SGD. Wo W1 and W2 first pull weights from the servers at tim Next W1 pushes the gradient back and therefore incr the timestamp at the servers. Worker W3 pulls the we immediately after that. Then W2 and W3 push the grad back. The delays for W1, W2, and W3 are 0, 1, and 1. data preprocessing. In this paper, we consider asynchro stochastic gradient descent (SGD) [8] for optimization. 4.1 Asynchronous Stochastic Gradient Des
  • 26. Marianas Labs Better Models10 0 10 1 10 2 300 (b) Runtim 10 0 10 1 10 2 −3 −2.5 −2 −1.5 −1 −0.5 0 dimesion (k) relativelogloss(%) no mem adaption freqency freqency + l1 shrk (Criteo 1TB) k what everyone else does
  • 27. Marianas Labs Faster Solver (small Criteo) 10 1 10 2 10 3 10 4 10 −0.348 10 −0.345 10 −0.342 10 −0.339 time (sec) testlogloss LibFM DiFacto, 1 worker DiFacto, 10 workers sec LibFM died on large models
  • 28. Marianas Labs 0 5 10 15 20 0 2 4 6 8 10 # of machines speedup(x) Criteo2 CTR2 Multiple Machines 
 Li, Wang, Liu, Smola, WSDM’16 80x LibFM speed
 on 16 machines
  • 29. Marianas Labs Details • Parameter Server Basics
 Logistic Regression (Classification) • Memory Subsystem
 Matrix Factorization (Recommender) • Large Distributed State
 Factorization Machines (CTR) • GPUs
 MXNET and applications • Data & Models
  • 30. Marianas Labs MXNET - Distributed Deep Learning • Several deep learning frameworks • Torch7: matlab-like tensor computation in Lua • Theano: symbolic programming in Python • Caffe: configuration in Protobuf • Problem - need to learn another system for a different programing flavor • MXNET • provides programming interfaces in a consistent way • multiple languages: C++/Python/R/Julia/… • fast, memory efficient, and distributed training
  • 31. Marianas Labs Programming Interfaces • Tensor Algebra Interface • compatible (e.g. NumPy) • GPU/CPU/ARM (mobile) • Symbolic Expression • Easy Deep Network design • Automatic differentiation • Mixed programming • CPU/GPU • symbolic + local code Python Julia
  • 32. Marianas Labs from data import ilsvrc12_iterator import mxnet as mx ## define alexnet input_data = mx.symbol.Variable(name="data") # stage 1 conv1 = mx.symbol.Convolution(data=input_data, kernel=(11, 11), stride=(4, 4), num_filter=96) relu1 = mx.symbol.Activation(data=conv1, act_type="relu") pool1 = mx.symbol.Pooling(data=relu1, pool_type="max", kernel=(3, 3), stride=(2,2)) lrn1 = mx.symbol.LRN(data=pool1, alpha=0.0001, beta=0.75, knorm=1, nsize=5) # stage 2 conv2 = mx.symbol.Convolution(data=lrn1, kernel=(5, 5), pad=(2, 2), num_filter=256) relu2 = mx.symbol.Activation(data=conv2, act_type="relu") pool2 = mx.symbol.Pooling(data=relu2, kernel=(3, 3), stride=(2, 2), pool_type="max") lrn2 = mx.symbol.LRN(data=pool2, alpha=0.0001, beta=0.75, knorm=1, nsize=5) # stage 3 conv3 = mx.symbol.Convolution(data=lrn2, kernel=(3, 3), pad=(1, 1), num_filter=384) relu3 = mx.symbol.Activation(data=conv3, act_type="relu") conv4 = mx.symbol.Convolution(data=relu3, kernel=(3, 3), pad=(1, 1), num_filter=384) relu4 = mx.symbol.Activation(data=conv4, act_type="relu") conv5 = mx.symbol.Convolution(data=relu4, kernel=(3, 3), pad=(1, 1), num_filter=256) relu5 = mx.symbol.Activation(data=conv5, act_type="relu") pool3 = mx.symbol.Pooling(data=relu5, kernel=(3, 3), stride=(2, 2), pool_type="max") # stage 4 flatten = mx.symbol.Flatten(data=pool3) fc1 = mx.symbol.FullyConnected(data=flatten, num_hidden=4096) relu6 = mx.symbol.Activation(data=fc1, act_type="relu") dropout1 = mx.symbol.Dropout(data=relu6, p=0.5) # stage 5 fc2 = mx.symbol.FullyConnected(data=dropout1, num_hidden=4096) relu7 = mx.symbol.Activation(data=fc2, act_type="relu") dropout2 = mx.symbol.Dropout(data=relu7, p=0.5) # stage 6 fc3 = mx.symbol.FullyConnected(data=dropout2, num_hidden=1000) softmax = mx.symbol.SoftmaxOutput(data=fc3, name='softmax') ## data batch_size = 256 train, val = ilsvrc12_iterator(batch_size=batch_size, input_shape=(3,224,224)) ## train num_gpus = 4 gpus = [mx.gpu(i) for i in range(num_gpus)] model = mx.model.FeedForward( ctx = gpus, symbol = softmax, num_round = 20, learning_rate = 0.01, momentum = 0.9, wd = 0.00001) model.fit(X = train, eval_data = val, batch_end_callback = mx.callback.Speedometer(batch_size=batch_size)) 6 layers definition 6 layers definition multi GPUmulti GPU train ittrain it
  • 33. Marianas Labs Engine • All operations issued into backend C++ engine
 Binding languages’ performance does not matter • Engine builds the read/write dependency graph • lazy evaluation • parallel execution • sophisticated memory
 allocation
  • 35. Marianas Labs Distributed Deep Learning Plays well with S3
 Spot instances & containers
  • 36. Marianas Labs • Imagenet datasets (1m images, 1k classes) • Google Inception network 
 (88 lines of code) Results 40% faster due to parallelization 1 2 4 0 100 200 300 400 #ofimages/sec # of GPUs CXXNet MXNet naive inplace sharing both 0 1 2 3 4 5 memoryusage(GB) 50% GPU memory reduction
  • 37. Marianas Labs Distributed Results x 0 2 4 6 8 Mbps 0 300 600 900 1200 machine 1 2 3 4 5 6 7 8 9 10 bandwidth speedup g2.2xlarge network limit 10 0 10 2 0.2 0.3 0.4 0.5 0.6 0.7 time (hour) accuracy single GPU 4 x dual GPUs Convergence on 4 x dual GTX980
  • 38. Marianas Labs Amazon g2.8xlarge • 12 instances (48 GPUs) @ $0.50/h spot • Minibatch size 512 • BSP with 1 delay between machines • 2 GB/s bandwidth between machines (awful) 10.113.170.187,10.157.109.227, 10.169.170.55, 10.136.52.151, 10.45.64.250, 10.166.137.100, 
 10.97.167.10, 10.97.187.157, 10.61.128.107, 10.171.105.160, 10.203.143.220, 10.45.71.20 (all over the place in availability zone) • Compressing to 1 byte per coordinate helps a bit but adds latency due to extra pass (need to fix) • 37x speedup on 48 GPUS • Imagenet’12 dataset in trained in 4h, i.e. $24
 (with alexnet; googlenet even better for network)
  • 39. Marianas Labs Mobile, too • 10 hours on 10 GTX980 • Deployed on Android
  • 40. Marianas Labs Plans • This month (went live 2 months ago) • 700 stars • 27 authors with
 >1k commits • Ranks 5th on Github C++ project trending • Build a larger community • Many more applications & platforms • Image/video analysis, Voice recognition • Language model, machine translation • Intel drivers
  • 41. Marianas Labs Side-effects of using MXNET Zichao was using Theano
  • 42. Marianas Labs Tensorflow vs. MXNET Languages MultiGPU Distributed Mobile Runtime Engine Tensorflow Python Yes No Yes ? MXNET Python, R, Julia, Go Yes Yes Yes Yes (Googlenet) E5-1650/980 Tensor flow Torch7 Caffe MXNET Time 940ms 172ms 170ms 180ms Memory all (OOM after 24) 2.1GB 2.2GB 1.6GB
  • 43. Marianas Labs More numbers (this morning) Googlenet batch 32 (24) Alexnet
 batch 128 VGG batch 32 (24) Torch7 172ms 2.1GB 135ms 1.6GB 459ms 3.4gb Caffe 170ms 2.2GB 165ms 1.34GB 448ms 3.4gb MXNET 172ms
 (new and improved) 1.5GB 147ms 1.38GB 456ms 3.6gb Tensor flow 940ms - 433ms - 980ms -
  • 44. Marianas Labs Details • Parameter Server Basics
 Logistic Regression (Classification) • Memory Subsystem
 Matrix Factorization (Recommender) • Large Distributed State
 Factorization Machines (CTR) • GPUs
 MXNET and applications • Models
  • 45. Marianas Labs Models • Fully connected networks (generic data)
 (70s and 80s … trivial to scale, see e.g. CAFFE) • Convolutional networks (1D, 2D grids)
 (90s - images, 2014 - text … see e.g. CUDNN) • Recurrent networks (time series, text)
 (90s - LSTM - latent variable autoregressive) • Memory • Attention model (Cho et al, 2014) • Memory model (Weston et al, 2014) • Memory rewriting - NTM (Graves et al, 2014)
  • 47. Marianas Labs Memory Networks
 (Sukhbataar, Szlam, Fergus, Weston, 2015) attention layer state layer query
  • 48. Marianas Labs Memory Networks
 (Sukhbataar, Szlam, Fergus, Weston, 2015) • Query weirdly defined
 (via bilinear embedding matrix) • Works on facts / text only
 (ingest sequence and weigh it) • ‘Ask me anything’ by Socher et al. 2015
 (uses GRU rather than LSTMs)
  • 49. Marianas Labs Memory networks for images http://arxiv.org/abs/1511.02274 (Yang et al, 2015)
  • 50. Marianas Labs Results Best results on • DAQUAR • DAQUAR- REDUCED • COCO-QA
  • 53. Marianas Labs Summary • Parameter Server Basics
 Logistic Regression • Large Distributed State
 Factorization Machines • Memory Subsystem
 Matrix Factorization • GPUs
 MXNET and applications • Models Data (local or cloud) read process process process process process process process process write Data (local or cloud) local state (or copy) update Data (local or cloud) read process process process process process process process process write Data (local or cloud) local state (or copy) update Data (local or cloud) read process process process process process process process process write Data (local or cloud) local state (or copy) update Server Server github.com/dmlc
  • 54. Marianas Labs Many thanks to • CMU & Marianas • Mu Li • Ziqi Liu • Yu-Xiang Wang • Zichao Zhang • Microsoft • Xiaodong He • Jianfeng Gao • Li Deng • MXNET • Tianqi Chen (UW) • Bing Xu • Naiyang Wang • Minjie Wang • Tianjun Xiao • Jianpeng Li • Jiaxing Zhang • Chuntao Hong