SlideShare a Scribd company logo
1 of 83
Download to read offline
Kangwook Lee, KAIST
Joint work w/
Max Lam (Stanford), Ramtin Pedarsani (UCSB), Dimitris Papailiopoulos (UW Madison),
Kannan Ramchandran (UC Berkeley), Jichan Chung (KAIST), Geewon Suh (KAIST),
Changho Suh (KAIST)
05/22/2017
Microsoft’s data center in Dublin, Ireland
# of servers > 1,000,000
• 300,000 for Xbox
• 700,000 for ?
Estimated cost > $2.5B
Size ~= Large Football Stadiums
“The scale and complexity of modern
Web services make it infeasible to
eliminate all latency variability.”
-Jeff Dean, Google
Network latency
Shared resources
Maintenance
HW failure
System Noise
System Noise = Latency Variability
System Noise = Latency Variability
A A
Loading file A…
Completed in 1s.
System Noise = Latency Variability
A A
Loading file A…
Completed in 1s.
A A
Loading file A…
Completed in 3s.
Still loading…
Still…
f(A)
Computing f(A)…
Completed in 1s.
A
System Noise = Latency Variability
f(A)
Computing f(A)…
Completed in 3s.
Still computing…
Still…
A
System Noise = Latency Variability
f(A)
Computing f(A)…
Completed in 1s.
A
Codes
Code is a system of rules to convert information
into another form of representation
Codes
Code is a system of rules to convert information
into another form of representation
A
B
A+B
Noise
Codes
Code is a system of rules to convert information
into another form of representation
A
B
A+B
A
?
A+B
Noise
Codes
Code is a system of rules to convert information
into another form of representation
A
B
A+B
A
B
A+B
Noise
Speeding Up Distributed Computing Systems Using Codes
A
B
A+B
A
?
A+B
Noise
System Noise
Codes for
ComputingStorage
[LSHR, IEEE T-IT ’17]
[LPR, IEEE/ACM ToN ’16]
[SLR, IEEE ToC ’16]
[LYPR, Allerton ’13]
Algorithms
[LPR, in preparation]
[PYLR, IEEE T-IT ’17]
[CLKPR, IEEE ICC ’17]
[LPR, IEEE ISIT ’16]
[PLR, IEEE Allerton ’15]
[LRS, IEEE ISIT ‘17]
[LPPR, IEEE ISIT ‘17]
[SLS, in preparation]
[LLPPR, IEEE ISIT ‘16,
NIPS ’15 workshop]
Codes for
Storage
[LSHR, IEEE T-IT ‘17
[LPR, IEEE/ACM ToN ’16]
[SLR, IEEE ToC ’16]
[LYPR, Allerton ’13]
Algorithms
[LPR, in preparation]
[PYLR, IEEE T-IT ’17]
[CLKPR, IEEE ICC ’17]
[LPR, IEEE ISIT ’16]
[PLR, IEEE Allerton ’15]
Computing
[LRS, IEEE ISIT ‘17]
[LPPR, IEEE ISIT ‘17]
[SLS, in preparation]
[LLPPR, IEEE ISIT ‘16,
NIPS ’15 workshop]
Large-scale Distributed Machine Learning Systems
1
2
1
2
n n
Coded ComputationCoded Shuffling
Agenda
• Basic Coded Computation
• MDS-coded Mat-Vec Multiplication
• Extensions
• Mat-Mat Multiplication
• Nonlinear functions
• Gradient Coding
• Basic Coded Shuffling
Distributed Matrix-Vector Multiplication
A ⇥ b
=
0
@
A1
A2
A3
1
A ⇥ b
=
0
@
A1 ⇥ b
A2 ⇥ b
A3 ⇥ b
1
A
Master
Worker 1 Worker 2 Worker 3
Distributed Matrix-Vector Multiplication
A ⇥ b
=
0
@
A1
A2
A3
1
A ⇥ b
=
0
@
A1 ⇥ b
A2 ⇥ b
A3 ⇥ b
1
A
Master
Worker 1 Worker 2 Worker 3
Distributed Matrix-Vector Multiplication
A ⇥ b
=
0
@
A1
A2
A3
1
A ⇥ b
=
0
@
A1 ⇥ b
A2 ⇥ b
A3 ⇥ b
1
A
Master
Worker 1 Worker 2 Worker 3
Distributed Matrix-Vector Multiplication
A ⇥ b
=
0
@
A1
A2
A3
1
A ⇥ b
=
0
@
A1 ⇥ b
A2 ⇥ b
A3 ⇥ b
1
A
:=
0
@
y1
y2
y3
1
A
Master
Worker 1 Worker 2 Worker 3
Distributed Matrix-Vector Multiplication
Master
Worker 1 Worker 2 Worker 3
A1 b b bA2 A3
A ⇥ b
=
0
@
A1
A2
A3
1
A ⇥ b
=
0
@
A1 ⇥ b
A2 ⇥ b
A3 ⇥ b
1
A
:=
0
@
y1
y2
y3
1
A
y1 y2 y3
Straggler Problem
0 0.1 0.2 0.3 0.4
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Latency (sec)
Complementary CDF of latency
5~10% stragglers
Incurs significant delay
(Measured on Amazon AWS)
Q. Can codes provide the distributed algorithms
with robustness against stragglers?
Uncoded Algorithm
W1 W2
M
W3
!"#$%&'& =	max !-, !/, !0, !1
A1
A2
A
A3
A4
W4
A1 b A2 b A3 b A4 b
A1b
A2b
A3b Ab
A4b
Replication-based Algorithm
W1 W2
M
W3
A
A’1
A’2
A’1b
A’2b
Ab
!/23'456$786%# =	max[min !′-, !′/ , min	(!′0, !′1))]
W4
A’1 b A’1 b A’2 b A’2 b
[LPR, IEEE/ACM ToN’16]
[SLR, IEEE ToC’16]
[D. Wang, G. Joshi, G. Wornell, ACM Sigmetrics’15]
[K. Gardner, S. Zbarsky, S. Doroudi, M. Harchol-Balter, E. Hyytia, A. Scheller-Wolf, ACM Sigmetrics’15]
The most popular choice in practice & well-studied in theory
Design param.
Coded Algorithm
W1 W2
M
W3 W4
A’’1
A
A’’2
A’’3
A’’1 b A’’2 b A’’3 b b∑
+
! 1,0 2ABC2$%&'& =	3rd	min(!′′-, !′′/, !′′0, !′′1)	
A’’1b
AbA’’2b
A’’3b
Design param.
A00
1 b A00
3 b (A00
1 + A00
2 + A00
3 )b
# of subblocks
coding gain with more parities
workload heavier
Coded Algorithm
A’’1
A
W1
A’’1
W2
M
b
W3A’’2
A’’1b
Ab
W4
A’’2 b A’’3 b b
A’’3
∑
+
! 1,0 2ABC2$%&'& =	3rd	min(!′′-, !′′/, !′′0, !′′1)	
A’’2b
A’’3b
Design param.
Q1. Given a latency distribution, what are the
optimal parameters for these algorithms?
Q2. Can ‘coded algorithm’ achieve
the optimal latency scaling?
Coded Computation for Linear Operations
Theorem: E[Tuncoded] = ⇥
✓
log n
n
◆
Assumptions:
§ n workers
§ k-way parallelizable: G = H G-, G/, … GJ
§ Computing time of ∑ KLML
N
LOP 	= constant + exponential RV (i.i.d.)
§ Average computing time is proportional to
-
J
Q
E[T]
R
coded
replication
1 Q∗
uncoded
⇥
✓
log n
n
◆
⇥
✓
1
n
◆
E[T?
MDS-coded] = ⇥
✓
1
n
◆
E[T?
replication] = ⇥
✓
log n
n
◆
[LLPPR, NIPS workshop‘15]
[LLPPR, IEEE ISIT’16]
k
E[T]
(ms)
500
400
300
200
6 8 10 12 14 16 18 20 22 n=24
30-40% speed up
uncoded
(1-rep)
2-rep
3-rep
4-rep
(24,23)
(24,22)
(24,20)
Experimental Results w/ 24 EC2 machines
Distributed Linear Regression
Gradient descent for linear regression = Iterative matrix multiplication
Coded Gradient Descent
= Gradient Descent + Coded Matrix Multiplication
35% reduction
0
2
4
6
8
10
Square Fat Tall
Averageruntime(s)
Uncoded
MDS-coded
✓(t+1)
= ✓(t)
↵AT
(A✓(t)
y)
Agenda
• Basic Coded Computation
• MDS-coded Mat-Vec Multiplication
• Extensions
• Mat-Mat Multiplication
• Nonlinear functions
• Gradient Coding
• Basic Coded Shuffling
Challenges
• Matrix-Matrix multiplication
• Nonlinear functions
• Gradient Coding
Challenges
• Matrix-Matrix multiplication
• Nonlinear functions
• Gradient Coding
Distributed Matrix-Matrix Multiplication
A ⇥ B
=
0
B
B
@
A1
A2
· · ·
An
1
C
C
A ⇥ B
=
0
B
B
@
A1 ⇥ B
A2 ⇥ B
· · ·
An ⇥ B
1
C
C
A
When both A and B scale,
this task does not fit in a single worker
Encode “A” and Multiply with B
No coding across groups of workers!
AB1 AB2
0
B
B
@
A1
A2
A1 + A2
A1 + 2A2
1
C
C
A ⇥ B1 B2
=
0
B
B
@
A1B1 A1B2
A2B1 A2B2
(A1 + A2)B1 (A1 + A2)B2
(A1 + 2A2)B1 (A1 + 2A2)B2
1
C
C
A
Encode over {AiBi}
4 x Average Latency = Average Latency
A1B1 A1B2 A2B1 A2B2 A1B1
+2A1B2
+3A2B1
+4A2B2
4 x Computation = Computation
A1B1
+2A1B2
+5A2B1
+3A2B2
3A1B1
+A1B2
+2A2B1
+4A2B2
A1B1
+5A1B2
+4A2B1
+2A2B2
0
@
A1
A2
A1 + A2
1
A ⇥ B1 B2 B1 + B2
=
0
B
B
B
B
@
A1B1 A1B2 A1(B1 + B2)
A2B1 A2B2 A2(B1 + B2)
(A1 + A2)B1 (A1 + A2)B2
1
C
C
C
C
A
Encode both “A” and “B”
0
@
A1
A2
A1 + A2
1
A ⇥ B1 B2 B1 + B2
=
0
B
B
B
B
@
A1B1 A1B2 A1(B1 + B2)
A2B1 A2B2 A2(B1 + B2)
(A1 + A2)B1 (A1 + A2)B2
1
C
C
C
C
A
Encode both “A” and “B”
0
@
A1
A2
A1 + A2
1
A ⇥ B1 B2 B1 + B2
=
0
B
B
B
B
@
A1B1 A1B2 A1(B1 + B2)
A2B1 A2B2 A2(B1 + B2)
(A1 + A2)B1 (A1 + A2)B2
1
C
C
C
C
A
Encode both “A” and “B”
0
@
A1
A2
A1 + A2
1
A ⇥ B1 B2 B1 + B2
=
0
B
B
B
B
@
A1B1 A1B2 A1(B1 + B2)
A2B1 A2B2 A2(B1 + B2)
(A1 + A2)B1 (A1 + A2)B2
1
C
C
C
C
A
Encode both “A” and “B”
Column
Decoding
0
@
A1
A2
A1 + A2
1
A ⇥ B1 B2 B1 + B2
=
0
B
B
B
B
@
A1B1 A1B2 A1(B1 + B2)
A2B1 A2B2 A2(B1 + B2)
(A1 + A2)B1 (A1 + A2)B2
1
C
C
C
C
A
Encode both “A” and “B”
Row
Decoding
0
@
A1
A2
A1 + A2
1
A ⇥ B1 B2 B1 + B2
=
0
B
B
B
B
@
A1B1 A1B2 A1(B1 + B2)
A2B1 A2B2 A2(B1 + B2)
(A1 + A2)B1 (A1 + A2)B2
1
C
C
C
C
A
Encode both “A” and “B”
Column
Decoding
0
@
A1
A2
A1 + A2
1
A ⇥ B1 B2 B1 + B2
=
0
B
B
B
B
@
A1B1 A1B2 A1(B1 + B2)
A2B1 A2B2 A2(B1 + B2)
(A1 + A2)B1 (A1 + A2)B2
1
C
C
C
C
A
Product-Coded Computation
Product Codes = Coding across all workers!
Simulation Results
Number of workers, N
400 800 1200 1600 2000 2400
E[T]
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
MDS-coded
Product-coded
Replication
Lower bound
Theorem: With workers,
Lower bound:
MDS-coded:
Product-coded:
Runtime Analysis
E[T]
1
µ
log
✓
k + t
t
◆
+ o(1)
E[TMDS-coded] ⇡
1
µ
log
✓
k + t
t
◆
+
1
µt
p
2(t + 1) log k
E[Tproduct-coded] ⇡
1
µ
log
✓
k + t/2
ct/2+1
◆
k2
+ tk
Pf: Product-Coded Computation
0
B
B
B
B
@
A1B1 A1B2 A1(B1 + B2)
A2B1 A2B2 A2(B1 + B2)
(A1 + A2)B1 (A1 + A2)B2
1
C
C
C
C
A
R1
R2
C1
C2
R3 C3
Pf: Product-Coded Computation
0
B
B
B
B
@
A1B1 A1B2 A1(B1 + B2)
A2B1 A2B2 A2(B1 + B2)
(A1 + A2)B1 (A1 + A2)B2
1
C
C
C
C
A
R1
R2
C1
C2
R3 C3
Pf: Product-Coded Computation
0
B
B
B
B
@
A1B1 A1B2 A1(B1 + B2)
A2B1 A2B2 A2(B1 + B2)
(A1 + A2)B1 (A1 + A2)B2
1
C
C
C
C
A
R1
R2
C1
C2
R3 C3
Pf: Product-Coded Computation
0
B
B
B
B
@
A1B1 A1B2 A1(B1 + B2)
A2B1 A2B2 A2(B1 + B2)
(A1 + A2)B1 (A1 + A2)B2
1
C
C
C
C
A
R1
R2
C1
C2
R3 C3
Pf: Product-Coded Computation
0
B
B
B
B
@
A1B1 A1B2 A1(B1 + B2)
A2B1 A2B2 A2(B1 + B2)
(A1 + A2)B1 (A1 + A2)B2
1
C
C
C
C
A
R1
R2
C1
C2
R3 C3
Pf: Product-Coded Computation
0
B
B
B
B
@
A1B1 A1B2 A1(B1 + B2)
A2B1 A2B2 A2(B1 + B2)
(A1 + A2)B1 (A1 + A2)B2
1
C
C
C
C
A
R1
R2
C1
C2
R3 C3
0
B
B
B
B
@
A1B1 A1B2 A1(B1 + B2)
A2B1 A2B2 A2(B1 + B2)
(A1 + A2)B1 (A1 + A2)B2
1
C
C
C
C
A
R1
R2
C1
C2
R3 C3
Pf: Product-Coded Computation
0
B
B
B
B
@
A1B1 A1B2 A1(B1 + B2)
A2B1 A2B2 A2(B1 + B2)
(A1 + A2)B1 (A1 + A2)B2
1
C
C
C
C
A
R1
R2
C1
C2
R3 C3
0
B
B
B
B
@
A1B1 A1B2 A1(B1 + B2)
A2B1 A2B2 A2(B1 + B2)
(A1 + A2)B1 (A1 + A2)B2
1
C
C
C
C
A
R1
R2
C1
C2
R3 C3
Pf: Product-Coded Computation
0
B
B
B
B
@
A1B1 A1B2 A1(B1 + B2)
A2B1 A2B2 A2(B1 + B2)
(A1 + A2)B1 (A1 + A2)B2
1
C
C
C
C
A
R1
R2
C1
C2
R3 C3
0
B
B
B
B
@
A1B1 A1B2 A1(B1 + B2)
A2B1 A2B2 A2(B1 + B2)
(A1 + A2)B1 (A1 + A2)B2
1
C
C
C
C
A
R1
R2
C1
C2
R3 C3
Pf: Product-Coded Computation
0
B
B
B
B
@
A1B1 A1B2 A1(B1 + B2)
A2B1 A2B2 A2(B1 + B2)
(A1 + A2)B1 (A1 + A2)B2
1
C
C
C
C
A
R1
R2
C1
C2
R3 C3
0
B
B
B
B
@
A1B1 A1B2 A1(B1 + B2)
A2B1 A2B2 A2(B1 + B2)
(A1 + A2)B1 (A1 + A2)B2
1
C
C
C
C
A
R1
R2
C1
C2
R3 C3
2-core exists!
Lemma: An erasure pattern is decodable iff the
corresponding bipartite graph does not have a core.
Theorem: Emergence of a core has a sharp threshold.
Challenges
• Matrix-Matrix multiplication
• Nonlinear functions
• Random Sparse Linear Code
• Gradient Coding
[Lee, Pedarsani, Papailiopoulos, Ramchandran, IEEE ISIT’17]
...
f1
f2
...
fk
Challenges
• Matrix-Matrix multiplication
• Nonlinear functions
• Gradient Coding
Gradient Coding
Master
Worker 1 Worker 2 Worker 3
f(x1) + f(x2) + f(x3)Goal:
[R. Tandon, Q. Lei, A. Dimakis, N. Karampatziakis, NIPS’16 workshop]
x1, x2 x1, x3 x2, x3
Gradient Coding
Master
Worker 1 Worker 2 Worker 3
f(x1) + f(x2) + f(x3)Goal:
[R. Tandon, Q. Lei, A. Dimakis, N. Karampatziakis, NIPS’16 workshop]
f(x1), f(x2) f(x2), f(x3)
x1, x2 x1, x3 x2, x3
Q. Can we do better?
Or can we reduce the comm. overheads?
Computation Alignment!
Gradient Coding
Master
Worker 1 Worker 2 Worker 3
f(x1) + f(x2) + f(x3)Goal:
[R. Tandon, Q. Lei, A. Dimakis, N. Karampatziakis, NIPS’16 workshop]
x1, x2 x1, x3 x2, x3
f(x1) + 2f(x2) f(x2) f(x3)f(x1) + 2f(x3)
Gradient Coding
Master
Worker 1 Worker 2 Worker 3
f(x1) + f(x2) + f(x3)Goal:
[R. Tandon, Q. Lei, A. Dimakis, N. Karampatziakis, NIPS’16 workshop]
x1, x2 x1, x3 x2, x3
f(x1) + 2f(x2) f(x2) f(x3)f(x1) + 2f(x3)
Gradient Coding
Master
Worker 1 Worker 2 Worker 3
f(x1) + f(x2) + f(x3)Goal:
[R. Tandon, Q. Lei, A. Dimakis, N. Karampatziakis, NIPS’16 workshop]
x1, x2 x1, x3 x2, x3
f(x1) + 2f(x2) f(x2) f(x3)f(x1) + 2f(x3)
Gradient Coding
Master
Worker 1 Worker 2 Worker 3
f(x1) + f(x2) + f(x3)Goal:
[R. Tandon, Q. Lei, A. Dimakis, N. Karampatziakis, NIPS’16 workshop]
x1, x2 x1, x3 x2, x3
f(x1) + 2f(x2) f(x2) f(x3)
A. This is the ‘unique’ solution
that achieves the minimum comm. overheads.
f(x1) + 2f(x3)
Recent Works on Coded Computation
• [S. Li, MA. Maddah-Ali, S. Avestimehr, Allerton’16]
• Coded Matrix Multiplication in MapReduce setup
• [Y. Yang, P. Grover, S. Kar, Allerton’16]
• Coded Computation for Logistic Regression
• [N. Ferdinand, S. Draper, Allerton’16]
• SVD + Coded Matrix Multiplication
• [S. Dutta, V. Cadambe, P. Grover, NIPS’16]
• Sparsification of A + Coded Matrix Multiplication
• [R. Tandon, Q. Lei, A. Dimakis, N. Karampatziakis, NIPS’16 workshop]
• Coded Computation + Distributed Gradient Computing
• + 8 works in ISIT’17…
Agenda
• Basic Coded Computation
• MDS-coded Mat-Vec Multiplication
• Extensions
• Mat-Mat Multiplication
• Nonlinear functions
• Gradient Coding
• Basic Coded Shuffling
x(t+1)
= x(t) (t)
rf(x(t)
)GD:
Uniformly
chosen at random
x(t+1)
= x(t) (t)
rfr(t) (x(t)
)SGD:
f(x) =
qX
i=1
fi(x)
Number of
data points
Coded Shuffling for PSGD
SGD SGD SGD
ModelData
Random
shuffling
Data
shuffling
x(0)
Coded Shuffling for PSGD
SGD SGD SGD
ModelData
Random
shuffling
Data
shuffling
Coded Shuffling for PSGD
x(0)
x
(1)
1 x
(1)
2 x
(1)
3
SGD SGD SGD
ModelData
Random
shuffling
Data
shuffling
Coded Shuffling for PSGD
x(0)
x
(1)
1 x
(1)
2 x
(1)
3
x(1)
= (x
(1)
1 , x
(1)
2 , x
(1)
3 )
SGD SGD SGD
ModelData
Random
shuffling
Data
shuffling
Coded Shuffling for PSGD
x(1)
SGD SGD SGD
Model
Coded Shuffling for PSGD
x(1)
Data
Random
shuffling
Data
shuffling
SGD SGD SGD
Model
Coded Shuffling for PSGD
x(1)
Data
Random
shuffling
Data
shuffling
Merge modelsPSGD with shuffling converges faster but it involves
communication cost* [Recht and Re, 2013], [Bottou, 2012], [Zhang and Re, 2014], [Gurbuzbalaban et al., 2015], [Ioffe and Szegedy, 2015],
[Zhang et al. 2015]
Coded Shuffling Algorithm
2 3A
W1
1
M
2 3 4
W2
(T-, T/) (T0, T1)
(T-, T0) (T/, T1)
Epoch 1
Epoch 2
1 2 3 4
Coded Shuffling Algorithm
2 3A
W1
1
M
2 3 4
1 TX
W2
(T-, T/) (T0, T1)
(T-, T0) (T/, T1)
Epoch 1
Epoch 2
1 2 3 4
2+32+3
Coded Shuffling Algorithm
2 3A
W1
1
M
2 3 4
1 TX
W2
(T-, T/) (T0, T1)
(T-, T0) (T/, T1)
Epoch 1
Epoch 2
1 2 3 4
2+3 2+3
3 2
Coded Shuffling Algorithm
2 3A
W1
1
M
2 3 4
1
TX
W2
(T-, T/) (T0, T1)
(T-, T0) (T/, T1)
Epoch 1
Epoch 2
1 2 3 4
2+32+3
3 2
Coding opportunity increases
as each worker can store more data points!
Thm: Let
q = # of data points,
= memory overhead
( ),
and n= # of workers. Then,
=
Number of data points stored by each worker
q/n
(1   n)
Coded Shuffling Algorithm
Tuncoded = q
✓
1
n
◆
Tcoded =
Tuncoded
+ 1
2 4 6 8 10 12 14 16 18 20
Memory
overhead
q = 10^7
n = 20
#oftransmissions/
10^6
1
10
2
3
4
5
6
7
8
9 Uncoded
Theory
Simulation
Simulation Results
35%
Tested on 25 EC2 instances
Experiments: Low-rank Matrix Completion (10M x 10M)
Conclusion
• Coding theory for distributed computing
• Stragglers slow down distributed computing
• => Coded Computation
• Data needs to be shuffled between distributed nodes
• => Coded Shuffling
ShufflingComputation

More Related Content

What's hot

High Speed VLSI Architecture for AES-Galois/Counter Mode
High Speed VLSI Architecture for AES-Galois/Counter ModeHigh Speed VLSI Architecture for AES-Galois/Counter Mode
High Speed VLSI Architecture for AES-Galois/Counter ModeIJERA Editor
 
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...NECST Lab @ Politecnico di Milano
 
Meetup web scale architecture quantum computing (Part 1 16-10-2018)
Meetup web scale architecture quantum computing (Part 1 16-10-2018)Meetup web scale architecture quantum computing (Part 1 16-10-2018)
Meetup web scale architecture quantum computing (Part 1 16-10-2018)Rolf Huisman
 
Student session Quantum Computing
Student session Quantum ComputingStudent session Quantum Computing
Student session Quantum ComputingRolf Huisman
 
Lunch session: Quantum Computing
Lunch session: Quantum ComputingLunch session: Quantum Computing
Lunch session: Quantum ComputingRolf Huisman
 

What's hot (8)

R04605106110
R04605106110R04605106110
R04605106110
 
High Speed VLSI Architecture for AES-Galois/Counter Mode
High Speed VLSI Architecture for AES-Galois/Counter ModeHigh Speed VLSI Architecture for AES-Galois/Counter Mode
High Speed VLSI Architecture for AES-Galois/Counter Mode
 
Selection analysis using HyPhy
Selection analysis using HyPhySelection analysis using HyPhy
Selection analysis using HyPhy
 
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
 
Meetup web scale architecture quantum computing (Part 1 16-10-2018)
Meetup web scale architecture quantum computing (Part 1 16-10-2018)Meetup web scale architecture quantum computing (Part 1 16-10-2018)
Meetup web scale architecture quantum computing (Part 1 16-10-2018)
 
Student session Quantum Computing
Student session Quantum ComputingStudent session Quantum Computing
Student session Quantum Computing
 
Lunch session: Quantum Computing
Lunch session: Quantum ComputingLunch session: Quantum Computing
Lunch session: Quantum Computing
 
Fp12_Efficient_SCM
Fp12_Efficient_SCMFp12_Efficient_SCM
Fp12_Efficient_SCM
 

Similar to Speeding Up Distributed Machine Learning Using Codes

Chapter 7: Matrix Multiplication
Chapter 7: Matrix MultiplicationChapter 7: Matrix Multiplication
Chapter 7: Matrix MultiplicationHeman Pathak
 
Sample quizz test
Sample quizz testSample quizz test
Sample quizz testkasguest
 
Combinational and sequential logic
Combinational and sequential logicCombinational and sequential logic
Combinational and sequential logicDeepak John
 
Datastage real time scenario
Datastage real time scenarioDatastage real time scenario
Datastage real time scenarioNaresh Bala
 
GSP 215 RANK Become Exceptional--gsp215rank.com
GSP 215 RANK Become Exceptional--gsp215rank.comGSP 215 RANK Become Exceptional--gsp215rank.com
GSP 215 RANK Become Exceptional--gsp215rank.comclaric119
 
GSP 215 RANK Achievement Education--gsp215rank.com
GSP 215 RANK Achievement Education--gsp215rank.comGSP 215 RANK Achievement Education--gsp215rank.com
GSP 215 RANK Achievement Education--gsp215rank.comclaric169
 
GSP 215 RANK Education Your Life--gsp215rank.com
GSP 215 RANK Education Your Life--gsp215rank.comGSP 215 RANK Education Your Life--gsp215rank.com
GSP 215 RANK Education Your Life--gsp215rank.comthomashard64
 
GSP 215 RANK Inspiring Innovation--gsp215rank.com
GSP 215 RANK Inspiring Innovation--gsp215rank.com GSP 215 RANK Inspiring Innovation--gsp215rank.com
GSP 215 RANK Inspiring Innovation--gsp215rank.com KeatonJennings102
 
GSP 215 RANK Lessons in Excellence-- gsp215rank.com
GSP 215 RANK Lessons in Excellence-- gsp215rank.comGSP 215 RANK Lessons in Excellence-- gsp215rank.com
GSP 215 RANK Lessons in Excellence-- gsp215rank.comRoelofMerwe102
 
Programmable array-logic-and-programmable-logic-array
Programmable array-logic-and-programmable-logic-arrayProgrammable array-logic-and-programmable-logic-array
Programmable array-logic-and-programmable-logic-arrayJher Carlson Atasan
 
FPGA based BCH Decoder
FPGA based BCH DecoderFPGA based BCH Decoder
FPGA based BCH Decoderijsrd.com
 
MuVM: Higher Order Mutation Analysis Virtual Machine for C
MuVM: Higher Order Mutation Analysis Virtual Machine for CMuVM: Higher Order Mutation Analysis Virtual Machine for C
MuVM: Higher Order Mutation Analysis Virtual Machine for CSusumu Tokumoto
 
design-compiler.pdf
design-compiler.pdfdesign-compiler.pdf
design-compiler.pdfFrangoCamila
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
 
COCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate AscentCOCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate Ascentjeykottalam
 
Algebraic data types: Semilattices
Algebraic data types: SemilatticesAlgebraic data types: Semilattices
Algebraic data types: SemilatticesBernhard Huemer
 

Similar to Speeding Up Distributed Machine Learning Using Codes (20)

Digital logic
Digital logicDigital logic
Digital logic
 
Chapter 7: Matrix Multiplication
Chapter 7: Matrix MultiplicationChapter 7: Matrix Multiplication
Chapter 7: Matrix Multiplication
 
04 comb ex
04 comb ex04 comb ex
04 comb ex
 
Sample quizz test
Sample quizz testSample quizz test
Sample quizz test
 
Combinational and sequential logic
Combinational and sequential logicCombinational and sequential logic
Combinational and sequential logic
 
Datastage real time scenario
Datastage real time scenarioDatastage real time scenario
Datastage real time scenario
 
GSP 215 RANK Become Exceptional--gsp215rank.com
GSP 215 RANK Become Exceptional--gsp215rank.comGSP 215 RANK Become Exceptional--gsp215rank.com
GSP 215 RANK Become Exceptional--gsp215rank.com
 
GSP 215 RANK Achievement Education--gsp215rank.com
GSP 215 RANK Achievement Education--gsp215rank.comGSP 215 RANK Achievement Education--gsp215rank.com
GSP 215 RANK Achievement Education--gsp215rank.com
 
Digital 4-bit Comprator
Digital 4-bit CompratorDigital 4-bit Comprator
Digital 4-bit Comprator
 
14 Lec11 2003
14 Lec11 200314 Lec11 2003
14 Lec11 2003
 
GSP 215 RANK Education Your Life--gsp215rank.com
GSP 215 RANK Education Your Life--gsp215rank.comGSP 215 RANK Education Your Life--gsp215rank.com
GSP 215 RANK Education Your Life--gsp215rank.com
 
GSP 215 RANK Inspiring Innovation--gsp215rank.com
GSP 215 RANK Inspiring Innovation--gsp215rank.com GSP 215 RANK Inspiring Innovation--gsp215rank.com
GSP 215 RANK Inspiring Innovation--gsp215rank.com
 
GSP 215 RANK Lessons in Excellence-- gsp215rank.com
GSP 215 RANK Lessons in Excellence-- gsp215rank.comGSP 215 RANK Lessons in Excellence-- gsp215rank.com
GSP 215 RANK Lessons in Excellence-- gsp215rank.com
 
Programmable array-logic-and-programmable-logic-array
Programmable array-logic-and-programmable-logic-arrayProgrammable array-logic-and-programmable-logic-array
Programmable array-logic-and-programmable-logic-array
 
FPGA based BCH Decoder
FPGA based BCH DecoderFPGA based BCH Decoder
FPGA based BCH Decoder
 
MuVM: Higher Order Mutation Analysis Virtual Machine for C
MuVM: Higher Order Mutation Analysis Virtual Machine for CMuVM: Higher Order Mutation Analysis Virtual Machine for C
MuVM: Higher Order Mutation Analysis Virtual Machine for C
 
design-compiler.pdf
design-compiler.pdfdesign-compiler.pdf
design-compiler.pdf
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
COCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate AscentCOCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate Ascent
 
Algebraic data types: Semilattices
Algebraic data types: SemilatticesAlgebraic data types: Semilattices
Algebraic data types: Semilattices
 

More from NAVER Engineering

디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIXNAVER Engineering
 
진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)NAVER Engineering
 
서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트NAVER Engineering
 
BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호NAVER Engineering
 
이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라NAVER Engineering
 
날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기NAVER Engineering
 
쏘카프레임 구축 배경과 과정
 쏘카프레임 구축 배경과 과정 쏘카프레임 구축 배경과 과정
쏘카프레임 구축 배경과 과정NAVER Engineering
 
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기NAVER Engineering
 
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)NAVER Engineering
 
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드NAVER Engineering
 
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기NAVER Engineering
 
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활NAVER Engineering
 
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출NAVER Engineering
 
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우NAVER Engineering
 
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...NAVER Engineering
 
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법NAVER Engineering
 
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며NAVER Engineering
 
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기NAVER Engineering
 
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기NAVER Engineering
 

More from NAVER Engineering (20)

React vac pattern
React vac patternReact vac pattern
React vac pattern
 
디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX
 
진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)
 
서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트
 
BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호
 
이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라
 
날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기
 
쏘카프레임 구축 배경과 과정
 쏘카프레임 구축 배경과 과정 쏘카프레임 구축 배경과 과정
쏘카프레임 구축 배경과 과정
 
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
 
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
 
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
 
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
 
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
 
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
 
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
 
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
 
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
 
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
 
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
 
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
 

Recently uploaded

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Recently uploaded (20)

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

Speeding Up Distributed Machine Learning Using Codes

  • 1. Kangwook Lee, KAIST Joint work w/ Max Lam (Stanford), Ramtin Pedarsani (UCSB), Dimitris Papailiopoulos (UW Madison), Kannan Ramchandran (UC Berkeley), Jichan Chung (KAIST), Geewon Suh (KAIST), Changho Suh (KAIST) 05/22/2017
  • 2. Microsoft’s data center in Dublin, Ireland # of servers > 1,000,000 • 300,000 for Xbox • 700,000 for ? Estimated cost > $2.5B Size ~= Large Football Stadiums
  • 3.
  • 4. “The scale and complexity of modern Web services make it infeasible to eliminate all latency variability.” -Jeff Dean, Google
  • 8. System Noise = Latency Variability
  • 9. System Noise = Latency Variability A A Loading file A… Completed in 1s.
  • 10. System Noise = Latency Variability A A Loading file A… Completed in 1s. A A Loading file A… Completed in 3s. Still loading… Still…
  • 11. f(A) Computing f(A)… Completed in 1s. A System Noise = Latency Variability
  • 12. f(A) Computing f(A)… Completed in 3s. Still computing… Still… A System Noise = Latency Variability f(A) Computing f(A)… Completed in 1s. A
  • 13. Codes Code is a system of rules to convert information into another form of representation
  • 14. Codes Code is a system of rules to convert information into another form of representation A B A+B Noise
  • 15. Codes Code is a system of rules to convert information into another form of representation A B A+B A ? A+B Noise
  • 16. Codes Code is a system of rules to convert information into another form of representation A B A+B A B A+B Noise
  • 17. Speeding Up Distributed Computing Systems Using Codes A B A+B A ? A+B Noise System Noise
  • 18. Codes for ComputingStorage [LSHR, IEEE T-IT ’17] [LPR, IEEE/ACM ToN ’16] [SLR, IEEE ToC ’16] [LYPR, Allerton ’13] Algorithms [LPR, in preparation] [PYLR, IEEE T-IT ’17] [CLKPR, IEEE ICC ’17] [LPR, IEEE ISIT ’16] [PLR, IEEE Allerton ’15] [LRS, IEEE ISIT ‘17] [LPPR, IEEE ISIT ‘17] [SLS, in preparation] [LLPPR, IEEE ISIT ‘16, NIPS ’15 workshop]
  • 19. Codes for Storage [LSHR, IEEE T-IT ‘17 [LPR, IEEE/ACM ToN ’16] [SLR, IEEE ToC ’16] [LYPR, Allerton ’13] Algorithms [LPR, in preparation] [PYLR, IEEE T-IT ’17] [CLKPR, IEEE ICC ’17] [LPR, IEEE ISIT ’16] [PLR, IEEE Allerton ’15] Computing [LRS, IEEE ISIT ‘17] [LPPR, IEEE ISIT ‘17] [SLS, in preparation] [LLPPR, IEEE ISIT ‘16, NIPS ’15 workshop]
  • 20. Large-scale Distributed Machine Learning Systems 1 2 1 2 n n Coded ComputationCoded Shuffling
  • 21. Agenda • Basic Coded Computation • MDS-coded Mat-Vec Multiplication • Extensions • Mat-Mat Multiplication • Nonlinear functions • Gradient Coding • Basic Coded Shuffling
  • 22. Distributed Matrix-Vector Multiplication A ⇥ b = 0 @ A1 A2 A3 1 A ⇥ b = 0 @ A1 ⇥ b A2 ⇥ b A3 ⇥ b 1 A Master Worker 1 Worker 2 Worker 3
  • 23. Distributed Matrix-Vector Multiplication A ⇥ b = 0 @ A1 A2 A3 1 A ⇥ b = 0 @ A1 ⇥ b A2 ⇥ b A3 ⇥ b 1 A Master Worker 1 Worker 2 Worker 3
  • 24. Distributed Matrix-Vector Multiplication A ⇥ b = 0 @ A1 A2 A3 1 A ⇥ b = 0 @ A1 ⇥ b A2 ⇥ b A3 ⇥ b 1 A Master Worker 1 Worker 2 Worker 3
  • 25. Distributed Matrix-Vector Multiplication A ⇥ b = 0 @ A1 A2 A3 1 A ⇥ b = 0 @ A1 ⇥ b A2 ⇥ b A3 ⇥ b 1 A := 0 @ y1 y2 y3 1 A Master Worker 1 Worker 2 Worker 3
  • 26. Distributed Matrix-Vector Multiplication Master Worker 1 Worker 2 Worker 3 A1 b b bA2 A3 A ⇥ b = 0 @ A1 A2 A3 1 A ⇥ b = 0 @ A1 ⇥ b A2 ⇥ b A3 ⇥ b 1 A := 0 @ y1 y2 y3 1 A y1 y2 y3
  • 27. Straggler Problem 0 0.1 0.2 0.3 0.4 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Latency (sec) Complementary CDF of latency 5~10% stragglers Incurs significant delay (Measured on Amazon AWS) Q. Can codes provide the distributed algorithms with robustness against stragglers?
  • 28. Uncoded Algorithm W1 W2 M W3 !"#$%&'& = max !-, !/, !0, !1 A1 A2 A A3 A4 W4 A1 b A2 b A3 b A4 b A1b A2b A3b Ab A4b
  • 29. Replication-based Algorithm W1 W2 M W3 A A’1 A’2 A’1b A’2b Ab !/23'456$786%# = max[min !′-, !′/ , min (!′0, !′1))] W4 A’1 b A’1 b A’2 b A’2 b [LPR, IEEE/ACM ToN’16] [SLR, IEEE ToC’16] [D. Wang, G. Joshi, G. Wornell, ACM Sigmetrics’15] [K. Gardner, S. Zbarsky, S. Doroudi, M. Harchol-Balter, E. Hyytia, A. Scheller-Wolf, ACM Sigmetrics’15] The most popular choice in practice & well-studied in theory Design param.
  • 30. Coded Algorithm W1 W2 M W3 W4 A’’1 A A’’2 A’’3 A’’1 b A’’2 b A’’3 b b∑ + ! 1,0 2ABC2$%&'& = 3rd min(!′′-, !′′/, !′′0, !′′1) A’’1b AbA’’2b A’’3b Design param. A00 1 b A00 3 b (A00 1 + A00 2 + A00 3 )b # of subblocks coding gain with more parities workload heavier
  • 31. Coded Algorithm A’’1 A W1 A’’1 W2 M b W3A’’2 A’’1b Ab W4 A’’2 b A’’3 b b A’’3 ∑ + ! 1,0 2ABC2$%&'& = 3rd min(!′′-, !′′/, !′′0, !′′1) A’’2b A’’3b Design param. Q1. Given a latency distribution, what are the optimal parameters for these algorithms? Q2. Can ‘coded algorithm’ achieve the optimal latency scaling?
  • 32. Coded Computation for Linear Operations Theorem: E[Tuncoded] = ⇥ ✓ log n n ◆ Assumptions: § n workers § k-way parallelizable: G = H G-, G/, … GJ § Computing time of ∑ KLML N LOP = constant + exponential RV (i.i.d.) § Average computing time is proportional to - J Q E[T] R coded replication 1 Q∗ uncoded ⇥ ✓ log n n ◆ ⇥ ✓ 1 n ◆ E[T? MDS-coded] = ⇥ ✓ 1 n ◆ E[T? replication] = ⇥ ✓ log n n ◆ [LLPPR, NIPS workshop‘15] [LLPPR, IEEE ISIT’16]
  • 33. k E[T] (ms) 500 400 300 200 6 8 10 12 14 16 18 20 22 n=24 30-40% speed up uncoded (1-rep) 2-rep 3-rep 4-rep (24,23) (24,22) (24,20) Experimental Results w/ 24 EC2 machines
  • 34. Distributed Linear Regression Gradient descent for linear regression = Iterative matrix multiplication Coded Gradient Descent = Gradient Descent + Coded Matrix Multiplication 35% reduction 0 2 4 6 8 10 Square Fat Tall Averageruntime(s) Uncoded MDS-coded ✓(t+1) = ✓(t) ↵AT (A✓(t) y)
  • 35. Agenda • Basic Coded Computation • MDS-coded Mat-Vec Multiplication • Extensions • Mat-Mat Multiplication • Nonlinear functions • Gradient Coding • Basic Coded Shuffling
  • 36. Challenges • Matrix-Matrix multiplication • Nonlinear functions • Gradient Coding
  • 37. Challenges • Matrix-Matrix multiplication • Nonlinear functions • Gradient Coding
  • 38. Distributed Matrix-Matrix Multiplication A ⇥ B = 0 B B @ A1 A2 · · · An 1 C C A ⇥ B = 0 B B @ A1 ⇥ B A2 ⇥ B · · · An ⇥ B 1 C C A When both A and B scale, this task does not fit in a single worker
  • 39. Encode “A” and Multiply with B No coding across groups of workers! AB1 AB2 0 B B @ A1 A2 A1 + A2 A1 + 2A2 1 C C A ⇥ B1 B2 = 0 B B @ A1B1 A1B2 A2B1 A2B2 (A1 + A2)B1 (A1 + A2)B2 (A1 + 2A2)B1 (A1 + 2A2)B2 1 C C A
  • 40. Encode over {AiBi} 4 x Average Latency = Average Latency A1B1 A1B2 A2B1 A2B2 A1B1 +2A1B2 +3A2B1 +4A2B2 4 x Computation = Computation A1B1 +2A1B2 +5A2B1 +3A2B2 3A1B1 +A1B2 +2A2B1 +4A2B2 A1B1 +5A1B2 +4A2B1 +2A2B2
  • 41. 0 @ A1 A2 A1 + A2 1 A ⇥ B1 B2 B1 + B2 = 0 B B B B @ A1B1 A1B2 A1(B1 + B2) A2B1 A2B2 A2(B1 + B2) (A1 + A2)B1 (A1 + A2)B2 1 C C C C A Encode both “A” and “B”
  • 42. 0 @ A1 A2 A1 + A2 1 A ⇥ B1 B2 B1 + B2 = 0 B B B B @ A1B1 A1B2 A1(B1 + B2) A2B1 A2B2 A2(B1 + B2) (A1 + A2)B1 (A1 + A2)B2 1 C C C C A Encode both “A” and “B”
  • 43. 0 @ A1 A2 A1 + A2 1 A ⇥ B1 B2 B1 + B2 = 0 B B B B @ A1B1 A1B2 A1(B1 + B2) A2B1 A2B2 A2(B1 + B2) (A1 + A2)B1 (A1 + A2)B2 1 C C C C A Encode both “A” and “B”
  • 44. 0 @ A1 A2 A1 + A2 1 A ⇥ B1 B2 B1 + B2 = 0 B B B B @ A1B1 A1B2 A1(B1 + B2) A2B1 A2B2 A2(B1 + B2) (A1 + A2)B1 (A1 + A2)B2 1 C C C C A Encode both “A” and “B” Column Decoding
  • 45. 0 @ A1 A2 A1 + A2 1 A ⇥ B1 B2 B1 + B2 = 0 B B B B @ A1B1 A1B2 A1(B1 + B2) A2B1 A2B2 A2(B1 + B2) (A1 + A2)B1 (A1 + A2)B2 1 C C C C A Encode both “A” and “B” Row Decoding
  • 46. 0 @ A1 A2 A1 + A2 1 A ⇥ B1 B2 B1 + B2 = 0 B B B B @ A1B1 A1B2 A1(B1 + B2) A2B1 A2B2 A2(B1 + B2) (A1 + A2)B1 (A1 + A2)B2 1 C C C C A Encode both “A” and “B” Column Decoding
  • 47. 0 @ A1 A2 A1 + A2 1 A ⇥ B1 B2 B1 + B2 = 0 B B B B @ A1B1 A1B2 A1(B1 + B2) A2B1 A2B2 A2(B1 + B2) (A1 + A2)B1 (A1 + A2)B2 1 C C C C A Product-Coded Computation Product Codes = Coding across all workers!
  • 48. Simulation Results Number of workers, N 400 800 1200 1600 2000 2400 E[T] 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 MDS-coded Product-coded Replication Lower bound
  • 49. Theorem: With workers, Lower bound: MDS-coded: Product-coded: Runtime Analysis E[T] 1 µ log ✓ k + t t ◆ + o(1) E[TMDS-coded] ⇡ 1 µ log ✓ k + t t ◆ + 1 µt p 2(t + 1) log k E[Tproduct-coded] ⇡ 1 µ log ✓ k + t/2 ct/2+1 ◆ k2 + tk
  • 50. Pf: Product-Coded Computation 0 B B B B @ A1B1 A1B2 A1(B1 + B2) A2B1 A2B2 A2(B1 + B2) (A1 + A2)B1 (A1 + A2)B2 1 C C C C A R1 R2 C1 C2 R3 C3
  • 51. Pf: Product-Coded Computation 0 B B B B @ A1B1 A1B2 A1(B1 + B2) A2B1 A2B2 A2(B1 + B2) (A1 + A2)B1 (A1 + A2)B2 1 C C C C A R1 R2 C1 C2 R3 C3
  • 52. Pf: Product-Coded Computation 0 B B B B @ A1B1 A1B2 A1(B1 + B2) A2B1 A2B2 A2(B1 + B2) (A1 + A2)B1 (A1 + A2)B2 1 C C C C A R1 R2 C1 C2 R3 C3
  • 53. Pf: Product-Coded Computation 0 B B B B @ A1B1 A1B2 A1(B1 + B2) A2B1 A2B2 A2(B1 + B2) (A1 + A2)B1 (A1 + A2)B2 1 C C C C A R1 R2 C1 C2 R3 C3
  • 54. Pf: Product-Coded Computation 0 B B B B @ A1B1 A1B2 A1(B1 + B2) A2B1 A2B2 A2(B1 + B2) (A1 + A2)B1 (A1 + A2)B2 1 C C C C A R1 R2 C1 C2 R3 C3
  • 55. Pf: Product-Coded Computation 0 B B B B @ A1B1 A1B2 A1(B1 + B2) A2B1 A2B2 A2(B1 + B2) (A1 + A2)B1 (A1 + A2)B2 1 C C C C A R1 R2 C1 C2 R3 C3 0 B B B B @ A1B1 A1B2 A1(B1 + B2) A2B1 A2B2 A2(B1 + B2) (A1 + A2)B1 (A1 + A2)B2 1 C C C C A R1 R2 C1 C2 R3 C3
  • 56. Pf: Product-Coded Computation 0 B B B B @ A1B1 A1B2 A1(B1 + B2) A2B1 A2B2 A2(B1 + B2) (A1 + A2)B1 (A1 + A2)B2 1 C C C C A R1 R2 C1 C2 R3 C3 0 B B B B @ A1B1 A1B2 A1(B1 + B2) A2B1 A2B2 A2(B1 + B2) (A1 + A2)B1 (A1 + A2)B2 1 C C C C A R1 R2 C1 C2 R3 C3
  • 57. Pf: Product-Coded Computation 0 B B B B @ A1B1 A1B2 A1(B1 + B2) A2B1 A2B2 A2(B1 + B2) (A1 + A2)B1 (A1 + A2)B2 1 C C C C A R1 R2 C1 C2 R3 C3 0 B B B B @ A1B1 A1B2 A1(B1 + B2) A2B1 A2B2 A2(B1 + B2) (A1 + A2)B1 (A1 + A2)B2 1 C C C C A R1 R2 C1 C2 R3 C3
  • 58. Pf: Product-Coded Computation 0 B B B B @ A1B1 A1B2 A1(B1 + B2) A2B1 A2B2 A2(B1 + B2) (A1 + A2)B1 (A1 + A2)B2 1 C C C C A R1 R2 C1 C2 R3 C3 0 B B B B @ A1B1 A1B2 A1(B1 + B2) A2B1 A2B2 A2(B1 + B2) (A1 + A2)B1 (A1 + A2)B2 1 C C C C A R1 R2 C1 C2 R3 C3 2-core exists! Lemma: An erasure pattern is decodable iff the corresponding bipartite graph does not have a core. Theorem: Emergence of a core has a sharp threshold.
  • 59. Challenges • Matrix-Matrix multiplication • Nonlinear functions • Random Sparse Linear Code • Gradient Coding [Lee, Pedarsani, Papailiopoulos, Ramchandran, IEEE ISIT’17] ... f1 f2 ... fk
  • 60. Challenges • Matrix-Matrix multiplication • Nonlinear functions • Gradient Coding
  • 61. Gradient Coding Master Worker 1 Worker 2 Worker 3 f(x1) + f(x2) + f(x3)Goal: [R. Tandon, Q. Lei, A. Dimakis, N. Karampatziakis, NIPS’16 workshop] x1, x2 x1, x3 x2, x3
  • 62. Gradient Coding Master Worker 1 Worker 2 Worker 3 f(x1) + f(x2) + f(x3)Goal: [R. Tandon, Q. Lei, A. Dimakis, N. Karampatziakis, NIPS’16 workshop] f(x1), f(x2) f(x2), f(x3) x1, x2 x1, x3 x2, x3 Q. Can we do better? Or can we reduce the comm. overheads? Computation Alignment!
  • 63. Gradient Coding Master Worker 1 Worker 2 Worker 3 f(x1) + f(x2) + f(x3)Goal: [R. Tandon, Q. Lei, A. Dimakis, N. Karampatziakis, NIPS’16 workshop] x1, x2 x1, x3 x2, x3 f(x1) + 2f(x2) f(x2) f(x3)f(x1) + 2f(x3)
  • 64. Gradient Coding Master Worker 1 Worker 2 Worker 3 f(x1) + f(x2) + f(x3)Goal: [R. Tandon, Q. Lei, A. Dimakis, N. Karampatziakis, NIPS’16 workshop] x1, x2 x1, x3 x2, x3 f(x1) + 2f(x2) f(x2) f(x3)f(x1) + 2f(x3)
  • 65. Gradient Coding Master Worker 1 Worker 2 Worker 3 f(x1) + f(x2) + f(x3)Goal: [R. Tandon, Q. Lei, A. Dimakis, N. Karampatziakis, NIPS’16 workshop] x1, x2 x1, x3 x2, x3 f(x1) + 2f(x2) f(x2) f(x3)f(x1) + 2f(x3)
  • 66. Gradient Coding Master Worker 1 Worker 2 Worker 3 f(x1) + f(x2) + f(x3)Goal: [R. Tandon, Q. Lei, A. Dimakis, N. Karampatziakis, NIPS’16 workshop] x1, x2 x1, x3 x2, x3 f(x1) + 2f(x2) f(x2) f(x3) A. This is the ‘unique’ solution that achieves the minimum comm. overheads. f(x1) + 2f(x3)
  • 67. Recent Works on Coded Computation • [S. Li, MA. Maddah-Ali, S. Avestimehr, Allerton’16] • Coded Matrix Multiplication in MapReduce setup • [Y. Yang, P. Grover, S. Kar, Allerton’16] • Coded Computation for Logistic Regression • [N. Ferdinand, S. Draper, Allerton’16] • SVD + Coded Matrix Multiplication • [S. Dutta, V. Cadambe, P. Grover, NIPS’16] • Sparsification of A + Coded Matrix Multiplication • [R. Tandon, Q. Lei, A. Dimakis, N. Karampatziakis, NIPS’16 workshop] • Coded Computation + Distributed Gradient Computing • + 8 works in ISIT’17…
  • 68. Agenda • Basic Coded Computation • MDS-coded Mat-Vec Multiplication • Extensions • Mat-Mat Multiplication • Nonlinear functions • Gradient Coding • Basic Coded Shuffling
  • 69. x(t+1) = x(t) (t) rf(x(t) )GD: Uniformly chosen at random x(t+1) = x(t) (t) rfr(t) (x(t) )SGD: f(x) = qX i=1 fi(x) Number of data points Coded Shuffling for PSGD
  • 71. SGD SGD SGD ModelData Random shuffling Data shuffling Coded Shuffling for PSGD x(0) x (1) 1 x (1) 2 x (1) 3
  • 72. SGD SGD SGD ModelData Random shuffling Data shuffling Coded Shuffling for PSGD x(0) x (1) 1 x (1) 2 x (1) 3 x(1) = (x (1) 1 , x (1) 2 , x (1) 3 )
  • 74. SGD SGD SGD Model Coded Shuffling for PSGD x(1) Data Random shuffling Data shuffling
  • 75. SGD SGD SGD Model Coded Shuffling for PSGD x(1) Data Random shuffling Data shuffling Merge modelsPSGD with shuffling converges faster but it involves communication cost* [Recht and Re, 2013], [Bottou, 2012], [Zhang and Re, 2014], [Gurbuzbalaban et al., 2015], [Ioffe and Szegedy, 2015], [Zhang et al. 2015]
  • 76. Coded Shuffling Algorithm 2 3A W1 1 M 2 3 4 W2 (T-, T/) (T0, T1) (T-, T0) (T/, T1) Epoch 1 Epoch 2 1 2 3 4
  • 77. Coded Shuffling Algorithm 2 3A W1 1 M 2 3 4 1 TX W2 (T-, T/) (T0, T1) (T-, T0) (T/, T1) Epoch 1 Epoch 2 1 2 3 4 2+32+3
  • 78. Coded Shuffling Algorithm 2 3A W1 1 M 2 3 4 1 TX W2 (T-, T/) (T0, T1) (T-, T0) (T/, T1) Epoch 1 Epoch 2 1 2 3 4 2+3 2+3 3 2
  • 79. Coded Shuffling Algorithm 2 3A W1 1 M 2 3 4 1 TX W2 (T-, T/) (T0, T1) (T-, T0) (T/, T1) Epoch 1 Epoch 2 1 2 3 4 2+32+3 3 2 Coding opportunity increases as each worker can store more data points!
  • 80. Thm: Let q = # of data points, = memory overhead ( ), and n= # of workers. Then, = Number of data points stored by each worker q/n (1   n) Coded Shuffling Algorithm Tuncoded = q ✓ 1 n ◆ Tcoded = Tuncoded + 1
  • 81. 2 4 6 8 10 12 14 16 18 20 Memory overhead q = 10^7 n = 20 #oftransmissions/ 10^6 1 10 2 3 4 5 6 7 8 9 Uncoded Theory Simulation Simulation Results
  • 82. 35% Tested on 25 EC2 instances Experiments: Low-rank Matrix Completion (10M x 10M)
  • 83. Conclusion • Coding theory for distributed computing • Stragglers slow down distributed computing • => Coded Computation • Data needs to be shuffled between distributed nodes • => Coded Shuffling ShufflingComputation