Slides for the ICLR 2022 paper "Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond" by Chulhee Yun (KAIST AI), Shashank Rajput (University of Wisconsin-Madison), Suvrit Sra (MIT).
The slides were used for an oral presentation at ICLR 2022.
52. Permutation Identity
N
∑
i=1
∇fσm
k (i)(x) = N ∇F(x)
Results: Synchronized Shuffling
18
Machine 1: fσ1
k (i) f3 f4 f5 f2 f6
f1
Machine 2: fσ2
k (i) f4 f1 f2 f3 f5
f6
Machine 3: fσ3
k (i) f6 f1 f4 f3 f2
f5
Independent Shuffling
σm
k ∼ Unif[Perm(N)]
53. Permutation Identity
N
∑
i=1
∇fσm
k (i)(x) = N ∇F(x)
Results: Synchronized Shuffling
18
Machine 1: fσ1
k (i) f3 f4 f5 f2 f6
f1
Machine 2: fσ2
k (i) f4 f1 f2 f3 f5
f6
Machine 3: fσ3
k (i) f6 f1 f4 f3 f2
f5
Synchronized Shuffling
σ ∼ Unif[Perm(N)], σm
k (i) := σ((i+
mN
M ) mod N)
Independent Shuffling
σm
k ∼ Unif[Perm(N)]
54. Permutation Identity
N
∑
i=1
∇fσm
k (i)(x) = N ∇F(x)
Results: Synchronized Shuffling
18
Machine 1: fσ1
k (i) f3 f4 f5 f2 f6
f1
Machine 2: fσ2
k (i) f4 f1 f2 f3 f5
f6
Machine 3: fσ3
k (i) f6 f1 f4 f3 f2
f5
Machine 1: fσ1
k (i) f3 f4 f5 f2 f6
f1
Machine 2: fσ2
k (i) f2 f6 f4 f1 f5
f3
Machine 3: fσ3
k (i) f1 f5 f6 f3 f4
f2
Synchronized Shuffling
σ ∼ Unif[Perm(N)], σm
k (i) := σ((i+
mN
M ) mod N)
Independent Shuffling
σm
k ∼ Unif[Perm(N)]
55. Permutation Identity
N
∑
i=1
∇fσm
k (i)(x) = N ∇F(x)
Results: Synchronized Shuffling
18
Machine 1: fσ1
k (i) f3 f4 f5 f2 f6
f1
Machine 2: fσ2
k (i) f4 f1 f2 f3 f5
f6
Machine 3: fσ3
k (i) f6 f1 f4 f3 f2
f5
Machine 1: fσ1
k (i) f3 f4 f5 f2 f6
f1
Machine 2: fσ2
k (i) f2 f6 f4 f1 f5
f3
Machine 3: fσ3
k (i) f1 f5 f6 f3 f4
f2
Get every iterations
N ∇F(x)
N
M
Synchronized Shuffling
σ ∼ Unif[Perm(N)], σm
k (i) := σ((i+
mN
M ) mod N)
Independent Shuffling
σm
k ∼ Unif[Perm(N)]
58. Results: Synchronized Shuffling
19
Minibatch RR for
Õ (
L2
ν2
μ3MNK2 ) K ≳ κ Local RR for
Õ (
L2
ν2
μ3MNK2
+ L2
ν2
B
μ3N2K2 ) K ≳ κ
Minibatch RR for
Õ (
L2
ν2
μ3M2NK2 ) K ≳ κ Local RR for
Õ (
L2
ν2
μ3M2NK2
+
L2
ν2
B
μ3N2K2 ) K ≳ κ
+SyncShuf +SyncShuf
·Bypass the factors in lower bounds
1
M
59. Results: Synchronized Shuffling
19
Minibatch RR for
Õ (
L2
ν2
μ3MNK2 ) K ≳ κ Local RR for
Õ (
L2
ν2
μ3MNK2
+ L2
ν2
B
μ3N2K2 ) K ≳ κ
Minibatch RR for
Õ (
L2
ν2
μ3M2NK2 ) K ≳ κ Local RR for
Õ (
L2
ν2
μ3M2NK2
+
L2
ν2
B
μ3N2K2 ) K ≳ κ
+SyncShuf +SyncShuf
·Bypass the factors in lower bounds
1
M
·Can allow "slight" component-wise heterogeneity