SlideShare a Scribd company logo
1 of 61
Download to read offline
Minibatch	vs	Local	SGD	with	Shuffling:	
Tight	Convergence	Bounds	and	Beyond
Chulhee	"Charlie"	Yun	(KAIST	AI)
Shashank	Rajput	


(UW-Madison	CS)
Suvrit	Sra


(MIT	LIDS/EECS)
Distributed/Federated	Learning
2
…
Central	
Server
Devices
Distributed/Federated	Learning
2
…
Central	
Server
Devices
F1
(x) :=
1
N
N
∑
i=1
f1
i (x)
F2
(x)
F3
(x)
FM−1
(x)
FM
(x)
Distributed/Federated	Learning
2
…
Central	
Server
Devices
F1
(x) :=
1
N
N
∑
i=1
f1
i (x)
F2
(x)
F3
(x)
FM−1
(x)
FM
(x)
minx F(x) :=
1
M
M
∑
m=1
Fm
(x)
Local	SGD	(Federated	Averaging)
3
Local	SGD	(Federated	Averaging)
Local	SGD	(aka	federated	averaging):	run	SGD	locally,	sync	once	in	a	while


3
Local	SGD	(Federated	Averaging)
Local	SGD	(aka	federated	averaging):	run	SGD	locally,	sync	once	in	a	while


Analysis	of	local	SGD	under	unbiased	independent	stochastic	gradient	estimates


[Dieuleveut	&	Patel,	2019;	Haddadpour,	Kamani,	Mahdavi,	Cadambe,	2019;	Haddadpour	&	Mahdavi,	
2019;	Stich,	2019;	Yu,	Yang,	Zhu,	2019;	Li,	Sahu,	Zaheer,	Sanjabi,	Talwalkar,	Smith,	2020;	Li,	Huang,	
Yang,	Wang,	Zhang,	2020;	Koloskova,	Loizou,	Boreiri,	Jaggi,	Stich,	2020;	Khaled,	Mishchenko,	Richtarik,	
2020;	Spiridonoff,	Olshevsky,	Paschalidis,	2020;	Karimireddy,	Kale,	Mohri,	Reddi,	Stich,	Suresh,	2020;	
Stich	&	Karimireddy,	2020;	Qu,	Lin,	Kalagnanam,	Li,	Zhou,	Zhou,	2020;	Gorbunov,	Hansel,	Richtarik,	
2021;	Glasgow,	Yuan,	Ma,	2021;	Qin,	Etesami,	Uribe,	2022;	…]


3
Local	SGD	(Federated	Averaging)
Local	SGD	(aka	federated	averaging):	run	SGD	locally,	sync	once	in	a	while


Analysis	of	local	SGD	under	unbiased	independent	stochastic	gradient	estimates


[Dieuleveut	&	Patel,	2019;	Haddadpour,	Kamani,	Mahdavi,	Cadambe,	2019;	Haddadpour	&	Mahdavi,	
2019;	Stich,	2019;	Yu,	Yang,	Zhu,	2019;	Li,	Sahu,	Zaheer,	Sanjabi,	Talwalkar,	Smith,	2020;	Li,	Huang,	
Yang,	Wang,	Zhang,	2020;	Koloskova,	Loizou,	Boreiri,	Jaggi,	Stich,	2020;	Khaled,	Mishchenko,	Richtarik,	
2020;	Spiridonoff,	Olshevsky,	Paschalidis,	2020;	Karimireddy,	Kale,	Mohri,	Reddi,	Stich,	Suresh,	2020;	
Stich	&	Karimireddy,	2020;	Qu,	Lin,	Kalagnanam,	Li,	Zhou,	Zhou,	2020;	Gorbunov,	Hansel,	Richtarik,	
2021;	Glasgow,	Yuan,	Ma,	2021;	Qin,	Etesami,	Uribe,	2022;	…]


Local	SGD	hard	to	beat	minibatch	SGD,	though	possible	in	some	convex	cases	


[Woodworth,	Patel,	Stich,	Dai,	Bullins,	Mcmahan,	Shamir,	Srebro,	2020;	Woodworth,	Patel,	Srebro,	2020;	
Woodworth,	Bullins,	Shamir,	Srebro,	2021]
3
Random	Reshuffling
Unbiased	&	independent	noisy	grad	 	with-replacement	sampling	


																																	,						 	without-replacement	sampling	(shuffling)
=
≠
4
Random	Reshuffling
Unbiased	&	independent	noisy	grad	 	with-replacement	sampling	


																																	,						 	without-replacement	sampling	(shuffling)
=
≠
Random	Reshuffling	(shuffling-based	SGD)	converges	faster	than	with-replacement	
SGD	if	#	epochs	is	large	enough


[Gurbuzbalaban,	Ozdaglar,	Parrilo,	2019;	Haochen	&	Sra,	2019;	Nagaraj,	Jain,	Netrapalli,	2019;	
Nguyen,	Tran-Dinh,	Phan,	Nguyen,	van	Dijk,	2020;	Safran	&	Shamir,	2020;	2021;	Rajput,	Gupta,	
Papailiopoulos,	2020;	Rajput,	Lee,	Papailiopoulos,	2021;	Ahn,	Yun,	Sra,	2020;	Yun,	Sra,	Jadbabaie,	
2021;	Mishchenko,	Khaled,	Richtarik,	2020;	Tran,	Nguyen,	Tran-Dinh,	2021;	…]


4
Random	Reshuffling
Unbiased	&	independent	noisy	grad	 	with-replacement	sampling	


																																	,						 	without-replacement	sampling	(shuffling)
=
≠
Random	Reshuffling	(shuffling-based	SGD)	converges	faster	than	with-replacement	
SGD	if	#	epochs	is	large	enough


[Gurbuzbalaban,	Ozdaglar,	Parrilo,	2019;	Haochen	&	Sra,	2019;	Nagaraj,	Jain,	Netrapalli,	2019;	
Nguyen,	Tran-Dinh,	Phan,	Nguyen,	van	Dijk,	2020;	Safran	&	Shamir,	2020;	2021;	Rajput,	Gupta,	
Papailiopoulos,	2020;	Rajput,	Lee,	Papailiopoulos,	2021;	Ahn,	Yun,	Sra,	2020;	Yun,	Sra,	Jadbabaie,	
2021;	Mishchenko,	Khaled,	Richtarik,	2020;	Tran,	Nguyen,	Tran-Dinh,	2021;	…]


Only	a	couple	of	concurrent	papers	on	Random	Reshuffling	in	federated	setup


[Mishchenko,	Khaled,	Richtarik,	2021;	Malinovsky,	Mishchenko,	Richtarik,	2022;	…]
4
Where	This	Paper	Stands
5
Distributed	SGD	


with	replacement
Single-machine	SGD	


without	replacement


(aka	Random	Reshuffling)
Where	This	Paper	Stands
5
Distributed	SGD	


with	replacement
Single-machine	SGD	


without	replacement


(aka	Random	Reshuffling)
Tight	analysis	of	
Minibatch	&	Local


Random	Reshuffling	(RR)
Where	This	Paper	Stands
5
Distributed	SGD	


with	replacement
Single-machine	SGD	


without	replacement


(aka	Random	Reshuffling)
Tight	analysis	of	
Minibatch	&	Local


Random	Reshuffling	(RR)
·Upper	bounds


[Khaled	et	al.,	2020;	Spiridonoff	et	al.,	2020;	Qu	et	al.,	2020]
Õ (
Lν2
μ2MNK
+
L2
ν2
B
μ3N2K2
+
L2
τ2
B2
μ3N2K2 ) K ≳ κ
#	machines	 ,	#	local	components/machine	 ,	#	epochs	 ,	#	gradients/machine/comm


[Khaled	et	al.,	2020;	Spiridonoff	et	al.,	2020;	Qu	et	al.,	2020]
Õ (
Lν2
μ2MNK
+
L2
ν2
B
μ3N2K2
+
L2
τ2
B2
μ3N2K2 ) K ≳ κ
·Too	large	 	doesn't	help	:	if	 ,	even	when	 	rate	becomes	
B B = Θ(N) τ = 0 Õ (
1
NK2 )
#	machines	 ,	#	local	components/machine	 ,	#	epochs	 ,	#	gradients/machine/comm


[Khaled	et	al.,	2020;	Spiridonoff	et	al.,	2020;	Qu	et	al.,	2020]
Õ (
Lν2
μ2MNK
+
L2
ν2
B
μ3N2K2
+
L2
τ2
B2
μ3N2K2 ) K ≳ κ
·Too	large	 	doesn't	help	:	if	 ,	even	when	 	rate	becomes	
B B = Θ(N) τ = 0 Õ (
1
NK2 )
·If	 	and	 	small,	local	RR	can	match	minibatch	RR	(can't	outperform)
B τ
#	machines	 ,	#	local	components/machine	 ,	#	epochs	 ,	#	gradients/machine/comm


(e.g.,	same	dataset	for	all	 	machines)
∀i, f1
i = f2
i = ⋯ = fM
i =: fi
M
Results:	Synchronized	Shuffling
17
Assume	component-wise	homogeneity	


(e.g.,	same	dataset	for	all	 	machines)
∀i, f1
i = f2
i = ⋯ = fM
i =: fi
M
For	any	permutation	 ,	summing	over	an	epoch	gives	full	gradient:	
σ
N
∑
i=1
∇fσ(i)(x) = N∇F(x)
Results:	Synchronized	Shuffling
17
Assume	component-wise	homogeneity	


(e.g.,	same	dataset	for	all	 	machines)
∀i, f1
i = f2
i = ⋯ = fM
i =: fi
M
For	any	permutation	 ,	summing	over	an	epoch	gives	full	gradient:	
σ
N
∑
i=1
∇fσ(i)(x) = N∇F(x)
Any	ways	to	exploit	the	permutation	identity	"more	often"?
Permutation	Identity	
N
∑
i=1
∇fσm
k (i)(x) = N ∇F(x)
Results:	Synchronized	Shuffling
18
Machine	1:	fσ1
k (i) f3 f4 f5 f2 f6
f1
Machine	2:	fσ2
k (i) f4 f1 f2 f3 f5
f6
Machine	3:	fσ3
k (i) f6 f1 f4 f3 f2
f5
Independent	Shuffling
σm
k ∼ Unif[Perm(N)]
Permutation	Identity	
N
∑
i=1
∇fσm
k (i)(x) = N ∇F(x)
Results:	Synchronized	Shuffling
18
Machine	1:	fσ1
k (i) f3 f4 f5 f2 f6
f1
Machine	2:	fσ2
k (i) f4 f1 f2 f3 f5
f6
Machine	3:	fσ3
k (i) f6 f1 f4 f3 f2
f5
Synchronized	Shuffling
σ ∼ Unif[Perm(N)], σm
k (i) := σ((i+
mN
M ) mod N)
Independent	Shuffling
σm
k ∼ Unif[Perm(N)]
Permutation	Identity	
N
∑
i=1
∇fσm
k (i)(x) = N ∇F(x)
Results:	Synchronized	Shuffling
18
Machine	1:	fσ1
k (i) f3 f4 f5 f2 f6
f1
Machine	2:	fσ2
k (i) f4 f1 f2 f3 f5
f6
Machine	3:	fσ3
k (i) f6 f1 f4 f3 f2
f5
Machine	1:	fσ1
k (i) f3 f4 f5 f2 f6
f1
Machine	2:	fσ2
k (i) f2 f6 f4 f1 f5
f3
Machine	3:	fσ3
k (i) f1 f5 f6 f3 f4
f2
Synchronized	Shuffling
σ ∼ Unif[Perm(N)], σm
k (i) := σ((i+
mN
M ) mod N)
Independent	Shuffling
σm
k ∼ Unif[Perm(N)]
Permutation	Identity	
N
∑
i=1
∇fσm
k (i)(x) = N ∇F(x)
Results:	Synchronized	Shuffling
18
Machine	1:	fσ1
k (i) f3 f4 f5 f2 f6
f1
Machine	2:	fσ2
k (i) f4 f1 f2 f3 f5
f6
Machine	3:	fσ3
k (i) f6 f1 f4 f3 f2
f5
Machine	1:	fσ1
k (i) f3 f4 f5 f2 f6
f1
Machine	2:	fσ2
k (i) f2 f6 f4 f1 f5
f3
Machine	3:	fσ3
k (i) f1 f5 f6 f3 f4
f2
Get	 	every	 	iterations	
N ∇F(x)
N
M
Synchronized	Shuffling
σ ∼ Unif[Perm(N)], σm
k (i) := σ((i+
mN
M ) mod N)
Independent	Shuffling
σm
k ∼ Unif[Perm(N)]
Results:	Synchronized	Shuffling
19
Minibatch	RR	 	for	
Õ (
L2
ν2
μ3MNK2 ) K ≳ κ Local	RR	 	for	
Õ (
L2
ν2
μ3MNK2
+ L2
ν2
B
μ3N2K2 ) K ≳ κ
Results:	Synchronized	Shuffling
19
Minibatch	RR	 	for	
Õ (
L2
ν2
μ3MNK2 ) K ≳ κ Local	RR	 	for	
Õ (
L2
ν2
μ3MNK2
+ L2
ν2
B
μ3N2K2 ) K ≳ κ
Minibatch	RR	 	for	
Õ (
L2
ν2
μ3M2NK2 ) K ≳ κ Local	RR	 	for	
Õ (
L2
ν2
μ3M2NK2
+
L2
ν2
B
μ3N2K2 ) K ≳ κ
+SyncShuf +SyncShuf
Results:	Synchronized	Shuffling
19
Minibatch	RR	 	for	
Õ (
L2
ν2
μ3MNK2 ) K ≳ κ Local	RR	 	for	
Õ (
L2
ν2
μ3MNK2
+ L2
ν2
B
μ3N2K2 ) K ≳ κ
Minibatch	RR	 	for	
Õ (
L2
ν2
μ3M2NK2 ) K ≳ κ Local	RR	 	for	
Õ (
L2
ν2
μ3M2NK2
+
L2
ν2
B
μ3N2K2 ) K ≳ κ
+SyncShuf +SyncShuf
·Bypass	the	 	factors	in	lower	bounds
1
M
Results:	Synchronized	Shuffling
19
Minibatch	RR	 	for	
Õ (
L2
ν2
μ3MNK2 ) K ≳ κ Local	RR	 	for	
Õ (
L2
ν2
μ3MNK2
+ L2
ν2
B
μ3N2K2 ) K ≳ κ
Minibatch	RR	 	for	
Õ (
L2
ν2
μ3M2NK2 ) K ≳ κ Local	RR	 	for	
Õ (
L2
ν2
μ3M2NK2
+
L2
ν2
B
μ3N2K2 ) K ≳ κ
+SyncShuf +SyncShuf
·Bypass	the	 	factors	in	lower	bounds
1
M
·Can	allow	"slight"	component-wise	heterogeneity
Thank	you!


Minibatch	vs	Local	SGD	with	Shuffling:	Tight	Convergence	Bounds	and	Beyond


Chulhee	Yun,	Shashank	Rajput,	Suvrit	Sra


https://arxiv.org/abs/2110.10342
Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond

More Related Content

Similar to Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond

Skiena algorithm 2007 lecture16 introduction to dynamic programming
Skiena algorithm 2007 lecture16 introduction to dynamic programmingSkiena algorithm 2007 lecture16 introduction to dynamic programming
Skiena algorithm 2007 lecture16 introduction to dynamic programming
zukun
 

Similar to Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond (20)

Skiena algorithm 2007 lecture16 introduction to dynamic programming
Skiena algorithm 2007 lecture16 introduction to dynamic programmingSkiena algorithm 2007 lecture16 introduction to dynamic programming
Skiena algorithm 2007 lecture16 introduction to dynamic programming
 
Teknik Simulasi
Teknik SimulasiTeknik Simulasi
Teknik Simulasi
 
Harmonics in rotating machines
Harmonics in rotating machinesHarmonics in rotating machines
Harmonics in rotating machines
 
Hidden Markov Random Field model and BFGS algorithm for Brain Image Segmentation
Hidden Markov Random Field model and BFGS algorithm for Brain Image SegmentationHidden Markov Random Field model and BFGS algorithm for Brain Image Segmentation
Hidden Markov Random Field model and BFGS algorithm for Brain Image Segmentation
 
Lecture 2-Filtering.pdf
Lecture 2-Filtering.pdfLecture 2-Filtering.pdf
Lecture 2-Filtering.pdf
 
One way ANOVA balanced design
One way ANOVA balanced designOne way ANOVA balanced design
One way ANOVA balanced design
 
Sufficient decrease is all you need
Sufficient decrease is all you needSufficient decrease is all you need
Sufficient decrease is all you need
 
FEMSlide 1.pdf
FEMSlide 1.pdfFEMSlide 1.pdf
FEMSlide 1.pdf
 
Optimal multi-configuration approximation of an N-fermion wave function
 Optimal multi-configuration approximation of an N-fermion wave function Optimal multi-configuration approximation of an N-fermion wave function
Optimal multi-configuration approximation of an N-fermion wave function
 
Unit 1
Unit 1Unit 1
Unit 1
 
A series of maximum entropy upper bounds of the differential entropy
A series of maximum entropy upper bounds of the differential entropyA series of maximum entropy upper bounds of the differential entropy
A series of maximum entropy upper bounds of the differential entropy
 
Numerical Algorithm for a few Special Functions
Numerical Algorithm for a few Special FunctionsNumerical Algorithm for a few Special Functions
Numerical Algorithm for a few Special Functions
 
Multicasting in Linear Deterministic Relay Network by Matrix Completion
Multicasting in Linear Deterministic Relay Network by Matrix CompletionMulticasting in Linear Deterministic Relay Network by Matrix Completion
Multicasting in Linear Deterministic Relay Network by Matrix Completion
 
1531 fourier series- integrals and trans
1531 fourier series- integrals and trans1531 fourier series- integrals and trans
1531 fourier series- integrals and trans
 
Strong convergence of an algorithm about strongly quasi nonexpansive mappings
Strong convergence of an algorithm about strongly quasi nonexpansive mappingsStrong convergence of an algorithm about strongly quasi nonexpansive mappings
Strong convergence of an algorithm about strongly quasi nonexpansive mappings
 
Modular Macros for OCaml
Modular Macros for OCamlModular Macros for OCaml
Modular Macros for OCaml
 
1.pdf
1.pdf1.pdf
1.pdf
 
EC8553 Discrete time signal processing
EC8553 Discrete time signal processing EC8553 Discrete time signal processing
EC8553 Discrete time signal processing
 
Evaluating definite integrals
Evaluating definite integralsEvaluating definite integrals
Evaluating definite integrals
 
linear models.pptx
linear models.pptxlinear models.pptx
linear models.pptx
 

Recently uploaded

ALCOHOL PRODUCTION- Beer Brewing Process.pdf
ALCOHOL PRODUCTION- Beer Brewing Process.pdfALCOHOL PRODUCTION- Beer Brewing Process.pdf
ALCOHOL PRODUCTION- Beer Brewing Process.pdf
Madan Karki
 
Online crime reporting system project.pdf
Online crime reporting system project.pdfOnline crime reporting system project.pdf
Online crime reporting system project.pdf
Kamal Acharya
 
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Lovely Professional University
 
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
drjose256
 

Recently uploaded (20)

Supermarket billing system project report..pdf
Supermarket billing system project report..pdfSupermarket billing system project report..pdf
Supermarket billing system project report..pdf
 
ALCOHOL PRODUCTION- Beer Brewing Process.pdf
ALCOHOL PRODUCTION- Beer Brewing Process.pdfALCOHOL PRODUCTION- Beer Brewing Process.pdf
ALCOHOL PRODUCTION- Beer Brewing Process.pdf
 
BORESCOPE INSPECTION for engins CFM56.pdf
BORESCOPE INSPECTION for engins CFM56.pdfBORESCOPE INSPECTION for engins CFM56.pdf
BORESCOPE INSPECTION for engins CFM56.pdf
 
Electrostatic field in a coaxial transmission line
Electrostatic field in a coaxial transmission lineElectrostatic field in a coaxial transmission line
Electrostatic field in a coaxial transmission line
 
analog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxanalog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptx
 
How to Design and spec harmonic filter.pdf
How to Design and spec harmonic filter.pdfHow to Design and spec harmonic filter.pdf
How to Design and spec harmonic filter.pdf
 
Lesson no16 application of Induction Generator in Wind.ppsx
Lesson no16 application of Induction Generator in Wind.ppsxLesson no16 application of Induction Generator in Wind.ppsx
Lesson no16 application of Induction Generator in Wind.ppsx
 
Introduction to Arduino Programming: Features of Arduino
Introduction to Arduino Programming: Features of ArduinoIntroduction to Arduino Programming: Features of Arduino
Introduction to Arduino Programming: Features of Arduino
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdf
 
Piping and instrumentation diagram p.pdf
Piping and instrumentation diagram p.pdfPiping and instrumentation diagram p.pdf
Piping and instrumentation diagram p.pdf
 
Theory for How to calculation capacitor bank
Theory for How to calculation capacitor bankTheory for How to calculation capacitor bank
Theory for How to calculation capacitor bank
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded Systems
 
AI in Healthcare Innovative use cases and applications.pdf
AI in Healthcare Innovative use cases and applications.pdfAI in Healthcare Innovative use cases and applications.pdf
AI in Healthcare Innovative use cases and applications.pdf
 
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisSeismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
 
Online crime reporting system project.pdf
Online crime reporting system project.pdfOnline crime reporting system project.pdf
Online crime reporting system project.pdf
 
Vip ℂall Girls Karkardooma Phone No 9999965857 High Profile ℂall Girl Delhi N...
Vip ℂall Girls Karkardooma Phone No 9999965857 High Profile ℂall Girl Delhi N...Vip ℂall Girls Karkardooma Phone No 9999965857 High Profile ℂall Girl Delhi N...
Vip ℂall Girls Karkardooma Phone No 9999965857 High Profile ℂall Girl Delhi N...
 
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
 
"United Nations Park" Site Visit Report.
"United Nations Park" Site  Visit Report."United Nations Park" Site  Visit Report.
"United Nations Park" Site Visit Report.
 
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfInvolute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
 
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
 

Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond