FEDBN: Improve Federated Learning on Non-IID Data via Local Batch Normalization (39

FEDBN: FEDERATED LEARNING
ON NON-IID FEATURES VIA LOCAL
BATCH NORMALIZATION
Paper presentation by Anam ur rehman
Contact: anamur.rehman@studenti.polito.it
Published as a conference paper at ICLR 2021
1

FEDBN: Federated learning[1]
Year
2021
2020
2018
2017
2016
2015
2014
2013
2012
2011
[1] Jakub et al. Federated optimization: Distributed machine learning for on-device intelligence. 2016
Classical Machine Learning:
• Centralized data storage
• Training process computations at the central server.
What if ?
 Data stays distributed on remote devices
Devices maintain control of their own data
 Training is done locally on remote devices
 One global model is learned via aggregation
3
Autonomous cars on an
average generate around 4 GB
of data per hour of driving.

FEDBN: Federated learning
Applications [1]
• Transportation: self-driving cars
• Healthcare: predictions on patient data
• Cybersecurity: spam filtering
• Smart applications: voice recognition, next word prediction, etc.
[1] Read more: Priyanka et al. Federated Learning: Opportunities and Challenges, 2021
Challenges [1]
• Communication Overheads: presence of stragglers
• Heterogeneity: system, statistical (in contrast to distributed learning)
• Privacy concerns
4
Year
2021
2020
2018
2017
2016
2015
2014
2013
2012
2011

FEDBN: Federated learning
Statistical heterogeneity among local datasets:
• Unbalancedness: Clients may have different amount of data
 Example, Spam filter for emails.
• Covariate shift: Statistical distribution of data varies among clients
 Example: NLP digits recognition
• Concept shift: Same features may correspond to different labels for different clients
 Example, in NLP, Sentiment analysis on same text may yield different sentiments for different clients
5

FEDBN: Example of NonIID datasets
MNIST
MNIST-M
USPS SynthDigits
SVHN
Covariate shift among datasets
Digits
dataset
6

FEDBN: Related work
• FedAvg[1]: Federated Average
[1] Brendan McMahan et al. Communication-efficient learning of deep networks from decentralized data. 2017.
At each communication round
1. Server randomly selects a subset of K clients and Send them current global model
2. Selected device k updates this model on local client data via SGD. After training client
sends the new local model back to server
3. Server aggregates local models to form a new global model
- Convergence in not guranteed. In hetergeneous settings it can diverge [1]
Year
2021
2020
2018
2017
2016
2015
2014
2013
2012
2011
7

FEDBN: Related work
• FedProx[1]: Federated Optimization in Heterogeneous Networks
[1] Tian Li et al, In Conference on Machine Learning and Systems, 2020a, 2020b.
Slide credit: Tian Li, MLSys presentation.
+ Limits the impact of heterogeneous local updates
+ Safely incorporate partial work of stragglers
+ Generalization of FedAvg; Allows for any local solver
+ Theoretical guarantees for convergence
Year
2021
2020
2018
2017
2016
2015
2014
2013
2012
2011
8

FEDBN: Related work
• SiloBN[1]: Siloed Federated Learning for Multi-Centric Histopathology Datasets
[1] Mathieu Andreux et al, Siloed federated learning for multi-centric histopathology datasets, pp. 129–139. Springer, 2020.
Slide credits: [1]
Year
2021
2020
2018
2017
2016
2015
2014
2013
2012
2011
9

FEDBN: Batch Normalization
Year
2021
2020
2018
2017
2016
2015
2014
2013
2012
2011
[1] Sergey Ioffe et al. Batch normalization: Accelerating deep network training by reducing internal covariate shift. 2015
γ and β are the only
learnable parameters
of BN layer.
10
Why we use it ?
To reduce internal covariate
shift in neural network [1].
How it works ?

FEDBN: Problem with non IID data
12
Consider a simple,
non-convex learning problem:
s.t
ϵ ∼ 𝒩 0, σ2
𝑤2
∗
Two clients train a model
s.t
x1 ∼ 𝒩 μ, σ1
2
.
x2 ∼ 𝒩 μ, σ2
2
.
and
σ1
2
≠ σ2
2
local squared loss before and after local BN
𝑤1
∗
w
𝑓𝑤 𝑥𝑖 = 𝑐𝑜𝑠 𝑤𝑥𝑖

FEDBN: Why not just take the average? (SiloBN)
Client 1
w1
∗
: Optimal weight
γ1
∗
: Optimal BN parameter
Observation 1:
For a fixed optimal weight w1
∗
,
changing γ deteriorates the model
quality.
Observation 2:
For a given optimal BN
parameter γ1
∗
, changing w
deteriorates the quality.
13

FEDBN: How it works?
Local training
Global aggregation
14

FEDBN: Pytorch implementation
15

FEDBN: How it Really Works?
Source: med-air/FedBN (github.com)
16
Global
Aggregation

FEDBN: Results on digit dataset (FedAvg vs FedBN)
• Outperforms FedAvg on SVHN dataset
• Faster convergence
• Smooth and robust convergence
17

FEDBN: Results; what if
Communication is done
at different frequencies?
18

FEDBN: Results; what if
Dataset size varies for
each client?
19

FEDBN: Contributions
Provides convergence guarantees.
Improves the convergence behavior on non-IID datasets.
One step further in privacy of client’s data.
20

FEDBN: Take home message
• Use batch normalization
• Keep it local
• Smooth and fast convergence
21

Useful links
Federated Optimization in Heterogeneous Networks
FedProx presentation by Tian Li:
22
med-air/FedBN (github.com)
Pytorch implementation of FedBN:
Brendan McMahan’s Talk:
Guarding user Privacy with Federated Learning

FEDBN: Improve Federated Learning on Non-IID Data via Local Batch Normalization (39

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to FEDBN: Improve Federated Learning on Non-IID Data via Local Batch Normalization (39

Similar to FEDBN: Improve Federated Learning on Non-IID Data via Local Batch Normalization (39 (20)

Recently uploaded

Recently uploaded (20)

FEDBN: Improve Federated Learning on Non-IID Data via Local Batch Normalization (39