Federated Learning Overview and New Research Areas

Federated
Learning, a Brief
Overview
*https://en.wikipedia.org/wiki/File:Centralized_federated_learning_protocol.png
1

Machine Learning
Machine Learning is the “field of study that gives computers the
capability to learn without being explicitly programmed.” (Arthur Samuel,
1959)
2
X
1
2
3
def square(x):
return x*x
square(4)
16
Y
1
4
9
Traditional
Programing
Rules
Data
Output
Machine
Learning
Data
Rules
Output
Explicit Instructions
Algorithms implicitly learn instructions by examples
Output
Input
X Y
1 1
2 4
3 9
Learning
Input Output

Why Machine Learning?
3
def factorial(x):
result = 1
for i in range(1,x+1):
result = result*i
return result
1 2 3 4
x =
Problem:
Weather this image is
dog or cat?
def function(image):
Very complex to code this
function with traditional
programming
…
Cat Dog Cat
.
Machine Learning
Prediction:
Hypothesis:
Simple Data
Simple Problem
Traditional
Programming

Why ML is hot topic these days?
4
1. Data 2. Computing power

Data, a blessing as well as a curse.
6
➔ More the data more accurate ML model, but handling huge amounts of data and training is a
problem.
➔ Although the compute/processing power is increasing, but the size of datasets is increasing
much more rapidly than computing power.
➔ This results in longer training time or even big ML models may cause out of memory errors.
➔ Majority of the datasets today cannot be used on a single machine due to the humongous
size of the dataset.
So, what is the solution?

Drawbacks of Distributed ML.
8
➔ Data Collection:
◆ Privacy.
◆ Security.
◆ Integrity.
◆ Storage.
➔ Regulations: GDPR (Europe), CCPA (California), PIPEDA (Canada), LGPD (Brazil), PDPL
(Argentina), KVKK (Turkey), POPI (South Africa), FSS (Russia), CDPR (China), PDPB (India), PIPA
(Korea), APPI (Japan), PDP (Indonesia), PDPA (Singapore), APP (Australia), and other
regulations protect sensitive data from being moved. In fact, those regulations sometimes
even prevent single organizations from combining their own users’ data for artificial
intelligence training because those users live in different parts of the world, and their data is
governed by different data protection regulations.
➔ Communication overhead:
➔ Resource utilization and load balancing:
➔ …

Solution: Federated Learning
Federated Learning (FL) [1] is a distributed
machine learning approach in which large
decentralized datasets, residing on edge devices
like mobile phones and IoT devices, are used to
train a Machine Learning (ML) model.
Some Important standard terms in FL.
1. Server: A computational device that orchestrates the
whole FL process and is responsible for weight
aggregation.
2. Client: A device that has some computational resources
and local data associated with it. e.g mobile phones, IoT
devices, personal computers etc etc.
3. Round: Round or communication round is one round
trip journey of model weights from server to clients and
back to server.
*https://blog.ml.cmu.edu/wp-content/uploads/2019/11/Screen-
Shot-2019-11-12-at-10.42.34-AM-1024x506.png
9

Distributed ML Vs Federated Learning
1. In distributed ML, data is centrally stored
(e.g., in a data center).
2. The main goal is just to train faster.
3. We control how data is distributed
across workers: usually, it is distributed
uniformly at random across workers
1. In FL, data is naturally distributed and
generated locally.
2. Data never leaves the place of origin.
3. Data is not independent and identically
distributed (non-i.i.d.), and it is
imbalanced.

Federated Learning Working
1.Model selection 2. Model
broadcast
3. Local training 4. Model upload
and averaging
5. Broadcast
updated model.

Types of Federated Learning
1. Based on Types of participating devices
1. Massive no. of clients (up to billions)
2. Small dataset per client
3. Limited availability and reliability
4. Some parties may be malicious
1. 2-100 clients
2. Medium to large dataset per client
3. Reliable clients, almost always available
4. Parties are typically honest
Cross Device FL Cross Silo FL

2. Based on Types of data partitioning
a. Horizontal Federated Learning (HFL):

2. Based on Types of data partitioning
b. Vertical Federated Learning (VFL):

Federated
Learning.
Research Areas in FL
FL consumes very
high bandwidth.
Communication
Cost.
Different clients may
have different data
distributions and
size.
Statistical
Heterogeneity.
Different clients have
different system
resources.
System
Heterogeneity.
ML model is not able
to train properly.
Model
Convergence
17

References
1. McMahan, H. Brendan, Eider Moore, Daniel Ramage, and Seth Hampson. "Communication-
efficient learning of deep networks from decentralized data." AISTATS, 2017.
2. M. U. Yaseen, A. Anjum, O. Rana, N. Antonopoulos, Deep learning hyper-parameter
optimization for video analytics in clouds, IEEE Transactions on Systems, Man, and
Cybernetics: Systems 49 (1) (2019) 253–264. doi:10.1109/TSMC.2018.2840341.
3. T. Yu, H. Zhu, Hyper-Parameter Optimization: A Review of Algorithms and Applications, CoRR
(2020) 1–56 arXiv:2003.05689. URL http://arxiv.org/abs/2003.05689
4. K. Murphy, Machine Learning: A Probabilistic Perspective, Adaptive Computation and
Machine Learning series, MIT Press, 2012. URL https://books.google.co.kr/books?
id=NZP6AQAAQBAJ
18

Federated Learning Overview and New Research Areas

More Related Content

Similar to Federated Learning Overview and New Research Areas

Recently uploaded

Federated Learning Overview and New Research Areas