Federated learning
Federated learning(FL)is a framework used to train a shared global
model on data across distributed devices while the data does not
leave the device at any point.
3.
Underlying Architecture
Central Server:the entity responsible for managing the connections between the
entities in the FL environment and for aggregating the knowledge acquired by the FL
clients;
Parties (Clients): all computing devices with data that can be used for training the
global model, including but not limited to: personal computers, servers, smartphones,
smartwatches, computerized sensor devices, and many more;
Communication Framework: consists of the tools and devices used to connect servers
and parties and can vary between an internal network, an intranet, or even the Internet;
Aggregation Algorithm: the entity responsible for aggregating the knowledge obtained
by the parties after training with their local data and using the aggregated knowledge
to update the global model.
Exchanging Models, Parameters,or Gradients
Exchanging Models: This is the classical approach, where models are exchanged
between server and clients.
Exchanging Gradients: Instead of submitting the entire model to the server, clients in
this method submit only the gradients they compute locally.
Exchanging Model Parameters : This concept is mainly tied to neural networks where
model parameters and weights are usually used interchangeably. Parameters,
sometimes called weights.
Hybrid Approaches: Two or more of the above methods can be combined to form a
hybrid strategy that is particularly suited to a particular application or environment.
6.
The categorization offederated learning
➔ Federated learning enables collaborative model training without sharing
raw data.
➔ It's categorized based on feature overlap in client datasets:
◆ Horizontal Federated Learning (HFL).
◆ Vertical Federated Learning (VFL).
◆ Federated Transfer Learning (FTL).
7.
The categorization offederated learning
Horizontal federated learning
➔ Shared features, different users:
Clients have the same set of
features.
➔ Focus: Leveraging the diversity of
users with the same data structure
to enhance model accuracy and
generalization.
➔ Example: Multiple banks training a
fraud detection model using
transaction data (shared features) from
di erent
ff customers (di erent
ff users).
8.
The categorization offederated learning
Vertical federated learning
➔ Different features, overlapping users:
Clients have di erent
ff feature sets but
some features might overlap.
➔ Focus: Combining data from participants
with complementary information while
protecting sensitive features.
➔ Example: Hospitals and insurance
companies collaborating on healthcare
predictions using medical records
(Hospital data) and policy data (Insurance
data) with overlapping features like
patient IDs.
9.
The categorization offederated learning
Federated transfer learning
➔ Leveraging pre-trained knowledge:
Uses a pre-trained model to guide
learning on a new task or data with
di erent
ff characteristics.
➔ Focus: Accelerating learning on new
tasks or data with limited resources,
especially when privacy concerns
restrict model sharing.
➔ Example: Using a sentiment analysis
model trained on public product reviews
to personalize recommendations within a
specific e-commerce domain.
10.
The categorization offederated learning
What is Key Differences?
Features Horizontal FL Vertical FL Transfer FL
Feature overlap High Low/Partial No
User overlap Low High Varies
Focus Data diversity,
accuracy
Shared information,
privacy
Knowledge transfer
11.
Synchronous Vs AsynchronousFederated learning
Synchronous federated learning
The Server updates the shared central
model after “all the devices send their
model updates”.
Eg: Federated Averaging.
12.
Synchronous Vs AsynchronousFederated learning
Synchronous federated learning
➔ This approach o ers
ff several advantages:
◆ Faster convergence: Synchronization
leads to quicker convergence towards
a more accurate global model.
◆ Better accuracy: The coordinated
updates can result in higher model
accuracy compared to asynchronous
methods.
◆ Reduced staleness: Updates are
always fresh, mitigating the issue of
outdated gradients.
13.
Synchronous Vs AsynchronousFederated learning
Synchronous federated learning
➔ However, synchronous federated learning
also faces some challenges:
◆ Increased communication overhead:
All devices need to communicate
with the server at every step, leading
to higher bandwidth requirements.
◆ Higher synchronization latency:
Waiting for the slowest device can
introduce delays in the training
process.
◆ Potential bottlenecks: Stragg
14.
Synchronous Vs AsynchronousFederated learning
Asynchronous federated learning
The Server updates the shared central
model “as the new updates keep
coming in”.
Eg: SMPC Aggregation, Secure
Aggregation with Trusted Execution
Environment(TEE).
15.
Synchronous Vs AsynchronousFederated learning
Asynchronous federated learning
➔ This approach o ers
ff several advantages:
◆ Relaxed communication
requirements: Devices can update
the model whenever convenient,
reducing communication overhead.
◆ Improved scalability: Asynchronous
learning can handle a large number
of devices more effciently.
◆ Fault tolerance: The system is
more resilient to device failures or
intermittent connections.
16.
Synchronous Vs AsynchronousFederated learning
Asynchronous federated learning
➔ However, asynchronous federated
learning also faces some challenges:
◆ Stale gradients: Updates from
devices may become outdated
before reaching the server, impacting
accuracy.
◆ Slower convergence: The lack of
synchronization can slow down the
overall training process.
◆ Potential for divergence: Individual
models on devices may diverge
significantly from the global model.
17.
What Is Aggregationin FL?
● Aggregation methods vary, each with unique advantages and
challenges.
○ Beyond model updates, aggregate statistical indicators
(loss, accuracy).
○ Hierarchical aggregation for large-scale FL systems.
● Aggregation algorithms are crucial for FL success.
○ Determine model training e ectiveness.
ff
○ Impact practical usability of the global model.
Averaging Aggregation
This isthe initial approach and the most commonly known. In this
approach, the server summarizes the received messages, whether they are
model updates, parameters, or gradients, by determining the average
value of the received updates.
Since the set of participating clients is denoted by “N”and their
updates are denoted by “𝑤𝑖”, the aggregate update “w” is calculated
as follows
Different Approaches of Aggregation
21.
Clipped Averaging Aggregation
Thismethod is similar to average aggregation, where the average of received messages
is calculated, but with an additional step of clipping the model updates to a predefined
range before averaging.
This approach helps reduce the impact of outliers and malicious clients that may transmit
large and malicious updates.
“𝑐𝑙𝑖𝑝(𝑥,𝑐)” is a function, which clips the values of “x” to a range of “[−𝑐,𝑐]”, and “c”
is the clipping threshold,
Different Approaches of Aggregation
22.
Secure Aggregation
➔ Techniquessuch as homomorphic encryption, secure multiparty
computation, and secure enclaves make the aggregation process more
secure and private in this way. These methods can ensure that client data
remain confidential during the aggregation process, which is critical in
environments where data privacy is a high priority.
➔ Secure aggregation is the result of integrating security techniques, with
one of the available aggregation algorithms to create a new secure
algorithm. However, one of the most popular secure aggregation
algorithms is the differential privacy aggregation algorithm.
Different Approaches of Aggregation
23.
Differential Privacy AverageAggregation
This approach adds a layer of di erential
ff privacy to the aggregation process to ensure
confidentiality of client data. Each client adds random noise to its model update before
sending it to the server, and the server compiles the final model by aggregating the updates
with the random noise.
● “𝑛𝑖” is a random noise vector, drawn from a Laplace distribution
● “b” is a privacy budget parameter,.
Different Approaches of Aggregation
24.
Momentum Aggregation
This strategyshould help solve the slow convergence problem in federated learning. Each
client stores a “momentum” term that describes the direction of model changes in the past.
Before a new update is sent to the server, the momentum term is appended to the update.
The server collects the updates enriched with the momentum term to build the final model,
which can speed up convergence
Different Approaches of Aggregation
25.
Weighted Aggregation
In thismethod, the server weights each client’s contribution to the final model update
depending on client performance or other parameters such as the client’s device type, the
quality of the network connection, or the similarity of the data to the global data
distribution. This can help give more weight to consumers that are more reliable or
representative, improving the overall accuracy of the model.
“N” denotes the set of participating clients and “𝑤𝑖” their relative weights, and
their corresponding weights “𝑎𝑖”
Different Approaches of Aggregation
26.
Bayesian Aggregation
In thisapproach, the server aggregates model updates from multiple clients using Bayesian
inference, which allows for uncertainty in model parameters. This can help reduce
overfitting and improve the generalizability of the model
Different Approaches of Aggregation
27.
Adversarial Aggregation
In thismethod, the server applies a number of techniques to detect and mitigate the impact
of customers submitting fraudulent model changes. This may include methods such as
outlier rejection, model-based anomaly detection, and secure enclaves
Different Approaches of Aggregation
28.
Quantization Aggregation
In thisapproach, model updates are quantized into a lower bit form before being delivered
to the server for aggregation. This reduces the amount of data to be transmitted and
improves communication effciency.
Different Approaches of Aggregation
29.
Hierarchical Aggregation
In thisway, the aggregation process is carried out at multiple levels of a hierarchical
structure, such as a federal hierarchy. This can help reduce the communication overhead by
performing local aggregations at lower levels of the hierarchy before passing the results on
to higher levels.
Different Approaches of Aggregation
30.
Personalized Aggregation
During theaggregation process, this approach considers the unique characteristics of each
client’s data. In this way, the global model can be updated in the most appropriate way for
each client’s data, while ensuring data privacy.
Different Approaches of Aggregation
31.
The contribution areas
➔Improving model
aggregation
➔ Reducing convergence
➔ Handling heterogeneity
➔ Enhancing security
➔ Reducing communication
and computation cost
➔ Handling users’ failures
(fault tolerance);
➔ Boosting learning quality
➔ Supporting scalability,
personalization and
generalization.
Different Approaches of Aggregation
Top Federated learningframeworks
Tensorflow Federated
➔ TensorFlow Federated (TFF): Building Blocks for
Distributed Learning
◆ Open-source and flexible framework by Google
AI
◆ High-level API for defining federated
computations and algorithms
◆ Supports various machine learning models and
distributed architectures
34.
Top Federated learningframeworks
Pysyft
➔ PySyft: Secure and Private Federated Learning with
Python
◆ Secure enclaves for data privacy and
computation
◆ Focus on secure aggregation and model
poisoning prevention
◆ Easy integration with existing Python
libraries and tools.
35.
Top Federated learningframeworks
Flower
➔ Flower: Orchestrating Federated Learning
Workflows
◆ Lightweight and flexible framework for
managing federated training
◆ Focus on orchestration, communication, and
resource management
◆ Agnostic to underlying machine learning
libraries and frameworks.
36.
Top Federated learningframeworks
Choosing the Right Federated Learning Framework
➔ Consider your project needs, target platforms, and security
requirements
➔ Evaluate strengths and weaknesses of each framework
➔ Focus on a framework that aligns with your expertise and comfort
level