Federated Semi-Supervised Learning
with Inter-Client Consistency & Disjoint Learning
Wonyong Jeong!, Jaehong Yoon", Eunho Yang!,$, Sung Ju Hwang!,$
Graduate Shool of AI!
, KAIST, Seoul, South Korea
School of Computing", KAIST, Deajeon, South Korea
AITRICS#, Seoul, South Korea
Motivation: Deficiency of Supervision
Federated Learning, in which multiple clients collaboratively learn a global model via
aggregating knowledge from local private data, have been actively studied.
Data obtained at the client often comes without accompanying labels due to expensive
labeling costs or requirement of expert knowledge when annotating.
Global Model
Global
Knowledge
Local
Knowledge
Local
Knowledge
Fully Labeled Data
Fully Labeled Data
Fully Labeled Data
Federated Semi-Supervised Learning
This leads us to a new problem of Federated Semi-Supervised Learning (FSSL), which
tackles federated learning under scarcity of supervision.
We propose two possible scenarios depending on the availability of the labeled data,
such as Labels-at-Client and Labels-at-Server scenarios.
Global Model
Global
Knowledge
Local
Knowledge
Local
Knowledge
Small-sized Labeled Data + Unlabeled Data
Small-sized Labeled Data + Unlabeled Data
Small-sized Labeled Data + Unlabeled Data
Federated Semi-Supervised Learning
One of common scenarios is that the end-users intermittently annotate a small portion
of their local data, while the rest of data instances remains unlabeled.
We call such scenarios as Labels-at-Client scenario where both the labeled and
unlabeled data are available at client side.
iPhone Photos – Face Album
Federated Semi-Supervised Learning
Another common cases for real-world applications is that annotation requires expert
knowledge (e.g. annotating medical images, evaluating body postures for exercises).
We define the scenario as Labels-at-Server scenario that the supervised labels are only
available at the server, while local clients work with unlabeled data.
Evaluating Body Postures
Challenges: Exploiting Reliable Knowledge
A simple solution to tackle the FSSL problems would be a naïve combination of semi-
supervised learning and federated learning algorithms.
Yet, the naïve approach does not fully exploit the knowledge from the multiple models
trained on each local unlabeled data, which may cause the degradation of reliability.
Federated
Learning
Semi-Supervised
Learning
Unreliable Knowledge
How to Aggregate?
Multiple Clients
Unlabeled Data
Federated Matching (FedMatch)
Thus, we propose a novel method, FedMatch, which effectively utilizes reliable
knowledge from multiple clients.
FedMatch consists of two main components, such as the Inter-Client Consistency Loss
and Parameter Decomposition for disjoint learning.
Federated
Learning
Semi-Supervised
Learning
Inter-Client Consistency
Reliable Knowledge
Inter-Client Consistency
Consistency regularization methods, one of popular approaches in SSL, enforce model
predictions from augmented instances to output the same class label.
Unlabeled
Examples
𝒖
We further improve this by proposing Inter-Client Consistency Loss, which enforces the
consistency between the predictions made across multiple helper agents.
Inter-Client Consistency Loss
0
1
!
𝒚𝒊
𝐂𝐄(!
𝒚𝒊, )
𝝅(𝒖)
𝒑𝜽
Truck
Conventional
Consistency Regularization
𝐇𝐞𝐥𝐩𝐞𝐫 𝟐( ) 1
𝐇𝐞𝐥𝐩𝐞𝐫 𝟏 ( )
1
1
𝐋𝐨𝐜𝐚𝐥𝐌𝐨𝐝𝐞𝐥
0
0
• Helper Agents are the selected
models form other clients based on
their similarity.
Inter-Client Consistency
Moreover, we perform local consistency regularization while utilizing the agreement for
generating pseudo labels, namely agreement-based pseudo labeling.
With the pseudo labels, we minimize standard cross entropy on augmented instances,
similarly to FixMatch’s (Sohn et al., 2020).
𝑯𝒆𝒍𝒑𝒆𝒓 𝟏 ( )
𝑯𝒆𝒍𝒑𝒆𝒓 𝟐( )
Unlabeled
Examples
Inter-Client Consistency
Regularization
𝑳𝒐𝒄𝒂𝒍𝑴𝒐𝒅𝒆𝒍
0
0
1
0
1
0
1
𝒖
0
0
1
0
1
0
1 0
1
!
𝒚𝒊
Agreement-based
Pseudo-Labeling
𝐂𝐄(!
𝒚𝒊, )
𝝅(𝒖)
𝒑𝜽
Truck
Truck !
Cat
Augmentation &
Local Consistency Loss
𝝉
Truck
Truck !
Cat ?
Parameter Decomposition for Disjoint Learning
In the standard semi-supervised learning, learning on labeled and unlabeled data is
simultaneously done with a shared set of parameters.
Labeled Data
Unlabeled Data
𝜃
Model Parameters
Reliable
Knowledge
Unreliable
Knowledge
However, this may result in the model to forget about what it learned from labeled data,
which is crucial to semi-supervised learning.
Reliable
Knowledge
Unreliable
Knowledge
Forgetting happens
Poor Performance
on Labeled Data
Parameter Decomposition for Disjoint Learning
We thus decompose our model parameters into parameters for supervised learning and
unsupervised learning, which are 𝜎 and 𝜓, respectively.
This enables us to separate the learning procedures depending on the availability of
labeled data.
𝜃 𝜎
𝒮 = {𝒙! , 𝒚! }!"#
$
Labeled Data from 𝒟
⊕
𝜓
𝒰 = {𝒖% }%"#
&
Unlabeled Data from 𝒟
Parameter Decomposition for Disjoint Learning
For Labels-at-Client scenario, where the labeled data is given at client, we learn both 𝜎
and 𝜓 at client side, and send them back to server.
Labels-at-Client Scenario
Global Model
Σ(()
Σ(()
Aggregation
Aggregation
⊕
Helper 1 Helper 2
𝜓<!
𝜓<"
High Similarity Low Similarity
…
…
…
…
High Similarity Low Similarity
⊕
𝜓<!
𝜓<"
Σ(()
Aggregation
For Labels-at-Server scenario, where the labeled data is located at server, we disjointly
learn 𝜎 at server. In this case, only 𝜓 at client side is sent back to server.
Labels-at-Server Scenario
Local Model
⊕
Learn 𝜎=! and 𝜓=!
Labeled Data Unlabeled Data
Global Model
Learn 𝜎>
Labeled Data
Helper 1 Helper 2
Learn 𝜓=!
Local Model
⊕
𝜎>
∗ Unlabeled Data
Objective Functions
For learning on labeled data, we perform the standard supervised learning with cross-
entropy loss utilizing corresponding labels .
For learning on unlabeled data, we minimize inter-client consistency loss and sparsity
regularization term on 𝜓, as well as a regularizer which prevents 𝜓 drifting away from 𝜎∗.
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 ℒA 𝜎 = 𝜆ACE(y, pBCD∗(y|𝐱))
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 ℒE 𝜓 = 𝜆EϕB∗CD 5 + 𝜆F"
||𝜓||! + 𝜆F#
||𝜎∗ − 𝜓||"
"
Experimental Setup
We validate our method on Batch-IID and -NonIID datasets (CIFAR-10), which is
conventional settings in standard federated learning.
Example Streaming Non-IID dataset under Labels-at-Server scenario
0 1 2 … 9
0 1 2 … 9
0 1 2 … 9
0 1 2 … 9
0 1 2 … 9
We also evaluate our method on Streaming Non-IID dataset (Fashion-MNIST) where
locally-generated private data continuously streams in.
Results of Main Experiments
Our method outperforms the naïve Federated Semi-supervised Learning models on both
batch and streaming datasets in all scenarios.
Batch-NonIID Dataset with 100 Clients
Labels-at-Client Scenario Labels-at-Server Scenario
Models Accuracy C2S Cost
Local-FixMatch 62.62 ± 0.32 % N/A
FedProx-FixMatch 62.40 ± 0.43 % 100 %
FedMatch (Ours) 77.95 ± 0.14 % 48 %
Streaming Non-IID under Labels-at-Client Scenario
Streaming Non-IID under Labels-at-Server Scenario
Models Accuracy C2S Cost
FedProx-SL 77.43 ± 0.42 % 100 %
FedProx-FixMatch 73.71 ± 0.32 % 100 %
FedMatch (Ours) 84.15 ± 0.31 % 63 %
Ablation Study
We observe that our inter-client consistency loss plays a significant role to improve
performance in FSSL scenario.
Effect of Parameter Decomposition
Effect of Inter-Client Consistency Loss
Our decomposition method also allows us to preserve reliable knowledge from the
labeled data.
Conclusion
• We introduce Federated Semi-Supervised Learning (FSSL), where each client learns
with only partly labeled data (labels-at-client), or work with completely unlabeled data
with supervised labels only available at the server (labels-at-server).
• We propose FedMatch which utilizes the Inter-Client Consistency Loss to maximize the
agreement from different clients, and Parameter Decomposition for disjoint learning
enabling our model to preserve the reliable knowledge from labeled data.
• We validate our method under Streaming Non-IID and Batch IID settings, which our
methods significantly outperform naïve approach of FSSL.
Thank You!

Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint Learning

  • 1.
    Federated Semi-Supervised Learning withInter-Client Consistency & Disjoint Learning Wonyong Jeong!, Jaehong Yoon", Eunho Yang!,$, Sung Ju Hwang!,$ Graduate Shool of AI! , KAIST, Seoul, South Korea School of Computing", KAIST, Deajeon, South Korea AITRICS#, Seoul, South Korea
  • 2.
    Motivation: Deficiency ofSupervision Federated Learning, in which multiple clients collaboratively learn a global model via aggregating knowledge from local private data, have been actively studied. Data obtained at the client often comes without accompanying labels due to expensive labeling costs or requirement of expert knowledge when annotating. Global Model Global Knowledge Local Knowledge Local Knowledge Fully Labeled Data Fully Labeled Data Fully Labeled Data
  • 3.
    Federated Semi-Supervised Learning Thisleads us to a new problem of Federated Semi-Supervised Learning (FSSL), which tackles federated learning under scarcity of supervision. We propose two possible scenarios depending on the availability of the labeled data, such as Labels-at-Client and Labels-at-Server scenarios. Global Model Global Knowledge Local Knowledge Local Knowledge Small-sized Labeled Data + Unlabeled Data Small-sized Labeled Data + Unlabeled Data Small-sized Labeled Data + Unlabeled Data
  • 4.
    Federated Semi-Supervised Learning Oneof common scenarios is that the end-users intermittently annotate a small portion of their local data, while the rest of data instances remains unlabeled. We call such scenarios as Labels-at-Client scenario where both the labeled and unlabeled data are available at client side. iPhone Photos – Face Album
  • 5.
    Federated Semi-Supervised Learning Anothercommon cases for real-world applications is that annotation requires expert knowledge (e.g. annotating medical images, evaluating body postures for exercises). We define the scenario as Labels-at-Server scenario that the supervised labels are only available at the server, while local clients work with unlabeled data. Evaluating Body Postures
  • 6.
    Challenges: Exploiting ReliableKnowledge A simple solution to tackle the FSSL problems would be a naïve combination of semi- supervised learning and federated learning algorithms. Yet, the naïve approach does not fully exploit the knowledge from the multiple models trained on each local unlabeled data, which may cause the degradation of reliability. Federated Learning Semi-Supervised Learning Unreliable Knowledge How to Aggregate? Multiple Clients Unlabeled Data
  • 7.
    Federated Matching (FedMatch) Thus,we propose a novel method, FedMatch, which effectively utilizes reliable knowledge from multiple clients. FedMatch consists of two main components, such as the Inter-Client Consistency Loss and Parameter Decomposition for disjoint learning. Federated Learning Semi-Supervised Learning Inter-Client Consistency Reliable Knowledge
  • 8.
    Inter-Client Consistency Consistency regularizationmethods, one of popular approaches in SSL, enforce model predictions from augmented instances to output the same class label. Unlabeled Examples 𝒖 We further improve this by proposing Inter-Client Consistency Loss, which enforces the consistency between the predictions made across multiple helper agents. Inter-Client Consistency Loss 0 1 ! 𝒚𝒊 𝐂𝐄(! 𝒚𝒊, ) 𝝅(𝒖) 𝒑𝜽 Truck Conventional Consistency Regularization 𝐇𝐞𝐥𝐩𝐞𝐫 𝟐( ) 1 𝐇𝐞𝐥𝐩𝐞𝐫 𝟏 ( ) 1 1 𝐋𝐨𝐜𝐚𝐥𝐌𝐨𝐝𝐞𝐥 0 0 • Helper Agents are the selected models form other clients based on their similarity.
  • 9.
    Inter-Client Consistency Moreover, weperform local consistency regularization while utilizing the agreement for generating pseudo labels, namely agreement-based pseudo labeling. With the pseudo labels, we minimize standard cross entropy on augmented instances, similarly to FixMatch’s (Sohn et al., 2020). 𝑯𝒆𝒍𝒑𝒆𝒓 𝟏 ( ) 𝑯𝒆𝒍𝒑𝒆𝒓 𝟐( ) Unlabeled Examples Inter-Client Consistency Regularization 𝑳𝒐𝒄𝒂𝒍𝑴𝒐𝒅𝒆𝒍 0 0 1 0 1 0 1 𝒖 0 0 1 0 1 0 1 0 1 ! 𝒚𝒊 Agreement-based Pseudo-Labeling 𝐂𝐄(! 𝒚𝒊, ) 𝝅(𝒖) 𝒑𝜽 Truck Truck ! Cat Augmentation & Local Consistency Loss 𝝉 Truck Truck ! Cat ?
  • 10.
    Parameter Decomposition forDisjoint Learning In the standard semi-supervised learning, learning on labeled and unlabeled data is simultaneously done with a shared set of parameters. Labeled Data Unlabeled Data 𝜃 Model Parameters Reliable Knowledge Unreliable Knowledge However, this may result in the model to forget about what it learned from labeled data, which is crucial to semi-supervised learning. Reliable Knowledge Unreliable Knowledge Forgetting happens Poor Performance on Labeled Data
  • 11.
    Parameter Decomposition forDisjoint Learning We thus decompose our model parameters into parameters for supervised learning and unsupervised learning, which are 𝜎 and 𝜓, respectively. This enables us to separate the learning procedures depending on the availability of labeled data. 𝜃 𝜎 𝒮 = {𝒙! , 𝒚! }!"# $ Labeled Data from 𝒟 ⊕ 𝜓 𝒰 = {𝒖% }%"# & Unlabeled Data from 𝒟
  • 12.
    Parameter Decomposition forDisjoint Learning For Labels-at-Client scenario, where the labeled data is given at client, we learn both 𝜎 and 𝜓 at client side, and send them back to server. Labels-at-Client Scenario Global Model Σ(() Σ(() Aggregation Aggregation ⊕ Helper 1 Helper 2 𝜓<! 𝜓<" High Similarity Low Similarity … … … … High Similarity Low Similarity ⊕ 𝜓<! 𝜓<" Σ(() Aggregation For Labels-at-Server scenario, where the labeled data is located at server, we disjointly learn 𝜎 at server. In this case, only 𝜓 at client side is sent back to server. Labels-at-Server Scenario Local Model ⊕ Learn 𝜎=! and 𝜓=! Labeled Data Unlabeled Data Global Model Learn 𝜎> Labeled Data Helper 1 Helper 2 Learn 𝜓=! Local Model ⊕ 𝜎> ∗ Unlabeled Data
  • 13.
    Objective Functions For learningon labeled data, we perform the standard supervised learning with cross- entropy loss utilizing corresponding labels . For learning on unlabeled data, we minimize inter-client consistency loss and sparsity regularization term on 𝜓, as well as a regularizer which prevents 𝜓 drifting away from 𝜎∗. 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 ℒA 𝜎 = 𝜆ACE(y, pBCD∗(y|𝐱)) 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 ℒE 𝜓 = 𝜆EϕB∗CD 5 + 𝜆F" ||𝜓||! + 𝜆F# ||𝜎∗ − 𝜓||" "
  • 14.
    Experimental Setup We validateour method on Batch-IID and -NonIID datasets (CIFAR-10), which is conventional settings in standard federated learning. Example Streaming Non-IID dataset under Labels-at-Server scenario 0 1 2 … 9 0 1 2 … 9 0 1 2 … 9 0 1 2 … 9 0 1 2 … 9 We also evaluate our method on Streaming Non-IID dataset (Fashion-MNIST) where locally-generated private data continuously streams in.
  • 15.
    Results of MainExperiments Our method outperforms the naïve Federated Semi-supervised Learning models on both batch and streaming datasets in all scenarios. Batch-NonIID Dataset with 100 Clients Labels-at-Client Scenario Labels-at-Server Scenario Models Accuracy C2S Cost Local-FixMatch 62.62 ± 0.32 % N/A FedProx-FixMatch 62.40 ± 0.43 % 100 % FedMatch (Ours) 77.95 ± 0.14 % 48 % Streaming Non-IID under Labels-at-Client Scenario Streaming Non-IID under Labels-at-Server Scenario Models Accuracy C2S Cost FedProx-SL 77.43 ± 0.42 % 100 % FedProx-FixMatch 73.71 ± 0.32 % 100 % FedMatch (Ours) 84.15 ± 0.31 % 63 %
  • 16.
    Ablation Study We observethat our inter-client consistency loss plays a significant role to improve performance in FSSL scenario. Effect of Parameter Decomposition Effect of Inter-Client Consistency Loss Our decomposition method also allows us to preserve reliable knowledge from the labeled data.
  • 17.
    Conclusion • We introduceFederated Semi-Supervised Learning (FSSL), where each client learns with only partly labeled data (labels-at-client), or work with completely unlabeled data with supervised labels only available at the server (labels-at-server). • We propose FedMatch which utilizes the Inter-Client Consistency Loss to maximize the agreement from different clients, and Parameter Decomposition for disjoint learning enabling our model to preserve the reliable knowledge from labeled data. • We validate our method under Streaming Non-IID and Batch IID settings, which our methods significantly outperform naïve approach of FSSL.
  • 18.