Task-Agnostic Vision Transformer for
Distributed Learning of Image Processing
Boah Kim*, Jeongsol Kim*, and Jong Chul Ye
IEEE Transactions on Image Processing
Task-Agnostic Vision Transformer for Distributed Learning of Image Processing
Research objectives
Distributed learning
• To train a single network on multiple devices using local data
• Ex. Federated learning (FL), Split learning (SL)
[1] https://proandroiddev.com/federated-learning-e79e054c33ef [2] Singh, Abhishek, et al. arXiv preprint arXiv:1909.09145 (2019).
2
Federated learning Split learning
Parallel communication between each client Decomposition of a network into clients & server
Usually consider a common task such as classification
Task-Agnostic Vision Transformer for Distributed Learning of Image Processing
Research objectives
3
Distributed learning
• To train a single network on multiple devices using local data
• Ex. Federated learning (FL), Split learning (SL)
• To process various tasks across the clients without sharing local data
• To propose a systematic way for clients to synergistically learn multiple image processing tasks
Research goal
Federated learning Split learning
Task-Agnostic Vision Transformer for Distributed Learning of Image Processing
Background: Multi-task learning (MTL)
[1] https://pyimagesearch.com/2022/08/17/multi-task-learning-and-hydranets-with-pytorch/ [2] Kendall et al., 2017
4
• To enhance the generalization of model on one task by learning shared representations of related tasks
• To improve computational efficiency and reduces the overfitting problem
Unlike existing MTL models that learn similar tasks,
Our model is to learn multiple different image processing tasks
[1] [2]
Existing models
Task-Agnostic Vision Transformer for Distributed Learning of Image Processing
Background: Image processing using Transformer
[1] Vaswani, Ashish, et al. NeurIPS (2017). [2] Dosovitskiy, Alexey, et al. ICLR 2021. [3] Chen, Hanting, et al. CVPR 2021.
A network to solve sequence-to-
sequence tasks by using long-range
dependencies via self-attention
Transformer
5
An encoder-only architecture to
learn image recognition tasks
Vision Transformer (ViT)
NLP CV
Image Processing Transformer (IPT)
CNN heads/tails & Transformer body to
learn low-level vision tasks
Our model is to design a distributed learning framework using Transformer
that does not require centralized data
Distribute learning
Task-Agnostic Vision Transformer for Distributed Learning of Image Processing
Proposed method
Task-Agnostic Vision Transformer (TAViT)
• Subscription-based service model → Clients subscribe to a task-agnostic Transformer at the server
- Clients: CNN Heads/tails proper to their tasks
- Server: Encoder-only Transformer to learn global attention over the image features
6
Task-Agnostic Vision Transformer for Distributed Learning of Image Processing
Proposed method
Training scheme
• Task-specific learning: To train the client-side task-specific head and tail networks
• Deformation network: To train the server-side task-agnostic body network
Consider clients and server as two players
Alternating training strategy
7
Task-Agnostic Vision Transformer for Distributed Learning of Image Processing
Proposed method
Training scheme
Task-specific learning
8
• Clients train their own heads/tails with the fixed body in parallel using locally stored datasets
• Optimization:
Back-propagation
• Federated learning
When there are multiple clients for the same task,
Task-Agnostic Vision Transformer for Distributed Learning of Image Processing
Proposed method
Training scheme
Task-agnostic learning
9
• Server trains the Transformer body with the fixed head & tail of a randomly chosen client for each iteration
• Optimization:
Back-propagation
• To learn global embedding representation
→ To provide task-agnostic self-attended features for various image processing
Task-Agnostic Vision Transformer for Distributed Learning of Image Processing
Experimental results
Multi-task distributed learning
10
• 1 server + 5 clients (2 deblocking, 1 denoising, 1 deraining, 1 deblurring)
Task-Agnostic Vision Transformer for Distributed Learning of Image Processing
Experimental results
Comparison to distributed learning strategies
11
• To compare TAViT for three cycles with SL and FL
Comparison to learning each separate task
• To compare TAViT with models independently trained on each individual task
Task-Agnostic Vision Transformer for Distributed Learning of Image Processing
Experimental results
Comparison to task-specific models
12
• To evaluate TAViT with several representative methods using the benchmark datasets
Task-Agnostic Vision Transformer for Distributed Learning of Image Processing
Experimental results
Comparison to task-specific models
13
• To evaluate TAViT with several representative methods using the benchmark datasets
Task-Agnostic Vision Transformer for Distributed Learning of Image Processing
Conclusion
• Task-specific CNN heads/tails placed in clients + Task-agnostic Transformer body placed in the server
- Training by an alternating training scheme between task-specific learning & task-agnostic learning
• Experimental results show the success of task-agnostic learning of the Transformer body and its
synergistic improvement with the task-specific heads and tails.
• Through our TAViT, clients can train their own networks depending on the task using local data in parallel.
Propose a novel multi-task distributed learning method, called TAViT.
14
Thank you.
15

TIP_TAViT_presentation.pdf

  • 1.
    Task-Agnostic Vision Transformerfor Distributed Learning of Image Processing Boah Kim*, Jeongsol Kim*, and Jong Chul Ye IEEE Transactions on Image Processing
  • 2.
    Task-Agnostic Vision Transformerfor Distributed Learning of Image Processing Research objectives Distributed learning • To train a single network on multiple devices using local data • Ex. Federated learning (FL), Split learning (SL) [1] https://proandroiddev.com/federated-learning-e79e054c33ef [2] Singh, Abhishek, et al. arXiv preprint arXiv:1909.09145 (2019). 2 Federated learning Split learning Parallel communication between each client Decomposition of a network into clients & server Usually consider a common task such as classification
  • 3.
    Task-Agnostic Vision Transformerfor Distributed Learning of Image Processing Research objectives 3 Distributed learning • To train a single network on multiple devices using local data • Ex. Federated learning (FL), Split learning (SL) • To process various tasks across the clients without sharing local data • To propose a systematic way for clients to synergistically learn multiple image processing tasks Research goal Federated learning Split learning
  • 4.
    Task-Agnostic Vision Transformerfor Distributed Learning of Image Processing Background: Multi-task learning (MTL) [1] https://pyimagesearch.com/2022/08/17/multi-task-learning-and-hydranets-with-pytorch/ [2] Kendall et al., 2017 4 • To enhance the generalization of model on one task by learning shared representations of related tasks • To improve computational efficiency and reduces the overfitting problem Unlike existing MTL models that learn similar tasks, Our model is to learn multiple different image processing tasks [1] [2] Existing models
  • 5.
    Task-Agnostic Vision Transformerfor Distributed Learning of Image Processing Background: Image processing using Transformer [1] Vaswani, Ashish, et al. NeurIPS (2017). [2] Dosovitskiy, Alexey, et al. ICLR 2021. [3] Chen, Hanting, et al. CVPR 2021. A network to solve sequence-to- sequence tasks by using long-range dependencies via self-attention Transformer 5 An encoder-only architecture to learn image recognition tasks Vision Transformer (ViT) NLP CV Image Processing Transformer (IPT) CNN heads/tails & Transformer body to learn low-level vision tasks Our model is to design a distributed learning framework using Transformer that does not require centralized data Distribute learning
  • 6.
    Task-Agnostic Vision Transformerfor Distributed Learning of Image Processing Proposed method Task-Agnostic Vision Transformer (TAViT) • Subscription-based service model → Clients subscribe to a task-agnostic Transformer at the server - Clients: CNN Heads/tails proper to their tasks - Server: Encoder-only Transformer to learn global attention over the image features 6
  • 7.
    Task-Agnostic Vision Transformerfor Distributed Learning of Image Processing Proposed method Training scheme • Task-specific learning: To train the client-side task-specific head and tail networks • Deformation network: To train the server-side task-agnostic body network Consider clients and server as two players Alternating training strategy 7
  • 8.
    Task-Agnostic Vision Transformerfor Distributed Learning of Image Processing Proposed method Training scheme Task-specific learning 8 • Clients train their own heads/tails with the fixed body in parallel using locally stored datasets • Optimization: Back-propagation • Federated learning When there are multiple clients for the same task,
  • 9.
    Task-Agnostic Vision Transformerfor Distributed Learning of Image Processing Proposed method Training scheme Task-agnostic learning 9 • Server trains the Transformer body with the fixed head & tail of a randomly chosen client for each iteration • Optimization: Back-propagation • To learn global embedding representation → To provide task-agnostic self-attended features for various image processing
  • 10.
    Task-Agnostic Vision Transformerfor Distributed Learning of Image Processing Experimental results Multi-task distributed learning 10 • 1 server + 5 clients (2 deblocking, 1 denoising, 1 deraining, 1 deblurring)
  • 11.
    Task-Agnostic Vision Transformerfor Distributed Learning of Image Processing Experimental results Comparison to distributed learning strategies 11 • To compare TAViT for three cycles with SL and FL Comparison to learning each separate task • To compare TAViT with models independently trained on each individual task
  • 12.
    Task-Agnostic Vision Transformerfor Distributed Learning of Image Processing Experimental results Comparison to task-specific models 12 • To evaluate TAViT with several representative methods using the benchmark datasets
  • 13.
    Task-Agnostic Vision Transformerfor Distributed Learning of Image Processing Experimental results Comparison to task-specific models 13 • To evaluate TAViT with several representative methods using the benchmark datasets
  • 14.
    Task-Agnostic Vision Transformerfor Distributed Learning of Image Processing Conclusion • Task-specific CNN heads/tails placed in clients + Task-agnostic Transformer body placed in the server - Training by an alternating training scheme between task-specific learning & task-agnostic learning • Experimental results show the success of task-agnostic learning of the Transformer body and its synergistic improvement with the task-specific heads and tails. • Through our TAViT, clients can train their own networks depending on the task using local data in parallel. Propose a novel multi-task distributed learning method, called TAViT. 14
  • 15.