SlideShare a Scribd company logo
1 of 16
Kyonggi Univ. AI Lab.
RETHINKING ATTENTION WITH PERFORMERS
2021.1.4
정규열
Artificial Intelligence Lab
Kyonggi Univiersity
Kyonggi Univ. AI Lab.
Index
 도입 배경
 FAVOR
 EXPERIMENTS
 결론
Kyonggi Univ. AI Lab.
도입 배경
Kyonggi Univ. AI Lab.
도입 배경
 Transformer에 사용되는 Attention기능의 연산량이 상당하다.
 과도한 연산량으로 인해 효율성이 저하된다.
 이에 연산량을 줄이는 방법이 필요하다.
 FAVOR를 도입함.
 우선적으로 Attention의 연산량을 줄인다.
 이에 새로운 Kernel 기법을 제안함(softmax 역할)
Kyonggi Univ. AI Lab.
도입 배경
 시간 복잡도 개선 구조
기존 제안
Kyonggi Univ. AI Lab.
FAVOR
Kyonggi Univ. AI Lab.
FAVOR - Attention의 개선
 일반적인 Attention
𝑄 =
𝑞11
𝑞21
𝑞31
.
.
𝑞𝐿1
𝑞12
𝑞22
𝑞32
.
.
𝑞𝐿2
𝑞13
𝑞23
𝑞33
.
.
𝑞𝐿3
…
𝑞1𝑑
𝑞2𝑑
𝑞3𝑑
.
.
𝑞𝐿𝑑
𝐾 =
𝑘11
𝑘21
𝑘31
.
.
𝑘𝐿1
𝑘12
𝑘22
𝑘32
.
.
𝑘𝐿2
𝑘13
𝑘23
𝑘33
.
.
𝑘𝐿3
…
𝑘1𝑑
𝑘2𝑑
𝑘3𝑑
.
.
𝑘𝐿𝑑
L x d L x d
𝐾𝑇 =
𝑘11
𝑘12
𝑘13
.
.
𝑘1𝑑
𝑘21
𝑘22
𝑘23
.
.
𝑘2𝑑
𝑘31
𝑘32
𝑘33
.
.
𝑘3𝑑
…
𝑘𝐿1
𝑘𝐿2
𝑘𝐿3
.
.
𝑘𝐿𝑑
d x L
𝑸𝑲𝑻
= 𝑳 × 𝒅 × (d × 𝑳 ) = 𝑳 × 𝑳
시간 복잡도 : 𝑶(𝑳𝟐𝒅)
Kyonggi Univ. AI Lab.
FAVOR - Attention의 개선
 시간 복잡도 개선하기 – Trick!
 일반적인 Attention -> 𝑨 = 𝒔𝒐𝒇𝒕𝒎𝒂𝒌(𝒒, 𝒌)
 제안한 방법 -> 𝑨 = 𝑲𝒆𝒓𝒏𝒆𝒍(𝑸, 𝑲)
𝑲𝒆𝒓𝒏𝒆𝒍 𝑸, 𝑲 = 𝑬[∅ 𝑸 𝑻∅(𝑲)]
∅: mapping (d -> r)
Q → L X d
𝑄𝑇
→ d X L
∅(𝑄𝑇) → r X L
∅(𝑄𝑇)𝑇 → L X r
𝑸′ = ∅(𝑸𝑻)𝑻
Attention = Kernel(Q, K) V
= 𝑸′
(𝑲′
)𝑻
V
= 𝑸′ ((𝑲′)𝑻 V)
Kyonggi Univ. AI Lab.
FAVOR - Attention의 개선
 Softmax의 역할을 하는 kernel (sin-cos)
Softmax kernel
이 방법은 분산이 매우 커짐
• Softmax의 경우 결과값이 항상 양수로 나온다.
• 그러나 위 방법은 음수 범위까지 나오게 된다.
• 따라서 안정적인 수렴이 어렵다.
Kyonggi Univ. AI Lab.
FAVOR - Attention의 개선
 제안하는 Kernel 기법 – Positive
분산이 작아지며 안정적인 수렴이 용이 하도록 하였다.
Kyonggi Univ. AI Lab.
EXPERIMENTS
Kyonggi Univ. AI Lab.
EXPERIMENTS
 연산 속도 비교
순전파 역전파
Transformer에 비하여 연산 속도가 빠름을 알 수 있다.
Kyonggi Univ. AI Lab.
EXPERIMENTS
 커널 방법 차이에 따른 정확성 비교
Positive 기법이 안정적임을 확인 할 수 있다.
Kyonggi Univ. AI Lab.
EXPERIMENTS
 기존 Transformer와 정확성 비교
기존 Transformer와 비교하여 정확성에서도 우수하며 수렴 속도도 빠르다
Kyonggi Univ. AI Lab.
결론
Kyonggi Univ. AI Lab.
결론
 기존의 Transformer의 연산량을 줄이려고 함.
 결국 Attention 과정을 수정해야 함.
 Trick을 사용하여 연산량을 줄였다.
 이럴 경우 기존의 softmax 함수를 사용 할 수 없다.
 Softmax 와 비슷한 역할을 할 수 있는 Kernel기법을 제안함
 단 sin-cos 방법보다 positive 방법이 우수함
 연산량 및 정확성에서 기존 Transformer보다 우수하다.

More Related Content

What's hot (7)

Robot Planing Article Overview
Robot Planing Article OverviewRobot Planing Article Overview
Robot Planing Article Overview
 
Thesis
ThesisThesis
Thesis
 
IEEE/RSJ IROS 2008 Real-time Tracker
IEEE/RSJ IROS 2008 Real-time TrackerIEEE/RSJ IROS 2008 Real-time Tracker
IEEE/RSJ IROS 2008 Real-time Tracker
 
Shai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingShai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble tracking
 
Rethinking Attention with Performers
Rethinking Attention with PerformersRethinking Attention with Performers
Rethinking Attention with Performers
 
SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...
SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...
SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...
 
Recurrent_environment_simulators
Recurrent_environment_simulatorsRecurrent_environment_simulators
Recurrent_environment_simulators
 

Similar to Rethinking attention with performers

Superefficient Monte Carlo Simulations
Superefficient Monte Carlo SimulationsSuperefficient Monte Carlo Simulations
Superefficient Monte Carlo Simulations
Cheng-An Yang
 
Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...
Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...
Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...
sky chang
 
45 years in cm (slide share2013)
45 years in cm  (slide share2013)45 years in cm  (slide share2013)
45 years in cm (slide share2013)
Ray Beebe
 
Loop Fusion for Memory Space Optimization
Loop Fusion for Memory Space OptimizationLoop Fusion for Memory Space Optimization
Loop Fusion for Memory Space Optimization
tmusabbir
 

Similar to Rethinking attention with performers (20)

TRPO(trust region policy optimization)
TRPO(trust region policy optimization)TRPO(trust region policy optimization)
TRPO(trust region policy optimization)
 
Sparse Representations for Packetized Predictive Networked Control
Sparse Representations for Packetized Predictive Networked ControlSparse Representations for Packetized Predictive Networked Control
Sparse Representations for Packetized Predictive Networked Control
 
Compressed Sensing using Generative Model
Compressed Sensing using Generative ModelCompressed Sensing using Generative Model
Compressed Sensing using Generative Model
 
Ant Colony Optimization: The Algorithm and Its Applications
Ant Colony Optimization: The Algorithm and Its ApplicationsAnt Colony Optimization: The Algorithm and Its Applications
Ant Colony Optimization: The Algorithm and Its Applications
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptx
 
Bias and Variance in Continuous EDA: massively parallel continuous optimization
Bias and Variance in Continuous EDA: massively parallel continuous optimizationBias and Variance in Continuous EDA: massively parallel continuous optimization
Bias and Variance in Continuous EDA: massively parallel continuous optimization
 
Superefficient Monte Carlo Simulations
Superefficient Monte Carlo SimulationsSuperefficient Monte Carlo Simulations
Superefficient Monte Carlo Simulations
 
Yacf
YacfYacf
Yacf
 
Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...
Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...
Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...
 
45 years in cm (slide share2013)
45 years in cm  (slide share2013)45 years in cm  (slide share2013)
45 years in cm (slide share2013)
 
Loop Fusion for Memory Space Optimization
Loop Fusion for Memory Space OptimizationLoop Fusion for Memory Space Optimization
Loop Fusion for Memory Space Optimization
 
euclides-c mthesis
euclides-c mthesiseuclides-c mthesis
euclides-c mthesis
 
PR-252: Making Convolutional Networks Shift-Invariant Again
PR-252: Making Convolutional Networks Shift-Invariant AgainPR-252: Making Convolutional Networks Shift-Invariant Again
PR-252: Making Convolutional Networks Shift-Invariant Again
 
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random foresthands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
 
I. Henderson, J. Ingram, D. Poulcharidis - Advanced Topics in Chemical Biolog...
I. Henderson, J. Ingram, D. Poulcharidis - Advanced Topics in Chemical Biolog...I. Henderson, J. Ingram, D. Poulcharidis - Advanced Topics in Chemical Biolog...
I. Henderson, J. Ingram, D. Poulcharidis - Advanced Topics in Chemical Biolog...
 
The Action Against Soft-Errors to Prevent Service Outage
The Action Against Soft-Errors to Prevent Service OutageThe Action Against Soft-Errors to Prevent Service Outage
The Action Against Soft-Errors to Prevent Service Outage
 
A tutorial on EMF-IncQuery
A tutorial on EMF-IncQueryA tutorial on EMF-IncQuery
A tutorial on EMF-IncQuery
 
Sc11 presentation 2001_06_28
Sc11 presentation 2001_06_28Sc11 presentation 2001_06_28
Sc11 presentation 2001_06_28
 
Building Robust Pipelines with Airflow | Wrangle Conference 2017
Building Robust Pipelines with Airflow | Wrangle Conference 2017Building Robust Pipelines with Airflow | Wrangle Conference 2017
Building Robust Pipelines with Airflow | Wrangle Conference 2017
 
Building Robust Pipelines with Airflow
Building Robust Pipelines with AirflowBuilding Robust Pipelines with Airflow
Building Robust Pipelines with Airflow
 

More from KyuYeolJung (7)

Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안
 
Cpgan content-parsing generative
Cpgan   content-parsing generativeCpgan   content-parsing generative
Cpgan content-parsing generative
 
MARL based on role
MARL based on roleMARL based on role
MARL based on role
 
Style gan
Style ganStyle gan
Style gan
 
Attn-gan : fine-grained text to image generation
Attn-gan :  fine-grained text to image generationAttn-gan :  fine-grained text to image generation
Attn-gan : fine-grained text to image generation
 
Language gans falling short
Language gans falling shortLanguage gans falling short
Language gans falling short
 
COMA(counterfactual multi-agent policy gradients)
COMA(counterfactual multi-agent policy gradients)COMA(counterfactual multi-agent policy gradients)
COMA(counterfactual multi-agent policy gradients)
 

Recently uploaded

Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 

Recently uploaded (20)

Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهله
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 

Rethinking attention with performers

  • 1. Kyonggi Univ. AI Lab. RETHINKING ATTENTION WITH PERFORMERS 2021.1.4 정규열 Artificial Intelligence Lab Kyonggi Univiersity
  • 2. Kyonggi Univ. AI Lab. Index  도입 배경  FAVOR  EXPERIMENTS  결론
  • 3. Kyonggi Univ. AI Lab. 도입 배경
  • 4. Kyonggi Univ. AI Lab. 도입 배경  Transformer에 사용되는 Attention기능의 연산량이 상당하다.  과도한 연산량으로 인해 효율성이 저하된다.  이에 연산량을 줄이는 방법이 필요하다.  FAVOR를 도입함.  우선적으로 Attention의 연산량을 줄인다.  이에 새로운 Kernel 기법을 제안함(softmax 역할)
  • 5. Kyonggi Univ. AI Lab. 도입 배경  시간 복잡도 개선 구조 기존 제안
  • 6. Kyonggi Univ. AI Lab. FAVOR
  • 7. Kyonggi Univ. AI Lab. FAVOR - Attention의 개선  일반적인 Attention 𝑄 = 𝑞11 𝑞21 𝑞31 . . 𝑞𝐿1 𝑞12 𝑞22 𝑞32 . . 𝑞𝐿2 𝑞13 𝑞23 𝑞33 . . 𝑞𝐿3 … 𝑞1𝑑 𝑞2𝑑 𝑞3𝑑 . . 𝑞𝐿𝑑 𝐾 = 𝑘11 𝑘21 𝑘31 . . 𝑘𝐿1 𝑘12 𝑘22 𝑘32 . . 𝑘𝐿2 𝑘13 𝑘23 𝑘33 . . 𝑘𝐿3 … 𝑘1𝑑 𝑘2𝑑 𝑘3𝑑 . . 𝑘𝐿𝑑 L x d L x d 𝐾𝑇 = 𝑘11 𝑘12 𝑘13 . . 𝑘1𝑑 𝑘21 𝑘22 𝑘23 . . 𝑘2𝑑 𝑘31 𝑘32 𝑘33 . . 𝑘3𝑑 … 𝑘𝐿1 𝑘𝐿2 𝑘𝐿3 . . 𝑘𝐿𝑑 d x L 𝑸𝑲𝑻 = 𝑳 × 𝒅 × (d × 𝑳 ) = 𝑳 × 𝑳 시간 복잡도 : 𝑶(𝑳𝟐𝒅)
  • 8. Kyonggi Univ. AI Lab. FAVOR - Attention의 개선  시간 복잡도 개선하기 – Trick!  일반적인 Attention -> 𝑨 = 𝒔𝒐𝒇𝒕𝒎𝒂𝒌(𝒒, 𝒌)  제안한 방법 -> 𝑨 = 𝑲𝒆𝒓𝒏𝒆𝒍(𝑸, 𝑲) 𝑲𝒆𝒓𝒏𝒆𝒍 𝑸, 𝑲 = 𝑬[∅ 𝑸 𝑻∅(𝑲)] ∅: mapping (d -> r) Q → L X d 𝑄𝑇 → d X L ∅(𝑄𝑇) → r X L ∅(𝑄𝑇)𝑇 → L X r 𝑸′ = ∅(𝑸𝑻)𝑻 Attention = Kernel(Q, K) V = 𝑸′ (𝑲′ )𝑻 V = 𝑸′ ((𝑲′)𝑻 V)
  • 9. Kyonggi Univ. AI Lab. FAVOR - Attention의 개선  Softmax의 역할을 하는 kernel (sin-cos) Softmax kernel 이 방법은 분산이 매우 커짐 • Softmax의 경우 결과값이 항상 양수로 나온다. • 그러나 위 방법은 음수 범위까지 나오게 된다. • 따라서 안정적인 수렴이 어렵다.
  • 10. Kyonggi Univ. AI Lab. FAVOR - Attention의 개선  제안하는 Kernel 기법 – Positive 분산이 작아지며 안정적인 수렴이 용이 하도록 하였다.
  • 11. Kyonggi Univ. AI Lab. EXPERIMENTS
  • 12. Kyonggi Univ. AI Lab. EXPERIMENTS  연산 속도 비교 순전파 역전파 Transformer에 비하여 연산 속도가 빠름을 알 수 있다.
  • 13. Kyonggi Univ. AI Lab. EXPERIMENTS  커널 방법 차이에 따른 정확성 비교 Positive 기법이 안정적임을 확인 할 수 있다.
  • 14. Kyonggi Univ. AI Lab. EXPERIMENTS  기존 Transformer와 정확성 비교 기존 Transformer와 비교하여 정확성에서도 우수하며 수렴 속도도 빠르다
  • 15. Kyonggi Univ. AI Lab. 결론
  • 16. Kyonggi Univ. AI Lab. 결론  기존의 Transformer의 연산량을 줄이려고 함.  결국 Attention 과정을 수정해야 함.  Trick을 사용하여 연산량을 줄였다.  이럴 경우 기존의 softmax 함수를 사용 할 수 없다.  Softmax 와 비슷한 역할을 할 수 있는 Kernel기법을 제안함  단 sin-cos 방법보다 positive 방법이 우수함  연산량 및 정확성에서 기존 Transformer보다 우수하다.