SlideShare a Scribd company logo
1 of 16
Download to read offline
MEGABYTE: Predicting Million-byte Sequences
with Multiscale Transformers
2023.6.7
유용상
NLP 티타임
Introduction
Introduction
Introduction
- Patch embedder
: It takes an input of a discrete sequence,
embeds each element, and chunks it into
patches of a fixed length.
- Global Module
: It is a large autoregressive transformer that
contextualizes patch representations
by performing self-attention over previous patches
- Local Module
: It is a small local transformer that inputs
a contextualized patch representation from
the global model, and autoregressively
predicts the next patch
Components
1. Patch Embedder
Byte sequence x0~xT
Patch size : P
Dimension size :
Patch length :
<- byte embedding
reshape
<- input to the global model
Components
2. Global model
Components
3. Local model
Motivation
- Why is local model needed?
Motivation
- Increasing Parameters for Fixed Compute
: Larger feedforward layers across patches rather than individual tokens
- Re-use of Established Components
: increases the likelihood that the architecture will inherit the desirable scaling properties of transformers.
Efficiency Analysis
Training Efficiency
Sequence length : T
Vanila transformer
Sparse transformer
Routing transformer
Sequence length : T
Patch size : P
Patch Length : T/P
Global model
Local model
MEGABYTE(Overall)
Efficiency Analysis
Feedforward Layers
in the GPT3 architecture, the quadratic self-attention computation accounts for only 1.4% of FLOPS
대부분은 feedforward에서 FLOPS를 잡아먹음..
# of non-embedding parameters : m
Sequence length : T
MEGABYTE
# of global model parameters
# of local model parameters
2mT FLOPS
Efficiency Analysis
Combined Analysis
Performance
Language Modeling
Performance
Generation Speed
Performance
Effective Use of Context
Conclusion
- SOTA in existing byte-level models across a range of tasks and modalities
- Allowing large models of sequences of over 1M tokens
- gives competitive language modeling results with subword models,
which may allow byte-level models to replace tokenization
- Scale was far below those of SOTA LLMs

More Related Content

Similar to 20230608_megabyte

Design and Implementation of Low Power DSP Core with Programmable Truncated V...
Design and Implementation of Low Power DSP Core with Programmable Truncated V...Design and Implementation of Low Power DSP Core with Programmable Truncated V...
Design and Implementation of Low Power DSP Core with Programmable Truncated V...
ijsrd.com
 
Energy Efficient Bit Extension Type Accelerator Chip for Detection Algorithms
Energy Efficient Bit Extension Type Accelerator Chip for Detection AlgorithmsEnergy Efficient Bit Extension Type Accelerator Chip for Detection Algorithms
Energy Efficient Bit Extension Type Accelerator Chip for Detection Algorithms
IJERA Editor
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
ijceronline
 

Similar to 20230608_megabyte (20)

Iy3116761679
Iy3116761679Iy3116761679
Iy3116761679
 
santhosh popshetwar
santhosh popshetwarsanthosh popshetwar
santhosh popshetwar
 
Presentation vision transformersppt.pptx
Presentation vision transformersppt.pptxPresentation vision transformersppt.pptx
Presentation vision transformersppt.pptx
 
HPPS - Final - 06/14/2007
HPPS - Final - 06/14/2007HPPS - Final - 06/14/2007
HPPS - Final - 06/14/2007
 
IRJET - Design of a Low Power Serial- Parallel Multiplier with Low Transition...
IRJET - Design of a Low Power Serial- Parallel Multiplier with Low Transition...IRJET - Design of a Low Power Serial- Parallel Multiplier with Low Transition...
IRJET - Design of a Low Power Serial- Parallel Multiplier with Low Transition...
 
Transformer Zoo
Transformer ZooTransformer Zoo
Transformer Zoo
 
Optimization of graph storage using GoFFish
Optimization of graph storage using GoFFishOptimization of graph storage using GoFFish
Optimization of graph storage using GoFFish
 
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...
 
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNetFrom Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
 
Design and Implementation of Low Power DSP Core with Programmable Truncated V...
Design and Implementation of Low Power DSP Core with Programmable Truncated V...Design and Implementation of Low Power DSP Core with Programmable Truncated V...
Design and Implementation of Low Power DSP Core with Programmable Truncated V...
 
Ieee 802.11.n
Ieee 802.11.nIeee 802.11.n
Ieee 802.11.n
 
Ieee 802.11.n
Ieee 802.11.nIeee 802.11.n
Ieee 802.11.n
 
Ieee 802.11.n
Ieee 802.11.nIeee 802.11.n
Ieee 802.11.n
 
Ieee 802.11.n
Ieee 802.11.nIeee 802.11.n
Ieee 802.11.n
 
A High Speed Transposed Form FIR Filter Using Floating Point Dadda Multiplier
A High Speed Transposed Form FIR Filter Using Floating Point Dadda MultiplierA High Speed Transposed Form FIR Filter Using Floating Point Dadda Multiplier
A High Speed Transposed Form FIR Filter Using Floating Point Dadda Multiplier
 
Architecture of Wemlin Hub
Architecture of Wemlin HubArchitecture of Wemlin Hub
Architecture of Wemlin Hub
 
SudheerV_resume_a
SudheerV_resume_aSudheerV_resume_a
SudheerV_resume_a
 
Energy Efficient Bit Extension Type Accelerator Chip for Detection Algorithms
Energy Efficient Bit Extension Type Accelerator Chip for Detection AlgorithmsEnergy Efficient Bit Extension Type Accelerator Chip for Detection Algorithms
Energy Efficient Bit Extension Type Accelerator Chip for Detection Algorithms
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 

More from YongSang Yoo

More from YongSang Yoo (10)

20230727_tinystories
20230727_tinystories20230727_tinystories
20230727_tinystories
 
221220_페르소나챗봇
221220_페르소나챗봇221220_페르소나챗봇
221220_페르소나챗봇
 
220920_AI ETHICS
220920_AI ETHICS220920_AI ETHICS
220920_AI ETHICS
 
230309_LoRa
230309_LoRa230309_LoRa
230309_LoRa
 
230305_Characterizing English Variation across Social Media Communities with ...
230305_Characterizing English Variation across Social Media Communities with ...230305_Characterizing English Variation across Social Media Communities with ...
230305_Characterizing English Variation across Social Media Communities with ...
 
230223_Knowledge_Distillation
230223_Knowledge_Distillation230223_Knowledge_Distillation
230223_Knowledge_Distillation
 
221108_Multimodal Transformer
221108_Multimodal Transformer221108_Multimodal Transformer
221108_Multimodal Transformer
 
221011_BERT
221011_BERT221011_BERT
221011_BERT
 
220910_GatedRNN
220910_GatedRNN220910_GatedRNN
220910_GatedRNN
 
220906_Glove
220906_Glove220906_Glove
220906_Glove
 

Recently uploaded

Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
EADTU
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project research
CaitlinCummins3
 
SPLICE Working Group: Reusable Code Examples
SPLICE Working Group:Reusable Code ExamplesSPLICE Working Group:Reusable Code Examples
SPLICE Working Group: Reusable Code Examples
Peter Brusilovsky
 

Recently uploaded (20)

Trauma-Informed Leadership - Five Practical Principles
Trauma-Informed Leadership - Five Practical PrinciplesTrauma-Informed Leadership - Five Practical Principles
Trauma-Informed Leadership - Five Practical Principles
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
 
Observing-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptxObserving-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptx
 
Including Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdfIncluding Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdf
 
AIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptAIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.ppt
 
MOOD STABLIZERS DRUGS.pptx
MOOD     STABLIZERS           DRUGS.pptxMOOD     STABLIZERS           DRUGS.pptx
MOOD STABLIZERS DRUGS.pptx
 
How to Manage Website in Odoo 17 Studio App.pptx
How to Manage Website in Odoo 17 Studio App.pptxHow to Manage Website in Odoo 17 Studio App.pptx
How to Manage Website in Odoo 17 Studio App.pptx
 
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptxAnalyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
 
The Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDFThe Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDF
 
How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project research
 
VAMOS CUIDAR DO NOSSO PLANETA! .
VAMOS CUIDAR DO NOSSO PLANETA!                    .VAMOS CUIDAR DO NOSSO PLANETA!                    .
VAMOS CUIDAR DO NOSSO PLANETA! .
 
Supporting Newcomer Multilingual Learners
Supporting Newcomer  Multilingual LearnersSupporting Newcomer  Multilingual Learners
Supporting Newcomer Multilingual Learners
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
 
8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
 
Book Review of Run For Your Life Powerpoint
Book Review of Run For Your Life PowerpointBook Review of Run For Your Life Powerpoint
Book Review of Run For Your Life Powerpoint
 
SPLICE Working Group: Reusable Code Examples
SPLICE Working Group:Reusable Code ExamplesSPLICE Working Group:Reusable Code Examples
SPLICE Working Group: Reusable Code Examples
 
Graduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptxGraduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptx
 

20230608_megabyte