Story story ppt

•Download as PPTX, PDF•

0 likes•33 views

Pooja Patil

Transfer Learning with transformers

Data & Analytics

Agenda
• Overview of Transfer Learning.
• Why Transfer Learning is becoming an active
research area, especially in NLP.
• Transformers!
• A unified approach to transfer learning in NLP

Transfer Learning
• Transfer learning is a machine learning technique where
a model trained on one task is re-purposed on a second
related task.
• Training a model on a data-rich task and fine-tuning it on
a related downstream task

Overview of Transfer Learning
• Traditionally, in transfer learning, pre-training is done
using supervised learning on large labeled datasets.
• In Modern techniques, pre-training is often done via
unsupervised learning on the unlabeled datasets.

Transformers
• The Transformer in NLP is a novel architecture that aims to solve
sequence-to-sequence tasks while handling long-range dependencies
with ease.
• Earlier, Recurrent Neural Networks were leveraged in transfer learning
for NLP.
• Example: Translation, Summarization.

Why Transformers for Transfer Learning
• Due to en masses availability of unlabeled text data (Thanks to the
Internet), transfer learning in NLP has become an active research area.
• Major NLP tasks like question answering, machine translation,
summarization can be treated as a related task.

Unified Approach
• Key – For related NLP tasks, treat every text-based/ text
processing task as a “text-to-text” task.
• Using a text-to-text framework, one can apply the same
model, objective, training procedure, and decoding
process to every task one considers.

Input Output Format
• To specify which task the model should perform, a task-specific (text)
prefix is added to the original input sequence before feeding it to the
model.
• For instance, to ask the model to translate the sentence “That is good.”
from English to German, the model would be fed the sequence
“translate English to German: That is good.” and would be trained to
output “Das ist gut.”

T5 model - Text-to-Text Transfer Transformer
Source -
https://arxiv.org/pdf/191
0.10683v3.pdf

Dataset
• Colossal Clean Crawled Corpus (C4)
• Publicly-available web archive provides “web extracted text” by
removing markup and other non-text content from the scraped HTML
files.

Model Structures
• A major distinguishing factor for different architectures
is the “mask” used by different attention mechanisms in
the model.
• The self-attention operation in a Transformer takes a
sequence as input and outputs a new sequence of the
same length.

Model Structure
• Fully Visible
• Casual
• Casual with prefix

Model Structure
• An encoder-decoder Transformer, consists of two-layer stacks:
• The encoder, which is fed an input sequence, and the decoder, which
produces a new output sequence.
• The encoder uses a “fully-visible” attention mask.
• This form of masking is appropriate when attending over a “prefix”, i.e.
some context provided to the model that is later used when making
predictions.

Model Structure
• The self-attention operations in the Transformer’s decoder use a
“causal” masking pattern.
• This is used during training so that the model can’t “see into the future”
as it produces its output.

What's hot

Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...Vimukthi Wickramasinghe

Natural Language to Visualization by Neural Machine Translationivaderivader

On using monolingual corpora in neural machine translationNAIST Machine Translation Study Group

Machine Translation Introductionnlab_utokyo

ASM Core API - methods (part 1)TaegeonLee1

Parallel Computing - Lec 5Shah Zaib

[Impl] neural machine translationJaeHo Jang

Parallel Execution of Model Management Programs (STAF 2017)Sina Madani

Multi source meta transfer for low resource multiple-choice question answering-DaeungKim2

A probabilistic approach to string transformationIEEEFINALYEARPROJECTS

Emma_TaoLiang_CV_2016.10Tao Liang

What's hot (11)

Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...

Natural Language to Visualization by Neural Machine Translation

On using monolingual corpora in neural machine translation

Machine Translation Introduction

ASM Core API - methods (part 1)

Parallel Computing - Lec 5

[Impl] neural machine translation

Parallel Execution of Model Management Programs (STAF 2017)

Multi source meta transfer for low resource multiple-choice question answering-

A probabilistic approach to string transformation

Emma_TaoLiang_CV_2016.10

Similar to Story story ppt

Transfer Learning in NLP: A SurveyNUPUR YADAV

240115_Attention Is All You Need (2017 NIPS).pptxthanhdowork

Natural Language Processing Advancements By Deep Learning - A SurveyAkshayaNagarajan10

Neural Networks with Google TensorFlowDarshan Patel

Building a Neural Machine Translation System From ScratchNatasha Latysheva

Coding For Cores - C# WayBishnu Rawal

Session 2.1 ontological representation of the telecom domain for advanced a...semanticsconference

Concurrency Programming in Java - 01 - Introduction to Concurrency ProgrammingSachintha Gunasena

Introduction to OpenSees by Frank McKennaopenseesdays

NLP and Deep Learning for non_expertsSanghamitra Deb

2017:12:06 acl読み会"Learning attention for historical text normalization by lea...ayaha osaki

The Tipping PointAndrzej Zydroń MBCS

The tipping pointAndrzej Zydroń MBCS

Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Saurabh Kaushik

C-and-Cpp-Brochure-English. .spotguys705

CPP19 - RevisionMichael Heron

Need of object oriented programmingAmar Jukuntla

Build, Scale, and Deploy Deep Learning Pipelines Using Apache SparkDatabricks

Felix Burkhardt | ARCHITECTURE FOR A QUESTION ANSWERING MACHINEsemanticsconference

Programming LanguageAdeel Hamid

Similar to Story story ppt (20)

Transfer Learning in NLP: A Survey

240115_Attention Is All You Need (2017 NIPS).pptx

Natural Language Processing Advancements By Deep Learning - A Survey

Neural Networks with Google TensorFlow

Building a Neural Machine Translation System From Scratch

Coding For Cores - C# Way

Session 2.1 ontological representation of the telecom domain for advanced a...

Concurrency Programming in Java - 01 - Introduction to Concurrency Programming

Introduction to OpenSees by Frank McKenna

NLP and Deep Learning for non_experts

2017:12:06 acl読み会"Learning attention for historical text normalization by lea...

The Tipping Point

The tipping point

Engineering Intelligent NLP Applications Using Deep Learning – Part 2

C-and-Cpp-Brochure-English. .

CPP19 - Revision

Need of object oriented programming

Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Felix Burkhardt | ARCHITECTURE FOR A QUESTION ANSWERING MACHINE

Programming Language

Recently uploaded

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823

Zuja dropshipping via API with DroFx.pptxolyaivanovalion

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7Call Girls in Nagpur High Profile Call Girls

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083

Smarteg dropshipping via API with DroFx.pptxolyaivanovalion

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

VidaXL dropshipping via API with DroFx.pptxolyaivanovalion

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692

100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823

Invezz.com - Grow your wealth with trading signalsInvezz1

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY

Ravak dropshipping via API with DroFx.pptxolyaivanovalion

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823

Recently uploaded (20)

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...

Zuja dropshipping via API with DroFx.pptx

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...

Smarteg dropshipping via API with DroFx.pptx

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

VidaXL dropshipping via API with DroFx.pptx

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night

FESE Capital Markets Fact Sheet 2024 Q1.pdf

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx

100-Concepts-of-AI by Anupama Kate .pptx

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...

Invezz.com - Grow your wealth with trading signals

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...

Ravak dropshipping via API with DroFx.pptx

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...

Story story ppt

1. Exploring transfer learning with transformers.

2. Agenda • Overview of Transfer Learning. • Why Transfer Learning is becoming an active research area, especially in NLP. • Transformers! • A unified approach to transfer learning in NLP

3. Transfer Learning • Transfer learning is a machine learning technique where a model trained on one task is re-purposed on a second related task. • Training a model on a data-rich task and fine-tuning it on a related downstream task

4. Overview of Transfer Learning • Traditionally, in transfer learning, pre-training is done using supervised learning on large labeled datasets. • In Modern techniques, pre-training is often done via unsupervised learning on the unlabeled datasets.

5. Transformers • The Transformer in NLP is a novel architecture that aims to solve sequence-to-sequence tasks while handling long-range dependencies with ease. • Earlier, Recurrent Neural Networks were leveraged in transfer learning for NLP. • Example: Translation, Summarization.

6. Why Transformers for Transfer Learning • Due to en masses availability of unlabeled text data (Thanks to the Internet), transfer learning in NLP has become an active research area. • Major NLP tasks like question answering, machine translation, summarization can be treated as a related task.

7. Unified Approach • Key – For related NLP tasks, treat every text-based/ text processing task as a “text-to-text” task. • Using a text-to-text framework, one can apply the same model, objective, training procedure, and decoding process to every task one considers.

8. Input Output Format • To specify which task the model should perform, a task-specific (text) prefix is added to the original input sequence before feeding it to the model. • For instance, to ask the model to translate the sentence “That is good.” from English to German, the model would be fed the sequence “translate English to German: That is good.” and would be trained to output “Das ist gut.”

9. T5 model - Text-to-Text Transfer Transformer Source - https://arxiv.org/pdf/191 0.10683v3.pdf

10. Dataset • Colossal Clean Crawled Corpus (C4) • Publicly-available web archive provides “web extracted text” by removing markup and other non-text content from the scraped HTML files.

11. Model Structures • A major distinguishing factor for different architectures is the “mask” used by different attention mechanisms in the model. • The self-attention operation in a Transformer takes a sequence as input and outputs a new sequence of the same length.

12. Model Structure • Fully Visible • Casual • Casual with prefix

13. Model Structure • An encoder-decoder Transformer, consists of two-layer stacks: • The encoder, which is fed an input sequence, and the decoder, which produces a new output sequence. • The encoder uses a “fully-visible” attention mask. • This form of masking is appropriate when attending over a “prefix”, i.e. some context provided to the model that is later used when making predictions.

14. Model Structure • The self-attention operations in the Transformer’s decoder use a “causal” masking pattern. • This is used during training so that the model can’t “see into the future” as it produces its output.

15. Thank You

Story story ppt

Recommended

Recommended

More Related Content

What's hot

What's hot (11)

Similar to Story story ppt

Similar to Story story ppt (20)

Recently uploaded

Recently uploaded (20)

Story story ppt