SlideShare a Scribd company logo
Who we are?
What is Grammarly?
Grammarly’s AI-powered writing
assistant helps you make your
communication clear and effective,
wherever you type.
● Correctness: grammar, spelling, punctuation, writing
mechanics, and updated terminologies
● Clarity: clear writing, conciseness, complexity, and
readability
● Engagement: vocabulary and variety
● Delivery: formality, politeness, and confidence
Grammarly’s
writing assistant
350+
team members across offices
in San Francisco, New York,
Kyiv, and Vancouver
2,000+
Institutional and enterprise
clients
20M+
daily active users around
the world
Grammarly by the numbers
Grammarly’s NLP team
Our team which works on NLP contains 30+ people of the
following roles:
- applied research scientists
- machine learning engineers
- computational linguists
This work
This presentation is based on the paper written by:
- Kostiantyn Omelianchuk
- Oleksandr Skurzhanskyi
- Vitaliy Atrasevych
- Artem Chernodub
What task we are
trying to solve?
The goal of GEC task is to produce the grammatically correct
sentence from the sentence with mistakes.
Example:
Source: He go at school.
Target: He goes to school.
GEC: Grammatical
Error Correction
A Comprehensive Survey of Grammar Error Correction (Yu
Wang*, Yuelin Wang*, Jie Liuɨ , and Zhuo Liu)
GEC: History overview
Categorization of research
in GEC
GEC: Data
GEC: Shared task
- CoNLL-2014 (the test set is composed of 50 essays written
by non-native English students; metric - M2Score)
- BEA-2019 (new data; the test set consists of 4477
sentences; metric - Errant)
GEC: Progress
Sequence
Tagging
We approach the GEC task as a sequence tagging problem.
In this formulation for each token in the source sentence a GEC
system should produce a tag (edit) which represent a required
correction operation for this token.
For solving this problem we use non-autoregressive
transformer-based model.
Sequence Tagging
keep the current token
unchanged
APPEND_ti
DELETE
REPLACE_ti
KEEP
delete current token
append new token ti
replace the current token
with token ti
SPLIT
MERGE
NOUN
NUMBER
CASE
VERB
FORM
Basic transformations
change the case of the
current token
g-transformations
merge the current and
the next token
split the current token
into two
convert singular nouns
to plurals and vice versa
change the form of
regular/irregular verbs
Token-level transformations
A ten years old boy go school
A ⇨ A:
ten ⇨ ten, -: ,
years ⇨ year, -:
old ⇨ old:
go ⇨ goes, to:
school ⇨ school, .:
A ten years old boy go school
A ten years old boy go school
$KEEP
$MERGE_HYPH
$NOUN_NUMBER_SINGULAR
$VERB_FORM_VB_VBZ
$APPEND_{.}
$KEEP
$MERGE_HYPH
$KEEP
$APPEND_{to}
$KEEP
A ten-year-old boy goes to school.
1. Generating tags is fully independent and easy to parallelize
operation.
2. Smaller output vocabulary size compare to seq2seq
models.
3. Less usage memory and faster inference compare to
encoder-decoder models.
Sequence Tagging: Pros
Sequence Tagging: Cons
1. Independent generating of tags relies on assumption that
errors are independent between each other.
2. Word reordering is tricky for this architecture.
First baselines
Academia vs Industry
- different data (both training/evaluation)
- latency/throughput matters
- aggressive deadlines
Baseline architecture
Encoder
Input sentence
Error
replacement
linear layer
Predicted tags
Corrected
sentence
Error
detection
linear layer
Postprocess
Error insertion
linear layer(s)
Embeddings
Baseline hyperparameters
- Emedings (trainable random, glove, bert, distill-bert)
- Encoder (cnn, lstm, stacked-lstm, pass_through, transformers)
- Output layers (1-2 for insertions, 1 for replacement, 1 for
detection)
- Output vocabulary size (100-50000)
- Other (dropouts, training schedule, tp/tn ratio, etc)
Baseline results
Baseline/m2 score CoNLL-2014 (test)
Grammarly Spellchecker
26.8
Stacked LSTM (vocab_size=1000) 30.5
Stacked LSTM (vocab_size=1000; + g-transformations) 35.6
Stacked LSTM (vocab_size=1000; + g-transformations; +
BERT embeddings) 46
Academic SOTA (single model) 61.3
Insights
- Increasing size of output vocabulary does not help
- Adding BERT as emdebbings helps a lot
- Training BERT with small batch_size failed (not converge);
training with bigger batches required gradient accumulating
Sudden
competitor
PIE paper
Our SOTA on NUCLE: 46
PIE paper
Our SOTA on NUCLE: 46
PIE paper
Our SOTA on NUCLE: 46
Transformers
Transformer
Attention Is All You Need
(Vaswani et al. 2017)
Multihead attention
BERT-like architecture
BERT (Devlin et al. 2018)
Training
transformers
huggingface / transformers
Training transformers.
First try
Thing that made our transformers works:
● small learning rate (1e-5)
● big batch size (at least 128 sentences)
or use gradient accumulation
● freeze BERT encoder first and train only linear layer,
then train jointly
Transformers recipe
*on CoNLL-2014 (test)
BERTology works
LSTM 35.6
LSTM + BERT
embeddings
46.0
DistillBERT 52.8
BERT-base 57.3
Additional tricks
A ten years old boy go school (zero corrections)
Iteration 1: A ten-years old boy goes school (2 total corrections)
Iteration 2 A ten-year-old boy goes to school (5 total corrections)
Iteration 3: A ten-year-old boy goes to school. (6 total corrections)
Iteration 2
Iterative Approach
Iteration 1
Iteration 3
Sequence
GECToR model: iterative
pipeline
Encoder (transformer-based)
Input sentence
Error correction
linear layer
Predicted tags
Corrected
sentence
Error detection
linear layer
Postprocess
Repeat N times
● Training is splitted in N stages
● Each stage has its own data
● Each stage has its own hyperparameters
● On each stage model is initialized by the best weights of
previous stage
Staged training
I. Pre-training on synthetic
errorful sentences as in
(Awasthi et al., 2019)
II. Fine-tuning on errorful-only
sentences
III. Fine-tuning on subset of
errorful and errorfree
sentences as in
(Kiyono et al., 2019)
Training stages
Dataset # sentences
% errorful
sentences
PIE-synthetic 9,000,000 100.0%
Lang-8 947,344 52.5%
NUCLE 56,958 38.0%
FCE 34,490 62.4%
W&I+LOCNESS 34,304 67.3%
I. Pre-training on synthetic
errorful sentences as in
(Awasthi et al., 2019)
II. Fine-tuning on errorful-only
sentences(new!)
III. Fine-tuning on subset of
errorful and errorfree
sentences as in
(Kiyono et al., 2019)
Training stages
Dataset # sentences
% errorful
sentences
PIE-synthetic 9,000,000 100.0%
Lang-8 947,344 52.5%
NUCLE 56,958 38.0%
FCE 34,490 62.4%
W&I+LOCNESS 34,304 67.3%
I. Pre-training on synthetic
errorful sentences as in
(Awasthi et al., 2019)
II. Fine-tuning on errorful-only
sentences
III. Fine-tuning on subset of
errorful and errorfree
sentences as in
(Kiyono et al., 2019)
Training stages
Dataset # sentences
% errorful
sentences
PIE-synthetic 9,000,000 100.0%
Lang-8 947,344 52.5%
NUCLE 56,958 38.0%
FCE 34,490 62.4%
W&I+LOCNESS 34,304 67.3%
● Adam optimizer (Kingma and Ba, 2015);
● Early stopping after 3 epochs of 10K
updates each w/a improvement;
● Epochs & batch sizes:
○ Stage I (pretraining):
batch size=256, 20 epochs
○ Stages II, III (finetuning):
batch size=128, 2-3 epochs
Training details
Training
stage #
CoNLL-2014
(test)*, F0.5
BEA-2019
(dev)*, F0.5
I 49.9 33.2
II 59.7 44.4
III 62.5 50.3
III + Inf.
tweaks
65.3 55.5
* Results are given for GECToR (XLNet).
Inference tweaks 1
By increasing probability of $KEEP tag we can force model to make
only confident actions.
In such a way, we can increase precision by trading recall.
Inference tweaks 2
We also compute the minimum probability of incorrect class across
all tokens in the sentence.
This value (min_erorr_probability) should be higher than threshold in
order to run next iteration.
BERT family
Varying encoders from
pretrained transformers
* Training was performed on data from training stage II only. ** Baseline.
Encoder
CONLL-2014 (test) BEA-2019 (test)
P R F0.5
P R F0.5
LSTM** 51.6 15.3 35.0 - - -
ALBERT 59.5 31.0 50.3 43.8 22.3 36.7
BERT 65.6 36.9 56.8 48.3 29.0 42.6
GPT-2 61.0 6.3 22.2 44.5 5.0 17.2
RoBERTa 67.5 38.3 58.6 50.3 30.5 44.5
XLNet 64.6 42.6 58.6 47.1 34.2 43.8
Varying encoders from
pretrained transformers
* Training was performed on data from training stage II only. ** Baseline.
Encoder
CONLL-2014 (test) BEA-2019 (test)
P R F0.5
P R F0.5
LSTM** 51.6 15.3 35.0 - - -
ALBERT 59.5 31.0 50.3 43.8 22.3 36.7
BERT 65.6 36.9 56.8 48.3 29.0 42.6
GPT-2 61.0 6.3 22.2 44.5 5.0 17.2
RoBERTa 67.5 38.3 58.6 50.3 30.5 44.5
XLNet 64.6 42.6 58.6 47.1 34.2 43.8
Model inference
time
Speed comparison
● NVIDIA Tesla V100
● CoNLL-2014 (test)
● single model
● batch size=128
Speed comparison
Final results
Results
Big picture
P.S. Writing paper
Writing paper
Writing paper
- It takes us approx 2 months for 4 authors to write 4 pages
- Better to start as early as possible
Support your code
Thank you!
Links
● The paper
● The code
● GEC survey

More Related Content

What's hot

BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentation
bhavesh_physics
 
Bert
BertBert
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
SDL
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Minh Pham
 
Bert pre_training_of_deep_bidirectional_transformers_for_language_understanding
Bert  pre_training_of_deep_bidirectional_transformers_for_language_understandingBert  pre_training_of_deep_bidirectional_transformers_for_language_understanding
Bert pre_training_of_deep_bidirectional_transformers_for_language_understanding
ThyrixYang1
 
SOFTCARDINALITY: Learning to Identify Directional Cross-Lingual Entailment fr...
SOFTCARDINALITY: Learning to Identify Directional Cross-Lingual Entailment fr...SOFTCARDINALITY: Learning to Identify Directional Cross-Lingual Entailment fr...
SOFTCARDINALITY: Learning to Identify Directional Cross-Lingual Entailment fr...
Sergio Jimenez
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
gohyunwoong
 
Capturing Word-level Dependencies in Morpheme-based Language Modeling
Capturing Word-level Dependencies in Morpheme-based Language ModelingCapturing Word-level Dependencies in Morpheme-based Language Modeling
Capturing Word-level Dependencies in Morpheme-based Language Modeling
Guy De Pauw
 
Transformers and BERT with SageMaker
Transformers and BERT with SageMakerTransformers and BERT with SageMaker
Transformers and BERT with SageMaker
Suman Debnath
 
Introduction to Transformers
Introduction to TransformersIntroduction to Transformers
Introduction to Transformers
Suman Debnath
 
22_11_2019 «Gamifying a software testing course with the Code Defenders Testi...
22_11_2019 «Gamifying a software testing course with the Code Defenders Testi...22_11_2019 «Gamifying a software testing course with the Code Defenders Testi...
22_11_2019 «Gamifying a software testing course with the Code Defenders Testi...
eMadrid network
 
Electra
ElectraElectra
Electra
San Kim
 
ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"
nozyh
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
Viet-Trung TRAN
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Deep Learning Italia
 
[論文紹介] Deep contextualized word representations
[論文紹介] Deep contextualized word representations[論文紹介] Deep contextualized word representations
[論文紹介] Deep contextualized word representations
OgataTomoya
 
IRJET- Spoken Language Identification System using MFCC Features and Gaus...
IRJET-  	  Spoken Language Identification System using MFCC Features and Gaus...IRJET-  	  Spoken Language Identification System using MFCC Features and Gaus...
IRJET- Spoken Language Identification System using MFCC Features and Gaus...
IRJET Journal
 
Ics1019 ics5003
Ics1019 ics5003Ics1019 ics5003
Ics1019 ics5003
Matt Montebello
 
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deren Lei
 

What's hot (19)

BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentation
 
Bert
BertBert
Bert
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Bert pre_training_of_deep_bidirectional_transformers_for_language_understanding
Bert  pre_training_of_deep_bidirectional_transformers_for_language_understandingBert  pre_training_of_deep_bidirectional_transformers_for_language_understanding
Bert pre_training_of_deep_bidirectional_transformers_for_language_understanding
 
SOFTCARDINALITY: Learning to Identify Directional Cross-Lingual Entailment fr...
SOFTCARDINALITY: Learning to Identify Directional Cross-Lingual Entailment fr...SOFTCARDINALITY: Learning to Identify Directional Cross-Lingual Entailment fr...
SOFTCARDINALITY: Learning to Identify Directional Cross-Lingual Entailment fr...
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Capturing Word-level Dependencies in Morpheme-based Language Modeling
Capturing Word-level Dependencies in Morpheme-based Language ModelingCapturing Word-level Dependencies in Morpheme-based Language Modeling
Capturing Word-level Dependencies in Morpheme-based Language Modeling
 
Transformers and BERT with SageMaker
Transformers and BERT with SageMakerTransformers and BERT with SageMaker
Transformers and BERT with SageMaker
 
Introduction to Transformers
Introduction to TransformersIntroduction to Transformers
Introduction to Transformers
 
22_11_2019 «Gamifying a software testing course with the Code Defenders Testi...
22_11_2019 «Gamifying a software testing course with the Code Defenders Testi...22_11_2019 «Gamifying a software testing course with the Code Defenders Testi...
22_11_2019 «Gamifying a software testing course with the Code Defenders Testi...
 
Electra
ElectraElectra
Electra
 
ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
[論文紹介] Deep contextualized word representations
[論文紹介] Deep contextualized word representations[論文紹介] Deep contextualized word representations
[論文紹介] Deep contextualized word representations
 
IRJET- Spoken Language Identification System using MFCC Features and Gaus...
IRJET-  	  Spoken Language Identification System using MFCC Features and Gaus...IRJET-  	  Spoken Language Identification System using MFCC Features and Gaus...
IRJET- Spoken Language Identification System using MFCC Features and Gaus...
 
Ics1019 ics5003
Ics1019 ics5003Ics1019 ics5003
Ics1019 ics5003
 
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
 

Similar to Kostiantyn Omelianchuk, Oleksandr Skurzhanskyi "Building a state-of-the-art approach to Grammatical Error Correction"

ELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
ELECTRA_Pretraining Text Encoders as Discriminators rather than GeneratorsELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
ELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
Danbi Cho
 
Free Writing - Grammatical Error Correction System: Sequence Tagging
Free Writing - Grammatical Error Correction System: Sequence TaggingFree Writing - Grammatical Error Correction System: Sequence Tagging
Free Writing - Grammatical Error Correction System: Sequence Tagging
IRJET Journal
 
IA3_presentation.pptx
IA3_presentation.pptxIA3_presentation.pptx
IA3_presentation.pptx
KtonNguyn2
 
American sign language recognizer
American sign language recognizerAmerican sign language recognizer
American sign language recognizer
Garrett Broughton, Architect/Engineer
 
Your Own Metric System
Your Own Metric SystemYour Own Metric System
Your Own Metric System
Erin Dees
 
BloombergGPT.pdfA Large Language Model for Finance
BloombergGPT.pdfA Large Language Model for FinanceBloombergGPT.pdfA Large Language Model for Finance
BloombergGPT.pdfA Large Language Model for Finance
957671457
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia Voulibasi
ISSEL
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
Tae Hwan Jung
 
Refactoring legacy code driven by tests - ENG
Refactoring legacy code driven by tests - ENGRefactoring legacy code driven by tests - ENG
Refactoring legacy code driven by tests - ENG
Luca Minudel
 
05-transformers.pdf
05-transformers.pdf05-transformers.pdf
05-transformers.pdf
ChaoYang81
 
LLM GPT-3: Language models are few-shot learners
LLM GPT-3: Language models are few-shot learnersLLM GPT-3: Language models are few-shot learners
LLM GPT-3: Language models are few-shot learners
Yan Xu
 
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
cvpaper. challenge
 
A Neural Grammatical Error Correction built on Better Pre-training and Sequen...
A Neural Grammatical Error Correction built on Better Pre-training and Sequen...A Neural Grammatical Error Correction built on Better Pre-training and Sequen...
A Neural Grammatical Error Correction built on Better Pre-training and Sequen...
NAVER Engineering
 
Generation of Random EMF Models for Benchmarks
Generation of Random EMF Models for BenchmarksGeneration of Random EMF Models for Benchmarks
Generation of Random EMF Models for Benchmarks
Markus Scheidgen
 
Analysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approachAnalysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approach
Lorenzo Cesaretti
 
From_seq2seq_to_BERT
From_seq2seq_to_BERTFrom_seq2seq_to_BERT
From_seq2seq_to_BERT
Huali Zhao
 
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual LearningSequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
MLAI2
 
Building high productivity applications
Building high productivity applicationsBuilding high productivity applications
Building high productivity applications
Hutomo Sugianto
 
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Association for Computational Linguistics
 
NL to OCL via SBVR
NL to OCL via SBVRNL to OCL via SBVR
NL to OCL via SBVR
Imran Bajwa
 

Similar to Kostiantyn Omelianchuk, Oleksandr Skurzhanskyi "Building a state-of-the-art approach to Grammatical Error Correction" (20)

ELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
ELECTRA_Pretraining Text Encoders as Discriminators rather than GeneratorsELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
ELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
 
Free Writing - Grammatical Error Correction System: Sequence Tagging
Free Writing - Grammatical Error Correction System: Sequence TaggingFree Writing - Grammatical Error Correction System: Sequence Tagging
Free Writing - Grammatical Error Correction System: Sequence Tagging
 
IA3_presentation.pptx
IA3_presentation.pptxIA3_presentation.pptx
IA3_presentation.pptx
 
American sign language recognizer
American sign language recognizerAmerican sign language recognizer
American sign language recognizer
 
Your Own Metric System
Your Own Metric SystemYour Own Metric System
Your Own Metric System
 
BloombergGPT.pdfA Large Language Model for Finance
BloombergGPT.pdfA Large Language Model for FinanceBloombergGPT.pdfA Large Language Model for Finance
BloombergGPT.pdfA Large Language Model for Finance
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia Voulibasi
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
 
Refactoring legacy code driven by tests - ENG
Refactoring legacy code driven by tests - ENGRefactoring legacy code driven by tests - ENG
Refactoring legacy code driven by tests - ENG
 
05-transformers.pdf
05-transformers.pdf05-transformers.pdf
05-transformers.pdf
 
LLM GPT-3: Language models are few-shot learners
LLM GPT-3: Language models are few-shot learnersLLM GPT-3: Language models are few-shot learners
LLM GPT-3: Language models are few-shot learners
 
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
 
A Neural Grammatical Error Correction built on Better Pre-training and Sequen...
A Neural Grammatical Error Correction built on Better Pre-training and Sequen...A Neural Grammatical Error Correction built on Better Pre-training and Sequen...
A Neural Grammatical Error Correction built on Better Pre-training and Sequen...
 
Generation of Random EMF Models for Benchmarks
Generation of Random EMF Models for BenchmarksGeneration of Random EMF Models for Benchmarks
Generation of Random EMF Models for Benchmarks
 
Analysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approachAnalysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approach
 
From_seq2seq_to_BERT
From_seq2seq_to_BERTFrom_seq2seq_to_BERT
From_seq2seq_to_BERT
 
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual LearningSequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
 
Building high productivity applications
Building high productivity applicationsBuilding high productivity applications
Building high productivity applications
 
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
 
NL to OCL via SBVR
NL to OCL via SBVRNL to OCL via SBVR
NL to OCL via SBVR
 

More from Fwdays

"What I learned through reverse engineering", Yuri Artiukh
"What I learned through reverse engineering", Yuri Artiukh"What I learned through reverse engineering", Yuri Artiukh
"What I learned through reverse engineering", Yuri Artiukh
Fwdays
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
"Micro frontends: Unbelievably true life story", Dmytro Pavlov
"Micro frontends: Unbelievably true life story", Dmytro Pavlov"Micro frontends: Unbelievably true life story", Dmytro Pavlov
"Micro frontends: Unbelievably true life story", Dmytro Pavlov
Fwdays
 
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
Fwdays
 
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
Fwdays
 
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y..."How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
Fwdays
 
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
Fwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Fwdays
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
Fwdays
 
"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets
Fwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Fwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
Fwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Fwdays
 
"Distributed graphs and microservices in Prom.ua", Maksym Kindritskyi
"Distributed graphs and microservices in Prom.ua",  Maksym Kindritskyi"Distributed graphs and microservices in Prom.ua",  Maksym Kindritskyi
"Distributed graphs and microservices in Prom.ua", Maksym Kindritskyi
Fwdays
 
"Rethinking the existing data loading and processing process as an ETL exampl...
"Rethinking the existing data loading and processing process as an ETL exampl..."Rethinking the existing data loading and processing process as an ETL exampl...
"Rethinking the existing data loading and processing process as an ETL exampl...
Fwdays
 
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T..."How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
Fwdays
 
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ..."The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
Fwdays
 
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu..."[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
Fwdays
 
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care..."[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
Fwdays
 
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"..."4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
Fwdays
 

More from Fwdays (20)

"What I learned through reverse engineering", Yuri Artiukh
"What I learned through reverse engineering", Yuri Artiukh"What I learned through reverse engineering", Yuri Artiukh
"What I learned through reverse engineering", Yuri Artiukh
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
"Micro frontends: Unbelievably true life story", Dmytro Pavlov
"Micro frontends: Unbelievably true life story", Dmytro Pavlov"Micro frontends: Unbelievably true life story", Dmytro Pavlov
"Micro frontends: Unbelievably true life story", Dmytro Pavlov
 
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
 
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
 
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y..."How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
 
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"Distributed graphs and microservices in Prom.ua", Maksym Kindritskyi
"Distributed graphs and microservices in Prom.ua",  Maksym Kindritskyi"Distributed graphs and microservices in Prom.ua",  Maksym Kindritskyi
"Distributed graphs and microservices in Prom.ua", Maksym Kindritskyi
 
"Rethinking the existing data loading and processing process as an ETL exampl...
"Rethinking the existing data loading and processing process as an ETL exampl..."Rethinking the existing data loading and processing process as an ETL exampl...
"Rethinking the existing data loading and processing process as an ETL exampl...
 
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T..."How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
 
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ..."The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
 
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu..."[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
 
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care..."[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
 
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"..."4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
 

Recently uploaded

20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 

Recently uploaded (20)

20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 

Kostiantyn Omelianchuk, Oleksandr Skurzhanskyi "Building a state-of-the-art approach to Grammatical Error Correction"

  • 1.
  • 3. What is Grammarly? Grammarly’s AI-powered writing assistant helps you make your communication clear and effective, wherever you type.
  • 4. ● Correctness: grammar, spelling, punctuation, writing mechanics, and updated terminologies ● Clarity: clear writing, conciseness, complexity, and readability ● Engagement: vocabulary and variety ● Delivery: formality, politeness, and confidence Grammarly’s writing assistant
  • 5. 350+ team members across offices in San Francisco, New York, Kyiv, and Vancouver 2,000+ Institutional and enterprise clients 20M+ daily active users around the world Grammarly by the numbers
  • 6. Grammarly’s NLP team Our team which works on NLP contains 30+ people of the following roles: - applied research scientists - machine learning engineers - computational linguists
  • 7. This work This presentation is based on the paper written by: - Kostiantyn Omelianchuk - Oleksandr Skurzhanskyi - Vitaliy Atrasevych - Artem Chernodub
  • 8. What task we are trying to solve?
  • 9. The goal of GEC task is to produce the grammatically correct sentence from the sentence with mistakes. Example: Source: He go at school. Target: He goes to school. GEC: Grammatical Error Correction
  • 10. A Comprehensive Survey of Grammar Error Correction (Yu Wang*, Yuelin Wang*, Jie Liuɨ , and Zhuo Liu) GEC: History overview
  • 13. GEC: Shared task - CoNLL-2014 (the test set is composed of 50 essays written by non-native English students; metric - M2Score) - BEA-2019 (new data; the test set consists of 4477 sentences; metric - Errant)
  • 16. We approach the GEC task as a sequence tagging problem. In this formulation for each token in the source sentence a GEC system should produce a tag (edit) which represent a required correction operation for this token. For solving this problem we use non-autoregressive transformer-based model. Sequence Tagging
  • 17. keep the current token unchanged APPEND_ti DELETE REPLACE_ti KEEP delete current token append new token ti replace the current token with token ti SPLIT MERGE NOUN NUMBER CASE VERB FORM Basic transformations change the case of the current token g-transformations merge the current and the next token split the current token into two convert singular nouns to plurals and vice versa change the form of regular/irregular verbs
  • 18. Token-level transformations A ten years old boy go school A ⇨ A: ten ⇨ ten, -: , years ⇨ year, -: old ⇨ old: go ⇨ goes, to: school ⇨ school, .: A ten years old boy go school A ten years old boy go school $KEEP $MERGE_HYPH $NOUN_NUMBER_SINGULAR $VERB_FORM_VB_VBZ $APPEND_{.} $KEEP $MERGE_HYPH $KEEP $APPEND_{to} $KEEP A ten-year-old boy goes to school.
  • 19. 1. Generating tags is fully independent and easy to parallelize operation. 2. Smaller output vocabulary size compare to seq2seq models. 3. Less usage memory and faster inference compare to encoder-decoder models. Sequence Tagging: Pros
  • 20. Sequence Tagging: Cons 1. Independent generating of tags relies on assumption that errors are independent between each other. 2. Word reordering is tricky for this architecture.
  • 22. Academia vs Industry - different data (both training/evaluation) - latency/throughput matters - aggressive deadlines
  • 23. Baseline architecture Encoder Input sentence Error replacement linear layer Predicted tags Corrected sentence Error detection linear layer Postprocess Error insertion linear layer(s) Embeddings
  • 24. Baseline hyperparameters - Emedings (trainable random, glove, bert, distill-bert) - Encoder (cnn, lstm, stacked-lstm, pass_through, transformers) - Output layers (1-2 for insertions, 1 for replacement, 1 for detection) - Output vocabulary size (100-50000) - Other (dropouts, training schedule, tp/tn ratio, etc)
  • 25. Baseline results Baseline/m2 score CoNLL-2014 (test) Grammarly Spellchecker 26.8 Stacked LSTM (vocab_size=1000) 30.5 Stacked LSTM (vocab_size=1000; + g-transformations) 35.6 Stacked LSTM (vocab_size=1000; + g-transformations; + BERT embeddings) 46 Academic SOTA (single model) 61.3
  • 26. Insights - Increasing size of output vocabulary does not help - Adding BERT as emdebbings helps a lot - Training BERT with small batch_size failed (not converge); training with bigger batches required gradient accumulating
  • 28. PIE paper Our SOTA on NUCLE: 46
  • 29. PIE paper Our SOTA on NUCLE: 46
  • 30. PIE paper Our SOTA on NUCLE: 46
  • 32. Transformer Attention Is All You Need (Vaswani et al. 2017)
  • 38. Thing that made our transformers works: ● small learning rate (1e-5) ● big batch size (at least 128 sentences) or use gradient accumulation ● freeze BERT encoder first and train only linear layer, then train jointly Transformers recipe
  • 39. *on CoNLL-2014 (test) BERTology works LSTM 35.6 LSTM + BERT embeddings 46.0 DistillBERT 52.8 BERT-base 57.3
  • 41. A ten years old boy go school (zero corrections) Iteration 1: A ten-years old boy goes school (2 total corrections) Iteration 2 A ten-year-old boy goes to school (5 total corrections) Iteration 3: A ten-year-old boy goes to school. (6 total corrections) Iteration 2 Iterative Approach Iteration 1 Iteration 3 Sequence
  • 42. GECToR model: iterative pipeline Encoder (transformer-based) Input sentence Error correction linear layer Predicted tags Corrected sentence Error detection linear layer Postprocess Repeat N times
  • 43. ● Training is splitted in N stages ● Each stage has its own data ● Each stage has its own hyperparameters ● On each stage model is initialized by the best weights of previous stage Staged training
  • 44. I. Pre-training on synthetic errorful sentences as in (Awasthi et al., 2019) II. Fine-tuning on errorful-only sentences III. Fine-tuning on subset of errorful and errorfree sentences as in (Kiyono et al., 2019) Training stages Dataset # sentences % errorful sentences PIE-synthetic 9,000,000 100.0% Lang-8 947,344 52.5% NUCLE 56,958 38.0% FCE 34,490 62.4% W&I+LOCNESS 34,304 67.3%
  • 45. I. Pre-training on synthetic errorful sentences as in (Awasthi et al., 2019) II. Fine-tuning on errorful-only sentences(new!) III. Fine-tuning on subset of errorful and errorfree sentences as in (Kiyono et al., 2019) Training stages Dataset # sentences % errorful sentences PIE-synthetic 9,000,000 100.0% Lang-8 947,344 52.5% NUCLE 56,958 38.0% FCE 34,490 62.4% W&I+LOCNESS 34,304 67.3%
  • 46. I. Pre-training on synthetic errorful sentences as in (Awasthi et al., 2019) II. Fine-tuning on errorful-only sentences III. Fine-tuning on subset of errorful and errorfree sentences as in (Kiyono et al., 2019) Training stages Dataset # sentences % errorful sentences PIE-synthetic 9,000,000 100.0% Lang-8 947,344 52.5% NUCLE 56,958 38.0% FCE 34,490 62.4% W&I+LOCNESS 34,304 67.3%
  • 47. ● Adam optimizer (Kingma and Ba, 2015); ● Early stopping after 3 epochs of 10K updates each w/a improvement; ● Epochs & batch sizes: ○ Stage I (pretraining): batch size=256, 20 epochs ○ Stages II, III (finetuning): batch size=128, 2-3 epochs Training details Training stage # CoNLL-2014 (test)*, F0.5 BEA-2019 (dev)*, F0.5 I 49.9 33.2 II 59.7 44.4 III 62.5 50.3 III + Inf. tweaks 65.3 55.5 * Results are given for GECToR (XLNet).
  • 48. Inference tweaks 1 By increasing probability of $KEEP tag we can force model to make only confident actions. In such a way, we can increase precision by trading recall.
  • 49. Inference tweaks 2 We also compute the minimum probability of incorrect class across all tokens in the sentence. This value (min_erorr_probability) should be higher than threshold in order to run next iteration.
  • 51. Varying encoders from pretrained transformers * Training was performed on data from training stage II only. ** Baseline. Encoder CONLL-2014 (test) BEA-2019 (test) P R F0.5 P R F0.5 LSTM** 51.6 15.3 35.0 - - - ALBERT 59.5 31.0 50.3 43.8 22.3 36.7 BERT 65.6 36.9 56.8 48.3 29.0 42.6 GPT-2 61.0 6.3 22.2 44.5 5.0 17.2 RoBERTa 67.5 38.3 58.6 50.3 30.5 44.5 XLNet 64.6 42.6 58.6 47.1 34.2 43.8
  • 52. Varying encoders from pretrained transformers * Training was performed on data from training stage II only. ** Baseline. Encoder CONLL-2014 (test) BEA-2019 (test) P R F0.5 P R F0.5 LSTM** 51.6 15.3 35.0 - - - ALBERT 59.5 31.0 50.3 43.8 22.3 36.7 BERT 65.6 36.9 56.8 48.3 29.0 42.6 GPT-2 61.0 6.3 22.2 44.5 5.0 17.2 RoBERTa 67.5 38.3 58.6 50.3 30.5 44.5 XLNet 64.6 42.6 58.6 47.1 34.2 43.8
  • 54. Speed comparison ● NVIDIA Tesla V100 ● CoNLL-2014 (test) ● single model ● batch size=128
  • 61. Writing paper - It takes us approx 2 months for 4 authors to write 4 pages - Better to start as early as possible
  • 64. Links ● The paper ● The code ● GEC survey