SlideShare a Scribd company logo
#TechSEOBoost | @CatalystSEM
THANK YOU TO OUR SPONSORS
Generating Qualitative Content with GPT-2
in All Languages
Vincent Terrasi, OnCrawl
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
In All Languages
Generating Qualitative
Content
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
SEO Use-cases
• Image captioning with Pythia
• Visual question & Answering
• Abstractive Summarization with BERTsum
• Full Article generation with GPT-2
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Text Spinners are bad
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Google, What is bad generated content in 2016?
• Text translated by an automated tool without human review or curation before
publishing
• Text generated through automated processes, such as Markov chains
• Text generated using automated synonymizing or obfuscation techniques
• Text generated from scraping Atom/RSS feeds or search results
• Stitching or combining content from different web pages without adding sufficient value
https://web.archive.org/web/20160222004700/https://support.google.com/webmasters/answer/2721306?hl=en
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Google, What is bad generated content in 2019?
• Text that makes no sense to the reader but which may contain search keywords.
• Text translated by an automated tool without human review or curation before
publishing
• Text generated through automated processes, such as Markov chains
• Text generated using automated synonymizing or obfuscation techniques
• Text generated from scraping Atom/RSS feeds or search results
• Stitching or combining content from different web pages without adding sufficient value
https://support.google.com/webmasters/answer/2721306?hl=en
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Surprise!
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
2019, the best year for
using AI for text
generation
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
GPT-2BERT
ELMO ULM-FIT
J Howard
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
GPT-2BERT
ELMO ULM-FIT
J Howard
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Transformer and Attention Model
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Patterns for Attention Model
Pattern 1: Attention to next word
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Patterns for Attention Model
Pattern 1: Attention to next word
Pattern 2: Attention to previous word
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Patterns for Attention Model
Pattern 1: Attention to next word
Pattern 2: Attention to previous word
Pattern 3: Attention to identical/related words
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Patterns for Attention Model
Pattern 1: Attention to next word
Pattern 2: Attention to previous word
Pattern 3: Attention to identical/related words
Pattern 4: Attention to identical/related words in other sentence
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Patterns for Attention Model
Pattern 1: Attention to next word
Pattern 2: Attention to previous word
Pattern 3: Attention to identical/related words
Pattern 4: Attention to identical/related words in other sentence
Pattern 5: Attention to other words predictive (next word) of word
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Patterns for Attention Model
Pattern 1: Attention to next word
Pattern 2: Attention to previous word
Pattern 3: Attention to identical/related words
Pattern 4: Attention to identical/related words in other sentence
Pattern 5: Attention to other words predictive (next word) of word
Pattern 6: Attention to delimiter tokens
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
State of the Art
⚫ All models exist for English
⚫ Documentation is good
⚫ So we just need to translate
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
There are a lot of biases:
◦ Small Talk
◦ Idioms
◦ Local Named Entities
◦ Rarest Verbs
◦ Uncommon Tenses
◦ Gender rules
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
How to scale?
Create your own model
in your language
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Objectives
Use only qualitative methods to improve
the quality of content created by humans
Extract the knowledge learnt by the Deep
Learning.
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Why others attempts have
failed?
Quantitative:
You need a lot of data: more than 100 000
texts with a minimum of 500 words
Qualitative:
You need qualitative texts
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
GPT-2
Recipe
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Step 1: Training the model
This method without pretraining requires significant computing power.
You need GPUs! 3 days to get my first result with one GPU.
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Step 2: Generating the compressed training dataset - 1/2
GPT-2 needs to learn with the Byte Pair Encoding (BPE) format which is a simple form of
data compression.
Why?
- Predicting the next character is too imprecise
- Predicting the next word is too precive and take a lot of computing power.
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Step 2: Generating the compressed training dataset - 2/2
Use SentencePiece to generate my BPE files.
Why?
- Unsupervised text tokenizer and detokenizer
- Purely end-to-end system that does not depend on language-specific
pre/postprocessing.
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Step 3: Fine-tuning the model
Vocabulary size: depends on the language
- n_vocab:50257
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Step 3: Fine-tuning the model
Vocabulary size: depends on the language
- n_vocab:50257
Embedding size: default value recommended by Open AI team
- n_embd:768
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Step 3: Fine-tuning the model
Vocabulary size: depends on the language
- n_vocab:50257
Embedding size: default value recommended by Open AI team
- n_embd:768
Size of attention: no greater accuracy if you increase this value
- n_head:12
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Step 3: Fine-tuning the model
Vocabulary size: depends on the language
- n_vocab:50257
Embedding size: default value recommended by Open AI team
- n_embd:768
Size of attention: no greater accuracy if you increase this value
- n_head:12
Number of layers: no greater accuracy if you increase this value
- n_layer:12
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Step 4: Generating article text
Once the model has been trained, the gpt-2-gen command is used to generate a text.
The first parameter is the path to the model.
The second is the beginning of the sentence.
Then there are two optional parameters:
o --tokens-to-generate: number of tokens to generate, default 42
o --top-k: number of candidate tokens each time, by default 8.
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Results & Quality
Evaluated subjectively by a native reader.
API pylanguagetool was used to quantifiably
confirm the quality of results and did not find
any errors in the generated text.
https://github.com/Findus23/pyLanguagetool
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
You can find my Google Colab Notebook
here for the French
https://colab.research.google.com/drive/13Lbk1TYmTjoQFO6qbw_f1TJgoD5ulJwV
Warning: it is just an example using limited
data.
NOW it is your turn.
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Further ?
Parameters Objectives Use Cases
top-k < 10
token < 10
High Performance
Very high qualitative content related
to your original training content
Anchors for Internal Linking
Variant of Title
Variant of Meta
top-k > 50
token > 400
Low Performance
Low qualitative content because the
model is weak, but the model
successfully extracts all concepts
that GPT-2 learnt about your dataset.
Guides to help you write, compared
to a query, with the stated purpose of
saving you time.
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Thank You
vincent@oncrawl.com
Catalyst | @CatalystSEM | #TechSEOBoost
Thanks for Viewing the Slideshare!
–
Watch the Recording: https://youtube.com/session-example
Or
Contact us today to discover how Catalyst can deliver unparalleled SEO
results for your business. https://www.catalystdigital.com/

More Related Content

What's hot

A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word Embeddings
Bhaskar Mitra
 
Transformers
TransformersTransformers
Transformers
Anup Joseph
 
Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Neural Language Generation Head to Toe
Neural Language Generation Head to Toe
Hady Elsahar
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
Nuwan Sriyantha Bandara
 
[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
JEE HYUN PARK
 
Bert
BertBert
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Deep Learning Italia
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Young Seok Kim
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
Suman Debnath
 
INTRODUCTION TO NLP, RNN, LSTM, GRU
INTRODUCTION TO NLP, RNN, LSTM, GRUINTRODUCTION TO NLP, RNN, LSTM, GRU
INTRODUCTION TO NLP, RNN, LSTM, GRU
Sri Geetha
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Minh Pham
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)
Yuta Niki
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face Transformers
Julien SIMON
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
shaurya uppal
 
Intro to Machine Learning by Google Product Manager
Intro to Machine Learning by Google Product ManagerIntro to Machine Learning by Google Product Manager
Intro to Machine Learning by Google Product Manager
Product School
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
Illia Polosukhin
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
Daiki Tanaka
 
Text to Speech for Mobile Voice
Text to Speech for Mobile Voice Text to Speech for Mobile Voice
Text to Speech for Mobile Voice
June Hostetter
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language model
JiWenKim
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with Transformers
Julien SIMON
 

What's hot (20)

A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word Embeddings
 
Transformers
TransformersTransformers
Transformers
 
Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Neural Language Generation Head to Toe
Neural Language Generation Head to Toe
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
 
Bert
BertBert
Bert
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
 
INTRODUCTION TO NLP, RNN, LSTM, GRU
INTRODUCTION TO NLP, RNN, LSTM, GRUINTRODUCTION TO NLP, RNN, LSTM, GRU
INTRODUCTION TO NLP, RNN, LSTM, GRU
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face Transformers
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
Intro to Machine Learning by Google Product Manager
Intro to Machine Learning by Google Product ManagerIntro to Machine Learning by Google Product Manager
Intro to Machine Learning by Google Product Manager
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
Text to Speech for Mobile Voice
Text to Speech for Mobile Voice Text to Speech for Mobile Voice
Text to Speech for Mobile Voice
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language model
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with Transformers
 

Similar to Generating Qualitative Content with GPT-2 in All Languages

Automate, Create Tools, & Test Ideas Quickly with Google Apps Script
Automate, Create Tools, & Test Ideas Quickly with Google Apps ScriptAutomate, Create Tools, & Test Ideas Quickly with Google Apps Script
Automate, Create Tools, & Test Ideas Quickly with Google Apps Script
Catalyst
 
ChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdfChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdf
Sonal Tiwari
 
TechSEO Boost 2019: Research Competition
TechSEO Boost 2019: Research CompetitionTechSEO Boost 2019: Research Competition
TechSEO Boost 2019: Research Competition
Catalyst
 
TechSEO Boost - Apps script for SEOs
TechSEO Boost - Apps script for SEOsTechSEO Boost - Apps script for SEOs
TechSEO Boost - Apps script for SEOs
David Sottimano
 
Analyzing Real Time News
Analyzing Real Time NewsAnalyzing Real Time News
Analyzing Real Time News
Raffaele Lorusso
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
Sumit Raj
 
BTech Final Project (1).pptx
BTech Final Project (1).pptxBTech Final Project (1).pptx
BTech Final Project (1).pptx
SwarajPatel19
 
Machine Learning for Designers
Machine Learning for DesignersMachine Learning for Designers
Machine Learning for Designers
Memi Beltrame
 
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSISMOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
Editor Jacotech
 
How to build your in-house ChatGPT
How to build your in-house ChatGPT How to build your in-house ChatGPT
How to build your in-house ChatGPT
Citynow Asia Inc
 
Improve existing code with confidence, supported by unit tests
Improve existing code with confidence, supported by unit testsImprove existing code with confidence, supported by unit tests
Improve existing code with confidence, supported by unit tests
Dattatray Kale
 
Deep Learning using Tensorflow and Data Science Experience
Deep Learning using Tensorflow and Data Science ExperienceDeep Learning using Tensorflow and Data Science Experience
Deep Learning using Tensorflow and Data Science Experience
Roy Cecil
 
Python For Technical SEO | Women In Tech SEO Festival March 2020 | Ruth Everett
Python For Technical SEO | Women In Tech SEO Festival March 2020 | Ruth Everett Python For Technical SEO | Women In Tech SEO Festival March 2020 | Ruth Everett
Python For Technical SEO | Women In Tech SEO Festival March 2020 | Ruth Everett
Ruth Everett
 
Five steps to search and store tweets by keywords
Five steps to search and store tweets by keywordsFive steps to search and store tweets by keywords
Five steps to search and store tweets by keywords
Weiai Wayne Xu
 
MmIT webinar 2018 - Essential tools and technologies for the library and info...
MmIT webinar 2018 - Essential tools and technologies for the library and info...MmIT webinar 2018 - Essential tools and technologies for the library and info...
MmIT webinar 2018 - Essential tools and technologies for the library and info...
MmIT - Multimedia Information Technology Group for CILIP
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastText
Bayu Aldi Yansyah
 
Machine Learning and Python For Marketing Automation | MKGO October 2019 | Ru...
Machine Learning and Python For Marketing Automation | MKGO October 2019 | Ru...Machine Learning and Python For Marketing Automation | MKGO October 2019 | Ru...
Machine Learning and Python For Marketing Automation | MKGO October 2019 | Ru...
Ruth Everett
 
How can AI be a creative partner for PR & marketing?
How can AI be a creative partner for PR & marketing?How can AI be a creative partner for PR & marketing?
How can AI be a creative partner for PR & marketing?
Thomas Winters
 
Sentiment analysis on demonetisation
Sentiment analysis on demonetisationSentiment analysis on demonetisation
Sentiment analysis on demonetisation
AbrarMohamed5
 

Similar to Generating Qualitative Content with GPT-2 in All Languages (20)

Automate, Create Tools, & Test Ideas Quickly with Google Apps Script
Automate, Create Tools, & Test Ideas Quickly with Google Apps ScriptAutomate, Create Tools, & Test Ideas Quickly with Google Apps Script
Automate, Create Tools, & Test Ideas Quickly with Google Apps Script
 
ChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdfChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdf
 
TechSEO Boost 2019: Research Competition
TechSEO Boost 2019: Research CompetitionTechSEO Boost 2019: Research Competition
TechSEO Boost 2019: Research Competition
 
TechSEO Boost - Apps script for SEOs
TechSEO Boost - Apps script for SEOsTechSEO Boost - Apps script for SEOs
TechSEO Boost - Apps script for SEOs
 
Analyzing Real Time News
Analyzing Real Time NewsAnalyzing Real Time News
Analyzing Real Time News
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
BTech Final Project (1).pptx
BTech Final Project (1).pptxBTech Final Project (1).pptx
BTech Final Project (1).pptx
 
Machine Learning for Designers
Machine Learning for DesignersMachine Learning for Designers
Machine Learning for Designers
 
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSISMOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
 
How to build your in-house ChatGPT
How to build your in-house ChatGPT How to build your in-house ChatGPT
How to build your in-house ChatGPT
 
Improve existing code with confidence, supported by unit tests
Improve existing code with confidence, supported by unit testsImprove existing code with confidence, supported by unit tests
Improve existing code with confidence, supported by unit tests
 
Deep Learning using Tensorflow and Data Science Experience
Deep Learning using Tensorflow and Data Science ExperienceDeep Learning using Tensorflow and Data Science Experience
Deep Learning using Tensorflow and Data Science Experience
 
Python For Technical SEO | Women In Tech SEO Festival March 2020 | Ruth Everett
Python For Technical SEO | Women In Tech SEO Festival March 2020 | Ruth Everett Python For Technical SEO | Women In Tech SEO Festival March 2020 | Ruth Everett
Python For Technical SEO | Women In Tech SEO Festival March 2020 | Ruth Everett
 
Five steps to search and store tweets by keywords
Five steps to search and store tweets by keywordsFive steps to search and store tweets by keywords
Five steps to search and store tweets by keywords
 
MmIT webinar 2018 - Essential tools and technologies for the library and info...
MmIT webinar 2018 - Essential tools and technologies for the library and info...MmIT webinar 2018 - Essential tools and technologies for the library and info...
MmIT webinar 2018 - Essential tools and technologies for the library and info...
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastText
 
Machine Learning and Python For Marketing Automation | MKGO October 2019 | Ru...
Machine Learning and Python For Marketing Automation | MKGO October 2019 | Ru...Machine Learning and Python For Marketing Automation | MKGO October 2019 | Ru...
Machine Learning and Python For Marketing Automation | MKGO October 2019 | Ru...
 
Thesis Presentation V4
Thesis Presentation V4Thesis Presentation V4
Thesis Presentation V4
 
How can AI be a creative partner for PR & marketing?
How can AI be a creative partner for PR & marketing?How can AI be a creative partner for PR & marketing?
How can AI be a creative partner for PR & marketing?
 
Sentiment analysis on demonetisation
Sentiment analysis on demonetisationSentiment analysis on demonetisation
Sentiment analysis on demonetisation
 

More from Catalyst

Closing the Gap: Adopting Omnichannel Strategies for Stronger Brand-Consumer ...
Closing the Gap: Adopting Omnichannel Strategies for Stronger Brand-Consumer ...Closing the Gap: Adopting Omnichannel Strategies for Stronger Brand-Consumer ...
Closing the Gap: Adopting Omnichannel Strategies for Stronger Brand-Consumer ...
Catalyst
 
TechSEO Boost 2021 - Cultivating a Product Mindset for Success
TechSEO Boost 2021 - Cultivating a Product Mindset for SuccessTechSEO Boost 2021 - Cultivating a Product Mindset for Success
TechSEO Boost 2021 - Cultivating a Product Mindset for Success
Catalyst
 
TechSEO Boost 2021 - SEO Experimentation
TechSEO Boost 2021 - SEO ExperimentationTechSEO Boost 2021 - SEO Experimentation
TechSEO Boost 2021 - SEO Experimentation
Catalyst
 
TechSEO Boost 2021 - Rendering Strategies: Measuring the Devil’s Details in C...
TechSEO Boost 2021 - Rendering Strategies: Measuring the Devil’s Details in C...TechSEO Boost 2021 - Rendering Strategies: Measuring the Devil’s Details in C...
TechSEO Boost 2021 - Rendering Strategies: Measuring the Devil’s Details in C...
Catalyst
 
TechSEO Boost 2021 - The Future Is The Past: Tagging And Tracking Through The...
TechSEO Boost 2021 - The Future Is The Past: Tagging And Tracking Through The...TechSEO Boost 2021 - The Future Is The Past: Tagging And Tracking Through The...
TechSEO Boost 2021 - The Future Is The Past: Tagging And Tracking Through The...
Catalyst
 
10 Trends Changing Programmatic
10 Trends Changing Programmatic10 Trends Changing Programmatic
10 Trends Changing Programmatic
Catalyst
 
New Commerce Conference: Charting a Course to Success with Your Retail Media ...
New Commerce Conference: Charting a Course to Success with Your Retail Media ...New Commerce Conference: Charting a Course to Success with Your Retail Media ...
New Commerce Conference: Charting a Course to Success with Your Retail Media ...
Catalyst
 
The New Commerce Conference: The Omni-channel Imperative
The New Commerce Conference: The Omni-channel ImperativeThe New Commerce Conference: The Omni-channel Imperative
The New Commerce Conference: The Omni-channel Imperative
Catalyst
 
New Commerce Commerce: All Things Instacart
New Commerce Commerce: All Things InstacartNew Commerce Commerce: All Things Instacart
New Commerce Commerce: All Things Instacart
Catalyst
 
The Power of SEO: Protect Your Bottom Line & Future Proof Your Brand
The Power of SEO: Protect Your Bottom Line & Future Proof Your BrandThe Power of SEO: Protect Your Bottom Line & Future Proof Your Brand
The Power of SEO: Protect Your Bottom Line & Future Proof Your Brand
Catalyst
 
The Era of Omni-Commerce: New Insights for Dominating the Digital Shelf and B...
The Era of Omni-Commerce: New Insights for Dominating the Digital Shelf and B...The Era of Omni-Commerce: New Insights for Dominating the Digital Shelf and B...
The Era of Omni-Commerce: New Insights for Dominating the Digital Shelf and B...
Catalyst
 
Reignite Your Business with Performance Marketing: 4 Ways to Fuel Your Reopening
Reignite Your Business with Performance Marketing: 4 Ways to Fuel Your ReopeningReignite Your Business with Performance Marketing: 4 Ways to Fuel Your Reopening
Reignite Your Business with Performance Marketing: 4 Ways to Fuel Your Reopening
Catalyst
 
Reignite Your Business with Performance Marketing: 4 Ways to Dial-Up Brand In...
Reignite Your Business with Performance Marketing: 4 Ways to Dial-Up Brand In...Reignite Your Business with Performance Marketing: 4 Ways to Dial-Up Brand In...
Reignite Your Business with Performance Marketing: 4 Ways to Dial-Up Brand In...
Catalyst
 
Evolve Your Social Commerce Strategy: Thinking Beyond Facebook
Evolve Your Social Commerce Strategy: Thinking Beyond FacebookEvolve Your Social Commerce Strategy: Thinking Beyond Facebook
Evolve Your Social Commerce Strategy: Thinking Beyond Facebook
Catalyst
 
B2B SEO: Increase Traffic & Leads in 2020
B2B SEO: Increase Traffic & Leads in 2020B2B SEO: Increase Traffic & Leads in 2020
B2B SEO: Increase Traffic & Leads in 2020
Catalyst
 
Keynote: Bias in Search and Recommender Systems
Keynote: Bias in Search and Recommender SystemsKeynote: Bias in Search and Recommender Systems
Keynote: Bias in Search and Recommender Systems
Catalyst
 
NLP Powered Outreach Link Building
NLP Powered Outreach Link BuildingNLP Powered Outreach Link Building
NLP Powered Outreach Link Building
Catalyst
 
NLP for SEO
NLP for SEONLP for SEO
NLP for SEO
Catalyst
 
What I Learned Building a Toy Example to Crawl & Render like Google
What I Learned Building a Toy Example to Crawl & Render like GoogleWhat I Learned Building a Toy Example to Crawl & Render like Google
What I Learned Building a Toy Example to Crawl & Render like Google
Catalyst
 
The User is The Query: The Rise of Predictive Proactive Search
The User is The Query: The Rise of Predictive Proactive SearchThe User is The Query: The Rise of Predictive Proactive Search
The User is The Query: The Rise of Predictive Proactive Search
Catalyst
 

More from Catalyst (20)

Closing the Gap: Adopting Omnichannel Strategies for Stronger Brand-Consumer ...
Closing the Gap: Adopting Omnichannel Strategies for Stronger Brand-Consumer ...Closing the Gap: Adopting Omnichannel Strategies for Stronger Brand-Consumer ...
Closing the Gap: Adopting Omnichannel Strategies for Stronger Brand-Consumer ...
 
TechSEO Boost 2021 - Cultivating a Product Mindset for Success
TechSEO Boost 2021 - Cultivating a Product Mindset for SuccessTechSEO Boost 2021 - Cultivating a Product Mindset for Success
TechSEO Boost 2021 - Cultivating a Product Mindset for Success
 
TechSEO Boost 2021 - SEO Experimentation
TechSEO Boost 2021 - SEO ExperimentationTechSEO Boost 2021 - SEO Experimentation
TechSEO Boost 2021 - SEO Experimentation
 
TechSEO Boost 2021 - Rendering Strategies: Measuring the Devil’s Details in C...
TechSEO Boost 2021 - Rendering Strategies: Measuring the Devil’s Details in C...TechSEO Boost 2021 - Rendering Strategies: Measuring the Devil’s Details in C...
TechSEO Boost 2021 - Rendering Strategies: Measuring the Devil’s Details in C...
 
TechSEO Boost 2021 - The Future Is The Past: Tagging And Tracking Through The...
TechSEO Boost 2021 - The Future Is The Past: Tagging And Tracking Through The...TechSEO Boost 2021 - The Future Is The Past: Tagging And Tracking Through The...
TechSEO Boost 2021 - The Future Is The Past: Tagging And Tracking Through The...
 
10 Trends Changing Programmatic
10 Trends Changing Programmatic10 Trends Changing Programmatic
10 Trends Changing Programmatic
 
New Commerce Conference: Charting a Course to Success with Your Retail Media ...
New Commerce Conference: Charting a Course to Success with Your Retail Media ...New Commerce Conference: Charting a Course to Success with Your Retail Media ...
New Commerce Conference: Charting a Course to Success with Your Retail Media ...
 
The New Commerce Conference: The Omni-channel Imperative
The New Commerce Conference: The Omni-channel ImperativeThe New Commerce Conference: The Omni-channel Imperative
The New Commerce Conference: The Omni-channel Imperative
 
New Commerce Commerce: All Things Instacart
New Commerce Commerce: All Things InstacartNew Commerce Commerce: All Things Instacart
New Commerce Commerce: All Things Instacart
 
The Power of SEO: Protect Your Bottom Line & Future Proof Your Brand
The Power of SEO: Protect Your Bottom Line & Future Proof Your BrandThe Power of SEO: Protect Your Bottom Line & Future Proof Your Brand
The Power of SEO: Protect Your Bottom Line & Future Proof Your Brand
 
The Era of Omni-Commerce: New Insights for Dominating the Digital Shelf and B...
The Era of Omni-Commerce: New Insights for Dominating the Digital Shelf and B...The Era of Omni-Commerce: New Insights for Dominating the Digital Shelf and B...
The Era of Omni-Commerce: New Insights for Dominating the Digital Shelf and B...
 
Reignite Your Business with Performance Marketing: 4 Ways to Fuel Your Reopening
Reignite Your Business with Performance Marketing: 4 Ways to Fuel Your ReopeningReignite Your Business with Performance Marketing: 4 Ways to Fuel Your Reopening
Reignite Your Business with Performance Marketing: 4 Ways to Fuel Your Reopening
 
Reignite Your Business with Performance Marketing: 4 Ways to Dial-Up Brand In...
Reignite Your Business with Performance Marketing: 4 Ways to Dial-Up Brand In...Reignite Your Business with Performance Marketing: 4 Ways to Dial-Up Brand In...
Reignite Your Business with Performance Marketing: 4 Ways to Dial-Up Brand In...
 
Evolve Your Social Commerce Strategy: Thinking Beyond Facebook
Evolve Your Social Commerce Strategy: Thinking Beyond FacebookEvolve Your Social Commerce Strategy: Thinking Beyond Facebook
Evolve Your Social Commerce Strategy: Thinking Beyond Facebook
 
B2B SEO: Increase Traffic & Leads in 2020
B2B SEO: Increase Traffic & Leads in 2020B2B SEO: Increase Traffic & Leads in 2020
B2B SEO: Increase Traffic & Leads in 2020
 
Keynote: Bias in Search and Recommender Systems
Keynote: Bias in Search and Recommender SystemsKeynote: Bias in Search and Recommender Systems
Keynote: Bias in Search and Recommender Systems
 
NLP Powered Outreach Link Building
NLP Powered Outreach Link BuildingNLP Powered Outreach Link Building
NLP Powered Outreach Link Building
 
NLP for SEO
NLP for SEONLP for SEO
NLP for SEO
 
What I Learned Building a Toy Example to Crawl & Render like Google
What I Learned Building a Toy Example to Crawl & Render like GoogleWhat I Learned Building a Toy Example to Crawl & Render like Google
What I Learned Building a Toy Example to Crawl & Render like Google
 
The User is The Query: The Rise of Predictive Proactive Search
The User is The Query: The Rise of Predictive Proactive SearchThe User is The Query: The Rise of Predictive Proactive Search
The User is The Query: The Rise of Predictive Proactive Search
 

Recently uploaded

Is AI-Generated Content the Future of Content Creation?
Is AI-Generated Content the Future of Content Creation?Is AI-Generated Content the Future of Content Creation?
Is AI-Generated Content the Future of Content Creation?
Cut-the-SaaS
 
SEO Master Class - Steve Wiideman, Wiideman Consulting Group
SEO Master Class - Steve Wiideman,  Wiideman Consulting GroupSEO Master Class - Steve Wiideman,  Wiideman Consulting Group
SEO Master Class - Steve Wiideman, Wiideman Consulting Group
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Unknown to Unforgettable - The Art and Science to Being Irresistible on Camer...
Unknown to Unforgettable - The Art and Science to Being Irresistible on Camer...Unknown to Unforgettable - The Art and Science to Being Irresistible on Camer...
Unknown to Unforgettable - The Art and Science to Being Irresistible on Camer...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Marketing as a Primary Revenue Driver - Lee Levitt
Marketing as a Primary Revenue Driver - Lee LevittMarketing as a Primary Revenue Driver - Lee Levitt
Digital Marketing Training In Bangalore
Digital Marketing Training In BangaloreDigital Marketing Training In Bangalore
Digital Marketing Training In Bangalore
syedasifsyed46
 
Your Path to Profits - The Game-Changing Power of a Marketing OS for Your Bus...
Your Path to Profits - The Game-Changing Power of a Marketing OS for Your Bus...Your Path to Profits - The Game-Changing Power of a Marketing OS for Your Bus...
Your Path to Profits - The Game-Changing Power of a Marketing OS for Your Bus...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
The What, Why & How of 3D and AR in Digital Commerce
The What, Why & How of 3D and AR in Digital CommerceThe What, Why & How of 3D and AR in Digital Commerce
The What, Why & How of 3D and AR in Digital Commerce
PushON Ltd
 
Adapt or Die - Jon Lakefish, Lakefish Group LLC
Adapt or Die - Jon Lakefish, Lakefish Group LLCAdapt or Die - Jon Lakefish, Lakefish Group LLC
May 2024 - VBOUT Partners Meeting Group Session
May 2024 - VBOUT Partners Meeting Group SessionMay 2024 - VBOUT Partners Meeting Group Session
May 2024 - VBOUT Partners Meeting Group Session
Vbout.com
 
5 Big Bets for 2024 - Jamie A. Lee, Stripes Co
5 Big Bets for 2024 - Jamie A. Lee, Stripes Co5 Big Bets for 2024 - Jamie A. Lee, Stripes Co
Your Path to Profits - The Game-Changing Power of a Marketing - Daniel Bussius
Your Path to Profits - The Game-Changing Power of a Marketing - Daniel BussiusYour Path to Profits - The Game-Changing Power of a Marketing - Daniel Bussius
Your Path to Profits - The Game-Changing Power of a Marketing - Daniel Bussius
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
SMM Cheap - No. 1 SMM panel in the world
SMM Cheap - No. 1 SMM panel in the worldSMM Cheap - No. 1 SMM panel in the world
SMM Cheap - No. 1 SMM panel in the world
smmpanel567
 
SEO as the Backbone of Digital Marketing
SEO as the Backbone of Digital MarketingSEO as the Backbone of Digital Marketing
SEO as the Backbone of Digital Marketing
Felipe Bazon
 
34-Rahul-Mande.pdf PROJECT REPORT MBA 4TH SEMESTER
34-Rahul-Mande.pdf PROJECT REPORT MBA 4TH SEMESTER34-Rahul-Mande.pdf PROJECT REPORT MBA 4TH SEMESTER
34-Rahul-Mande.pdf PROJECT REPORT MBA 4TH SEMESTER
DeepakTripathi733493
 
Turn Digital Reputation Threats into Offense Tactics - Daniel Lemin
Turn Digital Reputation Threats into Offense Tactics - Daniel LeminTurn Digital Reputation Threats into Offense Tactics - Daniel Lemin
Turn Digital Reputation Threats into Offense Tactics - Daniel Lemin
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Winning local SEO in the Age of AI - Dennis Yu
Winning local SEO in the Age of AI - Dennis YuWinning local SEO in the Age of AI - Dennis Yu
Offissa Dizayn - Otel, Kafe, Restoran Kataloqu_240603_011042.pdf
Offissa Dizayn - Otel, Kafe, Restoran Kataloqu_240603_011042.pdfOffissa Dizayn - Otel, Kafe, Restoran Kataloqu_240603_011042.pdf
Offissa Dizayn - Otel, Kafe, Restoran Kataloqu_240603_011042.pdf
offisadizayn
 
Playlist and Paint Event with Sony Music U
Playlist and Paint Event with Sony Music UPlaylist and Paint Event with Sony Music U
Playlist and Paint Event with Sony Music U
SemajahParker
 
My Personal Brand Exploration by Mariano
My Personal Brand Exploration by MarianoMy Personal Brand Exploration by Mariano
My Personal Brand Exploration by Mariano
marianooscos
 
BLOOM_May2024. Balmer Lawrie Online Monthly Bulletin
BLOOM_May2024. Balmer Lawrie Online Monthly BulletinBLOOM_May2024. Balmer Lawrie Online Monthly Bulletin
BLOOM_May2024. Balmer Lawrie Online Monthly Bulletin
BalmerLawrie
 

Recently uploaded (20)

Is AI-Generated Content the Future of Content Creation?
Is AI-Generated Content the Future of Content Creation?Is AI-Generated Content the Future of Content Creation?
Is AI-Generated Content the Future of Content Creation?
 
SEO Master Class - Steve Wiideman, Wiideman Consulting Group
SEO Master Class - Steve Wiideman,  Wiideman Consulting GroupSEO Master Class - Steve Wiideman,  Wiideman Consulting Group
SEO Master Class - Steve Wiideman, Wiideman Consulting Group
 
Unknown to Unforgettable - The Art and Science to Being Irresistible on Camer...
Unknown to Unforgettable - The Art and Science to Being Irresistible on Camer...Unknown to Unforgettable - The Art and Science to Being Irresistible on Camer...
Unknown to Unforgettable - The Art and Science to Being Irresistible on Camer...
 
Marketing as a Primary Revenue Driver - Lee Levitt
Marketing as a Primary Revenue Driver - Lee LevittMarketing as a Primary Revenue Driver - Lee Levitt
Marketing as a Primary Revenue Driver - Lee Levitt
 
Digital Marketing Training In Bangalore
Digital Marketing Training In BangaloreDigital Marketing Training In Bangalore
Digital Marketing Training In Bangalore
 
Your Path to Profits - The Game-Changing Power of a Marketing OS for Your Bus...
Your Path to Profits - The Game-Changing Power of a Marketing OS for Your Bus...Your Path to Profits - The Game-Changing Power of a Marketing OS for Your Bus...
Your Path to Profits - The Game-Changing Power of a Marketing OS for Your Bus...
 
The What, Why & How of 3D and AR in Digital Commerce
The What, Why & How of 3D and AR in Digital CommerceThe What, Why & How of 3D and AR in Digital Commerce
The What, Why & How of 3D and AR in Digital Commerce
 
Adapt or Die - Jon Lakefish, Lakefish Group LLC
Adapt or Die - Jon Lakefish, Lakefish Group LLCAdapt or Die - Jon Lakefish, Lakefish Group LLC
Adapt or Die - Jon Lakefish, Lakefish Group LLC
 
May 2024 - VBOUT Partners Meeting Group Session
May 2024 - VBOUT Partners Meeting Group SessionMay 2024 - VBOUT Partners Meeting Group Session
May 2024 - VBOUT Partners Meeting Group Session
 
5 Big Bets for 2024 - Jamie A. Lee, Stripes Co
5 Big Bets for 2024 - Jamie A. Lee, Stripes Co5 Big Bets for 2024 - Jamie A. Lee, Stripes Co
5 Big Bets for 2024 - Jamie A. Lee, Stripes Co
 
Your Path to Profits - The Game-Changing Power of a Marketing - Daniel Bussius
Your Path to Profits - The Game-Changing Power of a Marketing - Daniel BussiusYour Path to Profits - The Game-Changing Power of a Marketing - Daniel Bussius
Your Path to Profits - The Game-Changing Power of a Marketing - Daniel Bussius
 
SMM Cheap - No. 1 SMM panel in the world
SMM Cheap - No. 1 SMM panel in the worldSMM Cheap - No. 1 SMM panel in the world
SMM Cheap - No. 1 SMM panel in the world
 
SEO as the Backbone of Digital Marketing
SEO as the Backbone of Digital MarketingSEO as the Backbone of Digital Marketing
SEO as the Backbone of Digital Marketing
 
34-Rahul-Mande.pdf PROJECT REPORT MBA 4TH SEMESTER
34-Rahul-Mande.pdf PROJECT REPORT MBA 4TH SEMESTER34-Rahul-Mande.pdf PROJECT REPORT MBA 4TH SEMESTER
34-Rahul-Mande.pdf PROJECT REPORT MBA 4TH SEMESTER
 
Turn Digital Reputation Threats into Offense Tactics - Daniel Lemin
Turn Digital Reputation Threats into Offense Tactics - Daniel LeminTurn Digital Reputation Threats into Offense Tactics - Daniel Lemin
Turn Digital Reputation Threats into Offense Tactics - Daniel Lemin
 
Winning local SEO in the Age of AI - Dennis Yu
Winning local SEO in the Age of AI - Dennis YuWinning local SEO in the Age of AI - Dennis Yu
Winning local SEO in the Age of AI - Dennis Yu
 
Offissa Dizayn - Otel, Kafe, Restoran Kataloqu_240603_011042.pdf
Offissa Dizayn - Otel, Kafe, Restoran Kataloqu_240603_011042.pdfOffissa Dizayn - Otel, Kafe, Restoran Kataloqu_240603_011042.pdf
Offissa Dizayn - Otel, Kafe, Restoran Kataloqu_240603_011042.pdf
 
Playlist and Paint Event with Sony Music U
Playlist and Paint Event with Sony Music UPlaylist and Paint Event with Sony Music U
Playlist and Paint Event with Sony Music U
 
My Personal Brand Exploration by Mariano
My Personal Brand Exploration by MarianoMy Personal Brand Exploration by Mariano
My Personal Brand Exploration by Mariano
 
BLOOM_May2024. Balmer Lawrie Online Monthly Bulletin
BLOOM_May2024. Balmer Lawrie Online Monthly BulletinBLOOM_May2024. Balmer Lawrie Online Monthly Bulletin
BLOOM_May2024. Balmer Lawrie Online Monthly Bulletin
 

Generating Qualitative Content with GPT-2 in All Languages

  • 1. #TechSEOBoost | @CatalystSEM THANK YOU TO OUR SPONSORS Generating Qualitative Content with GPT-2 in All Languages Vincent Terrasi, OnCrawl
  • 2. Vincent Terrasi | @vincentterrasi | #TechSEOBoost In All Languages Generating Qualitative Content
  • 3. Vincent Terrasi | @vincentterrasi | #TechSEOBoost SEO Use-cases • Image captioning with Pythia • Visual question & Answering • Abstractive Summarization with BERTsum • Full Article generation with GPT-2
  • 4. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Text Spinners are bad
  • 5. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Google, What is bad generated content in 2016? • Text translated by an automated tool without human review or curation before publishing • Text generated through automated processes, such as Markov chains • Text generated using automated synonymizing or obfuscation techniques • Text generated from scraping Atom/RSS feeds or search results • Stitching or combining content from different web pages without adding sufficient value https://web.archive.org/web/20160222004700/https://support.google.com/webmasters/answer/2721306?hl=en
  • 6. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Google, What is bad generated content in 2019? • Text that makes no sense to the reader but which may contain search keywords. • Text translated by an automated tool without human review or curation before publishing • Text generated through automated processes, such as Markov chains • Text generated using automated synonymizing or obfuscation techniques • Text generated from scraping Atom/RSS feeds or search results • Stitching or combining content from different web pages without adding sufficient value https://support.google.com/webmasters/answer/2721306?hl=en
  • 7. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Surprise!
  • 8. Vincent Terrasi | @vincentterrasi | #TechSEOBoost 2019, the best year for using AI for text generation
  • 9. Vincent Terrasi | @vincentterrasi | #TechSEOBoost GPT-2BERT ELMO ULM-FIT J Howard
  • 10. Vincent Terrasi | @vincentterrasi | #TechSEOBoost GPT-2BERT ELMO ULM-FIT J Howard
  • 11. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Transformer and Attention Model
  • 12. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Patterns for Attention Model Pattern 1: Attention to next word
  • 13. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Patterns for Attention Model Pattern 1: Attention to next word Pattern 2: Attention to previous word
  • 14. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Patterns for Attention Model Pattern 1: Attention to next word Pattern 2: Attention to previous word Pattern 3: Attention to identical/related words
  • 15. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Patterns for Attention Model Pattern 1: Attention to next word Pattern 2: Attention to previous word Pattern 3: Attention to identical/related words Pattern 4: Attention to identical/related words in other sentence
  • 16. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Patterns for Attention Model Pattern 1: Attention to next word Pattern 2: Attention to previous word Pattern 3: Attention to identical/related words Pattern 4: Attention to identical/related words in other sentence Pattern 5: Attention to other words predictive (next word) of word
  • 17. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Patterns for Attention Model Pattern 1: Attention to next word Pattern 2: Attention to previous word Pattern 3: Attention to identical/related words Pattern 4: Attention to identical/related words in other sentence Pattern 5: Attention to other words predictive (next word) of word Pattern 6: Attention to delimiter tokens
  • 18. Vincent Terrasi | @vincentterrasi | #TechSEOBoost State of the Art ⚫ All models exist for English ⚫ Documentation is good ⚫ So we just need to translate
  • 19. Vincent Terrasi | @vincentterrasi | #TechSEOBoost There are a lot of biases: ◦ Small Talk ◦ Idioms ◦ Local Named Entities ◦ Rarest Verbs ◦ Uncommon Tenses ◦ Gender rules
  • 20. Vincent Terrasi | @vincentterrasi | #TechSEOBoost How to scale? Create your own model in your language
  • 21. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Objectives Use only qualitative methods to improve the quality of content created by humans Extract the knowledge learnt by the Deep Learning.
  • 22. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Why others attempts have failed? Quantitative: You need a lot of data: more than 100 000 texts with a minimum of 500 words Qualitative: You need qualitative texts
  • 23. Vincent Terrasi | @vincentterrasi | #TechSEOBoost GPT-2 Recipe
  • 24. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Step 1: Training the model This method without pretraining requires significant computing power. You need GPUs! 3 days to get my first result with one GPU.
  • 25. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Step 2: Generating the compressed training dataset - 1/2 GPT-2 needs to learn with the Byte Pair Encoding (BPE) format which is a simple form of data compression. Why? - Predicting the next character is too imprecise - Predicting the next word is too precive and take a lot of computing power.
  • 26. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Step 2: Generating the compressed training dataset - 2/2 Use SentencePiece to generate my BPE files. Why? - Unsupervised text tokenizer and detokenizer - Purely end-to-end system that does not depend on language-specific pre/postprocessing.
  • 27. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Step 3: Fine-tuning the model Vocabulary size: depends on the language - n_vocab:50257
  • 28. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Step 3: Fine-tuning the model Vocabulary size: depends on the language - n_vocab:50257 Embedding size: default value recommended by Open AI team - n_embd:768
  • 29. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Step 3: Fine-tuning the model Vocabulary size: depends on the language - n_vocab:50257 Embedding size: default value recommended by Open AI team - n_embd:768 Size of attention: no greater accuracy if you increase this value - n_head:12
  • 30. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Step 3: Fine-tuning the model Vocabulary size: depends on the language - n_vocab:50257 Embedding size: default value recommended by Open AI team - n_embd:768 Size of attention: no greater accuracy if you increase this value - n_head:12 Number of layers: no greater accuracy if you increase this value - n_layer:12
  • 31. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Step 4: Generating article text Once the model has been trained, the gpt-2-gen command is used to generate a text. The first parameter is the path to the model. The second is the beginning of the sentence. Then there are two optional parameters: o --tokens-to-generate: number of tokens to generate, default 42 o --top-k: number of candidate tokens each time, by default 8.
  • 32. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Results & Quality Evaluated subjectively by a native reader. API pylanguagetool was used to quantifiably confirm the quality of results and did not find any errors in the generated text. https://github.com/Findus23/pyLanguagetool
  • 33. Vincent Terrasi | @vincentterrasi | #TechSEOBoost You can find my Google Colab Notebook here for the French https://colab.research.google.com/drive/13Lbk1TYmTjoQFO6qbw_f1TJgoD5ulJwV Warning: it is just an example using limited data. NOW it is your turn.
  • 34. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Further ? Parameters Objectives Use Cases top-k < 10 token < 10 High Performance Very high qualitative content related to your original training content Anchors for Internal Linking Variant of Title Variant of Meta top-k > 50 token > 400 Low Performance Low qualitative content because the model is weak, but the model successfully extracts all concepts that GPT-2 learnt about your dataset. Guides to help you write, compared to a query, with the stated purpose of saving you time.
  • 35. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Thank You vincent@oncrawl.com
  • 36. Catalyst | @CatalystSEM | #TechSEOBoost Thanks for Viewing the Slideshare! – Watch the Recording: https://youtube.com/session-example Or Contact us today to discover how Catalyst can deliver unparalleled SEO results for your business. https://www.catalystdigital.com/