SlideShare a Scribd company logo
1 of 36
Download to read offline
#TechSEOBoost | @CatalystSEM
THANK YOU TO OUR SPONSORS
Generating Qualitative Content with GPT-2
in All Languages
Vincent Terrasi, OnCrawl
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
In All Languages
Generating Qualitative
Content
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
SEO Use-cases
• Image captioning with Pythia
• Visual question & Answering
• Abstractive Summarization with BERTsum
• Full Article generation with GPT-2
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Text Spinners are bad
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Google, What is bad generated content in 2016?
• Text translated by an automated tool without human review or curation before
publishing
• Text generated through automated processes, such as Markov chains
• Text generated using automated synonymizing or obfuscation techniques
• Text generated from scraping Atom/RSS feeds or search results
• Stitching or combining content from different web pages without adding sufficient value
https://web.archive.org/web/20160222004700/https://support.google.com/webmasters/answer/2721306?hl=en
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Google, What is bad generated content in 2019?
• Text that makes no sense to the reader but which may contain search keywords.
• Text translated by an automated tool without human review or curation before
publishing
• Text generated through automated processes, such as Markov chains
• Text generated using automated synonymizing or obfuscation techniques
• Text generated from scraping Atom/RSS feeds or search results
• Stitching or combining content from different web pages without adding sufficient value
https://support.google.com/webmasters/answer/2721306?hl=en
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Surprise!
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
2019, the best year for
using AI for text
generation
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
GPT-2BERT
ELMO ULM-FIT
J Howard
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
GPT-2BERT
ELMO ULM-FIT
J Howard
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Transformer and Attention Model
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Patterns for Attention Model
Pattern 1: Attention to next word
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Patterns for Attention Model
Pattern 1: Attention to next word
Pattern 2: Attention to previous word
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Patterns for Attention Model
Pattern 1: Attention to next word
Pattern 2: Attention to previous word
Pattern 3: Attention to identical/related words
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Patterns for Attention Model
Pattern 1: Attention to next word
Pattern 2: Attention to previous word
Pattern 3: Attention to identical/related words
Pattern 4: Attention to identical/related words in other sentence
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Patterns for Attention Model
Pattern 1: Attention to next word
Pattern 2: Attention to previous word
Pattern 3: Attention to identical/related words
Pattern 4: Attention to identical/related words in other sentence
Pattern 5: Attention to other words predictive (next word) of word
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Patterns for Attention Model
Pattern 1: Attention to next word
Pattern 2: Attention to previous word
Pattern 3: Attention to identical/related words
Pattern 4: Attention to identical/related words in other sentence
Pattern 5: Attention to other words predictive (next word) of word
Pattern 6: Attention to delimiter tokens
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
State of the Art
⚫ All models exist for English
⚫ Documentation is good
⚫ So we just need to translate
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
There are a lot of biases:
◦ Small Talk
◦ Idioms
◦ Local Named Entities
◦ Rarest Verbs
◦ Uncommon Tenses
◦ Gender rules
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
How to scale?
Create your own model
in your language
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Objectives
Use only qualitative methods to improve
the quality of content created by humans
Extract the knowledge learnt by the Deep
Learning.
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Why others attempts have
failed?
Quantitative:
You need a lot of data: more than 100 000
texts with a minimum of 500 words
Qualitative:
You need qualitative texts
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
GPT-2
Recipe
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Step 1: Training the model
This method without pretraining requires significant computing power.
You need GPUs! 3 days to get my first result with one GPU.
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Step 2: Generating the compressed training dataset - 1/2
GPT-2 needs to learn with the Byte Pair Encoding (BPE) format which is a simple form of
data compression.
Why?
- Predicting the next character is too imprecise
- Predicting the next word is too precive and take a lot of computing power.
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Step 2: Generating the compressed training dataset - 2/2
Use SentencePiece to generate my BPE files.
Why?
- Unsupervised text tokenizer and detokenizer
- Purely end-to-end system that does not depend on language-specific
pre/postprocessing.
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Step 3: Fine-tuning the model
Vocabulary size: depends on the language
- n_vocab:50257
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Step 3: Fine-tuning the model
Vocabulary size: depends on the language
- n_vocab:50257
Embedding size: default value recommended by Open AI team
- n_embd:768
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Step 3: Fine-tuning the model
Vocabulary size: depends on the language
- n_vocab:50257
Embedding size: default value recommended by Open AI team
- n_embd:768
Size of attention: no greater accuracy if you increase this value
- n_head:12
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Step 3: Fine-tuning the model
Vocabulary size: depends on the language
- n_vocab:50257
Embedding size: default value recommended by Open AI team
- n_embd:768
Size of attention: no greater accuracy if you increase this value
- n_head:12
Number of layers: no greater accuracy if you increase this value
- n_layer:12
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Step 4: Generating article text
Once the model has been trained, the gpt-2-gen command is used to generate a text.
The first parameter is the path to the model.
The second is the beginning of the sentence.
Then there are two optional parameters:
o --tokens-to-generate: number of tokens to generate, default 42
o --top-k: number of candidate tokens each time, by default 8.
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Results & Quality
Evaluated subjectively by a native reader.
API pylanguagetool was used to quantifiably
confirm the quality of results and did not find
any errors in the generated text.
https://github.com/Findus23/pyLanguagetool
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
You can find my Google Colab Notebook
here for the French
https://colab.research.google.com/drive/13Lbk1TYmTjoQFO6qbw_f1TJgoD5ulJwV
Warning: it is just an example using limited
data.
NOW it is your turn.
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Further ?
Parameters Objectives Use Cases
top-k < 10
token < 10
High Performance
Very high qualitative content related
to your original training content
Anchors for Internal Linking
Variant of Title
Variant of Meta
top-k > 50
token > 400
Low Performance
Low qualitative content because the
model is weak, but the model
successfully extracts all concepts
that GPT-2 learnt about your dataset.
Guides to help you write, compared
to a query, with the stated purpose of
saving you time.
Vincent Terrasi | @vincentterrasi | #TechSEOBoost
Thank You
vincent@oncrawl.com
Catalyst | @CatalystSEM | #TechSEOBoost
Thanks for Viewing the Slideshare!
–
Watch the Recording: https://youtube.com/session-example
Or
Contact us today to discover how Catalyst can deliver unparalleled SEO
results for your business. https://www.catalystdigital.com/

More Related Content

What's hot

Cost Effective Multilingual Content Optimization in An International SEO Process
Cost Effective Multilingual Content Optimization in An International SEO ProcessCost Effective Multilingual Content Optimization in An International SEO Process
Cost Effective Multilingual Content Optimization in An International SEO ProcessAleyda Solís
 
A beginner's guide to machine learning for SEOs - WTSFest 2022
A beginner's guide to machine learning for SEOs  - WTSFest 2022A beginner's guide to machine learning for SEOs  - WTSFest 2022
A beginner's guide to machine learning for SEOs - WTSFest 2022LazarinaStoyanova
 
BrightonSEO October 2022 - Log File Analysis - Steven van Vessum.pdf
BrightonSEO October 2022 - Log File Analysis - Steven van Vessum.pdfBrightonSEO October 2022 - Log File Analysis - Steven van Vessum.pdf
BrightonSEO October 2022 - Log File Analysis - Steven van Vessum.pdfSteven van Vessum
 
TechSEO Boost - Apps script for SEOs
TechSEO Boost - Apps script for SEOsTechSEO Boost - Apps script for SEOs
TechSEO Boost - Apps script for SEOsDavid Sottimano
 
Brighton SEO 2023 - ML Lessons For Total Search.pdf
Brighton SEO 2023 - ML Lessons For Total Search.pdfBrighton SEO 2023 - ML Lessons For Total Search.pdf
Brighton SEO 2023 - ML Lessons For Total Search.pdfMaxFlajsner1
 
Command Line Hacks For SEO - Brighton April 2018 - Tom Pool
Command Line Hacks For SEO - Brighton April 2018 - Tom PoolCommand Line Hacks For SEO - Brighton April 2018 - Tom Pool
Command Line Hacks For SEO - Brighton April 2018 - Tom PoolTom Pool
 
How to Create A Corporate Social Responsibility (CSR) Strategy (And Why it Ma...
How to Create A Corporate Social Responsibility (CSR) Strategy (And Why it Ma...How to Create A Corporate Social Responsibility (CSR) Strategy (And Why it Ma...
How to Create A Corporate Social Responsibility (CSR) Strategy (And Why it Ma...RebekahDunne
 
Networking for SEOs (and why it matters)
Networking for SEOs (and why it matters)Networking for SEOs (and why it matters)
Networking for SEOs (and why it matters)GretaKoivikko
 
MeasureFest 2021: Interactive Core Web Vitals In Data Studio
MeasureFest 2021: Interactive Core Web Vitals In Data StudioMeasureFest 2021: Interactive Core Web Vitals In Data Studio
MeasureFest 2021: Interactive Core Web Vitals In Data StudioLazarinaStoyanova
 
How to produce great multilingual content, even when you can't read it | Laur...
How to produce great multilingual content, even when you can't read it | Laur...How to produce great multilingual content, even when you can't read it | Laur...
How to produce great multilingual content, even when you can't read it | Laur...Oban International
 
El poder del estilo para impactar tu SEO - César Aparicio, Cráneo Previlegiad...
El poder del estilo para impactar tu SEO - César Aparicio, Cráneo Previlegiad...El poder del estilo para impactar tu SEO - César Aparicio, Cráneo Previlegiad...
El poder del estilo para impactar tu SEO - César Aparicio, Cráneo Previlegiad...Cráneo Previlegiado
 
SEO desde la línea de comandos
SEO desde la línea de comandosSEO desde la línea de comandos
SEO desde la línea de comandosLino Uruñuela
 
The 8-Step eCommerce Framework to Elevate Your SEO Game at #WTSFest 2020
The 8-Step eCommerce Framework to Elevate Your SEO Game at #WTSFest 2020The 8-Step eCommerce Framework to Elevate Your SEO Game at #WTSFest 2020
The 8-Step eCommerce Framework to Elevate Your SEO Game at #WTSFest 2020Kristina Azarenko
 
GPT and other Text Transformers: Black Swans and Stochastic Parrots
GPT and other Text Transformers:  Black Swans and Stochastic ParrotsGPT and other Text Transformers:  Black Swans and Stochastic Parrots
GPT and other Text Transformers: Black Swans and Stochastic ParrotsKonstantin Savenkov
 
"La intención es lo que cuenta" en SEO en 2021
"La intención es lo que cuenta" en SEO en 2021"La intención es lo que cuenta" en SEO en 2021
"La intención es lo que cuenta" en SEO en 2021MJ Cachón Yáñez
 
What we can learn from losing SEO tests
What we can learn from losing SEO testsWhat we can learn from losing SEO tests
What we can learn from losing SEO testsWill Critchlow
 
Data Driven Approach to Scale SEO at BrightonSEO 2023
Data Driven Approach to Scale SEO at BrightonSEO 2023Data Driven Approach to Scale SEO at BrightonSEO 2023
Data Driven Approach to Scale SEO at BrightonSEO 2023Nitin Manchanda
 
Actionable and Impactful SEO Audits #SearchNorwich
Actionable and Impactful SEO Audits  #SearchNorwichActionable and Impactful SEO Audits  #SearchNorwich
Actionable and Impactful SEO Audits #SearchNorwichAleyda Solís
 
How to Become a Successful Remote SEO Consultant #LisbonSEOMeetup
How to Become a Successful Remote SEO Consultant #LisbonSEOMeetupHow to Become a Successful Remote SEO Consultant #LisbonSEOMeetup
How to Become a Successful Remote SEO Consultant #LisbonSEOMeetupAleyda Solís
 
How to Grow your Organic Search Traffic in International Markets #ConnectaBern
How to Grow your Organic Search Traffic in International Markets #ConnectaBernHow to Grow your Organic Search Traffic in International Markets #ConnectaBern
How to Grow your Organic Search Traffic in International Markets #ConnectaBernAleyda Solís
 

What's hot (20)

Cost Effective Multilingual Content Optimization in An International SEO Process
Cost Effective Multilingual Content Optimization in An International SEO ProcessCost Effective Multilingual Content Optimization in An International SEO Process
Cost Effective Multilingual Content Optimization in An International SEO Process
 
A beginner's guide to machine learning for SEOs - WTSFest 2022
A beginner's guide to machine learning for SEOs  - WTSFest 2022A beginner's guide to machine learning for SEOs  - WTSFest 2022
A beginner's guide to machine learning for SEOs - WTSFest 2022
 
BrightonSEO October 2022 - Log File Analysis - Steven van Vessum.pdf
BrightonSEO October 2022 - Log File Analysis - Steven van Vessum.pdfBrightonSEO October 2022 - Log File Analysis - Steven van Vessum.pdf
BrightonSEO October 2022 - Log File Analysis - Steven van Vessum.pdf
 
TechSEO Boost - Apps script for SEOs
TechSEO Boost - Apps script for SEOsTechSEO Boost - Apps script for SEOs
TechSEO Boost - Apps script for SEOs
 
Brighton SEO 2023 - ML Lessons For Total Search.pdf
Brighton SEO 2023 - ML Lessons For Total Search.pdfBrighton SEO 2023 - ML Lessons For Total Search.pdf
Brighton SEO 2023 - ML Lessons For Total Search.pdf
 
Command Line Hacks For SEO - Brighton April 2018 - Tom Pool
Command Line Hacks For SEO - Brighton April 2018 - Tom PoolCommand Line Hacks For SEO - Brighton April 2018 - Tom Pool
Command Line Hacks For SEO - Brighton April 2018 - Tom Pool
 
How to Create A Corporate Social Responsibility (CSR) Strategy (And Why it Ma...
How to Create A Corporate Social Responsibility (CSR) Strategy (And Why it Ma...How to Create A Corporate Social Responsibility (CSR) Strategy (And Why it Ma...
How to Create A Corporate Social Responsibility (CSR) Strategy (And Why it Ma...
 
Networking for SEOs (and why it matters)
Networking for SEOs (and why it matters)Networking for SEOs (and why it matters)
Networking for SEOs (and why it matters)
 
MeasureFest 2021: Interactive Core Web Vitals In Data Studio
MeasureFest 2021: Interactive Core Web Vitals In Data StudioMeasureFest 2021: Interactive Core Web Vitals In Data Studio
MeasureFest 2021: Interactive Core Web Vitals In Data Studio
 
How to produce great multilingual content, even when you can't read it | Laur...
How to produce great multilingual content, even when you can't read it | Laur...How to produce great multilingual content, even when you can't read it | Laur...
How to produce great multilingual content, even when you can't read it | Laur...
 
El poder del estilo para impactar tu SEO - César Aparicio, Cráneo Previlegiad...
El poder del estilo para impactar tu SEO - César Aparicio, Cráneo Previlegiad...El poder del estilo para impactar tu SEO - César Aparicio, Cráneo Previlegiad...
El poder del estilo para impactar tu SEO - César Aparicio, Cráneo Previlegiad...
 
SEO desde la línea de comandos
SEO desde la línea de comandosSEO desde la línea de comandos
SEO desde la línea de comandos
 
The 8-Step eCommerce Framework to Elevate Your SEO Game at #WTSFest 2020
The 8-Step eCommerce Framework to Elevate Your SEO Game at #WTSFest 2020The 8-Step eCommerce Framework to Elevate Your SEO Game at #WTSFest 2020
The 8-Step eCommerce Framework to Elevate Your SEO Game at #WTSFest 2020
 
GPT and other Text Transformers: Black Swans and Stochastic Parrots
GPT and other Text Transformers:  Black Swans and Stochastic ParrotsGPT and other Text Transformers:  Black Swans and Stochastic Parrots
GPT and other Text Transformers: Black Swans and Stochastic Parrots
 
"La intención es lo que cuenta" en SEO en 2021
"La intención es lo que cuenta" en SEO en 2021"La intención es lo que cuenta" en SEO en 2021
"La intención es lo que cuenta" en SEO en 2021
 
What we can learn from losing SEO tests
What we can learn from losing SEO testsWhat we can learn from losing SEO tests
What we can learn from losing SEO tests
 
Data Driven Approach to Scale SEO at BrightonSEO 2023
Data Driven Approach to Scale SEO at BrightonSEO 2023Data Driven Approach to Scale SEO at BrightonSEO 2023
Data Driven Approach to Scale SEO at BrightonSEO 2023
 
Actionable and Impactful SEO Audits #SearchNorwich
Actionable and Impactful SEO Audits  #SearchNorwichActionable and Impactful SEO Audits  #SearchNorwich
Actionable and Impactful SEO Audits #SearchNorwich
 
How to Become a Successful Remote SEO Consultant #LisbonSEOMeetup
How to Become a Successful Remote SEO Consultant #LisbonSEOMeetupHow to Become a Successful Remote SEO Consultant #LisbonSEOMeetup
How to Become a Successful Remote SEO Consultant #LisbonSEOMeetup
 
How to Grow your Organic Search Traffic in International Markets #ConnectaBern
How to Grow your Organic Search Traffic in International Markets #ConnectaBernHow to Grow your Organic Search Traffic in International Markets #ConnectaBern
How to Grow your Organic Search Traffic in International Markets #ConnectaBern
 

Similar to Generating Qualitative Content with GPT-2 in All Languages

Automate, Create Tools, & Test Ideas Quickly with Google Apps Script
Automate, Create Tools, & Test Ideas Quickly with Google Apps ScriptAutomate, Create Tools, & Test Ideas Quickly with Google Apps Script
Automate, Create Tools, & Test Ideas Quickly with Google Apps ScriptCatalyst
 
ChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdfChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdfSonal Tiwari
 
TechSEO Boost 2019: Research Competition
TechSEO Boost 2019: Research CompetitionTechSEO Boost 2019: Research Competition
TechSEO Boost 2019: Research CompetitionCatalyst
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
BTech Final Project (1).pptx
BTech Final Project (1).pptxBTech Final Project (1).pptx
BTech Final Project (1).pptxSwarajPatel19
 
Machine Learning for Designers
Machine Learning for DesignersMachine Learning for Designers
Machine Learning for DesignersMemi Beltrame
 
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSISMOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSISEditor Jacotech
 
How to build your in-house ChatGPT
How to build your in-house ChatGPT How to build your in-house ChatGPT
How to build your in-house ChatGPT Citynow Asia Inc
 
Improve existing code with confidence, supported by unit tests
Improve existing code with confidence, supported by unit testsImprove existing code with confidence, supported by unit tests
Improve existing code with confidence, supported by unit testsDattatray Kale
 
Deep Learning using Tensorflow and Data Science Experience
Deep Learning using Tensorflow and Data Science ExperienceDeep Learning using Tensorflow and Data Science Experience
Deep Learning using Tensorflow and Data Science ExperienceRoy Cecil
 
Python For Technical SEO | Women In Tech SEO Festival March 2020 | Ruth Everett
Python For Technical SEO | Women In Tech SEO Festival March 2020 | Ruth Everett Python For Technical SEO | Women In Tech SEO Festival March 2020 | Ruth Everett
Python For Technical SEO | Women In Tech SEO Festival March 2020 | Ruth Everett Ruth Everett
 
Five steps to search and store tweets by keywords
Five steps to search and store tweets by keywordsFive steps to search and store tweets by keywords
Five steps to search and store tweets by keywordsWeiai Wayne Xu
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextBayu Aldi Yansyah
 
Machine Learning and Python For Marketing Automation | MKGO October 2019 | Ru...
Machine Learning and Python For Marketing Automation | MKGO October 2019 | Ru...Machine Learning and Python For Marketing Automation | MKGO October 2019 | Ru...
Machine Learning and Python For Marketing Automation | MKGO October 2019 | Ru...Ruth Everett
 
How can AI be a creative partner for PR & marketing?
How can AI be a creative partner for PR & marketing?How can AI be a creative partner for PR & marketing?
How can AI be a creative partner for PR & marketing?Thomas Winters
 
Sentiment analysis on demonetisation
Sentiment analysis on demonetisationSentiment analysis on demonetisation
Sentiment analysis on demonetisationAbrarMohamed5
 
Let's Make Pentesting Fun Again! Report writing in 5 minutes.
Let's Make Pentesting Fun Again! Report writing in 5 minutes.Let's Make Pentesting Fun Again! Report writing in 5 minutes.
Let's Make Pentesting Fun Again! Report writing in 5 minutes.DefCamp
 

Similar to Generating Qualitative Content with GPT-2 in All Languages (20)

Automate, Create Tools, & Test Ideas Quickly with Google Apps Script
Automate, Create Tools, & Test Ideas Quickly with Google Apps ScriptAutomate, Create Tools, & Test Ideas Quickly with Google Apps Script
Automate, Create Tools, & Test Ideas Quickly with Google Apps Script
 
ChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdfChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdf
 
TechSEO Boost 2019: Research Competition
TechSEO Boost 2019: Research CompetitionTechSEO Boost 2019: Research Competition
TechSEO Boost 2019: Research Competition
 
Analyzing Real Time News
Analyzing Real Time NewsAnalyzing Real Time News
Analyzing Real Time News
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
BTech Final Project (1).pptx
BTech Final Project (1).pptxBTech Final Project (1).pptx
BTech Final Project (1).pptx
 
Machine Learning for Designers
Machine Learning for DesignersMachine Learning for Designers
Machine Learning for Designers
 
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSISMOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
 
How to build your in-house ChatGPT
How to build your in-house ChatGPT How to build your in-house ChatGPT
How to build your in-house ChatGPT
 
Improve existing code with confidence, supported by unit tests
Improve existing code with confidence, supported by unit testsImprove existing code with confidence, supported by unit tests
Improve existing code with confidence, supported by unit tests
 
Deep Learning using Tensorflow and Data Science Experience
Deep Learning using Tensorflow and Data Science ExperienceDeep Learning using Tensorflow and Data Science Experience
Deep Learning using Tensorflow and Data Science Experience
 
Python For Technical SEO | Women In Tech SEO Festival March 2020 | Ruth Everett
Python For Technical SEO | Women In Tech SEO Festival March 2020 | Ruth Everett Python For Technical SEO | Women In Tech SEO Festival March 2020 | Ruth Everett
Python For Technical SEO | Women In Tech SEO Festival March 2020 | Ruth Everett
 
Five steps to search and store tweets by keywords
Five steps to search and store tweets by keywordsFive steps to search and store tweets by keywords
Five steps to search and store tweets by keywords
 
MmIT webinar 2018 - Essential tools and technologies for the library and info...
MmIT webinar 2018 - Essential tools and technologies for the library and info...MmIT webinar 2018 - Essential tools and technologies for the library and info...
MmIT webinar 2018 - Essential tools and technologies for the library and info...
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastText
 
Machine Learning and Python For Marketing Automation | MKGO October 2019 | Ru...
Machine Learning and Python For Marketing Automation | MKGO October 2019 | Ru...Machine Learning and Python For Marketing Automation | MKGO October 2019 | Ru...
Machine Learning and Python For Marketing Automation | MKGO October 2019 | Ru...
 
Thesis Presentation V4
Thesis Presentation V4Thesis Presentation V4
Thesis Presentation V4
 
How can AI be a creative partner for PR & marketing?
How can AI be a creative partner for PR & marketing?How can AI be a creative partner for PR & marketing?
How can AI be a creative partner for PR & marketing?
 
Sentiment analysis on demonetisation
Sentiment analysis on demonetisationSentiment analysis on demonetisation
Sentiment analysis on demonetisation
 
Let's Make Pentesting Fun Again! Report writing in 5 minutes.
Let's Make Pentesting Fun Again! Report writing in 5 minutes.Let's Make Pentesting Fun Again! Report writing in 5 minutes.
Let's Make Pentesting Fun Again! Report writing in 5 minutes.
 

More from Catalyst

Closing the Gap: Adopting Omnichannel Strategies for Stronger Brand-Consumer ...
Closing the Gap: Adopting Omnichannel Strategies for Stronger Brand-Consumer ...Closing the Gap: Adopting Omnichannel Strategies for Stronger Brand-Consumer ...
Closing the Gap: Adopting Omnichannel Strategies for Stronger Brand-Consumer ...Catalyst
 
TechSEO Boost 2021 - Cultivating a Product Mindset for Success
TechSEO Boost 2021 - Cultivating a Product Mindset for SuccessTechSEO Boost 2021 - Cultivating a Product Mindset for Success
TechSEO Boost 2021 - Cultivating a Product Mindset for SuccessCatalyst
 
TechSEO Boost 2021 - SEO Experimentation
TechSEO Boost 2021 - SEO ExperimentationTechSEO Boost 2021 - SEO Experimentation
TechSEO Boost 2021 - SEO ExperimentationCatalyst
 
TechSEO Boost 2021 - Rendering Strategies: Measuring the Devil’s Details in C...
TechSEO Boost 2021 - Rendering Strategies: Measuring the Devil’s Details in C...TechSEO Boost 2021 - Rendering Strategies: Measuring the Devil’s Details in C...
TechSEO Boost 2021 - Rendering Strategies: Measuring the Devil’s Details in C...Catalyst
 
TechSEO Boost 2021 - The Future Is The Past: Tagging And Tracking Through The...
TechSEO Boost 2021 - The Future Is The Past: Tagging And Tracking Through The...TechSEO Boost 2021 - The Future Is The Past: Tagging And Tracking Through The...
TechSEO Boost 2021 - The Future Is The Past: Tagging And Tracking Through The...Catalyst
 
10 Trends Changing Programmatic
10 Trends Changing Programmatic10 Trends Changing Programmatic
10 Trends Changing ProgrammaticCatalyst
 
New Commerce Conference: Charting a Course to Success with Your Retail Media ...
New Commerce Conference: Charting a Course to Success with Your Retail Media ...New Commerce Conference: Charting a Course to Success with Your Retail Media ...
New Commerce Conference: Charting a Course to Success with Your Retail Media ...Catalyst
 
The New Commerce Conference: The Omni-channel Imperative
The New Commerce Conference: The Omni-channel ImperativeThe New Commerce Conference: The Omni-channel Imperative
The New Commerce Conference: The Omni-channel ImperativeCatalyst
 
New Commerce Commerce: All Things Instacart
New Commerce Commerce: All Things InstacartNew Commerce Commerce: All Things Instacart
New Commerce Commerce: All Things InstacartCatalyst
 
The Power of SEO: Protect Your Bottom Line & Future Proof Your Brand
The Power of SEO: Protect Your Bottom Line & Future Proof Your BrandThe Power of SEO: Protect Your Bottom Line & Future Proof Your Brand
The Power of SEO: Protect Your Bottom Line & Future Proof Your BrandCatalyst
 
The Era of Omni-Commerce: New Insights for Dominating the Digital Shelf and B...
The Era of Omni-Commerce: New Insights for Dominating the Digital Shelf and B...The Era of Omni-Commerce: New Insights for Dominating the Digital Shelf and B...
The Era of Omni-Commerce: New Insights for Dominating the Digital Shelf and B...Catalyst
 
Reignite Your Business with Performance Marketing: 4 Ways to Fuel Your Reopening
Reignite Your Business with Performance Marketing: 4 Ways to Fuel Your ReopeningReignite Your Business with Performance Marketing: 4 Ways to Fuel Your Reopening
Reignite Your Business with Performance Marketing: 4 Ways to Fuel Your ReopeningCatalyst
 
Reignite Your Business with Performance Marketing: 4 Ways to Dial-Up Brand In...
Reignite Your Business with Performance Marketing: 4 Ways to Dial-Up Brand In...Reignite Your Business with Performance Marketing: 4 Ways to Dial-Up Brand In...
Reignite Your Business with Performance Marketing: 4 Ways to Dial-Up Brand In...Catalyst
 
Evolve Your Social Commerce Strategy: Thinking Beyond Facebook
Evolve Your Social Commerce Strategy: Thinking Beyond FacebookEvolve Your Social Commerce Strategy: Thinking Beyond Facebook
Evolve Your Social Commerce Strategy: Thinking Beyond FacebookCatalyst
 
B2B SEO: Increase Traffic & Leads in 2020
B2B SEO: Increase Traffic & Leads in 2020B2B SEO: Increase Traffic & Leads in 2020
B2B SEO: Increase Traffic & Leads in 2020Catalyst
 
Keynote: Bias in Search and Recommender Systems
Keynote: Bias in Search and Recommender SystemsKeynote: Bias in Search and Recommender Systems
Keynote: Bias in Search and Recommender SystemsCatalyst
 
NLP Powered Outreach Link Building
NLP Powered Outreach Link BuildingNLP Powered Outreach Link Building
NLP Powered Outreach Link BuildingCatalyst
 
NLP for SEO
NLP for SEONLP for SEO
NLP for SEOCatalyst
 
What I Learned Building a Toy Example to Crawl & Render like Google
What I Learned Building a Toy Example to Crawl & Render like GoogleWhat I Learned Building a Toy Example to Crawl & Render like Google
What I Learned Building a Toy Example to Crawl & Render like GoogleCatalyst
 
The User is The Query: The Rise of Predictive Proactive Search
The User is The Query: The Rise of Predictive Proactive SearchThe User is The Query: The Rise of Predictive Proactive Search
The User is The Query: The Rise of Predictive Proactive SearchCatalyst
 

More from Catalyst (20)

Closing the Gap: Adopting Omnichannel Strategies for Stronger Brand-Consumer ...
Closing the Gap: Adopting Omnichannel Strategies for Stronger Brand-Consumer ...Closing the Gap: Adopting Omnichannel Strategies for Stronger Brand-Consumer ...
Closing the Gap: Adopting Omnichannel Strategies for Stronger Brand-Consumer ...
 
TechSEO Boost 2021 - Cultivating a Product Mindset for Success
TechSEO Boost 2021 - Cultivating a Product Mindset for SuccessTechSEO Boost 2021 - Cultivating a Product Mindset for Success
TechSEO Boost 2021 - Cultivating a Product Mindset for Success
 
TechSEO Boost 2021 - SEO Experimentation
TechSEO Boost 2021 - SEO ExperimentationTechSEO Boost 2021 - SEO Experimentation
TechSEO Boost 2021 - SEO Experimentation
 
TechSEO Boost 2021 - Rendering Strategies: Measuring the Devil’s Details in C...
TechSEO Boost 2021 - Rendering Strategies: Measuring the Devil’s Details in C...TechSEO Boost 2021 - Rendering Strategies: Measuring the Devil’s Details in C...
TechSEO Boost 2021 - Rendering Strategies: Measuring the Devil’s Details in C...
 
TechSEO Boost 2021 - The Future Is The Past: Tagging And Tracking Through The...
TechSEO Boost 2021 - The Future Is The Past: Tagging And Tracking Through The...TechSEO Boost 2021 - The Future Is The Past: Tagging And Tracking Through The...
TechSEO Boost 2021 - The Future Is The Past: Tagging And Tracking Through The...
 
10 Trends Changing Programmatic
10 Trends Changing Programmatic10 Trends Changing Programmatic
10 Trends Changing Programmatic
 
New Commerce Conference: Charting a Course to Success with Your Retail Media ...
New Commerce Conference: Charting a Course to Success with Your Retail Media ...New Commerce Conference: Charting a Course to Success with Your Retail Media ...
New Commerce Conference: Charting a Course to Success with Your Retail Media ...
 
The New Commerce Conference: The Omni-channel Imperative
The New Commerce Conference: The Omni-channel ImperativeThe New Commerce Conference: The Omni-channel Imperative
The New Commerce Conference: The Omni-channel Imperative
 
New Commerce Commerce: All Things Instacart
New Commerce Commerce: All Things InstacartNew Commerce Commerce: All Things Instacart
New Commerce Commerce: All Things Instacart
 
The Power of SEO: Protect Your Bottom Line & Future Proof Your Brand
The Power of SEO: Protect Your Bottom Line & Future Proof Your BrandThe Power of SEO: Protect Your Bottom Line & Future Proof Your Brand
The Power of SEO: Protect Your Bottom Line & Future Proof Your Brand
 
The Era of Omni-Commerce: New Insights for Dominating the Digital Shelf and B...
The Era of Omni-Commerce: New Insights for Dominating the Digital Shelf and B...The Era of Omni-Commerce: New Insights for Dominating the Digital Shelf and B...
The Era of Omni-Commerce: New Insights for Dominating the Digital Shelf and B...
 
Reignite Your Business with Performance Marketing: 4 Ways to Fuel Your Reopening
Reignite Your Business with Performance Marketing: 4 Ways to Fuel Your ReopeningReignite Your Business with Performance Marketing: 4 Ways to Fuel Your Reopening
Reignite Your Business with Performance Marketing: 4 Ways to Fuel Your Reopening
 
Reignite Your Business with Performance Marketing: 4 Ways to Dial-Up Brand In...
Reignite Your Business with Performance Marketing: 4 Ways to Dial-Up Brand In...Reignite Your Business with Performance Marketing: 4 Ways to Dial-Up Brand In...
Reignite Your Business with Performance Marketing: 4 Ways to Dial-Up Brand In...
 
Evolve Your Social Commerce Strategy: Thinking Beyond Facebook
Evolve Your Social Commerce Strategy: Thinking Beyond FacebookEvolve Your Social Commerce Strategy: Thinking Beyond Facebook
Evolve Your Social Commerce Strategy: Thinking Beyond Facebook
 
B2B SEO: Increase Traffic & Leads in 2020
B2B SEO: Increase Traffic & Leads in 2020B2B SEO: Increase Traffic & Leads in 2020
B2B SEO: Increase Traffic & Leads in 2020
 
Keynote: Bias in Search and Recommender Systems
Keynote: Bias in Search and Recommender SystemsKeynote: Bias in Search and Recommender Systems
Keynote: Bias in Search and Recommender Systems
 
NLP Powered Outreach Link Building
NLP Powered Outreach Link BuildingNLP Powered Outreach Link Building
NLP Powered Outreach Link Building
 
NLP for SEO
NLP for SEONLP for SEO
NLP for SEO
 
What I Learned Building a Toy Example to Crawl & Render like Google
What I Learned Building a Toy Example to Crawl & Render like GoogleWhat I Learned Building a Toy Example to Crawl & Render like Google
What I Learned Building a Toy Example to Crawl & Render like Google
 
The User is The Query: The Rise of Predictive Proactive Search
The User is The Query: The Rise of Predictive Proactive SearchThe User is The Query: The Rise of Predictive Proactive Search
The User is The Query: The Rise of Predictive Proactive Search
 

Recently uploaded

Storyboards for my Final Major Project Video
Storyboards for my Final Major Project VideoStoryboards for my Final Major Project Video
Storyboards for my Final Major Project VideoSineadBidwell
 
ASO Process: What is App Store Optimization
ASO Process: What is App Store OptimizationASO Process: What is App Store Optimization
ASO Process: What is App Store OptimizationAli Raza
 
2024's Top PPC Tactics: Triple Your Google Ads Local Leads
2024's Top PPC Tactics: Triple Your Google Ads Local Leads2024's Top PPC Tactics: Triple Your Google Ads Local Leads
2024's Top PPC Tactics: Triple Your Google Ads Local LeadsSearch Engine Journal
 
What are the 4 characteristics of CTAs that convert?
What are the 4 characteristics of CTAs that convert?What are the 4 characteristics of CTAs that convert?
What are the 4 characteristics of CTAs that convert?Juan Pineda
 
Jai Institute for Parenting Program Guide
Jai Institute for Parenting Program GuideJai Institute for Parenting Program Guide
Jai Institute for Parenting Program Guidekiva6
 
Master the Art of Digital Recruitment in Asia.pdf
Master the Art of Digital Recruitment in Asia.pdfMaster the Art of Digital Recruitment in Asia.pdf
Master the Art of Digital Recruitment in Asia.pdfHigher Education Marketing
 
Word Count for Writers: Examples of Word Counts for Sample Genres
Word Count for Writers: Examples of Word Counts for Sample GenresWord Count for Writers: Examples of Word Counts for Sample Genres
Word Count for Writers: Examples of Word Counts for Sample GenresLisa M. Masiello
 
Most Impressive Construction Leaders in Tech, Making Waves in the Industry, 2...
Most Impressive Construction Leaders in Tech, Making Waves in the Industry, 2...Most Impressive Construction Leaders in Tech, Making Waves in the Industry, 2...
Most Impressive Construction Leaders in Tech, Making Waves in the Industry, 2...CIO Business World
 
Fiverr's Product Marketing Interview Assignment
Fiverr's Product Marketing Interview AssignmentFiverr's Product Marketing Interview Assignment
Fiverr's Product Marketing Interview AssignmentFarrel Brest
 
From Chance to Choice - Tactical Link Building for International SEO
From Chance to Choice - Tactical Link Building for International SEOFrom Chance to Choice - Tactical Link Building for International SEO
From Chance to Choice - Tactical Link Building for International SEOSzymon Słowik
 
Influencer Marketing Power point presentation
Influencer Marketing  Power point presentationInfluencer Marketing  Power point presentation
Influencer Marketing Power point presentationdgtivemarketingagenc
 
The 10 Most Inspirational Leaders LEADING THE WAY TO SUCCESS, 2024
The 10 Most Inspirational Leaders LEADING THE WAY TO SUCCESS, 2024The 10 Most Inspirational Leaders LEADING THE WAY TO SUCCESS, 2024
The 10 Most Inspirational Leaders LEADING THE WAY TO SUCCESS, 2024CIO Business World
 
Common Culture: Paul Willis Symbolic Creativity
Common Culture: Paul Willis Symbolic CreativityCommon Culture: Paul Willis Symbolic Creativity
Common Culture: Paul Willis Symbolic CreativityMonishka Adhikari
 
DIGITAL MARKETING STRATEGY_INFOGRAPHIC IMAGE.pdf
DIGITAL MARKETING STRATEGY_INFOGRAPHIC IMAGE.pdfDIGITAL MARKETING STRATEGY_INFOGRAPHIC IMAGE.pdf
DIGITAL MARKETING STRATEGY_INFOGRAPHIC IMAGE.pdfmayanksharma0441
 
Michael Kors marketing assignment swot analysis
Michael Kors marketing assignment swot analysisMichael Kors marketing assignment swot analysis
Michael Kors marketing assignment swot analysisjunaid794917
 
The Pitfalls of Keyword Stuffing in SEO Copywriting
The Pitfalls of Keyword Stuffing in SEO CopywritingThe Pitfalls of Keyword Stuffing in SEO Copywriting
The Pitfalls of Keyword Stuffing in SEO CopywritingJuan Pineda
 
Call Girls in Lajpat Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Lajpat Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Lajpat Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Lajpat Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Inbound Marekting 2.0 - The Paradigm Shift in Marketing | Axon Garside
Inbound Marekting 2.0 - The Paradigm Shift in Marketing | Axon GarsideInbound Marekting 2.0 - The Paradigm Shift in Marketing | Axon Garside
Inbound Marekting 2.0 - The Paradigm Shift in Marketing | Axon Garsiderobwhite630290
 
2024 SEO Trends for Business Success (WSA)
2024 SEO Trends for Business Success (WSA)2024 SEO Trends for Business Success (WSA)
2024 SEO Trends for Business Success (WSA)Jomer Gregorio
 
marketing strategy of tanishq word PPROJECT.pdf
marketing strategy of tanishq word PPROJECT.pdfmarketing strategy of tanishq word PPROJECT.pdf
marketing strategy of tanishq word PPROJECT.pdfarsathsahil
 

Recently uploaded (20)

Storyboards for my Final Major Project Video
Storyboards for my Final Major Project VideoStoryboards for my Final Major Project Video
Storyboards for my Final Major Project Video
 
ASO Process: What is App Store Optimization
ASO Process: What is App Store OptimizationASO Process: What is App Store Optimization
ASO Process: What is App Store Optimization
 
2024's Top PPC Tactics: Triple Your Google Ads Local Leads
2024's Top PPC Tactics: Triple Your Google Ads Local Leads2024's Top PPC Tactics: Triple Your Google Ads Local Leads
2024's Top PPC Tactics: Triple Your Google Ads Local Leads
 
What are the 4 characteristics of CTAs that convert?
What are the 4 characteristics of CTAs that convert?What are the 4 characteristics of CTAs that convert?
What are the 4 characteristics of CTAs that convert?
 
Jai Institute for Parenting Program Guide
Jai Institute for Parenting Program GuideJai Institute for Parenting Program Guide
Jai Institute for Parenting Program Guide
 
Master the Art of Digital Recruitment in Asia.pdf
Master the Art of Digital Recruitment in Asia.pdfMaster the Art of Digital Recruitment in Asia.pdf
Master the Art of Digital Recruitment in Asia.pdf
 
Word Count for Writers: Examples of Word Counts for Sample Genres
Word Count for Writers: Examples of Word Counts for Sample GenresWord Count for Writers: Examples of Word Counts for Sample Genres
Word Count for Writers: Examples of Word Counts for Sample Genres
 
Most Impressive Construction Leaders in Tech, Making Waves in the Industry, 2...
Most Impressive Construction Leaders in Tech, Making Waves in the Industry, 2...Most Impressive Construction Leaders in Tech, Making Waves in the Industry, 2...
Most Impressive Construction Leaders in Tech, Making Waves in the Industry, 2...
 
Fiverr's Product Marketing Interview Assignment
Fiverr's Product Marketing Interview AssignmentFiverr's Product Marketing Interview Assignment
Fiverr's Product Marketing Interview Assignment
 
From Chance to Choice - Tactical Link Building for International SEO
From Chance to Choice - Tactical Link Building for International SEOFrom Chance to Choice - Tactical Link Building for International SEO
From Chance to Choice - Tactical Link Building for International SEO
 
Influencer Marketing Power point presentation
Influencer Marketing  Power point presentationInfluencer Marketing  Power point presentation
Influencer Marketing Power point presentation
 
The 10 Most Inspirational Leaders LEADING THE WAY TO SUCCESS, 2024
The 10 Most Inspirational Leaders LEADING THE WAY TO SUCCESS, 2024The 10 Most Inspirational Leaders LEADING THE WAY TO SUCCESS, 2024
The 10 Most Inspirational Leaders LEADING THE WAY TO SUCCESS, 2024
 
Common Culture: Paul Willis Symbolic Creativity
Common Culture: Paul Willis Symbolic CreativityCommon Culture: Paul Willis Symbolic Creativity
Common Culture: Paul Willis Symbolic Creativity
 
DIGITAL MARKETING STRATEGY_INFOGRAPHIC IMAGE.pdf
DIGITAL MARKETING STRATEGY_INFOGRAPHIC IMAGE.pdfDIGITAL MARKETING STRATEGY_INFOGRAPHIC IMAGE.pdf
DIGITAL MARKETING STRATEGY_INFOGRAPHIC IMAGE.pdf
 
Michael Kors marketing assignment swot analysis
Michael Kors marketing assignment swot analysisMichael Kors marketing assignment swot analysis
Michael Kors marketing assignment swot analysis
 
The Pitfalls of Keyword Stuffing in SEO Copywriting
The Pitfalls of Keyword Stuffing in SEO CopywritingThe Pitfalls of Keyword Stuffing in SEO Copywriting
The Pitfalls of Keyword Stuffing in SEO Copywriting
 
Call Girls in Lajpat Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Lajpat Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Lajpat Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Lajpat Nagar Delhi 💯Call Us 🔝8264348440🔝
 
Inbound Marekting 2.0 - The Paradigm Shift in Marketing | Axon Garside
Inbound Marekting 2.0 - The Paradigm Shift in Marketing | Axon GarsideInbound Marekting 2.0 - The Paradigm Shift in Marketing | Axon Garside
Inbound Marekting 2.0 - The Paradigm Shift in Marketing | Axon Garside
 
2024 SEO Trends for Business Success (WSA)
2024 SEO Trends for Business Success (WSA)2024 SEO Trends for Business Success (WSA)
2024 SEO Trends for Business Success (WSA)
 
marketing strategy of tanishq word PPROJECT.pdf
marketing strategy of tanishq word PPROJECT.pdfmarketing strategy of tanishq word PPROJECT.pdf
marketing strategy of tanishq word PPROJECT.pdf
 

Generating Qualitative Content with GPT-2 in All Languages

  • 1. #TechSEOBoost | @CatalystSEM THANK YOU TO OUR SPONSORS Generating Qualitative Content with GPT-2 in All Languages Vincent Terrasi, OnCrawl
  • 2. Vincent Terrasi | @vincentterrasi | #TechSEOBoost In All Languages Generating Qualitative Content
  • 3. Vincent Terrasi | @vincentterrasi | #TechSEOBoost SEO Use-cases • Image captioning with Pythia • Visual question & Answering • Abstractive Summarization with BERTsum • Full Article generation with GPT-2
  • 4. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Text Spinners are bad
  • 5. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Google, What is bad generated content in 2016? • Text translated by an automated tool without human review or curation before publishing • Text generated through automated processes, such as Markov chains • Text generated using automated synonymizing or obfuscation techniques • Text generated from scraping Atom/RSS feeds or search results • Stitching or combining content from different web pages without adding sufficient value https://web.archive.org/web/20160222004700/https://support.google.com/webmasters/answer/2721306?hl=en
  • 6. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Google, What is bad generated content in 2019? • Text that makes no sense to the reader but which may contain search keywords. • Text translated by an automated tool without human review or curation before publishing • Text generated through automated processes, such as Markov chains • Text generated using automated synonymizing or obfuscation techniques • Text generated from scraping Atom/RSS feeds or search results • Stitching or combining content from different web pages without adding sufficient value https://support.google.com/webmasters/answer/2721306?hl=en
  • 7. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Surprise!
  • 8. Vincent Terrasi | @vincentterrasi | #TechSEOBoost 2019, the best year for using AI for text generation
  • 9. Vincent Terrasi | @vincentterrasi | #TechSEOBoost GPT-2BERT ELMO ULM-FIT J Howard
  • 10. Vincent Terrasi | @vincentterrasi | #TechSEOBoost GPT-2BERT ELMO ULM-FIT J Howard
  • 11. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Transformer and Attention Model
  • 12. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Patterns for Attention Model Pattern 1: Attention to next word
  • 13. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Patterns for Attention Model Pattern 1: Attention to next word Pattern 2: Attention to previous word
  • 14. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Patterns for Attention Model Pattern 1: Attention to next word Pattern 2: Attention to previous word Pattern 3: Attention to identical/related words
  • 15. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Patterns for Attention Model Pattern 1: Attention to next word Pattern 2: Attention to previous word Pattern 3: Attention to identical/related words Pattern 4: Attention to identical/related words in other sentence
  • 16. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Patterns for Attention Model Pattern 1: Attention to next word Pattern 2: Attention to previous word Pattern 3: Attention to identical/related words Pattern 4: Attention to identical/related words in other sentence Pattern 5: Attention to other words predictive (next word) of word
  • 17. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Patterns for Attention Model Pattern 1: Attention to next word Pattern 2: Attention to previous word Pattern 3: Attention to identical/related words Pattern 4: Attention to identical/related words in other sentence Pattern 5: Attention to other words predictive (next word) of word Pattern 6: Attention to delimiter tokens
  • 18. Vincent Terrasi | @vincentterrasi | #TechSEOBoost State of the Art ⚫ All models exist for English ⚫ Documentation is good ⚫ So we just need to translate
  • 19. Vincent Terrasi | @vincentterrasi | #TechSEOBoost There are a lot of biases: ◦ Small Talk ◦ Idioms ◦ Local Named Entities ◦ Rarest Verbs ◦ Uncommon Tenses ◦ Gender rules
  • 20. Vincent Terrasi | @vincentterrasi | #TechSEOBoost How to scale? Create your own model in your language
  • 21. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Objectives Use only qualitative methods to improve the quality of content created by humans Extract the knowledge learnt by the Deep Learning.
  • 22. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Why others attempts have failed? Quantitative: You need a lot of data: more than 100 000 texts with a minimum of 500 words Qualitative: You need qualitative texts
  • 23. Vincent Terrasi | @vincentterrasi | #TechSEOBoost GPT-2 Recipe
  • 24. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Step 1: Training the model This method without pretraining requires significant computing power. You need GPUs! 3 days to get my first result with one GPU.
  • 25. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Step 2: Generating the compressed training dataset - 1/2 GPT-2 needs to learn with the Byte Pair Encoding (BPE) format which is a simple form of data compression. Why? - Predicting the next character is too imprecise - Predicting the next word is too precive and take a lot of computing power.
  • 26. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Step 2: Generating the compressed training dataset - 2/2 Use SentencePiece to generate my BPE files. Why? - Unsupervised text tokenizer and detokenizer - Purely end-to-end system that does not depend on language-specific pre/postprocessing.
  • 27. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Step 3: Fine-tuning the model Vocabulary size: depends on the language - n_vocab:50257
  • 28. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Step 3: Fine-tuning the model Vocabulary size: depends on the language - n_vocab:50257 Embedding size: default value recommended by Open AI team - n_embd:768
  • 29. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Step 3: Fine-tuning the model Vocabulary size: depends on the language - n_vocab:50257 Embedding size: default value recommended by Open AI team - n_embd:768 Size of attention: no greater accuracy if you increase this value - n_head:12
  • 30. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Step 3: Fine-tuning the model Vocabulary size: depends on the language - n_vocab:50257 Embedding size: default value recommended by Open AI team - n_embd:768 Size of attention: no greater accuracy if you increase this value - n_head:12 Number of layers: no greater accuracy if you increase this value - n_layer:12
  • 31. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Step 4: Generating article text Once the model has been trained, the gpt-2-gen command is used to generate a text. The first parameter is the path to the model. The second is the beginning of the sentence. Then there are two optional parameters: o --tokens-to-generate: number of tokens to generate, default 42 o --top-k: number of candidate tokens each time, by default 8.
  • 32. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Results & Quality Evaluated subjectively by a native reader. API pylanguagetool was used to quantifiably confirm the quality of results and did not find any errors in the generated text. https://github.com/Findus23/pyLanguagetool
  • 33. Vincent Terrasi | @vincentterrasi | #TechSEOBoost You can find my Google Colab Notebook here for the French https://colab.research.google.com/drive/13Lbk1TYmTjoQFO6qbw_f1TJgoD5ulJwV Warning: it is just an example using limited data. NOW it is your turn.
  • 34. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Further ? Parameters Objectives Use Cases top-k < 10 token < 10 High Performance Very high qualitative content related to your original training content Anchors for Internal Linking Variant of Title Variant of Meta top-k > 50 token > 400 Low Performance Low qualitative content because the model is weak, but the model successfully extracts all concepts that GPT-2 learnt about your dataset. Guides to help you write, compared to a query, with the stated purpose of saving you time.
  • 35. Vincent Terrasi | @vincentterrasi | #TechSEOBoost Thank You vincent@oncrawl.com
  • 36. Catalyst | @CatalystSEM | #TechSEOBoost Thanks for Viewing the Slideshare! – Watch the Recording: https://youtube.com/session-example Or Contact us today to discover how Catalyst can deliver unparalleled SEO results for your business. https://www.catalystdigital.com/