DATA-SEO
NEXT LEVEL
VINCENT TERRASI / REMI BACHA
HEAD OF DATA / HEAD OF SEO
@vincentterrasi / @remibacha
Any sufficiently advanced technology
is indistinguishable from magic
ARTHUR CLARKE
BIG DATA
ARTIFICIAL INTELLIGENCE
DATA SCIENCE PROJECT
SEO PROJECTOVH
SEO
IS A BIG DATA JOB
UNDERSTAND
DATA
MANIPULATE
& ANALYSE
BRING VALUE
TO DATA
DATA SCIENCE
EMPIRICISM
01.
MAKE OBSERVATIONS
05BIS.
REFINE, ALTER, EXPAND,
OR REJECT HYPOTHESES
04.
DEVELOP TESTABLE
PREDICTIONS
02.
THINK OF INTERESTING
QUESTIONS
06.
DEVELOP
GENERAL THEORIES
05.
GATHER DATA TO TEST
PREDICTIONS
03.
FORMULATE
HYPOTHESES
DATA CENTRIC
EMPIRICISM
RANK BRAIN
01 02 03 04
CHANGING SEO FACTORS NEW FACTORS RANKING MISTAKES ULTRA-
PERSONNALISATION
IT’S TIME
TO UPGRADE SEO
MACHINE LEARNING
IA
BIG DATA
DATA SCIENCE
DEEP LEARNING
RANKBRAIN
WELCOME TO THE
DATA SEO ERA
NEW JOB
DATA SCIENTIST SEO
LEARNING DATA
SCIENCE
– Data Scientist Toolbox
– Getting & cleaning Data
– R / Python Programming
– Explorary data
– Machine Learning
– Big Data
SEO DATAMART
COMPETITORS
OTHER TRAFFIC SOURCES DATA
SOCIAL NETWORK
SEARCH CONSOLE
CRAWLS
STOCK, PRICES, SALES DATA
CUSTOMERS DATA
EVENTS
WEB ANALYTICS
NETLINKING
SEMANTICAL
WEBPERFS
SEARCH TRENDS
SERVER LOGS
SEO DATAMART
COMPETITORS
CRAWLS
NETLINKING
SEMANTICAL
WEBPERFS
XGBOOST
33TREES
10MAX DEPTH
100WAS GRID OF SIZE
ROC AUC : 0.915
?
?
?
?
?
?
?
MOST IMPORTANT VARIABLES
Screamingfrog_in_csv
Semrush_out_csv
Screamingfrog_in_csv_pre
pared
Semrush_
screamingfrog_out
_postgres
Majestic_out_
postgres
Visiblis_out_
postgres
Semrush_
screamingfrog_
majestic_visiblis_
Prediction
(XGBOOST_
CLASSIFICATION) on
DATAIKU DSS
The most complete
Data Science platform
Data
Preparation
Machine
Learning
Deployment Collaboration
WHY PREDICT
GOOGLE RANKINGS?
HOW TO PREDICT
GOOGLE RANKINGS?
GETTING SERP
DATA FROM SEMRUSH
CLEAN
DATA
REMOVE INVALID URLS
Slow Crawl
Rate
Non-HTML
Content
Network
Problems
Slow
Web Servers
WAIT TIMES
Errors from
Web Servers
URL Moved Permanently
Redirect (301)
URL Moved Temporarily
Redirect (302)
Authentication Required (401)
or Document Not Found (404)
Cyclic
Redirects
CREATE PREDICTION MODEL
XGBOOST
Adaptive boosting
Gradient boosting
Bagging
Random forest
BIAS RELATED
ERRORS
VARIANCE RELATED
ERRORS
?
?
?
?
?
?
?
XGBOOST
33TREES
10MAX DEPTH
100WAS GRID OF SIZE
ROC AUC : 0.915
MOST IMPORTANT VARIABLES
ExtBackLinks
RefDomains
TrustFlow
External Outlinks
Response Time
CitationFlow
TAKE AWAY
…
AUTOMATED MACHINE
LEARNING WITH DATAIKU
AUTOMATED KPI REPORTING SEO DATALAKE TEXT GENERATION
OPPORTUNITIES DETECTION PREDICTIVE ANALYSIS PROCESS MINING
AUTOMATED MACHINE
LEARNING WITH DATAIKU
SEO DATAMART
NOW, MACHINES CAN LEARN
AND ADAPT, IT IS TIME TO TAKE
ADVANTAGE OF THE OPPORTUNITY TO
CREATE NEW JOBS.
Data-SEO, Data-Doctor,
Data-Journalist …
THANK YOU
GET ALL OUR LAST DISCOVERIES AND UPDATES
Vincent TERRASI
@vincentterrasi
Remi BACHA
@remibacha
Data-seo.com Remibacha.com

Find out how DataScience has revolutionized SEO for OVH