SlideShare a Scribd company logo
1 of 74
How to automate all
your SEO projects
@VincentTerrasi
OVH
Planning
• Each Day :
• Advanced Reporting
• Anomalies Detection
• Log Analysis
• Webperf with SiteSpeed.io
• Each Week :
• Ranking monitoring
• Opportunities Detection
• Hot Topic Detection
• Each Quarter :
• Semantic Analysis
Time is precious
Automate
everything
1. RStudio Server
2. Shiny Server
3. Jupyter Notebook
4. Dataiku
5. OpenSource
searchConsoleR
Docker
ATinternetR oncrawlR
Rstudio
Server
Shiny Server Dataiku
DataLake
Scheduled
Email
Notebook DataAPIShiny Apps DataViz
Reports
1. RStudio Server
Automate all your SEO projects
Why R ?
Scriptable
Big Community
Mac / PC / Unix
Open Source
Free
 10 000 packages
Rgui
WheRe ? How ?
Rstudio
https://www.cran.r-project.org
1
2
3
4
RStudio Server
OVH – Instance Cloud
• Docker on Ubuntu 16.04 Server
• From the docker window, run:
• sudo docker run -d -p 8787:8787 rocker/rstudio
• e.g. http://yourIP:8787, and you should be greeted by the RStudio
welcome screen.
Log in using:
• username: rstudio
• password: rstudio
RStudio Server - Install
• install.packages("httr")
• install.packages("RCurl")
• install.packages("stringr")
• install.packages("stringi")
• install.packages("openssl")
• install.packages("Rmpi")
• install.packages("doMpi")
R – Scraper – Packages
R – Scraper – RCurl
seocrawler <- function( url ) {
useragent <- "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko)
Version/6.0 Mobile/10A5376e Safari/8536.25“
h <- basicTextGatherer()
html <- getURL(url
,followlocation = TRUE
,ssl.verifypeer = FALSE
,httpheader = c('User-Agent' = useragent)
,headerfunction = h$update
)
return(html)
}
R – Scraper – Header
ind0 <- grep("HTTP/",h$value(NULL))
df$StatusCode <- tail(h$value(NULL)[ind0],1)
ind1 <- grep("^Content-Type",h$value(NULL))
df$ContentType <- gsub("Content-Type:","",tail(h$value(NULL)[ind1],1))
ind2 <- grep("Last-Modified",h$value(NULL))
df$LastModified <- gsub("Last-Modified:","",tail(h$value(NULL)[ind2],1))
ind3 <- grep("Content-Language",h$value(NULL))
df$ContentLanguage <- gsub("Content-Language:","",tail(h$value(NULL)[ind3],1))
ind4 <- grep("Location",h$value(NULL))
df$Location <- gsub("Location:","",tail(h$value(NULL)[ind4],1))
R – Scraper – Xpath
doc <- htmlParse(html, asText=TRUE,encoding="UTF-8")
• H1 <- head(xpathSApply(doc, "//h1", xmlValue),1)
• H2 <- head(xpathSApply(doc, "//h2", xmlValue),1)
• robots <- head(xpathSApply(doc, '//meta[@name="robots"]', xmlGetAttr, 'content'),1)
• canonical <- head(xpathSApply(doc, '//link[@rel="canonical"]', xmlGetAttr, 'href'),1)
• DF_a <- xpathSApply(doc, "//a", xmlGetAttr, 'href')
How-to go parallel in R
R – Scraper – OpenMpi
• MPI : Message Passing Interface is a specification for an API for passing
messages between different computers.
• Programming with MPI
• Difficult because of Rmpi package defines about 110 R functions
• Needs a parallel programming system to do the actual work in parallel
• The doMPI package acts as an adaptor to the Rmpi package, which in
turn is an R interface to an implementation of MPI
• Very easy to install Open MPI, and Rmpi on Debian / Ubuntu
• You can test with one computer
R – Scraper – Install OpenMPI
sudo yum install openmpi openmpi-devel openmpi-libs
sudo ldconfig /usr/lib64/openmpi/lib/
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}${LD_LIBRARY_PATH:+:}/usr/lib64/openmpi/lib/“
install.packages("Rmpi",
configure.args =
c("--with-Rmpi-include=/usr/include/openmpi-x86_64/",
"--with-Rmpi-libpath=/usr/lib64/openmpi/lib/",
"--with-Rmpi-type=OPENMPI"))
install.packages (“doMPI“)
R – Scraper – Test doMpi
library(doMPI)
#start your cluster
cl <- startMPIcluster(count=20)
registerDoMPI(cl)
#
max <- dim(mydataset)[1]
x <- foreach(i=1:max, .combine="rbind") %dopar% seocrawlerThread(mydataset,i)
#close your cluster
closeCluster(cl)
• Venn Matrix :
http://blog.mrbioinfo.com/
R – Semantic Analysis – Intro
R – Semantic Analysis – Data
R – Semantic Analysis – eVenn
evenn(pathRes="./eVenn/", matLists=all.the.data, annot=FALSE, CompName=“croisiere”)
R – Semantic Analysis – Filter
fichierVenn <- "./eVenn/Venn_croisiere/VennMatrixBin.txt"
#read csv
DF <- read.csv(fichierVenn, sep = "t", encoding="CP1252", stringsAsFactors=FALSE)
#find
DF_PotentialKeywords <- subset(DF, DF$Total_lists >= 4 & DF$planete.croisiere.com==0 )
R – Semantic Analysis – nGram
library(text2vec)
it <- itoken( DF_PotentialKeywords[['Keywords']],
preprocess_function = tolower,
tokenizer = word_tokenizer,
progessbar = F )
# 2 and 3 grams
vocab <- create_vocabulary(it, ngram = c(2L, 3L))
DF_SEO_vocab <- data.frame(vocab$vocab)
DF_SEO_select <- data.frame(word=DF_SEO_vocab$terms,
freq=DF_SEO_vocab$terms_counts) %>%
arrange(-freq) %>%
top_n(30)
• Dplyr
• Readxl
• SearchConsoleR
• googleAuthR
• googleAnalyticsR
R – Packages SEO
Thanks to Mark Edmondson
R – SearchConsoleR
library(googleAuthR)
library(searchConsoleR)
# get your password on google console api
options("searchConsoleR.client_id" = "41078866233615q3i3uXXXX.apps.googleusercontent.com")
options("searchConsoleR.client_secret" = "GO0m0XXXXXXXXXX")
## change this to the website you want to download data for. Include http
website <- "https://data-seo.fr"
## data is in search console reliably 3 days ago, so we donwnload from then
## today - 3 days
start <- Sys.Date() - 3
## one days data, but change it as needed
end <- Sys.Date() - 3
R – SearchConsoleR
## what to download, choose between data, query, page, device, country
download_dimensions <- c('date','query')
## what type of Google search, choose between 'web', 'video' or 'image'
type <- c('web')
## Authorize script with Search Console.
## First time you will need to login to Google but should auto-refresh after that so can be put in
## Authorize script with an account that has access to website.
googleAuthR::gar_auth()
## first time stop here and wait for authorisation
## get the search analytics data
data <- search_analytics(siteURL = website, startDate = start, endDate = end, dimensions =
download_dimensions, searchType = type)
• Table: Crontab Fields and Allowed Ranges (Linux Crontab Syntax)
• MIN Minute field 0 to 59
• HOUR Hour field 0 to 23
• DOM Day of Month 1-31
• MON Month field 1-12
• DOW Day Of Week 0-6
• CMD Command Any command to be executed.
• $ crontab –e
• Run the R script filePath.R at 23:15 for every day of the year :
15 23 * * * Rscript filePath.R
R – CronTab – Method 1
• R Package : https://github.com/bnosac/cronR
R – Cron – Method 2
library(cronR)
cron_add(cmd, frequency = 'hourly', id = 'job4', at = '00:20',
days_of_week = c(1, 2))
cron_add(cmd, frequency = 'daily', id = 'job5', at = '14:20')
cron_add(cmd, frequency = 'daily', id = 'job6', at = '14:20',
days_of_week = c(0, 3, 5))
OR
Automated Reports
2. Shiny Server
Creating webapps with R
Shiny Server - Why
Shiny Server – Where and How
• ShinyApps.io
• A local server
• Hosted on your server
• docker run --rm -p 3838:3838
-v /srv/shinyapps/:/srv/shiny-server/
-v /srv/shinylog/:/var/log/
rocker/shiny
• If you have an app in /srv/shinyapps/appdir, you can run the app
by visiting http://yourIP:3838/appdir/.
Shiny Server - Install
Shiny – ui.R
fluidPage(
titlePanel("Compute your internal pagerank"),
sidebarLayout(
sidebarPanel(
a("data-seo.com", href="https://data-seo.com"),
tags$hr(),
p('Step 1 : Export your outlinks data from ScreamingFrog'),
fileInput('file1', 'Choose file to upload (e.g. all_outlinks.csv)',
accept = c('text/csv'), multiple = FALSE
),
tags$hr(),
downloadButton('downloadData', 'Download CSV')
),
mainPanel(
h3(textOutput("caption")),
tags$hr(),
tableOutput('contents')
)
)
)
Shiny – server.R
function(input, output, session) {
....
output$contents <- renderTable({
if (!is.null(input$file1)) {
inFile <- input$file1
logsSummary <- importLogs(inFile$datapath)
logsSummary
}
})
output$downloadData <- downloadHandler(
filename = "extract.csv",
content = function(file) {
if (!is.null(input$file1)) {
inFile <- input$file1
logsSummary <- importLogs(inFile$datapath)
write.csv2(logsSummary,file, row.names = FALSE)
}
}
)
}
https://mark.shinyapps.io/GA-dashboard-demo
Code on Github: https://github.com/MarkEdmondson1234/ga-dashboard-demo
• Interactive trend graphs.
• Auto-updating Google Analytics data.
• Zoomable day-of-week heatmaps.
• Top Level Trends via Year on Year, Month on Month
and Last Month vs Month Last Year data modules.
• A MySQL connection for data blending your own data with GA data.
• An easy upload option to update a MySQL database.
• Analysis of the impact of marketing events via Google's CausalImpact.
• Detection of unusual time-points using Twitter's Anomaly Detection.
Shiny – Use case
Automated KPI reporting
3. Jupyter Notebook
Sharing source code with your SEO team
Jupyter Notebook Example
• Reproducibility
• Quality
• Discoverability
• Learning
Jupyter Notebook – Why ?
Step 1 — Installing Python 2.7 and Pip
$ sudo apt-get update
$ sudo apt-get -y install python2.7 python-pip python-dev
Step 2 — Installing Ipython and Jupyter Notebook
$ sudo apt-get -y install ipython ipython-notebook
$ sudo -H pip install jupyter
Step 3 — Running Jupyter Notebook
$ jupyter notebook
Jupyter Notebook Install
Notebook Example
• https://github.com/voltek62/RNotebook-SEO
• Semantic Analysis for SEO
• Scraper for SEO
Jupyter Notebook Examples
Process Validation
Documentation
4. Dataiku
Use AML to find the best algorithm
Automated Machine Learning
• Benchmarking
• Detecting Target Leakage
• Diagnostics
• Automation
$ adduser vincent sudo
$ sudo apt-get install default-jre
$ wget https://downloads.dataiku.com/public/studio/4.0.1/dataiku-dss-4.0.1.tar.gz
$ tar xzf dataiku-dss-4.0.1.tar.gz
$ cd dataiku-dss-4.0.1
>> install all prerequites
$ sudo -i "/home/dataiku-dss-4.0.1/scripts/install/install-deps.sh" -without-java
>> install dataiku
$ ./installer.sh -d DATA_DIR -p 11000
$ DATA_DIR/bin/dss start
http://<your server address>:11000.
Dataiku- Install on Instance Cloud
Go to the DSS data dir
$ cd DATADIR
Stop DSS
$ ./bin/dss stop
Run the installation script
$ ./bin/dssadmin install-R-integration
$ ./bin/dss start
Dataiku- Install R
Install R Package
Use-Case :
Detect Featured
Snippet
• Get all your featured snippet with Ranxplorer
• Get SERP for each keywords with Ranxplorer
• Use homemade scraper to enrich data :
• 'Keyword' 'Domain' 'StatusCode' 'ContentType' 'LastModified' 'Location'
• 'Title' 'TitleLength' 'TitleDist' 'TitleIsQuestion'
• 'noSnippet' 'isJsonLD' 'isItemType' 'isItemProp'
• 'Wordcount' 'Size' 'ResponseTime'
• 'H1' 'H1Length' 'H1Dist' 'H1IsQuestion'
• 'H2' 'H2Length' 'H2Dist' 'H2IsQuestion‘
• Use AML to find importance features
Dataiku : Featured Snippet
Dataiku : Flow
Dataiku : Input / Output
Dataiku : Code Recipe
Dataiku : Visual Recipes
Dataiku : Plugin recipes
Dataiku : My Plugins
• SEMrush
• SearchConsole
• Majestic
• Visiblis [ongoing]
A DSS plugin is a zip file.
Inside DSS, click the top right gear → Administration → Plugins → Store.
https://github.com/voltek62/Dataiku-SEO-Plugins
Dataiku : AML
Dataiku : Import a project
• Learn from the success of others with AML
• Use all methods at your disposal to show Google you are the
answer to the question. ( Title, H1, H2, … )
Dataiku : Results
Automated Machine Learning
• Yes, you can because :
• Great advertising
• Get customers for specific features and trainings
Open Source & SEO ?
• Showing your work
• Attract talent
• Teaching the next generation
• Automated Reports with Rstudio Server
• Automated KPI reporting with Shiny Server
• Process Validation Documentation with Jupyter Notebook
• Automated Machine Learning with Dataiku
Take away
Now, machines can learn and adapt,
it is time to take advantage of the
opportunity to create new jobs.
Data-SEO, Data-Doctor, Data-Journalist …
Thank you!
Vincent Terrasi
@vincentterrasi
Get all my last discoveries and updates

More Related Content

What's hot

Real-time search in Drupal. Meet Elasticsearch
Real-time search in Drupal. Meet ElasticsearchReal-time search in Drupal. Meet Elasticsearch
Real-time search in Drupal. Meet ElasticsearchAlexei Gorobets
 
Practical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesPractical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesItamar
 
Elasticsearch Distributed search & analytics on BigData made easy
Elasticsearch Distributed search & analytics on BigData made easyElasticsearch Distributed search & analytics on BigData made easy
Elasticsearch Distributed search & analytics on BigData made easyItamar
 
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...javier ramirez
 
The ultimate guide for Elasticsearch plugins
The ultimate guide for Elasticsearch pluginsThe ultimate guide for Elasticsearch plugins
The ultimate guide for Elasticsearch pluginsItamar
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutesDavid Pilato
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampAlexei Gorobets
 
Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)Federico Panini
 
Dcm#8 elastic search
Dcm#8  elastic searchDcm#8  elastic search
Dcm#8 elastic searchIvan Wallarm
 
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...Rahul K Chauhan
 
Data Exploration with Elasticsearch
Data Exploration with ElasticsearchData Exploration with Elasticsearch
Data Exploration with ElasticsearchAleksander Stensby
 
Simple search with elastic search
Simple search with elastic searchSimple search with elastic search
Simple search with elastic searchmarkstory
 
JSON REST API for WordPress
JSON REST API for WordPressJSON REST API for WordPress
JSON REST API for WordPressTaylor Lovett
 
Managing Your Content with Elasticsearch
Managing Your Content with ElasticsearchManaging Your Content with Elasticsearch
Managing Your Content with ElasticsearchSamantha Quiñones
 
ElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseRobert Lujo
 
[LDSP] Solr Usage
[LDSP] Solr Usage[LDSP] Solr Usage
[LDSP] Solr UsageJimmy Lai
 
A Survey of Elasticsearch Usage
A Survey of Elasticsearch UsageA Survey of Elasticsearch Usage
A Survey of Elasticsearch UsageGreg Brown
 
The JSON REST API for WordPress
The JSON REST API for WordPressThe JSON REST API for WordPress
The JSON REST API for WordPressTaylor Lovett
 

What's hot (20)

Real-time search in Drupal. Meet Elasticsearch
Real-time search in Drupal. Meet ElasticsearchReal-time search in Drupal. Meet Elasticsearch
Real-time search in Drupal. Meet Elasticsearch
 
Practical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesPractical Elasticsearch - real world use cases
Practical Elasticsearch - real world use cases
 
Elasticsearch Distributed search & analytics on BigData made easy
Elasticsearch Distributed search & analytics on BigData made easyElasticsearch Distributed search & analytics on BigData made easy
Elasticsearch Distributed search & analytics on BigData made easy
 
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...
 
The ultimate guide for Elasticsearch plugins
The ultimate guide for Elasticsearch pluginsThe ultimate guide for Elasticsearch plugins
The ultimate guide for Elasticsearch plugins
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutes
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @Moldcamp
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
Scrapy-101
Scrapy-101Scrapy-101
Scrapy-101
 
Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)
 
Dcm#8 elastic search
Dcm#8  elastic searchDcm#8  elastic search
Dcm#8 elastic search
 
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
 
Data Exploration with Elasticsearch
Data Exploration with ElasticsearchData Exploration with Elasticsearch
Data Exploration with Elasticsearch
 
Simple search with elastic search
Simple search with elastic searchSimple search with elastic search
Simple search with elastic search
 
JSON REST API for WordPress
JSON REST API for WordPressJSON REST API for WordPress
JSON REST API for WordPress
 
Managing Your Content with Elasticsearch
Managing Your Content with ElasticsearchManaging Your Content with Elasticsearch
Managing Your Content with Elasticsearch
 
ElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseElasticSearch - index server used as a document database
ElasticSearch - index server used as a document database
 
[LDSP] Solr Usage
[LDSP] Solr Usage[LDSP] Solr Usage
[LDSP] Solr Usage
 
A Survey of Elasticsearch Usage
A Survey of Elasticsearch UsageA Survey of Elasticsearch Usage
A Survey of Elasticsearch Usage
 
The JSON REST API for WordPress
The JSON REST API for WordPressThe JSON REST API for WordPress
The JSON REST API for WordPress
 

Similar to How to automate all your SEO projects

[EXTENDED] Ceph, Docker, Heroku Slugs, CoreOS and Deis Overview
[EXTENDED] Ceph, Docker, Heroku Slugs, CoreOS and Deis Overview[EXTENDED] Ceph, Docker, Heroku Slugs, CoreOS and Deis Overview
[EXTENDED] Ceph, Docker, Heroku Slugs, CoreOS and Deis OverviewLeo Lorieri
 
Make BDD great again
Make BDD great againMake BDD great again
Make BDD great againYana Gusti
 
Toolbox of a Ruby Team
Toolbox of a Ruby TeamToolbox of a Ruby Team
Toolbox of a Ruby TeamArto Artnik
 
Dependencies Managers in C/C++. Using stdcpp 2014
Dependencies Managers in C/C++. Using stdcpp 2014Dependencies Managers in C/C++. Using stdcpp 2014
Dependencies Managers in C/C++. Using stdcpp 2014biicode
 
From development environments to production deployments with Docker, Compose,...
From development environments to production deployments with Docker, Compose,...From development environments to production deployments with Docker, Compose,...
From development environments to production deployments with Docker, Compose,...Jérôme Petazzoni
 
2019 11-bgphp
2019 11-bgphp2019 11-bgphp
2019 11-bgphpdantleech
 
MobileConf 2021 Slides: Let's build macOS CLI Utilities using Swift
MobileConf 2021 Slides:  Let's build macOS CLI Utilities using SwiftMobileConf 2021 Slides:  Let's build macOS CLI Utilities using Swift
MobileConf 2021 Slides: Let's build macOS CLI Utilities using SwiftDiego Freniche Brito
 
Spicy javascript: Create your first Chrome extension for web analytics QA
Spicy javascript: Create your first Chrome extension for web analytics QASpicy javascript: Create your first Chrome extension for web analytics QA
Spicy javascript: Create your first Chrome extension for web analytics QAAlban Gérôme
 
Exploring Async PHP (SF Live Berlin 2019)
Exploring Async PHP (SF Live Berlin 2019)Exploring Async PHP (SF Live Berlin 2019)
Exploring Async PHP (SF Live Berlin 2019)dantleech
 
Container (Docker) Orchestration Tools
Container (Docker) Orchestration ToolsContainer (Docker) Orchestration Tools
Container (Docker) Orchestration ToolsDhilipsiva DS
 
Server(less) Swift at SwiftCloudWorkshop 3
Server(less) Swift at SwiftCloudWorkshop 3Server(less) Swift at SwiftCloudWorkshop 3
Server(less) Swift at SwiftCloudWorkshop 3kognate
 
Developing and Deploying PHP with Docker
Developing and Deploying PHP with DockerDeveloping and Deploying PHP with Docker
Developing and Deploying PHP with DockerPatrick Mizer
 
Development Workflow Tools for Open-Source PHP Libraries
Development Workflow Tools for Open-Source PHP LibrariesDevelopment Workflow Tools for Open-Source PHP Libraries
Development Workflow Tools for Open-Source PHP LibrariesPantheon
 
Time tested php with libtimemachine
Time tested php with libtimemachineTime tested php with libtimemachine
Time tested php with libtimemachineNick Galbreath
 
Parse cloud code
Parse cloud codeParse cloud code
Parse cloud code維佋 唐
 
PSGI and Plack from first principles
PSGI and Plack from first principlesPSGI and Plack from first principles
PSGI and Plack from first principlesPerl Careers
 
C++ Windows Forms L01 - Intro
C++ Windows Forms L01 - IntroC++ Windows Forms L01 - Intro
C++ Windows Forms L01 - IntroMohammad Shaker
 

Similar to How to automate all your SEO projects (20)

[EXTENDED] Ceph, Docker, Heroku Slugs, CoreOS and Deis Overview
[EXTENDED] Ceph, Docker, Heroku Slugs, CoreOS and Deis Overview[EXTENDED] Ceph, Docker, Heroku Slugs, CoreOS and Deis Overview
[EXTENDED] Ceph, Docker, Heroku Slugs, CoreOS and Deis Overview
 
Logstash
LogstashLogstash
Logstash
 
Make BDD great again
Make BDD great againMake BDD great again
Make BDD great again
 
Toolbox of a Ruby Team
Toolbox of a Ruby TeamToolbox of a Ruby Team
Toolbox of a Ruby Team
 
Dependencies Managers in C/C++. Using stdcpp 2014
Dependencies Managers in C/C++. Using stdcpp 2014Dependencies Managers in C/C++. Using stdcpp 2014
Dependencies Managers in C/C++. Using stdcpp 2014
 
From development environments to production deployments with Docker, Compose,...
From development environments to production deployments with Docker, Compose,...From development environments to production deployments with Docker, Compose,...
From development environments to production deployments with Docker, Compose,...
 
2019 11-bgphp
2019 11-bgphp2019 11-bgphp
2019 11-bgphp
 
MobileConf 2021 Slides: Let's build macOS CLI Utilities using Swift
MobileConf 2021 Slides:  Let's build macOS CLI Utilities using SwiftMobileConf 2021 Slides:  Let's build macOS CLI Utilities using Swift
MobileConf 2021 Slides: Let's build macOS CLI Utilities using Swift
 
Spicy javascript: Create your first Chrome extension for web analytics QA
Spicy javascript: Create your first Chrome extension for web analytics QASpicy javascript: Create your first Chrome extension for web analytics QA
Spicy javascript: Create your first Chrome extension for web analytics QA
 
Exploring Async PHP (SF Live Berlin 2019)
Exploring Async PHP (SF Live Berlin 2019)Exploring Async PHP (SF Live Berlin 2019)
Exploring Async PHP (SF Live Berlin 2019)
 
Container (Docker) Orchestration Tools
Container (Docker) Orchestration ToolsContainer (Docker) Orchestration Tools
Container (Docker) Orchestration Tools
 
Docker, c'est bonheur !
Docker, c'est bonheur !Docker, c'est bonheur !
Docker, c'est bonheur !
 
Server(less) Swift at SwiftCloudWorkshop 3
Server(less) Swift at SwiftCloudWorkshop 3Server(less) Swift at SwiftCloudWorkshop 3
Server(less) Swift at SwiftCloudWorkshop 3
 
Developing and Deploying PHP with Docker
Developing and Deploying PHP with DockerDeveloping and Deploying PHP with Docker
Developing and Deploying PHP with Docker
 
Development Workflow Tools for Open-Source PHP Libraries
Development Workflow Tools for Open-Source PHP LibrariesDevelopment Workflow Tools for Open-Source PHP Libraries
Development Workflow Tools for Open-Source PHP Libraries
 
Puppi. Puppet strings to the shell
Puppi. Puppet strings to the shellPuppi. Puppet strings to the shell
Puppi. Puppet strings to the shell
 
Time tested php with libtimemachine
Time tested php with libtimemachineTime tested php with libtimemachine
Time tested php with libtimemachine
 
Parse cloud code
Parse cloud codeParse cloud code
Parse cloud code
 
PSGI and Plack from first principles
PSGI and Plack from first principlesPSGI and Plack from first principles
PSGI and Plack from first principles
 
C++ Windows Forms L01 - Intro
C++ Windows Forms L01 - IntroC++ Windows Forms L01 - Intro
C++ Windows Forms L01 - Intro
 

More from Vincent Terrasi

IA générative : Menace ou Opportunité pour le SEO
IA générative : Menace ou Opportunité pour le SEOIA générative : Menace ou Opportunité pour le SEO
IA générative : Menace ou Opportunité pour le SEOVincent Terrasi
 
slides SEO CAMP'us Paris 2022 - Google et tools SEO On vous a menti
slides SEO CAMP'us Paris 2022 - Google et tools SEO  On vous a mentislides SEO CAMP'us Paris 2022 - Google et tools SEO  On vous a menti
slides SEO CAMP'us Paris 2022 - Google et tools SEO On vous a mentiVincent Terrasi
 
Une IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEO
Une IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEOUne IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEO
Une IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEOVincent Terrasi
 
SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...
SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...
SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...Vincent Terrasi
 
Génération de contenu pour le SEO
Génération de contenu pour le SEOGénération de contenu pour le SEO
Génération de contenu pour le SEOVincent Terrasi
 
Comment faire du Data SEO sans savoir programmer ?
Comment faire du Data SEO sans savoir programmer ?Comment faire du Data SEO sans savoir programmer ?
Comment faire du Data SEO sans savoir programmer ?Vincent Terrasi
 
Explainable Machine Learning for Ranking Factors
Explainable Machine Learning for Ranking FactorsExplainable Machine Learning for Ranking Factors
Explainable Machine Learning for Ranking FactorsVincent Terrasi
 
Fausses données et Bad Data : restez vigilant !
Fausses données et Bad Data : restez vigilant !Fausses données et Bad Data : restez vigilant !
Fausses données et Bad Data : restez vigilant !Vincent Terrasi
 
Comment les plateformes de Data Science métamorphosent le SEO ?
Comment les plateformes de Data Science métamorphosent le SEO ?Comment les plateformes de Data Science métamorphosent le SEO ?
Comment les plateformes de Data Science métamorphosent le SEO ?Vincent Terrasi
 
Find out how DataScience has revolutionized SEO for OVH
Find out how DataScience has revolutionized SEO for OVHFind out how DataScience has revolutionized SEO for OVH
Find out how DataScience has revolutionized SEO for OVHVincent Terrasi
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?Vincent Terrasi
 
How Data Science can boost your SEO ?
How Data Science can boost your SEO ?How Data Science can boost your SEO ?
How Data Science can boost your SEO ?Vincent Terrasi
 

More from Vincent Terrasi (13)

IA générative : Menace ou Opportunité pour le SEO
IA générative : Menace ou Opportunité pour le SEOIA générative : Menace ou Opportunité pour le SEO
IA générative : Menace ou Opportunité pour le SEO
 
slides SEO CAMP'us Paris 2022 - Google et tools SEO On vous a menti
slides SEO CAMP'us Paris 2022 - Google et tools SEO  On vous a mentislides SEO CAMP'us Paris 2022 - Google et tools SEO  On vous a menti
slides SEO CAMP'us Paris 2022 - Google et tools SEO On vous a menti
 
Une IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEO
Une IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEOUne IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEO
Une IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEO
 
SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...
SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...
SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...
 
Génération de contenu pour le SEO
Génération de contenu pour le SEOGénération de contenu pour le SEO
Génération de contenu pour le SEO
 
Comment faire du Data SEO sans savoir programmer ?
Comment faire du Data SEO sans savoir programmer ?Comment faire du Data SEO sans savoir programmer ?
Comment faire du Data SEO sans savoir programmer ?
 
Explainable Machine Learning for Ranking Factors
Explainable Machine Learning for Ranking FactorsExplainable Machine Learning for Ranking Factors
Explainable Machine Learning for Ranking Factors
 
Fausses données et Bad Data : restez vigilant !
Fausses données et Bad Data : restez vigilant !Fausses données et Bad Data : restez vigilant !
Fausses données et Bad Data : restez vigilant !
 
Comment les plateformes de Data Science métamorphosent le SEO ?
Comment les plateformes de Data Science métamorphosent le SEO ?Comment les plateformes de Data Science métamorphosent le SEO ?
Comment les plateformes de Data Science métamorphosent le SEO ?
 
Find out how DataScience has revolutionized SEO for OVH
Find out how DataScience has revolutionized SEO for OVHFind out how DataScience has revolutionized SEO for OVH
Find out how DataScience has revolutionized SEO for OVH
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?
 
How Data Science can boost your SEO ?
How Data Science can boost your SEO ?How Data Science can boost your SEO ?
How Data Science can boost your SEO ?
 
Meetup Data-science OVH
Meetup Data-science OVHMeetup Data-science OVH
Meetup Data-science OVH
 

Recently uploaded

Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excelysmaelreyes
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证nhjeo1gg
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...ttt fff
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
SWOT Analysis Slides Powerpoint Template.pptx
SWOT Analysis Slides Powerpoint Template.pptxSWOT Analysis Slides Powerpoint Template.pptx
SWOT Analysis Slides Powerpoint Template.pptxviniciusperissetr
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 

Recently uploaded (20)

Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excel
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
SWOT Analysis Slides Powerpoint Template.pptx
SWOT Analysis Slides Powerpoint Template.pptxSWOT Analysis Slides Powerpoint Template.pptx
SWOT Analysis Slides Powerpoint Template.pptx
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 

How to automate all your SEO projects

  • 1. How to automate all your SEO projects @VincentTerrasi OVH
  • 2. Planning • Each Day : • Advanced Reporting • Anomalies Detection • Log Analysis • Webperf with SiteSpeed.io • Each Week : • Ranking monitoring • Opportunities Detection • Hot Topic Detection • Each Quarter : • Semantic Analysis Time is precious Automate everything
  • 3. 1. RStudio Server 2. Shiny Server 3. Jupyter Notebook 4. Dataiku 5. OpenSource
  • 4. searchConsoleR Docker ATinternetR oncrawlR Rstudio Server Shiny Server Dataiku DataLake Scheduled Email Notebook DataAPIShiny Apps DataViz Reports
  • 5. 1. RStudio Server Automate all your SEO projects
  • 6. Why R ? Scriptable Big Community Mac / PC / Unix Open Source Free  10 000 packages
  • 7. Rgui WheRe ? How ? Rstudio https://www.cran.r-project.org
  • 10. • Docker on Ubuntu 16.04 Server • From the docker window, run: • sudo docker run -d -p 8787:8787 rocker/rstudio • e.g. http://yourIP:8787, and you should be greeted by the RStudio welcome screen. Log in using: • username: rstudio • password: rstudio RStudio Server - Install
  • 11. • install.packages("httr") • install.packages("RCurl") • install.packages("stringr") • install.packages("stringi") • install.packages("openssl") • install.packages("Rmpi") • install.packages("doMpi") R – Scraper – Packages
  • 12. R – Scraper – RCurl seocrawler <- function( url ) { useragent <- "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25“ h <- basicTextGatherer() html <- getURL(url ,followlocation = TRUE ,ssl.verifypeer = FALSE ,httpheader = c('User-Agent' = useragent) ,headerfunction = h$update ) return(html) }
  • 13. R – Scraper – Header ind0 <- grep("HTTP/",h$value(NULL)) df$StatusCode <- tail(h$value(NULL)[ind0],1) ind1 <- grep("^Content-Type",h$value(NULL)) df$ContentType <- gsub("Content-Type:","",tail(h$value(NULL)[ind1],1)) ind2 <- grep("Last-Modified",h$value(NULL)) df$LastModified <- gsub("Last-Modified:","",tail(h$value(NULL)[ind2],1)) ind3 <- grep("Content-Language",h$value(NULL)) df$ContentLanguage <- gsub("Content-Language:","",tail(h$value(NULL)[ind3],1)) ind4 <- grep("Location",h$value(NULL)) df$Location <- gsub("Location:","",tail(h$value(NULL)[ind4],1))
  • 14. R – Scraper – Xpath doc <- htmlParse(html, asText=TRUE,encoding="UTF-8") • H1 <- head(xpathSApply(doc, "//h1", xmlValue),1) • H2 <- head(xpathSApply(doc, "//h2", xmlValue),1) • robots <- head(xpathSApply(doc, '//meta[@name="robots"]', xmlGetAttr, 'content'),1) • canonical <- head(xpathSApply(doc, '//link[@rel="canonical"]', xmlGetAttr, 'href'),1) • DF_a <- xpathSApply(doc, "//a", xmlGetAttr, 'href')
  • 15.
  • 17. R – Scraper – OpenMpi • MPI : Message Passing Interface is a specification for an API for passing messages between different computers. • Programming with MPI • Difficult because of Rmpi package defines about 110 R functions • Needs a parallel programming system to do the actual work in parallel • The doMPI package acts as an adaptor to the Rmpi package, which in turn is an R interface to an implementation of MPI • Very easy to install Open MPI, and Rmpi on Debian / Ubuntu • You can test with one computer
  • 18. R – Scraper – Install OpenMPI sudo yum install openmpi openmpi-devel openmpi-libs sudo ldconfig /usr/lib64/openmpi/lib/ export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}${LD_LIBRARY_PATH:+:}/usr/lib64/openmpi/lib/“ install.packages("Rmpi", configure.args = c("--with-Rmpi-include=/usr/include/openmpi-x86_64/", "--with-Rmpi-libpath=/usr/lib64/openmpi/lib/", "--with-Rmpi-type=OPENMPI")) install.packages (“doMPI“)
  • 19. R – Scraper – Test doMpi library(doMPI) #start your cluster cl <- startMPIcluster(count=20) registerDoMPI(cl) # max <- dim(mydataset)[1] x <- foreach(i=1:max, .combine="rbind") %dopar% seocrawlerThread(mydataset,i) #close your cluster closeCluster(cl)
  • 20. • Venn Matrix : http://blog.mrbioinfo.com/ R – Semantic Analysis – Intro
  • 21. R – Semantic Analysis – Data
  • 22.
  • 23. R – Semantic Analysis – eVenn evenn(pathRes="./eVenn/", matLists=all.the.data, annot=FALSE, CompName=“croisiere”)
  • 24. R – Semantic Analysis – Filter fichierVenn <- "./eVenn/Venn_croisiere/VennMatrixBin.txt" #read csv DF <- read.csv(fichierVenn, sep = "t", encoding="CP1252", stringsAsFactors=FALSE) #find DF_PotentialKeywords <- subset(DF, DF$Total_lists >= 4 & DF$planete.croisiere.com==0 )
  • 25. R – Semantic Analysis – nGram library(text2vec) it <- itoken( DF_PotentialKeywords[['Keywords']], preprocess_function = tolower, tokenizer = word_tokenizer, progessbar = F ) # 2 and 3 grams vocab <- create_vocabulary(it, ngram = c(2L, 3L)) DF_SEO_vocab <- data.frame(vocab$vocab) DF_SEO_select <- data.frame(word=DF_SEO_vocab$terms, freq=DF_SEO_vocab$terms_counts) %>% arrange(-freq) %>% top_n(30)
  • 26.
  • 27. • Dplyr • Readxl • SearchConsoleR • googleAuthR • googleAnalyticsR R – Packages SEO Thanks to Mark Edmondson
  • 28. R – SearchConsoleR library(googleAuthR) library(searchConsoleR) # get your password on google console api options("searchConsoleR.client_id" = "41078866233615q3i3uXXXX.apps.googleusercontent.com") options("searchConsoleR.client_secret" = "GO0m0XXXXXXXXXX") ## change this to the website you want to download data for. Include http website <- "https://data-seo.fr" ## data is in search console reliably 3 days ago, so we donwnload from then ## today - 3 days start <- Sys.Date() - 3 ## one days data, but change it as needed end <- Sys.Date() - 3
  • 29. R – SearchConsoleR ## what to download, choose between data, query, page, device, country download_dimensions <- c('date','query') ## what type of Google search, choose between 'web', 'video' or 'image' type <- c('web') ## Authorize script with Search Console. ## First time you will need to login to Google but should auto-refresh after that so can be put in ## Authorize script with an account that has access to website. googleAuthR::gar_auth() ## first time stop here and wait for authorisation ## get the search analytics data data <- search_analytics(siteURL = website, startDate = start, endDate = end, dimensions = download_dimensions, searchType = type)
  • 30.
  • 31. • Table: Crontab Fields and Allowed Ranges (Linux Crontab Syntax) • MIN Minute field 0 to 59 • HOUR Hour field 0 to 23 • DOM Day of Month 1-31 • MON Month field 1-12 • DOW Day Of Week 0-6 • CMD Command Any command to be executed. • $ crontab –e • Run the R script filePath.R at 23:15 for every day of the year : 15 23 * * * Rscript filePath.R R – CronTab – Method 1
  • 32. • R Package : https://github.com/bnosac/cronR R – Cron – Method 2 library(cronR) cron_add(cmd, frequency = 'hourly', id = 'job4', at = '00:20', days_of_week = c(1, 2)) cron_add(cmd, frequency = 'daily', id = 'job5', at = '14:20') cron_add(cmd, frequency = 'daily', id = 'job6', at = '14:20', days_of_week = c(0, 3, 5)) OR
  • 34. 2. Shiny Server Creating webapps with R
  • 36. Shiny Server – Where and How • ShinyApps.io • A local server • Hosted on your server
  • 37. • docker run --rm -p 3838:3838 -v /srv/shinyapps/:/srv/shiny-server/ -v /srv/shinylog/:/var/log/ rocker/shiny • If you have an app in /srv/shinyapps/appdir, you can run the app by visiting http://yourIP:3838/appdir/. Shiny Server - Install
  • 38. Shiny – ui.R fluidPage( titlePanel("Compute your internal pagerank"), sidebarLayout( sidebarPanel( a("data-seo.com", href="https://data-seo.com"), tags$hr(), p('Step 1 : Export your outlinks data from ScreamingFrog'), fileInput('file1', 'Choose file to upload (e.g. all_outlinks.csv)', accept = c('text/csv'), multiple = FALSE ), tags$hr(), downloadButton('downloadData', 'Download CSV') ), mainPanel( h3(textOutput("caption")), tags$hr(), tableOutput('contents') ) ) )
  • 39. Shiny – server.R function(input, output, session) { .... output$contents <- renderTable({ if (!is.null(input$file1)) { inFile <- input$file1 logsSummary <- importLogs(inFile$datapath) logsSummary } }) output$downloadData <- downloadHandler( filename = "extract.csv", content = function(file) { if (!is.null(input$file1)) { inFile <- input$file1 logsSummary <- importLogs(inFile$datapath) write.csv2(logsSummary,file, row.names = FALSE) } } ) }
  • 40. https://mark.shinyapps.io/GA-dashboard-demo Code on Github: https://github.com/MarkEdmondson1234/ga-dashboard-demo • Interactive trend graphs. • Auto-updating Google Analytics data. • Zoomable day-of-week heatmaps. • Top Level Trends via Year on Year, Month on Month and Last Month vs Month Last Year data modules. • A MySQL connection for data blending your own data with GA data. • An easy upload option to update a MySQL database. • Analysis of the impact of marketing events via Google's CausalImpact. • Detection of unusual time-points using Twitter's Anomaly Detection. Shiny – Use case
  • 41.
  • 43. 3. Jupyter Notebook Sharing source code with your SEO team
  • 45. • Reproducibility • Quality • Discoverability • Learning Jupyter Notebook – Why ?
  • 46. Step 1 — Installing Python 2.7 and Pip $ sudo apt-get update $ sudo apt-get -y install python2.7 python-pip python-dev Step 2 — Installing Ipython and Jupyter Notebook $ sudo apt-get -y install ipython ipython-notebook $ sudo -H pip install jupyter Step 3 — Running Jupyter Notebook $ jupyter notebook Jupyter Notebook Install
  • 48. • https://github.com/voltek62/RNotebook-SEO • Semantic Analysis for SEO • Scraper for SEO Jupyter Notebook Examples
  • 50. 4. Dataiku Use AML to find the best algorithm
  • 51. Automated Machine Learning • Benchmarking • Detecting Target Leakage • Diagnostics • Automation
  • 52. $ adduser vincent sudo $ sudo apt-get install default-jre $ wget https://downloads.dataiku.com/public/studio/4.0.1/dataiku-dss-4.0.1.tar.gz $ tar xzf dataiku-dss-4.0.1.tar.gz $ cd dataiku-dss-4.0.1 >> install all prerequites $ sudo -i "/home/dataiku-dss-4.0.1/scripts/install/install-deps.sh" -without-java >> install dataiku $ ./installer.sh -d DATA_DIR -p 11000 $ DATA_DIR/bin/dss start http://<your server address>:11000. Dataiku- Install on Instance Cloud
  • 53. Go to the DSS data dir $ cd DATADIR Stop DSS $ ./bin/dss stop Run the installation script $ ./bin/dssadmin install-R-integration $ ./bin/dss start Dataiku- Install R
  • 56. • Get all your featured snippet with Ranxplorer • Get SERP for each keywords with Ranxplorer • Use homemade scraper to enrich data : • 'Keyword' 'Domain' 'StatusCode' 'ContentType' 'LastModified' 'Location' • 'Title' 'TitleLength' 'TitleDist' 'TitleIsQuestion' • 'noSnippet' 'isJsonLD' 'isItemType' 'isItemProp' • 'Wordcount' 'Size' 'ResponseTime' • 'H1' 'H1Length' 'H1Dist' 'H1IsQuestion' • 'H2' 'H2Length' 'H2Dist' 'H2IsQuestion‘ • Use AML to find importance features Dataiku : Featured Snippet
  • 58. Dataiku : Input / Output
  • 59. Dataiku : Code Recipe
  • 60.
  • 61.
  • 62. Dataiku : Visual Recipes
  • 63. Dataiku : Plugin recipes
  • 64. Dataiku : My Plugins • SEMrush • SearchConsole • Majestic • Visiblis [ongoing] A DSS plugin is a zip file. Inside DSS, click the top right gear → Administration → Plugins → Store. https://github.com/voltek62/Dataiku-SEO-Plugins
  • 66. Dataiku : Import a project
  • 67. • Learn from the success of others with AML • Use all methods at your disposal to show Google you are the answer to the question. ( Title, H1, H2, … ) Dataiku : Results
  • 69.
  • 70. • Yes, you can because : • Great advertising • Get customers for specific features and trainings Open Source & SEO ? • Showing your work • Attract talent • Teaching the next generation
  • 71. • Automated Reports with Rstudio Server • Automated KPI reporting with Shiny Server • Process Validation Documentation with Jupyter Notebook • Automated Machine Learning with Dataiku Take away
  • 72. Now, machines can learn and adapt, it is time to take advantage of the opportunity to create new jobs. Data-SEO, Data-Doctor, Data-Journalist …
  • 74. Vincent Terrasi @vincentterrasi Get all my last discoveries and updates

Editor's Notes

  1. COMMENT ?
  2. R est un langage informatique dédié aux statistiques et à la science des données. L'implémentation la plus connue du langage R est le logiciel GNU R.
  3. Header de la response HTTP : collect the contents of the header of an HTTP response
  4. Itoken : This function creates iterators over input objects to vocabularies, corpora, or DTM and TCM matrices. This iterator is usually used in following functions : create_vocabulary, create_corpus, create_dtm, vectorizers,create_tcm. See them for details. create_vocabulary : This function collects unique terms and corresponding statistics. See the below for details.
  5. Email ,…..
  6. Shiny is a toolkit from RStudio that makes creating web applications much easier. (HTML, CSS, Java, JavaScript et jQuery ) Shiny is licensed GPLv3, and the source is available on GitHub.
  7. Shiny is a toolkit from RStudio that makes creating web applications much easier. (HTML, CSS, Java, JavaScript et jQuery ) Shiny is licensed GPLv3, and the source is available on GitHub.
  8. Install one line
  9. 2 fichiers UI.R et server.R
  10. Changer crawler par scraper
  11. Benchmarking : AML can quickly present a lot of models using the same training set Detecting Target Leakage: AML builds candidate models extremely fast in an automated way Diagnostics: Diagnostics can be automatically generated such as learning curves, feature importances, etc. Automation : Tasks like exploratory data analysis, pre-processing of data, model selection and putting models into production can be automated.