How to automate all
your SEO projects
@VincentTerrasi
OVH
Planning
• Each Day :
• Advanced Reporting
• Anomalies Detection
• Log Analysis
• Webperf with SiteSpeed.io
• Each Week :
• Ranking monitoring
• Opportunities Detection
• Hot Topic Detection
• Each Quarter :
• Semantic Analysis
Time is precious
Automate
everything
1. RStudio Server
2. Shiny Server
3. Jupyter Notebook
4. Dataiku
5. OpenSource
searchConsoleR
Docker
ATinternetR oncrawlR
Rstudio
Server
Shiny Server Dataiku
DataLake
Scheduled
Email
Notebook DataAPIShiny Apps DataViz
Reports
1. RStudio Server
Automate all your SEO projects
Why R ?
Scriptable
Big Community
Mac / PC / Unix
Open Source
Free
 10 000 packages
Rgui
WheRe ? How ?
Rstudio
https://www.cran.r-project.org
1
2
3
4
RStudio Server
OVH – Instance Cloud
• Docker on Ubuntu 16.04 Server
• From the docker window, run:
• sudo docker run -d -p 8787:8787 rocker/rstudio
• e.g. http://yourIP:8787, and you should be greeted by the RStudio
welcome screen.
Log in using:
• username: rstudio
• password: rstudio
RStudio Server - Install
• install.packages("httr")
• install.packages("RCurl")
• install.packages("stringr")
• install.packages("stringi")
• install.packages("openssl")
• install.packages("Rmpi")
• install.packages("doMpi")
R – Scraper – Packages
R – Scraper – RCurl
seocrawler <- function( url ) {
useragent <- "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko)
Version/6.0 Mobile/10A5376e Safari/8536.25“
h <- basicTextGatherer()
html <- getURL(url
,followlocation = TRUE
,ssl.verifypeer = FALSE
,httpheader = c('User-Agent' = useragent)
,headerfunction = h$update
)
return(html)
}
R – Scraper – Header
ind0 <- grep("HTTP/",h$value(NULL))
df$StatusCode <- tail(h$value(NULL)[ind0],1)
ind1 <- grep("^Content-Type",h$value(NULL))
df$ContentType <- gsub("Content-Type:","",tail(h$value(NULL)[ind1],1))
ind2 <- grep("Last-Modified",h$value(NULL))
df$LastModified <- gsub("Last-Modified:","",tail(h$value(NULL)[ind2],1))
ind3 <- grep("Content-Language",h$value(NULL))
df$ContentLanguage <- gsub("Content-Language:","",tail(h$value(NULL)[ind3],1))
ind4 <- grep("Location",h$value(NULL))
df$Location <- gsub("Location:","",tail(h$value(NULL)[ind4],1))
R – Scraper – Xpath
doc <- htmlParse(html, asText=TRUE,encoding="UTF-8")
• H1 <- head(xpathSApply(doc, "//h1", xmlValue),1)
• H2 <- head(xpathSApply(doc, "//h2", xmlValue),1)
• robots <- head(xpathSApply(doc, '//meta[@name="robots"]', xmlGetAttr, 'content'),1)
• canonical <- head(xpathSApply(doc, '//link[@rel="canonical"]', xmlGetAttr, 'href'),1)
• DF_a <- xpathSApply(doc, "//a", xmlGetAttr, 'href')
How-to go parallel in R
R – Scraper – OpenMpi
• MPI : Message Passing Interface is a specification for an API for passing
messages between different computers.
• Programming with MPI
• Difficult because of Rmpi package defines about 110 R functions
• Needs a parallel programming system to do the actual work in parallel
• The doMPI package acts as an adaptor to the Rmpi package, which in
turn is an R interface to an implementation of MPI
• Very easy to install Open MPI, and Rmpi on Debian / Ubuntu
• You can test with one computer
R – Scraper – Install OpenMPI
sudo yum install openmpi openmpi-devel openmpi-libs
sudo ldconfig /usr/lib64/openmpi/lib/
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}${LD_LIBRARY_PATH:+:}/usr/lib64/openmpi/lib/“
install.packages("Rmpi",
configure.args =
c("--with-Rmpi-include=/usr/include/openmpi-x86_64/",
"--with-Rmpi-libpath=/usr/lib64/openmpi/lib/",
"--with-Rmpi-type=OPENMPI"))
install.packages (“doMPI“)
R – Scraper – Test doMpi
library(doMPI)
#start your cluster
cl <- startMPIcluster(count=20)
registerDoMPI(cl)
#
max <- dim(mydataset)[1]
x <- foreach(i=1:max, .combine="rbind") %dopar% seocrawlerThread(mydataset,i)
#close your cluster
closeCluster(cl)
• Venn Matrix :
http://blog.mrbioinfo.com/
R – Semantic Analysis – Intro
R – Semantic Analysis – Data
R – Semantic Analysis – eVenn
evenn(pathRes="./eVenn/", matLists=all.the.data, annot=FALSE, CompName=“croisiere”)
R – Semantic Analysis – Filter
fichierVenn <- "./eVenn/Venn_croisiere/VennMatrixBin.txt"
#read csv
DF <- read.csv(fichierVenn, sep = "t", encoding="CP1252", stringsAsFactors=FALSE)
#find
DF_PotentialKeywords <- subset(DF, DF$Total_lists >= 4 & DF$planete.croisiere.com==0 )
R – Semantic Analysis – nGram
library(text2vec)
it <- itoken( DF_PotentialKeywords[['Keywords']],
preprocess_function = tolower,
tokenizer = word_tokenizer,
progessbar = F )
# 2 and 3 grams
vocab <- create_vocabulary(it, ngram = c(2L, 3L))
DF_SEO_vocab <- data.frame(vocab$vocab)
DF_SEO_select <- data.frame(word=DF_SEO_vocab$terms,
freq=DF_SEO_vocab$terms_counts) %>%
arrange(-freq) %>%
top_n(30)
• Dplyr
• Readxl
• SearchConsoleR
• googleAuthR
• googleAnalyticsR
R – Packages SEO
Thanks to Mark Edmondson
R – SearchConsoleR
library(googleAuthR)
library(searchConsoleR)
# get your password on google console api
options("searchConsoleR.client_id" = "41078866233615q3i3uXXXX.apps.googleusercontent.com")
options("searchConsoleR.client_secret" = "GO0m0XXXXXXXXXX")
## change this to the website you want to download data for. Include http
website <- "https://data-seo.fr"
## data is in search console reliably 3 days ago, so we donwnload from then
## today - 3 days
start <- Sys.Date() - 3
## one days data, but change it as needed
end <- Sys.Date() - 3
R – SearchConsoleR
## what to download, choose between data, query, page, device, country
download_dimensions <- c('date','query')
## what type of Google search, choose between 'web', 'video' or 'image'
type <- c('web')
## Authorize script with Search Console.
## First time you will need to login to Google but should auto-refresh after that so can be put in
## Authorize script with an account that has access to website.
googleAuthR::gar_auth()
## first time stop here and wait for authorisation
## get the search analytics data
data <- search_analytics(siteURL = website, startDate = start, endDate = end, dimensions =
download_dimensions, searchType = type)
• Table: Crontab Fields and Allowed Ranges (Linux Crontab Syntax)
• MIN Minute field 0 to 59
• HOUR Hour field 0 to 23
• DOM Day of Month 1-31
• MON Month field 1-12
• DOW Day Of Week 0-6
• CMD Command Any command to be executed.
• $ crontab –e
• Run the R script filePath.R at 23:15 for every day of the year :
15 23 * * * Rscript filePath.R
R – CronTab – Method 1
• R Package : https://github.com/bnosac/cronR
R – Cron – Method 2
library(cronR)
cron_add(cmd, frequency = 'hourly', id = 'job4', at = '00:20',
days_of_week = c(1, 2))
cron_add(cmd, frequency = 'daily', id = 'job5', at = '14:20')
cron_add(cmd, frequency = 'daily', id = 'job6', at = '14:20',
days_of_week = c(0, 3, 5))
OR
Automated Reports
2. Shiny Server
Creating webapps with R
Shiny Server - Why
Shiny Server – Where and How
• ShinyApps.io
• A local server
• Hosted on your server
• docker run --rm -p 3838:3838
-v /srv/shinyapps/:/srv/shiny-server/
-v /srv/shinylog/:/var/log/
rocker/shiny
• If you have an app in /srv/shinyapps/appdir, you can run the app
by visiting http://yourIP:3838/appdir/.
Shiny Server - Install
Shiny – ui.R
fluidPage(
titlePanel("Compute your internal pagerank"),
sidebarLayout(
sidebarPanel(
a("data-seo.com", href="https://data-seo.com"),
tags$hr(),
p('Step 1 : Export your outlinks data from ScreamingFrog'),
fileInput('file1', 'Choose file to upload (e.g. all_outlinks.csv)',
accept = c('text/csv'), multiple = FALSE
),
tags$hr(),
downloadButton('downloadData', 'Download CSV')
),
mainPanel(
h3(textOutput("caption")),
tags$hr(),
tableOutput('contents')
)
)
)
Shiny – server.R
function(input, output, session) {
....
output$contents <- renderTable({
if (!is.null(input$file1)) {
inFile <- input$file1
logsSummary <- importLogs(inFile$datapath)
logsSummary
}
})
output$downloadData <- downloadHandler(
filename = "extract.csv",
content = function(file) {
if (!is.null(input$file1)) {
inFile <- input$file1
logsSummary <- importLogs(inFile$datapath)
write.csv2(logsSummary,file, row.names = FALSE)
}
}
)
}
https://mark.shinyapps.io/GA-dashboard-demo
Code on Github: https://github.com/MarkEdmondson1234/ga-dashboard-demo
• Interactive trend graphs.
• Auto-updating Google Analytics data.
• Zoomable day-of-week heatmaps.
• Top Level Trends via Year on Year, Month on Month
and Last Month vs Month Last Year data modules.
• A MySQL connection for data blending your own data with GA data.
• An easy upload option to update a MySQL database.
• Analysis of the impact of marketing events via Google's CausalImpact.
• Detection of unusual time-points using Twitter's Anomaly Detection.
Shiny – Use case
Automated KPI reporting
3. Jupyter Notebook
Sharing source code with your SEO team
Jupyter Notebook Example
• Reproducibility
• Quality
• Discoverability
• Learning
Jupyter Notebook – Why ?
Step 1 — Installing Python 2.7 and Pip
$ sudo apt-get update
$ sudo apt-get -y install python2.7 python-pip python-dev
Step 2 — Installing Ipython and Jupyter Notebook
$ sudo apt-get -y install ipython ipython-notebook
$ sudo -H pip install jupyter
Step 3 — Running Jupyter Notebook
$ jupyter notebook
Jupyter Notebook Install
Notebook Example
• https://github.com/voltek62/RNotebook-SEO
• Semantic Analysis for SEO
• Scraper for SEO
Jupyter Notebook Examples
Process Validation
Documentation
4. Dataiku
Use AML to find the best algorithm
Automated Machine Learning
• Benchmarking
• Detecting Target Leakage
• Diagnostics
• Automation
$ adduser vincent sudo
$ sudo apt-get install default-jre
$ wget https://downloads.dataiku.com/public/studio/4.0.1/dataiku-dss-4.0.1.tar.gz
$ tar xzf dataiku-dss-4.0.1.tar.gz
$ cd dataiku-dss-4.0.1
>> install all prerequites
$ sudo -i "/home/dataiku-dss-4.0.1/scripts/install/install-deps.sh" -without-java
>> install dataiku
$ ./installer.sh -d DATA_DIR -p 11000
$ DATA_DIR/bin/dss start
http://<your server address>:11000.
Dataiku- Install on Instance Cloud
Go to the DSS data dir
$ cd DATADIR
Stop DSS
$ ./bin/dss stop
Run the installation script
$ ./bin/dssadmin install-R-integration
$ ./bin/dss start
Dataiku- Install R
Install R Package
Use-Case :
Detect Featured
Snippet
• Get all your featured snippet with Ranxplorer
• Get SERP for each keywords with Ranxplorer
• Use homemade scraper to enrich data :
• 'Keyword' 'Domain' 'StatusCode' 'ContentType' 'LastModified' 'Location'
• 'Title' 'TitleLength' 'TitleDist' 'TitleIsQuestion'
• 'noSnippet' 'isJsonLD' 'isItemType' 'isItemProp'
• 'Wordcount' 'Size' 'ResponseTime'
• 'H1' 'H1Length' 'H1Dist' 'H1IsQuestion'
• 'H2' 'H2Length' 'H2Dist' 'H2IsQuestion‘
• Use AML to find importance features
Dataiku : Featured Snippet
Dataiku : Flow
Dataiku : Input / Output
Dataiku : Code Recipe
Dataiku : Visual Recipes
Dataiku : Plugin recipes
Dataiku : My Plugins
• SEMrush
• SearchConsole
• Majestic
• Visiblis [ongoing]
A DSS plugin is a zip file.
Inside DSS, click the top right gear → Administration → Plugins → Store.
https://github.com/voltek62/Dataiku-SEO-Plugins
Dataiku : AML
Dataiku : Import a project
• Learn from the success of others with AML
• Use all methods at your disposal to show Google you are the
answer to the question. ( Title, H1, H2, … )
Dataiku : Results
Automated Machine Learning
• Yes, you can because :
• Great advertising
• Get customers for specific features and trainings
Open Source & SEO ?
• Showing your work
• Attract talent
• Teaching the next generation
• Automated Reports with Rstudio Server
• Automated KPI reporting with Shiny Server
• Process Validation Documentation with Jupyter Notebook
• Automated Machine Learning with Dataiku
Take away
Now, machines can learn and adapt,
it is time to take advantage of the
opportunity to create new jobs.
Data-SEO, Data-Doctor, Data-Journalist …
Thank you!
Vincent Terrasi
@vincentterrasi
Get all my last discoveries and updates

How to automate all your SEO projects

  • 1.
    How to automateall your SEO projects @VincentTerrasi OVH
  • 2.
    Planning • Each Day: • Advanced Reporting • Anomalies Detection • Log Analysis • Webperf with SiteSpeed.io • Each Week : • Ranking monitoring • Opportunities Detection • Hot Topic Detection • Each Quarter : • Semantic Analysis Time is precious Automate everything
  • 3.
    1. RStudio Server 2.Shiny Server 3. Jupyter Notebook 4. Dataiku 5. OpenSource
  • 4.
    searchConsoleR Docker ATinternetR oncrawlR Rstudio Server Shiny ServerDataiku DataLake Scheduled Email Notebook DataAPIShiny Apps DataViz Reports
  • 5.
    1. RStudio Server Automateall your SEO projects
  • 6.
    Why R ? Scriptable BigCommunity Mac / PC / Unix Open Source Free  10 000 packages
  • 7.
    Rgui WheRe ? How? Rstudio https://www.cran.r-project.org
  • 8.
  • 9.
  • 10.
    • Docker onUbuntu 16.04 Server • From the docker window, run: • sudo docker run -d -p 8787:8787 rocker/rstudio • e.g. http://yourIP:8787, and you should be greeted by the RStudio welcome screen. Log in using: • username: rstudio • password: rstudio RStudio Server - Install
  • 11.
    • install.packages("httr") • install.packages("RCurl") •install.packages("stringr") • install.packages("stringi") • install.packages("openssl") • install.packages("Rmpi") • install.packages("doMpi") R – Scraper – Packages
  • 12.
    R – Scraper– RCurl seocrawler <- function( url ) { useragent <- "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25“ h <- basicTextGatherer() html <- getURL(url ,followlocation = TRUE ,ssl.verifypeer = FALSE ,httpheader = c('User-Agent' = useragent) ,headerfunction = h$update ) return(html) }
  • 13.
    R – Scraper– Header ind0 <- grep("HTTP/",h$value(NULL)) df$StatusCode <- tail(h$value(NULL)[ind0],1) ind1 <- grep("^Content-Type",h$value(NULL)) df$ContentType <- gsub("Content-Type:","",tail(h$value(NULL)[ind1],1)) ind2 <- grep("Last-Modified",h$value(NULL)) df$LastModified <- gsub("Last-Modified:","",tail(h$value(NULL)[ind2],1)) ind3 <- grep("Content-Language",h$value(NULL)) df$ContentLanguage <- gsub("Content-Language:","",tail(h$value(NULL)[ind3],1)) ind4 <- grep("Location",h$value(NULL)) df$Location <- gsub("Location:","",tail(h$value(NULL)[ind4],1))
  • 14.
    R – Scraper– Xpath doc <- htmlParse(html, asText=TRUE,encoding="UTF-8") • H1 <- head(xpathSApply(doc, "//h1", xmlValue),1) • H2 <- head(xpathSApply(doc, "//h2", xmlValue),1) • robots <- head(xpathSApply(doc, '//meta[@name="robots"]', xmlGetAttr, 'content'),1) • canonical <- head(xpathSApply(doc, '//link[@rel="canonical"]', xmlGetAttr, 'href'),1) • DF_a <- xpathSApply(doc, "//a", xmlGetAttr, 'href')
  • 16.
  • 17.
    R – Scraper– OpenMpi • MPI : Message Passing Interface is a specification for an API for passing messages between different computers. • Programming with MPI • Difficult because of Rmpi package defines about 110 R functions • Needs a parallel programming system to do the actual work in parallel • The doMPI package acts as an adaptor to the Rmpi package, which in turn is an R interface to an implementation of MPI • Very easy to install Open MPI, and Rmpi on Debian / Ubuntu • You can test with one computer
  • 18.
    R – Scraper– Install OpenMPI sudo yum install openmpi openmpi-devel openmpi-libs sudo ldconfig /usr/lib64/openmpi/lib/ export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}${LD_LIBRARY_PATH:+:}/usr/lib64/openmpi/lib/“ install.packages("Rmpi", configure.args = c("--with-Rmpi-include=/usr/include/openmpi-x86_64/", "--with-Rmpi-libpath=/usr/lib64/openmpi/lib/", "--with-Rmpi-type=OPENMPI")) install.packages (“doMPI“)
  • 19.
    R – Scraper– Test doMpi library(doMPI) #start your cluster cl <- startMPIcluster(count=20) registerDoMPI(cl) # max <- dim(mydataset)[1] x <- foreach(i=1:max, .combine="rbind") %dopar% seocrawlerThread(mydataset,i) #close your cluster closeCluster(cl)
  • 20.
    • Venn Matrix: http://blog.mrbioinfo.com/ R – Semantic Analysis – Intro
  • 21.
    R – SemanticAnalysis – Data
  • 23.
    R – SemanticAnalysis – eVenn evenn(pathRes="./eVenn/", matLists=all.the.data, annot=FALSE, CompName=“croisiere”)
  • 24.
    R – SemanticAnalysis – Filter fichierVenn <- "./eVenn/Venn_croisiere/VennMatrixBin.txt" #read csv DF <- read.csv(fichierVenn, sep = "t", encoding="CP1252", stringsAsFactors=FALSE) #find DF_PotentialKeywords <- subset(DF, DF$Total_lists >= 4 & DF$planete.croisiere.com==0 )
  • 25.
    R – SemanticAnalysis – nGram library(text2vec) it <- itoken( DF_PotentialKeywords[['Keywords']], preprocess_function = tolower, tokenizer = word_tokenizer, progessbar = F ) # 2 and 3 grams vocab <- create_vocabulary(it, ngram = c(2L, 3L)) DF_SEO_vocab <- data.frame(vocab$vocab) DF_SEO_select <- data.frame(word=DF_SEO_vocab$terms, freq=DF_SEO_vocab$terms_counts) %>% arrange(-freq) %>% top_n(30)
  • 27.
    • Dplyr • Readxl •SearchConsoleR • googleAuthR • googleAnalyticsR R – Packages SEO Thanks to Mark Edmondson
  • 28.
    R – SearchConsoleR library(googleAuthR) library(searchConsoleR) #get your password on google console api options("searchConsoleR.client_id" = "41078866233615q3i3uXXXX.apps.googleusercontent.com") options("searchConsoleR.client_secret" = "GO0m0XXXXXXXXXX") ## change this to the website you want to download data for. Include http website <- "https://data-seo.fr" ## data is in search console reliably 3 days ago, so we donwnload from then ## today - 3 days start <- Sys.Date() - 3 ## one days data, but change it as needed end <- Sys.Date() - 3
  • 29.
    R – SearchConsoleR ##what to download, choose between data, query, page, device, country download_dimensions <- c('date','query') ## what type of Google search, choose between 'web', 'video' or 'image' type <- c('web') ## Authorize script with Search Console. ## First time you will need to login to Google but should auto-refresh after that so can be put in ## Authorize script with an account that has access to website. googleAuthR::gar_auth() ## first time stop here and wait for authorisation ## get the search analytics data data <- search_analytics(siteURL = website, startDate = start, endDate = end, dimensions = download_dimensions, searchType = type)
  • 31.
    • Table: CrontabFields and Allowed Ranges (Linux Crontab Syntax) • MIN Minute field 0 to 59 • HOUR Hour field 0 to 23 • DOM Day of Month 1-31 • MON Month field 1-12 • DOW Day Of Week 0-6 • CMD Command Any command to be executed. • $ crontab –e • Run the R script filePath.R at 23:15 for every day of the year : 15 23 * * * Rscript filePath.R R – CronTab – Method 1
  • 32.
    • R Package: https://github.com/bnosac/cronR R – Cron – Method 2 library(cronR) cron_add(cmd, frequency = 'hourly', id = 'job4', at = '00:20', days_of_week = c(1, 2)) cron_add(cmd, frequency = 'daily', id = 'job5', at = '14:20') cron_add(cmd, frequency = 'daily', id = 'job6', at = '14:20', days_of_week = c(0, 3, 5)) OR
  • 33.
  • 34.
  • 35.
  • 36.
    Shiny Server –Where and How • ShinyApps.io • A local server • Hosted on your server
  • 37.
    • docker run--rm -p 3838:3838 -v /srv/shinyapps/:/srv/shiny-server/ -v /srv/shinylog/:/var/log/ rocker/shiny • If you have an app in /srv/shinyapps/appdir, you can run the app by visiting http://yourIP:3838/appdir/. Shiny Server - Install
  • 38.
    Shiny – ui.R fluidPage( titlePanel("Computeyour internal pagerank"), sidebarLayout( sidebarPanel( a("data-seo.com", href="https://data-seo.com"), tags$hr(), p('Step 1 : Export your outlinks data from ScreamingFrog'), fileInput('file1', 'Choose file to upload (e.g. all_outlinks.csv)', accept = c('text/csv'), multiple = FALSE ), tags$hr(), downloadButton('downloadData', 'Download CSV') ), mainPanel( h3(textOutput("caption")), tags$hr(), tableOutput('contents') ) ) )
  • 39.
    Shiny – server.R function(input,output, session) { .... output$contents <- renderTable({ if (!is.null(input$file1)) { inFile <- input$file1 logsSummary <- importLogs(inFile$datapath) logsSummary } }) output$downloadData <- downloadHandler( filename = "extract.csv", content = function(file) { if (!is.null(input$file1)) { inFile <- input$file1 logsSummary <- importLogs(inFile$datapath) write.csv2(logsSummary,file, row.names = FALSE) } } ) }
  • 40.
    https://mark.shinyapps.io/GA-dashboard-demo Code on Github:https://github.com/MarkEdmondson1234/ga-dashboard-demo • Interactive trend graphs. • Auto-updating Google Analytics data. • Zoomable day-of-week heatmaps. • Top Level Trends via Year on Year, Month on Month and Last Month vs Month Last Year data modules. • A MySQL connection for data blending your own data with GA data. • An easy upload option to update a MySQL database. • Analysis of the impact of marketing events via Google's CausalImpact. • Detection of unusual time-points using Twitter's Anomaly Detection. Shiny – Use case
  • 42.
  • 43.
    3. Jupyter Notebook Sharingsource code with your SEO team
  • 44.
  • 45.
    • Reproducibility • Quality •Discoverability • Learning Jupyter Notebook – Why ?
  • 46.
    Step 1 —Installing Python 2.7 and Pip $ sudo apt-get update $ sudo apt-get -y install python2.7 python-pip python-dev Step 2 — Installing Ipython and Jupyter Notebook $ sudo apt-get -y install ipython ipython-notebook $ sudo -H pip install jupyter Step 3 — Running Jupyter Notebook $ jupyter notebook Jupyter Notebook Install
  • 47.
  • 48.
    • https://github.com/voltek62/RNotebook-SEO • SemanticAnalysis for SEO • Scraper for SEO Jupyter Notebook Examples
  • 49.
  • 50.
    4. Dataiku Use AMLto find the best algorithm
  • 51.
    Automated Machine Learning •Benchmarking • Detecting Target Leakage • Diagnostics • Automation
  • 52.
    $ adduser vincentsudo $ sudo apt-get install default-jre $ wget https://downloads.dataiku.com/public/studio/4.0.1/dataiku-dss-4.0.1.tar.gz $ tar xzf dataiku-dss-4.0.1.tar.gz $ cd dataiku-dss-4.0.1 >> install all prerequites $ sudo -i "/home/dataiku-dss-4.0.1/scripts/install/install-deps.sh" -without-java >> install dataiku $ ./installer.sh -d DATA_DIR -p 11000 $ DATA_DIR/bin/dss start http://<your server address>:11000. Dataiku- Install on Instance Cloud
  • 53.
    Go to theDSS data dir $ cd DATADIR Stop DSS $ ./bin/dss stop Run the installation script $ ./bin/dssadmin install-R-integration $ ./bin/dss start Dataiku- Install R
  • 54.
  • 55.
  • 56.
    • Get allyour featured snippet with Ranxplorer • Get SERP for each keywords with Ranxplorer • Use homemade scraper to enrich data : • 'Keyword' 'Domain' 'StatusCode' 'ContentType' 'LastModified' 'Location' • 'Title' 'TitleLength' 'TitleDist' 'TitleIsQuestion' • 'noSnippet' 'isJsonLD' 'isItemType' 'isItemProp' • 'Wordcount' 'Size' 'ResponseTime' • 'H1' 'H1Length' 'H1Dist' 'H1IsQuestion' • 'H2' 'H2Length' 'H2Dist' 'H2IsQuestion‘ • Use AML to find importance features Dataiku : Featured Snippet
  • 57.
  • 58.
  • 59.
  • 62.
  • 63.
  • 64.
    Dataiku : MyPlugins • SEMrush • SearchConsole • Majestic • Visiblis [ongoing] A DSS plugin is a zip file. Inside DSS, click the top right gear → Administration → Plugins → Store. https://github.com/voltek62/Dataiku-SEO-Plugins
  • 65.
  • 66.
  • 67.
    • Learn fromthe success of others with AML • Use all methods at your disposal to show Google you are the answer to the question. ( Title, H1, H2, … ) Dataiku : Results
  • 68.
  • 70.
    • Yes, youcan because : • Great advertising • Get customers for specific features and trainings Open Source & SEO ? • Showing your work • Attract talent • Teaching the next generation
  • 71.
    • Automated Reportswith Rstudio Server • Automated KPI reporting with Shiny Server • Process Validation Documentation with Jupyter Notebook • Automated Machine Learning with Dataiku Take away
  • 72.
    Now, machines canlearn and adapt, it is time to take advantage of the opportunity to create new jobs. Data-SEO, Data-Doctor, Data-Journalist …
  • 73.
  • 74.
    Vincent Terrasi @vincentterrasi Get allmy last discoveries and updates

Editor's Notes

  • #4 COMMENT ?
  • #7 R est un langage informatique dédié aux statistiques et à la science des données. L'implémentation la plus connue du langage R est le logiciel GNU R.
  • #13 Header de la response HTTP : collect the contents of the header of an HTTP response
  • #26 Itoken : This function creates iterators over input objects to vocabularies, corpora, or DTM and TCM matrices. This iterator is usually used in following functions : create_vocabulary, create_corpus, create_dtm, vectorizers,create_tcm. See them for details. create_vocabulary : This function collects unique terms and corresponding statistics. See the below for details.
  • #34 Email ,…..
  • #36 Shiny is a toolkit from RStudio that makes creating web applications much easier. (HTML, CSS, Java, JavaScript et jQuery ) Shiny is licensed GPLv3, and the source is available on GitHub.
  • #37 Shiny is a toolkit from RStudio that makes creating web applications much easier. (HTML, CSS, Java, JavaScript et jQuery ) Shiny is licensed GPLv3, and the source is available on GitHub.
  • #38 Install one line
  • #39 2 fichiers UI.R et server.R
  • #48 Changer crawler par scraper
  • #52 Benchmarking : AML can quickly present a lot of models using the same training set Detecting Target Leakage: AML builds candidate models extremely fast in an automated way Diagnostics: Diagnostics can be automatically generated such as learning curves, feature importances, etc. Automation : Tasks like exploratory data analysis, pre-processing of data, model selection and putting models into production can be automated.