Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How to automate all your SEO projects

2,178 views

Published on

Automated Reports with Rstudio Server
Automated KPI reporting with Shiny Server
Process Validation Documentation with Jupyter Notebook
Automated Machine Learning with Dataiku

Published in: Data & Analytics
  • Hello! High Quality And Affordable Essays For You. Starting at $4.99 per page - Check our website! https://vk.cc/82gJD2
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

How to automate all your SEO projects

  1. 1. How to automate all your SEO projects @VincentTerrasi OVH
  2. 2. Planning • Each Day : • Advanced Reporting • Anomalies Detection • Log Analysis • Webperf with SiteSpeed.io • Each Week : • Ranking monitoring • Opportunities Detection • Hot Topic Detection • Each Quarter : • Semantic Analysis Time is precious Automate everything
  3. 3. 1. RStudio Server 2. Shiny Server 3. Jupyter Notebook 4. Dataiku 5. OpenSource
  4. 4. searchConsoleR Docker ATinternetR oncrawlR Rstudio Server Shiny Server Dataiku DataLake Scheduled Email Notebook DataAPIShiny Apps DataViz Reports
  5. 5. 1. RStudio Server Automate all your SEO projects
  6. 6. Why R ? Scriptable Big Community Mac / PC / Unix Open Source Free  10 000 packages
  7. 7. Rgui WheRe ? How ? Rstudio https://www.cran.r-project.org
  8. 8. 1 2 3 4 RStudio Server
  9. 9. OVH – Instance Cloud
  10. 10. • Docker on Ubuntu 16.04 Server • From the docker window, run: • sudo docker run -d -p 8787:8787 rocker/rstudio • e.g. http://yourIP:8787, and you should be greeted by the RStudio welcome screen. Log in using: • username: rstudio • password: rstudio RStudio Server - Install
  11. 11. • install.packages("httr") • install.packages("RCurl") • install.packages("stringr") • install.packages("stringi") • install.packages("openssl") • install.packages("Rmpi") • install.packages("doMpi") R – Scraper – Packages
  12. 12. R – Scraper – RCurl seocrawler <- function( url ) { useragent <- "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25“ h <- basicTextGatherer() html <- getURL(url ,followlocation = TRUE ,ssl.verifypeer = FALSE ,httpheader = c('User-Agent' = useragent) ,headerfunction = h$update ) return(html) }
  13. 13. R – Scraper – Header ind0 <- grep("HTTP/",h$value(NULL)) df$StatusCode <- tail(h$value(NULL)[ind0],1) ind1 <- grep("^Content-Type",h$value(NULL)) df$ContentType <- gsub("Content-Type:","",tail(h$value(NULL)[ind1],1)) ind2 <- grep("Last-Modified",h$value(NULL)) df$LastModified <- gsub("Last-Modified:","",tail(h$value(NULL)[ind2],1)) ind3 <- grep("Content-Language",h$value(NULL)) df$ContentLanguage <- gsub("Content-Language:","",tail(h$value(NULL)[ind3],1)) ind4 <- grep("Location",h$value(NULL)) df$Location <- gsub("Location:","",tail(h$value(NULL)[ind4],1))
  14. 14. R – Scraper – Xpath doc <- htmlParse(html, asText=TRUE,encoding="UTF-8") • H1 <- head(xpathSApply(doc, "//h1", xmlValue),1) • H2 <- head(xpathSApply(doc, "//h2", xmlValue),1) • robots <- head(xpathSApply(doc, '//meta[@name="robots"]', xmlGetAttr, 'content'),1) • canonical <- head(xpathSApply(doc, '//link[@rel="canonical"]', xmlGetAttr, 'href'),1) • DF_a <- xpathSApply(doc, "//a", xmlGetAttr, 'href')
  15. 15. How-to go parallel in R
  16. 16. R – Scraper – OpenMpi • MPI : Message Passing Interface is a specification for an API for passing messages between different computers. • Programming with MPI • Difficult because of Rmpi package defines about 110 R functions • Needs a parallel programming system to do the actual work in parallel • The doMPI package acts as an adaptor to the Rmpi package, which in turn is an R interface to an implementation of MPI • Very easy to install Open MPI, and Rmpi on Debian / Ubuntu • You can test with one computer
  17. 17. R – Scraper – Install OpenMPI sudo yum install openmpi openmpi-devel openmpi-libs sudo ldconfig /usr/lib64/openmpi/lib/ export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}${LD_LIBRARY_PATH:+:}/usr/lib64/openmpi/lib/“ install.packages("Rmpi", configure.args = c("--with-Rmpi-include=/usr/include/openmpi-x86_64/", "--with-Rmpi-libpath=/usr/lib64/openmpi/lib/", "--with-Rmpi-type=OPENMPI")) install.packages (“doMPI“)
  18. 18. R – Scraper – Test doMpi library(doMPI) #start your cluster cl <- startMPIcluster(count=20) registerDoMPI(cl) # max <- dim(mydataset)[1] x <- foreach(i=1:max, .combine="rbind") %dopar% seocrawlerThread(mydataset,i) #close your cluster closeCluster(cl)
  19. 19. • Venn Matrix : http://blog.mrbioinfo.com/ R – Semantic Analysis – Intro
  20. 20. R – Semantic Analysis – Data
  21. 21. R – Semantic Analysis – eVenn evenn(pathRes="./eVenn/", matLists=all.the.data, annot=FALSE, CompName=“croisiere”)
  22. 22. R – Semantic Analysis – Filter fichierVenn <- "./eVenn/Venn_croisiere/VennMatrixBin.txt" #read csv DF <- read.csv(fichierVenn, sep = "t", encoding="CP1252", stringsAsFactors=FALSE) #find DF_PotentialKeywords <- subset(DF, DF$Total_lists >= 4 & DF$planete.croisiere.com==0 )
  23. 23. R – Semantic Analysis – nGram library(text2vec) it <- itoken( DF_PotentialKeywords[['Keywords']], preprocess_function = tolower, tokenizer = word_tokenizer, progessbar = F ) # 2 and 3 grams vocab <- create_vocabulary(it, ngram = c(2L, 3L)) DF_SEO_vocab <- data.frame(vocab$vocab) DF_SEO_select <- data.frame(word=DF_SEO_vocab$terms, freq=DF_SEO_vocab$terms_counts) %>% arrange(-freq) %>% top_n(30)
  24. 24. • Dplyr • Readxl • SearchConsoleR • googleAuthR • googleAnalyticsR R – Packages SEO Thanks to Mark Edmondson
  25. 25. R – SearchConsoleR library(googleAuthR) library(searchConsoleR) # get your password on google console api options("searchConsoleR.client_id" = "41078866233615q3i3uXXXX.apps.googleusercontent.com") options("searchConsoleR.client_secret" = "GO0m0XXXXXXXXXX") ## change this to the website you want to download data for. Include http website <- "https://data-seo.fr" ## data is in search console reliably 3 days ago, so we donwnload from then ## today - 3 days start <- Sys.Date() - 3 ## one days data, but change it as needed end <- Sys.Date() - 3
  26. 26. R – SearchConsoleR ## what to download, choose between data, query, page, device, country download_dimensions <- c('date','query') ## what type of Google search, choose between 'web', 'video' or 'image' type <- c('web') ## Authorize script with Search Console. ## First time you will need to login to Google but should auto-refresh after that so can be put in ## Authorize script with an account that has access to website. googleAuthR::gar_auth() ## first time stop here and wait for authorisation ## get the search analytics data data <- search_analytics(siteURL = website, startDate = start, endDate = end, dimensions = download_dimensions, searchType = type)
  27. 27. • Table: Crontab Fields and Allowed Ranges (Linux Crontab Syntax) • MIN Minute field 0 to 59 • HOUR Hour field 0 to 23 • DOM Day of Month 1-31 • MON Month field 1-12 • DOW Day Of Week 0-6 • CMD Command Any command to be executed. • $ crontab –e • Run the R script filePath.R at 23:15 for every day of the year : 15 23 * * * Rscript filePath.R R – CronTab – Method 1
  28. 28. • R Package : https://github.com/bnosac/cronR R – Cron – Method 2 library(cronR) cron_add(cmd, frequency = 'hourly', id = 'job4', at = '00:20', days_of_week = c(1, 2)) cron_add(cmd, frequency = 'daily', id = 'job5', at = '14:20') cron_add(cmd, frequency = 'daily', id = 'job6', at = '14:20', days_of_week = c(0, 3, 5)) OR
  29. 29. Automated Reports
  30. 30. 2. Shiny Server Creating webapps with R
  31. 31. Shiny Server - Why
  32. 32. Shiny Server – Where and How • ShinyApps.io • A local server • Hosted on your server
  33. 33. • docker run --rm -p 3838:3838 -v /srv/shinyapps/:/srv/shiny-server/ -v /srv/shinylog/:/var/log/ rocker/shiny • If you have an app in /srv/shinyapps/appdir, you can run the app by visiting http://yourIP:3838/appdir/. Shiny Server - Install
  34. 34. Shiny – ui.R fluidPage( titlePanel("Compute your internal pagerank"), sidebarLayout( sidebarPanel( a("data-seo.com", href="https://data-seo.com"), tags$hr(), p('Step 1 : Export your outlinks data from ScreamingFrog'), fileInput('file1', 'Choose file to upload (e.g. all_outlinks.csv)', accept = c('text/csv'), multiple = FALSE ), tags$hr(), downloadButton('downloadData', 'Download CSV') ), mainPanel( h3(textOutput("caption")), tags$hr(), tableOutput('contents') ) ) )
  35. 35. Shiny – server.R function(input, output, session) { .... output$contents <- renderTable({ if (!is.null(input$file1)) { inFile <- input$file1 logsSummary <- importLogs(inFile$datapath) logsSummary } }) output$downloadData <- downloadHandler( filename = "extract.csv", content = function(file) { if (!is.null(input$file1)) { inFile <- input$file1 logsSummary <- importLogs(inFile$datapath) write.csv2(logsSummary,file, row.names = FALSE) } } ) }
  36. 36. https://mark.shinyapps.io/GA-dashboard-demo Code on Github: https://github.com/MarkEdmondson1234/ga-dashboard-demo • Interactive trend graphs. • Auto-updating Google Analytics data. • Zoomable day-of-week heatmaps. • Top Level Trends via Year on Year, Month on Month and Last Month vs Month Last Year data modules. • A MySQL connection for data blending your own data with GA data. • An easy upload option to update a MySQL database. • Analysis of the impact of marketing events via Google's CausalImpact. • Detection of unusual time-points using Twitter's Anomaly Detection. Shiny – Use case
  37. 37. Automated KPI reporting
  38. 38. 3. Jupyter Notebook Sharing source code with your SEO team
  39. 39. Jupyter Notebook Example
  40. 40. • Reproducibility • Quality • Discoverability • Learning Jupyter Notebook – Why ?
  41. 41. Step 1 — Installing Python 2.7 and Pip $ sudo apt-get update $ sudo apt-get -y install python2.7 python-pip python-dev Step 2 — Installing Ipython and Jupyter Notebook $ sudo apt-get -y install ipython ipython-notebook $ sudo -H pip install jupyter Step 3 — Running Jupyter Notebook $ jupyter notebook Jupyter Notebook Install
  42. 42. Notebook Example
  43. 43. • https://github.com/voltek62/RNotebook-SEO • Semantic Analysis for SEO • Scraper for SEO Jupyter Notebook Examples
  44. 44. Process Validation Documentation
  45. 45. 4. Dataiku Use AML to find the best algorithm
  46. 46. Automated Machine Learning • Benchmarking • Detecting Target Leakage • Diagnostics • Automation
  47. 47. $ adduser vincent sudo $ sudo apt-get install default-jre $ wget https://downloads.dataiku.com/public/studio/4.0.1/dataiku-dss-4.0.1.tar.gz $ tar xzf dataiku-dss-4.0.1.tar.gz $ cd dataiku-dss-4.0.1 >> install all prerequites $ sudo -i "/home/dataiku-dss-4.0.1/scripts/install/install-deps.sh" -without-java >> install dataiku $ ./installer.sh -d DATA_DIR -p 11000 $ DATA_DIR/bin/dss start http://<your server address>:11000. Dataiku- Install on Instance Cloud
  48. 48. Go to the DSS data dir $ cd DATADIR Stop DSS $ ./bin/dss stop Run the installation script $ ./bin/dssadmin install-R-integration $ ./bin/dss start Dataiku- Install R
  49. 49. Install R Package
  50. 50. Use-Case : Detect Featured Snippet
  51. 51. • Get all your featured snippet with Ranxplorer • Get SERP for each keywords with Ranxplorer • Use homemade scraper to enrich data : • 'Keyword' 'Domain' 'StatusCode' 'ContentType' 'LastModified' 'Location' • 'Title' 'TitleLength' 'TitleDist' 'TitleIsQuestion' • 'noSnippet' 'isJsonLD' 'isItemType' 'isItemProp' • 'Wordcount' 'Size' 'ResponseTime' • 'H1' 'H1Length' 'H1Dist' 'H1IsQuestion' • 'H2' 'H2Length' 'H2Dist' 'H2IsQuestion‘ • Use AML to find importance features Dataiku : Featured Snippet
  52. 52. Dataiku : Flow
  53. 53. Dataiku : Input / Output
  54. 54. Dataiku : Code Recipe
  55. 55. Dataiku : Visual Recipes
  56. 56. Dataiku : Plugin recipes
  57. 57. Dataiku : My Plugins • SEMrush • SearchConsole • Majestic • Visiblis [ongoing] A DSS plugin is a zip file. Inside DSS, click the top right gear → Administration → Plugins → Store. https://github.com/voltek62/Dataiku-SEO-Plugins
  58. 58. Dataiku : AML
  59. 59. Dataiku : Import a project
  60. 60. • Learn from the success of others with AML • Use all methods at your disposal to show Google you are the answer to the question. ( Title, H1, H2, … ) Dataiku : Results
  61. 61. Automated Machine Learning
  62. 62. • Yes, you can because : • Great advertising • Get customers for specific features and trainings Open Source & SEO ? • Showing your work • Attract talent • Teaching the next generation
  63. 63. • Automated Reports with Rstudio Server • Automated KPI reporting with Shiny Server • Process Validation Documentation with Jupyter Notebook • Automated Machine Learning with Dataiku Take away
  64. 64. Now, machines can learn and adapt, it is time to take advantage of the opportunity to create new jobs. Data-SEO, Data-Doctor, Data-Journalist …
  65. 65. Thank you!
  66. 66. Vincent Terrasi @vincentterrasi Get all my last discoveries and updates

×