Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How Data Science can boost your SEO ?

2,983 views

Published on

Slides de la conférence QueDuWeb à Deauville du 28 Avril 2017. Utilisez Dataiku une plateforme de data-science pour optimiser votre SEO. Bonus : j'ai ajouté en bonus deux slides et des urls vers les moocs et articles traitant du sujet.

Published in: Technology
  • Be the first to comment

How Data Science can boost your SEO ?

  1. 1. How Data Science can boost your SEO ? @RemiBacha @VincentTerrasi OVH
  2. 2. SEO IS MAGIC
  3. 3. From SEO to Data SEO • R / Python / Scala / Java & SEO / SEA / Analytics • Data Science Platform Or / And And • Big Data Platform
  4. 4. Learn with Data Science MOOC • Data Science by Johns Hopkins University on Coursera • https://www.coursera.org/specializations/jhu-data-science • Introduction to Big Data with Apache Spark by Berkeley on Edx • https://www.edx.org/course/big-data-analysis-apache-spark-uc- berkeleyx-cs110x • Learn R or Python with https://www.datacamp.com/
  5. 5. 1. Google Rankings Prediction 2. Logs Analysis
  6. 6. 1. Rankings Prediction Use #Dataiku for Google Rankings Prediction
  7. 7. • Very difficult to predict EXACT position • Predict ranking Google (keyword > thematic > country)  Machine Learning : Classification • The website is in the top 10 • The website is not in the top 10 How to predict google rankings?
  8. 8. • Find your own ranking factors • Create new webpage • Update old webpage Why predict google rankings
  9. 9. https://www.dataiku.com/dss/trynow/OVH DLP : : https://www.ovh.com/fr/dlp/
  10. 10. Install Dataiku Local host Or OVH Public Cloud
  11. 11. • Build Dataset with SEO tools • Clean Data • Normalization and training • Optimize Threshold • Results • Show importance of ranking factor Entire Process Overview
  12. 12. SEO Architecture with Dataiku
  13. 13. Build Dataset We used a data set of 200,000 records, including roughly 2,000 different keywords/search terms.
  14. 14. R with Dataiku
  15. 15. First Dataiku SEO Plugin
  16. 16. Remove invalid urls : Slow Crawl Rate • Non-HTML Content • Network Problems • Slow Web Servers Wait Times • Errors from Web Servers • URL Moved Permanently Redirect (301) • URL Moved Temporarily Redirect (302) • Authentication Required (401) or Document Not Found (404) • Cyclic Redirects Clean Data
  17. 17. • Title length • H1 length • Inlinks • Outlinks • Text ratio • Word Count • HTTPS, … Screaming Frog or Scrapy
  18. 18. Screaming Frog
  19. 19. Step 1 Step 2 Step 3
  20. 20. Majestic • Trust Flow : score based on quality, on a scale between 0-100. • Citation Flow : score between 0-100 which helps to measure the link equity or "power" the website.
  21. 21. Visiblis can analyse on page topical analysis tools to measure : • a semantic affinity of WEB page contents • its relevance to a search query Visiblis
  22. 22. • Xgboost is short for eXtreme Gradient Boosting package. • It is an efficient and scalable implementation of gradient boosting framework. • Two solvers are included: • linear model • tree learning algorithm. XgBoost
  23. 23. Variables importance
  24. 24. 2. Logs Analysis Use #Dataiku for Logs Analysis
  25. 25. https://www.ovh.com/manager/ > Hosting > Multisite Where To Get Your Logs?
  26. 26. What you will learn
  27. 27. Install Logs Importer Plugin Works with OVH webhosting
  28. 28. Get your logs
  29. 29. Your first logs analysis
  30. 30. Enrich your dataset : Open a new analysis
  31. 31. Enrich your dataset : IP Geolocation
  32. 32. Enrich your dataset : User Agent
  33. 33. Prepare you logs
  34. 34. Let’s use dataviz
  35. 35. Status codes seen by bots
  36. 36. Group by URL
  37. 37. AMP vs no AMP by Bot
  38. 38. Enrich Logs with SF Crawl
  39. 39. Filter data before merge
  40. 40. Orphan pages Never crawled pages OK pages
  41. 41. URL never crawled by Googlebot
  42. 42. Orphan pages
  43. 43. Crawled and linked : seems OK
  44. 44. Create SEO Dataiku Plugins Ready • Majestic • Visiblis • SEMrush ToDo • OnCrawl • Moz • Ahrefs • Scrapy • Yooda
  45. 45. Take away : Create your SEO Platform • Semantical Audit • Advanced Reporting • Anomalies Detection • Log Analysis • Ranking monitoring • Opportunities Detection • Expired backlinks Tool • Hot Topic Detection
  46. 46. SEO IS NOT MAGIC DATA SCIENCE IS NOT MAGIC
  47. 47. Thank you!
  48. 48. Remi Bacha @remibacha Vincent Terrasi @vincentterrasi Get all our last discoveries and updates
  49. 49. Bonus • https://remibacha.com/analyse-logs-ovh-dataiku/ • https://data-seo.com/2017/04/13/how-to-predict-google-rankings-datascience- platform/ • https://freres.peyronnet.eu/predire-rankings-de-google-vincent-terrasi-dit-faire/ • Next steps : • Dataiku connectors : www.github.com/voltek62/ • Blog posts are coming …

×