Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Python for SEO

1,378 views

Published on

Python for SEO, presented at TechSEO Boost on November 29, 2018 by Hamlet Batista

Published in: Data & Analytics
  • Be the first to comment

Python for SEO

  1. 1. Hamlet Batista | @hamletbatista | #TechSEOBoost Python for SEO – Programming As a Superpower
  2. 2. Hamlet Batista | @hamletbatista | #TechSEOBoost AGENDA – Practical SEO applications of Python => 3.6 for: Data extraction – Preparation – Analysis & Visualization – Machine learning – Deep learning
  3. 3. Hamlet Batista | @hamletbatista | #TechSEOBoost INTRO – Why program when you can hire a programmer to do the work for you?
  4. 4. Hamlet Batista | @hamletbatista | #TechSEOBoost But before…
  5. 5. Hamlet Batista | @hamletbatista | #TechSEOBoost
  6. 6. Hamlet Batista | @hamletbatista | #TechSEOBoost
  7. 7. Hamlet Batista | @hamletbatista | #TechSEOBoost
  8. 8. Hamlet Batista | @hamletbatista | #TechSEOBoost
  9. 9. Hamlet Batista | @hamletbatista | #TechSEOBoost
  10. 10. Hamlet Batista | @hamletbatista | #TechSEOBoost
  11. 11. Hamlet Batista | @hamletbatista | #TechSEOBoost
  12. 12. Hamlet Batista | @hamletbatista | #TechSEOBoost CHALLENGING SEO PROBLEMS – THAT NEED PROGRAMMING WORK
  13. 13. Hamlet Batista | @hamletbatista | #TechSEOBoost IBM WebSphere => SAP Hybris
  14. 14. Hamlet Batista | @hamletbatista | #TechSEOBoost IBM WebSphere Site Category Page (Links to one or more Product Listing Pages) Product Listing Page (Links to one or more Product Pages) Product Page (Single SKU)
  15. 15. Hamlet Batista | @hamletbatista | #TechSEOBoost SAP Hybris Site Category Page (Links to one or more Product Pages) Product Page (Single SKU)
  16. 16. Hamlet Batista | @hamletbatista | #TechSEOBoost Old Site Product Pages (717) New Site Product Pages (442) Product Mapping (3431)
  17. 17. Hamlet Batista | @hamletbatista | #TechSEOBoost Old Site Category Pages (371) New Site Category Pages (147) Category Mapping (712)
  18. 18. Hamlet Batista | @hamletbatista | #TechSEOBoost
  19. 19. Hamlet Batista | @hamletbatista | #TechSEOBoost Category Home Product Content Videos Other NewUsersRevenuePageCount
  20. 20. Hamlet Batista | @hamletbatista | #TechSEOBoost
  21. 21. Hamlet Batista | @hamletbatista | #TechSEOBoost
  22. 22. Hamlet Batista | @hamletbatista | #TechSEOBoost Category Home Product Content Videos Other NewUsers
  23. 23. Hamlet Batista | @hamletbatista | #TechSEOBoost Winners vs Losers
  24. 24. Hamlet Batista | @hamletbatista | #TechSEOBoost Launch Jupyter Notebook in Google Colaboratory https://colab.research.google.com/github/ranksense/open- source/blob/master/Presentations/TechSEOBoost/2018/PythonforSEOTechSEOBoost2018_ Hamlet_Batista.ipynb
  25. 25. Hamlet Batista | @hamletbatista | #TechSEOBoost
  26. 26. Hamlet Batista | @hamletbatista | #TechSEOBoost Ecommerce V3 => Shopify
  27. 27. Hamlet Batista | @hamletbatista | #TechSEOBoost https://github.com/plotly/plotly.py
  28. 28. Hamlet Batista | @hamletbatista | #TechSEOBoost Solution Part 1 – Steps Step 1: Pull Google Analytics Data – Step 2: Store Data in Pandas DataFrame – Step 3: Perform Data Preparation and Perform Basic Set Operations CHALLENGE: Find Which Pages Lost SEO Traffic
  29. 29. Hamlet Batista | @hamletbatista | #TechSEOBoost Python – Basics https://pandas.pydata.org/ Python for Data Science Cheat Sheet https://s3.amazonaws.com/assets.datacamp.com/blog_assets/PythonF orDataScience.pdf
  30. 30. Hamlet Batista | @hamletbatista | #TechSEOBoost Python – Jupyter Google Colaboratory https://colab.research.google.com/notebooks/ welcome.ipynb
  31. 31. Hamlet Batista | @hamletbatista | #TechSEOBoost Python – Pandas https://pandas.pydata.org/ Cheat Sheet https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf 10 Minutes to pandas https://pandas.pydata.org/pandas-docs/stable/10min.html Intro to Pandas for Excel Super Users https://towardsdatascience.com/intro-to-pandas-for-excel- super-users-dac1b38f12b0
  32. 32. Hamlet Batista | @hamletbatista | #TechSEOBoost Python – Requests WEB SCRAPING REFERENCE: A Simple Cheat Sheet for Web Scraping with Python https://blog.hartleybrody.com/web-scraping-cheat-sheet/ http://docs.python-requests.org/en/master/
  33. 33. Hamlet Batista | @hamletbatista | #TechSEOBoost https://ga-dev-tools.appspot.com/query-explorer/
  34. 34. Hamlet Batista | @hamletbatista | #TechSEOBoost Pulling Google Analytics Data
  35. 35. Hamlet Batista | @hamletbatista | #TechSEOBoost Storing Data in a DataFrame
  36. 36. Hamlet Batista | @hamletbatista | #TechSEOBoost Transforming Data for Analysis https://www.shanelynn.ie/merge-join-dataframes-python-pandas-index-1/ Left Join Full Outer Join Left Join (if NULL) Inner Join Right Join Right Join (if NULL)
  37. 37. Hamlet Batista | @hamletbatista | #TechSEOBoost Transforming Data for Analysis
  38. 38. Hamlet Batista | @hamletbatista | #TechSEOBoost Pages That Lost SEO Traffic
  39. 39. Hamlet Batista | @hamletbatista | #TechSEOBoost Solution Part 2 – Steps Step 1: We will crawl old pages to follow redirects – Step 2: We will group pages using regular expressions – Step 3: Repeat the previous analysis CHALLENGE: Find Which Page Groups Lost SEO Traffic (Manually)
  40. 40. Hamlet Batista | @hamletbatista | #TechSEOBoost Regular Expressions for SEOs and Digital Marketers (with Use Cases) https://netpeaksoftware.com/blog/ regular-expressions-for-seos- and-digital-marketers-with-use- cases Regex101.com
  41. 41. Hamlet Batista | @hamletbatista | #TechSEOBoost Crawling Old Pages
  42. 42. Hamlet Batista | @hamletbatista | #TechSEOBoost Grouping with Regexes Lookahead and Lookbehind Zero-Length Assertions https://www.regular-expressions.info/lookaround.html
  43. 43. Hamlet Batista | @hamletbatista | #TechSEOBoost https://github.com/plotly/plotly.py
  44. 44. Hamlet Batista | @hamletbatista | #TechSEOBoost Page Groups That Lost SEO Traffic
  45. 45. Hamlet Batista | @hamletbatista | #TechSEOBoost Reverse Engineer Success Too
  46. 46. Hamlet Batista | @hamletbatista | #TechSEOBoost How Do We Generalize This?
  47. 47. Hamlet Batista | @hamletbatista | #TechSEOBoost Using Machine Learning!
  48. 48. Hamlet Batista | @hamletbatista | #TechSEOBoost But before…
  49. 49. Hamlet Batista | @hamletbatista | #TechSEOBoost Credit: Matt West Why Are Dominicans So Good at Baseball?
  50. 50. Hamlet Batista | @hamletbatista | #TechSEOBoost Hit a Vitilla? Hit Anything https://www.youtube.com/watch?v=k8Aw2cBer84
  51. 51. Hamlet Batista | @hamletbatista | #TechSEOBoost Vitilla https://en.wikipedia.org/wiki/Vitilla
  52. 52. Hamlet Batista | @hamletbatista | #TechSEOBoost Learn Machine Learning and Solve Any SEO Problem
  53. 53. Hamlet Batista | @hamletbatista | #TechSEOBoost
  54. 54. Hamlet Batista | @hamletbatista | #TechSEOBoost Regex-> URL Matching XPath-> Content Matching
  55. 55. Hamlet Batista | @hamletbatista | #TechSEOBoost Solution Part 3 – Steps Step 1: Collect training data – Step 2: Prepare and split training data into training, and testing – Step 3: Find best model CHALLENGE: Find Which Page Groups Lost SEO Traffic (Automatically)
  56. 56. Hamlet Batista | @hamletbatista | #TechSEOBoost Python – BeautifulSoup BeautifulSoup 4 Cheatsheet http://akul.me/blog/2016/beautifulsoup-cheatsheet/ https://www.crummy.com/software/BeautifulSoup/bs4/download/ An SEO’s guide to XPath https://builtvisible.com/seo-guide-to-xpath/
  57. 57. Hamlet Batista | @hamletbatista | #TechSEOBoost Python – Scikit-learn https://scikit-learn.org/ Cheat Sheet https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Scikit_Lear heat_Sheet_Python.pdf Hands-On Introduction To Scikit-learn (sklearn) https://towardsdatascience.com/hands-on-introduction-to-scikit-learn- sklearn-f3df652ff8f2 Efficiently Searching Optimal Tuning Parameters https://www.ritchieng.com/machine-learning-efficiently- search-tuning-param/
  58. 58. Hamlet Batista | @hamletbatista | #TechSEOBoost Data Scientist Bottom Up Solution Inside the BloomReach Algorithm - Using Machine Learning to Understand Page Templates https://www.bloomreach.com/en/blog/2018/07/using-machine- learning-to-learn-page-templates.html
  59. 59. Hamlet Batista | @hamletbatista | #TechSEOBoost For most Ecommerce sites, the dimensions and quantity of images and input form elements change by page template. Let’s use that as the features vector. Hamlet’s Observation and Simpler Solution
  60. 60. Hamlet Batista | @hamletbatista | #TechSEOBoost Hamlet’s Observation and Simpler Solution
  61. 61. Hamlet Batista | @hamletbatista | #TechSEOBoost Hamlet’s Observation and Simpler Solution
  62. 62. Hamlet Batista | @hamletbatista | #TechSEOBoost Collecting Training Data
  63. 63. Hamlet Batista | @hamletbatista | #TechSEOBoost What is One Hot Encoding? Why and when do you have to use it? https://hackernoon.com/what-is-one- hot-encoding-why-and-when-do-you- have-to-use-it-e3c6186d008f Prepare and Split Data
  64. 64. Hamlet Batista | @hamletbatista | #TechSEOBoost Cross Validation and Grid Search For Model Selection in Python https://stackabuse.com/cross-validation- and-grid-search-for-model-selection-in- python/ Find Best Model
  65. 65. Hamlet Batista | @hamletbatista | #TechSEOBoost https://github.com/plotly/plotly.py
  66. 66. Hamlet Batista | @hamletbatista | #TechSEOBoost Simple guide to confusion matrix terminology https://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/ Confusion Matrix
  67. 67. Hamlet Batista | @hamletbatista | #TechSEOBoost But wait… We can do Better
  68. 68. Hamlet Batista | @hamletbatista | #TechSEOBoost Using Deep Learning!
  69. 69. Hamlet Batista | @hamletbatista | #TechSEOBoost Solution Part 4 – Steps Step 1: Label a few thousand web page screenshots with the visual features you care about – Step 2: Train a computer vision model to predict more granular page groups – Step 3: Find best model CHALLENGE: Learn More Granular Page Groups that Lost SEO Traffic (Automatically)
  70. 70. Hamlet Batista | @hamletbatista | #TechSEOBoost https://www.tensorflow.org/ Keras Cheat Sheet https://s3.amazonaws.com/assets.dataca mp.com/blog_assets/Keras_Cheat_Sheet _Python.pdf TensorFlow Tutorial For Beginners https://www.datacamp.com/community/tut orials/tensorflow-tutorial Python – Tensorflow & Keras
  71. 71. Hamlet Batista | @hamletbatista | #TechSEOBoost Bottleneck The “Information Bottleneck” Theory https://www.quantamagazine.org/ne w-theory-cracks-open-the-black- box-of-deep-learning-20170921/
  72. 72. Hamlet Batista | @hamletbatista | #TechSEOBoost Encoder Bottleneck Decoder Input Image Reconstructed Image Latent Space Representation AUTOENCODER
  73. 73. Hamlet Batista | @hamletbatista | #TechSEOBoost 14 x 14 Feature Map 1. Input Image 2. Convolutional Feature Extraction 3. RNN with attention over the image 4. Word by word generation LSTM Encoder Bottleneck Decoder Latent Space Representation Caption Generator
  74. 74. Hamlet Batista | @hamletbatista | #TechSEOBoost Python – Tensorflow Object Detection API https://github.com/tensorflow/models/tree/master/research/object_detection
  75. 75. Hamlet Batista | @hamletbatista | #TechSEOBoost AutoML Vision API Tutorial https://cloud.google.com/vision/automl/docs/tutorial Google AutoML
  76. 76. Hamlet Batista | @hamletbatista | #TechSEOBoost Visually Labeling Screenshots
  77. 77. Hamlet Batista | @hamletbatista | #TechSEOBoost Don't Take Security Advice from SEO Experts or Psychics https://www.troyhunt.com/dont- take-security-advice-from-seo- experts-or-psychics-neil-patel/
  78. 78. Hamlet Batista | @hamletbatista | #TechSEOBoost Launch Jupyter Notebook in Google Colaboratory https://colab.research.google.com/github/ranksense/open- source/blob/master/Presentations/TechSEOBoost/2018/Pyt honforSEOTechSEOBoost2018_Hamlet_Batista.ipynb
  79. 79. Hamlet Batista | @hamletbatista | #TechSEOBoost SUMMARY –
  80. 80. Hamlet Batista | @hamletbatista | #TechSEOBoost Summary Practical applications of Python => 3.6 for: Data extraction – Preparation – Analysis – Machine learning – Deep learning
  81. 81. Hamlet Batista | @hamletbatista | #TechSEOBoost Free Realtime SEO Monitor – Ongoing monitoring with no active crawls – Receive alerts about critical SEO issues – Apply quick, temporary fixes in Cloudflare – Create developer tickets for permanent solutions ABOUT RANKSENSE – Apply for Beta Access www.ranksense.com

×