12. Hamlet Batista | @hamletbatista | #TechSEOBoost
CHALLENGING SEO PROBLEMS
–
THAT NEED PROGRAMMING WORK
13. Hamlet Batista | @hamletbatista | #TechSEOBoost
IBM WebSphere => SAP Hybris
14. Hamlet Batista | @hamletbatista | #TechSEOBoost
IBM WebSphere Site
Category Page
(Links to one or more
Product Listing
Pages)
Product Listing Page
(Links to one or more
Product Pages)
Product Page
(Single SKU)
15. Hamlet Batista | @hamletbatista | #TechSEOBoost
SAP Hybris Site
Category Page
(Links to one or more
Product Pages)
Product Page
(Single SKU)
16. Hamlet Batista | @hamletbatista | #TechSEOBoost
Old Site
Product Pages
(717)
New Site
Product Pages
(442)
Product
Mapping
(3431)
17. Hamlet Batista | @hamletbatista | #TechSEOBoost
Old Site
Category
Pages
(371)
New Site
Category
Pages
(147)
Category
Mapping
(712)
27. Hamlet Batista | @hamletbatista | #TechSEOBoost
https://github.com/plotly/plotly.py
28. Hamlet Batista | @hamletbatista | #TechSEOBoost
Solution Part 1 – Steps
Step 1:
Pull Google Analytics Data
–
Step 2:
Store Data in Pandas DataFrame
–
Step 3:
Perform Data Preparation and
Perform Basic Set Operations
CHALLENGE: Find Which Pages Lost
SEO Traffic
29. Hamlet Batista | @hamletbatista | #TechSEOBoost
Python – Basics
https://pandas.pydata.org/
Python for Data Science Cheat Sheet
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/PythonF
orDataScience.pdf
30. Hamlet Batista | @hamletbatista | #TechSEOBoost
Python – Jupyter
Google Colaboratory
https://colab.research.google.com/notebooks/
welcome.ipynb
31. Hamlet Batista | @hamletbatista | #TechSEOBoost
Python – Pandas
https://pandas.pydata.org/
Cheat Sheet
https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
10 Minutes to pandas
https://pandas.pydata.org/pandas-docs/stable/10min.html
Intro to Pandas for Excel Super Users
https://towardsdatascience.com/intro-to-pandas-for-excel-
super-users-dac1b38f12b0
32. Hamlet Batista | @hamletbatista | #TechSEOBoost
Python – Requests
WEB SCRAPING REFERENCE:
A Simple Cheat Sheet for Web Scraping with
Python
https://blog.hartleybrody.com/web-scraping-cheat-sheet/
http://docs.python-requests.org/en/master/
33. Hamlet Batista | @hamletbatista | #TechSEOBoost
https://ga-dev-tools.appspot.com/query-explorer/
34. Hamlet Batista | @hamletbatista | #TechSEOBoost
Pulling Google Analytics Data
35. Hamlet Batista | @hamletbatista | #TechSEOBoost
Storing Data in a DataFrame
36. Hamlet Batista | @hamletbatista | #TechSEOBoost
Transforming Data for Analysis
https://www.shanelynn.ie/merge-join-dataframes-python-pandas-index-1/
Left Join Full Outer Join Left Join (if NULL)
Inner Join Right Join Right Join (if NULL)
37. Hamlet Batista | @hamletbatista | #TechSEOBoost
Transforming Data for Analysis
38. Hamlet Batista | @hamletbatista | #TechSEOBoost
Pages That Lost SEO Traffic
39. Hamlet Batista | @hamletbatista | #TechSEOBoost
Solution Part 2 – Steps
Step 1:
We will crawl old pages to follow
redirects
–
Step 2:
We will group pages using regular
expressions
–
Step 3:
Repeat the previous analysis
CHALLENGE: Find Which Page Groups Lost
SEO Traffic (Manually)
40. Hamlet Batista | @hamletbatista | #TechSEOBoost
Regular Expressions for
SEOs and Digital
Marketers (with Use
Cases)
https://netpeaksoftware.com/blog/
regular-expressions-for-seos-
and-digital-marketers-with-use-
cases
Regex101.com
41. Hamlet Batista | @hamletbatista | #TechSEOBoost
Crawling Old Pages
42. Hamlet Batista | @hamletbatista | #TechSEOBoost
Grouping with Regexes
Lookahead and Lookbehind Zero-Length Assertions
https://www.regular-expressions.info/lookaround.html
43. Hamlet Batista | @hamletbatista | #TechSEOBoost
https://github.com/plotly/plotly.py
44. Hamlet Batista | @hamletbatista | #TechSEOBoost
Page Groups That Lost SEO Traffic
45. Hamlet Batista | @hamletbatista | #TechSEOBoost
Reverse Engineer Success Too
46. Hamlet Batista | @hamletbatista | #TechSEOBoost
How Do We Generalize This?
47. Hamlet Batista | @hamletbatista | #TechSEOBoost
Using Machine Learning!
55. Hamlet Batista | @hamletbatista | #TechSEOBoost
Solution Part 3 – Steps
Step 1:
Collect training data
–
Step 2:
Prepare and split training data into
training, and testing
–
Step 3:
Find best model
CHALLENGE: Find Which Page Groups Lost
SEO Traffic (Automatically)
56. Hamlet Batista | @hamletbatista | #TechSEOBoost
Python – BeautifulSoup
BeautifulSoup 4 Cheatsheet
http://akul.me/blog/2016/beautifulsoup-cheatsheet/
https://www.crummy.com/software/BeautifulSoup/bs4/download/
An SEO’s guide to XPath
https://builtvisible.com/seo-guide-to-xpath/
58. Hamlet Batista | @hamletbatista | #TechSEOBoost
Data Scientist Bottom Up Solution
Inside the BloomReach Algorithm - Using
Machine Learning to Understand Page
Templates
https://www.bloomreach.com/en/blog/2018/07/using-machine-
learning-to-learn-page-templates.html
59. Hamlet Batista | @hamletbatista | #TechSEOBoost
For most Ecommerce sites, the dimensions
and quantity of images and input form elements
change by page template.
Let’s use that as the features vector.
Hamlet’s Observation
and Simpler Solution
60. Hamlet Batista | @hamletbatista | #TechSEOBoost
Hamlet’s Observation and Simpler Solution
61. Hamlet Batista | @hamletbatista | #TechSEOBoost
Hamlet’s Observation and Simpler Solution
62. Hamlet Batista | @hamletbatista | #TechSEOBoost
Collecting Training Data
63. Hamlet Batista | @hamletbatista | #TechSEOBoost
What is One Hot Encoding?
Why and when do you have to
use it?
https://hackernoon.com/what-is-one-
hot-encoding-why-and-when-do-you-
have-to-use-it-e3c6186d008f
Prepare and Split Data
64. Hamlet Batista | @hamletbatista | #TechSEOBoost
Cross Validation and Grid Search
For Model Selection in Python
https://stackabuse.com/cross-validation-
and-grid-search-for-model-selection-in-
python/
Find Best Model
65. Hamlet Batista | @hamletbatista | #TechSEOBoost
https://github.com/plotly/plotly.py
66. Hamlet Batista | @hamletbatista | #TechSEOBoost
Simple guide to confusion matrix terminology
https://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/
Confusion Matrix
67. Hamlet Batista | @hamletbatista | #TechSEOBoost
But wait… We can do Better
68. Hamlet Batista | @hamletbatista | #TechSEOBoost
Using Deep Learning!
69. Hamlet Batista | @hamletbatista | #TechSEOBoost
Solution Part 4 – Steps
Step 1:
Label a few thousand web page
screenshots with the visual features
you care about
–
Step 2:
Train a computer vision model to
predict more granular page groups
–
Step 3: Find best model
CHALLENGE: Learn More Granular Page
Groups that Lost SEO Traffic (Automatically)
70. Hamlet Batista | @hamletbatista | #TechSEOBoost
https://www.tensorflow.org/
Keras Cheat Sheet
https://s3.amazonaws.com/assets.dataca
mp.com/blog_assets/Keras_Cheat_Sheet
_Python.pdf
TensorFlow Tutorial For
Beginners
https://www.datacamp.com/community/tut
orials/tensorflow-tutorial
Python – Tensorflow
& Keras
71. Hamlet Batista | @hamletbatista | #TechSEOBoost
Bottleneck
The “Information
Bottleneck” Theory
https://www.quantamagazine.org/ne
w-theory-cracks-open-the-black-
box-of-deep-learning-20170921/
72. Hamlet Batista | @hamletbatista | #TechSEOBoost
Encoder Bottleneck Decoder
Input Image Reconstructed Image
Latent Space
Representation
AUTOENCODER
73. Hamlet Batista | @hamletbatista | #TechSEOBoost
14 x 14 Feature Map
1. Input Image 2. Convolutional
Feature Extraction
3. RNN with attention
over the image
4. Word by word
generation
LSTM
Encoder Bottleneck Decoder
Latent Space
Representation
Caption Generator
74. Hamlet Batista | @hamletbatista | #TechSEOBoost
Python – Tensorflow Object Detection API
https://github.com/tensorflow/models/tree/master/research/object_detection
75. Hamlet Batista | @hamletbatista | #TechSEOBoost
AutoML Vision API Tutorial
https://cloud.google.com/vision/automl/docs/tutorial
Google AutoML
76. Hamlet Batista | @hamletbatista | #TechSEOBoost
Visually Labeling Screenshots
77. Hamlet Batista | @hamletbatista | #TechSEOBoost
Don't Take Security
Advice from SEO Experts
or Psychics
https://www.troyhunt.com/dont-
take-security-advice-from-seo-
experts-or-psychics-neil-patel/
78. Hamlet Batista | @hamletbatista | #TechSEOBoost
Launch Jupyter Notebook in Google
Colaboratory
https://colab.research.google.com/github/ranksense/open-
source/blob/master/Presentations/TechSEOBoost/2018/Pyt
honforSEOTechSEOBoost2018_Hamlet_Batista.ipynb
80. Hamlet Batista | @hamletbatista | #TechSEOBoost
Summary
Practical applications
of Python => 3.6
for:
Data extraction
–
Preparation
–
Analysis
–
Machine learning
–
Deep learning
81. Hamlet Batista | @hamletbatista | #TechSEOBoost
Free Realtime SEO Monitor
–
Ongoing monitoring with no active crawls
–
Receive alerts about critical SEO issues
–
Apply quick, temporary fixes in Cloudflare
–
Create developer tickets for permanent solutions
ABOUT RANKSENSE
– Apply for Beta Access
www.ranksense.com
Editor's Notes
This is what we will do to correct that:Step 1: We will crawl each page from the first set, and record the status code (and final URL of the redirects)Step 2: Repeat the analysis
This is what we will do to correct that:Step 1: We will crawl each page from the first set, and record the status code (and final URL of the redirects)Step 2: Repeat the analysis
This is what we will do to correct that:Step 1: We will crawl each page from the first set, and record the status code (and final URL of the redirects)Step 2: Repeat the analysis
This is what we will do to correct that:Step 1: We will crawl each page from the first set, and record the status code (and final URL of the redirects)Step 2: Repeat the analysis