Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Automated Duplicate Content Consolidation with Google Cloud Functions

Avoid duplicate content and don’t leave money on the table with unoptimized groups of pages linked by canonical declarations! Particularly in ecommerce, you can increase Google's confidence by making sure your groups of product URLs are perfectly canonicalized and clear to search engines.

In this keynote, we will use Python in Google Cloud Functions to reorganize canonical clusters automatically and maximize SEO performance.
1. We will use OnCrawl to find duplicate product cluster by SKU
2. Then we will pull production variant search traffic using SEMrush (color, size, etc)
3. We will use an algorithm to regroup the clusters based on search demand
4. We will automate the whole process using Cloud Functions and Pub/sub queues

Join Hamlet Batista to look at ways to automate canonicals for ecommerce sites in order to improve product visibility.

  • Be the first to comment

  • Be the first to like this

Automated Duplicate Content Consolidation with Google Cloud Functions

  1. 1. Automated Duplicate Content Consolidation with Google Cloud Functions
  2. 2. Speaking today / Présenté par Automating Google Lighthouse Hamlet Batista // RankSense @hamletbatista
  3. 3.
  4. 4. Agenda ➢Finding marginal but repeatable success ➢Scaling it with automation
  5. 5. Success Story
  6. 6. ➢ No www to non-www redirects ➢ No canonicals ➢ Redundant parameter URLs ➢ Only 1.40% of indexed pages with search clicks (out of +300k pages)
  7. 7. The Google SEO Scorecard Report
  8. 8. ➢ Duplicate content consolidation can be executed relatively quickly, as it requires a small set of technical changes ➢ You will likely see improved rankings within weeks after the corrections are in place ➢ New changes and improvements to your site are picked up faster by Google
  9. 9. ➢ Natzir found the total traffic to pages ranking for the same keyword was less than when consolidated with redirects ➢ Same idea but from a keywords’ perspective
  10. 10. Reverse Engineering
  11. 11. ➢ Finding repeatable success ➢ Searching for a machine learning model to connect new visits to technical SEO changes ➢ We focused on the impact of links, indexing, and canonical clustering
  12. 12. Our best predictive model achieved 85% test accuracy ➢ Canonicalization drives repeatable success ➢ The size of the canonical cluster turned out to be a strong predictor
  13. 13. One oversimplified way to think about a machine learning model is to picture a linear regression function in Excel/Sheets. We predicted new users (Y) within canonicalized clusters dependent on the size of the clusters (X). Machine Learning 101
  14. 14. To Canonicalize or Not to Canonicalize
  15. 15. Current canonical clustering is mostly self-referential (orange) Every product variant canonicalizes to itself.
  16. 16. Their optimal canonical setup is the inverse. Most clusters should canonicalize to one product “leader”
  17. 17. For some products, people specific the color they want directly in Google. But, for other products, they don’t. They decide the color they want after seeing the options available in the site.
  18. 18.
  19. 19. Technical Plan ➢ Build clusters using OnCrawl ➢ Get search demand using SEMrush ➢ Canonicalization algorithm ➢ Experiment on CDN using RankSense ➢ Automate everything using Cloud Functions and Pub/sub queues
  20. 20. Coupled vs Decoupled Systems
  21. 21. Pub/Sub is an asynchronous messaging service that decouples services that produce events from services that process events. It allows us to connect OnCrawl, SEMrush, and RankSense asynchronously to complete a custom workflow.
  22. 22. Cloud Scheduler acts as a single pane of glass, allowing us to manage all our automation tasks from one place. It allows us to trigger our custom workflow on recurring times as search demand changes with seasons.
  23. 23. Clustering with OnCrawl
  24. 24. Search Demand Tracking with SEMrush
  25. 25. ➢ Cloud Scheduler triggers OnCrawl Cloud Function which uploads each craw export to Cloud Storage ➢ Cloud Storage update triggers SEMrush Cloud Function which then exports search demand data to Cloud Storage
  26. 26. Canonicalization Algorithm
  27. 27. ➢ We are going to perform an intermediate step and force all product groups to canonicalize to the “leader” URL in the group. ➢ The “leader” could be the URL with most search traffic, more internal/external links or most frequently crawled
  28. 28. We end up with one cluster that we need to update, which means that David Yurman is leaving a lot of money on the table with their current setup that relies on self-referential canonicals.
  29. 29. Deploying to Cloudflare’s CDN with RankSense
  30. 30. We are going to use the RankSense API to publish our new canonical clusters as experiments in the Cloudflare CDN
  31. 31. ➢ We automatically populate a Google Sheet with the changes ➢ We submit the Sheet to RankSense’s PRODUCTION environment
  32. 32. Resources to Learn More ➢ Python code covered in this presentation ➢ Advanced Duplicate Content Consolidation with Python duplicate-content-consolidation-python/314471/ ➢ Cloud Functions ➢ Google PubSub ➢ Introduction to Python for SEO Pros to-python-seo-spreadsheets/342779/
  33. 33. Thank you!