What a search engine can teach you about product sitemaps - BrightonSEO April 2018

Pricesearcher
PricesearcherPricesearcher
Vlassios Rizopoulos
Chief Technology Officer @ pricesearcher.com
What a search engine can teach you about product
sitemaps
@Pricesearcher #BrightonSEO
@Pricesearcher #BrightonSEO
BACKGROUND
Pricesearcher is a vertical search
engine focusing on products and their
prices.
Our mission is to provide access to all
the worlds prices in one place.
@Pricesearcher #BrightonSEO
OUR MISSION IS TO INDEX ALL THE WORLD’S PRICES
@Pricesearcher #BrightonSEO
SOURCES OF DATA
Product feeds
from 5000+ retailers
Developed plugins
Developed PriceBot to
complete the picture
@Pricesearcher #BrightonSEO
PROGRESS TO DATE
Gathered data on 1.1 Billion products
Online in 11 Countries
Gathered 91 Billion price points for our products On average we check the price of a product 3 times a
day
We have gathered:
17,000,000 ISBNs
144,000,000 MPNs
73,000,000 SKUs
157,000,000 GTINs
GB / US / DE / FR / IT / IE / NO / SE / FI / DK / NG
@Pricesearcher #BrightonSEO
WHAT IS PRICEBOT?
Pricebot is our proprietary crawler, built to discover products and turn unstructured data
from web pages into structured data for our product database
Pricesearcher is the only product search engine that crawls to complement our product
coverage
PriceBot is fully robots.txt compliant, leaves behind a footprint in its user agent and has a
built-in feedback mechanism
http://www.pricesearcher.com/pricebot
@Pricesearcher #BrightonSEO
WHAT INFORMATION IS PRICEBOT COLLECTING?
We are looking to extract the following fields:
• Product Title
• Product Image
• Product Price
and optionally:
• Product Description
• Product Identifier (GTIN/UPC/EAN/ISBN)
• Product Brand
• Product Category
• Product Stock Availability
Vastly simplified discovering all the products from retailers
@Pricesearcher #BrightonSEO
INITIAL CRAWLING TECH DEPENDED ON SITEMAPS
@Pricesearcher #BrightonSEO
DATA SAMPLE
We will focus on 4000 UK retailers
we currently crawl using XML sitemaps discovering
20million+ products
@Pricesearcher #BrightonSEO
TOP
10
Data Insights
from our crawling tech
@Pricesearcher #BrightonSEO
1. SITEMAP DATA
have an XML sitemap
with product links
that’s regularly updated
91%
61%
54%
of retailer websites
of retailer websites
of retailer websites
@Pricesearcher #BrightonSEO
2. BLOCKING OF CRAWLERS
have blocked us unintentionally
(generic robots.txt entry
or 403 automatic block)
have blocked us intentionally
(robots.txt entry)
2%
of retailer websites
0.05%
of retailer websites
@Pricesearcher #BrightonSEO
3. EXTRACTION USING METADATA STANDARDS
have product title + price + image
defined using meta / opengraph tags
have product title + price + image
defined using meta / itemprop tags
(schema)
have product title + price + image defined
using both
41%
36%
12%
of retailer websites
of retailer websites
of retailer websites
@Pricesearcher #BrightonSEO
4. EXTRACTION USING JAVASCRIPT
no info extracted due to heavy rendering
being uneconomical
price cannot be extracted as it is
converted / calculated on the fly
2%
of retailer websites
1%
of retailer websites
@Pricesearcher #BrightonSEO
5. SITEMAP LINKS
have multiple links to the same
product pages
have multiple links to pages that
return 404 codes
2%
of retailer websites
3%
of retailer websites
@Pricesearcher #BrightonSEO
6. PRODUCT IDENTIFIERS
provide a GTIN-14, EAN-13, UPC-12/8
for their products
provide an SKU for their products
provide an ISBN for their products
24%
of retailer websites
7%
of retailer websites
3%
of retailer websites
@Pricesearcher #BrightonSEO
7. PRODUCT CATALOGUE SIZE
have less than 5000 product links in
their sitemap
have between 5000 and 30000 links
have more than 30000 links
14%
of retailer websites
79%
of retailer websites
7%
of retailer websites
@Pricesearcher #BrightonSEO
8. DATA RICHNESS #1
provide a brand for their products
provide a category for their products
provide a stock indicator for their products
17%
of retailer websites
44%
of retailer websites
62%
of retailer websites
@Pricesearcher #BrightonSEO
9. DATA RICHNESS #2 – NUMBER OF DIMENSIONS
Crawler 6 dimensions
Plugin
Product Feed
12 dimensions
23 dimensions
@Pricesearcher #BrightonSEO
10. SITEMAP DISCOVERABILITY
list their sitemap in robots.txt33%
of retailer websites
@Pricesearcher #BrightonSEO
TOP
5
Action Points
suggestions
@Pricesearcher #BrightonSEO
ACTION POINT #1 - SITEMAP
• Have an XML sitemap
• Have the path of your sitemap listed in robots.txt
• Have your product pages in your sitemap
• Regularly update your sitemap
• Don’t point to 404 pages from your sitemap
@Pricesearcher #BrightonSEO
ACTION POINT #2 - META / OPENGRAPH / ITEMPROP
• Provide structured information on your products using meta
itemprop (schema) or opengraph tags
• Provide as much structured data as possible
• Implement them as close as possible to the standards
@Pricesearcher #BrightonSEO
ACTION POINT #3 – JAVASCRIPT & PRICE
• Be wary of the side effects of a javascript heavy site on crawling
• If you do implement a javascript heavy site, meta tags with
structured information are even more important!
• Be wary when converting the price based on geo location
• Don’t perform the price conversion in Javascript
@Pricesearcher #BrightonSEO
ACTION POINT #4 - ANTI-CRAWL & ROBOTS.TXT
• Ask yourselves what’s the benefit of an anti-crawl mechanism
• Ask yourselves what’s the benefit of blocking all crawlers in
robots.txt
• Control the speed of crawlers using crawl-delay
@Pricesearcher #BrightonSEO
ACTION POINT #5 - HAVE A SITEMAP MEETING
• Have a sitemap strategy, it’s just as important as your SEO strategy
• Sitemaps contribute massively to discoverability, yet are often overlooked
• Make sure you are doing everything you can to provide structured information
• Review your robots.txt contents
• Address missed opportunities from your sitemap sooner rather than later
@Pricesearcher #BrightonSEO
THANKS FOR LISTENING!
Pricebot
http://www.pricesearcher.com/pricebot
Keen to hear from you with feedback about PriceBot or Pricesearcher in general.
Feel free to drop me a line at vlassios@pricesearcher.com or catch up with me at
our stand B11 in the expo hall
1 of 27

Recommended

How to Survive & Thrive after Mobile First Indexing - Rachel Costello, Techni... by
How to Survive & Thrive after Mobile First Indexing - Rachel Costello, Techni...How to Survive & Thrive after Mobile First Indexing - Rachel Costello, Techni...
How to Survive & Thrive after Mobile First Indexing - Rachel Costello, Techni...DeepCrawl
610 views34 slides
URL Funnel Optimisation: How to get budget for SEO - Michal Magdziarz, CEO, D... by
URL Funnel Optimisation: How to get budget for SEO - Michal Magdziarz, CEO, D...URL Funnel Optimisation: How to get budget for SEO - Michal Magdziarz, CEO, D...
URL Funnel Optimisation: How to get budget for SEO - Michal Magdziarz, CEO, D...DeepCrawl
787 views27 slides
BrightonSEO April 2018 Mobile-First & Crawl Budget by
BrightonSEO April 2018 Mobile-First & Crawl BudgetBrightonSEO April 2018 Mobile-First & Crawl Budget
BrightonSEO April 2018 Mobile-First & Crawl BudgetMark Thomas
4.9K views48 slides
How to Unleash The Power of Unique Content by
How to Unleash The Power of Unique ContentHow to Unleash The Power of Unique Content
How to Unleash The Power of Unique ContentEleni Cashell
6.7K views81 slides
Cut the Crap: Next Level Content Audits with Crawlers - Sam Marsden, SEO & Co... by
Cut the Crap: Next Level Content Audits with Crawlers - Sam Marsden, SEO & Co...Cut the Crap: Next Level Content Audits with Crawlers - Sam Marsden, SEO & Co...
Cut the Crap: Next Level Content Audits with Crawlers - Sam Marsden, SEO & Co...DeepCrawl
4.4K views86 slides
Meaningful SEO Reporting Insights Without Google Analytics by
Meaningful SEO Reporting Insights Without Google AnalyticsMeaningful SEO Reporting Insights Without Google Analytics
Meaningful SEO Reporting Insights Without Google AnalyticsNicole Bullock
4.7K views24 slides

More Related Content

What's hot

Crawling, indexation & the impact on performance | Brighton SEO by
Crawling, indexation & the impact on performance | Brighton SEOCrawling, indexation & the impact on performance | Brighton SEO
Crawling, indexation & the impact on performance | Brighton SEOMartin Sean Fennon
322 views43 slides
Conflicting Website Signals & Confused Search Engines - Rachel Costello, Tech... by
Conflicting Website Signals & Confused Search Engines - Rachel Costello, Tech...Conflicting Website Signals & Confused Search Engines - Rachel Costello, Tech...
Conflicting Website Signals & Confused Search Engines - Rachel Costello, Tech...Rachel Costello
8.5K views71 slides
Kostas Voudouris - BrightonSEO - Perfromance-based optimisation using Google ... by
Kostas Voudouris - BrightonSEO - Perfromance-based optimisation using Google ...Kostas Voudouris - BrightonSEO - Perfromance-based optimisation using Google ...
Kostas Voudouris - BrightonSEO - Perfromance-based optimisation using Google ...kvonweb
18.8K views20 slides
Technical SEO Beyond the Audit - Brighton SEO April 2017 - Philip Gamble by
Technical SEO Beyond the Audit - Brighton SEO April 2017 - Philip GambleTechnical SEO Beyond the Audit - Brighton SEO April 2017 - Philip Gamble
Technical SEO Beyond the Audit - Brighton SEO April 2017 - Philip GamblePhilip Gamble
10.7K views64 slides
SearchLeeds 2018 - Luke Carthy - How to optimise the s*** out of your interna... by
SearchLeeds 2018 - Luke Carthy - How to optimise the s*** out of your interna...SearchLeeds 2018 - Luke Carthy - How to optimise the s*** out of your interna...
SearchLeeds 2018 - Luke Carthy - How to optimise the s*** out of your interna...Branded3
4.7K views47 slides
Optimizing Customer Journeys Online | Digital Growth Unleashed 2019 | Rachel ... by
Optimizing Customer Journeys Online | Digital Growth Unleashed 2019 | Rachel ...Optimizing Customer Journeys Online | Digital Growth Unleashed 2019 | Rachel ...
Optimizing Customer Journeys Online | Digital Growth Unleashed 2019 | Rachel ...DeepCrawl
779 views86 slides

What's hot(16)

Crawling, indexation & the impact on performance | Brighton SEO by Martin Sean Fennon
Crawling, indexation & the impact on performance | Brighton SEOCrawling, indexation & the impact on performance | Brighton SEO
Crawling, indexation & the impact on performance | Brighton SEO
Martin Sean Fennon322 views
Conflicting Website Signals & Confused Search Engines - Rachel Costello, Tech... by Rachel Costello
Conflicting Website Signals & Confused Search Engines - Rachel Costello, Tech...Conflicting Website Signals & Confused Search Engines - Rachel Costello, Tech...
Conflicting Website Signals & Confused Search Engines - Rachel Costello, Tech...
Rachel Costello8.5K views
Kostas Voudouris - BrightonSEO - Perfromance-based optimisation using Google ... by kvonweb
Kostas Voudouris - BrightonSEO - Perfromance-based optimisation using Google ...Kostas Voudouris - BrightonSEO - Perfromance-based optimisation using Google ...
Kostas Voudouris - BrightonSEO - Perfromance-based optimisation using Google ...
kvonweb18.8K views
Technical SEO Beyond the Audit - Brighton SEO April 2017 - Philip Gamble by Philip Gamble
Technical SEO Beyond the Audit - Brighton SEO April 2017 - Philip GambleTechnical SEO Beyond the Audit - Brighton SEO April 2017 - Philip Gamble
Technical SEO Beyond the Audit - Brighton SEO April 2017 - Philip Gamble
Philip Gamble10.7K views
SearchLeeds 2018 - Luke Carthy - How to optimise the s*** out of your interna... by Branded3
SearchLeeds 2018 - Luke Carthy - How to optimise the s*** out of your interna...SearchLeeds 2018 - Luke Carthy - How to optimise the s*** out of your interna...
SearchLeeds 2018 - Luke Carthy - How to optimise the s*** out of your interna...
Branded34.7K views
Optimizing Customer Journeys Online | Digital Growth Unleashed 2019 | Rachel ... by DeepCrawl
Optimizing Customer Journeys Online | Digital Growth Unleashed 2019 | Rachel ...Optimizing Customer Journeys Online | Digital Growth Unleashed 2019 | Rachel ...
Optimizing Customer Journeys Online | Digital Growth Unleashed 2019 | Rachel ...
DeepCrawl779 views
BrightonSEO 2017 - SEO quick wins from a technical check by Chloe Bodard
BrightonSEO 2017  - SEO quick wins from a technical checkBrightonSEO 2017  - SEO quick wins from a technical check
BrightonSEO 2017 - SEO quick wins from a technical check
Chloe Bodard8.9K views
Building an SEO Exponential Growth model by closing your content gaps by Razvan Gavrilas
Building an SEO Exponential Growth model by closing your content gapsBuilding an SEO Exponential Growth model by closing your content gaps
Building an SEO Exponential Growth model by closing your content gaps
Razvan Gavrilas29K views
Sam morton 10 Tips to Scale Link Building for your Clients by Sam Morton
Sam morton   10 Tips to Scale Link Building for your Clients  Sam morton   10 Tips to Scale Link Building for your Clients
Sam morton 10 Tips to Scale Link Building for your Clients
Sam Morton10.8K views
Redefining relevance: links in 2018 - #LeedsLovesSearch by Branded3
Redefining relevance: links in 2018 - #LeedsLovesSearchRedefining relevance: links in 2018 - #LeedsLovesSearch
Redefining relevance: links in 2018 - #LeedsLovesSearch
Branded3434 views
MeasureFest July 2021 - Session Segmentation with Machine Learning by Richard Lawrence
MeasureFest July 2021 - Session Segmentation with Machine LearningMeasureFest July 2021 - Session Segmentation with Machine Learning
MeasureFest July 2021 - Session Segmentation with Machine Learning
Richard Lawrence1.7K views
SearchLeeds 2018 - Steve Chambers - Stickyeyes - How not to F**K up a Migration by Branded3
SearchLeeds 2018 - Steve Chambers - Stickyeyes - How not to F**K up a Migration SearchLeeds 2018 - Steve Chambers - Stickyeyes - How not to F**K up a Migration
SearchLeeds 2018 - Steve Chambers - Stickyeyes - How not to F**K up a Migration
Branded35K views
Mark Osborne - Brighton SEO April 2019 The Seedy Underbelly of Keyword Resear... by Mark Osborne
Mark Osborne - Brighton SEO April 2019 The Seedy Underbelly of Keyword Resear...Mark Osborne - Brighton SEO April 2019 The Seedy Underbelly of Keyword Resear...
Mark Osborne - Brighton SEO April 2019 The Seedy Underbelly of Keyword Resear...
Mark Osborne9.6K views
SMX West 2020 - Leveraging Structured Data for Maximum Effect by Abby Hamilton
SMX West  2020 - Leveraging Structured Data for Maximum EffectSMX West  2020 - Leveraging Structured Data for Maximum Effect
SMX West 2020 - Leveraging Structured Data for Maximum Effect
Abby Hamilton2.1K views
Max Prin - MnSearch Summit 2018 - SEO for the Current Mobile Landscape by Max Prin
Max Prin - MnSearch Summit 2018 - SEO for the Current Mobile LandscapeMax Prin - MnSearch Summit 2018 - SEO for the Current Mobile Landscape
Max Prin - MnSearch Summit 2018 - SEO for the Current Mobile Landscape
Max Prin594 views
How to report on SEO in 2018 #BrightonSEO by Branded3
How to report on SEO in 2018 #BrightonSEOHow to report on SEO in 2018 #BrightonSEO
How to report on SEO in 2018 #BrightonSEO
Branded38.4K views

Similar to What a search engine can teach you about product sitemaps - BrightonSEO April 2018

How to disrupt established markets with SEO in 2015 - LOGIN 2015 by
How to disrupt established markets with SEO in 2015 - LOGIN 2015How to disrupt established markets with SEO in 2015 - LOGIN 2015
How to disrupt established markets with SEO in 2015 - LOGIN 2015Yannis Karagiannidis
789 views119 slides
Digital Marketing Mumbai by
Digital Marketing MumbaiDigital Marketing Mumbai
Digital Marketing MumbaiYashwant Jethwani
122 views35 slides
Lessons From Spider Support by
Lessons From Spider SupportLessons From Spider Support
Lessons From Spider SupportOliver Brett
20 views235 slides
SEO Tools of the Trade - Barcelona Affiliate Conference 2014 by
SEO Tools of the Trade - Barcelona Affiliate Conference 2014SEO Tools of the Trade - Barcelona Affiliate Conference 2014
SEO Tools of the Trade - Barcelona Affiliate Conference 2014Bastian Grimm
8.9K views92 slides
Seo e marketing | PromoteDial.com by
Seo e marketing | PromoteDial.comSeo e marketing | PromoteDial.com
Seo e marketing | PromoteDial.comPromoteDial.com
44 views43 slides
BrightEdge Share15 - CM205: Content Convergence: Search, Social & Content - M... by
BrightEdge Share15 - CM205: Content Convergence: Search, Social & Content - M...BrightEdge Share15 - CM205: Content Convergence: Search, Social & Content - M...
BrightEdge Share15 - CM205: Content Convergence: Search, Social & Content - M...BrightEdge Technologies
221 views20 slides

Similar to What a search engine can teach you about product sitemaps - BrightonSEO April 2018(20)

How to disrupt established markets with SEO in 2015 - LOGIN 2015 by Yannis Karagiannidis
How to disrupt established markets with SEO in 2015 - LOGIN 2015How to disrupt established markets with SEO in 2015 - LOGIN 2015
How to disrupt established markets with SEO in 2015 - LOGIN 2015
Lessons From Spider Support by Oliver Brett
Lessons From Spider SupportLessons From Spider Support
Lessons From Spider Support
Oliver Brett20 views
SEO Tools of the Trade - Barcelona Affiliate Conference 2014 by Bastian Grimm
SEO Tools of the Trade - Barcelona Affiliate Conference 2014SEO Tools of the Trade - Barcelona Affiliate Conference 2014
SEO Tools of the Trade - Barcelona Affiliate Conference 2014
Bastian Grimm8.9K views
BrightEdge Share15 - CM205: Content Convergence: Search, Social & Content - M... by BrightEdge Technologies
BrightEdge Share15 - CM205: Content Convergence: Search, Social & Content - M...BrightEdge Share15 - CM205: Content Convergence: Search, Social & Content - M...
BrightEdge Share15 - CM205: Content Convergence: Search, Social & Content - M...
Not Just a Blog - WordPress & ECommerce by Will Hanke
Not Just a Blog - WordPress & ECommerceNot Just a Blog - WordPress & ECommerce
Not Just a Blog - WordPress & ECommerce
Will Hanke112 views
Understanding SEO - BritMums Live 16 Presentation by Judith Lewis
Understanding SEO - BritMums Live 16 PresentationUnderstanding SEO - BritMums Live 16 Presentation
Understanding SEO - BritMums Live 16 Presentation
Judith Lewis1.6K views
Redefining Technical SEO - Paul Shapiro at MozCon 2019 by Catalyst
Redefining Technical SEO - Paul Shapiro at MozCon 2019Redefining Technical SEO - Paul Shapiro at MozCon 2019
Redefining Technical SEO - Paul Shapiro at MozCon 2019
Catalyst1.6K views
Tools of the Trade for Running SEO Audits - SMX East 2015: Essential Steps fo... by Benj Arriola
Tools of the Trade for Running SEO Audits - SMX East 2015: Essential Steps fo...Tools of the Trade for Running SEO Audits - SMX East 2015: Essential Steps fo...
Tools of the Trade for Running SEO Audits - SMX East 2015: Essential Steps fo...
Benj Arriola4.1K views
Jeremy cabral search marketing summit - scraping data-driven content (1) by Jeremy Cabral
Jeremy cabral   search marketing summit - scraping data-driven content (1)Jeremy cabral   search marketing summit - scraping data-driven content (1)
Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy Cabral363 views
Google seo-search-engine-optimization-introduction-powerpoint-presentation by Sandeep Sharma
Google seo-search-engine-optimization-introduction-powerpoint-presentationGoogle seo-search-engine-optimization-introduction-powerpoint-presentation
Google seo-search-engine-optimization-introduction-powerpoint-presentation
Sandeep Sharma17.4K views
Big Data graph Clustering with Laurence O'Toole - Digital Marketing Show, Nov... by Authoritas
Big Data graph Clustering with Laurence O'Toole - Digital Marketing Show, Nov...Big Data graph Clustering with Laurence O'Toole - Digital Marketing Show, Nov...
Big Data graph Clustering with Laurence O'Toole - Digital Marketing Show, Nov...
Authoritas1K views

Recently uploaded

hamro digital logics.pptx by
hamro digital logics.pptxhamro digital logics.pptx
hamro digital logics.pptxtupeshghimire
10 views36 slides
Affiliate Marketing by
Affiliate MarketingAffiliate Marketing
Affiliate MarketingNavin Dhanuka
18 views30 slides
Marketing and Community Building in Web3 by
Marketing and Community Building in Web3Marketing and Community Building in Web3
Marketing and Community Building in Web3Federico Ast
15 views64 slides
How to think like a threat actor for Kubernetes.pptx by
How to think like a threat actor for Kubernetes.pptxHow to think like a threat actor for Kubernetes.pptx
How to think like a threat actor for Kubernetes.pptxLibbySchulze1
7 views33 slides
The Dark Web : Hidden Services by
The Dark Web : Hidden ServicesThe Dark Web : Hidden Services
The Dark Web : Hidden ServicesAnshu Singh
16 views24 slides
ATPMOUSE_융합2조.pptx by
ATPMOUSE_융합2조.pptxATPMOUSE_융합2조.pptx
ATPMOUSE_융합2조.pptxkts120898
35 views70 slides

Recently uploaded(6)

Marketing and Community Building in Web3 by Federico Ast
Marketing and Community Building in Web3Marketing and Community Building in Web3
Marketing and Community Building in Web3
Federico Ast15 views
How to think like a threat actor for Kubernetes.pptx by LibbySchulze1
How to think like a threat actor for Kubernetes.pptxHow to think like a threat actor for Kubernetes.pptx
How to think like a threat actor for Kubernetes.pptx
LibbySchulze17 views
The Dark Web : Hidden Services by Anshu Singh
The Dark Web : Hidden ServicesThe Dark Web : Hidden Services
The Dark Web : Hidden Services
Anshu Singh16 views
ATPMOUSE_융합2조.pptx by kts120898
ATPMOUSE_융합2조.pptxATPMOUSE_융합2조.pptx
ATPMOUSE_융합2조.pptx
kts12089835 views

What a search engine can teach you about product sitemaps - BrightonSEO April 2018

  • 1. Vlassios Rizopoulos Chief Technology Officer @ pricesearcher.com What a search engine can teach you about product sitemaps @Pricesearcher #BrightonSEO
  • 2. @Pricesearcher #BrightonSEO BACKGROUND Pricesearcher is a vertical search engine focusing on products and their prices. Our mission is to provide access to all the worlds prices in one place.
  • 3. @Pricesearcher #BrightonSEO OUR MISSION IS TO INDEX ALL THE WORLD’S PRICES
  • 4. @Pricesearcher #BrightonSEO SOURCES OF DATA Product feeds from 5000+ retailers Developed plugins Developed PriceBot to complete the picture
  • 5. @Pricesearcher #BrightonSEO PROGRESS TO DATE Gathered data on 1.1 Billion products Online in 11 Countries Gathered 91 Billion price points for our products On average we check the price of a product 3 times a day We have gathered: 17,000,000 ISBNs 144,000,000 MPNs 73,000,000 SKUs 157,000,000 GTINs GB / US / DE / FR / IT / IE / NO / SE / FI / DK / NG
  • 6. @Pricesearcher #BrightonSEO WHAT IS PRICEBOT? Pricebot is our proprietary crawler, built to discover products and turn unstructured data from web pages into structured data for our product database Pricesearcher is the only product search engine that crawls to complement our product coverage PriceBot is fully robots.txt compliant, leaves behind a footprint in its user agent and has a built-in feedback mechanism http://www.pricesearcher.com/pricebot
  • 7. @Pricesearcher #BrightonSEO WHAT INFORMATION IS PRICEBOT COLLECTING? We are looking to extract the following fields: • Product Title • Product Image • Product Price and optionally: • Product Description • Product Identifier (GTIN/UPC/EAN/ISBN) • Product Brand • Product Category • Product Stock Availability
  • 8. Vastly simplified discovering all the products from retailers @Pricesearcher #BrightonSEO INITIAL CRAWLING TECH DEPENDED ON SITEMAPS
  • 9. @Pricesearcher #BrightonSEO DATA SAMPLE We will focus on 4000 UK retailers we currently crawl using XML sitemaps discovering 20million+ products
  • 11. @Pricesearcher #BrightonSEO 1. SITEMAP DATA have an XML sitemap with product links that’s regularly updated 91% 61% 54% of retailer websites of retailer websites of retailer websites
  • 12. @Pricesearcher #BrightonSEO 2. BLOCKING OF CRAWLERS have blocked us unintentionally (generic robots.txt entry or 403 automatic block) have blocked us intentionally (robots.txt entry) 2% of retailer websites 0.05% of retailer websites
  • 13. @Pricesearcher #BrightonSEO 3. EXTRACTION USING METADATA STANDARDS have product title + price + image defined using meta / opengraph tags have product title + price + image defined using meta / itemprop tags (schema) have product title + price + image defined using both 41% 36% 12% of retailer websites of retailer websites of retailer websites
  • 14. @Pricesearcher #BrightonSEO 4. EXTRACTION USING JAVASCRIPT no info extracted due to heavy rendering being uneconomical price cannot be extracted as it is converted / calculated on the fly 2% of retailer websites 1% of retailer websites
  • 15. @Pricesearcher #BrightonSEO 5. SITEMAP LINKS have multiple links to the same product pages have multiple links to pages that return 404 codes 2% of retailer websites 3% of retailer websites
  • 16. @Pricesearcher #BrightonSEO 6. PRODUCT IDENTIFIERS provide a GTIN-14, EAN-13, UPC-12/8 for their products provide an SKU for their products provide an ISBN for their products 24% of retailer websites 7% of retailer websites 3% of retailer websites
  • 17. @Pricesearcher #BrightonSEO 7. PRODUCT CATALOGUE SIZE have less than 5000 product links in their sitemap have between 5000 and 30000 links have more than 30000 links 14% of retailer websites 79% of retailer websites 7% of retailer websites
  • 18. @Pricesearcher #BrightonSEO 8. DATA RICHNESS #1 provide a brand for their products provide a category for their products provide a stock indicator for their products 17% of retailer websites 44% of retailer websites 62% of retailer websites
  • 19. @Pricesearcher #BrightonSEO 9. DATA RICHNESS #2 – NUMBER OF DIMENSIONS Crawler 6 dimensions Plugin Product Feed 12 dimensions 23 dimensions
  • 20. @Pricesearcher #BrightonSEO 10. SITEMAP DISCOVERABILITY list their sitemap in robots.txt33% of retailer websites
  • 22. @Pricesearcher #BrightonSEO ACTION POINT #1 - SITEMAP • Have an XML sitemap • Have the path of your sitemap listed in robots.txt • Have your product pages in your sitemap • Regularly update your sitemap • Don’t point to 404 pages from your sitemap
  • 23. @Pricesearcher #BrightonSEO ACTION POINT #2 - META / OPENGRAPH / ITEMPROP • Provide structured information on your products using meta itemprop (schema) or opengraph tags • Provide as much structured data as possible • Implement them as close as possible to the standards
  • 24. @Pricesearcher #BrightonSEO ACTION POINT #3 – JAVASCRIPT & PRICE • Be wary of the side effects of a javascript heavy site on crawling • If you do implement a javascript heavy site, meta tags with structured information are even more important! • Be wary when converting the price based on geo location • Don’t perform the price conversion in Javascript
  • 25. @Pricesearcher #BrightonSEO ACTION POINT #4 - ANTI-CRAWL & ROBOTS.TXT • Ask yourselves what’s the benefit of an anti-crawl mechanism • Ask yourselves what’s the benefit of blocking all crawlers in robots.txt • Control the speed of crawlers using crawl-delay
  • 26. @Pricesearcher #BrightonSEO ACTION POINT #5 - HAVE A SITEMAP MEETING • Have a sitemap strategy, it’s just as important as your SEO strategy • Sitemaps contribute massively to discoverability, yet are often overlooked • Make sure you are doing everything you can to provide structured information • Review your robots.txt contents • Address missed opportunities from your sitemap sooner rather than later
  • 27. @Pricesearcher #BrightonSEO THANKS FOR LISTENING! Pricebot http://www.pricesearcher.com/pricebot Keen to hear from you with feedback about PriceBot or Pricesearcher in general. Feel free to drop me a line at vlassios@pricesearcher.com or catch up with me at our stand B11 in the expo hall

Editor's Notes

  1. Unintentional blocks: Crawl-delay is very high that would take weeks to crawl a single site All user-agents are blocked in robots.txt Automated anti-crawl system kicks in and starts serving 403s