Successfully reported this slideshow.
Your SlideShare is downloading. ×

Screaming Frog + Xpath: BrightonSEO April 2019

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
Screaming frog + xpath en
Screaming frog + xpath en
Loading in …3
×

Check these out next

1 of 74 Ad

More Related Content

Recently uploaded (20)

Advertisement

Screaming Frog + Xpath: BrightonSEO April 2019

  1. 1. Screaming Frog + Xpath: A Guide to Analyse the Pants Off Your Competition Sabine Langmann // sabine-langmann.com // @SabTheLa https://slideshare.net/sabinelangmann
  2. 2. 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  3. 3. Level 1 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  4. 4. What is this about? 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  5. 5. We‘d like: to crawl specific elements on our own web pages or the ones of our competition We use: Screaming Frog‘s Custom Extraction + XPath 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  6. 6. 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  7. 7. 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  8. 8. 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  9. 9. Level 2 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  10. 10. Who am I? https://www.sabine-langmann.com https://www.linkedin.com/in/sabine-langmann/ @SabTheLa 12.04.2019 Sabine Langmann Slides: bit.ly/sfx-2019 bit.ly/sfx-2019
  11. 11. Level 3Level 3 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  12. 12. Xpath 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  13. 13. XPath (XML Path Language) is a query language for selecting nodes from an XML document. Wikipedia 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  14. 14. Simple Syntax node every page element (e.g. H2, a, p, div) // adresses a certain node attribute attribute of a node (e.g. class, id) @ adresses a certain attribute count() counts addressed nodes 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  15. 15. Simple Syntax 12.04.2019 Sabine Langmann //node[@attribute="attribute_name"] bit.ly/sfx-2019
  16. 16. Simple Syntax 12.04.2019 Sabine Langmann //node[@attribute1="attribute_name1" and @attribute2="attribute_name2"] bit.ly/sfx-2019
  17. 17. Simple Syntax 12.04.2019 Sabine Langmann count(//node[@attribute="attribute_name"]) bit.ly/sfx-2019
  18. 18. Level 4 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  19. 19. Examples 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  20. 20. BBC.com vs. TheGuardian.com 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  21. 21. BBC.com 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  22. 22. 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  23. 23. 12.04.2019 Sabine Langmann How many images? How many H2, H3, etc? How many words? How many links to which pages? bit.ly/sfx-2019
  24. 24. 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  25. 25. 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  26. 26. 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  27. 27. What am I searching for? 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  28. 28. In I‘m searching for text (name of the topic tag) 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  29. 29. Suitable Xpath selector: //li[ @class="tags-list__tags" and @data-entityid="topic_link_bottom" ] 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  30. 30. 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  31. 31. 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  32. 32. 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  33. 33. 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  34. 34. 12.04.2019 Sabine Langmann https://www.bbc.com/news/uk-politics-.* bit.ly/sfx-2019
  35. 35. 12.04.2019 Sabine Langmann Don‘t forget ;) bit.ly/sfx-2019
  36. 36. 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  37. 37. 12.04.2019 Sabine Langmann Result bit.ly/sfx-2019
  38. 38. 12.04.2019 Sabine Langmann Bit.ly/abjsd Ain‘t nobody got time for Excel. Better listen to Ben! 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  39. 39. TheGuardian.com 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  40. 40. 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  41. 41. 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  42. 42. 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  43. 43. 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  44. 44. What am I searching for? 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  45. 45. In <div class="submeta"> I‘m searching for the topic tag names 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  46. 46. Suitable Xpath selectors: //div[@class="submeta__section-labels"]) and //div[@class="submeta__keywords"] 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  47. 47. 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  48. 48. 12.04.2019 Sabine Langmann https://www.theguardian.com/sitemaps/news.xml bit.ly/sfx-2019
  49. 49. 12.04.2019 Sabine Langmann Result bit.ly/sfx-2019
  50. 50. Yourcat.co.uk vs. TheFoodaholic.co.uk 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  51. 51. Yourcat.co.uk 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  52. 52. 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  53. 53. 12.04.2019 Sabine Langmann How many links in editorial content? How many internal/external? bit.ly/sfx-2019
  54. 54. 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  55. 55. 12.04.2019 Sabine Langmann <div class=“post-body-container”> bit.ly/sfx-2019
  56. 56. “How many links in editorial content?” Suitable Xpath selector: count(//div[@class="post-body-container"]//p//a) 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  57. 57. “How many internal editorial links?” Suitable Xpath selector: count(//div[@class="post-body-container"]//p//a[ starts-with(@href, "https://www.yourcat.co.uk/")]) 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  58. 58. 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  59. 59. 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  60. 60. 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  61. 61. Result 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  62. 62. TheFoodaholic.co.uk 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  63. 63. 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  64. 64. 12.04.2019 Sabine Langmann How many links in editorial content? How many internal/external? bit.ly/sfx-2019
  65. 65. 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  66. 66. 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  67. 67. “How many links in editorial content?” Suitable Xpath selector: count(//div[@itemprop="articleBody"]//p //a[not(contains(@href, "wp-content"))]) 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  68. 68. “How many internal editorial links?” Xpath selector: count(//div[@itemprop="articleBody"]//p //a[starts-with(@href, "http://www.thefoodaholic.co.uk") and not(contains(@href, "wp-content"))]) 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  69. 69. Result 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  70. 70. Level 5 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  71. 71. Recap: Which data do I need? Can I crawl the respective elements? What is the right Xpath selector? That‘s it! 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  72. 72. More Xpath Cases, SERP-Crawling, Regex, … 12.04.2019 Sabine Langmann bit.ly/sfx-2019
  73. 73. 12.04.2019 Sabine Langmann Wait no more. Max shows how! bit.ly/sfx-2019
  74. 74. 12.04.2019 Sabine Langmann bit.ly/sfx-2019 Thanks!!

×