Screaming Frog custom extractions are a fantastic tool to pull out just about any data from all of your pages across a website, and are especially useful for large sites where you want to speed up your analysis! In this talk we'll take a look at some of the ways you can put Screaming Frog custom extractions to find new opportunities across your site.
Jess Hobbs - Technical SEO Consultant, Erudite Agency
3. 1. What is Screaming Frog? And what’s
included in a standard crawl?
2. What are Custom Extractions and why are
they useful?
3. What is Custom Search and why is it
useful?
4. How it works
5. Other ideas you could try
6. Using different types of extractor
@Jessica_James01
5. Website crawler that
gathers data from either
URLs it discovers based
on your chosen criteria
or a predefined list of
URLs.
@Jessica_James01
6. Page Titles
Meta Descriptions
Meta Keywords
H1
H2
Indexability status
Word Count
Page Size
Structured Data
Response Time
Meta Robots
@Jessica_James01
7. 2. What is a
Custom Extraction?
@Jessica_James01
8. But what happens if you
want to gather data that’s
specific to a particular
site?
@Jessica_James01
9. You can use a custom
extraction to gather
information with custom
selectors using REGEX, CSS
Selectors or XPaths.
@Jessica_James01
11. Custom search is similar
to custom extractions.
Custom search finds a
specific string in the
HTML rather than
extracting data based on
an identifier.
@Jessica_James01
12. Tracking IDs and Pixels
(GA, GTM, Hotjar,
Facebook)
Text Strings (Out of
Stock, Properties with
Hot Tubs)
@Jessica_James01
31. 5. What Else Could
You Use it For?
@Jessica_James01
32. 1. Identifying product pages without
reviews
2. Extract all the body copy on a page
(<p> tags)
3. Pull author names from
articles/identify blog posts without
authors
4. Find GA/GTM Tags/check for multiples
5. Extract and verify Hreflang tags
6. Extract headings below H2
7. Mark up (OL UL etc.)
@Jessica_James01
34. XPaths can be extracted using
the ‘Inspect’ feature in
Chrome.
This works well when the CSS
Selector isn’t unique to the
element you need to extract.
@Jessica_James01
35. XPaths are ideal for
scraping, and there are loads
of different XPath
expressions you can use for
more sophisticated
extractions.
@Jessica_James01
36. Syntax Function
// Searches anywhere
/ Only searches within the root
@ Selects a specific attribute
* Wildcard
[] Finds a specific element
. Selects current element
.. Selects parent element
@Jessica_James01
39. REGEX is particularly useful
when you want to extract
something that isn’t rendered
on the page, or to extract
non-HTML data.
JSON-LD Structured Data, Tags
and Pixel IDs…
@Jessica_James01
Now we’re going to do a slightly more complex example. MyBuilder has a great Q&A section, but it has nearly 40k questions. We want to narrow it down by the questions that got the most engagement