The document discusses how search engines like Google index and rank webpages. It covers the crawling and rendering process, as well as different types of indexing statuses that a page can have and how to avoid common indexing issues like duplicate content. Managing redirects, canonical tags, and structured data is important for proper indexing and rich search results. The overall crawling and indexing process is complex with many factors that can impact a page's search visibility and ranking.
Here we take a look at server log file analysis for SEO and explore not only the benefits but also the process of finding, gathering, shipping and analysing user agent logs
Duplicate content continues to confuse many of us. Part of the problem is there are different types of duplicate content which may be treated differently by search engines. There are different ways to deal with this in your SEO and content marketing strategy. It's important to be careful when removing content to ensure you're not shooting yourself in the foot. Instead of remove, try to improve or regroup content which is being triggered for the same query class / cluster and may be diluting. Change the emphasis and make something of added value for users. Consider query and category agnostic filtering versus content which is considered at query run-time auction in search results.
Creating Commerce Reviews and Considering The Case For User Generated ReviewsDawn Anderson MSc DigM
Reviews are becoming increasingly important to consumers. They are not only influencing consumers to purchase, their absence potentially disrupts the path to purchase as consumers often wish to seek the opinions of experts or the wider community before buying. Here we look at the benefits which can be gleaned from commerce reviews and what it takes to add structure for search engines as well as the criticality and objectivity needed to create a trusted review. Both professional expert review and consumer user generated content reviews are explored in order to identity signals of further trust and a balance (both sides of the coin) full picture. There are also areas to be aware of such as the guidelines from the International Consumer Protection and Enforcement Network (ICPEN) for digital influencers, review administrators and marketing professionals / traders. What are the potential pros and cons of adding consumer user generated reviews to a site for SEO and for website management and content marketing?
SEO - The Rise of Persona Modelled Intent Driven Contextual SearchDawn Anderson MSc DigM
Increasing volumes of data on users and 'users like users' via user modelling now provide search engines with clues as to what types of pages to rank for different user types, terms, in different contexts, locations & scenarios
Mobile-first goes beyond simply indexing in a search engine. It has several meanings, which traverse user-behaviour, web design, adoption in different territories, adoption amongst user segments, adoption in different verticals. We need to be aware of these fundamentals changes in search behaviour and adapt quickly.
Dawn Anderson SEO Consumer Choice Crawl Budget Optimization ConflictsDawn Anderson MSc DigM
Optimizing for both humans and search engines can be challenging. Humans may be faced with too many choices online in ecommerce sites which are optimized for search engine crawling efficiency by SEOs. It is important that we consider both humans and search engine heuristic implementation to help both understand the information architecture of a website and achieve maximum SEO / UX / CRO harmony
Here we take a look at server log file analysis for SEO and explore not only the benefits but also the process of finding, gathering, shipping and analysing user agent logs
Duplicate content continues to confuse many of us. Part of the problem is there are different types of duplicate content which may be treated differently by search engines. There are different ways to deal with this in your SEO and content marketing strategy. It's important to be careful when removing content to ensure you're not shooting yourself in the foot. Instead of remove, try to improve or regroup content which is being triggered for the same query class / cluster and may be diluting. Change the emphasis and make something of added value for users. Consider query and category agnostic filtering versus content which is considered at query run-time auction in search results.
Creating Commerce Reviews and Considering The Case For User Generated ReviewsDawn Anderson MSc DigM
Reviews are becoming increasingly important to consumers. They are not only influencing consumers to purchase, their absence potentially disrupts the path to purchase as consumers often wish to seek the opinions of experts or the wider community before buying. Here we look at the benefits which can be gleaned from commerce reviews and what it takes to add structure for search engines as well as the criticality and objectivity needed to create a trusted review. Both professional expert review and consumer user generated content reviews are explored in order to identity signals of further trust and a balance (both sides of the coin) full picture. There are also areas to be aware of such as the guidelines from the International Consumer Protection and Enforcement Network (ICPEN) for digital influencers, review administrators and marketing professionals / traders. What are the potential pros and cons of adding consumer user generated reviews to a site for SEO and for website management and content marketing?
SEO - The Rise of Persona Modelled Intent Driven Contextual SearchDawn Anderson MSc DigM
Increasing volumes of data on users and 'users like users' via user modelling now provide search engines with clues as to what types of pages to rank for different user types, terms, in different contexts, locations & scenarios
Mobile-first goes beyond simply indexing in a search engine. It has several meanings, which traverse user-behaviour, web design, adoption in different territories, adoption amongst user segments, adoption in different verticals. We need to be aware of these fundamentals changes in search behaviour and adapt quickly.
Dawn Anderson SEO Consumer Choice Crawl Budget Optimization ConflictsDawn Anderson MSc DigM
Optimizing for both humans and search engines can be challenging. Humans may be faced with too many choices online in ecommerce sites which are optimized for search engine crawling efficiency by SEOs. It is important that we consider both humans and search engine heuristic implementation to help both understand the information architecture of a website and achieve maximum SEO / UX / CRO harmony
This no-hype session at SMX Paris 2017 provides practical tips that will help you move your website to HTTPS. No complex theory but highly actionable recommendations. This session will have tips for all levels of experience. Join former Google Search Quality team member Fili Wiese to learn all about the benefits, the challenges, how to prepare for a move, what to expect, how to measure success and how to avoid SEO pitfalls when moving a site to HTTPS. Having programmed websites and Google internal tools, Fili Wiese is passionate about improving the user experience and the go-to guy when it comes to on-page SEO and HTTPS.
The search engine experience 2.0 - U of U MBA @DESB_UofUClark T. Bell
The Search Engine Experience 2.0 - U of U MBA.
This is a presentation I gave for some MBA - digital marketing students at the David Eccles School of Business at the University of Utah in Salt Lake City on October 20, 2014.
The presentation is on "The Search Engine Experience 2.0" which covers the history of Google, Inc., Search Engine Optimization, On-Page SEO, Off-Page SEO, Performance Optimization, Page Rank and Domain Authority.
You can also view this slide on my website --> http://www.clarktbell.com/
SEO isn't just about ranking factors or signals as single entities. Sustainable SEO requires understanding how several signals relate to each other and where search algorithms evaluate each of those to confirm initial understanding. Understanding these relationships is vital to ensuring maximum SEO ranking value.
Observability - Experiencing the “why” behind the jargon (FlowCon 2019)Abigail Bangser
This is a near duplication of the previous keynote deck where we talk about three examples of where I really felt the pain of not applying core observability techniques. The three covered are:
- No pre-aggregation
- Arbitrarily wide events
- Exploration over dashboarding
SearchLove Boston 2018 - Tom Anthony - Hacking Google: what you can learn fro...Distilled
Tom has long been fascinated with how the web works… and how he could break it. In this presentation, Tom will discuss some of the times that he has discovered security issues in Google, Facebook and Twitter. He will discuss compromising Search Console so that he could look up any penalty in the Manual Action tool, how he took control of tens of thousands of websites, and how he recently discovered a major bug that let him rank brand new sites on the first page with no links at all. Tom will outline how these exploits work, and in doing so share some details about the technical side of the web.
Sam Partland - http://www.digisearch.com.au
While a migration of a small site is pretty simple, as soon as you move into migrating larger sites there are lot more things to consider. Whether it’s the more advanced redirect requirements, or poor implementation that slipped the checks, there are a number things we can do to ensure it’s still a successful one.
I will run through my tips on how to correctly perform a website migration, and cover;
• How to map out your migration
• Issues that you may face
• Post-migration analysis
We will be working through analysis of data you should already have, like pre-migration rankings & website scrapes, but I will also cover how to analyse a migration where you didn't have the correct data to begin with. This would be particularly useful if you have a client that has recently stuffed one up, and needs your help, or if you’re trying to work out whether a competitor’s migration was successful.
SearchLove London | Dave Sottimano, 'Using Data to Win Arguments' Distilled
From past experiences with data, Dave knows relying on your gut can be a mistake. Instead, we need to take comfort in the validation of solid data to ensure we’re making profitable decisions. Sharing real client examples, Dave will run through the essential steps: how to decide on a hypothesis, create conditions, and gather data.
Acquire an All Access Pass to Club GoogleJes Scholz
Optimising crawl budget and encouraging search engine indexing are concepts most SEOs are familiar with. But the devil is in the details. Especially as best practices have altered in recent years and will do so again with the introduction of APIs by both Google and Bing. Should you control crawlers with robots directives? Or XML sitemaps? Or submit via the APIs? Or just let Google figure it out? Let's into the optimal way to get your content into search engines fast.
Learn how to leverage the immersive capabilities of virtual reality. Leverage, business VR, make money, make more money, control minds, immersive experiences. You are an experience.
HeroConf 2016 - Keys to an Effective PPC Account StructureJes Scholz
PPC is not only about creative ads that connect to people. There is also a highly technical aspect. Learn how to utilise account structure to produce impressive ROI.
SearchLove 2016 - WhatsAppening with Chat App MarketingJes Scholz
Why is chat app marketing becoming popular now? What are some best practice examples? How do I launch my own messenger app? What is in the future for inbound marketing? The questions are more were answered at SearchLove 2016.
This no-hype session at SMX Paris 2017 provides practical tips that will help you move your website to HTTPS. No complex theory but highly actionable recommendations. This session will have tips for all levels of experience. Join former Google Search Quality team member Fili Wiese to learn all about the benefits, the challenges, how to prepare for a move, what to expect, how to measure success and how to avoid SEO pitfalls when moving a site to HTTPS. Having programmed websites and Google internal tools, Fili Wiese is passionate about improving the user experience and the go-to guy when it comes to on-page SEO and HTTPS.
The search engine experience 2.0 - U of U MBA @DESB_UofUClark T. Bell
The Search Engine Experience 2.0 - U of U MBA.
This is a presentation I gave for some MBA - digital marketing students at the David Eccles School of Business at the University of Utah in Salt Lake City on October 20, 2014.
The presentation is on "The Search Engine Experience 2.0" which covers the history of Google, Inc., Search Engine Optimization, On-Page SEO, Off-Page SEO, Performance Optimization, Page Rank and Domain Authority.
You can also view this slide on my website --> http://www.clarktbell.com/
SEO isn't just about ranking factors or signals as single entities. Sustainable SEO requires understanding how several signals relate to each other and where search algorithms evaluate each of those to confirm initial understanding. Understanding these relationships is vital to ensuring maximum SEO ranking value.
Observability - Experiencing the “why” behind the jargon (FlowCon 2019)Abigail Bangser
This is a near duplication of the previous keynote deck where we talk about three examples of where I really felt the pain of not applying core observability techniques. The three covered are:
- No pre-aggregation
- Arbitrarily wide events
- Exploration over dashboarding
SearchLove Boston 2018 - Tom Anthony - Hacking Google: what you can learn fro...Distilled
Tom has long been fascinated with how the web works… and how he could break it. In this presentation, Tom will discuss some of the times that he has discovered security issues in Google, Facebook and Twitter. He will discuss compromising Search Console so that he could look up any penalty in the Manual Action tool, how he took control of tens of thousands of websites, and how he recently discovered a major bug that let him rank brand new sites on the first page with no links at all. Tom will outline how these exploits work, and in doing so share some details about the technical side of the web.
Sam Partland - http://www.digisearch.com.au
While a migration of a small site is pretty simple, as soon as you move into migrating larger sites there are lot more things to consider. Whether it’s the more advanced redirect requirements, or poor implementation that slipped the checks, there are a number things we can do to ensure it’s still a successful one.
I will run through my tips on how to correctly perform a website migration, and cover;
• How to map out your migration
• Issues that you may face
• Post-migration analysis
We will be working through analysis of data you should already have, like pre-migration rankings & website scrapes, but I will also cover how to analyse a migration where you didn't have the correct data to begin with. This would be particularly useful if you have a client that has recently stuffed one up, and needs your help, or if you’re trying to work out whether a competitor’s migration was successful.
SearchLove London | Dave Sottimano, 'Using Data to Win Arguments' Distilled
From past experiences with data, Dave knows relying on your gut can be a mistake. Instead, we need to take comfort in the validation of solid data to ensure we’re making profitable decisions. Sharing real client examples, Dave will run through the essential steps: how to decide on a hypothesis, create conditions, and gather data.
Acquire an All Access Pass to Club GoogleJes Scholz
Optimising crawl budget and encouraging search engine indexing are concepts most SEOs are familiar with. But the devil is in the details. Especially as best practices have altered in recent years and will do so again with the introduction of APIs by both Google and Bing. Should you control crawlers with robots directives? Or XML sitemaps? Or submit via the APIs? Or just let Google figure it out? Let's into the optimal way to get your content into search engines fast.
Learn how to leverage the immersive capabilities of virtual reality. Leverage, business VR, make money, make more money, control minds, immersive experiences. You are an experience.
HeroConf 2016 - Keys to an Effective PPC Account StructureJes Scholz
PPC is not only about creative ads that connect to people. There is also a highly technical aspect. Learn how to utilise account structure to produce impressive ROI.
SearchLove 2016 - WhatsAppening with Chat App MarketingJes Scholz
Why is chat app marketing becoming popular now? What are some best practice examples? How do I launch my own messenger app? What is in the future for inbound marketing? The questions are more were answered at SearchLove 2016.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
14. @jes_scholz
BE THE RIGHT ‘TYPE’
Status Type
Error Submitted URL seems to be a soft 404
Error Submitted URL marked ‘noindex’
Error Submitted URL blocked by robots.txt
Error Submitted URL returns unauthorized request (401)
Error Submitted URL not found (404)
19. Google Indexing API
Time to crawl Within 1 minute
Content types Job postings
Live streams
Requirements Google Search Console verification
Relevant structured data
Rate limit / day 200 URLs
With option to request more
@jes_scholz
30. Destination URL Passes ranking
signals
Passes users Pace of de-
indexation
301
permanent
redirect
Relevant page Yes Yes Slow
Irrelevant page No
302
temporary
redirect
Relevant page Yes Yes Very slow
Irrelevant page No
404 page
not found
- No No Fast
410 gone - No No Fastest
RIGHT CODE FOR THE JOB
@jes_scholz
35. GET AN ENTRY STAMP
Status Type
Excluded Alternative page with proper canonical tag
@jes_scholz
36. GET AN ENTRY STAMP
Status Type
Excluded Duplicate without user-selected canonical
@jes_scholz
37. DON’T TRY TO TRANSFER ENTRY STAMPS
Status Type
Excluded
Excluded Duplicate, submitted URL not selected as canonical
Duplicate, Google chose different canonical than user
@jes_scholz
45. Use case Crawl behaviour Indexing behaviour Ranking signals
301 redirect Merge duplicates Infrequent crawl Slow de-indexing of
original URL
Passed on (if used
correctly)
Rel=
canonical
Duplicates have a
reason to exist
Infrequent crawl No indexing of
alternate URLs
Passed on
GSC URL
parameters
Prevent crawling of
parameter URLs
Not crawled No indexing Forfeit
Robots.txt Disallow crawling of
URLs
Not crawled Rare indexing Forfeit
Noindex tag Prevent indexing of
URLs
Infrequent crawl No indexing Forfeit
RECIPES TO DE-DUPE
@jes_scholz
64. Make a valuable match
between the user and the page
@jes_scholz
65. Use case De-indexing
speed
Ranking signals
410 gone Remove URLs Fast Forfeit
301 redirect Merge URLs with similar
content
Moderate Passed on (if used
correctly)
Rel= canonical Keep duplicate URLs Moderate Passed on (if used
correctly)
GSC URL parameters Keep non-duplicate URLs Moderate Forfeit
Noindex tag Prevent indexing of URLs Moderate Forfeit
Robots.txt Prevent crawling of URLs May not de-
index
Forfeit
RECIPES TO DE-INDEX
@jes_scholz
Have you heard? They opened a new nightclub. It’s so exclusive people literally wait for days in line, but most end up being turned away by the bouncer. But no matter when you go there, day or night, inside is always packed full. This is the place to be seen. Every IT girl, celebrity, person of note in the world is inside.
I’m speaking of cause about club Google and today I’m going to share with you the unwritten rules about what to do….
When waiting in line...
When facing the bouncer
When your ticket is checked
And once you're in
how to get to the VIP area
To make valuable connections
Google has a long line of URLs that it wants to crawl.
We all know queues are rarely first come first serve. Those who are known to bring value to the club, like regular patrons or celebrities, can jump the line. If you aren’t one of those pages, you don’t know how long you will have to wait. It could be an hour, a week, a few months or even longer. When a urls will be crawled is impacted by site architecture patterns, the URL history, the domain reputation and other factors. But there are a few easy to execute tactics that you use to influence where you end up in line.
The best way is to link up with friends who are respected by the club who can vouch for you.
This can be part of your internal link infrastructure or a link from another website. Each relevant link a page gets acts as a vote that it should be let into the club. This has benefits beyond just getting the page crawled, it will also help with rankings in the search engine result pages.
Another way is to call ahead and put your name down on the guest list by adding URLs to an XML sitemap.
Think of your XML sitemaps as a list of SEO relevant URLs you recommend search engines crawl as soon as the page is added or updated, which isn’t necessarily every single page of your website. Used wisely it can help search engines crawl your site more intelligently by drawing attention to high value pages you want to be crawled like your most recently published articles, new products, latest events. Used incorrectly, it will draw attention to your website’s flaws and you will see errors like these begin to appear in your Google Search Console coverage report.
The reason why this is the second best discover method is that it allows you to send the last modification date and time. This informs search engines when the page was added or last significantly changed and if it should be crawled. And it’s because of this tag that it’s important XML sitemap be dynamic and instantly update along with your website. I’ve seen many XML sitemaps coded to update only once per day. Essentially, you are choosing to wait in line for 24 hours.
Another option is to manually submit the URL in Google Search Console.
This gives the URL priority entry, so it is crawled almost immediately. But this method has its limitations, as Google only allows 50 such request per day.
If you need more scale, you can integrate with the Google Indexing APIs which allow you to directly notify the search engines of relevant URL updates.
You can expect a crawl within 1 minute of submitting any URL via the API. And yes, I do mean any submitted URL. No matter the content, even though officially Google says the API is only there to support pages with job postings and livestream structured data. Of course I’d never advise you to submit anything against the Google guidelines, but if accidentally you submited a different content type, I’m simply pointing out that it would be crawled with priority.
The takeaway here is that google is continually crawling your website, but you can use these 4 methods to help direct googlebot to pages you care about. Either trigger a first crawl to have content discovered, you know a page need it if you see the exclusion discovered - currently not indexed in GSC. Or a recrawl if you have updated the content.
Because getting in fast is a competitive advantage in SEO. The sooner your high-quality pages are indexed, the sooner they can start establishing top spot ranking in search results. This is even more critical when its comes to time-sensitive content, like breaking news, or for pages with a short lifespan, like product listings.
But crawling doesn’t guarantee indexing. If your coverage report looks something like this - you need to work on your site architecture. These URLs have been denied entry to club VALID.
But unlike at most clubs, the bouncer has told you why and I’m going to tell you what to do about it.
Let’s start with the basics. To some extent you can rely on your reputation if you are a big name. But if you walk up to the bouncer without pants on, chances are they wont let you in. You need to meet the dress code.
If you are wearing 5xx errors go home and improve your server infrastructure. You may have heard from some SEO guru the advice that you need to fix 404s.
There is no Google penalty for amassing 404 codes - that is a myth. If the page truly doesn’t exist because it was intentionally removed, there is nothing wrong. But that’s not to say they are best practice. If the URL had any ranking signals, these are lost to the 404 void.
So it’s common in SEO to implement 301 redirects when a page is removed.
In which case googlebot would crawl the original URL, see the 301 status code, and then add the destination URL to the crawl queue. The ranking signals with no dilution will be passed once google crawls and confirms that the destination URL has similar content.
But if you redirect to an irrelevant page, such as the homepage, Google will treat this as a soft 404 and won’t pass on the ranking signals. I can’t definitively tell you why this is, but I suspect it’s two fold. Firstly, for user experience, if i click on a search result expecting to land on a specific piece of content and all of a sudden I am on the homepage with no explanation that is not a better user experience than a custom 404 page where at least the user understands what happened. And secondly, rather than trading a 301 exclusion for a soft 404, encourage SEOs to signal to Google to rapidly de-index such content using a 410 code.
The main takeaway here is that there is no inherently good or bad codes but there are right and wrong codes for specific circumstances and right ways and wrong ways to implement those codes.
Like with redirects. The rules are clear. They won’t be allowed in to the club if they come in a big group.
If you have redirected page 1, to page 2 and then later 2 to page 3 you are creating a redirect chain - which is bad enough as it takes additional time to follow and forward the ranking signals, causes unnecessary load on your servers and latency for users. But if this continues past 5 pages, Googlebot gives up. You’re told “get to the back of the line”. So whatever the destination page was going to be, it won’t benefit from the transfer of ranking signals, and it may not even be crawled to get into the index.
Just because it is named a permanent redirect, doesn’t mean it should live forever. Break the chains and redirect each page to the final destination directly.
While we are on the topic of SEO misconceptions, let’s also tackle duplicate content. To many people believer that all duplicate content is bad. This is absolutely absurd. There are perfectly legitimate reasons to have duplicate content on site - for example if you are utilizing AMP, or you track using UTM parameters or you have a sort function to change the order of products. Duplicates don’t necessarily need to be redirected or removed but rather handled gracefully - and there are many options by which to achieve this.
One way is with a rel=canonical tag. An entry stamp that signals to search engines which of the duplicate URLs you wish to be indexed. If the tag is accepted, the alternate pages will be crawled, but much less frequently. They will excluded from the index, passing their ranking signals on to the canonical. So when you see the “alternate page with proper canonical tag” exclusion, it means your canonical has been accepted. This is a good thing.
If you see duplicate without user-selected canonical. Either you haven’t added the tag or it’s implemented incorrectly.
The worst problems arise when you see exclusions such as these, which show that Google thinks your trying to game the system and transfer signals to pages that don’t deserve them. In which case Google will happily ignore your stamp and make its own decision. Rel=canonical tags are only a hint, not a directive. Mis-signaling can be due to sitemaps or internal link infrastructure not prioritising the canonical or from using canonicals incorrectly, like on pages that aren’t actually duplicates.
The second option for handling duplicate content is by telling Googlebot not to crawl within Google Search Console URL parameter handling. When you specify to crawl no URL, those pages don’t exist as far as Google is concerned. Googlebot won't crawl the URLs, saving load on your server. But that comes at a price, if Googlebot can’t crawl, Caffeine can’t process signals - which may impact ranking - or extract internal links to add to the crawl queue - which may slow down site indexing.
Another way to stop a crawler is by disallowing the URL in the robots.txt file.
It’s the digital equivalent of a “no entry” sign on an unlocked door. And while googlebot obeys these instructions, it does it to the letter of the law, not the spirit.
So you may have pages that are specifically disallowed in robots.txt showing up in the search results.
Because if a blocked page has other strong ranking signals, Google may deem it relevant to index. Despite not having crawled the page. But because the content of that URL is unknown to Google, the search result looks something like this.
To definitively block a page from appearing in SERPs, you need to use a “noindex” robots meta tag or X-Robots-Tag in the HTTP header. After its processed, URLs with a “noindex” tag will also be crawled less frequently.
One problem is, if the tag it’s present for a long time, it will eventually lead Google to nofollow the page’s links as well - which means they won’t add those links to the crawl queue and ranking signals won’t be passed to linked pages.
And you start to see that all of this is rather complicated. And if one signal contradicts another, it’s not always clear how search engines will respond. So rather than using all these directives and hints to bandaid together your website, take a step back, breathe, and then work on the architecture of your website so that you can minimise robots directives as much as possible.
And move onto the next problem “Crawled - currently not indexed”. Essentially, the bouncer has looked you up and down and said ‘not tonight’. This is most commonly due to quality issues - thin content, poor-quality copy writing, combinations of category filters with no listings, tag pages with only one article, auto-generated user profiles with no details, anything that is clearly not worthy of indexing will be rejected. But if content is worthy but not being indexed you’re likely being tripped up by rendering.
Let’s talk about JavaScript - your ticket to a better user experience and more challenging SEO.
Of cause both Google and Bing are capable of indexing JS-generated content - because both use evergreen headless Chromium. Google shows the necessary rendering as a rather simple process - HTML goes into a render queue, it is then rendered and sends the DOM to be indexed. But Javascript injects a deeper level of complexity into the indexing equation.
Because there are two waves of indexing whenever javascript is involved - a bouncer who checks your ticket at the door followed by another who checks you bag just inside. The first wave indexes a page based on the initial HTML from the server. This is what you see when you right click & view page source.
The second indexes based on the DOM, which includes both the HTML and the rendered JS from the client side. This is what you see when you right click & inspect.
The challenge is that the 2nd wave of indexing is deferred until Caffeine has the rendering resources available. This means it takes longer to index Javascript-reliant content than HTML only content. Anywhere from days up to a few weeks from the time it was crawled.
But unlike most things in technical SEO, there is a clear solution. Use server side rendering so that all essential content is present in the initial HTML, allowing search engine to index it immediately. This should include your hero SEO elements like page titles, headings, canonicals, structured data and of course your main content and links.
To understand if you content is rendered on the server side you can’t rely on the mobile friendly test tool - it doesn't use the same pipeline that a true rendering engine would.
The live test functionality in the URL inspection tool is a bit better as it can tell you if Google is technically able to render the page.
But it is also significantly more patient. The real Caffeine may not be able to index the full content because of timeouts. So be sure you adhere to the 5 second render rule - as Caffeine will tend not to wait more than 5 seconds for a script when indexing.
To truly understand the outcome of rendering, you can’t rely on these tools. You need to take a close look at your ticket and make sure you understand what all the text and codes mean. Otherwise you have no idea whether your ticket is valid. You have to understand your rendering stack as it directly impacts the SEO performance of your site.
No critical content should be reliant on the render as the time to index could be weeks slower. That’s weeks worth of having to justify investment in a strategy that seemingly isn’t performing to clients. Weeks worth of lost sales to competitors. Weeks worth of waiting on rankings for time-limited URLs that are likely to be outdated before they’re indexed.
But then you're inside. And you will see that it is always packed full. But some of the people in the crowd aren't what you expected.
Like these guys. They got inside. They are having a good time. But do they have the slightest chance of converting anyone that night? No way. Problem is, they used your name to get in. Their behaviour reflects on your reputation.
Because URLs are not ranked solely on their own merits, but also on the company the keep and the family they belong to. Every page indexed by search engines impacts how the quality algorithms evaluate your domain reputation.
If you have a lot of URLs that aren’t in your sitemap, but are in the index, you have a problem. Either your sitemap doesn’t include all your SEO relevant URLs. Which can be easily fixed. Or you are suffering from index bloat where an excessive number of low-value pages have made it into the index. This it’s commonly caused by auto-generated pages - like filter combinations, archive pages, tag pages, user profiles, pagination, rogue parameters, you get the gist.
Your goal shouldn’t be to get as many pages into the index as possible. It’s already crowded. You will not get seats for a big group - you would be lucky to get two seats or even one. And with so many pages competing for the same search intent, it becomes confusing to search engines which pages deserve to be ranked. You will have more success if you combine their signals and have one attractive page which can stand out in the crowd.
But your goal shouldn’t be to rank as many pages as possible. If your ranking lead to a bad user experience because they land on a low quality page and bounces, that is hurting your brand.
Your goal should be to make a valuable match between the user and the page. So make sure you know all the types of pages you have in the index and only put the best face of your brand forward. No bloat. No noise. Only pages you want your users to land on.
There are many mechanisms by which you can eject unwanted pages out of the index. If the page has zero value, I recommend you send a 410 code as search engines will know you intended to remove the content and swiftly de-index it. If it has value and merging with a desirable page is an option, then 301. If there is a legitimate reason for duplicate content, set the canonical. If there's not a good reason, an it’s run through a parameter, use URL parameter handling. And if none of the above is an option, resort to a noindex tag. NOT robots.txt disallows as this is not guaranteed to deindex the page.
And after you finish the cleanup, have a drink and celebrate because your pages will be running with the IT girls.
But there is a difference between an ‘it’ girl and the indexable elite.
For competitive terms on mobile, ranking in organic position 1 really doesn’t mean so much anymore. If you want to rank for boris johnson, traditional placements won’t get you much visibility.
Search engine result pages are more rich, visual and crowded. It’s not just paid ads and other organic results you are competing with for clicks. There exists a wide range of rich result features which attracts users attention and reduce clicks on traditional results.
So how do you gain access to these VIP areas - the rich results. You need to make it so Google has no problem refactoring your content for presentation on its platforms.
The best way to do that is by using complete and correct structured data implement with JSON-LD based on schema.org. But don’t go and add every possible item from the schema library onto your website. Just because an element can be marked up doesn’t mean it should. You need to understand what value you will get back from Google for including that markup and the currency is rich experience rankings.
Organisation markup is only needed on your homepage as it’s purpose it to generate the knowledge graph panel.
Use ItemList on category pages to generate carousels
Use product markup to show detailed product information in rich Search results — including Google Images
Use event markup for coverage in search results and maps.
Use article markup and AMP to win the top stories carousel and enjoy enhanced results within the rest of the SERPs.
Not compulsory but having structured markup and AMP also helps relevant brands to be featured in other Google VIP experiences such as Google News.
and in Google Discover.
With these rich experiences we start to see that search engines are shifting from keyword to topics, from answers to journey, from queries to feeds and most importantly from being a traffic conduit to a content hub. It’s not only about search anymore. Google is going deeper into the user journey and it’s going to be important to have your site not only technical compatible, but optimised for, these rich experiences.
So - here are the top 5 takeaways from this talk.
1. Get your pages crawled fast by using the most relevant one of these 4 methods.
2. SEO hacks tend to be exactly that - hacky. You want to be presentable when you reach the Googlebot bouncer - Know how the different directives are processed by search engines. Don’t send conflicting signals. And if you get turned away stop and listen because the bouncer tells you why!
3 SEO can fail despite onpage and offpage being on point because of the processes behind the rank, especially rendering strategies. Understand how your rendering stack impacts SEO.
4. Just because a URL is valid, doesn’t mean it deserves to be. Know what pages you have in the index and how this impacts your brand reputation.
And finally, aim to achieve VIP status because search engines are evolving into platform ecosystems and if you don't contribute structured content, you will be left out in the cold.
And that my friends, is everything you need to know to enjoy yourself at club VALID.