Developing Web Applications for Humans and Robots
--- Nagaraju Sangam
•Left-to-right Vs Right to Left
User roles: Admin, End User
Impairments : Visual, Hear, Motor, Cognitive
•alt, title for image
•Keep empty alt for unimportant images
•role for sections
•for (label –field)
•Titles for frames
•Allow keyboard navigation
Web Robots : Programs that traverse the Web automatically.
Good Robots : indexing/crawling Eg:
Bad Robots: Spam : Tries to read confidential info from the pages, access private folders… Email ids, Phone numbers etc.
Problems with Good Robots:
Multiple versions of the pages
Private folders etc…
Problems with Good Robots: Solution
Add Robots.txt file in root folder of your site
You should be able to browse the file via below URL http://yourdomain/robots.txt
Put the below code in robots.txt This will prevent all bots from crawling your site…
Dealing with Bad Robots:
Robots.txt is not a real security feature.
It doesn’t prevent the bad robots from crawling your content.
It’s just a guideline for the robots, its up to them whether to follow it or not.
For bad robots you should have rules setup in firewalls to block them.
Typo errors in Robots.txt:
Robots.txt is a case sensitive file.
There is a possibility for typo errors.
So it’s always advisable to use tools to generate the file.
Online tools to create robots.txt
Meta tags for Robots:
We can setup rules for robots at the html page level via html tags
Meta tags <META name="robots" content= "NOINDEX, NOFOLLOW"> <Meta name="googlebot" content="noindex" /> <Meta name="googlebot-news" content="nosnippet">
HTTP Headers X-Robots-Tag: noindex
If you have Robots.txt and meta tags in page, search engines will first look at the robots.txt and then the meta tags in the page.
Meta tag attribute values are case in-sensitive, Robots.txt is case sensitive.
Other html tags for used by web robots:
<META NAME=“DESCRIPTION" CONTENT=“Nagaraju Sangam">
<META NAME="AUTHOR" CONTENT=“Nagaraju Sangam">
<META HTTP-EQUIV="CONTENT-LANGUAGE" CONTENT="en-US,fr">
<META HTTP-EQUIV="EXPIRES" CONTENT="Sun, 30 May 2013 12:00:00PM GMT">
<META NAME="KEYWORDS" CONTENT=“music,news,entertinement">
Title & Description in search results:
Title: Comes from the <Title> tag in the head section of the page. If no title is found, search engine performs the heuristic algorithm and displays the title.
Description: Comes from the Meta tag in the head section of the page. If no description is found is found, search engine performs the heuristic algorithm and displays the description, this may not be intuitive to the page. <Meta name=“description” content=“description goes here..”>
It’s a best practice to add title and description to each page of the site. Title should be unique for each page.