A Critique of the Proposed National Education Policy Reform
Sitemap comparison
1. Sitemap Comparison of Text, HTML,
ROR, RSS and XML Sitemaps
HTML Sitemaps - help humans navigate your website
HTML sitemaps can be:
• Viewed by all browsers including FireFox, IE and Opera.
• Crawled by all search engines including Google, Yahoo, MSN and ASK.
Some HTML sitemap tips and tricks:
• HTML documents can be generated by PHP, ASP etc. It is the output format that
matters.
• Limit yourself to a few hundred links per page for best website results. Makes it easier to
find your important pages.
Code example of HTML:
<html lang="en">
<head>This is a site map</head>
<body>
<h1>header of HTML site map</h1>
<p>site map paragraph with links
</body>
</html>
XHTML Sitemaps - HTML sitemaps as XML
XHTML is the HTML specification moved into the XML standard.
Sitemap file with XHTML and HTML differences highlighted:
<?xml version="1.0" encoding="UTF-8">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>This is a site map</head>
<body>
2. <h1>header of XHTML site map</h1>
<p>site map paragraph with links</p>
</body>
</html>
Text Sitemaps - simple sitemap
Text sitemaps contain one website url per line. Many search engines including Google and
Yahoo can scan text sitemaps.
Improve compatibility between text sitemaps and search engines:
• For Yahoo, name the primary text sitemap file urllist.txt.
• Save text file sitemaps as UTF-8 documents. Especially if you have website urls with
non-English characters.
• Each text sitemap file should contain no more than 50.000 urls.
Example of text sitemap file:
http://www.example.com/
http://www.example.com/some-directory/
RSS Feeds as Sitemaps - RSS 0.9, RSS 1.0 and RSS 2.0
The RSS protocol is often used in feed files for blogs, forums etc. The RSS file format uses XML
and has evolved over multiple versions and names, all fairly compatible with each other:
• Really Simple Syndication (RSS 2.0)
• RDF Site Summary (RSS 1.0 and RSS 0.90)
• Rich Site Summary (RSS 0.91)
After Google and Yahoo adopted RSS feeds as a kind of website sitemaps, more search
engines have followed.
Note: There is no official standard for splitting RSS feed sitemaps into multiple files. However, if
your RSS sitemap feed is too large, you may wish to, instead of just normal sitemap file split,
create a RSS feed file per website category. (If using a sitemap generator tool try use
include/exclude filters.)
3. Example of a RSS feed sitemap file:
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
<title>Website title</title>
<link>http://www.example.com</link>
<generator>A1 Sitemap Generator</generator>
<lastBuildDate>Tue, 13 Mar 2007 22:28:20 GMT</lastBuildDate>
<item>
<title>Page 1</title>
<link>http://www.example.com/page1.html</link>
</item>
<item>
<title>Page 2</title>
<link>http://www.example.com/page2.html</link>
</item>
</channel>
</rss>
ROR Sitemaps - extends RSS sitemaps
ROR expands on the RSS protocol with its own extensions. The standard file extension for ROR
files is .ror. All search engines that understand RSS sitemap files continue to understand the
RSS parts of ROR files. However, no major search engine, if any at all, currently supports the
ROR sitemap extensions. Currently Google Webmaster Tools has no mention of ROR sitemaps
support.
ROR sitemap file with the ROR namespace extensions of RSS highlighted:
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:ror="http://rorweb.com/0.1/">
<channel>
<title>Website title</title>
<link>http://www.example.com</link>
<generator>A1 Sitemap Generator</generator>
<lastBuildDate>Tue, 13 Mar 2007 22:28:20 GMT</lastBuildDate>
<item>
<title>Page 1</title>
<link>http://www.example.com/page1.html</link>
4. <ror:keywords>page1-keyword1, page1-keyword2, page1-
keyword3</ror:keywords>
<ror:updatePeriod>day</ror:updatePeriod>
</item>
<item>
<title>Page 2</title>
<link>http://www.example.com/page2.html</link>
<ror:keywords>page2-keyword1, page2-keyword2, page2-
keyword3</ror:keywords>
<ror:updatePeriod>day</ror:updatePeriod>
</item>
</channel>
</rss>
XML Sitemaps Protocol - also called Google Sitemaps
In 2005 Google started its own sitemaps protocol based on XML. It was called Google
Sitemaps. Google later convinced more search engines to follow and the standard was
renamed to XML sitemaps protocol. Currently Google, Yahoo, Microsoft MSN Search, Ask,
IBM and possibly more supports XML sitemaps. It is likely that more search engines will
implement support for XML sitemaps.
The protocol of XML sitemaps also defines autodiscovery, i.e. how search engines can
automatically discover website xml sitemaps. The answer islinking to the XML sitemap,
e.g. sitemap.xml, from robots.txt.
User-agent: *
Sitemap: http://www.example.com/sitemap.xml
Instead of just pointing to one XML sitemap file for auto discovery, you can list multiple
sitemaps:
Sitemap: http://www.example.com/sitemap-1.xml
Sitemap: http://www.example.com/sitemap-2.xml
Or point to XML sitemap index file:
5. Sitemap: http://www.example.com/sitemap-index.xml
Information about XML sitemaps protocol:
• Each XML sitemap file can contain max 50.000 urls and be 10 mb in size.
• It is possible to link 1000 XML sitemaps using a sitemap index file.
• XML sitemap files and sitemap index files have to be stored as UTF-8 documents.
Example of XML sitemaps file:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc></loc>
<priority>1.0</priority>
<changefreq>weekly</changefreq>
<lastmod>2007-06-18</lastmod>
</url>
<url>
<loc>blogs/</loc>
<priority>0.8</priority>
<changefreq>weekly</changefreq>
<lastmod>2007-06-21</lastmod>
</url>
</urlset>
<?xml version="1.0" encoding="utf-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc>http://example.com/</loc>
<lastmod>2006-11-18</lastmod>
<changefreq>daily</changefreq>
<priority>0.8</priority>
</url>
</urlset>
6. Element definitions
The definitions for the elements are shown below
Element Required? Description
<urlset> Yes
The document-level element for the Sitemap. The rest of the document
after the '<?xml version>' element must be contained in this.
<url> Yes Parent element for each entry.
<sitemapindex> Yes
The document-level element for the Sitemap index. The rest of the
document after the '<?xml version>' element must be contained in this.
<sitemap> Yes Parent element for each entry in the index.
<loc> Yes
Provides the full URL of the page or sitemap, including the protocol
(e.g. http, https) and a trailing slash, if required by the site's hosting
server. This value must be shorter than 2,048 characters.
<lastmod> No
The date that the file was last modified, in ISO 8601 format. This can
display the full date and time or, if desired, may simply be the date in
the format YYYY-MM-DD.
<changefreq> No How frequently the page may change:
• always
• hourly
• daily
• weekly
7. • monthly
• yearly
• never
"Always" is used to denote documents that change each time that they
are accessed. "Never" is used to denote archived URLs (i.e. files that
will not be changed again).
This is used only as a guide for crawlers, and is not used to determine
how frequently pages are indexed.
Does not apply to <sitemap> elements.
<priority> No
The priority of that URL relative to other URLs on the site. This allows
webmasters to suggest to crawlers which pages are considered more
important.
The valid range is from 0.0 to 1.0, with 1.0 being the most important.
The default value is 0.5.
Rating all pages on a site with a high priority does not affect search
listings, as it is only used to suggest to the crawlers how important
pages in the site are to one another.
Does not apply to <sitemap> elements.