SophiaConf2010 Présentation des Retours d'expériences de la Conférence du 08 ...
xml sitemap format guide
1. xml sitemap
Initially, we prepare to use the URL information webmasters provide to further boost the
coverage and freshness of our index. In excess of time that will direct to our performing an
even better work of providing more lookup final results from much more web sites.
This project does not just pertain to Google, both: we're releasing it underneath the
Attribution/Share Alike Creative Commons license so that other lookup engines can do a
greater occupation as well. Eventually we hope this will be supported natively in webservers
(e.g. Apache, Lotus Notes, IIS). But to get you started out, we provide Sitemap Generator, an
open source customer in Python to compute sitemaps for a couple of widespread use
instances. Give it a whirl and give us your opinions."
two. Google, MSN and Yahoo declared joint support for the Sitemaps protocol in November
2006. The schema version was altered to "Sitemap .90", but no other alterations ended up
made.
3. In April 2007, Ask and IBM declared support for Sitemaps. Also, Google, Yahoo, MS
introduced vehicle-discovery for sitemaps through robots.txt.
XML Sitemap Format
The sitemap protocol is composed of XML tags. All information values in a sitemap need to
be entity escaped (explained under). The file alone should be UTF-8 encoded. The sitemap
should:
one. Begin with tag and end with tag.
two. Specify the namespace inside of the tag.
three. Consist of a entry for every single URL as a mother or father tag.
4. Incorporate a kid entry for each and every mum or dad tag.
All other tags are optional and their utilization could range among lookup engines.
XML Tag Definitions
one. urlset - This tag is needed. Encapsulates the file and references the current protocol
normal.
2. url - This tag is required. Father or mother tag for every entry.
three. loc - This tag is needed. It states the URL of the webpage. It should get started with a
protocol (this kind of as http) and stop with a trailing slash. It need to be considerably less
than 2048 people.
4. lastmod - This tag is optional. It defines the day of last modification of the file. The day
need to be in W3C Datetime format.
2. 5. changefreq - This tag is optional. It informs how regularly the page is very likely to change.
It gives common information to the search engines and do not compel them to crawl the
webpage as it is changed. The valid values for it are:
o usually
o hourly
o daily
o weekly
o month to month
o annually
o never ever
six. precedence - This tag is optional. It describes the priority of a URL relative to other URLs
on the site. Its price ranges from to 1. Describing priorities does not affect the rankings of
URLs in the research motor consequence internet pages.
Entity Escaping
As explained over, the sitemap need to be UTF-8 encoded, any data values have to use
entity escape codes for the people:
o Ampersand - &
o Single Quotation - '
o Double Quotation - "
o Greater Than - >
o Less Than - Sitemap Index Documents
There are two variables which have to be stored in thoughts when generating sitemap. They
are:
1. The sitemap should not incorporate much more than fifty,000 URLs
2. Thorough details on xml sitemap can be discovered at main website.