Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Better Sitemap (Mozilla Drumbeat)

907 views

Published on

Project proposal on how SItemap 0.90 can be improved.

Published in: Technology, Business
  • Be the first to comment

Better Sitemap (Mozilla Drumbeat)

  1. 1. Better Sitemap U-Zyn Chua [email_address] December 12, 2009 Mozilla Drumbeat Challenge Singapore This work is licensed under a Creative Commons Attribution 3.0 License. All other trademarks, logos and copyrights are the property of their respective owners.
  2. 2. Sitemap 0.90 U-Zyn Chua [email_address]
  3. 3. <ul><li>XML </li></ul><ul><li>List of URLs </li></ul><ul><li>For URL discovery </li></ul><ul><li>Robot-friendly </li></ul><ul><li>Max of 10MB/50k URLs per file </li></ul>U-Zyn Chua [email_address]
  4. 4. <ul><li><?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> </li></ul><ul><li><urlset xmlns=&quot;http://www.sitemaps.org/schemas/sitemap/0.9&quot;> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/</loc> </li></ul><ul><li><priority>1.000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/3dwh_dmca.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/cpanel/domain</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/edu/</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/admins/new.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/admins/overview.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/admins/privacy.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/admins/program_policies.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/admins/seminars.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/admins/terms.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/admins/testimonials.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/admins/tour.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/edu/administration.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/edu/benefits.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/edu/calendar.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/edu/customers/asu.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/edu/customers/pdfs/asu_success_story.pdf</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/edu/details.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/edu/features.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/edu/gmail.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/edu/pagecreator.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/edu/seminars.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/edu/startpage.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/edu/talk.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul>U-Zyn Chua [email_address] <ul><li>Messy </li></ul><ul><li>Huge (google.com’s – 3.9MB) </li></ul><ul><li>Useless (for human) </li></ul>
  5. 5. Improvements U-Zyn Chua [email_address]
  6. 6. <ul><li>For robots: </li></ul><ul><ul><li>Faster </li></ul></ul><ul><ul><li>More efficient </li></ul></ul><ul><li>For humans: </li></ul><ul><ul><li>More useful </li></ul></ul><ul><ul><li>At least readable by human web client – browser. </li></ul></ul><ul><ul><li>A browser uses about 5KB of bandwidth to download favicons. Why not use the bandwidth to download more useful material? </li></ul></ul>Aims U-Zyn Chua [email_address]
  7. 7. <ul><li>Site map </li></ul><ul><li>Parent page </li></ul><ul><li>Sibling pages </li></ul><ul><li>Children pages </li></ul><ul><li>Parsable by web browsers </li></ul>Hierarchical U-Zyn Chua [email_address]
  8. 8. Hierarchical U-Zyn Chua [email_address] Browser is able to tell user where he/she is at
  9. 9. <ul><li><lastmod> is in Sitemap 0.90 </li></ul><ul><li>But not sorted-by </li></ul><ul><li>Present sitemap in chronological order </li></ul>Chronological U-Zyn Chua [email_address]
  10. 10. Chronological U-Zyn Chua [email_address] Browser showing newly updated pages
  11. 11. <ul><li>Robots: </li></ul><ul><ul><li>Do not have to download huge sitemap files everytime </li></ul></ul><ul><ul><li>Only download first few chunks </li></ul></ul><ul><li>Browsers: </li></ul><ul><ul><li>Easily tell surfers where the newly updated content is located </li></ul></ul><ul><ul><li>(unlike RSS) not limited to blog/blog-like site. </li></ul></ul>Chronological U-Zyn Chua [email_address]
  12. 12. More Efficient (Draft) <ul><li>Multiple versions </li></ul><ul><ul><li>Chronological </li></ul></ul><ul><ul><ul><li>Robots do not have to download the whole sitemap for each crawl </li></ul></ul></ul><ul><ul><li>Hierarchical </li></ul></ul><ul><li>Seekable </li></ul><ul><ul><li>With header index </li></ul></ul><ul><ul><li>Only download needed portions </li></ul></ul>U-Zyn Chua [email_address]
  13. 13. More Efficient (Draft) <ul><li>Smarter </li></ul><ul><ul><li>Each page serves sitemap based on where client/user is at. </li></ul></ul><ul><ul><li>Do not have to download whole sitemap. </li></ul></ul><ul><ul><li>Do not have to parse whole sitemap. </li></ul></ul><ul><ul><li>Able to keep filesize small – approx. 5KB for browsers to load quickly. </li></ul></ul><ul><li>Switch away from XML? </li></ul>U-Zyn Chua [email_address]
  14. 14. Better Sitemap U-Zyn Chua [email_address] This work is licensed under a Creative Commons Attribution 3.0 License. All other trademarks, logos and copyrights are the property of their respective owners. <ul><li>For robots and humans alike </li></ul><ul><li>Chronological </li></ul><ul><li>Hierarchical </li></ul><ul><li>Seekable </li></ul><ul><li>Smarter </li></ul>Project Summary

×