• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Better Sitemap (Mozilla Drumbeat)
 

Better Sitemap (Mozilla Drumbeat)

on

  • 894 views

Project proposal on how SItemap 0.90 can be improved.

Project proposal on how SItemap 0.90 can be improved.

Statistics

Views

Total Views
894
Views on SlideShare
855
Embed Views
39

Actions

Likes
1
Downloads
1
Comments
0

1 Embed 39

http://commonspace.wordpress.com 39

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Better Sitemap (Mozilla Drumbeat) Better Sitemap (Mozilla Drumbeat) Presentation Transcript

    • Better Sitemap U-Zyn Chua [email_address] December 12, 2009 Mozilla Drumbeat Challenge Singapore This work is licensed under a Creative Commons Attribution 3.0 License. All other trademarks, logos and copyrights are the property of their respective owners.
    • Sitemap 0.90 U-Zyn Chua [email_address]
      • XML
      • List of URLs
      • For URL discovery
      • Robot-friendly
      • Max of 10MB/50k URLs per file
      U-Zyn Chua [email_address]
      • <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?>
      • <urlset xmlns=&quot;http://www.sitemaps.org/schemas/sitemap/0.9&quot;>
      • <url>
      • <loc>http://www.google.com/</loc>
      • <priority>1.000</priority>
      • </url>
      • <url>
      • <loc>http://www.google.com/3dwh_dmca.html</loc>
      • <priority>0.5000</priority>
      • </url>
      • <url>
      • <loc>http://www.google.com/a</loc>
      • <priority>0.5000</priority>
      • </url>
      • <url>
      • <loc>http://www.google.com/a/cpanel/domain</loc>
      • <priority>0.5000</priority>
      • </url>
      • <url>
      • <loc>http://www.google.com/a/edu/</loc>
      • <priority>0.5000</priority>
      • </url>
      • <url>
      • <loc>http://www.google.com/a/help/intl/en/admins/new.html</loc>
      • <priority>0.5000</priority>
      • </url>
      • <url>
      • <loc>http://www.google.com/a/help/intl/en/admins/overview.html</loc>
      • <priority>0.5000</priority>
      • </url>
      • <url>
      • <loc>http://www.google.com/a/help/intl/en/admins/privacy.html</loc>
      • <priority>0.5000</priority>
      • </url>
      • <url>
      • <loc>http://www.google.com/a/help/intl/en/admins/program_policies.html</loc>
      • <priority>0.5000</priority>
      • </url>
      • <url>
      • <loc>http://www.google.com/a/help/intl/en/admins/seminars.html</loc>
      • <priority>0.5000</priority>
      • </url>
      • <url>
      • <loc>http://www.google.com/a/help/intl/en/admins/terms.html</loc>
      • <priority>0.5000</priority>
      • </url>
      • <url>
      • <loc>http://www.google.com/a/help/intl/en/admins/testimonials.html</loc>
      • <priority>0.5000</priority>
      • </url>
      • <url>
      • <loc>http://www.google.com/a/help/intl/en/admins/tour.html</loc>
      • <priority>0.5000</priority>
      • </url>
      • <url>
      • <loc>http://www.google.com/a/help/intl/en/edu/administration.html</loc>
      • <priority>0.5000</priority>
      • </url>
      • <url>
      • <loc>http://www.google.com/a/help/intl/en/edu/benefits.html</loc>
      • <priority>0.5000</priority>
      • </url>
      • <url>
      • <loc>http://www.google.com/a/help/intl/en/edu/calendar.html</loc>
      • <priority>0.5000</priority>
      • </url>
      • <url>
      • <loc>http://www.google.com/a/help/intl/en/edu/customers/asu.html</loc>
      • <priority>0.5000</priority>
      • </url>
      • <url>
      • <loc>http://www.google.com/a/help/intl/en/edu/customers/pdfs/asu_success_story.pdf</loc>
      • <priority>0.5000</priority>
      • </url>
      • <url>
      • <loc>http://www.google.com/a/help/intl/en/edu/details.html</loc>
      • <priority>0.5000</priority>
      • </url>
      • <url>
      • <loc>http://www.google.com/a/help/intl/en/edu/features.html</loc>
      • <priority>0.5000</priority>
      • </url>
      • <url>
      • <loc>http://www.google.com/a/help/intl/en/edu/gmail.html</loc>
      • <priority>0.5000</priority>
      • </url>
      • <url>
      • <loc>http://www.google.com/a/help/intl/en/edu/pagecreator.html</loc>
      • <priority>0.5000</priority>
      • </url>
      • <url>
      • <loc>http://www.google.com/a/help/intl/en/edu/seminars.html</loc>
      • <priority>0.5000</priority>
      • </url>
      • <url>
      • <loc>http://www.google.com/a/help/intl/en/edu/startpage.html</loc>
      • <priority>0.5000</priority>
      • </url>
      • <url>
      • <loc>http://www.google.com/a/help/intl/en/edu/talk.html</loc>
      • <priority>0.5000</priority>
      • </url>
      U-Zyn Chua [email_address]
      • Messy
      • Huge (google.com’s – 3.9MB)
      • Useless (for human)
    • Improvements U-Zyn Chua [email_address]
      • For robots:
        • Faster
        • More efficient
      • For humans:
        • More useful
        • At least readable by human web client – browser.
        • A browser uses about 5KB of bandwidth to download favicons. Why not use the bandwidth to download more useful material?
      Aims U-Zyn Chua [email_address]
      • Site map
      • Parent page
      • Sibling pages
      • Children pages
      • Parsable by web browsers
      Hierarchical U-Zyn Chua [email_address]
    • Hierarchical U-Zyn Chua [email_address] Browser is able to tell user where he/she is at
      • <lastmod> is in Sitemap 0.90
      • But not sorted-by
      • Present sitemap in chronological order
      Chronological U-Zyn Chua [email_address]
    • Chronological U-Zyn Chua [email_address] Browser showing newly updated pages
      • Robots:
        • Do not have to download huge sitemap files everytime
        • Only download first few chunks
      • Browsers:
        • Easily tell surfers where the newly updated content is located
        • (unlike RSS) not limited to blog/blog-like site.
      Chronological U-Zyn Chua [email_address]
    • More Efficient (Draft)
      • Multiple versions
        • Chronological
          • Robots do not have to download the whole sitemap for each crawl
        • Hierarchical
      • Seekable
        • With header index
        • Only download needed portions
      U-Zyn Chua [email_address]
    • More Efficient (Draft)
      • Smarter
        • Each page serves sitemap based on where client/user is at.
        • Do not have to download whole sitemap.
        • Do not have to parse whole sitemap.
        • Able to keep filesize small – approx. 5KB for browsers to load quickly.
      • Switch away from XML?
      U-Zyn Chua [email_address]
    • Better Sitemap U-Zyn Chua [email_address] This work is licensed under a Creative Commons Attribution 3.0 License. All other trademarks, logos and copyrights are the property of their respective owners.
      • For robots and humans alike
      • Chronological
      • Hierarchical
      • Seekable
      • Smarter
      Project Summary