How to get your data into Sindice and Google with sitemap4rdf

2,081 views
1,967 views

Published on

Published in: Technology

How to get your data into Sindice and Google with sitemap4rdf

  1. 1. How to get your data into Sindice and Google with sitemap4rdf<br />Boris Villazón-Terrazas (OEG), Richard Cyganiak (DERI)<br />
  2. 2. Publishing Linked Data <br />from a triple store<br />
  3. 3. Linked Data frontends for triple stores<br />Source: Pubby website, http://www4.wiwiss.fu-berlin.de/pubby/<br />
  4. 4. Search engines<br />
  5. 5. Sindice: the best RDF search engine<br />
  6. 6. Sindice: the best RDF search engine<br />120M+ documents<br />Continuously updating since 2006<br />Low-latency search API<br />RDF/XML, Turtle, RDFa, microformats<br />
  7. 7. The Sitemap protocol<br />
  8. 8. Sitemap Protocol<br />Used by web crawlers<br />Efficiently find all your content & discover what has been updated<br />http://sitemaps.org/<br />
  9. 9. Sitemap Protocol: Simple example<br /><?xml version="1.0" encoding="UTF-8"?><br /><urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9"><br /> <url><br /> <loc>http://yoursite/</loc><br /> </url><br /> <url><br /> <loc>http://yoursite/products/53546</loc><br /> </url><br /> <url><br /> <loc>http://yoursite/products/98421</loc><br /> </url><br /> <url><br /> <loc>http://yoursite/products/41003</loc><br /> </url><br /></urlset> <br />
  10. 10. Sitemap Protocol: Optional parts<br /><?xml version="1.0" encoding="UTF-8"?><br /><urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9"><br /> <url><br /> <loc>http://yoursite/</loc><br /> <lastmod>2010-06-24</lastmod><br /> <changefreq>daily</changefreq><br /> </url><br /></urlset><br />
  11. 11. Sitemap Protocol: Huge sitemaps<br />Gzip-compress your sitemap<br />Limit: 50k URLs or 10MB<br />split into multiple sitemap files<br />add a sitemap index file<br />
  12. 12. Sitemap Protocol: Discovery<br />Publish the sitemap file<br />Add a line to http://yoursite/robots.txt<br /> Sitemap: http://yoursite/sitemap.xml<br />
  13. 13. sitemap4rdf<br />Generate Sitemap files from a SPARQL endpoint<br />
  14. 14. sitemap4rdf<br />Simple command line tool<br />Sends a SPARQL query to list all URIs<br />Generates sitemap<br />sitemap4rdf http://yoursite/sparql http://yoursite/resource/<br />
  15. 15. Submit the sitemap location - Sindice<br />http://sindice.com/main/submit<br />
  16. 16. Submit the sitemap location - Google<br />https://www.google.com/webmasters/tools/<br />
  17. 17. Summary<br />Sitemap protocol informs search engines about available pages<br />Supported by Sindice!<br />sitemap4rdf generates Sitemap files by listing URIs in a SPARQL endpoint<br />Open source, Java<br />http://lab.linkeddata.deri.ie/2010/sitemap4rdf/<br />

×