How to get your data into Sindice and Google with sitemap4rdf<br />Boris Villazón-Terrazas (OEG), Richard Cyganiak (DERI)<...
Publishing Linked Data <br />from a triple store<br />
Linked Data frontends for triple stores<br />Source: Pubby website, http://www4.wiwiss.fu-berlin.de/pubby/<br />
Search engines<br />
Sindice: the best RDF search engine<br />
Sindice: the best RDF search engine<br />120M+ documents<br />Continuously updating since 2006<br />Low-latency search API...
The Sitemap protocol<br />
Sitemap Protocol<br />Used by web crawlers<br />Efficiently find all your content & discover what has been updated<br />ht...
Sitemap Protocol: Simple example<br /><?xml version="1.0" encoding="UTF-8"?><br /><urlsetxmlns="http://www.sitemaps.org/sc...
Sitemap Protocol: Optional parts<br /><?xml version="1.0" encoding="UTF-8"?><br /><urlsetxmlns="http://www.sitemaps.org/sc...
Sitemap Protocol: Huge sitemaps<br />Gzip-compress your sitemap<br />Limit: 50k URLs or 10MB<br />split into multiple site...
Sitemap Protocol: Discovery<br />Publish the sitemap file<br />Add a line to http://yoursite/robots.txt<br />  Sitemap: ht...
sitemap4rdf<br />Generate Sitemap files from a SPARQL endpoint<br />
sitemap4rdf<br />Simple command line tool<br />Sends a SPARQL query to list all URIs<br />Generates sitemap<br />sitemap4r...
Submit the sitemap location - Sindice<br />http://sindice.com/main/submit<br />
Submit the sitemap location - Google<br />https://www.google.com/webmasters/tools/<br />
Summary<br />Sitemap protocol informs search engines about available pages<br />Supported by Sindice!<br />sitemap4rdf gen...
Upcoming SlideShare
Loading in...5
×

How to get your data into Sindice and Google with sitemap4rdf

1,725

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,725
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
9
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

How to get your data into Sindice and Google with sitemap4rdf

  1. 1. How to get your data into Sindice and Google with sitemap4rdf<br />Boris Villazón-Terrazas (OEG), Richard Cyganiak (DERI)<br />
  2. 2. Publishing Linked Data <br />from a triple store<br />
  3. 3. Linked Data frontends for triple stores<br />Source: Pubby website, http://www4.wiwiss.fu-berlin.de/pubby/<br />
  4. 4. Search engines<br />
  5. 5. Sindice: the best RDF search engine<br />
  6. 6. Sindice: the best RDF search engine<br />120M+ documents<br />Continuously updating since 2006<br />Low-latency search API<br />RDF/XML, Turtle, RDFa, microformats<br />
  7. 7. The Sitemap protocol<br />
  8. 8. Sitemap Protocol<br />Used by web crawlers<br />Efficiently find all your content & discover what has been updated<br />http://sitemaps.org/<br />
  9. 9. Sitemap Protocol: Simple example<br /><?xml version="1.0" encoding="UTF-8"?><br /><urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9"><br /> <url><br /> <loc>http://yoursite/</loc><br /> </url><br /> <url><br /> <loc>http://yoursite/products/53546</loc><br /> </url><br /> <url><br /> <loc>http://yoursite/products/98421</loc><br /> </url><br /> <url><br /> <loc>http://yoursite/products/41003</loc><br /> </url><br /></urlset> <br />
  10. 10. Sitemap Protocol: Optional parts<br /><?xml version="1.0" encoding="UTF-8"?><br /><urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9"><br /> <url><br /> <loc>http://yoursite/</loc><br /> <lastmod>2010-06-24</lastmod><br /> <changefreq>daily</changefreq><br /> </url><br /></urlset><br />
  11. 11. Sitemap Protocol: Huge sitemaps<br />Gzip-compress your sitemap<br />Limit: 50k URLs or 10MB<br />split into multiple sitemap files<br />add a sitemap index file<br />
  12. 12. Sitemap Protocol: Discovery<br />Publish the sitemap file<br />Add a line to http://yoursite/robots.txt<br /> Sitemap: http://yoursite/sitemap.xml<br />
  13. 13. sitemap4rdf<br />Generate Sitemap files from a SPARQL endpoint<br />
  14. 14. sitemap4rdf<br />Simple command line tool<br />Sends a SPARQL query to list all URIs<br />Generates sitemap<br />sitemap4rdf http://yoursite/sparql http://yoursite/resource/<br />
  15. 15. Submit the sitemap location - Sindice<br />http://sindice.com/main/submit<br />
  16. 16. Submit the sitemap location - Google<br />https://www.google.com/webmasters/tools/<br />
  17. 17. Summary<br />Sitemap protocol informs search engines about available pages<br />Supported by Sindice!<br />sitemap4rdf generates Sitemap files by listing URIs in a SPARQL endpoint<br />Open source, Java<br />http://lab.linkeddata.deri.ie/2010/sitemap4rdf/<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×