Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Sitemap4rdf(v2 boris)

1,192 views

Published on

sitemap4rdf , a tool to generate Sitemap files from a SPARQL endpoint

Published in: Technology
  • Be the first to comment

Sitemap4rdf(v2 boris)

  1. 1. sitemap4rdfgenerate Sitemap files from a SPARQL endpoint http://www.deri.ie/ http://www deri ie/ Boris Villazón-Terrazas and Richard Cyganiak (DERI) Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn 28660 Boadilla del Monte Madrid sn, Monte, http://www.oeg-upm.net Phone: 34.91.3366605, Fax: 34.91.3524819
  2. 2. ToC• Publishing Linked Data from a triple store• Search engines• The Sitemap protocol• sitemap4rdf• Summary S• Future work 2
  3. 3. Linked Data frontends for triple storesSource: Pubby website, http://www4.wiwiss.fu-berlin.de/pubby/ 3
  4. 4. ToC• Publishing Linked Data from a triple store• Search engines• The Sitemap protocol• sitemap4rdf• Summary S• Future work 4
  5. 5. Sindice: the best RDF search engine 5
  6. 6. Sindice: the best RDF search engine• 120M+ documents• Continuously updating since 2006 C ti l d ti i• Search API• RDF/XML, Turtle, RDFa, microformats 6
  7. 7. ToC• Publishing Linked Data from a triple store• Search engines• The Sitemap protocol• sitemap4rdf• Summary S• Future work 7
  8. 8. Sitemap Protocol• Used by web crawlers• Efficiently find all your content & discover what has been updated http://sitemaps.org/A sitemap fil contains i f i file i information regarding one or more URL on i di URLs your Web site. The information that is stored there helps search engines better spider your website. 8
  9. 9. Sitemap Protocol: Simple example<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://yoursite/</loc> </url> <url> oc ttp://you s te/p oducts/535 6 / oc <loc>http://yoursite/products/53546</loc> </url> <url> <loc>http://yoursite/products/98421</loc> </url> <url> <loc>http://yoursite/products/41003</loc> </url></urlset> 9
  10. 10. Sitemap Protocol: Optional parts<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://yoursite/</loc> <lastmod>2010-06-24</lastmod> <changefreq>daily</changefreq> < h f >d il </ h f > </url></urlset> 10
  11. 11. Sitemap Protocol: Huge sitemaps• Gzip-compress your sitemap• Limit: 50k URLs or 10MB • split into multiple sitemap files • add a sitemap index file 11
  12. 12. Sitemap Protocol: Discovery• Publish the sitemap file• Add a line to http://yoursite/robots.txt • Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol. Sitemap: http://yoursite/sitemap.xml 12
  13. 13. ToC• Publishing Linked Data from a triple store• Search engines• The Sitemap protocol• sitemap4rdf• Summary S• Future work 13
  14. 14. sitemap4rdf• Simple command line tool• Sends a SPARQL query to list all URIs• Generates sitemapsitemap4rdf htt // it 4 df http://yoursite/sparql htt // it / l http://yoursite/resource/ it / /Example:sitemap4rdf http://geo.linkeddata.es/sparql http://geo.linkeddata.es/• run sitemap4rdf specifying th SPARQL endpoint it 4 df if i the d i t and the prefix of the URLs to include in the Sitemap 14
  15. 15. Submit the sitemap location - Sindice• http://sindice.com/main/submit 15
  16. 16. Submit the sitemap location - Google• https://www.google.com/webmasters/tools/ 16
  17. 17. ToC• Publishing Linked Data from a triple store• Search engines• The Sitemap protocol• sitemap4rdf• Summary S• Future work 17
  18. 18. Summary• Sitemap protocol informs search engines about available pages • Supported by Sindice!• sitemap4rdf generates Sitemap files by listing URIs in a SPARQL endpoint • Open source, Java • http://lab.linkeddata.deri.ie/2010/sitemap4rdf/ • http://mccarthy dia fi upm es/sitemap4rdf/ http://mccarthy.dia.fi.upm.es/sitemap4rdf/ • http://www.oeg-upm.net/index.php/en/downloads/122-sitemap4rdf 18
  19. 19. ToC• Publishing Linked Data from a triple store• Search engines• The Sitemap protocol• sitemap4rdf• Summary S• Future work 19
  20. 20. Future Work• Integrate sitemap4rdf with Pubby• Generate voiD file automatically from a SPARQL endpoint• Generate an entry in CKAN (registry of open knowledge packages) automatically through CKAN- API • http://ckan net/package/geolinkeddata http://ckan.net/package/geolinkeddata• Interact with prefix cc ( service for remembering and prefix.cc looking up URI prefixes) through its API • geoes: < http://geo.linkeddata.es/ontology> http://geo.linkeddata.es/ontology 20
  21. 21. Future Work• Support the semantic sitemap extension (when it will be compatible with google) • http://sw.deri.org/2007/07/sitemapextension/ 21
  22. 22. sitemap4rdfgenerate Sitemap files from a SPARQL endpoint http://www.deri.ie/ http://www deri ie/ Boris Villazón-Terrazas and Richard Cyganiak (DERI) Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn 28660 Boadilla del Monte Madrid sn, Monte, http://www.oeg-upm.net Phone: 34.91.3366605, Fax: 34.91.3524819

×