sitemap4rdfgenerate Sitemap files from a SPARQL              endpoint          http://www.deri.ie/          http://www der...
ToC•   Publishing Linked Data from a triple store•   Search engines•   The Sitemap protocol•   sitemap4rdf•   Summary    S...
Linked Data frontends for triple storesSource: Pubby website, http://www4.wiwiss.fu-berlin.de/pubby/                      ...
ToC•   Publishing Linked Data from a triple store•   Search engines•   The Sitemap protocol•   sitemap4rdf•   Summary    S...
Sindice: the best RDF search engine     5
Sindice: the best RDF search engine•   120M+ documents•   Continuously updating since 2006    C ti      l    d ti    i•   ...
ToC•   Publishing Linked Data from a triple store•   Search engines•   The Sitemap protocol•   sitemap4rdf•   Summary    S...
Sitemap Protocol• Used by web crawlers• Efficiently find all your content & discover  what has been updated             ht...
Sitemap Protocol: Simple example<?xml version="1.0" encoding="UTF-8"?><urlset   xmlns="http://www.sitemaps.org/schemas/sit...
Sitemap Protocol: Optional parts<?xml version="1.0" encoding="UTF-8"?><urlset   xmlns="http://www.sitemaps.org/schemas/sit...
Sitemap Protocol: Huge sitemaps• Gzip-compress your sitemap• Limit: 50k URLs or 10MB  • split into multiple sitemap files ...
Sitemap Protocol: Discovery• Publish the sitemap file• Add a line to http://yoursite/robots.txt   •   Web site owners use ...
ToC•   Publishing Linked Data from a triple store•   Search engines•   The Sitemap protocol•   sitemap4rdf•   Summary    S...
sitemap4rdf• Simple command line tool• Sends a SPARQL query to list all URIs• Generates sitemapsitemap4rdf htt // it    4 ...
Submit the sitemap location - Sindice• http://sindice.com/main/submit                           15
Submit the sitemap location - Google• https://www.google.com/webmasters/tools/                         16
ToC•   Publishing Linked Data from a triple store•   Search engines•   The Sitemap protocol•   sitemap4rdf•   Summary    S...
Summary• Sitemap protocol informs search engines about  available pages   • Supported by Sindice!• sitemap4rdf generates S...
ToC•   Publishing Linked Data from a triple store•   Search engines•   The Sitemap protocol•   sitemap4rdf•   Summary    S...
Future Work• Integrate sitemap4rdf with Pubby• Generate voiD file automatically from a SPARQL  endpoint• Generate an entry...
Future Work• Support the semantic sitemap extension (when it will  be compatible with google)   • http://sw.deri.org/2007/...
sitemap4rdfgenerate Sitemap files from a SPARQL              endpoint          http://www.deri.ie/          http://www der...
Upcoming SlideShare
Loading in...5
×

Sitemap4rdf(v2 boris)

985
-1

Published on

sitemap4rdf , a tool to generate Sitemap files from a SPARQL endpoint

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
985
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Sitemap4rdf(v2 boris)

  1. 1. sitemap4rdfgenerate Sitemap files from a SPARQL endpoint http://www.deri.ie/ http://www deri ie/ Boris Villazón-Terrazas and Richard Cyganiak (DERI) Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn 28660 Boadilla del Monte Madrid sn, Monte, http://www.oeg-upm.net Phone: 34.91.3366605, Fax: 34.91.3524819
  2. 2. ToC• Publishing Linked Data from a triple store• Search engines• The Sitemap protocol• sitemap4rdf• Summary S• Future work 2
  3. 3. Linked Data frontends for triple storesSource: Pubby website, http://www4.wiwiss.fu-berlin.de/pubby/ 3
  4. 4. ToC• Publishing Linked Data from a triple store• Search engines• The Sitemap protocol• sitemap4rdf• Summary S• Future work 4
  5. 5. Sindice: the best RDF search engine 5
  6. 6. Sindice: the best RDF search engine• 120M+ documents• Continuously updating since 2006 C ti l d ti i• Search API• RDF/XML, Turtle, RDFa, microformats 6
  7. 7. ToC• Publishing Linked Data from a triple store• Search engines• The Sitemap protocol• sitemap4rdf• Summary S• Future work 7
  8. 8. Sitemap Protocol• Used by web crawlers• Efficiently find all your content & discover what has been updated http://sitemaps.org/A sitemap fil contains i f i file i information regarding one or more URL on i di URLs your Web site. The information that is stored there helps search engines better spider your website. 8
  9. 9. Sitemap Protocol: Simple example<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://yoursite/</loc> </url> <url> oc ttp://you s te/p oducts/535 6 / oc <loc>http://yoursite/products/53546</loc> </url> <url> <loc>http://yoursite/products/98421</loc> </url> <url> <loc>http://yoursite/products/41003</loc> </url></urlset> 9
  10. 10. Sitemap Protocol: Optional parts<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://yoursite/</loc> <lastmod>2010-06-24</lastmod> <changefreq>daily</changefreq> < h f >d il </ h f > </url></urlset> 10
  11. 11. Sitemap Protocol: Huge sitemaps• Gzip-compress your sitemap• Limit: 50k URLs or 10MB • split into multiple sitemap files • add a sitemap index file 11
  12. 12. Sitemap Protocol: Discovery• Publish the sitemap file• Add a line to http://yoursite/robots.txt • Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol. Sitemap: http://yoursite/sitemap.xml 12
  13. 13. ToC• Publishing Linked Data from a triple store• Search engines• The Sitemap protocol• sitemap4rdf• Summary S• Future work 13
  14. 14. sitemap4rdf• Simple command line tool• Sends a SPARQL query to list all URIs• Generates sitemapsitemap4rdf htt // it 4 df http://yoursite/sparql htt // it / l http://yoursite/resource/ it / /Example:sitemap4rdf http://geo.linkeddata.es/sparql http://geo.linkeddata.es/• run sitemap4rdf specifying th SPARQL endpoint it 4 df if i the d i t and the prefix of the URLs to include in the Sitemap 14
  15. 15. Submit the sitemap location - Sindice• http://sindice.com/main/submit 15
  16. 16. Submit the sitemap location - Google• https://www.google.com/webmasters/tools/ 16
  17. 17. ToC• Publishing Linked Data from a triple store• Search engines• The Sitemap protocol• sitemap4rdf• Summary S• Future work 17
  18. 18. Summary• Sitemap protocol informs search engines about available pages • Supported by Sindice!• sitemap4rdf generates Sitemap files by listing URIs in a SPARQL endpoint • Open source, Java • http://lab.linkeddata.deri.ie/2010/sitemap4rdf/ • http://mccarthy dia fi upm es/sitemap4rdf/ http://mccarthy.dia.fi.upm.es/sitemap4rdf/ • http://www.oeg-upm.net/index.php/en/downloads/122-sitemap4rdf 18
  19. 19. ToC• Publishing Linked Data from a triple store• Search engines• The Sitemap protocol• sitemap4rdf• Summary S• Future work 19
  20. 20. Future Work• Integrate sitemap4rdf with Pubby• Generate voiD file automatically from a SPARQL endpoint• Generate an entry in CKAN (registry of open knowledge packages) automatically through CKAN- API • http://ckan net/package/geolinkeddata http://ckan.net/package/geolinkeddata• Interact with prefix cc ( service for remembering and prefix.cc looking up URI prefixes) through its API • geoes: < http://geo.linkeddata.es/ontology> http://geo.linkeddata.es/ontology 20
  21. 21. Future Work• Support the semantic sitemap extension (when it will be compatible with google) • http://sw.deri.org/2007/07/sitemapextension/ 21
  22. 22. sitemap4rdfgenerate Sitemap files from a SPARQL endpoint http://www.deri.ie/ http://www deri ie/ Boris Villazón-Terrazas and Richard Cyganiak (DERI) Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn 28660 Boadilla del Monte Madrid sn, Monte, http://www.oeg-upm.net Phone: 34.91.3366605, Fax: 34.91.3524819
  1. Gostou de algum slide específico?

    Recortar slides é uma maneira fácil de colecionar informações para acessar mais tarde.

×