• Like
How to get your data into Sindice and Google with sitemap4rdf
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

How to get your data into Sindice and Google with sitemap4rdf

  • 1,614 views
Published

 

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,614
On SlideShare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
8
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. How to get your data into Sindice and Google with sitemap4rdf
    Boris Villazón-Terrazas (OEG), Richard Cyganiak (DERI)
  • 2. Publishing Linked Data
    from a triple store
  • 3. Linked Data frontends for triple stores
    Source: Pubby website, http://www4.wiwiss.fu-berlin.de/pubby/
  • 4. Search engines
  • 5. Sindice: the best RDF search engine
  • 6. Sindice: the best RDF search engine
    120M+ documents
    Continuously updating since 2006
    Low-latency search API
    RDF/XML, Turtle, RDFa, microformats
  • 7. The Sitemap protocol
  • 8. Sitemap Protocol
    Used by web crawlers
    Efficiently find all your content & discover what has been updated
    http://sitemaps.org/
  • 9. Sitemap Protocol: Simple example
    <?xml version="1.0" encoding="UTF-8"?>
    <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
    <loc>http://yoursite/</loc>
    </url>
    <url>
    <loc>http://yoursite/products/53546</loc>
    </url>
    <url>
    <loc>http://yoursite/products/98421</loc>
    </url>
    <url>
    <loc>http://yoursite/products/41003</loc>
    </url>
    </urlset>
  • 10. Sitemap Protocol: Optional parts
    <?xml version="1.0" encoding="UTF-8"?>
    <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
    <loc>http://yoursite/</loc>
    <lastmod>2010-06-24</lastmod>
    <changefreq>daily</changefreq>
    </url>
    </urlset>
  • 11. Sitemap Protocol: Huge sitemaps
    Gzip-compress your sitemap
    Limit: 50k URLs or 10MB
    split into multiple sitemap files
    add a sitemap index file
  • 12. Sitemap Protocol: Discovery
    Publish the sitemap file
    Add a line to http://yoursite/robots.txt
    Sitemap: http://yoursite/sitemap.xml
  • 13. sitemap4rdf
    Generate Sitemap files from a SPARQL endpoint
  • 14. sitemap4rdf
    Simple command line tool
    Sends a SPARQL query to list all URIs
    Generates sitemap
    sitemap4rdf http://yoursite/sparql http://yoursite/resource/
  • 15. Submit the sitemap location - Sindice
    http://sindice.com/main/submit
  • 16. Submit the sitemap location - Google
    https://www.google.com/webmasters/tools/
  • 17. Summary
    Sitemap protocol informs search engines about available pages
    Supported by Sindice!
    sitemap4rdf generates Sitemap files by listing URIs in a SPARQL endpoint
    Open source, Java
    http://lab.linkeddata.deri.ie/2010/sitemap4rdf/