How to get your data into Sindice and Google with sitemap4rdf
Upcoming SlideShare
Loading in...5
×
 

How to get your data into Sindice and Google with sitemap4rdf

on

  • 1,978 views

 

Statistics

Views

Total Views
1,978
Views on SlideShare
1,978
Embed Views
0

Actions

Likes
3
Downloads
8
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

How to get your data into Sindice and Google with sitemap4rdf How to get your data into Sindice and Google with sitemap4rdf Presentation Transcript

  • How to get your data into Sindice and Google with sitemap4rdf
    Boris Villazón-Terrazas (OEG), Richard Cyganiak (DERI)
  • Publishing Linked Data
    from a triple store
  • Linked Data frontends for triple stores
    Source: Pubby website, http://www4.wiwiss.fu-berlin.de/pubby/
  • Search engines
  • Sindice: the best RDF search engine
  • Sindice: the best RDF search engine
    120M+ documents
    Continuously updating since 2006
    Low-latency search API
    RDF/XML, Turtle, RDFa, microformats
  • The Sitemap protocol
  • Sitemap Protocol
    Used by web crawlers
    Efficiently find all your content & discover what has been updated
    http://sitemaps.org/
  • Sitemap Protocol: Simple example
    <?xml version="1.0" encoding="UTF-8"?>
    <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
    <loc>http://yoursite/</loc>
    </url>
    <url>
    <loc>http://yoursite/products/53546</loc>
    </url>
    <url>
    <loc>http://yoursite/products/98421</loc>
    </url>
    <url>
    <loc>http://yoursite/products/41003</loc>
    </url>
    </urlset>
  • Sitemap Protocol: Optional parts
    <?xml version="1.0" encoding="UTF-8"?>
    <urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
    <loc>http://yoursite/</loc>
    <lastmod>2010-06-24</lastmod>
    <changefreq>daily</changefreq>
    </url>
    </urlset>
  • Sitemap Protocol: Huge sitemaps
    Gzip-compress your sitemap
    Limit: 50k URLs or 10MB
    split into multiple sitemap files
    add a sitemap index file
  • Sitemap Protocol: Discovery
    Publish the sitemap file
    Add a line to http://yoursite/robots.txt
    Sitemap: http://yoursite/sitemap.xml
  • sitemap4rdf
    Generate Sitemap files from a SPARQL endpoint
  • sitemap4rdf
    Simple command line tool
    Sends a SPARQL query to list all URIs
    Generates sitemap
    sitemap4rdf http://yoursite/sparql http://yoursite/resource/
  • Submit the sitemap location - Sindice
    http://sindice.com/main/submit
  • Submit the sitemap location - Google
    https://www.google.com/webmasters/tools/
  • Summary
    Sitemap protocol informs search engines about available pages
    Supported by Sindice!
    sitemap4rdf generates Sitemap files by listing URIs in a SPARQL endpoint
    Open source, Java
    http://lab.linkeddata.deri.ie/2010/sitemap4rdf/