Publishing Linked Data using Schema.org Development and management of e-Repositories – OTA IODE, Oostende, Belgium, April 11th, 2013 An introduction to the project of Mr. Aditya Kakodkar byChristophe.Dupriez@destin-informatique.com
LinkedData, Why?● External/Internal (Reference) Data use and reuse● (Meta) Data encoded and published along standardized, perennial and documented measurement systems and categories● Massive international efforts for tools and interlinked repositories development● Opportunity to become a General Reference on the Web for a specific domain● Your work becomes discoverable and well positioned by Search Engines
Data to be linked ?● Metadata provides the context, links to a MODEL● Observed Data: source, measure/range, unit...● Manually entered Data: validation rules● Aggregated Data: Which indicator for which decision?● Published Data: exact? complete? perenial?● Reference Data: comparability with other data?● Open Data is (not) Public Data! http://opendatacommons.org● Personal Data: protection? anonymisation?● Big Data: dangers? opportunities?
Linking Data in order to...● Denote an “real life” object, a concept, a transaction... – not uniquely enough: sameAs.org● Document (explain, contextualize) the data to the user (HTML document page)● Enrich, linking to other data ... (RDF data page)
RDF: Resource Description Framework● A standard to provide (meta)data on the Web● Based on a very simple model of triplets: subject – property – object● Everything is an URI; object can also be a “constant value” (a text, a number, a date...) suffixed by an indication of the language● Example: dbpedia:European_Herring_Gull rdfs:label “Goéland argenté”@fr where “dbpedia:” stands for URI prefix: http://dbpedia.org/resource/ and “rdfs:” stands for URI prefix: http://www.w3.org/2000/01/rdf-schema#
Being a Gull is not Dull !● http://en.wikipedia.org/wiki/European_Herring_Gull● http://dbpedia.org/resource/European_Herring_Gull which redirects to the document (HTML for human consumption): http://dbpedia.org/page/European_Herring_Gull● Data (for machine consumption) is generated separately in different formats (N3, Turtle, XML, JSON...) : http://dbpedia.org/data/European_Herring_Gull.n3● Browser negotiates the suitable format...● What is validated there? What are the rules?● Can it be a reference to take decisions?
Using a single page?● RDFa and MicroData are two standards to MERGE an HTML document (made for humans) and the data a machine may wish to extract from it● Example from a page in OceanExpert.net: <h1>Details of<span itemprop="name"> <span itemprop="familyName">Dupriez</span> , <span itemprop="givenName">Christophe </span> </span></h1>● ANY23.org, an Open Source software to collect data embedded in a Web Page will be demonstrated later on OceanExpert.net...
Data Model● Which processes do we need to automate? (use cases)● Which entities (real objects, concepts, transactions/events) have to be represented?● How do those entities interrelate?● What measures (properties) are made about each type of entity?● Reuse: who else will align on the same model? What Google may do with my data?
Schema.org● Schema.org is a modelling initiative of Google / Microsoft / Yahoo to standardize URIs for RDF properties● Common model for data published as documents harvestable on the web● Their goal is to collect the data in our pages. Those pages are then better indexed. What else? (A.I.?)● Schema.org models are far from exhaustive (for instance, insufficient for CVs) but a “/extension” mechanism exists● Examples on the site http://schema.org
Google RichSnippets● Google Spider extracts data tagged using RDFa or MicroData● Pages with such data are promoted...● Google Search Engine enriches results using this data● Example “Apollo Theatre”: place, events, reviews...● Google RichSnippets tool validates a web page: http://www.google.com/webmasters/tools/richsnippets
Data Search Engine● ANY23 is used to feed SINDICE, the Search Engine for RDF data● Example: http://www.sindice.com/search?q=apollo+theatre