this is the story of
making some open
linked* data


*disclaimer: it’s not very linked yet
we had a little project in York ...
to expose ‘The London Art World 1660-1735’
    dataset - several years of history of art
  research trawling primary and secondary
 sources of information about into art sales,
people, places and artworks all contained in
                 spreadsheets
simple database-driven
this is about an art sale              web site




     first we put the data on
                     the web
                            artworld.york.ac.uk
ok, so it’s on the web, it has some
 links, it’s open, right? can I go
now? ... not so fast, I’m not done
                 yet
how does a machine know
                                 and these are links to info about
 that this is about an art             people and places?
           sale?




    and how can someone get
       at this info and do                 like enrich it with
   interesting things with it?             information from
                                               elsewhere?
linked open data
 ... describing real-world things and
the relationships between them in a
        machine-readable way
in walks RDF:
Resource (identifying resources on the
web) Description (and describing them)
  Framework (with a model based on
         triples and graphs)
PREDICATE
SUBJECT                 (aka relationship)                 OBJECT


     <someArtist> <occupied> <somePlace>

      <someArtist> <painted> <somePainting>

          <somePainting> <soldIn> <someSale>

           <someSale> <happenedIn> <somePlace>
                                                          all of these
              <someCatalogue> <describes> <someSale>      will be uris
                <someSaleItem> <soldFor> <somePrice>


    RDF -        <someBuyer> <purchased> <someSaleItem>

      all                                                 this is not
     about                                                the rdf you
                                                          are looking
    triples                                                   for
an   ontology standardized representation
                                                             is a

               of knowledge as a set of concepts within a
             domain, and the relationships between those
              concepts. It can be used to reason about the
             entities within that domain, and may be used
               to describe the domain (wikipedia)


or, put another way “a standard way of describing stuff
               for a given domain” (me)
we should either use terms from
existing ontologies or create and
publish our terms using standard
           approaches
we created an event-driven ontology based
 on DUL (DOLCE Ultra Lite) and LODE
         (Linked Open Events)

                   why?

   because we wanted to create rich and
specific data but ensure our data could still
be understood in a generic and low barrier
                    way
dlib.york.ac.uk/ontologies
linking means making
connections between our
    data and others
we linked our people to viaf and some of
       our places to geonames ...

  <ourPerson> <sameas> <viafPerson>

 <ourPlace> <sameas> <geonamesPlace>

... a data consumer can start following this
             network of links
making data
image: http://www.flickr.com/photos/kikishua/5451503709/
spreadsheet cleanup with scripting, a database
               and some Google refine action*




                * google refine is very useful for dealing with
                      messy spreadsheets + has an rdf plugin
a turtle* document for each of our 38,000
                        primary ‘entities’

                                                 sale
                                              person
                                                place
                                             artwork
                                               source

 stored in dlib.york.ac.uk and indexed in
   sindice.com** semantic search engine
                           * a format for creating rdf data
            ** try a search for sale domain:dlib.york.ac.uk
foaf:primaryTopic <http://dlib.york.ac.uk/id/place/34867>;
                rdf:type foaf:Document, dctype:Text .

         <http://dlib.york.ac.uk/data/place/34867/turtle>
                 void:inDataset <http://dlib.york.ac.uk/data/void.ttl#OpenART>;
                 rdf:type foaf:Document, dctype:Text .

         <http://dlib.york.ac.uk/data/place/34867/rdf>
                 void:inDataset <http://dlib.york.ac.uk/data/void.ttl#OpenART>;
                 rdf:type foaf:Document, dctype:Text .


     <http://dlib.york.ac.uk/id/place/34867>
         mapping:hasResearchID "3.0548"^^<xsd:string>;
SUBJECT rdfs:label "The Green Doors in the Little Piazza, Covent Garden; sale         venue";
                vocupper:hasPlaceName "The Green Doors in the Little Piazza, Covent Garden";
                vocupper:hasBuildingName "The Green Doors";
                vocupper:hasStreetName "Little Piazza";
                vocupper:hasCity "London";
                vocupper:hasCounty "Greater London";
                vocupper:hasCountry "England";
                vochoa:hasContributorOfSource "Richard Stephens";

                                                           OBJECT
                oactxt:venueOfSale <http://dlib.york.ac.uk/id/sale/3494
         8>;
PREDICATE       oactxt:venueOfSale   <http://dlib.york.ac.uk/id/sale/34949>;
                oactxt:venueOfSale   <http://dlib.york.ac.uk/id/sale/34950>;
                oactxt:venueOfSale   <http://dlib.york.ac.uk/id/sale/34951>;
                oactxt:venueOfSale   <http://dlib.york.ac.uk/id/sale/34952>;

           vocupper:liesWithin [
                owl:sameas <http://www.geonames.org/6269131/>;
                rdf:type model:Place, vocupper:Country,
  LINKED
      owl:NamedIndividual
           ];
                rdf:type model:Place, owl:NamedIndividual .
DISCLAIMER

     ours was one approach

 it is very experimental and is
   imperfect in various ways

it showed that we could do linked
   data with an existing system

      we want to do more
linked open data is leap of
          faith -
 you have to expose data
before people can consume
           data
aim high -
if we all put out high quality
   rich data we can do high
   quality AND low barrier
         things with it
there are 77981 results for
    ‘York’ in geonames




       we had a little project in York ...
we had a little project in
http://www.geonames.org/2633352/
                  ...
credits

         Richard Stephens: data creator
                      Tate: data partners
               Martin Dow: ontology dev
          Stephen Bayliss: ontology dev
             Paul Young: data transform
              LOCAH project: inspiration
                  Jon Voss: lodlam guru
University of York: institutional support
                            JISC: funding

                                    @julieallinson
                       julie.allinson@york.ac.uk
                http://tinyurl.com/dlib-openart
                                #LODLAM #sxsw

Radically Open Cultural Heritage Data on the Web

  • 1.
    this is thestory of making some open linked* data *disclaimer: it’s not very linked yet
  • 2.
    we had alittle project in York ...
  • 3.
    to expose ‘TheLondon Art World 1660-1735’ dataset - several years of history of art research trawling primary and secondary sources of information about into art sales, people, places and artworks all contained in spreadsheets
  • 4.
    simple database-driven this isabout an art sale web site first we put the data on the web artworld.york.ac.uk
  • 5.
    ok, so it’son the web, it has some links, it’s open, right? can I go now? ... not so fast, I’m not done yet
  • 6.
    how does amachine know and these are links to info about that this is about an art people and places? sale? and how can someone get at this info and do like enrich it with interesting things with it? information from elsewhere?
  • 7.
    linked open data ... describing real-world things and the relationships between them in a machine-readable way
  • 8.
    in walks RDF: Resource(identifying resources on the web) Description (and describing them) Framework (with a model based on triples and graphs)
  • 9.
    PREDICATE SUBJECT (aka relationship) OBJECT <someArtist> <occupied> <somePlace> <someArtist> <painted> <somePainting> <somePainting> <soldIn> <someSale> <someSale> <happenedIn> <somePlace> all of these <someCatalogue> <describes> <someSale> will be uris <someSaleItem> <soldFor> <somePrice> RDF - <someBuyer> <purchased> <someSaleItem> all this is not about the rdf you are looking triples for
  • 10.
    an ontology standardized representation is a of knowledge as a set of concepts within a domain, and the relationships between those concepts. It can be used to reason about the entities within that domain, and may be used to describe the domain (wikipedia) or, put another way “a standard way of describing stuff for a given domain” (me)
  • 11.
    we should eitheruse terms from existing ontologies or create and publish our terms using standard approaches
  • 12.
    we created anevent-driven ontology based on DUL (DOLCE Ultra Lite) and LODE (Linked Open Events) why? because we wanted to create rich and specific data but ensure our data could still be understood in a generic and low barrier way
  • 13.
  • 14.
    linking means making connectionsbetween our data and others
  • 15.
    we linked ourpeople to viaf and some of our places to geonames ... <ourPerson> <sameas> <viafPerson> <ourPlace> <sameas> <geonamesPlace> ... a data consumer can start following this network of links
  • 16.
  • 17.
  • 18.
    spreadsheet cleanup withscripting, a database and some Google refine action* * google refine is very useful for dealing with messy spreadsheets + has an rdf plugin
  • 19.
    a turtle* documentfor each of our 38,000 primary ‘entities’ sale person place artwork source stored in dlib.york.ac.uk and indexed in sindice.com** semantic search engine * a format for creating rdf data ** try a search for sale domain:dlib.york.ac.uk
  • 20.
    foaf:primaryTopic <http://dlib.york.ac.uk/id/place/34867>; rdf:type foaf:Document, dctype:Text . <http://dlib.york.ac.uk/data/place/34867/turtle> void:inDataset <http://dlib.york.ac.uk/data/void.ttl#OpenART>; rdf:type foaf:Document, dctype:Text . <http://dlib.york.ac.uk/data/place/34867/rdf> void:inDataset <http://dlib.york.ac.uk/data/void.ttl#OpenART>; rdf:type foaf:Document, dctype:Text . <http://dlib.york.ac.uk/id/place/34867> mapping:hasResearchID "3.0548"^^<xsd:string>; SUBJECT rdfs:label "The Green Doors in the Little Piazza, Covent Garden; sale venue"; vocupper:hasPlaceName "The Green Doors in the Little Piazza, Covent Garden"; vocupper:hasBuildingName "The Green Doors"; vocupper:hasStreetName "Little Piazza"; vocupper:hasCity "London"; vocupper:hasCounty "Greater London"; vocupper:hasCountry "England"; vochoa:hasContributorOfSource "Richard Stephens"; OBJECT oactxt:venueOfSale <http://dlib.york.ac.uk/id/sale/3494 8>; PREDICATE oactxt:venueOfSale <http://dlib.york.ac.uk/id/sale/34949>; oactxt:venueOfSale <http://dlib.york.ac.uk/id/sale/34950>; oactxt:venueOfSale <http://dlib.york.ac.uk/id/sale/34951>; oactxt:venueOfSale <http://dlib.york.ac.uk/id/sale/34952>; vocupper:liesWithin [ owl:sameas <http://www.geonames.org/6269131/>; rdf:type model:Place, vocupper:Country, LINKED owl:NamedIndividual ]; rdf:type model:Place, owl:NamedIndividual .
  • 21.
    DISCLAIMER ours was one approach it is very experimental and is imperfect in various ways it showed that we could do linked data with an existing system we want to do more
  • 22.
    linked open datais leap of faith - you have to expose data before people can consume data
  • 23.
    aim high - ifwe all put out high quality rich data we can do high quality AND low barrier things with it
  • 24.
    there are 77981results for ‘York’ in geonames we had a little project in York ...
  • 25.
    we had alittle project in http://www.geonames.org/2633352/ ...
  • 26.
    credits Richard Stephens: data creator Tate: data partners Martin Dow: ontology dev Stephen Bayliss: ontology dev Paul Young: data transform LOCAH project: inspiration Jon Voss: lodlam guru University of York: institutional support JISC: funding @julieallinson julie.allinson@york.ac.uk http://tinyurl.com/dlib-openart #LODLAM #sxsw