Linked data

             the semantic web
and what it all means
   for publishers

             Graham Bell
              EDItEUR

              7th March 2013
About me
• 20 years experience at the point where
  publishing and technology meet
• formerly senior manager in IT department
  for HarperCollins UK
 •   led development of bibliographic, editorial and
     digital asset management systems –involved in
     e-book, e-audio, print-on-demand and online
     projects
• joined EDItEUR in mid-2010 as Chief Data
  Architect, primarily responsible for
  EDItEUR’s standards development work
About EDItEUR
• not-for-profit membership organisation
• develops, supports and promotes metadata
  and identification standards for the book,
  e-book and serials supply chains
• acknowledged centre of expertise on
  standards and metadata for the industry
• based in London, but a global membership
  of publishers, distributors, wholesalers,
  subscription agents, retailers, libraries,
  system vendors, rights organizations and
  trade associations
About EDItEUR
• also provides management services to
  International ISBN, ISTC, ISNI Agencies
• EDItEUR has three full-time staff, two FTE
  project staff, plus consultants from both the
  book and serials sectors
• we also work closely with other standards
  organisations, to ensure our standards meet
  the needs of their stakeholders too
• member participation is vital to ensure that
  standards keep pace with evolving business
  requirements
“An item of metadata
       is a relationship
that someone claims to exist
   between two entities.”
                              indecs, 2000




             http://www.doi.org/factsheets/indecs_factsheet.html
Title: The fire engine that
                       disappeared
        Title: The man on the
                Author: Maj Sjöwall
               balcony
Title: Roseanna ISBN: 978-0-00-783533-1
        Author: Maj Sjöwall
Author: Maj Sjöwall
                Pub date: 07-08-2007
        ISBN: 978-0-00-724293-1
ISBN: 978-0-00-723283-3
        Pub date: 15-01-2007
Pub date: 07-08-2006
Contributor
  SequenceNumber1/SequenceNumber
  ContributorRoleA01/ContributorRole
  NameIdentifier
    NameIDType16/NameIDType
    IDValue0000000121479135/IDValue
  /NameIdentifier
  PersonNameMaj Sjöwall/PersonName
/Contributor
Contributor
  SequenceNumber1/SequenceNumber
  ContributorRoleA01/ContributorRole
  NameIdentifier
    NameIDType16/NameIDType
    IDValue0000000121479135/IDValue
  /NameIdentifier
  PersonNameMaj Sjöwall/PersonName
/Contributor

                                 http://isni.org/search
0..n
  Contributor                        Name

SequenceNumber                   NameType
ContributorRole                  NamesBeforeKey
BiographicalNote                 PrefixToKey
                                 KeyNames
                                 SuffixToKey
                   0..n          NamesAfterKey
Name Identifier

NameIDType
IDTypeName
IDValue
predicate
subject               object
0000000121479135
                          ID type 16
                       from List 44

                  contributor
          role A01                           name type 01
       from List 17                           from List 18


book                                   Sjöwall, Maj

                      predicate
  subject                              object
Linked data
• expresses metadata as a collection of triples
• uses URIs to represent relations and entities
• prefers persistent HTTP URIs so they can be
  ‘looked up’ to get further details
  •   the data can be ‘self-describing’
• is intended to be flexible and extensible,
  because it’s ‘schemaless’
• isn’t new
• is not intended for human consumption
Linked data
• expresses metadata as a collection of triples
• uses URIs to represent relations and entities
••prefers persistent HTTP URIs–so they can be
   Uniform Resource Identifier two types…
    • Uniform Resource Name
  ‘looked up’ to get further details
  • the urn:isbn:9780001234567
      • data can be ‘self-describing’
• is intended Resource Locator extensible,
   • Uniform to be flexible and
  because it’s ‘schemaless’
     • http://dx.doi.org/10.978.000/1234567
• isn’t new
• is not intended for human consumption
isbn:9780007232833                  0000000121479135
                             ID type 16
  ID type 15              from List 44
 from List 5
                      contributor
              role A01                          name type 01
           from List 17                          from List 18


    book                                  Sjöwall, Maj
http://
                     http://ns.editeur.org/
harpercollins.co.uk/                             genid:A96
                     onix/codelists/17#A01
      360366

                      http://ns.editeur.org/
    genid:A96                                0000000121479135
                       onix/codelist/44#16


                      http://ns.editeur.org/
    genid:A96                                  Sjöwall, Maj
                      onix/codelists/18#01

      http://
                     http://ns.editeur.org/      urn:isbn:
harpercollins.co.uk/
                      onix/codelists/5#15      9780007232833
      360366
isbn:9780007232833                  0000000121479135
                             ID type 16
  ID type 15              from List 44
 from List 5
                      contributor
              role A01
           from List 17


    book                                  Sjöwall, Maj


                                          http://purl.org/vocab/
      Wahlöö, Per                 relationship/lifePartnerOf
McBain, Ed
Mankell, Henning
                                         http://purl.
                                         org/vocab/
                                         relationship/
                                         influencedBy
         http://purl.
         org/vocab/
         relationship/         Sjöwall, Maj
         influencedBy


                                http://purl.org/vocab/
      Wahlöö, Per       relationship/lifePartnerOf
                             http://vocab.org/relationship/.html
An aside
• linked data is often confused with the
  concept of ‘LinkedOpenData’
  •   linked data, but with an ideological view that
      data should be open and freely available for
      anyone to reuse without restrictions
  •   much of the data that’s claimed to be LOD is
      infact data where the owner has omitted to
      post a licence governing its reuse
  •   most of the truly open data is often data
      produced by the public sector
An aside
• linked data is often confused with the
  concept of ‘LinkedOpenData’
  •   linked data, but with an ideological view that
      data should be open and freely available for
      anyone to reuse without restrictions
  •   much of the data that’s claimed to be LOD is
      infact data where the owner has omitted to
      post a licence governing its reuse
  •     most of the truly open data is often data
        produced by the public sector
http://lod-cloud.net/
Linking data
• linking depends on shared entities
  •   we need to be sure we are really talking about
      the same thing – not just two people with the
      same name, but actually the same contributor
  •   unambiguous identifiers
  •   controlled vocabularies, taxonomies and clear,
      shared semantics based on shared data models
• linked data needs careful data modelling,
  and a concern for semantics and identity –
  and it’s technically challenging. There’s a
  steep learning curve
Linked data challenges

• linked data contains no provenance
• metadata licensing is often unclear
• needs shared persistent identifiers –but much
  of the current linked data activity is
  experimental and lacks sustainability
• poor legacy data – we’re still struggling with
  the schema-led approach…
• and where’s my record gone?
Linked data promise
• just another way of expressing the same data
  •   book metadata is already ‘data with links’
  •   linked data removes the limits of ‘the record’
• but optimised for machines, not people
  •   Tim Berners-Lee’s ‘semantic web’
  •   allows machines to ‘browse’ for other data – so
      if your book is set in Sweden, then it can
      automatically be linked to other sources of data
      about Sweden or to other Swedish things
• could change the game of ‘discoverability’
graham@editeur.org
    www.editeur.org

Linked Data, the Semantic Web and What It All Means for Books

  • 1.
    Linked data the semantic web and what it all means for publishers Graham Bell EDItEUR 7th March 2013
  • 2.
    About me • 20years experience at the point where publishing and technology meet • formerly senior manager in IT department for HarperCollins UK • led development of bibliographic, editorial and digital asset management systems –involved in e-book, e-audio, print-on-demand and online projects • joined EDItEUR in mid-2010 as Chief Data Architect, primarily responsible for EDItEUR’s standards development work
  • 3.
    About EDItEUR • not-for-profitmembership organisation • develops, supports and promotes metadata and identification standards for the book, e-book and serials supply chains • acknowledged centre of expertise on standards and metadata for the industry • based in London, but a global membership of publishers, distributors, wholesalers, subscription agents, retailers, libraries, system vendors, rights organizations and trade associations
  • 4.
    About EDItEUR • alsoprovides management services to International ISBN, ISTC, ISNI Agencies • EDItEUR has three full-time staff, two FTE project staff, plus consultants from both the book and serials sectors • we also work closely with other standards organisations, to ensure our standards meet the needs of their stakeholders too • member participation is vital to ensure that standards keep pace with evolving business requirements
  • 5.
    “An item ofmetadata is a relationship that someone claims to exist between two entities.” indecs, 2000 http://www.doi.org/factsheets/indecs_factsheet.html
  • 6.
    Title: The fireengine that disappeared Title: The man on the Author: Maj Sjöwall balcony Title: Roseanna ISBN: 978-0-00-783533-1 Author: Maj Sjöwall Author: Maj Sjöwall Pub date: 07-08-2007 ISBN: 978-0-00-724293-1 ISBN: 978-0-00-723283-3 Pub date: 15-01-2007 Pub date: 07-08-2006
  • 7.
    Contributor SequenceNumber1/SequenceNumber ContributorRoleA01/ContributorRole NameIdentifier NameIDType16/NameIDType IDValue0000000121479135/IDValue /NameIdentifier PersonNameMaj Sjöwall/PersonName /Contributor
  • 8.
    Contributor SequenceNumber1/SequenceNumber ContributorRoleA01/ContributorRole NameIdentifier NameIDType16/NameIDType IDValue0000000121479135/IDValue /NameIdentifier PersonNameMaj Sjöwall/PersonName /Contributor http://isni.org/search
  • 9.
    0..n Contributor Name SequenceNumber NameType ContributorRole NamesBeforeKey BiographicalNote PrefixToKey KeyNames SuffixToKey 0..n NamesAfterKey Name Identifier NameIDType IDTypeName IDValue
  • 10.
  • 11.
    0000000121479135 ID type 16 from List 44 contributor role A01 name type 01 from List 17 from List 18 book Sjöwall, Maj predicate subject object
  • 12.
    Linked data • expressesmetadata as a collection of triples • uses URIs to represent relations and entities • prefers persistent HTTP URIs so they can be ‘looked up’ to get further details • the data can be ‘self-describing’ • is intended to be flexible and extensible, because it’s ‘schemaless’ • isn’t new • is not intended for human consumption
  • 13.
    Linked data • expressesmetadata as a collection of triples • uses URIs to represent relations and entities ••prefers persistent HTTP URIs–so they can be Uniform Resource Identifier two types… • Uniform Resource Name ‘looked up’ to get further details • the urn:isbn:9780001234567 • data can be ‘self-describing’ • is intended Resource Locator extensible, • Uniform to be flexible and because it’s ‘schemaless’ • http://dx.doi.org/10.978.000/1234567 • isn’t new • is not intended for human consumption
  • 14.
    isbn:9780007232833 0000000121479135 ID type 16 ID type 15 from List 44 from List 5 contributor role A01 name type 01 from List 17 from List 18 book Sjöwall, Maj
  • 15.
    http:// http://ns.editeur.org/ harpercollins.co.uk/ genid:A96 onix/codelists/17#A01 360366 http://ns.editeur.org/ genid:A96 0000000121479135 onix/codelist/44#16 http://ns.editeur.org/ genid:A96 Sjöwall, Maj onix/codelists/18#01 http:// http://ns.editeur.org/ urn:isbn: harpercollins.co.uk/ onix/codelists/5#15 9780007232833 360366
  • 16.
    isbn:9780007232833 0000000121479135 ID type 16 ID type 15 from List 44 from List 5 contributor role A01 from List 17 book Sjöwall, Maj http://purl.org/vocab/ Wahlöö, Per relationship/lifePartnerOf
  • 17.
    McBain, Ed Mankell, Henning http://purl. org/vocab/ relationship/ influencedBy http://purl. org/vocab/ relationship/ Sjöwall, Maj influencedBy http://purl.org/vocab/ Wahlöö, Per relationship/lifePartnerOf http://vocab.org/relationship/.html
  • 18.
    An aside • linkeddata is often confused with the concept of ‘LinkedOpenData’ • linked data, but with an ideological view that data should be open and freely available for anyone to reuse without restrictions • much of the data that’s claimed to be LOD is infact data where the owner has omitted to post a licence governing its reuse • most of the truly open data is often data produced by the public sector
  • 19.
    An aside • linkeddata is often confused with the concept of ‘LinkedOpenData’ • linked data, but with an ideological view that data should be open and freely available for anyone to reuse without restrictions • much of the data that’s claimed to be LOD is infact data where the owner has omitted to post a licence governing its reuse • most of the truly open data is often data produced by the public sector http://lod-cloud.net/
  • 20.
    Linking data • linkingdepends on shared entities • we need to be sure we are really talking about the same thing – not just two people with the same name, but actually the same contributor • unambiguous identifiers • controlled vocabularies, taxonomies and clear, shared semantics based on shared data models • linked data needs careful data modelling, and a concern for semantics and identity – and it’s technically challenging. There’s a steep learning curve
  • 21.
    Linked data challenges •linked data contains no provenance • metadata licensing is often unclear • needs shared persistent identifiers –but much of the current linked data activity is experimental and lacks sustainability • poor legacy data – we’re still struggling with the schema-led approach… • and where’s my record gone?
  • 22.
    Linked data promise •just another way of expressing the same data • book metadata is already ‘data with links’ • linked data removes the limits of ‘the record’ • but optimised for machines, not people • Tim Berners-Lee’s ‘semantic web’ • allows machines to ‘browse’ for other data – so if your book is set in Sweden, then it can automatically be linked to other sources of data about Sweden or to other Swedish things • could change the game of ‘discoverability’
  • 26.
    graham@editeur.org www.editeur.org