Archives & The
Semantic Web
      Mark A. Matienzo
  The New York Public Library
New York Archivists’ Round Table
Annual M...
Disclaimer

   The following presentation, while
 factual, expresses opinions of my own
and not of my employer, my coworke...
Archives & The Web
The Web isn’t new,
 even to archivists.
http://listserv.muohio.edu/scripts/wa.exe?S1=archives
http://web.archive.org/web/19970606072913/http://www.nara.gov/
Web-based archival
description isn’t new.
http://sunsite.berkeley.edu/FindingAids/EAD/bfap.html
http://web.archive.org/web/19970523152709/http://www.library.yale.edu/beinecke/aboutosb.htm
http://web.archive.org/web/19990203012659/lcweb.loc.gov/ead/
http://digilib.nypl.org/dynaweb/ead/nypl/greco/@Generic__BookView
The Web, at its essence,
    is about links.
We take links for
granted in our work.
http://www.nypl.org/research/manuscripts/result.cfm?find=1
http://www.nypl.org/research/manuscripts/result.cfm?find=1
http://www.nypl.org/research/chss/spe/brg/berg.html
http://www.nypl.org/research/manuscripts/berg/brgabbey.xml
http://catnyp.nypl.org/record=b7621732
Links go beyond the
easily accessible sort.
http://catnyp.nypl.org/record=b7621732
http://catnyp.nypl.org/search?/dAbbey+Theatre./dabbey
+theatre/-3%2C-1%2C0%2CB/exact&FF=dabbey+theatre&1%2C41%2C
http://leopac4.nypl.org/ipac20/ipac.jsp?
session=1O4572959KO37.4065&profile=dial--3&uri=link=1100083~!S908632~!
       1100...
http://authorities.loc.gov/cgi-bin/Pwebrecon.cgi?
AuthRecID=969701&v1=1&HC=2&SEQ=20090622233419&PID=vDA4Ugr3s8SGy7-dKIByauO
Further down the
  rabbit hole.
http://en.wikipedia.org/wiki/Abbey_Theatre
http://www.abbeytheatre.ie/
Links become implicit.
Computers don’t “do”
   implicit links.
Humans must correlate
 data on both ends.
These access points don’t link to anything.
The Semantic Web
(blame this guy)
  http://www.flickr.com/photos/tanaka/3212373419/
I have a dream for the Web [in which
computers] become capable of analyzing
all the data on the Web – the content, links,
...
Linked Data is a way
    to link better.

               Dan Chudnov, Better Living Through Linking.
  http://onebiglibrar...
Linked data is not a new
   concept in archives.
If the series becomes the primary level of
classification, and the item the secondary
level, a) items are kept in their
adm...
Peter J. Scott, “The Record Group Concept: A Case For Abandonment,” American Archivist 29(4), 1966.
Peter J. Scott, “The Record Group Concept: A Case For Abandonment,” American Archivist 29(4), 1966.
Design Principles
1. Use URIs for names of things
2. Use HTTP URIs so people can look up
   those names
3. Provide useful ...
Naming things with
URIs tells us where
  to find them.
Using HTTP (Web)
URIs tells us how to
 find these things.
Providing data in
standard formats tells
 us what that thing is.
EAD is not a standard
 format in this sense.
RDF
1. Resource Description Framework
2. Presents relationships in a simple data
   structure
3. We can draw graphs of tho...
In RDF, we say some
thing has a property
with a certain value.
<http://matienzo.org/#me> foaf:firstName “Mark”.
<http://matienzo.org/#me> foaf:firstName “Mark”.
        thing (Me)          property    value
An RDF Graph

                          foaf:firstName
                                          “Mark”



http://matienzo....
An RDF Graph

                          foaf:based_near
                                            http://dbpedia.org/pag...
Simply linking to things
    is not enough.
RDF graphs show why
we link to other things.
These links say what
the relationships are.
Links between things
become crossreferences.
Precision improves
with explicit links and
 “smart crawlers.”
http://sindice.com/search?q=abbey+theatre&qt=term
http://www4.wiwiss.fu-berlin.de/bizer/pub/lod-datasets_2009-03-27.png
Examples in Libraries
LIBRIS




http://libris.kb.se/data/bib/4721351
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3....
id.loc.gov




http://id.loc.gov/authorities/sh96007490
Chronicling America




    http://chroniclingamerica.loc.gov/lccn/sn85066387/
NSDL Registry




   http://metadataregistry.org/
Examples in Archives
UK Archival Thesaurus




   http://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20050510/
Archives de France “Thesaurus W”




    http://www.archivesdefrance.culture.gouv.fr/gerer/classement/normes-outils/thesau...
Agrippa (AMVC)




http://www.analogousspaces.com/media/docs/GUNS_AS_MAY08.pdf
Barriers are both
cultural and technical.
Archival description
  contains lots of
implicit information.
“Inheritance” of data in
multi-level description is
     highly implicit.
EAD is document-centric
    standard, not a
 data-centric standard.
EAC, a standard in
development, is more
    data-centric.
Archival description, in
its current state, is not
   computer-friendly.
Archival description, in
its current state, is not
 Linked Data-friendly.
EAD needs to change to
interoperate with EAC as
 well as other standards.
It is up to the archival
community to steer the
standards accordingly.
Thank You
     mark@matienzo.org
     http://matienzo.org/
http://twitter.com/anarchivist
Upcoming SlideShare
Loading in...5
×

Archives & the Semantic Web

6,342

Published on

Given June 23, 2009 at the Annual Meeting of the Archivists' Round Table of Metropolitan New York.

Published in: Technology, Education

Archives & the Semantic Web

  1. 1. Archives & The Semantic Web Mark A. Matienzo The New York Public Library New York Archivists’ Round Table Annual Meeting, June 23, 2009
  2. 2. Disclaimer The following presentation, while factual, expresses opinions of my own and not of my employer, my coworkers, my family, etc.
  3. 3. Archives & The Web
  4. 4. The Web isn’t new, even to archivists.
  5. 5. http://listserv.muohio.edu/scripts/wa.exe?S1=archives
  6. 6. http://web.archive.org/web/19970606072913/http://www.nara.gov/
  7. 7. Web-based archival description isn’t new.
  8. 8. http://sunsite.berkeley.edu/FindingAids/EAD/bfap.html
  9. 9. http://web.archive.org/web/19970523152709/http://www.library.yale.edu/beinecke/aboutosb.htm
  10. 10. http://web.archive.org/web/19990203012659/lcweb.loc.gov/ead/
  11. 11. http://digilib.nypl.org/dynaweb/ead/nypl/greco/@Generic__BookView
  12. 12. The Web, at its essence, is about links.
  13. 13. We take links for granted in our work.
  14. 14. http://www.nypl.org/research/manuscripts/result.cfm?find=1
  15. 15. http://www.nypl.org/research/manuscripts/result.cfm?find=1
  16. 16. http://www.nypl.org/research/chss/spe/brg/berg.html
  17. 17. http://www.nypl.org/research/manuscripts/berg/brgabbey.xml
  18. 18. http://catnyp.nypl.org/record=b7621732
  19. 19. Links go beyond the easily accessible sort.
  20. 20. http://catnyp.nypl.org/record=b7621732
  21. 21. http://catnyp.nypl.org/search?/dAbbey+Theatre./dabbey +theatre/-3%2C-1%2C0%2CB/exact&FF=dabbey+theatre&1%2C41%2C
  22. 22. http://leopac4.nypl.org/ipac20/ipac.jsp? session=1O4572959KO37.4065&profile=dial--3&uri=link=1100083~!S908632~! 1100001~!1100087&aspect=basic&menu=search&ri=2&source=~! dial&term=Abbey+Theatre&index=SL
  23. 23. http://authorities.loc.gov/cgi-bin/Pwebrecon.cgi? AuthRecID=969701&v1=1&HC=2&SEQ=20090622233419&PID=vDA4Ugr3s8SGy7-dKIByauO
  24. 24. Further down the rabbit hole.
  25. 25. http://en.wikipedia.org/wiki/Abbey_Theatre
  26. 26. http://www.abbeytheatre.ie/
  27. 27. Links become implicit.
  28. 28. Computers don’t “do” implicit links.
  29. 29. Humans must correlate data on both ends.
  30. 30. These access points don’t link to anything.
  31. 31. The Semantic Web
  32. 32. (blame this guy) http://www.flickr.com/photos/tanaka/3212373419/
  33. 33. I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by mac hines talking to machines. The ‘intelligent agents’ people have touted for ages will finally materialize. Tim Berners-Lee, Weaving The Web.
  34. 34. Linked Data is a way to link better. Dan Chudnov, Better Living Through Linking. http://onebiglibrary.net/story/tcdl-2009-talk-better-living-through-linking
  35. 35. Linked data is not a new concept in archives.
  36. 36. If the series becomes the primary level of classification, and the item the secondary level, a) items are kept in their administrative context and original order by physical allocation to their appropriate series, and b) series are no longer kept in any original physical order in a record or shelf group (if there is any such order) but simply have their administrative context and associations recorded on paper. Peter J. Scott, “The Record Group Concept: A Case For Abandonment,” American Archivist 29(4), 1966.
  37. 37. Peter J. Scott, “The Record Group Concept: A Case For Abandonment,” American Archivist 29(4), 1966.
  38. 38. Peter J. Scott, “The Record Group Concept: A Case For Abandonment,” American Archivist 29(4), 1966.
  39. 39. Design Principles 1. Use URIs for names of things 2. Use HTTP URIs so people can look up those names 3. Provide useful information in standard formats at those URIs 4. Include links to other URIs so people can discover more things Tim Berners-Lee, Linked Data - Design Issues. http://www.w3.org/DesignIssues/LinkedData.html
  40. 40. Naming things with URIs tells us where to find them.
  41. 41. Using HTTP (Web) URIs tells us how to find these things.
  42. 42. Providing data in standard formats tells us what that thing is.
  43. 43. EAD is not a standard format in this sense.
  44. 44. RDF 1. Resource Description Framework 2. Presents relationships in a simple data structure 3. We can draw graphs of those relationships 4. We can represent those relationships in multiple formats for computers Tim Berners-Lee, Linked Data - Design Issues. http://www.w3.org/DesignIssues/LinkedData.html
  45. 45. In RDF, we say some thing has a property with a certain value.
  46. 46. <http://matienzo.org/#me> foaf:firstName “Mark”.
  47. 47. <http://matienzo.org/#me> foaf:firstName “Mark”. thing (Me) property value
  48. 48. An RDF Graph foaf:firstName “Mark” http://matienzo.org/#me
  49. 49. An RDF Graph foaf:based_near http://dbpedia.org/page/Brooklyn foaf:firstName “Mark” foaf:surname “Matienzo” http://matienzo.org/#me
  50. 50. Simply linking to things is not enough.
  51. 51. RDF graphs show why we link to other things.
  52. 52. These links say what the relationships are.
  53. 53. Links between things become crossreferences.
  54. 54. Precision improves with explicit links and “smart crawlers.”
  55. 55. http://sindice.com/search?q=abbey+theatre&qt=term
  56. 56. http://www4.wiwiss.fu-berlin.de/bizer/pub/lod-datasets_2009-03-27.png
  57. 57. Examples in Libraries
  58. 58. LIBRIS http://libris.kb.se/data/bib/4721351
  59. 59. @prefix dc: <http://purl.org/dc/elements/1.1/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix libris: <http://libris.kb.se/vocabulary/experimental#> . @prefix bibo: <http://purl.org/ontology/bibo/> . <http://libris.kb.se/resource/bib/4721351> rdfs:isDefinedBy <http://libris.kb.se/data/bib/4721351> . <http://libris.kb.se/resource/bib/4721351> rdf:type bibo:Book . <http://libris.kb.se/resource/bib/4721351> dc:title "The Abbey : Ireland's national theatre 1904-1979"@en . <http://libris.kb.se/resource/bib/4721351> dc:creator "Hunt, Hugh" . <http://libris.kb.se/resource/bib/4721351> dc:creator "Hugh Hunt" . <http://libris.kb.se/resource/bib/4721351> dc:type "text" . <http://libris.kb.se/resource/bib/4721351> dc:publisher "Columbia U.P" . <http://libris.kb.se/resource/bib/4721351> dc:date "1979" . <http://libris.kb.se/resource/bib/4721351> dc:description "U Can $ 33.60"@en . <http://libris.kb.se/resource/bib/4721351> dc:identifier <URN:ISBN:0231049064> . <http://libris.kb.se/resource/bib/4721351> bibo:isbn10 "0231049064" . <http://libris.kb.se/resource/bib/4721351> libris:held_by <http://libris.kb.se/resource/library/G> . <http://libris.kb.se/resource/bib/4721351> libris:held_by <http://libris.kb.se/resource/library/L> . <http://libris.kb.se/resource/bib/4721351> libris:held_by <http://libris.kb.se/resource/library/Li> . <http://libris.kb.se/resource/bib/4721351> libris:held_by <http://libris.kb.se/resource/library/U> . <http://libris.kb.se/resource/bib/4721351> libris:held_by <http://libris.kb.se/resource/library/Uh> . http://libris.kb.se/data/bib/4721351?format=text%2Frdf%2Bn3
  60. 60. id.loc.gov http://id.loc.gov/authorities/sh96007490
  61. 61. Chronicling America http://chroniclingamerica.loc.gov/lccn/sn85066387/
  62. 62. NSDL Registry http://metadataregistry.org/
  63. 63. Examples in Archives
  64. 64. UK Archival Thesaurus http://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20050510/
  65. 65. Archives de France “Thesaurus W” http://www.archivesdefrance.culture.gouv.fr/gerer/classement/normes-outils/thesaurus/
  66. 66. Agrippa (AMVC) http://www.analogousspaces.com/media/docs/GUNS_AS_MAY08.pdf
  67. 67. Barriers are both cultural and technical.
  68. 68. Archival description contains lots of implicit information.
  69. 69. “Inheritance” of data in multi-level description is highly implicit.
  70. 70. EAD is document-centric standard, not a data-centric standard.
  71. 71. EAC, a standard in development, is more data-centric.
  72. 72. Archival description, in its current state, is not computer-friendly.
  73. 73. Archival description, in its current state, is not Linked Data-friendly.
  74. 74. EAD needs to change to interoperate with EAC as well as other standards.
  75. 75. It is up to the archival community to steer the standards accordingly.
  76. 76. Thank You mark@matienzo.org http://matienzo.org/ http://twitter.com/anarchivist
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×