Linking Library Data on the Web Daniel Chudnov - dchud @ umich.edu Tokyo, Japan - December 8 2010Hello, my name is This is my ﬁrst time in Japan since I was a in1992. I can speak a little Japanese, but only a little, so today it will be better for me to speakin English. But I will practice, and maybe if I am lucky enough to visit Japan again I will trypresenting in Japanese next time.
these slides are my opinion not my employer’sI am very excited to be here today and thank you for the opportunity. I work for the Libraryof Congress, but I am not here representing the Library of Congress. Instead, I am hereindependently, and these slides represent only my opinion, and not the opinion of myemployer. I will not be discussing the work I do in my regular job.
• making links • weaving links into a fabric • proxying and caching linksI will focus on three areas of Linking Library Data on the Web. Because we are all familiarwith basic Linked Data ideas, I will focus on translating traditional library activities into theseareas: making links, weaving links together, and the proxying and caching of links.
making links • we have been building the web for about 20 years • we constantly improve the web • we constantly improve how we build the web • we still have a ways to goThe web has been around for 20 years. How we build it is always changing, and what wethink is “modern” and “cutting edge” is always changing. It is easy to look back at an old webdesign and laugh, but it was exciting and new to us back then. The same is true today fortoday’s web. I think we can still make a lot of improvements.
Rembrandt at artic.edu authority record at OCLCHere is a sample of how different cultural heritage institutions represent the famous artistRembrandt on the web. The museum’s page at right shows an example item by the artistwith some description. The authority record at left shows a lot of useful informationincluding identiﬁers and name form variations. These pages do not link to each other. Inthis case, I did not ﬁnd them using Google.
Rembrandt in linkypediaIn this case, I found these two pages using linkypedia ( ). linkypedia is asoftware project started by Ed Summers. To use linkypedia, you give it a list of sites you careabout, like you library’s website or a museum website, and it crawls Wikipedia to ﬁnd linksfrom article pages to your sites. In this case I gave linkypedia a list of libraries and museumsI have visited. It found several museum sites’ pages linked to from the Wikipedia page for“Rembrandt”. It also found a link to the name authority record for Rembrandt that ispublished by OCLC in its Linked Authority File online.
Rembrandt from linkypediaThis image collage shows the different pages linked to from the linkypedia page. It startswith the linkypedia resource and the authority record from OCLC in the background, and hasimages from the New York Metropolitan Museum of Art, the Art Institute of Chicago, and theNational Gallery of Art in Washington. It emphasizes how the links in linkypedia and wikipediacan bring together resources from disparate locations. But something’s missing here. Wecan do more with our own library data.
Picasso: name variants, search linksI added this page to linkypedia to make more use of the name authority record. Here you seename variations from the authority record, a link to an LCCN search in Google, and links tosearch several famous cultural heritage institutions as well as general search engines.
Picasso from more than linkypediaThis image collage shows a sample of the broader results we can ﬁnd when following theselinks from the name forms. It includes pages from the Virtual International Authority File(VIAF), an Israeli museum site, and the NDL. Actually, the page added to linkypedia withname forms is partly similar in function to the VIAF page.
a fragile web • just using search link patterns is brittle • wikipedia, google, and bing do better than we do at connecting resources across institutions • we have more content that isn’t easy to ﬁnd • how can we improve this?Pointing at these broader results as some kind of new success is cheating a little bit. The bigsearch engines have already brought together related resources from across organizationsand sites for a long time. And VIAF brings many more authority records together verythoughtfully. But we have so much content on our sites that is only available through searchURL patterns; these change often, and are not a good basis for longer-term linking. I thinkthe next step is obvious.
proxy resource URIs • commit to a stable resource URI for each concept/name for each institution • use that URI to show your own links to your own holdings • use that URI to ﬁnd external links to your holdings • use that URI to show external links to related holdingsI think the ﬁrst thing we can do at our respective institutions is to commit to having a single,primary, stable URI for each concept and name at our institution. That resource URI can thenbe used as a proxy for several things: links to your own holdings, a stable tool for ﬁndinglinks from other sites to your holdings (like in linkypedia), and also as a place to highlightrelated resources at other institutions.
proxy resource URIs • a “home page” for each concept at each of our sites • threads together diverse content at each of our sites • makes it easier to weave threads of content from diverse sites togetherIn this way we can each have our own home pages for concepts. But although they are “homepages”, they are also a way to connect our own resources together - especially in largeinstitutions with different groups managing different parts of our sites. And although theyare about “our resources”, they also can enable lasting connections with other valuableresources on the web.
NDLSH WikipediaThe NDL is already doing some of this with links from NDLSH to Wikipedia. Cupid ( )and I both love it.
caching concepts • concept resource URIs should indicate “same as” status with major concept sources • these URIs should also publish local caches of the “main” concept source dataWe can take this a step further. When we publish a concept resource URI at our sites, we canalso publish with it the deﬁnition or authority record for the concept as a cache of thedeﬁnition made available from the source institution or registry. By connecting andindicating “same as” status, we are tying the resource URIs together by saying their meaningis the same. A local cache of the remote record makes the “meaning” information availablelocally.
caching concepts • supports efﬁcient local processing • adds stability when remote sites go down • we often do this in our ILS alreadyThis can have two big beneﬁts. First is for local processing. It is more efficient to have acopy of the concept record locally when performing indexing and analysis of metadata orcontent. Also the redundant copies make for a more stable “web of meaning”. If the remoteconcept resource URI becomes unavailable temporarily, other institutions and sites anddepend on having their own local copies of the concept records, or can use the cached copiesat peer sites. This is why we already do something like this with local copies of authorityrecords for cataloging and indexing. The new patterns I am discussing would extend thatpractice to the web.
strengthening meaning • can apply to registries as well as catalogs, repositories, exhibit sites, reference resources • makes concepts and their meaning a more prominent part of our own sites and the web we are building as a wholeThis is not my area of expertise so I am glad that the others hear are better able to addressthese points. But I think that these patterns can be very broadly applied to the many kinds ofresources we each publish on the web, including registries, catalogs, repositories, contentexhibits, and reference resources. It might be a valuable way to give more weight to themeaning of the concepts we use to connect our resources and collections together.