Your SlideShare is downloading. ×
0
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Building the New Open Linked Library
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Building the New Open Linked Library

2,384

Published on

Presented at the LITA National Forum, September 30, 2011

Presented at the LITA National Forum, September 30, 2011

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,384
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
9
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Possibly omit or move this slide
  • Mo’ data mo’ better. Mission fulfilment. Sharing=caring. Efficient reuse of data.
  • Q to Audience: How many people have heard of linked data before today? How many feel they have a basic grasp of what it is? How many people want to watch me trip over my tongue trying to explain it in less than a minute?(If good grasp, note that in the 4 principles of linked data from T B-L 1 & 2 are easy, 3 is where we’re working now, and 4 we’re trying to figure out how to do it.) (otherwise on to definition)
  • LD describes a way of publishing structured data to the web so it can be interlinked with other structured data. Shared data usually (not always) in RDF (resource description framework), often as RDF in XML (we understand XML !) standard that allows data from different sources to be connected and queried.Linking data enables you to enrich yr data & give it additional contextData expressed almost like sentences in ‘triples’ URI=your data Predicate=verb Object=object. Example. Object can be a link to another system, or can just be more data, e.g. “1809”The predicate is chosen from a set vocabulary (or ontology) or if you have to make one up, you publish that new vocabulary on the web so others can get to it. Common vocabularies includeFOAF (Friend of a Friend) people, personal relationships DC (Dublin Core) publications, etc. etc. SKOS (Simple Knowledge Organization System) links systems, conceptsOWL (Web Ontology Language) links ontologies, extension of RDF
  • How did SIL start thinking about implementing LD? Website rebuild. Goal is to make our data more useful, reusable, and accessible to people and machines, more than just putting our stuff up ‘online’. Started looking at CMS. Wow! D7 is not only a CMS, it’s open source, and it has RDFa baked in, along with common LD ontologies! Sold!
  • -lots of bibliographic data in ILS, but unfortunately no access to it (for now)-re-doing online books, good candidates for providing lined biblio data-existing ‘database’ stuff – inventories, as well as new project digitization/markup of reference book Tax Lit 2 (more from JMR)
  • Initial focus for us will be on “database” like content we already have, or are currently creating. JMR will discuss one example.
  • As we move through our website redesign, and rearrange more of our content online, we will gradually go through books, other database stuff, maybe even simple stuff like library locations and hours, and apply our planning principles.
  • Questions for the audience to get a feel for who you are.* Computer* Librarians* Worked with Databases* Worked with Drupal
  • Why drupal? Why not!Why 7 and not 6? Well RDFa is built in. If it’s there, we’re more likely to use it.RDFx extends RDFa to provide different formats (XML, JSON,NTriples, Turtle via REST) RDFx also provides UI to se t the RDF mappings (Drupal comes with some already set up, but we really want to customize ours)Evoc used for caching also for autocomplete, which we’ll see later.AUDIENCE QUESTION: How many know the difference between RDF and RDFa?
  • Since we sort of know about what Linked Data is, let’s take a quick look at it compare RDFa, which embeds RDF data into the webpage, and and RDF in XML.The identifier is the URI of the page, the predicates are embedded in the page, and are displayed in orange, and the object or property is displayed in the <div> or <span>There may be more that needs to be done here
  • This RDF is formatted in XML, note that only the predicates are shown here, There is no extraneous HTML to distract. Typically you need a special tool to use this information. The web browser doesn’t natively understand an RDF XML file.
  • Field / Node Ref / Views are built inSPARQL is an addon module to allow others to come in and query the data on our siteSPARQL Views allow us to use external data from other sites, presumably to create new content (we may use this)RDF Ext Vocab (evoc) is used to cache vocabularies to use them in the autocomplete feature when setting up RDF mappings (among other things)Biblio is a nice module, but it needs a serious update before we can start using it.
  • Namespaces that drupal comes with:Dublin Core -FOAF – Friend of a friend – Links between people and the things they create and doOpen Graph – Allowing web pages to become an object in the social graph – Mainly facebookSIOC – Semantically interlinked Online CommunitiesSKOS – Knowledge organization – concepts, collections, ideasOWL – Web Ontology LanguageBIBO – Bibliogrpahic Ontology – For books! How convenient! Covers nearly all of what we need for describing books on the webWe may need to extend for publication year (rather than publication date)Later we’ll discuss a few cases where we aren’t finding something perfectly appropriate for our needs or our data is very specialized, so we may extend an existing namepsace or create our own. We can do this as long as the namespace is published and documented for others to reused.
  • Adding a namespace is a simple matter of giving it a prefix and the URI to the namespace. This page does not show all of the namespaces used by RDFx, there are actually 8 or 9 of them.Drupal can aslo import and cache these namespaces using the External Vocabulary Importer for reuse and also for the autocomplete feature, which is really nice. (not shown, but it’s also a matter of supplying the prefix and name.)
  • Although some very basic RDF mappings are set up in Drupal for us, it’s easy to create our own. They can be viewed in multiple places, but on the content type, each field’s RDF mappings can be edited on a single page. Additionally, if we have imported the vocabulary into Drupal, we get the nice benefit of the autocomplete feature to help us choose the appropriate mapping.
  • TL2 is a database. In book form!Botanists and their books, cross referenced in the index using unique identifiers across all volumes. It’s really a database!Used by botanists, having this online and searchable could be huge. At least having it online saves them the trouble of going to the physical volumes.Since no one else has this online in linked data form (in fact it’s barely online as it is) we’re going to become the authority for botanist names. Also, SI has contributed to the supplemental volumes.
  • Here we have a page of TL2, our good man Charles Darwin and some information about him. At the bottom, we have an obscure (ha!) book that he wrote, which is number 1313 in the TL2 scheme of things.Our goal is to identify the data elements that we are going to initially make public and how to map them to the vocabularies to make them more useful to others.This goes hand in hand with the parsing that we’ve hired a contractor to do, they’re pre-parsing some of the information based on our specs.1313. Nice address. 1313 Mockingbird Lane. Munsters Reference. Bad joke.
  • TODO: LinkSameAs to BHL, not OCLCHere’s an example. The identifiers, /darwin and /1313 are linked together with “dc:creator” and in the reverse “dc:contributor” (I think)(Predicates are one-way)So these links, which come from the index of TL2, are cross linked and our site is nicely browseable and searchable and so on.But we also link out to other places, VIAF for darwin’s identifier and WorldCat for Origin of Species that allow others to go out and do other things with this data. We link out, but how do we get people to link back into us? That’s one of the questions we aim to get an answer to, but solving it will take some time.
  • And here’s what we’re going to start with. (run through the different elements, starting with URI, the RTF type, then the predicates and data types TODO: More info hereOther data elements may be linked later, there’s certainly stuff available here Herbaria other bibliographic entries (need to define their relationships) Handwriting Samples Postage Stamps (!!)Mentioned earlier that we might create our own or extend an existing vocabulary. You’ll note here that we are creating the “tl2”namepsace because the concepts in TL2 are specific to it and yet is commonly used that a new namespace would be useful to others.BUT! Something is missing! Where’s that “linked” part of linked data?
  • TODO: LinkSameAs to BHL, not OCLCHere’s an example. The identifiers, /darwin and /1313 are linked together with “dc:creator” and in the reverse “dc:contributor” (I think)(Predicates are one-way)So these links, which come from the index of TL2, are cross linked and our site is nicely browseable and searchable and so on.But we also link out to other places, VIAF for darwin’s identifier and WorldCat for Origin of Species that allow others to go out and do other things with this data. We link out, but how do we get people to link back into us? That’s one of the questions we aim to get an answer to, but solving it will take some time.
  • So to recap, this entire dataset is initially going to be represented in exactly two content types, Authors and BooksA node reference between them allows us to browse between them in Drupal, but also helps create the RDF links for LOD
  • So how do we get this data into Drupal.We start with an XML file from our contractor. It’s already partially parsed, we’ll do some more parsing and convert that data into CSV, most likely.Using the Feeds module’s import tool, we’ll bring in the data and (hopefully) create the proper node references between. We need to keep the blocks of information together (herbaria, handwriting samples, bibliography, postage stamps) until we can parse them out at a later date as needed.Ultimately we’ll create a custom search just for TL2, even though its data will be included in the general site search on our Drupal site.
  • What things do we still need to do.RDFx (rdf extensions) module uses one set of identifiers and Drupal uses another. i.e. /node/22365 and /node/22356.rdf for the XML version versus /tl2/author/charles-darwinOther useful information in TL2 includes “See also” entries,Alternate names, etcUseful to researchers. We do plan to incorporate this data in a later phase of development, if only for the human-friendly site search.Investigating whether it makes sense to use SPARQL when users are querying our own data? Would this facilitate the search or make things more complicated.As we mentioned before, we’ll need to design, document and publish any extended or new ontologies (vocabularies) that we create for TL2. Our website’s been around for 15 years. Now we are laying the foundation for the next 15 years. Hopefully.
  • Transcript

    • 1. Building the New Open Linked Library Theory and Practice …and results!Keri Thompson, Joel Richard, Trish Rose- LITA National Forum, September 30, Sandler 2011
    • 2. Smithsonian Libraries • Founded in 1846 • 1.5 m volumes in collection, plus assorted archival collections • 15,000 volumes scanned and online • 20 libraries serving ~500 researchers/curators + hundreds of fellows and interns • 102 library staff • 1.5 web staff • Founding member of the Biodiversity Heritage Library LITA National Forum, September 30, 2011
    • 3. Linked Data in our Library WHY Linked Open Data? • It’s cool • “Increase and Diffusion of Knowledge” • Share, contribute to a global database • Create context around our data • Allow data to be reused/repurposed by ourselves and others • Improve discoverability of our content LITA National Forum, September 30, 2011
    • 4. Linked Data “The Semantic Web isn’t just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data.” Tim Berners-Lee, Linked Data – Design Issues 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names 3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) 4. Include links to other URIs, so that they can discover more things. LITA National Forum, September 30, 2011
    • 5. Linked Data Open Data• Publishing structured data on the web • Freely available to use, reuse,• RDF (Resource Description Framework) republish with no restrictions• Enables queries computer 2 computer • Made available through various mechanisms such as .csv files,• uses standard ontologies (vocabularies) APIs• data in “triples” (“triplestore”)URI http://library.si.edu/tl2/author/charles-darwinPredicate owl:sameAsObject http://viaf.org/viaf/27063124 LITA National Forum, September 30, 2011
    • 6. Our Website Organically grown since 1995 • 83,000 HTML pages • 3,700 ColdFusion pages • 253,000 JPEG files • 27,000 PNG files • 46,000 PDFs No CMS. LITA National Forum, September 30, 2011
    • 7. Digital Library Planning 1. Analyze and categorize our current & future online content 2. Create high-level data models for common content types Questions: Where are we metadata-rich? What do we have that others don’t? What is feasible right now? LITA National Forum, September 30, 2011
    • 8. Content Analysis • 400+ Online “books” • Exhibitions • Research Tools • Image Collections (60,000+ images) • “Brochure” content (About us, Locations, Hours) • Bibliographies, Fact Sheets, Subject Guides • Databases, inventories and database-like books Collections not on our website: • ~15,000 digitized volumes, with many more planned • Other analog collections that will be digitized LITA National Forum, September 30, 2011
    • 9. Linked Data in our LibraryBooks (and book-like objects) • expose bibliographic data for reuse • consume links to other internal content and external authoritative dataDatabases • expose data previously unavailable • provide authoritative data • consume our data and others’ to create new aggregate websites LITA National Forum, September 30, 2011
    • 10. Linked Digital Library Planning 1. Decide which data elements should be exposed as linked data for each content type 2. Choose appropriate vocabularies 3. Create a rough timeline and plan for migrating site content (=1 year*) * Optimism included in this estimate LITA National Forum, September 30, 2011
    • 11. Linked Data in our Library Implement all this linked open data goodness (and a shiny new website) by moving to Drupal 7 LITA National Forum, September 30, 2011
    • 12. Drupal and Linked Data• Native support for RDFa in Drupal 7.• RDF Extensions (rdfx) – even more features.• Vocabularies can be imported and cached for reuse.• Few or no modifications to HTML to support RDFa.What’s the difference between RDF,RDF/XML and RDFa? LITA National Forum, September 30, 2011
    • 13. RDFa Sample URI: http://library.si.edu/book/origin-of-species <meta content="The Origin of Species" about=”/book/origin-species" property="dc:title" /> <h1>The Origin of Species</h1> <img typeof="foaf:Image" src="http://localhost:8087/images/origin-of-species.png" alt="The origin of species cover image” title="The origin of species cover image" /> <div rel="bibo:authorList"> <a href="/content/darwin-charles-1809-1882"> Darwin, Charles, 1809-1882 </a> </div> <div property="dc:created">November 24, 1859</div> <div property="bibo:numPages">1000</div> <div property="dc:language">english</div> <div rel="owl:sameAs"> <a href="http://www.worldcat.org/oclc/1184647" target="_blank">http://www.worldcat.org/oclc/1184647</a> </div> LITA National Forum, September 30, 2011
    • 14. RDF/XML Sample URI: http://library.si.edu/book/origin-of-species.rdf <?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/terms/" xmlns:bibo="http://purl.org/ontology/bibo/"> <rdf:Description rdf:about="http://localhost:8087/content/ origin-species"> <rdf:type rdf:resource="http://purl.org/ontology/bibo/Book"/> <dc:title>The Origin of Species</dc:title> <dc:created>November 24, 1859</dc:created> <bibo:numPages>1000</bibo:numPages> <dc:language>english</dc:language> <bibo:authorList rdf:resource="http://localhost:8087/content/darwin-charles"/> <owl:sameAs rdf:resource=“http://www.worldcat.org/oclc/1184647”> </rdf:Description> </rdf:RDF> LITA National Forum, September 30, 2011
    • 15. What other modules are we using?• Fields, Views, Views UI• Node Reference• SPARQL Endpoint , SPARQL API• RESTful Web Services• SPARQL Views• RDF External Vocabulary ImporterCaveat: Some modules not ready for Drupal 7 • i.e., Biblio module (no CCK, RDF capabilities) LITA National Forum, September 30, 2011
    • 16. What about Namespaces/Vocabularies?• Drupal 7 comes with several namespaces. We will use: DC Terms, FOAF, SKOS, OWL• Were working with books, so we need the Bibliographic Ontology: • Website: http://bibliontology.com/ • Namespace: http://purl.org/ontology/bibo/ • Prefix: “bibo”• We may also create our own vocabulary. LITA National Forum, September 30, 2011
    • 17. Adding a Namespace to Drupal LITA National Forum, September 30, 2011
    • 18. Setting up RDF Mappings in Drupal LITA National Forum, September 30, 2011
    • 19. Databases: TL-2Taxonomic Literature 2 (1977-2009)• The standard reference work for plant taxonomic literature from Linnaeus to 1940.• Contains botanists, authors, biographies, citations, and species.• Indexed and cross referenced.• Should be digitized & on the web!• SIL aims to be an authority for botanist names on the Internet. LITA National Forum, September 30, 2011
    • 20. TL-2 Page Sample Taxonomic Literature 2 (TL-2). v1., p. 600 LITA National Forum, September 30, 2011
    • 21. TL-2 Page Sample http://library.si.edu/tl2/author/darwin tl2:creatorOf http://library.si.edu/tl2/book/1313 owl:sameAs http://viaf.org/viaf/27063124 http://library.si.edu/tl2/book/1313 dc:creator http://library.si.edu/tl2/author/darwin owl:sameAs http://www.archive.org/details/ originofspecies00darwuoft LITA National Forum, September 30, 2011
    • 22. TL-2 Page Sample http://library.si.edu/tl2/author/darwin RDF Type = foaf:Person foaf:lastName, foaf:familyName foaf:firstName, foaf:givenName foaf:name, skos:prefLabel tl2:birthYear tl2:deathYear tl2:description tl2:personAbbrev http://library.si.edu/tl2/book/1313 RDF Type = bibo:Book tl2:bookNumber dc:title event:place dc:publisher tl2:bookAbbreviation dc:created LITA National Forum, September 30, 2011
    • 23. TL-2 Page Sample Resultshttp://library.si.edu/tl2/author/darwin http://library.si.edu/tl2/book/1313tl2:creatorOf dc:creator “http://library.si.edu/tl2/book/1313” “http://library.si.edu/tl2/author/darwin”owl:sameAs owl:sameAs “http://viaf.org/viaf/27063124” ”http://www.archive.org/details/ originofspecies00darwuoft”foaf:lastName “Darwin” tl2:bookNumber “1313”foaf:familyName “Darwin” bibo:shortTitle “On the origin of species”foaf:firstName “Charles” dc:title “On the origin of species by meansfoaf:givenName “Charles” of natural selection, or the preservation of favoured races in the struggle forfoaf:name “Darwin, Charles Robert” life.”skos:prefLabel “Darwin, Charles Robert” event:place “London”tl2:birthYear “1809” dc:publisher “John Murray”tl2:deathYear “1882” dc:created “1859”tl2:description “British evolutionary biologist” tl2:bookAbbreviation “Origin sp.”tl2:personAbbrev “Darwin” LITA National Forum, September 30, 2011
    • 24. Setting up TL-2 in Drupal• Two Content Types: Authors (Botanists) and Publications• Node Reference between Authors and Publications based on the TL-2 index.• Other data is available when its parsed: • Herbaria • Institutions • Species names • Bibliographies • Handwriting Samples • Postage Stamps LITA National Forum, September 30, 2011
    • 25. Image Credits: Database: eponas-deeway (http://eponas-deeway.deviantart.com); Magnifying Glass: Flahorn (http://flahorn.deviantart.com/)Getting Data into Drupal• Create Content Types (Digital Library books & TL-2)• Create import process • May be able to use the Feeds module for import • Must create node references during the import. • Must accommodate the blocks of unparsed information in TL-2• Create a search interface specifically for TL-2 LITA National Forum, September 30, 2011
    • 26. What else is there to do?Resolve /node/22365.rdf and /tl2/author/charles-darwinHandling "See also" and "Same as" entries in the TL-2indexes.Can we search our own data using SPARQL? • Should we? Does it make sense?Discuss/Extend vocabulary for our special needs.Set up linked data within our site • image collections • trade literature • Exhibitions LITA National Forum, September 30, 2011
    • 27. Other Resources LinkedData.org http://linkeddata.org/guides-and-tutorials http://linkeddatabook.com/editions/1.0/ Drupal Groups http://groups.drupal.org/semantic-web http://groups.drupal.org/libraries Tim Berners-Lee, TED talks Tim Berners-Lee on the next Web (2009) The year open data went worldwide (2010) LITA National Forum, September 30, 2011
    • 28. BHL is….• A consortium of 13 natural history and botanical libraries and research institutions• An open access digital library for legacy biodiversity literature.• An open data repository of taxonomic names and bibliographic information LITA National Forum, September 30, 2011
    • 29. LITA National Forum, September 30,2011
    • 30. LITA National Forum, September 30,2011
    • 31. Benefits of open data Allows data which was created for aspecific purpose and audience to interact with other data to serve new, previously unimagined roles.. LITA National Forum, September 30, 2011
    • 32. What information have we opened up? Essentially, everything – our metadata(descriptive, rights, structural), our image files, scientific names, OCR’d files LITA National Forum, September 30, 2011
    • 33. Technical methods for opening data • Data exports • APIs • OpenURL • OAI-PMH LITA National Forum, September 30, 2011
    • 34. Who is reusing our data? • Tropicos • Rod Page – BioGUID, BioStor • Encyclopedia of Life • Ryan Schenk – Visualizing taxominic synonyms LITA National Forum, September 30, 2011
    • 35. Who is reusing our data?Tropicos LITA National Forum, September 30, 2011
    • 36. Tropicos LITA National Forum, September 30, 2011
    • 37. Who is reusing our data?Tropicos LITA National Forum, September 30, 2011
    • 38. Who is reusing our data?Rod Page – BioGUID – http://bioguid.info/bhl/ LITA National Forum, September 30, 2011
    • 39. Who is reusing our data?Rod Page – BioStor – http://biostor.org/ LITA National Forum, September 30, 2011
    • 40. Who is reusing our data?Rod Page – BioStor – http://biostor.org/ LITA National Forum, September 30, 2011
    • 41. Who is reusing our data?Encyclopedia of Life – http://eol.org/ LITA National Forum, September 30, 2011
    • 42. Who is reusing our data?Encyclopedia of Life – http://eol.org/ LITA National Forum, September 30, 2011
    • 43. Who is reusing our data?Encyclopedia of Life – http://eol.org/ LITA National Forum, September 30, 2011
    • 44. Who is reusing our data?Ryan Schenk – http://ryanschenk.com/2011/02/visualizing-taxonomic-synoymns/ LITA National Forum, September 30, 2011
    • 45. Making open data successful• Promote it! LITA National Forum, September 30, 2011
    • 46. Do a code challenge LITA National Forum, September 30, 2011
    • 47. Publicly display your data’s copyright/licensing and API terms of service LITA National Forum, September 30, 2011
    • 48. Thank You!Building the New Open Linked Library Keri Thompson, Head of Web Services Smithsonian Institution Libraries thompsonk@si.edu , @DigiKeri_SIL Joel Richard, Lead Developer Smithsonian Institution Libraries richardjm@si.edu Trish Rose-Sandler, Data Analyst Biodiversity Heritage Library trisha.rose-sandler@mobot.org LITA National Forum, September 30, 2011

    ×