• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Piloting Linked Data to Connect Library and Archive Resources to the New World of Data, and Staff to New Skills

Piloting Linked Data to Connect Library and Archive Resources to the New World of Data, and Staff to New Skills



Presentation for the CNI (Coalition for Networked Information) Fall Forum, December 2012. Describes Emory University Library’s first-hand experience in interlinking Civil War-related materials and ...

Presentation for the CNI (Coalition for Networked Information) Fall Forum, December 2012. Describes Emory University Library’s first-hand experience in interlinking Civil War-related materials and other online resources by leveraging open linked data principles. The library has been actively evaluating linked data’s potential to replace current library processes and services (bibliographic services, finding aids, cataloging, and metadata work) as a more efficient and sustainable means, and one that could bring greater benefit to end users for research and learning. The Library’s initial focus was on workforce education and hands-on learning through real-time experiments: the Connections project was begun to prepare staff to work with linked data, a process that has culminated in a 3-month hands-on pilot to build and convert some data. The pilot introduced the concept to a wide range of staff, including subject liaisons, archivists, metadata librarians, and programmers. Emory’s “silos” of data were interlinked with other open data sources as a way to enhance user discovery and use of library materials on a very limited scale.



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • "Connections:  Piloting linked data to connect library and archive resources to the new world of data, and staff to new skills". Emory University Libraries (EUL) will share their first-hand experience in interlinking Civil War related materials and other online resources, leveraging Open Linked Data principles.  As library linked data are emerging, EUL planned to evaluate linked data's potential to replace current library processes and services (bibliographic services, finding aids , cataloging, and metadata work) as a more efficient and sustainable means, and understand its payoff for end user research and learning.   We initally focused on workforce education and hand-on learning through real-time experiments.   Over the last year, the Connections project has begun to educate staff and prepare them to work with linked data, culminating in a 3 month hands-on pilot to build and convert some data.  The Pilot introduced the concept to a wide range of staff, including subject liaisons, archivists, and metadata librarians, and programmers. We interlinked our "silos" of data with other open data sources as a road to enhancing user discovery and use of our material in a very limited scale.  From our experience (insights, as well as some limitations and stumbling blocks) the group developed some better informed recommendations for more education and staff involvement, as well as potential incorporation of this technology into library services and workflows.   Our experience will be particularly helpful for institutions that are awake to linked data's transformative potential and are making plans.   We will share our assessment on the readiness of the entire linked open data ecosystem for libraries to cross-link disciplines and the possible roles of libraries in a linked world.   Beyond that, we plan to suggest some possible routes to help peers to involve their staff with this new paradigm of information curation and dissemination.  " • Can linked data replace library bibliographic services? • How to start such an initiative • Readiness • Recommendations – Education – Integration – Potential areas of work/roles of librarians and staff
  • Reduce Duplicative Work (Downloading, editing, creating holding records) Shorten Process Time (Knowledge Linking) Enhance Authority Control (This John Wang is from Emory) Give Library a Universal Attention (Web Scale) Help Libraries Achieve Missions
  • Than current biblio service tool sets Not treat as an additional thing Connect to the Larger ecosystem Convert/make some RDF. Show value.
  • Finished ILS Migration Setup Cloud Services setup Biblio Vendor Services How to get two division directors sponsored the initiative
  • Now I'm going to talk a little about our experience, and some of the discoveries we made. What we've produced so far isn't that significant, compared to what some other institutions have done... but maybe this is one way to think about involving staff in learning and preparing for using linked data in libraries.
  • We started having classes toward the end of last year. As John explained, our library had a lot on its plate, and the people whom we wanted to involve are very busy. So we usually had brown bag lunches, every other week. This was a high-level overview... sort of “ABC’s” This is a triple, this is how SPARQL queries work, here’s what OWL can be used for, here are some things about publishing linked data that we need to think about. I worked with a graduate fellow to team-teach these. We had a core group that was asked to attend every time, but a lot of folks from across the library attended. By the time we got to brainstorming about a “pilot project” for the summer, the group had a lot of ideas...
  • And, we decided to try as many as we could with our pilot.
  • We chose to center our pilot around a small topic, The American Civil War, because we had some interesting resources and it’s the 150 anniversary... We wanted to show how linked data could link up our metadata “silos”, enhance our unique content, integrate it with data from other places (manually and automatically). Of course, that would include DBPedia but also other archives, other sources of data specific to our theme, and perhaps even data we converted from other formats. We planned to have cool visualizations like maps and timelines. Maybe our data could contribute to a faculty project on the Battle of Atlanta. Oh, and also, we were going to build an interface to create metadata as “native RDF”. We would choose which data models we wanted to use. And, we would investigate which free or open source tools were most useful for doing this work. All, in 3 months... using enthusiastic but busy people who were not the “a team” (our actual developers). Oh, but we were only going to work on a small sample of our metadata – chosen carefully ahead of time, to “connect up well”...
  • So, this was very ambitious, but linked data sounded so simple! We were only able to accomplish a fraction of all we’d planned in the 3 months… Investigated Virtuoso and Sesame, and also Callimachus, a new “beta” web framework; we decided to go with Sesame, it had a web client people could use to do SPARQL queries and load things. By the way, our programmer Bernardo Gomez converted a copy of our ILS database to RDF (mostly Dublin Core) – this wasn’t part of the pilot per se, but interesting exploration. Transformed a small number finding aids using ArchivesHub stylesheet as starting point (lots of modifications still needed!) Transformed subset of MARCxml for some digitized books, using LC stylesheet as starting point (experimenting a little bit with RDA vocabularies but not getting too far into it). Made some N3 triples by hand, in Notepad, to describe images with no metadata. Included id.loc.gov links and DBikipedia links. Retrieved id.loc.gov name/subject URLs via script. Our programmer looked at scripting some links to DBPedia based on our names and subjects but found this too involved to attempt for this pilot. Building a navigation interface, turned out to be a bit too much to accomplish in 3 months, but we had some adventures along the way. At the end, a power outage corrupted our Sesame/OWLIM triplestore... so no live demo today but we can rebuild it. Investigated lots of softwares (mostly free and open source) for display, navigation and publishing of linked data. Some of us were interested in using Drupal 7’s linked data capabilities to create a user interface, but we’re not sure this was the applicationwe wanted. We had not planned to publish our data for this pilot! But, came to realize, if we wanted web-based tools such as LinkSailor to navigate our data, we’d have to publish it. We had fun with a few simple visualization tools that we could plug some of the data into directly.
  • This sequence illustrates the kind of connections we want be able to make. I’m using sort of generic terms for the predicates, rather than any particular vocabulary, for simplicity. We go from a name/text string as a subject (in this case a person)
  • To a URI identifier which we came up with. With the ArchivesHub model we were able to see lots of “coined” URIs for entities such as names associated with an archive. We began to see some wisdom in having our own URIs which we can make assertions about (like, our URL identifies the same person as one from id.loc.gov) without having to assert things about other people’s data... But in this case, we don’t think anyone else has made an identifier for Mr. Mobley so we would have to. The number of URIs we would need to mint was kind of astonishing for us, one of the things we learned is, we need a strategy for this.
  • We can then assert that he’s a member of a particular Civil War regiment. We have “NACO Authority” strings...
  • From here we could link to a regimental history in our collection. And, if our URI for the regiment was linked to a DBPedia entity, we could link to whatever information Wikipedia has on it and navigate to other regiments and much more. And who knows what other data might link to the DBPedia entity?
  • Or, a user could explore other material in that MARBL manuscript collection, or in any other collection that had material on that regiment, or the Civil War...
  • So, what we learned on our summer non-vacation.... We spent too much time trying to select specific records to convert for our pilot. In the end, we loaded all our regimental histories, and a subset of our finding aids, and SPARQL query told us which ones had common subject headings. SPARQL is a skill that I think many librarians could start learning, by the way. There are plenty of SPARQL endpoints...
  • When we contrast ArchivesHub's "associatedWith" construction to express concepts - in this case, a person - with archives that have material about them,
  • With the very simjple mapping to Dublin Core,
  • To Simile's MARC to MODS to RDF approach... And I haven't had time to play with the conversion the BIBFRAME project has come up with, yet, or you'd get a slide of that! You can see that there are a lot of choices. What we wondered was, were there enough similarities in the relationships that we could find some common models and vocabularies across our data? That would make querying easier...
  • From my own perspective, it can be a bit overwhelming to follow linked data just now... so much to learn, so much happening. But I think you just have to dive in. Since this has been a major focus for me this year, and I've followed so many email lists and tried to keep up with projects here and in Europe, and am starting to feel a sense of urgency about this - we need to be on board. At the same time, there are aspects of the "how" that are still unclear and difficult - provenance, tools, vocabulary mappings are just a few.
  • So we learned a lot more in the pilot than in class. People were more engaged because they were doing an assignment they came up with. Those of us that worked with DBPedia could see real possibilities, both as a means to link to Wikipedia content and as a vocabulary in itself. String matching LC subjects didn't work very well - this needs to be a larger project - maybe it's already happening? using algorithms but also, we think, some human review. By larger, I mean, community effort. Who's doing it?
  • -- We had challenges finding, and using tools (especially the non-programmers); how do we find what’s new, what’s good, what do we need to build ourselves? I had been warned that there weren’t really good tools for non-programmers out there... but it was interesting, how many new tools appeared in beta just in the 3 months we were doing the pilot. We just ran out of time to try them all. The pilot made some of us painfully aware that our web skills were out of date. Most of the group expressed regret that they didn't have more time to get involved in this project, but felt they got a lot out of it anyway. In our discussions we recognized a conflict between the desire to create more of our metadata as data, to provide more hooks, and the reality that we have limited staff working at capacity... we talked about crowdsourcing. We also need to explore how this would change the tools we are using to create metadata. Is it possible to make it easy to make more links?
  • One of our members said at one point, "this is really like a relational database, just not with tables" and from his perspective there's some truth to that, but we are starting to see that we can do way more with this than replicate what we're doing with relational databases, MARC, and XML. Linked data is not just all about “search”. We can make discoveries about our collections as a whole, but we can also link our content to the "things" it relates to and really weave them into the research environment for scholars, such as the articles appearing in UniProt and other scientific databases... As we look back, although we don't have a killer app yet, we've gotten a lot out of the last 3 months. We have our test triplestore and can begin to expand it bit by bit towards realizing some of our grand schemes... but we also have other ideas.
  • We also get a sense of our limitations. Some developments really call for big communities, maybe global effort. Who is going to host banks of shared transformations and vocabulary mappings? Some of us are interested in the social tools but our library isn't ready for that right now, however we can begin to feel out our faculty and students about their interest.
  • Our sponsors haven’t made decisions on where we go next. I’m pretty sure we’re not in a position yet to invest more staff time, but: I think many of our original ambitions for the Civil War pilot could be achieved if we can continue at a slow pace, one step at a time. There’s also some interest in at least demoing, interlinking our Primo discovery layer with DBPedia. We want to continue learning and broaden the participation of staff at the library, coordinate with more people in our Systems division - We know of at least one faculty digital scholarship project that our programmers are involved in that uses linked data, and we'd like to open our information sharing group to others at the University.
  • Management (Learning and experience) Technical and learning aspect: Different publishing methods Technology readiness Ecosystem readiness Users perspective on what they get from library and how they might use the data Who should learn, in the conversation
  • (enable linking and creation of linked data)
  • Given the “infrastructure” of global LD isn’t “mature” yet, why not wait for big players to sort it out? What can we do now? (our project was an attempt to answer) Our library is “pinched” for staff time – what can we do? Who in your organization do you get involved in learning/transition, and when? (our project started from systems and “tech services” but public services folks came in and we discovered we need them! Everybody!) Is LD only “big data” – or is small data a part? How can we get data (metadata) in RDF when we don’t have it? Standardization? Who decides? Tools for everyone! Who will build? Where is the community? We need X..... Big jobs – e.g. linking LCSH to Wikipedia Share info on tools (dlf Zot group – no traction – what would work?

Piloting Linked Data to Connect Library and Archive Resources to the New World of Data, and Staff to New Skills Piloting Linked Data to Connect Library and Archive Resources to the New World of Data, and Staff to New Skills Presentation Transcript

  • Connections: Piloting linked data to connect library andarchive resources to the new world of data, and staff to new skills CNI Fall Meeting, December 11, 2012 Laura Akerman Zheng (John) Wang Metadata Librarian AUL, Digital Access, Resources, and IT Robert W. Woodruff Library Hesburgh Library Emory University Notre Dame University
  • Who has presented most frequently at CNI?
  • Current Model: Search and Discover
  • Metadata Published as Documents
  • Require Human to Decipher
  • Linked Data Model: Find
  • Semantic Graph Model
  • Machine Understands Semantics
  • RDF Triple PredicateSubject Object
  • RDF Triple LectureLaura Connection
  • RDF Triples 2012 Year John CNI PlaceKnow Lecture Laura Connection
  • Relevant to What We DoReuse, Authority Control, Knowledging Linking...
  • Connections PilotTo Interlink EAD, Catalog, and Other External Resources
  • Connections: ContextLittle Time to Learn Additional New Things
  • Hands-on learning
  • Ingredients• Leader/teacher/evangelist• Learning group – open to all o 2 "classes" a month, 5 months.• Pilot: 3 months o Brainstorming a pilot project o Start small o Team: programmer, subject liaison, metadata specialists, archivist, digital curator, fellow. o 1-3 hrs/week for all but leader o A sandbox running Linux
  • Integrate linked SPARQL data into discovery layer (catalog)? Our Own Triplestore User interface RDF from EAD Civil Navigation War id.loc.gov Timelines Maps RDF from TEI Crowdsou DBPedia rcing Rosters (and MARC) RDF from MARCXML Faculty project Other 50Data from other 1 CW dataarchives Redesign metadata National Park creation as RDF Service Data
  • 3 months later...
  • Sampling little bites of the meal: EAD (starting from ArchiveHub stylesheetid.loc.gov URIs for LCsubjects and names(scripted) MARCXML (starting from LC DC stylesheet) Make someDBPedia/subjects RDF(by hand) metadata Sesame Visualization – triplestore Simile Welkin
  • A few of the connections... HasSubject "Mobley, Thomas"HTTP:OurResourceURL
  • HasSubject rdfs:resource HTTP://OurPersonMobleyT1 rdfs:label ""Mobley, Thomas"HTTP:OurResourceURL
  • HTTP:OurPersonMobleyT1hasSubject memberOf Confederate States of America. Army. Georgia Infantry Regiment, 48th
  • HTTP:Our Mobley Tom1hasSubject memberOf 48th Georgia Infantry http://id.loc.gov/authorities/names/n99264720 hasSubject sameAsDBPedia:http://dbpedia.org/page/48th_Georgia_Volunteer_Infantry
  • isPartOf heldBy Confederate miscellany collection, 1860-1865
  • We learned: Selecting material that will “link up” without SPARQL, is too hard!Even when items are in a unified “discovery layer”, the types of search are limited. Get it into triples, then find out!
  • We learned:•(No one model to follow has emerged. Wehave to think about this ourselves.)There are many ways of modeling data
  • ArchivesHub handles subjects:<associatedWith><!--About the Concept (Person)--> <skos:Conceptxmlns:skos="http://www.w3.org/2004/02/skos/core#"rdf:about="http://duchamp.library.emory.edu/resource/id/concept/person/lcnaf/gearyjohnwhite1819-1873"> <rdfs:label xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xml:lang="en" >Geary, JohnWhite, 1819-1873.</rdfs:label> <skos:inScheme> <skos:ConceptScheme rdf:about="http://duchamp.library.emory.edu/resource/id/conceptscheme/lcnaf"> <rdfs:label xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xml:lang="en"> lcnaf</rdfs:label> </skos:ConceptScheme> </skos:inScheme> <foaf:focus xmlns:foaf="http://xmlns.com/foaf/0.1/"><!--About the Person--><foaf:Personrdf:about="http://duchamp.library.emory.edu/resource/id/person/lcnaf/gearyjohnwhite1819-1873"> <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Agent"/> <rdf:type rdf:resource="http://purl.org/dc/terms/Agent"/> <rdf:type rdf:resource="http://erlangen-crm.org/current/E21_Person"/> <rdfs:label xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xml:lang="en">Geary, John White, 1819-1873.</rdfs:label> </foaf:Person> </foaf:focus> </skos:Concept> </associatedWith>
  • LCs MARCXML to RDF/DublinCore: dc:subject " Geary, John White, 1819-1873."
  • Simile MARC to MODS to RDF:<modsrdf:subject rdf:resource="http://simile.mit.edu/2006/01/Entity#Geary_John_White_18191873"/> <rdf:Description rdf:about="http://simile.mit.edu/2006/01/Entity#Geary_John_White_18191873"> <rdf:type rdf:resource="http://simile.mit.edu/2006/01/ontologies/mods3#Person"/> <modsrdf:fullName>Geary, John White </modsrdf:fullName> <modsrdf:dates>1819-1873</modsrdf:dates </rdf:Description>
  • We learned: Linked data is HUGEIt’s coming at us FAST It’s not “cooked” yet
  • More learnings• We learned more by doing than by "class".• Making DBPedia mappings or links by hand is very time consuming! We need better tools.• We need to spend a lot more time learning about OWL, and linked data modeling.
  • Challenges• Easily available tools are not ideal!• Skills we needed more of: HTML5, CSS, Javascript• Time!• Visualization/killer app not there yet.• Cant do things without the data! No timeline if no dates!
  • What we got out of itTest triplestore for training and more developmentBetter ideas on what to pilot nextConvinced some doubters"Gut knowledge“ about triples, SPARQL, scaleBeginning to realize how this can be so much more than a better way to provide "search"
  • Outside our reach for nowTransform ILS system to use triple store instead of MARCCreate hub of all data our researchers might wantMake a bank of shared transformations for EAD, MARC, etc.Shared vocabulary mappingsSocial/networking aspect (e.g. Vivo, OpenSocial...) - need a culture shift?
  • Next? Maybe...Build user navigation?More Civil War triples including other localinstitutions’ stuff?Publishing plan?Integrate ILS with DBPedia links?Suite of “portal tools” for scholars?Use linked data for crowdsourcing metadata?More classes?Connect with others at Emory around linked data
  • Recommendation: Individual Institutions• Focus on unique digital content• Publish unique triples• Reuse existing linked data
  • Recommendation: Community• Create standards or best practices• Grow our skills• Test and evaluate tools• Develop tools
  • Recommendation: Librarians’ Role?• Interdisciplinary linking?• Metadata librarians - Linking association and normalization
  • AcknowledgementsConnections group sponsors: Lars Meyer, John EllingerConnections Pilot team: Laura Akerman (leader), Tim Bryson, Kim Durante, Kyle Fenton, Bernardo Gomez, Elizabeth Roke, John WangFellows who joined us: Jong Hwan Lee, Bethany NashOur website: https://scholarblogs.emory.edu/connections/Laura Akerman, liblna@emory.eduJohn Wang, Zheng.Wang.257@nd.edu
  • Thanks Q&A