Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Loading in …3
×
1 of 16

Practical approaches to entification in library bibliographic data

1

Share

Download to read offline

Presentation from C4LMW 2015

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Practical approaches to entification in library bibliographic data

  1. 1. Practical approaches to entification in library bibliographic data
  2. 2. BIBFRAME = Internet of Things • BIBFRAME is the model, but the devil is in the details • Reconciliation with legacy data • Different flavors of the model (kind of like different flavors of MARC, but not really) • How do make our data semantic web friendly • How do we build links (down with strings!) • What services do we trust and are these services available yet • How do we experiment to start learning what works and what doesn’t
  3. 3. Where do you start? • If you are a developer? • The current toolset is built for you. LC’s tools, SPARQL, system APIs – as a developer, the raw components that you need to start pulling together toolsets for experimentation can be found if you look for them. • If you are a cataloger? • Find a developer, or start writing scripts yourself… • Today, very few resources are being developed for practitioners. Zepheira has a training set and is sponsoring LibHub, LC’s BIBFRAME site provides examples of data in context, and there is MarcEdit.
  4. 4. Linked Data Tools in MarcEdit • MARCNext • The MARCNext toolset represents an effort to beginning creating a set of tools that can integrate into existing workflows for Libraries and Catalogers interested in testing or implementing linked data concepts within their bibliography environments today. • Directly in MarcEditor • Integration of the Linked data tool as part of the cataloger’s workflow • Via the Command-line • As part of cmarcedit.exe: eg. cmarcedit.exe –s [sourcefile] –d [destfile] -linkeddata
  5. 5. MarcEdit’s MARCNext Toolset
  6. 6. MarcEdit’s MARCNext Toolset • Main motivations for making this available • Exposes a part of a larger framework presently within MarcEdit to support my research interests in emerging metadata models and linked data concepts in general. • To place tools in the hands of catalogers; who are largely pushed to sidelines when thinking about issues like BIBFRAME and Linked Data • Lower the barriers for those interested in experimenting with their own data
  7. 7. MarcEdit’s MARCNext Toolset • BIBFRAME Testbed: a tool utilizing LC’s XQuery transformations to allow users the ability to visualize their own metadata within various BIBFRAME serializations. • JSON Object View: a tool allowing users to open a JSON file and visualize the relationships between objects. • Link Identifiers: a tool that catalogers can use now to embed URIs into the $0 of controlled terms • SPARQL Browser: A Spartan interface for users wanting to test SPARQL endpoints
  8. 8. Link Data Tool • The Last Mile Problem: To take advantage of metadata models designed for the web, someone will need to “link” the data. • EZ-Entification: Takes advantage of the current MARC structure to embed $0’s into the 1xx, 6xx, and 7xx fields. • Process supports the generation of links to a wide range of authority sources. • Presently: • VIAF • ID.LOC.GOV • FAST • MESH • Embedding OCLC Work ID’s into records
  9. 9. Link Data Tool • How it works • In March 2015, I formalized support for linked data resources and created the melinked_data.dll assembly. This assembly is the engine that drives MarcEdit’s Linked Data work. • Within the assembly is a resolution framework, designed to enable plug & play networks for eventual user definition of new linked data services. • Framework has been designed to support SPARQL, JSONLD, and OpenSearch (with Atom or RSS responses) • As part of the tool, the resolution algorithm has multiple validation layers, with basic data normalization to ensure optimal communication with the current linking services.
  10. 10. Link Data Tool • So what get’s Linked? • Tool is looking for specific values • VIAF and LCNAF linking occurs on 1xx and 7xx data elements • Subject Linking occurs on all 6xx fields • Linking services are automatically evaluated and processed by utilizing data found within the second indicator and the $2. • When working with known services, the tool will evaluate any data found in the $0 and if a URI isn’t present, will update the value appropriately
  11. 11. Link Data Tool • Creating Actionable Data • $0 defined as: Authority record control number or standard number (R) • Linked Data Tool ignores this utilizing URIs (and will actively convert control numbers to URIs) Example: =650 7$aMedical policy.$2fast$0(OCoLC)fst01014505 • Converted to: =650 7$aMedical policy.$2fast$0http://id.worldcat.org/fast/1014505
  12. 12. Link Data Tool • Challenges • Are the Linking Services ready? • Honestly, many of these services are still evolving. Will a VIAF identify continue to make the most sense when linking to OCLC person data, or will the Person Identifiers that they talked about at ALA be more appropriate? • Id.loc.gov doesn’t handle redirects well through the API – there is (or was last time I tested) a disconnect between terms that have been replaced.
  13. 13. Link Data Tool • Challenges • Linking will also be local – and how will those services be implemented. I’m hoping SPARQL, but my experience has been all over the map. • Where is OCLC in all of this. They are working hard on their own internal data streams, but its actually groups like Zepheira, BibFlow, and LD4P that are actively engaging catalogers. .
  14. 14. Link Data Tool • Challenges • Linking will also be local – and how will those services be implemented. I’m hoping SPARQL, but my experience has been all over the map. • Where is OCLC in all of this. They are working hard on their own internal data streams, but its actually groups like Zepheira, BibFlow, and LD4P that are actively engaging catalogers. .
  15. 15. Source Code • Zepheira BIBFRAME Testing Plugin Code • Code is provided minus the API key • Includes the linked data assembly from ME • http://marcedit.reeset.net/software/plugins/source/libhub.zip
  16. 16. Contact Me: Terry Reese Head of Digital Initiatives University Libraries 175 West 18th Avenue 320F 18th Avenue Library,, Columbus, OH 43210 614-292-8263 Office / 614-407-4998 Mobile reese.2179@osu.edu / http://library.osu.edu / http://reeset.net

Editor's Notes

  • 2) I don’t want to under-emphasize this first point – especially as libraries are doing work around RDA with legacy or vendor data – or looking at ways to experiment with Linked Data or BIBFRAME or whatever comes next. Catalogers tend to be the odd folks out – they create the metadata but the tools and processed created to migrate, test, model new/emerging library metadata standards tend to not be produced with this group in mind. That makes it very difficult for catalogers to engage in these conversations, because they are looking from the outside in, without the ability to work with data and begin to explore for themselves how these changes will manifest themselves. I spend a lot of time really thinking about the role MarcEdit can play in changing that.

    3) There is a lot of great work being done in libraries – but the tools being created are not being created with librarians in mind. They are often difficult to use, require specific dependencies that sometimes require domain knowledge to resolve – basically; they are not created to be easy. These data models can be challenging…our tools shouldn’t be.
  • Why would we do this:
    If the purpose of creating data to support linked data applications, our data needs to shed any requirements related to domain specific knowledge simply to resolve an entity. As much as I like OCLC embedding fast headings with a $0 in their data – it’s essentially useless data everywhere. I can’t use it in my catalog (because the ILS doesn’t do anything with this field) and if I take it outside of my system, I could do a damn thing with it unless I convert the data. And if I’m sharing this information, I’m forced to convert the data because it’s meaningless to anyone outside of a library.
  • ×