#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit


Published on

Slides for #mashcat webinar

Published in: Education

#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

  1. 1. #mashcat: Evolving MarcEdit LEVERAGING SEMANTIC DATA IN MARCEDIT
  2. 2. Little History MarcEdit Development started around 1999ish (as parts) ◦ Originally coded in 3 programming languages: Assembler (libraries), Visual Basic (UI) and Delphi (COM). ◦ I started writing it as an undergraduate to better understand MARC & circumvent OCLC’s Passport for Windows program ◦ First “MarcEdit” was released Sept. 11, 2000 (thank you WayBack Machine: http://web.archive.org/web/20001017105529/http://ucs.orst.edu/~reeset/marcedit/indexb.html) Today: ◦ Written in C# (Windows/Linux) & Object-C/C# (OSX) ◦ Active user community is ~20,000ish (based on update logs) ◦ Used in ~190ish countries/political regions ◦ Roughly 1/3 of the users reside outside of Canada/United States* * Based on loose analysis of server logs by my server-side stats software
  3. 3. MarcEdit Evolution MarcEdit 1.0-2.0 Main Window MarcEdit MARC Tools 1.0-2.0 MarcEdit 1.0-2.0 MarcEditor
  4. 4. MarcEdit Evolution Early application was developed to (again, thank you Internet Archive): 1. Be user-friendly (whether I’ve accomplished that is debatable – I’m not a UI designer) 2. Support LC’s MARCBreakr/Maker diacritics (largely yes) 3. Be fast (which I think that it is) 4. Simplify editing records in batch 5. Provide a set of programming tools to solve my own local needs
  5. 5. MarcEdit Today
  6. 6. Three development rules I follow MarcEdit is a real-world metadata tool ◦ Tool is designed to provide workflows for data problems currently facing libraries right now MarcEdit is MARC Agnostic ◦ Too many metadata tools are anglo-centric; MarcEdit has been designed to work within the very heterogeneous metadata environment that we find ourselves today, which includes: ◦ Support for MARC (not a particular flavor*) ◦ Near universal characterset support (because the world is bigger than MARC8 and UTF8) ◦ Supports a wide range of Library metadata standards beyond MARC MarcEdit is one part of the larger library metadata tooling environment ◦ So integrations with OCLC, ILSs (when possible), OpenRefine are important * And if something assumes MARC21 – call it out
  7. 7. So how does any of this relate to semantic data in Libraries? http://musictheorysite.com/img/dwight_question.jpg
  8. 8. A lot of metadata people I talk to fall into two camps
  9. 9. BibFrame and Linked Data as RDA 2.0 BibFrame http://www.wired.com/wp-content/uploads/archive/news/images/full/duke_nukem_frever_f.16807.jpg http://astronomy.nmsu.edu/cwc/Group/magiicat/images/magiicat-logo.gif Linked Data
  10. 10. BibFrame and linked data as datacorns https://whatsthebigdata.files.wordpress.com/2015/10/datascience_unicorn.png?w=640
  11. 11. I prefer a more practical outlook… https://www.etsy.com/search?q=unicorn+cat+hat
  12. 12. MarcEdit’s MARCNext MarcEdit’s MARCNext is a first attempt to start having this discussion by: 1. Integrating a linked data framework into MarcEdit, including tooling for: a. JSON-LD b. SPARQL c. RDF 2. Providing catalogers with proof of concept tools to begin experimenting with their own data 3. Provide a method to integrate semantic concepts into legacy data 4. Provide a toolset that MarcEdit can use to build new tools.
  13. 13. Let’s take a closer look at two Link Identifiers Tool ◦ This tool embeds URIs into MARC data ◦ Is rules driven (i.e., not MARC21 centric) ◦ Supports ~24 different in-use data sources Validate Headings Tool ◦ First tool in MarcEdit to make use of the tools linked data platform and available data services to provide a real-world application.
  14. 14. Link Identifiers Tool
  15. 15. Link Identifiers Tool Initially released in Aug. 2014[1] as a proof of concept for testing the linked data framework being developed in MarcEdit ◦ Initially only processed LCSH and NAF Currently, I’ve profiled ~24 data sources, and the tool can be integrated in MarcEdit’s Task Workflow. ◦ Translation profiles are currently in flux, as I work with a PCC group developing recommendations for embedding URIs in MARC records. ◦ Working on a process that would allow users to self-profile identifier services, so long as they supported JSON-LD or SPARQL. [1] MarcEdit’s Research Toolkit: MARCNext: http://blog.reeset.net/archives/1359
  16. 16. Link Identifiers Tool Tool has evolved over the last year to utilize a rules based configuration (example): <field type="bibliographic"> <tag>630</tag> <ind2 value="0" vocab="naf_lcsh" /> <ind2 value="1" vocab="lcshac" /> <ind2 value="2" vocab="mesh" /> <subfields>adfkqnp</subfields> <uri>0</uri> <special_instructions>mixed</special_instructions> </field> <field type="authority|bibliographic"> <tag>336</tag> <subfields>a</subfields> <index>2</index> <uri>0</uri> </field>
  17. 17. Linked Identifiers: Turning strings =336 $atext$btxt$2rdacontent =337 $aunmediated$bn$2rdamedia =338 $avolume$bnc$2rdacarrier =600 10$6880-06$aHu, Zongnan,$d1896-1962$vDiaries. =650 0$aGenerals$zChina$vBiography. =650 0$aGenerals$zTaiwan$vBiography. =600 17$aHu, Zongnan,$d1896-1962.$2fast$0(OCoLC)fst00131171 =650 7$aGenerals.$2fast$0(OCoLC)fst00939841 =651 7$aChina.$2fast$0(OCoLC)fst01206073 =651 7$aTaiwan.$2fast$0(OCoLC)fst01207854 =655 7$aDiaries.$2lcgft =655 7$aAutobiographies.$2lcgft
  18. 18. Linked Identifiers: into strings+ =336 $atext$btxt$2rdacontent$0http://id.loc.gov/vocabulary/contentTypes/txt =337 $aunmediated$bn$2rdamedia$0http://id.loc.gov/vocabulary/mediaTypes/n =338 $avolume$bnc$2rdacarrier$0http://id.loc.gov/vocabulary/carriers/nc =600 10$6880-06$aHu, Zongnan,$d1896-1962$vDiaries.$0http://id.loc.gov/authorities/names/n84029846 =650 0$aGenerals$zChina$vBiography.$0http://id.loc.gov/authorities/subjects/sh2008105087 =650 0$aGenerals$zTaiwan$vBiography.$0http://id.loc.gov/authorities/subjects/sh2008105117 =600 17$aHu, Zongnan,$d1896-1962.$2fast$0http://id.worldcat.org/fast/00131171 =650 7$aGenerals.$2fast$0http://id.worldcat.org/fast/00939841 =651 7$aChina.$2fast$0http://id.worldcat.org/fast/01206073 =651 7$aTaiwan.$2fast$0http://id.worldcat.org/fast/01207854 =655 7$aDiaries.$2lcgft$0http://id.loc.gov/authorities/genreForms/gf2014026085 =655 7$aAutobiographies.$2lcgft$0http://id.loc.gov/authorities/genreForms/gf2014026047
  19. 19. Example
  20. 20. Linked Data tools Things that are still hard: ◦ Most identifier services use their own rules for data escaping – and they aren’t documented ◦ Many services are still not well suited for this work ◦ Anything that doesn’t provide an option to do an exact lookup like ULAN, AAT, or VIAF – all these require additional processing to ensure that results match the queried term. ◦ Many services are little “p” production in that lots of look-ups can (and do) cause problems.
  21. 21. Validate Headings Automated authority control processing ◦ Utilizes id.loc.gov ◦ Provides reports of data that isn’t currently “authorized” ◦ Provides options for generating brief authorities ◦ Extracts for further data processing ◦ Ability to embed URIs during validation ◦ If URIs are present – they are used rather than a direct look up ◦ Automatic heading correction when variants are encountered
  22. 22. Validate Headings Validate Headings can be run from inside the MarcEditor, or outside as a stand alone tool
  23. 23. Example
  24. 24. Continued work… Would like to continue to add additional vocabularies Expand headings validation to more than just LCSH/NAF Include Linking Profiles for UNIMARC Using Linked Data sources for sameas subject generation
  25. 25. Questions Contact Information: Terry Reese Email: reese.2179@osu.edu or reeset@gmail.com MarcEdit Website: http://marcedit.reeset.net Help: http://marcedit.reeset.net/help