This document discusses using the Perl programming language to manipulate MARC records for cataloging purposes. Perl is well-suited for text processing tasks like editing MARC fields. The document provides examples of using Perl scripts with MARC editing tools to create and edit MARC records for electronic resources in batch from delimited files. Specific scenarios discussed include creating records from titles lists, editing vendor-supplied records, and deriving e-journal records from print records. Resources for learning Perl and examples of Perl scripts used for MARC record manipulation are also included.
Modern PHP RDF toolkits: a comparative studyMarius Butuc
This work presents a comparative study on the RDF processing APIs implemented in PHP. We took into consideration diferent
criteria including, but not limited to: the solution for storing RDF statements, the support for SPARQL queries, performance, interoperability,
and implementation maturity.
Lexigraf is a multilingual lexicography Desktop Publishing engine, design in the late 90s by Yiannis Hatzopoulos. It was used to bring to the market a 4 language natural sciences dictionary ISBN 960-12-1276-0, ISBN-13 978-960-12-1276-0
The W3C Linked Data Platform (LDP) candidate recommendation defines a standard HTTP-based protocol for read/write Linked
Data. The W3C R2RML recommendation defines a language to map relational databases (RDBs) and RDF. This paper presents morph-LDP, a novel system that combines these two W3C standardization initiatives to expose relational data as read/write Linked Data for LDP-aware applications, whilst allowing legacy applications to continue using their relational databases.
Modern PHP RDF toolkits: a comparative studyMarius Butuc
This work presents a comparative study on the RDF processing APIs implemented in PHP. We took into consideration diferent
criteria including, but not limited to: the solution for storing RDF statements, the support for SPARQL queries, performance, interoperability,
and implementation maturity.
Lexigraf is a multilingual lexicography Desktop Publishing engine, design in the late 90s by Yiannis Hatzopoulos. It was used to bring to the market a 4 language natural sciences dictionary ISBN 960-12-1276-0, ISBN-13 978-960-12-1276-0
The W3C Linked Data Platform (LDP) candidate recommendation defines a standard HTTP-based protocol for read/write Linked
Data. The W3C R2RML recommendation defines a language to map relational databases (RDBs) and RDF. This paper presents morph-LDP, a novel system that combines these two W3C standardization initiatives to expose relational data as read/write Linked Data for LDP-aware applications, whilst allowing legacy applications to continue using their relational databases.
WebSpa is a tool that allows the quick, intuitive (and even fun) interrogation of arbitrary SPARQL endpoints. WebSpa runs in the web browser and does not require the installation of any additional software. The tool manages a large variety of pre-defined SPARQL endpoints and allows the addition of new ones. An user account gives the possibility of saving both the interrogation and its results on the local computer, as well as further editing of the queries. The application is written in both Java and Flex. It uses Jena and ARQ application programming interface in order to perform the queries, and the results are processed and displayed using Flex.
UiTM IM110 IMD253 : ORGANIZATION OF INFORMATION (IMD253) Individual Assignment Kumprinx Amin
FINAL PROJECT INDIVIDUAL:
ANALYZE AND REPORT
TABLE OF CONTENTS
Z39.50: An information Retrieval Protocol
• Introduction
• History And Backround
• Objective & Purpose
• Function
• Benefit
• Conclusion
MARC Standard
• Introduction
• History And Backround
• Objective & Purpose
• Function
• Benefit
• Conclusion
Overview of the SPARQL-Generate language and latest developmentsMaxime Lefrançois
SPARQL-Generate is an extension of SPARQL 1.1 for querying not only RDF datasets but also documents in arbitrary formats. The solution bindings can then be used to output RDF (SPARQL-Generate) or text (SPARQL-Template)
Anyone familiar with SPARQL can easily learn SPARQL-Generate; Learning SPARQL-Generate helps you learning SPARQL.
The open-source implementation (Apache 2 license) is based on Apache Jena and can be used to execute transformations from a combination of RDF and any kind of documents in XML, JSON, CSV, HTML, GeoJSON, CBOR, streams of messages using WebSocket or MQTT... (easily extensible)
Recent extensions and improvement include:
- heavy refactoring to support parallelization
- more expressive iterators and functions
- simple generation of RDF lists
- support of aggregates
- generation of HDT (thanks Ana for the use case)
- partial implementation of STTL for the generation of Text (https://ns.inria.fr/sparql-template/)
- partial implementation of LDScript (http://ns.inria.fr/sparql-extension/)
- integration of all these types of rules to decouple or compose queries, e.g.:
- call a SPARQL-Generate query in the SPARQL FROM clause
- plug a SPARQL-Generate or a SPARQL-Template query to the output of a SPARQL-
Select function
- a Sublime Text package for local development
MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh...Terry Reese
Topics:
* merging records
* building clustering tools
* moving marc data in and out of openrefine
* Integrations (oclc and alma)
Recording on Youtube: https://youtu.be/2pPru42ShqY
Although animals do not use language, they are capable of many of the same kinds of cognition as us; much of our experience is at a non-verbal level.
Semantics is the bridge between surface forms used in language and what we do and experience.
Language understanding depends on world knowledge (i.e. “the pig is in the pen” vs. “the ink is in the pen”)
We might not be ready for executives to specify policies themselves, but we can make the process from specification to behavior more automated, linked to precise vocabulary, and more traceable.
Advances such as SVBR and an English serialization for ISO Common Logic means that executives and line workers can understand why the system does certain things, or verify that policies and regulations are implemented
WebSpa is a tool that allows the quick, intuitive (and even fun) interrogation of arbitrary SPARQL endpoints. WebSpa runs in the web browser and does not require the installation of any additional software. The tool manages a large variety of pre-defined SPARQL endpoints and allows the addition of new ones. An user account gives the possibility of saving both the interrogation and its results on the local computer, as well as further editing of the queries. The application is written in both Java and Flex. It uses Jena and ARQ application programming interface in order to perform the queries, and the results are processed and displayed using Flex.
UiTM IM110 IMD253 : ORGANIZATION OF INFORMATION (IMD253) Individual Assignment Kumprinx Amin
FINAL PROJECT INDIVIDUAL:
ANALYZE AND REPORT
TABLE OF CONTENTS
Z39.50: An information Retrieval Protocol
• Introduction
• History And Backround
• Objective & Purpose
• Function
• Benefit
• Conclusion
MARC Standard
• Introduction
• History And Backround
• Objective & Purpose
• Function
• Benefit
• Conclusion
Overview of the SPARQL-Generate language and latest developmentsMaxime Lefrançois
SPARQL-Generate is an extension of SPARQL 1.1 for querying not only RDF datasets but also documents in arbitrary formats. The solution bindings can then be used to output RDF (SPARQL-Generate) or text (SPARQL-Template)
Anyone familiar with SPARQL can easily learn SPARQL-Generate; Learning SPARQL-Generate helps you learning SPARQL.
The open-source implementation (Apache 2 license) is based on Apache Jena and can be used to execute transformations from a combination of RDF and any kind of documents in XML, JSON, CSV, HTML, GeoJSON, CBOR, streams of messages using WebSocket or MQTT... (easily extensible)
Recent extensions and improvement include:
- heavy refactoring to support parallelization
- more expressive iterators and functions
- simple generation of RDF lists
- support of aggregates
- generation of HDT (thanks Ana for the use case)
- partial implementation of STTL for the generation of Text (https://ns.inria.fr/sparql-template/)
- partial implementation of LDScript (http://ns.inria.fr/sparql-extension/)
- integration of all these types of rules to decouple or compose queries, e.g.:
- call a SPARQL-Generate query in the SPARQL FROM clause
- plug a SPARQL-Generate or a SPARQL-Template query to the output of a SPARQL-
Select function
- a Sublime Text package for local development
MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh...Terry Reese
Topics:
* merging records
* building clustering tools
* moving marc data in and out of openrefine
* Integrations (oclc and alma)
Recording on Youtube: https://youtu.be/2pPru42ShqY
Although animals do not use language, they are capable of many of the same kinds of cognition as us; much of our experience is at a non-verbal level.
Semantics is the bridge between surface forms used in language and what we do and experience.
Language understanding depends on world knowledge (i.e. “the pig is in the pen” vs. “the ink is in the pen”)
We might not be ready for executives to specify policies themselves, but we can make the process from specification to behavior more automated, linked to precise vocabulary, and more traceable.
Advances such as SVBR and an English serialization for ISO Common Logic means that executives and line workers can understand why the system does certain things, or verify that policies and regulations are implemented
Detailed presentation on various analytical tools widely used in Corpus Linguistics for corpora analysis including WORDCRUNCHER, LEXA, CWB , TACT, MICROCONCORD etc.
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...datascienceiqss
It would be useful to be able to discover what kinds of data are contained in the myriad general-purpose public data repositories. It would be even better if it were possible to query that data and/or have that data conform to a particular context-dependent data format. This was the ambition of the Data FAIRport project. I will be demonstrating the "strawman" demonstration of a fully-functional Data FAIRport, where the meta/data in a public repository can be "projected" into one of a number of different context-dependent formats, such that it can be cross-queried in combination with the (potentially "projected") data from other repositories.
As technology and needs evolve and the need for scalable and high availability solutions increase there is a need to evaluate new databases. The lack of clarity in the market makes in difficult for IT stakeholders to understand the differences between the solutions available and the choice to make. The key areas to consider while evaluating NoSql databases are data model, query model, consistency model, APIs, support and community strength.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
The Art of the Pitch: WordPress Relationships and Sales
CatConf2001
1. I name thee Bay of Pe(a)rls : some practical virtues of Perl for cataloguers
Jenny Quilliam
Abstract
With the increasing numbers of aggregated electronic resources, libraries now tend to ‘collect in batches’.
These aggregated collections may not be permanent and are subject to frequent and significant frequent content
changes. One survival strategy for Cataloguers is to ‘catalogue in batches’. While some publishers and vendors
are now supplying files of MARC records for their aggregated resources, these often need to be adapted by
libraries to include local authentication and access restriction information.
Perl (Practical Extraction and Reporting Language) – is an easy to learn programming language which was
designed to work with chunks of text – extracting, pattern matching / replacing, and reporting. MARC records
are just long strings of highly formatted text and scripting with Perl is a practical way to edit fields, to add local
information, change subfields, delete unwanted fields etc. – any find-and-replace or insert operation for which
the algorithm can be defined.
As cataloguers are already familiar with MARC coding and can define the algorithms, learning a bit of Perl
means that cataloguers can easily add a few strings of Perls to their repertoire of skills
Introduction
In reviewing the literature on current and future roles for cataloguers, two major themes emerge: cataloguers
need to be outcomes focussed and that new competencies are required to address the challenges in providing
bibliographic access control for remote-access online resources.
Electronic resources – primarily fulltext electronic journals and fulltext aggregated databases – have
significantly improved libraries’ ability to deliver content to users regardless of time and distance. Integrated
access means that the library catalogue must reflect all the resources that can be accessed especially those that
are just a few clicks away. Macro cataloguing approaches are needed to deal with the proliferation of electronic
resources and the high maintenance load caused by both long-term and temporary associated content volatility
of these resources.
In the United States, the Federal Library and Information Center Committee’s Personnel Working Group
(2001) is developing Knowledge, Skills and Abilities statements for its various professional groups. For
Catalogers, it has identified abilities including:
• Ability to apply cataloging rules and adapt to changing rules and guidelines
• Ability to accept and deal with ambiguity and make justifiable cataloguing decisions in the absence of
clear-cut guidelines
• Ability to create effective cataloging records where little or not precedent cataloguing exists
Anderson (2000) argues that without decrying the importance of individual title cataloguing, macro-
cataloguing approaches to manage large sets of records are essential. Responsibility for managing quality
control, editing, loading, maintaining and unloading requires the “Geek Factor”. In a column which outlined
skills required for librarians to manage digital collections, Tennant (1999) observed that while digital librarians
do not need to be programmers, it is useful to know one’s way around a programming language and while the
specific languages will vary a “general purpose language such as Perl can serve as a digital librarian’s Swiss
Army knife – something that can perform a variety of tasks quickly and easily”.
2. What is Perl and why is it useful?
Perl is the acronym for Practical Extraction and Report Language. It is a high-level interpreted language
optimized for scanning arbitrary text files and extracting, manipulating and reporting information from those
text files. Unpacking this statement:
• high-level = humans can read it
• interpreted = doesn’t need to be compiled and is thus easier to debug and correct
• text capabilities = Perl handles text in much the same way as people do
Perl is a low cost – free - scripting language with very generous licensing provisions. To write a Perl script all
you need is any text editor – e.g. Notepad or Arachnaphilia - as Perl scripts are just plain text files
Perl is an outcomes focussed programming language – the ‘P” in Perl means practical and it is designed to get
things done. This means that it is complete, easy to use and efficient. Perl uses sophisticated pattern-matching
techniques to scan large amounts of data very quickly and it can do tasks that in other programming languages
would be more complex, take longer to write, debug and test. There are often many ways to accomplish a task
in Perl.
Perl is optimized for text processing – and this is precisely what is required in creating editing and otherwise
manipulating MARC records. A word of caution - while Perl is more forgiving that many other programming
languages, there is a structure and syntax to be observed – in many ways familiar territory to cataloguers who
deal with AACR2R and MARC rules, coding and syntax.
Resources for learning Perl
There are many how-to books on Perl. If you have no previous programming knowledge, two introductory texts
are Paul Hoffman’s Perl 5 for dummies or Schwartz & Christiansen’s Learning Perl. Both are written in a
gentle tutorial style, with comprehensive indexes and detailed tables of contents. Another useful resource is the
Perl Cookbook, which contains around 1000 how-to-recipes for Perl – giving firstly the quick answer followed
by a detailed discussion of the answer to the problem
For online resources, an Internet search on the phrase ‘Perl tutorial’ yields pages of results. Two examples of
beginner level tutorials are Take 10mins to learn Perl and Nik Silver’s Perl tutorial.
How much Perl is needed to manipulate MARC records?
The good news is “not a lot” – there are a number of tools available to deal with the more challenging
intricacies of the MARC format – the directory structure and offsets, field and subfield terminators etc. These
MARC editing tools (discussed below) allow you to deal with MARC records in a tagged text format rather
than the single string. Not only is a tagged text format much easier to read (for humans) but it can be easily
updated and manipulated using simple Perl scripts.
Certainly to create a useful Perl script need to learn how to open files for reading and writing, something about
control structures, conditionals and pattern matching and substitution.
MARC record tools
There are a range of MARC editing tools available for use and the Library of Congress maintains a listing of
MARC Specialized Tools at: http://lcweb.loc.gov/marc/marctools.html
MARCBreaker is a Library of Congress utility for converting MARC records into an ASCII text file format. It
has a complimentary utility, MARCMaker, which can then be used to reformat from this file format into
MARC records. The current version only runs under DOS and Windows 95/98. There is also a companion
3. MarcEdit utility to MARCBreaker/MARCMaker developed by Terry Reese (2001). MarcEdit is currently in
version 3.0 and has a number of useful editing features including global field addition and deletion.
Simon Huggard and David Groenewegen in their paper ‘E-data management: data access and cataloguing
strategies to support the Monash University virtual library’ outline the use of MARCBreaker and MARCMaker
to edit record sets for various database aggregates. The Virtual University of Virginia (VIVA) has also used
MARCMaker together with the MARC.pm module to convert and manipulate MARC records for electronic
texts.
MARC.pm is Perl 5 module for preprocessing, converting and manipulating MARC records. SourceForge
maintains an informative website for MARC.pm that includes documentation with a few examples. It is a
comprehensive module that can convert from MARC to ASCII, HTML, and XML and includes a number of
‘methods’ with options to create, delete and update MARC fields. Using MARC.pm requires a reasonable
knowledge of Perl and general programming constructs. MARC.pm is used by the JAKE project to create
MARC records. Michael Doran, University of Texas at Arlington, uses MARC.pm together with Perl scripts to
preprocess MARC records for netLibrary. A description of this project can be found at:
http://rocky.uta.edu/doran/preprocess/process.html
marc.pl is a Perl module written by Steve Thomas from Adelaide University. It is a utility for extracting record
from a file of MARC records, and converting records between standard MARC format and a tagged text
representation and vice-versa from tagged text to MARC. One of the best features of this utility is the ability to
add tags globally to each record by the use of a globals tagged text file. The marc.pl utility with documentation
is available for download at: www.library.adelaide.edu.au/~sthomas/scripts/
It uses command line switches to specify the output format and options to include a global file or skip records.
By default, marc.pl creates serial format MARC records, Leader ‘as’ so it is particularly suited to creating
records for electronic journals in aggregated databases and publisher collections. The tagged text format
required by marc.pl is simple – each field is on a separate line, the tag and indicator information is separated by
a space and subfields are terminated with a single dagger delimiter. Records are separated by a blank line.
To use marc.pl it is helpful to know what Perl is and this is why I first dived [paddled is probably a more
accurate verb] into the world of Perl. Once in though, it is easy to learn enough to write simple Perl scripts.
Scenarios for Perl scripting with MARC records
Three scenarios where Perl scripting is used for cataloguing purposes:
• Creating brief MARC records from delimited titles lists
• Editing vendor-supplied MARC record files to adapt for local requirements
• Deriving MARC records for ejournals based on the print version.
The Final report of the Program for Cooperative Cataloging’s Task Group on Journals in Aggregator Databases
(2000) provides a useful checklist of appropriate tags and content when scripting to either create or derive
MARC records. It lists proposed data elements for both machine-generated and machine-derived (i.e. from
existing print records) aggregator analytics records
Depending on whether there is an existing file of MARC records the records creation/manipulation process
steps are:
1. Convert from MARC to tagged text using marc.pl or capture vendors delimited titles, ISSN, coverage file
1. Edit tagged text using a locally written Perl script
2. Create a globals tagged text file for fields, including a default holdings tag, to be added to each record
3. Convert from tagged text to MARC using marc.pl
4. Load resulting file of MARC records to the library system
Creating brief MARC records from delimited titles lists
When no MARC record set exists for an aggregated database, Perl scripts are used to parse delimited titles,
ISSN, coverage and URL information into MARC tagged text. The resulting tagged text file is then formatted
to MARC incorporating a global tagged text file using marc.pl to create as set of records.
4. In brief, all the Perl script has to do is to open the input file for reading, parse the information into the
appropriate fields, format it as tagged text and write the tags to an output file. This approach has been used to
create records for several databases including IDEAL, Emerald, Dow Jones Interactive and BlackwellScience.
For some publisher databases, fuller records with subject access have been created by adding one or more
subject heading terms for each title in the delimited titles file.
Appendix 1 shows the simple Perl script written to process Emerald records. Appendix 2 shows an example of
the resulting tagged text together with the global file used for Emerald.
Editing Vendor-supplied MARC records
Database vendors now make available files of records for their various aggregated databases. EBSCO
Publishing had undertaken a pilot project for the PCC Task Group on Aggregator Databases to derive records
for aggregated databases and their records are freely available to subscribers. When the University of South
Australia subscribed to the Ebsco MegaFile offer in late 1999, the availability of full MARC records was
regarded as a definite advantage. However these records required preprocessing to include UniSA-specific
information, change the supplied title level URLs to incorporate Digital Island access, and add a second URL
for off campus clients. Additional edits include changing GMD from [computer file] to [electronic journal] and
altering subject headings form subdivision coding from ‘x’ to ‘v’. Again to enable bulk deletion for
maintenance purposes, a tag to create a default holding was required.
The Perl scripts for these files do string pattern matching and substitution or [semi-global] find-and-replace
operations. In many cases, these changes could be done with a decent text editor with find/replace capabilities
and if dealing with the records on a one-off basis this is practical process. However aggregator databases are
notoriously volatile – changing content frequently – and hence the record sets need to be deleted and new files
downloaded from the vendor site, edited and loaded to the library system. So it’s worth spending a little time to
write a custom Perl editing script. Appendix 3 shows a script to edit Ebsco-sourced records.
Until mid-2000, Ebsco did not include publisher’s embargo periods in their MARC records but maintained a
separate embargoes page – hence further scripting to incorporate this information was needed. Vendor MARC
records are also available for the Gale and Proquest databases.
A variation of this process is also used to preprocess netLibrary MARC records – adding a default holding,
second remote authentication URL, and to edit the GMD.
Deriving MARC records for ejournals from print records
The third scenario where Perl scripts are used with MARC records is deriving records for the electronic version
from existing records. At UniSA we have reworked existing MARC records for print titles to create ejournal
records for APAIS FullText. No records were available as ejournals and as we already had print records for a
majority of titles, it was decided to rework these records into ejournal records. Title, ISSN and coverage
information was captured from the Informit site and edited into a spreadsheet. During the pre-subscription
evaluation process, APAIS FullText titles had been searched to the UniSA catalogue and bibkeys of existing
records noted. MARC records for these titles were exported from the catalogue as tagged text. For the titles not
held at UniSA, bibliographic records were captured to file from Kinetica and then converted to tagged text. The
ISSN and coverage data was also exported in tab-delimited format from the spreadsheet. By matching on ISSN,
the fulltext coverage information could be linked to each title and incorporated into the MARC record.
The records were edited following the PCC’s (2000) proposed data elements for machine-derived records –
deleting unwanted fields, adding and editing fields as needed. A globals file was used to add tag 006 and 007
data, tag 530 additional physical form note, a 590 local access information note, a 773 Host-item entry for the
database, a 710 for the vendor Informit and a default local holdings tag.
The Perl script to process records is longer than the earlier examples but no more complex – it just does more
deleting, updating and reworking. Appendix 4 shows an example of a print record for APAIS Fulltext – the
original print form, the edited form, the globals file and the final record as an ejournal.
5. Conclusion
While Perl is currently mostly used to deal with the challenges of providing and maintaining MARC records
for electronic resources, scripts are also used to post-process original cataloguing for all formats for batch
uploading to Kinetica. The uses of Perl in the cataloguer’s toolkit can be many and varied – it is a not-so-little
language that can and does! And it’s fun!
Appendix 1 – Perl script to edit Emerald titles file
# !/usr/local/bin/perl
# Script to edit Emerald tab-delimited title file into tagged text
# Entries contain Title, ISSN, Coverage and specific URL
# Written: Jenny Quilliam Revised: August 2001
# Command line >perl Emerald_RPA.pl [INPUT FILE] [OUTPUT FILE]
#
#################################################################################
$TheFile = shift;
$OutFile = shift;
open(INFILE, $TheFile) or die "Can't open Inputn";
open(OUTFILE, ">$OutFile") or die "Can't open Outputn";
# control structure to read and process each line from the input file
while (<INFILE>)
{
s/"//g ; #deleting any quote marks from the string
$TheLine = $_ ;
chomp($TheLine);
#parsing the contents at the tab delimiters to populate the variables
($ISSN, $Title, $Coverage, $URL) = split(/t/, $TheLine);
#printing out blank line between records
print OUTFILE "n";
# processing ISSN
print OUTFILE "022 |a$ISSNn" ;
# processing Title - fixing filing indicators
# checking for leading The in Title
if($Title =~ /^The /)
{print OUTFILE "245 04|a$Title|h[electronic journal]n"; }
else
{print OUTFILE "245 00|a$Title|h[electronic journal]n";}
# processing to generate URL tag with Coverage info
print OUTFILE "856 40|zFulltext from: $Coverage.";
print OUTFILE "This electronic journal is part of the Emerald database.";
print OUTFILE " Access within University network.|u$URLn";
# adding generic RPA URL link to all records
print OUTFILE "856 41|zAccess outside University network.";
print OUTFILE
"|uhttp://librpa.levels.unisa.edu.au/rpa/webauth.exe?rs=emeraldn";
}
close(INFILE);
close(OUTFILE);
6. Appendix 2 – Global and example of tagged text for Emerald titles
006 m d
007 cr cn-
008 001123c19uu9999enkuu p 0 a0eng d
040 |aSUSA|beng|cSUSA
260 |a[Bradford, England :|bMCB University Press.]
530 |aOnline version of the print publication.
590 |aAvailable to University of South Australia staff and students. Access is
by direct login from computers within the University network or by authenticated
remote access. Articles available for downloading in PDF and HTML formats.
773 0 |tEmerald
991 |cEJ|nCAE|tNFL
___________________________________________________________________________
001 jaq00-05205
245 00|aAsia Pacific Journal of Marketing & Logistics|h[electronic journal]
022 |a0945-7517
856 40|zFulltext from: 1998. This electronic journal is part of the Emerald
library database. Access within University network.
|uhttp://www.emeraldinsight.com/094-57517.htm
856 41|zAccess outside University network.
|uhttp://librpa.levels.unisa.edu.au/rpa/webauth.exe?rs=emerald
Appendix 3 – Perl script to edit Ebsco sourced records
# !/usr/local/bin/perl
#
# Author: Jenny Quilliam November 2000
#
# Program to edit EbscoHost records [as converted to text using marc.pl]
# GMD to be altered to: electronic journal
# Form subfield coding to be altered to v
# French subject headings to be deleted
# Fix URL to incorporate Digital Island access
# Command line string, takes 2 arguments:
# Command line: mlx> perl EHedit.pl [input filename] [output filename]
#############################################################################
$TheFile = shift;
$OutFile = shift;
open(INFILE, $TheFile) or die "Can't open inputn";
open(OUTFILE, ">$OutFile") or die "Can't open outputn";
while (<INFILE>)
{
$TheLine = $_ ;
# processing selected lines only
# editing the GMD in the 245 field from [computer file] to [electronic journal]
if($TheLine =~ /^245/) { $TheLine =~ s/computer file/electronic journal/g;} #
# editing subject headings to fix form subdivision subfield character
if($TheLine =~ /^65/) { $TheLine =~ s/xPeriodicals/vPeriodicals/g;}
# editing out French subject headings
if($TheLine =~ /^650 6/) {next}
# editing URL to add .global to string for Digital Island address
if($TheLine =~ /^856/) {$TheLine =~ s/search.epnet/search.global.epnet/g ;}
print $TheLine;
print OUTFILE $TheLine;
}
close(INFILE);
close(OUTFILE);
7. Appendix 4 – APAIS FullText examples
Print record
LDR 00824nas 2200104 a 4500
001 dup91000065
008 820514c19739999vrabr p 0 0 0eng d
022 0 $a0310-2939
035 $a(atABN)2551638
035 $u145182
040 $dSIT$dSCAE
043 $au-at---
082 0 $a639.9$219
245 00 $aHabitat Australia.
259 00 $aLC$bP639.9 H116$cv.2, no.1 (Mar. 1974)-
260 01 $aHawthorn, Vic. :$bAustralian Conservation Foundation,$c1973-
300 $av. :$bill. (some col.), maps ;$c28 cm.
362 0 $aVol. 1, no. 1 (June 1973)-
580 $aAbsorbed Peace magazine Australia. Vol 15, no. 4 (Aug. 1987)
650 0 $aNatural resources$xResearch$zAustralia.
650 0 $aConservation of natural resources$zAustralia.
710 20 $aAustralian Conservation Foundation.
780 05 $tPeace magazine Australia$x0817-895X
984 $a2036$cCIT PER 304.2 HAB v.1 (1973)-$cUND PER 304.2 HAB v.1 (1973)-$cMAG
PER 333.9506 H116 v.1 (1973)-$cSAL PER 333.705 H11 v.1 (1973)-
EndRecord
Edited record
LDR 00824nas 2200104 a 4500
001 jaq01-0607
008 820514c19739999vrabr p 0 0 0eng d
022 0 |a0310-2939
082 0 |a639.9|219
245 00|aHabitat Australia|h[electronic journal].
260 |aHawthorn, Vic. :|bAustralian Conservation Foundation,|c1973-
362 0 |aVol. 1, no. 1 (June 1973)-
580 |aAbsorbed Peace magazine Australia. Vol 15, no. 4 (Aug. 1987)
650 0|aNatural resources|xResearch|zAustralia.
650 0|aConservation of natural resources|zAustralia.
710 2 |aAustralian Conservation Foundation.
780 05|tPeace magazine Australia|x0817-895X
856 41|zSelected fulltext available: Vol. 24- (June 1996-) .Access via Australian
public affairs full text.|uhttp://www.informit.com.au
991 |cEJ|nCAE|tNFL
Globals file
006 m d
007 cr anu
040 |aSUSA
530 |aOnline version of the print title.
590 |aAvailable to University of South Australia staff and students. Access is
by direct login from computers within the University network or by login and
password for remote users. File format and amount of fulltext content of journals
varies.
710 2 |aInformit.
773 0 |tAustralian public affairs full text|dMelbourne, Vic. : RMIT Publishing,
2000-.
991 |cEJ|nCAE|tNFL
8. References
Anderson, B., 1999, ‘Cataloging issues’ paper presented to Technical Services Librarians: the training we
need, the issues we face, PTPL Conference 1999. http://www.lib.virginia.edu/ptpl/anderson.html
Christiansen, T. & Torkington, N.,1998, Perl cookbook, O’Reilly, Sebastapol CA.
FLICC Personnel Working Group (2001) Sample KSAs for Librarian Positions: Catalogers
http://www.loc.gov/flicc/wg/ksa-cat.html
Hoffman, P. 1997, Perl 5 for dummies, IDG Books, Foster City CA.
Huggard, S. & Groenewegen, D., 2001, ‘E-data management: data access and cataloguing strategies to support
the Monash University virtual library’, LASIE, April 2001, p.25-42.
Library of Congress’s MARCBreaker and MARCMaker programs available at:
http://lcweb.loc.gov/marc/marc/marctools.html
Program for Cooperative Cataloging Task Group on Aggregator Databases, 2000, Final report.
http://lcweb/loc/gov/catdir/pcc/aggfinal.html
Reese, T. MarcEdit 3.0 program available at: http://ucs.orst.edu/~reeset/marcedit/index.html
Schwartz, R. & Christiansen, T. 1997, Learning Perl, 2nd ed., O’Reilly, Sebastapol CA.
Silver, Nik, Perl tutorial. http://fpg.uwaterloo.ca:80/perl/
Take 10 min to learn Perl http://www.geocities.com/SiliconValley/7331/ten_perl.html
Tennant, R. 1999 ‘Skills for the new millenium’, LJ Digital, January 1, 1999.
http://www.libraryjournal.com/articles/infotech/digitallibraries/19990101_412.htm
Thomas, S. marc.pl utility available at: http://www.library.adelaide.edu.au/~sthomas/scripts/
Using MARC.pm with batches of MARC records : the VIVA experience, 2000. [Online]
http://marcpm.sourceforeg.net/examples/viva.html
Author
Jenny Quilliam
Coordinator (Records)
Technical Services
University of South Australia Library
Email: jenny.quilliam@unisa.edu.au