This is an archive on a webinar delivered on January 12, 2012. Description: If you’re really new to cataloging, this session is for you. In this 90-minute online session, facilitated by NEKLS technology librarian Heather Braum, you will:
learn the basic principles behind cataloging,
discover why librarians catalog,
learn to read a basic MARC record,
see what a good MARC record looks like,
learn basic cataloging terminology,
and practice describing different materials.
Special thanks to Robin Fay for allowing me to use a couple of the ideas shared in this webinar and presentation. See her outstanding slides: http://www.slideshare.net/robinfay/cataloging-basics-presentation.
Trabalho realizado para a disciplina de Gestão de Redes e Pessoas pela Universidade Federal de São Carlos pelos seguintes graduandos de Biblioteconomia e Ciência da Informação: Klicia Mendonça, Paulo Aparecido Rodriguês da Silva e Rebeca Carrari.
The first part of a day-long presentation made on November 3, 2009, covering various aspects of library cataloging, MARC records, FRBR, RDA, authority control, etc.
ترجمة محمد عبد الحميد معوض / تقييم مجموعة أدوات RDAMuhammad Muawwad
يتناول هذا العرض التقديمي تقييما لمجموعة أدوات وصف وإتاحة المصادر و مقارنتها بالنسخة التحريبية RDA Toolkit. كما يتناول تطبيق نموذج الإفلا للمكتبة المرجعي LRM . وعرض فكرة عن مشروع إعادة هيكلة وإعادة تصميمي مجموعة أدوات RDA المعروف اختصارا بـ 3R وما تم من إضافات وتغييراتفي الكيانات والعناصر. كم تم عرض فكرة عن فصول المبادئ التوجيهية وطرق التسجيل للبيانات، ومفهوم التجميعات والأعمال المتعاقبة و معالجة الدوريات وفق معيار RDA
This is an archive on a webinar delivered on January 12, 2012. Description: If you’re really new to cataloging, this session is for you. In this 90-minute online session, facilitated by NEKLS technology librarian Heather Braum, you will:
learn the basic principles behind cataloging,
discover why librarians catalog,
learn to read a basic MARC record,
see what a good MARC record looks like,
learn basic cataloging terminology,
and practice describing different materials.
Special thanks to Robin Fay for allowing me to use a couple of the ideas shared in this webinar and presentation. See her outstanding slides: http://www.slideshare.net/robinfay/cataloging-basics-presentation.
Trabalho realizado para a disciplina de Gestão de Redes e Pessoas pela Universidade Federal de São Carlos pelos seguintes graduandos de Biblioteconomia e Ciência da Informação: Klicia Mendonça, Paulo Aparecido Rodriguês da Silva e Rebeca Carrari.
The first part of a day-long presentation made on November 3, 2009, covering various aspects of library cataloging, MARC records, FRBR, RDA, authority control, etc.
ترجمة محمد عبد الحميد معوض / تقييم مجموعة أدوات RDAMuhammad Muawwad
يتناول هذا العرض التقديمي تقييما لمجموعة أدوات وصف وإتاحة المصادر و مقارنتها بالنسخة التحريبية RDA Toolkit. كما يتناول تطبيق نموذج الإفلا للمكتبة المرجعي LRM . وعرض فكرة عن مشروع إعادة هيكلة وإعادة تصميمي مجموعة أدوات RDA المعروف اختصارا بـ 3R وما تم من إضافات وتغييراتفي الكيانات والعناصر. كم تم عرض فكرة عن فصول المبادئ التوجيهية وطرق التسجيل للبيانات، ومفهوم التجميعات والأعمال المتعاقبة و معالجة الدوريات وفق معيار RDA
استراتيجيات تطبيق RDA في المكتبات العربيةMohamed Mahdy
o الواقع الحالي لتطبيق قواعد الفهرسة في المكتبات العربية.
o مصطلحات مهمة (Work, Manifestation, Expression, Item )
o RDA Toolkit
o أبرز ملامح التغيير في القواعد الجديدة.
o بيانات وحقول الفهرسة الوصفية.
o تدريبات عملية وتطبيقية.
o تجارب رائدة.
"Dear Students,
Greetings from www.etraining.guru
We provide BEST online training for IBM DB2 LUW/UDB DBA by a database architect. Our DB2 Trainer comes with a working experience of 11+ years, 9+ years in DB2 and a DB2 certified professional.
DB2 LUW DBA Course Content: http://www.etraining.guru/course/dba/online-training-db2-luw-udb-dba
Course Cost: USD 350 (or) INR 21000
Number of Hours: 30-35 hours
Regards,
Karthik
www.etraining.guru
Importancia de las bases de datos en las áreas: Empresarial, Gubernamental, E...Erick Umanchuk
Es una breve descripción tipo ensayo de la importancia de las bases de datos en las áreas: Empresarial, Gubernamental y Educativa, con la finalidad de entender mejor cómo se puede implementar una base de datos correctamente y que aspectos generales tomar en cuenta para su mejora.
High Availability & Disaster Recovery with SQL Server 2012 AlwaysOn Availabil...turgaysahtiyan
The AlwaysOn Availability Groups feature is a high-availability and disaster-recovery solution that provides an enterprise-level alternative to database mirroring. Introduced in SQL Server 2012, AlwaysOn Availability Groups maximizes the availability of a set of user databases for an enterprise. In this session we will talk about what’s coming with Always On, and how does it help to improve high availability and disaster recovery solutions.
Disponibiliza slides de uma aula sobre a Classificação Decimal de Dewey (CDD). Os slides falam também sobre alguns Sistemas de Classificações Bibliográficas e Científicas, assim como sobre a utilização da CDD na atualidade. Os slides também constam de alguns exercícios de classificação pela CDD.
Palestra proferida na Fundação Escola de Sociologia e Política de São Paulo - Curso de Biblioteconomia, abordando as tendências da Catalogação Descritiva, em especial a norma RDA.
استراتيجيات تطبيق RDA في المكتبات العربيةMohamed Mahdy
o الواقع الحالي لتطبيق قواعد الفهرسة في المكتبات العربية.
o مصطلحات مهمة (Work, Manifestation, Expression, Item )
o RDA Toolkit
o أبرز ملامح التغيير في القواعد الجديدة.
o بيانات وحقول الفهرسة الوصفية.
o تدريبات عملية وتطبيقية.
o تجارب رائدة.
"Dear Students,
Greetings from www.etraining.guru
We provide BEST online training for IBM DB2 LUW/UDB DBA by a database architect. Our DB2 Trainer comes with a working experience of 11+ years, 9+ years in DB2 and a DB2 certified professional.
DB2 LUW DBA Course Content: http://www.etraining.guru/course/dba/online-training-db2-luw-udb-dba
Course Cost: USD 350 (or) INR 21000
Number of Hours: 30-35 hours
Regards,
Karthik
www.etraining.guru
Importancia de las bases de datos en las áreas: Empresarial, Gubernamental, E...Erick Umanchuk
Es una breve descripción tipo ensayo de la importancia de las bases de datos en las áreas: Empresarial, Gubernamental y Educativa, con la finalidad de entender mejor cómo se puede implementar una base de datos correctamente y que aspectos generales tomar en cuenta para su mejora.
High Availability & Disaster Recovery with SQL Server 2012 AlwaysOn Availabil...turgaysahtiyan
The AlwaysOn Availability Groups feature is a high-availability and disaster-recovery solution that provides an enterprise-level alternative to database mirroring. Introduced in SQL Server 2012, AlwaysOn Availability Groups maximizes the availability of a set of user databases for an enterprise. In this session we will talk about what’s coming with Always On, and how does it help to improve high availability and disaster recovery solutions.
Disponibiliza slides de uma aula sobre a Classificação Decimal de Dewey (CDD). Os slides falam também sobre alguns Sistemas de Classificações Bibliográficas e Científicas, assim como sobre a utilização da CDD na atualidade. Os slides também constam de alguns exercícios de classificação pela CDD.
Palestra proferida na Fundação Escola de Sociologia e Política de São Paulo - Curso de Biblioteconomia, abordando as tendências da Catalogação Descritiva, em especial a norma RDA.
FRBR stands for Functional Requirements for Bibliographic Records.
Functional Requirements for Bibliographic Records is a conceptual entity-relationship model developed by the International Federation of Library Associations and Institutions (IFLA).
A conceptual entity relationship model that relates user tasks of retrieval and access in online library catalogs and bibliographic databases from a user’s perspective.
A new conceptual model for bibliographic universe with a strong users focus .
The purpose of this entity relationship analysis was to discover the logical nature of bibliographic data in terms of entity, attributes and relationship.
presentation on "CATALOGUING" during Training workshop in library science for staff of muktangan school libraries organised by muktangan school teacher reference library, mumbai on 15th November 2010
NASIG Webinar 2014 "From Record-Bound to Boundless: FRBR, Linked Data and New...Juliya Borie
The use of linked data within the library community has the potential to significantly impact cataloging and may help improve information discovery and retrieval for the end user. For librarians and users alike, serial publications have been a constant challenge due to their complex publication histories and fluid nature. In this webinar, the presenters will reprise their NASIG 2013 Conference presentation, providing an overview of Linked Data developments within the library and journal publishing communities. By exploring serials in relation to FRBR principles and linked data modeling techniques, the presenters will describe how a search for periodical literature might be improved in a linked data environment. Taking description out of the current record constraints, serials librarians will be able to express the relationships between multiple versions of the same publication, and document how a particular journal has changed over time. The linked data model also opens up many opportunities for the provision of value-added content to bibliographic descriptions.
The International Federation of Library Associations and Institutions (IFLA) is responsible for the development and maintenance of International Standard Bibliographic Description (ISBD), UNIMARC, and the "Functional Requirements" family for bibliographic records (FRBR), authority data (FRAD), and subject authority data (FRSAD). ISBD underpins the MARC family of formats used by libraries world-wide for many millions of catalog records, while FRBR is a relatively new model optimized for users and the digital environment. These metadata models, schemas, and content rules are now being expressed in the Resource Description Framework language for use in the Semantic Web.
This webinar provides a general update on the work being undertaken. It describes the development of an Application Profile for ISBD to specify the sequence, repeatability, and mandatory status of its elements. It discusses issues involved in deriving linked data from legacy catalogue records based on monolithic and multi-part schemas following ISBD and FRBR, such as the duplication which arises from copy cataloging and FRBRization. The webinar provides practical examples of deriving high-quality linked data from the vast numbers of records created by libraries, and demonstrates how a shift of focus from records to linked-data triples can provide more efficient and effective user-centered resource discovery services.
Similar to Clusters from outer space: Primo Deduping and FRBRizing in Context and Reality (20)
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Key Trends Shaping the Future of Infrastructure.pdf
Clusters from outer space: Primo Deduping and FRBRizing in Context and Reality
1. Clusters from Outer Space
Primo Deduping and FRBRizing in Context and Reality
Laura Akerman, Nathalie Schulz, Amelia Rowe
With help from Lukas Koster
IGELU Annual Meeting September 12 2017 St. Petersburg, Russia
2. 1. Why do librarians bring things together?
It’s called “collocation”...
4. Functional Requirements for Bibliographic Records, 1991
The study uses an entity analysis
technique that begins by isolating the
entities that are the key objects of interest
to users of bibliographic records. The
study then identifies the characteristics or
attributes associated with each entity and
the relationships between entities that
are most important to users in
formulating bibliographic searches,
interpreting responses to those searches,
and “navigating” the universe of entities
described in bibliographic records.
IFLA Study Group on the Functional Requirements
for Bibliographic Records. Functional
Requirements for Bibliographic Records. K . G.
Saur München 1998
5. It’s all about what users do
● using the data to find materials that correspond to the user’s stated search criteria (e.g., in the context of a search
for all documents on a given subject, or a search for a recording issued under a particular title);
● using the data retrieved to identify an entity (e.g., to confirm that the document described in a record corresponds to
the document sought by the user, or to distinguish between two texts or recordings that have the same title);
● using the data to select an entity that is appropriate to the user’s needs (e.g., to select a text in a language the user
understands, or to choose a version of a computer program that is compatible with the hardware and operating
system available to the user);
● using the data in order to acquire or obtain access to the entity described (e.g., to place a purchase order for a
publication, to submit a request for the loan of a copy of a book in a library’s collection, or to access online an
electronic document stored on a remote computer).
IFLA Study Group on the Functional Requirements for Bibliographic Records. Functional Requirements for Bibliographic Records. K . G. Saur
München 1998
6. FRBR Work
work: a distinct intellectual or artistic creation.*
● An abstract entity - no one material item to point to
● Recognized in realizations or expressions;
● Work is the commonality of content between and among various expressions (example: Homer’s
Illiad)
● Sometimes difficult to define boundaries; differences may be cultural.
IFLA Study Group on the Functional Requirements for Bibliographic Records. Functional Requirements for Bibliographic Records. K . G. Saur
München 1998
7. FRBR Expression
expression: the intellectual or artistic realization of a work in the form of alpha-
numeric, musical, or choreographic notation, sound, image, object,
movement, etc., or any combination of such forms
● Any change in intellectual or artistic content constitutes a new expression
● Change in form (e.g. from alphanumeric to spoken word) - new expression
● Changes in physical form (e.g. typeface) are not new expression
● Examples of new expressions - translation
● My own “layman’s” term would be “version”
8. 2. How do librarians bring things together?
Technology made this change..
9. Card Catalog - linear arrangement
● Various ways of
organizing cards but
principle of bringing
together the various
versions of a work.
● “Deduping” could be
adding call numbers for
print and microform to
same card.
A. L. A. rules for filing catalog cards
1942. "Second printing, with
corrections,April, 1943”
https://catalog.hathitrust.org/Record
/002433836
10. Here you see in (b)
alternative rule,
something like the
origin of “uniform title”
concept - organizing all
translations via a
heading for the original
title and language.
11. California Digital Library “dedup” algorithm
“DLA merges book format records through a complex algorithm that assigns numeric "weights" for
matches on different parts of the bibliographic record. When the total of these weights reaches a certain
level, the records are considered to be sufficiently alike to warrant bringing them together as a single
database record. If the total weight does not reach this level, the records are not merged.
Not all data elements have to match exactly for the records to be merged. The use of weighting means
that some variation between the records can be tolerated, as long as the overall score is high enough to
be considered a match.”
Coyle, Karen. Technical Report No. 6 RULES FOR MERGING MELVYL(R) RECORDS* Revised June 1992 (copy
provided privately).
See also Coyle, Karen, and Linda Gallaher-Brown. "Record matching: an expert algorithm." ASIS'85: Proceedings of the
American Society for Information Science (ASIS) 48th Annual Meeting. Vol. 22. 1985.
12. Other approaches
● VTLS Cataloging system based on FRBR entities
https://www.slideshare.net/VisionaryTechnology/vtls-8-years-experience-with-
frbr-rda-4755109
● WorldCat Work Descriptions: http://www.oclc.org/developer/develop/linked-
data/worldcat-entities/worldcat-work-entity.en.html
14. Primo Dedup ...
● Derived from California Digital Library algorithm.
● Roughly equivalent to FRBR “Expression” level - edition of a book, director’s
cut of a movie, recording of a symphony by a particular orchestra on a certain
date
● Should bring together issuances of same content in different formats - print,
electronic, microform, etc. (manifestations)
15. Primo Dedup merged record
● Provides a merged record PNX - selecting one description out of the “dups”, then adding from all the
records:
○ local fields,
○ holdings/items from all the records.
● Primo’s selection of “preferred record” is based on the “delivery category” assigned by the Primo
norm rules. Current hierarchy is:
○ SFX resource
○ Electronic resource
○ Metalib resource
○ Physical item
16. Dedup - matching up “dups”
● Assign a “score” based on full or partial matching of selected fields, as indicated in the
“dedup” section of the PNX (created by normalization rules)
● Same field, different rules for serials, for articles, and for everything else
● If score meets target number, it’s a match.
● The Primo ingest pipe calculates match scores for every incoming record and assigns a
match ID associated with matching records. It also removes deleted records from a
match ID cluster, and adds or removes records to a match ID if their score changes.
● If changes are made to the dedup normalization rules, the records would need to be
updated (renormalization pipe or reload from source) to change.
● “Force dedup” setting on a renormalization pipe might be needed if you tinkered with
21. 5. Dedup at Emory Libraries (Laura)
● When we first implemented Primo in 2008-9, we experimented with FRBR but decided it was too
confusing for users. But we wanted dedup.
● Intent of dedup was to bring print, microform, electronic etc. versions of the same content together.
● Our big concern at implementation was, we were creating very brief records for electronic serials
from SFX, and they became the “merge record” and our lovely print CONSER full serial records
disappeared. Our solution at that time was to add 856 URLs to the serial records, making the print
record “electronic” to Primo’s norm rules, which put it on equal footing in the choice of merge record.
This was too much manual work.
● With Alma, things are better for e-serials; Community Zone e-journal records are fuller, so we can
choose fuller records for e-serials in Alma.
● From time to time when we have had dedup problems, Ex Libris support staff have suggested we
just use FRBR instead, but we have re-evaluated it and decided “no”.
22. The algorithm isn’t friendly to rare book cataloging.
The first edition and some of the
rare editions of this book were
deduping together.
Why? Dates...
23. Solution? Exclude entire library or location where
the rare stuff lives
(Screenshot of norm rules)
25. 245 10 |a Libellus |h [microform] / |c F.
Barholomei de Vsingn Agustiniani de falsis prophetis tam
in persona quã doctrina vitandis a fidelibus. De recta et
mũda predicatiõe euãgelij & quibus conformiter illud
debeat predicari. ...
264 _1 |a Erphurdie [i.e. Erfurt] : |b
[Matthes Maler], |c 1525.
300 __ |a 79 pages (4to) ; |c cm.
336 __ |a text |b txt |2 rdacontent
337 __ |a microform |b h |2
rdamedia
338 __ |a microfiche |b he |2
rdacarrier
500 __ |a Signatures: A-K4.
500 __ |a Title within ornamental
border.
510 4_ |a Panzer (Annales
typographici) |c VI: 503, 63
510 4_ |a Kuczyński |c 2681
245 10 |a Libellus |h [microform] / |c F.
Bartholomei de Vsingen Augustiniani de Merito
bonorum operum. In quo veris argumentis respondet
ad instructionem fratris Mechlerij Franciscani de
bonis operibus. quam inscribit christianã. ...
264 _1 |a Erphurdie [i.e.
Erfurt] : |b [Mathes Maler], |c 1525.
300 __ |a 70 pages (4to) ; |c
cm.
336 __ |a text |b txt |2
rdacontent
337 __ |a microform |b h |2
rdamedia
338 __ |a microfiche |b he |2
rdacarrier
500 __ |a Signatures: A-I4.
500 __ |a Title within
ornamental border.
510 4_ |a Panzer (annales
typographici) |c VI: 503, 62
26. Other side effects -
Our digitized books from the Rose Library Special collections (not in Alma) no
longer dedup with the source physical book records from Alma - even though we
retained the record ID in the digital metadata.
30. Why?
No identifiers in separate records that could break the dedup
245 (Title) subfield p (part) or v (volume) for volume number doesn’t have enough
weight to lower the score enough.
31. Solution - not nice
Add the MMSID for
each Alma record
for the 12 volumes
to the Dedup rule
so it will get a t99
“do not dedup”
value.
32. Same title same year (work in progress?...)
Both published 1999. Same composer and work (Chopin, Piano
Concertos nos. 1 & 2)
Artists (Arthur Rubinstein, Martha Argerich) are not part of dedup
algorithm!
● Ideas: Add mapping of 024 or 028 or 037 - (publisher numbers, repeatable, not consistently formatted, not
“universal”) as Universal ID (F1)
● Support suggested: Add record ID to F1 (Universal ID) as the last “or” choice, to subtract points/prevent dedups
The American movie directed by Steven Segal and the Chinese
language movie directed by Corey Yuen with the same title
were issued in 2009. I couldn’t find a thumbnail of our copy of
the Yuen movie which is a Videodisc.
33. 6. More Dedup problems at RMIT University
(Amelia)
Genki
● Two records with a single number different in the titles
● Number displayed in roman numerals I and II
● Primo was deduping the records and only displaying title metadata related to
Genki II
● Users couldn’t find Genki I
34. Screenshot of the DeDup
test in Primo BO.
This is how we identified
the title field was
matching.
35. Solution = Changed roman numerals in title (245 $a) to numerical representation
For example: 246 $a Genki 2
37. 7. Primo “FRBR clustering” (Nathalie)
● Simpler algorithm
● Uses author-title (or title only) keys to create clusters of records for a work.
● In the FRBRization part of a pipe, if a match is found based on the keys the
record is added to the same FRBR group.
38. FRBR matching
FRBR vector (simplified explanation)
K1 - Author part key (Fields 100 or 110 or 111 OR 700, 710, 711)
K2 - Title only key (Field 130)
K3 - Title part key (Not Serials: 240 and 245; Serials: 240 or if does not exist 245)
● Not all subfields are used.
● Normalization to remove punctuation, change to lowercase, etc.
● K1 and K3 are combined for matching, K2 is not.
39. FRBR problems (Nathalie, Bodleian Libraries)
● Records that you want to cluster, that don’t
● Records that cluster, that you don’t want to
● Sort order within clusters
(Examples are from http://solo.bodleian.ox.ac.uk - which has FRBR turned on, but
not dedup)
41. FRBR problems (Nathalie)
FRBR section of the PNX records
Print record
<k3>$$Kjournal of women politics and policy$$AT</k3>
Key used for matching: none
Electronic records
<k2>$$Kjournal of women politics & policy online$$ATO</k2>
<k3>$$Kjournal of women politics and policy$$AT</k3>
Key used for matching: journal of women politics & policy online
43. FRBR problems (Nathalie)
FRBR section of the PNX records
9th and 10th editions:
<k1>$$Kroberts harry$$AA</k1>
<k3>$$Kriley on business interruption insurance$$AT</k3>
Key used for matching: riley on business interruption insurance~roberts harry
7th and 8th editions:
<k1>$$Kcloughton david$$AA</k1>
<k1>$$Kriley denis$$AA</k1>
<k3>$$Kriley on business interruption insurance$$AT</k3>
Keys used for matching: riley on business interruption insurance~cloughton david
riley on business interruption insurance~riley denis
45. FRBR problems (Nathalie)
FRBR section of the PNX records
Print record - incorrect metadata! (24514 $aThree sisters)
<k1>$$Kcaldwell lucy 1981$$AA</k1>
<k3>$$Ke sisters$$AT</k3>
Electronic Record
<k1>$$Kcaldwell lucy 1981$$AA</k1>
<k3>$$Kthree sisters$$AT</k3>
46. FRBR problems (Nathalie)
● Records that cluster, that you don’t want to
○ This is subjective!
○ The normalization rules can be used to exclude records from clustering by assigning
“<t>99</t>”
● Oxford case-study
○ Excluded from clustering: printed maps, printed music, sound recordings, video recordings,
computer software, and printed books prior to 1830.
○ Individual records can also be excluded by adding a local field to the Aleph record (which is
used by the normalization rules).
47. FRBR problems (Nathalie)
● Sort order within clusters
○ Set in the Back office.
● Oxford case-study
○ At Oxford we have chosen relevance as that works best for people doing known item searches
as the result they want will usually be the first record in the cluster.
○ However, Date-newest would be preferable in some situations (e.g. multiple editions of a text
book)
○ Sometimes the most “relevant” record is not what you would expect ….
50. 8. FRBR problems (Amelia)
FRBR not occurring unexpectedly - such as minor differences in cataloging
51. Solution (to be implemented)
Add transformations to Normalization rules - FRBR Section
(thank-you Nathalie for the solution to this problem)
52. More FRBR problems (Amelia)
Tecnica dei modelli
● Fashion series split into 3 volumes
● Each volume has it’s own Alma record
● Primo was clustering the records and only displaying the $n information for
volume 3 in the search results
● Users couldn’t find volumes 1 and 2
53. Solution:
Add t=99 for records
with the series title 240
$a Tecnica dei modelli
Preventing FRBR (Amelia)
54. Other FRBR problems (Amelia)
● User understanding
○ How much do users understand about clustering?
○ How much do they need to know?
● Staff training requirements
○ How much do staff understand about clustering?
○ How much do they need to know?
■ Enough to help the users
55. Above: Screenshot of deduped item in Classic UI
Below: Screenshot of deduped item in New UI
DeDup : Classic Primo and New Primo
56. FRBR : Classic Primo and New Primo
Above: Screenshot of clustered item in Classic UI
Below: Screenshot of clustered item from New UI
57. Summary of issues with Primo
● 245 $n and $p not given enough weight
● Inability to DeDup or Cluster across all collections (example: Alma and PCI)
● Matching depends on textual strings in the metadata - this can have errors or
legitimate variations
● Deduping should not happen for rare book cataloging
● Lack of control on choice of the “merged record” for Deduping
● Lack of reliable identifiers in records especially for media….
● Lack of control...
58. The Future...
● New field approved to be added to MARC for work identifiers (URIs): 758
● Linked Data! If you define an Entity… it must have an Identifier (URI: URL
or URN).
● RDA/FRBR “Work” vs BIBFRAME “Work” (RDA Expression?)
● Not clear where the overlaps or agreements are in version 2.0
● BIBFRAME still being refined
59. Questions:
How might we address problems with deduping and FRBR clustering?
Should the algorithms be modified?
Should Work and Expression identifiers be generated on-the-fly in Alma and
Primo, or be generated once, be stored and be editable?
Is Primo Dedup merged display best for users? What other approaches might
work better?
60. Contacts:
Laura Akerman, Discovery Systems and Metadata Librarian, Emory University
liblna@emory.edu
Nathalie Schulz, Systems Analyst, Bodleian Libraries, University of Oxford
Nathalie.Schulz@bodleian.ox.ac.uk
Amelia Rowe, Applications Librarian, RMIT University
amelia.rowe2@rmit.edu.au
61. Credits:
● Opening image: NASA, Hubble Space Telescope image, Gas Clouds and Star Clusters, NGC 1850.jpg
● Image from Cutter, Charles A.,1837-1903, Rules for a printed dictionary catalogue. Washington: Government
Printing Office, 1875, retrieved from Hathi Trust, https://catalog.hathitrust.org/Record/009394960
● Frank Sinatra and Martha Argerich album cover and Above the Law (Segal) DVD cover thumbnails from
Amazon.com
● Artur Rubinstein album cover thumbnail from Discogs.com
● Above the Law (Yuen) DVD thumbnail from Internet Movie Database
Editor's Notes
(Laura) The origin of this talk is, I started receiving a spate of dedup issues reported by other librarians at Emory University and thought it’d be an interesting topic. But I wanted to take a higher level view of the process and wanted to include FRBR which we don’t have experience of. So I put out a call to collaborate on the Primo list and was delighted to find great collaborators in Nathalie Schulz from the Bodleian Library, Oxford University, and Amelia Rowe from RMIT University in Melbourne, Australia.
Before we get into the juicy problems, I will start off with a little background - bear with it… Librarians have been bringing descriptions together for a very long time to assist users to find what they want.
Bringing all versions of a work together before online catalogs involved arranging cards for different version to be found together in the card catalog, as well as on the shelf due to in the classification numbers assigned to books. Works were generally to be found/identified in the card catalog by author and title, but if there was any ambiguity, a uniform title could be constructed which would be unique in combination with the author name. Certain special authors might have more elaborate arrangment into sections by language, special sections for complete and selected works, compilations by form (e.g. “Poetical works”). This led to different sorts of uniform titles
In 1991 the International Federation of Library Associations published Functional Requirements for Bibliographic Records. This was a result of intense committee work to develop metadata requirements for libraries at the national level. The group analyzed both user tasks that needed support, and a definition of conceptual entities that lay behind those tasks and their relationships. This was really a new conceptualization of description for discovery of information resources.
Some of the things we think users do include: determining if an electronic version of the same text and edition that exists in print is available through the library - or vice versa (some users prefer print); determining if a particular described sound recording contains a particular song; finding a specific rare printing of a book described in a specialized bibliography
So in this presentation it is good to look at the FRBR Work and Expression entities and the differences between them. Work is abstract and something that could have grey areas - I am thinking of some manuscripts and serially published things... It’s a mental idea of all the versions of what we think of as being “the same work”.
This is a bit of ancient history for most libraries today, but in a dictionary catalog, alphabetical arrangement of titles (within an author section of the catalog if there was an author main entry) would bring together different editions of books.
Merging and deduping of records for “same content” was needed when large scale union catalogs, incorporating records from many sources, became possible with library automation and online catalogs. In an email to me, Karen Coyle, one of the authors of the algorithm, pointed out that author names were not as reliable when this was developed, due to a mix of old and new cataloging rules (AACR1 and AACR2), and that times have changed. This algorithm was the source for the dedup algorithm in Primo
I’ve not had the time to do deep research into other system models but wanted to note these. VTLS is now owned by Innovative Interfaces; their literature speaks of users being able to search once and retrieve all related versions of a work including those with variant titles and different languages. OCLC has been developing algorithms to identify “work entities” and associate them with clusters of records; these identifiers are available under an Open Data License and can be found in the linked data section of
So this is not a full tutorial but I’m just going to hit some highlights about how dedup works in Primo. The approach is based on the California Digital Library’s approach to merging “duplicate records” from its database. Only instead of weeding out duplicates (which you should do in your ILS system before things get to Primo!), the idea is to combine descriptions of essentially the same content that may have different formats - e.g. electronic, microform, print; CD vs. streaming audio; etc.
Here’s where the tricky part comes in. There really isn’t a science behind assigning a score to each combination of two records and calling it a match if it meets certain threshold. That was developed through trial and error by California Digital Library.
This is just a small part of the algorithm that, if you have server access to Primo, you could find and edit. You can see that it scores the point scores for various kinds of matches in data elements between two records. I think we have tried this a couple of times but have not had much success. If others have, would be interested to hear your experiences. This file is not protected from being overwritten by updates, as far as I know.
Here you have your average normalization rule for dedup. Field “F5” contains a title. The condition says this rule is only for “non-serials”. It takes the title 245 field, including title, subtitle, number and part. Then it heavily normalizes the string to remove initial articles and various punctuation - brackets, ampersands, etc. There’s another rule for F5 for serials - it operates on the 022 field subfield z (invalid ISSN) These rules can be modified - at your own risk!
The DEDUP Test utility is a wonderful tool for understanding the complex process by which the dedup stage of record loading determines whether two records match and assigns them a dedupmrg number and match ID - or not. This happens in two stages. Title, date and record identifier get checked and basically, if the record IDs don’t match but the title and date do, it goes to full comparison. This is where “points” are assigned for full or partial matching, or subtracted for non-matching. Notice that the short title matches here and that gets 450 points
Notice that the long title doesn’t match completely, but enough words match so that it still gets 400 points. More about this later...
I regret I don’t have a screenshot of the deduped record here, but imagine all of these records deduping together. The date matching actually allows a couple of years variance. 25 points are subtracted for lack of exact match within 2 years, not enough to prevent the deduping in some cases.
I tried various less drastic changes, but they weren’t working. Our Special Collections library (now called The Stuart A. Rose Library for Manuscripts and Rare Books) was impatient - a class was going to study this autobiography and its editions, and they couldn’t abide Primo clumping them together (and a lot of other rare editions as well). This is just one example of many. So we ultimately added a rule to give all records with an item in that library a “do not dedup” value of t 99. Later, we did the same for the special collections location in the Theology library.
But rare books in microform or electronic collections are still clumping...
We want our rare books that we have digitized to dedup with the digital version, but they no longer do so - this is a tradeoff we’ve not found a solution for.
Twelve fabulous recordings of “ol’ blue eyes’ in our collection. Here’s a somewhat fuzzy image of the cover of Vol 7, noting the songs “Night and Day”, “But Beautiful”, “The Song is You”, and “What’ll I Do?”
When I searched for Frank Sinatra the Columbia Years in our Primo, I got two results - the record for the set, and a record for “Vol. 10 the Complete Recordings”. What happened to volume 7? If we look at the item details, we can see what happened. Primo deduped all the volumes. So only the contents note of vol. 10 displays.
Viewing the PNX in the PNX viewer and clicking on “Match ID” confirms this
The first problem, I neutralized in Production by adding “t 99” to these record IDs. Someone reported the second one - the videos - while I was at this conference. There are publisher identifier numbers in tags 024 or 028 or 037, that could be added but both fields are repeatable, there may be inconsistencies in the formatting, and these numbers aren’t universal. I’m nervous about using them. The suggestion to add the Alma record ID to F1, the Universal ID field to which is mapped the 010 (Lc card number) might result in breaking this dedup, but how many others that we do want to happen would it break? We will test these approaches but are not optimistic.Now I’m going to turn this over to Amelia Rowe who’ll tell you about more fascinating dedup problems at RMIT University.
Items not deduping between collections/pipes is the greatest cause for confusion for our users.
We can teach them about DeDup and clustering but if they see a behaviour that doesn’t match what they’ve been taught they get confused and think something is wrong.
Pipe related notes:
Because Dedup/FRBR is done at the pipe level there is always content that isn’t dedup/frbr-izing as the end user would expect.
At RMIT we ingest resources from a variety of locations (with 10 active pipes)
Some resources may be available between multiple pipes, and/or in PCI.
When some records don’t dedup but others do this causes confusion especially for staff.
Screenshot = record that is the same in our Research repository and our research bank, yet they don’t dedup because they are in different pipe.
In this instance the TN_rmit_res33432 record is an RMIT originating record that has found it’s way into the PCI
Note: users don’t see the record ID I have used a bookmarklet to display this for my own purposes.
There is no fix/solution for this
Records are assigned to FRBR clusters in Primo as part of the pipe. Keys based on the author(s)/titles are compared with other records and if a match is found the record is then added to the same FRBR group. A record can only be part of one FRBR group.
There are three different types of keys in the FRBR Vector:
Author part - uses the “main entry” (1XX fields) and if this is not present the added entry author fields (7XX)
Title only key - uniform title from field 130
Title part key - uniform title from field 240 and title from 245 (except for serials which have a 240). There are other fields included in the normalization rules for when there is no 240 or 245 field but as records are rejected from Primo if there is no 245$a, these will rarely be used.
Not all subfields are used, e.g. subfield $l (Language) in the 240 field is not used, which allows the original and translations of a work to cluster together.
The Author part keys and Title part keys and combined to make strings for matching, while the Title only key is not
There is a detailed explanation on the Ex Libris Customer Centre at: https://knowledge.exlibrisgroup.com/Primo/Product_Documentation/Technical_Guide/040FRBRization/010The_FRBR_Vector
Sometimes records simply do not have enough information to create matching keys. In this case the print record does not have any author information so there is only a title key. The two online titles that cluster have the same uniform title.
The print record only has a title part key, and this is not used for FRBR matching on its own.
The electronic records have a K2 fields which are used on their own for matching.
Note: the keys used for matching are stored in the p_frbr_keys table.
If the author changes between editions (which often happens with legal works), the keys won’t match and so they do not cluster.
The 9th and 10th editions have a 100 field for “Roberts, Harry”.
The 7th and 8th editions have 700 fields for “Cloughton, David” and “Riley, Denis”.
In this example the reason for the records not clustering is not immediately apparent.
Primo can only work with the metadata as found in the records - if this is incorrect, as in this case (non-filing indicator), the records will not cluster.
The University of Oxford has had Primo since 2008, but until mid-2011 when we moved to Aleph there was also a separate OPAC. Moving to “Primo only” meant staff in the libraries started to look more closely at the clustering. As part of a review we trialled (in a test version) turning off clustering. After staff testing and usability testing the decision was made to go with partial clustering and there have not been any calls to change this.
The normalization rules to exclude clustering make use of fields from the Aleph records, both standard FMT fields and local RTP (Record Type) fields which is how we identify most of the pre-1830 books. We also have a local “SOL” field that we can use to exclude individual records from clustering.
When we were reviewing clustering at Oxford, we had the default sort order set to “Date newest”. Some of the complaints that people had about clustering were because the specific record they wanted could be hard to find within a cluster. Changing to sorting by “relevance” helped with this. However, there are times when the “relevant” record is unexpected.
In this example, Primo is considering the Spanish translation to be the most relevant and is presenting it as the “top” record in the cluster
See the screenshot for an example of two records not deduping (record 1 and record 3) because of the ampersand in the title
Record 1 is the 5th and 3rd editions FRBRized
Record 3 is the 2nd and 4th editions FRBRized
Users would expect all of these record to be found under the one record
In this example FRBR is correct however the confusion for users meant we had to change the system's behaviour for users
Note: Setting t=99 is a solution we typically try to avoid as it risks creating very complicated frbr:t normalization rules
Typically we try to edit the cataloguing to prevent dedup
While those of us who work closely with Primo and have a concept of FRBR and what it is and how it works in Primo the majority of library staff and our users do not have this understanding. This makes the display in Primo confusing for users.
Have you tried explaining FRBR to your staff or users? I’ve explained it many times to staff and still there is not a clear understanding of what it is within Primo.
To help overcome this confusion staff at RMIT are working on an online IST (in-service training) module specifically related to DeDup and FRBR to help educate our staff who can in turn help the users.
Sharing how dedup and Clustering are presented in New vs Classic Primo UI
In some instances the New UI is more user friendly in the way it presents the records
Ex. deduped print and electronic material appear in the same full record instead of separate tabs
In classic view versions was easily lost in the top right hand corner of the record.
Now it is part of the records availability
Note: we are thinking of changing the terminology from “versions” to “editions and formats” in the hope of using terminology that better explains the functionality to our end users.