What's in a Name, Fernando Pessoa?

•Download as PPTX, PDF•

0 likes•195 views

The talk was delivered at the NEA/ART Spring meeting, 2018, as part of the "Connections and Context: Three Projects in Archival Description" panel.

Data & Analytics

What’s in a Name,
Fernando Pessoa?
Adding URIs to Archival Description

Potential Connections
• Beinecke Library
• Pessoa, Fernando, 1888-1935
• http://id.loc.gov/authorities/n
ames/n50016857
• National Library of Portugal
• Pessoa, Fernando, 1888-1935
• (PTBNP)10380

Connections without Communicating
• Beinecke Library
• Pessoa, Fernando, 1888-1935
• http://id.loc.gov/authorities
/names/n50016857
• National Library of Portugal
• Pessoa, Fernando, 1888-1935
• (PTBNP)10380
https://www.wikidata.org/wiki/Q173481 https://www.wikidata.org/wiki/Q173481

Pessoa, Fernando, 1888-1935
http://id.loc.gov/authorities/names/n50016857

Project Requirements
Do not assume that any step will be problem free!
1. Every Resource record is linked to its MARC record
2. Every subfield 0 match is accurate
3. Verify that each match can be downloaded / imported
4. For each record pair, ensure that the headings match
Do not assume that any step will be problem free!

Near-Match Issue
600 1 0
$a Ford, Ford Madox, $d 1873-1939 $0
http://id.loc.gov/authorities/names/n810502328

Near-Match Issue
• Ford Madox Ford != his maternal grandfather
600 1 0
$a Ford, Ford Madox, $d 1873-1939 $0
http://id.loc.gov/authorities/names/n810502328

Solution
• Two parts:
1. Compare authorized name with name string
2. Check for multiple subfield 0s

Summary
• We are enhancing data in two, local systems
• We want to connect to external systems
• We want our description to be recognized outside of
our domain
• URIs are the first (not straightforward) step
• It’s not about links, but the potential for links
• Once connected, the network changes

Code created; code shared
• MARC XML analysis:
https://github.com/fordmadox/xquery-scripts
• Authority download and ASpace Linking:
https://github.com/mark-cooper/authorizer

Recently uploaded

bams-3rd-case-presentation-scabies-12-05-2020.pptxJocylDuran

SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjadimosmejiaslendon

Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta

How to Transform Clinical Trial Management with Advanced Data AnalyticsBrainSell Technologies

RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay

Ranking and Scoring Exercises for ResearchRajesh Mondal

Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...mikehavy0

Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Voces Mineras

Audience Researchndfhcvnfgvgbhujhgfv.pptxStephen266013

Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...Payal Garg #K09

Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...ThinkInnovation

Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics

Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Klinik Aborsi

Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh +966572737505 get cytotec

Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Riyadh +966572737505 get cytotec

sourabh vyas1222222222222222222244444444saurabvyas476

Displacement, Velocity, Acceleration, and Second Derivatives23050636

obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...yulianti213969

Case Study 4 Where the cry of rebellion happen?RemarkSemacio

Client Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptxStephen266013

Recently uploaded (20)

bams-3rd-case-presentation-scabies-12-05-2020.pptx

SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj

Harnessing the Power of GenAI for BI and Reporting.pptx

How to Transform Clinical Trial Management with Advanced Data Analytics

RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx

Ranking and Scoring Exercises for Research

Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...

Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...

Audience Researchndfhcvnfgvgbhujhgfv.pptx

Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...

Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...

Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age

Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...

Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec

Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec

sourabh vyas1222222222222222222244444444

Displacement, Velocity, Acceleration, and Second Derivatives

obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...

Case Study 4 Where the cry of rebellion happen?

Client Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptx

Featured

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools

12 Ways to Increase Your Influence at WorkGetSmarter

Featured (20)

Product Design Trends in 2024 | Teenage Engineerings

How Race, Age and Gender Shape Attitudes Towards Mental Health

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...

12 Ways to Increase Your Influence at Work

What's in a Name, Fernando Pessoa?

1. What’s in a Name, Fernando Pessoa? Adding URIs to Archival Description

10. Potential Connections • Beinecke Library • Pessoa, Fernando, 1888-1935 • http://id.loc.gov/authorities/n ames/n50016857 • National Library of Portugal • Pessoa, Fernando, 1888-1935 • (PTBNP)10380

11. Connections without Communicating • Beinecke Library • Pessoa, Fernando, 1888-1935 • http://id.loc.gov/authorities /names/n50016857 • National Library of Portugal • Pessoa, Fernando, 1888-1935 • (PTBNP)10380 https://www.wikidata.org/wiki/Q173481 https://www.wikidata.org/wiki/Q173481

12. Pessoa, Fernando, 1888-1935 http://id.loc.gov/authorities/names/n50016857

13. Project Requirements Do not assume that any step will be problem free! 1. Every Resource record is linked to its MARC record 2. Every subfield 0 match is accurate 3. Verify that each match can be downloaded / imported 4. For each record pair, ensure that the headings match Do not assume that any step will be problem free!

14. Near-Match Issue 600 1 0 $a Ford, Ford Madox, $d 1873-1939 $0 http://id.loc.gov/authorities/names/n810502328

15. Near-Match Issue • Ford Madox Ford != his maternal grandfather 600 1 0 $a Ford, Ford Madox, $d 1873-1939 $0 http://id.loc.gov/authorities/names/n810502328

16. Solution • Two parts: 1. Compare authorized name with name string 2. Check for multiple subfield 0s

17.

18.

19. Summary • We are enhancing data in two, local systems • We want to connect to external systems • We want our description to be recognized outside of our domain • URIs are the first (not straightforward) step • It’s not about links, but the potential for links • Once connected, the network changes

20.

21.

22.

23.

24.

25. Code created; code shared • MARC XML analysis: https://github.com/fordmadox/xquery-scripts • Authority download and ASpace Linking: https://github.com/mark-cooper/authorizer

Editor's Notes

Good afternoon, everyone. As Karen mentioned, I will be going into a bit more detail about our efforts to enhance Yale’s legacy and current archival description by associating URIs with name and subject headings. To do that, I have decided to frame my talk around Fernando Pessoa.
There are a few different reasons why I have selected Fernando Pessoa, represented here in the Social Networks and Archival Context interface, but the most important reason is because of Pessoa’s proclivity for creating and often writing as a variety of heteronyms – over 70 throughout his lifetime.
Pessoa preferred the term heteronym to pseudonym, since, as he said himself, his heteronyms were “authors to whom he served as literary executor.” Many of the names you see listed here in SNAC, which can be accessed in the interface by clicking on a link labelled alternative names, have been described as different writers, different people, which also begs the question, today, as we undertake Linked Data projects, if those different names require different URIs. In SNAC, there is one URI for Pessoa. In the Library of Congress Name Authority File, however, some of his heteronyms have their own authority records. An authority record for Ricardo Reis was just created last year, for example, in LC’s database. And even though there is only one Wikipedia entry for Pessoa in the English-language Wikipedia…
…what you see on this next slide is an entry for Ricardo Reis in the Portuguese-language edition of Wikipedia. There are also stand-alone URIs for some of Pessoa’s heteronyms in Wikidata.
In fact, if you view his entry in Wikidata, as seen here…
You will find a series of statements that are grouped under a heading that’s labeled “said to be the same as”. Here, you will see an entry for Ricardo Reis and others. Each of these entries is a URI. There is also a URI for the term “heteronym” in Wikidata, which is how all of these same-as relationships are characterized. Furthermore, there is a URI for the concept of “said to be the same as” in Wikidata. At this point, we are starting to go down the Linked Data rabbit hole. I don’t plan to do that in today’s talk, so instead I would like to pull things back for a moment and provide a concrete example of the value in adding URIs to archival description.
On this slide, I have included an image of the recently-released Digital Edition of Pessoa’s writings, which is a collaborative project undertaken by the New University of Lisbon and the University of Cologne. This site currently contains, among other writings, all of the poetry published by Pessoa in his lifetime.
Here, for example, is an encoded transcription of one his of his poems, attributed to his birth name, alongside a digitized copy of that same version of the poem. Now, as for the connection to the project that Karen and I are reporting on today, because Pessoa is one of the agent records in our ArchivesSpace database, we will be updating our record for him with a URI. And here….
…is how that Agent record looks in the development version of our ArchivesSpace Public User Interface. The only reason we have an agent record for Pessoa is because the Beinecke Library acquired a draft of the same poem that I just showed you in the Digital Edition website, which is represented here on the screen by the single search result. Of course, our local description would probably never go to the lengths of providing a researcher a link to an encoded version of the poem hosted elsewhere. But what happens when we add URIs to our description?
When we use URIs, we create potential connections. And these potentialities can be realized without us doing anything else. How? Well, as you see in this slide, imagine that the Beinecke has added a URI to the LC Name Authority File. Also imagine that the Digital Edition website has done the same, but instead of using a LC URI, they have used the authority ID from the National Library of Portugal. So now have two IDs that aren’t the same.
But because of Linked Data services, such as Wikidata, potential connections exist. Anyone can connect these two IDs, at the time of need, because Wikidata records them both. In fact, in addition to the LC NAF ID and the National Library of Portugal ID, the Wikidata record for Pessoa references 48 other IDs, all of which refer to Pessoa, in different systems, different languages, all over the world. In other words, we can provide a solid foundation for connections without even having to communicate with other description providers. So that’s why we’re adding URIs, and that’s also why I hope that everyone else is adding URIs or considering to add such URIs to their archival description. But Fernando Pesssoa and his heteronyms are also why adding URIs is not a straightforward process, let alone describing relationships amongst those URIs
But all projects have to start somewhere, so that’s why I’ve started with the example of what it takes to enhance a single name string with a URI. When we started our project, I did not have a grasp on how many headings we would be updating, but now that we have gotten to this point, I can tell you that we have added exactly 31,665 unique URIs to nearly 10k finding aids. And along the way, we’ve made mistakes, so next I just want to talk a little bit about how we reviewed our work.
When it came to quality control, we had four general project requirements. First, we had the task of connecting each ArchivesSpace record with its corresponding MARC record. Simple, right? Well, we had a few issues here, such as the wrong links being made accidentally, as well one issue where a single collection had so many access points that it was split, long ago, into two MARC records in our ILS, whereas we have a single Resource record for that collection in ArchivesSpace. Next, we wanted to ensure that every subfield 0 that we added during the course of the project was accurate. Most of the subfield 0s were added automatically, by Backstage, and only when the name string was an exact match with the primary heading from an authority record. In all of those cases, we had to hope that the archivist or cataloger added the name string correctly in the first place. We also added a much smaller subset of URIs manually, when our group reviewed the “near match” reports provided by Backstage, as Karen already mentioned. And, in my experience, whenever you have more than one person doing more than one thing manually, you are going to get a variety of errors, so you have to check the results. Third, and this was a simple one since we had LYRASIS do it for us, we had to make sure that for every subfield 0 we added, we were able to download its authority record from LC or the Getty. And that’s just a simple numbers check. Finally, we have to verify that all of the headings that we have in our ILS are also in ArchivesSpace and linked to the exact same descriptive records. This is also basically a numbers check, but it is a bit more nuanced since ArchivesSpace does not align one-to-one with bibliographic description. For just one example, a Meeting Name in MARC is mapped as a Corporate Agent record in ArchivesSpace, making it indistinguishable from other Corporate Agent headings. Not a travesty by any means, but it makes our last stage of verification thornier than I would like. The important takeaway, though, is that we had errors at every stage in the project that we needed to correct, so next, I am going to show 3 examples of errors encountered….
URIs are opaque, so everything looks fine here…
Wrong URI, Right Name (sort of, as FMF was named after FMB).
In this case, we caught the error since the MARC record eventually had two subfield 0s. The reason: we sent this record to Backstage twice during the course of the project, and when it came back the second time, it had two subfield 0s. The second was the correct one; The first one was for FMB. So, we removed the first one! We also checked all of our matches with the authorized headings to ensure that there weren’t any other blatantly wrong matches.
Getty responded within hours to apply a fix so that we could download this record. We also had to report a handful of issues to LoC during this project, when we discovered that same records were available at authorities.loc.gov but not available (due to indexing issues?) at id.loc.gov.
Don’t use undifferentiated records. We had 135 matches to these types of records, and once we discovered that, we removed those subfield 0s (but not the headings) from our records. In this case, the record is for a G.B. who published a book in 2015 as well as a G.B. who was a 19th century musician.
All our finding aids, subject, and agent records (now represented as URIs) from this project, ingested as a graph with Gephi. 43,109 nodes: 51% agent headings, 26% subjects, 23% finding aids 137, 844 connections / edges: 40% are 650 topical headings (orange lines), 22% are 600 and 700 headings (pink), and nearly 20% are geographic headings (green).
Oft-used subject heading in the Beinecke.
J.B., here in isolation, but she is / would be much, much more central in other graphs (even this graph, if we described relationships among people, in addition to material-to-people relationships). And the point here is that once we add URIs for these entities, we create that potential.
Like Baker, another very isolated agent record in our graph…
But the underlying metadata, seen here through Google’s eyes, provides one possibility for (re)connecting Pessoa outside of our “Archives at Yale” graph.
The first link includes a few scripts used to review our MARC XML records before sending them to LYRASIS (checking for typical issues we encountered, like a URI link in a subfield other than subfield 0, etc.) The second link is the amazing set of tools developed by Mark Cooper, at LYRASIS, which downloaded authority records (from LC and Getty), gets them into ASpace, and links those authority records back to our Resource records (by means of our “bib ids” from our ILS).

What's in a Name, Fernando Pessoa?

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

What's in a Name, Fernando Pessoa?

Editor's Notes