Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

What’s in a Name? Text and Image for Indexing Prosopographical Data

423 views

Published on

Presentation by Eduard Frunzeanu and Régis Robineau at the Summer School “Reconstitution of Early Modern Cultural Networks. From Primary Source to Data” (Médiathèque Louis-Aragon, Le Mans, France), organized with the support of Humanities at Scale (DARIAH-EU) and the City of Le Mans, and in partnership with Biblissima and the Centre d’Études Supérieures de la Renaissance (Tours).

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

What’s in a Name? Text and Image for Indexing Prosopographical Data

  1. 1. What’s in a Name? Text and Image for Indexing Prosopographical Data Eduard FRUNZEANU Régis ROBINEAU biblissima.fr / @biblissima Summer School “Reconstitution of Early Modern Cultural Networks. From Primary Source to Data.” Médiathèque Louis-Aragon, Le Mans - 2017, July 5th
  2. 2. Prosopography - compilation of names found in documents - their identification and indexation - study of the life, the career, the relationships of people within a contextual frame (geographical, historical and/or professional) - linear and factual point of view about the individual's life seen as a continuum - does not analyse the individual in a personality perspective - does not take into account the historical and social conditions of what makes possible an event
  3. 3. ▪ 1. Model the data ▪ 2. Prepare the data for interoperability ▪ 3. Link the data to a visual library
  4. 4. 1. Model the data • No standards commonly used for indexing prosopographical data • Libraries/Archives/Museums vs scientific databases: partially similar data • Models and encoding formats used in libraries/archives/museums: - FRBR (Functional Requirements for Bibliographic Records) - FRAD (Functional Requirements for Authority Data) - REICAT (REgole Italiane di CATalogazione) - ISAAR (International Standard Archival Authority Record) - EAC (Encoded Archival Context) - CIDOC-CRM (Conceptual Reference Model) - RDA (Resource Description and Access)
  5. 5. 1.1 Which data? • Libraries: person/family/corporate body as participant in a document (text, sound, image) held in a library collection or as concept of a document - Each document is indexed as a record (bibliographical or archivistic) - The records are linked to an authority file - Very few semantic relationships between the person/family/corporate bodies • Prosopographical databases: any historically attested person/group - Index of persons in relation to an historical document - Many types of semantic relationships
  6. 6. 1.2 Authority files: what for? • Authority files: - identify an entity and distinguish it from other entities identified by the same name Martin, Jean (15..-15..? ; imprimeur imaginaire à l'adresse de Reims) Martin, Jean (15..-157.? ; imprimeur) Martin, Jean (15..-16.. ; imprimeur imaginaire à l'adresse de Lyon) - regroup all the graphical forms of a name as it appears in the different records existing in the catalogue - link works to a specific person, family, or corporate body Divina commedia to Dante Alighieri (1265-1321) - group together various editions of a work Divina Commedia di Dante Alighieri: col commento di Christoforo Landino, Brescia : Bonino de' Bonini, 31 V 1487 to Divina commedia
  7. 7. 1.3 Authority data • Entities: - person - family - corporate body - work (Divina commedia) > expression (its translation in French by B. Grangier) > manifestation (published in Paris, 1597) > item (located at Paris, BnF, RES-YD-817-819) - concept - object - event - place • Characteristics/ Attributes of each entity
  8. 8. 1.4 Person • Person - As agent (participated to the production of an entity, text, image or event) - As concept (attested by an entity, text or image) - A name does not correspond to a person: pseudonyms ≈ persona ≠ individual - Name known but person unknown: L. R. E. P. - Appellations established by researchers - in association with another person: Master of Boucicaut - on the basis of anagrammatic clues: Vivien de Nogent - in association with other kinds of entities - work: Master of the Epître d'Othéa - edition: Printer of Alexander de Villa Dei, Doctrinale (GW 963)
  9. 9. 1.4 Person - Divinity or literary figures attested as document creators: Zoroaster, Orphaeus - Bibliographical fictions/Ghost names: Gelasius Cyzicenus (a name issued from bibliographical confusion, associated as author of an Ecclesiastical History), Alcadinus (the work attributed to him in some manuscripts is in fact by Peter of Eboli), Serapion iunior (the work of this hypothetic writer was identified as being the translation of a treatise of ʿAbd al-Raḥmān ibn Wāfid) - Borderline case between pseudonym & real name: Meffreth, Salomon Trismosin - Pseudonym ≠ Nickname ≠ Heteronym (unattested in Middle Ages): - as an identity purposely taken for various reasons (e.g. usurp another identity): - Plutarch - Pro nobilitate, a forgery by Arnoul Le Ferron published in Lyon, 1556 - Seneca philosophus (pretended author ; 1939-....) – Letter from Corsica, a forgery by Giovanni Galli published in Ajaccio, 1995
  10. 10. 1.4 Person - pseudonymity as established by philological critics: Pseudo-Augustinus. Several pseudonyms distinguishable based on chronology: - Seneca philosophus (pretended author; 006.-009.) – author of the tragedy Octavia - Seneca philosophus (pretended author; 03..-03..) – author of Correspondence with Saint Paul - assumed by an author (individual or collective): - Real person unknown Cercamon (<Cherche-monde>, <Court-le-monde>), Gasteblé, Jean Martin (Rabelais’ printer) - Real person known: François Rabelais = Alcofrybas Nasier (anagramme) - Anonymous – no library catalogue encodes this kind of entity (as a result, sometimes partial encoding: Trois versions rimées de l'Évangile de Nicodème/ par Chrétien, André de Coutances et un anonyme): - Institutional anonymous – liturgical texts (Missale, Horae) - Literary anonymous – Ogier le Danois. IFLA (International Federation of Library Associations) maintains a list of the anonymous classics - Semi-anonymous (≈ appellation established by researchers): Anonymous of Bec
  11. 11. 1.4.1 Person attributes • Preferred form of the entity name (could be an entity per se) • Identifier assigned to the entity (could be an entity per se) • Variant forms: - alternative linguistic forms - acronyms: Mr G… D… P… = Paul Girardot de Préfond - abbreviated forms: A. F. de Fourcroy = Antoine-François Fourcroy - name in religion: Petrus Hispanus = Johannes XXI (pope ; 1220?-1277) - nickname: Longbeard for William Fitzosbert (11..-1196), Taillevent for Guillaume Tirel - honorifics: Doctor angelicus = Thomas Aquinas - historically attested forms/orthographical variants: NB: for early languages: problem for clustering the entities that come from several sources - ex. Simon Hayeneufve/ Hayneufve/ Haineuve/ Haie-Neufve
  12. 12. 1.4.1 Person attributes • NB: Various customs for transliteration (languages with non-Latin alphabets) BnF: - Mésué, Yaḥya ibn Mâsawaik (l ancien) - Ibn Māsawayh, Yaḥyā Abū Zakarīyā (0777?-0857) - Mesuë l ancien, Yaḥya ibn Mâsawaih dit (dit aussi Jean de Damas et Jean Damascène) VIAF: http://viaf.org/viaf/112670997 - 17 preferred forms - 294 variant forms: Ben-Massawaih, Yohanna // Ibn Māsawayh, Yuḥannā // Ibn Māsūyah, Yūhannā // Johannes Mesue etc.
  13. 13. 1.4.1 Person attributes • Date & Place of birth/death • Dates & Places of residence • Dates & Places of activity: e.g. printer & his workshops (Hermann Liechtenstein: Treviso, Venezia, Vicenza) • Dates & Places of participation in events: councils, battles • Jurisdictional affiliation (diocese, archdeaconry) • Use standardized styles for dates: numerical (14..?-15..?) not alphanumerical (around 1500) or textual (beginning of the XVIth c.) • Use a regular syntax for the place’s names: City (Region, Country). Include a URI from an authority file (Geonames etc.)
  14. 14. 1.4.1 Person attributes • Gender: male, female, unknown, other NB: unisex first name (Anne, Claude, Dominique) - Anne de Montmorency Dictionnary of Medieval Names from European Sources: http://dmnes.org/names Anne = only feminine • Languages of written/oral expression • Titles: offices, titles of nobility, ecclesiastical titles, academic degrees • Profession/Occupation • Biographical notes • Roles with respect to an entity (work, expression, manifestation, or item)
  15. 15. 1.4.1 Person roles • Controlled Vocabularies: http://data.bnf.fr/vocabulary/roles/ • Create new roles depending the dataset: - Chancellor - Keeper of the seal - Ambassador etc.
  16. 16. 1.5 The problem with homonyms • Enemies of the librarian: humidity, fire, rats, and… homonyms • Hundreds of Johannes • Use entity attributes in order to distinguish homonyms: - Dates: - Johannes Petrus (12..?-13..?) - Johannes Petrus (13..?-14..?) - Profession/Roles: - Johannes (physician) - Johannes (scribe) - Johannes (miniaturist) - Dates & Profession/Roles
  17. 17. 1.5 The problem with homonyms - Document shelfmark/ID etc.: - Johannes (attested by Paris, BnF, latin 3260 f. 23v) - Johannes (attested by Paris, BnF, latin 3260 f. 24r) Encode the inference or hypothesis that 2 occurrences of a name point to the same person
  18. 18. 1.6 Person responsibility • Attributions • Degrees of certainty: - Certain - Possible - Probable - Doubtful - Rejected - Sometimes attributed to - Wrongly attributed to - In the past attributed to
  19. 19. 1.7 Corporate bodies • Date first attested/created – Date last attested/disappeared • Associated place(s) (e.g. itinerant courts in medieval kingdoms) • Language(s) (e.g. French Royal Chancery used both French and Latin) • Field of activity (e.g. book trading, teaching, administration) • Several historical instances of an aggregate entity : Bibliothèque royale, Bibliothèque Nationale, Bibliothèque nationale de France • Several administrative divisions of an instance: BnF Department of Manuscripts, Department of Musical Collections, Department of Rare Books
  20. 20. 1.8 Relationships • Relationships between individual entities - Attributive (real person to whom entities have been falsely attributed: Pseudo-Brutus vs Marcus Junius Brutus) - Kinship (genealogical, consanguineal – parent/child, ritual - godparents) - Hierarchical (teacher/student) - Affective (friends) - Collaborative (co-writer) - Similar sociological condition: social exclusion (banned people, authors whose works were listed in the Index librorum prohibitorum)
  21. 21. 1.8 Relationships • Relationships between individual and collective entities - Hierarchical (membership, spiritual affiliation) - Founding (Mazarin vs Mazarine Library) • Relationships between collective entities - Genealogical (family to family: Bourbon vs Bourbon-Condé) - Hierarchical (library vs university: Library of the College of Sorbonne vs College of Sorbonne) - Sequential (Bibliothèque royale, Bibliothèque nationale, Bibliothèque nationale de France) - Political alliances
  22. 22. 1.9 Examples of prosopographical databases ❏ Trismegistos: portal of papyrological and epigraphical resources in the Ancient World • distinguishes between access points for Names and Persons, each having its own identifier: Name: Apollonios www.trismegistos.org/name/1 = 6428 attestations = 3896 Persons (e. g. Apollonios www.trismegistos.org/person/5304 ) • variant forms of the name have also their own identifier Apollonios – 86 variants (Coptic, Egyptian, Greek, Latin) ● Index of ghost names = “all personal names that have been read by editors of papyri, but are in fact non-existent, i.e. do not occur in the current onomastical lexica or in the published papyri”
  23. 23. 1.9 Examples of prosopographical databases
  24. 24. 1.9 Examples of prosopographical databases
  25. 25. 1.9 Examples of prosopographical databases • Attributes: - Name - Identifier - Sex - Ethnicity - Function - Dates birth/death - Provenance • Relationships: - Familial (father/mother)
  26. 26. 1.9 Examples of prosopographical databases ❏ Prosopography of Anglo Saxon England: access to structured information relating to all the recorded inhabitants of Anglo-Saxon England from the late 6th to the late 11th c. Attributes for Persons: • Name • Gender / Institution • Status • Office - Female - Male - M/F (anonymous collective designations: people, women and children - Institution - Undefined - Apostle - Burgess - Captive - Comes - Companion etc. - Abbot - Archdeacon - Bishop - Cancellarius - Dean etc.
  27. 27. 1.9 Examples of prosopographical databases - Occupation - Personal Information - Relationship - Education - Artisan - Cook - Falconer - Fierd etc. - Ethnicity - Language competence - Reputation - Stated health: “Theodore Archbishop of Canterbury, 668-690: Stephen.VitWilfridi 43 (p. 86) (He was troubled by frequent ill-health in advanced old age.)” - Affinal kinship: brother-in-law, husband, widow - Consanguineal kinship: aunt, brother, daughter - General relationship: beloved, companion, disciple - Generic kinship: ancestor, kinsman, kinswoman - Honorific kinship: brother, famulus, son etc. - As student or learned person - As teacher or instructor
  28. 28. 1.9 Examples of prosopographical databases ❏ Dictionary of English Writers (1300-1600) Data structure and entities encoded: • Author - Attributes: Dates of birth/death; Dates of activity; Region and Place of origin; Initial scholarship (dates, place, institution) • Friendship network • Library • Compagnies • Correspondence • Familial relationships – controlled vocabulary (daughter, son, etc.) • University degree • Frequented Inn • Social level • Justice • Polemics • Politics
  29. 29. 1.9 Examples of prosopographical databases ● Writings – controlled vocabulary (astrology, autobiography, etc.) ● Profession – controlled vocabulary (administration, teacher, etc.) ● University activities ● Religious positions – controlled vocabulary (abbot, archdeacon, etc.) ● Religious confession (catholic, protestant) ● Political network ● Services – controlled vocabulary (chamberlain, confessor, etc.) ● Academic societies ● Frequented university ● Name variants ● Voyages Ex.: FITZRALPH Richard List of the persons not included for different reasons (ghost names like John Boston, erroneous attributions etc.)
  30. 30. 1.9 Examples of prosopographical databases ❏ Kindred Britain – c. 30,000 individuals in contextual networks • Attributes: name; dates birth/death • Relationships: - Genealogical: ancestry, descent, siblinghood, marriage - Professional: - Arts and Humanities - Business, Finances - Diplomacy, Civil service - Fashion, Crime, Society, Travel - Monarchy and Court - Military - Religion - Science, Engineering - Teaching, Scholarship
  31. 31. 2. Prepare the data for interoperability • Structure your data and publish them in a machine-readable format: XML family • Check for existing models/ ontologies: foaf, SNAP, BIO, Relationship, RDA Relationships • Avoid duplicates: - Alphonse II (1185-1233 ; roi du Portugal) = Alphonse II (roi de Portugal ; 1185-1223) - Marguerite de Parme (1522-1586) = http://catalogue.bnf.fr/ark:/12148/cb14974379s = http://catalogue.bnf.fr/ark:/12148/cb12089957q = http://catalogue.bnf.fr/ark:/12148/cb134857095 • Use persistent URIs (Uniform Resource Identifier) for your data and a system capable of managing modifications to your dataset (e.g. deletion/fusion/splitting of records) • Use existing standards to encode attributes (ISO 233/843 etc. for transliteration of Arabic/Greek etc., ISO 639 for language codes, ISO 3166 for country codes)
  32. 32. 2. Prepare the data for interoperability • Align your data with other LOD: - generic: VIAF provides links to BnF/ LoC/ DNB/IdRef/BNE/wikidata/ISNI/CERL NB: ! VIAF does not have persistent URIs (risk of reattribution from time to time) ! CERL has not clustered all the identical records: Guillaume Cavelier (1658?-1727?): https://thesaurus.cerl.org/cgi-bin/record.pl?rid=cni00010035 = https://thesaurus.cerl.org/cgi-bin/record.pl?rid=cnp01335989 = https://thesaurus.cerl.org/cgi-bin/record.pl?rid=cni00047058 = https://thesaurus.cerl.org/cgi-bin/record.pl?rid=cni00034054 - specialized: Trismegistos, Pleiades, Typenrepertorium der Wiegendrucke
  33. 33. 2. Prepare the data for interoperability Reuse and Share: Sparql query to extract the biographical information from data.bnf dataset for the printers Enguilbert de Marnef, Jean de Marnef and Pasquier Bonhomme http://data.bnf.fr/sparql PREFIX skos: <http://www.w3.org/2004/02/skos/core#> SELECT ?prefLabel ?note WHERE { {<http://data.bnf.fr/ark:/12148/cb15046491c> skos:prefLabel ?prefLabel; skos:note ?note} UNION {<http://data.bnf.fr/ark:/12148/cb16690896z> skos:prefLabel ?prefLabel; skos:note ?note} UNION {<http://data.bnf.fr/ark:/12148/cb12389091x> skos:prefLabel ?prefLabel; skos:note ?note} }
  34. 34. 3. Link the data to a visual library
  35. 35. iiif.io
  36. 36. A Community that develops Shared APIs, implements them in Software, and exposes interoperable Content
  37. 37. IIIF Vision Create a global framework by which image-based resources (images, books, maps, scrolls, manuscripts, musical scores, etc.) …from any participating institution can be delivered in a standard way …via any compatible image server …for display, manipulation and annotation in any application, …to any user on the Web.
  38. 38. A Community that develops Shared APIs, implements them in Software, and exposes interoperable Content
  39. 39. Global Community 41 consortium members 100+ organizations involved
  40. 40. Museums / Galleries British Museum National Gallery of Art The J. Paul Getty Trust The Walters Art Museum Yale Center for British Art Et al. Aggregators ARTstor CONTENTdm DPLA Europeana Internet Archive Wikimedia Foundation Biblissima State / National Libraries Austria Bavarian State British Library Denmark Egypt France Israel Moravian Library New Zealand Norway Poland Scotland Serbia Wales Vatican Qatar United States (LoC) International Leaders And many more! Universities and Research Institutions Cambridge Cornell Ghent Gottingen Harvard Oxford Princeton Stanford Edinburgh Toronto Wellcome Trust Yale
  41. 41. Community • A/V Technical Specification • Discovery Technical Specification • Manuscripts Community • Museums Community • Newspapers Community • Software Developers Community 1. Cambridge, Sept 2011 2. The Hague, April 2012 3. Edinburgh, July 2012 4. Paris, May 2013 5. Copenhagen, February 2014 6. London, October 2014 7. Washington DC, May 2015 8. Ghent, November 2015 9. New York City, May 2016 10. The Hague, October 2016 11. The Vatican, June 2017 Working Group Meetings Community Groups 1130+ 6 participants on open community calls every 2 weeks
  42. 42. A Community that develops Shared APIs, implements them in Software, and exposes interoperable Content
  43. 43. “get pixels” via a simple web service Just enough metadata to drive a remote viewing experience Image API Presentation API IIIF: Two Core APIs
  44. 44. IIIF Image API https://example.com/{id}/{region}/{size}/{rotation}/{quality}.{fmt} CC-BY IIIF Consortium and Community
  45. 45. IIIF Presentation API Core concepts to remember: A Manifest...: ➔ is “just enough metadata for viewing” ➔ represents the digital surrogate of a physical object ➔ is what a IIIF viewer loads to display an object ➔ contains one or more Sequences of Canvases CC-BY IIIF Consortium and Community
  46. 46. IIIF Presentation API A Canvas...: ➔ is a virtual container for content, an abstract space onto which we “paint” content ➔ is the target of annotations used to associate content with it (images, texts, links, videos…) CC-BY-NC-SA IIIF Consortium and Community
  47. 47. IIIF Presentation API
  48. 48. To support login, and differential access to resources. Search within an object, such as the full text of a book or newspaper Authentication APISearch API IIIF: Three More APIs A/V API Deliver time-based media (audio, video)
  49. 49. A Community that develops Shared APIs, implements them in Software, and exposes interoperable Content
  50. 50. Compatible Software IIP Image IIP Moo Viewer digilib FSI Server Mirador Internet Archive Book Reader FSI Viewer Leaflet JS Universal Viewer
  51. 51. A Community that develops Shared APIs, implements them in Software, and exposes interoperable Content
  52. 52. IIIF-compatible Content 100+ organizations 345+ million images
  53. 53. IIIF-compatible Repositories (especially useful to find medieval and Renaissance content) • Gallica (BnF) • Biblioteca Apostolica Vaticana • Bavarian State Library (BSB) • Internet Archive • Universität Heidelberg • Harvard University • Bodleian Libraries, Oxford • e-codices • BVMM (IRHT-CNRS) Still under construction: • British Library, Cambridge University, Parker on the Web (Stanford), etc.
  54. 54. What’s in it for me? As an end-user, what can I do with IIIF?
  55. 55. Deep zoom
  56. 56. Compare (across digital libraries)
  57. 57. Compare (across digital libraries) Gallica Internet Archive
  58. 58. Import to viewers (drag and drop)
  59. 59. Cite and share (full images or regions of images) gallica.bnf.fr/iiif/ark:/12148/btv1b8446958b/f39/423,1322,1365,1135/,800/0/native.jpg
  60. 60. Reunite, reconstruct (mutilated manuscript)
  61. 61. Reunite, reconstruct (mutilated manuscript)
  62. 62. Reunite, reconstruct (former personal collection) Florus of Lyon’s library (9th century) (autograph manuscripts or those annotated by his hand)
  63. 63. Reunite, reconstruct (medieval library) Historical collection of the abbey of Saint-Benoît de Fleury (650-1791)
  64. 64. Annotate
  65. 65. IIIF use case: The Biblissima portal
  66. 66. The Biblissima portal in a nutshell ➔ Focus: history of collections / transmission of texts in the Middle Ages and the Renaissance ➔ aggregates specialized data on medieval manuscripts and early printed books ➔ search, browse, visualize beta.biblissima.fr
  67. 67. (Heidelberg, Universitätsbibliothek, Cod. Pal. lat. 864)
  68. 68. IIIF Collection URL
  69. 69. Page about the abbey of Saint-Germain-des-Prés in the Biblissima portal
  70. 70. ➔ strengthen the documentary identity of a person/organization ➔ enrich textual metadata with visual elements ➔ join together available images: ◆ common formats (jpg, png etc.) ◆ and IIIF-compliant images Implement a visual library about an entity
  71. 71. Page about the abbey of Saint-Germain-des-Prés in the Biblissima portal Collection of examples of the abbey’s ex-libris? (codicology) Collection of illuminations depicting the abbey? (iconography)
  72. 72. Mirador, populated with a collection of documents
  73. 73. Select a document to view
  74. 74. Identify an ex-libris on folio 2 recto
  75. 75. Annotate the ex-libris (transcribe)
  76. 76. Annotate the ex-libris (indexing)
  77. 77. Annotate the ex-libris This autocomplete list of tags could be dynamically populated by requesting external web services or by a local project-based authority file This basic form could be extended to record other bits of metadata in the annotation
  78. 78. Annotate the ex-libris (saving) Save the user input into a remote database along with additional data giving the context of the annotation: ● URI and label of the Canvas that is being annotated ● Image coordinates (xywh) ● URL of the Manifest and the entire manifest data itself
  79. 79. Simple PHP page based on this SPARQL query (for demo purposes only) Search and browse the annotations
  80. 80. Display an ex-libris annotation Image region (xywh coordinates), requested with the Image API URL Manifest label (shelfmark) / Canvas label (folio) Attribution (rights information) Open in Mirador to view the ex-libris in the context of the full document
  81. 81. Ex-libris annotation in context, in Mirador
  82. 82. Basic principles of this approach ➔ use image annotations as a starting point to collect data and index visual elements on a page ➔ maintain the link between images and textual metadata ➔ reuse existing metadata about the object
  83. 83. Mirador: List view Monograms of Simon Hayeneufve, in Mirador Local static images on your hard drive, imported into Mirador Desktop
  84. 84. Mirador: List view Annotate a monogram
  85. 85. Mirador: List view
  86. 86. Credits This presentation reuses some slides taken from the following presentations: • Introduction to IIIF, Tom Cramer (2017 IIIF Conference, Vatican, 06/06/17) • Welcome and State of the IIIF Universe, Sheila Rabun (2017 IIIF Conference, Vatican, 06/06/17) Reused slides: #36, #37, #39, #40, #41, #43, #48, #50, #52, #55 The presentation also includes of a few images taken from the IIIF specifications. The license is indicated on each slide with the mention “CC-BY-NC-SA IIIF Consortium and Community”.
  87. 87. Thank you! Eduard FRUNZEANU Régis ROBINEAU Pool Biblissima Équipex Biblissima biblissima.fr / @biblissima

×