Standardize XML encoding of medieval charters using TEI
1. How to standardize XML-
encoding of medieval
and early modern charters and
instruments?
TEI workshop
“Putting the TEI to the test”
Berlin April 25th, 2007
2. Topics
• state of activities in medieval and early
modern charters projects
• some examples of differences between
CEI and TEI
3. state of activities
• 300 years of diplomatics
• 30 years of digital diplomatics
• 3 years of Charters Encoding Initiative
4. Charters on the web
• currently ca. 100 projects
• mainly in Europe
• ranging between
– 94 (Critical Edition of the Charters of the
Arnulfinger)
and
– 20,000+ (MOnasteriuM)
charters
5. Different approaches …
• archival description, finding aids, registers
• calendar / regesta
• retrospective digitization of printed charter
editions
• ongoing work on new digital editions
=> different requirements
6. … but a common object
• „The Charter“
• => cross-project object identification
– (charter0 in archive =) charter1 in medieval
cartulair = charter2 in 17th cent. print = charter3
in a calendar = charter4 in finding aid = charter5
in a modern edition = charter6 as digital
facsimile
• why not use TEI? – as all projects produce
“text“ …
7. considerations
• some projects are older than the TEI and thus have
other approaches (DEEDS)
• some are not aware of markup (using relational
databases, pure html, pdf, etc.)
• some only want relatively flat encoding „quasi
standard“ TEI sufficient (dMGH)
• some believe the TEI tag set meets their needs (EdC)
• some consider the TEI inappropriate and develop their
own schema (CDLM)
• many would like to use an extended TEI tag set (MOM,
CEI)
8. „Pur nel rispetto degli standard disponibili e riconosciuti
(XML lo è), è parso utile rinunciare al modello TEI, e
dar vita a un’autonoma e agile struttura di codifica dei
dati testuali. Nella consapevolezza […] che il prodotto (il
‘modello’) andrà progressivamente perfezionato […] mi è
capitato in altre circostanze di indicare nel Vocabulaire
International de la Diplomatique lo strumento ideale per
avviare una discussione e un confronto concreti
all’interno della comunità dei diplomatisti, lo strumento
da cui partire per fissare i requisiti di una Document
Type Definition (DTD) da impiegare nell’edizione
elettronica di testi documentari; che, in sostanza, sia per
la comunità degli studiosi ed editori di fonti
d’archivio medievali ciò che TEI è per filologi e
letterati.”
Michele Ansani (http://cdlm.unipv.it/progetto/codifica-xml), Codice
diplomatico della Lombardia medievale, Documentazione tecnica
9. CEI
• founded in 2004
• diplomatists from Europe and North America
• scholars, librarians, archivists, information
scientists
• aims at making data exchange easier: standard?
controlled vocabulary? ontology? DTD/schema
proposals?
• continuous meetings – mailing list (cei-l@lists.lrz-
muenchen.de) – experiments with the current
version (editMOM, DS) that could be called CEI.0
10. Questions to the TEI
• possibilities for charter-specific elements?
• better encoding of material and visual
aspects of charters?
• integration of the EAD-model?
15. Typical structure of an
„Urkundenbuch“ (charters edition)
• number
• charter header
– <cei:abstract>, <tei:creation>
– <tei:textWitList>
• <cei:archIdentifier>, <cei:transmissionStatus>
– <tei:listBibl type=”facsimilia|prints|regesta|studies”>
– <cei:diplomaticAnalysis>
• text with apparatuses
• charter (chDesc*, tenor*)
16. Archives
• EAD is dominant
• LEADER as one attempt to bring EAD and TEI
together:
– separate TEI transcription text linked to a standard
EAD description (http://leaders.sourceforge.net/)
• We would prefer:
– make an archival description possible with TEI
• msDesc, fileDesc, idno … are good starting
points
18. Main differences between CEI and
TEI
• CEI prefers explicit charter specific rather
than generic TEI elements
• CEI adds some concepts to the TEI
• CEI changes syntax and meaning of some
TEI elements slightly
• CEI keeps nesting open
=> CEI proposes encoding that is charter
specific and that is aware of the physical
and visual charateristics of the charter.
19. But …
• It is our hope that the CEI will one day be
a TEICharters-Module
• Thank you for your attention!
Georg Vogeler (g.vogeler@lmu.de)
Charters Encoding Initiative (http://www.cei.lmu.de)
Editor's Notes
I have to apologize for my english – and to thank Barbara Levergood from the Statelibrary in Göttingen who gave me susbstantial support with the english.
And I have to thank Patrick Sahle who is in some ways co-author of this presentation although other duties didn‘t allow him to be here.
Why all those logos in bottom? Because my work is supported not only by the universities in Lecce and Munich where I‘m working at the moment, but also by the colleagues at the chair of Historical Auxiliary Sciences in Munich and the Institute for documentology and editorial sciences.
Approximately: Mabilllon end of 17th centr., De re diplomatica 2. Aufl. 1709
Diplomatics is an old discipline that developed it‘s methods since the beginning of the 18th century. It‘s methodology went through some major changes in the 19th cent. (Sickel: Diktatvergleich). It was open to technical developments through all the time (copperplate engraving, Photography) and thus in the 1970s the diplomatists (esp. French) shared the interest among scholars of medieval history in informatics. (Informatique et histoire médiévale, hg. v. Lucie Fossier, A. Vauchez u. C. Violante, Actes du Colloque de Rome, 20-22 mai 1975, Roma 1977 (Publications de l&apos;École Française de Rome 31).)
The growing attempts to bring medieval charters into the web has led to the CEI-founding meeting April 2004, at the moment supported by the « Commission international de Diplomatique », and it’s work based on the old traditions. I give you some more details about the initiative later.
=&gt; I’m talking about well established scholarly concepts and the efforts to bring them into the digital world.
requirements:
Archival description: aims at: locate a charter, submit it to the historian, part of general archival management. (material object)
calendar: Historians want brief informations about the content of this – important – kind of sources (event)
Charter editions: libraries have printed charter editions in their shelves and digitize them (page images– book)
Ongoing work: diplomatists compile a corpus (usually by issuer, but also by region or subject), and make a sound critical edition with indexes, comments on the content (citations, identification of persons and places), diplomatic introductions (discrimen veri ac falsi), studies of the history of the producing chancery
DEEDS: basing on a database of phrases this project has led to a large text archive, that can be converted into different encodings by #übergelegten Layern#
projects like „Preußisches Urkundenbuch“ (Prussian Charter Book) are ambitious but haven‘t discovered XML yet.
Many projects reduce their encoding to simple phenomena: page, paragraph, division and use the TEI (dMGH, RI), without exploiting it totally – without getting closer to the common intention to identify single charters spanning several presentations.
There are projects that use TEI for quite deep charter encoding. These projects seem to be satisfied with the possibilities that Roma and ODD schemas offer. Partly they resign from semantic charter markup or the description of scribal or codicological details, partly, they use generic elements to encode specific information (EdC hatte mal &lt;bibl&gt; für eine Archivsignatur verwendet – andere Beispiele?)
Others see similarities to the TEI but have build their own schemas: That‘s not a great surprise as charter editions have – old and well established – structures that can easily mapped to an XML structure. The main example here is the CDLM (with about 13.000 charters)
Current projects are interested in the TEI as basics for encoding, but don‘t want do bend them to fit to charter encoding requirements – they are interested in extend the TEI to adapt it to their needs – thus they try to conciliate between the last to positions.
To make the extreme# position of the CDLM more clear, I would like to quote a statement of the head of this project, Michele Ansani.
Arianna um eine adhoc-Übersetzung?
Übersetzung (ins Englische?##) Was die verfügbaren und anerkannten Standards angeht (XML ist anerkannt!), ist es erschien sinnvoll, das Modell der TEI aufzugeben# und eine selbständige und lebendige Struktur der Textauszeichnung ins Leben zu rufen. Im Bewußtsein, daß so ein Produkt (das „datenmodell“) sich fortschreitend verbessern wird, habe ich an anderer Stelle darauf hingewiesen, daß im VID das ideale Werkzeug existiert, um eine konkrete Diskussion innerhalb der community der Diplomatiker anzuregen, ein Werkzeug, von dem aus man festlegen kann, was DTD enthalten muß, die für eine elektronische Edition von Urkundentexten angewendet werden soll; eine DTD, die gewissermaßen#, für die Community der Editoren von Quellen aus mittelalterlichen Archiven das wäre, was die TEI für Philologen und Literaten ist.
English: As regards the existing and accepted standards (XML is one) it seems to be useful to resign from the TEI model, and create an autonomous and flexible structure for the encoding of textual data. Given that the product (the „model“) has to be improved continuously, I understood from other circumstances, that with the VID we have the ideal instrument to initiate a concrete discussion in the diplomatic community; an instrument where you can start to establish the requirement for a DTD to be used for the digital edition of documentary texts; a DTD that, effectively, would be for the community of scholars and editors of medieval archival material the same that is the TEI for philologists and literary scholars.
Wer Urkunden edieren will muss zunächst mal die TEI verwerfen!
Dann sollte er das Vocabulaire benutzen um daraus eine DTD zu entwickeln
Um schließlich zu einer Codierung zu kommen die für Editoren mittelalterlichen Archivmaterials das leistet, was die TEI für Philologen leistet
That is to say:
We start with the needs of diplomatic work – and then see how far this can be done with the TEI tag set
Well, the CEI has taken up the challenge of this statement
But the CEI isn’t so strict to say: The TEI is useless for diplomatic encoding.
Possibilities for urkundenspezifische Elemente?
Diplomatic Discourse
Authentication
Slight changes of existing elements?
Better encoding of material and visualt aspects of charters ?
Graphical elements?
chDesc with deeper descriptions?
Encoded Archival Description
To keep the archival community in
For the period between 500 and 1300 I would assume that liturgic texts and charters together form more than 3 quartes of the written cultural heritage
A manuscript but: usually a single sheet not a book.
And: a legal text that becames valid by it‘s visual appearance and physical properties: signs, seal as means of authentication
One of the crucial issue of diplomatists therefore: Forgery? (discrimen veri ac falsi): external characteristics (characteristics of the parchment, Faltung#, how the seal is applied, Rasuren#, characteristics of script, Layout) compared to other charters produced in the same chancery
=&gt; The diplomatist is interested more in the visual and physical characteristics of a charter than other people working with manuscripts (except from codicology and paleography)
=&gt;crucial to mark up means of authentication
Some of these are already part of the TEI – some not
Markup of special signs like &lt;monogram&gt;, &lt;notarialSign&gt;, &lt;chrismon&gt; …
Luciana:
Authentication is the act or mode of giving authority or legal authenticity to a record, instrument, statute, or a certified copy thereof. These are the primary methods of authentication:
an act or instrument may be executed and acknowledged by the signers before a notary or public officer authorized to fulfill such a function (in this case the notary witnesses the instrument)
an act or instrument may be executed by a notary and certified by its affidavit or special sign (i.e. the notary is the author)
an act or instrument may be executed by the author before witnesses and acknowledged by their signatures (&quot;sealed, signed and delivered in front of ...&quot;)
an act or instrument issued by a public authority may be acknowledged by the signum manus of such authority (e.g. the R in Rex written by the hand of British Kings)
an act or instrument may be testified by a seal (attached, appended or embossed). The digital signature (which MoReq2 specifically discusses) is a seal, not a signature. It is functionally equivalent to medieval seals, which were not only a means of verifying the origin of the record and the fact that it was intact, but also made the record indisputable and incontestable, that is, had a non-repudiation function. The analogy is not perfect, because the medieval seal was associated exclusively with a person, while the digital signature is associated with a given person and a specific record, and because the former is an expression of authority, while the latter is only a mathematical expression.
an act or instrument may be authenticated, after having been issued, by the authority of a competent magistrate who attests that the instrument is in due form of law
a copy of an act or instrument may be certified as authentic by the officer keeping or preserving the act or instrument as part of his duties
an act or instrument may be authenticated by introduction of evidence sufficient to sustain a finding that it is the entity that the proponents claims it to be
Beside the external characteristics the dipomatists are intersted in the“ internal characteristics“ i.e. linguistic characteristics #used in charters#
That is the „Diplomatic discourse“/structure of charters: &lt;incovatio&gt; &lt;intitulatio&gt; &lt;salutatio&gt; &lt;prooem&gt; &lt;narratio&gt; &lt;dispositio&gt; &lt;dispWord&gt; &lt;clausulae&gt; &lt;curse&gt; &lt;subscriptions&gt;…
For example … Arianna, you may forgive me that I took this picture from your presentation in munich. But I think it shows very good the relationship between TEI and the things I‘m talking about: We have here several elements like &lt;curse&gt;, &lt;invocation&gt;, &lt;dispWord&gt; that mark linguistic parts of the charter text – and we see how they could be mapped to TEI: &lt;seg type=„curse“&gt; etc.
As it is the question with the diplomatic discourse, wether one should use generic elements or charter specific elements, the CEI would add some semantic encodings to the TEI that result from diplomatic tradition:
Thus: Having semantic elements in the tei like author, publisher, pubDate, settlement etc. why not have issuer, witness, notary …?
You might argue, that some of these elements are part of the metadata in the teiHeader – and that leads me to the next question: Couldn‘t text to be encoded with the TEI represent metadata? Isn‘t the msDesc „metadata“ to the text of the ms? I will try to show it
#### Unwichtig ###
Date: creation = contains information about the creation of a text. Issuing = creation???
Datum et actum …
How looks the usual diplomatic text like?
It’s identified by a number in the corpus.
It’s describe by a header, giving an short abstract, date and place of issuing, information on the textual tradition including archival identification and a annotation wether the textual witness is the original or a copy.
The header lists up 4 kinds of bibliographic references (and the CEI would like to suggest the extension of the &lt;listBibl&gt; element with a type-attribute, as the mentioned types of lists occur in scholarly editions of charters as distinct parts often without any explicit indication of their scope)
The header closes with reasoning about diplomatic pecularities, esp. about the question, if the charter might be forged.
And finally gives the text apparatuses, i.e. the critical apparatus and oftenly notes on persons and geographical indications in the text. It would be a step towards the integration# of the above mentioned appraches, if it could be used by archivists too.
A charter thus contains of description and the text itself: metadata and text.
The description of the charter has several things in common with the archival description of the parchment sheet in the archives.
##### Ideen, die ich nicht erwähnen werde – nicht übersetzen! ###
Italienische UB klassifizieren ihre Urkunden gewöhnlich als Überschrift: instrumentum, privilegium, charta donations, testamentum …: =?tei.textClass
EdC took the Vorspann# as front matter – hm …
Abstract model of the charter: =&gt;Urkunde{Bild;Text;Beschreibung}
But: … Keeping the archival description in the CEI induces special problems:
…
Some questions are simple as they are already adressed by the msDesc
msDescription contains lots of things archivist could use to describe their single items as charters have a physDesc …
msIdentifier: &lt;idno&gt; (identifying number) supplies any standard or non-standard number used to identify a bibliographic item – also used in msIdentifier/idno. So you could easily keep the archival community in with changing the description to: „… bibliographical or archival item.“
control of availability as archival concern could be adressed with e.G. &lt;availability&gt;
But:
The material in archives is organized hierarchical and in a way recursive: The holdings of archives are usually organized in archival fonds that can contain subgroups etc. The Encoded Archival Description thus has built: &lt;archive&gt;&lt;c&gt;&lt;c&gt;&lt;unit&gt;&lt;/c&gt;&lt;/c&gt;&lt;archive&gt; to be generic enough for different kinds of archival organization.
Component can be: box, folder, some abstract intellectual separation
Levels: „file“ for containers, item: &quot;series,&quot; &quot;subseries,&quot; &quot;subfonds,&quot; &quot;subgrp,&quot; &quot;file,&quot; or &quot;item.&quot;
This concept is similar to the archival finding aids although they don‘t need the &lt;did&gt;-Descriptive Identification-Wrapper
How is charter encoding of this iusse affected? As much as we do want the archival community be a part of our work as they provide the primary data for diplomatic and historical work with the charters.
But it looks like the EAD concept is maybe beyond the scope of TEI – at the moment I would be happy if I had the possibility to build parts of collections:
&lt;collection&gt;Hochstift Passau&lt;/collection&gt;&lt;collectionPart&gt;Urkunden&lt;/collectionPart&gt;
#####
It‘s like div – but it isn‘t div as it describes objects not parts of a text.????
did: A required wrapper element that bundles other elements identifying core information about the described materials in either Archival Description &lt;archdesc&gt; or a Component &lt;c&gt;. The various &lt;did&gt; subelements are intended for brief, clearly designated statements of information and, except for &lt;note&gt;, do not require Paragraphs &lt;p&gt; to enter text.
The &lt;did&gt; groups elements that constitute a good basic description of an archival unit. This grouping ensures that the same data elements and structure are available at every level of description within the EAD hierarchy. It facilitates the retrieval or other output of a cohesive body of elements for resource discovery and recognition.
The &lt;did&gt; in &lt;archdesc&gt; is sometimes called the high level &lt;did&gt;, because it describes the collection as a whole. Consider using the following elements for this high level &lt;did&gt;: &lt;head&gt;, &lt;origination&gt;, &lt;unittitle&gt;, &lt;physdesc&gt;, &lt;repository&gt;, and &lt;abstract&gt;. The &lt;unitid&gt; and &lt;physloc&gt; elements are suggested if applicable to a repository&apos;s practice. A &lt;did&gt; within a Component &lt;c&gt; can be less complete, and might have only a &lt;container&gt; or &lt;unitid&gt; and a &lt;unittitle&gt;.
Ist EAD-Integration nicht ein zu großes Rad???
To summarize:
1: not to many generic elements like “name/seg/place@type” but explict charter specific ones:
e.g. parts of the diplomatic discourse
2. Extend TEI by
authentication
transmissionStatus in witList
3. Name textWitness instead of witness
Description of &lt;idno&gt; has to be changed (identification number of other text than books)
Slight changes: listBibl@type
4. Not decided where to put the archival identification elements
That is: CEI proposes encoding that a) is charter specific and b) is aware of physical and visual charateristics of the charter