Vogeler shows how boundaries of both nation and media are collapsing into the world-wide web, and demonstrates the collaborative possibilities of combined diplomatic corpora.
Presentation at the International Medieval Congress in Leeds, 2009 July
2. IMC Leeds, 9.7.2009Georg Vogeler 2
Charter Corpora on the Web
Württembergisches Urkundenbuch (http://maja.bsz-
bw.de/wubonline/
CDLM (http://cdlm.unipv.it)
DEEDS (http://www.utoronto.ca/deeds/)
Monasterium.net (http://www.monasterium.net)
Ut per litteras apostolicas …
(http://www.brepolis.net)
Diplomatico Firenze
(http://www.archiviodistato.firenze.it/diplomatico)
3. IMC Leeds, 9.7.2009Georg Vogeler 3
What’s their advantage?
Images
Reconstructed archives
• Virtuelles Archiv Salzburg
• Archive of the Stift Ardagger
Fast search
=> take the charter heritage as is
not as defined by organisational reasons
4. IMC Leeds, 9.7.2009Georg Vogeler 4
Online Corpus abolishes borders …
between repositories
between forms of representation
and
5. IMC Leeds, 9.7.2009Georg Vogeler 5
Research on set phrases
Vernacular dating clauses
• Latin model: (Ulm 1275 März 29)
dirre dinge iſt gezivch herre Marquart von
Bleichen herre hartman von ſahſenhvſen vn
herre tecke von annenhoven. Datum · IIIIo · kl
· aprilis · anno dni · Mo · CCo · IXXVo.
• German model almost free from it:
Diz geſchach zehahberch an deme Ciſtage in der
phingeſtwochen / do von gotteſ geb̓vrte waren
zwelfhundert Sibenzig vn f̓vnf Jar
6. IMC Leeds, 9.7.2009Georg Vogeler 6
Dating Clauses
13th century:
• Germany (de Boor 1975)
- South-western model:
• dis geschach do man zalte von gotes gebúrte zwelf
hundert und niun und niunzig jar.
- South-eastern model:
• ditz ist geschehen, do es waren von christes geburt
tousent zwaihundert und darnach in dem niun unde
niunzegisten jare.
7. IMC Leeds, 9.7.2009Georg Vogeler 7
In monasterium.net
for $u in //tenor[not(.='')]/ancestor::text[.//lang_MOM='Deutsch']
let $dat := substring($u//tenor, (string-length($u//tenor) - 200))
where number($u//date_sort) lt 14000001 and
number($u//date_sort) gt 13000000
order by $u//date_sort
return <dat><wo>{
$u/@b_name
} {
$u//issued/placeName/text()
}</wo>
<was> {
$dat
}</was></dat>
10. IMC Leeds, 9.7.2009Georg Vogeler 10
Results
13th century:
• 433 texts
• “zalt”-model: 24, all but 5 from the Chartularium
Sangallense
• “waren”-model: 137, all but 15 from the south-eastern
regions
14th century:
• 8354 texts
• “zalt”-model: 2478, 964 not from St. Gallen
• “waren”-model: 350, only 13 from St. Gallen
11. IMC Leeds, 9.7.2009Georg Vogeler 11
Methods of Investigation
Already in use
• Simple word selection/word count (Tock,
Brousseau, Parisse)
• Phrase statistics (Gervers/Margolin)
• Graphetic detail analysis (Fiebig)
• Hand identification by pattern analysis
(Schomaker/Burgers)
• Named entity recognition (Stoyan/Schmidt)
12. IMC Leeds, 9.7.2009Georg Vogeler 12
Possible Programming
Testing/adapting existing algorithms
• Author identification tools
• Graphical variation tools
• Named Entity Recognition methods for clauses
to find the connections between charters that
aren’t kept in the same archive/aren’t printed in
the same edition:
• e.g.: Influence of recipient on the charters
• Spread of formula, regions of legal culture
13. IMC Leeds, 9.7.2009Georg Vogeler 13
Early medieval diplomatics
Add charters to the online corpora
Add information to the online charter corpora
Take text analytic software into consideration
Ask your local computer scientist what he could
help you
CDLM
Württembergisches Urkundenbuch
DEEDS
Monasterium.net
Ut per litteras apostolicas …
Diplomatico Firenze
The monasterium-project thus gives an insight into the possibilities of a Virtual European Charter Archive: The charters are just one corpus and
you will find the documents in the Archives of the Archbishop of Salzburg although the Habsburgs transferred them all to Vienna,
you will find all documents dealing with the dioceses of Passau that are incorporated to the capital of Austria before 1469.
You will find documents concerning Bratislava, the capital of Slovakia from the times it was Capital of Hungary …
Borders between forms: I explained last year with the Online Kemble for the Anglo Saxon Charters. This year I want to give an example how a corpus like monsterium.net can be used for diplomatic research – supported by the computer.
Some of you might know that I had my own conference on Codicology and Palaeography in the Digital Age only a week ago. Thus I had not too much time to prepare this example: From vast variety of possible questions (Paarformeln, Bekräftigungsformenl; Angabe von Gründen für Beurkunden abhängig vom Aussteller? („Notturft“ bei Frauen), #Zustimmungsformeln und an schaden#, the relationship between vernacular formula and latin, function of the witnesses of the seal, seller taking responsibility for the correctness of ##proporty rights#; correlation between issuer, recipient and writing notary etc. etc.) I choose the dating clause: #Latin example#; There are several observations made from 13th century material: For Switzerland Peter Rück observed the introduction of the „modern“ dating style by counting days in a month from West to East, with continously #reluctance in the diocesis of Konstanz#; Helmut de Boor observed
I prepared a selection of the full texts in mom-ca that are german and from the 13th century.
They are from archives as indicated in this map (google maps): St. Gallen ist from the alemannic part and thus should prefer „zalt“ while the rest should prefer „waren“
And I made a search with regular expressions on it to identify the clauses in their variety
I prepared a selection of the full texts in mom-ca that are german and from the 13th century.
They are from archives as indicated in this map (google maps): St. Gallen ist from the alemannic part and thus should prefer „zalt“ while the rest should prefer „waren“
And I made a search with regular expressions on it to identify the clauses in their variety
I prepared a selection of the full texts in mom-ca that are german and from the 13th century.
They are from archives as indicated in this map (google maps): St. Gallen ist from the alemannic part and thus should prefer „zalt“ while the rest should prefer „waren“
And I made a search with regular expressions on it to identify the clauses in their variety
13th century confirmes the analysis of de Boor
14th century shows a significant change: the „zalt“-model isn‘t restricted to the alemannian region of south Germany and the waren model is much less spread than it was before. That fits into de Boors general observation that the zalt-model is more modern and is spreading already in the 13th century from west to east. If I wouldn‘t be occupied by research on the use of the documents of Frederic II at the moment, I would very much be inclined to continue this research. But I have to be careful: There are lot‘s of other techniques to be applied to digital charter copora:
Let me give you some examples
Author Idenfitication: Leeds 2008: problem of short formalistic texts: difficult to identy in general, thus of great interest for the computer linguists.
Graphical Variation: edit-distance, developing soundex
NER: Hidden-Markov-Model: training
What could be the result of that for the early medieval diplomatists?
You traditionally don’t deal with large corpora. But you could consider that: The CDLM provides a huge amount of data – and I haven’t read any study using the corpus. Unfortunately the ARTEM-Databases aren’t online, but I would so much interested to see research done with it.
The online accessible corpora can be improved:
Add charters to the online corpora
By retro digitization and
By digital edition
Add information to the online charter corpora
Online Editor of www.mom-ca.uni-koeln.de: there are at the moment 636 charters from before 1150, 171 of them without fulltexts. mom-ca provides the possibility to add text online, simply by registering yourself on the site. Why not enhancing the corpus?
Take text analytic software into consideration
Whereever your material comes from: take into consideration that there are already text analytic tools that could be useful for you. And if you imagine a tool but don’t find it or don’t know how to use it:
Ask your local computer scientist what he could help you:
and don’t be frustrated if he doesn’t understand you – there are lots of computer scientists supporting the work of historians!