Text of presentation on data structures in Making of Charlemagne's Europe database, given at at Medieval Studies in the Digital Age seminar, Leeds, February 2015
1
Bits of charters: putting Carolingiancharters intoa database
INTRODUCTION
[Slide 2:projectwebsite]The Makingof Charlemagne’sEurope project ranfrom2012-2014 at King’s
College Londonand createdadatabase frameworkforthe storage and retrieval of prosopographical,
geographical andsocio-economicdatafrom earlymedieval charters.The projectteamalsoinput
data fromnearly1000 chartersintothe database systemtoproduce a corpusthat couldaddressa
wide range of researchqueries.
If anyone wouldlike a demonstration of the database,Icangive one brieflylater. Thistalk isfocusing
on the decisionprocesses behindthe scenes of the project, discussingsome aspectsof the database
designandthe reasonsforthem. How do youdecide the technological formandscope of a project?
How can typesof data such as places be enteredconsistentlyandretrievedeffectively?How doyou
thenlinktogetherbasicunitsof people,placesandpossessionstoprovide useful representationsof
activitiesinacharter?And finally,havingputall the dataintoa database, how doyouget it out
again?What I’m tryingto showissome of the practical problems involvedintakingunstructured
and oftenambiguousmedieval informationandputtingitindigital formats.
But I wantto start withthe most basicquestion:why thisproject?Digital humanitiesreflectsthe
interactionof twodifferentfields:fora goodprojectyouneeda balance betweenthe technological
developments thatmake new digital approachesfeasibleandthe researchquestionsthat scholars
wantnew insightsinto.Andthere’salsoathirdaspect,whichshouldn’tbe overlooked:whatyoucan
getfundingfor. One of the bigproblemswith alotof digital humanities projectsisthat’sithardto
getthe long-termfundingtoallowthemtoshow theirfull potential.
The Making of Charlemagne’sEurope projectessentiallyarose fromthe coincidence of twoaspects.
[Slide 3:previousDDHprojects]One wasthatthere isa longtraditionof creating prosopographic
databasesat King’s;the Departmentof Digital Humanitieshad more than20 yearsof expertisein
that.
[Slide 4:MKCHEUR spreadsheet]The otherwasthe large numberof Carolingiancharterswhichexist,
whose prospographical and socio-economicdatahasn’tbeenfullyexploited.Itmade obvioussense
to combine these two factorsandalsoto take advantage of developmentsinmappingsoftware to
representspatial informationinnewways.
CHARTERS AND THEIR REPRESENTATION
I’ve mentioned chartersseveral times,butwhatexactly are they?The shortanswerismedievallegal
documents,mostlyconcernedwiththe grantingorsellingof property.I’vegivenyouahandout
whichisthe textand translationof one quite longcharter.I’mgoingto be referringtothat a lotin
my examples.
But a longeransweristhat there are differentwaysof understandingwhatacharter isand they
logically resultindifferentkindsof digital projects. [Slide5:understanding] Thereare three main
waysyou can thinkof a charter: as a material object,asatextor as a source of information.
2
Firstly,youcan thinkof a charter as a material object:aparticularpiece of parchmentwithwords
and signswrittenonit. [Slide 6:Marburg] A digital projectthinkingof chartersinthat way,like the
Lichtbildarchiv ältererOriginalurkunden includes high-qualitydigital imagesof the charters,
combinedwithtoolstoallowthe analysisof these images.A similarapproachisbeingtakenby the
newModelsof AuthorityprojectbeingjointlydevelopedbyKing’sCollege andGlasgow.1
The problemforsuch an approachwithCarolingianchartersisthatwe don’thave the original
manuscriptforthe vast majorityof them;insteadwe onlyhave copiesfromlaterinthe Middle Ages
(or sometimes evenjustearlymoderntranscripts). Anapproach focusingonmaterial objects would
therefore meanwe were unabletouse a lotof charters fromthe corpus.
A second viewof charterstreatsthemas texts. [Slide 7:CDLM] One optionforour projectwould
have beentoset upa database containingthe full textsof charters, togetherwithsome kindof XML
mark-upto allowusto pickout wordsand termsof particularinterest:personal names,titles,verbs
indicatingdonationetc.Thisisthe approachtakenbyprojectssuch as
Codice diplomaticodellaLombardiamedievale. [Slide 8:CEI]There’salreadyan international group
workingonstandardsfor encodingcharters
The difficulty if we wentdownthatroute wouldhave been the datawe workedwith.Carolingian
chartershave relativelylittlestructure tothem,theirspellingcanbe idiosyncratictothe pointof
unintelligibility andtheycanalsobe maddeninglyindirectinreferringtopeopleandplaces. [Slide 9:
Texton screen] Forexample, the charterI’ve givenyoureferstohow Tassilo ‘hadcauseda
monasterytobe newlybuiltinhonourof the HolySaviourwithinourforestatthe place called
Kremsmünsterinthe pagusof Traungau’(inmonasteriumhonore sancti Salvatorisinfrawaldo
nostroloco qui diciturChremisainpagonuncupante Drungaoe novoopere construerefecisset).
What bitof that Latintextdo youmark up as the ‘name’of Kremsmünstermonastery?[Slide10:
referencestoFater] AndalthoughFateris the abbotof Kremsmünster,thisfactisonlyever
mentionedindirectly Sohowdoyoumark up the textto show that Fater’snotjustany oldabbot,
but specificallythe abbotof the monastery receivingthe donation?
Recentscholarshiponchartershastendedtofavourthe textual approachtocharters:it’soften
focusedonindividualchartersorat the widestregional Urkundenlandschaften (charterlandscapes).
But we didn’twantto treatcharters as special snowflakes:we wantedtofindawayto lookacross
the whole corpusfromCataloniatoAustriaand compare socio-economicdatafromacross different
regions.Butmark-upof textswouldn’t allow us todothiseasily:itsstructuresaren’tsuitedto
findingall the unfree peoplementioned underhalf adozendifferentLatinterms orfor findingall
female vendors.
To answerthose kindof questions,ourprojectwas thereforethinkingof chartersina thirdway:as
sourcesfromwhichinformation couldbe extractedand putintostandardisedformatsfor
comparisonandlarge-scale analysis. Andourtoolsforthatwere a large-scale relational database
combinedwiththe ‘factoidmodel’.
1 http://www.digipal.eu/blog/new-digipal-project-models-of-authority/
3
[Slide 11:definition] Sowhat’sthe factoidmodel?Itwas developedbyKing’sforprevious
prosopographical projects. Andatitsheart isthe factoid, an assertion made bythe projectteamthat
a source says somethingaboutapersonor a place. Such a model canbe appliedtoanythingfroma
saint'slife (the VitaCuthberti states“Cuthberthealed awomanfroma demon”) toa coin
(FitzwilliamMuseumcoin839 states "Charlemagne iskingof the Franks").
How didwe apply the conceptof factoidstoour specific project?We startedoff withsome main
modelsforbasicentitiessuchasagents(peopleorinstitutions),placesandsources;these contain
staticinformationthatdoesn’tchange betweencharters(informationsuchasa person’s sex orthe
geo-locationof aparticularplace).We thenlinkedthese modelstogetherviafactoids.Here’swhat
thisfactoidmodel lookslike inthe specificcase of our Charlemagne database[Slide 12:Factoid
diagram]
So factoidsare statementswhichlinktogether agents,placesandpossessionsderivedfroma
specificcharter.These factoidsare of varying types,dependingonthe contentthattheyneedto
contain.For example, youneeddifferent datastructures toinputthe statement"Faterisabbotof
Kremsmünster"(whatwe call anattribute andrelationshipfactoid) thantoinputthe statement
"FaterpetitionedCharlemagne to confirmthe propertyof the monasteryof Kremsmünster"(what
we call a transactionfactoid).Authoritylists,meanwhile, are usedtoensure thatdata isentered
consistently,e.g.that Faterisnot describedas"abbot"inone factoidand"abbas" inanother.
Thisfactoidmethodpotentiallyallowed ustoinputa wide range of data intoa single database. We
still hadto determine alotof bigissues,however.What specificdatafromeach charterdidwe want
to record?What structuresand fields wouldwe needforthis dataandhow didwe ensure datawas
enteredconsistently?Andhowdidwe then allow enduserstoextractthe data?
A lotof the firstyearof the projectwasspent iteratively developingthe database andthe inputting
standards, ratherthan beingable tocarry out much sustaininputting. We startedinthe mostbasic
way:just lookinghardat the data. [Slide 13] Discussinghow we dealtwith people whowerebeing
transacted,[Slide 14] collatingexamplesof termsusedtoformthe basisfor authoritylists.The
charter I’ve givenyouasa handout, DKAR1:169 was one that we wentthrough several timesin
considerable detail,because ithadthe kindof complexitythatwe neededtobe able tohandle
withinthe system. Ithas18 differentplacesinit:Iknow because we analysedeverysingle one of
them.It alsocontainedalotof complicatedinteractionsbetween differentagents andplaces;[Slide
15: diagram] here’sone of my early attemptsat drawinga diagramof that.I’ll come back to that
diagramlateron.
We were alsothinkingabout the questions usersmightwanttoask [Slide 16:WendyDaviesand
Mark Mersiowsky]2
,consideringthe differentresearchtopics thathistorians hadalready explored
usingcharters.[Slide 17:questionslist] We endedupwithalonglistof possible questions thatthe
2 Wendy Davies,Small worlds: the village community in early medieval Brittany (London, 1988); Mark
Mersiowsky,'Y-a-t-il une influencedes actes royaux sur les actes privés du IXe siècle?',in Marie-José• Gasse-
Grandjean and Benoît-Michel Tock (eds.), Les actes comme expression du pouvoir au haut Moyen âge: actes de
la table ronde de Nancy, 26-27 novembre 1999,Atelier de recherches sur les textes médiévaux, 5 (Turnhout,
2003),pp.139-78
4
database mightbe usedto answer.Lookingatthemnow at the endof the project, mostof themare
too complicatedtoanswerwiththe currentversionof the userinterface.But it’sbythinkingof key
researchquestions inthatwaythat we couldworkout which data to record. For example, we
developedstructurestorecordthe relationshipsbetweenplaces, anoption whichwasn’tavailablein
previousprojects.I’lltalkabout those abitlater.
Anotherpartof developingthe projectwasdeciding itslimits:some of the issuesthatwe wouldn’t
realistically be able totackle. The single mostfrequentcomplaintwe have hadconcerningthe
database isthat we don’tinclude the full textof the charters(althoughwe hope inthe future to
include linksto the full textavailablefromotheronlinesites).Butgettingsuchdataintothe system
consistently wouldhave requiredavastamountof extrawork.I’ve givenyouthe textof DKAR 1:169
on the handout.Here’s the firstsentence of atextof a differenteditionof the same charter [Slide
18: Kremsmünstercharter].Thissingle sentence has 6differences fromthe textI’ve givenyou[click].
If we’dusedprojecttime toinputthe full textof databaseswe probablywouldn’thave ahdtime to
do anythingelse.
DATA STRUCTURES AND STANDARDS
There are alwaysthingsyou’re notable to achieve withinone project.Butevenwithoutthe full text,
whydiddevelopingdatastructuresandstandardsfor the Charlemagne database take solong?
What I want todo now is to lookinsome detail ata couple of examplesof the datastructureswe
created. Everydatabase projecthasto do thiskindof work,so althoughsome of the detailsI’ll be
discussingare specifictoearlymedievalcharters,Ihope you’ll findalotof the conceptsare more
widelyrelevant.Ourunderlyingrelational databaseishorrendouslycomplex.The schemaforitwent
fromthis[slide 19] to this[slide20] andit’snow even bigger. Butbecause it’sarelational database,
you’re essentiallybuildingupthiscomplexityfromalarge number of relativelysmall buildingblocks.
I’mgoingto be talkingabouttwoof these:placesandtransactionfactoids,usingexamples mainly
fromthe charterI’ve givenyou onthe handout.
Places
Thinkingaboutplaces,twomainaspectsdrove the structureswe developedfor recordingthem.One
was technologicaldevelopmentsindigitalmapping;we wantedtobe able torepresentdata
patternson a map onscreen.The otherwasthe data itself. [Slide21] In charters,whatwe get isn’t
technicallyplaces,butplace names.How yougetfroma name ina medievalchartertoa spot on the
map turnsout to require alot of thought.
The firstdistinctionwe made inthe database wasbetweencharter-specificinformation onplace
names(whichwe recordedin individualfactoids)andstaticinformationonaplace (whichwas
recordedinthe place model). [Slide 22] Forexample, the charterI’ve givenyou,DKAR1:169 has two
differentmanuscripts andtheirspellingof places varies. The place name editedas‘Bettinbah’ hasa
variantspellinginfootnote11 of ‘Petenpach’. There’sanothercharterrecordingTassilo’soriginal
grant whichcallsthe same place ‘Petinpach’.Insome chartersyou evengetseveral different
spellingsof aname (especiallypersonalnames) within asingle text.Andthere’salsothe
complicationof Latinbeinganinflectedlanguage,soyoucan get namesin differentcases,likethe
‘actum Wormacie’inthe lastline of the charter.
5
One of ourearlydecisions therefore wasthatwe’dincludefieldstorecordthe original textof place
namesand thatthese original textboxes wouldbe searchable. [Slide23:place name search]. If you
put ‘Wormacie’intothe searchplace namesfunctionsof the database,you’llcorrectlyfind‘Worms’,
be toldthat it’sin Germanyand evenhave itlocatedona map foryou that,so that you have what
youmightcall a ‘bottom-up’search.If you’ve gotjustaname in a medieval charteryou’ve gota
possible wayof identifyingit.
But behindthatlinkingfromaname instance toa map, there’salot of work goingonin termsof
identificationandstandardisation. I’ll show this schematicallywith anotherplace inthe same
charter: the place calledLiublinbah(towardsthe bottomof the firstpage of the handout).[Slide 24:
charter-specificinformation]We startoff withwhatthe chartertellsus:Liublinbahisa‘locus’,itsrole
inthe charteris as the locationof a possessionbeingdonatedandit’sinthe pagusof Drungaoor
Traungau (pagiare Carolingianadministrative regions,similarto Anglo-Saxoncounties).
The firstthingwe have to do is produce a standardisedmedieval name forthisplace.Ourmain
reference source forsuch LatinnamesisOrbisLatinus[slide 25],but like the majorityof minor
placesinour charter,Liublinbahdoesn’tappear inthat.[Slide 26:standardmedieval name] So
insteadwe choose one of the spellingsanduse that(addinganasterisktoshow that it may need
furtherworkat some point).
The nextquestioniswhere thisplace is.We relyonthe editorsof the charterfor this;we don’ttry
and researchdetailsforourselves,since we simplydon’thave time. [slide 27:place identification] In
thiscase,the editorsaysthatthe settlementconcernedisLeombach,a locationinAustriaandhis
identificationiscertain. [slide 28] Once we have thisidentification,we can use modernreference
toolsto check the geo-locationforLeombachandwhatpart of Austriait’sin.[slide 29] Fromthis,we
can create a place recordfor Leombach.[slide 30:map fromDKAR1:169] Whenwe enterdata
concerningthe charteritself we thenincludecharter-specificinformationsuchas the place’s role or
the place descriptor.
That may all seemratherlongwinded, butthiswayof thinkingaboutplacesgivesusa lotof flexibility
for dealingwithmore complex cases. Forexample,we quiteoftengetriversormountains
mentionedincharters;[slide 31:Ipfbach]inDKAR1:169 there are the two riverscalledIpfbach
(Ipphas inthe original).We can’teasilygeo-locate rivers,butwe caninputthe name data we do
have intothe systemandproduce recordsfor natural featuresinthatway
The more difficultsituationiswhenthere’suncertaintyaboutwhat place the medieval name refers
to. Sometimes,the editorwill justgive anapproximate area.[Slide 32:Raotola]. The editorof DKAR
1:169 thinks Raotola,where there are several vineyards, issomewhere onthe Rodelbach,atributary
of the Danube.We inputthe place as having a medieval place name,butnomodernone,butan
approximate locationmeanswe canrecordit withinmodernplace hierarchies:it’ssomewhere
withinUpperAustria
Alternativelythe editormaysuggest one ormore possible modernlocationsforaparticular
medieval place. Forexample,acharterfromMondsee discussesadonationof propertyatan
unknownplace called Teginga.[Slide33:schematic] Here iswhere we effectively carve upour
6
earlierschematicintotwo.We still have amedievalplace onone side, butitisnow beingtentatively
matchedto three differentmodernplaces,withvaryinglevelsof probability. [Slide34] Byrecording
possible matchesinthisway,we caneventuallygenerate adisplayforthe usersthatshowsthe
possible options.
The final aspectof places I want to talkaboutis place relationshipfactoids.One of the thingswe
realisedearlieronwhenlookingatourplace data is that we had twodifferentsortsof place
hierarchy,where placesare inlargerunits.Firstly,we hadthe modernhierarchies:Leombachisin
the Austrianstate UpperAustria,whichisinthe moderncountryAustria.Thatwas easyenoughto
record.But what didwe do aboutthe otherinformationwe’re beinggiven,thatLeombach isinthe
pagusof Traungau? How do we recordmedieval hierarchies?
PreviousprojectsatKing’shave beenable tofudge thisandmeldtogethermodernandmedieval
hierarchiesbecause they’ve beendealingwithaBritishadministrative systemthat’sextraordinarily
long-lasting. Forexample, the Prosopographyof Anglo-SaxonEnglandusedthe pre-1974English
countiesfortheirhierarchies,whichare nearenoughtoAnglo-Saxoncountiestobe workable.But
we didn’tknowwhere Carolingian pagiwere onthe map.Infact, there’sevenascholarlyargument
aboutwhetherpagiwere flatareason the ground at all or just scatteredcollectionsof
administrativerights.So torecordthe factthat Leombachwas inthe pagusof Traungau, which
mighthelpresearchersto understand more aboutpagi,we neededsome additionalstructures.
What we endedupusingiswhat’scalleda place relationshipfactoid. [Slide 35] Thisisa charter
specificassertionthat“CharterXsays Place 1 isin Place 2”, and alsoincludesplace descriptorsfor
the two placesconcerned.The original ideawasthata userwouldbe able to pull upa place record
for a medieval regionand thenbe able tosee all the placessaidto be withinin(includinggeo-
locationswhere they’reknown).We weren’table toimplementthisfully, butevensothese factoids
still provide useful information.Andtheyalsoillustrate anotherimportantaspectof any digital
humanitiesproject.If youthink apiece of data mightbe useful,it’smuchbetterto start recordingit
earlyon, rather thanhavingto go back laterand rechecka large numberof records.
Transaction factoids
As youcan see,justdesigningthe buildingblocksforthe database system, like places,takesalotof
thoughtif you’re goingtobe able to recordthemconsistently. Ourtrickiestproblem, however,was
workingouthowto record transactions,the actual businessof the charters,withinthe database.
Andit’sdifficultbecausewe’re tryingtorecord dynamicratherthan juststatic information. If you
lookat this place relationshipfactoid,forexample,the statement‘charterXsays Leombach isin
Traungau’doesn’talterthroughoutthe charter. Similarly,whenDKAR1:169 refersto ‘AbbotFater’,
whichwe’drecordas the attribute andrelationshipfactoid ‘Faterisabbotof Kremsmünster,that’s a
fixedstatement.There maybe earlierandlaterchartersinwhichFaterisn’tabbotof Kremsmünster,
but inthisparticularcharter he alwaysis.
In contrast,the mainimportance of a charter is thatit’schangingthings,typicallysomeone granting
propertyto someone else.Thingsare differentafterthe actionsdescribedinthe charterthanthey
were before.Buthowdoyou recordsuch a change in a relational database structure thatdoesn’t
allowfordifferentstates?
7
The firstthingwe did was simplifythingsby breakingdownthe activitiesin anycharterintoa
collectionof differenttransactions (possessionsflowingaround) andevents(allthe otherthings
goingon).[Slide 36] I showedyou thisdiagramforthe activitiesinDKAR1:169 earlieron. It’svery
complex because itshowsyoualmostall the activities (thoughinfactthere are few more eventhan
that).But if we start breakingthese activities down,we cangetrather more manageable units.
Firstly,there are three differentevents. [Slide 37] Tassilofoundsthe monasteryof Kremsmünster
and [Slide 38] there are two examplesof landclearance.[Slide 39] Activitieslike theseare recorded
ineventfactoids,whichgive fairlybasicinformationaboutagents,placesandthe type of activity
goingon.
Once we’ve dealtwiththose, we’re leftwiththe transactioninformation[Slide 40].Whatwe have in
thisdiagramis three separate transactions. [Slide41] One isa recordof whathappened inthe past:
Tassilograntedpossessions toKremsmünster.[Slide 42]The othertwoare Charlemagne’sactionsin
the present.He confirmsKremsmünster’spossessionsandhe alsograntsto the menof Eberstalzell
the right to remainonlandthey’ve cleared illegally
We needtobreakdown the charter intothese separate transactionstoallow us torepresentthe
flowsof possessions inadatabase structure.[Slide 43] Sofor example,Tassilo’sinitial grant to
Kremsmünstercanbe representedinaseries of tablesthatshowsthe agentsinvolved,thenthe
detailsof the placesmentioned,thenthe possessiontransferredandsoon We’ve frozenthe activity
ina waythat allowsusto describe itwithinarelational database.
Thisis a simplifiedversionof the structure we use torecord transactions,butit still givesusquite a
lotof flexibility.[Slide44: multipledonations] Forexample,itmeansthatwe can record a numberof
donatedpossessionsinthe same record;we don’tneed 22 differentfactoidsforthe 22 different
thingsTassilogave toKremsmünster,whichisabigrelief.Andif there are different locationsor
termsand conditionsfordifferentpossessionswe canrecordthose inone go.
But if we’re goingtouse a data structure like this,we needtomake sure thatwe can interpret
everythingunambiguouslyfromthe informationwe recordinthe table.[Slide 45] If you try andput
bothCharlemagne’sconfirmationtoKremsmünsterandhisgrantto the menof Eberstalzell inthe
same record,how doyou keeptrackof who’sgettingwhatpossessions?Yourapidlygeta complete
mess. Sowe had to define atransactionas involving onlytwomainagents(oragentsworking
together) andonlyone type of activity,sowe didn’tcombine aconfirmationandagrant inthe same
record.
We still had furtherdifficultiestosolve. One issuewasthatthe neatdistinctionI’ve beenmakingall
alongbetweenagents,placesandpossessionsdoesn’t actuallyworkwhenyoulook closely atthe
data. [Slide 46] For example, lookatthe 22 possessionsthatDKAR1:169 mentions. Asyou’ll see,
amongthe thingsbeinggranted [click] are entitiesthatwe regardas agents:churches,forexample,
like thatat Alburg,butalsopeople beinggranted,like the craftsmeninRaotola.Andaswell asland
at particularplaces beinggiven,there’salsoawhole place beinggiven: the villaof Alkoven [click].
We hadto ensure thatwe couldrecordagentsand placesas possessions;we alsostillhadtorecord
8
themas agentsand placesinthe normal way.People don’tcease tobe people justbecause the Duke
of Bavariatreatsthemas objectsforsome purposes.
[Slide 47] As youmay alsohave noticed,possessionsno16 and 18 on thislisthave an additional
complication:there’smore thanone objectinvolved. Tassilo’sgiving2vineyardsatAschachand 3 at
Raoltola,sowe alsoneededtorecordquantitiesof objectsbeingtransferred.
A final issue wasthatnotall the transactionswe were interestedinhadidentical flowsof
possessions. [Slide 48:sale] Ina sale,there are possessionsgoingbothways:how doyourecord
that? [click] Ourapproachwas to add anothercolumntothe table for possessions:if the returnbox
ismarked,it meansthatthese particularpossessionsare flowinginthe oppositedirection.[Slide 49]
In the online versionof the database, we use anarrow to highlightthis.
Unfortunately,however, DKAR1:169 hasan evenmore complicatedtwist.[Slide50:
transaction]Let’sgobackto Charlemagne’sinteractionwiththe menof Eberstalzell. Charlemagne
grants to these men the rightto remainonlandthey’ve cleared,butinreturntheyhave todo
service notto him,butthe monasteryof Kremsmünster.Thisisnolongeranarrangementbetween
twoparties:a thirdparty is nowinvolved.
[Slide 51:diagram] Aftera lotof discussion,we eventually developedadata structure inwhichwe
couldrecord sucharrangements,whichwe calledthird-partyreturns. Essentiallythiswasavariantof
the methodwe’dalreadyusedforordinaryreturns;we just neededtoadd anothertick-box tothe
inputform.
But thishighlights akeyissue whenyou’redesigningdatastructures:there’satrade-off between
how accuratelyyoucan representyour dataand how complex yourdatastructuresneedtobe. It’s
alwaysa temptationto developadatabase thatcan deal withevery possible datavariant,butyou
endup bymakingit more and more complicated anddifficulttouse.Asitwas,we had inputscreens
where youhadto scroll across horizontallytosee all the fieldsyouhadtoinput.
[Slide 52:shoes] Eventually therecomesapointwhere youhave to decide you’re notgoingto
change your data structuresanymore; youjusthave to fitthe data, howeverimperfectly,intothe
existingstructuresThe problemisknowingwhenyou’ve got tothat point.Doyou designelaborate
data structuresforthe minorityof casesthatare as complex asthe charter I’ve givenyou? The
problemcomesinknowing howcomplexthe average charterisbefore you’velookedatitindetail.
Third-partyflowsturnedouttoappearonlyinabout 3% of charters; possiblywe couldhave dealt
withtheminsome otherway,but at the time thisseemedthe mosteffectiveapproach.
More generally,whendesigningstructuresandstandardsit’sveryeasytomake decisionsthatyou
laterrealise were wrong.Forexample,we didn’tinitiallytreatall churchesasagents,buttriedto
distinguishbetweenmore andlessimportantchurches.Towardsthe endof the project,we realised
thisapproach wasn’tworkingandhadto go back and re-inputsome of the data.
So developingdatastructuresandstandardsforthe Charlemagne project wasanodd mixture.We
had to combine detailedconceptual analysisof platonicideal of names,charters,transactionsand
9
the like withthe messyreality of the actual formsthatsuch thingstake inpractice.JohnBradley and
Michele Pasin, membersof ourdigital humanitiesteam, once wrote apaperentitled ‘Structuring
that whichcannotbe structured’ 3
andin a sense that’swhatwe’ve beentryingtodo. Butit’sonlyif
youcan inputdataintothe database ina way that’sbothstructuredand that retainsasmuch of its
meaningthatyoucan getit outagain ina useful form.That’swhatI want to discussbrieflyinthe
final partof thistalk.
GETTING DATA OUT
In some waysgettingdataout isharderthan gettingitin.Just sortingoutthe displays of factoidsso
that theymake sense toend-usersis time-consumingandthere are still aspectsof thatthat coulddo
withbeingimproved.Butthe biggestproblemwhenwe were designingthe userinterface was
providinganeffective wayforuserstobrowse the database andfindthe particularinformation
there were lookingfor.
The main methodwe usedwas‘facetedbrowsing’,whichisincreasinglyusedbymanydatabases.To
explainhowthatworks,I’ll showyouaverysimple example,notusingcharters.[Slide53] Suppose
youhave a database thatcontainsinformationabout colouredshapes.How doyoufindthe object
withthe particularshape andcolour youwant?
The traditional wayiswitha search box [Slide 54],butthere are several problemswiththis.The first
isnot knowingthe righttermsto use. [Slide 55] Forexample,youinputthe term‘square’andget
zeroresults. Whyisthat? Because the shapesinthe database aren’tactuallysquares, evenif they
looklike it, butrectangles,sothey’re all listedunderthat term.Similarly,supposeyou’re interested
inparticularsorts of triangle. [slide56] You inputthe pair of terms‘red’and ‘triangle’.Againyouget
zeroresults. Butwhyis that?Doesthe database notcontaintriangles?Orhas itclassifiedthemall
undersome differentterm?Perhapsitthinksthatcolourisn’tredbutscarlet? Searchingadatabase
you’re notfamiliarwithcanbe worryinglylike makingrepeated stabsinthe dark.
Facetedbrowsing,incontrast,providesaneasierwaytonarrow downyoursearch till youfind
exactlywhatyou’re lookingfor. [Slide 57] Inthiscase,you mightbe givena choice of twofiltersto
browse by:by shape or bycolour.Choose one of these [click] andyouthengetshownhow many
examplesthere are of each colour[click].It’smuchsimplertofindall the red objectsyouwant[Slide
58].
But you’re notinterestedinall red objects,justredtriangles.Here’swhere youcancombine filters.
[Slide 59] Choose the shape filter,andyou’re shown whatshapesthe redobjectshave Sothere
definitelyaren’tanyredtrianglesinthe database,it’snotjustthatyou’ve gotyour searchterms
wrong.Andif you wantto, youcan nowclearthe colour filteranduse the shape filtertolookfor
trianglesof anycolour[Slide 60].
Facetedbrowsingtherefore,offersusersaneasywayof narrowingdownentriesinadatabase to
findexactlywhattheywant.You’re alwayskeptaware of whatfacetsyou’ve usedalreadyandyou
3 Bradley,John, and Pasin,Michele,'Structuring that which cannot be structured: a role for formal models in
representing aspects of Medieval Scotland',in Matthew Hammond (ed.), New perspectives on medieval
Scotland, 1093-1286,Woodbridge: The Boydell Press,2013,pp. 203-14
10
can remove some of themif youfindyou’re notgettinganyresults.The same basicprinciplesas
withthe colouredshapesunderlie the muchmore complicatedfacetedbrowsinginourdatabase.
You can browse bycharter, agentor place and graduallydrill downtofindwhatyouwant [Slide 61–
3 clicks]
Facetingmakesthingsalot easierforusers,butitdoesn’tsolve all theirsearchproblems.Users of
our database can browse byeithercharters,agentsorplaces,buttheyhave to thinkabout the
meaningof the filterstheyuse inthese differentviewstogetthe resultstheyexpect.Suppose
you’re browsingbycharters.Youcan choose tofilterthe chartersby variouscharacteristicsof the
agentstheycontain,soyou couldchoose all those thatinclude scribes[Slide 62] andthenadd in
women asan additional filter[Slide 63]. Youendup with177 charters,but that doesn’tmeanthat
there are 177 charterswrittenbyfemale slides,butthatthere are 177 whichinclude ascribe of
some sex andalsoa womaninsome role. [Slide 64] Infact, if yousearch viaagents,youfindthat so
far we haven’tfoundany scribeswhoare definitelyfemale.
In addition,facetedbrowsingwitha database like ours(unlike withthe colouredshapesexample)
needsalot of behind-the-sceneswork sothatwhenusersclickona filtertheygetthe resultsthey
expect.Todemonstrate that,Iwant to talkaboutone of the most complicatedfilterswe hadto
develop,thatlinkingagentsandplaces.If youwant tofindall the agents connected withthe place
Worms,for example,howdoyou definethisconnection inameaningful way?
A simple-mindedrule wouldsayall thatall agents whoappear ina charter whichmentions place X
are connected withplace X.[Slide 65]. Butas you’ll see fromthe exampleof DKAR1:169 youget
some unsatisfactory results. The groupof Slavssomewhereoutinthe wildsof Austria,forexample,
may well nothave knownanythingaboutWorms.Doesitmake sense to connectthemto it?Equally,
if we happenedtoknowthe scribe of thischarter (we don’t),he’dbe sittinginWormsandthe first
he may have heardof Leombachiswhenhe’saskedtowrite a charter mentioningit.AsforTassilo,
by 791 when thischarter waswritten,he’dbeendeposedbyCharlemagne andwasbeingheldin a
monasteryinNormandy. ConnectinghimtoWormsbecause a charter he’donce givenwas
confirmedthere seemstenuousatbest.
What we had to do therefore waslookathow agentsare connectedtoplacesinthe charter bythe
rolesthey play.[Slide 66] Youimmediatelygetamuchsmallerbutmore meaningful setof
connections.So,forexample,Tassiloasagranter isconnectedtothe differentplaceshe donated
and so is Kremsmünsterasthe recipient.Butthe monastery isalsoconnectedtoWorms,because it
(or at leastFaderrepresentingit) wenttoWormsto geta confirmationcharter fromCharlemagne.
The beekeepersof Raotola,meanwhile,are connectedto Raotola,butnotto anyof the otherplaces
mentionedinthe charter. We don’tknow if theyeverwenttosome of the other properties of
Kremsmünsterthatare mentioned.Andwe don’tknow exactly where the decaniaof Slavswere,so
theydon’tgetconnectedto anywhere.
We obviouslycouldn’tdothissortof detailedanalysisof agentandplace interconnectionsforevery
charter. [Slide 67] Instead,we hadto come upwith rules(still verycomplicated) forhow agentroles
and place rolesare connectedtogether. Forexample,anyone whose agentrole makesthem
responsible forthe flowsof possessions(like grantersandrecipients) shouldbe linkedtoall places
11
whichhave the place role ‘locationof possession’.Inmostcases(whichwe specify) theyshouldalso
be linkedtoanyplace withthe place role ‘locationof transaction’.Anyone whoseagentrole just
involvesthembeingpresentata transaction (like apetitioner) shouldbe linkedonlytothe place
withrole ‘locationof transaction’.
The full rules justforthisfacetprobably tookus a monthor more of discussiontoworkout. [Slide
68] Here’sanextractfrom the final results.Thisdocumentgivesthe rules,butitalsoincludesthe
testdata: examplesthatwe coulduse tocheckthe facetswere workingaswe wanted.Doingthis
testingwasincrediblypernicketyandtediousforboththe historiansandITspecialists [Slide69].But
it wasthe onlyway of checkingthatuserswouldalwaysgetthe expectedresults. Facetedbrowsing
isa verypowerful tool forusers,butit’snoteasyto setup a database to be able to use it.
CONCLUSIONS(5 min)
In thistalkI’ve triedtogive a feel forthe practicalitiesof a medievaldatabase projectandtoshow
somethingof the messinessbehindthe neatfacade.Iwanttoendwith five more general points
aboutcombiningdigital technologyandhistorical texts.
[Slide 70] [click] One of the mostbasic problemswe hadwiththe projectdidn’tinvolve the digital
aspectat all.It was simply understandingwhatsome chartersactuallymeant.Whatisthe decania of
Slavsthat DKAR1:169 refersto?We decideditwasprobably some kindof administrativeunit,but
the charter’sirritatinglyvague andwe’re stillnotquite sure.
[click] A secondissue ishowto avoidre-inventingthe wheel withdigital historyprojects.We didour
bestto buildonwhatpreviousprojectshaddone,butit’s sometimes surprisinglydifficulttofindout
specifictechnical detailsof otherprojects.We’re thereforedoingourbesttopreserve and
documentthe knowledge we’ve gained,throughthe website’sblogand alsothroughpresentations
like this.
[click] One of the otherreasonsprojectstendtoendup re-inventingthe wheel isthatdifferent
historical periods produce notjustdifferenttypesof document,butdifferentstyles.The Peopleof
Medieval Scotlandproject,forexample wasapreviousprojectbasedoncharters,butthese are far
more standardised documents inthe twelfthcenturythaninthe eighth.Changingsocial practices
alsomakesstandardpracticesacross databases verydifficulttoimplement.POMS, dealingwith
Scotland inthe central Middle Ages,foundituseful torecordGaelicandLatin namesseparately,
whichwasn’tan issue forus.On the otherhand,about half the people inmedievalScotland seemto
have beencalled eitherJohn,WilliamorRobert,sothe researchersonPOMS didn’thave tospend so
much time arguingaboutwhetherAdalbertreallyisthe same name asOdalpert.
[click] Fourthly,althoughI’ve focusedondatastructuresin thistalk, data inputstandardsare also
veryimportantforany historical projectandthey dohave to be specifiedin incredible detail.Touse
an analogy, youcan’t have one inputterclassifyingshapesasrectangleswhile anothersees themas
squares.Eventhe mostbasic data tobe inputhasto be agreed.
Evenso, problemsof inconsistentinputtingovertime are inevitable,evenif it’sjustone person
doingthat.We didour bestto discussanddocumentthe decisionswe made,usingwiki software
12
and a lotof Skype meetings,butwe still hadtodo a lotof data clean-uptowardsthe endof the
project.
[click] Whichleadsme tomy final pointabouthistorical databases.They’re inevitablyimperfect. It
doesn’thave tobe quite at the level of garbage in,garbage out,but behindthe cleanfacade of any
database projectthere tendstobe some verymessydata anda lotof compromisesondatabase
design.Butthenhistoryismessyandearlymedievalhistoryisparticularlyso.The Charlemagne
database isn’tperfect,butdespite all itsimperfection,we hope it’ll still be avaluable tool forfuture
researchers.