Text of presentation on data structures in Making of Charlemagne's Europe database, given at at Medieval Studies in the Digital Age seminar, Leeds, February 2015
Towards an Open Research Knowledge GraphSören Auer
Â
The document-oriented workflows in science have reached (or already exceeded) the limits of adequacy as highlighted for example by recent discussions on the increasing proliferation of scientific literature and the reproducibility crisis. Now it is possible to rethink this dominant paradigm of document-centered knowledge exchange and transform it into knowledge-based information flows by representing and expressing knowledge through semantically rich, interlinked knowledge graphs. The core of the establishment of knowledge-based information flows is the creation and evolution of information models for the establishment of a common understanding of data and information between the various stakeholders as well as the integration of these technologies into the infrastructure and processes of search and knowledge exchange in the research library of the future. By integrating these information models into existing and new research infrastructure services, the information structures that are currently still implicit and deeply hidden in documents can be made explicit and directly usable. This has the potential to revolutionize scientific work because information and research results can be seamlessly interlinked with each other and better mapped to complex information needs. Also research results become directly comparable and easier to reuse.
Towards an Open Research Knowledge GraphSören Auer
Â
The document-oriented workflows in science have reached (or already exceeded) the limits of adequacy as highlighted for example by recent discussions on the increasing proliferation of scientific literature and the reproducibility crisis. Now it is possible to rethink this dominant paradigm of document-centered knowledge exchange and transform it into knowledge-based information flows by representing and expressing knowledge through semantically rich, interlinked knowledge graphs. The core of the establishment of knowledge-based information flows is the creation and evolution of information models for the establishment of a common understanding of data and information between the various stakeholders as well as the integration of these technologies into the infrastructure and processes of search and knowledge exchange in the research library of the future. By integrating these information models into existing and new research infrastructure services, the information structures that are currently still implicit and deeply hidden in documents can be made explicit and directly usable. This has the potential to revolutionize scientific work because information and research results can be seamlessly interlinked with each other and better mapped to complex information needs. Also research results become directly comparable and easier to reuse.
Open Knowledge Foundation Edinburgh meet-up #3Gill Hamilton
Â
Lightning talks by
Gordon Dunsire on library standards and linked data
Gill Hamilton on recent initiatives with open and linked open data at National Library of Scotland
Linked data for Enterprise Data IntegrationSören Auer
Â
The Web evolves into a Web of Data. In parallel Intranets of large companies will evolve into Data Intranets based on the Linked Data principles. Linked Data has the potential to complement the SOA paradigm with a light-weight, adaptive data integration approach.
Vocabularies as Linked Data - OUDCE March2014Keith.May
Â
Presentation given as part of OUDCE course in Oxford 04-03-2014 on "Digital Data and Archaeology: Management, Preservation and Publishing.
Acknowledgements to Ceri Binding @Ceribin for many of the slides.
Linked Open Data Principles, Technologies and ExamplesOpen Data Support
Â
Theoretical and practical introducton to linked data, focusing both on the value proposition, the theory/foundations, and on practical examples. The material is tailored to the context of the EU institutions.
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
Â
Over the past 4 years, the Semantic Web activity has gained momentum with the widespread publishing of structured data as RDF. The Linked Data paradigm has therefore evolved from a practical research idea into
a very promising candidate for addressing one of the biggest challenges
of computer science: the exploitation of the Web as a platform for data
and information integration. To translate this initial success into a
world-scale reality, a number of research challenges need to be
addressed: the performance gap between relational and RDF data
management has to be closed, coherence and quality of data published on
the Web have to be improved, provenance and trust on the Linked Data Web
must be established and generally the entrance barrier for data
publishers and users has to be lowered. This tutorial will discuss
approaches for tackling these challenges. As an example of a successful
Linked Data project we will present DBpedia, which leverages Wikipedia
by extracting structured information and by making this information
freely accessible on the Web. The tutorial will also outline some recent advances in DBpedia, such as the mappings Wiki, DBpedia Live as well as
the recently launched DBpedia benchmark.
SENESCHAL: Semantic ENrichment Enabling Sustainability of arCHAeological Link...CIGScotland
Â
Presented at Linked Open Data: current practice in libraries and archives (Cataloguing & Indexing Group in Scotland 3rd Linked Open Data Conference), Edinburgh, 18 Nov 2013
Open Knowledge Foundation Edinburgh meet-up #3Gill Hamilton
Â
Lightning talks by
Gordon Dunsire on library standards and linked data
Gill Hamilton on recent initiatives with open and linked open data at National Library of Scotland
Linked data for Enterprise Data IntegrationSören Auer
Â
The Web evolves into a Web of Data. In parallel Intranets of large companies will evolve into Data Intranets based on the Linked Data principles. Linked Data has the potential to complement the SOA paradigm with a light-weight, adaptive data integration approach.
Vocabularies as Linked Data - OUDCE March2014Keith.May
Â
Presentation given as part of OUDCE course in Oxford 04-03-2014 on "Digital Data and Archaeology: Management, Preservation and Publishing.
Acknowledgements to Ceri Binding @Ceribin for many of the slides.
Linked Open Data Principles, Technologies and ExamplesOpen Data Support
Â
Theoretical and practical introducton to linked data, focusing both on the value proposition, the theory/foundations, and on practical examples. The material is tailored to the context of the EU institutions.
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
Â
Over the past 4 years, the Semantic Web activity has gained momentum with the widespread publishing of structured data as RDF. The Linked Data paradigm has therefore evolved from a practical research idea into
a very promising candidate for addressing one of the biggest challenges
of computer science: the exploitation of the Web as a platform for data
and information integration. To translate this initial success into a
world-scale reality, a number of research challenges need to be
addressed: the performance gap between relational and RDF data
management has to be closed, coherence and quality of data published on
the Web have to be improved, provenance and trust on the Linked Data Web
must be established and generally the entrance barrier for data
publishers and users has to be lowered. This tutorial will discuss
approaches for tackling these challenges. As an example of a successful
Linked Data project we will present DBpedia, which leverages Wikipedia
by extracting structured information and by making this information
freely accessible on the Web. The tutorial will also outline some recent advances in DBpedia, such as the mappings Wiki, DBpedia Live as well as
the recently launched DBpedia benchmark.
SENESCHAL: Semantic ENrichment Enabling Sustainability of arCHAeological Link...CIGScotland
Â
Presented at Linked Open Data: current practice in libraries and archives (Cataloguing & Indexing Group in Scotland 3rd Linked Open Data Conference), Edinburgh, 18 Nov 2013
Examines how new technologies can be applied to overcome problems in controlled vocabularies, focusing on Resource Description Framework (RDF), Simple Knowledge Organisation System (SKOS), metadata registries and web services. Part of the Cataloguing and Indexing Group in Scotland (CIGS) seminar "Toto, I've got a feeling we're not in Kansas anymore": metadata issues and Web2.0 services.
This presentation was provided by Jake Zarnegar of Silverchair, during the NFAIS Forethought event "Artificial Intelligence #2 â Processes for Media Analysis and Extraction" The webinar was held on May 20, 2020.
This paper describes the concept of a data lake and how it compares to a data warehouse. A review recent research and discussion of the definition of both repositories, what types of data are catered for? Does ingesting data make it available for forging information and beyond
into knowledge? What types of people, process and tools need to be involved to realise the
benefits of using a data lake?
Providing geospatial information as Linked Open DataPat Kenny
Â
ADAPT is revolutionising the way people can seamlessly interact with digital content, systems and each other and enabling users to achieve unprecedented levels of access and efficiency. - Prof. Declan O'Sullivan, Trinity College Dublin. Address given at Ordnance Survey Ireland GI R&D Initiatives, Tuesday, 22 March 2016, 13:00 to 20:30 (GMT), Maynooth University.
Data mining Course
Chapter 1
Definition of Data Mining
Data Mining as an Interdisciplinary field
The process of Data Mining
Data Mining Tasks
Challenges of Data Mining
Data mining application examples
Introduction to RapidMiner
1. 1
Bits of charters: putting Carolingiancharters intoa database
INTRODUCTION
[Slide 2:projectwebsite]The Makingof CharlemagneâsEurope project ranfrom2012-2014 at Kingâs
College Londonand createdadatabase frameworkforthe storage and retrieval of prosopographical,
geographical andsocio-economicdatafrom earlymedieval charters.The projectteamalsoinput
data fromnearly1000 chartersintothe database systemtoproduce a corpusthat couldaddressa
wide range of researchqueries.
If anyone wouldlike a demonstration of the database,Icangive one brieflylater. Thistalk isfocusing
on the decisionprocesses behindthe scenes of the project, discussingsome aspectsof the database
designandthe reasonsforthem. How do youdecide the technological formandscope of a project?
How can typesof data such as places be enteredconsistentlyandretrievedeffectively?How doyou
thenlinktogetherbasicunitsof people,placesandpossessionstoprovide useful representationsof
activitiesinacharter?And finally,havingputall the dataintoa database, how doyouget it out
again?What Iâm tryingto showissome of the practical problems involvedintakingunstructured
and oftenambiguousmedieval informationandputtingitindigital formats.
But I wantto start withthe most basicquestion:why thisproject?Digital humanitiesreflectsthe
interactionof twodifferentfields:fora goodprojectyouneeda balance betweenthe technological
developments thatmake new digital approachesfeasibleandthe researchquestionsthat scholars
wantnew insightsinto.Andthereâsalsoathirdaspect,whichshouldnâtbe overlooked:whatyoucan
getfundingfor. One of the bigproblemswith alotof digital humanities projectsisthatâsithardto
getthe long-termfundingtoallowthemtoshow theirfull potential.
The Making of CharlemagneâsEurope projectessentiallyarose fromthe coincidence of twoaspects.
[Slide 3:previousDDHprojects]One wasthatthere isa longtraditionof creating prosopographic
databasesat Kingâs;the Departmentof Digital Humanitieshad more than20 yearsof expertisein
that.
[Slide 4:MKCHEUR spreadsheet]The otherwasthe large numberof Carolingiancharterswhichexist,
whose prospographical and socio-economicdatahasnâtbeenfullyexploited.Itmade obvioussense
to combine these two factorsandalsoto take advantage of developmentsinmappingsoftware to
representspatial informationinnewways.
CHARTERS AND THEIR REPRESENTATION
Iâve mentioned chartersseveral times,butwhatexactly are they?The shortanswerismedievallegal
documents,mostlyconcernedwiththe grantingorsellingof property.Iâvegivenyouahandout
whichisthe textand translationof one quite longcharter.Iâmgoingto be referringtothat a lotin
my examples.
But a longeransweristhat there are differentwaysof understandingwhatacharter isand they
logically resultindifferentkindsof digital projects. [Slide5:understanding] Thereare three main
waysyou can thinkof a charter: as a material object,asatextor as a source of information.
2. 2
Firstly,youcan thinkof a charter as a material object:aparticularpiece of parchmentwithwords
and signswrittenonit. [Slide 6:Marburg] A digital projectthinkingof chartersinthat way,like the
Lichtbildarchiv ÀltererOriginalurkunden includes high-qualitydigital imagesof the charters,
combinedwithtoolstoallowthe analysisof these images.A similarapproachisbeingtakenby the
newModelsof AuthorityprojectbeingjointlydevelopedbyKingâsCollege andGlasgow.1
The problemforsuch an approachwithCarolingianchartersisthatwe donâthave the original
manuscriptforthe vast majorityof them;insteadwe onlyhave copiesfromlaterinthe Middle Ages
(or sometimes evenjustearlymoderntranscripts). Anapproach focusingonmaterial objects would
therefore meanwe were unabletouse a lotof charters fromthe corpus.
A second viewof charterstreatsthemas texts. [Slide 7:CDLM] One optionforour projectwould
have beentoset upa database containingthe full textsof charters, togetherwithsome kindof XML
mark-upto allowusto pickout wordsand termsof particularinterest:personal names,titles,verbs
indicatingdonationetc.Thisisthe approachtakenbyprojectssuch as
Codice diplomaticodellaLombardiamedievale. [Slide 8:CEI]Thereâsalreadyan international group
workingonstandardsfor encodingcharters
The difficulty if we wentdownthatroute wouldhave been the datawe workedwith.Carolingian
chartershave relativelylittlestructure tothem,theirspellingcanbe idiosyncratictothe pointof
unintelligibility andtheycanalsobe maddeninglyindirectinreferringtopeopleandplaces. [Slide 9:
Texton screen] Forexample, the charterIâve givenyoureferstohow Tassilo âhadcauseda
monasterytobe newlybuiltinhonourof the HolySaviourwithinourforestatthe place called
KremsmĂŒnsterinthe pagusof Traungauâ(inmonasteriumhonore sancti Salvatorisinfrawaldo
nostroloco qui diciturChremisainpagonuncupante Drungaoe novoopere construerefecisset).
What bitof that Latintextdo youmark up as the ânameâof KremsmĂŒnstermonastery?[Slide10:
referencestoFater] AndalthoughFateris the abbotof KremsmĂŒnster,thisfactisonlyever
mentionedindirectly Sohowdoyoumark up the textto show that Faterâsnotjustany oldabbot,
but specificallythe abbotof the monastery receivingthe donation?
Recentscholarshiponchartershastendedtofavourthe textual approachtocharters:itâsoften
focusedonindividualchartersorat the widestregional Urkundenlandschaften (charterlandscapes).
But we didnâtwantto treatcharters as special snowflakes:we wantedtofindawayto lookacross
the whole corpusfromCataloniatoAustriaand compare socio-economicdatafromacross different
regions.Butmark-upof textswouldnât allow us todothiseasily:itsstructuresarenâtsuitedto
findingall the unfree peoplementioned underhalf adozendifferentLatinterms orfor findingall
female vendors.
To answerthose kindof questions,ourprojectwas thereforethinkingof chartersina thirdway:as
sourcesfromwhichinformation couldbe extractedand putintostandardisedformatsfor
comparisonandlarge-scale analysis. Andourtoolsforthatwere a large-scale relational database
combinedwiththe âfactoidmodelâ.
1 http://www.digipal.eu/blog/new-digipal-project-models-of-authority/
4. 4
database mightbe usedto answer.Lookingatthemnow at the endof the project, mostof themare
too complicatedtoanswerwiththe currentversionof the userinterface.But itâsbythinkingof key
researchquestions inthatwaythat we couldworkout which data to record. For example, we
developedstructurestorecordthe relationshipsbetweenplaces, anoption whichwasnâtavailablein
previousprojects.Iâlltalkabout those abitlater.
Anotherpartof developingthe projectwasdeciding itslimits:some of the issuesthatwe wouldnât
realistically be able totackle. The single mostfrequentcomplaintwe have hadconcerningthe
database isthat we donâtinclude the full textof the charters(althoughwe hope inthe future to
include linksto the full textavailablefromotheronlinesites).Butgettingsuchdataintothe system
consistently wouldhave requiredavastamountof extrawork.Iâve givenyouthe textof DKAR 1:169
on the handout.Hereâs the firstsentence of atextof a differenteditionof the same charter [Slide
18: KremsmĂŒnstercharter].Thissingle sentence has 6differences fromthe textIâve givenyou[click].
If weâdusedprojecttime toinputthe full textof databaseswe probablywouldnâthave ahdtime to
do anythingelse.
DATA STRUCTURES AND STANDARDS
There are alwaysthingsyouâre notable to achieve withinone project.Butevenwithoutthe full text,
whydiddevelopingdatastructuresandstandardsfor the Charlemagne database take solong?
What I want todo now is to lookinsome detail ata couple of examplesof the datastructureswe
created. Everydatabase projecthasto do thiskindof work,so althoughsome of the detailsIâll be
discussingare specifictoearlymedievalcharters,Ihope youâll findalotof the conceptsare more
widelyrelevant.Ourunderlyingrelational databaseishorrendouslycomplex.The schemaforitwent
fromthis[slide 19] to this[slide20] anditâsnow even bigger. Butbecause itâsarelational database,
youâre essentiallybuildingupthiscomplexityfromalarge number of relativelysmall buildingblocks.
Iâmgoingto be talkingabouttwoof these:placesandtransactionfactoids,usingexamples mainly
fromthe charterIâve givenyou onthe handout.
Places
Thinkingaboutplaces,twomainaspectsdrove the structureswe developedfor recordingthem.One
was technologicaldevelopmentsindigitalmapping;we wantedtobe able torepresentdata
patternson a map onscreen.The otherwasthe data itself. [Slide21] In charters,whatwe get isnât
technicallyplaces,butplace names.How yougetfroma name ina medievalchartertoa spot on the
map turnsout to require alot of thought.
The firstdistinctionwe made inthe database wasbetweencharter-specificinformation onplace
names(whichwe recordedin individualfactoids)andstaticinformationonaplace (whichwas
recordedinthe place model). [Slide 22] Forexample, the charterIâve givenyou,DKAR1:169 has two
differentmanuscripts andtheirspellingof places varies. The place name editedasâBettinbahâ hasa
variantspellinginfootnote11 of âPetenpachâ. ThereâsanothercharterrecordingTassiloâsoriginal
grant whichcallsthe same place âPetinpachâ.Insome chartersyou evengetseveral different
spellingsof aname (especiallypersonalnames) within asingle text.Andthereâsalsothe
complicationof Latinbeinganinflectedlanguage,soyoucan get namesin differentcases,likethe
âactum Wormacieâinthe lastline of the charter.
5. 5
One of ourearlydecisions therefore wasthatweâdincludefieldstorecordthe original textof place
namesand thatthese original textboxes wouldbe searchable. [Slide23:place name search]. If you
put âWormacieâintothe searchplace namesfunctionsof the database,youâllcorrectlyfindâWormsâ,
be toldthat itâsin Germanyand evenhave itlocatedona map foryou that,so that you have what
youmightcall a âbottom-upâsearch.If youâve gotjustaname in a medieval charteryouâve gota
possible wayof identifyingit.
But behindthatlinkingfromaname instance toa map, thereâsalot of work goingonin termsof
identificationandstandardisation. Iâll show this schematicallywith anotherplace inthe same
charter: the place calledLiublinbah(towardsthe bottomof the firstpage of the handout).[Slide 24:
charter-specificinformation]We startoff withwhatthe chartertellsus:Liublinbahisaâlocusâ,itsrole
inthe charteris as the locationof a possessionbeingdonatedanditâsinthe pagusof Drungaoor
Traungau (pagiare Carolingianadministrative regions,similarto Anglo-Saxoncounties).
The firstthingwe have to do is produce a standardisedmedieval name forthisplace.Ourmain
reference source forsuch LatinnamesisOrbisLatinus[slide 25],but like the majorityof minor
placesinour charter,Liublinbahdoesnâtappear inthat.[Slide 26:standardmedieval name] So
insteadwe choose one of the spellingsanduse that(addinganasterisktoshow that it may need
furtherworkat some point).
The nextquestioniswhere thisplace is.We relyonthe editorsof the charterfor this;we donâttry
and researchdetailsforourselves,since we simplydonâthave time. [slide 27:place identification] In
thiscase,the editorsaysthatthe settlementconcernedisLeombach,a locationinAustriaandhis
identificationiscertain. [slide 28] Once we have thisidentification,we can use modernreference
toolsto check the geo-locationforLeombachandwhatpart of Austriaitâsin.[slide 29] Fromthis,we
can create a place recordfor Leombach.[slide 30:map fromDKAR1:169] Whenwe enterdata
concerningthe charteritself we thenincludecharter-specificinformationsuchas the placeâs role or
the place descriptor.
That may all seemratherlongwinded, butthiswayof thinkingaboutplacesgivesusa lotof flexibility
for dealingwithmore complex cases. Forexample,we quiteoftengetriversormountains
mentionedincharters;[slide 31:Ipfbach]inDKAR1:169 there are the two riverscalledIpfbach
(Ipphas inthe original).We canâteasilygeo-locate rivers,butwe caninputthe name data we do
have intothe systemandproduce recordsfor natural featuresinthatway
The more difficultsituationiswhenthereâsuncertaintyaboutwhat place the medieval name refers
to. Sometimes,the editorwill justgive anapproximate area.[Slide 32:Raotola]. The editorof DKAR
1:169 thinks Raotola,where there are several vineyards, issomewhere onthe Rodelbach,atributary
of the Danube.We inputthe place as having a medieval place name,butnomodernone,butan
approximate locationmeanswe canrecordit withinmodernplace hierarchies:itâssomewhere
withinUpperAustria
Alternativelythe editormaysuggest one ormore possible modernlocationsforaparticular
medieval place. Forexample,acharterfromMondsee discussesadonationof propertyatan
unknownplace called Teginga.[Slide33:schematic] Here iswhere we effectively carve upour
6. 6
earlierschematicintotwo.We still have amedievalplace onone side, butitisnow beingtentatively
matchedto three differentmodernplaces,withvaryinglevelsof probability. [Slide34] Byrecording
possible matchesinthisway,we caneventuallygenerate adisplayforthe usersthatshowsthe
possible options.
The final aspectof places I want to talkaboutis place relationshipfactoids.One of the thingswe
realisedearlieronwhenlookingatourplace data is that we had twodifferentsortsof place
hierarchy,where placesare inlargerunits.Firstly,we hadthe modernhierarchies:Leombachisin
the Austrianstate UpperAustria,whichisinthe moderncountryAustria.Thatwas easyenoughto
record.But what didwe do aboutthe otherinformationweâre beinggiven,thatLeombach isinthe
pagusof Traungau? How do we recordmedieval hierarchies?
PreviousprojectsatKingâshave beenable tofudge thisandmeldtogethermodernandmedieval
hierarchiesbecause theyâve beendealingwithaBritishadministrative systemthatâsextraordinarily
long-lasting. Forexample, the Prosopographyof Anglo-SaxonEnglandusedthe pre-1974English
countiesfortheirhierarchies,whichare nearenoughtoAnglo-Saxoncountiestobe workable.But
we didnâtknowwhere Carolingian pagiwere onthe map.Infact, thereâsevenascholarlyargument
aboutwhetherpagiwere flatareason the ground at all or just scatteredcollectionsof
administrativerights.So torecordthe factthat Leombachwas inthe pagusof Traungau, which
mighthelpresearchersto understand more aboutpagi,we neededsome additionalstructures.
What we endedupusingiswhatâscalleda place relationshipfactoid. [Slide 35] Thisisa charter
specificassertionthatâCharterXsays Place 1 isin Place 2â, and alsoincludesplace descriptorsfor
the two placesconcerned.The original ideawasthata userwouldbe able to pull upa place record
for a medieval regionand thenbe able tosee all the placessaidto be withinin(includinggeo-
locationswhere theyâreknown).We werenâtable toimplementthisfully, butevensothese factoids
still provide useful information.Andtheyalsoillustrate anotherimportantaspectof any digital
humanitiesproject.If youthink apiece of data mightbe useful,itâsmuchbetterto start recordingit
earlyon, rather thanhavingto go back laterand rechecka large numberof records.
Transaction factoids
As youcan see,justdesigningthe buildingblocksforthe database system, like places,takesalotof
thoughtif youâre goingtobe able to recordthemconsistently. Ourtrickiestproblem, however,was
workingouthowto record transactions,the actual businessof the charters,withinthe database.
Anditâsdifficultbecauseweâre tryingtorecord dynamicratherthan juststatic information. If you
lookat this place relationshipfactoid,forexample,the statementâcharterXsays Leombach isin
Traungauâdoesnâtalterthroughoutthe charter. Similarly,whenDKAR1:169 refersto âAbbotFaterâ,
whichweâdrecordas the attribute andrelationshipfactoid âFaterisabbotof KremsmĂŒnster,thatâs a
fixedstatement.There maybe earlierandlaterchartersinwhichFaterisnâtabbotof KremsmĂŒnster,
but inthisparticularcharter he alwaysis.
In contrast,the mainimportance of a charter is thatitâschangingthings,typicallysomeone granting
propertyto someone else.Thingsare differentafterthe actionsdescribedinthe charterthanthey
were before.Buthowdoyou recordsuch a change in a relational database structure thatdoesnât
allowfordifferentstates?
7. 7
The firstthingwe did was simplifythingsby breakingdownthe activitiesin anycharterintoa
collectionof differenttransactions (possessionsflowingaround) andevents(allthe otherthings
goingon).[Slide 36] I showedyou thisdiagramforthe activitiesinDKAR1:169 earlieron. Itâsvery
complex because itshowsyoualmostall the activities (thoughinfactthere are few more eventhan
that).But if we start breakingthese activities down,we cangetrather more manageable units.
Firstly,there are three differentevents. [Slide 37] Tassilofoundsthe monasteryof KremsmĂŒnster
and [Slide 38] there are two examplesof landclearance.[Slide 39] Activitieslike theseare recorded
ineventfactoids,whichgive fairlybasicinformationaboutagents,placesandthe type of activity
goingon.
Once weâve dealtwiththose, weâre leftwiththe transactioninformation[Slide 40].Whatwe have in
thisdiagramis three separate transactions. [Slide41] One isa recordof whathappened inthe past:
Tassilograntedpossessions toKremsmĂŒnster.[Slide 42]The othertwoare Charlemagneâsactionsin
the present.He confirmsKremsmĂŒnsterâspossessionsandhe alsograntsto the menof Eberstalzell
the right to remainonlandtheyâve cleared illegally
We needtobreakdown the charter intothese separate transactionstoallow us torepresentthe
flowsof possessions inadatabase structure.[Slide 43] Sofor example,Tassiloâsinitial grant to
KremsmĂŒnstercanbe representedinaseries of tablesthatshowsthe agentsinvolved,thenthe
detailsof the placesmentioned,thenthe possessiontransferredandsoon Weâve frozenthe activity
ina waythat allowsusto describe itwithinarelational database.
Thisis a simplifiedversionof the structure we use torecord transactions,butit still givesusquite a
lotof flexibility.[Slide44: multipledonations] Forexample,itmeansthatwe can record a numberof
donatedpossessionsinthe same record;we donâtneed 22 differentfactoidsforthe 22 different
thingsTassilogave toKremsmĂŒnster,whichisabigrelief.Andif there are different locationsor
termsand conditionsfordifferentpossessionswe canrecordthose inone go.
But if weâre goingtouse a data structure like this,we needtomake sure thatwe can interpret
everythingunambiguouslyfromthe informationwe recordinthe table.[Slide 45] If you try andput
bothCharlemagneâsconfirmationtoKremsmĂŒnsterandhisgrantto the menof Eberstalzell inthe
same record,how doyou keeptrackof whoâsgettingwhatpossessions?Yourapidlygeta complete
mess. Sowe had to define atransactionas involving onlytwomainagents(oragentsworking
together) andonlyone type of activity,sowe didnâtcombine aconfirmationandagrant inthe same
record.
We still had furtherdifficultiestosolve. One issuewasthatthe neatdistinctionIâve beenmakingall
alongbetweenagents,placesandpossessionsdoesnât actuallyworkwhenyoulook closely atthe
data. [Slide 46] For example, lookatthe 22 possessionsthatDKAR1:169 mentions. Asyouâll see,
amongthe thingsbeinggranted [click] are entitiesthatwe regardas agents:churches,forexample,
like thatat Alburg,butalsopeople beinggranted,like the craftsmeninRaotola.Andaswell asland
at particularplaces beinggiven,thereâsalsoawhole place beinggiven: the villaof Alkoven [click].
We hadto ensure thatwe couldrecordagentsand placesas possessions;we alsostillhadtorecord
8. 8
themas agentsand placesinthe normal way.People donâtcease tobe people justbecause the Duke
of Bavariatreatsthemas objectsforsome purposes.
[Slide 47] As youmay alsohave noticed,possessionsno16 and 18 on thislisthave an additional
complication:thereâsmore thanone objectinvolved. Tassiloâsgiving2vineyardsatAschachand 3 at
Raoltola,sowe alsoneededtorecordquantitiesof objectsbeingtransferred.
A final issue wasthatnotall the transactionswe were interestedinhadidentical flowsof
possessions. [Slide 48:sale] Ina sale,there are possessionsgoingbothways:how doyourecord
that? [click] Ourapproachwas to add anothercolumntothe table for possessions:if the returnbox
ismarked,it meansthatthese particularpossessionsare flowinginthe oppositedirection.[Slide 49]
In the online versionof the database, we use anarrow to highlightthis.
Unfortunately,however, DKAR1:169 hasan evenmore complicatedtwist.[Slide50:
transaction]Letâsgobackto Charlemagneâsinteractionwiththe menof Eberstalzell. Charlemagne
grants to these men the rightto remainonlandtheyâve cleared,butinreturntheyhave todo
service notto him,butthe monasteryof KremsmĂŒnster.Thisisnolongeranarrangementbetween
twoparties:a thirdparty is nowinvolved.
[Slide 51:diagram] Aftera lotof discussion,we eventually developedadata structure inwhichwe
couldrecord sucharrangements,whichwe calledthird-partyreturns. Essentiallythiswasavariantof
the methodweâdalreadyusedforordinaryreturns;we just neededtoadd anothertick-box tothe
inputform.
But thishighlights akeyissue whenyouâredesigningdatastructures:thereâsatrade-off between
how accuratelyyoucan representyour dataand how complex yourdatastructuresneedtobe. Itâs
alwaysa temptationto developadatabase thatcan deal withevery possible datavariant,butyou
endup bymakingit more and more complicated anddifficulttouse.Asitwas,we had inputscreens
where youhadto scroll across horizontallytosee all the fieldsyouhadtoinput.
[Slide 52:shoes] Eventually therecomesapointwhere youhave to decide youâre notgoingto
change your data structuresanymore; youjusthave to fitthe data, howeverimperfectly,intothe
existingstructuresThe problemisknowingwhenyouâve got tothat point.Doyou designelaborate
data structuresforthe minorityof casesthatare as complex asthe charter Iâve givenyou? The
problemcomesinknowing howcomplexthe average charterisbefore youâvelookedatitindetail.
Third-partyflowsturnedouttoappearonlyinabout 3% of charters; possiblywe couldhave dealt
withtheminsome otherway,but at the time thisseemedthe mosteffectiveapproach.
More generally,whendesigningstructuresandstandardsitâsveryeasytomake decisionsthatyou
laterrealise were wrong.Forexample,we didnâtinitiallytreatall churchesasagents,buttriedto
distinguishbetweenmore andlessimportantchurches.Towardsthe endof the project,we realised
thisapproach wasnâtworkingandhadto go back and re-inputsome of the data.
So developingdatastructuresandstandardsforthe Charlemagne project wasanodd mixture.We
had to combine detailedconceptual analysisof platonicideal of names,charters,transactionsand
9. 9
the like withthe messyreality of the actual formsthatsuch thingstake inpractice.JohnBradley and
Michele Pasin, membersof ourdigital humanitiesteam, once wrote apaperentitled âStructuring
that whichcannotbe structuredâ 3
andin a sense thatâswhatweâve beentryingtodo. Butitâsonlyif
youcan inputdataintothe database ina way thatâsbothstructuredand that retainsasmuch of its
meaningthatyoucan getit outagain ina useful form.ThatâswhatI want to discussbrieflyinthe
final partof thistalk.
GETTING DATA OUT
In some waysgettingdataout isharderthan gettingitin.Just sortingoutthe displays of factoidsso
that theymake sense toend-usersis time-consumingandthere are still aspectsof thatthat coulddo
withbeingimproved.Butthe biggestproblemwhenwe were designingthe userinterface was
providinganeffective wayforuserstobrowse the database andfindthe particularinformation
there were lookingfor.
The main methodwe usedwasâfacetedbrowsingâ,whichisincreasinglyusedbymanydatabases.To
explainhowthatworks,Iâll showyouaverysimple example,notusingcharters.[Slide53] Suppose
youhave a database thatcontainsinformationabout colouredshapes.How doyoufindthe object
withthe particularshape andcolour youwant?
The traditional wayiswitha search box [Slide 54],butthere are several problemswiththis.The first
isnot knowingthe righttermsto use. [Slide 55] Forexample,youinputthe termâsquareâandget
zeroresults. Whyisthat? Because the shapesinthe database arenâtactuallysquares, evenif they
looklike it, butrectangles,sotheyâre all listedunderthat term.Similarly,supposeyouâre interested
inparticularsorts of triangle. [slide56] You inputthe pair of termsâredâand âtriangleâ.Againyouget
zeroresults. Butwhyis that?Doesthe database notcontaintriangles?Orhas itclassifiedthemall
undersome differentterm?Perhapsitthinksthatcolourisnâtredbutscarlet? Searchingadatabase
youâre notfamiliarwithcanbe worryinglylike makingrepeated stabsinthe dark.
Facetedbrowsing,incontrast,providesaneasierwaytonarrow downyoursearch till youfind
exactlywhatyouâre lookingfor. [Slide 57] Inthiscase,you mightbe givena choice of twofiltersto
browse by:by shape or bycolour.Choose one of these [click] andyouthengetshownhow many
examplesthere are of each colour[click].Itâsmuchsimplertofindall the red objectsyouwant[Slide
58].
But youâre notinterestedinall red objects,justredtriangles.Hereâswhere youcancombine filters.
[Slide 59] Choose the shape filter,andyouâre shown whatshapesthe redobjectshave Sothere
definitelyarenâtanyredtrianglesinthe database,itâsnotjustthatyouâve gotyour searchterms
wrong.Andif you wantto, youcan nowclearthe colour filteranduse the shape filtertolookfor
trianglesof anycolour[Slide 60].
Facetedbrowsingtherefore,offersusersaneasywayof narrowingdownentriesinadatabase to
findexactlywhattheywant.Youâre alwayskeptaware of whatfacetsyouâve usedalreadyandyou
3 Bradley,John, and Pasin,Michele,'Structuring that which cannot be structured: a role for formal models in
representing aspects of Medieval Scotland',in Matthew Hammond (ed.), New perspectives on medieval
Scotland, 1093-1286,Woodbridge: The Boydell Press,2013,pp. 203-14
10. 10
can remove some of themif youfindyouâre notgettinganyresults.The same basicprinciplesas
withthe colouredshapesunderlie the muchmore complicatedfacetedbrowsinginourdatabase.
You can browse bycharter, agentor place and graduallydrill downtofindwhatyouwant [Slide 61â
3 clicks]
Facetingmakesthingsalot easierforusers,butitdoesnâtsolve all theirsearchproblems.Users of
our database can browse byeithercharters,agentsorplaces,buttheyhave to thinkabout the
meaningof the filterstheyuse inthese differentviewstogetthe resultstheyexpect.Suppose
youâre browsingbycharters.Youcan choose tofilterthe chartersby variouscharacteristicsof the
agentstheycontain,soyou couldchoose all those thatinclude scribes[Slide 62] andthenadd in
women asan additional filter[Slide 63]. Youendup with177 charters,but that doesnâtmeanthat
there are 177 charterswrittenbyfemale slides,butthatthere are 177 whichinclude ascribe of
some sex andalsoa womaninsome role. [Slide 64] Infact, if yousearch viaagents,youfindthat so
far we havenâtfoundany scribeswhoare definitelyfemale.
In addition,facetedbrowsingwitha database like ours(unlike withthe colouredshapesexample)
needsalot of behind-the-sceneswork sothatwhenusersclickona filtertheygetthe resultsthey
expect.Todemonstrate that,Iwant to talkaboutone of the most complicatedfilterswe hadto
develop,thatlinkingagentsandplaces.If youwant tofindall the agents connected withthe place
Worms,for example,howdoyou definethisconnection inameaningful way?
A simple-mindedrule wouldsayall thatall agents whoappear ina charter whichmentions place X
are connected withplace X.[Slide 65]. Butas youâll see fromthe exampleof DKAR1:169 youget
some unsatisfactory results. The groupof Slavssomewhereoutinthe wildsof Austria,forexample,
may well nothave knownanythingaboutWorms.Doesitmake sense to connectthemto it?Equally,
if we happenedtoknowthe scribe of thischarter (we donât),heâdbe sittinginWormsandthe first
he may have heardof Leombachiswhenheâsaskedtowrite a charter mentioningit.AsforTassilo,
by 791 when thischarter waswritten,heâdbeendeposedbyCharlemagne andwasbeingheldin a
monasteryinNormandy. ConnectinghimtoWormsbecause a charter heâdonce givenwas
confirmedthere seemstenuousatbest.
What we had to do therefore waslookathow agentsare connectedtoplacesinthe charter bythe
rolesthey play.[Slide 66] Youimmediatelygetamuchsmallerbutmore meaningful setof
connections.So,forexample,Tassiloasagranter isconnectedtothe differentplaceshe donated
and so is KremsmĂŒnsterasthe recipient.Butthe monastery isalsoconnectedtoWorms,because it
(or at leastFaderrepresentingit) wenttoWormsto geta confirmationcharter fromCharlemagne.
The beekeepersof Raotola,meanwhile,are connectedto Raotola,butnotto anyof the otherplaces
mentionedinthe charter. We donâtknow if theyeverwenttosome of the other properties of
KremsmĂŒnsterthatare mentioned.Andwe donâtknow exactly where the decaniaof Slavswere,so
theydonâtgetconnectedto anywhere.
We obviouslycouldnâtdothissortof detailedanalysisof agentandplace interconnectionsforevery
charter. [Slide 67] Instead,we hadto come upwith rules(still verycomplicated) forhow agentroles
and place rolesare connectedtogether. Forexample,anyone whose agentrole makesthem
responsible forthe flowsof possessions(like grantersandrecipients) shouldbe linkedtoall places
11. 11
whichhave the place role âlocationof possessionâ.Inmostcases(whichwe specify) theyshouldalso
be linkedtoanyplace withthe place role âlocationof transactionâ.Anyone whoseagentrole just
involvesthembeingpresentata transaction (like apetitioner) shouldbe linkedonlytothe place
withrole âlocationof transactionâ.
The full rules justforthisfacetprobably tookus a monthor more of discussiontoworkout. [Slide
68] Hereâsanextractfrom the final results.Thisdocumentgivesthe rules,butitalsoincludesthe
testdata: examplesthatwe coulduse tocheckthe facetswere workingaswe wanted.Doingthis
testingwasincrediblypernicketyandtediousforboththe historiansandITspecialists [Slide69].But
it wasthe onlyway of checkingthatuserswouldalwaysgetthe expectedresults. Facetedbrowsing
isa verypowerful tool forusers,butitâsnoteasyto setup a database to be able to use it.
CONCLUSIONS(5 min)
In thistalkIâve triedtogive a feel forthe practicalitiesof a medievaldatabase projectandtoshow
somethingof the messinessbehindthe neatfacade.Iwanttoendwith five more general points
aboutcombiningdigital technologyandhistorical texts.
[Slide 70] [click] One of the mostbasic problemswe hadwiththe projectdidnâtinvolve the digital
aspectat all.It was simply understandingwhatsome chartersactuallymeant.Whatisthe decania of
Slavsthat DKAR1:169 refersto?We decideditwasprobably some kindof administrativeunit,but
the charterâsirritatinglyvague andweâre stillnotquite sure.
[click] A secondissue ishowto avoidre-inventingthe wheel withdigital historyprojects.We didour
bestto buildonwhatpreviousprojectshaddone,butitâs sometimes surprisinglydifficulttofindout
specifictechnical detailsof otherprojects.Weâre thereforedoingourbesttopreserve and
documentthe knowledge weâve gained,throughthe websiteâsblogand alsothroughpresentations
like this.
[click] One of the otherreasonsprojectstendtoendup re-inventingthe wheel isthatdifferent
historical periods produce notjustdifferenttypesof document,butdifferentstyles.The Peopleof
Medieval Scotlandproject,forexample wasapreviousprojectbasedoncharters,butthese are far
more standardised documents inthe twelfthcenturythaninthe eighth.Changingsocial practices
alsomakesstandardpracticesacross databases verydifficulttoimplement.POMS, dealingwith
Scotland inthe central Middle Ages,foundituseful torecordGaelicandLatin namesseparately,
whichwasnâtan issue forus.On the otherhand,about half the people inmedievalScotland seemto
have beencalled eitherJohn,WilliamorRobert,sothe researchersonPOMS didnâthave tospend so
much time arguingaboutwhetherAdalbertreallyisthe same name asOdalpert.
[click] Fourthly,althoughIâve focusedondatastructuresin thistalk, data inputstandardsare also
veryimportantforany historical projectandthey dohave to be specifiedin incredible detail.Touse
an analogy, youcanât have one inputterclassifyingshapesasrectangleswhile anothersees themas
squares.Eventhe mostbasic data tobe inputhasto be agreed.
Evenso, problemsof inconsistentinputtingovertime are inevitable,evenif itâsjustone person
doingthat.We didour bestto discussanddocumentthe decisionswe made,usingwiki software
12. 12
and a lotof Skype meetings,butwe still hadtodo a lotof data clean-uptowardsthe endof the
project.
[click] Whichleadsme tomy final pointabouthistorical databases.Theyâre inevitablyimperfect. It
doesnâthave tobe quite at the level of garbage in,garbage out,but behindthe cleanfacade of any
database projectthere tendstobe some verymessydata anda lotof compromisesondatabase
design.Butthenhistoryismessyandearlymedievalhistoryisparticularlyso.The Charlemagne
database isnâtperfect,butdespite all itsimperfection,we hope itâll still be avaluable tool forfuture
researchers.