MEDIEVAL studies & the digital
turn
Department of Digital HumanitiesToby Burrows
Scope and scale
• Medieval studies – a vast field; interdisciplinary and multi-
disciplinary
– The surviving evidence
– Subsequent material – including medievalism
• Digital Humanities – also a very large field
– Tools
– Theoretical framework – computational modelling
• 18 DH sessions at Kalamazoo ICMS 2015
– 7 medieval papers at DH 2015
The digital landscape
• Digitized manuscripts and digital image collections:
numerous libraries, museums, researchers
• Digitized editions and reference works: Patrologia Latina,
Corpus Christianorum, Hathi Trust, Google Books
• Catalogues and databases: libraries, museums, specialist
research collections
• Tools for working with images: IIIF, DigiPal
• Tools for working with texts: NER and topic modeling,
transcriptions and editions
• Visualization and analysis tools: social networks,
geographical, chronological
Information overload?
“Their works are so limitless that they cannot be numbered
… Indeed, we stand convicted of indolence by our inability to
read all that they could manage to dictate”
Hugh of St Victor, Didascalicon, Book 4, chapter 2
“Is there anywhere on earth exempt from these swarms of
new books? Even if, taken out one at a time, they offered
something worth knowing, the very mass of them would be a
serious impediment to learning – from satiety, if nothing
else”
Erasmus, Adages, II.1.1
The limits of digitization
“Around 60,000 medieval manuscripts are preserved in German
collections today, of which roughly 7.5 percent have been digitized
so far.”
C. Fabian, C. Schreiber, “Piloting a National Programme for the Digitization of
Medieval Manuscripts in Germany”, Liber Quarterly 24 (1) (2014), 2-16
“We invite you to take part in an experiment in DIY digitization.
Please upload your digital photographs of the Bodleian’s special
collections to this [Flickr] group.
These are your photographs, and you can license them
accordingly, but please include at least a shelfmark for
identification. We just want to see how people might use this
online resource for sharing photographs of our collections.”
Bodleian Library, 26 June 2015
Joining the dots
• We have lots of resources and tools, but they mostly exist
independently of each other
• Duplication of effort
• Time spent trying to identify and bring together different
sources
• How do we bring them together?
• Integrated infrastructure – standardized, centralized
• Interoperability – of services and data
• IIIF – International Image Interoperability Framework
• LOD – Linked Open Data
VICTORIA [State Library of Victoria]
223 Boethius DE MUSICA – Pseudo-Hucbaldus MUSICA ENCHIRIADIS – in Latin – 11c.
Vellum, 305 x 210mm, A modern paper + B modern vellum + C contemporary vellum + 56 + D modern
vellum + E modern paper. Collation: (8)1-7
. No catchwords, quire signatures in roman numerals placed in the
centre of the lower margin of last folio verso of each gathering, foliation modern pencil in arabic numerals,
no pagination. Av.-Bv., Cv., Dr.-Er. are blank. Some folios have been repaired. Most sheets have purple
stains, however, they rarely efface the text. Worm-holes in fols 1-18 without loss of text.
Dark brown ink, ruling dry-point, one col. of 39-40 lines, Daseia musical notation. Prickings in outer
margins. Script is first half 11c. Rhineland or northern Italian littera prae-gothica textualis. Explicit to the
first text 49r. FINIT.
Decoration: orange rubrics and green, orange, or brown ink drawings.
Edges cut and gilded, binding 19c. brown morocco over boards, gilt, by C. Lewis (see below), spine gilt
with lettering BOETII / MUSICA / M.S. / SEC. XIII (sic).
Incipits: 2r. –ulescentis; 9r. His igitur; 55v. Alleluia; 56r. asterisco ostendi. Ownership: Cr. in a 15c. littera
hybrida currens is a schematic table of astrological texts mentioning such authorities as Ptolomaeus, Thebit,
Iohannes Hispalensis, Alkabitus, Albumasaris, Alfagranus, and there follows immediately a much rubbed
transcription of a commentary (here without ascription) on portion of Arzachel, Canones ad tabulas
tholetanas and it begins: Quoniam cuiusque actionis72
archazel (sic) arabus composuit tabulas ad ciuitatem
toletti…; 56v. among scribbles now almost illegible is a 15c. musical diagram of the diatesseron; Ar. has a
note ‘Boetii Musica, an Ancient MS. in fine Condition with diagrams. A MS. of a work of rare occurrence
bound by C. Lewis. H. Drury73
1824’; spine carries the small printed number ‘3345’, being that of Sir Thomas
Phillipps; Ev. in modern pencil ‘W. H. Robinson 5.9.1949. £LE-N/a/-’; Ar. has the stock no. ‘587673’; Ev. bears
the library’s shelf-mark *091/B63.
72 These opening words are underlined and are the incipit of Arzachel’s text, cf. Thorndike, 1268.
73 Henry Drury (1778-1841); see above p.185 for another MS. he owned.
Proliferation of services
NAMES (persons, places)
Europa Sacra, International
Medieval Bibliography, CERL
Thesaurus
IDENTIFIERS
Used in manuscript databases and
library catalogues, printed
catalogues, International Medieval
Bibliography, Scriptorium
MEASUREMENTS
Recorded in manuscript databases
and library catalogues, printed
catalogues
CONCEPTS
International Medieval Bibliography,
Getty Institute vocabularies,
IconClass vocabulary
MANIFESTATIONS
Web sites and books; listed in
International Medieval Bibliography,
library catalogues, some manuscript
databases, printed catalogues
DEPENDENTS
Listed in International Medieval
Bibliography, Scriptorium, some
manuscript databases
Classics: digital infrastructure
• Perseus Digital Library
• Greek and Roman texts
• Secondary sources – dictionaries
• Linguistic tools – treebanks
• Art and archaeology artefact browser
• Pleiades
• Community-built gazetteer and graph of ancient places
• Pleiades+ adds toponyms from GeoNames
• Pelagios
• Annotating place references in texts and images with entries
from Pleiades (tool = Recogito)
• Perseids
• Collaborative editing platform for source documents
Classics – unique identifiers
• Perseus Digital Library – URIs:
• Texts and citations – built on URNs from Canonical Text
Services
• Bibliographical catalogue records
• Work and edition/translation level records
• In progress: authors, editors, translators; places, Greek and
Latin lexical entities, artefacts, images
• Planned: variety of annotation types
• Pleiades
• URIs for ancient places
Integrating Digital Epigraphies
• Linked Data platform for digital epigraphy:
• PHI Searchable Greek Inscriptions project
• Supplementum Epigraphicum Graecum
• CLAROS concordance of epigraphical publication data
• JSTOR epigraphy articles
Identifiers from any of the projects may be used to retrieve
related data from any of the others
What’s missing in medieval studies?
• A significant proportion of the data is not digital
• Much of the digital data is not available for reuse
• There are many schemas for manuscript descriptions, and
no mappings between them
• There are no machine-readable identifiers for most
medieval people, places, and organizations
• There are no identifiers for medieval manuscripts – or even
consistent ways to cite the shelf-marks
Linked Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Using a Linked Data approach
• Assigning globally unique identifiers to manuscripts and
their component parts
• Tracking relationships between manuscripts and their
different manifestations (including digital versions)
• Tracking relationships between manuscripts and their
various dependents (including works about them)
• Deriving entity data from manuscript descriptions and
from full text editions and transcriptions
• Mapping between terminology in different languages
• Mapping between variant terminologies (e.g., 12th
century
and 21st
century)
Linked Data functions
• Identification of individual manuscripts and their components
• Identification of names, places, works relating to manuscripts
• Terminology mapping: between different vocabularies and
ontologies used to describe manuscripts
• Schema mapping: between different descriptive structures
• Entity extraction: examining digital objects, publications and
manuscript descriptions to identify entities (e.g. through text
mining)
• Browsing and searching across multiple sites and datasets: using
IDs, terminology services and mapping services
• Linking scholarly activities (annotations, excerpts,
representations, etc.) to the manuscript descriptions, objects and
publications which they reference
Implications and assumptions
• Multiple naming schemes are necessary for describing entities and
relationships. There is no single authoritative vocabulary or ontology.
• This is a permanent work-in-progress – not just for adding new entities
and relationships, but for re-thinking and enriching existing entities and
relationships.
• Entities need to include individuals as well as categories. Hierarchical
categorization is useful, but only as far as it helps with finding or
browsing. There is no definitive classification structure.
• Relationships between entities are crucial. The network graph is the
basic structure – not the database, text, Web site. The context is
essential: <this> is related to <that>, and here is <the evidence> for the
relationship.
• The focus should be on organizing knowledge and meaning
(represented by semantic entities), not on organizing collections.
• Linking data is more of an organizational and cultural problem than a
technical one
Re-creation of Phillipps’ shelves, Grolier Club
The Phillipps manuscript collection
• Phillipps’ own printed catalogue
(1837-1871) goes up to no. 23,837
• Thomas Fitzroy Fenwick
(grandson, d. 1938) spent fifty
years reorganizing and
renumbering: up to no. 38,628
• Fenwick’s estimate of the total
was close to 60,000 volumes and
individual documents
• Phillipps also owned 50,000
books, as well as many prints,
photographs, drawings and
paintings Sir Thomas Phillipps (1792-1872)
Dispersal of the collection
Fenwick family (1886-1945):
• Sales to interested libraries and governments (Germany, Belgium,
Netherlands, France, Ireland, Wales) – more than 2,500 items
• Auctions at Sotheby’s, 1886 to 1938 – 22 auctions, more than 22,000
lots, raised £97,000 (over £30 million)
• Residue (12,000 items) sold to the Robinson brothers in 1945 for
£100,000 (£11-12 million)
W.H. Robinson Ltd (1945-1958):
•Series of sale catalogues, 1945-1954
•Donation to the Bodleian Library of the remaining materials, 1958
Sotheby’s (1946-1950, 1965-1977):
•Series of sale catalogues
Research questions and use cases
• Show all the Irish manuscripts acquired by Phillipps,
together with their previous and subsequent history of
ownership, acquisition and sales
• Show all the events which link Phillipps to an earlier or
later collector (e.g., Guglielmo Libri, Chester Beatty)
• How many Phillipps manuscripts are now in North
American collections, and where are they?
• What can we learn about the sources of the Phillipps
Collection, the nature of its contents, and the extent of its
dispersal?
Data sources
Source Format Comments
Schoenberg Database of
Manuscripts
Relational database Incorporates other sources, esp. sales catalogues
6,000 Phillipps MSS; 20,000 Phillipps events
Library catalogues (BL, KB etc.) Relational databases
Generally MARC records
Provenance in notes
Export can be awkward
Union catalogues
Relational databases
Printed bibliographies
Formats vary
Coverage varies
Export can be awkward
Sale catalogues
Printed books (some digitized)
Online sources (PDFs, Web sites)
Many included in Schoenberg
MSS in ABE, eBay etc.
Phillipps catalogues and lists
Printed book; Partly digitized
Supplemented by handwritten notes
Partly included in Schoenberg
Handwritten notes not digitized
Phillipps provenance indexes (BL,
IRHT)
Handwritten; Not digitized
Arranged by Phillipps number
No longer updated
Annotated sales catalogues &
printed catalogues
Handwritten; Not digitized
Researchers (Munby), owners (Phillipps), auctioneers
(Sotheby’s)
Held in Cambridge UL, Bodleian, BL
In 1862, Sir Thomas Phillipps bought Phillipps MS 16402 in London
as part of the Sotheby’s sale of the collection of Guglielmo Libri.
London
1862
MS16402
Libri
Phillipps
Sotheby’s
DATA MODEL – Nodegoat
Object Sub-objects Related to:
PERSON Nationality (country) Manuscript
Text
Catalogue
ORGANIZATION Location (city; country) Manuscript
Text
Catalogue
MANUSCRIPT Sold
Donated
Owned
Described In
Produced
Contents
Person/Organization: Agent,
Owner, Buyer, Donor,
Recipient, Scribe, Artist,
Producer
Location (city; country)
Catalogue
Text
TEXT Person: Author
Manuscript
CATALOGUE Organization: Publisher
Person: Compiler
Manuscript
Going beyond the data
• This is only about organizing the data – the first steps in the
research process – what about the later steps?
• Computational representation of research processes –
modelling humanities research in the digital environment
• What is the goal?
•Making it quicker and easier to gather and organize evidence?
•Changing the way we do humanities research?
Research processes: big science
Humphrey, Charles. (2006) “e-Science and the life cycle of research”
http://datalib.library.ualberta.ca/~humphrey/lifecycle-science060308.doc
When I go to libraries or archives, I make notes in a continuous form on
sheets of paper, entering the page number and abbreviated title of the
source opposite each excerpted passage. When I get home, I copy the
bibliographical details of the works I have consulted into an alphabeticised
index book, so that I can cite them in my footnotes.
I then cut up each sheet with a pair of scissors. The resulting fragments are
of varying size, depending on the length of the passage transcribed. These
sliced-up pieces of paper pile up on the floor. Periodically, I file them away
in old envelopes, devoting a separate envelope to each topic. Along with
them go newspaper cuttings, lists of relevant books and articles yet to be
read, and notes on anything else which might be helpful when it comes to
thinking about the topic more analytically.
If the notes on a particular topic are especially voluminous, I put them in a
box file or a cardboard container or a drawer in a desk. I also keep an index
of the topics on which I have an envelope or a file. The envelopes run into
thousands.
When the time comes to start writing, I go through my envelopes, pick out a
fat one and empty it out onto the table, to see what I have got.
Keith Thomas, “Diary”, London Review of Books, 10 June 2010
The humanities researcher’s data: pre-digital
• Annotations – collected in books, on sheets of paper, on cards, in
notebooks
• Excerpts – collected on sheets of paper, on cards, in notebooks, in
commonplace books
• Citations – recorded on cards, in notebooks
• Categorization scheme(s) – for filing and arranging notes (on index
cards, file dividers, box labels)
• Storage mechanisms – including files, envelopes, cardboard boxes
• Collections of sources: Personal book collections, journals, off-prints
and cuttings, image collections, map collections, microfilms – the ever-
expanding office…
• Sharing: in own publications (especially via footnotes1
), in
bibliographies, through personal contact (e.g. letters)
1
Grafton, Anthony, The footnote: a curious history (London: Faber and Faber, 1997)
The humanities researcher’s data: digital world
• Annotations and excerpts – in Word documents, Google Docs, citation
management databases (EndNote)
• Citations – citation management databases (EndNote) and Web
services (Zotero, Mendeley); downloaded or keyed-in
• Categorization scheme(s) – tagging, keywords
• Storage mechanisms – digital storage of various kinds (logical and
physical) – hard drives, USBs, data stores
• Personal digital collections – iPads, Web sites (Omeka, Dropbox, Flickr)
• Copies of source materials – downloaded or scanned / photographed;
stored on various media (or linked to using URLs and DOIs)
• Sharing – e-mail, social media, Web collaboration services (Google,
Confluence)
Digital methods…
• Force you to be more overt and explicit about your
assumptions, processes, methods, approach – to
externalize and reify
• Help you to visualize the evidence – to explore the relevant
material, to navigate complexity
• Some risks:
– Rigidity, over-simplification, single perspective
– Loss of serendipity
– Mistaking the visualization for the analysis
– Conflating the organizing of the evidence with explanations
and conclusions
Jean-Claude Gardin on humanities systems
• ‘Searching in natural language’: fallacious; “representation issues are
still with us”
• Ordering and classification: “rather primitive”
• Simulation: qualitative variables and values
• Dynamic system modelling: quantified variables, numerical functions
• Artificial intelligence and expert systems: logic, inference engines –
data + rules of inference – but discursive rather than formal rules of
reasoning
“It is incumbent upon the humanities as a ‘distinct’ science to define its
own, alternative ways of reasoning”
“Our primary concern should be the study of mental processes at work in
archaeological reasoning, with a view to making them amenable to
machine handling in a Turing sense – that is, with or without computers”
“The formulation of rules of reasoning is here essential, in whichever
form we chose to express them”
J-C. Gardin, “The Impact of Computer-based Techniques on Research in Archaeology”, in: Scholarship and Technology in the Humanities, ed. May
Katzen (London: Bowker-Saur, 1991), pp. 95-110
A logic of historical thought
• Frame the question
• Revise and refine, test and verify
• Must be resolvable in empirical terms
• Fictional, counterfactual questions: heuristically useful but unprovable
• Use evidence to build up an explanation
• Classificatory concepts, e.g. feudalism
• Statistical generalizations – inferred by a special form of reasoning
• Assemble in temporal order as a narrative
• Causation – influences, factors, elements – by reference to antecedents
• Individual motivation and group behaviour
• Inference using analogies: heuristically useful but unprovable
• Construct an argument (produce a historical interpretation)
• Proceed from premises to a conclusion through rational inference
• Define and clarify the terms (e.g., democracy, capitalism, nationalism) –
simple, working, consistent definition
David Hackett Fischer, Historians’ Fallacies: Toward a Logic of Historical Thought
(London: Routledge & Kegan Paul, 1971)
Fischer’s logical steps Systems requirements
Framing the question Searching and browsing
Counterfactuals and “what if?”
Using evidence to build up an
explanation
[Assembling evidence]
Organizing evidence – classification
Linking within the evidence – causation, analogy
Working at scale – from the individual to the general
Reasoning – inference
Working with time – constructing a narrative
Constructing an argument –
producing a historical
explanation
Defining terms
Reasoning: premises – inference – conclusion
[Distributing the results]
Dr Toby Burrows
Marie Curie Fellow
Department of Digital Humanities
King’s College London
26-29 Drury Lane
London WC2B 5RL
toby.burrows@kcl.ac.uk
@tobyburrows
tobyburrows.wordpress.com
Questions: Digitization
1. How much of the material you need for your research has
not been digitized?
2. How should the priorities for future digitization be
decided?
3. How useful to you are services which only contain digital
or digitized materials?
4. Should everything be free?
Questions: Tools
1. How do you choose which software to use for your
research?
2. Do you use generic software, or tools specific to medieval
studies?
3. What cloud- or Web-based tools do you use?
4. What is your university’s attitude to Open Source
software?
5. What sort of support do you get with using software?
Questions: Computation and representation
1.What do we mean by “data” in the humanities? Can
medieval research be data-centred?
2.To what extent can research processes in medieval studies
be modelled for use in a computational environment?
3.What’s the point of visualizations?

CENDARI Summer School July 2015 Burrows

  • 1.
    MEDIEVAL studies &the digital turn Department of Digital HumanitiesToby Burrows
  • 2.
    Scope and scale •Medieval studies – a vast field; interdisciplinary and multi- disciplinary – The surviving evidence – Subsequent material – including medievalism • Digital Humanities – also a very large field – Tools – Theoretical framework – computational modelling • 18 DH sessions at Kalamazoo ICMS 2015 – 7 medieval papers at DH 2015
  • 4.
    The digital landscape •Digitized manuscripts and digital image collections: numerous libraries, museums, researchers • Digitized editions and reference works: Patrologia Latina, Corpus Christianorum, Hathi Trust, Google Books • Catalogues and databases: libraries, museums, specialist research collections • Tools for working with images: IIIF, DigiPal • Tools for working with texts: NER and topic modeling, transcriptions and editions • Visualization and analysis tools: social networks, geographical, chronological
  • 5.
    Information overload? “Their worksare so limitless that they cannot be numbered … Indeed, we stand convicted of indolence by our inability to read all that they could manage to dictate” Hugh of St Victor, Didascalicon, Book 4, chapter 2 “Is there anywhere on earth exempt from these swarms of new books? Even if, taken out one at a time, they offered something worth knowing, the very mass of them would be a serious impediment to learning – from satiety, if nothing else” Erasmus, Adages, II.1.1
  • 6.
    The limits ofdigitization “Around 60,000 medieval manuscripts are preserved in German collections today, of which roughly 7.5 percent have been digitized so far.” C. Fabian, C. Schreiber, “Piloting a National Programme for the Digitization of Medieval Manuscripts in Germany”, Liber Quarterly 24 (1) (2014), 2-16 “We invite you to take part in an experiment in DIY digitization. Please upload your digital photographs of the Bodleian’s special collections to this [Flickr] group. These are your photographs, and you can license them accordingly, but please include at least a shelfmark for identification. We just want to see how people might use this online resource for sharing photographs of our collections.” Bodleian Library, 26 June 2015
  • 11.
    Joining the dots •We have lots of resources and tools, but they mostly exist independently of each other • Duplication of effort • Time spent trying to identify and bring together different sources • How do we bring them together? • Integrated infrastructure – standardized, centralized • Interoperability – of services and data • IIIF – International Image Interoperability Framework • LOD – Linked Open Data
  • 17.
    VICTORIA [State Libraryof Victoria] 223 Boethius DE MUSICA – Pseudo-Hucbaldus MUSICA ENCHIRIADIS – in Latin – 11c. Vellum, 305 x 210mm, A modern paper + B modern vellum + C contemporary vellum + 56 + D modern vellum + E modern paper. Collation: (8)1-7 . No catchwords, quire signatures in roman numerals placed in the centre of the lower margin of last folio verso of each gathering, foliation modern pencil in arabic numerals, no pagination. Av.-Bv., Cv., Dr.-Er. are blank. Some folios have been repaired. Most sheets have purple stains, however, they rarely efface the text. Worm-holes in fols 1-18 without loss of text. Dark brown ink, ruling dry-point, one col. of 39-40 lines, Daseia musical notation. Prickings in outer margins. Script is first half 11c. Rhineland or northern Italian littera prae-gothica textualis. Explicit to the first text 49r. FINIT. Decoration: orange rubrics and green, orange, or brown ink drawings. Edges cut and gilded, binding 19c. brown morocco over boards, gilt, by C. Lewis (see below), spine gilt with lettering BOETII / MUSICA / M.S. / SEC. XIII (sic). Incipits: 2r. –ulescentis; 9r. His igitur; 55v. Alleluia; 56r. asterisco ostendi. Ownership: Cr. in a 15c. littera hybrida currens is a schematic table of astrological texts mentioning such authorities as Ptolomaeus, Thebit, Iohannes Hispalensis, Alkabitus, Albumasaris, Alfagranus, and there follows immediately a much rubbed transcription of a commentary (here without ascription) on portion of Arzachel, Canones ad tabulas tholetanas and it begins: Quoniam cuiusque actionis72 archazel (sic) arabus composuit tabulas ad ciuitatem toletti…; 56v. among scribbles now almost illegible is a 15c. musical diagram of the diatesseron; Ar. has a note ‘Boetii Musica, an Ancient MS. in fine Condition with diagrams. A MS. of a work of rare occurrence bound by C. Lewis. H. Drury73 1824’; spine carries the small printed number ‘3345’, being that of Sir Thomas Phillipps; Ev. in modern pencil ‘W. H. Robinson 5.9.1949. £LE-N/a/-’; Ar. has the stock no. ‘587673’; Ev. bears the library’s shelf-mark *091/B63. 72 These opening words are underlined and are the incipit of Arzachel’s text, cf. Thorndike, 1268. 73 Henry Drury (1778-1841); see above p.185 for another MS. he owned.
  • 18.
    Proliferation of services NAMES(persons, places) Europa Sacra, International Medieval Bibliography, CERL Thesaurus IDENTIFIERS Used in manuscript databases and library catalogues, printed catalogues, International Medieval Bibliography, Scriptorium MEASUREMENTS Recorded in manuscript databases and library catalogues, printed catalogues CONCEPTS International Medieval Bibliography, Getty Institute vocabularies, IconClass vocabulary MANIFESTATIONS Web sites and books; listed in International Medieval Bibliography, library catalogues, some manuscript databases, printed catalogues DEPENDENTS Listed in International Medieval Bibliography, Scriptorium, some manuscript databases
  • 19.
    Classics: digital infrastructure •Perseus Digital Library • Greek and Roman texts • Secondary sources – dictionaries • Linguistic tools – treebanks • Art and archaeology artefact browser • Pleiades • Community-built gazetteer and graph of ancient places • Pleiades+ adds toponyms from GeoNames • Pelagios • Annotating place references in texts and images with entries from Pleiades (tool = Recogito) • Perseids • Collaborative editing platform for source documents
  • 23.
    Classics – uniqueidentifiers • Perseus Digital Library – URIs: • Texts and citations – built on URNs from Canonical Text Services • Bibliographical catalogue records • Work and edition/translation level records • In progress: authors, editors, translators; places, Greek and Latin lexical entities, artefacts, images • Planned: variety of annotation types • Pleiades • URIs for ancient places
  • 24.
    Integrating Digital Epigraphies •Linked Data platform for digital epigraphy: • PHI Searchable Greek Inscriptions project • Supplementum Epigraphicum Graecum • CLAROS concordance of epigraphical publication data • JSTOR epigraphy articles Identifiers from any of the projects may be used to retrieve related data from any of the others
  • 25.
    What’s missing inmedieval studies? • A significant proportion of the data is not digital • Much of the digital data is not available for reuse • There are many schemas for manuscript descriptions, and no mappings between them • There are no machine-readable identifiers for most medieval people, places, and organizations • There are no identifiers for medieval manuscripts – or even consistent ways to cite the shelf-marks
  • 27.
    Linked Open Datacloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  • 28.
    Using a LinkedData approach • Assigning globally unique identifiers to manuscripts and their component parts • Tracking relationships between manuscripts and their different manifestations (including digital versions) • Tracking relationships between manuscripts and their various dependents (including works about them) • Deriving entity data from manuscript descriptions and from full text editions and transcriptions • Mapping between terminology in different languages • Mapping between variant terminologies (e.g., 12th century and 21st century)
  • 29.
    Linked Data functions •Identification of individual manuscripts and their components • Identification of names, places, works relating to manuscripts • Terminology mapping: between different vocabularies and ontologies used to describe manuscripts • Schema mapping: between different descriptive structures • Entity extraction: examining digital objects, publications and manuscript descriptions to identify entities (e.g. through text mining) • Browsing and searching across multiple sites and datasets: using IDs, terminology services and mapping services • Linking scholarly activities (annotations, excerpts, representations, etc.) to the manuscript descriptions, objects and publications which they reference
  • 30.
    Implications and assumptions •Multiple naming schemes are necessary for describing entities and relationships. There is no single authoritative vocabulary or ontology. • This is a permanent work-in-progress – not just for adding new entities and relationships, but for re-thinking and enriching existing entities and relationships. • Entities need to include individuals as well as categories. Hierarchical categorization is useful, but only as far as it helps with finding or browsing. There is no definitive classification structure. • Relationships between entities are crucial. The network graph is the basic structure – not the database, text, Web site. The context is essential: <this> is related to <that>, and here is <the evidence> for the relationship. • The focus should be on organizing knowledge and meaning (represented by semantic entities), not on organizing collections. • Linking data is more of an organizational and cultural problem than a technical one
  • 31.
    Re-creation of Phillipps’shelves, Grolier Club
  • 32.
    The Phillipps manuscriptcollection • Phillipps’ own printed catalogue (1837-1871) goes up to no. 23,837 • Thomas Fitzroy Fenwick (grandson, d. 1938) spent fifty years reorganizing and renumbering: up to no. 38,628 • Fenwick’s estimate of the total was close to 60,000 volumes and individual documents • Phillipps also owned 50,000 books, as well as many prints, photographs, drawings and paintings Sir Thomas Phillipps (1792-1872)
  • 33.
    Dispersal of thecollection Fenwick family (1886-1945): • Sales to interested libraries and governments (Germany, Belgium, Netherlands, France, Ireland, Wales) – more than 2,500 items • Auctions at Sotheby’s, 1886 to 1938 – 22 auctions, more than 22,000 lots, raised £97,000 (over £30 million) • Residue (12,000 items) sold to the Robinson brothers in 1945 for £100,000 (£11-12 million) W.H. Robinson Ltd (1945-1958): •Series of sale catalogues, 1945-1954 •Donation to the Bodleian Library of the remaining materials, 1958 Sotheby’s (1946-1950, 1965-1977): •Series of sale catalogues
  • 34.
    Research questions anduse cases • Show all the Irish manuscripts acquired by Phillipps, together with their previous and subsequent history of ownership, acquisition and sales • Show all the events which link Phillipps to an earlier or later collector (e.g., Guglielmo Libri, Chester Beatty) • How many Phillipps manuscripts are now in North American collections, and where are they? • What can we learn about the sources of the Phillipps Collection, the nature of its contents, and the extent of its dispersal?
  • 35.
    Data sources Source FormatComments Schoenberg Database of Manuscripts Relational database Incorporates other sources, esp. sales catalogues 6,000 Phillipps MSS; 20,000 Phillipps events Library catalogues (BL, KB etc.) Relational databases Generally MARC records Provenance in notes Export can be awkward Union catalogues Relational databases Printed bibliographies Formats vary Coverage varies Export can be awkward Sale catalogues Printed books (some digitized) Online sources (PDFs, Web sites) Many included in Schoenberg MSS in ABE, eBay etc. Phillipps catalogues and lists Printed book; Partly digitized Supplemented by handwritten notes Partly included in Schoenberg Handwritten notes not digitized Phillipps provenance indexes (BL, IRHT) Handwritten; Not digitized Arranged by Phillipps number No longer updated Annotated sales catalogues & printed catalogues Handwritten; Not digitized Researchers (Munby), owners (Phillipps), auctioneers (Sotheby’s) Held in Cambridge UL, Bodleian, BL
  • 38.
    In 1862, SirThomas Phillipps bought Phillipps MS 16402 in London as part of the Sotheby’s sale of the collection of Guglielmo Libri. London 1862 MS16402 Libri Phillipps Sotheby’s
  • 40.
    DATA MODEL –Nodegoat Object Sub-objects Related to: PERSON Nationality (country) Manuscript Text Catalogue ORGANIZATION Location (city; country) Manuscript Text Catalogue MANUSCRIPT Sold Donated Owned Described In Produced Contents Person/Organization: Agent, Owner, Buyer, Donor, Recipient, Scribe, Artist, Producer Location (city; country) Catalogue Text TEXT Person: Author Manuscript CATALOGUE Organization: Publisher Person: Compiler Manuscript
  • 49.
    Going beyond thedata • This is only about organizing the data – the first steps in the research process – what about the later steps? • Computational representation of research processes – modelling humanities research in the digital environment • What is the goal? •Making it quicker and easier to gather and organize evidence? •Changing the way we do humanities research?
  • 50.
    Research processes: bigscience Humphrey, Charles. (2006) “e-Science and the life cycle of research” http://datalib.library.ualberta.ca/~humphrey/lifecycle-science060308.doc
  • 51.
    When I goto libraries or archives, I make notes in a continuous form on sheets of paper, entering the page number and abbreviated title of the source opposite each excerpted passage. When I get home, I copy the bibliographical details of the works I have consulted into an alphabeticised index book, so that I can cite them in my footnotes. I then cut up each sheet with a pair of scissors. The resulting fragments are of varying size, depending on the length of the passage transcribed. These sliced-up pieces of paper pile up on the floor. Periodically, I file them away in old envelopes, devoting a separate envelope to each topic. Along with them go newspaper cuttings, lists of relevant books and articles yet to be read, and notes on anything else which might be helpful when it comes to thinking about the topic more analytically. If the notes on a particular topic are especially voluminous, I put them in a box file or a cardboard container or a drawer in a desk. I also keep an index of the topics on which I have an envelope or a file. The envelopes run into thousands. When the time comes to start writing, I go through my envelopes, pick out a fat one and empty it out onto the table, to see what I have got. Keith Thomas, “Diary”, London Review of Books, 10 June 2010
  • 52.
    The humanities researcher’sdata: pre-digital • Annotations – collected in books, on sheets of paper, on cards, in notebooks • Excerpts – collected on sheets of paper, on cards, in notebooks, in commonplace books • Citations – recorded on cards, in notebooks • Categorization scheme(s) – for filing and arranging notes (on index cards, file dividers, box labels) • Storage mechanisms – including files, envelopes, cardboard boxes • Collections of sources: Personal book collections, journals, off-prints and cuttings, image collections, map collections, microfilms – the ever- expanding office… • Sharing: in own publications (especially via footnotes1 ), in bibliographies, through personal contact (e.g. letters) 1 Grafton, Anthony, The footnote: a curious history (London: Faber and Faber, 1997)
  • 53.
    The humanities researcher’sdata: digital world • Annotations and excerpts – in Word documents, Google Docs, citation management databases (EndNote) • Citations – citation management databases (EndNote) and Web services (Zotero, Mendeley); downloaded or keyed-in • Categorization scheme(s) – tagging, keywords • Storage mechanisms – digital storage of various kinds (logical and physical) – hard drives, USBs, data stores • Personal digital collections – iPads, Web sites (Omeka, Dropbox, Flickr) • Copies of source materials – downloaded or scanned / photographed; stored on various media (or linked to using URLs and DOIs) • Sharing – e-mail, social media, Web collaboration services (Google, Confluence)
  • 54.
    Digital methods… • Forceyou to be more overt and explicit about your assumptions, processes, methods, approach – to externalize and reify • Help you to visualize the evidence – to explore the relevant material, to navigate complexity • Some risks: – Rigidity, over-simplification, single perspective – Loss of serendipity – Mistaking the visualization for the analysis – Conflating the organizing of the evidence with explanations and conclusions
  • 55.
    Jean-Claude Gardin onhumanities systems • ‘Searching in natural language’: fallacious; “representation issues are still with us” • Ordering and classification: “rather primitive” • Simulation: qualitative variables and values • Dynamic system modelling: quantified variables, numerical functions • Artificial intelligence and expert systems: logic, inference engines – data + rules of inference – but discursive rather than formal rules of reasoning “It is incumbent upon the humanities as a ‘distinct’ science to define its own, alternative ways of reasoning” “Our primary concern should be the study of mental processes at work in archaeological reasoning, with a view to making them amenable to machine handling in a Turing sense – that is, with or without computers” “The formulation of rules of reasoning is here essential, in whichever form we chose to express them” J-C. Gardin, “The Impact of Computer-based Techniques on Research in Archaeology”, in: Scholarship and Technology in the Humanities, ed. May Katzen (London: Bowker-Saur, 1991), pp. 95-110
  • 56.
    A logic ofhistorical thought • Frame the question • Revise and refine, test and verify • Must be resolvable in empirical terms • Fictional, counterfactual questions: heuristically useful but unprovable • Use evidence to build up an explanation • Classificatory concepts, e.g. feudalism • Statistical generalizations – inferred by a special form of reasoning • Assemble in temporal order as a narrative • Causation – influences, factors, elements – by reference to antecedents • Individual motivation and group behaviour • Inference using analogies: heuristically useful but unprovable • Construct an argument (produce a historical interpretation) • Proceed from premises to a conclusion through rational inference • Define and clarify the terms (e.g., democracy, capitalism, nationalism) – simple, working, consistent definition David Hackett Fischer, Historians’ Fallacies: Toward a Logic of Historical Thought (London: Routledge & Kegan Paul, 1971)
  • 57.
    Fischer’s logical stepsSystems requirements Framing the question Searching and browsing Counterfactuals and “what if?” Using evidence to build up an explanation [Assembling evidence] Organizing evidence – classification Linking within the evidence – causation, analogy Working at scale – from the individual to the general Reasoning – inference Working with time – constructing a narrative Constructing an argument – producing a historical explanation Defining terms Reasoning: premises – inference – conclusion [Distributing the results]
  • 58.
    Dr Toby Burrows MarieCurie Fellow Department of Digital Humanities King’s College London 26-29 Drury Lane London WC2B 5RL toby.burrows@kcl.ac.uk @tobyburrows tobyburrows.wordpress.com
  • 59.
    Questions: Digitization 1. Howmuch of the material you need for your research has not been digitized? 2. How should the priorities for future digitization be decided? 3. How useful to you are services which only contain digital or digitized materials? 4. Should everything be free?
  • 60.
    Questions: Tools 1. Howdo you choose which software to use for your research? 2. Do you use generic software, or tools specific to medieval studies? 3. What cloud- or Web-based tools do you use? 4. What is your university’s attitude to Open Source software? 5. What sort of support do you get with using software?
  • 61.
    Questions: Computation andrepresentation 1.What do we mean by “data” in the humanities? Can medieval research be data-centred? 2.To what extent can research processes in medieval studies be modelled for use in a computational environment? 3.What’s the point of visualizations?

Editor's Notes

  • #42 I currently have a test data collection in Nodegoat, involving 100 Phillipps manuscripts and about 250 provenance transactions (data from the Schoenberg Database) Here is an example of a manuscript object – its description is the top half Its associated sub-objects are summarized in the lower half This MS has three “Sold” sub-objects, as well as “Owned”, “Produced” and “Contents” (a link to the text it contains) Produced in 1580 in the UK, then sold three times between 1815 and 1967, owned by Yale University in 2010