2010 Digital Humanities London - Dutch Republic of Letters

Letters, Ideas and scholarly
communication
Information Technology @ 1650
Using digital corpora of letters to
disclose the circulation of
knowledge in the 17th century

Erik-Jan Bos, Univ. Utrecht,
erik-jan.bos@phil.uu.nl

scholarly
communication
Charles van den Heuvel, VKS, @ 2050
charles.vandenheuvel@vks.knaw.nl
Dirk Roorda (that’s me), DANS,
dirk.roorda@dans.knaw.nl

Nota
Beeckman
Cats STEVIN

relation disciplines
direct - water
indirect - literature

Huygens STEVIN
Langeren

Corpora of
17th century scholars
 Constantijn Huygens
 Christiaan Huygens
 Grotius
 Descartes
 Swammerdam
 Leeuwenhoek
 Barleaus
 Spinoza 4

 and more?

Corpus Number In Format Metadata Normalized?
of letters: posession?
Grotius 7946 Yes TEI In Interp Yes, DBNL
element codes
Van 337 Yes TEI In Interp Yes, DBNL
Leeuwenhoek element codes
Descartes 750 Yes XML (no other No, plain text
TEI) markup
Barlaeus 1200 300 ready Word unknown unknown
Swammerdam 80 Yes Word unknown unknown
Constantijn 7295 Yes xml Probably DBNL codes
Huygens Interp
element
Christiaan 2900? Medio 2010 probably Probably DBNL codes
Huygens TEI Interp
element

CEN -Metadata

Catalogus Epistularum Neerlandaricum
265,000 descriptions of approximately
1,000,000 letters
from 1600 – now of which
100,000 letters in 17th century

Research Questions
• History of science:
• How did knowledge circulate in the 17th-
century Dutch Republic?
• Patterns in knowledge growth:
• How can we visualise sets of letters that
exhibit features of knowledge circulation?
• Re-use:
• How can we expose the sources, annotations,
and resulting patterns to further research?

Challenge

Traditional scholarship
• interpretation
• close reading East
• solving puzzles

Computational methods We
•dealing with patterns st
•gleaned from large quantities of texts
•by automatic tools

East is east and
West is west and ...

Issues to deal with

• making the sources uniformly available
• well coded in TEI, access rights
• overcoming the language barrier
• (17th cent varieties of French, Latin, Dutch)
• named entity recognition & concepts
• people, places, dates, concepts, instruments
• mixture of interpretation and algorithms
• creating useful visualisations
• aiding exploration by historians of science

ICT in Humanities Research
• collaboratory
• e-Laborate as starting point
• algorithmic pipelines
• from source material to visualisation
• infrastructure
• archiving results
• re-using data
• developing new algorithms
• disseminating the methodology

pipelines (current)
• language detection, using
Language Identification from Text Using N-gram Based
Cumulative Frequency Addition
Bashir Ahmed, Sung-Hyuk Cha, and Charles Tappert 2004
• results

latin
dutch
french
german

pipelines (current)
• spelling normalisation
• VARD (http://www.comp.lancs.ac.uk/~barona/vard2/)
• with help from (http://www.dicollecte.org/home.php?prj=fr)
• results
• French: VARD works (after improvements),
although designed for historical English
• Dutch: still on the lookout for a combination of
resources, tools, and dexterity
• Latin: later

pipelines (current)
• named entity recognition
• known tools get 70%
• search for optimal tools in the next stage

pipelines (insights)
• expect the most from statistical methods
• language technology may boost results
• it remains to be seen by how much

Source: Scott

Topic-Author-Time Weingart UIA

the project’s legacy
• more than publications
• curated sources, annotations, visualisations
• more than algoritms
• a framework for analysis of historical texts
• more than a piece of historical research
• data and (intermediate) results worthwhile to
• linguists, computer scientists, sociologists
• more than a passive dataset
• extensible, dynamic, interactive

preserving the results
• part of the CLARIN infrastructure
• http://www.clarin.eu/
• http://www.clarin.nl/
• materials in a Trusted Digital Repository
(DANS)
• http://easy.dans.knaw.nl/dms

working with CLARIN
• CLARIN-EU
• Outreach to humanities: use cases
• CKCC one of 10 selected projects
• received expert input for choice of language
tools
• CLARIN-NL
• CKCC one of 10 initial projects in the Dutch
national construction effort
• support for applying language technology

Adapting to CLARIN
• Conforming to standards
• CLARIN standards are in evolution
• (and will remain evolvable)
• Common MetaData Infrastructure
• a registry of metadata components
• defined by the community
• with explicit semantics (http://www.isocat.org/ )
• Data in TEI (as export/import format)

Trusted Digital Repository
• materials
• reliable (provenance metadata)
• findable (CMDI metadata)
• referable (persistent identifiers)
• accessible (viewable in webbrowser)
• usable (downloadable)
• sooner or later:
• high-performance computing
• memento: a time-sensitive webinterface to the
dynamic contents of the collaboratory
(http://arxiv.org/abs/0911.1112 )

http://www.clarin.eu/node/3073


http://ckcc.huygens.knaw.nl/

2010 Digital Humanities London - Dutch Republic of Letters

Recommended

Recommended

More Related Content

Similar to 2010 Digital Humanities London - Dutch Republic of Letters

Similar to 2010 Digital Humanities London - Dutch Republic of Letters (20)

More from Dirk Roorda

More from Dirk Roorda (20)

Recently uploaded

Recently uploaded (20)

2010 Digital Humanities London - Dutch Republic of Letters

Editor's Notes