2. Aims of this talk
• Introduce the new Digital Humanities Group
• Discussing some ideas for digital humanities at HuC
• Starting the conversation
3. Digital Humanities Group
• Our research is to develop new
language technology methods
for the humanities
• Focus on big ‘textual’ data
• Interdisciplinary
• Inter-institutional (joint research
group of Huygens ING, IISH
and Meertens Institute)
Melvin Wevers
Adina Nerghes
Marieke van Erp
5. What is Digital Humanities
“Digital humanities is work at the intersection of digital
technology and humanities disciplines.”
Johanna Drucker, 2013
“Bringing computational methods to bear on traditional
humanities scholarship.”
Elijah Meeks, 2016
“Digital humanities is a diverse and still emerging field that
encompasses the practice of humanities research in and through
information technology, and the exploration of how the humanities may
evolve through their engagement with technology, media, and
computational methods.”
Digital Humanities Quarterly, 2017
6. WebSci’12
Folgert Karsdorp, Antal van den Bosch
“The Structure and Evolution of Story
Networks.” Royal Society Open
Science 3 (2016): 160071.
Stapel, R. (2016). Reconstruction of Labour Relations in the
North Sea Region in the Late Middle Ages: Spatio-Temporal
Analysis Using Historical GIS, Taxation Sources, and Coin
Finds. In Digital Humanities 2016: Conference Abstracts.
Jagiellonian University & Pedagogical University, Kraków, pp.
366-369.
10. Federalist papers
• In 1788 Alexander Hamilton, James Madison en John
Jay wrote 85 arguments supporting the American
constitution using the penname ‘Publius’
• Until1962 it was unknown who had written what
• One of the first examples of authorship attribution using
statistics
https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/
11. Federalist papers
• Sentence length by Hamilton & Madison was ~35 words
• Hamilton preferred to use ‘while’
• Madison preferred to use ‘whilst’
• But sometimes Hamilton also used ‘whilst’ and v.v.
12. Federalist papers
• Frequency analysis in 1959:
Then the words were entered
into the IBM7090 that could
analyse 3000 words per batch
13. Federalist papers
• In the ‘known’ Hamilton documents ‘upon’ occurred 3.24
times per 1000 words (on average). Madison used that
word far less often. Thus, when the word ‘upon’ occurs
more often in an ‘unknown’ document, the chance is
higher that it was written by Hamilton.
• Such comparisons were also made for other words
• According to this analysis, the majority of the documents
was written by James Madison
18. Knowledge modelling DH@HuC
• Possible research directions:
• Historical narratives
• Literary narratives
• Challenges:
• What is a (historical/literary) narrative? (active research field in CS,
e.g. CMN workshops)
• How can we automatically detect narratives?
• Can we detect and model storylines? (with Tommaso Caselli,
RUG?)
19. Trend Analysis
• Trace use of `toxic’ metaphors around the financial crisis in newspapers
• Detect metaphors and their context through a semantic network
• Nerghes, A., Hellsten, I., and Groenewegen, P. (2015) A Toxic Crisis:
Metaphorizing the Financial Crisis. International Journal Of
Communication, 9(27).
20. Trend Analysis DH@HuC
• Possible research directions:
• Metaphors in Dutch folk songs
• Concepts in Resolutions of Dutch States General
• Challenges:
• Sparser and more varied data
• Language variation through time
22. Visualising
• Can help researchers explore
data
• Discover patterns
• Discover outliers
• Can help communicate our
research
23. Tracing and mapping DH@HuC
• Possible research directions:
• Tracing concept drift in Dutch strikes
• Mapping concepts in folk tales
• Challenges:
• Data sparsity
• Figurative language use
28. Semantic Querying/Cross-linguality DH@HuC
• Possible research directions:
• Linking sources on global trade
• Mapping concepts in letters
• Challenges:
• Archaic language use
• Domain specific concepts
29. Evaluation: Social networks from novels
• Named entity recognition is often used to generate social networks from
novels
• But: most work is done on ‘the classics’, how well does it perform on
contemporary novels?
MSc thesis
Niels Dekker
30. Evaluation: Social networks from novels
ChalaisChalais M. BonacieuxM. Bonacieux
de M. Busignyde M. Busigny
Houdiniere LaHoudiniere La
John FeltonJohn Felton
Bois-Tracy de Ma...Bois-Tracy de Ma...
de M. Schombergde M. Schomberg
LubinLubin
Porthos MonsieurPorthos Monsieur
la Harpe de Ruela Harpe de Rue
RochellaisRochellais
Richelieu deRichelieu de
de Busigny Monsi...de Busigny Monsi...
Milady ClarikMilady Clarik
RochefortRochefort
d Monsieurd Monsieur
M. CoquenardM. Coquenard
de Treville Mons...de Treville Mons...
Mr. FeltonMr. Felton
MontagueMontague
dâArtagnan Mon...dâArtagnan Mon...
Buckingham de Mo...Buckingham de Mo...
de Monsieur Voit...de Monsieur Voit...
Monsieur Bernajo...Monsieur Bernajo...
III HenryIII Henry
Monsieur Dessess...Monsieur Dessess...
de Chevreuse Mad...de Chevreuse Mad...
Donna EstafaniaDonna Estafania
Lord DukeLord Duke
Quixote DonQuixote Don
Lorme de MarionLorme de Marion
de Cahusac Monsi...de Cahusac Monsi...
BazinBazin
Chevalier Monsie...Chevalier Monsie...
MusketeerMusketeer
Constance Bonaci...Constance Bonaci...
M. DessessartM. Dessessart
GermainGermain
de M. Cavoisde M. Cavois
JudithJudith
GasconGascon
MousquetonMousqueton
Monsieur AthosMonsieur Athos
Duke MonsieurDuke Monsieur
Charlotte BacksonCharlotte Backson
BethuneBethune
Planchet MonsieurPlanchet Monsieur
Louis XIIILouis XIII
Bonacieux MadameBonacieux Madame
de Benserade Mon...de Benserade Mon...
GervaisGervais
MeungMeung
Chesnaye LaChesnaye La
Bonacieux Monsie..Bonacieux Monsie..
ChrysostomChrysostom
Wardes de De M.Wardes de De M.
Coquenard Monsie...Coquenard Monsie...
PatrickPatrick
BerryBerry
MandeMande
Laporte M.Laporte M.
de M. Laffemasde M. Laffemas
Laporte MonsieurLaporte Monsieur
Louis XIVLouis XIV
AnneAnne
de M. Tremouille...de M. Tremouille...
NormanNorman
de M. Bassompier...de M. Bassompier...
IV HenryIV Henry
Villiers GeorgeVilliers George
BearnaisBearnais
I CharlesI Charles
PierrePierre
monsieur Aramis ...monsieur Aramis ...
JussacJussac
DenisDenis
GasconsGascons
Coquenard MadameCoquenard Madame
CrevecoeurCrevecoeur
PicardPicard
pope Popepope Pope
de M. Trevillede M. Treville
de Marie Mde Marie M
LorraineLorraine
#N/A#N/A
Cardinal MonsieurCardinal Monsieur
FourreauFourreau
BicaratBicarat
Marie Michon MAR...Marie Michon MAR...
Lord de WinterLord de Winter
Milady de De Win...Milady de De Win...
M. dâArtagnanM. dâArtagnan
DukeDuke
Messieurs PorthosMessieurs Porthos
KittyKitty
MSc thesis
Niels Dekker
31. Evaluation: Social networks from novels
ChalaisChalais
M. BonacieuxM. Bonacieux de M. Busignyde M. Busigny
Houdiniere LaHoudiniere La
John FeltonJohn Felton
Bois-Tracy de Ma...Bois-Tracy de Ma...
de M. Schombergde M. Schomberg
LubinLubin
Porthos MonsieurPorthos Monsieur
la Harpe de Ruela Harpe de Rue
RochellaisRochellais
de Marie Medicisde Marie Medicis
de Busigny Monsi...de Busigny Monsi...
Milady ClarikMilady Clarik
RochefortRochefort
Grimaud MonsieurGrimaud Monsieur
M. CoquenardM. Coquenard
de Treville Mons...de Treville Mons...
Commissary Monsi...Commissary Monsi...
Mr. FeltonMr. Felton
MontagueMontague
Buckingham de Mo...Buckingham de Mo...
de Monsieur Voit...de Monsieur Voit...
M. DartagnanM. Dartagnan
Monsieur Bernajo...Monsieur Bernajo...
III HenryIII Henry
Monsieur Dessess...Monsieur Dessess...
de Chevreuse Mad...de Chevreuse Mad...
Donna EstafaniaDonna Estafania
Lord DukeLord Duke
Quixote DonQuixote Don
Lorme de MarionLorme de Marion
de Cahusac Monsi...de Cahusac Monsi...
BazinBazin
Chevalier Monsie...Chevalier Monsie...
MusketeerMusketeer
M. DessessartM. Dessessart
GermainGermain
de M. Cavoisde M. Cavois
JudithJudith
Monsieur Dartagn...Monsieur Dartagn...
GasconGascon
MousquetonMousqueton
Monsieur AthosMonsieur Athos
Duke MonsieurDuke Monsieur
Charlotte BacksonCharlotte Backson
BethuneBethune
Planchet MonsieurPlanchet Monsieur
Louis XIIILouis XIII
Milady de WinterMilady de Winter
Bonacieux MadameBonacieux Madame
de Benserade Mon...de Benserade Mon...
GervaisGervais
MeungMeung
Chesnaye LaChesnaye La
Bonacieux Monsie...Bonacieux Monsie...
ChrysostomChrysostom
Wardes de De M.Wardes de De M.
Coquenard Monsie...Coquenard Monsie...
PatrickPatrick
Lord de De WinterLord de De Winter
BerryBerry
MandeMande
Laporte M.Laporte M.
Richelieu deRichelieu de
GodeauGodeau
Laporte MonsieurLaporte Monsieur
Louis XIVLouis XIV
AnneAnne
de M. Tremouille...de M. Tremouille...
NormanNorman
de M. Bassompier...de M. Bassompier...
IV HenryIV Henry
Villiers GeorgeVilliers George
de M. Laffemasde M. Laffemas
BearnaisBearnais
PierrePierre
monsieur Aramis ...monsieur Aramis ...
JussacJussac
DenisDenis
GasconsGascons
CrevecoeurCrevecoeur
PicardPicard
pope Popepope Pope
de M. Trevillede M. Treville
de Monsieur Cavo...de Monsieur Cavo...
LorraineLorraine
Dangouleme DucDangouleme Duc
#N/A#N/A
Cardinal MonsieurCardinal Monsieur
FourreauFourreau
BicaratBicarat
Marie Michon MAR...Marie Michon MAR...
I CharlesI CharlesDukeDuke
VilleroyVilleroy
Messieurs PorthosMessieurs Porthos
KittyKitty
Bonacieux Consta...Bonacieux Consta...
After changing d’Artagnan to Dartagnan:
(and the F-score rose from 0.13 to 0.53)
MSc thesis
Niels Dekker
32. Replicability
and Reproducibility
• Replicate: execute the exact
same experiment with the
same code and data
• Reproduce: achieve the same
results/conclusions with a
different implementation
• Difficult when datasets evolve,
web data is involved, or when
data is proprietary
• Analyses often not well
documented
33. Why replicate/reproduce?
• To better understand each other’s work
• To be able to build on each other’s work
• Quality check
• DHG will aim to make its research as reproducible as possible
• We will aim to document all steps of the process (from data cleaning
to visualisation)
• Will make code and datasets available (where possible)
34. Sharing is caring
See also: https://www.slideshare.net/Oorlogsbronnen/historicidagen-2017-collectieontsluiting-next-level-de-ijsberg-zichtbaar-maken
37. We need to learn each other’s language
Humanities Language Technology
Close reading Deep reading
Coding Annotating
Programming Coding
Tool criticism Evaluation
… ….
38. We need to learn about each others’ research methods
Photo by DAVID ILIFF. License: CC-BY-SA 3.0 Image source: https://atos.net/content/dam/global/images/atos-cartesius-supercomputer-by-bull-copyright-surfsara.jpg
39. We need to look
beyond our own
domain
Image source: https://pxhere.com/en/photo/994857
(It may not always be easy…)
41. Going forward
• What questions would you like to answer with digital methods?
• What awesome datasets/tools do you have?
• How do you like your coffee?
image source: http://www.independent.ie/incoming/article31308951.ece/ALTERNATES/h342/tea.jpg