Finding common ground between text, maps, and tables for quantitative and qualitative research
Jan. 24, 2019•0 likes
1 likes
Be the first to like this
Show More
•339 views
views
Total views
0
On Slideshare
0
From embeds
0
Number of embeds
0
Download to read offline
Report
Science
Invited talk given at 8th AIUCD Conference 2019 – ‘Pedagogy, teaching, and research in the age of Digital Humanities’
http://aiucd2019.uniud.it/
24 January 2019, Udine, Italy
Finding common ground between text, maps, and tables for quantitative and qualitative research
Finding common ground between
text, maps, and tables for
quantitative and qualitative research
Marieke van Erp
merpeltje
D I G I TA L H U M A N I T I E S L A B
This talk
• The Dutch DH Landscape
• CLARIAH
• Use case 1: diachronic & domain
specific query expansion
• Use case 2: Amsterdam Time
Machine
• How the digital affects the
humanities
• Challenges ahead
Digital Humanities Lab
• Our research is to develop new
language technology methods
for the humanities
• Focus on big ‘textual’ data
• Interdisciplinary
• Inter-institutional (joint research
group of Huygens ING, IISH
and Meertens Institute)
Melvin Wevers
Adina Nerghes
Marieke van Erp
• NWO Funded:
• CLARIAH CORE: 2015-2018 (M€ 12.6)
• CLARIAH Plus: 2019 - 2024 (M€13.8)
• Design, implement and exploit the Dutch part of the European CLARIN and
DARIAH infrastructure
• Focus areas:
• Linguistics (WP3)
• Socio-economic history (WP4)
• Media studies (WP5)
• Content of Text (WP6) (CLARIAH Plus)
• Focus areas are brought together by WP1 (Management & Dissemination)
and WP2 (Infrastructure)
• Developed technology is tested in research pilot projects:
• CLARIAH Pilots:
• Total budget: €700K
• 16 projects funded
• CLARIAH-eScience pilots:
• Total budget: €300K cash + 4.5 FTE in kind
• 4 projects funded
• Focus areas are brought together by WP1 (Management & Dissemination)
and WP2 (Infrastructure)
• Developed technology is tested in research pilot projects:
• CLARIAH Pilots:
• Total budget: €700K
• 16 projects funded
• CLARIAH-eScience pilots:
• Total budget: €300K cash + 4.5 FTE in kind
• 4 projects funded
Photos provided by National Library of the Netherlands
Use case 1: Diachronic & Domain-specific query expansion
(in collaboration with Victor de Boer & Rinke Hoekstra)
What is a ‘heikeuter’?
En van de schamelheid zijner plaggen had er de heikeuter nog
eerst den langen weg te gaan tot de burgers van Venlo, eer hij de
winst van zijn arbeid ingeruild zag tegen ’t noodige voor een
schraal bestaan. (Felix Rutten, 1918, Ons mooie Limburg, DBNL)
And because of the poverty of his soil, the heikeuter was still a
long way away from the burghers of Venlo, before he would see
the benefits of his toil traded in against the bare necessities for a
meagre existence. (Felix Rutten, 1918, Ons mooie Limburg,
DBNL)
Searching for Historical Occupations
• Historical international
classification of occupations.
• Central set of occupations
(English labels) + labels in
Dutch, Norwegian, German…
• Aligned sources provide even
more labels
• Expressed as SKOS (CEDAR,
WP4)
https://socialhistory.org/
nl/projects/hisco-history-
work
Amsterdam Time Machine
• WP5: Can we map Amsterdam cinema audiences?
• WP3: Can we reconstruct Amsterdam dialects and
sociolects?
• WP4: Can we measure social mobility?
• Pilot project funded by CLARIAH
• Amsterdam Time Machine consortium part of larger EU
consortium
Media studies: Amsterdam Cinema Audiences
• Audiences
• For a particular cinema, film, or screening?
• Three main concepts of ‘audience’ (Christie, 2012)
• Individual spectator
• Imagined audience (“they”, “we”)
• Economic or statistical audience
• This use case: early 20th-century audiences for cinemas in Amsterdam
•
Based on slide by: Vincent Baptist, Julia Noordegraaf & Thunnis van Oort
Cinema Context
• Main database entities
• Screenings
• Films (linked to IMDb)
• Cinemas
• People
• Companies
• Audiences?
• Mapping cinema data + contextual data
•
Based on slide by: Vincent Baptist, Julia Noordegraaf & Thunnis van Oort
Cinema locations active between 1907 - 1928
Based on slide by: Vincent Baptist, Julia Noordegraaf & Thunnis van Oort
Cinemas (1907 - 1928) according to seating capacity
Based on slide by: Vincent Baptist, Julia Noordegraaf & Thunnis van Oort
Top Film Genres in Cinemas (1907 - 1928)
Based on slide by: Vincent Baptist, Julia Noordegraaf & Thunnis van Oort
Cinema locations and tram lines (1921)
Based on slide by: Vincent Baptist, Julia Noordegraaf & Thunnis van Oort
Average yearly house rent (1919) per Neighborhood (1909)
Based on slide by: Vincent Baptist, Julia Noordegraaf & Thunnis van Oort
Research sources
• Primary and secondary sources about Amsterdam dialect(s)
in the 19th century (e.g. dictionaries, glossaries, historical
descriptions of the city and/or specific neighbourhoods)
• Recordings of dialect speakers born in the late 19th or early
20th century (Nederlandse Dialectenbank, Nederlandse
Liederenbank)
• Results of a survey on the pronunciation and words of the
Amsterdam dialect, conducted in 1877
Based on slide by: Kristel Doreleijers, Nicoline van der Sijs & Marieke van Erp
Dialects
Kattenburgs
Sound /eui/ à /ui/
‘Fast talking’: “mójjók
geskórre wórre”
Haarlemmerdijks
Sound /oi/ à /ui/
“Haarlemmerdijkies
maken”: arguing
Jodenhoeks
Verbal affix -t: “ik gaat”
Common determiner: “de kind”
Typical phrase: “Weet ik veel”
Jordanees
“appies”: potatoes
“Dat neem ik niet”: not at
peace
Based on slide by: Kristel Doreleijers, Nicoline van der Sijs & Marieke van Erp
Sociolects
Apart from neighbourhood-specific dialects there are three sociolects,
basically spoken through the whole city
• 1. Bargoens or the argot of thieves, beggars and tramps) = sociolect
of lower class (collected by J.G.M. Moormann)
• 2. High class: generally related to the ‘Kalverstraats’ dialect
(associated with the shopping street De Kalverstraat)
• Most identical to the Dutch Standard Language
• Often described as ‘posh language’; a touch of French
• Sources: Bible stories and fairy tales translated into the
sociolect of the high class.
• 3. Middle class ! Less frequently described
• Some sources describe the language of the bourgeoisie as a
language that avoids low class words and sounds
• Jan Stroop (former Meertens linguist) made up a lexicon of the
middle class based on an electronic dictionary of Dutch (WNT)
Based on slide by: Kristel Doreleijers, Nicoline van der Sijs & Marieke van Erp
Data collection
• We select relevant data from all sources (relevant data
= dialect or sociolect words or features that are
indicated as prominent, salient for nineteenth-century
Amsterdam as a whole, (a) specific neighbourhood(s) or
social class
• We subsequently store and organise these data in a
large database (currently in Excel/FileMaker format)
• In order to build up our database we have identified ten
categories/variables to structure the data collection
Based on slide by: Kristel Doreleijers, Nicoline van der Sijs & Marieke van Erp
ATM Status
• Puzzle pieces nearly complete
• 29 January: Data sprint
• End of February: wrap up
• Continue Amsterdam and EU
collaborations
How the digital affects the humanities
(and how the humanities affect the digital)
How the digital
affects the humanities
• New ways of looking at data/
research questions/research
methods
• New opportunities for
innovating research
• New types of research
questions
• Miscommunication
• Cultural gap
How the humanities
affect the digital
• New ways of looking at data/
research questions/research
methods
• New opportunities for
innovating our research
• New types of research
questions
• Miscommunication
• Cultural gap
Challenges
• What do we want Digital
Humanities to be?
• Educating the next generation
of Digital Humanities
Researchers
• Bridging the gap between the
digital and the humanities
• Sharing our research better
Summary
• Overview of CLARIAH and
KNAW HuC
• 2 Use cases focused on
connecting data across
disciplines
• Chances & Challenges for
Digital Humanities
Communication is key!