Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Finding common ground between text, maps, and tables for quantitative and qualitative research

175 views

Published on

Invited talk given at 8th AIUCD Conference 2019 – ‘Pedagogy, teaching, and research in the age of Digital Humanities’

http://aiucd2019.uniud.it/

24 January 2019, Udine, Italy

Published in: Science
  • Be the first to comment

  • Be the first to like this

Finding common ground between text, maps, and tables for quantitative and qualitative research

  1. 1. Finding common ground between text, maps, and tables for quantitative and qualitative research Marieke van Erp merpeltje D I G I TA L H U M A N I T I E S L A B
  2. 2. This talk • The Dutch DH Landscape • CLARIAH • Use case 1: diachronic & domain specific query expansion • Use case 2: Amsterdam Time Machine • How the digital affects the humanities • Challenges ahead
  3. 3. The Dutch Digital Humanities Landscape
  4. 4. Digital Humanities Lab History, Literary Studies, History of Science & Scholarship Social History Dutch Language & Culture https://huc.knaw.nl/
  5. 5. Digital Humanities Lab • Our research is to develop new language technology methods for the humanities • Focus on big ‘textual’ data • Interdisciplinary • Inter-institutional (joint research group of Huygens ING, IISH and Meertens Institute) Melvin Wevers Adina Nerghes Marieke van Erp
  6. 6. What is Digital Humanities? Humanities Technology Data
  7. 7. What is Digital Humanities? Humanities Technology Data
  8. 8. 4 hours
  9. 9. +
  10. 10. • NWO Funded: • CLARIAH CORE: 2015-2018 (M€ 12.6) • CLARIAH Plus: 2019 - 2024 (M€13.8) • Design, implement and exploit the Dutch part of the European CLARIN and DARIAH infrastructure • Focus areas: • Linguistics (WP3) • Socio-economic history (WP4) • Media studies (WP5) • Content of Text (WP6) (CLARIAH Plus)
  11. 11. • Focus areas are brought together by WP1 (Management & Dissemination) and WP2 (Infrastructure) • Developed technology is tested in research pilot projects: • CLARIAH Pilots: • Total budget: €700K • 16 projects funded • CLARIAH-eScience pilots: • Total budget: €300K cash + 4.5 FTE in kind • 4 projects funded
  12. 12. • Focus areas are brought together by WP1 (Management & Dissemination) and WP2 (Infrastructure) • Developed technology is tested in research pilot projects: • CLARIAH Pilots: • Total budget: €700K • 16 projects funded • CLARIAH-eScience pilots: • Total budget: €300K cash + 4.5 FTE in kind • 4 projects funded Photos provided by National Library of the Netherlands
  13. 13. Use case 1: Diachronic & Domain-specific query expansion (in collaboration with Victor de Boer & Rinke Hoekstra)
  14. 14. What is a ‘heikeuter’? En van de schamelheid zijner plaggen had er de heikeuter nog eerst den langen weg te gaan tot de burgers van Venlo, eer hij de winst van zijn arbeid ingeruild zag tegen ’t noodige voor een schraal bestaan. (Felix Rutten, 1918, Ons mooie Limburg, DBNL) And because of the poverty of his soil, the heikeuter was still a long way away from the burghers of Venlo, before he would see the benefits of his toil traded in against the bare necessities for a meagre existence. (Felix Rutten, 1918, Ons mooie Limburg, DBNL)
  15. 15. Searching for Historical Occupations • Historical international classification of occupations. • Central set of occupations (English labels) + labels in Dutch, Norwegian, German… • Aligned sources provide even more labels • Expressed as SKOS (CEDAR, WP4) https://socialhistory.org/ nl/projects/hisco-history- work
  16. 16. WP3: Linguistics WP4: Socio-economic history WP5: Media Studies
  17. 17. Alignment with GTAA using CultuurLink (WP5) • 153 mappings http://gtaa.beeldengeluid.nl
  18. 18. WP3: Linguistics WP4: Socio-economic history WP5: Media Studies
  19. 19. Alignment with Brouwers through Lemon (WP3)
  20. 20. WP3: Linguistics WP4: Socio-economic history WP5: Media Studies
  21. 21. Use case 2: Amsterdam Time Machine
  22. 22. Amsterdam Time Machine • WP5: Can we map Amsterdam cinema audiences? • WP3: Can we reconstruct Amsterdam dialects and sociolects? • WP4: Can we measure social mobility? • Pilot project funded by CLARIAH • Amsterdam Time Machine consortium part of larger EU consortium
  23. 23. Media studies: Amsterdam Cinema Audiences • Audiences • For a particular cinema, film, or screening? • Three main concepts of ‘audience’ (Christie, 2012) • Individual spectator • Imagined audience (“they”, “we”) • Economic or statistical audience • This use case: early 20th-century audiences for cinemas in Amsterdam • Based on slide by: Vincent Baptist, Julia Noordegraaf & Thunnis van Oort
  24. 24. Cinema Context • Main database entities  • Screenings • Films (linked to IMDb) • Cinemas • People • Companies • Audiences? • Mapping cinema data + contextual data • Based on slide by: Vincent Baptist, Julia Noordegraaf & Thunnis van Oort
  25. 25. Cinema locations active between 1907 - 1928 Based on slide by: Vincent Baptist, Julia Noordegraaf & Thunnis van Oort
  26. 26. Cinemas (1907 - 1928) according to seating capacity Based on slide by: Vincent Baptist, Julia Noordegraaf & Thunnis van Oort
  27. 27. Top Film Genres in Cinemas (1907 - 1928) Based on slide by: Vincent Baptist, Julia Noordegraaf & Thunnis van Oort
  28. 28. Cinema locations and tram lines (1921) Based on slide by: Vincent Baptist, Julia Noordegraaf & Thunnis van Oort
  29. 29. Average yearly house rent (1919) per Neighborhood (1909) Based on slide by: Vincent Baptist, Julia Noordegraaf & Thunnis van Oort
  30. 30. Linguistics: 19 Neighbourhoods, 19 dialects?
  31. 31. Research sources • Primary and secondary sources about Amsterdam dialect(s) in the 19th century (e.g. dictionaries, glossaries, historical descriptions of the city and/or specific neighbourhoods) • Recordings of dialect speakers born in the late 19th or early 20th century (Nederlandse Dialectenbank, Nederlandse Liederenbank) • Results of a survey on the pronunciation and words of the Amsterdam dialect, conducted in 1877 Based on slide by: Kristel Doreleijers, Nicoline van der Sijs & Marieke van Erp
  32. 32. Dialects Kattenburgs Sound /eui/ à /ui/ ‘Fast talking’: “mójjók geskórre wórre” Haarlemmerdijks Sound /oi/ à /ui/ “Haarlemmerdijkies maken”: arguing Jodenhoeks Verbal affix -t: “ik gaat” Common determiner: “de kind” Typical phrase: “Weet ik veel” Jordanees “appies”: potatoes “Dat neem ik niet”: not at peace Based on slide by: Kristel Doreleijers, Nicoline van der Sijs & Marieke van Erp
  33. 33. Sociolects Apart from neighbourhood-specific dialects there are three sociolects, basically spoken through the whole city • 1. Bargoens or the argot of thieves, beggars and tramps) = sociolect of lower class (collected by J.G.M. Moormann) • 2. High class: generally related to the ‘Kalverstraats’ dialect (associated with the shopping street De Kalverstraat) • Most identical to the Dutch Standard Language • Often described as ‘posh language’; a touch of French • Sources: Bible stories and fairy tales translated into the sociolect of the high class. • 3. Middle class ! Less frequently described • Some sources describe the language of the bourgeoisie as a language that avoids low class words and sounds • Jan Stroop (former Meertens linguist) made up a lexicon of the middle class based on an electronic dictionary of Dutch (WNT) Based on slide by: Kristel Doreleijers, Nicoline van der Sijs & Marieke van Erp
  34. 34. Data collection • We select relevant data from all sources (relevant data = dialect or sociolect words or features that are indicated as prominent, salient for nineteenth-century Amsterdam as a whole, (a) specific neighbourhood(s) or social class • We subsequently store and organise these data in a large database (currently in Excel/FileMaker format) • In order to build up our database we have identified ten categories/variables to structure the data collection Based on slide by: Kristel Doreleijers, Nicoline van der Sijs & Marieke van Erp
  35. 35. Socio-economic History: The Amsterdam Elite Slide credit: Ivo Zandhuis
  36. 36. Socio-economic History: The Amsterdam Elite Slide credit: Ivo Zandhuis
  37. 37. Socio-economic History: The Amsterdam Elite Slide credit: Ivo Zandhuis
  38. 38. xsd:string ontolex: LexicalEntry rdfs:label penn:Tag ontolex:LexicalSense ontolex:Form olia:hasTag ontolex:sense ontolex:canonicalForm ontolex:Formontolex:otherForm lemon-cltl:Usage xsd:date xsd:date lemon:Sense Definition ontolex:Lexical Concept ontolex:definition ontolex:isSenseOf lemon-cltl:periodEnd ontolex:usage skos:pr skos:re lemon-cltl:periodStart adl:wijk lemon-cltl:geographicArea dbo:Thing dct:subject skos:concept is a ontolex:reference lemon-cltl: SpatioTemporalScope lemon-cltl:scope lexinfo:Register lexinfo:register
  39. 39. Core map Amsterdam (1909) Slide credit: Mark Raat
  40. 40. ATM Status • Puzzle pieces nearly complete • 29 January: Data sprint • End of February: wrap up • Continue Amsterdam and EU collaborations
  41. 41. How the digital affects the humanities (and how the humanities affect the digital)
  42. 42. How the digital affects the humanities • New ways of looking at data/ research questions/research methods • New opportunities for innovating research • New types of research questions • Miscommunication • Cultural gap
  43. 43. How the humanities affect the digital • New ways of looking at data/ research questions/research methods • New opportunities for innovating our research • New types of research questions • Miscommunication • Cultural gap
  44. 44. Challenges ahead
  45. 45. Challenges • What do we want Digital Humanities to be? • Educating the next generation of Digital Humanities Researchers • Bridging the gap between the digital and the humanities • Sharing our research better
  46. 46. Summary • Overview of CLARIAH and KNAW HuC • 2 Use cases focused on connecting data across disciplines • Chances & Challenges for Digital Humanities Communication is key!
  47. 47. merpeltje marieke.van.erp@dh.huc.knaw.nl

×