Digital Humanities
A brief introduction to the field
Dr Anouk Lang
Department of English
University of Edinburgh
@a_e_lang
Thurs 23 July 2015
6-7.30pm
Queen Mary
University of London
#adpsummer
“To infinity & beyond”: Where we’re headed
Working with data: where are the pitfalls?
- structured vs. unstructured
Overview of the field
- historical background and debates
Sample projects and tools
- textual, spatial and network analysis
Resources
- summer schools, workshops, teach-yourself tutorials, Twitter
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Working with data as a humanities scholar
Ben Schmidt, “Gendered language in teacher reviews”
http://benschmidt.org/profGender
 interdisciplinary  serious fun
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Randall Munroe, “Correlation”, XKCD, http://xkcd.com/552.
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
http://benschmidt.org/profGender
What are the limitations?
 data collection  sampling
What is obscured?
 gender of reviewers  context
 gender of reviewees  field size
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Data:
you’re not just using it but producing it
 Facebook’s “emotional contagion” study
http://www.pnas.org/content/111/24/8788.full.pdf
 Facebook voting study
www.nature.com/nature/journal/v489/n7415/pdf/nature11421.pdf
 Homicide Watch homicidewatch.org
 And, obviously, NSA/GCHQ/etc
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Data:
structured vs. unstructured
 information that is organised in some way
 vs information that comes without a data model
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Rate My Professors: one data model
Data:
structured vs. unstructured
 information that is organised in some way
 vs information that comes without a data model
 Schmidt’s dataset: partially structured but also in
need of some curation
 Data from an API, eg. Twitter data
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Twitter data: highly structured
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Humanities data: often unstructured
Image from Flickr: Jason Weinberger, “Mahler Symphony
5, IV Adagietto [page 15]”, CC BY 2.0 licence.
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Jad Abumrad and Robert Crulwich, “Vanishing Words”,
RadioLab, www.radiolab.org/story/91960-vanishing-words/.
Concordancing software: AntConc (Laurence Anthony)
www.laurenceanthony.net/software/antconc/
Query 1: all instances of look as a simple text string
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Text marked up with tags denoting parts of speech
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Query 2: all instances of look as a noun (look_NN*)
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Query 3: all instances of look as a verb (look_VV*)
followed by a preposition (*_II) then sorted 1R, 2R
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Query 4: all instances of the lemma look* sorted 1R, 2R
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Pos-tagging errors: look_NN* != look as a noun
Data:
always contingent, never objective
 Johanna Drucker & the concept of ‘capta’
 what kind of data curation is necessary?
 who else has come up with categories/data models?
 think about how to capture & structure your data early
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Overview of the field: Definitional skirmishes
Digital Humanities is a field of study in which scholarly
applications of technology are used to perform analyses and
generate insights that would be difficult or impossible to achieve
without the help of technology.
“digital humanities is more akin to a common methodological
outlook than an investment in any one specific set of texts or
even technologies”. (Matthew Kirschenbaum)
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Or you could crowdsource the definition …
http://whatisdigitalhumanities.com/
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Or you could crowdsource the definition …
http://whatisdigitalhumanities.com/
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Historical antecedents: Humanities Computing
Roberto Busa, IBM & the Index Thomisticus
Livia Canestraro, one of the
female punchcard operators
for the Index Thomisticus.
CC-BY-NC, license by
permission of CIRCSE
Research Centre, Università
Cattolica del Sacro Cuore,
Milan.
Via Melissa Terras,
melissaterras.blogspot.co.uk
/2013_10_01_archive.html
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Disciplinary antecedents
• corpus linguistics, computational linguistics & NLP
• GIS (Geographic Information System / Science)
• within History, Cliometrics
• others …
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Readings giving historical background
• Kirschenbaum, Matthew G. ‘What Is Digital Humanities and
What’s It Doing in English Departments?’ ADE Bulletin 150
(2010): 1–7. http://mkirschenbaum.files.wordpress.com
/2011/01/kirschenbaum_ade150.pdf.
• Liu, Alan. ‘The Meaning of the Digital Humanities’. PMLA
128.2 (2013): 409–423. http://www.jstor.org/stable/23489068.
• Hockey, Susan. ‘The History of Humanities Computing’. In
Susan Schreibman, Ray Siemens and John Unsworth, eds., A
Companion to Digital Humanities (Oxford: Blackwell, 2004).
http://www.digitalhumanities.org/companion/.
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Some broad debates and tensions in the field
• from outside the field: too empiricist, too positivistic, too
uncritical of the use of computers
• from within the field: not sufficiently
statistically/algorithmically literate, use of black boxes
• too apolitical: where are race, gender, & identity?
• too focused on literature
• “you’re not a real digital humanist unless you can code”
• “more hack, less yack”
Examples of projects and tools
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Image from In the Forbidden Land: An account of a journey in
Tibet ... With a map and two hundred and fifty illustrations (1898), p.154.
From the British Library’s Flickr collection of images in the public domain
Textual analysis
Mapping
Network analysis
0 day lydia dear replied felt cried aunt hear uncle charlotte
1 wickham made till evening added world knew married father visit
2 lady man young catherine brother ladies happiness half friends settled
3 make great give hope thought pleasure present general affection conversation
4 time sister mother love feelings ill speak leave meryton life
5 mr darcy bingley miss collins mind london civility convinced feeling
6 mrs bennet family long gardiner morning town found character coming
7 elizabeth jane letter longbourn happy answer kind left kitty reason
8 good friend house lizzy subject sisters father netherfield told home
9 room manner daughter heard sir moment looked woman immediately began
For more on topic modelling, start at Vol. 2 issue 1, Journal of Digital Humanities:
http://journalofdigitalhumanities.org/2-1/dh-contribution-to-topic-modeling/
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Textual analysis: Topic modelling
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Textual analysis: Stylometry
Authorship attribution: “the
science of inferring
characteristics of the author
from the characteristics of
documents written by that
author” (Juola 2006).
Deciphering
The Dynamiter
thedynamiter.llc.ed.ac.uk
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Textual analysis: Stylometry
Deciphering
The Dynamiter
thedynamiter.llc.ed.ac.uk
green = Fanny
black = Fanny
black = Robert
orange = Robert
red = authorship uncertain
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Textual analysis: Stylometry
Clear differentiation vs. overlap between authors
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Using DH in research-led teaching
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Spatial analysis: digital maps
Salem Witch
Trials (U Virginia)
Mapping Modernist Paris (Lang)
LitLong (Edinburgh U)
Mapping the Lakes
(Ian Gregory)
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Spatial analysis: digital maps
Franco Moretti,
“Network Theory, Plot
Analysis”, New Left
Review 68 (2011): 81.
Also available as a
LitLab pamphlet: see
litlab.stanford.edu
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Network analysis and visualisations
Moretti, “Network Theory”, 87.
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Moretti, “Network Theory”, 87.
Further resources
• DHOxSS: DH Summer School at Oxford
• Lancaster Summer Schools
• Further afield: DHSI, HILT, DH@Leipzig
• The Programming Historian
(http://programminghistorian.org)
• MOOCs, eg. IVMOOC, Coursera, FutureLearn
• Training courses at your institution, eg. ArcGIS
• Teach-yourself tutorials, eg. Codecademy
• DH Q&A http://digitalhumanities.org/answers/
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Matthew Jockers, “Revealing Sentiment and Plot Arcs with the
Syuzhet Package”, blog post, Matthew L. Jockers 2 Feb. 2015.
www.matthewjockers. net/2015/02/02/syuzhet/. Code at
https://github.com/mjockers/syuzhet.
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
Eileen Clancy, “A Fabula of
Syuzhet II: Continuing the
tale of digital humanities
and sentiment analysis”.
Storify of tweets from 24
Mar-10 April 2015.
https://storify.com/clancyne
wyork/a-fabula-of-syuzhet-ii.
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
until Syuzhet provides filters that don’t cause ringing
artifacts [extra lobes introduced into a graph by an
ideal low-pass filter], it is likely that most foundation
shapes will be inaccurate representations of the
stories’ true plot trajectories. Since the foundation
shape may in places be the opposite of the emotional
trajectory, two foundation shapes may look identical
despite having opposing emotional valences.
Jockers’s claim … may be due more to ringing
artifacts than to an actual similarity between the
emotional structures of the analyzed novels.
Annie Swafford, “Problems with the Syuzhet Package”,
blog post, Anglophile in Academia, 2 March 2015.
annieswafford.wordpress.com/2015/03/02/syuzhet/.
adapted from Allie Brosh, Hyperbole and a
Half (hyperboleandahalf.blogspot.co.uk)
@a_e_lang | anouk.lang@ed.ac.uk | #adpsummer

Digital Humanities: A brief introduction to the field

  • 1.
    Digital Humanities A briefintroduction to the field Dr Anouk Lang Department of English University of Edinburgh @a_e_lang Thurs 23 July 2015 6-7.30pm Queen Mary University of London #adpsummer
  • 2.
    “To infinity &beyond”: Where we’re headed Working with data: where are the pitfalls? - structured vs. unstructured Overview of the field - historical background and debates Sample projects and tools - textual, spatial and network analysis Resources - summer schools, workshops, teach-yourself tutorials, Twitter @a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
  • 3.
    Working with dataas a humanities scholar Ben Schmidt, “Gendered language in teacher reviews” http://benschmidt.org/profGender  interdisciplinary  serious fun @a_e_lang | anouk.lang@ed.ac.uk | #adpsummer Randall Munroe, “Correlation”, XKCD, http://xkcd.com/552.
  • 4.
    @a_e_lang | anouk.lang@ed.ac.uk| #adpsummer http://benschmidt.org/profGender
  • 5.
    What are thelimitations?  data collection  sampling What is obscured?  gender of reviewers  context  gender of reviewees  field size @a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
  • 6.
    Data: you’re not justusing it but producing it  Facebook’s “emotional contagion” study http://www.pnas.org/content/111/24/8788.full.pdf  Facebook voting study www.nature.com/nature/journal/v489/n7415/pdf/nature11421.pdf  Homicide Watch homicidewatch.org  And, obviously, NSA/GCHQ/etc @a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
  • 7.
    Data: structured vs. unstructured information that is organised in some way  vs information that comes without a data model @a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
  • 8.
    @a_e_lang | anouk.lang@ed.ac.uk| #adpsummer Rate My Professors: one data model
  • 9.
    Data: structured vs. unstructured information that is organised in some way  vs information that comes without a data model  Schmidt’s dataset: partially structured but also in need of some curation  Data from an API, eg. Twitter data @a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
  • 10.
    @a_e_lang | anouk.lang@ed.ac.uk| #adpsummer Twitter data: highly structured
  • 11.
    @a_e_lang | anouk.lang@ed.ac.uk| #adpsummer Humanities data: often unstructured Image from Flickr: Jason Weinberger, “Mahler Symphony 5, IV Adagietto [page 15]”, CC BY 2.0 licence.
  • 12.
    @a_e_lang | anouk.lang@ed.ac.uk| #adpsummer Jad Abumrad and Robert Crulwich, “Vanishing Words”, RadioLab, www.radiolab.org/story/91960-vanishing-words/.
  • 13.
    Concordancing software: AntConc(Laurence Anthony) www.laurenceanthony.net/software/antconc/ Query 1: all instances of look as a simple text string
  • 14.
    @a_e_lang | anouk.lang@ed.ac.uk| #adpsummer Text marked up with tags denoting parts of speech
  • 15.
    @a_e_lang | anouk.lang@ed.ac.uk| #adpsummer Query 2: all instances of look as a noun (look_NN*)
  • 16.
    @a_e_lang | anouk.lang@ed.ac.uk| #adpsummer Query 3: all instances of look as a verb (look_VV*) followed by a preposition (*_II) then sorted 1R, 2R
  • 17.
    @a_e_lang | anouk.lang@ed.ac.uk| #adpsummer Query 4: all instances of the lemma look* sorted 1R, 2R
  • 18.
    @a_e_lang | anouk.lang@ed.ac.uk| #adpsummer Pos-tagging errors: look_NN* != look as a noun
  • 19.
    Data: always contingent, neverobjective  Johanna Drucker & the concept of ‘capta’  what kind of data curation is necessary?  who else has come up with categories/data models?  think about how to capture & structure your data early @a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
  • 20.
    Overview of thefield: Definitional skirmishes Digital Humanities is a field of study in which scholarly applications of technology are used to perform analyses and generate insights that would be difficult or impossible to achieve without the help of technology. “digital humanities is more akin to a common methodological outlook than an investment in any one specific set of texts or even technologies”. (Matthew Kirschenbaum) @a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
  • 21.
    @a_e_lang | anouk.lang@ed.ac.uk| #adpsummer Or you could crowdsource the definition … http://whatisdigitalhumanities.com/
  • 22.
    @a_e_lang | anouk.lang@ed.ac.uk| #adpsummer Or you could crowdsource the definition … http://whatisdigitalhumanities.com/
  • 23.
    @a_e_lang | anouk.lang@ed.ac.uk| #adpsummer Historical antecedents: Humanities Computing Roberto Busa, IBM & the Index Thomisticus Livia Canestraro, one of the female punchcard operators for the Index Thomisticus. CC-BY-NC, license by permission of CIRCSE Research Centre, Università Cattolica del Sacro Cuore, Milan. Via Melissa Terras, melissaterras.blogspot.co.uk /2013_10_01_archive.html
  • 24.
    @a_e_lang | anouk.lang@ed.ac.uk| #adpsummer Disciplinary antecedents • corpus linguistics, computational linguistics & NLP • GIS (Geographic Information System / Science) • within History, Cliometrics • others …
  • 25.
    @a_e_lang | anouk.lang@ed.ac.uk| #adpsummer Readings giving historical background • Kirschenbaum, Matthew G. ‘What Is Digital Humanities and What’s It Doing in English Departments?’ ADE Bulletin 150 (2010): 1–7. http://mkirschenbaum.files.wordpress.com /2011/01/kirschenbaum_ade150.pdf. • Liu, Alan. ‘The Meaning of the Digital Humanities’. PMLA 128.2 (2013): 409–423. http://www.jstor.org/stable/23489068. • Hockey, Susan. ‘The History of Humanities Computing’. In Susan Schreibman, Ray Siemens and John Unsworth, eds., A Companion to Digital Humanities (Oxford: Blackwell, 2004). http://www.digitalhumanities.org/companion/.
  • 26.
    @a_e_lang | anouk.lang@ed.ac.uk| #adpsummer Some broad debates and tensions in the field • from outside the field: too empiricist, too positivistic, too uncritical of the use of computers • from within the field: not sufficiently statistically/algorithmically literate, use of black boxes • too apolitical: where are race, gender, & identity? • too focused on literature • “you’re not a real digital humanist unless you can code” • “more hack, less yack”
  • 27.
    Examples of projectsand tools @a_e_lang | anouk.lang@ed.ac.uk | #adpsummer Image from In the Forbidden Land: An account of a journey in Tibet ... With a map and two hundred and fifty illustrations (1898), p.154. From the British Library’s Flickr collection of images in the public domain Textual analysis Mapping Network analysis
  • 28.
    0 day lydiadear replied felt cried aunt hear uncle charlotte 1 wickham made till evening added world knew married father visit 2 lady man young catherine brother ladies happiness half friends settled 3 make great give hope thought pleasure present general affection conversation 4 time sister mother love feelings ill speak leave meryton life 5 mr darcy bingley miss collins mind london civility convinced feeling 6 mrs bennet family long gardiner morning town found character coming 7 elizabeth jane letter longbourn happy answer kind left kitty reason 8 good friend house lizzy subject sisters father netherfield told home 9 room manner daughter heard sir moment looked woman immediately began For more on topic modelling, start at Vol. 2 issue 1, Journal of Digital Humanities: http://journalofdigitalhumanities.org/2-1/dh-contribution-to-topic-modeling/ @a_e_lang | anouk.lang@ed.ac.uk | #adpsummer Textual analysis: Topic modelling
  • 29.
    @a_e_lang | anouk.lang@ed.ac.uk| #adpsummer Textual analysis: Stylometry Authorship attribution: “the science of inferring characteristics of the author from the characteristics of documents written by that author” (Juola 2006). Deciphering The Dynamiter thedynamiter.llc.ed.ac.uk
  • 30.
    @a_e_lang | anouk.lang@ed.ac.uk| #adpsummer Textual analysis: Stylometry Deciphering The Dynamiter thedynamiter.llc.ed.ac.uk green = Fanny black = Fanny black = Robert orange = Robert red = authorship uncertain
  • 31.
    @a_e_lang | anouk.lang@ed.ac.uk| #adpsummer Textual analysis: Stylometry Clear differentiation vs. overlap between authors
  • 32.
    @a_e_lang | anouk.lang@ed.ac.uk| #adpsummer Using DH in research-led teaching
  • 33.
    @a_e_lang | anouk.lang@ed.ac.uk| #adpsummer Spatial analysis: digital maps Salem Witch Trials (U Virginia) Mapping Modernist Paris (Lang) LitLong (Edinburgh U) Mapping the Lakes (Ian Gregory)
  • 34.
    @a_e_lang | anouk.lang@ed.ac.uk| #adpsummer Spatial analysis: digital maps
  • 35.
    Franco Moretti, “Network Theory,Plot Analysis”, New Left Review 68 (2011): 81. Also available as a LitLab pamphlet: see litlab.stanford.edu @a_e_lang | anouk.lang@ed.ac.uk | #adpsummer Network analysis and visualisations
  • 36.
    Moretti, “Network Theory”,87. @a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
  • 37.
    @a_e_lang | anouk.lang@ed.ac.uk| #adpsummer Moretti, “Network Theory”, 87.
  • 38.
    Further resources • DHOxSS:DH Summer School at Oxford • Lancaster Summer Schools • Further afield: DHSI, HILT, DH@Leipzig • The Programming Historian (http://programminghistorian.org) • MOOCs, eg. IVMOOC, Coursera, FutureLearn • Training courses at your institution, eg. ArcGIS • Teach-yourself tutorials, eg. Codecademy • DH Q&A http://digitalhumanities.org/answers/ @a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
  • 39.
    Matthew Jockers, “RevealingSentiment and Plot Arcs with the Syuzhet Package”, blog post, Matthew L. Jockers 2 Feb. 2015. www.matthewjockers. net/2015/02/02/syuzhet/. Code at https://github.com/mjockers/syuzhet. @a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
  • 40.
    Eileen Clancy, “AFabula of Syuzhet II: Continuing the tale of digital humanities and sentiment analysis”. Storify of tweets from 24 Mar-10 April 2015. https://storify.com/clancyne wyork/a-fabula-of-syuzhet-ii. @a_e_lang | anouk.lang@ed.ac.uk | #adpsummer
  • 41.
    until Syuzhet providesfilters that don’t cause ringing artifacts [extra lobes introduced into a graph by an ideal low-pass filter], it is likely that most foundation shapes will be inaccurate representations of the stories’ true plot trajectories. Since the foundation shape may in places be the opposite of the emotional trajectory, two foundation shapes may look identical despite having opposing emotional valences. Jockers’s claim … may be due more to ringing artifacts than to an actual similarity between the emotional structures of the analyzed novels. Annie Swafford, “Problems with the Syuzhet Package”, blog post, Anglophile in Academia, 2 March 2015. annieswafford.wordpress.com/2015/03/02/syuzhet/.
  • 42.
    adapted from AllieBrosh, Hyperbole and a Half (hyperboleandahalf.blogspot.co.uk) @a_e_lang | anouk.lang@ed.ac.uk | #adpsummer