This presentation discusses the 'CLiC Dickens Project' corpus tool. The University of Birmingham and the University of Nottingham created this corpus tool. These corpus tools are used for study in the field of digital humanities.
1. A Corpus Tool for
Digital Humanities
Research
CLiC Dickens Project
Corpus Linguistics in Context
2. Digital
Humanities
Digital humanities (DH) is an area of
scholarly activity at the intersection of
computing or digital technologies and
the disciplines of the humanities. It
includes the systematic use of digital
resources in the humanities, as well as the
analysis of their application.(Terras)
DH can be defined as new ways of doing
scholarship that involve collaborative,
transdisciplinary, and computationally
engaged research, teaching, and
publishing. It brings digital tools and
methods to the study of the humanities
with the recognition that the printed
word is no longer the main medium for
knowledge production and
distribution.(Burdick)
3. Matthew G. Kirschenbaum states in his article: “What Is
Digital Humanities and What's It Doing in English
Departments?”
"The digital humanities, also known as humanities computing, is a field
of study, research, teaching, and invention concerned with the
intersection of computing and the disciplines of the humanities. It is
methodological by nature and interdisciplinary in scope. It involves
investigation, analysis, synthesis and presentation of information in
electronic form. It studies how these media affect the disciplines in
which they are used, and what these disciplines have to contribute to
our knowledge of computing"
By producing and using new applications and techniques, DH makes
new kinds of teaching possible, while at the same time studying and
critiquing how these impact cultural heritage and digital culture.
4. Corpus Linguistics
● Corpus linguistics is the study of a language as that language is expressed in its text
corpus (plural corpora), its body of "real world" text. Corpus linguistics proposes that a
reliable analysis of a language is more feasible with corpora collected in the field—the
natural context ("realia") of that language—with minimal experimental interference.
● The text-corpus method uses the body of texts written in any natural language to derive
the set of abstract rules which govern that language. Those results can be used to
explore the relationships between that subject language and other languages which
have undergone a similar analysis. The first such corpora were manually derived from
source texts, but now that work is automated.
5. Corpus linguistics is an area of linguistics that has become possible with
the arrival of computers. Corpus linguists use electronic copies of texts
as well as linguistic data that is born-digital (blogs, twitter data, etc.) to
study a language. Corpus linguistics is a good example to show how
research methods develop and enable new perspectives and insights.
This is the same in other disciplines.
Insights from corpus linguistic research have a particular impact on
teaching English as a foreign language. For instance, dictionaries now
tend to focus on the most frequent meanings of the most frequent
words. (Mahlberg et al.)
6. Tools
Digital humanities scholars conduct research using a
range of digital tools, which can take place in setups as
small as a mobile device or as large as a virtual reality lab.
Environments for "creating, publishing and working with
digital scholarship include everything from personal
equipment to institutes and software to cyberspace."
(Musto and Gardiner)
DiRT (Digital Research Tools Directory) offers a registry of
digital research tools for scholars.
TAPoR (Text Analysis Portal for Research) is a gateway to
text analysis and retrieval tools.
Voyant Tools, which only requires the user to copy and
paste either a body of text or a URL and then click the
'reveal' button to run the program.
7. ‘CLiC’ – Corpus Linguistics
in Context
The CLiC Dickens project demonstrates
through corpus stylistics how computer-
assisted methods can be used to study
literary texts and lead to new insights
into how readers perceive fictional
characters.
As part of the project we
are developing the web
app CLiC, designed
specifically for the
analysis of literary texts.
9. Picking out the character
3. Enter Dick in the
“Search for terms” box.
Hit Return.
1. Go to the CLiC
Concordance tab
2. Select David Copperfield in
the “Search the Corpora” box
and select the
subset “All text”.
10. The Search results
will appear here
This will give you every occurrence in the novel. At this point, you
might like to scan down the lines of examples to see what strikes you
about the presentation of Mr. Dick in general. Are there any patterns
or oddities that you can see?
11. One pattern we noticed was that the verbs
associated with Mr. Dick tend not to have much
agency.
Quite a lot of the examples occur within direct
speech, where other characters are talking about
or addressing Mr. Dick directly. We can remove
these so that we focus on the narratorial
characterisation.
In the “Concordance” column
on the right, change the
instruction in “Only in subsets”
from “All text” to “Non-quotes”.
This will allow you to peruse the
lines in more detail.
12. Isolating the reporting clause
In the “Concordance” column on the right, change the instruction in
“Search for terms” to said Mr. Dick. Don’t forget the full-stop after Mr.
You can see that around a
third of all occurrences of
narratorial mentions of Mr.
Dick are reporting of his
speech. If you look at your
long list, you might also add
in other speech verbs such
as returned, suggested,
rejoined, asked, cried, and so
on. In other words, the
character is allowed to speak
for himself in a significant
proportion of
mentions.
13. Exploring the narrative comments on the character
10. Start again by going to the CLiC Concordance tab
11. In “Search the corpora”, select “David Copperfield”.
12. In “Only in subsets”, select “Long suspensions”.
13. In “Search for terms”, insert Dick
14. Hit Return.
14. Face and features of the character
15. Start again by going to the CLiC
Concordance tab
16. In “Search the corpora”, select “David
Copperfield”.
17. In “Only in subsets”, select “All text”.
18. In “Search for terms”, insert Dick
19. In “Filter rows”, try out a range of terms
such as head, face, eyes, mouth, or
looked, watched, seemed. Hit Return after
each one.
20. Try other synonyms or alternatives as
they occur to you.
15.
16. Works Cited
● Burdick, Anne. Digital Humanities. MIT Press, 2012. Accessed 6 October
2022.
● Kirschenbaum, Matthew G. “What Is Digital humanities and What's It
Doing in English Departments?” University of Victoria, 2010,
https://www.uvic.ca/humanities/english/assets/docs/kirschenbaum.pdf.
● Mahlberg, Michaela, et al. CLiC – Corpus Linguistics in Context An
Activity Book. 2017.
● Mahlberg, M., Stockwell, P., Wiegand, V. and Lentin, J. CLiC 2.1. Corpus
Linguistics in Context, 2020
● Musto, Ronald G., and Eileen Gardiner. “The Digital Humanities: A Primer
for Students and Scholars.” Cambridge University Press, 2015, p. 83.
● Terras, Melissa. “Quantifying Digital Humanities.” UCL Centre for Digital
Humanities, 2011, https://www.ucl.ac.uk/infostudies/melissa-
terras/DigitalHumanitiesInfographic.pdf.