Are Digital Literary Studies even possible?

Are Digital Literary Studies even possible?
I do want to toss around
a question that I have
been thinking about for a
long time: Can you have
computational text
analysis and literary
criticism at the same
time?
(Ramsay 2012)

What am I looking for?
● Literary criticism disguised as text analysis
● Text analysis disguised as literary criticism

“The position of Digital Humanities as a discipline is very
peculiar, being at the same time a methodology and a
discipline in its own right, aimed at the creation of theories
and methods, tools and techniques that can be used for
research and inquiry.”
Definition of DH as an academic field

“The position of Statistics as a discipline is very peculiar,
being at the same time a methodology and a discipline in its
own right, aimed at the creation of theories and methods,
tools and techniques that can be used for research and
inquiry.”
(Franco Giusti, Introduzione alla statistica, p. 20)
Definition of DH as an academic field

Big Data and Statistics
“The growing digitization of our textual and literary heritage has convinced many
academics and observers of higher education that we are currently experiencing a
renaissance in the Humanities. Some scholars argue that this mass of data is
profoundly changing the methodological toolbox of a field whose scholarship
is traditionally based on close reading and interpretation of texts. Digitization has
rendered novels, plays, poems and historical texts open to forms of statistical
analysis and visualization methods previously unavailable to these
objects. As a result, this “digital turn” is creating a vivid debate within the
Humanities about the effects that the use of algorithms might have on the
interpretation, understanding and teaching of literature and history.”
(Digital Methods in Research – Textual Heritage and Literary Studies, March 27)

Jockers' MacroAnalysis
This emerging field [...] was for a good many
decades not emerging at all [...] Technology has
certainly changed some things about the way
literary scholars go about their work, but until
recently change has been mostly at the level of
simple, even anecdotal, search. The humanities
computing/ digital humanities revolution has now
begun, and big data have been a major catalyst.
The questions we may now ask were previously
inconceivable , and to answer these questions
requires a new methodology, a new way of
thinking about our object of study.

History of Statistics: 1600-1700
Girolamo Ghilini (1589-1668)
Ristretto della civile, politica, statistica e militare scienza (1666-68)
William Petty (1623-1687)
Several Essays in Political Arithmetick (1699)
Gottfried Achenwall (1719-1772)
Staatsverfassung der Europäischen Reiche im Grundrisse (1752)

History of Statistics: 1800
● Emergence of Modern Statistics
● Statistics applied to many fields beside
government
● Calculations became increasingly complicate
● Stronger need to build mechanical calculating
machines

The Art of Compiling Statistics
● Automation of the US 1890 census
● Hollerith founded the Tabulating Machine Company, later
called IBM (from 1911 onwards)
● “Be it known that I, HERMAN HOLLERITH, of New York
city, county, and State, have invented a certain new
and useful Improvement in the Art of Compiling
Statistics; and I do hereby declare the following to be a
full, clear, and exact description of the same, reference
being had to the accompanying drawings, forming a part of
this specification, and to the figures and letters of reference
marked thereon.” (Patent US395782 A: Art of Compiling
Statistics - 1889)

1920s: IBM and Columbia U.
●
1924-26: Columbia University
Statistical Laboratory
●
1928-33: Columbia University
Statistical Bureau
●
Served as “Computer Center” for other
academic departments and outside
organizations (Rockfeller and Carnegie
Foundations, Yale, Harvard, Princeton)

New statistical machines with the mental power of 100 skilled
mathematicians in solving even highly complex algebraic problems
were demonstrated yesterday for the first time before a group of
psychologists, educational research workers and statisticians in the
laboratories of the Columbia University Statistical Bureau in
Hamilton Hall.One of the tabulators exhibited can work out and print
the results of as many as twelve difficult problems in just a single
rapid operation. It is designed to handle differences and reckon
powers of numbers up to the tenth, whereas such machines hiterto
[sic] have been able to compute only the second power of numbers.
Richard Warren and Robert M. Mendenhall, research workers at
Columbia and statistical consultants for the Carnegie Foundation for
the Advancement of Teaching, are responsible for most of the
inventions which were first announced at the educator's convention
in Atlantic City last week.
These new machines will be a tremendous boon to research, Dr.
Ben. D. Wood, Director of the Statistical Bureau, said yesterday,
through making statistical procedure more accurate, much faster
and less expensive. With the assistance of the new tabu-
1920: The first Super-computing machine?

Prof. Benjamin Wood
Pioneer in studies on learning technologies:
● an early study (1928) showing that students
taught with films learned more than those
taught with printed materials alone
● a study (1929-1931) showing that using
typewriters encouraged more and higher
quality writing in addition to more
cooperation in the classroom
●
Consulting role in developing the first
commercial test scoring machine (the
IBM805)

1949: Watson meets Busa
Hollerith 1889
● first, preparing a standard or templet
indicating the relative position or order
in which each item or characteristic of
the individual or thing is to be
recorded;
● second, forming according to such a
standard or templet a separate record
for each individual
● third, actuating a series of circuit
controlling devices, corresponding in
number and position to the standard of
templet
Busa 1951
● Transcription of text, broken down into
phrases, on to separate cards;
● Multiplication of the cards (as many as
there are words on each);
● Indication on each of the resulting
cards the respective entry (lemma);
● Selection and alphabetization of all
cards purely by spelling;
● typographical composition of the pages
for publishing.

1950s: Competing Computers
IBM
“In the late nineteenth century,
many businesses adopted a practice
that organized work using [...] an
ensemble of three to six different
devices […] More relevant is the
‘‘architecture’’ of the entire room—
including the people in it - [ ...] it
was that room, not the
individual machines, that the
electronic computer eventually
replaced.
(Ceruzzi: 16)
UNIVAC
“The flow of information through the
UNIVAC reflected Eckert and Mauchly’s
background in physics and engineering.
[…] the flow of instructions and data
in the UNIVAC mirrored the way humans
using mechanical calculators, books of
tables, and pencil and paper performed
scientific calculations […] a scientist or
engineer would not have found anything
unusual in the way a UNIVAC attacked a
problem.”
(Ceruzzi: 15)

Crunching Words before DH
● 1851: Augustus de Morgan
● 1887: T. C. Mendenhall, "The Characteristic Curves of Composition"
● 1888: C. Mascol, "Curves of Pauline and Pseudo- Pauline Style I,"
● 1893: L. A. Sherman, Analytics of Literature: A Manual for the Objective
Study of English Prose and Poetry (Boston: Ginn)
● 1898: W. Lutolawski, Principes de stylométrie
● 1935: G.K. Zipf, The psycho-biology of language; an introduction to
dynamic philology (Boston: Houghton Mifflin Company)
● 1944: G. Udny Yule, The Statistical Study of Literary Vocabulary
(Cambridge UP)

The Statistical Study of Literary Vocabulary
These discussions left in my mind a sense of
inadequacy. They did not tell me what I wanted to
know. They dealt with such details as his use of
words and idioms […] mere details, details
certainly quite useful […] but they give no faintest
notion as to what his vocabulary is really like as a
whole […] What I felt I wanted in the first place,
prior to any detail, was some summary, some
picture of the vocabulary as a whole. (p.2)

The Statistical Study of Literary Vocabulary
I decided to confine myself to a single class of
words, viz. nouns. The concordance was worked
through page by page and every noun entered on
a card together with the number of times it was
used. From these cards it was easy to book up a
table, the 'frequency of distribution' to use the
statistical term, showing the number of nouns
used once, twice, thrice [...] (p.4)

Busa's project
Like all good projects, this one began with a question: What is the
metaphysics of presence in St. Thomas Aquinas? Combing for praesens
and praesentia, he realized that such words were peripheral, and, however
unfortunately, Saint Thomas's doctrine of presence is linked with the
preposition in!
Inquiring what St. Thomas meant by "presence," the young Roberto Busa
realized that we must also study the way function-words affect
meaning-words. To study the significant phrase "in the presence" he
needed the shades of "in". His dissertation, defended in 1946, was
essentially founded on a handmade Thomistic Concordance, essentially
complete, but with one entry.
He had made 10,000 hand-written cards.
(Thomas N. Winter 1999: 6)

Early DH and IBM
"The use of the latest data-processing tools developed primarily for science and commerce may
prove a significant factor in facilitating future literary and scholarly studies."
(Paul Tasman, 1957)
1964
Literary Data Processing Conference Proceedings, September 9, 10, 11, 1964. Department of
Scientific and Technical Information, International Business Machines Corp., Data Processing
Division: White Plains, N.Y., 1964
1966
First issue of Computers and the Humanities, published by Queens College of CUNY, with the
financial assistance of IBM corporation and U.S. Steel Foundation. The Academic editor was
Prof. Joseph Raben, Department of English, Queens College

Surprise Surprise
● Stylometry is a very popular approach in
Digital Literary studies and Text Analysis today
● The R project for Statistical Computing, a
strongly functional language and environment
to statistically explore data sets, is the most
used language for literary digital studies

Leech-Short, Style in Fiction
[...] literary stylistics has, implicitly or explicitly, the goal of
explaining the relation between language and artistic
function. The motivating questions are not so much what as
why and how. From the linguist’s angle, it is ‘Why does the
author here choose this form of expression?’ From the
literary critic’s viewpoint, it is ‘How is such-and-such an
aesthetic effect achieved through language?’

Louis T. Milic
● A Quantitative Approach to the Style of
Jonathan Swift. Studies in English Literature, v.
23. The Hague: Mouton, 1967.
● Style and Stylistics; an Analytical Bibliography.
New York: Free Press, 1967.
● Stylists on Style; a Handbook with Selections
for Analysis. New York: Scribner, 1969.

Poibeau 2014
[…] computational linguists try to study the mechanisms that make
the comprehension of languages possible. They try to build tools
that show the possibilities and the limits of learning with only the
help of real language data, without dictionaries and similar
resources. They try to understand to what extent we can avoid the
use of dictionaries or of other tools that provide meanings a priori
in order to define meaning exclusively out of a corpus, inferring it
from the way in which words are used in it […] it is clear in fact that
we acquire knowledge about language from what we hear and read.

Influence and Information Cascades
Within the field of observational learning, there exists a theory of
information cascades:
“An informational cascade occurs when it is optimal for an
individual, having observed the actions of those ahead of him, to
follow the behavior of the preceding individual without regard to
his own information” […]
In other words, once a cascade begins, it tends to continue and to
create a situation of mass imitation in which individuals repeatedly
avoid the road less taken. […] At the same time, the theory tells us
that cascades are fragile; the introduction of a disruptive force, a
new signal, can cause the cascade to collapse and move in an
entirely new direction. […] some mutant writer would take some
other road, and a new cascade would follow. As a way of modeling
literary influence and intertextuality at scale, information cascades
provide an attractive theoretical framework.

Macroanalysis' Genealogy
● in part a response to Franco Moretti’s (Moretti
2000, 56-58) discussion of the need for distant
reading in literary studies
● in part related to text analysis and humanities
computing
● in part indebted to stylometry and the use of
statistics to evaluate and analyze corpora of
texts

Distant Reading
● Close reading as a method for gathering
evidence is flawed, because interpretation is
subjective and biased
● big data render close reading totally
inappropriate as a method of studying literary
history
● massive digital-text collections demand a new
type of evidence gathering and meaning
making

Linguistics and Stilistics
In recent years we have seen the emergence of computational
methods, usually using statistics, whose main feature is to be
efficient in working with big data. In a way, being efficient
was more important than being meaningful. It is not
possible to compute thousands or millions of documents in a
few seconds with a deep and meaningful analysis, even if
computers are more and more powerful. Suddenly, the easy
way is counting (forms, words, patterns and collocations,
frequencies etc.)
(Poibeau 2014)

Statistics: Why?
● Statistics is a science of the aggregate
(Scienza del collettivo)
● The statistical method is the only one that
allows to analyse big data

Statistics: Why not?
If you use a statistical method, the individual
items lose their materiality, there are
abstractions that carry only characteristic that
are investigated, erasing all the other features
that are not interesting for the research.

Aravamudan on Moretti
● Moretti's work on the long arc of the novel has
expanded our understanding of its scope and range
● European hegemony is exercised, even if he
encourages a cosmopolitan approach
● Moretti has no time for the critical interpretation of
individual fictions, except as exemplary of very
large trends that can be followed through their
tropological and formal analysis, and this is of a
piece with his grand narrative of intellectual
diffusion with Europe as the core.

Novel: Rise, Diffusion, Resistance
● The rise of the novel (Ian Watt)
● Enlarging the rise of the novel (Moretti &
Jockers)
● Resisting the Rise of the Novel (Aravamudan)

Event in the history of mediation
Enlightenment is not just a philosophical position-taking but an
institutional event in the history of mediation, a time and a
place, as well as a mode of interaction entailing the creation of a
new epistemological infrastructure when new genres and
formats for the presentation of knowledge were explored and new
associational practices developed for the collation of information.
New protocols came about, including the 'postal principle' by
which anyone can address anyone, public credit and copyright,
all of which saturated knowledge production.

Distance Transmission Absence
Or as John Guillory extends this argument, the mediations created
by the Enlightenment entailed an understanding of distance,
transmission and absence as operational between the poles of
communication, whether between individuals, objects of analysis,
or knowledge systems. Taking on this insight, we can propose that
genres are to be understood not just as containers for
information but rather as apparatuses of mediation that
traverse social distance, enable cultural transmission and
make absence productive of new forms and new media.

Consequences
● put into perspective the use of statistical
computing in literary studies
● taking seriously the meaning of digital computing
● digital support is not simply another support of the
same thing (text), but a transformation of the
(written) text itself in something else
● situate the literary system within the media system
(Fiormonte 2003: 31)

Semiotic Computing?
● Connecting the debate on digital representation with semiotics is perhaps
the only possible method that will attack the very core of the digital
production of symbols, highlighting both problems and possibilities
(Fiormonte 2009)
● Instead of R, for literary studies we could use a different programming
paradigm (event-driven (VS object-oriented) and declarative are the ones I
am WILLING TO TRY to understand now)
● P. B. Andersen, A Theory of Computer Semiotics. Semiotic Approaches to
Construction and Assessment of Computer Systems, Cambridge UP, 1997.

events in the history of mediation
A Companion to Digital Literary Studies is fundamentally a narrative of
what may be called the scene of "new media encounter" — in this
case, between the literary and the digital. The premise is that the
boundary between codex-based literature and digital information
has now been so breached by shared technological, communicational,
and computational protocols that we might best think in terms of an
encounter rather than a border. And "new media" is the concept that
helps organize our understanding of how to negotiate — which is to
say, mediate — the mixedprotocols in the encounter zone. (LIU
2008)

Electronic Documents
● Even when we press it into a mould, the electronic
document is and remains a source in motion.
(Fiormonte 2003: 15)
● If reading consists in […] constructing a network of cross-
references within the text, associating it with other data,
integrating words and images within a personal memory
that is continuously being updated, then hypertext
mechanisms represent an objectivation,
exteriorization, and virtualization of the reading
process. (Levy: 56-57)

Books VS hypertexts
If we define a hypertext as a space of possible
readings, a text would then represent a
particular reading of an hypertext […] Any
public text accessible through the Internet is
now a virtual component in an immense and
ever-expanding hypertext. (Lévy : 58-59)

Texts VS Events
● representation of texts (TEI, XML, object-
oriented)
● representation of events (performance,
readings, event-driven paradigm languages)

Case Study: The Council of Egypt
●
Arab manuscript (14th
century)
●
Vella, Consiglio d'Egitto (18th
century)
●
Sciascia, Consiglio d'Egitto (20th
century)

Corporate Orientalism
Taking the late eighteenth century as a very
roughly defined starting point, Orientalism can be
discussed and analyzed as the corporate
institution for dealing with the Orient – dealing
with it by making statements about it, authorizing
views of it, describing it, by teaching it, settling it,
ruling over it: in short, Orientalism as a Western
style for dominating, restructuring, and having
authority over the Orient
(Edward Said, Orientalism)

Enlightenment Orientalism
[…] imaginative fiction [...] defined European understandings of
cultures that were seemingly foreign but that shared the past in
ways that needed expert explanation. […] This imagination was
experimental, prospective, and antifoundationalist. […] The
experimentation came to an end, however, partly out of generic
exhaustion and partly as a result of a rising nationalist tide […]
Enlightenment Orientalism was very much an imaginative
Orientalism, circulating images of the East that were nine part
invented and one part referential, but it would be anachronistic to
deem these images ideological, as they did not tend principally
towards domination of the East [...]

Side Projects
● Books are falling apart:
http://futuread.hypotheses.org/
● Leggere, scrivere e far di conto:
http://infouma.hypotheses.org/
● History of Humanities Computing pre-1994:
http://historyofhumanitiescomputing.wikispaces.com
● Bibliography of HC pre-1994:
https://www.zotero.org/groups/252168/

Definition of DH
● The attempt to create intelligent (reading)
machines
and
● to teach people how to be smarter than the
intelligent machines we created

Are Digital Literary Studies even possible?

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (7)

Similar to Are Digital Literary Studies even possible?

Similar to Are Digital Literary Studies even possible? (20)

Recently uploaded

Recently uploaded (20)

Are Digital Literary Studies even possible?

Editor's Notes