Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Are Digital Literary Studies even possible?


Published on

My presentation at University College Cork, for the postgraduate seminar, 2 april 2014

Published in: Education

Are Digital Literary Studies even possible?

  1. 1. Are Digital Literary Studies even possible? I do want to toss around a question that I have been thinking about for a long time: Can you have computational text analysis and literary criticism at the same time? (Ramsay 2012)
  2. 2. What am I looking for? ● Literary criticism disguised as text analysis ● Text analysis disguised as literary criticism
  3. 3. Moretti & Jockers
  4. 4. “The position of Digital Humanities as a discipline is very peculiar, being at the same time a methodology and a discipline in its own right, aimed at the creation of theories and methods, tools and techniques that can be used for research and inquiry.” Definition of DH as an academic field
  5. 5. “The position of Statistics as a discipline is very peculiar, being at the same time a methodology and a discipline in its own right, aimed at the creation of theories and methods, tools and techniques that can be used for research and inquiry.” (Franco Giusti, Introduzione alla statistica, p. 20) Definition of DH as an academic field
  6. 6. Big Data and Statistics “The growing digitization of our textual and literary heritage has convinced many academics and observers of higher education that we are currently experiencing a renaissance in the Humanities. Some scholars argue that this mass of data is profoundly changing the methodological toolbox of a field whose scholarship is traditionally based on close reading and interpretation of texts. Digitization has rendered novels, plays, poems and historical texts open to forms of statistical analysis and visualization methods previously unavailable to these objects. As a result, this “digital turn” is creating a vivid debate within the Humanities about the effects that the use of algorithms might have on the interpretation, understanding and teaching of literature and history.” (Digital Methods in Research – Textual Heritage and Literary Studies, March 27)
  7. 7. Jockers' MacroAnalysis This emerging field [...] was for a good many decades not emerging at all [...] Technology has certainly changed some things about the way literary scholars go about their work, but until recently change has been mostly at the level of simple, even anecdotal, search. The humanities computing/ digital humanities revolution has now begun, and big data have been a major catalyst. The questions we may now ask were previously inconceivable , and to answer these questions requires a new methodology, a new way of thinking about our object of study.
  8. 8. History of Statistics: 1600-1700 Girolamo Ghilini (1589-1668) Ristretto della civile, politica, statistica e militare scienza (1666-68) William Petty (1623-1687) Several Essays in Political Arithmetick (1699) Gottfried Achenwall (1719-1772) Staatsverfassung der Europäischen Reiche im Grundrisse (1752)
  9. 9. History of Statistics: 1800 ● Emergence of Modern Statistics ● Statistics applied to many fields beside government ● Calculations became increasingly complicate ● Stronger need to build mechanical calculating machines
  10. 10. The Art of Compiling Statistics ● Automation of the US 1890 census ● Hollerith founded the Tabulating Machine Company, later called IBM (from 1911 onwards) ● “Be it known that I, HERMAN HOLLERITH, of New York city, county, and State, have invented a certain new and useful Improvement in the Art of Compiling Statistics; and I do hereby declare the following to be a full, clear, and exact description of the same, reference being had to the accompanying drawings, forming a part of this specification, and to the figures and letters of reference marked thereon.” (Patent US395782 A: Art of Compiling Statistics - 1889)
  11. 11. 1920s: IBM and Columbia U. ● 1924-26: Columbia University Statistical Laboratory ● 1928-33: Columbia University Statistical Bureau ● Served as “Computer Center” for other academic departments and outside organizations (Rockfeller and Carnegie Foundations, Yale, Harvard, Princeton)
  12. 12. New statistical machines with the mental power of 100 skilled mathematicians in solving even highly complex algebraic problems were demonstrated yesterday for the first time before a group of psychologists, educational research workers and statisticians in the laboratories of the Columbia University Statistical Bureau in Hamilton Hall.One of the tabulators exhibited can work out and print the results of as many as twelve difficult problems in just a single rapid operation. It is designed to handle differences and reckon powers of numbers up to the tenth, whereas such machines hiterto [sic] have been able to compute only the second power of numbers. Richard Warren and Robert M. Mendenhall, research workers at Columbia and statistical consultants for the Carnegie Foundation for the Advancement of Teaching, are responsible for most of the inventions which were first announced at the educator's convention in Atlantic City last week. These new machines will be a tremendous boon to research, Dr. Ben. D. Wood, Director of the Statistical Bureau, said yesterday, through making statistical procedure more accurate, much faster and less expensive. With the assistance of the new tabu- 1920: The first Super-computing machine?
  13. 13. Prof. Benjamin Wood Pioneer in studies on learning technologies: ● an early study (1928) showing that students taught with films learned more than those taught with printed materials alone ● a study (1929-1931) showing that using typewriters encouraged more and higher quality writing in addition to more cooperation in the classroom ● Consulting role in developing the first commercial test scoring machine (the IBM805)
  14. 14. 1949: Watson meets Busa Hollerith 1889 ● first, preparing a standard or templet indicating the relative position or order in which each item or characteristic of the individual or thing is to be recorded; ● second, forming according to such a standard or templet a separate record for each individual ● third, actuating a series of circuit controlling devices, corresponding in number and position to the standard of templet Busa 1951 ● Transcription of text, broken down into phrases, on to separate cards; ● Multiplication of the cards (as many as there are words on each); ● Indication on each of the resulting cards the respective entry (lemma); ● Selection and alphabetization of all cards purely by spelling; ● typographical composition of the pages for publishing.
  15. 15. 1950s: Competing Computers IBM “In the late nineteenth century, many businesses adopted a practice that organized work using [...] an ensemble of three to six different devices […] More relevant is the ‘‘architecture’’ of the entire room— including the people in it - [ ...] it was that room, not the individual machines, that the electronic computer eventually replaced. (Ceruzzi: 16) UNIVAC “The flow of information through the UNIVAC reflected Eckert and Mauchly’s background in physics and engineering. […] the flow of instructions and data in the UNIVAC mirrored the way humans using mechanical calculators, books of tables, and pencil and paper performed scientific calculations […] a scientist or engineer would not have found anything unusual in the way a UNIVAC attacked a problem.” (Ceruzzi: 15)
  16. 16. Crunching Words before DH ● 1851: Augustus de Morgan ● 1887: T. C. Mendenhall, "The Characteristic Curves of Composition" ● 1888: C. Mascol, "Curves of Pauline and Pseudo- Pauline Style I," ● 1893: L. A. Sherman, Analytics of Literature: A Manual for the Objective Study of English Prose and Poetry (Boston: Ginn) ● 1898: W. Lutolawski, Principes de stylométrie ● 1935: G.K. Zipf, The psycho-biology of language; an introduction to dynamic philology (Boston: Houghton Mifflin Company) ● 1944: G. Udny Yule, The Statistical Study of Literary Vocabulary (Cambridge UP)
  17. 17. The Statistical Study of Literary Vocabulary These discussions left in my mind a sense of inadequacy. They did not tell me what I wanted to know. They dealt with such details as his use of words and idioms […] mere details, details certainly quite useful […] but they give no faintest notion as to what his vocabulary is really like as a whole […] What I felt I wanted in the first place, prior to any detail, was some summary, some picture of the vocabulary as a whole. (p.2)
  18. 18. The Statistical Study of Literary Vocabulary I decided to confine myself to a single class of words, viz. nouns. The concordance was worked through page by page and every noun entered on a card together with the number of times it was used. From these cards it was easy to book up a table, the 'frequency of distribution' to use the statistical term, showing the number of nouns used once, twice, thrice [...] (p.4)
  19. 19. Busa's project Like all good projects, this one began with a question: What is the metaphysics of presence in St. Thomas Aquinas? Combing for praesens and praesentia, he realized that such words were peripheral, and, however unfortunately, Saint Thomas's doctrine of presence is linked with the preposition in! Inquiring what St. Thomas meant by "presence," the young Roberto Busa realized that we must also study the way function-words affect meaning-words. To study the significant phrase "in the presence" he needed the shades of "in". His dissertation, defended in 1946, was essentially founded on a handmade Thomistic Concordance, essentially complete, but with one entry. He had made 10,000 hand-written cards. (Thomas N. Winter 1999: 6)
  20. 20. Early DH and IBM "The use of the latest data-processing tools developed primarily for science and commerce may prove a significant factor in facilitating future literary and scholarly studies." (Paul Tasman, 1957) 1964 Literary Data Processing Conference Proceedings, September 9, 10, 11, 1964. Department of Scientific and Technical Information, International Business Machines Corp., Data Processing Division: White Plains, N.Y., 1964 1966 First issue of Computers and the Humanities, published by Queens College of CUNY, with the financial assistance of IBM corporation and U.S. Steel Foundation. The Academic editor was Prof. Joseph Raben, Department of English, Queens College
  21. 21. Surprise Surprise ● Stylometry is a very popular approach in Digital Literary studies and Text Analysis today ● The R project for Statistical Computing, a strongly functional language and environment to statistically explore data sets, is the most used language for literary digital studies
  22. 22. Leech-Short, Style in Fiction [...] literary stylistics has, implicitly or explicitly, the goal of explaining the relation between language and artistic function. The motivating questions are not so much what as why and how. From the linguist’s angle, it is ‘Why does the author here choose this form of expression?’ From the literary critic’s viewpoint, it is ‘How is such-and-such an aesthetic effect achieved through language?’
  23. 23. Louis T. Milic ● A Quantitative Approach to the Style of Jonathan Swift. Studies in English Literature, v. 23. The Hague: Mouton, 1967. ● Style and Stylistics; an Analytical Bibliography. New York: Free Press, 1967. ● Stylists on Style; a Handbook with Selections for Analysis. New York: Scribner, 1969.
  24. 24. Poibeau 2014 […] computational linguists try to study the mechanisms that make the comprehension of languages possible. They try to build tools that show the possibilities and the limits of learning with only the help of real language data, without dictionaries and similar resources. They try to understand to what extent we can avoid the use of dictionaries or of other tools that provide meanings a priori in order to define meaning exclusively out of a corpus, inferring it from the way in which words are used in it […] it is clear in fact that we acquire knowledge about language from what we hear and read.
  25. 25. Influence and Information Cascades Within the field of observational learning, there exists a theory of information cascades: “An informational cascade occurs when it is optimal for an individual, having observed the actions of those ahead of him, to follow the behavior of the preceding individual without regard to his own information” […] In other words, once a cascade begins, it tends to continue and to create a situation of mass imitation in which individuals repeatedly avoid the road less taken. […] At the same time, the theory tells us that cascades are fragile; the introduction of a disruptive force, a new signal, can cause the cascade to collapse and move in an entirely new direction. […] some mutant writer would take some other road, and a new cascade would follow. As a way of modeling literary influence and intertextuality at scale, information cascades provide an attractive theoretical framework.
  26. 26. Macroanalysis' Genealogy ● in part a response to Franco Moretti’s (Moretti 2000, 56-58) discussion of the need for distant reading in literary studies ● in part related to text analysis and humanities computing ● in part indebted to stylometry and the use of statistics to evaluate and analyze corpora of texts
  27. 27. Distant Reading ● Close reading as a method for gathering evidence is flawed, because interpretation is subjective and biased ● big data render close reading totally inappropriate as a method of studying literary history ● massive digital-text collections demand a new type of evidence gathering and meaning making
  28. 28. Linguistics and Stilistics In recent years we have seen the emergence of computational methods, usually using statistics, whose main feature is to be efficient in working with big data. In a way, being efficient was more important than being meaningful. It is not possible to compute thousands or millions of documents in a few seconds with a deep and meaningful analysis, even if computers are more and more powerful. Suddenly, the easy way is counting (forms, words, patterns and collocations, frequencies etc.) (Poibeau 2014)
  29. 29. Statistics: Why? ● Statistics is a science of the aggregate (Scienza del collettivo) ● The statistical method is the only one that allows to analyse big data
  30. 30. Statistics: Why not? If you use a statistical method, the individual items lose their materiality, there are abstractions that carry only characteristic that are investigated, erasing all the other features that are not interesting for the research.
  31. 31. Aravamudan on Moretti ● Moretti's work on the long arc of the novel has expanded our understanding of its scope and range ● European hegemony is exercised, even if he encourages a cosmopolitan approach ● Moretti has no time for the critical interpretation of individual fictions, except as exemplary of very large trends that can be followed through their tropological and formal analysis, and this is of a piece with his grand narrative of intellectual diffusion with Europe as the core.
  32. 32. Novel: Rise, Diffusion, Resistance ● The rise of the novel (Ian Watt) ● Enlarging the rise of the novel (Moretti & Jockers) ● Resisting the Rise of the Novel (Aravamudan)
  33. 33. A different genealogy
  34. 34. Event in the history of mediation Enlightenment is not just a philosophical position-taking but an institutional event in the history of mediation, a time and a place, as well as a mode of interaction entailing the creation of a new epistemological infrastructure when new genres and formats for the presentation of knowledge were explored and new associational practices developed for the collation of information. New protocols came about, including the 'postal principle' by which anyone can address anyone, public credit and copyright, all of which saturated knowledge production.
  35. 35. Distance Transmission Absence Or as John Guillory extends this argument, the mediations created by the Enlightenment entailed an understanding of distance, transmission and absence as operational between the poles of communication, whether between individuals, objects of analysis, or knowledge systems. Taking on this insight, we can propose that genres are to be understood not just as containers for information but rather as apparatuses of mediation that traverse social distance, enable cultural transmission and make absence productive of new forms and new media.
  36. 36. Consequences ● put into perspective the use of statistical computing in literary studies ● taking seriously the meaning of digital computing ● digital support is not simply another support of the same thing (text), but a transformation of the (written) text itself in something else ● situate the literary system within the media system (Fiormonte 2003: 31)
  37. 37. Semiotic Computing? ● Connecting the debate on digital representation with semiotics is perhaps the only possible method that will attack the very core of the digital production of symbols, highlighting both problems and possibilities (Fiormonte 2009) ● Instead of R, for literary studies we could use a different programming paradigm (event-driven (VS object-oriented) and declarative are the ones I am WILLING TO TRY to understand now) ● P. B. Andersen, A Theory of Computer Semiotics. Semiotic Approaches to Construction and Assessment of Computer Systems, Cambridge UP, 1997.
  38. 38. events in the history of mediation A Companion to Digital Literary Studies is fundamentally a narrative of what may be called the scene of "new media encounter" — in this case, between the literary and the digital. The premise is that the boundary between codex-based literature and digital information has now been so breached by shared technological, communicational, and computational protocols that we might best think in terms of an encounter rather than a border. And "new media" is the concept that helps organize our understanding of how to negotiate — which is to say, mediate — the mixedprotocols in the encounter zone. (LIU 2008)
  39. 39. Electronic Documents ● Even when we press it into a mould, the electronic document is and remains a source in motion. (Fiormonte 2003: 15) ● If reading consists in […] constructing a network of cross- references within the text, associating it with other data, integrating words and images within a personal memory that is continuously being updated, then hypertext mechanisms represent an objectivation, exteriorization, and virtualization of the reading process. (Levy: 56-57)
  40. 40. Books VS hypertexts If we define a hypertext as a space of possible readings, a text would then represent a particular reading of an hypertext […] Any public text accessible through the Internet is now a virtual component in an immense and ever-expanding hypertext. (Lévy : 58-59)
  41. 41. Texts VS Events ● representation of texts (TEI, XML, object- oriented) ● representation of events (performance, readings, event-driven paradigm languages)
  42. 42. Case Study: The Council of Egypt ● Arab manuscript (14th century) ● Vella, Consiglio d'Egitto (18th century) ● Sciascia, Consiglio d'Egitto (20th century)
  43. 43. Corporate Orientalism Taking the late eighteenth century as a very roughly defined starting point, Orientalism can be discussed and analyzed as the corporate institution for dealing with the Orient – dealing with it by making statements about it, authorizing views of it, describing it, by teaching it, settling it, ruling over it: in short, Orientalism as a Western style for dominating, restructuring, and having authority over the Orient (Edward Said, Orientalism)
  44. 44. Enlightenment Orientalism […] imaginative fiction [...] defined European understandings of cultures that were seemingly foreign but that shared the past in ways that needed expert explanation. […] This imagination was experimental, prospective, and antifoundationalist. […] The experimentation came to an end, however, partly out of generic exhaustion and partly as a result of a rising nationalist tide […] Enlightenment Orientalism was very much an imaginative Orientalism, circulating images of the East that were nine part invented and one part referential, but it would be anachronistic to deem these images ideological, as they did not tend principally towards domination of the East [...]
  45. 45. Side Projects ● Books are falling apart: ● Leggere, scrivere e far di conto: ● History of Humanities Computing pre-1994: ● Bibliography of HC pre-1994:
  46. 46. Definition of DH ● The attempt to create intelligent (reading) machines and ● to teach people how to be smarter than the intelligent machines we created
  47. 47. Thank you