Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard
1. Winter 2014: Session #2
Programming on the Whiteboard
(Paige Morgan, Sarah Kremen-Hicks, Brian Gutierrez)
2. Previously, at DMDH...
• The work of creating usable data
• Forms that this data might take:
• markup language
• spreadsheets
3. Workshop #2
• Caveat Curator (challenges of working with
data)
• Programming on the whiteboard, i.e.,
conceptualizing the specific steps that you
need to take to accomplish your goals
4. Why this focus on data?
• Understanding your data, and your
intended actions, is a key skill for working
with any programming language or
platform.
• This is true whether you are the
programmer or whether you are working
with professional programmers.
46. GIS technology has paved the
way for the analyzing qualitative
data associated with cultural
experiences
47. “A good map is worth a thousand words,
cartographers say, and they are right: because
it produces a thousand words: it raises doubts,
ideas. It poses new questions, and forces you
to look for new answers.”
(Moretti 1998, 3–4)
48. Literary texts are filled with
subjective spatial data: an
author or character's
articulation of geographically
located dwellings, urban and
rural landscapes, as well as
performance spaces
50. Objective: to map the visual culture
events referenced in Wordsworth’s
autobiographical poem The Prelude (as
well as the ones not referenced)
51. Problem to solve: Prove that literary
galleries, specifically Joseph Boydell’s
“Shakespeare Gallery” shaped the
dramaturgical choices in the only play
written by Wordsworth. He reads
Shakespeare not through a personal copy
of the play, but through the visual and
performative texts at that time
56. Wordsworth mentions these following place
names and references:
"Oh wonderous power of words, how sweet they are
/ According to the meaning which they bring-- /
Vauxhall and Ranelagh, I then had heard / Of your green
groves and wilderness of lamps, /Your gorgeous ladies,
fairy cataracts,And pageant fireworks" (119-125)
"Half-rural Sadler's Wells" (267)
57. First, I need to know what and
where these places were in
order to identify them as
spatial data
Ex:Vauxhall and Ranelagh
58. Second, if I'm interested in
visual cultural experiences, I
need to identify what kind of
event occurred there: galley
play, etc.
59. Third, how would I access the data?
Answer: place-names in a book are not
under any copyright.
However, if I wanted to include sections
from the text when a viewer would click
on that place name then I would have to
think about copyright, but it's on PG, so
that's covered.
60. Fourth, I would have to locate any indirect
reference to visual cultural phenomena.
Ex:Wordsworth mentions two actresses by
name Mary Robinson and Sarah Siddons.
Since I cannot map a person, I need to
investigate which plays they were in and at which
theaters during that moment of his life (it's an
autobiography)
61. Fifth, I need to research what special
events were occurring at other places
he mentions. For that, I look to The
Times (newspapers) and various
periodicals.
62. Sixth, because I going to create
a map, using ArcGIS, I need to
put my data in an excel
spreadsheet so that it can be
read by the program.
70. Benefits of ArcGIS
• It allows the overlay of historical maps
• Trainings were available and accessible
(through DHSI and UW courses)
• As a software program,ArcGIS is
established enough to be considered robust
• Available through the UW software suite
71. Disadvantages of ArcGIS
• Available only for PCs
• Proprietary file format (even if input data is
open-access, the end result is not)
• Available only on an annual subscription
model (and prohibitively expensive for
scholars without campus-granted access)
72. In Franco Moretti’s Atlas of the
European Novel 1800-1900
(1998), he calls for a “literary
geography,” predicated on the
creation of “readerly maps”
and the use of those maps as
analytical tools.
73. Caveats?
The pursuit of mapping data
may exclude complex social
spaces (e.g., gender domestic
environments)
78. Is it possible to predict
deviations from a metrical
norm based on author or lyric
classification?
79. Will authors show a tendency
for particular types of metrical
substitution?
80. Prepping the Data
• For proof of concept, start with one author
(Alfred, LordTennyson)
• Get Tennyson’s poems from Project
Gutenberg
• Hand-mark representative poems for
prosody
82. Computer tasks• Count feet per line
• Recognize | as a foot boundary
• Recognize carriage return as a line boundary
• Supply foot boundaries at beginning/end of
lines
• Count the number of areas contained within
foot boundaries for each line
83. These steps involve recognizing
each metrical foot as units that
contain particular accentual-
syllabic data.
x / |x /|xx / | x / |x /
Sir WalterVivian all a summer's day
84. Computer tasks, cont’d.
• Identify the most common number of feet
per line
• Supply a report on lines (by number) that
deviate
• Calculate rate of deviation/adherence
• Mode = paradigm
85. After recognizing the foot as a
unit, the computer can calculate
what patterns of data each foot
contains.
86. Computer tasks, cont’d.
• Identify the most common foot type
• Identify markings within foot boundaries
• Compare markings to foot dictionary to
identify type
87. These tasks identify each line
as a unit composed of one or
more feet.
x / |x /|xx / | x / |x /
Sir WalterVivian all a summer's day
(iambic pentameter with third foot anapestic
substitution)
88. Still more computing tasks!
• Identify the most common foot type within
a poem
• Supply a report on feet (by line and foot
number) that deviate
• Calculate rate of deviation/adherence
• Mode = paradigm
89. Just as the feet contain
patterns, the lines contain
patterns that can be analyzed
as well.
90. Still more computing tasks!
• Report on types of deviations arranged by
most to least common
• Information should include location
(line/foot number), as well as prevalence of
substitution type
91. Deviations and their placement
within each line and each poem
should display certain patterns
unique to each author (I hope!)
92. Current status: I’m investigating
using the Natural Language
Toolkit to tokenize each foot;
and to establish syllables, feet,
and lines as a unique hierarchy.
94. If you are thinking about your
data, and the tasks that you
need to accomplish, then it’s
easier to determine what sort
of language or platform your
project needs.
95. There are countless tutorials,
online courses, etc., for almost
any programming language or
platform.
(We’re giving you a cheat sheet,
too; and http://www.dmdh.org is
your friend. So is Google.)
100. For this activity, we
recommend that you pair up,
or form small groups to work
together.
101. Group Activity
• What do you need to do with your data?
• What units might that data exist in?
• What categories do you need to create?
• What relationships need to exist between
the units and categories?
102. Spring Workshops!
• Project Ideation and Development
• April 5th and April 26th (advance
registration for DMDH participants at the
end of Winter Quarter
103. DMDH content is developed by Paige Morgan,
Sarah Kremen-Hicks, and Brian Gutierrez, with
generous support from the Simpson Center for
the Humanities at the University of Washington.
Content is available under a
Creative Commons Attribution-NonCommercial
3.0 Unported License.
Please contact Paige at paigecm@uw.edu with
questions.