Transcript of "Getting Intimate with Your Data - Working Our Way out of the Lab"
Will the scholarship ever leave the lab?
Getting Intimate with Your Data
18 February 2014
Any Success with TA or DV?
Did anyone get a chance to poke around with
Voyant, TAPoR or ManyEyes?
An Interesting TA Case Study
Objective: Goal was to reveal the connection between
business and society in the historical record of the HBR
Clement Levallois and Valerie Alloix
A Sample Text/Network Analysis
Merging the singular and plural forms of terms ("lemmatization");
Removal of the most common terms from the English language (based
on a list of 5000 frequent terms - stop list);
Detection of terms composed of multiple words ("n-gram detection");
Identification of the 10 most frequent terms for each year;
Publishing frequency equalised as years preceding 2000 were
grouped in 5 year periods;
The next step was to manually inspect these 10 most frequent terms
for each year or group of 5 years.
Clement's Levallois Cowo software (
"Dennis the Paywall Menace Stalks the Archives"
Dennis the Paywall Menace Stalks the Archives
"I suppose I would wish D. C. Thomson
well in moving on from Dennis the
Menace to history, if it wasn’t for the
fact that it involves the theft of public
cultural property." - Andrew Prescott
Access versus Preservation
Access versus Process
Privileging certain collection because they are available
"It seems as if archivists have been gripped by a mania to
digitise as quickly as possibly, regardless of the
implications for future scholarship of how this is done."
"Scottish students in Glasgow now study Welsh wills (freely
available) rather than Scottish wills (locked behind a
brightsolid paywell) – a lesson for the Scottish government
to ponder there, surely."
"Digitization makes the most traditional forms of humanistic
scholarship more necessary, not less.
But the differences mean that we need to reinvent, not
reaffirm, the way we engage with the humanities."
"Process raw data received through our senses into
concepts, patterns and implications. Everything coming in
through our senses is information waiting to be processed
Wm Jones - Keeping Found Things Found
Data Consisting of What?
Basic types of content that we are used to deal with:
Other, more “complex” stuff:
Temporal - Time - Events
Spatial - Space Coordinates - Place
Relations, connections, links - genealogy - Networks
TimeFlow was created by:
Fernanda Viegas and
(Flowing Media, Inc.) and
Sarah Cohen (Duke University).
The initial development was
Duke University's DeWitt Wallace
Center for Media and Democracy.