Analysing deux
Will the scholarship ever leave the lab?
Getting Intimate with Your Data
!
!

18 February 2014
Any Success with TA or DV?
Did anyone get a chance to poke around with

Voyant, TAPoR or ManyEyes?
An Interesting TA Case Study
‣
‣

Objective: Goal was to reveal the connection between
business and society in the historical record of the HBR
Clement Levallois and Valerie Alloix
!

‣

https://www.kaggle.com/c/harvard-business-reviewvision-statement-prospect/prospector#100
A Sample Text/Network Analysis
‣
‣
‣
‣
‣
‣
‣

Merging the singular and plural forms of terms ("lemmatization");
Removal of the most common terms from the English language (based
on a list of 5000 frequent terms - stop list);
Detection of terms composed of multiple words ("n-gram detection");
Identification of the 10 most frequent terms for each year;
Publishing frequency equalised as years preceding 2000 were
grouped in 5 year periods;
The next step was to manually inspect these 10 most frequent terms
for each year or group of 5 years.
Result:

https://github.com/seinecle)

Clement's Levallois Cowo software (
Aylien Text Analysis
How was:
!

"Dennis the Paywall Menace Stalks the Archives"
Dennis the Paywall Menace Stalks the Archives
"I suppose I would wish D. C. Thomson

well in moving on from Dennis the

Menace to history, if it wasn’t for the

fact that it involves the theft of public

cultural property." - Andrew Prescott
!

Access versus Preservation
Access versus Process
Privileging certain collection because they are available
"It seems as if archivists have been gripped by a mania to
digitise as quickly as possibly, regardless of the
implications for future scholarship of how this is done."
!

"Scottish students in Glasgow now study Welsh wills (freely
available) rather than Scottish wills (locked behind a
brightsolid paywell) – a lesson for the Scottish government
to ponder there, surely."
!

Andrew Prescott
"Digitization makes the most traditional forms of humanistic
scholarship more necessary, not less.

But the differences mean that we need to reinvent, not
reaffirm, the way we engage with the humanities."
"Process raw data received through our senses into
concepts, patterns and implications. Everything coming in
through our senses is information waiting to be processed
and understood."
!

Wm Jones - Keeping Found Things Found
UnBuilding Grand Central Station
Data Consisting of What?
‣

Basic types of content that we are used to deal with:
‣
‣
‣
‣

‣

Text
Numbers
Images
Video

Other, more “complex” stuff:
‣
‣
‣

Temporal - Time - Events
Spatial - Space Coordinates - Place
Relations, connections, links - genealogy - Networks
Time
‣
‣
‣
‣

‣
‣
‣
‣
‣

Timeflow:
Journalists
TimeFlow was created by:
Fernanda Viegas and 

Martin Wattenberg

(Flowing Media, Inc.) and
Sarah Cohen (Duke University).
The initial development was
sponsored by
Duke University's DeWitt Wallace
Center for Media and Democracy.
Space and Place
Network Analysis
Networks and Relationships
Thinking Longer Term
for Next Lecure (25 February): Presenting I
Please take a look at:
!

The Visual Complexity Website
http://visualcomplexity.com
Thank You
shawn.day@ucc.ie @iridium

Getting Intimate with Your Data - Working Our Way out of the Lab

  • 1.
    Analysing deux Will thescholarship ever leave the lab? Getting Intimate with Your Data ! ! 18 February 2014
  • 2.
    Any Success withTA or DV? Did anyone get a chance to poke around with
 Voyant, TAPoR or ManyEyes?
  • 3.
    An Interesting TACase Study ‣ ‣ Objective: Goal was to reveal the connection between business and society in the historical record of the HBR Clement Levallois and Valerie Alloix ! ‣ https://www.kaggle.com/c/harvard-business-reviewvision-statement-prospect/prospector#100
  • 4.
    A Sample Text/NetworkAnalysis ‣ ‣ ‣ ‣ ‣ ‣ ‣ Merging the singular and plural forms of terms ("lemmatization"); Removal of the most common terms from the English language (based on a list of 5000 frequent terms - stop list); Detection of terms composed of multiple words ("n-gram detection"); Identification of the 10 most frequent terms for each year; Publishing frequency equalised as years preceding 2000 were grouped in 5 year periods; The next step was to manually inspect these 10 most frequent terms for each year or group of 5 years. Result: https://github.com/seinecle) Clement's Levallois Cowo software (
  • 6.
  • 7.
    How was: ! "Dennis thePaywall Menace Stalks the Archives"
  • 8.
    Dennis the PaywallMenace Stalks the Archives "I suppose I would wish D. C. Thomson
 well in moving on from Dennis the
 Menace to history, if it wasn’t for the
 fact that it involves the theft of public
 cultural property." - Andrew Prescott ! Access versus Preservation Access versus Process Privileging certain collection because they are available
  • 9.
    "It seems asif archivists have been gripped by a mania to digitise as quickly as possibly, regardless of the implications for future scholarship of how this is done." ! "Scottish students in Glasgow now study Welsh wills (freely available) rather than Scottish wills (locked behind a brightsolid paywell) – a lesson for the Scottish government to ponder there, surely." ! Andrew Prescott
  • 10.
    "Digitization makes themost traditional forms of humanistic scholarship more necessary, not less.
 But the differences mean that we need to reinvent, not reaffirm, the way we engage with the humanities."
  • 11.
    "Process raw datareceived through our senses into concepts, patterns and implications. Everything coming in through our senses is information waiting to be processed and understood." ! Wm Jones - Keeping Found Things Found
  • 12.
  • 13.
    Data Consisting ofWhat? ‣ Basic types of content that we are used to deal with: ‣ ‣ ‣ ‣ ‣ Text Numbers Images Video Other, more “complex” stuff: ‣ ‣ ‣ Temporal - Time - Events Spatial - Space Coordinates - Place Relations, connections, links - genealogy - Networks
  • 14.
    Time ‣ ‣ ‣ ‣ ‣ ‣ ‣ ‣ ‣ Timeflow: Journalists TimeFlow was createdby: Fernanda Viegas and 
 Martin Wattenberg
 (Flowing Media, Inc.) and Sarah Cohen (Duke University). The initial development was sponsored by Duke University's DeWitt Wallace Center for Media and Democracy.
  • 15.
  • 16.
  • 17.
  • 19.
  • 20.
    for Next Lecure(25 February): Presenting I Please take a look at: ! The Visual Complexity Website http://visualcomplexity.com
  • 21.