Let the Computer Do the Work

LET THE COMPUTER
DO THE WORK
Karen Cariani & Casey Davis
WGBH | Boston, MA, USA

the situation
■ 68,000 digitized television and radio programs
■ incomplete, inaccurate metadata records
■ limited staff resources
■ we need to know what we have in the collection
■ we have a responsibility to users to provide access to the collection
■ continued growth of the collection (content and sparse metadata)

The State of Recorded Sound
Preservation in the United States: A
National Legacy at Risk in the Digital Age
(2010)
Suggested that if scholars and students do not use sound archives,
cultural heritage institutions will be less inclined to preserve them.
Archives and libraries must collaborate with patrons and scholars to
understand how recordings are and might be used.
Scholars need to know what kinds of analysis are possible in an age
of large, freely available collections and advanced computational
analysis.

A vision
“ . . . the sound file would become . . . a text
for study, much like the visual document.
The acoustic experience of listening to the
poem would begin to compete with the
visual experience of reading the poem.”
Bernstein, Charles. Attack of the Difficult Poems: Essays and Inventions.
University Of Chicago Press, 2011, 114.

HiPSTAS team
• Tanya Clement, [PI] Assistant Professor, University of
Texas at Austin
• Loretta Auvil [Co-PI] Senior Project Coordinator at the
Illinois Informatics Institute (I3) at the University of
Illinois at Urbana-Champaign
• David Tcheng [Co-PI] Research Scientist at I3; ARLO
developer
• Tony Borries, Research Programmer working as a
consultant with I3; ARLO programmer
• David Enstrom, Biologist, University of Illinois at
Urbana-Champaign; consultant

Participants, Hipstas Institute, 2013-
2014
• 8 librarians and archivists
• 9 humanities scholars
• 3 advanced graduate students in humanities and
information science

Participating collections
• poetry from PennSound at the University of
Pennsylvania 30,000 audio files
• folklore at the Dolph the Briscoe Center for American
History at UT Austin, 57 feet of tapes (reels and
audiocassettes)
• storytelling traditions at the Native American Projects
(NAP) at the American Philosophical Society in
Philadelphia , 50 tribes, 3,000 hours

• Field recordings (200,000 recordings) American Folklife
Center, Library of Congress
• 30, 000 hours, Oral histories, Storycorps
• Speeches in the Southern Christian Leadership
Conference recordings, Emory University
• 700 recordings in the Elliston Poetry Collection at the
University of Cincinnati
• 36 interviews in the Dust, Drought and Dreams Gone
Dry: Oklahoma Women and the Dust Bowl (WDB) oral
history project out of the Oklahoma State Libraries
OTHER COLLECTIONS OF INTEREST TO PARTICIPANTS

To develop a virtual research environment in which users
can better access and analyze spoken word collections of
interest to humanists through:
1. an assessment of scholarly requirements for analyzing
sound
2. an assessment of technological infrastructures needed
to support discovery
3. preliminary tests that demonstrate the efficacy of using
such tools in humanities scholarship
4. A freely available, open-source, API-driven version for
general use
HIPSTAS: PRIMARY GOALS

ARLO (Adaptive Recognition with
Layered Optimization)
HZ, a unit
of
frequency
Time
a heat based color scheme.
White – hottest, most
intense
Yellow
Red
Green
Blue
Black – coolest, least
intense
Energy represented by

OpenMary
LaynorStein
Searching for Sound with
Sound
Supervised Classification

UNSUPERVISED CLASSIFICATION
Searching for Sound with

Blue = sung; green = spoken; red = instrumental
55 John Alan Lomax recordings 1926-1941
Visualize results

Visualize results
55 John Alan Lomax recordings 1926-1941

Takeaways:
■ What do scholars talk about when they talk about
sound?
• Language dynamics: tempo, pitch, tone/timbre,
volume, pace, laughter, silence, applause, moans,
screams, dialects, changing speakers, gender,
age, changing genres
• Environment: fan hums, car horns, chickens, train
whistles, bird calls, frogs mating
• Materiality: recording noises, needle drops,
feedback, the electronic grid, changing tracks

■What do engineers talk about they talk about
audio?
• Resolution: Bit depth, Bit rate, sample rate
• Signal processing: Fast Fourier Transform (FFT)
and filter banks
• Dynamics: Damping ratios, gain, frequencies,
spectra, energy, and pitch energy
TAKEAWAYS

■ What do computer scientists talk about when
they talk about ML?
• Features: What are we measuring?
• Ground Truth: What’s the answer? How do we
know when we’re accurate?
• Optimization: Accuracy vs. Efficiency – how do
you balance the accuracy of your results
against the computational resources you need
to achieve that level of accuracy?
Takeaways

Takeaways
• Literacy: How much do we need to know about the
technology of audio, of computational methods, and of
humanist inquiry to do new kinds of research in this area?
• Usability: What kinds of interfaces and tools facilitate AV
analysis in a diverse range of disciplines and communities?
Who gets access to these tools and for what kinds of
questions?
• Accuracy: Is good enough, good enough?
• Scalability: How much storage and processing power do
users need to conduct local and large-scale AV analyses? A
Laptop? A Supercomputer?
• Sustainability: What are local, national, and global scale
issues? How does this work fit back into the access
infrastructure already in place in archives, libraries,
classrooms? Is data enough to get us over the hump of our
limited means for discovery?

NATURAL LANGUAGE
PROCESSING TOOLS

Computational tools
■ Language
■ Speech to text
■ Image recognition
■ Sound

Data visualization
■ ARLO
■ Hipsta

We will want to show sample files
■ Popup archive
■ Speech to text

americanarchive.org
@amarchivepub
facebook.com/amarchivepub

Let the Computer Do the Work

More Related Content

Similar to Let the Computer Do the Work

More from WGBH Media Library and Archives

Recently uploaded

Let the Computer Do the Work

Editor's Notes