Tim Groeling is a professor and former chair of the UCLA Department of Communication Studies. He has written numerous books and articles on political communication, including the award-winning “When Politicians Attack.” He is currently leading a project to digitize three decades of television news for the UCLA Communication Studies Archive.
2. Who Are We?
• Prof. Francis Steen: Director, Communication Studies
Archive
• Me: Leading the analog digitization effort
• Predecessor & Archive Founder: Prof. Paul Rosenthal
(emeritus)
• UCLA Library: Helps support the collection, store files,
and host main “public” site (tvnews.library.ucla.edu )
• Other supporters: UCLA Chancellor and Dean of
Social Sciences, Arcadia Fund, UCLA Social Sciences
Computing, the NSF, the California Endowment, UCLA
Office for Instructional Development, and UCLA CCLE.
3. Collections?
• Oldest collection: UCLA Campus
Speakers, from 1950s-1980s (over 500
audio recordings)
• Digitized to coincide with 40th
anniversary of my department.
• Originally planned to exhibit & host on
our website. Target at alumni.
• Moved to YouTube: Now most traffic
(77%) comes from YouTube search/
suggested videos/browsing.
• Issues: commenters & copyright
4. NewsScape
• Largest collection: TV news and public
affairs programs (local & national)
• Started during Watergate (preserve
ephemera). Shoestring budget (until
recently, only about $10k per year plus
volunteer labor & donated equipment)
• 1979: Started trying to record all the local
and national TV news viewable in LA.
• 2006: Started daily straight-to-digital
recording.
• Since 2006: Added other cities.
5. Pre-2006 Holdings?
• Recordings spread across three campus
organizations (Comm Studies, Library, and TFT)
and at least four on- and off-campus storage
sites.
• Good records for some portions; very poor
records for others.
• Not sure how many tapes are in the
collection overall.
• Even where we know what should be on a tape,
some problems with tape, VCR, or schedule.
6. Tapes
• Earliest recordings (1970s) Around 500
U-Matic tapes.
• Middle period (1979-early 1990s):
about 50k hours on Betamax
• Late period (1990s-2006): Around 160k
hours on VHS, plus some redundancy.
7. Preservation
• VHS are actually most threatened, despite being newest
tapes.
• Coincided with cable TV expansion of news programming:
stretched same budget to cover more news programming.
• 8 hours per consumer VHS tape (Betas and U-matics
were higher quality tapes; less recorded on each tape)
• Poor quality consumer-grade VCRs
• Limited spot-checking for quality (failing VCRs or poor
signal quality not noticed for long stretches).
• Originals still in hand, but dead.
• Improperly stored (even faculty didn’t have A/C)
8. Cost to Digitize?
• Got bids from another archive and
commercial providers: $1.5 million (just for
first 150k hours of VHS).
• Instead, shoestring again.
• $20k (and some donated surplus
machines) for hardware, software, and
furniture for digitization lab.
• Run by me, part time lab manager, and 10
work-study students. Steen handles files.
• [Shifting to Betamax will be costly, though]
9. Lab Details
• 22 digitization stations (VCR, encoder, computer).
3 local RAID file servers and 1 Filemaker Server.
• All computers: surplus or eBay Macs (circa 2008)
• Encoders: EyeTV using Hauppauge 950q or
EyeTV Hybrid hardware MPEG-2 encoders (get
CC). Export and sync scripted.
• VCRs: After testing, settled on JVC S-VHS VCRs
(consumer to pro).
• Use pre-printed barcode stickers for inventory.
Custom Filemaker database for tracking
digitization attempts and quality control. Filemaker
Go Mobile (via cell phones) for asset tracking.
• Files are quality-checked, compressed to h.264,
closed captioning extracted via Hoffman Cluster.
10. Progress
• Fall 2015: Process design &
workstation configuration testing
• Winter 2016: hired students, fixed
network issues, and ramped up
workstations.
• Summer 2016: Filemaker inventory
control; two daily shifts.
• March-Oct 2016: 4.5k tapes
encoded (about 36k hours).
• Delaying splitting files into shows.
11. Problems: Lots
• Recording/playback VCRs out of spec
(solution: quality control aggregation helps
find love connection)
• Varying program names over time (database
tracking alternate show names; day/time/
channel 30-minute bloc)
• Buzzing audio? Computer RF interference with
VCR audio. (Used dead VCRs as spacers.)
• Lot of other problems, but in most cases, just
means another encoding attempt.
12. Post 2006
Straight-to-Digital
• 46 networks (US and beyond)
• 2,525 Series
• Total video files: 383,550
• Duration in hours: 297,596
• Closed caption files: 383,739
• Words in caption files: 2,419,185,351
• OCR files: 371,426
• Words in OCR files: 825,662,597
• Total thumbnail images: 107,134,425
• Storage: 106.93 terabytes
• Limited public access link: tvnews.library.ucla.edu
13. Unlocking the Content
• Preservation is just first step:
Needs to be more than "world’s
best DVR"
• Want to provide tools to make
the collection more useful and
relevant beyond UCLA: Help
people
• Understand TV news, and…
• Share what they find
14. Example: Obama
• Preservation is just first step: Needs to be more
than world’s "best VCR"
• Want to provide tools to make the collection more
useful and relevant beyond UCLA: Help people
• Understand TV news, and…
• Share what they find
Good to know, but…
15. Tools to Understand News
• Not just view, but analyze
• Help understand and
visualize patterns of news
coverage, not just
individual stories. Forest,
not just trees.
• Tools are already being
developed, but are
complex
16. Tools to Understand Text
• Not just view, but analyze
• Help understand and visualize patterns of news
coverage, not just individual stories (copyright, too)
• Tools are already being developed, but are
complex
17. Ambitious Goal: Visuals
• Text analysis is fairly mature (more
than 2 billion words in NewsScape
index)
• Named entities, parts of speech,
topic detection are all working
now (sentiment is harder)
• Analysis of visuals is challenging.
• Facial detection & analysis tools
are becoming more useful;
scalable
18. Automated Analysis of
Visuals
• Goal: Be able to understand patterns of visual communication in
election news. Hard to study.
• Mostly hand-coded & focus on still newspaper or web photos
• Trouble scaling to massive volume of images.
• Subjectivity
• Machine learning and big data as solution
• Presented pilot study at this year’s American Political Science
Association conference categorizing presidential candidate
faces (smiling or not).
22. Weekly topic tracking
(filter by outlet) with
metadata (who, what,
where, how much)
Daily topic trajectory.
News topics are
detected by clustering
every day, and then
linked the detected
topics to generate topic
tracking trajectories (Li,
Joo, Qi, & Zhu, 2015)
23.
24.
25. Other Goal: Sharing
• Help people share what they
learn (within bounds of copyright)
• Solution #1: Share analysis, rather
than raw material
• Solution #2: use familiar,
copyright-compliant tools to
create and share.
26. Social Sharing Tools
• Trying to develop two tools:
• Animated GIF generator:
(short clip; small file; on-
screen captioning; easy to
play)
• "Supercut" generator:
Assemble short examples
from archives; share
compilation
27. Summing Up
• Preservation as goal, but also as starting point.
• Excited to be able to understand long-term
changes in news content & norms.
• Lot of work ahead of us.
• Appreciate any help or advice (or funding) you
can offer.