Software Studies Initiative - 2007
NEH Ofﬁce for Digital Humanities - 2008
NEH Humanities High Performance Computing - 2008
NEH/NSF Digging Into Data competition - 2009
Computational Social Science - 2009
Culturnomics and Google n-gram viewer - 2010
New York Times: “The next big idea in language,
history and the arts? Data.”- 2010
How can we take advantage of unprecedented
amounts of cultural data available on the web
and digitized cultural heritage to begin analyzing
cultural processes in new ways?
How does computational analysis of the
massive cultural datasets and real-time ﬂows
can help us to develop theories and methods in
humanities adequate for the scale and speed of
the 21st century global networked digital
NEH/NSF Digging into Data competition (2009):
“How does the notion of scale affect
humanities and social science research?
Now that scholars have access to huge
repositories of digitized data—far more than
they could read in a lifetime—what does that
mean for research?”
big cultural data ?
1 study societies through the social media
traces (social computing)
2 more inclusive understanding of cultural
history and present (using much larger
3 detect large scale cultural patterns
4 generate multiple maps of the same cultural
data sets (multiple “landscapes”)
5 the best way to follow global professionally
produced digital culture; understand new
developed cultural ﬁelds (“X” design)
6 map cultural variability and diversity
Example - graph from Ted Underwood, “The Differentiation of Literary
and nonliterary diction, 1700-1900.” Data: 3,724 18th century volumes,
using 10,000 most frequent words (excluding proper nouns).
modern (19th-20th centuries) social and
cultural theory: describe what is similar
(classes, structures, types) / statistics
computational humanities and social science
should focus on describing what is different /
variability / diversity
“from data to knowledge” is wrong. In the
study of culture, we need to go from our
(incomplete, biased) knowledge to actual
“We are no longer interested in the conformity
of an individual to an ideal type; we are now
interested in the relation of an individual to the
other individuals with which it interacts...
Relations will be more important than
categories; functions, which are variable, will
be more important than purposes; transitions
will be more important than boundaries;
sequences will be more important than
Louis Menand on Darvin, 2001.
without “large” categories
Manual De Landa:
“The ontological status of assemblages, large
and small, is always that of unique, singular
“Unlike taxonomic essentialism in which
genus, species and individuals are separate
ontological categories, the ontology of
assemblages is ﬂat since it contains nothing
but differently scaled individual singularities.”
source: A New Philosophy of Society.
“The ‘whole is now nothing more than a
provisional visualization which can be
modiﬁed and reversed at will, by moving back
to the individual components, and then
looking for yet other tools to regroup the same
elements into alternative assemblages.”
source: “Tarde’s idea of quantiﬁcation.” In
The Social After Gabriel Tarde: Debates and
How to study big cultural
visual data in practice?
How to explore massive visual collections
(exploratory media analysis)?
Which data analysis and visualization
techniques are appropriate for non-technical
users? How to democratize data analysis?
collection sorted using
metadata and/or extracted
infovis: data into pictures
mediavis: pictures into pictures
left: scatter plot
right: media visualization (image plot) of the same data
our media visualization software on 287 megapixel display (image: 1 million manga pages)
our media visualization software on newer
display wall with thin bezels
data: 4535 Time magazine covers)
mediavis - related research:
M. Worring, G.P. Nguyen. Interactive access to large
image collections using similarity-based visualization.
Journal of Visual Languages and Computing 19 (2008)
Gerald Schaefer. Interactive Browsing of Image
Repositories. ICVG 2012.
Jing et al., Google Inc. Google Image Swirl: A Large-Scale
Content-Based Image Visualization System. WWW 2012.
mediavis vs. normal
computer science approach:
borrow techniques from media art, digital art,
information visualization / for non-technical users
explore the possibilities of simplest techniques by
using them with media collections from every area
use mediavis to challenge existing concepts and
assumptions of humanities
Basic media visualization
1 montage: sort images using metadata
2 slice: sample images and arrange using
3 image plot: automatically measure image
properties (features) and organize in 2D using
these measurements and metadata
montage: sort images
4535 Time covers, 1923-2009
1 montage close up: Time magazine covers, 1920s
1 montage close up: Time magazine covers, 1990s-2000s
slice: sample images and arrange using metadata
4535 Time covers, 1923-2009. Each line is a vertical slice through the center of an image.
Time coves slice close-up
3 image plot: organize images using features and
Image plots of 4535 Time covers, 1923-2009. X-axis = date; Y-axis = saturation mean.
Time covers image plot close-up
Comparing a number of image sets with image plots
Selected paintings by six impressionist artists. X-axis = mean saturation. Y-axis =
median hue. Megan O’Rourke, 2012.
11th Year (Dziga Vertov, 1928): ﬁrst frame of every shot
11th Year (Dziga Vertov, 1928): comparing ﬁrst
and last frame in every shot (close-ups from
the larger visualization)
Why use numbers?
Using numbers to describe
cultural artifacts allows to
categories (words) with
1 from timelines to graphs
2 better represent analog attributes
of cultural artifacts
3 map cultural landscapes (fuzzy /
overlapping / hard clusters?)
4 visualize cultural variability
5 discover new gropings
1 from timelines to curves Mark Rothko, 393 paintings (1927-1970).
X - year. Y - brightness mean. Hao Wang and Mayra Vasquez.
2 better represent analog attributes of cultural artifacts
close-up of a visualization showing average amount of
visual change (bar graph) in every shot in Vertov’s
11th year. Images above the bar: ﬁrst frame of every
To measure visual change per shot:
1) calculate brightness mean of the difference image
between each two frames in the shot
2) add all means
3) divide by number of frames in the shot
million manga pages
x - standard deviation
y - entropy
single short manga series
< 1000 pages
776 Vincent van Gogh paintings. X - year/month. Y - brightness mean.
Current / recent projects
6000+ paintings of French Impressionists
7000 year old stone arrowheads
(with UCSD anthropologist)
samples from 4.7 million newspaper pages
collection from Library of Congress (UCSD
virtual world / game analytics (funded by NSF
Eager, with UCSD Experimental Games Lab)
comparing Art Now & Graphic design Flickr
groups (340,000 images)
(with CS collaborator from Laurence Berkeley
Big project supported by Mellon Foundation
- tools and workﬂows for working with image
and video collections using SEASR / MEANDRE
digital humanities workﬂow platform
1) 1+ million images + millions of metadata
records from deviantArt (the largest social
network for user-created art - 20 M users, 240 M
2) 1+ million manga pages.
3) thousands of hours TV poltical news and
digital humanities (working
with digitized collections of
vs. computational humanities
(using social web data)
“The capacity to collect and analyze massive amounts
of data has transformed such ﬁelds as biology and
physics. But the emergence of a data-driven
'computational social science' has been much slower.
Leading journals in economics, sociology, and political
science show little evidence of this ﬁeld. But
computational social science is occurring in Internet
companies such as Google and Yahoo, and in
government agencies such as the U.S. National
“Computational Social Science.” Science, vol. 323, no.
6, February 2009.
Massive amounts of cultural content and online
conversations, opinions, and cultural activities
(general and specialized social media networks;
personal and professional web sites ).
This data offers us unprecedented opportunities to
understand cultural processes and their dynamics
and develop new concepts and models which can be
also used to better understand the past.
Currently only analyzed by Google, Facebook,
YouTube, Blueﬁn labs, Echonest, and other
companies, and computer scientists working in
“social computing”- not yet by humanists.
Our free open source software tools for
analyzing and visualizing large image and
video collections, publications and
The tools run on Mac, PC, Unix.
All media visualizations in this presentation
were created by members of Software