How and why study big cultural data v2

How and why study big
visual cultural data

Dr. Lev Manovich
Professor, CUNY Graduate Center
manovich.lev@gmail.com
softwarestudies.com

Fall 2012 version

softwarestudies.com 1

Software Studies Initiative - 2007

NEH Ofﬁce for Digital Humanities - 2008

NEH Humanities High Performance Computing - 2008

NEH/NSF Digging Into Data competition - 2009

Computational Social Science - 2009

Culturnomics and Google n-gram viewer - 2010

New York Times: “The next big idea in language,
history and the arts? Data.”- 2010


How can we take advantage of unprecedented
amounts of cultural data available on the web
and digitized cultural heritage to begin analyzing
cultural processes in new ways?

How does computational analysis of the
massive cultural datasets and real-time ﬂows
can help us to develop theories and methods in
humanities adequate for the scale and speed of
the 21st century global networked digital
culture ?


NEH/NSF Digging into Data competition (2009):

“How does the notion of scale affect
humanities and social science research?
Now that scholars have access to huge
repositories of digitized data—far more than
they could read in a lifetime—what does that
mean for research?”


Why study
big cultural data ?


1 study societies through the social media
traces (social computing)

2 more inclusive understanding of cultural
history and present (using much larger
samples)

3 detect large scale cultural patterns


4 generate multiple maps of the same cultural
data sets (multiple “landscapes”)

5 the best way to follow global professionally
produced digital culture; understand new
developed cultural ﬁelds (“X” design)

6 map cultural variability and diversity


Example - graph from Ted Underwood, “The Differentiation of Literary
and nonliterary diction, 1700-1900.” Data: 3,724 18th century volumes,
using 10,000 most frequent words (excluding proper nouns).


modern (19th-20th centuries) social and
cultural theory: describe what is similar
(classes, structures, types) / statistics
(reduction)

computational humanities and social science
should focus on describing what is different /
variability / diversity

“from data to knowledge” is wrong. In the
study of culture, we need to go from our
(incomplete, biased) knowledge to actual
cultural data


“We are no longer interested in the conformity
of an individual to an ideal type; we are now
interested in the relation of an individual to the
other individuals with which it interacts...
Relations will be more important than
categories; functions, which are variable, will
be more important than purposes; transitions
will be more important than boundaries;
sequences will be more important than
hierarchies.”

Louis Menand on Darvin, 2001.


Visualization: Thinking
without “large” categories


Manual De Landa:
“The ontological status of assemblages, large
and small, is always that of unique, singular
individuals.”

“Unlike taxonomic essentialism in which
genus, species and individuals are separate
ontological categories, the ontology of
assemblages is ﬂat since it contains nothing
but differently scaled individual singularities.”

source: A New Philosophy of Society.


Bruno Latour:
“The ‘whole is now nothing more than a
provisional visualization which can be
modiﬁed and reversed at will, by moving back
to the individual components, and then
looking for yet other tools to regroup the same
elements into alternative assemblages.”

source: “Tarde’s idea of quantiﬁcation.” In
The Social After Gabriel Tarde: Debates and
Assessments.


How to study big cultural
visual data in practice?
How to explore massive visual collections
(exploratory media analysis)?

Which data analysis and visualization
techniques are appropriate for non-technical
users? How to democratize data analysis?


Our methodology:
media visualization

display complete
collection sorted using
metadata and/or extracted
features


infovis: data into pictures

mediavis: pictures into pictures


left: scatter plot
right: media visualization (image plot) of the same data


our media visualization software on 287 megapixel display (image: 1 million manga pages)

our media visualization software on newer
display wall with thin bezels
data: 4535 Time magazine covers)


mediavis - related research:
M. Worring, G.P. Nguyen. Interactive access to large
image collections using similarity-based visualization.
Journal of Visual Languages and Computing 19 (2008)
(submitted 2005).

Gerald Schaefer. Interactive Browsing of Image
Repositories. ICVG 2012.

Jing et al., Google Inc. Google Image Swirl: A Large-Scale
Content-Based Image Visualization System. WWW 2012.


mediavis vs. normal
computer science approach:
borrow techniques from media art, digital art,
information visualization / for non-technical users

explore the possibilities of simplest techniques by
using them with media collections from every area
of humanities

use mediavis to challenge existing concepts and
assumptions of humanities


Basic media visualization
techniques:
1 montage: sort images using metadata

2 slice: sample images and arrange using
metadata

3 image plot: automatically measure image
properties (features) and organize in 2D using
these measurements and metadata


1
montage: sort images
using metadata

4535 Time covers, 1923-2009


1 montage close up: Time magazine covers, 1920s


1 montage close up: Time magazine covers, 1990s-2000s


2
slice: sample images and arrange using metadata

4535 Time covers, 1923-2009. Each line is a vertical slice through the center of an image.


Time coves slice close-up


3 image plot: organize images using features and
(optionally) metadata

Image plots of 4535 Time covers, 1923-2009. X-axis = date; Y-axis = saturation mean.


Time covers image plot close-up


Comparing a number of image sets with image plots

Selected paintings by six impressionist artists. X-axis = mean saturation. Y-axis =
median hue. Megan O’Rourke, 2012.


visualizing video
collections:

use media visualization with a set of
keyframes

automatic selection of key frames
(for example, using free shot detection
software)


Kingdom Hearts video game
62.5 hr. of game play, 29 sessions over 20 days.ys.
montage: 1 frame per 3 sec (22500 frames in total)

softwarestudies.com

11th Year (Dziga Vertov, 1928): ﬁrst frame of every shot

softwarestudies.com

11th Year (Dziga Vertov, 1928): comparing ﬁrst
and last frame in every shot (close-ups from
the larger visualization)


Why use numbers?

Using numbers to describe
cultural artifacts allows to
replacing discrete
categories (words) with
continuos descriptions
(curves)

1 from timelines to graphs

2 better represent analog attributes
of cultural artifacts

3 map cultural landscapes (fuzzy /
overlapping / hard clusters?)

4 visualize cultural variability

5 discover new gropings

1 from timelines to curves Mark Rothko, 393 paintings (1927-1970).
X - year. Y - brightness mean. Hao Wang and Mayra Vasquez.

softwarestudies.com

2 better represent analog attributes of cultural artifacts

Next slide:
close-up of a visualization showing average amount of
visual change (bar graph) in every shot in Vertov’s
11th year. Images above the bar: ﬁrst frame of every
shot.

To measure visual change per shot:
1) calculate brightness mean of the difference image
between each two frames in the shot
2) add all means
3) divide by number of frames in the shot

softwarestudies.com

3 the maps of cultural landscapes reveal fuzzy and
overlapping clusters - rather than discrete categories
with hard boundaries

4 visualize the space of variations
600 variations of Google Logo, 1988-2009

softwarestudies.com

Studying large massive
data sets challenges our
existing theoretical
concepts and assumptions

example: what is “style”?


image plot of one million manga pages
x - standard deviation
y - entropy

softwarestudies.com

distribution of
million manga pages

x - standard deviation
y - entropy


single short manga series
< 1000 pages


776 Vincent van Gogh paintings. X - year/month. Y - brightness mean.


Current / recent projects
at softwarestudies.com:
6000+ paintings of French Impressionists

7000 year old stone arrowheads
(with UCSD anthropologist)


samples from 4.7 million newspaper pages
collection from Library of Congress (UCSD
undergraduate students)

virtual world / game analytics (funded by NSF
Eager, with UCSD Experimental Games Lab)

comparing Art Now & Graphic design Flickr
groups (340,000 images)
(with CS collaborator from Laurence Berkeley
National Laboratory)


Big project supported by Mellon Foundation
Grant, 2012-2015

- tools and workﬂows for working with image
and video collections using SEASR / MEANDRE
digital humanities workﬂow platform

- applications:
1) 1+ million images + millions of metadata
records from deviantArt (the largest social
network for user-created art - 20 M users, 240 M
artworks).
2) 1+ million manga pages.
3) thousands of hours TV poltical news and
online video

Postscript:

digital humanities (working
with digitized collections of
historical artifacts)
vs. computational humanities
(using social web data)


“The capacity to collect and analyze massive amounts
of data has transformed such ﬁelds as biology and
physics. But the emergence of a data-driven
'computational social science' has been much slower.
Leading journals in economics, sociology, and political
science show little evidence of this ﬁeld. But
computational social science is occurring in Internet
companies such as Google and Yahoo, and in
government agencies such as the U.S. National
Security Agency.”

“Computational Social Science.” Science, vol. 323, no.
6, February 2009.


Massive amounts of cultural content and online
conversations, opinions, and cultural activities
(general and specialized social media networks;
personal and professional web sites ).
This data offers us unprecedented opportunities to
understand cultural processes and their dynamics
and develop new concepts and models which can be
also used to better understand the past.

Currently only analyzed by Google, Facebook,
YouTube, Blueﬁn labs, Echonest, and other
companies, and computer scientists working in
“social computing”- not yet by humanists.


manovich.lev@gmail.com

softwarestudies.com


Our free open source software tools for
analyzing and visualizing large image and
video collections, publications and
projects:

softwarestudies.com

The tools run on Mac, PC, Unix.

All media visualizations in this presentation
were created by members of Software


How and why study big cultural data v2

Recommended

Recommended

More Related Content

Similar to How and why study big cultural data v2

Similar to How and why study big cultural data v2 (20)

Recently uploaded

Recently uploaded (20)

How and why study big cultural data v2