foundations and seemingly
prof. dr. Richard Rogers, University of Amsterdam
Situating Digital Methods
Exploring Popular Claims about New
Media with Digital Methods
Web 1.0 vs. Web 2.0 Analysis
Multiple Times Online
The Web as a Problem for Content Analysis
digital methods I.
2000-2007 Virtual Methods
2007- Digital Methods
digital methods II.
The approaches within the contemporary
Virtual methods - Digitized methods
Cultural analytics - Digitized content
Digital methods - Natively digital method
for social and cultural research
Lonneke van der Velden, PhD candidate, University of Amsterdam
what is facebook activism?
Slacktivism: “The internet has made “activism” as easy and non-
committal as clicking a mouse button” (Nationalpost.com)
Anti-Facebook: “Facebook, stop invading my privacy”, Web 2.0
suicide machine, privacy enhancing scripts (New media studies)
Social networking: “Facebook and other technologies are just
tools to reach out to them [the real social networks, lvdv].” (Advocates for
Science & Technology for the People)
“Facebook isn’t designed for activism”
“The services may be free, but they have not
been designed to suit your needs as an activist
organizer. This means you will find that the
site's functionality does not always match what
you need. You will have to stretch what is there
in order to be effective.” Dan Schulz, Digiactive.org (2008)
What kind of activism does Facebook enable?
Which issues do well on Facebook?
What kind of language or practices do they exhibit?
What kind of action is advocated?
Facebook-specific ways of phrasing? Instead of issue terms.
Investigate ‘stance’ keywords in page and group titles:
Google Scraper: intitle:support site:facebook.com/group
“anti ” OR “anti-” | against | stop | protest | opposition |
oppose | resistance | resist | halt | refuse | “object to” |
objection | “pro ” OR “pro-” | support | help
Google scraper results for the top 100 groups
for “anti”, “pro”, “stop”, and “support” in the
Query the top 35 for member count with “of *
members”, to see which groups are the
Note: remove artifacts!
copy paste decoding.
To what or to whom is the group Title/ info keywords
directed? following anti, pro, etc
What kind of category does the The group’s category in response to
group submit itself to? Facebook’s settings
What kind of category can we Intuitively composed, based on
assign to the group? keywords (what is the group
What kind of engagement or action The group’s explanation of how to
format does the group suggest? ‘stop’, ‘support’
What do groups want (users) to do? ‘How’ to
join, sign, pray, raise awareness, extend
Collated into overlapping action formats
Emerging issue language from the actors’ own
calls for action.
Overall, tending towards lightweight engagement and network spread
features of Facebook (learn, join, awareness)
Anti: more action oriented, position of protest
Pro: awareness, spreading
Stop: joining & petitions – specific protests, short term, explicit need
for names on paper
Support: solidarity, letter writing
First step towards a different analysis: don’t apply a predefined
notion of activism to your objects or compare object with predefined
Instead: let’s stay close to the objects’ definitions and it’s own
language: what do they call for?
Further research: Bigger samples, interlinking other issue arena’s
within Facebook, associations with sites external to facebook (ie.
Discussion: Streching Facebook?
investigating facebook activism.
Research: Clare Lee, Esther Weltevrede, Lonneke van der Velden
the myth of data-driven
Catalina Iorga, RMA student in Media Studies, University of Amsterdam
open data and its benefits.
• “Transparency is at the heart of this Government.” (data.gov.uk,
Democratisation and Citizen Empowerment
• “empowering people” (data.gov, 2010)
• “these applications arm citizens with the information they need
to make decisions every day” (data.gov, 2010)
• Open source software (Jonathan Gray, Open Knowledge, 2010)
Large datasets available online
• The Afghan War Diary 2004 - 2010 (WikiLeaks, 2010)
Information visualisation tools
• Guardian Data Explorer (Tony Hirst)
A narrative powered by Web 2.0
• Using the Web “to tell a story, not just as a delivery
medium” (Alan Maclean, The New York Times, 2010)
data + tools = info viz <=> new story
Do non - mainstream digital media (i.e. citizen
blogs) directly reference Afghan War Diary
individual document pages?
afghan war diary
“ an extraordinary secret compendium
of over 91,000 reports covering the
war in Afghanistan from 2004 to
“ the most significant archive about the
reality of war to have ever been
released during the course of a war.”
Image source: http://wardiary.wikileaks.org
afghan war diary - der spiegel.
(Der Spiegel, 2010: http://www.spiegel.de/international/world/bild-708314-114716.html)
afghan war diary - the guardian.
“Afghanistan war logs: IED attacks on civilians, coalition and Afghan troops”
(The Guardian, 2010: http://www.guardian.co.uk/world/datablog/interactive/2010/jul/26/ied-
1. Observe the common root of all Afghan War Diary 2004 - 2010 document
2. Query Google with the Google Scraper to get the first 1000 results which contain
this common root as a textual component.
3. Submit the top 100 results to the Link Ripper to extract all outlinks to specific
Afghan War Diary 2004 - 2010 document pages.
4. Insert the Link Ripper output in the Harvester to remove textual descriptions and
alphabetize the obtained URL list.
5. Manually clean the output by searching for the 'http://wardiary.wikileaks.org/afg/
event' and produce a list of Afghan War Diary 2004 - 2010 document URLs.
6 Select all documents that receive at least two links and compile a final list of the
'most mentioned' warlogs.
methodology - disclaimer.
1. Searching for inlinks with Yahoo! Site Explorer or Google
yielded the same or no results.
2. Finding inlinks with different anchor tags is very difficult.
Content syndication based on local / national interest
• Ex.: UK political commentator James Barlow: listing a collection
of links to warlogs on the British military
Overlap between non-mainstream media and ‘alternative’ accounts
• “Idaho Soldier Captured in Afghanistan” (http://
AFG20091208n2517.html) - 8 mentions, none alternative
• “Four Canadians Killed in Friendly Fire” (http://
AFG20060903n347.htm) - 4 mentions, alternative in
“Connect the dots” reasoning in warlog commentary
(Peak of Elephants, Jul 26 2010: http://peakofelephants.posterous.com/post/
Data-driven Citizen Journalism - A Myth
• massive amount of data
• highly technical military terms
• pre-determined perspectives by media giants
the myth of data-driven
Research: Camilo Cristancho, Matteo Cernison, Catalina Iorga
web 1.0 vs web 2.0
analysis and multiple
prof. dr. Richard Rogers,University of Amsterdam
Anne Helmond, PhD candidate, University of Amsterdam
economy of links.
“Links have become the currency of the
Web. With this economic value they also
have power, affecting accessibility and
knowledge on the Web.” Jill Walker (2002)
“ReTweets Are The New Currency Of
The Web” Michael Arrington, Techcrunch (2009)
Do social media activities such as the
(re)Tweet or the Like create a web
currency outside of the link?
What type of content is hard to Like and
easy to Like?
1.Query “BP Oil Spill” in Google Web,
Google Blogs and Google News.
2.Take top 100 results for each sphere
3.Collect number of Diggs, Delicious,
Tweets and Likes per URL.
4.Manually check for button presence
Hard to Like:
organizational, special interest* and
Easy to Like:
User-generated content platforms &
Websites connected to a Facebook
* BP America & Causes: Help Wildlife Impacted by the BP Oil Spill
Google ranking is not at all correlated
with any social currencies
Google Web: Like correlates w/ Tweet
Google Blogs: Like correlates w/ Digg
Google News: Like correlates w/ Tweet
Research: Carolin Gerlitz & Anne Helmond
Esther Weltevrede, PhD candidate, University of Amsterdam
“Internet Time is absolute time for everybody. Now
is now and the same time for all people and
- Negroponte in Lee and Liebenau, 2000
“Internet time is a multiplicity”
- Leong 2009
relevance and Internet time.
What does time mean in the different spheres? How
is the temporality of content handled by engines
What is pace online?
Update cycles of content
Update cycles of engines
update cycles of content:
How many blog posts, web pages, tweets, wall
postings are being published every moment on a
update cycles of engines:
How static or fresh are the result pages of search
engines and platforms?
T 18 Aug 10 15:45 ext -------------------------3 new results--------------------------- 18 Aug 10 15:50
Research: Erik Borra, Taina Bucher, Carolin Gerlitz, Anne Helmond,
the web as a problem
for content analysis.
Sabine Niederer, PhD candidate, University of Amsterdam
“It's early in the twenty-first century, and
that means that these words will mostly
be read by nonpersons." Jaron Lanier 2010
“The Web didn’t come with content”
Richard Rogers 2010
"Fandom, porn and aliens have taken
cyber-theory into a political and cultural
wasteland" Jodi Dean 2004
"The Web is [where] ignorance meets
egoism meets bad taste meets mob
rule" Andrew Keen 2007
"[T]he majority of the forum and blog
discussions did not in any way match
the normative criteria of the
Habermasian public sphere" (Thomas Poell 2009)
"It becomes clear that the quality of
the [forum] discussion leaves a lot to
be desired." (Tamara Witschge 2007)
What kind of content analysis can be
done with the Web?
method follows medium.
How to let the content speak for itself,
no coding, or labeling the (sub)
Thomas Poell, Assistant Professor, University of Amsterdam
controversy on twitter.
Thomas Poell, Assistant Professor, University of Amsterdam
Headlines Nytimes & Fox
Muslim Community Center in Lower Manhattan (Park51)
Islamic Center Exposes Mixed Feelings Locally
Archbishop Offers Mediation for Islamic
Under Fire, Obama Clarifies Support for Ground Zero Mosque
Dozens speak out against planned mosque near
ground zero at NYC hearing on landmark status
Offer Rejected to Move Mosque Away From
Ground Zero to 'State Property'
“by letting users tag URLs and then
aggregating those tags, we're going
to be able to build alternate
organizational systems” (Shirky 2008, 18)
How is controversy organized on Twitter?
1. where can we find the
controversy on twitter?
1) Queried the news to find the
different labels for the controversy.
2) Scraped Twitter for labels to
determine specific hash tags.
3) Calculated the tweet activity per
2. how much of the twitter activity
is organized through hash tags and
1) Divided total number of hashtags
per Twitter query by the total number
of tweets for that query = average
hash tag activity.
2) Divided total number of retweets per
Twitter query by total number tweets for
that query = % of retweet organization.
3. do hash tags organize different
accounts of the controversy?
1. Determined top hash tags for each
query, and visualized the relative usage.
2. Analyse overlap in related hash tags
#gop Grand Old Party
#gzm Ground Zero Mosque
Organized Conservative Resistance Alliance
#palin Sarah Palin
#park51 location Islamic community Center
#rsrh Red State Red Hot
#sgp Smart Girl Politics (Conservative Women)
#tcot Top Conservatives on Twitter
#tlot Top Libertarians on Twitter
4. does retweeting produce
distinctly different accounts?
1. Select the top retweets per query.
2. Compare accounts.
Ground Zero Mosque
1. RT@Douggpound: How come everyone is up in arms about that
mosque but no one cares about Coke Ground Zero? (59)
2. RT @ewerickson: A lesson in tolerance from the left:Tolerance =
Ground Zero Mosque. Intolerance = @GlennBeck speaking at the
Lincoln Memorial.#rsrh #fb (19)
3. RT @ewerickson: Holocaust survivor cursed out by Ground Zero
Mosque supporter. Because it's all about tolerance. Really. http://
1. RT @MitchBenn: Re. "#gzm" (misnomer but used for ease of
ref.): trouble with Muslim Terrorists isn't that they're Muslims.
It's that they're terrorists... (14)
2. RT @JordanSekulow: This cartoon on #GZM, Obama, & Israel is
so dead on http://bit.ly/bIURiG (13)
3. RT @TuckerCarlson: RT @DailyCaller: U.S. government funds
mosque renovation and rehabilitation around the world http://
ow.ly/2tWP6 #gzm (9)
New York Islamic Center
1. RT @cnnbrk: Protesters rally against, for planned Islamic center
in New York. http://on.cnn.com/9MZJm6 (107)
2. RT @AndersonCooper: Imam behind controversial New York Islamic
center speaks http://bit.ly/aXAOBG (21)
3. RT @AlanColmes: Imam Behind New York Islamic Center Speaks
http://bit.ly/b8vXCp #p2 (11)
Two types of users
Spaces of confrontation
controversy on twitter.
Research: Matteo Cernison, Simeona Petkova & Thomas Poell
Erik Borra, Docent, University of Amsterdam
what is related search?
it is not based on query logs
it portrays semantic clusters of Web documents
it has generative rules to suggest different terms
Can the related search option be turned into a
the good news:
Google's related searches allow to "scan" broad
thematic landscapes quickly
the bad news:
Google's related searches tend to go mad:
- Topic drifting (usual)
- Over-generative (unusual)
- "Global Warning"
- "Global Warming Is An Idiot"
- "Al Gore Isnt Real"
Too much generativity and not enough selectivity.
strategies to counter act that.
A high threshold of connectivity reveals when
suggestions become inconsistent: No more path to
-7.$&'(/&+0(!" Global Warming #=..(!"(3''$6
Air :$%$R0&':(.()R-3R4S*!<0<"' 3*4565*789
Types of /&+0(!"
Polution & Effects #$%&'(:4(!"
Global Warming !"(#$'&(#$%&'
"$&.= 3$=&7 !"(!=$07
3%+(#$%&' 3$=&7(<$6 !" Al Gore
Green Facts Climate !"(<.*$&.
Global Warming !"(1+$2(>*%
& Tips Change Hoax
Energy Media Controversy
Conservation ;$'$ ;+$$
related search as a re-
organization of a content space
identification of programs / anti-programs
classification and categorization
Research: Mathieu Jacomy, Matthieu Renault, Esther Weltevrede & Erik
summer school 2010.