Seeing and talking about Big Data, Farida Vis, AHRC Subject Assocations
Seeing and talking about Big Data
Farida Vis, University of Sheffield
‘He created this installation that was at the Tate museum in
London a while back and the installation was these
hundreds of thousands of ceramic hand-painted sunflower
seeds... And as you stood back from the room it looked like
this sea of just stones that were black stones that were
spread across the floor and of course you couldn’t really tell
what they were. But as you got closer it looks like, you can
start to tell ‘ooh it looks like they’ve stamped out hundreds
of thousands of sunflower seeds and spread them across the
floor’. But as you pick them up you started to realise that
they were all individually shaped and painted differently and
unique and beautiful and distinct in their own right. So
that’s what we want to bring to what we’re building: the
ability to shrink the world and allow everybody to see each
other.’ Dick Costolo Twitter CEO, 2012 (quoted in Vis, 2012)
Synoptic view (Scott, 1998)
a) Everything can be seen
b) Everything can be comprehended
A critical reflection on Big Data: considering APIs*,
researchers and tools as data makers
*Application Programming Interfaces
Big data includes cultural and technological aspects, but also highlights
Big Data as a ‘scholarly phenomenon’, which rests on interplay
• Technology: maximizing computation power and algorithmic
accuracy to gather, analyze, link, and compare large data sets.
• Analysis: drawing on large data sets to identify patterns in order to
make economic, social, technical, and legal claims.
• Mythology: the widespread belief that large data sets offer a higher
form of intelligence and knowledge that can generate insights that
were previously impossible, with the aura of truth, objectivity, and
accuracy. (boyd and Crawford, 2012, p. 663).
“Big data” is high-volume, -velocity and -variety information assets
that demand cost-effective, innovative forms of information processing
for enhanced insight and decision making’ (Gartner in Sicular, 2013).
Part one: three Vs – high Volume, -Velocity, -Variety
Key focus on processing data in real time.
Part two: highlight cost-effectiveness and innovation in processing
Part three: key benefit is the possibility of greater insight and thus
• Important to make visible inherent claims about objectivity
• Problematic focus on quantitative methods
• How can data answer questions it was not designed to answer?
• How can the right questions be asked?
• Inherent biases in large linked error prone datasets
• Focus on text and numbers that can be mined algorithmically
• Data fundamentalism
Crawford (2013): ‘“data fundamentalism,” the notion that correlation
always indicates causation, and that massive data sets and predictive
analytics always reflect objective truth’.
Idea and belief in the existence of an objective ‘truth’, that something
can be fully understood from a single perspective, again brings to light
tensions about how the social world can be made known.
Barthes (1957) on myth: naturalize beliefs that are contingent, making
them invisible, and therefore beyond question.
Bowker and Star (2000): limitations of available ways in which
information can be stored in society. Instead of seeing the limitations
of the technical affordances and imagine different ways in which
information might be structured, the ways in which information is
structured become naturalized, people begin to see these structures
as ‘inevitable’ (p. 108).
Amazon awarded ‘Social Networking System’
patent (The United States Patent and Trademark Office, 15 June 2010)
"A networked computer system provides various services for
assisting users in locating, and establishing contact relationships
with, other users. For example, in one embodiment, users can
identify other users based on their affiliations with particular
schools or other organizations. The system also provides a mechanism
for a user to selectively establish contact relationships or connections
with other users, and to grant permissions for such other users to view
personal information of the user. The system may also include features
for enabling users to identify contacts of their respective contacts. In
addition, the system may automatically notify users of personal
information updates made by their respective contacts."
The human algorithm tension
There are people in the machine
350 million images daily on FB
From around May 1996, just before Amazon’s IPO:
‘Soon, Amazon’s human editors were recommending books to
customers based on similar purchases they had made in the past.’
‘Amazon wasn’t just a selling site; it became an early social network
site for book fans’. (Brandt, 2011, p. 86)
Trent Reznor: Chief creative officer at Daisy
"What's missing is a system that adds a layer of
intelligent curation . . . As great as it is to have all this
information bombarding you, there's a real value in
trusted filters. It's like having your own guy when you
go into the record store, who knows what you like but
can also point you down some paths you wouldn't
necessarily have encountered.
Finding new ways to see and talk about Big Data
In particle physics, one of the bedrocks of Big Data in the natural
sciences, so called ‘dark matter’ cannot directly be seen or observed
by telescopes. Its presence can however be inferred by the
gravitational effects it has on visible matter, specifically through the
use of electromagnetic radiation.
Drawing on particle physics, we can however adopt a similar approach
and aim to make this unseen data and algorithmic structures visible by
examining data that can be seen. Through such an examination we can
infer and find out more about the dark matter’s gravitational effects on
this visible data and learn more about the dark matter itself.
• Roland Barthes, 1993 . Mythologies, London: Vintage Classics.
• Brandt, R.L., (2011), One Click: Jeff Bezos and the rise of Amazon. London: Portfolio Penguin
• James Bridle, 2013. ‘Naked Lunch’ Keynote presentation, Media Evolution Conference 2013 Malmo,
Thursday 22nd August, http://bambuser.com/v/3836761
• Geoffrey C. Bowker and Susan Leigh Star, 2000. Sorting Things Out: Classification and its
Consequences. Cambridge, Massachusetts and London, England: MIT Press.
• danah boyd and Kate Crawford, 2012. “Critical Questions for Big Data,” Information,
Communication & Society, volume 15, number 5, pp. 662-679.
• Kate Crawford, 2013. “The Hidden Biases in Big Data”, Harvard Business Review, HBR Blog Network,
1 April, at http://blogs.hbr.org/2013/04/the-hidden-biases-in-big-data/, accessed 10 September
• John C. Scott, 1998. Seeing Like a State: How Certain Schemes to Improve the Human Condition
Have Failed. New Haven and London: Yale University Press.
• Svetlana Sicular, 2013. “Gartner's Big Data Definition Consists of Three Parts, Not to Be Confused
with Three ‘V’s,” Forbes, 27 March, at
three-parts-not-to-be-confused-with-three-vs/ , accessed 18 August 2013.
• Farida Vis, 2012a. ‘‘’Twitter brings you closer’: the importance of seeing the little data in Big Data,”
In: Drew Hemment and Charlie Gere (editors). FutureEverybody: FutureEverything Report, pp. 43-
45, at http://futureeverything.org/FutureEverybody.pdf, accessed 10 September 2013.