WHAT IS ‘BIG DATA’?
▸Data on a scale which challenges our existing techniques and infrastructure for
collection, storage and analysis
▸Lack of specificity provokes definitional spiral: volume, velocity, variety,
variability, veracity, visualisation and value etc
▸Implied comparison to ‘small data’ (we are in a new age, with new data
necessitating new methodologies) delineating a new age (‘a data deluge’, a
‘data avalanche’, ‘data flood’ etc)
▸Social action being mediated by digital devices and infrastructure produces data
as by-product of transactions e.g. retail purchases, click streams, social media
activity, mobile phone data, search engines, online communications.
▸Data produced in real time, unobtrusively as side effect, unstructured and
varied.
THE BRAVE NEW WORLD
▸The height of hype: big data makes scientific method
unnecessary
▸Predicated on dichotomies: unobtrusive/obtrusive,
correlational/causal, mining/questions,
populations/samples
▸Epochal cut: conceptual force of dichotomies and
metaphorical force of ‘big data’ create brave new world
and undermine continuities (e.g. census as population
science, secondary analysis as unobtrusive method)
▸Obscures ontology and epistemology: what is ‘big
data’ and what can we do with it?
▸Obscures political economy: platform firms, data
analytics industry, data brokers, software developers,
analytics platforms and consultants.
▸Obscures discipline: new forms of expertise
consolidating in the ideational morass produced.
EXPERTISE ARISING
FROM SYNERGIES▸ Post-PhD at the Data Science Lab and
CompSocSci.Eu
▸ Intersection of applied statistics, computer
science and social science.
▸ Huge commercial growth reflecting expansion
fo platform economy but also also emerging
academic discipline.
▸ ‘The sexiest job of the 21st century’ (Harvard
Business Review) but possibility we are seeing
a ‘data science skills bubble”?
▸ Huge state investment in data science directly
and indirectly through funding for machine
learning and artificial intelligence.
▸ Data science and emerging geo-politics of
machine learning.
THE DISCIPLINARY CHALLENGE
▸What does this mean for the organisation of knowledge production? Huge influx of funding, changing order of
epistemic prestige, pluralisation of claims to knowing the social world, radical new methodological
opportunities. .
▸The hype is a material force which benefits those who repudiate it:‘Big data’ as discursive shield advancing a
view of the social world and our knowledge of it.
▸Addressing this entails working across disciplinary boundaries because in these matters there is no adequate
account of the social which isn’t also an account of the technical.
▸Technical challenges to using categories like ‘big data’, ‘algorithm’ and ‘machine learning’ as if they refer to
singular entities: hypostatising a technical ontology which hinders our social analysis.
▸Involves overcoming schism between social theory and media theory identify by Hepp and Couldry.
Tendency to either ignore media systems or build an antiquate account of media systems into background
assumptions.
▸Can we argue for a more expansive social science? Risk is data science constrains the real to what registers
empirically within the horizons of digital infrastructures (and the limitations of this horizon are disguised by
the the sophistication with which it does this job).
THE PHILOSOPHY OF BIG DATA AND
SOCIOLOGY OF BIG DATA
▸Can we distinguish transactional data from the hype surrounding it?
▸Only an analytical distinction because implementation of the former directed and
legitimated by the latter.
▸Furthermore, transactional data inherently productive of asymmetries with method of
collection & data collected transparent to engineers and opaque to the engineered.
▸Dependence upon digital infrastructure empowers those who operate it and intervene
through it, disempowering those who are revealed and susceptible to behavioural
modification through it.
▸Human agency sits at the intersection between philosophy of big data and
sociology of big data: how are people represented and what are the consequences
of those representations?
“When we wake up in the morning, we check our e-mail, make a quick phone
call, walk outside (our movements captured by a high definition video camera),
get on the bus (swiping our RFID mass transit cards) or drive (using a
transponder to zip through the tolls). We arrive at the airport, making sure to
purchase a sandwich with a credit card before boarding the plane, and check
our BlackBerries shortly before takeoff. Or we visit the doctor or the car
mechanic, generating digital records of what our medical or automative
problems are. We post Blog entries confiding to the world our thoughts and
feelings, or maintain personal social network profiles revealing our friends and
tastes.” David Lazer et al (2009)
FOLLOWING THE DIGITAL BREADCRUMBS
BIG DATA AND HUMAN AGENCY
▸Maximally reflexive (e.g. expressing our thoughts and feelings through social media).
▸Minimally reflexive (e.g. choosing a sandwich shop which allows us to accumulate reward
points through our use of the credit card).
▸Entirely habitual (e.g. walking the same route we do each day, unaware our movements are
captured by video camera).
▸People act in ways orientated towards digital infrastructures but that purposiveness is not
registered as digital breadcrumbs: action is reduced to behaviour, human being is reduced to
behavioural trace.
▸This reduction is framed as an epistemic gain (“who we are when we think no one is looking”)
overcoming the messy thickets of interpretation and revealing the truth of human being.
Furthermore, it is done at scale, without the mess or cost of designed intervention.
▸Daniel Little has called this a “utopia of social legibility”: a belief in a world where it is possible to
read ‘the book of society’ like the ‘book of nature’ (Barnes and Wilson)
▸Recovering the commitments underlying this reduction help us identify the agency underlying
big data, the people behind platforms and their material and ideational interests.
THE ‘GOD VIEW’ IS AN INSTITUTIONAL REALITY RATHER THAN
EPISTEMIC HYPOTHESIS
▸Increasing number of platforms have a ‘god view’ but these extreme examples are the tip of an iceberg.
▸Are we seeing a ‘big data divide’ (Mark Andrejevic): a fundamental divide between the data rich and the data poor? The
data poor susceptible to constant behavioural intervention by the data rich to serve opaque private interests.
▸Big data drives promulgation of visions of the human which deny agency while we are seeing a profound restructuring of
primary (involuntary social placement) and collective agency (co-ordinating and collaborating with others)
WHAT DOES CRITICAL REALISM HAVE
TO DO WITH THIS?
▸An extremely sophisticated account of human agency (Archer, Donati, Sayer, Smith) to unpick the
multifaceted transformation of agency underway as a consequence of digitalisation
▸An extremely sophisticated account of the social production of facts about a real world. What would
Bhaskar of Reclaiming Reality say about data science?
▸An extremely sophisticated meta-theory which can help us roam across disciplinary boundaries in a
way which combines ontology, epistemology, methodology and political economy.
▸It can identify the limits in a social science which take the ‘online order’ as given: limits the real to
the empirical of what registers within the confines of a given platform.
▸This is a social science which can’t turn its gaze upon the conditions which produced it and take
platform capitalism as an object of analysis. Without this we miss the emerging political economy of
a capitalism dominated by tech firms.
▸Critical realism offers us way to think through how to overcome this but also how to account for why
this matters as a project.