BIG DATA, CRITICAL
REALISM, HUMAN AGENCY
AND THE FUTURE OF THE
SOCIAL SCIENCES
MARK CARRIGAN
WHAT IS ‘BIG DATA’?
▸Data on a scale which challenges our existing techniques and infrastructure for
collection, storage and analysis
▸Lack of specificity provokes definitional spiral: volume, velocity, variety,
variability, veracity, visualisation and value etc
▸Implied comparison to ‘small data’ (we are in a new age, with new data
necessitating new methodologies) delineating a new age (‘a data deluge’, a
‘data avalanche’, ‘data flood’ etc)
▸Social action being mediated by digital devices and infrastructure produces data
as by-product of transactions e.g. retail purchases, click streams, social media
activity, mobile phone data, search engines, online communications.
▸Data produced in real time, unobtrusively as side effect, unstructured and
varied.
THE BRAVE NEW WORLD
▸The height of hype: big data makes scientific method
unnecessary
▸Predicated on dichotomies: unobtrusive/obtrusive,
correlational/causal, mining/questions,
populations/samples
▸Epochal cut: conceptual force of dichotomies and
metaphorical force of ‘big data’ create brave new world
and undermine continuities (e.g. census as population
science, secondary analysis as unobtrusive method)
▸Obscures ontology and epistemology: what is ‘big
data’ and what can we do with it?
▸Obscures political economy: platform firms, data
analytics industry, data brokers, software developers,
analytics platforms and consultants.
▸Obscures discipline: new forms of expertise
consolidating in the ideational morass produced.
EXPERTISE ARISING
FROM SYNERGIES▸ Post-PhD at the Data Science Lab and
CompSocSci.Eu
▸ Intersection of applied statistics, computer
science and social science.
▸ Huge commercial growth reflecting expansion
fo platform economy but also also emerging
academic discipline.
▸ ‘The sexiest job of the 21st century’ (Harvard
Business Review) but possibility we are seeing
a ‘data science skills bubble”?
▸ Huge state investment in data science directly
and indirectly through funding for machine
learning and artificial intelligence.
▸ Data science and emerging geo-politics of
machine learning.
THE DISCIPLINARY CHALLENGE
▸What does this mean for the organisation of knowledge production? Huge influx of funding, changing order of
epistemic prestige, pluralisation of claims to knowing the social world, radical new methodological
opportunities. .
▸The hype is a material force which benefits those who repudiate it:‘Big data’ as discursive shield advancing a
view of the social world and our knowledge of it.
▸Addressing this entails working across disciplinary boundaries because in these matters there is no adequate
account of the social which isn’t also an account of the technical.
▸Technical challenges to using categories like ‘big data’, ‘algorithm’ and ‘machine learning’ as if they refer to
singular entities: hypostatising a technical ontology which hinders our social analysis.
▸Involves overcoming schism between social theory and media theory identify by Hepp and Couldry.
Tendency to either ignore media systems or build an antiquate account of media systems into background
assumptions.
▸Can we argue for a more expansive social science? Risk is data science constrains the real to what registers
empirically within the horizons of digital infrastructures (and the limitations of this horizon are disguised by
the the sophistication with which it does this job).
THE PHILOSOPHY OF BIG DATA AND
SOCIOLOGY OF BIG DATA
▸Can we distinguish transactional data from the hype surrounding it?
▸Only an analytical distinction because implementation of the former directed and
legitimated by the latter.
▸Furthermore, transactional data inherently productive of asymmetries with method of
collection & data collected transparent to engineers and opaque to the engineered.
▸Dependence upon digital infrastructure empowers those who operate it and intervene
through it, disempowering those who are revealed and susceptible to behavioural
modification through it.
▸Human agency sits at the intersection between philosophy of big data and
sociology of big data: how are people represented and what are the consequences
of those representations?
“When we wake up in the morning, we check our e-mail, make a quick phone
call, walk outside (our movements captured by a high definition video camera),
get on the bus (swiping our RFID mass transit cards) or drive (using a
transponder to zip through the tolls). We arrive at the airport, making sure to
purchase a sandwich with a credit card before boarding the plane, and check
our BlackBerries shortly before takeoff. Or we visit the doctor or the car
mechanic, generating digital records of what our medical or automative
problems are. We post Blog entries confiding to the world our thoughts and
feelings, or maintain personal social network profiles revealing our friends and
tastes.” David Lazer et al (2009)
FOLLOWING THE DIGITAL BREADCRUMBS
BIG DATA AND HUMAN AGENCY
▸Maximally reflexive (e.g. expressing our thoughts and feelings through social media).
▸Minimally reflexive (e.g. choosing a sandwich shop which allows us to accumulate reward
points through our use of the credit card).
▸Entirely habitual (e.g. walking the same route we do each day, unaware our movements are
captured by video camera).
▸People act in ways orientated towards digital infrastructures but that purposiveness is not
registered as digital breadcrumbs: action is reduced to behaviour, human being is reduced to
behavioural trace.
▸This reduction is framed as an epistemic gain (“who we are when we think no one is looking”)
overcoming the messy thickets of interpretation and revealing the truth of human being.
Furthermore, it is done at scale, without the mess or cost of designed intervention.
▸Daniel Little has called this a “utopia of social legibility”: a belief in a world where it is possible to
read ‘the book of society’ like the ‘book of nature’ (Barnes and Wilson)
▸Recovering the commitments underlying this reduction help us identify the agency underlying
big data, the people behind platforms and their material and ideational interests.
THE ‘GOD VIEW’ IS AN INSTITUTIONAL REALITY RATHER THAN
EPISTEMIC HYPOTHESIS
▸Increasing number of platforms have a ‘god view’ but these extreme examples are the tip of an iceberg.
▸Are we seeing a ‘big data divide’ (Mark Andrejevic): a fundamental divide between the data rich and the data poor? The
data poor susceptible to constant behavioural intervention by the data rich to serve opaque private interests.
▸Big data drives promulgation of visions of the human which deny agency while we are seeing a profound restructuring of
primary (involuntary social placement) and collective agency (co-ordinating and collaborating with others)
WHAT DOES CRITICAL REALISM HAVE
TO DO WITH THIS?
▸An extremely sophisticated account of human agency (Archer, Donati, Sayer, Smith) to unpick the
multifaceted transformation of agency underway as a consequence of digitalisation
▸An extremely sophisticated account of the social production of facts about a real world. What would
Bhaskar of Reclaiming Reality say about data science?
▸An extremely sophisticated meta-theory which can help us roam across disciplinary boundaries in a
way which combines ontology, epistemology, methodology and political economy.
▸It can identify the limits in a social science which take the ‘online order’ as given: limits the real to
the empirical of what registers within the confines of a given platform.
▸This is a social science which can’t turn its gaze upon the conditions which produced it and take
platform capitalism as an object of analysis. Without this we miss the emerging political economy of
a capitalism dominated by tech firms.
▸Critical realism offers us way to think through how to overcome this but also how to account for why
this matters as a project.

Big data, human agency, critical realism and the future of the social sciences

  • 1.
    BIG DATA, CRITICAL REALISM,HUMAN AGENCY AND THE FUTURE OF THE SOCIAL SCIENCES MARK CARRIGAN
  • 2.
    WHAT IS ‘BIGDATA’? ▸Data on a scale which challenges our existing techniques and infrastructure for collection, storage and analysis ▸Lack of specificity provokes definitional spiral: volume, velocity, variety, variability, veracity, visualisation and value etc ▸Implied comparison to ‘small data’ (we are in a new age, with new data necessitating new methodologies) delineating a new age (‘a data deluge’, a ‘data avalanche’, ‘data flood’ etc) ▸Social action being mediated by digital devices and infrastructure produces data as by-product of transactions e.g. retail purchases, click streams, social media activity, mobile phone data, search engines, online communications. ▸Data produced in real time, unobtrusively as side effect, unstructured and varied.
  • 3.
    THE BRAVE NEWWORLD ▸The height of hype: big data makes scientific method unnecessary ▸Predicated on dichotomies: unobtrusive/obtrusive, correlational/causal, mining/questions, populations/samples ▸Epochal cut: conceptual force of dichotomies and metaphorical force of ‘big data’ create brave new world and undermine continuities (e.g. census as population science, secondary analysis as unobtrusive method) ▸Obscures ontology and epistemology: what is ‘big data’ and what can we do with it? ▸Obscures political economy: platform firms, data analytics industry, data brokers, software developers, analytics platforms and consultants. ▸Obscures discipline: new forms of expertise consolidating in the ideational morass produced.
  • 4.
    EXPERTISE ARISING FROM SYNERGIES▸Post-PhD at the Data Science Lab and CompSocSci.Eu ▸ Intersection of applied statistics, computer science and social science. ▸ Huge commercial growth reflecting expansion fo platform economy but also also emerging academic discipline. ▸ ‘The sexiest job of the 21st century’ (Harvard Business Review) but possibility we are seeing a ‘data science skills bubble”? ▸ Huge state investment in data science directly and indirectly through funding for machine learning and artificial intelligence. ▸ Data science and emerging geo-politics of machine learning.
  • 5.
    THE DISCIPLINARY CHALLENGE ▸Whatdoes this mean for the organisation of knowledge production? Huge influx of funding, changing order of epistemic prestige, pluralisation of claims to knowing the social world, radical new methodological opportunities. . ▸The hype is a material force which benefits those who repudiate it:‘Big data’ as discursive shield advancing a view of the social world and our knowledge of it. ▸Addressing this entails working across disciplinary boundaries because in these matters there is no adequate account of the social which isn’t also an account of the technical. ▸Technical challenges to using categories like ‘big data’, ‘algorithm’ and ‘machine learning’ as if they refer to singular entities: hypostatising a technical ontology which hinders our social analysis. ▸Involves overcoming schism between social theory and media theory identify by Hepp and Couldry. Tendency to either ignore media systems or build an antiquate account of media systems into background assumptions. ▸Can we argue for a more expansive social science? Risk is data science constrains the real to what registers empirically within the horizons of digital infrastructures (and the limitations of this horizon are disguised by the the sophistication with which it does this job).
  • 6.
    THE PHILOSOPHY OFBIG DATA AND SOCIOLOGY OF BIG DATA ▸Can we distinguish transactional data from the hype surrounding it? ▸Only an analytical distinction because implementation of the former directed and legitimated by the latter. ▸Furthermore, transactional data inherently productive of asymmetries with method of collection & data collected transparent to engineers and opaque to the engineered. ▸Dependence upon digital infrastructure empowers those who operate it and intervene through it, disempowering those who are revealed and susceptible to behavioural modification through it. ▸Human agency sits at the intersection between philosophy of big data and sociology of big data: how are people represented and what are the consequences of those representations?
  • 7.
    “When we wakeup in the morning, we check our e-mail, make a quick phone call, walk outside (our movements captured by a high definition video camera), get on the bus (swiping our RFID mass transit cards) or drive (using a transponder to zip through the tolls). We arrive at the airport, making sure to purchase a sandwich with a credit card before boarding the plane, and check our BlackBerries shortly before takeoff. Or we visit the doctor or the car mechanic, generating digital records of what our medical or automative problems are. We post Blog entries confiding to the world our thoughts and feelings, or maintain personal social network profiles revealing our friends and tastes.” David Lazer et al (2009) FOLLOWING THE DIGITAL BREADCRUMBS
  • 8.
    BIG DATA ANDHUMAN AGENCY ▸Maximally reflexive (e.g. expressing our thoughts and feelings through social media). ▸Minimally reflexive (e.g. choosing a sandwich shop which allows us to accumulate reward points through our use of the credit card). ▸Entirely habitual (e.g. walking the same route we do each day, unaware our movements are captured by video camera). ▸People act in ways orientated towards digital infrastructures but that purposiveness is not registered as digital breadcrumbs: action is reduced to behaviour, human being is reduced to behavioural trace. ▸This reduction is framed as an epistemic gain (“who we are when we think no one is looking”) overcoming the messy thickets of interpretation and revealing the truth of human being. Furthermore, it is done at scale, without the mess or cost of designed intervention. ▸Daniel Little has called this a “utopia of social legibility”: a belief in a world where it is possible to read ‘the book of society’ like the ‘book of nature’ (Barnes and Wilson) ▸Recovering the commitments underlying this reduction help us identify the agency underlying big data, the people behind platforms and their material and ideational interests.
  • 9.
    THE ‘GOD VIEW’IS AN INSTITUTIONAL REALITY RATHER THAN EPISTEMIC HYPOTHESIS ▸Increasing number of platforms have a ‘god view’ but these extreme examples are the tip of an iceberg. ▸Are we seeing a ‘big data divide’ (Mark Andrejevic): a fundamental divide between the data rich and the data poor? The data poor susceptible to constant behavioural intervention by the data rich to serve opaque private interests. ▸Big data drives promulgation of visions of the human which deny agency while we are seeing a profound restructuring of primary (involuntary social placement) and collective agency (co-ordinating and collaborating with others)
  • 10.
    WHAT DOES CRITICALREALISM HAVE TO DO WITH THIS? ▸An extremely sophisticated account of human agency (Archer, Donati, Sayer, Smith) to unpick the multifaceted transformation of agency underway as a consequence of digitalisation ▸An extremely sophisticated account of the social production of facts about a real world. What would Bhaskar of Reclaiming Reality say about data science? ▸An extremely sophisticated meta-theory which can help us roam across disciplinary boundaries in a way which combines ontology, epistemology, methodology and political economy. ▸It can identify the limits in a social science which take the ‘online order’ as given: limits the real to the empirical of what registers within the confines of a given platform. ▸This is a social science which can’t turn its gaze upon the conditions which produced it and take platform capitalism as an object of analysis. Without this we miss the emerging political economy of a capitalism dominated by tech firms. ▸Critical realism offers us way to think through how to overcome this but also how to account for why this matters as a project.