SlideShare a Scribd company logo
1 of 23
Big Data and
the Question of Objectivity
Federica Russoa & Jean-Christophe Plantinb
aUniversity of Amsterdam | @federicarusso
bLondon School of Economics and Political Science | @JCPlantin
Overview
The social sciences go big
Quantitative social science
A big question: what conception(s) of objectivity in big-data social science practices?
What practices?
What objectivity?
Why is this relevant?
Conceptual issues
Practical implications
2
The social sciences go big
3
Big and quantitative
The ‘first’ big data revolution in social science:
Positivism and the birth of quantitative social science
Possibility of analysing more data, using the tools of statistics
Going quantitative helped the social science reach the ‘realm of the sciences’
And yet questions related to the objectivity of the social sciences didn’t settle
4
Big and problematic
Current debates around big data and scholarship
Borgman (2015):
Don’t conflate “ease of acquisition [of data] for ease of analysis”
Need theoretical as well as methodological framework
A choir of old and new data-philosophers (e.g. Sellars, Floridi, Leonelli …)
Data are not given
Data, information, knowledge are not all the same
Data are relational 5
Lots of questions already asked
How big / fast is ‘big’?
How much theory in big automated algorithms?
What kind of reasoning? Inductive?
What implications does the ‘big’ have at social / technical / scholarly level?
...
6
Our investigation
The question:
What exactly do data curators want to achieve with big-data practices?
A two-step answer:
1. Analysis of big-data practices in social science
2. Problematisation of 2 aspects of big-data practices:
a. Making the data curator visible / invisible [(in)visibility]
b. Standardisation of processes for data curation [standardisation]
In a nutshell: [a-b] force us to re-think the notion of objectivity
7
Big-data practices in social science
8
The manual processing
‘pipeline’
“Taylorism” in the data archive
“We're more of an assembly line, and so it's
production type of work” Paul, Archive Manager
Employment conditions that characterize “invisible technicians” in science
(Shapin, 1989; Barley, Bechky, 1994; Star, Strauss, 1999)
–Strict division of roles
–Rhythm of work
–No skills development
–Short term employment and turn over
–Highly standardized work, routine
–Invisible contribution
Making data ‘pristine’
“We want [the datasets] to be right, and everything
to read properly […] Trying to get that, so that the
future users when they get [the datasets], they get
everything in a pristine manner.” Paul, Archive Manager
Data processing and invisible labor
• Complete invisibility outside the archive
– No critique allowed of the datasets: “Don’t get carried away”
– Contacting the PI only as last resort
– Strict formatting for standardized output
• Complete visibility inside the archive
– Making all processing techniques explicit
– Processing history file + Quality check
– Homogenization of practices
Interrogating ‘pristineness’
• Cleaning data twice: traces of original context + traces of
cleaning
• Reproduces erroneous conception of ‘raw data’ (Gitelman,
2013)
• Conceals contributions of data processors: protocol work
(Downey, 2014) data packaging (Leonelli, 2016)
The question of objectivity
14
[(in)visibility] and [standardisation]:
Re-introduce old ideas about objectivity
Exemplify some more recent ideas about objectivity
But also: pull them in opposite directions
15
The data curator must be invisible from the outside
Data users don’t know / need to know about the process
Focus on ‘end product’ (rather than process)
➔Data are objectively clean, ready to (re)use
● No (interfering) curator behind data curation
● Objectivity is a property of data, not of the process
➔An old ideal of objectivity: objects’ objectivity
◆ Kitcher: the Legend View of Neopositivism
16
The data curator must be visible from the inside
At any time in the data curation process, who the data curator is and what s/he does must
be visible, traceable, transparent
➔The process is objective as long as procedures are respected
The curator is present at all times
Objectivity lies in the procedure
➔A more recent idea of procedural objectivity
◆ Montuschi, Little, Cardano, … :
● the social sciences can attain objectivity;
● objectivity is in the process, not in the object of science 17
Procedural objectivity: pulling in opposite directions?
A good tool to have in the kit
• Liberate social sciences from inferiority
complex
• Can value role of data curators
• Helps understand where the process can
go wrong
• Increases objectivity of the ‘end product’
by self-reflectively work on process
...
A ‘procedural drift’ towards obsessive
standardisation?
• Can / should we be flexible about
procedure
• If so, do we lose or gain on objectivity?
• Is objectivity just a matter of procedure?
• What role is left to the data curator then?
...
What else does ‘obsessive procedural
objectivity’ presuppose? 18
Strong procedures and data pristineness
Much of [(in)visibility] and of [standardisation] rest on
the myth of raw data and of clean data
Pristineness: data are cleaned twice (original PI and of traces of cleaning)
Here we sing with the choir of data-philosophers
No, data is not raw or clean
No, you can’t just assume their cleanness abstracting from curation procedures
No, maybe they shouldn’t be cleaned up so much after all
Yes, perhaps the social science need somewhat dirty data
19
To sum up and conclude
20
Social science practices go big
The social sciences grew big already since Positivism
Introduction and development of quantitative methods
Demography and sociology; understanding and acting on social phenomena
In the era of big data, they grow even bigger
More data: social media provide tons
More practices: data curation and automated data analyses
21
Big-data practices strive for objectivity
Two relevant aspects of these practices:
[(in)visbility] and [standardisation]
Two notions of objectivity at play:
[invisible] curators: the objects are objective
[visible] curators: the procedures are objective
[standardisation] of procedures: procedures are objectives … too objective?
22
Relevance of the discussion
An interesting ‘philosophy of science in practice’ question
From the practice, bottom up crucial philosophical issues
Objectivity, an evergreen of phil sci. But what new is at stake with big data?
Beyond scholarly questions
Open data and open science
Can we abstract from the alleged objectivity of these practices?
When are data objective enough to be safely re-used?
If [standardisation] doesn’t ensure it, what does?
Should we strive for that kind of objectivity? 23

More Related Content

What's hot

What's hot (16)

Russo unam-1
Russo unam-1Russo unam-1
Russo unam-1
 
Productive causality in technoscientific research
Productive causality in technoscientific researchProductive causality in technoscientific research
Productive causality in technoscientific research
 
Causal assessment and the question of stability
Causal assessment and the question of stabilityCausal assessment and the question of stability
Causal assessment and the question of stability
 
Info biomark
Info biomarkInfo biomark
Info biomark
 
Information transmission and the mosaic of causal theory
Information transmission and the mosaic of causal theoryInformation transmission and the mosaic of causal theory
Information transmission and the mosaic of causal theory
 
Venezia phil
Venezia philVenezia phil
Venezia phil
 
The concept of variation in causal discovery
The concept of variation in causal discoveryThe concept of variation in causal discovery
The concept of variation in causal discovery
 
The mosaic of causal theory
The mosaic of causal theoryThe mosaic of causal theory
The mosaic of causal theory
 
Causality, regularity, and variation. A reassessment
Causality, regularity, and variation. A reassessmentCausality, regularity, and variation. A reassessment
Causality, regularity, and variation. A reassessment
 
The mosaic of causal theory
The mosaic of causal theoryThe mosaic of causal theory
The mosaic of causal theory
 
Causal mosaics - Series of lectures on causal modelling in the social sciences
Causal mosaics - Series of lectures on causal modelling in the social sciencesCausal mosaics - Series of lectures on causal modelling in the social sciences
Causal mosaics - Series of lectures on causal modelling in the social sciences
 
Causality in the sciences: a gentle introduction.
Causality in the sciences: a gentle introduction.Causality in the sciences: a gentle introduction.
Causality in the sciences: a gentle introduction.
 
A rendezvous with the uncertainty monster
A rendezvous with the uncertainty monsterA rendezvous with the uncertainty monster
A rendezvous with the uncertainty monster
 
Mechanisms in the Sciences. A Gentle Introduction
Mechanisms in the Sciences. A Gentle IntroductionMechanisms in the Sciences. A Gentle Introduction
Mechanisms in the Sciences. A Gentle Introduction
 
Evidence in the social sciences - Series of lectures on causal modelling in t...
Evidence in the social sciences - Series of lectures on causal modelling in t...Evidence in the social sciences - Series of lectures on causal modelling in t...
Evidence in the social sciences - Series of lectures on causal modelling in t...
 
Kent Phil Dept March06
Kent Phil Dept March06Kent Phil Dept March06
Kent Phil Dept March06
 

Similar to Big data and the question of objectivity

Ralph schroeder and eric meyer
Ralph schroeder and eric meyerRalph schroeder and eric meyer
Ralph schroeder and eric meyer
oiisdp
 

Similar to Big data and the question of objectivity (20)

Big data and the question of objectivity
Big data and  the question of objectivityBig data and  the question of objectivity
Big data and the question of objectivity
 
Data Science-1 (1).ppt
Data Science-1 (1).pptData Science-1 (1).ppt
Data Science-1 (1).ppt
 
CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...
 
Big data, new epistemologies and paradigm shifts
Big data, new epistemologies and paradigm shiftsBig data, new epistemologies and paradigm shifts
Big data, new epistemologies and paradigm shifts
 
Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science Knowledge
 
Learning Analytics – Ethical questions and dilemmas
Learning Analytics  – Ethical questions and dilemmasLearning Analytics  – Ethical questions and dilemmas
Learning Analytics – Ethical questions and dilemmas
 
TL_Thompson.pptx.ppt
TL_Thompson.pptx.pptTL_Thompson.pptx.ppt
TL_Thompson.pptx.ppt
 
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
 
"I always feel it must be great to be a hacker"
"I always feel it must be great to be a hacker" "I always feel it must be great to be a hacker"
"I always feel it must be great to be a hacker"
 
Internet Research Ethics CSSWS2015 Tutorial
Internet Research Ethics CSSWS2015 TutorialInternet Research Ethics CSSWS2015 Tutorial
Internet Research Ethics CSSWS2015 Tutorial
 
Mapping Movements: Social movement research and big data: critiques and alter...
Mapping Movements: Social movement research and big data: critiques and alter...Mapping Movements: Social movement research and big data: critiques and alter...
Mapping Movements: Social movement research and big data: critiques and alter...
 
SOC2002 Lecture 3
SOC2002 Lecture 3SOC2002 Lecture 3
SOC2002 Lecture 3
 
Ralph schroeder and eric meyer
Ralph schroeder and eric meyerRalph schroeder and eric meyer
Ralph schroeder and eric meyer
 
Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona
Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona
Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona
 
Fundamentals of Data science Introduction Unit 1
Fundamentals of Data science Introduction Unit 1Fundamentals of Data science Introduction Unit 1
Fundamentals of Data science Introduction Unit 1
 
The Hidden Data of Social Media Rearch_CSS-winter-symposium
The Hidden Data of Social Media Rearch_CSS-winter-symposiumThe Hidden Data of Social Media Rearch_CSS-winter-symposium
The Hidden Data of Social Media Rearch_CSS-winter-symposium
 
Ethical research in the era of Digital Studies : the French point of view
Ethical research in the era of Digital Studies : the French point of view Ethical research in the era of Digital Studies : the French point of view
Ethical research in the era of Digital Studies : the French point of view
 
Digital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesDigital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social Sciences
 
Louise Bezuidenhout - OpenCon Oxford, 1st Dec 2017
Louise Bezuidenhout - OpenCon Oxford, 1st Dec 2017Louise Bezuidenhout - OpenCon Oxford, 1st Dec 2017
Louise Bezuidenhout - OpenCon Oxford, 1st Dec 2017
 
Social physics
Social physicsSocial physics
Social physics
 

More from University of Amsterdam and University College London

More from University of Amsterdam and University College London (20)

H-AI-BRID - Thinking and designing Human-AI systems
H-AI-BRID - Thinking and designing Human-AI systemsH-AI-BRID - Thinking and designing Human-AI systems
H-AI-BRID - Thinking and designing Human-AI systems
 
Time in QCA: a philosopher’s perspective
Time in QCA: a philosopher’s perspectiveTime in QCA: a philosopher’s perspective
Time in QCA: a philosopher’s perspective
 
Interconnected health-environmental challenges: Between the implosion of the ...
Interconnected health-environmental challenges: Between the implosion of the ...Interconnected health-environmental challenges: Between the implosion of the ...
Interconnected health-environmental challenges: Between the implosion of the ...
 
Trusting AI-generated contents: a techno-scientific approach
Trusting AI-generated contents: a techno-scientific approachTrusting AI-generated contents: a techno-scientific approach
Trusting AI-generated contents: a techno-scientific approach
 
Interconnected health-environmental challenges, Health and the Environment: c...
Interconnected health-environmental challenges, Health and the Environment: c...Interconnected health-environmental challenges, Health and the Environment: c...
Interconnected health-environmental challenges, Health and the Environment: c...
 
Who Needs “Philosophy of Techno- Science”?
Who Needs “Philosophy of Techno- Science”?Who Needs “Philosophy of Techno- Science”?
Who Needs “Philosophy of Techno- Science”?
 
Philosophy of Techno-Science: Whence and Whither
Philosophy of Techno-Science: Whence and WhitherPhilosophy of Techno-Science: Whence and Whither
Philosophy of Techno-Science: Whence and Whither
 
Charting the explanatory potential of network models/network modeling in psyc...
Charting the explanatory potential of network models/network modeling in psyc...Charting the explanatory potential of network models/network modeling in psyc...
Charting the explanatory potential of network models/network modeling in psyc...
 
The implosion of medical evidence: emerging approaches for diverse practices ...
The implosion of medical evidence: emerging approaches for diverse practices ...The implosion of medical evidence: emerging approaches for diverse practices ...
The implosion of medical evidence: emerging approaches for diverse practices ...
 
On the epistemic and normative benefits of methodological pluralism
On the epistemic and normative benefits of methodological pluralismOn the epistemic and normative benefits of methodological pluralism
On the epistemic and normative benefits of methodological pluralism
 
Socio-markers and information transmission
Socio-markers and information transmissionSocio-markers and information transmission
Socio-markers and information transmission
 
Disease causation and public health interventions
Disease causation and public health interventionsDisease causation and public health interventions
Disease causation and public health interventions
 
The life-world of health and disease and the design of public health interven...
The life-world of health and disease and the design of public health interven...The life-world of health and disease and the design of public health interven...
The life-world of health and disease and the design of public health interven...
 
Towards and epistemological and ethical XAI
Towards and epistemological and ethical XAITowards and epistemological and ethical XAI
Towards and epistemological and ethical XAI
 
Value-promoting concepts in the health sciences and public health
Value-promoting concepts in the health sciences and public healthValue-promoting concepts in the health sciences and public health
Value-promoting concepts in the health sciences and public health
 
Connecting the epistemology and ethics of AI
Connecting the epistemology and ethics of AIConnecting the epistemology and ethics of AI
Connecting the epistemology and ethics of AI
 
How is Who. Empowering evidence for sustainability and public health interven...
How is Who. Empowering evidence for sustainability and public health interven...How is Who. Empowering evidence for sustainability and public health interven...
How is Who. Empowering evidence for sustainability and public health interven...
 
High technologized justice – The road map for policy & regulation. Legaltech ...
High technologized justice – The road map for policy & regulation. Legaltech ...High technologized justice – The road map for policy & regulation. Legaltech ...
High technologized justice – The road map for policy & regulation. Legaltech ...
 
Connecting the epistemology and ethics of AI
Connecting the epistemology and ethics of AIConnecting the epistemology and ethics of AI
Connecting the epistemology and ethics of AI
 
Science and values. A two-way relations
Science and values. A two-way relationsScience and values. A two-way relations
Science and values. A two-way relations
 

Recently uploaded

No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
Sheetaleventcompany
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
Kayode Fayemi
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
Kayode Fayemi
 

Recently uploaded (20)

Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
 
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubs
 
Causes of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCauses of poverty in France presentation.pptx
Causes of poverty in France presentation.pptx
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 

Big data and the question of objectivity

  • 1. Big Data and the Question of Objectivity Federica Russoa & Jean-Christophe Plantinb aUniversity of Amsterdam | @federicarusso bLondon School of Economics and Political Science | @JCPlantin
  • 2. Overview The social sciences go big Quantitative social science A big question: what conception(s) of objectivity in big-data social science practices? What practices? What objectivity? Why is this relevant? Conceptual issues Practical implications 2
  • 4. Big and quantitative The ‘first’ big data revolution in social science: Positivism and the birth of quantitative social science Possibility of analysing more data, using the tools of statistics Going quantitative helped the social science reach the ‘realm of the sciences’ And yet questions related to the objectivity of the social sciences didn’t settle 4
  • 5. Big and problematic Current debates around big data and scholarship Borgman (2015): Don’t conflate “ease of acquisition [of data] for ease of analysis” Need theoretical as well as methodological framework A choir of old and new data-philosophers (e.g. Sellars, Floridi, Leonelli …) Data are not given Data, information, knowledge are not all the same Data are relational 5
  • 6. Lots of questions already asked How big / fast is ‘big’? How much theory in big automated algorithms? What kind of reasoning? Inductive? What implications does the ‘big’ have at social / technical / scholarly level? ... 6
  • 7. Our investigation The question: What exactly do data curators want to achieve with big-data practices? A two-step answer: 1. Analysis of big-data practices in social science 2. Problematisation of 2 aspects of big-data practices: a. Making the data curator visible / invisible [(in)visibility] b. Standardisation of processes for data curation [standardisation] In a nutshell: [a-b] force us to re-think the notion of objectivity 7
  • 8. Big-data practices in social science 8
  • 10. “Taylorism” in the data archive “We're more of an assembly line, and so it's production type of work” Paul, Archive Manager Employment conditions that characterize “invisible technicians” in science (Shapin, 1989; Barley, Bechky, 1994; Star, Strauss, 1999) –Strict division of roles –Rhythm of work –No skills development –Short term employment and turn over –Highly standardized work, routine –Invisible contribution
  • 11. Making data ‘pristine’ “We want [the datasets] to be right, and everything to read properly […] Trying to get that, so that the future users when they get [the datasets], they get everything in a pristine manner.” Paul, Archive Manager
  • 12. Data processing and invisible labor • Complete invisibility outside the archive – No critique allowed of the datasets: “Don’t get carried away” – Contacting the PI only as last resort – Strict formatting for standardized output • Complete visibility inside the archive – Making all processing techniques explicit – Processing history file + Quality check – Homogenization of practices
  • 13. Interrogating ‘pristineness’ • Cleaning data twice: traces of original context + traces of cleaning • Reproduces erroneous conception of ‘raw data’ (Gitelman, 2013) • Conceals contributions of data processors: protocol work (Downey, 2014) data packaging (Leonelli, 2016)
  • 14. The question of objectivity 14
  • 15. [(in)visibility] and [standardisation]: Re-introduce old ideas about objectivity Exemplify some more recent ideas about objectivity But also: pull them in opposite directions 15
  • 16. The data curator must be invisible from the outside Data users don’t know / need to know about the process Focus on ‘end product’ (rather than process) ➔Data are objectively clean, ready to (re)use ● No (interfering) curator behind data curation ● Objectivity is a property of data, not of the process ➔An old ideal of objectivity: objects’ objectivity ◆ Kitcher: the Legend View of Neopositivism 16
  • 17. The data curator must be visible from the inside At any time in the data curation process, who the data curator is and what s/he does must be visible, traceable, transparent ➔The process is objective as long as procedures are respected The curator is present at all times Objectivity lies in the procedure ➔A more recent idea of procedural objectivity ◆ Montuschi, Little, Cardano, … : ● the social sciences can attain objectivity; ● objectivity is in the process, not in the object of science 17
  • 18. Procedural objectivity: pulling in opposite directions? A good tool to have in the kit • Liberate social sciences from inferiority complex • Can value role of data curators • Helps understand where the process can go wrong • Increases objectivity of the ‘end product’ by self-reflectively work on process ... A ‘procedural drift’ towards obsessive standardisation? • Can / should we be flexible about procedure • If so, do we lose or gain on objectivity? • Is objectivity just a matter of procedure? • What role is left to the data curator then? ... What else does ‘obsessive procedural objectivity’ presuppose? 18
  • 19. Strong procedures and data pristineness Much of [(in)visibility] and of [standardisation] rest on the myth of raw data and of clean data Pristineness: data are cleaned twice (original PI and of traces of cleaning) Here we sing with the choir of data-philosophers No, data is not raw or clean No, you can’t just assume their cleanness abstracting from curation procedures No, maybe they shouldn’t be cleaned up so much after all Yes, perhaps the social science need somewhat dirty data 19
  • 20. To sum up and conclude 20
  • 21. Social science practices go big The social sciences grew big already since Positivism Introduction and development of quantitative methods Demography and sociology; understanding and acting on social phenomena In the era of big data, they grow even bigger More data: social media provide tons More practices: data curation and automated data analyses 21
  • 22. Big-data practices strive for objectivity Two relevant aspects of these practices: [(in)visbility] and [standardisation] Two notions of objectivity at play: [invisible] curators: the objects are objective [visible] curators: the procedures are objective [standardisation] of procedures: procedures are objectives … too objective? 22
  • 23. Relevance of the discussion An interesting ‘philosophy of science in practice’ question From the practice, bottom up crucial philosophical issues Objectivity, an evergreen of phil sci. But what new is at stake with big data? Beyond scholarly questions Open data and open science Can we abstract from the alleged objectivity of these practices? When are data objective enough to be safely re-used? If [standardisation] doesn’t ensure it, what does? Should we strive for that kind of objectivity? 23

Editor's Notes

  1. Briefly on our previous collaboration Why this one. Also a question of philosophical methodology: phil sci *in practice* >> JC provides a lot of the ‘in practice’
  2. Contextualisation: birth of quantitative social science. Going quantitative was a way of going big. And of making soc sci a *science* Focus on question of objectivity in big data practices in soc sci Describe the practices Problematise notion(s) of objectivity Why bother Certainly intriguing and interesting conceptual issues But also very important practical implications
  3. A bod claim, sure. But helpful to contextualise. Big social science data in the context of longer term development of the discipline Look at birth of quantitative social science: why going quantitative To be objective, a science like physics (e.g. Quetelet); reach the ‘positive’ stage (Comte) To be able to analyse more data and compare (e.g. Quetelet, Durkheim) To provide a solid (objective!) basis for decision (see e.g. historical reconstructions of Porter) A long standing debate on status of soc science. Does it reach the realm of the sciences? Can they establish laws? Does the mathematization ensure reliable predictions? … Many of these questions aren’t settled in phil soc sci debates, especially objectivity [will come back to that]
  4. There is already a flourishing literature on big data. Scholar in science studies got there first, sure. Philosophers are on the way. Two points that are of relevance to our discussion -- form the background / lend support to later claims about objectivity Comment on Borgman’s point made in ‘Big data little data no data’ Mention that some of our later points are part of a large choir of data philosophers
  5. Comment briefly on all these questions. Acknowledge slightly different interests of scholars from various fields
  6. Spell out the question. Some elements: Talk about ‘practices’. So important to describe them. In fact, kind of dissect the process of data curation Emphasise: not the end product, but the process is of interest to us In the process, we also pay attention to the curators, so the subjects Anticipate briefly the two points that will matter for us. Caveat about terminology. As for suggestion about ‘labelling’.
  7. How I encounter this notion of objectivity in practice -- what elements can fieldwork bring to the understanding of such concept Major US Data archive Was created due to rise of SS survey data – data archive in the UK Forster archive and secondary analysis of existing datasets Since inception, problem of unequal quality and idiosyncrasies of data Having a group of dedicated workers to clean the data! Particiatory ethnography of 6 month there -look at the work processes (pipeline) to prepare this data -- with the idea that through this fieldwork we have access to values and practices of what “good data” is for reuse and data sharing -- entrance point to how the notion of objectivity exists in this social world.
  8. -called the pipeline of datasets – This is in a nutshell the data processing pipeline, the way the ICPSR ‘takes care’ of the data that are deposited in their archive, mostly survey data: -each dedicated processors inside the archive, goes through all this pipeline, from deposit of the dataset by a PI (1) to the publication on the ICPSR website for dowload by future reusers (7): -what is esp important here are the steps 3 and 5: -repair: the data processors uses a series of software and manual labor to review the datasets, identify the missing values, wild codes, problems wth the codebook, and fix them. The ICPSE uses the comparison with « restoring an old book or a painting »  -prepare: make sure the dataset fits with the standards of the ICPSR, in terms of how the dataset is presented, and if all the documents are present (, codebook, etc), all the metadata following the ICPSR catalogue
  9. -Factory metaphor: the pileline – a work described to me by the head of the unit at « assembly line » QUOTE -no control over rhythm of submission // tight schedule: they have to process it very quickly -BA social science required, but no time to develop skills -short term employment, high turn over, no career prospect -highly standardized, no agency, routine, boring -contribution is made invisible What is the goal of this division of labor == making data “pristine”
  10. Goal is to get data pristine -- How is this pristine-ness achieved? Through a process of invisibilisation of the data processors, that is double:
  11. Work procedures aim to make the work of data processing completely invisible outside // completely visible inside.
  12. -pristineness: inaccurate (the agency of processors cannot be completely erased), but also it reproduces idea of “raw data” -- that there is such a thing that data that could be made raw to be reused -it negates the very specificity and richness of data archives, that is, to have dedicated employees that are paid to enhance the data that they receive before reuse -- protocol work, data packaging As long as the institution commit to such conception of data, it will continue to conceal all the care provided by its processors, and will hereby keep them invisible instead of recognizing their essential contribution to the circulation of data in social science.
  13. Preliminary impressions, to be further substantiate
  14. Kitcher’s book: The Advancements of Science
  15. Mention especially Montuschi: Discussion of objectivity of: The ‘objects’ (out there vs constructed), The descriptions thereof (neutral vs value-laden) The methods (quantitative / experimental vs qualitative)
  16. Here problematise the fact that having clean data is an objective to pursue. Instill the idea that perhaps there a degree of dirtiness that has to remain in social science. How much important is the context??
  17. So in the end try to ask the question whether it *desirable* to have so much cleaned data? So far, discussion at conceptual level, based on analyses of the practices