Big data and new data analytics are transforming research across disciplines by enabling new methods of data generation, collection, and analysis. This allows researchers to ask and answer questions in new ways. While opportunities exist to develop more sophisticated models and insights, there are also concerns about reductionism and losing nuance. In the social sciences and humanities, both opportunities and challenges exist regarding quantitative and qualitative approaches. Overall, while new paradigms may emerge, pluralism in methods and approaches is likely to continue across disciplines.
1. Big data, new epistemologies and paradigm shifts
or
Do revolutions in measurement lead to
revolutions in science?
Rob Kitchin,
National University of Ireland Maynooth
2. Introduction
• “Revolutions in science have often been preceded by revolutions
in measurement” Sinan Aral (2010)
• “Big data creates a radical shift in how we think about research.
... [It offers] a profound change at the levels of epistemology
and ethics. Big data reframes key questions about the
constitution of knowledge, the processes of research, how we
should engage with information, and the nature and the
categorization of reality ... Big data stakes out new terrains of
objects, methods of knowing, and definitions of social life”
(boyd and Crawford 2012)
• Critically examine
• Big data
• Data analytics
• Effects on epistemological and methodological approach in sciences,
social sciences and humanities
3. Small data / big data
Characteristic Small data Big data
Volume Limited to large Very large
Exhaustivity Samples Entire populations
Resolution and
indexicality
Coarse & weak to tight
& strong
Tight & strong
Relationality Weak to strong Strong
Velocity Slow, freeze-framed Fast
Variety Limited to wide Wide
Flexible and scalable Low to middling High
4. Urban big data
• Directed
o Surveillance: CCTV,
drones/satellite
o Scaled public admin records
• Automated
o Automated surveillance
o Digital devices
o Sensors, actuators,
transponders, meters (IoT)
o Interactions and transactions
• Volunteered
o Social media
o Sousveillance/wearables
o Crowdsourcing
o Citizen science
5. Big data analytics
• Challenge of making sense of big data is coping with its
abundance and exhaustivity, timeliness and dynamism,
messiness and uncertainty, semi-structured or unstructured
nature
• Solution has been machine learning (AI) made possible by
advances in computation and computational techniques
• Four broad classes of analytics:
• data mining and pattern recognition
• statistical analysis
• prediction, simulation, and optimization
• data visualization and visual analytics
6.
7. New paradigms
• Big data, coupled with new data analytics, challenges established
epistemologies across the sciences, social sciences and humanities
• Transforming how we frame, ask and answer questions
• Some argue leading to new paradigms within and across disciplines
• For Kuhn (1962) paradigm shifts are driven by science being unable to account
for particular phenomena or answer key questions
• For Gray (2009) paradigm shifts are driven by new forms of measurement, data
and analytical techniques. He charts the evolution of science through four
broad paradigms
Paradigm Nature Form When
First Experimental science Empiricism; describing natural
phenomena
pre-Renaissance
Second Theoretical science Modelling and generalization pre-computers
Third Computational science Simulation of complex phenomena pre-big data
Fourth Exploratory science Data-intensive; statistical exploration
and data mining
Now
8. Science
• Gray proposes that science is entering a fourth paradigm
driven by big data and new data analytics
• Leading to new era of data-intensive science and a
radically new extension of the established scientific
method
• Others suggest that big data ushers in a new era of
empiricism, wherein data can speak for themselves free of
theory
• The latter has gain credence outside of the academy,
especially within business circles, but its ideas have also
taken root in data science
9. ‘The end of theory’
• Anderson (2008) argues: ‘The data deluge makes the scientific method
obsolete’; that the patterns and relationships contained within big data
inherently produce meaningful and insightful knowledge
• “There is now a better way. Petabytes allow us to say: ‘Correlation is
enough.’ ... We can analyze the data without hypotheses about what it
might show. We can throw the numbers into the biggest computing
clusters the world has ever seen and let statistical algorithms find
patterns where science cannot. ... Correlation supersedes causation,
and science can advance even without coherent models, unified
theories, or really any mechanistic explanation at all. There’s no
reason to cling to our old ways.”
• Ayasdi software claims to be able to:
• “automatically discover insights -- regardless of complexity -- without
asking questions.”
10. ‘The end of theory’
• Moreover, can employ an ensemble approach
• Literally hundreds of different algorithms can be applied to
a dataset to determine the best answer or a composite
model or explanation
• A radically different approach to that traditionally used
wherein the analyst selects an appropriate method based
on their knowledge of techniques and the data
• Logic is insight is born from the data, not theory
11. ‘The end of theory’
• Powerful and attractive set of ideas at work in the empiricist epistemology that
run counter to mainstream deductive approach:
• big data can capture a whole of a domain and provide full resolution
• there is no need for a priori theory, models or hypotheses
• through the application of agnostic data analytics the data can speak for
themselves free of human bias or framing
• that any patterns and relationships within big data are inherently
meaningful and truthful
• meaning transcends context or domain-specific knowledge, thus can be
interpreted by anyone who can decode a statistic or data visualization
• offers the possibility of insightful, objective and profitable knowledge
without science or scientists
• These work together to suggest that a new mode of understanding the world is
being created, one in which the modus operandi is purely inductive in nature
12. ‘The end of theory’
• Empiricist thinking is problematic for four
reasons:
• Big data are both a representation and a sample, shaped
by the technology and platform used, the data ontology
employed, the regulatory environment, and are subject
to sampling bias
• Big data do not arise from nowhere, free from the ‘the
regulating force of philosophy’
• Big data cannot simply speak for themselves free of
human bias or framing
• Big data cannot be interpreted outside of context and
domain-specific knowledge
13. Data-driven science
• Data-driven science seeks to hold to the tenets of the scientific
method, but is more open to using a hybrid combination of
abductive, inductive and deductive approaches
• Differs from traditional, experimental deductive design in that it
seeks to generate hypotheses and insights ‘born from the data’
rather than ‘born from the theory’
• Seeks to incorporate a mode of induction into the research
design, though explanation through induction is not the intended
end-point.
• Instead, induction forms a new mode of hypothesis generation
before a deductive approach is employed
• Process of induction does not arise from nowhere, but is situated
and contextualised within a highly evolved theoretical domain
14. Data-driven science
• The epistemological strategy is to use guide knowledge discovery
techniques to identify potential questions worthy of further
examination and testing
• And instead of testing whether every relationship revealed has
veracity, attention is focused on those that seemingly offer the
most likely or valid way forward based on established science
• Approach is suited to extracting additional, valuable insights that
traditional ‘knowledge-driven science’ would fail to generate
• Data-driven approached:
• suited to exploring, extracting value and making sense of massive,
interconnected data sets
• fostering interdisciplinary research that conjoins domain expertise
• will lead to more holistic and extensive models and theories of
entire complex systems rather than elements of them
15. Social sciences and humanities
• The effect of big data/data analytics in the humanities and
social sciences is less certain
• These areas of scholarship are highly diverse in their
philosophical underpinnings, with only some scholars
employing the epistemology common in the sciences
• Whilst there is a history quantitative and positivistic
scholarship in social sciences, much rarer in humanities
• There has been a strong post-positivistic shift in many
social science disciplines
16. Computational social science
• For positivistic scholars in the social sciences, big data offers the
opportunity to develop more sophisticated, wider-scale, finer-
grained models of human life. To shift from:
• data-scarce to data-rich studies of societies
• from static snapshots to dynamic unfoldings
• from coarse aggregations to high resolutions
• from relatively simple models to more complex, sophisticated
simulations
• The potential is for studies with much greater breadth, depth,
scale, and timeliness, and are inherently longitudinal
• The variety, exhaustivity, resolution, and relationality of data,
plus new techniques, addresses some of the critiques of
positivistic scholarship –- reductionism and universalism -- by
providing more finely grained, sensitive, and nuanced analysis
17. Social sciences
• For post-positivist scholars, big data offers both opportunities and challenges
• Opportunities:
• a proliferation, digitisation and interlinking of a diverse set of analogue and
unstructured data, much of it new (e.g., social media) and many of which have
been difficult to access (e.g., millions of books, documents, newspapers,
photographs, art works, material objects, etc.)
• And new tools of data curation, management and analysis that can handle
massive numbers of data objects
• Challenges:
• Analysis mechanistic, atomizing, and parochial, reducing diverse individuals and
complex multidimensional social structures to mere data points; identifies
trends but not what produces such a trend
• struggles with the social and with context
• creates bigger haystacks
• identifies but does not address problems
• tends to marginalize metaphysical and normative questions
• erosion of domain level expertise
• promotion of empiricist/quantitative approaches and skewing of funding
towards big data
• skills and knowledge deficit
18. Digital humanities
• Opportunities/challenges being keenly felt in the
humanities; rise of digital humanities
• Rather than providing a close reading of a handful of
novels or photographs, or a couple of artists and their
work, it becomes possible to search, connect and find
patterns across a very large number of related works
• Digital humanities advocates broadly divided into two
camps epistemologically
• Those that believe that that new techniques -- counting,
graphing, mapping, data mining -- bring methodological rigour
and objectivity to disciplines that heretofore been
unsystematic and random in their focus and approach
• Those that see the techniques as a supplement to, rather than
replacement for existing humanities methods and theory
building
19. Digital humanities
• Both cases tend to use descriptive rather than inferential
statistics
• The claims of the former have opened up an
epistemological debate centred on close versus distant
reading/interpretation, ability of algorithms to parse
meaning and context
• DH seen by some as mechanistic and reductionist
(reduces literature and art to data)
• Identifies patterns but not processes or meaning
• Sacrifices complexity, specificity, context, depth and
critique for scale, breadth, automation, descriptive
patterns and the impression that interpretation does not
require deep contextual knowledge
• Other similar concerns as Soc Sci.
20. What happens to small data studies?
• Big data doesn’t replace or negate small data
• Small data have a proven track record of answering specific
questions, with est. procedures, methods, etc.
• Studies can be much more finely tailored
• Small data studies seek to mine gold from carefully working a
narrow seam, whereas big data studies seek to extract nuggets
through open-pit mining, scooping up and sieving huge tracts of
land
• Small data will, however, increasingly be made more big data-
like through the development of new data infrastructures that:
• pool, scale and link small data in order to create larger datasets,
• encourage sharing and re-use
• open them up to combination with big data and analysis using big
data analytics
21. Conclusion
• Big data/analytics constitute a data revolution – fundamentally
alters the nature of data and how we make sense of them
(disruptive innovation)
• It is starting to transform how research is conducted, organised
and managed - enables new approaches to data
generation/analysis that make it possible to ask and answer
questions in new ways
• Also pose significant social, political and ethical questions
• As new technologies and analytics develop these transformations
will extend and deepen raising a series of conceptual and
methodological challenges across sciences, social sciences and
humanities
• Have the potential to usher in new paradigms, but more likely to
be further pluralism in approaches
22. Rob.Kitchin@nuim.ie
@robkitchin
Kitchin, R. and McArdle, G. (2016) What makes big data, big data? Exploring the ontological
characteristics of 26 datasets. Big Data & Society 3: 1–10
Kitchin, R. and Lauriault, T. (2014) Towards critical data studies. SSRN
Kitchin R and Lauriault T (2015) Small data in the era of big data. GeoJournal 80(4): 463-475
Kitchin R (2014) Big data, new epistemologies & paradigm shifts. Big Data and Society 1: 1-12.
Kitchin, R. (2014) The real-time city? Big data and smart urbanism. GeoJournal 79(1): 1-14.
Kitchin, R. (2013) Big data and human geography: Opportunities, challenges and risks.
Dialogues in Human Geography 3(3): 262–267
http://www.nuim.ie/progcity
@progcity