The Nature of Digitally-Produced Data: Towards Social-Scientific Tool Criticism
The Nature Of Digitally-Produced
Jacco van Ossenbruggen, Laura Hollink, Myriam C. Traub
Do you know
the limitations of
your tool and
their impact on
Of course, I only
use unbiased tools on
unbiased data for my
Biases in the source data are unknown because it was not created for scientiﬁc
Biases in the tool chain are unknown because we use black box computational
workﬂows and tools.
We need a systematic ﬁt-for-purpose assessment for digital
tools and data, starting with the ability to measure and quantify
Do you know
… even simple word count tools return different
counts for the same text?
… most search engines are biased on document length?
… the version of the Google Ngram Viewer corpus may
impact your trend analysis?
… OCR performs better on expensive newspapers targeting
the social elite than on titles printed on cheaper paper?
… the Twitter Streaming API is not serving a random
sample of the complete “ﬁrehose”?
… the performance of predictive models on new
data cannot be predicted, not even by the
developers of the models?
Traub, Myriam C., and Jacco van
Ossenbruggen. Workshop on Tool
Criticism in the Digital Humanities,
Amsterdam, 22 May 2015
If you know OF OTHER examples of
technology-induced bias: please tweet!
The Nature Of Digitally-Produced Data:
Towards Social-Scientiﬁc Tool Criticism
Bick and Müller observe
that years of experience in
scientiﬁc data collection
methods have informed
us about each method’s
limitations with respect to
claim, however, that this is
still largely missing in
“new” research methods
based on data that
has been produced
for purposes other
Scholars need to assume a
critical attitude towards the use
of tools and perform a
Tool makers need to publish
the source code of the tools and
document their requirements
Data providers need to make
Data scientists need to develop
quality metrics that measure
bias and take the
research tasks into
Bick, Wolfgang, and Paul J Müller. "The Nature of
Process-Produced Data. Towards a Social-Scientific
Source Criticism." Historical social research: The
use of historical and processproduced data. (1980).