Data, Science, Society - Claudio Gutierrez, University of Chile
Data, Science, Society
LEARN Final Conference, CEPAL, London, May 5th, 2017
Claudio Guti´errez • DCC, Universidad de Chile / CIWS •
The foundations of experience (since we absolutely must get
down to this) have been non-existent or very weak; nor has a
collection or store of particulars yet been sought or made, able
or in any way adequate, either in number, kind or certainty, to
inform the intellect. [...] Natural history contains nothing that
has been researched in the proper ways, nothing veriﬁed,
nothing counted, nothing weighed, nothing measured.
FRANCIS BACON, APHORISMS, XCVIII
A tentative agenda
I. Torrents of Data
II. The notion of Data
III. Research and Scientiﬁc Data
IV. Data and Society
V. Concluding Remarks
There are already too many books. Even when we drastically
reduce the number of subjects to which man must direct his
attention, the quantity of books that he must absorb is so
enormous that it exceeds the limits of his time and his capacity
of assimilation. [...] Here then is the drama: the book is
indispensable at this stage in history, but the book is in danger
because it has become a danger for man.
JOS ´E ORTEGA Y GASSET. THE MISSION OF THE LIBRARIAN.
TWO DIMENSIONS OF THE PROBLEM:
QUANTITY (Ortega’s problem): too many objects. Beyond
our time limits, human capacity of assimilation.
QUALITY (New problem): the object itself is beyond our
intelligibility. Huge sizes and no explicit semantics.
The essence: beyond human scale
Byte B ∼ 100 a character
Kilo KB ∼ 103 written text
Mega MB ∼ 106 image, music
Giga GB ∼ 109 movies
Tera TB ∼ 1012 US Congress Library
Peta PB ∼ 1015 Large data center
Exa EB ∼ 1018 All words ever spoken
Zetta ZB ∼ 1021 Amount of global data
+ Data science portals
+ Data portals of organizations
+ Online libraries
+ APIs and services for data
+ Online datasets and journals
+ Visualization and processing tools
+ Legal and regulatory frameworks
+ Open Data initiatives
+ · · ·
. . . how to organize them?
PARAPHRASING A CLASSICAL THESIS ABOUT SOCIAL CHANGE:
At a certain stage of development, the material forces of society
began producing more symbolic material than the one existing
social relations can digest. From forms of development of the
culture these relations turn into their fetters. Then begins an era
of information upheaval.
SUMMARY AND WORKING HYPOTHESIS:
The symbolic world is growing so fast and vast that escapes
our “natural” human capacities to handle it. We feel that an
obscure and daunting, fundamentally unintelligible, (parallel)
world is growing in front of our eyes.
The formerly vast and volatile symbolic world is being
materialized in digital data (the virtual world), thus making
obsolete the conceptual models used to deal with it.
Moral: Need to understand what is “data”!
Data = information Data = knowledge
knowledge = information + metainformation
information = data + metadata
data = ?
——– I ——–
At the most basic and abstract level, data is a distinction, a
“fracture in the fabric of Being”. Data is the most basic layer in
the symbolic world. Has not meaning by itself, but is the source
——– II ——–
By data we will mean materialized (digitally recorded) data.
Despite its ontological status between the material and the
intangible, data is material. But it makes sense only in the
——– III ——–
The distinctions that deﬁne data assume an implicit context.
This network of meanings is not stated explicitly, that is, not
speciﬁed in the data itself. This allows manifold interpretations
of the same data from different points of view, to further explore
new dimensions, etc.
——– IIII ——–
Data is the starting point for our discussion. Data is something
given, the basic elements of our ﬁeld. From this point of view
our concern at this stage is not the possible meanings of data,
but them as “material” elements.
DATA SCIENCE AS THE CHEMISTRY OF THE VIRTUAL WORLD
DIAGNOSIS FROM OECD (1996)
Knowledge, as embodied in human beings (as “human capital”)
and in technology, has always been central to economic
development. But only over the last few years has its relative
importance been recognised, just as that importance is
growing. The OECD economies are more strongly dependent
on the production, distribution and use of knowledge than ever
A BASIC CHAIN OF DEDUCTIONS
Economy is strongly dependent on (scientiﬁc) knowledge.
Science today is heavily based on data.
“Data has become the new oil.”
nature of these resources. Some knowledge commons reside at the local
level, others at the global level or somewhere in between. There are
Toll or club goods
Types of goods. Source: Adapted from V. Ostrom and E. Ostrom 1977
DATA AS PUBLIC GOOD
A public good has two critical properties, non-rivalrous
consumption–the consumption of one individual does not
detract from that of another–and non-excludability–it is difﬁcult
if not impossible to exclude an individual from enjoying the
good. [...] Knowledge is a global public good requiring public
support at the global level.
Joseph Stiglitz, 1998.
OECD VIEW OF OPEN ACCESS
Openness means access on equal terms for the international
research community at the lowest possible cost, preferably at
no more than the marginal cost of dissemination. Open access
to research data from public funding should be easy, timely,
user-friendly and preferably Internet-based
Agencies must adopt a presumption in favor of openness to the
extent permitted by law and subject to privacy, conﬁdentiality,
security, or other valid restrictions.
Open data are publicly available data structured in a way to be
fully accessible and usable. This is important because data that
is open, available, and accessible will help spur innovation and
inform how agencies should evolve their programs to better
meet the public’s needs.
Open Data at NSF
OPEN DATA MOVEMENT
Open data is data that can be freely used, re-used and
redistributed by anyone –subject only, at most, to the
requirement to attribute and sharealike.
Open Data Handbook
LIMITATIONS OF OPEN ACCESS
• DUAL NATURE OF DATA: material and intangible and
non-material and non-intangible
• SCALE: Open access works well at human scale (this is
origin of open movements and anti-closure movements).
Needs secon thoughts at big scale.
• CYCLE AND ECOSYSTEM: Data needs support in all parts
of the cycle. Need access for all parts of the ecosystem of
ACCESS IS NOT ENOUGH: NEED TO “REFINE”
Nature Scientiﬁc Data Journal:
“Scientiﬁc Data is a peer-reviewed, open-access journal for
descriptions of scientiﬁcally valuable datasets, and research
that advances the sharing and reuse of scientiﬁc data.”
DATA ITSELF AS ECOSYSTEM
Main challenge is how we would like to manage and govern
this new good, including its whole cycle, that is, how it is
generated, accessed, stored, curated, processed and
DATA AS COMMONS
The essential questions for any commons analysis are
inevitably about equity, efﬁciency and sustainability. Equity
refers to issues of just or equal appropriation from, and
contribution to, the maintenance of a resource. Efﬁciency deals
with optimal production, management and use of the resource.
Sustainability looks at the oucomes over the long term.
Ch. Hess, E. Ostrom, 2006.