Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data, Science, Society - Claudio Gutierrez, University of Chile


Published on

This presentation was given at the Final Conference of the LEARN Project, 5 May 2017.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Data, Science, Society - Claudio Gutierrez, University of Chile

  1. 1. Data, Science, Society LEARN Final Conference, CEPAL, London, May 5th, 2017 Claudio Guti´errez • DCC, Universidad de Chile / CIWS •
  2. 2. The foundations of experience (since we absolutely must get down to this) have been non-existent or very weak; nor has a collection or store of particulars yet been sought or made, able or in any way adequate, either in number, kind or certainty, to inform the intellect. [...] Natural history contains nothing that has been researched in the proper ways, nothing verified, nothing counted, nothing weighed, nothing measured. FRANCIS BACON, APHORISMS, XCVIII
  3. 3. A tentative agenda I. Torrents of Data II. The notion of Data III. Research and Scientific Data IV. Data and Society V. Concluding Remarks
  5. 5. There are already too many books. Even when we drastically reduce the number of subjects to which man must direct his attention, the quantity of books that he must absorb is so enormous that it exceeds the limits of his time and his capacity of assimilation. [...] Here then is the drama: the book is indispensable at this stage in history, but the book is in danger because it has become a danger for man. JOS ´E ORTEGA Y GASSET. THE MISSION OF THE LIBRARIAN. 1935.
  6. 6. TWO DIMENSIONS OF THE PROBLEM: QUANTITY (Ortega’s problem): too many objects. Beyond our time limits, human capacity of assimilation. QUALITY (New problem): the object itself is beyond our intelligibility. Huge sizes and no explicit semantics. The essence: beyond human scale
  7. 7. (Figure by Hans Moravec)
  8. 8. human scale    Byte B ∼ 100 a character Kilo KB ∼ 103 written text Mega MB ∼ 106 image, music Giga GB ∼ 109 movies beyond human    Tera TB ∼ 1012 US Congress Library Peta PB ∼ 1015 Large data center Exa EB ∼ 1018 All words ever spoken Zetta ZB ∼ 1021 Amount of global data
  9. 9. + Data science portals + Data portals of organizations + Online libraries + APIs and services for data + Online datasets and journals + Visualization and processing tools + Legal and regulatory frameworks + Open Data initiatives + · · · ————————————– . . . how to organize them?
  10. 10. PARAPHRASING A CLASSICAL THESIS ABOUT SOCIAL CHANGE: At a certain stage of development, the material forces of society began producing more symbolic material than the one existing social relations can digest. From forms of development of the culture these relations turn into their fetters. Then begins an era of information upheaval.
  11. 11. SUMMARY AND WORKING HYPOTHESIS: The symbolic world is growing so fast and vast that escapes our “natural” human capacities to handle it. We feel that an obscure and daunting, fundamentally unintelligible, (parallel) world is growing in front of our eyes. The formerly vast and volatile symbolic world is being materialized in digital data (the virtual world), thus making obsolete the conceptual models used to deal with it. Moral: Need to understand what is “data”!
  13. 13. NECESSARY CLARIFICATION Data = information Data = knowledge traditional view: knowledge = information + metainformation information = data + metadata data = ?
  14. 14. ——– I ——– At the most basic and abstract level, data is a distinction, a “fracture in the fabric of Being”. Data is the most basic layer in the symbolic world. Has not meaning by itself, but is the source of meaning.
  15. 15. ——– II ——– By data we will mean materialized (digitally recorded) data. Despite its ontological status between the material and the intangible, data is material. But it makes sense only in the virtual world.
  16. 16. ——– III ——– The distinctions that define data assume an implicit context. This network of meanings is not stated explicitly, that is, not specified in the data itself. This allows manifold interpretations of the same data from different points of view, to further explore new dimensions, etc.
  17. 17. ——– IIII ——– Data is the starting point for our discussion. Data is something given, the basic elements of our field. From this point of view our concern at this stage is not the possible meanings of data, but them as “material” elements.
  18. 18. DATA SCIENCE AS THE CHEMISTRY OF THE VIRTUAL WORLD Virtual World Data = Material world Atoms
  19. 19. (Figure by TechTarget)
  20. 20. THREE NOVELTIES/CHALLENGES a. Dual nature b. Scale c. Mode of consumption
  22. 22. WHY THIS HYPE NOW?
  23. 23. (Figure by Jim Gray)
  24. 24. DIAGNOSIS FROM OECD (1996) Knowledge, as embodied in human beings (as “human capital”) and in technology, has always been central to economic development. But only over the last few years has its relative importance been recognised, just as that importance is growing. The OECD economies are more strongly dependent on the production, distribution and use of knowledge than ever before.
  25. 25. A BASIC CHAIN OF DEDUCTIONS Economy is strongly dependent on (scientific) knowledge. Science today is heavily based on data. —————————————————- “Data has become the new oil.”
  26. 26. (Figure from
  27. 27. BLURRING BOUNDARIES I Experiment/Interference: RESEARCH DATA versus Observation/Contemplation: COMMON DATA
  28. 28. BLURRING BOUNDARIES II EXTENSIONAL, static, data (datasets, collection/networks of datasets) versus INTENSIONAL, dynamic, data (Streaming, URI, API, etc.)
  30. 30. nature of these resources. Some knowledge commons reside at the local level, others at the global level or somewhere in between. There are SUBTRACTABILITY Low High DifficultEasy EXCLUSION Toll or club goods Journal subscriptions Day-care centers Public goods Useful knowledge Sunsets Private goods Personal computers Doughnuts Common-pool resources Libraries Irrigation systems Figure 1.1 Types of goods. Source: Adapted from V. Ostrom and E. Ostrom 1977
  31. 31. DATA AS PUBLIC GOOD A public good has two critical properties, non-rivalrous consumption–the consumption of one individual does not detract from that of another–and non-excludability–it is difficult if not impossible to exclude an individual from enjoying the good. [...] Knowledge is a global public good requiring public support at the global level. Joseph Stiglitz, 1998.
  32. 32. OECD VIEW OF OPEN ACCESS Openness means access on equal terms for the international research community at the lowest possible cost, preferably at no more than the marginal cost of dissemination. Open access to research data from public funding should be easy, timely, user-friendly and preferably Internet-based OECD, 2007.
  33. 33. NSF’S PRINCIPLES Agencies must adopt a presumption in favor of openness to the extent permitted by law and subject to privacy, confidentiality, security, or other valid restrictions. Open data are publicly available data structured in a way to be fully accessible and usable. This is important because data that is open, available, and accessible will help spur innovation and inform how agencies should evolve their programs to better meet the public’s needs. Open Data at NSF
  34. 34. OPEN DATA MOVEMENT Open data is data that can be freely used, re-used and redistributed by anyone –subject only, at most, to the requirement to attribute and sharealike. Open Data Handbook
  36. 36. (Figure from
  37. 37. LIMITATIONS OF OPEN ACCESS • DUAL NATURE OF DATA: material and intangible and non-material and non-intangible • SCALE: Open access works well at human scale (this is origin of open movements and anti-closure movements). Needs secon thoughts at big scale. • CYCLE AND ECOSYSTEM: Data needs support in all parts of the cycle. Need access for all parts of the ecosystem of science.
  38. 38. (Figure by Puneet Kishor)
  39. 39. ACCESS IS NOT ENOUGH: NEED TO “REFINE” Nature Scientific Data Journal: “Scientific Data is a peer-reviewed, open-access journal for descriptions of scientifically valuable datasets, and research that advances the sharing and reuse of scientific data.”
  40. 40. DATA ITSELF AS ECOSYSTEM Main challenge is how we would like to manage and govern this new good, including its whole cycle, that is, how it is generated, accessed, stored, curated, processed and delivered.
  41. 41. DATA AS COMMONS The essential questions for any commons analysis are inevitably about equity, efficiency and sustainability. Equity refers to issues of just or equal appropriation from, and contribution to, the maintenance of a resource. Efficiency deals with optimal production, management and use of the resource. Sustainability looks at the oucomes over the long term. Ch. Hess, E. Ostrom, 2006.
  42. 42. thank you!