"Assuming the Burden of Proof: Data as Evidence in Science and Public Policy,": Global Research Data Infrastructures (GRDI) 2020, Stellenbosch, South Africa October 23-24, 2010
Assuming the Burden of Proof: Data as Evidence in Science and Public Policy (with lessons from Biodiversity Conservation) By Tom Moritz Monk, Bean & Moritz Los Angeles Prepared for GRDI2020 Stellenbosch, South Africa October 23-24, 2010
“In order to diagnose, one must observe and reason.” – Galen of Pergamon (2nd Century AD)[Luis Garcia-Ballester, 2002, Galen and Galenism, Burlington: Ashgate-Variorum, page 1663]
The Legacy of “Evidence”“…the great trouble with the world was that which survived was held in hard evidence as to past events. A false authority clung to what persisted, as if those artifacts of the past which had endured had done so by some act of their own will.”-- Cormac McCarthy The Crossing
“E pursimuove!”“After 350 Years, Vatican Says Galileo Was Right: It Moves” by Alan CowellNew York Times October 31, 1992 “ROME, Oct. 30 — More than 350 years after the Roman Catholic Church condemned Galileo, Pope John Paul II is poised to rectify one of the Churchs most infamous wrongs -- the persecution of the Italian astronomer and physicist for proving the Earth moves around the Sun. “With a formal statement at the Pontifical Academy of Sciences on Saturday, Vatican officials said the Pope will formally close a 13-year investigation into the Churchs condemnation of Galileo in 1633. The condemnation, which forced the astronomer and physicist to recant his discoveries, led to Galileos house arrest for eight years before his death in 1642 at the age of 77. “The dispute between the Church and Galileo has long stood as one of historys great emblems of conflict between reason and dogma, science and faith. The Vaticans formal acknowledgement of an error, moreover, is a rarity in an institution built over centuries on the belief that the Church is the final arbiter in matters of faith.” http://www.nytimes.com/1992/10/31/world/after-350-years-vatican-says-galileo-was-right-it-moves.html
“PUBLIC REASON?” “Nothing is required for this enlightenment, however, except freedom; and the freedom in questionis the least harmful of all, namely,the freedom to use reason publicly in all matters ." -- Immanuel Kant from “What is Enlightenment?”
Recommendation: “A Global Knowledge Commons” ? “There is a universal public right to access and to use of all forms of data and information essential to the formation of public policy, to public decision making and to public regulation. Similarly, all data and informationessential to the evaluation and understanding of public policy must be freely and openly available in a form that is complete, meaningful and usable. Universal science literacy is a fundamental, corollary right.”
“KNOWLEDGE RESOURCES”: TechnologyWhat is effect if costbarriers and transaction costs are added atevery level? Repatriation of biodiversity information through Clearing House Mechanism of the Convention on Biological Diversity and Global Biodiversity Information Facility; Views and experiences of Peruvian and Bolivian non-governmental organizations. Ulla Helimo Master’s Thesis University of Turku Department of Biology 6.10. 2004 p.11. http://enbi.utu.fi/Documents/Ulla%20Helimo%20PRO%20GRADU.pdf[06-06-05]
“Transparency” and “Accountability”• “Transparency” means: visibility by intentional disclosure -- In the digitally networked internet environment this implies open, free and effective access• “Accountability” means proactive responsibility for the “quality of data” insuring that data are open, free, informative and usable. -- this means ability to satisfy a rigorous audit of both the (logical) validity and (technical) integrity of all data
The erosion of the ethic of data sharing: “Could you patent the sun? “ In a 1954 interview with Edward R Murrow, Jonas Salk responded to a question suggesting the patenting of the polio vaccine : “Could you patent the sun?” and then ca 50 years later In a 2002 study, 47% of surveyed geneticists had been rejected at least once in their efforts to gain access to key genetics data (this result indicated a significant increase over a previous survey). EG Campbell et al. “Data Withholding in Academic Genetics: Evidence From a National Survey”JAMA, Jan 2002; 287: 473 – 480; Massachusetts General Hospital (2006). “Studies examine withholding of scientific data among researchers, trainees: Relationships with industry, competitive environments associated with research secrecy.” News release (January 25). Massachusetts General Hospital. http://www.massgeneral.org/news/releases/012506campbell.html, as of November 17, 2008.
Is scientific knowledge a “commodity” ??? What are the implications of this view? ???Julian Birkinshaw and Tony Sheehan, “Managing the Knowledge Life Cycle,” MIT Sloan Management Review, 44 (2) Fall, 2002: 77.
References to “Intellectual Property” in U.S. federal cases “Professor Hank Greely” Cited in Lessig, L. The future of ideas: the fate of the commons in a connrcted world. NY, Random House, 2001. P. 294.
“Evidence-Based” Policy:Development of the of Science / Policy Dynamic• “Jeffersonian Science” as a middle alternative to the opposed paradigms of: – “pure” - “Newtonian” science (“omniscience”) – “applied” - “Baconian” science (“omnipotence”)• The “Carter-Press initiative” (ca 1980): US governmental agencies were surveyed about primary research that would be most likely to yield research of practical benefit to agency mission• IPCC and “climate-gate” as “public exposure and potential economic impact increases, “fault tolerance” of system diminishes “Fault tolerance” • viz recent IPCC – efforts to regulate expressions of probability – levels of confidence-- (practical differences among 3 IPCC Working Groups) SEE: Gerard Holton, “Science and Anti-science,” and
Poder Politico y Conocimiento Alto ??? Políticos Administradores o Gestores Analistas- Técnicos Científicos AltoBajo Conocimiento (en términos científicos-occidentales) (Sutton, 1999) From:Organizaciones que aprenden, paises que aprenden: lecciones y AP en Costa Rica by Andrea Ballestero Directora ELAP
“A representation of the cholera epidemic of the nineteenth century” http://history.nih.gov/exhibits/history/index.html
Complex knowledge resources support researchResearch Information Network and British library “Patterns of information use and exchange: case studiesof researchers in the life sciences”http://www.rin.ac.uk/system/files/attachments/Patterns_information_use-REPORT_Nov09.pdf
Hilary Spencer (Nature Publishing Group)“Thoughts on the sharing of data and research materials and the role of journal policies,” http://www.stanford.edu/~vcs/Nov21/hilary_spencer_rdcscsJan2010.pdf
“Forty-seven percent of geneticists who asked other faculty for additional information, data, or materials regarding published research reported that at least 1 of their requests had been denied in the preceding 3 years.”*“Of a potential 3000 respondents, 2893 were eligible and 1849 responded, yielding an overall response rate of 64%. We analyzed a subsample of 1240 self-identified geneticists and made a limited number of comparisons with 600 self-identified nongeneticists.”+E G Campbell, et al. “Data Withholding in Academic Genetics Evidence From a National Survey,” JAMA. 2002;287(15):1939-1940. http://jama.ama-assn.org/content/287/4/473.abstract
“Qualitative”? / “Quantitative”?“Civilization advances by extending the number ofimportant operations which we can perform withoutthinking about them. Operations of thought are like cavalry charges in battle – they are strictly limited innumber, they require fresh horses, and must only be made at decisive moments.” A N Whitehead, An introduction to mathematics, Oxford Univ Press, London, 1911.
Interdisciplinarity and The “Unity of Science”• Applied sciences -- like biodiversity conservation or epidemiology or earthquake engineering –- must inevitably draw on the practices and knowledge bases of multiple domains and disciplines.• Domains and disciplines previously inaccessible to each other may now share knowledge much more effectively and consequentially.• Discovery and combination of previously unavailable, dissociated, heterogeneous data creates the possibility for construction of longitudinal studies and new meta-analyses
New capacity for historical (“longitudinal”) studies: ICOADS Marine Data RescueScott Wodruff et al. “ICOADS Marine Data Rescue: Status and Future CDMPPriorities”: http://icoads.noaa.gov/reclaim/pdf/marine-data-rescue_v15.pdf
New capacity for historical (“longitudinal”) studies: NCAR Research Data Archive (RDA)C.A. Jacobs, S. J. Worley, “Data Curation in Climate and Weather: Transforming our ability to improve predictions through global knowledge sharing ,” from the 4th International Digital Curation Conference December 2008 , page 7. www.dcc.ac.uk/events/dcc- 2008/programme/papers/Data%20Curation%20in%20Climate%20and%20Weather.pdf [03 02 09]
The “ecology”, development and maturity of scientific domains and disciplines
“the institutionalecology of thedigitalenvironment”(YokaiBenkler)Sector andJurisdictionalScaleTHE ROLE OF SCIENTIFIC AND TECHNICAL DATA AND INFORMATION IN THE PUBLIC DOMAIN PROCEEDINGS OF A SYMPOSIUM Julie M. Esanuand Paul F. Uhlir, Editors Steering Committee on the Role of Scientific and Technical Data and Information in the Public Domain Office ofInternational Scientific and Technical Information Programs Board on International Scientific Organizations Policy and Global Affairs Division,National Research Council of the National Academies, p. 5
“Big Science” / “Small Science” ? GRIDS International Data Collaborative Centers Research EffortIndividual National Disciplinary InitiativesLibraries Cooperative ProjectsLocal / IndividualsPersonalArchiving “Small Science” “BIG Science”
Definition: “Data” ? [technological]“…’data’ are defined as any information that can be stored in digital form and accessed electronically, including, but not limited to, numeric data, text, publications, sensor streams, video, audio, algorithms, software, models and simulations, images, etc.” -- US NSF Program Solicitation 07-601 “Sustainable Digital Data Preservation and Access Network Partners (DataNet)”Taken in this broadest possible sense, “data” are thus simply electronic coded forms of information. And virtually anything can be represented as “data” so long as it is electronically machine-readable.
“Most commonly, computer scientists are concerned with digital objects that are defined as a set of sequences of bits. One can then ask computationally based questions about whether one has the correct set of sequences of bits, such as whether the digital object in ones possession is the same as that which some entity published under a specific identifier at a specific point in time… However, this is a simplistic notion. There are additional factors to consider.” Clifford Lynch, “Authenticity and Integrity in the Digital Environment: An Exploratory Analysis of the Central Role of Trust,” http://www.clir.org/pubs/reports/pub92/lynch.html
Definition: “Data” [epistemic]“Measurements, observations or descriptions of a referent -- such as an individual, an event, a specimen in a collection or an excavated/surveyed object -- created or collected through human interpretation (whether directly “by hand” or through the use of technologies)” -- AnthroDPA Working Group on Metadata (May, 2009)
Data transformations and the risks of loss of integrity --(we must fully analyze the etiology of data degradation!)
“The Data Life Cycle” #1 US NSF “DataNet” Program: “the full data preservation and access lifecycle” • “Acquisition” • “Documentation” • “Protection” • “Access” • “Analysis and dissemination” • “Migration” • “Disposition”“Sustainable Digital Data Preservation and Access Network Partners (DataNet) Program Solicitation” NSF 07-601 US National Science Foundation Office of Cyberinfrastructure Directorate for Computer & Information Science & Engineering
www.dcc.ac.uk/docs/publications/DCCLifecycle.pdf “The Data Life Cycle” #2
“The Data Life Cycle” #3http://wiki.esipfed.org/images/c/c4/IWGDD.pp t
“Quality of Data”???• “Validity”: logical decisive force in support of an hypothesis or proposition – may be discipline or domain specific (but must also be made publicly, popularly intelligible...!)• “Integrity”: – Acquisition of Data: The use of certified personnel, methods and equipment to collect and record the data? (Were these methods and equipment appropriately selected, calibrated, deployed and operated properly? Were the human agents properly trained and certified for the work they performed?) – Management and Maintenance of Data: Secure maintenance and proper management of data? (Accordance with established best practices?) Maintenance of the integrity of original data includes documentation of the chain of custody and uses all appropriate methods and metrics that establish data have not been inadvertently lost or changed. – Transformations of Data: Proper execution and documentation of all data transformations. (Have all transformations been validly applied to insure the integrity of original data? Can the complete documented “audit trail” of data be analyzed to reveal provenance and lineage, the integral, original data sources?)
“…the “validation” of any scientific hypotheses rests upon the sum integrity of all original data and of all sequences of data transformation to which original data have been subject. “ – Tom Moritz “The Burden of Proof” GRDI2020 Position Paper October 23-24, 2010
“Evidence”? “Data having decisive probative value (in terms of applicable rules of evidence), being demonstrably valid (i.e. well supported by scientific logic)and having technically demonstrable integrity.” Tom Moritz “The Burden of Proof” GRDI2020 Position Paper October 23-24, 2010
Data as Evidence:• “Evidence” implies intended probative force• Clear distinctions between “rules of evidence” applied in different domains: (i.e. appropriate fitness for use within domains) – In science – In law (forensics) – In policy formation (deliberation/legislation/ law) – In regulation (administrative code, enforcement, professional/ industrial norms ) – In journalism – [In religion or other belief systems (generally “authority-based”)]• “Peer review” in various forms is arbiter (academy, profession, discipline, jury, review board, public…)• “Independence” / “impartiality” is essential quality of such reviews
“Rules of Evidence”?• Scientific: – professional/disciplinary standards/peer certification• Law/Forensics: – “Justice as Fairness” (John Rawls) – Precedent/ judicial review – “Reasonable Doubt” (the “clear conscience” standard)• Deliberation /Policy Formation (polemical/rhetorical)• Regulatory: (code)• Journalistic: corroboration and review (including letters to editor)• Religious: authority (textual and clerical) and testimonials (evidence of miracles)
Forms of scientific peer review (“organized skepticism”) and “jurying” in science: Existing scientific conventions and practices: • Admission to scientific programs • Academic performance • Internships, “apprenticeships” • Awarding degrees • Hiring, promotion and tenure • Collegiality and mentoring (personal communications) • Editorial board reviewing • Grant Reviewing • Book reviews • Letters / Notes • Blogs and recommender-style “commenting” • Annotation • Awards and other recognition
An “regulatory” example industrial standards for peer-certification: “A regulatory standard from the US Pharmaceutical Industry”“The objective of the analytical procedure should be clearly understood since this will govern the validationcharacteristics which need to be evaluated. Typical validation characteristics which should be considered are listed below:“Accuracy”: “The accuracy of an analytical procedure expresses the closeness of agreement between the value which isaccepted either as a conventional true value or an accepted reference value and the value found.”“Precision” : “The precision of an analytical procedure expresses the closeness of agreement (degree of scatter) betweena series of measurements obtained from multiple sampling of the same homogeneous sample under the prescribedconditions.”“Repeatability” : “Repeatability expresses the precision under the same operating conditions over a short interval oftime. Repeatability is also termed intra-assay precision ““Intermediate Precision”: “Intermediate precision expresses within laboratories variations: different days differentanalysts, different equipment, etc.”“Reproducibility”: “Reproducibility expresses the precision between laboratories”“Specificity”: “ability to assess unequivocally the analyte in the presence of components which may be expected to bepresent.”“Detection Limit”: “…the lowest amount of analyte in a sample which can be detected but not necessarily quantitated asan exact value.”“Quantitation Limit”: “…the lowest amount of analyte in a sample which can be quantitatively determined with suitableprecision and accuracy.”“Linearity Range”: “ability (within a given range) to obtain test results which are directly proportional to the concen-tration (amount) of analyte in the sample.”“Robustness”:“capacity to remain unaffected by small, but deliberate variations in method parameters and provides anindication of its reliability during normal usage.” “Guideline for Industry Text on Validation of Analytical Procedures” by The Expert Working Group (Quality) of the International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH) - endorsed by the ICH Steering Committee at Step 4 of the ICH process, October 27, 1994.
Journalism:• BBC accountability for presenting irresponsible “counterpoints” to climate change [sensationalist approach to journalism?]• Distinctions between journalism / punditry / entertainment?• Why are “climate deniers without scientific credentials presented as “counterpoints” to serious scientific consensus on climate change• Would “flat-earther”s” or “gravity-deniers” be accorded such respect and given such a platform? SEE: Nature editorial comment ca Oct. 18, 2010